je peux récupérer le texte d'une page web, disons https://stackoverflow.com/questions avec des liens réels et composés:Comment créer des URL complètes à partir d'URL incomplètes trouvées dans une page Web?
/questions /tags /questions?sort=votes /questions?sort=active randompage.aspx ../coolhomepage.aspx
Sachant ma page d'origine a été https://stackoverflow.com/questions est-il un moyen de .Net pour résoudre les liens à ce sujet? Un peu comme la façon dont un navigateur est assez intelligent pour résoudre les liens.
Mise à jour =========================== - En utilisant la solution de David:
'Regex to match all <a ... /a> links Dim myRegEx As New Regex("\<\s*a (?# Find opening <a tag) " & _ ".+?href\s*=\s*['""] (?# Then all to href=' or "") " & _ "(?<href>.*?)['""] (?# Then all to the next ' or "") " & _ ".*?\> (?# Then all to >) " & _ "(?<name>.*?)\<\s*/a\s*\> (?# Then all to </a>) ", _ RegexOptions.IgnoreCase Or _ RegexOptions.IgnorePatternWhitespace Or _ RegexOptions.Multiline) 'MatchCollection to hold all the links that are matched Dim myMatchCollection As MatchCollection myMatchCollection = myRegEx.Matches(Me._RawPageText) 'Loop through all matches and evaluate the value of the href attribute. For i As Integer = 0 To myMatchCollection.Count - 1 Dim thisLink As String = "" thisLink = myMatchCollection(i).Groups("href").Value() 'This checks for Javascript and Mailto links. 'This is not complete. There are others to check I just haven't encountered them yet. If thisLink.ToLower.StartsWith("javascript") Then thisLink = "JAVASCRIPT: " & thisLink ElseIf thisLink.ToLower.StartsWith("mailto") Then thisLink = "MAILTO: " & thisLink Else Dim baseUri As New Uri(Me.URL) If Not thisLink.ToLower.StartsWith("http") Then 'This is a partial URL so we will assume that it's relative to our originating URL Dim myUri As New Uri(baseUri, thisLink) thisLink = "RELATIVE LOCAL LINK: RESOLVED: " & myUri.ToString() & " ORIGINAL: " & thisLink Else 'The link starts with HTTP, determine if part of base host or is outside host. Dim ThisUri As New Uri(thisLink) If ThisUri.Host.ToLower = baseUri.Host.ToLower Then thisLink = "INSIDE COMPLETE LINK: " & thisLink Else thisLink = "OUTSIDE LINK: " & thisLink End If End If End If 'I'm storing the found links into a Generic.List(Of String) 'This link has descriptive text added to it. 'TODO: Make collection to hold only unique internal links. Me._Links.Add(thisLink) Next
Oui, ce que j'avais besoin. Cela fonctionne sur les URL provenant d'un emplacement différent. Je vais mettre à jour ma question pour montrer comment je l'ai implémenté. Merci! – rvarcher