2017-09-21 1 views
0

code html pour analyser:analyse syntaxique du code source html pour obtenir la valeur attendue

<table width="100%" border="0" cellpadding="0" cellspacing="0" class="ms-bottompaging" xmlns:x="http://www.w3.org/2001/XMLSchema" xmlns:d="http://schemas.microsoft.com/" xmlns:asp="http://schemas.microsoft.com/ASPNET/20" xmlns:pcm="urn:PageContentManager" xmlns:ddwrt2="urn:frontpage:internal"> 
    <tbody> 
     <tr> 
     <td class="ms-bottompagingline1"><img src="/_images/11/images/blank.gif?rev=40" width="1" height="1" alt="" /></td> 
     </tr> 
     <tr> 
     <td class="ms-bottompagingline2"><img src="/_images/11/images/blank.gif?rev=40" width="1" height="1" alt="" /></td> 
     </tr> 
     <tr> 
     <td class="ms-vb" id="bottomPagingCellWPQ2" align="center"> 
     <table> 
     <tbody> 
      <tr> 
      <td class="ms-paging">1 - 15</td> 
      <td><a onclick="javascript:RefreshPageTo(event, &quot;/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&amp;p_SortBehavior=0&amp;p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&amp;p_ID=85&amp;RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&amp;PageFirstRow=16&amp;&amp;View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}&quot;);javascript:return false;" href="javascript:"><img src="/_layouts/15/1011/images/next.gif" border="0" alt="Next" /></a></td> 
      </tr> 
     </tbody> 
     </table></td> 
     </tr> 
    <tr>....... 

Comment obtenir la valeur de <a onClick=".."> du code html ci-dessus.

Résultats escomptés:

&quot;/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&amp;p_SortBehavior=0&amp;p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&amp;p_ID=85&amp;RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&amp;PageFirstRow=16&amp;&amp;View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}&quot; 

J'ai essayé avec le code ci-dessous, mais la sortie est pas comme prévu.

File input = new File("myHtml.html"); 
      Document doc = Jsoup.parse(input, "UTF-8"); 
      Elements links = doc.select(".ms-paging > td > a"); //get the value stored inside <a onClick="javascript:RefreshPageTo(event, &quot...)"> near <td class="ms-paging">1 - 15</td>; 
      System.out.println("size : "+ links.size()); //0 
      for (Element link : links) { 
       System.out.println(link);//empty, it should print the link 
      } 
+0

'" .ms-paging + td> a "'? Le '>' signifie "descendant direct" mais vous voulez le frère suivant td après celui avec la classe "ms-paging". –

Répondre

0

Vous devez utiliser ~ pour spécifier élément td à côté de td class="ms-paging". Ce qui suit a fonctionné pour moi

Document doc = Jsoup.parse(input, "UTF-8"); 
Elements elements = doc.select("td.ms-paging ~ td > a") ; 
for(Element e : elements) { 
    String attrValue = e.attr("onclick"); 
    System.out.println(attrValue.substring(attrValue.indexOf("\"") + 1, 
         attrValue.lastIndexOf("\""))); 
} 

imprimera valeur attendue

/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&p_SortBehavior=0&p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&p_ID=85&RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&PageFirstRow=16&&View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}

Hope it helps!