2017-04-21 1 views
0

J'ai des données d'un site Web que je suis en train de gratter. Les données ressemblent à ci-dessous. Comment puis-je extraire le table en utilisant scrapysharp?Exactement une table en utilisant scrapysharp

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 

using HtmlAgilityPack; 
using ScrapySharp.Extensions; 
using ScrapySharp.Network; 

namespace Scrape 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      ScrapingBrowser browser = new ScrapingBrowser(); 

      //set UseDefaultCookiesParser as false if a website returns invalid cookies format 
      //browser.UseDefaultCookiesParser = false; 

      WebPage homePage = browser.NavigateToPage(new Uri("http://www.nasdaq.com/earnings/earnings-calendar.aspx")); 
      var divs = homePage.Html.CssSelect("div"); //all div elements 
      var trs = homePage.Html.SelectNodes("//div") 
       .Where(n => !String.IsNullOrEmpty(n.GetAttributeValue("class")) 
       //(n.GetAttributeValue("class") == "genTable") 
       );      
     } 
    } 
} 

Ceci est la partie pertinente du html:

<div class="clearB"></div> 

     <div class="genTable"> 
      <div id="_confirmed" > 
       <!--<div class="floatL"> 
        <h3>Earnings Date - Confirmed by Zacks</h3> 
       </div> 
       <div class="clearB"></div> 
       <br />--> 

       <div id="two_column_main_content_pnlInsider"> 


          <table class="USMN_EarningsCalendar" id="ECCompaniesTable" border="0"cellpadding="0" cellspacing="0"> 
           <thead> 
           <tr> 
            <th> 
             <a href="javascript:void(0);" onclick="getdata('earningtype',1)">Time</a> 
            </th> 
            <th> 
             Company Name (Symbol) <br /> Market Cap<br /> 
             Sort by: <a href="javascript:void(0);" onclick="getdata('name',1)">Name</a>/<a href="javascript:void(0);" onclick="getdata('marketcap',1)">Size</a> 
            </th> 
            <th> 
             Expected Report Date 
            </th> 
            <th> 
             Fiscal<br /> 
             Quarter<br /> 
             Ending 
            </th> 
            <th> 
             <span id="two_column_main_content_CompanyTable_EPS"> 
              Consensus<br /> 
              EPS* Forecast 
             </span> 
            </th> 
            <th> 
             # of Ests 
            </th> 
            <th style=""> 
             <span id="two_column_main_content_CompanyTable_previousreportdate"> 
              Last Year's<br /> 
              Report Date 
             </span> 
            </th> 
            <th> 
             Last Year's EPS* 
            </th> 
            <th style="display:none"> 
             % Suprise<br /> 
            </th> 
            </tr> 
           </thead> 

          <tr> 
           <td> 
            <a href="http://www.nasdaq.com/symbol/abb/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a> 
           </td> 
           <td> 
            <a id="two_column_main_content_CompanyTable_companyname_0" href="http://www.nasdaq.com/earnings/report/abb">ABB Ltd (ABB) <br/><b>Market Cap: $47.63B</b></a> 
           </td> 
           <td> 
            04/20/2017 
           </td> 
           <td> 
            Mar 2017 
           </td> 
           <td> 
            $0.25 
           </td> 
           <td> 
            2 
           </td> 
           <td style=""> 
            04/20/2016 
           </td> 
           <td> 
            $0.23 
           </td> 
           <td style="display:none"> 
            <span style='color:green'>Met</span> 
           </td> 
          </tr> 

          <tr> 
           <td> 
            <a href="http://www.nasdaq.com/symbol/acu/premarket" title="Pre-market Quotes"><img src="http://www.nasdaq.com/images/weather_sun.jpg" alt="Pre-Market Quotes" height="16" width="16"></a> 
           </td> 
           <td> 
            <a id="two_column_main_content_CompanyTable_companyname_1" href="http://www.nasdaq.com/earnings/report/acu">Acme United Corporation. (ACU) <br/><b>Market Cap: $92.5M</b></a> 
           </td> 
           <td> 
            04/20/2017 
           </td> 
           <td> 
            Mar 2017 
           </td> 
           <td> 
            $0.18 
           </td> 
           <td> 
            1 
           </td> 
           <td style=""> 
            04/22/2016 
           </td> 
           <td> 
            $0.16 
           </td> 
           <td style="display:none"> 
            <span style='color:green'>Met</span> 
           </td> 
          </tr> 

      ................ 

Répondre

1

j'imagine quelque chose comme ce code

var hw = new HtmlWeb(); 
     doc = hw.Load("http://www.nasdaq.com/earnings/earnings-calendar.aspx"); 

     foreach (HtmlNode row in doc.DocumentNode.Descendants("table").FirstOrDefault(_ => _.Id.Equals("ECCompaniesTable")).Descendants("tr")) 
     { 
      Console.WriteLine(row.InnerText); 
     } 
+0

Une variante de qui fonctionne réellement. Merci. – Ivan