2011-11-20 4 views

Répondre

0

utilisation BaseSpider au lieu de CrawlSpider, puis mis en ajouter à start_requests ou start_urls []

class MySpider(BaseSpider): 
    name = "myspider" 

    def start_requests(self): 
     return [Request("https://www.example.com", 
      callback=self.parse)] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     ... 
0

ThemenHubSpider classe (CrawlSpider):

name = 'themenHub' 
allowed_domains = ['themen.t-online.de'] 
start_urls = ["http://themen.t-online.de/themen-a-z/a"] 
rules = [Rule(SgmlLinkExtractor(allow=['id_\d+']), 'parse_news')] 
Questions connexes