2017-08-23 2 views
1

Je suis nouveau sur Scrapy et j'essaie d'explorer quelques liens en tant que test en utilisant Scrapy. Chaque fois que je lance scrapy crawl tier1, je reçois "TypeError: objet() ne prend aucun paramètre" comme suit:scrapy TypeError: object() ne prend aucun paramètre

Traceback (most recent call last): 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse 
    mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath 
    self.add_value(field_name, values, *processors, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value 
    self._add_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value 
    processed_value = self._process_input_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value 
    return proc(value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__ 
    next_values += arg_to_iter(func(v)) 
TypeError: object() takes no parameters 
2017-08-23 17:25:02 [tier1-parse-logger] INFO: Entered the parse function to parse and index: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<date>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<author>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166 
2017-08-23 17:25:02 [scrapy.core.scraper] ERROR: Spider error processing <GET http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166> (referer: None) 
Traceback (most recent call last): 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse 
    mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath 
    self.add_value(field_name, values, *processors, **kw) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value 
    self._add_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value 
    processed_value = self._process_input_value(field_name, value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value 
    return proc(value) 
    File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__ 
    next_values += arg_to_iter(func(v)) 
TypeError: object() takes no parameters 

Et, mon dossier d'araignée (de tier1_crawler.py):

Et, mes articles fichier .py:

# -*- coding: utf-8 -*- 

import scrapy 
from scrapy.loader.processors import Join, MapCompose, TakeFirst 
from w3lib.html import remove_tags 

def filter_date(value): 
    if isinstance(value, unicode): 
     (year, month, day) = str(value.split(" ")[-2]).split(".") 
     return year+"-"+month+"-"+day 

def filter_utf(value): 
    if isinstance(value, unicode): 
     return value.encode('utf-8') 

class AdvCrawlerItem(scrapy.Item): 
    author = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # Name of the publisher/author 
    content = scrapy.Field(input_processor=MapCompose(remove_tags, Join, filter_utf),) # Content of the article (entire contents) 
    content_type = scrapy.Field() 
    date = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_date),) 
    timestamp = scrapy.Field() # timestamp of when the document is being indexed 
    title = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # title of the article 
    url = scrapy.Field() # url of the article 

Et, fichier pipelines.py:

import json 
from scrapy import signals 
from scrapy.exporters import JsonLinesItemExporter 

class AdvCrawlerJsonExportPipeline(object): 
    def open_spider(self, spider): 
     self.file = open('crawled-articles1.txt', 'w') 

    def close_spider(self, spider): 
     self.file.close() 

    def process_item(self, item, spider): 
     line = json.dummps(dict(item)) + "\n" 
     self.file.write(line) 
     return item 

Je suis conscient que l'erreur "TypeError: object() takes no parameters" est généralement levée lorsque la méthode __init__ d'une classe n'est pas définie du tout ou n'est pas définie pour prendre en paramètre (s).

Cependant, dans le cas ci-dessus, comment puis-je corriger l'erreur? Est-ce que je fais quelque chose de mal en utilisant le chargeur d'élément ou le chargeur d'élément imbriqué?

+0

Il a probablement à voir avec l'utilisation de processeurs comme '' TakeFirst' et Join' dans 'MapCompose()', au lieu de fonctions. –

Répondre

2

Lorsque vous utilisez des processeurs Scrapy vous devez utiliser les classes pour créer des objets qui font le traitement:

# wrong 
field = Field(output_processor=MapCompose(TakeFirst)) 
# right 
field = Field(output_processor=MapCompose(TakeFirst())) 
                ^^ 
+0

Ah, vous avez raison. J'ai fait une erreur stupide. Merci. – btaek