Je suis nouveau sur Scrapy et j'essaie d'explorer quelques liens en tant que test en utilisant Scrapy. Chaque fois que je lance scrapy crawl tier1
, je reçois "TypeError: objet() ne prend aucun paramètre" comme suit:scrapy TypeError: object() ne prend aucun paramètre
Traceback (most recent call last):
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse
mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath
self.add_value(field_name, values, *processors, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value
self._add_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value
processed_value = self._process_input_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value
return proc(value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__
next_values += arg_to_iter(func(v))
TypeError: object() takes no parameters
2017-08-23 17:25:02 [tier1-parse-logger] INFO: Entered the parse function to parse and index: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<date>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [tier1-parse-logger] ERROR: Error (object() takes no parameters) when trying to parse <<author>> from a mk article: http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166
2017-08-23 17:25:02 [scrapy.core.scraper] ERROR: Spider error processing <GET http://news.mk.co.kr/newsRead.php?sc=30000001&year=2017&no=535166> (referer: None)
Traceback (most recent call last):
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/adv_crawler/adv_crawler/spiders/tier1_crawler.py", line 93, in parse
mk_loader.add_xpath('title', 'h1[@class="top_title"]') # Title of the article
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 167, in add_xpath
self.add_value(field_name, values, *processors, **kw)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 77, in add_value
self._add_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 91, in _add_value
processed_value = self._process_input_value(field_name, value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/__init__.py", line 150, in _process_input_value
return proc(value)
File "/Users/btaek/TaeksProgramming/adv/crawler/lib/python2.7/site-packages/scrapy/loader/processors.py", line 28, in __call__
next_values += arg_to_iter(func(v))
TypeError: object() takes no parameters
Et, mon dossier d'araignée (de tier1_crawler.py):
Et, mes articles fichier .py:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.loader.processors import Join, MapCompose, TakeFirst
from w3lib.html import remove_tags
def filter_date(value):
if isinstance(value, unicode):
(year, month, day) = str(value.split(" ")[-2]).split(".")
return year+"-"+month+"-"+day
def filter_utf(value):
if isinstance(value, unicode):
return value.encode('utf-8')
class AdvCrawlerItem(scrapy.Item):
author = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # Name of the publisher/author
content = scrapy.Field(input_processor=MapCompose(remove_tags, Join, filter_utf),) # Content of the article (entire contents)
content_type = scrapy.Field()
date = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_date),)
timestamp = scrapy.Field() # timestamp of when the document is being indexed
title = scrapy.Field(input_processor=MapCompose(remove_tags, TakeFirst, filter_utf),) # title of the article
url = scrapy.Field() # url of the article
Et, fichier pipelines.py:
import json
from scrapy import signals
from scrapy.exporters import JsonLinesItemExporter
class AdvCrawlerJsonExportPipeline(object):
def open_spider(self, spider):
self.file = open('crawled-articles1.txt', 'w')
def close_spider(self, spider):
self.file.close()
def process_item(self, item, spider):
line = json.dummps(dict(item)) + "\n"
self.file.write(line)
return item
Je suis conscient que l'erreur "TypeError: object() takes no parameters" est généralement levée lorsque la méthode __init__
d'une classe n'est pas définie du tout ou n'est pas définie pour prendre en paramètre (s).
Cependant, dans le cas ci-dessus, comment puis-je corriger l'erreur? Est-ce que je fais quelque chose de mal en utilisant le chargeur d'élément ou le chargeur d'élément imbriqué?
Il a probablement à voir avec l'utilisation de processeurs comme '' TakeFirst' et Join' dans 'MapCompose()', au lieu de fonctions. –