2012-03-13 2 views
1

J'ai écrit une fonction python pour évaluer les sites Web en fonction de certains paramètres (une série de mots). La fonction utilise Python Mechanize et fonctionne très bien la plupart du temps.Python mechanize se bloque sur br.open (url) avec self._sleep (pause) traceback

Cependant, pour certains sites Web, il reste suspendu jusqu'à ce que je ctrl + c sur le terminal. Je suppose qu'il s'agit d'une sorte de problème lié à javascript, existe-t-il un moyen de créer une fonction de temporisation autour de cela?

C'est ma fonction:

def rateSite(site_url,comparisonWords): 
    #open the site 
    localBrowser = mechanize.Browser() 
    localBrowser.addheaders = [('User-agent', 'Mozilla/5.1 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/9.0.1')] 
    localBrowser.set_handle_robots(False) 
    site = localBrowser.open(site_url,timeout=5000) 
    html = site.read() 

    #rate the site 
    for i in comparisonWords.split(): 
     #do some rating math 

    return rating 

et c'est le retraçage que je reçois sur ctrl + c:

site=localBrowser.open(site_url,timeout=5000) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open 
    return self._mech_open(url, data, timeout=timeout) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open 
    response = UserAgentBase.open(self, request, data) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open 
    response = meth(req, response) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response 
    "http", request, response, code, msg, hdrs) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error 
    result = apply(self._call_chain, args) 
    File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302 
    return self.parent.open(new) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open 
    return self._mech_open(url, data, timeout=timeout) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open 
    response = UserAgentBase.open(self, request, data) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open 
    response = meth(req, response) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response 
    "http", request, response, code, msg, hdrs) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error 
    result = apply(self._call_chain, args) 
    File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302 
    return self.parent.open(new) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open 
    return self._mech_open(url, data, timeout=timeout) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open 
    response = UserAgentBase.open(self, request, data) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open 
    response = meth(req, response) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response 
    "http", request, response, code, msg, hdrs) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error 
    result = apply(self._call_chain, args) 
    File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302 
    return self.parent.open(new) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open 
    return self._mech_open(url, data, timeout=timeout) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open 
    response = UserAgentBase.open(self, request, data) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open 
    response = meth(req, response) 
    File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 578, in http_response 
    self._sleep(pause) 
KeyboardInterrupt 

Toute aide sur la façon de résoudre ce ou de construire un temps pour elle sera grandement apprécié.

Merci!

Répondre

1

timeout=5000 est de plus d'une heure; vous pourriez vouloir dire timeout=5. Par défaut, mechanize suit au plus 10 redirections avant d'abandonner, voir HTTPRedirectHandler.max_redirections.

+0

Merci, cela semble fonctionner correctement. Je pensais que la valeur du délai d'attente était en millisecondes. – dtrujillo

Questions connexes