Vous devriez être bien avec:
apt-get install -y \
python3 \
python-dev \
python3-dev
# for cryptography
apt-get install -y \
build-essential \
libssl-dev \
libffi-dev
# for lxml
apt-get install -y \
libxml2-dev \
libxslt-dev
# install pip
apt-get install -y python-pip
Voici un exemple Dockerfile pour tester l'installation scrapy sur Python 3, sur Ubuntu 16.04/Xenial:
$ cat Dockerfile
FROM ubuntu:xenial
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update
# Install Python3 and dev headers
RUN apt-get install -y \
python3 \
python-dev \
python3-dev
# Install cryptography
RUN apt-get install -y \
build-essential \
libssl-dev \
libffi-dev
# install lxml
RUN apt-get install -y \
libxml2-dev \
libxslt-dev
# install pip
RUN apt-get install -y python-pip
RUN useradd --create-home --shell /bin/bash scrapyuser
USER scrapyuser
WORKDIR /home/scrapyuser
Puis, après avoir construit l'image Docker et l'exécution d'un conteneur pour avec:
$ sudo docker build -t redapple/scrapy-ubuntu-xenial .
$ sudo docker run -t -i redapple/scrapy-ubuntu-xenial
vous pouvez exécuter pip install scrapy
Ci-dessous j'utilise virtualenvwrapper
pour créer un Python 3 virtualenv:
[email protected]:~$ pip install --user virtualenvwrapper
Collecting virtualenvwrapper
Downloading virtualenvwrapper-4.7.1-py2.py3-none-any.whl
Collecting virtualenv-clone (from virtualenvwrapper)
Downloading virtualenv-clone-0.2.6.tar.gz
Collecting stevedore (from virtualenvwrapper)
Downloading stevedore-1.14.0-py2.py3-none-any.whl
Collecting virtualenv (from virtualenvwrapper)
Downloading virtualenv-15.0.2-py2.py3-none-any.whl (1.8MB)
100% |################################| 1.8MB 320kB/s
Collecting pbr>=1.6 (from stevedore->virtualenvwrapper)
Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB)
100% |################################| 102kB 1.5MB/s
Collecting six>=1.9.0 (from stevedore->virtualenvwrapper)
Downloading six-1.10.0-py2.py3-none-any.whl
Building wheels for collected packages: virtualenv-clone
Running setup.py bdist_wheel for virtualenv-clone ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/24/51/ef/93120d304d240b4b6c2066454250a1626e04f73d34417b956d
Successfully built virtualenv-clone
Installing collected packages: virtualenv-clone, pbr, six, stevedore, virtualenv, virtualenvwrapper
Successfully installed pbr six stevedore virtualenv virtualenv-clone virtualenvwrapper
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[email protected]:~$ source ~/.local/bin/virtualenvwrapper.sh
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkproject
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkproject
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/initialize
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/prermvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postrmvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/predeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postdeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/preactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/get_env_details
[email protected]:~$ export PATH=$PATH:/home/scrapyuser/.local/bin
[email protected]:~$ mkvirtualenv --python=/usr/bin/python3 scrapy11.py3
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python3
Also creating executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/predeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postdeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/preactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/get_env_details
et l'installation scrapy 1.1 est une question de pip install scrapy
(scrapy11.py3) [email protected]:~$ pip install scrapy
Collecting scrapy
Downloading Scrapy-1.1.0-py2.py3-none-any.whl (294kB)
100% |################################| 296kB 1.0MB/s
Collecting PyDispatcher>=2.0.5 (from scrapy)
Downloading PyDispatcher-2.0.5.tar.gz
Collecting pyOpenSSL (from scrapy)
Downloading pyOpenSSL-16.0.0-py2.py3-none-any.whl (45kB)
100% |################################| 51kB 1.8MB/s
Collecting lxml (from scrapy)
Downloading lxml-3.6.0.tar.gz (3.7MB)
100% |################################| 3.7MB 312kB/s
Collecting parsel>=0.9.3 (from scrapy)
Downloading parsel-1.0.2-py2.py3-none-any.whl
Collecting six>=1.5.2 (from scrapy)
Using cached six-1.10.0-py2.py3-none-any.whl
Collecting Twisted>=10.0.0 (from scrapy)
Downloading Twisted-16.2.0.tar.bz2 (2.9MB)
100% |################################| 2.9MB 307kB/s
Collecting queuelib (from scrapy)
Downloading queuelib-1.4.2-py2.py3-none-any.whl
Collecting cssselect>=0.9 (from scrapy)
Downloading cssselect-0.9.1.tar.gz
Collecting w3lib>=1.14.2 (from scrapy)
Downloading w3lib-1.14.2-py2.py3-none-any.whl
Collecting service-identity (from scrapy)
Downloading service_identity-16.0.0-py2.py3-none-any.whl
Collecting cryptography>=1.3 (from pyOpenSSL->scrapy)
Downloading cryptography-1.4.tar.gz (399kB)
100% |################################| 409kB 1.1MB/s
Collecting zope.interface>=4.0.2 (from Twisted>=10.0.0->scrapy)
Downloading zope.interface-4.1.3.tar.gz (141kB)
100% |################################| 143kB 1.3MB/s
Collecting attrs (from service-identity->scrapy)
Downloading attrs-16.0.0-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->scrapy)
Downloading pyasn1-0.1.9-py2.py3-none-any.whl
Collecting pyasn1-modules (from service-identity->scrapy)
Downloading pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting idna>=2.0 (from cryptography>=1.3->pyOpenSSL->scrapy)
Downloading idna-2.1-py2.py3-none-any.whl (54kB)
100% |################################| 61kB 2.0MB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools>=11.3 in ./.virtualenvs/scrapy11.py3/lib/python3.5/site-packages (from cryptography>=1.3->pyOpenSSL->scrapy)
Collecting cffi>=1.4.1 (from cryptography>=1.3->pyOpenSSL->scrapy)
Downloading cffi-1.6.0.tar.gz (397kB)
100% |################################| 399kB 1.1MB/s
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3->pyOpenSSL->scrapy)
Downloading pycparser-2.14.tar.gz (223kB)
100% |################################| 225kB 1.2MB/s
Building wheels for collected packages: PyDispatcher, lxml, Twisted, cssselect, cryptography, zope.interface, cffi, pycparser
Running setup.py bdist_wheel for PyDispatcher ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411
Running setup.py bdist_wheel for lxml ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/6c/eb/a1/e4ff54c99630e3cc6ec659287c4fd88345cd78199923544412
Running setup.py bdist_wheel for Twisted ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/fe/9d/3f/9f7b1c768889796c01929abb7cdfa2a9cdd32bae64eb7aa239
Running setup.py bdist_wheel for cssselect ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/1b/41/70/480fa9516ccc4853a474faf7a9fb3638338fc99a9255456dd0
Running setup.py bdist_wheel for cryptography ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/f6/6c/21/11ec069285a52d7fa8c735be5fc2edfb8b24012c0f78f93d20
Running setup.py bdist_wheel for zope.interface ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/52/04/ad/12c971c57ca6ee5e6d77019c7a1b93105b1460d8c2db6e4ef1
Running setup.py bdist_wheel for cffi ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/8f/00/29/553c1b1db38bbeec3fec428ae4e400cd8349ecd99fe86edea1
Running setup.py bdist_wheel for pycparser ... done
Stored in directory: /home/scrapyuser/.cache/pip/wheels/9b/f4/2e/d03e949a551719a1ffcb659f2c63d8444f4df12e994ce52112
Successfully built PyDispatcher lxml Twisted cssselect cryptography zope.interface cffi pycparser
Installing collected packages: PyDispatcher, idna, pyasn1, six, pycparser, cffi, cryptography, pyOpenSSL, lxml, w3lib, cssselect, parsel, zope.interface, Twisted, queuelib, attrs, pyasn1-modules, service-identity, scrapy
Successfully installed PyDispatcher-2.0.5 Twisted-16.2.0 attrs-16.0.0 cffi-1.6.0 cryptography-1.4 cssselect-0.9.1 idna-2.1 lxml-3.6.0 parsel-1.0.2 pyOpenSSL-16.0.0 pyasn1-0.1.9 pyasn1-modules-0.0.8 pycparser-2.14 queuelib-1.4.2 scrapy-1.1.0 service-identity-16.0.0 six-1.10.0 w3lib-1.14.2 zope.interface-4.1.3
test Enfin, l'exemple de projet:
(scrapy11.py3) [email protected]:~$ scrapy startproject tutorial
New Scrapy project 'tutorial', using template directory '/home/scrapyuser/.virtualenvs/scrapy11.py3/lib/python3.5/site-packages/scrapy/templates/project', created in:
/home/scrapyuser/tutorial
You can start your first spider with:
cd tutorial
scrapy genspider example example.com
(scrapy11.py3) [email protected]:~$ cd tutorial
(scrapy11.py3) [email protected]:~/tutorial$ scrapy genspider example example.com
Created spider 'example' using template 'basic' in module:
tutorial.spiders.example
(scrapy11.py3) [email protected]:~/tutorial$ cat tutorial/spiders/example.py
# -*- coding: utf-8 -*-
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)
def parse(self, response):
pass
(scrapy11.py3) [email protected]:~/tutorial$ scrapy crawl example
2016-06-07 11:08:27 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial)
2016-06-07 11:08:27 [scrapy] INFO: Overridden settings: {'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'tutorial.spiders'}
2016-06-07 11:08:27 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats', 'scrapy.extensions.corestats.CoreStats']
2016-06-07 11:08:27 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-07 11:08:27 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-07 11:08:27 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-07 11:08:27 [scrapy] INFO: Spider opened
2016-06-07 11:08:28 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-07 11:08:28 [scrapy] DEBUG: Crawled (404) <GET http://www.example.com/robots.txt> (referer: None)
2016-06-07 11:08:28 [scrapy] DEBUG: Crawled (200) <GET http://www.example.com/> (referer: None)
2016-06-07 11:08:28 [scrapy] INFO: Closing spider (finished)
2016-06-07 11:08:28 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 436,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 1921,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 614605),
'log_count/DEBUG': 2,
'log_count/INFO': 7,
'response_received_count': 2,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 24624)}
2016-06-07 11:08:28 [scrapy] INFO: Spider closed (finished)
(scrapy11.py3) [email protected]:~/tutorial$
Merci Paul. J'ai suivi vos pas (mais sans utiliser docker et virtualenv), et l'installation a été réussie. Cependant, mon python par défaut devient 2.7.11 qui était 3.5.1 par défaut. Possible de résoudre ce problème? – Harrison
J'ai reçu le message 'Successfully installed ... scrapy ...', mais lorsque j'ai exécuté 'scrapy startproject myProject', j'ai reçu un message d'erreur disant' Le programme 'scrapy' n'est pas installé actuellement. Vous pouvez l'installer en tapant: sudo apt installer python-scrapy' – Harrison
Le second problème a été résolu par 'sudo pip install scrapy' – Harrison