windows簡易安裝scrapy
阿新 • • 發佈:2018-11-11
windows簡易安裝scrapy
Scrapy,Python開發的一個快速、高層次的螢幕抓取和web抓取框架,用於抓取web站點並從頁面中提取結構化的資料。Scrapy用途廣泛,可以用於資料探勘、監測和自動化測試。
Scrapy吸引人的地方在於它是一個框架,任何人都可以根據需求方便的修改。它也提供了多種型別爬蟲的基類,如BaseSpider、sitemap爬蟲等,最新版本又提供了web2.0爬蟲的支援
寫在前面:在本文中將使用的python版本為3.7,讀者可自行選擇版本。系統為windows 64位。
第一步 確保環境要求
讀者請自行安裝python,並將python目錄,以及python目錄下Scripts加入系統環境變數中。如下圖所示:
注:在安裝過程中如果讀者勾選了將python加入環境變數,即跳過此步驟。
準備兩個所需檔案:**Twisted-18.7.0-cp37-cp37m-win_amd64.whl
lxml-4.2.3-cp37-cp37m-win_amd64.whl**
檔案下載地址–>https://pan.baidu.com/s/1TC2q_oC5h6Z4ymRpmpSxsA (包含了3.5 以及3.7版本)
讀者也可以自行下載–>非官方windows-python擴充套件包地址:pythonhttps://www.lfd.uci.edu/~gohlke/pythonlibs/
注:由於scrapy使用Twisted為框架,以及使用lxml解析html,在正常的安裝過程中無法正確的安裝這兩個元件,故進行單獨安裝。
第二步 安裝scrapy
進入Twisted-18.7.0-cp37-cp37m-win_amd64.whl 、lxml-4.2.3-cp37-cp37m-win_amd64.whl檔案存放目錄,使用pip命令進行安裝:
C:\Users\WU\Downloads\scrapyFile>pip install lxml-4.2.3-cp37-cp37m-win_amd64.whl
C:\Users\WU\Downloads\scrapyFile>pip install Twisted-18.7.0-cp37-cp37m-win_amd64.whl
注:本人將這兩個檔案存放C:\Users\WU\Downloads\scrapyFile檔案中
在lxml Twisted安裝成功後,執行如下命令,進行scrapy安裝:
pip install pywin32
pip install scrapy
注:由於python後續還將訪問windows系統的API庫,故需安裝pywin32
第三步 驗證scrapy是否安裝成功
在cmd中執行‘scrapy’,出現如下資訊:
C:\Users\WU\Downloads\scrapyFile>scrapy
Scrapy 1.5.1 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
第四步 建立scrapy專案
進入工作目錄,執行如下命令,即可檢視對應專案的生成:
D:\pythonplace\scrapy>scrapy startproject helloworld
New Scrapy project 'helloworld', using template directory 'd:\\software\\python3.7\\lib\\site-packages\\scrapy\\templates\\project', created in:
D:\pythonplace\scrapy\helloworld
You can start your first spider with:
cd helloworld
scrapy genspider example example.com
附錄:
1.當讀者用執行scrapy crawl xxx命令啟動爬蟲時,出現如下錯誤:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
return Crawler(spidercls, self.settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
from twisted.conch import manhole, telnet
File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
請找到python目錄下Lib/site-packages/twisted/conch/manhole.py檔案的154、155、240、241、247行的async重新命名
如下:
154 def write(self, data, async1=False):
155 self.handler.addOutput(data, async1)
........
240 def addOutput(self, data, async1=False):
241 if async1:
........
247 if async1: