一.Python的命令行工具 學習筆記(Command line tool)
命令行工具
Scrapy通過scrapy命令行工具進行控制,在此稱為“Scrapy工具”,以區別於子命令,我們稱之為“命令”或“Scrapy命令”。
Scrapy工具提供了多個命令,用於多種用途,每個命令都接受一組不同的參數和選項。
創建項目
scrapy startproject myproject [project_dir]
在命令行中創建項目
scrapy start myproject E:\pythoncode\
E:\pythoncode中創建myproject項目
接下來
cd E:\pythoncode
按:如果project_dir沒有指定,project_dir將是相同的myproject
控制項目
例如:scrapy genspider mydomain mydomain.com
創建一個爬蟲,名字為mydomain,爬取mydomain.com網站
可以在E:\pythoncode\myproject\spiders 看到這個爬蟲的代碼
scrapy -h
我們可以看到以下:
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
註:意思是不懂就scrapy <command> -h 自己學
startproject命令
句法: scrapy startproject <project_name> [project_dir]
示例:
scrapy startproject myproject
genspider命令
句法: scrapy genspider [-t template] <name> <domain>
示例:
E:\pythoncode>scrapy genspider -l Available templates: basic crawl csvfeed xmlfeed 註:-t中可用的模版basic....xmlfeed E:\pythoncode>scrapy genspider example example.com Created spider ‘example‘ using template ‘basic‘ in module: myproject.spiders.example 註:scrapy genspider xx xx.com相當於 scrapy genspider -t basic xx xx.com E:\pythoncode>scrapy genspider -t crawl scrapyorg scrapy.org Created spider ‘scrapyorg‘ using template ‘crawl‘ in module: myproject.spiders.scrapyorg 註:-t crawl創建出來的跟-t crawl不一樣的,我想大概是為了滿足網站一些不可知的需求吧
這只是一個方便的快捷方式命令,用於根據預定義的模板創建爬蟲,但肯定不是創建爬蟲的唯一方法。您可以自己創建爬蟲源代碼文件,而不是使用此命令。
crawl命令
句法: scrapy crawl <spider>
示例:
E:\pythoncode>scrapy crawl mydomain [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject) ................. 註:crawl命令 在示例教程裏已經用的不能再熟悉了 使用爬蟲開始抓取。
check命令
句法: scrapy check [-l] <spider>
示例:
E:\pythoncode>scrapy check -l mydomain E:\pythoncode>scrapy check -l 註:現在沒顯示什麽了,估計是版本的已經升級了。我猜以前是用來檢查存在這個<spider>的 和訪問速度的吧?
list命令
句法: scrapy list
示例:
E:\pythoncode>scrapy list
example
mydomain
scrapyorg
註:列出所有爬蟲的名字
edit命令
句法: scrapy edit <spider>
Edit the given spider using the editor defined in the EDITOR
environment variable or (if unset) the EDITOR
setting.
This command is provided only as a convenience shortcut for the most common case, the developer is of course free to choose any tool or IDE to write and debug spiders.
註:不懂,貼上官方原話
fetch命令
句法: scrapy fetch <url>
示例:
E:\pythoncode>scrapy fetch --nolog http://www.example.com/some/page.html <?xml version="1.0" encoding="iso-8859-1"?> ..................................... 註:使用Scrapy下載程序下載給定的URL,並將內容寫入標準輸出。 E:\pythoncode>scrapy fetch --nolog --headers http://www.example.com/ > Accept-Language: en > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > User-Agent: Scrapy/1.5.1 (+https://scrapy.org) > Accept-Encoding: gzip,deflate ....................................................... 註:--headers 你是用什麽狀態去訪問url的
list命令
句法: scrapy view <url>
示例:
E:\pythoncode>scrapy view https://movie.douban.com/ 註:進去之後是不是403錯誤啊,豆瓣會判斷你以什麽姿態去訪問的 我們看看--headers是什麽效果 E:\pythoncode>scrapy fetch --headers --nolog https://movie.douban.com/ > Accept-Encoding: gzip,deflate > User-Agent: Scrapy/1.5.1 (+https://scrapy.org) > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > Accept-Language: en > < Server: dae < Content-Type: text/html < Date: Sun, 07 Oct 2018 15:06:06 GMT 註:User-Agent: Scrapy/1.5.1 (+https://scrapy.org) 下次要爬這些網址的時候記得改下這個哦 不懂的話推薦一個網址https://blog.csdn.net/u012195214/article/details/78889602
shell命令
句法: scrapy shell [url]
示例:
E:\pythoncode>scrapy shell https://movie.douban.com/ [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject) ........................ 註:shell要安裝ipthon好像,進去學習的話自己看官網的教程示例
E:\pythoncode>scrapy shell --nolog https://movie.douban.com/ -c "(response.status, response.url)" (403, ‘https://movie.douban.com/‘) 註:403訪問錯誤,https://movie.douban.com/ 200訪問成功 HTTP響應代碼,不懂的話:https://blog.csdn.net/jackfrued/article/details/25662527
Parse命令
句法: scrapy parse <url> [options]
獲取給定的URL並使用處理它的爬蟲解析它
示例:這個爬蟲解析沒有,所以沒有示例
settings命令
句法: scrapy settings [options]
示例:
E:\pythoncode>scrapy settings --get BOT_NAME myproject 註:項目名 E:\pythoncode>scrapy settings --get DOWNLOAD_DELAY 0 註:下載延遲 不懂的話自己打開項目目錄下的scrapy.cfg看啦,加上scrapy settings -h
runspider命令
句法: scrapy runspider <spider_file.py>
示例:
E:\pythoncode>scrapy runspider E:\pythoncode\myproject\spiders\mydomain.py [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject)
...................................
Version命令
句法: scrapy version [-v]
示例:
E:\pythoncode>scrapy version Scrapy 1.5.1 E:\pythoncode>scrapy version -v Scrapy : 1.5.1 lxml : XXX libxml2 : XXX cssselect : XXX parsel : XXX w3lib : XXX Twisted : XXX Python : 3.XXXX pyOpenSSL : XXX cryptography : XXX Platform : XXX 註:scrapy用到的庫的版本號
原話:Prints the Scrapy version. If used with-v
it also prints Python, Twisted and Platform info, which is useful for bug reports.
bench命令
句法: scrapy bench
註:運行基準測試
原話:Run a quick benchmark test.
不懂就:https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking
自定義項目命令
沒玩過,哪天玩下。
學不急而能修,附上源頭活水的官方地址:https://docs.scrapy.org/en/latest/topics/commands.html
一.Python的命令行工具 學習筆記(Command line tool)