scrapy學習筆記
阿新 • • 發佈:2018-03-27
spa pro 爬取 scrapy 需要 Go type com line
scrapy學習筆記
下面以爬取1919網站為例子,完成對一整個網站數據爬取的scrapy項目創建。
創建一個scrapy文件
在任意目錄下輸入命令
scrapy startproject OneNine (文件名)
將會得到如下目錄的文件
OneNine/ scrapy.cfg # 部署配置文件 OneNine/ # Python模塊,你所有的代碼都放這裏面 __init__.py items.py # Item定義文件 pipelines.py # pipelines定義文件settings.py # 配置文件 spiders/ # 所有爬蟲spider都放這個文件夾下面 __init__.py ...
接著創建一個spider文件用來編寫爬取規則
cd OneNine
scrape genspider onenine onenine.com
此時在spiders文件夾下就會生成一個onenine.py文件,我們將在這個文件中編寫爬蟲規則
定義Item
在items.py文件中需要編寫我們要爬取的字段內容。
import scrapy classOnenineItem(scrapy.Item): url = scrapy.Field() good_name = scrapy.Field() actual_price = scrapy.Field() details = scrapy.Field() year = scrapy.Field() month = scrapy.Field() plateform = scrapy.Field() cat_lv_one = scrapy.Field() cat_lv_two = scrapy.Field() shop_id= scrapy.Field() shop_name = scrapy.Field() shop_area = scrapy.Field() shop_province = scrapy.Field() shop_city = scrapy.Field() good_id = scrapy.Field() brand = scrapy.Field() size = scrapy.Field() percent = scrapy.Field() country = scrapy.Field() area = scrapy.Field() type = scrapy.Field() grape_type = scrapy.Field() num = scrapy.Field() name_price = scrapy.Field() bottle_price = scrapy.Field() comments = scrapy.Field() accumulate_sales = scrapy.Field() month_sales = scrapy.Field() month_bottle_sales = scrapy.Field() month_sale_amounts = scrapy.Field()
scrapy.Field的屬性的字段可以直接在後期直接生成你要的文件格式。
spider文件
在spider文件中我們編寫了對於網站爬取規則的編寫
scrapy學習筆記