1. 程式人生 > >scrapy學習筆記

scrapy學習筆記

spa pro 爬取 scrapy 需要 Go type com line

scrapy學習筆記

下面以爬取1919網站為例子,完成對一整個網站數據爬取的scrapy項目創建。

創建一個scrapy文件

在任意目錄下輸入命令

scrapy startproject OneNine (文件名)

將會得到如下目錄的文件

OneNine/
    scrapy.cfg            # 部署配置文件

    OneNine/           # Python模塊,你所有的代碼都放這裏面
        __init__.py

        items.py          # Item定義文件

        pipelines.py      # pipelines定義文件
settings.py # 配置文件 spiders/ # 所有爬蟲spider都放這個文件夾下面 __init__.py ...

接著創建一個spider文件用來編寫爬取規則

cd OneNine
scrape genspider onenine onenine.com 

此時在spiders文件夾下就會生成一個onenine.py文件,我們將在這個文件中編寫爬蟲規則

定義Item

在items.py文件中需要編寫我們要爬取的字段內容。

import scrapy

class
OnenineItem(scrapy.Item): url = scrapy.Field() good_name = scrapy.Field() actual_price = scrapy.Field() details = scrapy.Field() year = scrapy.Field() month = scrapy.Field() plateform = scrapy.Field() cat_lv_one = scrapy.Field() cat_lv_two = scrapy.Field() shop_id
= scrapy.Field() shop_name = scrapy.Field() shop_area = scrapy.Field() shop_province = scrapy.Field() shop_city = scrapy.Field() good_id = scrapy.Field() brand = scrapy.Field() size = scrapy.Field() percent = scrapy.Field() country = scrapy.Field() area = scrapy.Field() type = scrapy.Field() grape_type = scrapy.Field() num = scrapy.Field() name_price = scrapy.Field() bottle_price = scrapy.Field() comments = scrapy.Field() accumulate_sales = scrapy.Field() month_sales = scrapy.Field() month_bottle_sales = scrapy.Field() month_sale_amounts = scrapy.Field()

scrapy.Field的屬性的字段可以直接在後期直接生成你要的文件格式。

spider文件

在spider文件中我們編寫了對於網站爬取規則的編寫

scrapy學習筆記