Scrapy通過sqlite3保存數據
阿新 • • 發佈:2018-05-13
光標 AI mat items trac lds .cn logs utf
以爬取當當網作為實例 http://bj.ganji.com/fang1/chaoyang/
通過xpath獲取title和price
分別貼出spider, items, pipelines的code
1 # -*- coding: utf-8 -*- 2 import scrapy 3 from ..items import RenthouseItem 4 5 class GanjiSpider(scrapy.Spider): 6 name = ‘ganji‘ 7 # allowed_domains = [‘bj.ganji.com‘] 8 start_urls = [‘http://bj.ganji.com/fang1/chaoyang/‘] 9 10 def parse(self, response): 11 #print(response) 12 rh = RenthouseItem() 13 title_list = response.xpath(‘//*[@class="f-list-item ershoufang-list"]/dl/dd[1]/a/text()‘).extract() 14 price_list = response.xpath(‘//*[@class="f-list-item ershoufang-list"]/dl/dd[5]/div[1]/span[1]/text()‘).extract() 15 # d = {} 16 for i, j in zip(title_list, price_list): 17 rh[‘title‘] = i 18 rh[‘price‘] = j 19 yield rh 20 # d[‘title‘] = i 21 # d[‘price‘] = j 22 # yield d 23 # print(i, ‘:‘, j)
1 # -*- coding: utf-8 -*-2 3 # Define here the models for your scraped items 4 # 5 # See documentation in: 6 # https://doc.scrapy.org/en/latest/topics/items.html 7 8 import scrapy 9 10 11 class RenthouseItem(scrapy.Item): 12 # define the fields for your item here like: 13 # name = scrapy.Field() 14 title = scrapy.Field() 15 price = scrapy.Field() 16 # pass
1 # -*- coding: utf-8 -*- 2 3 # Define your item pipelines here 4 # 5 # Don‘t forget to add your pipeline to the ITEM_PIPELINES setting 6 # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html 7 import sqlite3 8 9 class RenthousePipeline(object): 10 def open_spider(self, spider): 11 self.con = sqlite3.connect(‘renthouse.sqlite‘) 12 self.cu = self.con.cursor() 13 14 def process_item(self, item, spider): 15 #print(spider.name) 16 insert_sql = ‘insert into renthouse (title, price) values ("{}", "{}")‘.format(item[‘title‘], item[‘price‘]) 17 #print(insert_sql) 18 self.cu.execute(insert_sql) 19 self.con.commit() 20 return item 21 22 def spider_close(self, spider): 23 self.con.close()
spider通過 rh = RenthouseItem() 這一句話初始化一個rh的實例,使我們可以通過這個rh傳到pipelines進行處理
所以這裏我們每次通過rh傳一個字典給pipelines(標題titile,價格price)然後通過sql語句插入到sqlite3
open_spider是打開spider的時候做的,所以這個時候我們連接數據庫,個人覺得這篇文章關於cursor光標及sqlite的應用講的很清楚https://www.cnblogs.com/qq78292959/archive/2013/04/01/2993327.html
註意insert等這種修改數據execute(執行)以後一定要commit(提交)!!!
close_spider就是關閉spider的時候做的,所以這個時候我們關閉與數據庫的連接
Scrapy通過sqlite3保存數據