scrapy根據關鍵字爬取google圖片

阿新 • • 發佈：2018-12-14

瀏覽器的圖片都是通過Ajax非同步載入的，通過瀏覽器F12的network下的XHR可以看到，當往下拉動載入更多圖片時，XHR會加載出

許多內容，可以判定我們所需的資料可以通過這個介面拿到。下面是程式碼;

spiders檔案

# -*- coding: utf-8 -*-
import scrapy
from urllib.parse import urlencode
import re
from picture.items import PictureItem

class DongzhiwuSpider(scrapy.Spider):
name = 'dongzhiwu'
allowed_domains = ['www.google.com']
start_urls = ['https://www.google.com.hk/']
phrase_list = ['蘋果','香蕉','深圳']#爬取關鍵字列表

def start_requests(self):
for name in range(len(self.phrase_list)):
key = urlencode({'q': self.phrase_list[name]})
for page in range(1, 21):#爬取20頁
url = "https://www.google.com.hk/search?ei=0n6sW-DlJITr-QbQl7Mw&hl=zh-CN&safe=strict&yv=3&tbm=isch&" + key +"&vet=10ahUKEwjglqSRzdrdAhWEdd4KHdDLDAYQuT0IOCgB.0n6sW-DlJITr-QbQl7Mw.i&ved=0ahUKEwjglqSRzdrdAhWEdd4KHdDLDAYQuT0IOCgB&ijn="+str(page)+"&start="+str(page*100)+"&asearch=ichunk&async=_id:rg_s,_pms:s,_fmt:pc"
yield scrapy.Request(url, callback=self.parse, meta=({'q': self.phrase_list[name]}), dont_filter=True)

def parse(self, response):
item = PictureItem()
item['name'] = response.meta['q']#關鍵字名
item['pic_urls'] = re.findall('imgurl=(http.*?)&', response.text)#每一頁的圖片連結
yield item

由此我們得到了每一頁的圖片連結，接下來去管道檔案中寫道本地即可。

pipelines檔案:

from hashlib import md5
from urllib.request import urlretrieve
import os

class PicturePipeline(object):
def process_item(self, item, spider):
if not os.path.exists('google圖片'):
os.mkdir('google圖片')
kind_path = '{0}/{1}'.format('google圖片', item['name'])
if not os.path.exists(kind_path):
os.mkdir(kind_path)
for url in item['pic_urls']:
img_path = '{0}/{1}.{2}'.format(kind_path, md5(url.encode("utf-8")).hexdigest(), 'jpg')
try:
if not os.path.exists(img_path):
urlretrieve(url, filename=img_path)
except :
continue
print(item['name']+"寫入完畢")
return item

以上。

scrapy根據關鍵字爬取google圖片

scrapy根據關鍵字爬取google圖片

思路——根據網站鏈接爬取整個圖片網站

Scrapy:虎牙爬取，圖片存儲與數據分析

福利向---Scrapy爬蟲爬取多級圖片網站

scrapy-redis分散式爬蟲爬取美女圖片

python爬蟲: 指定關鍵字爬取圖片

python3-按關鍵字爬取百度圖片

Scrapy--使用phantomjs爬取花瓣網圖片

scrapy爬取美女圖片

scrapy框架來爬取桌布網站並將圖片下載到本地檔案中

一個鹹魚的Python爬蟲之路（三）：爬取網頁圖片

python爬取網頁圖片

簡單的爬取網頁圖片

Python 爬取美女圖片，分目錄多級存儲

scrapy結合selenium爬取淘寶等動態網站

練習-爬取某圖片及查詢IP地址

爬取動態圖片—以百度圖片為例

python3爬取女神圖片，破解盜鏈問題

網絡爬蟲（爬取網站圖片，自動保存本地）

scrapy初探之爬取武sir首頁博客

scrapy根據關鍵字爬取google圖片

相關推薦