python3爬蟲之使用Scrapy框架爬取性感女神美女照片
阿新 • • 發佈:2019-01-05
使用Scrapy框架爬取性感女神美女照片其實很簡單哦,只需要5分鐘,爬取上萬張性感女神照片。
先給大家看一下成果吧:
激不激動,興不興奮,那就快來學一下吧:
開始專案前需要安裝python3和Scrapy,不會的自行百度,這裡就不具體介紹了
接下來是程式碼部分
首先,建立專案
scrapy startproject Sexygirl
生成專案的目錄結構
首先需要定義抓取元素,在item.py中,我們這個專案用到了圖片名和連結
import scrapy class SexygirlItem(scrapy.Item): name = scrapy.Field() ImgUrl = scrapy.Field() pass
接下來在爬蟲目錄建立爬蟲檔案,並編寫主要程式碼,SexygirlSpider.py
# -*- coding: utf-8 -*- import scrapy from Sexygirl.items import SexygirlItem class SexygirlpiderSpider(scrapy.Spider): name = "Sexygirl" allowed_domains = ["maotiao.com"] start_urls = [ 'http://www.maotiao.com/xingganmeinv/' ] def parse(self, response): list = response.css("dl dd:not(.page)") for img in list: imgurl = img.css("a::attr(href)").extract_first() imgurl2 = str(imgurl) # next_url = response.css(".page-en:nth-last-child(2)::attr(href)").extract_first() # if next_url is not None: # # 下一頁 # yield response.follow(next_url, callback=self.parse) yield scrapy.Request(imgurl2, callback=self.content) def content(self, response): item = SexygirlItem() item['name'] = response.css(".content-pic img::attr(alt)").extract_first() item['ImgUrl'] = response.css(".content-pic img::attr(src)").extract() yield item # 提取圖片,存入資料夾 # print(item['ImgUrl']) next_url = response.css("a.page-ch:last-child::attr(href)").extract_first() if next_url is not None: # 下一頁 yield response.follow(next_url, callback=self.content)
圖片的連結和名稱已經獲取到了,接下來需要使用圖片通道下載圖片並儲存到本地,pipelines.py:
from scrapy.pipelines.images import ImagesPipeline from scrapy.exceptions import DropItem from scrapy.http import Request import re class MyImagesPipeline(ImagesPipeline): def get_media_requests(self, item, info): for image_url in item['ImgUrl']: yield Request(image_url,meta={'item':item['name']}) def file_path(self, request, response=None, info=None): name = request.meta['item'] name = re.sub(r'[?\\*|“<>:/()0123456789]', '', name) image_guid = request.url.split('/')[-1] filename = u'sexygirl/{0}/{1}'.format(name, image_guid) return filename def item_completed(self, results, item, info): image_path = [x['path'] for ok, x in results if ok] if not image_path: raise DropItem('Item contains no images') item['image_paths'] = image_path return item
最後在settings.py中設定儲存目錄並開啟通道:
# 設定圖片儲存路徑
IMAGES_STORE = 'F:/python/meizi'
#啟動pipeline中介軟體
ITEM_PIPELINES = {
'Sexygirl.pipelines.MyImagesPipeline': 300,
}
在根目錄下執行程式:
scrapy crawl Sexygirl
ok!!!下載完成後就可以欣賞女神啦!!!
爬取lol高清桌布請移步: