1. 程式人生 > >python3爬蟲之使用Scrapy框架爬取性感女神美女照片

python3爬蟲之使用Scrapy框架爬取性感女神美女照片

使用Scrapy框架爬取性感女神美女照片其實很簡單哦,只需要5分鐘,爬取上萬張性感女神照片。

先給大家看一下成果吧:

激不激動,興不興奮,那就快來學一下吧:

開始專案前需要安裝python3和Scrapy,不會的自行百度,這裡就不具體介紹了

接下來是程式碼部分

首先,建立專案

scrapy startproject Sexygirl

生成專案的目錄結構

首先需要定義抓取元素,在item.py中,我們這個專案用到了圖片名和連結

import scrapy

class SexygirlItem(scrapy.Item):
    name = scrapy.Field()
    ImgUrl = scrapy.Field()
    pass

接下來在爬蟲目錄建立爬蟲檔案,並編寫主要程式碼,SexygirlSpider.py

# -*- coding: utf-8 -*-
import scrapy
from Sexygirl.items import SexygirlItem

class SexygirlpiderSpider(scrapy.Spider):
    name = "Sexygirl"
    allowed_domains = ["maotiao.com"]
    start_urls = [
                  'http://www.maotiao.com/xingganmeinv/'
                  ]

    def parse(self, response):
        list = response.css("dl dd:not(.page)")
        for img in list:
            imgurl = img.css("a::attr(href)").extract_first()
            imgurl2 = str(imgurl)
            # next_url = response.css(".page-en:nth-last-child(2)::attr(href)").extract_first()
            # if next_url is not None:
            #     # 下一頁
            #     yield response.follow(next_url, callback=self.parse)

            yield scrapy.Request(imgurl2, callback=self.content)

    def content(self, response):
        item = SexygirlItem()
        item['name'] = response.css(".content-pic img::attr(alt)").extract_first()
        item['ImgUrl'] = response.css(".content-pic img::attr(src)").extract()
        yield item
        # 提取圖片,存入資料夾
        # print(item['ImgUrl'])
        next_url = response.css("a.page-ch:last-child::attr(href)").extract_first()

        if next_url is not None:
            # 下一頁
            yield response.follow(next_url, callback=self.content)

圖片的連結和名稱已經獲取到了,接下來需要使用圖片通道下載圖片並儲存到本地,pipelines.py:

from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
import re

class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['ImgUrl']:
            yield Request(image_url,meta={'item':item['name']})

    def file_path(self, request, response=None, info=None):
        name = request.meta['item']
        name = re.sub(r'[?\\*|“<>:/()0123456789]', '', name)
        image_guid = request.url.split('/')[-1]
        filename = u'sexygirl/{0}/{1}'.format(name, image_guid)
        return filename

    def item_completed(self, results, item, info):
        image_path = [x['path'] for ok, x in results if ok]
        if not image_path:
            raise DropItem('Item contains no images')
        item['image_paths'] = image_path
        return item

最後在settings.py中設定儲存目錄並開啟通道:

# 設定圖片儲存路徑
IMAGES_STORE = 'F:/python/meizi'
#啟動pipeline中介軟體
ITEM_PIPELINES = {
   'Sexygirl.pipelines.MyImagesPipeline': 300,
}

在根目錄下執行程式:

scrapy crawl Sexygirl

 ok!!!下載完成後就可以欣賞女神啦!!!

爬取lol高清桌布請移步: