1. 程式人生 > >並發爬取網站圖片

並發爬取網站圖片

跳轉 返回 aps {} map light 網站 div utf-8

某網站的圖片:

通過“https://photo.fengniao.com/#p=4”(人像)進入某一主題。

顯示的是幾十張縮略的小圖片以及相應的跳轉地址,點擊小圖片後獲取大圖片。

想獲取小圖片背後的大圖片,如果通過串行方法依次訪問大圖鏈接後保存,會非常耗時。

1,使用多線程獲取圖片

import requests
from lxml import etree
from concurrent.futures import ThreadPoolExecutor
from functools import partial


def get_paths(path, regex, code):
    """
    :param path: 網頁
    :param regex: 解析規則
    :param code: 編碼
    :return: 根據解析規則,解析網頁後返回內容列表
    """
    resp = requests.get(path)
    if resp.status_code == 200:
        content = resp.content.decode(code)
        select = etree.HTML(content)
        paths = select.xpath(regex)
        return paths


def save_pic(path, pic_name, directory):
    """
    :param pic_name: 保存的圖片名稱
    :param path: 圖片的地址
    :param directory: 保存的圖片目錄
    """
    resp = requests.get(path, stream=True)
    if resp.status_code == 200:
        with open(‘{}/{}.jpg‘.format(directory, pic_name), ‘wb‘) as f:
            f.write(resp.content)


if __name__ == ‘__main__‘:
    paths = get_paths(‘https://photo.fengniao.com/#p=4‘, ‘//a[@class="pic"]/@href‘, ‘utf-8‘)
    paths = [‘https://photo.fengniao.com/‘ + p for p in paths]

    # 獲取所有大圖片路徑
    p = partial(get_paths, regex=‘//img[@class="picBig"]/@src‘, code=‘utf-8‘)  # 凍結規則和編碼
    with ThreadPoolExecutor() as excutor:
        res = excutor.map(p, paths)
    big_paths = [i[0] for i in res]  # 拿到所有圖片的路徑

    # 保存圖片
    p = partial(save_pic, directory=‘fn_pics‘)   # 凍結保存目錄
    with ThreadPoolExecutor() as excutor:
        res = excutor.map(p, range(len(big_paths)), big_paths)
    [r for r in res]  # res是個叠代器,需要遍歷觸發

並發爬取網站圖片