PYTHON爬取圖片

阿新 • • 發佈：2022-12-11

from threading import Thread
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Process, Queue
import requests
from lxml import etree
from urllib import parse

# 異常處理還未優化，後續補上
# 未解決問題1：這是爬取多個頁面的當前所有圖片，圖片內部的還未處理
# 未解決問題2：當爬取頁面過多時，會報錯，原因還未找到，後續補上

headers = {
    "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Mobile Safari/537.36",
    # 防盜鏈 : 朔源，當前本次請求的上一級是誰
    "Referer": "https://xxx"
}


def get_img_src(q):
    urls = []
    for i in range(1, 5):
        if i == 1:
            a = f"https://xxx/index.html"
        else:
            a = f"https://xxx/{i}.html"
        urls.append(a)
    href_list_all = []
    for i in urls:
        resp = requests.get(i, headers=headers)
        resp.encoding = 'utf-8'
        tree = etree.HTML(resp.text)
        href_list = tree.xpath("//div[@class='list-box-p']/ul/li/a/@href")
        href_list_all.append(href_list)

    for all_list in href_list_all:
        for href in all_list:
            child_resp = requests.get(href, headers=headers)
            child_resp.encoding = 'utf-8'
            child_tree = etree.HTML(child_resp.text)
            src = child_tree.xpath("//div[@class='img_box']/a/img/@src")[0]  # 注意這裡獲取的是列表,需要取裡面的下標為0的第一個元素值
            q.put(src)  # 迴圈向佇列裡裝東西,後面好給下載用
            print(f"---------------------------------------------------被塞進佇列--------------------->{src}")
    q.put("完事了")


def download(src):
    print('開始下載------------>', src)
    name = src.split('/')[-1]
    with open("./image/" + name, mode='wb') as f:
        resp = requests.get(src, headers=headers)
        f.write(resp.content)
    print('下載完畢------------>', src)


def download_img(q):
    with ThreadPoolExecutor(5) as t:
        while 1:
            src = q.get()  # 從佇列裡拿東西,如果沒資料就阻塞,一直等著有資料來
            if src == "完事了":
                break
            t.submit(download, src)


if __name__ == '__main__':
    q = Queue()
    p1 = Process(target=get_img_src, args=(q,))
    p2 = Process(target=download_img, args=(q,))
    p1.start()
    p2.start()

python爬取圖片遇見src亂碼： data:image/png;base64

　　python爬取圖片遇見src亂碼： data:image/png;base64 　　向爬取自己喜歡的圖片，但是在爬取下來的程式碼當中圖片的src會出現亂碼的情況：data:image/png;base64。搞了我好長時間，試過偽裝headers，也試過通過修

快看，這是我為你準備的Python爬取圖片教程

爬取圖片例項 •selenium+win32爬取圖片 Python學習交流Q群：903971231##### \"\"\"爬取圖片\"\"\"

PYTHON爬取圖片

from threading import Threadfrom concurrent.futures import ThreadPoolExecutorfrom multiprocessing import Process, Queueimport requestsfrom lxml import etreefrom urllib import parse# 異常處理還未優化，

Python爬取知乎圖片程式碼實現解析

首先，需要獲取任意知乎的問題，只需要你輸入問題的ID，就可以獲取相關的頁面資訊，比如最重要的合計有多少人回答問題。

python 爬取指定網頁中的圖片（python crawls the image in the specified page）

來自《Python專案案例開發從入門到實戰》（清華大學出版社鄭秋生夏敏捷主編）中爬蟲應用——抓取百度圖片

python 爬取指定網頁中的圖片精細版（python crawls the image in the specified page fine version）

來自《Python專案案例開發從入門到實戰》（清華大學出版社鄭秋生夏敏捷主編）中爬蟲應用——抓取百度圖片

利用python爬取網頁圖片

\"\"\"利用python爬取網頁圖片\"\"\" import requests import urllib from bs4 import BeautifulSoup import json

python根據使用者需求輸入想爬取的內容及頁數爬取圖片方法詳解

本次小編向大家介紹的是根據使用者的需求輸入想爬取的內容及頁數。主要步驟：

實用python爬取妹子圖網站圖片

參考自: https://gitee.com/52itstyle/Python/blob/master/Day01/%E8%84%9A%E6%9C%AC/%20mzitu_win.py 注: 未成年請在家長的陪同下使用該指令碼與訪問該網站

Python爬取百度圖片

import urllib.request as urqt import urllib.parse as urps from urllib.parse import quote import requests

Python 爬取b站專欄圖片

當olinr學會了爬蟲。。。嘿嘿嘿 import urllib.request as urqt import urllib.parse as urps import sys

python爬取堆糖網每日精選圖片

前言本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,如有問題請及時聯絡我們以作處理。

3分鐘Python爬取9000張表情包圖片

先看下我的爬取成果：很多人學習python，不知道從何學起。很多人學習python，掌握了基本語法過後，不知道在哪裡尋找案例上手。很多已經做案例的人，卻不知道如何去學習更加高深的知識。那麼針對這三類人，我給大家

使用Python爬取網頁圖片

下載https://www.mayiwenku.com/p-4957235.html 網頁的MATLAB答案下載一張照片 import requests headers = {\"User-Agent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72