[python][爬蟲]批量爬取【漫畫DB】的漫畫圖片

阿新 • • 發佈：2020-08-12

今天看漫畫的時候感覺用瀏覽器實在不爽，就寫個小程式爬了下來。順便安利一個漫畫軟體MComix，超級好用（Linux限定，Win可以用ComicViewer）

import requests
import os
from bs4 import BeautifulSoup
class Manhuadb:
    def __init__(self, url,):
        self.url = url
        print(type(self.url))
        self.header = {
            'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
            "Accept-Encoding":"gzip, deflate, br",
            "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            'Cache-Control':'max-age=0',
            'Connection':'keep-alive',
            'Cookie':'PHPSESSID=tcsqlbacj644lgdi9gf3ngj3sf; Hm_lvt_b09a6e73b4faec9edd5935dc45604b5b=1597234279,1597234684; Hm_lpvt_b09a6e73b4faec9edd5935dc45604b5b=1597236589; _ga=GA1.2.1851898722.1597234280; _gid=GA1.2.615465153.1597234280; __cfduid=df113e815d70b868f8f455ea0cd34d9271597234297',
            'Host':"www.manhuadb.com",
            'Upgrade-Insecure-Requests':1,
            'User-Agent':"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0",
            }
        self.start()

    def judge_type(self):
        if "https://www.manhuadb.com/manhua/" in self.url:
            loc = self.url[32:]
            if '/' in loc and ".html" in loc:
                self.type_ = 'chapter'
            else:
                self.type_ = 'book'
        else:
            self.type_ = None
    
    def get_page(self, url):
        html = requests.get(url, )
        soup = BeautifulSoup(html.text)
        self.title = soup.find_all('h1')[0].text
        self.chapter = soup.find_all('h2')[0].text
        img = soup.find_all("img", class_="img-fluid show-pic")[0]['src']
        return img

    def get_page_list_from_page(self, url):
        html = requests.get(url, )
        soup = BeautifulSoup(html.text)
        s0 = int(soup.find_all('div', class_="d-none vg-r-data")[0]['data-total'])
        id_ = url.split('/')[4]
        page_id = url.split('/')[5][:-5].split('p')[0]
        self.title = soup.find_all('h1')[0].text
        self.chapter = soup.find_all('h2')[0].text
        return *[
            "https://www.manhuadb.com/manhua/{}/{}p{}.html".format(id_, page_id, each + 1)
            for each in range(s0)
        ],

    def get_chapter_list_from_book(self, url):
        html = requests.get(url, )
        soup = BeautifulSoup(html)
        li = soup.find_all("li", class_="sort_div")
        self.title = soup.find_all('h1')[0].text
        self.chapter = soup.find_all('h2')[0].text
        return *[
            "https://www.manhuadb.com"+each.find_all('a')['href']
            for each in li
        ],

    def get_imgs(self, src, page):
        img = requests.get(src)
        with open('{}.jpg'.format(page), 'wb+') as f:
            f.write(img.content)
            f.close()

    def get_chapter(self, url):
        l = self.get_page_list_from_page(url)
        os.mkdir(self.chapter)
        cwd = os.getcwd()
        os.chdir(self.chapter)
        for each in l:
            self.get_imgs(self.get_page(each), l.index(each))
        os.chdir(cwd)

    def start(self):
        self.judge_type()
        if self.type_ == "chapter":
            self.get_chapter(self.url)
        elif self.type_ == "book":
            chapter_list = self.get_chapter_list_from_book(self.url)
            os.mkdir(self.title)
            cwd = os.getcwd()
            os.chdir(self.title)
            for each in chapter_list:
                self.get_chapter(each)
            os.chdir(cwd)
        elif self.type_ == None:
            print('wrong input')

if __name__ == "__main__":
    Manhuadb("https://www.manhuadb.com/manhua/147/1330_13291_p1.html")

效果：

[python][爬蟲]批量爬取【漫畫DB】的漫畫圖片

今天看漫畫的時候感覺用瀏覽器實在不爽，就寫個小程式爬了下來。順便安利一個漫畫軟體MComix，超級好用（Linux限定，Win可以用ComicViewer）

Python 爬蟲批量爬取網頁圖片儲存到本地的實現程式碼

其實和爬取普通資料本質一樣，不過我們直接爬取資料會直接返回，爬取圖片需要處理成二進位制資料儲存成圖片格式（.jpg,.png等）的資料文字。

python爬蟲：爬取某牙直播小姐姐圖片，我的雙手已經按捺不住了

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python爬蟲例項——爬取美團美食資料

1.分析美團美食網頁的url引數構成 1）搜尋要點美團美食，地址：北京，搜尋關鍵詞：火鍋

Python爬蟲之爬取淘女郎照片示例詳解

本篇目標抓取淘寶MM的姓名，頭像，年齡抓取每一個MM的資料簡介以及寫真圖片

python爬蟲實戰---爬取大眾點評評論

python爬蟲實戰—爬取大眾點評評論（加密字型） 1.首先開啟一個店鋪找到評論

python爬蟲實踐爬取今日頭條街拍圖（參考了python3webspider和github上的程式碼）

import requests from urllib.parse import urlencode from requests import codes import os from hashlib import md5

利用python批量爬取百度任意類別的圖片的實現方法

利用python批量爬取百度任意類別的圖片時：（1）：設定類別名字。（2）：設定類別的數目，即每一類別的的圖片數量。

Python爬蟲：爬取科技新聞，進而整理，產出資料分析

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python爬蟲：爬取喜馬拉雅音訊資料詳解

前言喜馬拉雅是專業的音訊分享平臺，彙集了有聲小說,有聲讀物,有聲書,FM電臺,兒童睡前故事,相聲小品,鬼故事等數億條音訊，我最喜歡聽民間故事和德雲社相聲集，你呢？

python爬蟲：爬取某圖外賣資料有這篇文章就夠了

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python爬蟲：爬取無賬號無限制獲取企查查資訊

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

python爬蟲：爬取窮遊網的地點資料，世界那麼大，我想去看看。

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

python爬蟲實現爬取同一個網站的多頁資料的例項講解

對於一個網站的圖片、文字音視訊等，如果我們一個個的下載，不僅浪費時間，而且很容易出錯。Python爬蟲幫助我們獲取需要的資料，這個資料是可以快速批量的獲取。本文小編帶領大家通過python爬蟲獲取獲取總頁數並更改

python爬蟲實現爬取網頁主頁資訊（html程式碼）

技術標籤：pythonweb python爬蟲實現爬取網頁主頁資訊（html程式碼） 1.爬取網站原始碼

Python爬蟲，爬取網站圖片，詳細解釋（看完就會）

Xpath 解析圖片專案 # 指定url url = \'http://pic.netbian.com/4kyingshi/\' # UA偽裝 headers = { \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \

python爬蟲--seebug爬取

爬取相關漏洞，並儲存到本地檔案思路： 1.找相關的標籤一步一步往下查詢 2.有cookie才能查詢

Python爬蟲——將爬取的資料存入excle

本性專案從淘車網爬取資料並將爬下來的資料生成excle表格安裝 lxml和XlsxWriter庫的時候飄紅，從Terminal和Python interpreter安裝都不行，最後試了試cmd

Python爬蟲之爬取豆瓣top

Python爬蟲之爬取豆瓣top import bs4# 網頁解析，獲取資料 import re# 正則表示式，進行文字匹配

初學python爬蟲，爬取“豆瓣電影 Top 250”相關資訊，並下載電影封面

注：所學的視訊教程：B站Python爬蟲基礎5天速成（2021全新合集）Python入門+資料視覺化

[python][爬蟲]批量爬取【漫畫DB】的漫畫圖片

今天看漫畫的時候感覺用瀏覽器實在不爽，就寫個小程式爬了下來。順便安利一個漫畫軟體MComix，超級好用（Linux限定，Win可以用ComicViewer）

相關推薦