python多程序爬蟲

阿新 • • 發佈：2019-02-14

import re
import time
from multiprocessing import Pool
import requests

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0'
}

def re_scraper(url):
    res = requests.get(url,headers=headers)
    names = re.findall('<h2>(.*?)</h2>',res.text,re.S)
    contents = re.findall('<div class="content">.*?</div',res.text,re.S)
    laughs = re.findall('<span class="stats-vote">.*?<i class="number">(\d+)</i>',res.text,re.S)
    comments = re.findall('<i class="number">(\d+)</i> 評論',res.text,re.S)

    infos = list()
    for name,content,laugh,comment in zip(names,contents,laughs,comments):
        info = {
            'name':name,
            'content':comment,
            'laugh':laugh,
            'comment':comment,
        }
        infos.append(info)
        return  infos

if __name__ == "__main__":
    # re_scraper("https://www.qiushibaike.com/8hr/page/1/")
    urls = ["https://www.qiushibaike.com/8hr/page/{}/".format(str(i)) for i in range(1,35)]
    start_1 = time.time()
    for url in urls:
        re_scraper(url)
    end_1 = time.time()
    print('序列爬蟲時間:',end_1-start_1)

    start_2 = time.time()
    pool = Pool(processes=2)
    pool.map(re_scraper,urls)
    end_2 = time.time()
    print('2程序爬蟲耗時:',end_2-start_2)


    start_3 = time.time()
    pool = Pool(processes=4)
    pool.map(re_scraper,urls)
    end_3 = time.time()
    print('2程序爬蟲耗時:',end_3-start_3)

python多程序爬蟲

import re import time from multiprocessing import Pool import requests headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64;

[Python爬蟲]爬蟲例項:爬取PEXELS圖片---修改為多程序爬蟲

第二次修改的地址---->爬蟲例項:爬取PEXELS圖片—解決非同步載入問題在前面的修改中,我們通過使用逆向工程成功解決了非同步載入的問題.但同時還有一個問題:效率問題,受限於網速,假如使用單程序下載圖片時下載的速度沒有佔滿,而使用多個程序時下載速度能夠佔滿的話,那麼多程序爬蟲在

python爬蟲入門八：多程序/多執行緒 python佇列Queue Python多執行緒（2）——執行緒同步機制 python學習筆記——多程序中共享記憶體Value & Array python 之多程序 Python多程序 Python 使用multiprocessing 特別耗記

什麼是多執行緒/多程序引用蟲師的解釋：計算機程式只不過是磁碟中可執行的，二進位制（或其它型別）的資料。它們只有在被讀取到記憶體中，被作業系統呼叫的時候才開始它們的生命期。程序（有時被稱為重量級程序）是程式的一次執行。每個程序都有自己的地址空間，記憶體，資料棧以及其它記錄其執行軌跡的輔助資料

python多程序爬蟲

python多程序爬蟲

[Python爬蟲]爬蟲例項:爬取PEXELS圖片---修改為多程序爬蟲

python爬蟲入門八：多程序/多執行緒 python佇列Queue Python多執行緒（2）——執行緒同步機制 python學習筆記——多程序中共享記憶體Value & Array python 之多程序 Python多程序 Python 使用multiprocessing 特別耗記

Python爬蟲入門——3.1 多程序爬蟲

python爬蟲學習筆記--python多程序

Python多程序協程爬蟲----1

如果你不會Python多程序！那你會爬蟲？扯淡！抓取拉鉤網十萬資料

python爬蟲：編寫多程序爬蟲學習筆記

python 多程序與子程序

Python(多程序multiprocessing模組)

Python 多程序並行程式設計實踐: multiprocessing 模組

PYTHON——多程序：Process類

Python多程序併發操作程序池Pool

python多程序的理解 multiprocessing Process join run

Python——多程序

python多程序監聽rabbitmq

python 多程序練習

python多程序————10、程序間的通訊-Queue、Manager、Pipe

python多程序———9、multiprocessing多程序程式設計

python 多程序/多執行緒/協程同步非同步

python多程序爬蟲

相關推薦