爬取糗事百科1到5頁的圖片並下載到本地

阿新 • • 發佈：2019-04-05

safari pre url height 入參取圖參數 user 使用

思路如下：

首先找到圖片的節點

<div class="thumb">

<a href="/article/121672165" target="_blank">
<img src="//pic.qiushibaike.com/system/pictures/12167/121672165/medium/NTDNQY3EJKUSRZ2X.jpg" alt="糗事#121672165" class="illustration" width="100%" height="auto">
</a>
</div>

找到爬取頁面的url

https://www.qiushibaike.com/imgrank/

發起請求拿到響應,略

使用正則表達式來獲取圖片的src

re.compile(‘<div class="thumb">.*?<img src="(.*?)>".*?</div>‘, re.S)

最後持久化寫入文件。

具體代碼如下：

import requests
import re
import os

url = "https://www.qiushibaike.com/imgrank/page/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
}

pattern = re.compile(‘<div class="thumb">.*?<img src="(.*?)".*?>.*?</div>‘, re.S)

if not os.path.exists("./imgs"):
    os.mkdir("imgs")

# 觀察糗事百科的分頁url,不需要傳入參數
for page in range(1, 6):
    # 直接更新url切換頁面
    new_url = url + "%s/" % page
    response = requests.get(url=new_url, headers=headers)
    page_text = response.text
    # 拿到所有圖片的鏈接列表
    list_img = pattern.findall(page_text)

    # 持久化存儲
    page_path = "pages%s/" % page
    os.mkdir("imgs/%s" % page_path)

    for my_url in list_img:
        # 將圖片url補充完整
        url_img = "https:" + my_url
        print(url_img)
        # 拿到圖片的二進制文件
        data_img = requests.get(url=url_img, headers=headers).content
        # 圖片的名稱
        name_img = my_url.split("/")[-1]
        print(name_img)
        # 寫入到本地的文件的路徑
        path_img = "imgs/" + page_path + name_img
        print(path_img)
        with open(path_img, "wb") as fp:
            fp.write(data_img)

爬取糗事百科1到5頁的圖片並下載到本地

safari pre url height 入參取圖參數 user 使用思路如下：首先找到圖片的節點<div class="thumb"> <a href="/article/121672165" target="_blank"> &

爬取糗事百科圖片，（截止至2016/10/23可用）

區分開頭像和圖片所在資料夾就好頭像 <div class="article block untagged mb15" id='qiushi_tag_117810314'> <di

Python爬蟲-爬取糗事百科段子

hasattr com ima .net header rfi star reason images 閑來無事，學學python爬蟲。在正式學爬蟲前，簡單學習了下HTML和CSS，了解了網頁的基本結構後，更加快速入門。 1.獲取糗事百科url http://www.qiu

利用python爬取糗事百科的用戶及段子

我們什麽 roo urlopen gen 文件 addheader find 正則匹配最近正在學習python爬蟲，爬蟲可以做很多有趣的事，本文利用python爬蟲來爬取糗事百科的用戶以及段子，我們需要利用python獲取糗事百科一個頁面的用戶以及段子，就需要匹配兩次，

Python 爬取糗事百科段子

爬蟲 Python 百科段子直接上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- import re import urllib.request def gettext(url,page): headers=("User-Agen

案例_(多線線程)爬取糗事百科

false 內容圖片 nbsp strip 5.0 mpat 交流 strong 1 # 使用了線程庫 2 import threading 3 # 隊列 4 from queue import Queue 5 # 解析庫 6 from lxml

爬取糗事百科案例

from random import choice import requests import re user_agents=[ "User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHT

scrapy框架爬蟲爬取糗事百科之 Python爬蟲從入門到放棄第不知道多少天（1）

Scrapy框架安裝及使用 1. windows 10 下安裝 Scrapy 框架：　　前提：安裝了python-pip 　　1. windows下按住win+R 輸入cmd 　　2. 在cmd 下輸入　　　　　　pip install scrapy 　　　　　　pip inst

Python :爬取糗事百科段子

原始碼： import urllib import random def JokeSet(Url,UserAgent) ''' Url ：動態url網址 UserAgent :動態請求頭 ''' #設定請求頭 Headers ={ "User-Agent" : UserAgent

requests爬取糗事百科頁面

requests爬取糗事百科,由於糗事百科是靜態頁面,用簡單的requests即可程式碼如下: import requests import lxml.html class Qiu: def __init__(self, name_, url_base): """

Python爬蟲爬取糗事百科(xpath+re)

爬取糗事百科，用xpath、re提取 =================================================== ===================================================== 1 ''' 2 爬取醜事百科，頁面

使用python的requests、xpath和多執行緒爬取糗事百科的段子

程式碼主要使用的python中的requests模組、xpath功能和threading多執行緒爬取了糗事百科中段子的內容、圖片和閱讀數、段子作者的性別，年齡和頭像。 # author: aspiring import requests from lxml import

Scrapy框架的應用———爬取糗事百科檔案

專案主程式碼： 1 import scrapy 2 from qiushibaike.items import QiushibaikeItem 3 4 class QiubaiSpider(scrapy.Spider): 5 name = 'qiubai' 6

用BeautifulSoup爬取糗事百科段子

from bs4 import BeautifulSoup import lxml import requests import html import time import html5lib import re def crawl_joke_list_usebs4(pag

NO.33——XPath選擇器爬取糗事百科段子

程式碼實戰： # -*- coding:utf-8 -*- import urllib import requests import re import chardet from lxml import etree page = 2 url = 'ht

Python爬蟲從入門到精通(3): BeautifulSoup用法總結及多執行緒爬蟲爬取糗事百科

本文是Python爬蟲從入門到精通系列的第3篇。我們將總結BeautifulSoup這個解析庫以及常用的find和select方法。我們還會利用requests庫和BeauitfulSoup來爬取糗事百科上的段子, 並對比下單執行緒爬蟲和多執行緒爬蟲的爬取效率。什麼是

使用threading,queue,fake_useragent,requests ,lxml,多執行緒爬取嗅事百科13頁文字資料,爬蟲案例

#author:huangtao # coding=utf-8 #多執行緒庫 from threading import Thread #佇列庫 from queue import Queue #請求庫 from fake_useragent import UserAgent

爬取糗事百科的頁面

import requests class QiuShiBaiKe(): def __init__(self): """ 初始化引數 """ self.url_bash = 'https://www.qiushibaike.

python爬取糗事百科資料並儲存到sqlite中，命令列讀出

import requests import sqlite3 from bs4 import BeautifulSoup class QSBK: def __init__(self): self.page=0 self.items=[

爬取糗事百科文欄位子，（2016年10月22日可用）

簡單的利用bs4提取了一些東西，中途嘗試了網上的多個版本，自己簡單的模仿了一下。主要提取部分： <a href="/article/117808662" target="_blank" cla

爬取糗事百科1到5頁的圖片並下載到本地

相關推薦