requests與BeautifulSoup爬取嗅事百科

阿新 • • 發佈：2018-11-10

爬取嗅事百科

今天我們利用requests和bs4來爬取嗅事百科的內容。

爬取步驟：

分析網頁結構
利用request來獲取網頁內容
利用bs4來篩選網頁內容
列印或者儲存網頁內容

接下來，我們一步一步來完成這些事
1.分析網頁結構
在這裡插入圖片描述
由此可知，段子裡面的容都是儲存在

這個標籤下的標籤中，所以我們可以利用bs4來進行刪選。

2.利用request來獲取網頁內容

#模擬瀏覽器
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
#獲取網頁內容
r = requests.get('http://www.qiushibaike.com', headers = headers).text

3.利用bs4來篩選網頁內容

#利用lxml解析網頁內容
soup = BeautifulSoup(r, 'lxml')
#找到所有上面的內容的標籤
divs = soup.find_all('div',attrs={'class':'content'})

4.將內容打印出來

#列印所有的內容
for div in divs:
    contents = div.span.get_text()
    print(contents)

#開啟檔案，寫入內容
   with open('C:\\Users\\Administrator\\Desktop\\11.txt','a') as f:
        f.write(contents)

總程式：

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
r = requests.get('http://www.qiushibaike.com', headers = headers).text
soup = BeautifulSoup(r, 'lxml')

divs = soup.find_all('div',attrs={'class':'content'})
print(divs)

for div in divs:
    contents = div.span.get_text()
    with open('C:\\Users\\Administrator\\Desktop\\11.txt','a',encoding='utf-8') as f:
        f.write(contents)
    print(contents)

我們做一個比較全面的，爬取某頁的嗅事百科
看一下這兩張圖片，就知道了區別了
在這裡插入圖片描述
可以看出來，url的地址是不一樣的，區別就是最後這個數字，這樣我們就很好去選擇了。
程式碼：

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
#傳入引數，第幾頁的引數
def pages(num):
    url = 'https://www.qiushibaike.com/8hr/page/'+'str(num)'
    r = requests.get(url, headers = headers).text
    soup = BeautifulSoup(r, 'lxml')

    divs = soup.find_all('div',attrs={'class':'content'})
    print(divs)

    for div in divs:
        contents = div.span.get_text()
        with open('C:\\Users\\Administrator\\Desktop\\11.txt','a',encoding='utf-8') as f:
            f.write(contents)
        print(contents)
pages(5)

requests與BeautifulSoup爬取嗅事百科

爬取嗅事百科今天我們利用requests和bs4來爬取嗅事百科的內容。爬取步驟：分析網頁結構利用request來獲取網頁內容利用bs4來篩選網頁內容列印或者儲存網頁內容接下來，我們一步一步來完成這些事 1.分

使用threading,queue,fake_useragent,requests ,lxml,多執行緒爬取嗅事百科13頁文字資料,爬蟲案例

#author:huangtao # coding=utf-8 #多執行緒庫 from threading import Thread #佇列庫 from queue import Queue #請求庫 from fake_useragent import UserAgent

requests與BeautifulSoup爬取網頁圖片

requests+BeautifulSoup爬取網頁圖片最近一直抽時間在看requests+BeautifulSoup爬取網頁內容這一塊的內容，所以，打算把自己看的總結一下，分享也是一種學醫，給自己做做筆記。 1.首先，我們看一下requests庫 requests

用BeautifulSoup爬取糗事百科段子

from bs4 import BeautifulSoup import lxml import requests import html import time import html5lib import re def crawl_joke_list_usebs4(pag

requests爬取糗事百科頁面

requests爬取糗事百科,由於糗事百科是靜態頁面,用簡單的requests即可程式碼如下: import requests import lxml.html class Qiu: def __init__(self, name_, url_base): """

使用python的requests、xpath和多執行緒爬取糗事百科的段子

程式碼主要使用的python中的requests模組、xpath功能和threading多執行緒爬取了糗事百科中段子的內容、圖片和閱讀數、段子作者的性別，年齡和頭像。 # author: aspiring import requests from lxml import

Python爬蟲從入門到精通(3): BeautifulSoup用法總結及多執行緒爬蟲爬取糗事百科

本文是Python爬蟲從入門到精通系列的第3篇。我們將總結BeautifulSoup這個解析庫以及常用的find和select方法。我們還會利用requests庫和BeauitfulSoup來爬取糗事百科上的段子, 並對比下單執行緒爬蟲和多執行緒爬蟲的爬取效率。什麼是

Python爬蟲-爬取糗事百科段子

hasattr com ima .net header rfi star reason images 閑來無事，學學python爬蟲。在正式學爬蟲前，簡單學習了下HTML和CSS，了解了網頁的基本結構後，更加快速入門。 1.獲取糗事百科url http://www.qiu

利用python爬取糗事百科的用戶及段子

我們什麽 roo urlopen gen 文件 addheader find 正則匹配最近正在學習python爬蟲，爬蟲可以做很多有趣的事，本文利用python爬蟲來爬取糗事百科的用戶以及段子，我們需要利用python獲取糗事百科一個頁面的用戶以及段子，就需要匹配兩次，

Python 爬取糗事百科段子

爬蟲 Python 百科段子直接上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- import re import urllib.request def gettext(url,page): headers=("User-Agen

案例_(多線線程)爬取糗事百科

false 內容圖片 nbsp strip 5.0 mpat 交流 strong 1 # 使用了線程庫 2 import threading 3 # 隊列 4 from queue import Queue 5 # 解析庫 6 from lxml

爬取糗事百科案例

from random import choice import requests import re user_agents=[ "User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHT

scrapy框架爬蟲爬取糗事百科之 Python爬蟲從入門到放棄第不知道多少天（1）

Scrapy框架安裝及使用 1. windows 10 下安裝 Scrapy 框架：　　前提：安裝了python-pip 　　1. windows下按住win+R 輸入cmd 　　2. 在cmd 下輸入　　　　　　pip install scrapy 　　　　　　pip inst

Python :爬取糗事百科段子

原始碼： import urllib import random def JokeSet(Url,UserAgent) ''' Url ：動態url網址 UserAgent :動態請求頭 ''' #設定請求頭 Headers ={ "User-Agent" : UserAgent

Python爬蟲爬取糗事百科(xpath+re)

爬取糗事百科，用xpath、re提取 =================================================== ===================================================== 1 ''' 2 爬取醜事百科，頁面

Scrapy框架的應用———爬取糗事百科檔案

專案主程式碼： 1 import scrapy 2 from qiushibaike.items import QiushibaikeItem 3 4 class QiubaiSpider(scrapy.Spider): 5 name = 'qiubai' 6

NO.33——XPath選擇器爬取糗事百科段子

程式碼實戰： # -*- coding:utf-8 -*- import urllib import requests import re import chardet from lxml import etree page = 2 url = 'ht

爬取糗事百科的頁面

import requests class QiuShiBaiKe(): def __init__(self): """ 初始化引數 """ self.url_bash = 'https://www.qiushibaike.

python爬取糗事百科資料並儲存到sqlite中，命令列讀出

import requests import sqlite3 from bs4 import BeautifulSoup class QSBK: def __init__(self): self.page=0 self.items=[

爬取糗事百科文欄位子，（2016年10月22日可用）

簡單的利用bs4提取了一些東西，中途嘗試了網上的多個版本，自己簡單的模仿了一下。主要提取部分： <a href="/article/117808662" target="_blank" cla

requests與BeautifulSoup爬取嗅事百科

爬取嗅事百科

相關推薦