爬取糗事百科文欄位子，（2016年10月22日可用）

阿新 • • 發佈：2019-01-27

簡單的利用bs4提取了一些東西，中途嘗試了網上的多個版本，自己簡單的模仿了一下。

主要提取部分：

<a href="/article/117808662" target="_blank" class='contentHerf' >
<div class="content">



<span>偶遇小朋友玩家家酒！<br/>一小姑娘說：誰要扮演老公的？只見小男孩們紛紛舉起小手：我、我、我……<br/>好，這是你的搓衣板和尿壺，你就跪在這上面手上拖著尿壺，我在旁邊化妝</span>


</div>
</a>

找到相應class提取span即可

from urllib.request import urlopen ,Request
from bs4 import BeautifulSoup
import re
import time

x=1
def gogogo(page):
    global x
    url = "http://www.qiushibaike.com/text/page/"+str(page)+"/?s=4922848"
    H = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    req = Request(url=url,headers=H)
    res = urlopen(req)
    H = BeautifulSoup(res,"html.parser")
    t = H.findAll('a',{"class":re.compile("content")})

    for i in t:
        lists = i.find('span')
        f.write(str(x)+":")
        x=x+1
        f.write(lists.get_text())
        f.write("\n\n")
    time.sleep(1)

if __name__ =="__main__":
    f = open("d.txt",'a',encoding='utf-8')
    for i in range(1,4):
        gogogo(i)
    print('Good Job!')
    f.close()

程式程式碼比較簡單，需要模擬一下瀏覽器訪問即可，正在學習計算機網路，第10行的內容可以根據自己的瀏覽器進行替換，學一下開發者工具還是很重要的。

爬取糗事百科文欄位子，（2016年10月22日可用）

簡單的利用bs4提取了一些東西，中途嘗試了網上的多個版本，自己簡單的模仿了一下。主要提取部分： <a href="/article/117808662" target="_blank" cla

爬蟲--使用scrapy爬取糗事百科並在txt文件中持久化存儲

max color 圖片得到 acc deb ould ins ant 工程目錄結構　spiders下的first源碼　　 # -*- coding: utf-8 -*- import scrapy from firstBlood.items imp

Python爬蟲-爬取糗事百科段子

hasattr com ima .net header rfi star reason images 閑來無事，學學python爬蟲。在正式學爬蟲前，簡單學習了下HTML和CSS，了解了網頁的基本結構後，更加快速入門。 1.獲取糗事百科url http://www.qiu

利用python爬取糗事百科的用戶及段子

我們什麽 roo urlopen gen 文件 addheader find 正則匹配最近正在學習python爬蟲，爬蟲可以做很多有趣的事，本文利用python爬蟲來爬取糗事百科的用戶以及段子，我們需要利用python獲取糗事百科一個頁面的用戶以及段子，就需要匹配兩次，

Python 爬取糗事百科段子

爬蟲 Python 百科段子直接上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- import re import urllib.request def gettext(url,page): headers=("User-Agen

案例_(多線線程)爬取糗事百科

false 內容圖片 nbsp strip 5.0 mpat 交流 strong 1 # 使用了線程庫 2 import threading 3 # 隊列 4 from queue import Queue 5 # 解析庫 6 from lxml

爬取糗事百科案例

from random import choice import requests import re user_agents=[ "User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHT

scrapy框架爬蟲爬取糗事百科之 Python爬蟲從入門到放棄第不知道多少天（1）

Scrapy框架安裝及使用 1. windows 10 下安裝 Scrapy 框架：　　前提：安裝了python-pip 　　1. windows下按住win+R 輸入cmd 　　2. 在cmd 下輸入　　　　　　pip install scrapy 　　　　　　pip inst

Python :爬取糗事百科段子

原始碼： import urllib import random def JokeSet(Url,UserAgent) ''' Url ：動態url網址 UserAgent :動態請求頭 ''' #設定請求頭 Headers ={ "User-Agent" : UserAgent

requests爬取糗事百科頁面

requests爬取糗事百科,由於糗事百科是靜態頁面,用簡單的requests即可程式碼如下: import requests import lxml.html class Qiu: def __init__(self, name_, url_base): """

Python爬蟲爬取糗事百科(xpath+re)

爬取糗事百科，用xpath、re提取 =================================================== ===================================================== 1 ''' 2 爬取醜事百科，頁面

使用python的requests、xpath和多執行緒爬取糗事百科的段子

程式碼主要使用的python中的requests模組、xpath功能和threading多執行緒爬取了糗事百科中段子的內容、圖片和閱讀數、段子作者的性別，年齡和頭像。 # author: aspiring import requests from lxml import

Scrapy框架的應用———爬取糗事百科檔案

專案主程式碼： 1 import scrapy 2 from qiushibaike.items import QiushibaikeItem 3 4 class QiubaiSpider(scrapy.Spider): 5 name = 'qiubai' 6

用BeautifulSoup爬取糗事百科段子

from bs4 import BeautifulSoup import lxml import requests import html import time import html5lib import re def crawl_joke_list_usebs4(pag

NO.33——XPath選擇器爬取糗事百科段子

程式碼實戰： # -*- coding:utf-8 -*- import urllib import requests import re import chardet from lxml import etree page = 2 url = 'ht

Python爬蟲從入門到精通(3): BeautifulSoup用法總結及多執行緒爬蟲爬取糗事百科

本文是Python爬蟲從入門到精通系列的第3篇。我們將總結BeautifulSoup這個解析庫以及常用的find和select方法。我們還會利用requests庫和BeauitfulSoup來爬取糗事百科上的段子, 並對比下單執行緒爬蟲和多執行緒爬蟲的爬取效率。什麼是

爬取糗事百科的頁面

import requests class QiuShiBaiKe(): def __init__(self): """ 初始化引數 """ self.url_bash = 'https://www.qiushibaike.

python爬取糗事百科資料並儲存到sqlite中，命令列讀出

import requests import sqlite3 from bs4 import BeautifulSoup class QSBK: def __init__(self): self.page=0 self.items=[

爬取糗事百科圖片，（截止至2016/10/23可用）

區分開頭像和圖片所在資料夾就好頭像 <div class="article block untagged mb15" id='qiushi_tag_117810314'> <di

python—多協程爬取糗事百科熱圖

wow64 monk 根據 list 網址 real span 本地 uil 今天在使用正則表達式時未能解決實際問題，於是使用bs4庫完成匹配，通過反復測試，最終解決了實際的問題，加深了對bs4.BeautifulSoup模塊的理解。爬取流程前奏：分析糗事百科熱圖板塊

爬取糗事百科文欄位子，（2016年10月22日可用）

相關推薦