python爬蟲之糗事百科文字笑話

阿新 • • 發佈：2019-02-12

##執行環境
python：python3.6.5
IDE：pycharm
##依賴模組
request，re

##實現目的
實現從糗事百科網站上爬取所有的文字笑話，以txt的文字儲存在程式所在資料夾內
原始碼可以直接執行
##原始碼

import requests
import re

#設定UA，模擬瀏覽器正常訪問
head={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

#獲取網頁
def get_page 
(url):
    try:
        response=requests.get(url,headers=head)
        #判斷網頁狀態碼，是否正確獲取網頁
        if response.status_code==200:
            #返回解析後的網頁
            return response.text
        return None
    except Exception:
        #提示資訊
        print('出現錯誤，請檢查重試')
def parse_page(html):
    print('正在解析資料.....' 
)
    #正則表示式，規則書寫
    req_pattern = re.compile(
        '<div class="article block untagged mb15 typs.*?>.*?class="content".*?span>(.*?)</span>',
        re.S)
    #查詢符合正則的內容
    joke_texts=re.findall(req_pattern,html)
    print('正在寫入文字.......')
    save_to_text(joke_texts)
def save_to_text 
(joke_texts):
    #檔案寫入
    with open('糗事百科.txt', 'a+', encoding='utf-8') as f:
        for i in range(0, len(joke_texts)):
            k = joke_texts[i].replace('<br/>', '\n')
            f.write(k)

def main(page):
    url='https://www.qiushibaike.com/text/page/{}/'.format(page)
    print('正在請求第{}頁資料...'.format(page))
    html=get_page(url)
    if html:
        parse_page(html)
if __name__ == '__main__':
    for i in range(1,14):
        main(i)
    print('資料寫入完成，請愉快閱讀')

效果如下
這裡寫圖片描述

python爬蟲之糗事百科文字笑話

##執行環境 python：python3.6.5 IDE：pycharm ##依賴模組 request，re ##實現目的實現從糗事百科網站上爬取所有的文字笑話，以txt的文字儲存在程式所在資料夾內

Python爬蟲之糗事百科段子寫入MySQL資料庫

在《Python爬取糗事百科段子》這篇文章中，我們獲取到了每一個段子的內容（content）、作者(auth)、作者主頁(home)、點贊數(votes)、評論數(comments)、段子地址(content_href)等資訊，現在我們只需要根據以上欄位名，建立資料庫表，將資訊逐條寫入資料庫就可以了

手寫爬蟲之糗事百科段子及神回覆

先貼程式碼吧，然後再說遇到的坑 #!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2018/10/11 16:35 # @Author : yuantup # @Site : # @File : jokes_.py # @S

python抓取糗事百科文字內容

最近用python處理了蠻多資料，也自己稍微學習爬取了一些資料。主要是用requests和BeautifulSoup。以下例子是糗事百科的內容爬取，儲存的格式為：(user_name, user_picture, qiushi, [good_cmt])，good_cmt可能不存在。程式碼如

python爬蟲基礎案例之糗事百科

alt 依靠 webdriver pytho 糗事百科代碼 web 分享圖片 sel 關於爬蟲也是剛接觸，案例是基於python3做的，依靠selenium的webdriver做的，所以python3必須有selenium這個包，如果是基於谷歌瀏覽器的話需要下載谷歌瀏

利用python爬取糗事百科的用戶及段子

我們什麽 roo urlopen gen 文件 addheader find 正則匹配最近正在學習python爬蟲，爬蟲可以做很多有趣的事，本文利用python爬蟲來爬取糗事百科的用戶以及段子，我們需要利用python獲取糗事百科一個頁面的用戶以及段子，就需要匹配兩次，

Python 爬取糗事百科段子

爬蟲 Python 百科段子直接上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- import re import urllib.request def gettext(url,page): headers=("User-Agen

Android實戰——jsoup實現網絡爬蟲，糗事百科項目的起步

網絡數據標識爬蟲 android thumb 技術分享由於網絡數界面本篇文章包括以下內容：前言 jsoup的簡介 jsoup的配置 jsoup的使用結語對於Android初學者想要做項目時，最大的煩惱是什麽？毫無疑問是數據源的缺乏，當然可以選

Python :爬取糗事百科段子

原始碼： import urllib import random def JokeSet(Url,UserAgent) ''' Url ：動態url網址 UserAgent :動態請求頭 ''' #設定請求頭 Headers ={ "User-Agent" : UserAgent

用python抓取糗事百科的小程式

直接上程式碼和執行結果 #by suwenhao #QQ 2487872782 import urllib import urllib2 import re page = 1 url = 'http

Python 爬去糗事百科內容講解

參考：http://blog.csdn.net/flyingfishmark/article/details/51251534 爬取前我們先看一下我們的目標： 1.抓取糗事百科熱門段子 2.過濾帶有圖片的段子 3.段子的釋出人，段子內容，好笑數，評論數 # -*

python爬取糗事百科資料並儲存到sqlite中，命令列讀出

import requests import sqlite3 from bs4 import BeautifulSoup class QSBK: def __init__(self): self.page=0 self.items=[

獲取糗事百科文字欄目所有用戶ID

header ade window 存儲 time test gecko com html import requests from lxml import etree import time headers = {‘User-Agent‘: ‘Mozilla/5.0 (

scrapy框架爬蟲爬取糗事百科之 Python爬蟲從入門到放棄第不知道多少天（1）

Scrapy框架安裝及使用 1. windows 10 下安裝 Scrapy 框架：　　前提：安裝了python-pip 　　1. windows下按住win+R 輸入cmd 　　2. 在cmd 下輸入　　　　　　pip install scrapy 　　　　　　pip inst

Python爬蟲-爬取糗事百科段子

hasattr com ima .net header rfi star reason images 閑來無事，學學python爬蟲。在正式學爬蟲前，簡單學習了下HTML和CSS，了解了網頁的基本結構後，更加快速入門。 1.獲取糗事百科url http://www.qiu

python 爬蟲--糗事百科段子

decode imp rst -a paragraph 糗事百科 mozilla ont ner import reimport urllib.requestfrom docx import Documentheader=("User-Agent",‘User-Agent:

Python爬蟲(十七)_糗事百科案例

exce html str window path {} zh-cn use src 糗事百科實例爬取糗事百科段子，假設頁面的URL是: http://www.qiushibaike.com/8hr/page/1 要求：使用requests獲取頁面信息，用XPath/

Python爬蟲(十八)_多線程糗事百科案例

.json afa 安全 rip down 退出交互 encode tar 多線程糗事百科案例案例要求參考上一個糗事百科單進程案例:http://www.cnblogs.com/miqi1992/p/8081929.html Queue(隊列對象) Queue是pyth

Python 爬蟲系列：糗事百科最熱段子

image .get headers BE write findall parse 調用 with open 1.獲取糗事百科url http://www.qiushibaike.com/hot/page/2/ 末尾2指第2頁 2.分析頁面，找到段子部分的位置，

Python爬蟲爬取糗事百科(xpath+re)

爬取糗事百科，用xpath、re提取 =================================================== ===================================================== 1 ''' 2 爬取醜事百科，頁面

python爬蟲之糗事百科文字笑話

相關推薦