糗事百科段子爬蟲

阿新 • • 發佈：2019-01-14

import re

import urllib.request

import urllib.error

headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0")

opener=urllib.request.build_opener()

opener.addheader=[headers]

urllib.request.install_opener(opener)

def use_proxy(url,IP):

proxy=urllib.request.ProxyHandler({"http":IP})

opener=urllib.request.build_opener(proxy,urllib.request.HTTPHandler)

#新增為全域性

urllib.request.install_opener(opener)

data=urllib.request.urlopen(url).read()

data=data.decode("utf-8","ignore")

return data

IP="61.135.217.7:80"

for i in range(1,10):

try:

url="http://www.qiushibaike.com/8hr/page/"+str(i)

pagedata=use_proxy(url,IP)

print(len(pagedata))

pat='<div class="content">.*?<span>(.*?)</span>.*?</div>'

datalist=re.compile(pat,re.S).findall(pagedata)

print(len(datalist))

for j in range(0,len(datalist)):

print("第"+str(i)+"頁第"+str(j)+"個段子的內容是： ")

print(datalist[j])

except urllib.error.HTTPError as e:

if hasattr(e,"code"):

print(e.code)

if hasattr(e,"reason"):

print(e.reason)

糗事百科段子爬蟲

import re import urllib.request import urllib.error headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0

Python爬蟲-爬取糗事百科段子

hasattr com ima .net header rfi star reason images 閑來無事，學學python爬蟲。在正式學爬蟲前，簡單學習了下HTML和CSS，了解了網頁的基本結構後，更加快速入門。 1.獲取糗事百科url http://www.qiu

爬蟲實戰1--抓取糗事百科段子

爬蟲1.提取某一頁的所有段子 # -*- coding:utf-8 -*- import urllib import urllib2 import re page = 1 url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page) user_agen

python 爬蟲--糗事百科段子

decode imp rst -a paragraph 糗事百科 mozilla ont ner import reimport urllib.requestfrom docx import Documentheader=("User-Agent",‘User-Agent:

手寫爬蟲之糗事百科段子及神回覆

先貼程式碼吧，然後再說遇到的坑 #!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2018/10/11 16:35 # @Author : yuantup # @Site : # @File : jokes_.py # @S

Python爬蟲之糗事百科段子寫入MySQL資料庫

在《Python爬取糗事百科段子》這篇文章中，我們獲取到了每一個段子的內容（content）、作者(auth)、作者主頁(home)、點贊數(votes)、評論數(comments)、段子地址(content_href)等資訊，現在我們只需要根據以上欄位名，建立資料庫表，將資訊逐條寫入資料庫就可以了

Python 爬取糗事百科段子

爬蟲 Python 百科段子直接上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- import re import urllib.request def gettext(url,page): headers=("User-Agen

Python :爬取糗事百科段子

原始碼： import urllib import random def JokeSet(Url,UserAgent) ''' Url ：動態url網址 UserAgent :動態請求頭 ''' #設定請求頭 Headers ={ "User-Agent" : UserAgent

Python資料爬蟲學習筆記（17）Scrapy糗事百科自動爬蟲

一、需求：在糗事百科主頁下，無需設定頁碼，自動爬取所有段子詳情頁的段子完整內容。（1）糗事百科主頁：（2）段子詳情頁：二、Scrapy實現思路：在糗事百科主頁上自動提取出所有段子的詳情連結，在每個段字詳情頁中爬取段子內容。三、網頁原始

用BeautifulSoup爬取糗事百科段子

from bs4 import BeautifulSoup import lxml import requests import html import time import html5lib import re def crawl_joke_list_usebs4(pag

NO.33——XPath選擇器爬取糗事百科段子

程式碼實戰： # -*- coding:utf-8 -*- import urllib import requests import re import chardet from lxml import etree page = 2 url = 'ht

Python 爬蟲系列：糗事百科最熱段子

image .get headers BE write findall parse 調用 with open 1.獲取糗事百科url http://www.qiushibaike.com/hot/page/2/ 末尾2指第2頁 2.分析頁面，找到段子部分的位置，

【爬蟲入門】【正則表示式】抓取糗事百科的段子3.0

在原有基礎上，增加寫入偽造瀏覽器的UserAgent fake_user_agent: pip install fake-useragent//這個第三方庫，維護了各種主流瀏覽器的UA標識，並且會定時更新這個庫，淘汰一些過期的UA。首先，在pycharm中安裝fake_userag

【爬蟲入門】【正則表示式】抓取糗事百科的段子2.0

在原有基礎上，增加寫入資料庫操作和網頁翻頁操作 import sqlite3, re from urllib.request import Request, urlopen class DBTool(object): """ 將資料儲存到資料庫的工具類，主要負責資料庫

【爬蟲入門】抓取糗事百科的段子1.0

爬取糗事百科資訊注意：爬取任何一個網站，首先要確定的就是這個網站是靜態網站還是動態網站。其次看看這個GET請求是否攜帶了特殊的引數。最後需要留意請求頭中的Cookie資訊。 class QSBKSpider(object): """ 爬蟲類 """

糗事百科正則爬蟲

.html == resp 加載初始 main findall print 錯誤參考博客：http://cuiqingcai.com/990.html # -*- coding:utf-8 -*- import urllib import urllib2 impor

Python爬蟲(十七)_糗事百科案例

exce html str window path {} zh-cn use src 糗事百科實例爬取糗事百科段子，假設頁面的URL是: http://www.qiushibaike.com/8hr/page/1 要求：使用requests獲取頁面信息，用XPath/

Python爬蟲(十八)_多線程糗事百科案例

.json afa 安全 rip down 退出交互 encode tar 多線程糗事百科案例案例要求參考上一個糗事百科單進程案例:http://www.cnblogs.com/miqi1992/p/8081929.html Queue(隊列對象) Queue是pyth

利用python爬取糗事百科的用戶及段子

我們什麽 roo urlopen gen 文件 addheader find 正則匹配最近正在學習python爬蟲，爬蟲可以做很多有趣的事，本文利用python爬蟲來爬取糗事百科的用戶以及段子，我們需要利用python獲取糗事百科一個頁面的用戶以及段子，就需要匹配兩次，

python爬蟲基礎案例之糗事百科

alt 依靠 webdriver pytho 糗事百科代碼 web 分享圖片 sel 關於爬蟲也是剛接觸，案例是基於python3做的，依靠selenium的webdriver做的，所以python3必須有selenium這個包，如果是基於谷歌瀏覽器的話需要下載谷歌瀏

糗事百科段子爬蟲

相關推薦