urllib.request模組(1)

阿新 • • 發佈：2020-12-01

1.Request()的引數

import urllib.request

request=urllib.request.Request('https://python.org')
response=urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

通過構造這個資料結構，一方面可以我們可以將請求獨立成一個物件，另一方面可以更加豐富和靈活地配置引數。

它的構造方法如下：

class.urllib.request.Request(url,data=None,headers={},origin_rep_host=None,unverifiable=False,method=None)

引數：

1.url必傳引數

2.data，必須傳bytes型別。如果是字典，先使用urllib.parse裡的urlencode()

3.headers，是一個字典，請求頭，直接構造或者用add_header()方法新增

4.origin_rep_host，請求方的名稱或者ip地址

5.unverifiable，預設為false，表示這個請求是否無法驗證。如果沒有抓取的許可權，此時值就是true。

6.method，用來指示請求使用的方法。

嘗試傳入多個引數構建請求：

from urllib import request,parse

url='http://httpbin.org/post 
'

headers={
   'Url-Agent':'Mozilla/4.0(compatible;MSIE 5.5;Windows NT)',
   'Host':'httpbin.org'
}
#也可以使用add_header()方法新增headers：
#req=request.Request(url=url,data=data,method='POST')
#req.add_header('User-Agent','Mozilla/4.0(compatible;MSIE 5.5;Windows NT)')

dict={
   'name':'Germey'
}
data=bytes(parse.urlencode(dict),encoding=' 
utf-8')#用urlencode()將dict轉換成bytes型別，傳遞給data

req=request.Request(url=url,data=data,headers=headers,method='POST')
response=request.urlopen(req)
print(response.read().decode('utf-8'))

執行結果：

2.Handler與Opener

Handler：

它是各種處理器，幾乎可以做到HTTP請求中的所有事情。

urllib.request模組裡的BaseHandler類，它是所有其他Headler的父類，它提供了最基本的方法。

Opener：

例如urlopen()就是一個Opener，它是urllib為我們提供的。

它們的關係是：使用Handler來構建Opener。

3.用法

驗證：

建立一個需要驗證的網站，我這裡使用的是IIS

遇到的問題：

IIS怎樣安裝與配置-百度經驗 (baidu.com)

IIS網站如何設定基本身份驗證-百度經驗 (baidu.com)

window10家庭版解決IIS中全球資訊網服務的安全性中無Windows身份驗證 - enjoryWeb - 部落格園 (cnblogs.com)

程式碼：

from urllib.request import HTTPPasswordMgrWithDefaultRealm,HTTPBasicAuthHandler,build_opener
from urllib.error import URLError

username='username'#填上自己的使用者名稱和密碼
password='password'
url='http://localhost:5000/'

p=HTTPPasswordMgrWithDefaultRealm()
p.add_password(None,url,username,password)#新增使用者名稱和密碼，建立了一個處理驗證的Handler
auth_handler=HTTPBasicAuthHandler(p)#基本認證
opener=build_opener(auth_handler)#利用Handler構建一個Opener

try:
    result=opener.open(url)#開啟連結
    html=result.read().decode('utf-8')
    print(html)#結果列印html原始碼內容
except URLError as e:
    print(e.reason)

代理：

新增代理，在本地搭建一個代理，執行在9743埠上。

程式碼：

from urllib.request import ProxyHandler,build_opener
from urllib.error import URLError

proxy_handler=ProxyHandler({
    'http':'http://127.0.0.1:9743',
    'https':'https://127.0.0.1:9743'
})#構建一個Handler
opener=build_opener(proxy_handler)#構建一個Opener
try:
    response=opener.open('https://www.baidu.com')
    print(response.read().decode('utf-8'))
except URLError as e:
    print(e.reason)

Cookies：

將網站的Cookies獲取下來：

程式碼：

import http.cookiejar,urllib.request

cookie=http.cookiejar.CookieJar()#宣告一個CookieJar物件
handler=urllib.request.HTTPCookieProcessor(cookie)#構建一個Handler
opener=urllib.request.build_opener(handler)#構建一個Opener
response=opener.open('http://www.baidu.com')
for item in cookie:
    print(item.name+"="+item.value)

執行結果：

將Cookie輸出成檔案格式：

程式碼：

import http.cookiejar,urllib.request

filename='cookies.txt'

cookie=http.cookiejar.MozillaCookieJar(filename)
#MozillaCookieJar()生成檔案時用到，用來處理Cookie和檔案相關的事件
#如果要儲存LWP格式的Cookies檔案，可以改為：
#cookie=http.cookiejar.LWPCookieJar(filename)

handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
response=opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True,ignore_expires=True)

執行結果：

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file!  Do not edit.

.baidu.com    TRUE    /    FALSE    1638359640    BAIDUID    9BB1BA4FDD840EBD956A3D2EFB6BF883:FG=1
.baidu.com    TRUE    /    FALSE    3754307287    BIDUPSID    9BB1BA4FDD840EBD25D00EE8183D1125
.baidu.com    TRUE    /    FALSE        H_PS_PSSID    1445_33119_33059_31660_33099_33101_26350_33199
.baidu.com    TRUE    /    FALSE    3754307287    PSTM    1606823639
www.baidu.com    FALSE    /    FALSE        BDSVRTM    7
www.baidu.com    FALSE    /    FALSE        BD_HOME    1

LWP格式：

#LWP-Cookies-2.0
Set-Cookie3: BAIDUID="DDF5CB401A1543ED614CE42962D48099:FG=1"; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2021-12-01 12:04:18Z"; comment=bd; version=0
Set-Cookie3: BIDUPSID=DDF5CB401A1543ED00860C3997C3282C; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2088-12-19 15:18:25Z"; version=0
Set-Cookie3: H_PS_PSSID=1430_33058_31254_33098_33101_33199; path="/"; domain=".baidu.com"; path_spec; domain_dot; discard; version=0
Set-Cookie3: PSTM=1606824257; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2088-12-19 15:18:25Z"; version=0
Set-Cookie3: BDSVRTM=0; path="/"; domain="www.baidu.com"; path_spec; discard; version=0
Set-Cookie3: BD_HOME=1; path="/"; domain="www.baidu.com"; path_spec; discard; version=0

以LWP格式的檔案為示例，展示讀取和利用的方法：

程式碼：

import http.cookiejar,urllib.request

cookie=http.cookiejar.LWPCookieJar()
#如果檔案儲存為Mozilla型瀏覽器格式，可以改為：
#cookie=http.cookiejar.MozillaCookieJar()

cookie.load('cookies.txt',ignore_discard=True,ignore_expires=True)
#呼叫load()方法來讀取本地的Cookies檔案，獲取Cookies的內容

handler=urllib.request.HTTPCookieProcessor(cookie)
opener=urllib.request.build_opener(handler)
response=opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))

執行結果：輸出網頁原始碼。

參考用書《python3網路爬蟲開發實戰》

urllib.request模組(1)

1.Request()的引數 import urllib.request request=urllib.request.Request(\'https://python.org\') response=urllib.request.urlopen(request)

urllib.request模組(2)：處理異常

1.URLError 該類來自urllib.error模組，由request模組產生的異常都可以通過捕獲這個類來處理。

urllib.request模組(3)：解析連結

1.urlparse() 作用：實現url的識別和分段。程式碼： from urllib.parse import urlparse result=urlparse(\'http://www.baidu.com/index.html;user?id=5#comment\')

urllib.request模組(4)：分析Robots協議

1.Robots協議 robots協議_百度百科 (baidu.com) User-agent描述了搜尋爬蟲的名稱，將其設定為*則代表該協議對任何爬蟲都有效。

Python urllib request模組傳送請求實現過程解析

1.Request()的引數 import urllib.request request=urllib.request.Request(\'https://python.org\') response=urllib.request.urlopen(request)

一木.溪橋學爬蟲-03：請求模組urllib、 urllib.request、urllib.parse.urlencode、urllib.parse.quote(str)、.unquote()

技術標籤：Python 爬蟲python 一木.溪橋在Logic Education跟Jerry學爬蟲 07期：Python 爬蟲一木.溪橋學爬蟲-03：請求模組urllib、 urllib.request、urllib.parse.urlencode、urllib.parse.quote(str)、parse.

python urllib爬蟲模組使用解析

前言網路爬蟲也稱為網路蜘蛛、網路機器人，抓取網路的資料。其實就是用Python程式模仿人點選瀏覽器並訪問網站，而且模仿的越逼真越好。一般爬取資料的目的主要是用來做資料分析，或者公司專案做資料測試，公司業務所

python爬蟲開發之Request模組從安裝到詳細使用方法與例項全解

python爬蟲模組Request的安裝在cmd中，使用如下指令安裝requests： pip install requests

如何基於執行緒池提升request模組效率

普通方法：爬取梨視訊 import re import time import random import requests from lxml import etree start_time = time.time()

python中urllib.request和requests的使用及區別詳解

urllib.request 我們都知道，urlopen()方法能發起最基本對的請求發起，但僅僅這些在我們的實際應用中一般都是不夠的，可能我們需要加入headers之類的引數,那需要用功能更為強大的Request類來構建了

Python urllib.request物件案例解析

剛剛接觸爬蟲，基礎的東西得時時回顧才行，這麼全面的帖子無論如何也得厚著臉皮轉過來啊！

python 爬蟲 02-urllib+request

1. urllib.request模組 1.1 版本 python2 ：urllib2、urllib python3 ：把urllib和urllib2合併,urllib.request

python筆記-模組1

python模組定義一個.py檔案就是一個模組，它是Python的最小封裝單位。分類內建模組：Python內部提供的模組，如time，random

使用 urllib.request 來訪問蘑菇丁、登入

import sslimport json#使用urllib去訪問頁面import urllib.request as ur#規避警告context = ssl._create_unverified_context()# 登入函式def Login(info_data):# 把字典info_data轉成字串、再用過encoding指定的編

訪問個人主頁、蘑菇丁、使用：import urllib.request

import sslimport json#使用urllib去訪問頁面import urllib.request as ur#規避警告context = ssl._create_unverified_context()#登入函式def Login(info_data):# 把字典info_data轉成字串、再用過encoding指定的編碼

python 自定義request模組除錯

import requests import logging # These two lines enable debugging at httplib level (requests->urllib3->http.client)

異常和模組-1

1.輸入一行字元，分別統計出其中的數字、字母、空行和其他字元的個數； #encoding = utf-8

spider.?-python中urllib.request和requests的使用和區別

轉載自：https://blog.csdn.net/qq_38783948/article/details/88239109 1.urllib.request 我們都知道，urlopen()方法能發起最基本對的請求發起，但僅僅這些在我們的實際應用中一般都是不夠的，可能我們需要加入heade

從零開始【第二天】 python爬蟲師python教程request模組

python爬蟲師python教程request模組 python教程request模組這個模組，是基礎模組，需要多做練習。建議練習20個案例以上。

python request 模組詳細介紹

request 　　Requests 是使用 Apache2 Licensed 許可證的基於Python開發的HTTP 庫，其在Python內建模組的基礎上進行了高度的封裝，從而使得Pythoner進行網路請求時，變得美好了許多，使用Requests可以輕而易舉的完成

urllib.request模組(1)

1.Request()的引數

2.Handler與Opener

3.用法

相關推薦