Python-第三方庫requests詳解

阿新 • • 發佈：2018-12-09

Requests 是用語言編寫，基於 urllib，採用 Apache2 Licensed 開源協議的 HTTP 庫。它比 urllib 更加方便，可以節約我們大量的工作，完全滿足 HTTP 測試需求。Requests 的哲學是以 PEP 20 的習語為中心開發的，所以它比 urllib 更加 Pythoner。更重要的一點是它支援 Python3 哦！

Beautiful is better than ugly.(美麗優於醜陋)
Explicit is better than implicit.(清楚優於含糊)
Simple is better than complex.(簡單優於複雜)

Complex is better than complicated.(複雜優於繁瑣)
Readability counts.(重要的是可讀性)

一、安裝 Requests

通過pip安裝

pip install requests

或者，下載程式碼後安裝：

$ git clone git://github.com/kennethreitz/requests.git
$ cd requests
$ python setup.py install

再懶一點，通過IDE安裝吧，如pycharm！

二、傳送請求與傳遞引數

先來一個簡單的例子吧！讓你瞭解下其威力：

import requests

r = requests.get(url='http://www.itwhy.org') # 最基本的GET請求

print(r.status_code) # 獲取返回狀態

r = requests.get(url='http://dict.baidu.com/s', params={'wd':'python'}) #帶引數的GET請求 print(r.url) print(r.text) #列印解碼後的返回資料

很簡單吧！不但GET方法簡單，其他方法都是統一的介面樣式哦！

requests.get(‘https://github.com/timeline.json’) #GET請求 requests.post(“http://httpbin.org/post”) #POST請求 requests.put(“http://httpbin.org/put”) #PUT請求 requests.delete(“http://httpbin.org/delete”) #DELETE請求 requests.head(“http://httpbin.org/get”) #HEAD請求 requests.options(“http://httpbin.org/get”) #OPTIONS請求

PS：以上的HTTP方法，對於WEB系統一般只支援 GET 和 POST，有一些還支援 HEAD 方法。帶引數的請求例項：

import requests
requests.get('http://www.dict.baidu.com/s', params={'wd': 'python'})    #GET引數例項
requests.post('http://www.itwhy.org/wp-comments-post.php', data={'comment': '測試POST'})    #POST引數例項

POST傳送JSON資料：

import requests
import json
 
r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some': 'data'}))
print(r.json())

定製header：

import requests
import json
 
data = {'some': 'data'}
headers = {'content-type': 'application/json',
           'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)
print(r.text)

三、Response物件

使用requests方法後，會返回一個response物件，其儲存了伺服器響應的內容，如上例項中已經提到的 r.text、r.status_code…… 獲取文字方式的響應體例項：當你訪問 r.text 之時，會使用其響應的文字編碼進行解碼，並且你可以修改其編碼讓 r.text 使用自定義的編碼進行解碼。

r = requests.get('http://www.itwhy.org')
print(r.text, '\n{}\n'.format('*'*79), r.encoding)
r.encoding = 'GBK'
print(r.text, '\n{}\n'.format('*'*79), r.encoding)

其他響應：

r.status_code #響應狀態碼 r.raw #返回原始響應體，也就是 urllib 的 response 物件，使用 r.raw.read() 讀取 r.content #位元組方式的響應體，會自動為你解碼 gzip 和 deflate 壓縮 r.text #字串方式的響應體，會自動根據響應頭部的字元編碼進行解碼 r.headers #以字典物件儲存伺服器響應頭，但是這個字典比較特殊，字典鍵不區分大小寫，若鍵不存在則返回None #*特殊方法*# r.json() #Requests中內建的JSON解碼器 r.raise_for_status() #失敗請求(非200響應)丟擲異常

案例之一：

import requests
 
URL = 'http://ip.taobao.com/service/getIpInfo.php'  # 淘寶IP地址庫API
try:
    r = requests.get(URL, params={'ip': '8.8.8.8'}, timeout=1)
    r.raise_for_status()    # 如果響應狀態碼不是 200，就主動丟擲異常
except requests.RequestException as e:
    print(e)
else:
    result = r.json()
    print(type(result), result, sep='\n')

四、上傳檔案

使用 Requests 模組，上傳檔案也是如此簡單的，檔案的型別會自動進行處理：

import requests
 
url = 'http://127.0.0.1:5000/upload'
files = {'file': open('/home/lyb/sjzl.mpg', 'rb')}
#files = {'file': ('report.jpg', open('/home/lyb/sjzl.mpg', 'rb'))}     #顯式的設定檔名
 
r = requests.post(url, files=files)
print(r.text)

更加方便的是，你可以把字串當著檔案進行上傳：

import requests
 
url = 'http://127.0.0.1:5000/upload'
files = {'file': ('test.txt', b'Hello Requests.')}     #必需顯式的設定檔名
 
r = requests.post(url, files=files)
print(r.text)

五、身份驗證

基本身份認證(HTTP Basic Auth):

import requests
from requests.auth import HTTPBasicAuth
 
r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user', 'passwd'))
# r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=('user', 'passwd'))    # 簡寫
print(r.json())

另一種非常流行的HTTP身份認證形式是摘要式身份認證，Requests對它的支援也是開箱即可用的:

requests.get(URL, auth=HTTPDigestAuth('user', 'pass'))

六、Cookies與會話物件

如果某個響應中包含一些Cookie，你可以快速訪問它們：

import requests
 
r = requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))

要想傳送你的cookies到伺服器，可以使用 cookies 引數：

import requests
 
url = 'http://httpbin.org/cookies'
cookies = {'testCookies_1': 'Hello_Python3', 'testCookies_2': 'Hello_Requests'}
# 在Cookie Version 0中規定空格、方括號、圓括號、等於號、逗號、雙引號、斜槓、問號、@，冒號，分號等特殊符號都不能作為Cookie的內容。
r = requests.get(url, cookies=cookies)
print(r.json())

會話物件讓你能夠跨請求保持某些引數，最方便的是在同一個Session例項發出的所有請求之間保持cookies，且這些都是自動處理的，甚是方便。下面就來一個真正的例項，如下是快盤簽到指令碼：

import requests
 
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
           'Accept-Encoding': 'gzip, deflate, compress',
           'Accept-Language': 'en-us;q=0.5,en;q=0.3',
           'Cache-Control': 'max-age=0',
           'Connection': 'keep-alive',
           'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
s = requests.Session()
s.headers.update(headers)
# s.auth = ('superuser', '123')
s.get('https://www.kuaipan.cn/account_login.htm')
 
_URL = 'http://www.kuaipan.cn/index.php'
s.post(_URL, params={'ac':'account', 'op':'login'},
       data={'username':'****@foxmail.com', 'userpwd':'********', 'isajax':'yes'})
r = s.get(_URL, params={'ac':'zone', 'op':'taskdetail'})
print(r.json())
s.get(_URL, params={'ac':'common', 'op':'usersign'})

七、超時與異常

timeout 僅對連線過程有效，與響應體的下載無關。

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

所有Requests顯式丟擲的異常都繼承自 requests.exceptions.RequestException：ConnectionError、HTTPError、Timeout、TooManyRedirects。

轉自

http://www.itwhy.org/%E8%BD%AF%E4%BB%B6%E5%B7%A5%E7%A8%8B/python/python-%E7%AC%AC%E4%B8%89%E6%96%B9-http-%E5%BA%93-requests-%E5%AD%A6%E4%B9%A0.html

requests是python的一個HTTP客戶端庫，跟urllib，urllib2類似，那為什麼要用requests而不用urllib2呢？官方文件中是這樣說明的：

python的標準庫urllib2提供了大部分需要的HTTP功能，但是API太逆天了，一個簡單的功能就需要一大堆程式碼。

我也看了下requests的文件，確實很簡單，適合我這種懶人。下面就是一些簡單指南。

插播個好訊息！剛看到requests有了中文翻譯版，建議英文不好的看看，內容也比我的部落格好多了，具體連結是：http://cn.python-requests.org/en/latest/(不過是v1.1.0版，另抱歉，之前貼錯連結了)。

1. 安裝

安裝很簡單，我是win系統，就在這裡下載了安裝包（網頁中download the zipball處連結），然後$ python setup.py install就裝好了。當然，有easy_install或pip的朋友可以直接使用：easy_install requests或者pip install requests來安裝。至於linux使用者，這個頁面還有其他安裝方法。測試：在IDLE中輸入import requests，如果沒提示錯誤，那說明已經安裝成功了！

2. 小試牛刀

>>>import requests
>>> r = requests.get('http://www.zhidaow.com')  # 傳送請求
>>> r.status_code  # 返回碼 
200
>>> r.headers['content-type']  # 返回頭部資訊
'text/html; charset=utf8'
>>> r.encoding  # 編碼資訊
'utf-8'
>>> r.text  #內容部分（PS，由於編碼問題，建議這裡使用r.content）
u'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'
...

是不是很簡單？比urllib2和urllib簡單直觀的多？！那請接著看快速指南吧。

3. 快速指南

3.1 傳送請求

傳送請求很簡單的，首先要匯入requests模組：

>>>import requests

接下來讓我們獲取一個網頁，例如我個人部落格的首頁：

>>>r = requests.get('http://www.zhidaow.com')

接下來，我們就可以使用這個r的各種方法和函數了。另外，HTTP請求還有很多型別，比如POST,PUT,DELETE,HEAD,OPTIONS。也都可以用同樣的方式實現：

>>> r = requests.post("http://httpbin.org/post")
>>> r = requests.put("http://httpbin.org/put")
>>> r = requests.delete("http://httpbin.org/delete")
>>> r = requests.head("http://httpbin.org/get")
>>> r = requests.options("http://httpbin.org/get")

因為目前我還沒用到這些，所以沒有深入研究。

3.2 在URLs中傳遞引數

有時候我們需要在URL中傳遞引數，比如在採集百度搜索結果時，我們wd引數（搜尋詞）和rn引數（搜素結果數量），你可以手工組成URL，requests也提供了一種看起來很NB的方法：

>>> payload = {'wd': '張亞楠', 'rn': '100'}
>>> r = requests.get("http://www.baidu.com/s", params=payload)
>>> print r.url
u'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'

上面wd=的亂碼就是“張亞楠”的轉碼形式。（好像引數按照首字母進行了排序。）

3.3 獲取響應內容

可以通過r.text來獲取網頁的內容。

>>> r = requests.get('https://www.zhidaow.com')
>>> r.text
u'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

文件裡說，requests會自動將內容轉碼。大多數unicode字型都會無縫轉碼。但我在cygwin下使用時老是出現UnicodeEncodeError錯誤，鬱悶。倒是在python的IDLE中完全正常。另外，還可以通過r.content來獲取頁面內容。

>>> r = requests.get('https://www.zhidaow.com')
>>> r.content
b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

文件中說r.content是以位元組的方式去顯示，所以在IDLE中以b開頭。但我在cygwin中用起來並沒有，下載網頁正好。所以就替代了urllib2的urllib2.urlopen(url).read()功能。（基本上是我用的最多的一個功能。）

3.4 獲取網頁編碼

可以使用r.encoding來獲取網頁編碼。

>>> r = requests.get('http://www.zhidaow.com')
>>> r.encoding
'utf-8'

當你傳送請求時，requests會根據HTTP頭部來猜測網頁編碼，當你使用r.text時，requests就會使用這個編碼。當然你還可以修改requests的編碼形式。

>>> r = requests.get('http://www.zhidaow.com')
>>> r.encoding
'utf-8'
>>>r.encoding = 'ISO-8859-1'

像上面的例子，對encoding修改後就直接會用修改後的編碼去獲取網頁內容。

3.5 json

像urllib和urllib2，如果用到json，就要引入新模組，如json和simplejson，但在requests中已經有了內建的函式，r.json()。就拿查詢IP的API來說：

>>>r = requests.get('http://ip.taobao.com/service/getIpInfo.php?ip=122.88.60.28')
>>>r.json()['data']['country']
'中國'

3.6 網頁狀態碼

我們可以用r.status_code來檢查網頁的狀態碼。

>>>r = requests.get('http://www.mengtiankong.com')
>>>r.status_code
200
>>>r = requests.get('http://www.mengtiankong.com/123123/')
>>>r.status_code
404
>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')
>>>r.url
u'http://www.zhidaow.com/
>>>r.status_code
200

前兩個例子很正常，能正常開啟的返回200，不能正常開啟的返回404。但第三個就有點奇怪了，那個是百度搜索結果中的302跳轉地址，但狀態碼顯示是200，接下來我用了一招讓他原形畢露：

>>>r.history
(<Response [302]>,)

這裡能看出他是使用了302跳轉。也許有人認為這樣可以通過判斷和正則來獲取跳轉的狀態碼了，其實還有個更簡單的方法：

>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)
>>>r.status_code
302

只要加上一個引數allow_redirects，禁止了跳轉，就直接出現跳轉的狀態碼了，好用吧？我也利用這個在最後一掌做了個簡單的獲取網頁狀態碼的小應用，原理就是這個。

3.7 響應頭內容

可以通過r.headers來獲取響應頭內容。

>>>r = requests.get('http://www.zhidaow.com')
>>> r.headers
{
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'content-type': 'text/html; charset=utf-8';
    ...
}

可以看到是以字典的形式返回了全部內容，我們也可以訪問部分內容。

>>> r.headers['Content-Type']
'text/html; charset=utf-8'

>>> r.headers.get('content-type')
'text/html; charset=utf-8'

3.8 設定超時時間

我們可以通過timeout屬性設定超時時間，一旦超過這個時間還沒獲得響應內容，就會提示錯誤。

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

3.9 代理訪問

採集時為避免被封IP，經常會使用代理。requests也有相應的proxies屬性。

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://www.zhidaow.com", proxies=proxies)

如果代理需要賬戶和密碼，則需這樣：

proxies = {
    "http": "http://user:[email protected]:3128/",
}

3.10 請求頭內容

請求頭內容可以用r.request.headers來獲取。

>>> r.request.headers
{'Accept-Encoding': 'identity, deflate, compress, gzip',
'Accept': '*/*', 'User-Agent': 'python-requests/1.2.3 CPython/2.7.3 Windows/XP'}

3.11 自定義請求頭部

偽裝請求頭部是採集時經常用的，我們可以用這個方法來隱藏：

r = requests.get('http://www.zhidaow.com')
print r.request.headers['User-Agent']
#python-requests/1.2.3 CPython/2.7.3 Windows/XP

headers = {'User-Agent': 'alexkh'}
r = requests.get('http://www.zhidaow.com', headers = headers)
print r.request.headers['User-Agent']
#alexkh

3.12 持久連線keep-alive

requests的keep-alive是基於urllib3，同一會話內的持久連線完全是自動的。同一會話內的所有請求都會自動使用恰當的連線。

也就是說，你無需任何設定，requests會自動實現keep-alive。

4. 簡單應用

4.1 獲取網頁返回碼

def get_status(url):
    r = requests.get(url, allow_redirects = False)
    return r.status_code

print get_status('http://www.zhidaow.com') 
#200
print get_status('http://www.zhidaow.com/hi404/')
#404
print get_status('http://mengtiankong.com')
#301
print get_status('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')
#302
print get_status('http://www.huiya56.com/com8.intre.asp?46981.html')
#500

後記

1、官方文件 requests的具體安裝過程請看：http://docs.python-requests.org/en/latest/user/install.html#install requests的官方指南文件：http://docs.python-requests.org/en/latest/user/quickstart.html requests的高階指南文件：http://docs.python-requests.org/en/latest/user/advanced.html#advanced 2、本文內容部分翻譯自官方文件，部分自己歸納。 3、大多數用的IDLE格式，累死了，下次直接用編輯器格式，這樣更符合我的習慣。 4、還是那句話，有問題留言或email。 5、圖注：requests官方文件上的一隻老鱉。

Python-第三方庫requests詳解

1. 安裝

2. 小試牛刀

3. 快速指南

3.1 傳送請求

3.2 在URLs中傳遞引數

3.3 獲取響應內容

3.4 獲取網頁編碼

3.5 json

3.6 網頁狀態碼

3.7 響應頭內容

3.8 設定超時時間

3.9 代理訪問

3.10 請求頭內容

3.11 自定義請求頭部

3.12 持久連線keep-alive

4. 簡單應用

4.1 獲取網頁返回碼

後記

11.Python-第三方庫requests詳解(三）

Python-第三方庫requests詳解

Python強大的第三方庫requests詳解

python第三方庫requests簡單介紹

Window下安裝Python第三方庫requests。（python3+pycharm5）

python tkinker庫模組詳解

python第三方庫requests

python爬蟲requests的庫使用詳解

【Python爬蟲學習筆記8-2】MongoDB數據庫操作詳解

python爬蟲入門之————————————————第三節requests詳解

Python爬蟲之selenium庫使用詳解

Requests 庫 | 不可勝數的 Python 第三方庫

Python的第三方庫requests提示警告InsecureRequestWarning的問題

Liblinear機器學習庫教程詳解（基於Python API）

python 第三方庫的安裝，pip的使用

Python init.py 作用詳解

四、python之函數詳解

[轉載]Python logging模塊詳解

Python HTTP庫requests中文頁面亂碼解決方案！

python第三方庫PIL安裝的各種坑

Python-第三方庫requests詳解

1. 安裝

2. 小試牛刀

3. 快速指南

3.1 傳送請求

3.2 在URLs中傳遞引數

3.3 獲取響應內容

3.4 獲取網頁編碼

3.5 json

3.6 網頁狀態碼

3.7 響應頭內容

3.8 設定超時時間

3.9 代理訪問

3.10 請求頭內容

3.11 自定義請求頭部

3.12 持久連線keep-alive

4. 簡單應用

4.1 獲取網頁返回碼

後記

相關推薦