1. 程式人生 > 實用技巧 >requests(2):高階用法

requests(2):高階用法

1.檔案上傳

程式碼:

import requests

files={'file':open('favicon.ioc','rb')}#將之前儲存的圖示上傳
r=requests.post("http://httpbin.org/post",files=files)
print(r.text)

執行結果:輸出

{
  "args": {},
  "data": "",
  "files": {
    "file": "data:application/octet-stream;base64,AAABAAIAE......="
  },
  "form": {},
  "headers": {
    
"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type": "multipart/form-data; boundary=33cdfa79c0730d8f84ea9bdb3501852d", "Host": "httpbin.org", "User-Agent": "python-requests/2.25.0", "X-Amzn-Trace-Id": "Root=1-5fd8b40b-2727d1dd67bcac6071e02b44" },
"json": null, "origin": "183.92.250.185", "url": "http://httpbin.org/post" }

2.Cookies

獲取Cookies

程式碼:

import requests

r=requests.get("http://www.baidu.com")
print(r.cookies)
for key,value in r.cookies.items():
    print(key+'='+value)

執行結果:輸出

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ
=27315

(與書中不一致)

也可以直接使用Cookie來維持登入狀態。

程式碼:

import requests

headers={
  'Cookie':'_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1606400831,1607224141,1607844250,1608038711; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038711; SESSIONID=HLfiVBjVfxVpMa6fEvx5rioAGM3bCCHuGAEVoPObzXq; JOID=VFoSBkJRNcGGFuW9alJO0RA7gLN5HXyl40-j0j44SZL0V4uPCWeLF9od5LpkY1japKhIBVOvrrfr_V-s5S8xMC4=; osd=VFgWAEpRN8WAHuW_blRG0RI_hrt5H3ij60-h1jgwSZDwUYOPC2ONH9of4LxsY1reoqBIB1epprfp-Vmk5S01NiY=; KLBRSID=0a401b23e8a71b70de2f4b37f5b4e379|1608038712|1608038704',
  'Host':'www.zhihu.com',
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
r=requests.get('https://www.zhihu.com',headers=headers)
print(r.text)

執行結果:輸出登陸後的結果。

或者直接設定Cookie引數,但是較繁瑣。

程式碼:

import requests

cookies='_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; SESSIONID=Lw1oXe6qnCJ82o9ksVXjfwe0DNCniiBKX34UfmTXuLx; JOID=WlgTCk4yx7ADY72OYDOyr5tD2o58draOYDbQvh5RjPw0Xd-_DThCal1ms4djtwCvTf2KMHI_GjSp_ueD_caBq74=; osd=UV8RCk05wLIDYLaJYjOxpJxB2o13cbSOYz3XvB5Sh_s2Xdy0CjpCaVZhsYdgvAetTf6BN3A_GT-u_OeA9sGDq70=; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038737,1608092677,1608092706,1608092743; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608092743; KLBRSID=76ae5fb4fba0f519d97e594f1cef9fab|1608093479|1608092673'
jar=requests.cookies.RequestsCookieJar()#新建一個RequestsCookieJar物件
headers={
  'Host':'www.zhihu.com',
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
for cookie in cookies.split(';'):#使用split()方法分割
    key,value=cookie.split('=',1)
    jar.set(key,value)#用set()方法設定好每個Cookie的key和value
r=requests.get('https://www.zhihu.com',cookies=jar,headers=headers)#傳入cookies引數
print(r.text)

執行結果:與上面一致

3.會話維持

使用Session物件維持同一個會話

程式碼:

import requests

requests.get('http://httpbin.org/cookies/set/number/123456')#設定一個cookies,內容是123456
r=requests.get('http://httpbin.org/cookies')
print(r.text)

執行結果:輸出

{
  "cookies": {}
}

從結果來看,這樣並不能獲取到設定的cookies。

如果使用Session物件

程式碼:

import requests

s=requests.Session()#建立一個Session物件
s.get('http://httpbin.org/cookies/set/number/123456')
r=s.get('http://httpbin.org/cookies')
print(r.text)

執行結果:輸出

{
  "cookies": {
    "number": "123456"
  }
}

可以成功獲取。

4.SLL證書驗證

如果一個網站沒有被官方CA機構信任,會出現證書錯誤的結果,以12306為示例(其實12306現在已經沒有這個問題了),可以將verify引數設定為False。

程式碼:

import requests

response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

但是執行結果會輸出警告,它建議我們給他指定證書,我們可以設定忽略警告或者捕獲警告到日誌。

程式碼:

import requests
import logging
from requests.packages import urllib3

urllib3.disable_warnings()#忽略警告
#或使用logging.captureWarnings(True)
#捕獲警告到日誌

response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

我們也可以指定一個本地證書用作客戶端證書。

程式碼:

import requests

response=requests.get('https://www.12306.cn',cert=('/path/server.crrt','/path/key'))
#包含兩個檔案路徑的元組,key需要是解密的
print(response.status_code)
 

5.代理設定

使用proxies引數設定代理。

程式碼:

import requests

proxies={
  'http':'http://10.10.1.10:3128',
  'https':'http://10.10.1.10:1080'
}

#若代理需要使用HTTP Basic Auth可以使用類似http://user:password@:port的語法
#proxies={
#  'http':'http://user:[email protected]:3128/',
#}

#還支援SOCKS協議代理
#proxies={
#  'http':'socks5://user:password@:port',
#  'https':'socks5://user:password@:port'
#}

requests.get("https://www.taobao.com",proxies=proxies)

6.超時設定

使用timeout引數。

程式碼:

import requests

r=requests.get("http://www.taobao.com",timeout=1)
print(r.status_code)

這裡將請求時間設定為1秒,如果1秒內沒有響應,就丟擲異常。

實際上,請求分為兩個階段連線和讀取,timeout設定將用於這兩個時間階段的總和。

可以傳入一個元組來分別指定這兩個階段的時間。

r=requests.get("http://www.taobao.com",timeout=(5,30))

如果不設定,直接留空或者設定為None。

7.身份驗證

有的網站可能需要身份驗證。可以直接傳入一個元組給auth引數。

import requests

r=requests.get('http://locahost:5000',auth=('username','password'))
print(r.status_code)

8.Prepared Request

requests中的資料結構就叫Prepared Request。

程式碼:

from requests import Request,Session

url='http://httpbin.org/post'
data={
  'name':'germey'
}
headers={
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
s=Session()
req=Request('POST',url,data=data,headers=headers)#構造了一個Request物件
prepped=s.prepare_request(req)#使用prepare_request()將其轉換為一個Prepared Request物件
r=s.send(prepped)#呼叫send傳送
print(r.text)

執行結果:輸出

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "name": "germey"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "11",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60",
    "X-Amzn-Trace-Id": "Root=1-5fd9da60-2467cc031aa12ebd39daf72d"
  },
  "json": null,
  "origin": "183.92.251.74",
  "url": "http://httpbin.org/post"
}

同樣達到POST請求效果。

參考用書:《python3網路爬蟲開發實戰》