requests(2):高階用法
阿新 • • 發佈:2020-12-16
1.檔案上傳
程式碼:
import requests files={'file':open('favicon.ioc','rb')}#將之前儲存的圖示上傳 r=requests.post("http://httpbin.org/post",files=files) print(r.text)
執行結果:輸出
{ "args": {}, "data": "", "files": { "file": "data:application/octet-stream;base64,AAABAAIAE......=" }, "form": {}, "headers": {"Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type": "multipart/form-data; boundary=33cdfa79c0730d8f84ea9bdb3501852d", "Host": "httpbin.org", "User-Agent": "python-requests/2.25.0", "X-Amzn-Trace-Id": "Root=1-5fd8b40b-2727d1dd67bcac6071e02b44" },"json": null, "origin": "183.92.250.185", "url": "http://httpbin.org/post" }
2.Cookies
獲取Cookies
程式碼:
import requests r=requests.get("http://www.baidu.com") print(r.cookies) for key,value in r.cookies.items(): print(key+'='+value)
執行結果:輸出
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> BDORZ=27315
(與書中不一致)
也可以直接使用Cookie來維持登入狀態。
程式碼:
import requests headers={ 'Cookie':'_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1606400831,1607224141,1607844250,1608038711; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038711; SESSIONID=HLfiVBjVfxVpMa6fEvx5rioAGM3bCCHuGAEVoPObzXq; JOID=VFoSBkJRNcGGFuW9alJO0RA7gLN5HXyl40-j0j44SZL0V4uPCWeLF9od5LpkY1japKhIBVOvrrfr_V-s5S8xMC4=; osd=VFgWAEpRN8WAHuW_blRG0RI_hrt5H3ij60-h1jgwSZDwUYOPC2ONH9of4LxsY1reoqBIB1epprfp-Vmk5S01NiY=; KLBRSID=0a401b23e8a71b70de2f4b37f5b4e379|1608038712|1608038704', 'Host':'www.zhihu.com', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } r=requests.get('https://www.zhihu.com',headers=headers) print(r.text)
執行結果:輸出登陸後的結果。
或者直接設定Cookie引數,但是較繁瑣。
程式碼:
import requests cookies='_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; SESSIONID=Lw1oXe6qnCJ82o9ksVXjfwe0DNCniiBKX34UfmTXuLx; JOID=WlgTCk4yx7ADY72OYDOyr5tD2o58draOYDbQvh5RjPw0Xd-_DThCal1ms4djtwCvTf2KMHI_GjSp_ueD_caBq74=; osd=UV8RCk05wLIDYLaJYjOxpJxB2o13cbSOYz3XvB5Sh_s2Xdy0CjpCaVZhsYdgvAetTf6BN3A_GT-u_OeA9sGDq70=; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038737,1608092677,1608092706,1608092743; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608092743; KLBRSID=76ae5fb4fba0f519d97e594f1cef9fab|1608093479|1608092673' jar=requests.cookies.RequestsCookieJar()#新建一個RequestsCookieJar物件 headers={ 'Host':'www.zhihu.com', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } for cookie in cookies.split(';'):#使用split()方法分割 key,value=cookie.split('=',1) jar.set(key,value)#用set()方法設定好每個Cookie的key和value r=requests.get('https://www.zhihu.com',cookies=jar,headers=headers)#傳入cookies引數 print(r.text)
執行結果:與上面一致
3.會話維持
使用Session物件維持同一個會話。
程式碼:
import requests requests.get('http://httpbin.org/cookies/set/number/123456')#設定一個cookies,內容是123456 r=requests.get('http://httpbin.org/cookies') print(r.text)
執行結果:輸出
{ "cookies": {} }
從結果來看,這樣並不能獲取到設定的cookies。
如果使用Session物件。
程式碼:
import requests s=requests.Session()#建立一個Session物件 s.get('http://httpbin.org/cookies/set/number/123456') r=s.get('http://httpbin.org/cookies') print(r.text)
執行結果:輸出
{ "cookies": { "number": "123456" } }
可以成功獲取。
4.SLL證書驗證
如果一個網站沒有被官方CA機構信任,會出現證書錯誤的結果,以12306為示例(其實12306現在已經沒有這個問題了),可以將verify引數設定為False。
程式碼:
import requests response=requests.get('https://www.12306.cn',verify=False) print(response.status_code)
但是執行結果會輸出警告,它建議我們給他指定證書,我們可以設定忽略警告或者捕獲警告到日誌。
程式碼:
import requests import logging from requests.packages import urllib3 urllib3.disable_warnings()#忽略警告 #或使用logging.captureWarnings(True) #捕獲警告到日誌 response=requests.get('https://www.12306.cn',verify=False) print(response.status_code)
我們也可以指定一個本地證書用作客戶端證書。
程式碼:
import requests response=requests.get('https://www.12306.cn',cert=('/path/server.crrt','/path/key')) #包含兩個檔案路徑的元組,key需要是解密的 print(response.status_code)
5.代理設定
使用proxies引數設定代理。
程式碼:
import requests proxies={ 'http':'http://10.10.1.10:3128', 'https':'http://10.10.1.10:1080' } #若代理需要使用HTTP Basic Auth可以使用類似http://user:password@:port的語法 #proxies={ # 'http':'http://user:[email protected]:3128/', #} #還支援SOCKS協議代理 #proxies={ # 'http':'socks5://user:password@:port', # 'https':'socks5://user:password@:port' #} requests.get("https://www.taobao.com",proxies=proxies)
6.超時設定
使用timeout引數。
程式碼:
import requests r=requests.get("http://www.taobao.com",timeout=1) print(r.status_code)
這裡將請求時間設定為1秒,如果1秒內沒有響應,就丟擲異常。
實際上,請求分為兩個階段連線和讀取,timeout設定將用於這兩個時間階段的總和。
可以傳入一個元組來分別指定這兩個階段的時間。
r=requests.get("http://www.taobao.com",timeout=(5,30))
如果不設定,直接留空或者設定為None。
7.身份驗證
有的網站可能需要身份驗證。可以直接傳入一個元組給auth引數。
import requests r=requests.get('http://locahost:5000',auth=('username','password')) print(r.status_code)
8.Prepared Request
requests中的資料結構就叫Prepared Request。
程式碼:
from requests import Request,Session url='http://httpbin.org/post' data={ 'name':'germey' } headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } s=Session() req=Request('POST',url,data=data,headers=headers)#構造了一個Request物件 prepped=s.prepare_request(req)#使用prepare_request()將其轉換為一個Prepared Request物件 r=s.send(prepped)#呼叫send傳送 print(r.text)
執行結果:輸出
{ "args": {}, "data": "", "files": {}, "form": { "name": "germey" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "11", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60", "X-Amzn-Trace-Id": "Root=1-5fd9da60-2467cc031aa12ebd39daf72d" }, "json": null, "origin": "183.92.251.74", "url": "http://httpbin.org/post" }
同樣達到POST請求效果。
參考用書:《python3網路爬蟲開發實戰》