爬蟲requests高階用法

阿新 • • 發佈：2019-01-13

一檔案上傳

1 程式碼

import requests
#favicon.ico需要和當前指令碼在同一目錄下。
files = {'file': open('favicon.ico', 'rb')}
r = requests.post("http://httpbin.org/post", files=files)
print(r.text)

2 執行結果

E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/3_2_2.py
{
  "args": {},
  "data": "",
  "files": {
    "file": "data:application/octet-stream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4
  },
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "Content-Length": "6665",
    "Content-Type": "multipart/form-data; boundary=ea9414f1b8ae442f768c08b6a62b8082",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.21.0"
  },
  "json": null,
  "origin": "106.36.218.77",
  "url": "http://httpbin.org/post"
}

Process finished with exit code 0

3 說明

這個網站會返回響應，裡面包含files這個欄位，而form欄位是空的，這證明檔案上傳部分會單獨有一個files欄位來標識。

二 Cookies

1 程式碼

import requests

r = requests.get("https://www.baidu.com")
print(r.cookies)
for key, value in r.cookies.items():
    print(key + '=' + value)

2 執行結果

E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/3_2_2.py
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315

3 說明

首先呼叫cookies屬性即可成功得到Cookies，可以發現它是RequestCookieJar型別。然後用items()方法將其轉化為元組組成的列表，遍歷輸出每一個Cookie的名稱和值，實現Cookie的遍歷解析。

三 Cookie來維持登入狀態

1 首先登入知乎，將Headers中的Cookie內容複製下來。

2 程式碼

import requests

headers = {
    'Cookie': '_zap=5278893c-f810-4c54-bab7-8a41270bc214; d_c0="APDmmdY8Cg6PTudPGBNFoxdsxayIFBmmnPc=|1533985205"; q_c1=cbbdfe655e4c42b0bbb6bd5b98669e82|1533985206000|1533985206000; tgw_l7_route=e5fff8427ab0da864ad8c176457be0a7; _xsrf=9FMv1Gt6dC9h7xdayPvNCyaiVcwri6GK; capsion_ticket="2|1:0|10:1547258019|14:capsion_ticket|44:YzIwN2Y1ZTM2MDU1NGRiN2I3MGYzYzk2NDRlNWM4N2Y=|caf585e4ea653247735804c0a1457db785cee1269503caee2a240ba56c22ef62"; z_c0="2|1:0|10:1547258039|4:z_c0|92:Mi4xa0EwN0JnQUFBQUFBOE9hWjFqd0tEaVlBQUFCZ0FsVk50cFltWFFCaHp1NnY4Tm5pdHlzc2pWMXNKdUwzOUVnM1VB|0f317c974b531bb5bc0fc809108c496c3cc193d290555234949ee2eebd7ee059"',
    'Host': 'www.zhihu.com',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36',
}
r = requests.get('https://www.zhihu.com', headers=headers)
print(r.text)

3 結果

我們發現，結果中包含了登入後的結果，如圖3-7所示，這證明登入成功。

四會話維持

1 點睛

在requests中，如果直接利用get()或post()等方法的確可以做到模擬網頁的請求，但是這實際上是相當於不同的會話，也就是說相當於你用了兩個瀏覽器打開了不同的頁面。

設想這樣一個場景，第一個請求利用post()方法登入了某個網站，第二次想獲取成功登入後的自己的個人資訊，你又用了一次get()方法去請求個人資訊頁面。實際上，這相當於打開了兩個瀏覽器，是兩個完全不相關的會話，能成功獲取個人資訊嗎？那當然不能。

有小夥伴可能說，我在兩次請求時設定一樣的cookies不就行了？可以，但這樣做起來顯得很煩瑣，我們有更簡單的解決方法。

其實解決這個問題的主要方法就是維持同一個會話，也就是相當於開啟一個新的瀏覽器選項卡而不是新開一個瀏覽器。但是我又不想每次設定cookies，那該怎麼辦呢？這時候就有了新的利器——Session物件。

2 程式碼

import requests

requests.get('http://httpbin.org/cookies/set/number/123456789')
r = requests.get('http://httpbin.org/cookies')
print(r.text)

s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)

3 說明

所以，利用Session，可以做到模擬同一個會話而不用擔心Cookies的問題。它通常用於模擬登入成功之後再進行下一步的操作。

Session在平常用得非常廣泛，可以用於模擬在一個瀏覽器中開啟同一站點的不同頁面

爬蟲requests高階用法

爬蟲requests高階用法

python requests 高階用法

Python爬蟲(3):Requests的高階用法

關於爬蟲的日常復習（13）—— 爬蟲requests的初級高級的基本用法

python爬蟲---requests庫的用法

Python爬蟲入門四之Urllib庫的高階用法

Python爬蟲十六式 - 第三式：Requests的用法

requests的高階用法（1）

Python爬蟲（三）Urllib庫的高階用法

最適合新手上手的爬蟲專案！requests的用法最全合集！

網絡爬蟲--requests庫中兩個重要的對象

Python3.x：requests的用法

python 爬蟲 requests+BeautifulSoup 爬取巨潮資訊公司概況代碼實例

爬蟲 requests.post

Python之爬蟲-- Requests

iOS webView的高階用法之JS互動

python sorted函式高階用法

Flask高階應用06---模型的CRUD和高階用法

嵌入式C的高階用法必須瞭解

#define巨集的高階用法

爬蟲requests高階用法

相關推薦