python requests簡介
更為強大的庫requests是為了更加方便地實現爬蟲操作,有了它 , Cookies 、登入驗證、代理設定等操作都不是 .
一、安裝requests模組(cmd視窗執行)
pip3 install requests
二、requests的基本方法
import requests response=requests.get("https://www.baidu.com/") print(type(response)) #<class 'requests.models.Response'> response型別print(response.status_code) #200 獲取狀態碼 print(response.text) #獲取網頁原始碼 print(response.content) #獲取網頁原始碼 print(response.cookies) #獲取網頁cookies ,Req u estsCookieJar print(response.headers) #獲取請求頭
三、推薦一個測試網址:http://httpbin.org測試請求網站,可以隨便搗鼓(其他請求方式)
import requests r=requests.post("http://httpbin.org/post") print(r.text) #列印post請求的頭部資訊 r=requests.put("http://httpbin.org/post") r=requests.delete("http://httpbin.org/post") r=requests.options("http://httpbin.org/post")
這裡分別用 post ()、 put ()、 delete ()等方法實現了 POST 、 PUT 、 DELETE 等請求 。
四、get 請求
檢視get請求包含的請求資訊
import requests r=requests.get("http://httpbin.org/get") print(r.text) #列印get請求資訊
結果顯示: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "origin": "119.123.196.143", "url": "http://httpbin.org/get" }
結果顯示說明:一個請求資訊應該包含了請求頭、ip地址、URL等資訊。
(1)請求新增額外資訊
方法一:?key=value&key2=value2... (?:表示起始,&:表示和)
r= requests.get("http://httpbin.org/get?name=germey&age=22")
import requests r= requests.get("http://httpbin.org/get?name=germey&age=22") print(r.text)方法一示例
{ "args": { "age": "22", "name": "germey" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "origin": "119.123.196.143", "url": "http://httpbin.org/get?name=germey&age=22" }方法一結果
通過執行結果可以判斷,請求的連結自動被構造成了:http://httpbin.org/get?name=germey&age=22
方法二:利用get 裡面引數params,可以將請求資訊編譯載入到url中(推薦使用)
import requests data={ "name":"germey", "age":22 } r=requests.get("http://httpbin.org/get",params=data) print(r.text)
結果顯示: { "args": { "age": "22", "name": "germey" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "origin": "119.123.196.143", "url": "http://httpbin.org/get?name=germey&age=22" }方法二結果
結果都構造了:http://httpbin.org/get?name=germey&age=22,方法二比較實用
(2)從網頁請求到的請求資訊都是json格式字串,轉換成字典dict,使用 .json();
如果不是Json格式,則報錯:JSON。decodeJSONDecodeError異常
import requests r=requests.get("http://httpbin.org/get") print(type(r.text)) #檢視請求頭資料型別 print(r.text) #列印請求資訊 #r.json() 將json字串轉換為字典 print(type(r.json()))#轉換為dict,列印資料型別
<class 'str'> { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "origin": "119.123.196.143", "url": "http://httpbin.org/get" } <class 'dict'>結果
結果顯示:請求資訊是<str>;r.json()後的資料是<dict>
五、get方法請求抓取網頁例項
(1)成功獲取知乎的網頁資訊
# 請求知乎 import requests #構建請求要求資訊 data={ "type":"content", "q":"趙麗穎" } url="https://www.zhihu.com/search" #構建請求的ip和伺服器資訊 headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36" , "origin": "119.123.196.143", } response=requests.get(url,params=data,headers=headers) print(response.text)新增搜尋資訊和headrs資訊
這裡我們加入了 headers 資訊,其中包含了 User- Agent 欄位資訊, 也就是瀏覽器標識資訊 。 如果
不加這個 ,知乎會禁止抓取,data構造了一個請求搜尋資訊.
(2)github站點圖示下載
import requests r=requests.get(" https://github.com/favicon.ico") print(r.text) print(r.content) with open("github.ico","wb") as f: f.write(r.content)列印圖示示例
(3)請求頭資訊headers
{ "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "origin": "119.123.196.143", "url": "https://httpbin.org/get" }
(4)抓取github圖示
r.text 得到的資料是字串型別
r.content 得到的資料是bytes型別資料
import requests r = requests.get("https://github.com/favicon.ico") print('text',r.text)#獲取到字串 print('content',r.content) #獲取的是二進位制
import requests r=requests.get(" https://github.com/favicon.ico") print(r.text) print(r.content) with open("gg.ico","wb") as f: f.write(r.content)github圖片寫入
六、post請求
帶data資訊請求
import requests data ={ 'name' :'pig', 'age':18 } r = requests.post('http://httpbin.org/post', data=data) print(r.text)
{ "args": {}, "data": "", "files": {}, "form": { "age": "18", "name": "pig" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "15", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.20.1" }, "json": null, "origin": "119.123.198.80", "url": "http://httpbin.org/post" }post請求資訊
七、請求狀態碼
1、100狀態碼:資訊狀態碼
2、200狀態碼:成功狀態碼
3、300狀態嗎:重定向狀態碼
4、400狀態碼:客戶端錯誤狀態碼
5、500狀態碼:伺服器錯誤狀態碼