1. 程式人生 > >python requests簡介

python requests簡介

 

  更為強大的庫requests是為了更加方便地實現爬蟲操作,有了它 , Cookies 、登入驗證、代理設定等操作都不是 .

  

 

一、安裝requests模組(cmd視窗執行)

pip3 install requests

 

二、requests的基本方法

import requests
response=requests.get("https://www.baidu.com/")
print(type(response)) #<class 'requests.models.Response'> response型別
print(response.status_code) #200 獲取狀態碼 print(response.text) #獲取網頁原始碼 print(response.content) #獲取網頁原始碼 print(response.cookies) #獲取網頁cookies ,Req u estsCookieJar print(response.headers) #獲取請求頭

 

三、推薦一個測試網址:http://httpbin.org測試請求網站,可以隨便搗鼓(其他請求方式)

import requests
r=requests.post("http://httpbin.org/post
") print(r.text) #列印post請求的頭部資訊 r=requests.put("http://httpbin.org/post") r=requests.delete("http://httpbin.org/post") r=requests.options("http://httpbin.org/post")

  這裡分別用 post ()、 put ()、 delete ()等方法實現了 POST PUT DELETE 等請求

 

四、get 請求

  檢視get請求包含的請求資訊

import requests
r=requests.get("http://httpbin.org/get
") print(r.text) #列印get請求資訊
結果顯示:
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "origin": "119.123.196.143", 
  "url": "http://httpbin.org/get"
}

  結果顯示說明:一個請求資訊應該包含了請求頭、ip地址、URL等資訊。

  (1)請求新增額外資訊

    方法一:?key=value&key2=value2... (?:表示起始,&:表示和)

r= requests.get("http://httpbin.org/get?name=germey&age=22")
import requests
r= requests.get("http://httpbin.org/get?name=germey&age=22")
print(r.text)
方法一示例
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "origin": "119.123.196.143", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}
方法一結果

  通過執行結果可以判斷,請求的連結自動被構造成了:http://httpbin.org/get?name=germey&age=22
    

  方法二:利用get 裡面引數params,可以將請求資訊編譯載入到url中(推薦使用

import requests

data={
    "name":"germey",
    "age":22
}
r=requests.get("http://httpbin.org/get",params=data)
print(r.text)
結果顯示:
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "origin": "119.123.196.143", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}
方法二結果

  結果都構造了:http://httpbin.org/get?name=germey&age=22,方法二比較實用

  (2)從網頁請求到的請求資訊都是json格式字串,轉換成字典dict,使用 .json();

    如果不是Json格式,則報錯:JSON。decodeJSONDecodeError異常

import requests
r=requests.get("http://httpbin.org/get")
print(type(r.text)) #檢視請求頭資料型別
print(r.text) #列印請求資訊
#r.json() 將json字串轉換為字典
print(type(r.json()))#轉換為dict,列印資料型別
<class 'str'>
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "origin": "119.123.196.143", 
  "url": "http://httpbin.org/get"
}

<class 'dict'>
結果

  結果顯示:請求資訊是<str>;r.json()後的資料是<dict>

 

五、get方法請求抓取網頁例項

  (1)成功獲取知乎的網頁資訊

# 請求知乎
import requests
#構建請求要求資訊
data={
    "type":"content",
    "q":"趙麗穎"
      }
url="https://www.zhihu.com/search"
#構建請求的ip和伺服器資訊
headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"  ,
    "origin": "119.123.196.143",
}
response=requests.get(url,params=data,headers=headers)
print(response.text)
新增搜尋資訊和headrs資訊

  這裡我們加入了 headers 資訊,其中包含了 User- Agent 欄位資訊, 也就是瀏覽器標識資訊 如果
不加這個 ,知乎會禁止抓取,data構造了一個請求搜尋資訊.
  (2)github站點圖示下載

import requests
r=requests.get(" https://github.com/favicon.ico")
print(r.text)
print(r.content)
with open("github.ico","wb") as f:
    f.write(r.content)
列印圖示示例

  (3)請求頭資訊headers 

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "origin": "119.123.196.143", 
  "url": "https://httpbin.org/get"
}

   (4)抓取github圖示

    r.text  得到的資料是字串型別

    r.content 得到的資料是bytes型別資料

 

import requests
r = requests.get("https://github.com/favicon.ico")
print('text',r.text)#獲取到字串
print('content',r.content) #獲取的是二進位制

import requests
r=requests.get(" https://github.com/favicon.ico")
print(r.text)
print(r.content)
with open("gg.ico","wb") as f:
    f.write(r.content)
github圖片寫入

 

六、post請求

  帶data資訊請求

import requests
data ={
    'name' :'pig',
    'age':18
}
r = requests.post('http://httpbin.org/post', data=data)
print(r.text)
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "18", 
    "name": "pig"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "15", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.20.1"
  }, 
  "json": null, 
  "origin": "119.123.198.80", 
  "url": "http://httpbin.org/post"
}
post請求資訊

 

七、請求狀態碼

  1、100狀態碼:資訊狀態碼

  2、200狀態碼:成功狀態碼

  3、300狀態嗎:重定向狀態碼

  4、400狀態碼:客戶端錯誤狀態碼

  5、500狀態碼:伺服器錯誤狀態碼