Python爬蟲之requests模組
阿新 • • 發佈:2019-11-21
獲取響應資訊
import requests response = requests.get('http://www.baidu.com') print(response.status_code) # 狀態碼 print(response.url) # 請求url print(response.headers) # 響應頭資訊 print(response.cookies) # cookie資訊 print(response.content) # bytes形式的響應內容 print(response.encoding) # 獲取響應內容編碼 response.encoding=”utf-8” # 指定響應內容編碼 print(response.text) # 文字形式的響應內容,response.content編碼後的結果
傳送Get請求
不帶引數的Get請求
response = requests.get('http://www.baidu.com')
print(response.text)
帶引數的Get請求
直接寫在url後面
在url後面用?表示帶上引數,每對引數用&分隔。如下url:
https://www.bilibili.com/video/av4050443?from=search&seid=17321873743047145176
注意:url最長2048位元組,且資料透明不安全
作為字典引數傳入
data = {'name': 'xiaoming', 'age': 26} response = requests.get('http://www.abcd.com', params=data) print(response.text)
傳送post請求
只能作為字典引數傳入,注意引數名字是data而不是params
data = {'name': 'xiaoming', 'age': 26}
response = requests.post('http://www.abcd.com', data=data)
print(response.text)
新增headers
heads = {} heads['User-Agent'] = 'Mozilla/5.0 ' \ '(Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 ' \ '(KHTML, like Gecko) Version/5.1 Safari/534.50' response = requests.get('http://www.baidu.com',headers=headers)
使用代理
proxy = {'http': '49.89.84.106:9999', 'https': '49.89.84.106:9999'}
heads = {}
heads['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
req = requests.get(url, proxies=proxy, headers=heads)
print(req.text)
使用加密代理
from requests.auth import HTTPProxyAuth
proxies= {'http': '127.0.0.1:8888', 'https': '127.0.0.1:8888'}
auth = HTTPProxyAuth('user', 'pwd')
requests.get(url, proxies=proxies, auth=auth)
也可以這樣
proxies = {"http": "http://user:[email protected]:3128/",}
req = requests.get(url, proxies=proxy, headers=heads)
Cookie
獲取Cookie
import requests
response = requests.get("http://www.baidu.com")
print(type(response.cookies))
# 把cookiejar物件轉化為字典
cookies = requests.utils.dict_from_cookiejar(response.cookies)
print(cookies)
使用Cookie
cookie = {"Cookie":"xxxxxxxx"}
response = requests.get(url,cookies=cookie)
Session
session = requests.Session()
session.get('http://httpbin.org/cookies/set/number/12345')
response = session.get('http://httpbin.org/cookies')
print(response.text)
限定響應時間
from requests.exceptions import ReadTimeout
try:
response = requests.get('https://www.baidu.com', timeout=1)
print(response.status_code)
except :
print('給定時間內未響應')
解析JSON格式的響應內容
通過response.json()方法可以將為JSON格式的響應內容轉變為Python的物件,json.loads(response.text)也能起到同樣的作用
response = requests.get('http://www.abcd.com')
print(response.text)
print(response.json())
print(type(response.json()))
想進一步瞭解程式設計開發相關知識,與我一同成長進步,請關注我的公眾號“松果倉庫”,共同分享宅&程式設計師的各類資源,謝謝!!!