爬蟲Requests基本使用
阿新 • • 發佈:2018-12-10
Requests基本使用
安裝
- pip install requests
一、Requests模組請求
- 獲取網頁(不帶引數)
r = requests.get('http://www.chinahufei.com')
r = requests.post('http://www.chinahufei.com')
r = requests.delete('http://www.chinahufei.com')
r = requests.head('http://www.chinahufei.com')
r = requests.options('http://www.chinahufei.com')
- 獲取網頁(帶引數)
# get方式 r = requests.get("http://api.chinahufei.com", params = { 'page': 1 }) # post方式 r = requests.post('http://api.chinahufei.com', data = {'kwd':'hufei'}) # 通用方式 r = requests.request("get", "http://api.chinahufei.com/") # 其他 payload = {'page': '1', 'kwd': ['hufei', 'china']} r = requests.get('http://api.chinahufei.com', params=payload)
- 獲取網頁(帶header和UserAgent)
# get方式 kw = {'kwd':'長城'} headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"} # params 接收一個字典或者字串的查詢引數,字典型別自動轉換為url編碼,不需要urlencode() response = requests.get("http://api.chinahufei.com", params = kwd, headers = headers) # post方式 formdata = { "type":"AUTO", "i":"i love python", "doctype":"json", "xmlVersion":"1.8", "keyfrom":"fanyi.web", "ue":"UTF-8", "action":"FY_BY_ENTER", "typoResult":"true" } url = "http://api.chinahufei.com" headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"} r = requests.post(url, data = formdata, headers = headers)
- 獲取網頁(使用代理)
import requests
# 根據協議型別,選擇不同的代理
proxies = {
"http": "http://12.34.56.79:9527",
"https": "http://12.34.56.79:9527"
}
response = requests.get("http://api.chinahufei.com", proxies = proxies)
print response.text
# 私密代理驗證
import requests
# 如果代理需要使用HTTP Basic Auth,可以使用下面這種格式:
proxy = { "http": "mr_mao_hacker:[email protected]:16816" }
response = requests.get("http://api.chinahufei.com", proxies = proxy)
print response.text
# Web客戶端驗證
import requests
auth=('test', '123456')
response = requests.get('http://192.168.199.107', auth = auth)
print response.text
- 獲取網頁(重定向使用)
# 不允許
r = requests.head('http://github.com', allow_redirects=False)
- HTTPS請求 SSL證書驗證
# 如果我們想跳過 12306 的證書驗證,把 verify 設定為 False 就可以正常請求了。
r = requests.get("https://www.12306.cn/mormhweb/", verify = False)
二、Request模組響應
- 響應內容-text(Unicode格式的資料)
- 響應內容-content(位元組流資料)
- 響應內容-json(json型別的資料)
- url地址-url(完整地址)
- 響應碼-status_code
- 響應頭-headers()
- 響應頭部字元編碼-encoding
- Cookies-cookies
import requests
response = requests.get("http://www.baidu.com/")
# 返回CookieJar物件
cookiejar = response.cookies
# 將CookieJar轉為字典
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
print cookiejar
print cookiedict
- Sission-session
# 人人網模擬登入
import requests
# 1. 建立session物件,可以儲存Cookie值
ssion = requests.session()
# 2. 處理 headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# 3. 需要登入的使用者名稱和密碼
data = {"email":"[email protected]", "password":"alarmchime"}
# 4. 傳送附帶使用者名稱和密碼的請求,並獲取登入後的Cookie值,儲存在ssion裡
ssion.post("http://www.renren.com/PLogin.do", data = data)
# 5. ssion包含使用者登入後的Cookie值,可以直接訪問那些登入後才可以訪問的頁面
response = ssion.get("http://www.renren.com/410043129/profile")
# 6. 列印響應內容
print response.text
- 響應歷史-history
三、Request模組的編解碼問題3種解決方法
- response.content.decode()
- response.content.decode('gbk')
- response.text