爬蟲之requests

阿新 • • 發佈：2018-05-07

channel 方法覆蓋 gin 提交 odin 解析html 應用 mozilla lazy

requests

Python標準庫中提供了：urllib、urllib2、httplib等模塊以供Http請求，但是，它的 API 太渣了。它是為另一個時代、另一個互聯網所創建的。它需要巨量的工作，甚至包括各種方法覆蓋，來完成最簡單的任務。

Requests 是使用 Apache2 Licensed 許可證的基於Python開發的HTTP 庫，其在Python內置模塊的基礎上進行了高度的封裝，從而使得Pythoner進行網絡請求時，變得美好了許多，使用Requests可以輕而易舉的完成瀏覽器可有的任何操作。

一：Python的requestcontent和text的區別

結論
resp.text 
返回的是Unicode型的數據。
resp.content返回的是bytes型也就是二進制的數據。
實地的應用
也就是說，如果你想取文本，可以通過r.text。
如果想取圖片，文件，則可以通過r.content。
（resp.json()返回的是json格式數據）

響應中的參數和對應的數據內容

r.status_code #響應狀態碼
r.raw #返回原始響應體，也就是 urllib 的 response 對象，使用 r.raw.read() 讀取
r.content #字節方式的響應體，會自動為你解碼 gzip 和 deflate 壓縮
r.text #字符串方式的響應體，會自動根據響應頭部的字符編碼進行解碼 

r.headers #以字典對象存儲服務器響應頭，但是這個字典比較特殊，字典鍵不區分大小寫，若鍵不存在則返回None
#*特殊方法*#
r.json() #Requests中內置的JSON解碼器
r.raise_for_status() #失敗請求(非200響應)拋出異常

1、GET請求

　　1.無參數的實例

#encoding=utf-8
import requests

ret=requests.get(‘https://www.autohome.com.cn/news/‘)
print(ret.apparent_encoding)
# 打印結果如下：GB2312，此參數能夠打印出網頁的編碼格式 

ret.encoding=ret.apparent_encoding
#這個是設置解析的形式，一般這樣寫就能夠動態按照網頁的編碼格式來解析獲取的數據
# print(ret.content)
# print(ret.text)
print(ret.cookies)
ret1=ret.text

2.POST請求

ret1=requests.post(
    url=‘https://dig.chouti.com/login‘,#提交數據走那個url
    data={"phone": "8617701205345", "password": "huoxianyu", "oneMonth": "1"},#提交數據的內容
    headers={
        ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36‘
    }, #請求頭中添加的內容
    cookies=cookies1,#cookies的值，跟著請求一起發送過去
)

3、其他請求

requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
  
# 以上方法均是在此方法的基礎上構建
requests.request(method, url, **kwargs)

4、更多參數

官方文檔：http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4

BeautifulSoup

BeautifulSoup是一個模塊，該模塊用於接收一個HTML或XML字符串，然後將其進行格式化，之後遍可以使用他提供的方法進行快速查找指定元素，

從而使得在HTML或XML中查找指定元素變得簡單。

解析HTML格式的字符串
           pip3 install beautifulsoup4
           
           soup = BeautifulSoup(‘<html>....</html>‘,"html.parser")#第一個參數就是上邊通過requests獲取的數據，第二個參數是解析器，公司一般用的是lxml
           
           div = soup.find(name=‘標簽名‘)
           div = soup.find(name=‘標簽名‘,id=‘i1‘)
           div = soup.find(name=‘標簽名‘,_class=‘i1‘)
           div = soup.find(name=‘div‘,attrs={‘id‘:‘auto-channel-lazyload-article‘,‘class‘:‘id‘})   #第二個參數是設置這個標簽中包含的屬性和對應的值
            
           div.text   #打印此標簽內的文本內容
           div.attrs 
           div.get(‘href‘) #獲取標簽內的屬性
           
            
           divs = soup.find_all(name=‘標簽名‘)  #find_all獲取的數據是一個列表
           divs = soup.find_all(name=‘標簽名‘,id=‘i1‘)
           divs = soup.find_all(name=‘標簽名‘,_class=‘i1‘)
           divs = soup.find_all(name=‘div‘,attrs={‘id‘:‘auto-channel-lazyload-article‘,‘class‘:‘id‘})
           
           divs是列表
           divs[0]

爬蟲之requests

爬蟲之requests

requests

一：Python的requestcontent和text的區別

BeautifulSoup

20170717_python爬蟲之requests+cookie模擬登陸

python爬蟲之requests模塊

爬蟲之requests介紹

爬蟲之requests庫

python3 爬蟲之requests模塊使用總結

爬蟲之requests

簡單爬蟲之requests的使用

爬蟲之 Requests庫的基本使用

Python網路爬蟲之requests庫Scrapy爬蟲比較

Python爬蟲之Requests庫的基本使用

爬蟲之Requests庫應用例項

Python爬蟲之requests+正則表示式抓取貓眼電影top100以及瓜子二手網二手車資訊(四)

python3[爬蟲實戰] 爬蟲之requests爬取新浪微博京東客服

python爬蟲之requests庫詳解（一，如何通過requests來獲得頁面資訊）

python爬蟲之requests的基本使用

Python爬蟲之requests庫(三)：傳送表單資料和JSON資料

第一節：web爬蟲之requests

爬蟲之requests模塊

python爬蟲之requests對https的限制訪問

Python爬蟲之requests庫(五)：Cookie、超時、重定向和請求歷史

爬蟲之requests

requests

一：Python的requestcontent和text的區別

BeautifulSoup

相關推薦