從Request庫理解HTTP消息

阿新 • • 發佈：2017-12-07

print 獨立 closed heroku 參考 enc 單獨 end 解碼

背景

requests庫官方文檔地址

http://docs.python-requests.org/en/master/

作者博客

http://www.kennethreitz.org

github地址

https://github.com/requests/requests

環境搭建

基本環境

python、pip

註：pip freeze -- 查看當前已經安裝的pip包

安裝其他必須軟件

virtualenv

作用

初始化一個空環境（也可以不初始化）

使用 pip / easy_install 是為了能夠良好的管理起來你的包，這類似deb之流。

之所以需要用 VirtualEnv，關鍵的目的是把你當前開發/生產的 Python 環境和其他的 Python 環境隔離開來。例如你的 Proj1 需要用到 LibA 的版本1.0，而你的 Proj2 需要用到LibA的2.0，如果不加以區分，那麽就會可能造成沖突。

在 VirtualEnv 中，每一個 VirtualEnv 的環境，都是隔離的，需要的包等都是單獨的，你可以認為這是一個沙盒（SandBox），這時候 pip / easy_install 等運行時候依賴的 Python 也是隔離的，既 $VENV_HOME/bin/python 而非 /usr/bin/python。

一般的，yum / apt 安裝的會安裝在系統的路徑中，針對 Python，則是 Global 的 PYTHONPATH。很難做到隔離。

而從源代碼安裝的，一般的會根據你的運行時 Python 命令進行隔離。也就是如果你啟用了 VirtualEnv (source $VENV_HOME/bin/activate）後，從源碼安裝的也會在這個 venv 下。

安裝virtualenv

pip install virtualenv

使用virtualenv初始化當前文件夾

virtualenv .env

激活當前文件夾的virtualenv

windows：.env/Script/activate

linux：source .env/bin/activate

註：deactivate可以退出虛擬環境

requests庫

pip install requests

httpbin.org

測試用，由於http://httpbin.org/服務器在美國，所以可以在本地搭建個類似環境測試

pip install gunicorn httpbin

gunicorn httpbin:app

註：windows上無法安裝gunicorn

HTTP協議

概念

HyperText Transfer Protocol超文本傳輸協議

The Hypertext Transfer Protocol is a stateless(無狀態的), application-level protocol(應用層協議) for distributed(分布式), collaborative(協作式), hepertext information system(超文本信息系統)

顯示一次http通信的整個過程

curl -v https://www.imooc.com/ >/data01/yc_files/http_contact_demo.log

顯示分析

技術分享圖片

urllib

概念

python原生網絡庫

urllib、urllib2、urllib3的關系

urllib和urllib2是獨立的模塊，並沒有直接的關系，兩者相互結合實現復雜的功能

urllib和urllib2在python2中才可以使用

requests庫中使用了urllib3(多次請求重復使用一個socket)

urllib和request區別

技術分享圖片

urlib_demo

# -*- coding:utf-8 -*-
import urllib2
import urllib

URL_simple="http://httpbin.org/ip"
URL_get="http://httpbin.org/get"

def urllib_simple_use():
    response = urllib2.urlopen(URL_simple)
    print( ‘>>>>Response Headers:‘)
    print( response.info())
    print( ‘>>>>Response Status Code:‘)
    print( response.getcode())
    print( ‘>>>>Response :‘)
    print( ‘‘.join([line for line in response]))

def urllib_params_use():
    params = urllib.urlencode({‘param1‘:‘hello‘,‘param2‘:‘world‘})
    response = urllib2.urlopen(‘?‘.join([URL_get,params]))
    print( ‘>>>>Response Headers:‘)
    print( response.info())
    print( ‘>>>>Response Status Code:‘)
    print( response.getcode())
    print( ‘>>>>Response :‘)
    print( ‘‘.join([line for line in response]))

if __name__ == ‘__main__‘:
    print( ‘>>>>Urllib Simple Use:‘)
    urllib_simple_use()
    print( ‘>>>>Urllib Params Use:‘)
    urllib_params_use()

urllib simple demo

requests_demo

quick start: http://docs.python-requests.org/en/latest/user/quickstart/

# -*- coding:utf-8 -*-
import requests

URL_simple="http://httpbin.org/ip"
URL_get="http://httpbin.org/get"

def requests_simple_use():
    response = requests.get(URL_simple)
    print( ‘>>>>Request Headers:‘)
    print( response.request.headers)
    print( ‘>>>>Request body:‘)
    print( response.request.body)
    print( ‘>>>>Response url:‘)
    print( response.url)
    print( ‘>>>>Response Headers:‘)
    print( response.headers)
    print( ‘>>>>Response Status Code:‘)
    print( response.status_code)
    print( ‘>>>>Response Status Code Reason:‘)
    print( response.reason)
    print( ‘>>>>Response :‘)
    print( response.text)
    print( response.json())

def requests_params_use():
    params = {‘param1‘:‘hello‘,‘param2‘:‘world‘}
    response = requests.get(URL_get,params=params)
    print( ‘>>>>Request Headers:‘)
    print( response.request.headers)
    print( ‘>>>>Request body:‘)
    print( response.request.body)
    print( ‘>>>>Response url:‘)
    print( response.url)
    print( ‘>>>>Response Headers:‘)
    print( response.headers)
    print( ‘>>>>Response Status Code:‘)
    print( response.status_code)
    print( ‘>>>>Response Status Code Reason:‘)
    print( response.reason)
    print( ‘>>>>Response :‘)
    print( response.text)
    print( response.json())

if __name__ == ‘__main__‘:
    print( ‘>>>>Requests Simple Use:‘)
    requests_simple_use()
    print( ‘>>>>Requests Params Use:‘)
    requests_params_use()

requests simple demo

擴展

RFC7230 -> RFC7235閱讀

https://tools.ietf.org/html/

發送請求

import requests
import json
from requests.exceptions import RequestException 

ROOT_URL=‘https://api.github.com‘

def better_print(src_json):
    ‘‘‘
        格式化打印json
    ‘‘‘
    return json.dumps(src_json,indent=4)

def get_simple_use():
    ‘‘‘
        API獲得指定用戶的用戶信息
    ‘‘‘
    response = requests.get(‘/‘.join([ROOT_URL,‘users/wahaha‘]))
    print(‘>>>>Response text:‘)
    print(better_print(response.json()))

def get_auth_use():
    ‘‘‘
        API獲得指定用戶的email信息--明文，不建議方法。
        當然，123456是錯誤的密碼。
        建議通過簡易oauth認證。
    ‘‘‘
    # response = requests.get(‘/‘.join([ROOT_URL,‘user/emails‘]))
    response = requests.get(‘/‘.join([ROOT_URL,‘user/emails‘]),auth=(‘wahaha‘,‘123456‘))
    print(‘>>>>Response text:‘)
    print(better_print(response.json()))

def get_params_use():
    ‘‘‘
        get + 帶參URL 獲得11號之後的user信息
    ‘‘‘
    response = requests.get(‘/‘.join([ROOT_URL,‘users‘]),params={‘since‘:11})
    print( ‘>>>>Request Headers:‘)
    print( response.request.headers)
    print( ‘>>>>Request body:‘)
    print( response.request.body)
    print( ‘>>>>Response url:‘)
    print( response.url)
    print(‘>>>>Response text:‘)
    print(better_print(response.json()))
    
def patch_json_use():
    ‘‘‘
        json參數 + patch 修改用戶郵箱
    ‘‘‘
    response = requests.patch(‘/‘.join([ROOT_URL,‘user‘]),auth=(‘wahaha‘,‘123456‘),json={‘name‘:‘test_name‘,‘email‘:‘[email protected]‘})
    print( ‘>>>>Request Headers:‘)
    print( response.request.headers)
    print( ‘>>>>Request body:‘)
    print( response.request.body)
    print( ‘>>>>Response url:‘)
    print( response.url)
    print(‘>>>>Response text:‘)
    print(better_print(response.json()))
    
def request_exception_use():
    ‘‘‘
        設定超時時間 + 異常處理
        timeout = x 握手+發送response超時時間-x
        timeout = ( x, y) 握手超時時間-x, 發送response超時時間-y
    ‘‘‘
    try:
        response = requests.get(‘/‘.join([ROOT_URL,‘users‘]),timeout=(0.1,0.2),params={‘since‘:11})
    except RequestException as e:
        print(e)
    else:
        print( ‘>>>>Request Headers:‘)
        print( response.request.headers)
        print( ‘>>>>Request body:‘)
        print( response.request.body)
        print( ‘>>>>Response url:‘)
        print( response.url)
        print(‘>>>>Response text:‘)
        print(better_print(response.json()))
    
def my_request_use():
    ‘‘‘
        簡單模擬requests庫底層實現發送requset方法
    ‘‘‘
    # 導庫
    from requests import Request,Session
    # 初始化session
    my_session = Session()
    # 初始化headers
    my_headers = {‘User-Agent‘:‘fake1.1.1‘}
    # 初始化request
    my_request = Request(‘GET‘,‘/‘.join([ROOT_URL,‘users‘]),headers=my_headers, params={‘since‘:‘11‘})
    # 準備request
    my_prepared_request = my_request.prepare()
    # 發送request，並用response接受
    my_response = my_session.send(my_prepared_request,timeout=(3,3))
    print( ‘>>>>Request Headers:‘)
    print( json.dumps(dict(my_response.request.headers),indent=4))
    print( ‘>>>>Request body:‘)
    print( my_response.request.body)
    print( ‘>>>>Response url:‘)
    print( my_response.url)
    print( ‘>>>>Response Headers:‘)
    print( json.dumps(dict(my_response.headers),indent=4))
    print( ‘>>>>Response Status Code:‘)
    print( my_response.status_code)
    print( ‘>>>>Response Status Code Reason:‘)
    print( my_response.reason)
    # print( ‘>>>>Response :‘)
    # print(better_print(my_response.json()))

def hook_function(response, *args, **kw):
    print (‘回調函數>>>‘,response.headers[‘Content-Type‘])
    
def event_hook_use():
    ‘‘‘
        事件鉤子，即回調函數，指定獲得response時候調用的函數
    ‘‘‘
    response = requests.get(‘http://www.baidu.com‘,hooks={‘response‘:hook_function})
    
if __name__=="__main__":
    # get_simple_use()
    # get_auth_use()
    # get_params_use()
    # patch_json_use()
    # request_exception_use()
    # my_request_use()
    event_hook_use()

requests庫使用demo

參數類型

不同網站接口所對應的發送請求所需的參數類型不同

https://developer.github.com/v3/#parameters

1.URL參數

https://xxx/xx?a=1&b=2&c=3

request庫中使用requests.get(url, params={‘a‘:‘1‘,‘b‘:‘2‘,‘c‘:‘3‘})

優勢：跳轉方便，速度快

劣勢：明文，長度有限制

2.表單參數提交

Content-Type: application/x-www-form-urlencoded

request庫中使用requests.post(url, data={‘a‘:‘1‘,‘b‘:‘2‘,‘c‘:‘3‘})

3.json參數提交

Content-Type: application/json

request庫中使用request.post(url, json={‘a‘:‘1‘,‘b‘:‘2‘,‘c‘:‘3‘})

舉例網站

https://developer.github.com

使用參考：https://developer.github.com/v3/guides/getting-started/

請求方法

GET -- 查看資源

POST -- 增加資源

PATCH -- 修改資源

PUT -- 修改資源（比PATCH修改的力度大）

DELETE -- 刪除資源

HEAD -- 查看響應頭

OPTIONS -- 查看可用請求方法

異常處理

requests庫中顯示引發的所有異常都繼承自requests.exceptions.RequestException，所以直接捕獲該異常即可。

自定義request

[email protected]:requests/requests.git

通過閱讀request源碼，了解Session(proxy,timeout,verify)、PreparedRequest(body,headers,auth)、Response(text,json...)

處理響應

響應基本API

響應的附加信息

status_code 響應碼 E:\yc_study\python\request\request\HTTP狀態碼.html

reason 響應碼解釋

headers 響應頭

url 該響應是從哪個url發來的

history 重定向的歷史信息

elapsed 接口調用時長

request 該響應對應的request

響應的主體內容

encoding 主體內容的編碼格式

content 主體內容，str類型

text 主體內容，unicode類型

json 主體內容，json類型

raw 流模式下，返回一個urllib3.response.HTTPResponse類型的對象，可以分塊取出數據

>>> r = requests.get(‘https://api.github.com/events‘, stream=True)

>>> r.raw

<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)

‘\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03‘

iter_content 流模式下，返回字節碼，但會自動解碼，可以分塊取出數據

with open(filename, ‘wb‘) as fd:

    for chunk in r.iter_content(chunk_size=128):

        fd.write(chunk)

註1：iter_content和raw的區別

使用Response.iter_content會處理很多你直接使用Response.raw時必須處理的內容。在流式傳輸下載時，以上是檢索內容的首選和推薦方式。請註意，chunk_size可以隨意調整為更適合您的用例的數字。

Response.iter_content會自動解碼gzip和deflate傳輸編碼。Response.raw是一個原始的字節流，它不轉換響應內容。如果您確實需要訪問返回的字節，請使用Response.raw。

註2：使用流模式讀取文件時，最終需要關閉流

http://www.cnblogs.com/Security-Darren/p/4196634.html

# -*- coding:utf-8 -*-

def get_image(img_url):
    import requests
    # response = requests.get(img_url,stream=True)
    # 引入上下文管理器的closing來實現關閉流
    from contextlib import closing
    # with closing(requests.get(img_url,stream=True,headers={‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36‘})) as response:
    with closing(requests.get(img_url,stream=True)) as response:
        print(‘Request Headers>>:\n‘,response.request.headers)
        with open(‘demo1.jpg‘,‘wb‘) as img_file:
            for chunk in response.iter_content(128):
                img_file.write(chunk)


def main():
    img_url=‘https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1512314249355&di=6371398ddfba39ce23fc02b74b2d59cf&imgtype=0&src=http%3A%2F%2Fpic.58pic.com%2F58pic%2F16%2F42%2F96%2F56e58PICAu9_1024.jpg‘
    # img_url=‘http://img5.imgtn.bdimg.com/it/u=2502798296,3184925683&fm=26&gp=0.jpg‘
    get_image(img_url)

if __name__ == ‘__main__‘:
    main()

requests庫--下載圖片並保存到本地

事件鉤子--回調函數

　　見 “發送請求” 下的代碼

HTTP認證

基本認證

原理

技術分享圖片

使用

即上面說過的明文auth

requests.get(‘https://api.github.com/user‘,auth=(‘wahaha‘, ‘123456‘))

結果

此種方式在request庫中是使用的base64編碼的，直接解碼可以得到用戶名和密碼

　　headers如下：

技術分享圖片

oauth認證

原理

技術分享圖片

oauth的流程

http://www.barretlee.com/blog/2016/01/10/oauth2-introduce/

官方自定義oauth示例

http://www.python-requests.org/en/master/user/advanced/#custom-authentication

使用

簡單使用：在settings->developer settings -> personal access tokens處生成一個測試token，並選擇scope，將該token值放到request的headers中即可

結果

　　headers如下：

技術分享圖片

proxy代理

原理

技術分享圖片

背景

本機不可以訪問外網

代理服務器可以訪問外網

配置代理服務器後，所有訪問外網的請求會被發送到代理服務器，然後代理服務器發送該請求到外網服務器，然後·外網服務器返回響應到代理服務器，代理服務器再返回到本機

搜索 heroku + socks

cookie

原理

由於HTTP本身是無狀態的，即每個HTTP請求應該是獨立的，但是由於現實需要後面的請求需要依賴前面的請求所獲得的結果，所以產生了cookie。

1.瀏覽器第一次發送HTTP請求（無cookie）

2.服務器返回帶cookie的HTTP響應

3.瀏覽器解析相應中的cookie，並保存在本地

4.瀏覽器第二次發送請求（帶cookie）

技術分享圖片

官方使用

http://docs.python-requests.org/en/master/user/quickstart/#cookies

session

原理

由於cookie需要將用戶的數據在每次請求中都附帶，而數據在網絡上傳輸是不安全的，所以產生了session，session會將主要數據保存在服務器端，而只給瀏覽器端返回一個帶session-id的cookie，一定保障了數據的安全性與傳輸效率

技術分享圖片

從Request庫理解HTTP消息

print 獨立 closed heroku 參考 enc 單獨 end 解碼背景 requests庫官方文檔地址 http://docs.python-requests.org/en/master/ 作者博客 http://www.k

從Request庫理解HTTP消息

背景

requests庫官方文檔地址

作者博客

github地址

環境搭建

基本環境

安裝其他必須軟件

virtualenv

requests庫

httpbin.org

HTTP協議

概念

顯示一次http通信的整個過程

顯示分析

urllib

概念

urllib、urllib2、urllib3的關系

urllib和request區別

發送請求

參數類型

舉例網站

請求方法

異常處理

自定義request

處理響應

響應基本API

事件鉤子--回調函數

HTTP認證

基本認證

原理

使用

結果

oauth認證

原理

oauth的流程

官方自定義oauth示例

使用

結果

proxy代理

原理

背景

更多

cookie

原理

官方使用

session

原理

相關推薦