spider.?-python中urllib.request和requests的使用和區別

阿新 • • 發佈：2020-10-10

轉載自：https://blog.csdn.net/qq_38783948/article/details/88239109

1.urllib.request

我們都知道，urlopen()方法能發起最基本對的請求發起，但僅僅這些在我們的實際應用中一般都是不夠的，可能我們需要加入headers之類的引數,那需要用功能更為強大的Request類來構建了

在不需要任何其他引數配置的時候，可直接通過urlopen()方法來發起一個簡單的web請求

1.1發起一個簡單的請求

import urllib.request

url='https://www.baidu.com'

webPage=urllib.request.urlopen(url)

 
print(webPage)
print('--------------------------------------')
data=webPage.read()

print(data)
print('--------------------------------------')
print(data.decode('utf-8'))

urlopen()方法返回的是一個http.client.HTTPResponse物件，需要通過read()方法做進一步的處理。一般使用read()後，我們需要用decode()進行解碼，通常為utf-8，經過這些步驟後，最終才獲取到我們想要的網頁

1.2新增Headers資訊

import urllib.request

url='https://www.douban.com'

headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
 }
 
response=urllib.request.Request(url=url,headers=headers)

webPage=urllib.request.urlopen(response)

 
print(webPage.read().decode('utf-8'))

使用Request類返回的又是一個urllib.request.Request物件了。
通常我們爬取網頁，在構造http請求的時候，都需要加上一些額外資訊，什麼User_Agent，cookie等之類的資訊，或者新增代理伺服器。往往這些都是一些必要的反爬機制

2.requests

通常而言，在我們使用python爬蟲時，更建議用requests庫，因為requests比urllib更為便捷，requests可以直接構造get,post請求併發起，而urllib.request只能先構造get,post請求，再發起。

import requests

url='https://www.douban.com'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
}

get_response = requests.get(url,headers=headers,params=None)

post_response=requests.post(url,headers=headers,data=None,json=None)

print(post_response)

print(get_response.text)

print(get_response.content)

print(get_response.json)

get_response.text得到的是str資料型別。
get_response.content得到的是Bytes型別,需要進行解碼。作用和get_response.text類似。
get_response.json得到的是json資料。

總而言之，requests是對urllib的進一步封裝，因此在使用上顯得更加的便捷，建議小夥伴們在實際應用當中儘量使用requests。

spider.?-python中urllib.request和requests的使用和區別

spider.?-python中urllib.request和requests的使用和區別

python中urllib.request和requests的使用及區別詳解

Python中的單下劃線和雙下劃線使用場景

python中的函式遞迴和迭代原理解析

Python中的單下劃線和雙下劃線使用場景詳解

python中單下劃線(_)和雙下劃線（__）的特殊用法

對python中 math模組下 atan 和 atan2的區別詳解

Python中zip()函式的解釋和視覺化(例項詳解)

談談Python中列表、元組和陣列的區別和騷操作

在Python中獲取列表的長度和大小

python中的a[::1]型別和用法

轉：多重共線性：python中利用statsmodels計算VIF和相關係數消除共線性

影象去霧畢業論文準備05-python中科學計算（Numpy和Scipy)

python中numpy的基本使用和random的一些函式

python中不可變資料型別和可變資料型別

python中的函式引數*args和**kwargs的使用

python 中的迭代器和生成器簡單介紹

基於python中networkx包的傳教士和野人深度優先搜尋演算法及視覺化實現

Python 中的多執行緒和多程序

python 中的import、模組和包

spider.?-python中urllib.request和requests的使用和區別

相關推薦