Python爬蟲專案
阿新 • • 發佈:2021-09-04
目錄
用urlopen傳送http get請求
import urllib.request
response=urllib.request.urlopen("https://www.cnblogs.com")
#用utf-8解碼
print(response.read().decode('utf-8'))
京東專案實戰
獲取HTTP報文資訊
import urllib.request response=urllib.request.urlopen("https://www.jd.com") print("response的型別:",type(response)) print("status:",response.status," msg:",response.msg," version",response.version) print('header:',response.headers," \n\n",response.getheaders()) print('headers-content-type:',response.getheader('Content-Type')) print(response.read().decode('utf-8'))
用urlopen傳送HTTP post請求
需要將資料轉換為bytes型別
import urllib.request
data=bytes(urllib.parse.urlencode({'name':'Bill','age':30}),encoding='utf-8')
response=urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read().decode('utf-8'))
使用try except捕獲超時異常
import urllib.request import socket import urllib.error try: response=urllib.request.urlopen("http://httpbin.org/get",timeout=0.1) except urllib.error.URLError as e: if isinstance(e.reason,socket.timeout): #isinstance() 函式來判斷一個物件是否是一個已知的型別,類似 type()。 print('超時') print('continue....')
設定HTTP請求頭
#修改了user-agent和host請求頭,並且添加了自定義請求頭,並提交給了web from urllib import request,parse url="http://httpbin.org/post" headers={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.7113.93 Safari/537.36", "Host":"127.0.0.1", "who":'my python' } dict={ 'name':'Bill', 'age':30 } data=bytes(parse.urlencode(dict),encoding='utf-8') req=request.Request(url=url,data=data,headers=headers) print(str(req)+"\n\n") response=request.urlopen(req) print(response.read().decode('utf-8'))
本文來自部落格園,作者:{Zeker62},轉載請註明原文連結:https://www.cnblogs.com/Zeker62/p/15227504.html