1. 程式人生 > >Python3使用urllib庫

Python3使用urllib庫

1. urlopen()

import urllib.request
response = urllib.request.urlopen("http://www.baidu.com")

urlopen(url)返回一個HTTPResponse型別的物件

print(type(response))

<class ‘http.client.HTTPResponse’>

我們可以使用read()獲取網頁的原始碼

print(response.read().decode('utf-8'))

也可以獲取狀態碼,頭資訊。使用getheader(key)可獲得頭資訊中key對應的資訊,例如第三行程式碼傳入的引數是Server,我們獲得了BWS/1.1。順便一提,BWS/1.1是百度自己開發的伺服器。

print(response.status)
print(response.getheaders())
print(response.getheader('Server'))

200 [(‘Bdpagetype’, ‘1’), (‘Bdqid’, ‘0x86df3e810007cff5’), (‘Cache-Control’, ‘private’), (‘Content-Type’, ‘text/html’), (‘Cxy_all’, ‘baidu+a3b3120deaa50f29fcbabca556087115’), (‘Date’, ‘Sun, 23 Sep 2018 12:57:25 GMT’), (‘Expires’, ‘Sun, 23 Sep 2018 12:56:31 GMT’), (‘P3p’, ‘CP=" OTI DSP COR IVA OUR IND COM "’), (‘Server’, ‘BWS/1.1’), (‘Set-Cookie’, ‘BAIDUID=67D8FDD012F5DDD1D66D4CF65F0097DE:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/;

domain=.baidu.com’), (‘Set-Cookie’, ‘BIDUPSID=67D8FDD012F5DDD1D66D4CF65F0097DE; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com’), (‘Set-Cookie’, ‘PSTM=1537707445; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com’), (‘Set-Cookie’, ‘delPer=0; expires=Tue, 15-Sep-2048 12:56:31 GMT’), (‘Set-Cookie’, ‘BDSVRTM=0; path=/’), (‘Set-Cookie’, ‘BD_HOME=0; path=/’), (‘Set-Cookie’, ‘H_PS_PSSID=1429_27213_21093_22158; path=/;
domain=.baidu.com
’), (‘Vary’, ‘Accept-Encoding’), (‘X-Ua-Compatible’, ‘IE=Edge,chrome=1’), (‘Connection’, ‘close’), (‘Transfer-Encoding’, ‘chunked’)]

BWS/1.1