入門資料採集，python爬蟲常見的資料採集與儲存、

阿新 • • 發佈：2020-08-29

本文介紹兩種方式來實現python爬蟲獲取資料，並將python獲取的資料儲存到檔案中。
一、第一種方式：
主要通過爬取百度官網頁面資料，將資料儲存到檔案baidu.html中，程式執行完開啟檔案baidu.html檢視效果。具體程式碼中有詳細的程式碼解釋，相信剛入門的你也能看懂~~
說明一下我的程式碼環境是python3.7，本地環境是python2.x的可能需要改部分程式碼，用python3.x環境的沒問題。
程式碼如下：

# -*- coding: utf-8 -*-
import urllib.request
import urllib
 
# 1網址url  --百度    
url = ' 
http://www.baidu.com'
 
# 2建立request請求物件
request = urllib.request.Request(url)

#3 傳送請求獲取結果
response = urllib.request.urlopen(request)
htmldata = response.read()

# 4、設定編碼方式
htmldata = htmldata.decode('utf-8')
 
# 5、列印結果
print (htmldata)
 
# 6、列印爬去網頁的各類資訊
print ("response的型別:",type(response))
print (" 
請求的url:",response.geturl())
print ("響應的資訊:",response.info())
print ("狀態碼:",response.getcode())
 
# 7、爬取資料儲存到檔案
fileOb = open('baidu.html','w',encoding='utf-8')     #開啟一個檔案，沒有就新建一個
fileOb.write(htmldata)
fileOb.close()

在open()方法中如果沒有設定編碼encoding='utf-8'，會報錯，原因如下：
在windows下面，新檔案的預設編碼是gbk，這樣的話，python直譯器會用gbk編碼去解析我們的網路資料流html，

然而html此時已經是decode過的unicode編碼，這樣的話就會導致解析不了，出現上述問題。
設定encoding='utf-8'，開啟檔案時就按照utf-8格式編碼，則順利執行。
執行結果：
部分截圖如下：擷取的是第六步中的網頁各類資訊，第五步列印的資料過多，也已經儲存到檔案baidu.html中了，所以沒有擷取。

二、第二種方式：新增特殊情景的處理器

程式碼如下：

# -*- coding: utf-8 -*-
import urllib.request, http.cookiejar
 
# 1、網址url  --百度
url = 'http://www.baidu.com'
 
# 2、建立cookie容器
cj = http.cookiejar.CookieJar()
handle = urllib.request.HTTPCookieProcessor(cj)
 
# 3、建立1個opener
opener = urllib.request.build_opener(handle)

# 4、給urllib.request安裝opener
urllib.request.install_opener(opener)
 
# 5、使用帶有cookie的urllib.request訪問網頁,傳送請求返回結果
response = urllib.request.urlopen(url)
htmldata = response.read()
 
# 6、設定編碼方式
data = htmldata.decode("utf-8")
 
# 7、列印結果
print (data)
 
# 8、列印爬去網頁的各類資訊
print ("response的型別:",type(response))
print ("請求的url:",response.geturl())
print ("響應的資訊:",response.info())
print ("狀態碼:",response.getcode())
 
# 9、爬取資料儲存到檔案
fileOb = open('baiduCookie.html','w',encoding='utf-8')     #開啟一個檔案，沒有就新建一個
fileOb.write(data)
fileOb.close()

入門資料採集，python爬蟲常見的資料採集與儲存、

入門資料採集，python爬蟲常見的資料採集與儲存、

Python爬蟲案例：採集Tripadvisor資料

私藏專案實操分享，Python爬蟲實現拉勾網崗位資料視覺化

python對常見資料型別的遍歷解析

對Python爬蟲常見工具總結，歡迎補充

python爬蟲與資料視覺化——python爬蟲：補充SQLite

python爬蟲與資料視覺化——python爬蟲：儲存資料到SQLite

python爬蟲-scrapy資料解析

python爬蟲+R資料視覺化例項

layui動態資料表格，每次操作完資料，資料表格重新整理，且回到之前所操作的資料位置

js，javascript中物件鍵相同的資料合併，js物件的資料處理問題

ajax接收servlet的資料data，傳入Layui的資料表格

python爬蟲實現網頁採集器

簡單的python爬蟲程式碼，python爬蟲程式碼大全

構築“資料聯結器”，騰訊雲大資料推出“開源開放”戰略

搭建一個強大的資料平臺，讓你的資料分析事半功倍！

vue引用傳遞——賦值後，新變數資料改變，原變數的資料也隨之改變了

Python爬蟲庫requests獲取響應內容、響應狀態碼、響應頭

Python 簡明教程 --- 20，Python 類中的屬性與方法

Python爬蟲（二）導包、解釋urllib、bs4

入門資料採集，python爬蟲常見的資料採集與儲存、

相關推薦