常規檔案讀寫和爬蟲資料儲存

阿新 • • 發佈：2020-07-21

常規檔案讀寫和爬蟲資料儲存

1.檔案讀取、寫入資料

檔案讀取

檔案讀取很簡單，就三步
```
首先，使用open()函式開啟檔案
myfile = open(r'test.txt','r')
```
myfile是變數，存放讀取的檔案,第一個r是固定符號,'test.txt'是需要讀取檔案的檔名,最後一個‘r’是檔案開啟模式，‘r’代表read, 意為檔案讀取。

第二步，我們要來讀取檔案中的內容

myfliecontent=myfile.read()
#使用read()函式，讀取myfile變數中檔案的資料，將資料放在myfilecontent變數中
print(myfilecontent)
#使用print()函式，看看讀取到什麼內容

最後一步，關閉檔案。

myfile.close()
#關閉檔案.不能忘記很重要!很重要!很重要!很重要!很重要!

完整的程式碼

myfile = open(r'test.txt','r')
myfilecontent = myfile.read()
print(myfilecontent)
myfile.close()

檔案寫入操作

檔案寫入也是分三步: 開啟檔案-----寫檔案------關閉檔案

方法一

第一步開啟檔案：
myfile=open(r'test.txt','w') #使用open()函式， 除了最後一個引數，其餘引數不動， 把最後一個引數換成‘w’，是write意思，意為寫入。
第二步，開始寫入內容
myfile.write('從你的全世界路過')
第三步，關閉檔案
myfile.close()

方法二

with open(rtest.txt','a') as myfile:
          myfile.write('從你的全世界路過')

2.Excel檔案讀取儲存

import openpyxl
# 引用openpyxl
wb = openpyxl.Workbook()
# 利用openpyxl.Workbook()函式建立新的workbook（工作薄）物件，就是建立新的空的Excel檔案。
sheet = wb.active
# wb.active就是獲取這個工作薄的活動表，通常就是第一個工作簿，也就是我們在上面的圖片中看到的sheet1。
sheet.title = 'kaikeba'
# 可以用.title給工作表重新命名。現在第一個工作表的名稱就會由原來預設的“sheet1”改為"gdp"。
sheet['A1'] = 'gdp'
# 向單個單元格寫入資料
score1 = ['math', 95]
sheet.append(score1)
# 寫入整行的資料，變數型別是一個列表
wb.save('score.xlsx')
# 儲存修改的Excel
wb.close()
# 關閉Excel

#實戰練習：
'''
根據已給網址爬取果殼前10頁標題和對應的網址資訊，
並將爬取的資料寫入到excel表格或者csv中
網址：https://www.guokr.com/ask/highlight/?page=
'''
import requests
import openpyxl
from bs4 import BeautifulSoup
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
wb=openpyxl.Workbook()
sheet=wb.active
sheet.title='guoke'
sheet['A1']='標題'
sheet['B1']='網址'
for i in range(1,11):
    res=requests.get('https://www.guokr.com/ask/highlight/?page=%s'%i,headers=headers)
    soup=BeautifulSoup(res.text,'html.parser')
    list_info=soup.find_all('ul',class_="ask-list-cp")
    list_data=list_info[0].find_all('div',class_="ask-list-detials")
    #print(list_info)
    for data in list_data:
        res_data=data.find('a')
        title=res_data.text
        #print(title)
        url=res_data['href']
        #print(url)
        sheet.append([title,url])
wb.save('gouke.xlsx')
wb.close()

3.csv格式檔案讀取、儲存資料

CSV儲存(寫入)資料（csv.writerow()操作)

方法一:
import CSV
# 需要寫入的資料
score1 = ['math', 95]
score2 = ['english', 90]

# 開啟檔案，追加a, newline=""，可以刪掉行與行之間的空格
file=open("score.csv",'a',newline="")
# 設定寫入模式
csv_write=csv.writer(flie)
# 寫入具體內容
csv_write.writerow(score1)
csv_write.writerow(score2)
flie.close()
open("score.csv")
#方法二：
import CSV
score1 = ['math', 95]
score2 = ['english', 90]
with open('score.csv','a',newline='') as r:
    #writer是例項化物件，writerow()是寫入的方法,括號內的資料是列表形式
    writer=csv.write(r)
    writer.writerow(scroe1)   
    write.writerow(scroe2)
print('寫入完畢')
open('score.csv')

CSV資料讀取（csv.reader()讀取操作）

with open("mytest.csv",'r') as file:
    reader=csv.reader(file)
    for content in reader: #writerow()方法寫入的資料據是列表形式，所以讀取的時候應該for迴圈遍歷列表
        print(content)