1. 程式人生 > 實用技巧 >Python爬蟲實驗報告之Big_Homework1_Lishipin

Python爬蟲實驗報告之Big_Homework1_Lishipin

實驗目的:

爬取梨視訊網站某模組全部資訊;

欄位資訊為:視訊標題、作者、點贊數,純視訊連結,並且存入txt文件。

實驗過程截圖

原始碼:

 1 import requests
 2 from lxml import etree
 3 from urllib import request
 4 import re
 5 
 6 # 全域性變數(請求頭+檔案IO物件)
 7 headers = {
 8     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44
'} 9 file = open('./梨視訊.txt', 'w', encoding='utf-8') 10 11 12 # 採集前端原始碼 13 def index(): 14 for num in range(0, 493, 12): 15 base_url = 'https://www.pearvideo.com/category_loading.jsp?reqType=5&categoryId=59&start={}'.format(num) 16 print('正在寫入', base_url, '中的資料資訊...') 17 response = requests.get(base_url, headers=headers) #
模擬訪問+請求頭 18 response.encoding = 'utf-8' # 解碼 19 html = response.text # 獲取原始碼 20 clean(html) # 清洗資料 21 22 23 # 清洗資料 24 def clean(html): 25 htmls = etree.HTML(html) # 預處理 26 video_titles = htmls.xpath('//div[@class="vervideo-bd"]/a/div[2]/text()') 27 # print(video_titles),視訊標題
28 video_authors = htmls.xpath('//div[@class="vervideo-bd"]/div/a/text()') 29 # print(video_authors),作者 30 video_likes = htmls.xpath('//div[@class="vervideo-bd"]/div/span/text()') 31 # print(video_likes),點贊數 32 video_urls1 = htmls.xpath('//div[@class="vervideo-bd"]/a/@href') 33 # print(video_urls1),不完整的視訊連結 34 printt(video_titles,video_authors,video_likes,video_urls1) 35 36 37 # 列印資料 38 def printt(video_titles,video_authors,video_likes,video_urls1): 39 # 拼接 40 for vu,vt,va,vl in zip(video_urls1,video_titles,video_authors,video_likes): 41 video_urls2 = 'https://www.pearvideo.com/' + vu 42 # print(video_urls2) 43 # 第二層訪問 44 response = requests.get(video_urls2) 45 response.encoding = 'utf-8' 46 html = response.text 47 # print(html) 48 # 吸星大法 49 pattern = re.compile('srcUrl="(.*?)",vdoUrl') 50 video_url = pattern.findall(html)[0] 51 # print(video_url) 52 full_info='視訊標題:'+vt+'\t'+'作者:'+va+'\t'+'點贊數:'+str(vl)+'\n'+video_url 53 file.write(full_info+'\n') 54 55 56 # 下載模組 57 def download(): 58 pass 59 60 61 if __name__ == '__main__': 62 index() 63 file.close()
View Code

實驗心得:

因為我先寫的大作業2,所以這個寫的順的一批,中間也沒遇到什麼煩人的bug,又是一段開心的程式設計經歷。