1. 程式人生 > 其它 >20213306李鵬宇《Python程式設計》實驗四實驗報告

20213306李鵬宇《Python程式設計》實驗四實驗報告

20213306《Python程式設計》實驗四報告

課程:《Python程式設計》
班級:2133
姓名:李鵬宇
學號:20213306
實驗教師:王志強
實驗日期:2022年5月24日
必修/選修: 公選課

1.實驗內容

用python對個人b站賬號資料進行推送

2. 實驗過程及結果

2.1 靈感

作為一個遊戲愛好者,我偶爾也會發布一些新遊體驗視訊到b站,而有時候就想看看自己的視訊有沒有人看,正好學習了python的網路爬蟲,想著能不能做一個推送服務。

2.2 過程

2.2.1.1 1.0版本

最開始只是想單純的檢視一下b站api公開的個人資料,於是照著網上的教程自己修修補補,簡單地寫了一個:

2.2.1.2 1.0程式碼

import requests
import re
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47', 'Referer': 'https://space.bilibili.com/410279961'}
url1 = 'https://api.bilibili.com/x/space/acc/info?mid=410279961'
url2 = "http://api.bilibili.com/x/relation/stat?vmid=410279961"
data1 = requests.get(url1,headers=headers).text#1.0版本未增加自定義id
data2 = requests.get(url2,headers=headers).text
id=re.search(r'''"mid":(.*?),''',data1)#正則表示式搜尋
id_1 = re.sub(r'''[",]''','',id.group(0))#替換符號
id_2 = re.sub(r'''mid''',"B站id",id_1)#替換英文
name=re.search(r'''"name":"(.*?)"''',data1)
name_1 = re.sub(r'''"''','',name.group(0))
name_2 = re.sub(r'''name''',"使用者名稱",name_1)
sex=re.search(r'''"sex":"(.*?)"''',data1)
sex_1 = re.sub(r'''"''','',sex.group(0))
sex_2 = re.sub(r'''sex''',"性別",sex_1)
level=re.search(r'''"level":(.*?),''',data1)
level_1 = re.sub(r'''[",]''','',level.group(0))
level_2 = re.sub(r'''level''',"等級",level_1)
following=re.search(r'''"following":(.*?),''',data2)
following_1 = re.sub(r'''[",]''','',following.group(0))
following_2 = re.sub(r'''following''',"關注",following_1)
follower=re.search(r'''"follower":(.*?)}''',data2)
follower_1 = re.sub(r'''["}]''','',follower.group(0))
follower_2 = re.sub(r'''follower''',"粉絲",follower_1)
print(id_2+'\n'+name_2+'\n'+sex_2+'\n'+level_2+'\n'+following_2+'\n'+follower_2)#輸出

2.2.2.1 2.0版本

讓我感覺十分不解的是,公開的api裡竟然沒有個人硬幣數量?!
作為一個手握700+硬幣的白嫖怪,這我堅決不能忍!
之前瞭解到,cookie可以記錄登入狀態,所以我嘗試加入cookie
但是,我爬到的與b站的網頁完全不一致,怎樣能使爬取結果與實際網頁相同呢?
我想起了老師之前發出的selenium模擬打卡教程
通過selenium控制firefox瀏覽器登入b站後,得到的結果仍然相同,但是我仍然找到了解決辦法:
通過網上找到的一個firefox拓展可以讓自動化狀態下的firefox也被看做正常執行,下面第二個連結是原文,就不在這裡放出瀏覽器拓展的內容了

2.2.2.2 2.0版本程式碼

from cgitb import html
import requests
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
from selenium.webdriver import FirefoxOptions

def isElementExist(self,element):
        flag=True
        browser=self
        try:
            browser.find_element(by=By.XPATH,value=element)
            return flag
        
        except:
            flag=False
            return flag

opts = FirefoxOptions()
opts.add_argument("--headless")
opts.add_argument("--disable-gpu")
browser1 = webdriver.Firefox(options=opts)
url = 'https://space.bilibili.com'
url2 = "https://member.bilibili.com/platform/home"
url3 = 'https://member.bilibili.com/platform/data-up/video/dataCenter/video'
browser1.install_addon(os.path.realpath('C:\linshi'), temporary=True)
browser1.get(url)
cookies ={
		#這裡是你的cookie
	}
for i in cookies:
    browser1.add_cookie({"name":i,"value":cookies[i],"domain":".bilibili.com","path":'/'})
browser1.refresh()
time.sleep(5)
tx = browser1.find_element(by=By.XPATH, value=r'''/html/body/div[1]/div/div/div[3]/div[2]/div[1]/span''')
ActionChains(browser1).move_to_element(tx).perform()
time.sleep(1)
name = browser1.find_element(by=By.XPATH, value=r'''/html/body/div[4]/div/p''').text
vip = browser1.find_element(by=By.XPATH, value=r'''/html/body/div[4]/div/div[2]/a''').text
level = browser1.find_element(by=By.XPATH, value=r'''/html/body/div[4]/div/div[3]/div[1]/span[1]''').text+'\t經驗:'+browser1.find_element(by=By.XPATH, value=r'''/html/body/div[4]/div/div[3]/div[1]/span[2]''').text
coin = '硬幣:'+browser1.find_element(by=By.XPATH, value=r'/html/body/div[4]/div/div[4]/div/div[1]/a[1]/span').text
bcoin = 'B幣:'+browser1.find_element(by=By.XPATH, value=r'/html/body/div[4]/div/div[4]/div/div[1]/a[2]/span').text
following = '關注:'+browser1.find_element(by=By.XPATH, value='//*[@id="n-gz"]').text
follower = '粉絲:'+browser1.find_element(by=By.XPATH, value='//*[@id="n-fs"]').text
tougao = '總投稿:'+browser1.find_element(by=By.XPATH,value='/html/body/div[2]/div[2]/div/div[1]/div[1]/a[3]/span[3]').text
zan = browser1.find_element(by=By.XPATH, value='/html/body/div[2]/div[2]/div/div[1]/div[3]/div[1]').get_attribute("title")
bofang = browser1.find_element(by=By.XPATH, value='/html/body/div[2]/div[2]/div/div[1]/div[3]/div[2]').get_attribute("title")
browser1.get(url2)
time.sleep(7)
if isElementExist(browser1,'/html/body/div[2]/div/div/img'):
    browser1.find_element(by=By.XPATH,value='/html/body/div[2]/div/div/img').click()
time.sleep(1)
date = '今天是你'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[1]/div/div[2]/div[1]').text
dcl = browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[1]/div[1]/div[2]/a').text
pinglun = '總評論:'+browser1.find_element(by=By.XPATH, value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[2]/div[1]/div[3]/div/div/div[2]/span').text
dm = '總彈幕:'+browser1.find_element(by=By.XPATH, value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[2]/div[1]/div[4]/div/div/div[2]/span').text
share = '總轉發:'+browser1.find_element(by=By.XPATH, value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[2]/div[2]/div[2]/div/div/div[2]/span').text
save = '總收藏:'+browser1.find_element(by=By.XPATH, value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[2]/div[2]/div[3]/div/div/div[2]/span').text
tb = '總投幣:'+browser1.find_element(by=By.XPATH, value='/html/body/div/div[3]/div[4]/div[2]/div[3]/div/div[2]/div[2]/div[4]/div/div/div[2]/span').text
money = '收益-電池:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div[4]/div[2]/div/div/p').text
browser1.get(url3)
time.sleep(5)
if isElementExist(browser1,'/html/body/div[2]/div/div/img'):
    browser1.find_element(by=By.XPATH,value='/html/body/div[2]/div/div/img').click()
browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[1]/div/div[2]/div/div[1]/div/div/span').click()
time.sleep(1)
browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[1]/div/div[2]/div/div[1]/div/div/span').click()
time.sleep(1)
bofang = bofang+'(昨日增加:'+browser1.find_element(by=By.XPATH,value="/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/span").text+')'
follower = follower+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[3]/div[1]/div[2]/span').text+')'
zan = zan+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[4]/div[1]/div[2]/span').text+')'
save = save+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[5]/div[1]/div[2]/span').text+')'
tb = tb+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[6]/div[1]/div[2]/span').text+')'
pinglun = pinglun+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[7]/div[1]/div[2]/span').text+')'
dm = dm+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[8]/div[1]/div[2]/span').text+')'
share = share+'(昨日增加:'+browser1.find_element(by=By.XPATH,value='/html/body/div/div[3]/div[4]/div[2]/div/micro-app/micro-app-body/div/div/div/div[2]/div[2]/div[2]/div[9]/div[1]/div[2]/span').text+')'
shuchu = name+'\n'+vip+'\n'+level+'\n'+coin+'\n'+bcoin+'\n創作資訊:\n'+date+'\n'+tougao+'\n'+dcl+'\n'+follower+'\n'+following+'\n'+zan+'\n'+bofang+'\n'+pinglun+'\n'+dm+'\n'+share+'\n'+save+'\n'+tb+'\n'+money
print(shuchu)

2.2.3.1 3.0版本

雖然上面的版本在我的電腦上成功執行,但是在我給ecs伺服器裝上firefox後(注:eular os yum還是dnf庫裡都沒有firefox,只能用openeuler),啟動python程式直接報錯,原因是selenium版本與firefox不匹配,同時在多次安裝更新無果後(甚至某次更新gcc某庫時徹底使伺服器指令連結失效,見下方參考連結),最終還是放棄了在伺服器上使用selenium的方案
於是我又回到了正常的請求上來,最終經過查找了解到,原來動態網頁是通過呼叫不同的請求來顯示內容的,而這些內容使用f12開發人員工具是可以找到的,最終,經過了一系列尋找和改進之後,我完成了本次實驗程式碼的最終版本(由於程式碼幾乎沒有借鑑,所以會顯得很呆。。。)

2.2.3.2 3.0版本程式碼

import datetime
import json
import requests
import time

try:#學某些東西(mc伺服器端,grasscutter伺服器端)生成個config.json
    f =open('config.json')
    f.close()
    configforuser = json.load(open('config.json'))
    cookie = configforuser['cookie']
    up = int(configforuser['up'])#其實true or false才合理,但是懶得改了
    Ua = configforuser['UA']
    ts = int(configforuser['ts'])
    tgbotweb = configforuser['server']
except:#如果讀不到資料
    print("檔案不存在,正在重新生成,請正確修改config.json後再繼續.")
    f = open('config.json',mode='w',encoding='utf8')
    f.write('''{"cookie":"在這裡輸入你的B站cookie",\n"up":"開啟(改成1)或關閉(改成0)up主模式",\n"UA":"填入你的瀏覽器UA",\n"ts":"推送,開為1關為0",\n"server":"暫時只做了tg bot,輸入api網頁連結至text=即可"}''')
    f.close()
    exit()
#報頭
headers = {"Referer": r"https://www.bilibili.com/",
           'origin': r'https://space.bilibili.com',
           "Accept": 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
           "Accept-Language": 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
           "Cookie": cookie,
           'User-Agent': Ua, }
res2 = requests.get(
    r'https://api.bilibili.com/x/web-interface/nav', headers=headers).json()
res1 = requests.get(
    r'https://api.bilibili.com/x/space/acc/info?mid=410279961&jsonp=jsonp', headers=headers).json()
res3 = requests.get('https://member.bilibili.com/x/web/index/stat',headers=headers).json()
res4 = requests.get(r'https://api.bilibili.com/x/web-interface/nav/stat',headers=headers).json()
uid = 'B站id:'+str(res1['data']['mid'])
name = res1['data']['name']
coins = '硬幣:'+str(res1['data']['coins'])
vip = res2['data']['vip']['label']['text']
level_info = '等級:Lv'+str(res2['data']['level_info']['current_level'])+'\t經驗'+str(res2['data']['level_info']['current_exp'])+'\\'+str(res2['data']['level_info']['next_exp'])
following = '關注:'+str(res4['data']['following'])#讀取你點開個人中心裡的資料
#第二部分
date = '今天是'+requests.get(r'https://member.bilibili.com/x/web/index/scrolls',headers=headers).json()['data']['scrolls'][0]['name']
total_data = res3['data']
click = '總點選:'+str(total_data['total_click'])+'(昨日增加:'+str(total_data["incr_click"])+')'
dm = '總彈幕:'+str(total_data['total_dm'])+'(昨日增加:'+str(total_data["incr_dm"])+')'
fans = '粉絲:'+str(total_data['total_fans'])+'(昨日增加:'+str(total_data["incr_fans"])+')'
reply = '評論:'+str(total_data['total_reply'])+'(昨日增加:'+str(total_data["incr_reply"])+')'
like = '總點贊:'+str(total_data['total_like'])+'(昨日增加:'+str(total_data["inc_like"])+')'
fav = '總收藏:'+str(total_data['total_fav'])+'(昨日增加:'+str(total_data["inc_fav"])+')'
coin = '收到投幣:'+str(total_data['total_coin'])+'(昨日增加:'+str(total_data["inc_coin"])+')'
share = '總轉發:'+str(total_data['total_share'])+'(昨日增加:'+str(total_data["inc_share"])+')'
dcl = '電磁力 Lv.'+str(requests.get('https://api.bilibili.com/studio/up-rating/v3/rating/status',headers=headers).json()['data']['level'])
money = '收益-貝殼:'+str(requests.get('https://member.bilibili.com/x/web/elec/balance',headers=headers).json()['data']['bpay_account']['brokerage'])
dt = '總動態數:'+str(res4['data']['dynamic_count'])#點開創作中心後的資料(b站創作中心12點更新昨日資料)
#第三部分
todaytime = int(time.mktime(datetime.date.today().timetuple()))
yesterdaytime = todaytime-86400
historybase = requests.get('https://api.bilibili.com/x/web-interface/history/cursor',headers=headers).json()
yestview = 0
todaview = 0
#todalike = 0 b站在個人歷史介面並沒有給出“我是否點讚了該視訊”的資料,一條一條視訊爬效率過低,暫時擱置
isyest = 1#另:b站並不儲存所有回放記錄,我只想寫一個推送工具,故只用時間戳統計昨日和今日的觀看量,完全可以pandas累積多日生成表格
while isyest:
    cursor = historybase['data']['cursor']
    datalist = historybase['data']['list']
    for i in datalist:
        view_at = i['view_at']
        if view_at>= yesterdaytime:
            if view_at>=todaytime:
                todaview+=1
            else:
                yestview+=1
        else:#一條條向上爬取,直到超過昨天0:00
            isyest = 0
            break
        tempurl = 'https://api.bilibili.com/x/web-interface/history/cursor?max={}&view_at={}&business=archive'.format(cursor['max'], cursor['view_at'])
        historybase = requests.get(tempurl,headers=headers).json()#每次最多顯示20條視訊記錄,故迴圈
zrgk = '昨日觀看視訊總數:'+str(yestview)
jrgk = '今日(截至目前)觀看視訊數:'+str(todaview)
upbf = date+'\n'+dcl+'\n'+dt+'\n'+click+'\n'+dm+'\n'+fans+'\n'+reply+'\n'+like+'\n'+fav+'\n'+coin+'\n'+share+'\n'+money
basepart =  name+'\n'+uid+'\n'+level_info+'\n'+vip+'\n'+zrgk+'\n'+jrgk+'\n'+coins
if up:#讀取config.json
    output = basepart+'\n'+upbf
else:
    output = basepart
if ts:#推送部分(cf反代tg bot api)
    tgbot = tgbotweb+output
    a = requests.post(tgbot)
    time.sleep(5)
else:
    print(output)
print('成功')

2.3 關於推送

其實剛開始是沒有推送這一條的,但是之前有用過某面板的經歷,所以正好有一個telegram的bot,之後還自己折騰給裝上了cloudflare的反向代理。於是就用上了(注:telegram走谷歌fcm能給國內推送但看不全。。。)

2.4 執行截圖






3. 實驗過程中遇到的問題和解決過程

太多了,尤其是在給ecs裝selenium,給eular os裝桌面還有升級編譯gcc尤其是更新glibc的時候(誤)
解決辦法 :一點一點上網查,除錯,執行,修改,再除錯(大不了重灌伺服器)
以下為部分查閱過的資料......

4. 參考資料

5.結課感悟

Python正式結課了,相對於我們必修課程的C語言,Python確實易用,並且相對於C語言現在只能靠命令列輸入輸出,Python在各類模組的加持下可以更快更明顯地進入實踐解毒丹見到效果。
在高中時我和幾個喜歡折騰的同學用舊手機開過我的世界的伺服器,當時用termux,用anlinux等等因此對vnc,ssh等等工具都有所瞭解,同時也逐漸開始瞭解到一些程式語言,畢竟你得能看懂config.json吧(現在有個非常好的多伺服器管理工具叫MCDR,是個開箱即用的python模組)
當我看到選修課裡有Python時,我第一反應就是報名這門選修,當然,我也在這門課上正式地學習了(其實也只是初步認識了)Python,也知道了很多很多其他程式語言也會用到的知識。度過了一段美好的學習時光。很開心能夠學習到這門選修課,也希望自己能夠在以後保持使用和學習Python。畢竟

Life is short, you need python.