1. 程式人生 > 實用技巧 >爬取B站彈幕並且製作詞雲

爬取B站彈幕並且製作詞雲

目錄

爬取彈幕

1. 從手機埠進入網頁爬取找到介面
2.程式碼
import requests
from lxml import etree
import numpy as np
url='https://api.bilibili.com/x/v1/dm/list.so?oid=198835779'
headers= {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36',
}
response = requests.get(url,headers= headers)
# response.encoding='utf-8'
# print(response.text)
# print(response.content)
print(type(response.text))
html=etree.HTML(response.content)
t=etree.tostring(html)
# print(t)
words = html.xpath('/html//d/text()')
with open('word.txt','w')as f:
for word in words:
f.write(word+' ')
print(words)

每一次的都是不同的彈幕所以不用擔心彈幕數量過少

製作詞雲

利用的是wordcloud

1.檔案讀取
2.程式碼
#coding:utf-8

from matplotlib import pyplot as plt
from wordcloud import WordCloud
import jieba
f=open(r'word.txt','r',encoding='gbk').read()
font = r'C:\Windows\Fonts\FZSTK.TTF' cloud=WordCloud(
font_path=font,
width=1000,
height=800,
margin=2
).generate(f) plt.imshow(cloud)
plt.axis('off')
plt.show()