【python爬蟲自學筆記】-----爬取網易雲歌單中歌曲歌詞
阿新 • • 發佈:2019-02-07
工具:python3.6 ,pycharm
開始對網頁的內容進行爬取的時候,使用requests獲得響應,只傳url,但是沒有獲得響應,使用urllib新增請求頭部,並對response的內容使用utf-8進行解碼,使用BeautifulSoup轉換為html物件,並格式化列印物件內容。
此爬蟲中最重要的一點是獲得歌詞的連結,此連結在網頁的原始碼中是隱藏的,參看文章說明,使用的是網易雲開放的API介面。
#爬取網易雲音樂我的歌單裡面所有歌曲的歌詞 import json import requests import re import urllib from bs4 import * myurl = "http://music.163.com/playlist?id=2251736705" headers = {"Host":" music.163.com", "User-Agent":" Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0", } request = urllib.request.Request(myurl,headers=headers) response = urllib.request.urlopen(request) #不decode的話text是十六進位制,不是中文 html = response.read().decode('utf-8','ignore') soup = BeautifulSoup(html,'lxml') print(soup.prettify()) #列印的有用的資料部分 <ul class="f-hide"> <li> <a href="/song?id=5048569"> Wonderful Tonight </a> </li> <li> <a href="/song?id=1299217"> Tears in Heaven </a> </li> <li> <a href="/song?id=17541009"> Autumn Leaves </a> </li> <li> <a href="/song?id=28851137"> Sensitive Kind </a> </li> <li> <a href="/song?id=25542198"> My Back Pages </a> </li> <li> <a href="/song?id=17541090"> Lay Down Sally </a> </li> <li> <a href="/song?id=26641658"> Riding With the King </a> </li> <li> <a href="/song?id=17540892"> Change The World </a> </li> <li> <a href="/song?id=28040815"> Layla </a> </li> <li> <a href="/song?id=26641663"> Help the Poor </a> </li> <li> <a href="/song?id=5201813"> Tears In Heaven </a> </li> <li> <a href="/song?id=17540496"> Piece Of My Heart (Album Version) </a> </li> <li> <a href="/song?id=28851139"> Magnolia </a> </li> <li> <a href="/song?id=17540498"> One Track Mind (Album Version) </a> </li> <li> <a href="/song?id=26641661"> Marry You </a> </li> <li> <a href="/song?id=26641665"> Worried Life Blues </a> </li> <li> <a href="/song?id=28851135"> Someday </a> </li> <li> <a href="/song?id=28851134"> Rock And Roll Records </a> </li> <li> <a href="/song?id=17541200"> Old Love </a> </li> <li> <a href="/song?id=17541190"> Hey Hey </a> </li> <li> <a href="/song?id=26641669"> Come Rain or Come Shine </a> </li> <li> <a href="/song?id=1077606"> Change the World (Live) </a> </li> <li> <a href="/song?id=28851141"> Songbird </a> </li> <li> <a href="/song?id=413961594"> I Will Be There </a> </li> <li> <a href="/song?id=18610067"> Last Will And Testament (Album Version) </a> </li> <li> <a href="/song?id=28851136"> Lies </a> </li> <li> <a href="/song?id=1298826"> Knockin' on Heaven's Door </a> </li> <li> <a href="/song?id=17540893"> My Father's Eyes </a> </li> <li> <a href="/song?id=27490248"> Everytime I Sing the Blues </a> </li> <li> <a href="/song?id=17540856"> Cocaine </a> </li> <li> <a href="/song?id=18610066"> Don't Cry Sister (Album Version) </a> </li> <li> <a href="/song?id=31918662"> Riding With The King </a> </li> <li> <a href="/song?id=26641662"> Three O'Clock Blues </a> </li> <li> <a href="/song?id=1299044"> Jeff's Blues </a> </li> <li> <a href="/song?id=26641668"> Hold On! I'm Comin' </a> </li> <li> <a href="/song?id=17540639"> Golden Ring </a> </li> <li> <a href="/song?id=31918653"> Behind The Mask </a> </li> <li> <a href="/song?id=28851140"> I Got The Same Old Blues </a> </li> <li> <a href="/song?id=1297898"> Over The Rainbow </a> </li> <li> <a href="/song?id=17540956"> Tears In Heaven </a> </li> <li> <a href="/song?id=17540890"> Running On Faith - Unplugged </a> </li> <li> <a href="/song?id=26641659"> Ten Long Years </a> </li> <li> <a href="/song?id=26641660"> Key to the Highway </a> </li> <li> <a href="/song?id=26641664"> I Wanna Be </a> </li> <li> <a href="/song?id=31918654"> Sweet Home Chicago </a> </li> <li> <a href="/song?id=28040813"> Driftin' </a> </li> <li> <a href="/song?id=413961593"> Can't Let You Do It </a> </li> <li> <a href="/song?id=28851133"> They Call Me The Breeze </a> </li> <li> <a href="/song?id=18610062"> It's Easy (Album Version) </a> </li> <li> <a href="/song?id=17541198"> San Francisco Bay Blues </a> </li> </ul>
將爬取的歌詞寫入一個檔案中
#開啟jazz.txt 把歌單中的歌詞寫入
f=open('jazz.txt','w',encoding='utf-8')
首先獲得歌曲的id,根據列印輸出html物件結構可以看出,他們包含在一個ul標籤中,每首歌包含在一個li標籤中
for item in soup.ul.children: #取出歌單裡歌曲的id 形式為:/song?id=11111111 song_id = item('a')[0].get("href",None) #歌曲名稱 song_name = item.string #利用正則表示式提取出song_id的數字部分sid pat = re.compile(r'[0-9].*$')#提取模式為全都為數字的字串 sid = re.findall(pat,song_id)[0]#提取歌曲ID #列印歌曲ID以及名稱 print(sid+"-"+song_name) 5048569-Wonderful Tonight 1299217-Tears in Heaven 17541009-Autumn Leaves 28851137-Sensitive Kind 25542198-My Back Pages 17541090-Lay Down Sally 26641658-Riding With the King 17540892-Change The World 28040815-Layla 26641663-Help the Poor 5201813-Tears In Heaven 17540496-Piece Of My Heart (Album Version) 28851139-Magnolia 17540498-One Track Mind (Album Version) 26641661-Marry You 26641665-Worried Life Blues 28851135-Someday 28851134-Rock And Roll Records 17541200-Old Love 17541190-Hey Hey 26641669-Come Rain or Come Shine 1077606-Change the World (Live) 28851141-Songbird 413961594-I Will Be There 18610067-Last Will And Testament (Album Version) 28851136-Lies 1298826-Knockin' on Heaven's Door 17540893-My Father's Eyes 27490248-Everytime I Sing the Blues 17540856-Cocaine 18610066-Don't Cry Sister (Album Version) 31918662-Riding With The King 26641662-Three O'Clock Blues 1299044-Jeff's Blues 26641668-Hold On! I'm Comin' 17540639-Golden Ring 31918653-Behind The Mask 28851140-I Got The Same Old Blues 1297898-Over The Rainbow 17540956-Tears In Heaven 17540890-Running On Faith - Unplugged 26641659-Ten Long Years 26641660-Key to the Highway 26641664-I Wanna Be 31918654-Sweet Home Chicago 28040813-Driftin' 413961593-Can't Let You Do It 28851133-They Call Me The Breeze 18610062-It's Easy (Album Version) 17541198-San Francisco Bay Blues
得到的歌曲為json格式,解析並且列印:
#這裡的url是真實的歌詞頁面
url = "http://music.163.com/api/song/lyric?"+"id="+str(sid)+"&lv=1&kv=1&tv=-1"
html = requests.post(url)
json_obj = html.text
#歌詞是一個json物件 解析它
j = json.loads(json_obj)
print(j)
{'sgc': True, 'sfy': False, 'qfy': False, 'transUser': {'id': 5048569, 'status': 99, 'demand': 1, 'userid': 121424, 'nickname': '老白怪蜀黍', 'uptime': 1522309673919}, 'lrc': {'version': 12, 'lyric': "[00:22.270]It's late in the evening\n[00:27.140]she's wondering what clothes to wear\n[00:32.200]She puts on her make-up\n[00:37.410]and brushes her long blonde hair\n[00:42.600]And then she asks me Do I look all right\n[00:50.690]And I say Yes you look wonderful tonight\n[01:07.890]We go to a party and everyone turns to see\n[01:17.760]This beautiful lady that's walking around with me\n[01:27.790]And then she asks me Do you feel all right\n[01:36.160]And I say Yes I feel wonderful tonight\n[01:46.030]I feel wonderful because I see\n[01:51.720]The love light in your eyes\n[01:57.140]And the wonder of it all\n[02:01.770]Is that you just don't realize how much I love you\n[02:29.420]It's time to go home now and I've got an aching head\n[02:39.040]So I give her the car keys and she helps me to bed\n[02:49.400]And then I tell her as I turn out the light\n[02:57.860]I say My darling you were wonderful tonight\n[03:07.960]Oh my darling you were wonderful tonight\n"}, 'klyric': {'version': 0, 'lyric': None}, 'tlyric': {'version': 1, 'lyric': '[by:阿坤_Arcane]\n[00:22.270]那是一個傍晚\n[00:27.140]她在想穿什麼衣服\n[00:32.200]她打扮好自己\n[00:37.410]然後梳理妥金色的長髮\n[00:42.600]然後她問我:我看起來還好嗎?\n[00:50.690]我說:是的,今晚的你美極了\n[01:07.890]我們去參加派對,所有的人都轉過頭\n[01:17.760]看著這位陪在我身邊的美麗的女士\n[01:27.790]然後她問我:你感覺還好吧\n[01:36.160]我說:是的,今晚感覺棒極了\n[01:46.030]我感到美妙,是因為我看到了\n[01:51.720]你眼中愛的光芒\n[01:57.140]而其中最最美妙的\n[02:01.770]恰是你不會明白我有多麼的愛你\n[02:29.420]是時候回家了,我有一點酒醉頭痛\n[02:39.040]我把車鑰匙給她,她會服侍我回家躺下\n[02:49.400]當我走出派對最後一縷燈光\n[02:57.860]我說:親愛的,今晚你真的很美\n[03:07.960]哦,我的愛人,今晚你真的很美\n'}, 'code': 200}
得到json格式的歌詞並獲得歌詞部分的內容,得到原歌詞內容以及翻譯的歌詞內容:
try:
lyric = j['lrc']['lyric']
tlyric = j['tlyric']['lyric']
print(lyric)
print(tlyric)
except KeyError:
lyric = "無歌詞"
[00:22.270]It's late in the evening
[00:27.140]she's wondering what clothes to wear
[00:32.200]She puts on her make-up
[00:37.410]and brushes her long blonde hair
[00:42.600]And then she asks me Do I look all right
[00:50.690]And I say Yes you look wonderful tonight
[01:07.890]We go to a party and everyone turns to see
[01:17.760]This beautiful lady that's walking around with me
[01:27.790]And then she asks me Do you feel all right
[01:36.160]And I say Yes I feel wonderful tonight
[01:46.030]I feel wonderful because I see
[01:51.720]The love light in your eyes
[01:57.140]And the wonder of it all
[02:01.770]Is that you just don't realize how much I love you
[02:29.420]It's time to go home now and I've got an aching head
[02:39.040]So I give her the car keys and she helps me to bed
[02:49.400]And then I tell her as I turn out the light
[02:57.860]I say My darling you were wonderful tonight
[03:07.960]Oh my darling you were wonderful tonight
[by:阿坤_Arcane]
[00:22.270]那是一個傍晚
[00:27.140]她在想穿什麼衣服
[00:32.200]她打扮好自己
[00:37.410]然後梳理妥金色的長髮
[00:42.600]然後她問我:我看起來還好嗎?
[00:50.690]我說:是的,今晚的你美極了
[01:07.890]我們去參加派對,所有的人都轉過頭
[01:17.760]看著這位陪在我身邊的美麗的女士
[01:27.790]然後她問我:你感覺還好吧
[01:36.160]我說:是的,今晚感覺棒極了
[01:46.030]我感到美妙,是因為我看到了
[01:51.720]你眼中愛的光芒
[01:57.140]而其中最最美妙的
[02:01.770]恰是你不會明白我有多麼的愛你
[02:29.420]是時候回家了,我有一點酒醉頭痛
[02:39.040]我把車鑰匙給她,她會服侍我回家躺下
[02:49.400]當我走出派對最後一縷燈光
[02:57.860]我說:親愛的,今晚你真的很美
[03:07.960]哦,我的愛人,今晚你真的很美
使用正則表示式獲得例如[00:22.270]的模式然後使用空字串進行替換,re.sub()具體使用方法見re正則表示式用法。string.strip()方法具體使用見string.strip()使用。
pat = re.compile(r'\[.*\]')
lrc = re.sub(pat,"",lyric)
tlrc = re.sub(pat,"",tlyric)
lrc = sid+"-"+song_name+'\n'+lrc.strip()+'\n'+tlrc.strip()+'\n'
print(lrc)
f.write(lrc)
f.close()
5048569-Wonderful Tonight
It's late in the evening
she's wondering what clothes to wear
She puts on her make-up
and brushes her long blonde hair
And then she asks me Do I look all right
And I say Yes you look wonderful tonight
We go to a party and everyone turns to see
This beautiful lady that's walking around with me
And then she asks me Do you feel all right
And I say Yes I feel wonderful tonight
I feel wonderful because I see
The love light in your eyes
And the wonder of it all
Is that you just don't realize how much I love you
It's time to go home now and I've got an aching head
So I give her the car keys and she helps me to bed
And then I tell her as I turn out the light
I say My darling you were wonderful tonight
Oh my darling you were wonderful tonight
那是一個傍晚
她在想穿什麼衣服
她打扮好自己
然後梳理妥金色的長髮
然後她問我:我看起來還好嗎?
我說:是的,今晚的你美極了
我們去參加派對,所有的人都轉過頭
看著這位陪在我身邊的美麗的女士
然後她問我:你感覺還好吧
我說:是的,今晚感覺棒極了
我感到美妙,是因為我看到了
你眼中愛的光芒
而其中最最美妙的
恰是你不會明白我有多麼的愛你
是時候回家了,我有一點酒醉頭痛
我把車鑰匙給她,她會服侍我回家躺下
當我走出派對最後一縷燈光
我說:親愛的,今晚你真的很美
哦,我的愛人,今晚你真的很美