Scrapy爬取網易雲音樂和評論（五、評論）

阿新 • • 發佈：2021-10-06

教程系列連結目錄：

1、Scrapy爬取網易雲音樂和評論（一、思路分析）
2、Scrapy爬取網易雲音樂和評論（二、Scrapy框架每個模組的作用）
3、Scrapy爬取網易雲音樂和評論（三、爬取歌手）
4、Scrapy爬取網易雲音樂和評論（四、關於API）
5、Scrapy爬取網易雲音樂和評論（五、評論）

專案GitHub地址：https://github.com/sujiujiu/WYYScrapy

CSDN不允許寫爬取類的東西，其他幾章被遮蔽了，可以去我簡書看https://www.jianshu.com/u/a0871cf1b395，後續可能會陸續轉移到部落格園

評論的API的參考連結：
1、https://github.com/darknessomi/musicbox/wiki/網易雲音樂新版WebAPI分析。（這個是從歌單下手的，裡面的評論可以參考）

2、http://www.imooc.com/article/17459?block_id=tuijian_wz
3、http://blog.csdn.net/u012104691/article/details/53766045
後面這幾篇都講的比較詳細，當時查資料的時候，還查到另外一種寫法，就是裡面有一堆命名是first_param什麼的，看得頭暈眼花，然後當時測試似乎也沒有成功，建議用現在的這種就好了。

基本模式就是這樣：

圖片程式碼來自：來自https://github.com/darknessomi/musicbox/wiki/%E7%BD%91%E6%98%93%E4%BA%91%E9%9F%B3%E4%B9%90%E6%96%B0%E7%89%88WebAPI%E5%88%86%E6%9E%90%E3%80%82

因為專輯和歌曲都有評論，所以我專門將它寫成了個類，後面直接呼叫就可以了。

# -*-coding:utf-8-*-
import os
import re
import sys
import json
import base64
import binascii
import hashlib
import requests
from Crypto.Cipher import AES

class CommentCrawl(object):
    '''評論的API封裝成一個類，直接傳入評論的API，再呼叫函式get_song_comment()和get_album_comment()即可分別獲取歌曲和專輯的評論資訊
    '''

    def __init__(self,comment_url):
        self.comment_url = comment_url
        self.headers = {
            "Referer":"http://music.163.com",
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3067.6 Safari/537.36",
        }

    def createSecretKey(self,size):
        '''生成長度為16的隨機字串作為金鑰secKey
        '''
        return (''.join(map(lambda xx: (hex(ord(xx))[2:]), os.urandom(size))))[0:16]

    def AES_encrypt(self,text, secKey):
        '''進行AES加密
        '''
        pad = 16 - len(text) % 16
        text = text + pad * chr(pad)
        encryptor = AES.new(secKey, 2, '0102030405060708')
        encrypt_text = encryptor.encrypt(text.encode())
        encrypt_text = base64.b64encode(encrypt_text)
        return encrypt_text

    def rsaEncrypt(self, text, pubKey, modulus):
        '''進行RSA加密
        '''
        text = text[::-1]
        rs = int(text.encode('hex'), 16) ** int(pubKey, 16) % int(modulus, 16)
        return format(rs, 'x').zfill(256)

    def encrypted_request(self, text):
        '''將明文text進行兩次AES加密獲得密文encText,
        因為secKey是在客戶端上生成的，所以還需要對其進行RSA加密再傳給服務端。
        '''
        pubKey = '010001'
        modulus = '00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7'
        nonce = '0CoJUm6Qyw8W8jud'

        text = json.dumps(text)
        secKey = self.createSecretKey(16)
        encText = self.AES_encrypt(self.AES_encrypt(text, nonce), secKey)
        encSecKey = self.rsaEncrypt(secKey, pubKey, modulus)
        data = {
            'params': encText,
            'encSecKey': encSecKey
        }
        return data   

    def get_post_req(self, url, data):
        try:
            req = requests.post(url, headers=self.headers, data=data)
        except Exception,e:
            # dosomething
            print url,e
            # return None
        return req.json()            

    def get_offset(self, offset=0):
        '''偏移量
        '''
        if offset == 0:
            text = {'rid':'', 'offset':'0', 'total':'true', 'limit':'20', 'csrf_token':''} 
        else:
            text = {'rid':'', 'offset':'%s' % offset, 'total':'false', 'limit':'20', 'csrf_token':''} 
        return text

    def get_json_data(self,url,offset):
        '''json 格式的評論
        '''
        text = self.get_offset(offset)
        data = self.encrypted_request(text)
        json_text = self.get_post_req(url, data)
        return json_text

    def get_song_comment(self):
        '''某首歌下全部評論
        '''
        comment_info = []
        data = self.get_json_data(self.comment_url,offset=0)
        comment_count = data['total']
        if comment_count:
            comment_info.append(data)
            if comment_count > 20:
                for offset in range(20,int(comment_count),20):
                    comment = self.get_json_data(self.comment_url,offset=offset)
                    comment_info.append(comment)
        return comment_info

    def get_album_comment(self,comment_count):
        '''某專輯下全部評論
        '''
        album_comment_info = []
        if comment_count:
            for offset in range(0,int(comment_count),20):
                comment = self.get_json_data(self.comment_url,offset=offset)
                album_comment_info.append(comment)
        return album_comment_info

程式碼裡有四個常量值，'0102030405060708'是個固定值，pubKey ，modulus，nonce都是常量來的，大家就不要改值，也不要去琢磨為什麼了，它就是設定了這麼一個值。這幾個常量名按規範應該寫成大寫，也應該放在外面最好，但這裡為了方便理解，特意放在該函式裡。

重複的地方我就不贅述了，最後兩個地方我之所以分開寫，是因為專輯的評論數可以從專輯資訊裡獲取，但歌曲評論數從專輯列表資訊裡獲取不到，只能先爬取它第一頁的json資料，它裡面的total就是評論總數，然後再做後面的處理。

評論的API：

# 1、專輯：
comment_url = 'http://music.163.com/weapi/v1/resource/comments/R_AL_3_%s?csrf_token=' % album_id
# 2、歌曲：
comment_url = 'http://music.163.com/weapi/v1/resource/comments/R_SO_4_%s?csrf_token=' % song_id

然後將comment_url 作為引數傳入上面封裝的那個類裡即可，不同的是專輯還需先獲取專輯評論的數量。
所有的分析都結束了，接下來的程式碼自己寫吧。

Scrapy爬取網易雲音樂和評論（五、評論）

教程系列連結目錄：

Scrapy爬取網易雲音樂和評論（二、Scrapy框架每個模組的作用）

Scrapy爬取網易雲音樂和評論（五、評論）

Scrapy爬取網易雲音樂和評論（四、關於API）

python爬取網易雲音樂熱歌榜例項程式碼

Python爬取網易雲音樂歌手歌曲和歌單

爬取網易雲音樂歌曲特色榜單資訊

爬取網易雲音樂

python | 爬取網易雲音樂下載

爬取網易雲音樂熱歌榜 - Python

python爬取網易雲音樂並分析：使用者有什麼樣的音樂偏好？

用Python爬取網易雲音樂的使用者評論文字

python爬蟲爬取網易雲音樂（超詳細教程，附原始碼）

Python 爬取網易雲歌手的50首熱門作品

Python爬蟲之js加密破解，抓取網易雲音樂評論生成詞雲

Python爬蟲實戰，argparse模組，Python模擬登入爬取網易雲個人聽歌排行榜

Python爬取網易雲歌曲評論，做詞雲分析

Python爬蟲實戰，requests模組，Python爬取網易雲歌曲並儲存本地

網易雲音樂專案實現（一）

網易雲音樂專案實現（二）

網易雲音樂專案實現（三）

Scrapy爬取網易雲音樂和評論（五、評論）

教程系列連結目錄：

相關推薦