1. 程式人生 > >QQ群資訊爬取-------抓包破解介面(2)

QQ群資訊爬取-------抓包破解介面(2)

QQ群資訊爬取-------使用模擬登陸(1)
這個比之前的爬取效率更高,爬取的網站https://qun.qq.com/member.html,簡單說一下思路

  1. 首先首先使用模擬登陸獲取cookie
  2. 經過抓包,我發現了Q群的資料介面為https://qun.qq.com/cgi-bin/qun_mgr/search_group_members,使用requests模組向該網站傳送post請求,需要post的資料格式為:{‘gc’: qqGroupNum, ‘st’: 0, ‘end’: 100, ‘sort’: 0, ‘bkn’: self.bkn},返回結果為json資料
  3. 拿到json資料就可以儲存到文字或者資料庫。

1.模擬登陸

還是使用selenium模擬登陸即可

url = "https://qun.qq.com/member.html"
driver = webdriver.Chrome()
driver.delete_all_cookies()
driver.get(url)

time.sleep(1)
driver.switch_to.frame("login_frame")  # 進入登入iframe
time.sleep(1)
change = driver.find_element_by_id("switcher_plogin")
change.click()
driver.
find_element_by_id('u').clear() # 選擇使用者名稱框 driver.find_element_by_id('u').send_keys(qq) driver.find_element_by_id('p').clear() driver.find_element_by_id('p').send_keys(passwd) driver.find_element_by_class_name("login_button").click()

獲取到一些關鍵的cookie值即可關閉瀏覽器了,關鍵的cookie值有skey,p_skey
下面詳解抓包破解

2.抓包破解介面

開啟網頁https://qun.qq.com/member.html使用者密碼登陸和fiddler
在這裡插入圖片描述
輸入正確QQ號和密碼成功登入後,檢視一下fiddler
在這裡插入圖片描述
為了看得更清楚這個請求,操作如下
在這裡插入圖片描述
右上角有個 Execute可以執行這個post請求,可以自己琢磨一下
經過我的測試,想通過這個請求獲取到正確的資料必須需要cookie中的uin,skey,p_skey以及請求資料gc,st,end,sort,bkn。
對這些引數進行一個一個分析,如下:

  • uin:這個一眼就可以看出來是,字元o拼上自己的QQ號即可
  • skey和p_skey:不太清楚是啥,但可以從cookie中直接獲取到,所以才需要進行模擬登陸獲取cookie的步驟
  • gc:QQ群的號碼
  • st:應該是start的意思,Q群成員的序號
  • end:和st對應,Q群成員的序號,通過st和end就可以獲取到從第st個到第end個成員這個意思
  • sort:不用管,就給個值為0就行了
  • bkn:這個資料是經過特別計算的值,應該是為了驗證登入之類的,相當於小小的加密了吧,我研究了一段時間,不管從哪裡都獲取不到這個值,最後是在一段JavaScript程式碼裡面,發現這個值的計算方式,嗯,我果然是天才,哈哈哈,我們回到瀏覽器,F12一下,檢視都有哪些js,如下
    在這裡插入圖片描述
    把js複製到notepad++,來一探究竟吧
    在這裡插入圖片描述
    所以我們知道bkn值的計算是和cookie值中的skey有關的,把這些資料都拿出來,愉快的程式設計吧
skey = driver.get_cookie('skey')['value']
p_skey = driver.get_cookie('p_skey')['value']
cookie = "uin=o" + qq + "; skey=" + self.skey + "; p_skey=" + self.p_skey
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
    "Cookie": self.cookie
}
e = skey
t = 5381
for n in range(0, len(e)):
    t += (t << 5) + ord(e[n])
    t &= sys.maxint
bkn = t
driver.quit()

3. 傳送post請求,拿到json資料

data = {'gc': qqGroupNum, 'st': 0, 'end': 100, 'sort': 0, 'bkn': bkn}
response = requests.post("https://qun.qq.com/cgi-bin/qun_mgr/search_group_members", data=data, headers=headers,
                         verify=False)
qqJson = json.loads(response.text)
self.qqGroupCount = qqJson['count']
members = qqJson['mems']

最後我想要一千個贊
完整程式碼如下:

#coding=utf-8
import time
from selenium import webdriver
import requests
import json
import sys
import codecs

class qqGroupSpider():
    '''
    Q群爬蟲類
    '''
    def __init__(self, qq,passwd):
        self.qqGroupCount=0
        self.nextCount=0
        self.members=[]
        self.qqGroupNum=0
        url = "https://qun.qq.com/member.html"
        driver = webdriver.Chrome()
        driver.delete_all_cookies()
        driver.get(url)

        time.sleep(1)
        driver.switch_to.frame("login_frame")  # 進入登入iframe
        time.sleep(1)
        change = driver.find_element_by_id("switcher_plogin")
        change.click()
        driver.find_element_by_id('u').clear()  # 選擇使用者名稱框
        driver.find_element_by_id('u').send_keys(qq)
        driver.find_element_by_id('p').clear()
        driver.find_element_by_id('p').send_keys(passwd)
        driver.find_element_by_class_name("login_button").click()

        time.sleep(3)
        self.skey = driver.get_cookie('skey')['value']
        self.p_skey = driver.get_cookie('p_skey')['value']
        self.cookie = "uin=o" + qq + "; skey=" + self.skey + "; p_skey=" + self.p_skey
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
            "Cookie": self.cookie
        }
        e = self.skey
        t = 5381
        for n in range(0, len(e)):
            t += (t << 5) + ord(e[n])
            t &= sys.maxint
        self.bkn = t
        driver.quit()

    class Mylist():
        '''
        自定義列表,自定義迭代器
        '''
        def __init__(self,members,qqGroupNum,qqGroupCount):
            self.nextCount=0
            self.qqGroupCount=qqGroupCount
            self.qqGroupNum=qqGroupNum
            self.members=members
            self.spider=None

        def __iter__(self):
            return self

        def next(self):
            if self.nextCount >= self.qqGroupCount:
                raise StopIteration
            elif self.nextCount==0:
                self.nextCount += 1
                return self.members[0]
            elif (self.nextCount)%100==0:
                members = self.spider.getMembers(self.qqGroupNum,start=self.nextCount)
                self.members=members
                self.nextCount += 1
                return members[0]
            else:
                self.nextCount += 1
                return self.members[(self.nextCount-1)%100]

    def getGroupList(self):
        '''
        獲取QQ對應的所有Q群號
        :return:
        '''
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
            "Cookie": self.cookie
        }

        groupInfoR = requests.post('https://qun.qq.com/cgi-bin/qun_mgr/get_group_list', data='bkn=' + str(self.bkn),
                                   headers=headers, verify=False)
        groupInfo = json.loads(groupInfoR.text)
        groupList = groupInfo['create'] + groupInfo['join']
        return groupList

    def getMembers(self,qqGroupNum,start=0,count=100):
        '''
        獲取Q群成員列表的方法,返回json陣列
        :param qqGroupNum: Q群號
        :param start: Q群成員的開始下標索引
        :param count: 獲取Q群成員的個數,預設取100個,不足100個成員,則全部取出
        :return: 返回Q群成員列表
        '''
        self.qqGroupNum=qqGroupNum
        data = {'gc': qqGroupNum, 'st': start, 'end': start+count, 'sort': 0, 'bkn': self.bkn}
        response = requests.post("https://qun.qq.com/cgi-bin/qun_mgr/search_group_members", data=data, headers=self.headers,
                                 verify=False)
        qqJson = json.loads(response.text)
        self.qqGroupCount = qqJson['count']
        members = qqJson['mems']
        return members

    def getMembers2(self, qqGroupNum):
        '''
        返回一個自定義的列表,列表可以返回QQ成員資訊
        :param qqGroupNum: Q群號
        :return:
        '''
        self.qqGroupNum = qqGroupNum
        data = {'gc': qqGroupNum, 'st': 0, 'end': 100, 'sort': 0, 'bkn': self.bkn}
        response = requests.post("https://qun.qq.com/cgi-bin/qun_mgr/search_group_members", data=data,
                                 headers=self.headers,
                                 verify=False)
        qqJson = json.loads(response.text)
        count = qqJson['count']
        members = qqJson['mems']
        mylist=self.Mylist(members, qqGroupNum, count)
        mylist.spider=self
        return mylist

def main():
    qq=raw_input('請輸入QQ號:')
    pwd=raw_input('請輸入密碼:')
    spider=qqGroupSpider(qq,pwd)
    groupList =spider.getGroupList()

    print '請選擇以下Q群進行爬取'
    for each in groupList:
        print each['gn'],":",each['gc']
    qqGroup=raw_input('請輸入Q群號:')

    members =spider.getMembers2(qqGroup)
    # members=spider.getMembers(qqGroup)

    writefile = codecs.open(u'qqGroup.txt', 'w',encoding='utf-8')
    writefile.write(u'暱稱,q號,性別,q齡,入群時間,等級(積分),最後發言\n')
    for one in members:
        lv=one['lv']
        g=one['g']
        gender=u'男'
        if g==255:
            gender=u'未知'
        elif g==0:
            pass
        elif g==1:
            gender=u'女'
        else:
            gender=g

        join_time = time.localtime(one['join_time'])
        join_time = str(join_time[0]) + u'年' + str(join_time[1]) + u'月' + str(
            join_time[2]) + u'日' + str(join_time[3]) + u'時' + str(join_time[4]) + u'分'
        last_speak_time = time.localtime(one['last_speak_time'])
        last_speak_time=str(last_speak_time[0]) + u'年' + str(last_speak_time[1]) + u'月' + str(last_speak_time[2]) + u'日' + str(last_speak_time[3]) + u'時' + str(last_speak_time[4]) + u'分'
        datatext = one['nick']+","+str(one['uin'])+","+gender+","+str(one['qage'])+","+join_time+","+\
              str(lv['level'])+'('+str(lv['point'])+')'+","+last_speak_time+'\n'
        writefile.write(datatext)
    writefile.close()

if __name__ == '__main__':
    main()

程式碼放在GitHub上,python-learning
以上具體程式碼在目錄other/qq_group2.py
我的GitHub
QQ:2541692705
郵箱:[email protected]
我想去流浪,我想去讀書,若有機會,江湖再見
掃一掃,領取紅包,就當獎勵你我付出的努力