Python 使用BrowserMob Proxy + selenium 獲取Ajax加密資料

阿新 • • 發佈：2020-07-15

BrowserMob Proxy，簡稱 BMP，它是一個 HTTP 代理服務，我們可以利用它截獲 HTTP 請求和響應內容。

第一步：先安裝 BrowserMob Proxy 的包。

pip install browsermob-proxy

第二步：下載 browsermob-proxy 的二進位制檔案，用於啟動 BrowserMob Proxy。

下載地址：https://github.com/lightbody/browsermob-proxy/releases

第三步：將下載好的檔案直接放到專案目錄下。

程式碼走起：

# _*_ coding:utf-8 _*_
import os
 
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from browsermobproxy import Server
import time
import json


class BaseFramework(object):

    def __init__(self):
        self.server = Server('./browsermob-proxy-2.1.4/bin/browsermob-proxy')
        self.server.start()
        self.proxy  
= self.server.create_proxy()
        chrome_options = Options()
        chrome_options.add_argument('--ignore-certificate-errors')
        chrome_options.add_argument('--proxy-server={0}'.format(self.proxy.proxy))
        chrome_options.add_argument('--headless')  # 無頭模式
        self.browser = webdriver.Chrome(options=chrome_options)

     
def process_request(self, request, response):
        pass

    def process_response(self, response, request):
        pass

    def run(self, func, *args):
        self.proxy.new_har(options={
            'captureContent': True,
            'captureHeaders': True
        })
        func(*args)
        result = self.proxy.har
        for entry in result['log']['entries']:
            request = entry['request']
            response = entry['response']
            self.process_request(request, response)
            self.process_response(response, request)

    def __del__(self):
        self.proxy.close()
        self.browser.close()


class Framework(BaseFramework):

    def load(self, url):
        self.browser.get(url)
        time.sleep(3)

    def process_request(self, request, response):
        pass

    def process_response(self, response, request):
        # print(request['url'])
        if '/item/timemap/cn/' in request['url']:
　　　　　# 找到你所需資料的url即可快樂的解析資料了
            try:
                text = response['content']['text']
                text_dict = json.loads(text)
                data_result = text_dict['data']
            except KeyError:
                print('----KeyError: text----')
                return
            name = data_result['name']  # 姓名
            id_name = name_id + '_' + name
            print(id_name)
            time_map_list = data_result['timeMap']
            if time_map_list:
                time_map_dict = {}
                for i in range(len(time_map_list)):
                    time_map = time_map_list[i]
                    time_map_dict[str(i)] = time_map
            else:
                return
            path = f'./****/{id_name}.json'
            if os.path.exists(path):
                print(f'------{id_name}--已存在------')
                return
            with open(path, 'w', encoding='utf-8') as f:
                f.write(json.dumps(time_map_dict, ensure_ascii=False, indent=4))


if __name__ == '__main__':
    Framework = Framework()
    id_list = ['********']
    for name_id in id_list:
        url = "************************"
        Framework.run(Framework.load, url)

注意：如果沒有裝java，可能會報錯，自行百度安裝java，並將配置java環境。

結果如下：

解釋解釋：

程式碼一共分了四步：

•第一步便是啟動 BrowserMob Proxy，它會在本地啟動一個代理服務，這裡注意 Server 的第一個引數需要指定 BrowserMob Proxy 的可執行檔案路徑，這裡我就指定了下載下來的 BrowserMob Proxy 的 bin 目錄的 browsermob-proxy 的路徑。
•第二步便是啟動 Selenium 了，它可以設定 Proxy Server 為 BrowserMob Proxy 的地址。
•第三步便是訪問頁面同時監聽結果，這裡我們需要呼叫 new_har 方法，同時指定捕獲 Resopnse Body 和 Headers 資訊，緊接著呼叫 Selenium 的 get 方法訪問一個頁面，這時候瀏覽器便會載入這個頁面，同時所有的請求和響應資訊都會被記錄到 HAR 中。
•第四步便是讀取 HAR 到內容了，我們呼叫 log 到 entries 欄位，裡面便包含了請求和響應的具體結果，這樣所有的請求和響應資訊我們便能獲取到了，Ajax 的內容也不在話下。

有了這個我們就不需要非得等頁面加載出來之後再根據頁面渲染結果提取資訊了，Ajax 請求直接拿原始資料，爽歪歪！

Python 使用BrowserMob Proxy + selenium 獲取Ajax加密資料

Python 使用BrowserMob Proxy + selenium 獲取Ajax加密資料

python selenium 獲取介面資料的實現

Browsermob-Proxy（Selenium）爬取瀏覽器獲取Har資訊（含例項）

【Python】Selenium輔助海量基金資料獲取

Python獲取當前指令碼資料夾(Script)的絕對路徑方法程式碼

Ajax獲取node伺服器資料的完整步驟

python 呼叫API介面獲取和解析 Json資料

python 查詢檔案，迴圈遍歷資料夾，獲取資料夾中的某個檔案中的檔案，判斷檔案是否包含在某個資料夾下

python爬蟲利用selenium實現自動翻頁爬取某魚資料的思路詳解

[twitter spider] Python 使用推特開發者賬號應用+tweepy api，採集獲取推特資料

python測試開發django-145.$.ajax() 請求 javascript 獲取當前 URL 引數

python之處理selenium中的獲取元素屬性問題 || 處理selenium中的獲取文字問題 || 處理selenium中的視窗切換問題 || 處理selenium中的滑鼠懸停問題

python selenium 獲取UA

python 非同步請求 mysql 獲取資料的指令碼

C#獲取某路徑資料夾中全部圖片或其它指定格式的檔名的例項方法

python爬蟲貓眼電影和電影天堂資料csv和mysql儲存過程解析

Python如何應用cx_Oracle獲取oracle中的clob欄位問題

Python使用SQLite和Excel操作進行資料分析

linux 下python多執行緒遞迴複製資料夾及資料夾中的檔案

python 利用已有Ner模型進行資料清洗合併程式碼

Python 使用BrowserMob Proxy + selenium 獲取Ajax加密資料

相關推薦