cdp協議簡介

阿新 • • 發佈：2021-06-16

啥是cdp

根據官網的說法，cdp(Chrome DevTools Protocol) 允許我們檢測，除錯Chromium, Chrome 和其他基於 Blink的瀏覽器. 這個協議被廣泛使用. 其中最著名的是 Chrome DevTools，協議的api也由這個團隊維護。

使用cdp的姿勢

首先需要開啟： "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222

如果在瀏覽器中，當你開啟devtools時，其實你已經在使用cdp了，只是感知不深罷了，一種辦法可以更直觀的感知cdp，就是開啟devtools的devtools，具體操作如下：

將開發者工具設定為獨立視窗，dock side點第一個
在開發者工具上再使用快捷鍵ctrl+shift+i，就可以開啟開發者工具的開發者工具了（就是先開啟開發者工具成獨立視窗；再在這個獨立視窗上再用快捷鍵ctrl+shift+i，又打開了開發者工具），現在在新開啟的開發者工具的console裡面，輸入下面的程式碼：

let Main = await import('./main/main.js');
Main.MainImpl.sendOverProtocol('Runtime.evaluate', {expression: 'alert (12345)'});

這時網頁會alert 12345，你會發現平時在控制檯簡單的程式碼執行，其實是通過cdp遠端呼叫網頁的js引擎去執行再返回結果的。

除此之外，protocol monitor也可以幫助我們更直觀的理解cdp。

幾個重要的URL

當一個頁面暴露出它的remote debugging port時，我們就可以藉助cdp來對這個網頁進行remote debugging了。由於cdp是藉助websocket實現的，所以，在一切開始之前，有兩個url是比較重要的
http://localhost:[port]/json/list
http://localhost:[port]/json/version
這兩個url，可以讓我們拿到網頁的websocket url，json/list返回的資料類似於：

[
{
description: "",
devtoolsFrontendUrl: "/devtools/inspector.html?ws=localhost:8080/devtools/page/a31c4d5c-b0df-48e8-8dcc-7c98964e2ebe",
id: "a31c4d5c-b0df-48e8-8dcc-7c98964e2ebe",
title: "",
type: "page",
url: "xxx://xxx",
webSocketDebuggerUrl: "ws://localhost:8080/devtools/page/a31c4d5c-b0df-48e8-8dcc-7c98964e2ebe"
}
]

其中webSocketDebuggerUrl就是我們需要的開啟remote debugging 的鑰匙

重頭戲websocket

接下來我們連上ws，就可以愉快的遠端操作頁面了，正如chrome devtools所做的那樣，下面是一個例子：

const WebSocket = require('ws');
const puppeteer = require('puppeteer');

(async () => {
  // Puppeteer launches browser with a --remote-debugging-port=0 flag,
  // parses Remote Debugging URL from Chromium's STDOUT and exposes
  // it as |browser.wsEndpoint()|.
  const browser = await puppeteer.launch();

  // Create a websocket to issue CDP commands.
  const ws = new WebSocket(browser.wsEndpoint(), {perMessageDeflate: false});
  await new Promise(resolve => ws.once('open', resolve));
  console.log('connected!');

  ws.on('message', msg => console.log(msg));

  console.log('Sending Target.setDiscoverTargets');
  ws.send(JSON.stringify({
    id: 1,
    method: 'Target.setDiscoverTargets',
    params: {
      discover: true
    },
  }));
})();

更多例子可以在這裡

jsonRPC

如上面例子所示，當ws連線後，一個發給瀏覽器的指令大概包括3部分id，method，params，比如一個執行一段console.log('hello')程式碼的指令：

{
  "id": 235,
  "method": "Runtime.evaluate",
  "params": {
    "expression": "console.log('hello');",
    "objectGroup": "console",
    "includeCommandLineAPI": true,
    "silent": false,
    "contextId": 1,
    "returnByValue": false,
    "generatePreview": true,
    "userGesture": true,
    "awaitPromise": false
  }
}

chrome devtools可以完成的功能非常龐大，而這些功能基本都是使用這樣的一個個指令實現的，讓人想起那句古老的中國名言：九層之臺，起於壘土。本文完

參考資料：
https://chromedevtools.github.io/devtools-protocol
https://github.com/aslushnikov/getting-started-with-cdp/blob/master/README.md

文章來源: www.cnblogs.com，作者：nobody-junior，版權歸原作者所有，如需轉載，請聯絡作者。

原文連結：https://www.cnblogs.com/imgss/p/12852595.html

這邊我選擇的是 python 的 pychromegithub 地址,使用方法很簡單，直接看 github 上它的 Demo

這個庫依賴websocket-client

獲取 performance api 資料

這裡使用 Runtime Domain 中執行 JavaScript 指令碼的 APIRuntime.evaluate

# 開始前先啟動chrome，啟動chrome必須帶上引數`--remote-debugging-port=9222`開啟遠端除錯否則無法與chrome互動
browser = pychrome.Browser('http://127.0.0.1:%d' % 9222)
tab = browser.new_tab()
tab.start()
tab.Runtime.enable()
tab.Page.navigate(url={你的頁面地址})
# 設定等待頁面載入完成的時間
tab.wait(10)
# 執行js指令碼
timing_remote_object = tab.Runtime.evaluate(
            expression='performance.timing'
        )
# 獲取performance.timing結果資料
timing_properties = tab.Runtime.getProperties(
            objectId=timing_remote_object.get('result').get('objectId')
        )
timing = {}
for item in timing_properties.get('result'):
            if item.get('value', {}).get('type') == 'number':
                    timing[item.get('name')] = item.get('value').get('value')
# 獲取performance.getEntries()資料
entries_remote_object = tab.Runtime.evaluate(
            expression='performance.getEntries()'
        )
entries_properties = tab.Runtime.getProperties(
            objectId=entries_remote_object.get('result').get('objectId')
        )
entries_values = []
for item in entries_properties.get('result'):
  if item.get('name').isdigit():
    url_timing_properties = tab.Runtime.getProperties(
                    objectId=item.get('value').get('objectId')
                )
     entries_value = {}
     for son_item in url_timing_properties.get('result'):
                    if (son_item.get('value', {}).get('type') == 'number'or
                            son_item.get('value', {}).get('type') == 'string'):
                        entries_value[son_item.get('name')] = son_item.get('value').get('value')
                entries_values.append(entries_value)

獲取 Network 資料

實際上 performance.getEntries() 不會記錄 404 的請求資訊，另外當前頁面通過 js 觸發新 html 頁面請求時它只會記錄第一個頁面的請求，在這些情況下就需要通過 Network Domain 的 API 來收集所有請求資訊，先介紹用到的 API:

Network.requestWillBeSent每個 http 請求傳送前回調
Network.responseReceived首次接送到 http 響應時回撥
Network.loadingFinished請求載入完成時回撥

Network.loadingFailed請求載入失敗時回撥

# 封裝上面4個事件對應的回撥方法
class NetworkAPIImplemention(object):

def __init__(self):
    self.request_dict = {}
    # 首個請求開始時間
    self.start = None

def request_will_be_sent(self, **kwargs):
    if self.start is None:
        self.start = time.time()
    dict_http = {
        'url':kwargs.get('request').get('url'),
        'start':kwargs.get('timestamp')
    }
    self.request_dict[kwargs.get('requestId')]=dict_http
    #print "loading:%s" % kwargs.get('request').get('url')

def loading_finished(self, **kwargs):
    # 伺服器返回code 例如404也是finished
    self.request_dict[kwargs.get('requestId')]['end'] = kwargs.get('timestamp')
    self.request_dict[kwargs.get('requestId')]['size'] = kwargs.get('encodedDataLength')

def response_received(self, **kwargs):
    self.request_dict[kwargs.get('requestId')]['type'] = kwargs.get('type')
    self.request_dict[kwargs.get('requestId')]['response'] = kwargs.get('response')

def loading_failed(self, **kwargs):
    self.request_dict[kwargs.get('requestId')]['end'] = kwargs.get('timestamp')
    self.request_dict[kwargs.get('requestId')]['error_text'] = kwargs.get('errorText')
network_api = NetworkAPIImplemention()
browser = pychrome.Browser('http://127.0.0.1:%d' % 9222)
tab = browser.new_tab()
# 繫結回撥函式
tab.Network.requestWillBeSent = network_api.request_will_be_sent
tab.Network.responseReceived = network_api.response_received
tab.Network.loadingFinished = network_api.loading_finished
tab.Network.loadingFailed = network_api.loading_failed
tab.start()
tab.Network.enable()
tab.Runtime.enable()
# 是否禁用快取
if disable_cache:
tab.Network.setCacheDisabled(cacheDisabled=True)
tab.Page.navigate(url={你的頁面地址})
tab.wait(10)
tab.stop()
self.browser.close_tab(tab)
# 獲取的所有url詳細資訊
print network_api.request_dict

監聽頁面事件

有時候特別是一些複雜的頁面，頁面依賴 js 和後端資源資料，並不是通常意義上頁面 loadEventEnd 事件觸發完就表示頁面載入完成，這種情況可能需要依賴開發打點。
這裡以開發設計了一個Loaded事件為例

# 具體事件註冊方式和註冊時機詢問開發，所謂註冊時機即要求在js物件生成後註冊，我們專案中page是在一個js檔案中宣告的，需要等這個js檔案請求完成後再註冊
# 這邊使用Promise方式，這種方式awaitPromise引數必須是True
js = """
    new Promise((resolve, reject) => {
        page.getController().getPageEvent().addEventListener("Loaded",
                function(){
                    resolve(new Date().getTime());
                });
        });
   """
custom_result = tab.Runtime.evaluate(
    expression=js,
    awaitPromise=True,
    timeout=timeout * 1000
)
print custom_result.get('result').get('value')

有個坑peformance.now()獲取與 chrome 開發者工具協議一樣型別的時間時，這個時間不準確，只好用new Date().getTime()

寫在最後

一開始是使用 nodejs 的 chrome-remote-interface，但是發現Page.loadEventFired回撥後不會再記錄請求，事實上有些頁面仍然有請求沒有完成，不懂是不是我使用姿勢不對
附贈 W3C 的一幅圖

cdp協議簡介

啥是cdp

使用cdp的姿勢

幾個重要的URL

重頭戲websocket

jsonRPC

獲取 performance api 資料

獲取 Network 資料

監聽頁面事件

寫在最後

cdp協議簡介

I2C協議及其原始碼跟進（一）I2C協議簡介及層次架構

python學習筆記——爬蟲前置——HTTP協議簡介

SPI協議簡介及時序

HDMI CEC協議簡介

實用TCP協議（1）：TCP 協議簡介

BFD協議簡介

HTTP協議簡介

USB匯流排-USB協議簡介（一）

簡介TCP協議的三次握手和四次揮手

01 前端簡介及HTTP協議

1 . 網路程式設計簡介與OSI七層協議

前端入門 HTTP協議 HTML簡介

Mac 開發（一）蘋果沙盒機制sandbox簡介

WKWebView預設快取策略與HTTP快取協議

Dubbo原始碼解析（二十四）遠端呼叫——dubbo協議

手把手教你用 Node 實現 HTTP 協議（三）

分散式事務之深入理解什麼是2PC、3PC及TCC協議？

一篇文章讓你明白CPU快取一致性協議MESI

Dubbo原始碼解析（三十一）遠端呼叫——rmi協議

cdp協議簡介

啥是cdp

使用cdp的姿勢

幾個重要的URL

重頭戲websocket

jsonRPC

獲取 performance api 資料

獲取 Network 資料

監聽頁面事件

寫在最後

相關推薦