1. 程式人生 > 其它 >E-GraphSAGE: A Graph Neural Network based Intrusion Detection System 筆記

E-GraphSAGE: A Graph Neural Network based Intrusion Detection System 筆記

E-GraphSAGE: A Graph Neural Network based Intrusion Detection System

目錄

介紹

總之,本文的主要貢獻有兩個:

• 我們提出並實現了 E-GraphSAGE,它是 GraphSAGE 的擴充套件,它允許結合邊緣特徵/屬性進行圖表示學習。 這一貢獻適用於一系列 GNN 用例,其中邊緣特徵代表關鍵資訊。
• 我們將 E-GraphSAGE 應用於網路入侵檢測和網路流分類,並通過廣泛的實驗評估證明其潛力。
本文的其餘部分安排如下。 第二節討論了關鍵的相關工作,第三節提供了 GNN 和 GraphSAGE 的相關背景。 我們提出的 E-GraphSAGE 演算法和相應的 NIDS 在第四節中介紹。 實驗評估結果在第六節中介紹,第七節總結了論文。

翻譯

訓練階段

在我們的實施過程中使用的神經網路模型由兩個egraphsage層組成,這意味著鄰居資訊是由一個兩跳的鄰域聚整合的。對於聚集函式AGG,就像在公式5中展示的那樣,我們使用平均數方法,他簡單尋找基於元素的平均值,這個平均值是從樣本的鄰居中的邊緣嵌入的平均值。在egraphsage中的平均值聚類方法的定義提供在下面

\[h^k_{N(v)}=\sum\limits_{{u\in N(v),\atop uv\in \epsilon}} \frac{h^{k-1}_{uv}}{\lvert N(v)\rvert _e} \]

這裡,\(\lvert N(v)\rvert _e\)代表在樣本鄰域的邊緣的數量,\(h^{k-1}_{uv}\)

代表他們的嵌入在k-1。為了我們的實現,我們選擇全鄰域樣本,這意味著在一個節點的鄰域的全部邊緣的平均值資訊被聚合

在兩個egraphsage層中,對於每層的隱特徵大小的表示在公式3中,我們使用128個隱藏節點,同時他們也是節點嵌入的維度。對於非線性的轉換,我們使用ReLU啟用函式,並且為了規則化的提出,我們在兩個egraphsage層中,使用一個比率為0.2的退出機制。我們使用交叉熵損失函式,並且在反向傳播階段的梯度下降階段使用亞當優化器執行,學習率為0.001

在egraphsage最後一層中生成節點嵌入時,他們轉換成對應的邊緣嵌入。因為邊緣嵌入通過拼接兩個節點產生的,所以邊緣嵌入的大小是256維。

GNN

A common task performed by GNNs is to generating node embeddings [16], which aims to encode nodes as low-dimensional vectors, while maintaining their key relationships and graph position in the original format. A pair of node embeddings can be concatenated together to form edge embeddings to represent the edges. Node or edge embedding is typically a key precursor to ’downstream tasks such as node and edge classification or link prediction [16]. GNNs have recently received a lot of attention due to their convincing performance and high interpretability of the results through the visualisation of the graph embeddings [17].

GraphSAGE

為了推廣CNN的強大能力到非歐空間結構的資料上,GNNs使用了訊息傳遞的概念。為此,圖節點的鄰居的特徵通常被聚合或者作為傳遞到那個節點上的一個訊息。這個過程在一些迭代中多次重複,以從網路節點中傳播資訊。最終的結果,即在每個節點中獲取的聚合資訊,被稱作節點嵌入。

如果從每個迭代的每個節點的鄰居收集資訊,就像在很多GNN中提議的那樣,這個方法受到可擴充套件性的限制,同樣也有在大型圖中無法預測的儲存和計算資源的需求

Batch Size定義:一次訓練所選取的樣本數。

Forward Propagation - Node Embedding

當前第k層v節點的嵌入等於啟用函式下 權重乘 k-1層的v的嵌入拼接k-1層v的鄰居的嵌入

重要文獻

Q. Xiao, J. Liu, Q. Wang, Z. Jiang, X. Wang, and Y. Yao, “Towards Network Anomaly Detection Using
Graph Embedding,” in Computational Science – ICCS 2020, V. V. Krzhizhanovskaya, G. Závodszky, M. H. Lees, J. J. Dongarra, P. M. A. Sloot, S. Brissos, and J. Teixeira, Eds., Cham: Springer International Publishing, 2020, pp. 156–169, ISBN : 978-3-030-50423-6.


Xiao et al. [11] proposed a graph embedding approach to perform anomaly detection on network flows. The authors first converted the network flows into a first-order and secondorder graph. The first-order graph learns the latent features from the perspective of a single host by using its IP address and port number. The second-order graph aims to learn the latent features from a global perspective by using source IP addresses, source ports, destination IP addresses, as well as destination ports. The extracted graph embeddings and the raw features are then used to train a Random Forest classifier to detect network attacks. The evaluation is limited to only two NIDS datasets, namely CICIDS 2017 [12] and CIDDS001 [13]. In contrast, the evaluation of E-GraphSAGE-based NIDS considers six recent benchmark datasets. Moreover, a more significant limitation of this approach is its use of a traditional transductive graph embedding method [6], which limits its ability to classify samples with graph nodes, e.g. IP addresses and port numbers, which were not seen during the training phase. This makes the approach unsuitable for most practical NIDS application scenarios, as we cannot assume that all local and remote IP addresses and port numbers in the network are known at training time. In contrast, the EGraphSAGE approach presented in this paper uses an inductive graph neural learning approach, which does not suffer from this limitation.