使用K近鄰演算法檢測Rootkit、WebShell

阿新 • • 發佈：2019-01-20

使用K近鄰演算法檢測Rootkit

基於telnet連線的rootkit檢測流程：

KDD 99 資料（41維特徵）->篩選與rootkit相關特徵->基於tcp內容的特徵->向量化->與rootkit相關的特徵向量->KNN演算法+10折交叉驗證->評估效果

1、資料蒐集和清洗

這裡用的是KDD 99資料集,篩選標記為rootkit和normal且是telnet協議的資料

        if(x1[41] in ['rootkit.','normal.']) and (x1[2] == 'telnet'):
            if x1[41] == 'rootkit.':
                y.append(1)
            else:
                y.append(0)

2、特徵化

挑選與rootkit相關的特徵作為樣本特徵

            x1 = x1[9:21]
            v.append(x1)
    for x1 in v:
        v1 = []
        for x2 in x1:
            v1.append(float(x2))
        w.append(v1)

3、訓練樣本

例項化KNN演算法，k設定為3

    model = KNeighborsClassifier(n_neighbors=3)

4、效果驗證

使用十折交叉驗證

model_selection.cross_val_score(model,x,y,n_jobs=-1,cv=10)

完整程式碼：

import re
import matplotlib.pyplot as plt
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import model_selection
import os
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

def load_kdd99(filename):
    x = []
    with open(filename) as f:
        for line in f:
            line = line.strip('\n')
            line = line.split(',')
            x.append(line)
    return x
def get_rookit2andNormal(x):
    v = [];w = [];y = []
    for x1 in x:
        if(x1[41] in ['rootkit.','normal.']) and (x1[2] == 'telnet'):
            if x1[41] == 'rootkit.':
                y.append(1)
            else:
                y.append(0)
            x1 = x1[9:21]
            v.append(x1)
    for x1 in v:
        v1 = []
        for x2 in x1:
            v1.append(float(x2))
        w.append(v1)
    return w,y
if __name__ == '__main__':
    v = load_kdd99("G:/data/kddcup99/corrected")
    x,y = get_rookit2andNormal(v)
    model = KNeighborsClassifier(n_neighbors=3)
    print(model_selection.cross_val_score(model,x,y,n_jobs=-1,cv=10))

使用K近鄰演算法檢測WebShell

1、資料蒐集和清洗

載入ADFA-LD中的正常樣本資料

def load_adfa_training_files(rootdir):
    x=[]
    y=[]
    list = os.listdir(rootdir) #列出該路徑下的所有檔案
    for i in range(0, len(list)):
        path = os.path.join(rootdir, list[i]) #將list[i]新增到rootdir路徑後
        if os.path.isfile(path): #判斷該路徑下是不是檔案
            x.append(load_one_flle(path))
            y.append(0)
    return x,y

定義遍歷目錄下的檔案函式，遞迴遍歷所有目錄，直到得到所有檔案

def dirlist(path, allfile):
    filelist = os.listdir(path)
    for filename in filelist:
        filepath = os.path.join(path, filename)
        if os.path.isdir(filepath):
            dirlist(filepath, allfile)
        else:
            allfile.append(filepath)
    return allfile

從攻擊資料中篩選與WebShell相關的資料

def load_adfa_webshell_files(rootdir):
    x=[]
    y=[]
    allfile=dirlist(rootdir,[])
    for file in allfile:
        if re.match(r"G:/data/ADFA-LD/Attack_Data_Master/Web_Shell_\d+/UAD-W*",file): #正則化處理，匹配相應檔案
            x.append(load_one_flle(file))
            y.append(1)
    return x,y

2、特徵化

ADFA-LD資料集記錄了函式的呼叫序列，每個檔案包含的函式呼叫序列個數都不一致，用詞集模型進行特徵化

    x1,y1=load_adfa_training_files("G:/data/ADFA-LD/Training_Data_Master/")
    x2,y2=load_adfa_webshell_files("G:/data/ADFA-LD/Attack_Data_Master/")
    x=x1+x2
    y=y1+y2
vectorizer = CountVectorizer(min_df=1)
    x=vectorizer.fit_transform(x)
    x=x.toarray()

接下來與上面過程一樣，完整程式碼：

import re
import matplotlib.pyplot as plt
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import model_selection
import os
from sklearn.datasets import load_iris
from sklearn import tree
import numpy as np
from sklearn.neighbors import KNeighborsClassifier


def load_one_flle(filename):
    x=[]
    with open(filename) as f:
        line=f.readline()
        line=line.strip('\n')
    return line

def load_adfa_training_files(rootdir):
    x=[]
    y=[]
    list = os.listdir(rootdir)
    for i in range(0, len(list)):
        path = os.path.join(rootdir, list[i])
        if os.path.isfile(path):
            x.append(load_one_flle(path))
            y.append(0)
    return x,y

def dirlist(path, allfile):
    filelist = os.listdir(path)
    for filename in filelist:
        filepath = os.path.join(path, filename)
        if os.path.isdir(filepath):
            dirlist(filepath, allfile)
        else:
            allfile.append(filepath)
    return allfile

def load_adfa_webshell_files(rootdir):
    x=[]
    y=[]
    allfile=dirlist(rootdir,[])
    for file in allfile:
        if re.match(r"G:/data/ADFA-LD/Attack_Data_Master/Web_Shell_\d+/UAD-W*",file):
            x.append(load_one_flle(file))
            y.append(1)
    return x,y



if __name__ == '__main__':

    x1,y1=load_adfa_training_files("G:/data/ADFA-LD/Training_Data_Master/")
    x2,y2=load_adfa_webshell_files("G:/data/ADFA-LD/Attack_Data_Master/")
    x=x1+x2
    y=y1+y2
    vectorizer = CountVectorizer(min_df=1)
    x=vectorizer.fit_transform(x)
    x=x.toarray()
    clf = KNeighborsClassifier(n_neighbors=3)
    scores=model_selection.cross_val_score(clf, x, y, n_jobs=-1, cv=10)
    print (scores)
    print (np.mean(scores))

使用K近鄰演算法檢測Rootkit、WebShell

使用K近鄰演算法檢測Rootkit基於telnet連線的rootkit檢測流程：KDD 99 資料（41維特徵）->篩選與rootkit相關特徵->基於tcp內容的特徵->向量化->與rootkit相關的特徵向量->KNN演算法+10折交叉驗證-

學習筆記（三）：使用K近鄰演算法檢測Rootkit

Rootkit是一種特殊的惡意軟體，它的功能是在安裝目標上隱藏自身以及指定的檔案，程序和網路連結等資訊。 1.資料蒐集 KDD 99 TCP連線內容特徵包括hot ,num_faild_login

學習筆記（四）：使用K近鄰演算法檢測WebShell

1.資料蒐集載入ADFA-LD中正常樣本資料： def load_adfa_training_files(rootdir): x=[] y=[] list = os.listdir(rootdir) for i in

學習筆記（二）：使用K近鄰演算法檢測Web異常操作

使用全量比較，而不是最頻繁和最不頻繁的比較。 1.資料蒐集我們使用詞集的模型，將全部命令去重後形成一個大型向量空間，每個命令代表一個特徵，首先通過遍歷全部命令，生成對應詞集。 with open(filename) as f: fo

學習筆記（一）：使用K近鄰演算法檢測web異常操作

黑客入侵Web伺服器後，通常會通過系統漏洞進一步提權，獲得ROOT許可權。我們可以通過蒐集LINUX伺服器的bash操作日誌，通過訓練識別出特定使用者的操作習慣，然後進一步識別出異常操作的行為。 1.資料蒐集訓練集包括50個使用者的操作

2、python機器學習基礎教程——K近鄰演算法鳶尾花分類

一、第一個K近鄰演算法應用：鳶尾花分類 import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors

2、K-近鄰演算法之約會網站預測

k-近鄰演算法概述定義：簡單地說，k近鄰演算法採用測量不同特徵值之間的距離進行分類原理：存在一個樣本資料集合，也稱作訓練樣本集，並且樣本集中每個資料都存在標籤，即我們知道樣本集中每一資

從K近鄰演算法、距離度量談到KD樹、SIFT+BBF演算法

本文各部分內容分佈如下：第一部分講K近鄰演算法，其中重點闡述了相關的距離度量表示法，第二部分著重講K近鄰演算法的實現--KD樹，和KD樹的插入，刪除，最近鄰查詢等操作，及KD樹的一系列相關改進(包括BBF，M樹等)；第三部分講KD樹的應用：SIFT+kd_BB

《機器學習實戰》第二章——k-近鄰演算法——筆記

在看這一章的書之前，在網上跟著博主Jack-Cui的部落格學習過，非常推薦。部落格地址：http://blog.csdn.net/c406495762 《Python3《機器學習實戰》學習筆記（一）：k-近鄰演算法(史詩級乾貨長文)》講述的非常細緻，文字幽默有趣，演算法細

機器學習 k-近鄰演算法

1、使用python匯入資料 from numpy import * def createDataSet(): group=array([[1.1,1.1],[1.0,1.0],[0,0],[0,0.1]]) labels=['A','A','B','B'] return gr

小白python學習——機器學習篇——k-近鄰演算法（KNN演算法）

一、演算法理解一般給你一資料集，作為該題目的資料（一個矩陣，每一行是所有特徵），而且每一組資料都是分了類，然後給你一個數據，讓這個你預測這組資料屬於什麼類別。你需要對資料集進行處理，如：歸一化數值。處理後可以用matplotlib繪製出影象，一般選兩個特徵繪製x，y軸，然後核心是計算出預測點到

機器學習-k-近鄰演算法python實踐【4】

寫在最前面：簡單來說，k-近鄰演算法是用來根據不同的特徵進行分類的一種演算法優點：精度高、對異常值不敏感、無資料輸入假定缺點：計算複雜度高、空間複雜度高適用資料範圍：數值型和標稱型 IDE:Pycharm python版本：3.6 作業系統：macOS Mojave k

機器學習-K近鄰演算法

用例一： from sklearn.neighbors import NearestNeighbors import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) nbr

機器學習：K近鄰演算法，kd樹

https://www.cnblogs.com/eyeszjwang/articles/2429382.html kd樹詳解 https://blog.csdn.net/v_JULY_v/article/details/8203674 一、K-近鄰演算法（KNN）概述

機器學習實戰——k-近鄰演算法Python實現問題記錄

準備 kNN.py 的python模組 from numpy import * import operator def createDataSet(): group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])

K近鄰演算法(KNN)原理解析及python實現程式碼

KNN演算法是一個有監督的演算法，也就是樣本是有標籤的。KNN可以用於分類，也可以用於迴歸。這裡主要講knn在分類上的原理。KNN的原理很簡單：放入一個待分類的樣本，使用者指定k的大小，然後計算所有訓練樣本與該樣

學習筆記（十一）：使用K-Means演算法檢測DGA域名

1.資料收集與清洗：同（十） 2.特徵化：同（十） 3.訓練樣本 model = KMeans(n_clusters = 2, random_state=random_state) y_pred = model.fit_predict(x) 4.效果驗證：使用TSNE將

KNN(K近鄰)演算法小結

1.K近鄰演算法的介紹： K近鄰演算法是一個理論上比較成熟的分類演算法，也是機器學習中的基本演算法。該方法的思路為：如果一個樣本在特徵空間中的K個最相似(即特徵空間中最鄰近)的樣本中的大多數屬於某一類別，那麼這個樣本也屬於這個類別。用官方的話來說，就是給定一個訓練資料集，對新的輸入例項，在訓練資

Python3《機器學習實戰》學習筆記（一）：k-近鄰演算法

**轉載：**http://blog.csdn.net/c406495762執行平臺： WindowsPython版本： Python3.xIDE： Sublime text3 他的個人網站：http://cuijiahua.com 文章目錄

機器學習實戰筆記一：K-近鄰演算法在約會網站上的應用

K-近鄰演算法概述簡單的說，K-近鄰演算法採用不同特徵值之間的距離方法進行分類 K-近鄰演算法優點：精度高、對異常值不敏感、無資料輸入假定。缺點：計算複雜度高、空間複雜度高。適用範圍：數值型和標稱型。 k-近鄰演算法的一般流程收集資料:可使用任何方法

使用K近鄰演算法檢測Rootkit、WebShell

使用K近鄰演算法檢測Rootkit

使用K近鄰演算法檢測WebShell

相關推薦