1. 程式人生 > 其它 >異常檢測-線性方法

異常檢測-線性方法

技術標籤:機器學習機器學習

PCA異常檢測

來自pyod的文件

Principal component analysis (PCA) can be used in detecting outliers. PCA is a linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.
In this procedure, covariance matrix of the data can be decomposed to orthogonal vectors, called eigenvectors, associated with eigenvalues. The eigenvectors with high eigenvalues capture most of the variance in the data.

Therefore, a low dimensional hyperplane constructed by k eigenvectors can capture most of the variance in the data. However, outliers are different from normal data points, which is more obvious on the hyperplane constructed by the eigenvectors with small eigenvalues.
Therefore, outlier scores can be obtained as the sum of the projected distance of a sample on all eigenvectors. See [BSCSC03,BAgg15] for details.
Score(X) = Sum of weighted euclidean distance between each sample to the hyperplane constructed by the selected eigenvectors

實踐用乳腺癌資料集

from pyod.models import pca
data = train_data.values
y = data[:,-1]
n_samples = int(numeric.shape[0])
train_set = numeric[:int(n_samples*0.8)]
y_train = y[:int(n_samples*0.8)]
y_test = y[int(n_samples*0.8):]
test_set = numeric[int(n_samples*0.8):]
my_pca = pca.PCA()
my_pca.fit(train_set)

y_pre = my_pca.predict(X=test_set)
def trans(c):
    if c=='n':
        return 0
    else:
        return 1;
y_ = list(map(trans,y_test))
print('預測成功率為 %.2f%% '%((y_==y_pre).sum() / y_pre.shape[0] * 100))