ROC曲線評估和異常點去除

阿新 • • 發佈：2020-07-13

1、詳細連結見https://www.cnblogs.com/mdevelopment/p/9456486.html

複習ROC曲線：

ROC曲線是一個突出ADS分辨能力的曲線，用來區分正常點和異常點。ROC曲線將TPR召回率描繪為FPR假陽性率的函式。

曲線下的面積(AUC)越大，曲線越接近水平漸近線，ADS效果越好。

def evaluate(scores, labels):
"""
It retures the auc and prauc scores.
:param scores: list<float> | the anomaly scores predicted by CellPAD.

:param labels: list<float> | the true labels.
:return: the auc, prauc.
"""
from sklearn import metrics 呼叫方式為：metrics.評價指標函式名稱（parameter）

fpr, tpr, thresholds = metrics.roc_curve(labels, scores, pos_label=1)

計算ROC曲線的橫縱座標值，TPR，FPR

TPR = TP/(TP+FN) = recall(真正例率，敏感度) FPR = FP/(FP+TN)(假正例率，1-特異性)

precision, recall, thresholds = metrics.precision_recall_curve(labels, scores, pos_label=1)

使用python畫precision-recall曲線的程式碼
auc = metrics.auc(fpr, tpr)

auc(x,y,reorder=False): ROC曲線下的面積;較大的AUC代表了較好的performance
pruc = metrics.auc(recall, precision)
return auc, pruc

2、

def detect_anomaly(self, predicted_series, practical_series):

通過比較預測值和實際值來計算每個點的掉落率。
然後，它執行filter_anomaly（）函式以通過引數“ rule”過濾掉異常。

"""
It calculates the drop ratio of each point by comparing the predicted value and practical value.
Then it runs filter_anomaly() function to filter out the anomalies by the parameter "rule".
:param predicted_series: the predicted values of a KPI series
:param practical_series: the practical values of a KPI series
:return: drop_ratios, drop_labels and drop_scores
"""
drop_ratios = []
for i in range(len(practical_series)):

dp=（實際值-預測值）/（預測值+10的7次方）
dp = (practical_series[i] - predicted_series[i]) / (predicted_series[i] + 1e-7)
drop_ratios.append(dp)
drop_scores = []

如有負數，改為正數
for r in drop_ratios:
if r < 0:
drop_scores.append(-r)
else:
drop_scores.append(0.0)

drop_labels = self.filter_anomaly(drop_ratios)
return drop_ratios, drop_labels, drop_scores

3、由2呼叫filter_anomaly（）函式

def filter_anomaly(self, drop_ratios):

"""

它計算不同方法的閾值（規則），然後呼叫filter_by_threshold（）。
It calculates the threshold for different approach(rule) and then calls filter_by_threshold().
- gauss: threshold = mean - self.sigma * std
- threshold: the given threshold variable
- proportion: threshold = sort_scores[threshold_index]
:param drop_ratios: list<float> | a measure of predicted drop anomaly degree
:return: list<bool> | the drop labels
"""
if self.rule == 'gauss':
mean = np.mean(drop_ratios)
std = np.std(drop_ratios) 方差，總體標準偏差
threshold = mean - self.sigma * std 閾值=平均數-方差*sigma
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

if self.rule == "threshold":
threshold = self.threshold
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

if self.rule == "proportion":
sort_scores = sorted(np.array(drop_ratios)) 從小到大排序
threshold_index = int(len(drop_ratios) * self.proportion)
threshold = sort_scores[threshold_index]
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

4、由3呼叫filter_by_threshold函式

def filter_by_threshold(self, drop_scores, threshold):
"""

通過比較其下降分數和閾值來判斷一個點是否為異常。
It judges whether a point is an anomaly by comparing its drop score and the threshold.
:param drop_scores: list<float> | a measure of predicted drop anomaly degree.
:param threshold: float | the threshold to filter out anomalies.
:return: list<bool> | a list of labels where a point with a "true" label is an anomaly.
"""
drop_labels = []
for r in drop_scores:
if r < threshold:
drop_labels.append(True)
else:
drop_labels.append(False)
return drop_labels

ROC曲線評估和異常點去除

ROC曲線評估和異常點去除

解決ROC曲線畫出來只有一個點的問題

Python matplotlib繪製圖形例項(包括點,曲線,註釋和箭頭)

python實現二分類和多分類的ROC曲線教程

【轉】混淆矩陣和ROC曲線

Matlab建立SVM，KNN和樸素貝葉斯模型分類繪製ROC曲線

Coroutine中的去除和異常 | 取消操作介紹

邏輯迴歸4-分類評估方法、混淆矩陣、精確率、召回率、roc曲線、auc指標

PHP pthread拓展使用和注意點

PHP pthreads v3使用中的一些坑和注意點分析

淺談ROC曲線的最佳閾值如何選取

36 Go 語言中的錯誤和異常處理

11 Python 中的錯誤和異常

基於python實現ROC曲線繪製廣場解析

python常見報錯資訊！錯誤和異常！附帶處理方法

利用scikitlearn畫ROC曲線例項

Python中的錯誤和異常

機器學習中的AUC-ROC曲線

檢測異常點並過濾

.net FrameWork怎樣對映HRESULT和異常

ROC曲線評估和異常點去除

相關推薦