ALINK(三十六):模型評估(一)二分類評估 (EvalBinaryClassBatchOp)
Java 類名:com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp
Python 類名:EvalBinaryClassBatchOp
功能介紹
二分類評估是對二分類演算法的預測結果進行效果評估。
支援Roc曲線,LiftChart曲線,K-S曲線,Recall-Precision曲線繪製。
流式的實驗支援累計統計和視窗統計,除卻上述四條曲線外,還給出Auc/Kappa/Accuracy/Logloss隨時間的變化曲線。
給出整體的評估指標包括:AUC、K-S、PRC, 不同閾值下的Precision、Recall、F-Measure、Sensitivity、Accuracy、Specificity和Kappa。
混淆矩陣
Roc曲線
橫座標:FPR
縱座標:TPR
AUC
Roc曲線下面的面積
K-S
橫座標:閾值
縱座標:TPR和FPR
KS
K-S曲線兩條縱軸的最大差值
Recall-Precision曲線
橫座標:Recall
縱座標:Precision
PRC
Recall-Precision曲線下面的面積
提升曲線
橫座標:$$ \dfrac{TP + FP}{total} $$
縱座標:TP
Precision
Precision = \dfrac{TP}{TP + FP}
Recall
Recall = \dfrac{TP}{TP + FN}
F-Measure
F1=\dfrac{2TP}{2TP+FP+FN}=\dfrac{2\cdot Precision \cdot Recall}{Precision+Recall}
Sensitivity
Sensitivity=\dfrac{TP}{TP+FN}
Accuracy
Accuray=\dfrac{TP + TN}{TP + TN + FP + FN}
Specificity
Specificity=\dfrac{TN}{FP+T}
Kappa
p_a =\dfrac{TP + TN}{TP + TN + FP + FN}
p_e = \dfrac{(TN + FP) * (TN + FN) + (FN + TP) * (FP + TP)}{(TP + TN + FP + FN) * (TP + TN + FP + FN)}
kappa = \dfrac{p_a - p_e}{1 - p_e}
Logloss
logloss=- \dfrac{1}{N}\sum_{i=1}^N \sum_{j=1}^My_{i,j}log(p_{i,j})
引數說明
名稱 |
中文名稱 |
描述 |
型別 |
是否必須? |
預設值 |
predictionDetailCol |
預測詳細資訊列名 |
預測詳細資訊列名 |
String |
✓ |
|
labelCol |
標籤列名 |
輸入表中的標籤列名 |
String |
✓ |
|
positiveLabelValueString |
正樣本 |
正樣本對應的字串格式。 |
String |
null |
程式碼示例
Python 程式碼
from pyalink.alink import * import pandas as pd useLocalEnv(1) df = pd.DataFrame([ ["prefix1", "{\"prefix1\": 0.9, \"prefix0\": 0.1}"], ["prefix1", "{\"prefix1\": 0.8, \"prefix0\": 0.2}"], ["prefix1", "{\"prefix1\": 0.7, \"prefix0\": 0.3}"], ["prefix0", "{\"prefix1\": 0.75, \"prefix0\": 0.25}"], ["prefix0", "{\"prefix1\": 0.6, \"prefix0\": 0.4}"] ]) inOp = BatchOperator.fromDataframe(df, schemaStr='label string, detailInput string') metrics = EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol("detailInput").linkFrom(inOp).collectMetrics() print("AUC:", metrics.getAuc()) print("KS:", metrics.getKs()) print("PRC:", metrics.getPrc()) print("Accuracy:", metrics.getAccuracy()) print("Macro Precision:", metrics.getMacroPrecision()) print("Micro Recall:", metrics.getMicroRecall()) print("Weighted Sensitivity:", metrics.getWeightedSensitivity())
Java 程式碼
import org.apache.flink.types.Row; import com.alibaba.alink.operator.batch.BatchOperator; import com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp; import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import com.alibaba.alink.operator.common.evaluation.BinaryClassMetrics; import org.junit.Test; import java.util.Arrays; import java.util.List; public class EvalBinaryClassBatchOpTest { @Test public void testEvalBinaryClassBatchOp() throws Exception { List <Row> df = Arrays.asList( Row.of("prefix1", "{\"prefix1\": 0.9, \"prefix0\": 0.1}"), Row.of("prefix1", "{\"prefix1\": 0.8, \"prefix0\": 0.2}"), Row.of("prefix1", "{\"prefix1\": 0.7, \"prefix0\": 0.3}"), Row.of("prefix0", "{\"prefix1\": 0.75, \"prefix0\": 0.25}"), Row.of("prefix0", "{\"prefix1\": 0.6, \"prefix0\": 0.4}") ); BatchOperator <?> inOp = new MemSourceBatchOp(df, "label string, detailInput string"); BinaryClassMetrics metrics = new EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol( "detailInput").linkFrom(inOp).collectMetrics(); System.out.println("AUC:" + metrics.getAuc()); System.out.println("KS:" + metrics.getKs()); System.out.println("PRC:" + metrics.getPrc()); System.out.println("Accuracy:" + metrics.getAccuracy()); System.out.println("Macro Precision:" + metrics.getMacroPrecision()); System.out.println("Micro Recall:" + metrics.getMicroRecall()); System.out.println("Weighted Sensitivity:" + metrics.getWeightedSensitivity()); } }
執行結果
AUC: 0.8333333333333334 KS: 0.6666666666666666 PRC: 0.9027777777777777 Accuracy: 0.6 Macro Precision: 0.8 Micro Recall: 0.6 Weighted Sensitivity: 0.6