關於Multi-label Classification 多標籤分類的問題
多分類:
訓練集中每個樣本只有一個標籤,該標籤來自於一個不重合的標籤集合L,|L| > 1。
當|L|=2 時,這是一個二分類問題。當|L| > 2 時,這是一個多分類問題。
單標籤與多標籤:
多標籤:
令為訓練樣本集,為標籤集合,給定一組形式為的訓練樣本,,,目的是學習求出合適的誤差低的函式(中的unique values)。
大多數情況下,多標籤的方法會涉及到一個給定樣本可能有的標籤的排序問題,因此這個學習演算法可以看成函式。並按照來對標籤進行排序。
令為規則下樣本所對應的標籤的排序。其中,是一個到上的一對一的對映,且如果,那麼。
多標籤分類的方法:
多標籤分類的方法可以從 Algorithm Independent 和 Algorithm Dependent 的角度講,也可以從 Problem Transformation Method 和 Algorithm Adaptation Method 的角度講。Algorithm Dependent 和 Algorithm Adaptation Method 一樣,也有很多方法是 Problem Transformation Method 和 Algorithm Adaptation Method 的結合。
Problem Transformation Method 問題轉化
將多標籤問題轉化為單標籤問題。改變data來適應演算法。
At training time, with :
1 Transform the multi-label training data to single-label data
2 Learn from the single-label transformed data
At testing time, for :
1 Make single-label predictions
2 Translate these into multi-label predictions
e.g.
1 Binary Relevance (BR): binary problems (one vs. all)
缺點:不能model標籤的dependency關係;分類不均衡
改進:Stacked BR (2BR) [Godbole and Sarawagi, 2004],Chain Classifier (CC) [Cheng et al., 2010, Read et al., 2011]
2 Label Powerset (LP): one multi-class problem of class-values 二進位制
缺點:類標籤多而複雜;不均衡(每個類別標籤對應的例項不多);過擬合(預測新值)
改進:Ensembles of RAndom k-labEL subsets (RAkEL) [Tsoumakas and Vlahavas, 2007],Ensembles of Pruned Sets (EPS) [Read et al., 2008] 二者都通過投票機制進行預測
3 Pairwise (PW): binary problems (all vs. all)
Each model is trained based on examples annotated by at least one of the labels, but not both.
缺點:生成的是PW的Rankings,要得到標籤集;找過擬合標籤的decision boundary無意義;分類器數量大
改進:Calibrated Label Ranking CLR ([Fürnkranz et al., 2008]),
4 Copy-Weight (CW): one multi-class problem of L class values
Make a single multi-class problem with L possible class values.Each example duplicated times, weighted as.
缺點:同一例項 / 不同標籤的decision boundary;標籤基數大導致資料集增長大;沒有可以直觀地對dependency進行model的方法
Algorithm Independent 獨立於演算法
分為基於標籤和基於例項的兩種轉化方式。
Label-based Transformation 基於標籤的轉化
Instance-based Transformation 基於例項的轉化
Instance Elimination
去除多標籤例項
Label Creation
把多個標籤變成一個新的標籤
Conversion
把多標籤的例項變成單標籤的例項。簡化或分解(加、乘)。
Label Elimination (Simplification)
保留一個最可能為true的或者隨機選擇一個標籤。
Labels Decomposition
Additive
如下圖,分類器數量1+1=2。
Multiplicative
如下圖,分類器數量2×1×2×1×1×1=4
Algorithm Independent Method總結
Algorithm Adaptation / Dependent Method 演算法適應
改變單標籤演算法生成多標籤的輸出。
e.g.,
k-Nearest Neighbours (KNN)
MLkNN [Zhang and Zhou, 2007]
Decision Trees (DT)
Multi-label C4.5 [Clare and King, 2001]
基於AdaBoost [Freund Y. & Schapire, R. E., 1995], ADTBoost [Freund Y. & Mason, L. , 1999]的de Comite, F., Gilleron, R. & and Tommasi, M. (2003) Learning Multi-label Alternating Decision Trees from Texts and Data. 和Adaboost.MH and Adaboost.MR (Schapire & Singer, 2000)
Support Vector Machines (SVM)
RankSVM, a Maximum Margin approach [Elisseeff and Weston, 2002]
Godbole and Sarawagi (2004) 結合PT4
Aiolli, F. & Sperduti, A. (2005) Multiclass Classification with Multi-Prototype Support Vector Machines.
Neural Networks (NN)
BPMLL [Zhang and Zhou, 2006]
其他方法可參見 A Tutorial on Multi-Label Classification Techniques 和 Multi-Label Classification: An Overview 等文獻。
具體選哪種方法比較好取決於具體問題是什麼,效率和速度:Decision Tree-based;靈活性:problem transformation methods, esp. BR-based;預測能力: ensembles (most modern methods)。
An extensive empirical study by [Madjarov et al., 2012] recommends:
RT-PCT: Random Forest of Predictive Clustering Trees (Algorithm Adaptation, Decision Tree based)
HOMER: Hierarchy Of Multilabel ClassifiERs (Problem Transformation, LP-based (original presentation))
CC: Classifier Chains (Problem Transformation, BR-based)
Multi-label Classification-Part 01, Jesse Read
André C. P. L. F. de Carvalho, Freitas A A. A Tutorial on Multi-label Classification Techniques[M]// Foundations of Computational Intelligence Volume 5. Springer Berlin Heidelberg, 2009:177-195.
Li T, Zhang C, Zhu S. Empirical Studies on Multi-label Classification.[C]// IEEE International Conference on TOOLS with Artificial Intelligence. IEEE Computer Society, 2006:86-92.
Multi-Label Classification: An Overview