機器學習-周志華-個人練習13.4

阿新 • • 發佈：2019-01-05

13.4 從網上下載或自己程式設計實現TSVM演算法，選擇兩個UCI資料集，將其中30%的樣例用作測試樣本，10%的樣例用作有標記樣本，60%的樣例用作無標記樣本，分別訓練出利用無標記樣本的TSVM以及僅利用有標記樣本的SVM，並比較其效能。

選擇最常用的iris資料集，並以sciki-learn的SVM演算法為基礎建立TSVM，為了方便展示效果，選用iris資料集下的兩個第二類和第三類，並將類標記記為-1，1，最後的訓練結果選用其中兩個屬性進行視覺化。本題為了與書上可進行對比，選用了線性超平面來劃分類別，以此才能直接得到權重係數和鬆弛變數，具體程式碼如下：

# -*- coding: utf-8 -*- 

import numpy as np
import matplotlib.pyplot as plt
import sklearn.svm as svm
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# balanced data，each class has the same volume of every kind of data
iris = datasets.load_iris()
# data, label = iris.data[50:, [0, 3]], iris.target[50:] * 2 - 3  # 標籤變為-1,1 

data, label = iris.data[50:, :], iris.target[50:] * 2 - 3  # 4 attributes
# standardizing
sc = StandardScaler()
sc.fit(data)
data = sc.transform(data)

test_d, test_c = np.concatenate((data[:15], data[50:65])), np.concatenate((label[:15], label[50:65]))  # 30
l_d, l_c = np.concatenate((data[45:50], data[95 
:])), np.concatenate((label[45:50], label[95:]))  # 10
u_d = np.concatenate((data[15:45], data[65:95]))  # 60
lu_d = np.concatenate((l_d, u_d))
n = len(l_d)+len(u_d)
# u_d, u_c = np.concatenate((data[20:50], data[70:])), np.concatenate((label[20:50], label[70:]))  # 60

clf1 = svm.SVC(C=1,kernel='linear')
clf1.fit(l_d, l_c)
clf0 = svm.SVC(C=1,kernel='linear')
clf0.fit(l_d, l_c)
lu_c_0 = clf0.predict(lu_d) 

u_c_new = clf1.predict(u_d)  # the pseudo label for unlabelled samples
cu, cl = 0.001, 1
sample_weight = np.ones(n)
sample_weight[len(l_c):] = cu
id_set = np.arange(len(u_d))


while cu < cl:
    lu_c = np.concatenate((l_c, u_c_new))  # 70
    clf1.fit(lu_d, lu_c, sample_weight=sample_weight)
    while True:
        u_c_new = clf1.predict(u_d)  # the pseudo label for unlabelled samples
        u_dist = clf1.decision_function(u_d)  # the distance of each sample
        norm_weight = np.linalg.norm(clf1.coef_)  # norm of weight vector
        epsilon = 1 - u_dist * u_c_new * norm_weight

        plus_set, plus_id = epsilon[u_c_new > 0], id_set[u_c_new > 0]  # positive labelled samples
        minus_set, minus_id = epsilon[u_c_new < 0], id_set[u_c_new < 0]  # negative labelled samples
        plus_max_id, minus_max_id = plus_id[np.argmax(plus_set)], minus_id[np.argmax(minus_set)]
        a, b = epsilon[plus_max_id], epsilon[minus_max_id]

        if a > 0 and b > 0 and a + b > 2:
            u_c_new[plus_max_id], u_c_new[minus_max_id] = -u_c_new[plus_max_id], -u_c_new[minus_max_id]
            lu_c = np.concatenate((l_c, u_c_new))
            clf1.fit(lu_d, lu_c, sample_weight=sample_weight)
        else:
            break
    cu = min(cu * 2, cl)
    sample_weight[len(l_c):] = cu

lu_c = np.concatenate((l_c, u_c_new))
test_c1 = clf0.predict(test_d)
test_c2 = clf1.predict(test_d)
score1 = clf0.score(test_d,test_c)
score2 = clf1.score(test_d,test_c)

fig = plt.figure(figsize=(16,4))
ax = fig.add_subplot(131)
ax.scatter(test_d[:,0],test_d[:,2],c=test_c,marker='o',cmap=plt.cm.coolwarm)
ax.set_title('True Labels for test samples',fontsize=16)
ax1 = fig.add_subplot(132)
ax1.scatter(test_d[:,0],test_d[:,2],c=test_c1,marker='o',cmap=plt.cm.coolwarm)
ax1.scatter(lu_d[:,0], lu_d[:,2], c=lu_c_0, marker='o',s=10,cmap=plt.cm.coolwarm,alpha=.6)
ax1.set_title('SVM, score: {0:.2f}%'.format(score1*100),fontsize=16)
ax2 = fig.add_subplot(133)
ax2.scatter(test_d[:,0],test_d[:,2],c=test_c2,marker='o',cmap=plt.cm.coolwarm)
ax2.scatter(lu_d[:,0], lu_d[:,2], c=lu_c, marker='o',s=10,cmap=plt.cm.coolwarm,alpha=.6)
ax2.set_title('TSVM, score: {0:.2f}%'.format(score2*100),fontsize=16)
for a in [ax,ax1,ax2]:
    a.set_xlabel(iris.feature_names[0])
    a.set_ylabel(iris.feature_names[2])
plt.show()

上述程式碼執行結果如下，由圖可見，對於iris資料集，TSVM通過利用未標記資料能提高最終分類的準確率，從SVM的96.67%提高到了TSVM的100%，預測標記與測試集的真實標記一致。另外，經測試發現，對於iris資料集，若選用非線性核，如RBF，那麼此時TSVM相對於SVM效能並沒有提升。
TSVM與SVM

機器學習-周志華-個人練習13.4

機器學習-周志華-個人練習13.4

機器學習-周志華-個人練習11.1

機器學習-周志華-個人練習8.3和8.5

機器學習(周志華) 習題7.3 個人筆記

機器學習-周志華-第一章

機器學習周志華筆記

學習筆記 | 機器學習-周志華 | 5

學習筆記 | 機器學習-周志華 | 4

學習筆記 | 機器學習-周志華 | 3

學習筆記 | 機器學習-周志華 | 2

學習筆記 | 機器學習-周志華 | 1

機器學習-周志華-課後習題答案5.5

機器學習(周志華) 參考答案第十六章強化學習

機器學習(周志華西瓜書) 參考答案總目錄

機器學習--周志華（第1章）

機器學習(周志華) 參考答案第三章線性模型 3.3

機器學習(周志華) 參考答案第一章緒論

機器學習(周志華) 參考答案第十四章概率圖模型

機器學習-周志華-課後習題答案-線性模型

機器學習(周志華) 參考答案第四章決策樹 python重寫版與畫樹演算法

機器學習-周志華-個人練習13.4

相關推薦