《機器學習實戰》筆記 第二章(2)
阿新 • • 發佈:2018-12-12
《機器學習實戰》筆記 第二章 (2)
2.2 約會網站配對
程式碼實現
這裡原作者給出的資料集的標籤不是int,實現程式碼的時候,出現了問題,給出兩種解決方案。以下是書上的原始碼
#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
#得到檔案的行數
numberOfLines = len(arrayOlines)
#建立返回Numpy的矩陣
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析檔案資料到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[ index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
解決方案①
替換classLabelVector.append(int(listFromLine[-1]))
#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines( )
#得到檔案的行數
numberOfLines = len(arrayOlines)
#建立返回Numpy的矩陣
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析檔案資料到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
if listFromLine[-1] == 'did_not_Like':
classLabelVector.append(1)
elif listFromLine[-1] == 'small_Doses':
classLabelVector.append(2)
elif listFromLine[-1] == 'large_Doses':
classLabelVector.append(3)
index += 1
return returnMat,classLabelVector
注意,Python2可直接輸入reload()
但Python3必須先import importlib匯入!
在ipython下
>>>import importlib
>>>importlib.reload(kNN)
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet.txt')
解決方案②
程式碼同書
#將約會資料文字記錄轉化為numpy的解析程式
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
#得到檔案的行數
numberOfLines = len(arrayOlines)
#建立返回Numpy的矩陣
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析檔案資料到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
在ipython下引用把標籤格式改為int的datingTestSet2.txt
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet2.txt')
輸出datingDataMat和datingLabels
In[1]: datingDataMat
Out[1]:
array([[4.0920000e+04, 8.3269760e+00, 9.5395200e-01],
[1.4488000e+04, 7.1534690e+00, 1.6739040e+00],
[2.6052000e+04, 1.4418710e+00, 8.0512400e-01],
...,
[2.6575000e+04, 1.0650102e+01, 8.6662700e-01],
[4.8111000e+04, 9.1345280e+00, 7.2804500e-01],
[4.3757000e+04, 7.8826010e+00, 1.3324460e+00]])
In[2]: datingLabels[0:20]
Out[2]: [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]
建立散點圖
需要匯入matplotlib來建立散點圖
import matplotlib
import matplotlib.pyplot as plt
開始構圖
fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*array(datingLabels),15.0*array(datingLabels))
plt.show()
結果如圖所示,圖中橫軸表示玩視訊遊戲所耗時間百分比,豎軸表示每週所消費的冰淇淋公升數
特別提醒:如果把書上程式碼classLabelVector.append(int(listFromLine[-1]))改為classLabelVector.append(listFromLine[-1])會發生無法預料的錯誤,建議採用本文所訴的兩種解題方式
歸一化資料
書上程式碼無誤建議手寫一遍
def autoNorm(dataSet):
minVals = dataSet.min(0)
maxVals = dataSet.max(0)
ranges = maxVals - minVals
normDataSet = zeros(shape(dataSet))
m = dataSet.shape[0]
normDataSet = dataSet - tile(minVals, (m,1))
#特徵值相除
normDataSet = normDataSet/tile(ranges, (m,1))
return normDataSet, ranges, minVals
作為完整程式驗證分類器
原始碼
def datingClassTest():
hoRatio = 0.10
datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
m = normMat.shape[0]
numTestVecs = int(m*hoRatio)
errorCount = 0.0
for i in range(numTestVecs):
classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:], \
datingLabels[numTestVecs:m],3)
print("the classfier came back with: %d,the real answer is : %d" \
% (classifierResult,datingLabels[i]))
if (classifierResult != datingLabels[i]): errorCount += 1.0
print("the total error rate is: %f" % (errorCount/float(numTestVecs)))
轉到ipython
In[1]: import kNN
In[2]: datingDataMat,datingLabels = kNN.file2matrix('datingTestSet2.txt')
In[3]: normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
In[4]: normMat
Out[4]:
array([[0.44832535, 0.39805139, 0.56233353],
[0.15873259, 0.34195467, 0.98724416],
[0.28542943, 0.06892523, 0.47449629],
...,
[0.29115949, 0.50910294, 0.51079493],
[0.52711097, 0.43665451, 0.4290048 ],
[0.47940793, 0.3768091 , 0.78571804]])
In[5]: ranges
Out[5]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])
In[6]: minVals
Out[6]: array([0. , 0. , 0.001156])
構建完整系統
原始碼
def classifyPerson():
resultList = ['not at all','in small doses','in large doses']
percentTats = float(input("percentage of time spent playing video games?"))
ffMiles = float(input("frequent flier miles earned per year?"))
iceCream = float(input("liters of ice cream consumed per years?"))
datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
inArr = array([ffMiles, percentTats, iceCream])
classifierResult = classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
print("You will probably like this person: ",resultList[classifierResult - 1])
轉到ipython
In[1]: import kNN
In[2]: kNN.datingClassTest()
Out[2]:
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the re