k-近鄰演算法改進約會網站的配對效果
阿新 • • 發佈:2018-12-29
在上一篇的基礎上增加如下程式碼:
'''
將文字記錄轉換到NumPy的解析程式
輸入為檔名字串
輸出為訓練樣本矩陣和類標籤向量
'''
def file2matrix(filename):
fr = open(filename)
arrayOLine = fr.readlines()
numberOfLines = len(arrayOLine) #得到文字行數
returnMat = zeros((numberOfLines, 3)) #建立以0填充的NumPy矩陣
'''
解析文字資料到列表,文字資料有4列,分別表示
每年獲得的飛行常客里程數
玩視訊遊戲所消耗的時間百分比
每週消費的冰淇淋公升數
標籤,以整型表示:不喜歡的人,魅力一般的人,極具魅力的人
'''
classLabelVector = []
index = 0
for line in arrayOLine:
line = line.strip() #strip,預設刪除空白符(包括'\n', '\r', '\t', ' ')
listFromLine = line.split('\t')
returnMat[index, :] = listFromLine[0: 3] #選取前3個元素儲存到特徵矩陣
classLabelVector.append(int(listFromLine[-1])) #-1表示最後一列元素,如果不用int(),將當做字串處理
index += 1
return returnMat, classLabelVector
#歸一化特徵值
def autoNorm(dataSet):
minVals = dataSet.min(0) #存放每一列的最小值,min(0)引數0可以從列中選取最小值,而不是當前行最小值
maxVals = dataSet.max(0) #存放每一列的最大值
ranges = maxVals - minVals #1 * 3 矩陣
normDataSet = zeros(shape(dataSet)) #列
m = dataSet.shape[0 ] #行
normDataSet = dataSet - tile(minVals, (m, 1)) #tile(A, (row, col))
normDataSet = normDataSet/tile(ranges, (m, 1))
return normDataSet, ranges, minVals
#分類器針對約會網站的測試程式碼
def dataingClassTest():
hoRatio = 0.1
datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
m = normMat.shape[0]
numTestVecs = int(m*hoRatio) #用於測試的資料條數
errorCount = 0.0 #錯誤率
for i in range(numTestVecs):
classifierResult = classify0(normMat[i,:], normMat[numTestVecs:m,:],\
datingLabels[numTestVecs:m], 3)
print "the classifier came back with: %d, the real answer is: %d"\
%(classifierResult, datingLabels[i])
if(classifierResult != datingLabels[i]): errorCount += 1.0
print "the total error rate is: %f" %(errorCount/float(numTestVecs))
測試:
... ...
the classifier came back with: 3, the real answer is: 3
the classifier came back with: 2, the real answer is: 2
the classifier came back with: 1, the real answer is: 1
the classifier came back with: 3, the real answer is: 1
the total error rate is: 0.050000
錯誤率為5%
新增以下函式,進行預測
#約會網站預測函式
def classifyPerson():
resultList = ['not at all', 'in small doses', 'in large doses']
percentTats = float(raw_input("percentage of time spent playing video games?"))
ffMiles = float(raw_input("frequent flier miles earned per year?"))
iceCream = float(raw_input("liters of ice cream consumed per year?"))
datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
inArr = array([ffMiles, percentTats, iceCream])
classifierResult = classify0((inArr-minVals)/ranges, normMat, datingLabels, 3)
print "You will probably like this person:", resultList[classifierResult-1]
>>> import KNN
>>> classifyPerson()
percentage of time spent playing video games?20
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.6
You will probably like this person: in large doses