KNN演算法----分類模型
阿新 • • 發佈:2020-12-21
KNN 演算法的原理及實現—手動寫法
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
from collections import Counter
x = [
[2.40,2.0],
[4.1,2.8],
[1.2,1.4],
[0.6,3.7],
[1.3,2.5],
[6.4,4.2],
[1.7,4.5],
[8.2,3.5]
]
y= [0,0,0,0,0,1,1,1]
sample = [5.0,6.2]
k = 3
x_train = np.array(x)
y_train = np.array(y)
distance =[]
for i in x_train:
d =sqrt(np.sum(i - sample)**2)
distance.append(d)
nearst = np.argsort(distance)[:k]
# print(nearst)
k_y = [y_train[j] for j in nearst]
# print(k_y)
count = Counter(k_y)
print(count)
count.most_common (1)[0][0]
plt.scatter(x_train[y_train == 0,0],x_train[y_train == 0,1],color = 'g')
plt.scatter(x_train[y_train == 1,0],x_train[y_train == 1,1],color = 'r')
plt.scatter(5.0,6.2,color = 'b')
plt.show()
用 KNN 演算法,對 scikit-learn 內建的手寫數字識別資料集進行預測
1.匯入scikit-learn 內建的手寫數字識別資料集
import numpy as np
import pandas as pd
from sklearn import datasets
iris = datasets.load_digits()
# iris.keys()
x = iris.data
y = iris.target
2.將資料分為訓練集和測試集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2,random_state = 666)
3.將資料進行歸一化
from sklearn.preprocessing import StandardScaler
standscaler = StandardScaler() # 建立例項
standscaler.fit(x_train) # 訓練資料,不需要特徵y 儲存計算出來的均值和方差
# 對資料及逆行轉換,用於測試資料
std_x_train = standscaler.transform(x_train) # 用fit中儲存的均值和方差來轉換x_train,使x_train標準化
std_x_test = standscaler.transform(x_test)
4.構建KNN模型
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors = 6)
clf.fit(std_x_train,y_train)
clf.score(std_x_test,y_test)
5.對KNN模型進行網格搜尋,尋找最優的模型和引數
from sklearn.model_selection import GridSearchCV
clf = KNeighborsClassifier()
param_grid = [{
'n_neighbors':[i for i in range(0,10)],
'weights':['uniform','distance']
}]
gs_clf = GridSearchCV(clf,param_grid = param_grid)
gs_clf.fit(std_x_train, y_train)
gs_clf.best_estimator_
gs_clf.best_params_
# gs_clf.best_score_
6.利用最優的模型對測試集進行預測,計算預測的準確度
clf = KNeighborsClassifier(n_neighbors = 6,weights = 'distance')
clf.fit(std_x_train,y_train)
clf.score(std_x_test,y_test)