1. 程式人生 > 其它 >KNN演算法----分類模型

KNN演算法----分類模型

技術標籤:python資料分析人工智慧python

KNN 演算法的原理及實現—手動寫法

import numpy as np 
import matplotlib.pyplot as plt
from math import sqrt
from collections import Counter

x = [
    [2.40,2.0],
    [4.1,2.8],
    [1.2,1.4],
    [0.6,3.7],
    [1.3,2.5],
    [6.4,4.2],
    [1.7,4.5],
    [8.2,3.5]
]

y= [0,0,0,0,0,1,1,1]
sample =
[5.0,6.2] k = 3 x_train = np.array(x) y_train = np.array(y) distance =[] for i in x_train: d =sqrt(np.sum(i - sample)**2) distance.append(d) nearst = np.argsort(distance)[:k] # print(nearst) k_y = [y_train[j] for j in nearst] # print(k_y) count = Counter(k_y) print(count) count.most_common
(1)[0][0] plt.scatter(x_train[y_train == 0,0],x_train[y_train == 0,1],color = 'g') plt.scatter(x_train[y_train == 1,0],x_train[y_train == 1,1],color = 'r') plt.scatter(5.0,6.2,color = 'b') plt.show()

用 KNN 演算法,對 scikit-learn 內建的手寫數字識別資料集進行預測

1.匯入scikit-learn 內建的手寫數字識別資料集
import numpy as np
import pandas as
pd from sklearn import datasets iris = datasets.load_digits() # iris.keys() x = iris.data y = iris.target 2.將資料分為訓練集和測試集 from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2,random_state = 666) 3.將資料進行歸一化 from sklearn.preprocessing import StandardScaler standscaler = StandardScaler() # 建立例項 standscaler.fit(x_train) # 訓練資料,不需要特徵y 儲存計算出來的均值和方差 # 對資料及逆行轉換,用於測試資料 std_x_train = standscaler.transform(x_train) # 用fit中儲存的均值和方差來轉換x_train,使x_train標準化 std_x_test = standscaler.transform(x_test) 4.構建KNN模型 from sklearn.neighbors import KNeighborsClassifier clf = KNeighborsClassifier(n_neighbors = 6) clf.fit(std_x_train,y_train) clf.score(std_x_test,y_test) 5.KNN模型進行網格搜尋,尋找最優的模型和引數 from sklearn.model_selection import GridSearchCV clf = KNeighborsClassifier() param_grid = [{ 'n_neighbors':[i for i in range(0,10)], 'weights':['uniform','distance'] }] gs_clf = GridSearchCV(clf,param_grid = param_grid) gs_clf.fit(std_x_train, y_train) gs_clf.best_estimator_ gs_clf.best_params_ # gs_clf.best_score_ 6.利用最優的模型對測試集進行預測,計算預測的準確度 clf = KNeighborsClassifier(n_neighbors = 6,weights = 'distance') clf.fit(std_x_train,y_train) clf.score(std_x_test,y_test)