knn自己實現（python）

阿新 • • 發佈：2018-11-19

前言：

對於knn來說，有兩個hyperparameters(超引數：choices about the algorithm that we set rather than learn. Very problem-dependent, must try them all out and see what works best.)，其一是怎麼選取distance metric，其二是怎麼選取k。

這兒說兩種distance metric，一種是Manhattan metric, 也叫L1，另一種是Euclidean metric, 即L2。

這兩種距離各有應用的方面，目前自己也在學習階段，並不是特別清楚，但老師說，L1是coordinate dependence。

比如說你有一個關於員工的vector，裡面的元素描述的是員工的工資，年齡，性別等等不同的方面，那麼，此時

就可以考慮L1距離。

程式碼實現：

步驟：

1、匯入資料集

2、資料集分類（訓練、測試）

3、計算test instance與training set的distance

4、根據distance的大小，選取k個neighbors

5、k個鄰居進行投票（這兒只討論binary knn）

實現：

from sklearn.datasets import load_iris
from sklearn import cross_validation
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from collections import Counter
from operator import itemgetter
import numpy as np
import math


# 1) given two data points, calculate the euclidean distance between them
def get_distance(data1, data2):
    points = zip(data1, data2)
    diffs_squared_distance = [pow(a - b, 2) for (a, b) in points]
    return math.sqrt(sum(diffs_squared_distance))

 
# 2) given a training set and a test instance, use getDistance to calculate all pairwise distances
def get_neighbours(training_set, test_instance, k):
    distances = [_get_tuple_distance(training_instance, test_instance) for training_instance in training_set]
 
    # index 1 is the calculated distance between training_instance and test_instance
    sorted_distances = sorted(distances, key=itemgetter(1))
 
    # extract only training instances without distance
    sorted_training_instances = [tuple[0] for tuple in sorted_distances]
 
    # select first k elements
    return sorted_training_instances[:k]

def _get_tuple_distance(training_instance, test_instance):
    return (training_instance, get_distance(test_instance, training_instance[0]))



def get_majority_vote(neighbours):
    # index 1 is the class
    classes = [neighbour[1] for neighbour in neighbours]
    count = Counter(classes)
    return count.most_common()[0][0]


def main():
 
    # load the data and create the training and test sets
    # random_state = 1 is just a seed to permit reproducibility of the train/test split，即設定種子，使得隨機結果能夠在以後的實驗再次出現
    iris = load_iris()
    X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=1)
 
    # reformat train/test datasets for convenience
    train = np.array(list(zip(X_train,y_train)))
    test = np.array(list(zip(X_test, y_test)))
    
    #上面的train和test的資料結構如下
    """
        array([[array([5.8, 2.8, 5.1, 2.4]), 2],
       [array([6. , 2.2, 4. , 1. ]), 1],
       [array([5.5, 4.2, 1.4, 0.2]), 0],.....])
    """

 
    # generate predictions
    predictions = []
 
    # let's arbitrarily set k equal to 5, meaning that to predict the class of new instances,
    k = 5
 
    # for each instance in the test set, get nearest neighbours and majority vote on predicted class
    for x in range(len(X_test)):
 
            print('Classifying test instance number ',str(x),":")
            neighbours = get_neighbours(training_set=train, test_instance=test[x][0], k=5)
            majority_vote = get_majority_vote(neighbours)
            predictions.append(majority_vote)
            print('Predicted label=',str(majority_vote),', Actual label=',str(test[x][1]))
            
    # summarize performance of the classification
    y_test_new=y_test.reshape(-1,1)
    print('The overall accuracy of the model is: ',accuracy_score(y_test_new, predictions))
    report = classification_report(y_test, predictions, target_names = iris.target_names)
    print('A detailed classification report: \n\n',report)
 
if __name__ == "__main__":
    main()

knn自己實現（python）

前言：對於knn來說，有兩個hyperparameters(超引數：choices about the algorithm that we set rather than learn. Very problem-dependent, must try them all out and see

歸併法的程式碼實現（python）

這個演算法的主要思想是：將被排序的陣列劃分成相等的兩個子陣列，然後遞迴使用同樣的演算法分別對兩個子陣列排序。最好將兩個排好序的子陣列歸併成一個數組。歸併的過程如下：假設兩個子陣列是A和B，它們的元素都按照從小到大的順序排列。將A與B歸併後的

ROS行為樹實現（Python）

一、行為樹　　行為樹是一種控制結構，在相關論文資料中通常會與有限狀態機進行比較，並認為其比有限狀態機更適合複雜條件下的控制，目前多用於遊戲開發中（主要用於NPC行為），工業領域的應用研究正逐漸增多，主要面向移動機器人/AGV/無人駕駛等等。　　相關論文資料可以參考：http://kth.diva-po

感知機演算法（Perceptron Learning Algorithm）和程式碼實現（Python）

PLA演算法是機器學習中最為基礎的演算法，與SVM和Neural Network有著緊密的關係。 &n

K近鄰演算法理解及實現（python）

KNN的工作原理：給定一個已知標籤類別的訓練資料集，輸入沒有標籤的新資料後，在訓練資料集中找到與新資料最鄰近的k個例項，如果這k個例項的多數屬於某個類別，那麼新資料就屬於這個類別。可以簡單理解為：由那些離X最近的k個點來投票決定X歸為哪一類。在二維平面下：

機器學習演算法-K最近鄰從原理到實現（Python）

本來這篇文章是5月份寫的，今天修改了一下內容，就成今天發表的了，CSDN這是出BUG了還是什麼改規則了。。。引文：決策樹和基於規則的分類器都是積極學習方法（eager learner）的例子，因為一旦訓練資料可用，他們就開始學習從輸入屬性到類標號的對映模型。

圖的實現（python）

比如有這麼一張圖： (1)可以用字典和列表來表示 graph = {'V0':['V1','V5'], 'V1':['V2'], 'V2':['V3'], 'V3':['V4','V5'],

爬蟲requests庫簡單抓取頁面資訊功能實現（Python）

import requests import re, json,time,random from requests import RequestException UserAgentList = [ "Mozilla/5.0 (Windows NT 6.1; WO

機器學習-mnist kNN演算法識別（python）

方以類聚，物以群分 ---《周易·繫辭上》測試環境：python3.6、win7 32bit、x86。在上一篇文章中介紹了mnist資料的格式，以及用python如何讀取mnist資料

《推薦系統》基於使用者和Item的協同過濾演算法的分析與實現（Python）

開啟微信掃一掃，關注《資料與演算法聯盟》1：協同過濾演算法簡介2：協同過濾演算法的核心3：協同過濾演算法的應用方式4：基於使用者的協同過濾演算法實現5：基於物品的協同過濾演算法實現一：協同過濾演算法簡介關於協同過濾的一個最經典的例子就是看電影，有時候不知道哪一部電影是

協方差實現（python）

import numpy as np if __name__ == '__main__': a = np.array([[1,2,3],[3,1,1]]) c_ = np.mean(a, axis =0) #按列計算 x = a - c_ # 廣播機制 cov =

學習筆記-二叉樹-先序、中序、後序、層次遍歷的實現（Python）

一、二叉樹類的Python實現及其函式：包括統計結點個數，用遞迴實現的先序遍歷，非遞迴實現的先序遍歷，以及非遞迴實現的後序遍歷。class StackUnderflow(ValueError): pass class SStack(): d

單例模式封裝的logging類庫的實現（Python）

import logging import sys def __singletion(cls): """ 單例模式的裝飾器函式 :param cls: 實體類 :return: 返回實體類物件 """ instances =

K-means和K-means++演算法程式碼實現（Python）

K-means和K-means++主要區別在於，K-means++演算法選擇初始類中心時，儘可能選擇相距較遠的類中心，而K-means僅僅是隨機初始化類中心。 #K-means演算法 from pylab import * from numpy import * impo

K-SVD字典學習及其實現（Python）

演算法思想演算法求解思路為交替迭代的進行稀疏編碼和字典更新兩個步驟. K-SVD在構建字典步驟中，K-SVD不僅僅將原子依次更新，對於原子對應的稀疏矩陣中行向量也依次進行了修正. 不像MOP，K-SVD不需要對矩陣求逆，而是利用SVD數學分析方法得到了一個新

web框架的原理以及web框架的實現（python）

在學習了動態伺服器的實現之後（wsgi），便引入了web框架。何為web框架？其實就相當於人的骨架一樣，有了框架之後我們便可以往裡面新增肉，新增各種需要新增的，由此而組成了一個人。而web也是如此，我們希望使用者可以直接往框架裡新增功能，同時不用去管怎樣處理

knn演算法例項（python）

import csv import random import math import operator def loadDataset(filename,split,trainingSet=[],testSet=[]): # 注意這兒加上'b

樸素貝葉斯演算法的程式碼例項實現（python）

本文由本人原創，僅作為自己的學習記錄資料：假設下面是課程資料，課程資料分為，價格A，課時B，銷量C 價格A 課時B 銷量C 低多高高中高低少高低中低中中

Apriori演算法簡介及實現（python）

Apriori這個詞的意思是“先驗的”，從priori這個詞根可以猜出來~;) 。該演算法用於從資料中挖掘頻繁項資料集以及關聯規則。其核心原理是基於這樣一類“先驗知識”：如果一個數據項在資料庫中是頻繁出現的，那麼該資料項的子集在資料庫中也應該是頻繁出現的（命題1）

HMM-前向後向演算法理解與實現（python）

[TOC] ### 基本要素 - 狀態 $N$個 - 狀態序列 $S = s_1,s_2,...$ - 觀測序列 $O=O_1,O_2,...$ - $\lambda(A,B,\pi)$ - 狀態轉移概率 $A = \{a_{ij}\}$ - 發射概率 $B = \{b_{ik}\}$

knn自己實現（python）

前言：

程式碼實現：

步驟：

實現：

相關推薦