1. 程式人生 > >資料預處理之歸一化(normalization)

資料預處理之歸一化(normalization)

概念介紹:

歸一化是利用特徵的最大最小值,將特徵的值縮放到[new_min,new_max]區間,對於每一列的特徵使用min-max函式進行縮放,計算公式如下

程式碼示例:

import numpy as np
fromsklearn.preprocessing import MinMaxScaler,StandardScaler
 
### Machine LearningAction Chapter2 rewrite
deffile2matrix(filename):
    data= np.genfromtxt(filename,delimiter="\t")
    returnMat=data[:,0:3]
    classLabelVector=data[:,3:4]
    return returnMat,classLabelVector
 
defautoNorm(dataset):
    x = dataset[:, 0:1]
    #method1 用skit-learn封裝的MinMaxScaler處理
    minMax = MinMaxScaler()
    x_std = minMax.fit_transform(x)
    print(x.min())
    print(x.max())
    print(x[2])
    print((26052-0)/91273)
    print(x_std[2])
 
    ##method2 use lambda
    a = lambda x: (x -x.min())/(x.max()-x.min())
    print(a(x)[2])
   
if __name__ =='__main__':
    returnMat,classLabelVector=file2matrix('F:\\datingTestSet2.txt')
    autoNorm(returnMat)

執行結果:

資料集示意: