區域性加權迴歸(Locally weighted linear regression)
阿新 • • 發佈:2019-02-11
通常情況下的線性擬合不能很好地預測所有的值,因為它容易導致欠擬合(under fitting),比如資料集是
一個鐘形的曲線。而多項式擬合能擬合所有資料,但是在預測新樣本的時候又會變得很糟糕,因為它導致資料的
過擬合(overfitting),不符合資料真實的模型。
今天來講一種非引數學習方法,叫做區域性加權迴歸(LWR)。為什麼區域性加權迴歸叫做非引數學習方法呢? 首
先引數學習方法是這樣一種方法:在訓練完成所有資料後得到一系列訓練引數,然後根據訓練引數來預測新樣本
的值,這時不再依賴之前的訓練資料了,引數值是確定的。而非引數學習方法是這樣一種演算法:在預測新樣本值
時候每次都會重新訓練資料得到新的引數值,也就是說每次預測新樣本都會依賴訓練資料集合,所以每次得到的
引數值是不確定的。
接下來,介紹區域性加權迴歸的原理。
有上面的原理,我們來實踐一下,使用python的程式碼來實現,如下:
採用C = 1.0的結果圖:#python 3.5.3 蔡軍生 #http://edu.csdn.net/course/detail/2592 # 計算加權迴歸 import numpy as np import random import matplotlib.pyplot as plt def gaussian_kernel(x, x0, c, a=1.0): """ Gaussian kernel. :Parameters: - `x`: nearby datapoint we are looking at. - `x0`: data point we are trying to estimate. - `c`, `a`: kernel parameters. """ # Euclidian distance diff = x - x0 dot_product = diff * diff.T return a * np.exp(dot_product / (-2.0 * c**2)) def get_weights(training_inputs, datapoint, c=1.0): """ Function that calculates weight matrix for a given data point and training data. :Parameters: - `training_inputs`: training data set the weights should be assigned to. - `datapoint`: data point we are trying to predict. - `c`: kernel function parameter :Returns: NxN weight matrix, there N is the size of the `training_inputs`. """ x = np.mat(training_inputs) n_rows = x.shape[0] # Create diagonal weight matrix from identity matrix weights = np.mat(np.eye(n_rows)) for i in range(n_rows): weights[i, i] = gaussian_kernel(datapoint, x[i], c) return weights def lwr_predict(training_inputs, training_outputs, datapoint, c=1.0): """ Predict a data point by fitting local regression. :Parameters: - `training_inputs`: training input data. - `training_outputs`: training outputs. - `datapoint`: data point we want to predict. - `c`: kernel parameter. :Returns: Estimated value at `datapoint`. """ weights = get_weights(training_inputs, datapoint, c=c) x = np.mat(training_inputs) y = np.mat(training_outputs).T xt = x.T * (weights * x) betas = xt.I * (x.T * (weights * y)) return datapoint * betas def genData(numPoints, bias, variance): x = np.zeros(shape=(numPoints, 2)) y = np.zeros(shape=numPoints) # 構造一條直線左右的點 for i in range(0, numPoints): # 偏移 x[i][0] = 1 x[i][1] = i # 目標值 y[i] = bias + i * variance + random.uniform(0, 1) * 20 return x, y #生成資料 a1, a2 = genData(100, 10, 0.6) a3 = [] #計算每一點 for i in a1: pdf = lwr_predict(a1, a2, i, 1) a3.append(pdf.tolist()[0]) plt.plot(a1[:,1], a2, "x") plt.plot(a1[:,1], a3, "r-") plt.show()
採用C = 2.0的結果圖: