1. 程式人生 > 實用技巧 >machine learnig學習之神經網路反向傳播

machine learnig學習之神經網路反向傳播

所用技術點:

1、one-hot編碼:

defination:將1-10十個數字用另一種形式表示,1用[1,0,0,0,0,0,0,0,0,0]表示,依次類推

為什麼使用one-hot編碼:因為損失函式沿用的是邏輯迴歸的損失函式,兒邏輯迴歸中的y取值只有0和1,所以此處要將具體的數字換一種編碼

反向傳播思路:

首先來看看反向傳播的目的:尋找最優的權重

工作流程:先經過一遍前向傳播,並將所有中間變數快取(有輸入層引數、隱藏層引數、隱藏層通過啟用函式引數、輸出層引數、輸出層通過啟用函式引數(輸出結果))——此處需要注意,初始化權重時,不能將權重全部設定為0(此處和邏輯迴歸不一樣),否則沒有意義,應該使用隨機函式去一個對稱小區間內的值;然後開始反向傳播,通過求導鏈式法則的公式,將相應損失函式寫出,然後加上相應的正則化項(懲罰項)——以防過擬合;反向傳播得出最終的誤差值後,通過scipy模組的自動優化模組取得最優引數並返回;神經網路模型訓練完畢

原始碼:

'''
background propagate of natural network(without optimize attribute)
'''

import numpy as np
import scipy.io as sio
import matplotlib.pyplot as plt
from scipy.optimize import minimize


'''
優化函式,如果選擇無正則化梯度函式
則容易產生過擬合,正則化後,擬合程度會有所下降
'''
def nn_training(X,y):
    init_theta = np.random.uniform(-0.5,0.5,10285)
    res 
= minimize( fun = reg_cost, x0 = init_theta, args = (X,y,lamda), method = 'TNC', jac = reg_gradient, options = {'maxiter':300} ) return res def one_hot_encoder(raw_y): result = [] for i in raw_y:# 遍歷raw_y中的所有內容 # 定義臨時存放陣列(轉編碼陣列) y_temp = np.zeros(10) y_temp[i
-1] = 1 result.append(y_temp) return np.array(result) # 序列化操作 def serialize(a,b): return np.append(a.flatten(),b.flatten()) def deserialize(theta_serialize): theta1 = theta_serialize[:25*401].reshape(25,401) theta2 = theta_serialize[25*401:].reshape(10,26) return theta1,theta2 def sigmoid(z): return 1/(1+np.exp(-z)) ''' 前向傳播函式 所有過程引數都需要返回,便於反向傳播操作 a代表每一層經過sigmoid函式的結果 z表示通過theta運算,但是並未經過啟用函式的值,h為輸出值 ''' def feed_forward(theta_serialize,X): theta1,theta2 = deserialize(theta_serialize) a1 = X z2 = a1 @ theta1.T a2 = sigmoid(z2) a2 = np.insert(a2,0,values =1,axis = 1) z3 = a2 @ theta2.T h = sigmoid(z3) return a1,z2,a2,z3,h ''' 不帶正則化的損失函式(不帶正則化的計算次數會更多) ''' def cost(theta_serialize,X,y): a1,z2,a2,z3,h = feed_forward(theta_serialize,X) J = -np.sum(y*np.log(h) + (1-y)*np.log(1-h))/len(X) return J ''' 帶正則化的損失函式 ''' def reg_cost(theta_serialize,X,y,lamda): theta1,theta2 = deserialize(theta_serialize) sum1 = np.sum(np.power(theta1[:,1:],2)) sum2 = np.sum(np.power(theta2[:,1:],2)) reg = (sum1 + sum2)*lamda/(2*len(X)) return reg + cost(theta_serialize,X,y) ''' sigmoid函式的偏導數function ''' def sigmoid_gradient(z): return sigmoid(z)*(1-sigmoid(z)) ''' 無正則化的梯度下降 d代表誤差 ''' def gradient(theta_serialize,X,y): theta1,theta2 = deserialize(theta_serialize) a1, z2, a2, z3, h = feed_forward(theta_serialize, X) d3 = h - y d2 = d3 @ theta2[:,1:]*sigmoid_gradient(z2) D2 = (d3.T @ a2)/len(X) D1 = (d2.T @ a1)/len(X) return serialize(D1,D2) ''' 正則化的梯度下降 d代表誤差 ''' def reg_gradient(theta_serialize,X,y,lamda): D = gradient(theta_serialize,X,y) D1,D2 = deserialize(D) theta1,theta2 = deserialize(theta_serialize) # 加上懲罰項 D1[:,1:] += theta1[:,1:]*lamda / len(X) D2[:,1:] += theta2[:,1:]*lamda / len(X) return serialize(D1,D2) ''' 隱藏層視覺化,沒有實際意義,因為其中的資料只有電腦能夠識別,產生的影象看不出實際意義 ''' def plot_hidden_layer(theta): theta1,_ = deserialize(theta) hidden_layer = theta1[:,1:]# 25,400,因為第一列是附加的,所以此處需要去掉 fig,ax = plt.subplots(ncols = 5,nrows= 5 ,figsize = (8,8),sharey=True,sharex=True) for r in range(5): for c in range(5): ax[r,c].imshow(hidden_layer[5*r+c].reshape(20,20).T,cmap = 'gray_r') # 取消x,y軸顯示 plt.xticks([]) plt.yticks([]) plt.show() data = sio.loadmat('./data_set/ex4data1.mat') raw_X = data['X'] raw_y = data['y'] X = np.insert(raw_X,0,values = 1,axis = 1) # X.shape y = one_hot_encoder(raw_y) theta = sio.loadmat('./data_set/ex4weights.mat') theta1,theta2 = theta['Theta1'],theta['Theta2'] # print(theta1) # print(theta2) theta_serialize = serialize(theta1,theta2) lamda = 10 # print(reg_cost(theta_serialize,X,y,lamda)) res = nn_training(X,y) # print(res.x) raw_y = raw_y.reshape(5000,) _,_,_,_,h = feed_forward(res.x,X) y_pred = np.argmax(h,axis = 1) + 1 acc = np.mean(y_pred == raw_y) plot_hidden_layer(res.x)

以上

希望對大家有所幫助