吳恩達機器學習 - 神經網路的反向傳播演算法吳恩達機器學習 - 神經網路的反向傳播演算法

阿新 • • 發佈：2018-11-05

原

吳恩達機器學習 - 神經網路的反向傳播演算法

2018年06月21日 20:59:35 離殤灬孤狼閱讀數：373

													<span class="tags-box artic-tag-box">
							<span class="label">標籤：</span>
															<a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;機器學習&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=機器學習&amp;t=blog" target="_blank">機器學習																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;神經網路&quot;}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=神經網路&amp;t=blog" target="_blank">神經網路																</a>
						<span class="article_info_click">更多</span></span>
																				<div class="tags-box space">
							<span class="label">個人分類：</span>
															<a class="tag-link" href="https://blog.csdn.net/wyg1997/article/category/7742222" target="_blank">吳恩達機器學習																</a>
						</div>
																							</div>
			<div class="operating">
													</div>
		</div>
	</div>
</div>
<article>
	<div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post" style="height: 2211px; overflow: hidden;">
							<div class="article-copyright">
				版權宣告：如果感覺寫的不錯，轉載標明出處連結哦~blog.csdn.net/wyg1997					https://blog.csdn.net/wyg1997/article/details/80766153				</div>
							            <div class="markdown_views">
						<!-- flowchart 箭頭圖示 勿刪 -->
						<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
						<p>題目連結：<a href="https://s3.amazonaws.com/spark-public/ml/exercises/on-demand/machine-learning-ex4.zip" rel="nofollow" target="_blank">點選開啟連結</a></p>

筆記：

這裡寫圖片描述

因為這一部分的內容確實難度比較大，所以我準備按最後一頁筆記的思路一點一點的寫出實現的思路和我的想法。

首先讓資料視覺化

執行程式碼

load('ex4data1.mat');
m = size(X, 1);

sel = randperm(size(X, 1));     %亂序後隨機選擇100組資料進行展示
sel = sel(1:100);

displayData(X(sel, :));
  
   1
   2
   3
   4
   5
   6
   7

結果是：

這裡寫圖片描述

用到是函式是displayData.m：

function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
%   [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
%   stored in X in a nice grid. It returns the figure handle h and the 
%   displayed array if requested.

% Set example_width automatically if not passed in 

if ~exist('example_width', 'var') || isempty(example_width) 
    example_width = round(sqrt(size(X, 2)));
end

% Gray Image
colormap(gray);

% Compute rows, cols
[m n] = size(X);
example_height = (n / example_width);

% Compute number of items to display
display_rows = floor(sqrt(m));
display_cols = ceil(m / display_rows);

% Between images padding
pad = 1;

% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...
                       pad + display_cols * (example_width + pad));

% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
    for i = 1:display_cols
        if curr_ex > m, 
            break; 
        end
        % Copy the patch

        % Get the max value of the patch
        max_val = max(abs(X(curr_ex, :)));
        display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
                      pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
                        reshape(X(curr_ex, :), example_height, example_width) / max_val;
        curr_ex = curr_ex + 1;
    end
    if curr_ex > m, 
        break; 
    end
end

% Display Image
h = imagesc(display_array, [-1 1]);

% Do not show axis
axis image off

drawnow;

end

  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60

表示一下神經網路模型：

這裡寫圖片描述

我們可以得到以下資訊：

3層網路
輸入層有400（20*20的影象樣本）個單元（這裡不包括偏置單元）
輸出層有10個（表示0，1，2，…，9）單元
隱藏層有25個單元

代價函式

公式

這裡寫圖片描述

sigmoid.m程式碼（這個已經沒有難度，只是下面要呼叫，先粘出來）：

function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
%   J = SIGMOID(z) computes the sigmoid of z.

g = 1.0 ./ (1.0 + exp(-z));
end
  
   1
   2
   3
   4
   5
   6

代價函式的計算：nnCostFunction.m中填充的程式碼（暫時沒加正則化）（這裡要求有任意維度的輸出層都通用）：

%計算各層的z(x)
a1 = [ones(m,1), X];        %input
z2 = a1*Theta1';       %hidden
a2 = [ones(m,1), sigmoid(z2)];
z3 = a2*Theta2';       %output
a3 = sigmoid(z3);

%轉換y向量
Y = zeros(m, size(Theta2, 1));        %適應不同維度的輸出層
for i = 1:size(Theta2, 1)
    Y(find(y==i), i) = 1;
end

%然後計算J
J = sum(sum(-(Y.*log(a3)+(1-Y).*log(1-a3))))/m;
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15

代價函式正則化（在上面的程式碼下新增）：

%對J進行正則化
J = J + lambda/(2.0*m)* ...
    (sum(sum(Theta1(:,2:size(Theta1,2)).^2))+ ...
    sum(sum(Theta2(:,2:size(Theta2,2)).^2)));
  
   1
   2
   3
   4

反向傳播

Sigmoid導數的實現（sigmoidGradient.m）：

function g = sigmoidGradient(z)
%SIGMOIDGRADIENT returns the gradient of the sigmoid function
%evaluated at z
%   g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function
%   evaluated at z. This should work regardless if z is a matrix or a
%   vector. In particular, if z is a vector or matrix, you should return
%   the gradient for each element.

g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the gradient of the sigmoid function evaluated at
%               each value of z (z can be a matrix, vector or scalar).

g = sigmoid(z).*(1-sigmoid(z));

% =============================================================

end
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19

隨機初始化（randInitializeWeights.m）（因為權重不能全為0嘛，筆記上解釋了為什麼）：

function W = randInitializeWeights(L_in, L_out)
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
%   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights 
%   of a layer with L_in incoming connections and L_out outgoing 
%   connections. 
%
%   Note that W should be set to a matrix of size(L_out, 1 + L_in) as
%   the first column of W handles the "bias" terms
%

% You need to return the following variables correctly 
W = zeros(L_out, 1 + L_in);

% ====================== YOUR CODE HERE ======================
% Instructions: Initialize W randomly so that we break the symmetry while
%               training the neural network.
%
% Note: The first column of W corresponds to the parameters for the bias unit
%

epsilon_init = 0.12;        %這個數字要小一點從而保證較高的學習效率
W = rand(L_out, 1+L_in)*2*epsilon-epsilon_init;

% =========================================================================

end
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27

另外這裡還有一個比較好的選取epsilon的方法：
這裡寫圖片描述

反向傳播（填充在nnCostFunction.m中，代價函式計算程式碼之下，注意這裡沒有正則化）：

公式和圖示：
這裡寫圖片描述

%開始反向傳播，分5部計算梯度
%1.對於輸入層，計算每個樣本的啟用值（上面已經實現）

%2.計算輸出層的誤差值
delta3 = a3 - Y;

%3.計算l=2層的誤差值（這裡由於矩陣的方向的不同，和文件中式子不太一樣）
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);

%4.用公式計算DELTA（delta的大寫形式）
DELTA1 = delta2'*a1;
DELTA2 = delta3'*a2;

%5.除以樣本數得到梯度
Theta1_grad = DELTA1./m;
Theta2_grad = DELTA2./m;
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16

梯度檢驗（checkNNGradients.m）：

function checkNNGradients(lambda)
%CHECKNNGRADIENTS Creates a small neural network to check the
%backpropagation gradients
%   CHECKNNGRADIENTS(lambda) Creates a small neural network to check the
%   backpropagation gradients, it will output the analytical gradients
%   produced by your backprop code and the numerical gradients (computed
%   using computeNumericalGradient). These two gradient computations should
%   result in very similar values.
%

if ~exist('lambda', 'var') || isempty(lambda)
    lambda = 0;
end

input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;

% We generate some 'random' test data
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);
% Reusing debugInitializeWeights to generate X
X  = debugInitializeWeights(m, input_layer_size - 1);
y  = 1 + mod(1:m, num_labels)';

% Unroll parameters
nn_params = [Theta1(:) ; Theta2(:)];

% Short hand for cost function
costFunc = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, ...
                               num_labels, X, y, lambda);

[cost, grad] = costFunc(nn_params);
numgrad = computeNumericalGradient(costFunc, nn_params);

% Visually examine the two gradient computations.  The two columns
% you get should be very similar. 
disp([numgrad grad]);
fprintf(['The above two columns you get should be very similar.\n' ...
         '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);

% Evaluate the norm of the difference between two solutions.  
% If you have a correct implementation, and assuming you used EPSILON = 0.0001 
% in computeNumericalGradient.m, then diff below should be less than 1e-9
diff = norm(numgrad-grad)/norm(numgrad+grad);

fprintf(['If your backpropagation implementation is correct, then \n' ...
         'the relative difference will be small (less than 1e-9). \n' ...
         '\nRelative Difference: %g\n'], diff);

end
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52

好了，檢驗通過我們進行下一步：正則化梯度（nnCostFunction.m）（在上面計算梯度的程式碼下填充）：

%正則化梯度
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda/m*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda/m*Theta2(:,2:end);
  
   1
   2
   3

到此為止，計算代價以及計算梯度的過程我們已經完成了，下面貼出nnCostFunction.m的完整程式碼：

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);

% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
%               following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
%         variable J. After implementing Part 1, you can verify that your
%         cost function computation is correct by verifying the cost
%         computed in ex4.m
%
% Part 2: Implement the backpropagation algorithm to compute the gradients
%         Theta1_grad and Theta2_grad. You should return the partial derivatives of
%         the cost function with respect to Theta1 and Theta2 in Theta1_grad and
%         Theta2_grad, respectively. After implementing Part 2, you can check
%         that your implementation is correct by running checkNNGradients
%
%         Note: The vector y passed into the function is a vector of labels
%               containing values from 1..K. You need to map this vector into a 
%               binary vector of 1's and 0's to be used with the neural network
%               cost function.
%
%         Hint: We recommend implementing backpropagation using a for-loop
%               over the training examples if you are implementing it for the 
%               first time.
%
% Part 3: Implement regularization with the cost function and gradients.
%
%         Hint: You can implement this around the code for
%               backpropagation. That is, you can compute the gradients for
%               the regularization separately and then add them to Theta1_grad
%               and Theta2_grad from Part 2.
%

%本地資料的矩陣大小
%Theta1:25*401
%Theta2:10*26
%X:5000*400
%z1:5000*401
%z2:5000*25
%z3:5000*10
%y:5000*1
%Y:5000*10
%a1:5000*401
%a2:5000*26
%a3:5000*10
%delta3:5000*10
%delta2:5000*25

%計算各層的z(x)
a1 = [ones(m,1), X];        %input
z2 = a1*Theta1';       %hidden
a2 = [ones(m,1), sigmoid(z2)];
z3 = a2*Theta2';       %output
a3 = sigmoid(z3);

%轉換y向量
Y = zeros(m, size(Theta2, 1));        %適應不同維度的輸出層
for i = 1:size(Theta2, 1)
    Y(find(y==i), i) = 1;
end

%然後計算J
J = sum(sum(-(Y.*log(a3)+(1-Y).*log(1-a3))))/m;

%對J進行正則化
J = J + lambda/(2.0*m)* ...
    (sum(sum(Theta1(:,2:size(Theta1,2)).^2))+ ...
    sum(sum(Theta2(:,2:size(Theta2,2)).^2)));


%開始反向傳播，分5部計算梯度
%1.對於輸入層，計算每個樣本的啟用值（上面已經實現）

%2.計算輸出層的誤差值
delta3 = a3 - Y;

%3.計算l=2層的誤差值（這裡由於矩陣的方向的不同，和文件中式子不太一樣）
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);

%4.用公式計算DELTA（delta的大寫形式）
DELTA1 = delta2'*a1;
DELTA2 = delta3'*a2;

%5.除以樣本數得到梯度
Theta1_grad = DELTA1./m;
Theta2_grad = DELTA2./m;

%正則化梯度
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda/m*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda/m*Theta2(:,2:end);


% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];


end
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
   100
   101
   102
   103
   104
   105
   106
   107
   108
   109
   110
   111
   112
   113
   114
   115
   116
   117
   118
   119
   120
   121
   122
   123
   124
   125
   126
   127
   128
   129
   130
   131
   132

學習使用高階優化來求解（如果對寫法有疑惑，可以參考這篇文章：點選開啟連結）：

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
  
   1
   2
   3
   4
   5
   6
   7
   8
   9

視覺化隱藏層（使用`displayData(Theta1(:, 2:end));`）：

displayData.m：

function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
%   [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
%   stored in X in a nice grid. It returns the figure handle h and the 
%   displayed array if requested.

% Set example_width automatically if not passed in
if ~exist('example_width', 'var') || isempty(example_width) 
    example_width = round(sqrt(size(X, 2)));
end

% Gray Image
colormap(gray);

% Compute rows, cols
[m n] = size(X);
example_height = (n / example_width);

% Compute number of items to display
display_rows = floor(sqrt(m));
display_cols = ceil(m / display_rows);

% Between images padding
pad = 1;

% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...
                       pad + display_cols * (example_width + pad));

% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
    for i = 1:display_cols
        if curr_ex > m, 
            break; 
        end
        % Copy the patch

        % Get the max value of the patch
        max_val = max(abs(X(curr_ex, :)));
        display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
                      pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
                        reshape(X(curr_ex, :), example_height, example_width) / max_val;
        curr_ex = curr_ex + 1;
    end
    if curr_ex > m, 
        break; 
    end
end

% Display Image
h = imagesc(display_array, [-1 1]);

% Do not show axis
axis image off

drawnow;

end
  
   1
   2
   3
   4
   5
   6
   7
   8
   9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59

來看看隱藏層有什麼祕密：

這裡寫圖片描述

最後，我們可以用不同的λ來獲得一個更精確的引數，這個以後再實驗

吳恩達機器學習 - 神經網路的反向傳播演算法吳恩達機器學習 - 神經網路的反向傳播演算法

原吳恩達機器學習 - 神經網路的反向傳播演算法 2018年06月21日 20:59:35 離殤灬孤狼閱讀數：373

吳恩達Coursera深度學習課程 DeepLearning.ai 提煉筆記（5-1）-- 迴圈神經網路

Ng最後一課釋出了，撒花！以下為吳恩達老師 DeepLearning.ai 課程專案中，第五部分《序列模型》第一週課程“迴圈神經網路”關鍵點的筆記。同時我在知乎上開設了關於機器學習深度學習的專欄收錄下面的筆記，以方便大家在移動端的學習。歡迎關

吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

# coding: utf-8 # # Initialization # Welcome to the first assignment of "Improving Deep Neural Networks". # # Training your neural

Coursera吳恩達《優化深度神經網路》課程筆記（1）-- 深度學習的實用層面

Andrew Ng的深度學習專項課程的第一門課《Neural Networks and Deep Learning》的5份筆記我已經整理完畢。迷路的小夥伴請見如下連結：在接下來的幾次筆記中，我們將對第二門課《Improving Dee

吳恩達Coursera深度學習課程 DeepLearning.ai 提煉筆記（1-4）-- 深層神經網路

以下為在Coursera上吳恩達老師的DeepLearning.ai課程專案中，第一部分《神經網路和深度學習》第四周課程“深層神經網路”部分關鍵點的筆記。筆記並不包含全部小視訊課程的記錄，如需學習筆記中捨棄的內容請至 Coursera 或者網易雲課

跟著吳恩達學深度學習：用Scala實現神經網路-第二課：用Scala實現多層神經網路

上一章我們講了如何使用Scala實現LogisticRegression，這一張跟隨著吳恩達的腳步我們用Scala實現基礎的深度神經網路。順便再提一下，吳恩達對於深度神經網路的解釋是我如今聽過的最清楚的課，感嘆一句果然越是大牛知識解釋得越清晰明瞭。本文分為以下四個部分。

吳恩達機器學習 - 無監督學習——K-means演算法吳恩達機器學習 - 無監督學習——K-means演算法

原吳恩達機器學習 - 無監督學習——K-means演算法 2018年06月25日 12:02:37 離殤灬孤狼閱讀數：181

吳恩達改善深層神經網路引數：超引數除錯、正則化以及優化——優化演算法

機器學習的應用是一個高度依賴經驗的過程，伴隨著大量的迭代過程，你需要訓練大量的模型才能找到合適的那個，優化演算法能夠幫助你快速訓練模型。難點：機器學習沒有在大資料發揮最大的作用，我們可以利用巨大的資料集來訓練網路，但是在大資料下訓練網路速度很慢；使用快速的優化演算法大大提高效率

吳恩達機器學習（十一）K-means（無監督學習、聚類演算法）

目錄 0. 前言學習完吳恩達老師機器學習課程的無監督學習，簡單的做個筆記。文中部分描述屬於個人消化後的理解，僅供參考。如果這篇文章對你有一點小小的幫助，請給個關注喔~我會非常開心

Coursera吳恩達《卷積神經網路》課程筆記（1）-- 卷積神經網路基礎

《Convolutional Neural Networks》是Andrw Ng深度學習專項課程中的第四門課。這門課主要介紹卷積神經網路（CNN）的基本概念、模型和具體應用。該門課共有4周課時，所以我將分成4次筆記來總結，這是第一節筆記。 1. Compu

吳恩達《機器學習》課程筆記——第一章：緒論初識機器學習

【重要提示】：本人機器學習課程的主要學習資料包括：吳恩達教授的機器學習課程和黃廣海博士的中文學習筆記。感謝吳恩達教授和黃廣海博士的知識分享和無私奉獻。作為機器學習小白，計劃每週末記錄一週以來的學習內容，總結回顧。希望大家多多挑錯，也願我的學習筆記能幫助到有需要的人。 1.1 什麼是機器學習卡內基梅隆大學

用純Python實現迴圈神經網路RNN向前傳播過程(吳恩達DeepLearning.ai作業)

Google TensorFlow程式設計師點讚的文章！前言目錄: - 向量表示以及它的維度 - rnn cell - rnn 向前傳播 &nbs

機器學習筆記（十）：TensorFlow實戰二（深層神經網路）

1 - 深度學習與深層神經網路深度學習的精確定義為：“一類通過多層非線性變換對高複雜性資料建模演算法的集合” 因此，多層神經網路有著2個非常重要的特性多層非線性 1.1 - 線性模型的侷限性線上性模型中，模型的輸出為輸入的加權和，假設一

32.學習曲線繪製技巧翻譯自吳恩達新書-Machine Learning Yearning

假如你的訓練樣本集非常小，只有100個樣本。你依次隨機抽取10個樣本、20個樣本、30個樣本，每次增加10個樣本依次類推，一直到100個樣本，進行模型訓練，然後把學習曲線繪製出來，你可能會發現，當訓練樣本集很少時，曲線看起來有很多噪音。當你只隨機選擇10個樣

機器學習入坑指南（十一）：卷積神經網路

上一篇文章中，我們準備好了深度學習所需的資料。為了實現分辨貓狗的目的，我們決定使用卷積神經網路（Convolutional Neural Networks，CNN），目前它在影象識別方面十分受歡迎，那麼到底什麼是卷積神經網路呢？我們應該怎麼去理解它？ 1 CNN

從整合方法到神經網路：自動駕駛技術中的機器學習演算法有哪些？

來源：機器之心編譯：Lj Linjing、蔣思源物聯網智庫原創轉載請註明來源和出處 ------ 【導讀】------ 機器學習演算法可以融合來自車體內外不同感測器的資料，從而評估駕駛員狀況或者對駕駛場景進行分類。本文將粗略講解一下各類用於自動駕駛技術的演算法。如今，機器

機器學習實驗（四）：用tensorflow實現卷積神經網路識別人類活動

在近幾年，越來越多的使用者在智慧手機上安裝加速度感測器等一些裝置，這就為做一些應用需要收集相關的資料提供了方便。人類活動識別(human activity recognition (HAR))是其中的一個應用。對於HAR，有很多的方法可以去嘗試，方法的performance很大程度上依賴於特徵工程。傳統的機

機器學習（ML）九之GRU、LSTM、深度神經網路、雙向迴圈神經網路

門控迴圈單元（GRU）迴圈神經網路中的梯度計算方法。當時間步數較大或者時間步較小時，迴圈神經網路的梯度較容易出現衰減或爆炸。雖然裁剪梯度可以應對梯度爆炸，但無法解決梯度衰減的問題。通常由於這個原因，迴圈神經網路在實際中較難捕捉時間序列中時間步距離較大的依賴關係。門控迴圈神經網路（gated recurre

斯坦福大學公開課機器學習：Neural network-model representation（神經網絡模型及神經單元的理解）

如何 work 單元 pre 結果 mda s函數額外權重神經網絡是在模仿大腦中的神經元或者神經網絡時發明的。因此，要解釋如何表示模型假設，我們先來看單個神經元在大腦中是什麽樣的。如下圖，我們的大腦中充滿了神經元，神經元是大腦中的細胞，其中有兩點值得我們註意，一是神經

演算法工程師之路——Deeplearning.ai神經網路與深度學習篇Week1

寫在前面——一點碎碎念天將降大任於是人也，必先苦其心志，勞其筋骨，餓其體膚，空乏其身，行拂亂其所為。——《孟子•告子下》塵埃落定，在好好學（wan）習（shua）三年之後，我成功僥倖收穫了的UESTC MSE的Offer，心裡萬分激動，想著B

吳恩達機器學習 - 神經網路的反向傳播演算法 吳恩達機器學習 - 神經網路的反向傳播演算法

吳恩達機器學習 - 神經網路的反向傳播演算法

筆記：

首先讓資料視覺化

執行程式碼

結果是：

用到是函式是displayData.m：

表示一下神經網路模型：

我們可以得到以下資訊：

代價函式

公式

sigmoid.m程式碼（這個已經沒有難度，只是下面要呼叫，先粘出來）：

代價函式的計算：nnCostFunction.m中填充的程式碼（暫時沒加正則化）（這裡要求有任意維度的輸出層都通用）：

代價函式正則化（在上面的程式碼下新增）：

反向傳播

Sigmoid導數的實現（sigmoidGradient.m）：

隨機初始化（randInitializeWeights.m）（因為權重不能全為0嘛，筆記上解釋了為什麼）：

反向傳播（填充在nnCostFunction.m中，代價函式計算程式碼之下，注意這裡沒有正則化）：

梯度檢驗（checkNNGradients.m）：

好了，檢驗通過我們進行下一步：正則化梯度（nnCostFunction.m）（在上面計算梯度的程式碼下填充）：

到此為止，計算代價以及計算梯度的過程我們已經完成了，下面貼出nnCostFunction.m的完整程式碼：

學習使用高階優化來求解（如果對寫法有疑惑，可以參考這篇文章：點選開啟連結）：

視覺化隱藏層（使用displayData(Theta1(:, 2:end));）：

displayData.m：

來看看隱藏層有什麼祕密：

最後，我們可以用不同的λ來獲得一個更精確的引數，這個以後再實驗

相關推薦

吳恩達機器學習 - 神經網路的反向傳播演算法吳恩達機器學習 - 神經網路的反向傳播演算法

視覺化隱藏層（使用`displayData(Theta1(:, 2:end));`）：