Deep Learning 8_深度學習UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大學深度學習教程)
2.實驗環境:win7, matlab2015b,16G記憶體,2T硬碟
3.實驗內容:Exercise: Implement deep networks for digit classification。利用深度網路完成MNIST手寫數字資料庫中手寫數字的識別。即:用6萬個已標註資料(即:6萬張28*28的影象塊(patches)),作為訓練資料集,然後把它輸入到棧式自編碼器中,它的第一層自編碼器提取出訓練資料集的一階特徵,接著把這個一階特徵輸入到第二層自編碼器中提取出二階特徵,然後把把這個二階特徵輸入到softmax分類器,再用原始資料的標籤和二階特徵來訓練softmax分類器,最後利用BP演算法對整個網路的權重值進行微調以更好地學習資料,再用1萬個已標註資料(即:1萬張28*28的影象塊(patches))作為測試資料集,用前面訓練好的softmax分類器對測試資料集進行分類,並計算分類的正確率。本節整個網路結構如下:
在什麼時候應用微調?通常僅在有大量已標註訓練資料的情況下使用。在這樣的情況下,微調能顯著提升分類器效能。然而,如果有大量未標註資料集(用於非監督特徵學習/預訓練),卻只有相對較少的已標註訓練集,微調的作用非常有限,這時可用Deep Learning七:Self-Taught Learning_Exercise(斯坦福大學深度學習教程UFLDL)中介紹的方法。
[params, netconfig] = stack2params(stack)
[ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, numClasses, netconfig,lambda, data, labels)
stack = params2stack(params, netconfig)
[pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)
[h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)
s = sturct;表示建立一個結構陣列s。
比如函式save('saves/step2.mat', 'sae1OptTheta');則要求當前目錄下有saves這個目錄,否則該語句會呼叫失敗的。
images = loadMNISTImages('train-images.idx3-ubyte');
labels = loadMNISTLabels('train-labels.idx1-ubyte');
labels(labels==0) = 10; % 把標籤0變為標籤10,故labels的值是[1,10],而原來是[0,9] ?為什麼非要這樣?
trainLabels(trainLabels == 0) = 10; % 一直沒搞懂,為什麼非要把標籤0變為10?
[prob pred] = max(softmaxTheta*a{depth+1});
7. 疑問
首先,要清楚第二個隱含層特徵顯示不出來的原因是什麼,很多人(比如:Deep learning:二十四(stacked autoencoder練習))以為是這個原因: 因為display_network.m這個函式要求隱含層神經元數的均方根必須是整數,而在本節實驗中隱含層神經元數設定的是200,它不是一個整數的平方,所以不能顯示出來,但這只是一個程式編寫的問題,實際上這個問題很好解決,我們只需要把隱含層神經元數設定為196,就可以用按照顯示第一層特徵的方式用函式display_network.m把它顯示出。但實際上並不是這個原因,具體我們可以從下面得到的結果證明,結果如下:
那麼,它究竟為什麼不能這樣顯示,究竟該怎麼樣顯示呢?這實際上是一個深度學習的一個研究方向,具體可參考:Deep Learning論文筆記之(七)深度網路高層特徵視覺化
8 代價函式
1 depth = size(stack, 1); % 隱藏層的數量 2 a = cell(depth+1, 1); % 輸入層和隱藏層的輸出值,即:輸入層的輸出值和隱藏層的啟用值 3 a{1} = data; % 輸入層的輸出值 4 Jweight = 0; % 權重懲罰項 5 m = size(data, 2); % 樣本數 6 7 % 計算隱藏層的啟用值 8 for i=2:numel(a) 9 a{i} = sigmoid(stack{i-1}.w*a{i-1}+repmat(stack{i-1}.b, [1 size(a{i-1}, 2)])); 10 %Jweight = Jweight + sum(sum(stack{i-1}.w).^2); 11 end 12 13 M = softmaxTheta*a{depth+1}; % a{depth+1}為最後一層隱藏層的輸出,此時M為輸入softmax層的資料,即它是未計算softmax層啟用函式前的數值. 14 M = bsxfun(@minus, M, max(M, [], 1)); %防止下一步計算指數函式時溢位 15 M = exp(M); 16 p = bsxfun(@rdivide, M, sum(M)); % p為softmax層的輸出,就是每種類別的分類概率 17 18 Jweight = Jweight + sum(softmaxTheta(:).^2); % softmaxTheta是softmax層的權重引數 19 20 % 計算softmax分類器的代價函式,為什麼它就是整個模型的代價函式? 21 cost = -1/m .* groundTruth(:)'*log(p(:)) + lambda/2*Jweight;% 代價函式=均方差項+權重衰減項(也叫:規則化項)
Before Finetuning Test Accuracy: 92.140%
After Finetuning Test Accuracy: 97.590%
%% CS294A/CS294W Stacked Autoencoder Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % sstacked autoencoder exercise. You will need to complete code in % stackedAECost.m % You will also need to have implemented sparseAutoencoderCost.m and % softmaxCost.m from previous exercises. You will need the initializeParameters.m % loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises. % % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. DISPLAY = true; inputSize = 28 * 28; numClasses = 10; hiddenSizeL1 = 200; % Layer 1 Hidden Size hiddenSizeL2 = 200; % Layer 2 Hidden Size sparsityParam = 0.1; % desired average activation of the hidden units. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", % in the lecture notes). lambda = 3e-3; % weight decay parameter beta = 3; % weight of sparsity penalty term %%====================================================================== %% STEP 1: Load data from the MNIST database % % This loads our training data from the MNIST database files. % Load MNIST database files trainData = loadMNISTImages('train-images.idx3-ubyte'); trainLabels = loadMNISTLabels('train-labels.idx1-ubyte'); trainLabels(trainLabels == 0) = 10; % 一直沒搞懂,為什麼非要把標籤0變為10? Remap 0 to 10 since our labels need to start from 1 %%====================================================================== %% STEP 2: Train the first sparse autoencoder % This trains the first sparse autoencoder on the unlabelled STL training % images. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here. % Randomly initialize the parameters sae1Theta = initializeParameters(hiddenSizeL1, inputSize); %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the first layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL1" % You should store the optimal parameters in sae1OptTheta addpath minFunc/; options = struct; options.Method = 'lbfgs'; options.maxIter = 400; options.display = 'on'; [sae1OptTheta, cost] = minFunc(@(p)sparseAutoencoderCost(p,... inputSize,hiddenSizeL1,lambda,sparsityParam,beta,trainData),sae1Theta,options);%訓練出第一層網路的引數 save('saves/step2.mat', 'sae1OptTheta'); if DISPLAY W1 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize); display_network(W1'); end % ------------------------------------------------------------------------- %%====================================================================== %% STEP 3: Train the second sparse autoencoder % This trains the second sparse autoencoder on the first autoencoder % featurse. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here. % 利用第一個稀疏自編碼器的權重引數sae1OptTheta,得到輸入資料的一階特徵表示 [sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ... inputSize, trainData); % Randomly initialize the parameters sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1); %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the second layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL2" and an inputsize of % "hiddenSizeL1" % % You should store the optimal parameters in sae2OptTheta [sae2OptTheta, cost] = minFunc(@(p)sparseAutoencoderCost(p,... hiddenSizeL1,hiddenSizeL2,lambda,sparsityParam,beta,sae1Features),sae2Theta,options);%訓練出第二層網路的引數 save('saves/step3.mat', 'sae2OptTheta'); figure; if DISPLAY W11 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize); W12 = reshape(sae2OptTheta(1:hiddenSizeL2 * hiddenSizeL1), hiddenSizeL2, hiddenSizeL1); % TODO(zellyn): figure out how to display a 2-level network % display_network(log(W11' ./ (1-W11')) * W12'); % W12_temp = W12(1:196,1:196); % display_network(W12_temp'); % figure; % display_network(W12_temp'); end % ------------------------------------------------------------------------- %%====================================================================== %% STEP 4: 用二階特徵訓練softmax分類器 Train the softmax classifier % This trains the sparse autoencoder on the second autoencoder features. % If you've correctly implemented softmaxCost.m, you don't need % to change anything here. % 利用第二個稀疏自編碼器的權重引數sae2OptTheta,得到輸入資料的二階特徵表示 [sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ... hiddenSizeL1, sae1Features); % Randomly initialize the parameters saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);%這個引數拿來幹嘛?計算softmaxCost函式嗎?可以捨去! %因為softmaxCost函式在softmaxExercise練習中已經實現,並且已經證明其梯度計算是正確的! %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the softmax classifier, the classifier takes in % input of dimension "hiddenSizeL2" corresponding to the % hidden layer size of the 2nd layer. % % You should store the optimal parameters in saeSoftmaxOptTheta % % NOTE: If you used softmaxTrain to complete this part of the exercise, % set saeSoftmaxOptTheta = softmaxModel.optTheta(:); softmaxLambda = 1e-4; numClasses = 10; softoptions = struct; softoptions.maxIter = 400; softmaxModel = softmaxTrain(hiddenSizeL2,numClasses,softmaxLambda,... sae2Features,trainLabels,softoptions); saeSoftmaxOptTheta = softmaxModel.optTheta(:);%得到softmax分類器的權重引數 save('saves/step4.mat', 'saeSoftmaxOptTheta'); % ------------------------------------------------------------------------- %%====================================================================== %% STEP 5: 微調 Finetune softmax model % Implement the stackedAECost to give the combined cost of the whole model % then run this cell. % Initialize the stack using the parameters learned stack = cell(2,1); stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ... hiddenSizeL1, inputSize); stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1); stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ... hiddenSizeL2, hiddenSizeL1); stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2); % Initialize the parameters for the deep model [stackparams, netconfig] = stack2params(stack);%把stack層(即:兩個隱藏層)的權重引數變為一個向量stackparams stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];% 得到微調前整個網路引數向量stackedAETheta,它包括softmax分類器那部分的引數向量saeSoftmaxOptTheta,且分類器那部分的引數放前面 %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the deep network, hidden size here refers to the ' % dimension of the input to the classifier, which corresponds % to "hiddenSizeL2". % % 用BP演算法微調,得到微調後的整個網路引數stackedAEOptTheta [stackedAEOptTheta, cost] = minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,... numClasses, netconfig,lambda, trainData, trainLabels),... stackedAETheta,options);%訓練出第三層網路的引數 save('saves/step5.mat', 'stackedAEOptTheta'); figure; if DISPLAY optStack = params2stack(stackedAEOptTheta(hiddenSizeL2*numClasses+1:end), netconfig); W11 = optStack{1}.w; W12 = optStack{2}.w; % TODO(zellyn): figure out how to display a 2-level network % display_network(log(1 ./ (1-W11')) * W12'); end % ------------------------------------------------------------------------- %%====================================================================== %% STEP 6: Test % Instructions: You will need to complete the code in stackedAEPredict.m % before running this part of the code % % Get labelled test images % Note that we apply the same kind of preprocessing as the training set testData = loadMNISTImages('t10k-images.idx3-ubyte'); testLabels = loadMNISTLabels('t10k-labels.idx1-ubyte'); testLabels(testLabels == 0) = 10; % Remap 0 to 10 [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ... numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:)); fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100); [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ... numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:)); fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100); % Accuracy is the proportion of correctly classified images % The results for our implementation were: % % Before Finetuning Test Accuracy: 87.7% % After Finetuning Test Accuracy: 97.6% % % If your values are too low (accuracy less than 95%), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)
1 function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ... 2 numClasses, netconfig, ... 3 lambda, data, labels) 4 % 計算整個模型的代價函式及其梯度 5 % 注意:完成這個函式後最好用checkStackedAECost函式檢查梯度計算是否正確 6 7 % stackedAECost: Takes a trained softmaxTheta and a training data set with labels, 8 % and returns cost and gradient using a stacked autoencoder model. Used for 9 % finetuning. 10 11 % theta: trained weights from the autoencoder 12 % visibleSize: the number of input units 13 % hiddenSize: the number of hidden units *at the 2nd layer* 14 % numClasses: the number of categories 15 % netconfig: the network configuration of the stack 16 % lambda: the weight regularization penalty 17 % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. 18 % labels: A vector containing labels, where labels(i) is the label for the 19 % i-th training example 20 21 22 %% Unroll softmaxTheta parameter 23 24 % We first extract the part which compute the softmax gradient 25 softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);%從整個網路引數向量中提取出softmax分類器部分的引數,並以矩陣表示 26 27 % Extract out the "stack" 28 stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);%從整個網路引數向量中提取出隱藏層部分的引數,並以結構表示 29 30 % You will need to compute the following gradients 31 softmaxThetaGrad = zeros(size(softmaxTheta));% softmaxTheta的梯度 32 stackgrad = cell(size(stack)); % stack的梯度 33 for d = 1:numel(stack) 34 stackgrad{d}.w = zeros(size(stack{d}.w)); 35 stackgrad{d}.b = zeros(size(stack{d}.b)); 36 end 37 38 cost = 0; % You need to compute this 39 40 % You might find these variables useful 41 M = size(data, 2); 42 groundTruth = full(sparse(labels, 1:M, 1)); 43 44 45 %% --------------------------- YOUR CODE HERE ----------------------------- 46 % Instructions: Compute the cost function and gradient vector for 47 % the stacked autoencoder. 48 % 49 % You are given a stack variable which is a cell-array of 50 % the weights and biases for every layer. In particular, you 51 % can refer to the weights of Layer d, using stack{d}.w and 52 % the biases using stack{d}.b . To get the total number of 53 % layers, you can use numel(stack). 54 % 55 % The last layer of the network is connected to the softmax 56 % classification layer, softmaxTheta. 57 % 58 % You should compute the gradients for the softmaxTheta, 59 % storing that in softmaxThetaGrad. Similarly, you should 60 % compute the gradients for each layer in the stack, storing 61 % the gradients in stackgrad{d}.w and stackgrad{d}.b 62 % Note that the size of the matrices in stackgrad should 63 % match exactly that of the size of the matrices in stack. 64 % 65 66 depth = size(stack, 1); % 隱藏層的數量 67 a = cell(depth+1, 1); % 輸入層和隱藏層的輸出值,即:輸入層的輸出值和隱藏層的啟用值 68 a{1} = data; % 輸入層的輸出值 69 Jweight = 0; % 權重懲罰項 70 m = size(data, 2); % 樣本數 71 72 % 計算隱藏層的啟用值 73 for i=2:numel(a) 74 a{i} = sigmoid(stack{i-1}.w*a{i-1}+repmat(stack{i-1}.b, [1 size(a{i-1}, 2)])); 75 %Jweight = Jweight + sum(sum(stack{i-1}.w).^2); 76 end 77 78 M = softmaxTheta*a{depth+1}; 79 M = bsxfun(@minus, M, max(M, [], 1)); %防止下一步計算指數函式時溢位 80 M = exp(M); 81 p = bsxfun(@rdivide, M, sum(M)); 82 83 Jweight = Jweight + sum(softmaxTheta(:).^2); 84 85 % 計算softmax分類器的代價函式,為什麼它就是整個模型的代價函式? 86 cost = -1/m .* groundTruth(:)'*log(p(:)) + lambda/2*Jweight;% 代價函式=均方差項+權重衰減項(也叫:規則化項) 87 88 %計算softmax分類器代價函式的梯度,即輸出層的梯度 89 softmaxThetaGrad = -1/m .* (groundTruth - p)*a{depth+1}' + lambda*softmaxTheta; 90 91 delta = cell(depth+1, 1); %隱藏層和輸出層的殘差 92 93 %計算輸出層的殘差 94 delta{depth+1} = -softmaxTheta' * (groundTruth - p) .* a{depth+1} .* (1-a{depth+1}); 95 96 %計算隱藏層的殘差 97 for i=depth:-1:2 98 delta{i} = stack{i}.w'*delta{i+1}.*a{i}.*(1-a{i}); 99 end 100 101 % 通過前面得到的輸出層和隱藏層的殘差,計算隱藏層引數的梯度 102 for i=depth:-1:1 103 stackgrad{i}.w = 1/m .* delta{i+1}*a{i}'; 104 stackgrad{i}.b = 1/m .* sum(delta{i+1}, 2); 105 end 106 107 % ------------------------------------------------------------------------- 108 109 %% Roll gradient vector 110 grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)]; 111 112 end 113 114 115 % You might find this useful 116 function sigm = sigmoid(x) 117 sigm = 1 ./ (1 + exp(-x)); 118 end
1 function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data) 2 3 % stackedAEPredict: Takes a trained theta and a test data set, 4 % and returns the predicted labels for each example. 5 6 % theta: trained weights from the autoencoder 7 % visibleSize: the number of input units 8 % hiddenSize: the number of hidden units *at the 2nd layer* 9 % numClasses: the number of categories 10 % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. 11 12 % Your code should produce the prediction matrix 13 % pred, where pred(i) is argmax_c P(y(c) | x(i)). 14 15 %% Unroll theta parameter 16 17 % We first extract the part which compute the softmax gradient 18 softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize); 19 20 % Extract out the "stack" 21 stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig); 22 23 %% ---------- YOUR CODE HERE -------------------------------------- 24 % Instructions: Compute pred using theta assuming that the labels start 25 % from 1. 26 27 depth = numel(stack); 28 a = cell(depth+1); 29 a{1} = data; 30 m = size(data, 2); 31 32 for i=2:depth+1 33 a{i} = sigmoid(stack{i-1}.w*a{i-1}+ repmat(stack{i-1}.b, [1 m])); 34 end 35 36 [prob pred] = max(softmaxTheta*a{depth+1}); 37 38 39 40 41 % ----------------------------------------------------------- 42 43 end 44 45 46 % You might find this useful 47 function sigm = sigmoid(x) 48 sigm = 1 ./ (1 + exp(-x)); 49 end
1 function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor) 2 % This function visualizes filters in matrix A. Each column of A is a 3 % filter. We will reshape each column into a square image and visualizes 4 % on each cell of the visualization panel. 5 % All other parameters are optional, usually you do not need to worry 6 % about it. 7 % opt_normalize:whether we need to normalize the filter so that all of 8 % them can have similar contrast. Default value is true. 9 % opt_graycolor: whether we use gray as the heat map. Default is true. 10 % cols: how many columns are there in the display. Default value is the 11 % squareroot of the number of columns in A. 12 % opt_colmajor: you can switch convention to row major for A. In that 13 % case, each row of A is a filter. Default value is false. 14 15 % opt_normalize:是否需要歸一化的引數。真:每個影象塊歸一化(即:每個影象塊元素值除以該影象塊中畫素值絕對值的最大值); 16 % 假:整幅大影象一起歸一化(即:每個影象塊元素值除以整幅影象中畫素值絕對值的最大值)。預設為真。 17 % opt_graycolor: 該引數決定是否顯示灰度圖。 18 % 真:顯示灰度圖;假:不顯示灰度圖。預設為真。 19 % cols: 該引數決定將要顯示的整幅大影象每一行中小影象塊的個數。預設為A列數的均方根。 20 % opt_colmajor:該引數決定將要顯示的整個大影象中每個小影象塊是按行從左到右依次排列,還是按列從上到下依次排列 21 % 真:整個大影象由每個小影象塊按列從上到下依次排列組成; 22 % 假:整個大影象由每個小影象塊按行從左到右依次排列組成。預設為假。 23 24 warning off all %關閉警告 25 26 % 引數的預設值 27 if ~exist('opt_normalize', 'var') || isempty(opt_normalize) 28 opt_normalize= true; 29 end 30 31 if ~exist('opt_graycolor', 'var') || isempty(opt_graycolor) 32 opt_graycolor= true; 33相關推薦
Deep Learning 32: 自己寫的keras的一個callbacks函式,解決keras中不能在每個epoch實時顯示學習速率learning rate的問題
1 from __future__ import absolute_import 2 from . import backend as K 3 from .utils.generic_utils import get_from_module 4 from six.moves import z
擴展 問題 sca ref 調度 這也 集中 技術 park 我的探索歷程 這一部分,與分布式不大相關,記錄的是我是如何在分布式學習這條道路上摸索的,不感興趣的讀者請直接跳到下一章。 過去的一年,我在分布式學習這條道路上苦苦徘徊,始終沒有找到一個好的學