CS229 6.11 Neurons Networks implements of self-taught learning

阿新 • • 發佈：2018-11-27

在machine learning領域，更多的資料往往強於更優秀的演算法，然而現實中的情況是一般人無法獲取大量的已標註資料，這時候可以通過無監督方法獲取大量的未標註資料，自學習（ self-taught learning）與無監督特徵學習（unsupervised feature learning）就是這種演算法。雖然同等條件下有標註資料蘊含的資訊多於無標註資料，但是若能獲取大量的無標註資料並且計算機能夠加以利用，計算機往往可以取得比較良好的結果。

通過自學習與無監督特徵學習，可以得到大量的無標註資料，學習出較好的特徵描述，在嘗試解決一個具體的分類問題時，可以基於這些學習出的特徵描述和任意的（可能比較少的）已標註資料，使用有監督學習方法在標註資料上完成分類。

在擁有大量未標註資料和少量已標註資料的場景下，通過對所有x⁽ⁱ⁾進行特徵學習得到a⁽ⁱ⁾，在標註資料中用a⁽ⁱ⁾替原始的輸入x⁽ⁱ⁾得到新的訓練樣本{a⁽ⁱ⁾ ,y⁽ⁱ⁾ }(i=1...m)，即可取得很好的效果，即使在只有標註資料的情況下，本演算法依然能取得很好的效果。

autoencoder可以在無標註資料集中學習特徵，給定一個無標註的訓練資料集 $\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}$ （下標 $\textstyle u$ 代表“不帶類標”），首先進行預處理，比如pca或者白化，然後訓練一個sparse autoencoder:

通過訓練得到的模型引數 $\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}$ ，給定任意的輸入資料 $\textstyle x$

，可以計算隱藏單元的啟用量（activations） $\textstyle a$ 。相比原始輸入 $\textstyle x$ 來說， $\textstyle a$ 可能是一個更好的特徵描述。下圖的神經網路描述了特徵（啟用量 $\textstyle a$ ）的計算：

對應之前所提到的，假定有 $\textstyle m_l$ 個已標註訓練集 $\textstyle \{ (x_l^{(1)}, y^{(1)}), (x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}$ （下標 $\textstyle l$ 表示“帶類標”），現在可以為輸入資料找到更好的特徵描述。將 $\textstyle x_l^{(1)}$ 輸入到稀疏自編碼器，得到隱藏單元啟用量 $\textstyle a_l^{(1)}$ 。接下來，可以直接使用 $\textstyle a_l^{(1)}$ 來代替原始資料 $\textstyle x_l^{(1)}$ （“替代表示”,Replacement Representation）。也可以合二為一，使用新的向量 $\textstyle (x_l^{(1)}, a_l^{(1)})$

來代替原始資料 $\textstyle x_l^{(1)}$ （“級聯表示”,Concatenation Representation）。

經過變換後，訓練集就變成 $\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)}) \}$ 或者是 $\textstyle \{ ((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots, ((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}$ （取決於使用 $\textstyle a_l^{(1)}$ 替換 $\textstyle x_l^{(1)}$ 還是將二者合併）。在實踐中，將 $\textstyle a_l^{(1)}$ 和 $\textstyle x_l^{(1)}$ 合併通常表現的更好。考慮到記憶體和計算的成本，也可以使用替換操作。

最終，可以訓練出一個有監督學習演算法（例如 svm, logistic regression 等），得到一個判別函式對 $\textstyle y$ 值進行預測。預測過程如下：給定一個測試樣本 $\textstyle x_{\rm test}$ ，重複之前的過程，將其送入稀疏自編碼器，得到 $\textstyle a_{\rm test}$ 。然後將 $\textstyle a_{\rm test}$ （或者 $\textstyle (x_{\rm test}, a_{\rm test})$ ）送入分類器中，得到預測值。

從未標註訓練集 $\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}$ 中學習這一過程中可能計算了各種資料預處理引數。例如計算資料均值並且對資料做均值標準化（mean normalization）；或者對原始資料做主成分分析（PCA），然後將原始資料表示為 $\textstyle U^Tx$ (又或者使用 PCA 白化或 ZCA 白化)。這樣的話，有必要將這些引數儲存起來，並且在後面的訓練和測試階段使用同樣的引數，以保證新來（測試）資料進入稀疏自編碼神經網路之前經過了同樣的變換。例如，如果對未標註資料集進行PCA預處理，就必須將得到的矩陣 $\textstyle U$ 儲存起來，並且應用到有標註訓練集和測試集上；而不能使用有標註訓練集重新估計出一個不同的矩陣 $\textstyle U$ （也不能重新計算均值並做均值標準化），否則的話可能得到一個完全不一致的資料預處理操作，導致進入自編碼器的資料分佈迥異於訓練自編碼器時的資料分佈。

有兩種常見的無監督特徵學習方式，區別在於有什麼樣的未標註資料。自學習(self-taught learning) 是其中更為一般的、更強大的學習方式，它不要求未標註資料 $\textstyle x_u$ 和已標註資料 $\textstyle x_l$ 來自同樣的分佈。另外一種帶限制性的方式也被稱為半監督學習，它要求 $\textstyle x_u$ 和 $\textstyle x_l$ 服從同樣的分佈。下面通過例子解釋二者的區別。

假定有一個計算機視覺方面的任務，目標是區分汽車和摩托車影象；也即訓練樣本里面要麼是汽車的影象，要麼是摩托車的影象。哪裡可以獲取大量的未標註資料呢？最簡單的方式可能是從網際網路上下載一些隨機的影象資料集，在這些資料上訓練出一個稀疏自編碼器，從中得到有用的特徵。這個例子裡，未標註資料完全來自於一個和已標註資料不同的分佈（未標註資料集中，或許其中一些影象包含汽車或者摩托車，但是不是所有的影象都如此）。這種情形被稱為自學習。

相反，如果有大量的未標註影象資料，要麼是汽車影象，要麼是摩托車影象，僅僅是缺失了類標號（沒有標註每張圖片到底是汽車還是摩托車）。也可以用這些未標註資料來學習特徵。這種方式，即要求未標註樣本和帶標註樣本服從相同的分佈，有時候被稱為半監督學習。在實踐中，常常無法找到滿足這種要求的未標註資料（到哪裡找到一個每張影象不是汽車就是摩托車，只是丟失了類標號的影象資料庫？）因此，自學習在無標註資料集的特徵學習中應用更廣。

下面通過自學習的方法，整合sparse autoencoder 與 softmax regression 來構建一個手寫數字的分類。

演算法步驟：

1）把MNIST資料庫的資料分為labeled（0-4）與 unlabeled（5-9），並且把labeled data 分為 test data 與 train data，一半用來測試，一般用來訓練

2）用unlabeled data （5-9）訓練一個 sparse autoencoder，得到所有引數W⁽¹⁾W⁽²⁾b⁽¹⁾b⁽²⁾，記做 θ ，展示第一層引數W⁽¹⁾,展示效果如下：

3）使用上面的sparse autoencoder 訓練出來的W⁽¹⁾對labeled data（0-4）訓練得到其隱層輸出a⁽²⁾，這樣不適用原來的畫素值，而使用學到的特徵來對0-4進行分類。

4）用上述學到的特徵a⁽²⁾⁽ⁱ⁾代替原始輸入x⁽ⁱ⁾，現在的樣本為{(a⁽¹⁾,y⁽¹⁾)(a⁽²⁾,y⁽²⁾)...(a^(m),y^(m))},用該樣本來訓練我們的softmax分類器。

5）用訓練好的softmax進行預測，在labeled data 中的 test data 進行測試即可。準確率講道理的話應該有98%以上。

一下是matlab程式碼。部分程式碼直接呼叫到之前章節的：

%% CS294A/CS294W Self-taught Learning Exercise
 
%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the
%  self-taught learning. You will need to complete code in feedForwardAutoencoder.m
%  You will also need to have implemented sparseAutoencoderCost.m and
%  softmaxCost.m from previous exercises.
%
%% ======================================================================
%  STEP 0: Here we provide the relevant parameters values that will
%  allow your sparse autoencoder to get good filters; you do not need to
%  change the parameters below.
 
inputSize  = 28 * 28;
numLabels  = 5;
hiddenSize = 200;
sparsityParam = 0.1; % desired average activation of the hidden units.
                     % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                     %  in the lecture notes).
lambda = 3e-3;       % weight decay parameter      
beta = 3;            % weight of sparsity penalty term  
maxIter = 400;
 
%% ======================================================================
%  STEP 1: Load data from the MNIST database
%
%  This loads our training and test data from the MNIST database files.
%  We have sorted the data for you in this so that you will not have to
%  change it.
 
% Load MNIST database files
mnistData   = loadMNISTImages('mnist/train-images-idx3-ubyte');
mnistLabels = loadMNISTLabels('mnist/train-labels-idx1-ubyte');
 
% Set Unlabeled Set (All Images)
 
% Simulate a Labeled and Unlabeled set
labeledSet   = find(mnistLabels >= 0 & mnistLabels <= 4);
unlabeledSet = find(mnistLabels >= 5);
 
%把labeled set分為訓練資料 和 測試資料
numTrain = round(numel(labeledSet)/2);
trainSet = labeledSet(1:numTrain);
testSet  = labeledSet(numTrain+1:end);
 
unlabeledData = mnistData(:, unlabeledSet);
 
trainData   = mnistData(:, trainSet);
trainLabels = mnistLabels(trainSet)' + 1; % Shift Labels 0-4 to the Range 1-5
 
testData   = mnistData(:, testSet);
testLabels = mnistLabels(testSet)' + 1;   % Shift Labels 0-4 to the Range 1-5
 
% Output Some Statistics
fprintf('# examples in unlabeled set: %d\n', size(unlabeledData, 2));
fprintf('# examples in supervised training set: %d\n\n', size(trainData, 2));
fprintf('# examples in supervised testing set: %d\n\n', size(testData, 2));
 
%% ======================================================================
%  STEP 2: Train the sparse autoencoder
%  This trains the sparse autoencoder on the unlabeled training
%  images.
 
%  Randomly initialize the parameters
theta = initializeParameters(hiddenSize, inputSize);
 
%% ----------------- YOUR CODE HERE ----------------------
%  Find opttheta by running the sparse autoencoder on
%  unlabeledTrainingImages
%theta 現再是以個展開的向量,對應[W1,W2,b1,b2]的長向量
opttheta = theta;
 
opttheta = theta;
addpath minFunc/
options.Method = 'lbfgs';
options.maxIter = 400;
options.display = 'on';
[opttheta, loss] = minFunc( @(p) sparseAutoencoderLoss(p, ...
      inputSize, hiddenSize, ...
      lambda, sparsityParam, ...
      beta, unlabeledData), ...
      theta, options);
 
 
%% -----------------------------------------------------
                           
% Visualize weights，展示W1'（28*28 * 200的矩陣）
% 把該矩陣的每一列展示為一個28*28的圖片，來看效果
W1 = reshape(opttheta(1:hiddenSize * inputSize), hiddenSize, inputSize);
display_network(W1');
 
%%======================================================================
%% STEP 3: Extract Features from the Supervised Dataset
% 
%  You need to complete the code in feedForwardAutoencoder.m so that the
%  following command will extract features from the data.
 
trainFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       trainData);
 
testFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
                                       testData);
 
%%======================================================================
%% STEP 4: Train the softmax classifier
 
softmaxModel = struct; 
%  Use softmaxTrain.m from the previous exercise to train a multi-class
%  classifier.
 
%  Use lambda = 1e-4 for the weight regularization for softmax
 
% You need to compute softmaxModel using softmaxTrain on trainFeatures and
% trainLabels
 
lambda = 1e-4;
inputSize = hiddenSize;
numClasses = numel(unique(trainLabels));%unique為找出向量中的非重複元素並進行排序
 
options.maxIter = 100;
%注意這裡的資料不是x^(i)，而是a^(2).
softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...
                            trainFeatures, trainLabels, options);
 
%% -----------------------------------------------------
 
 
%%======================================================================
%% STEP 5: Testing
 
% Compute Predictions on the test set (testFeatures) using softmaxPredict
% and softmaxModel
 
[pred] = softmaxPredict(softmaxModel, testFeatures);
%% -----------------------------------------------------
 
% Classification Score
fprintf('Test Accuracy: %f%%\n', 100*mean(pred(:) == testLabels(:)));
 
% (note that we shift the labels by 1, so that digit 0 now corresponds to
%  label 1)
%
% Accuracy is the proportion of correctly classified images
% The results for our implementation was:
%
% Accuracy: 98.3%
%
%
 
 
 
%%%%%%%%%%%%% 以下對應STEP 3，%%%%%%%%%%%%%%
function [activation] = feedForwardAutoencoder(theta, hiddenSize, visibleSize, data)
 
% theta: trained weights from the autoencoder
% visibleSize: the number of input units (probably 64)
% hiddenSize: the number of hidden units (probably 25)
% data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example.
   
% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
% follows the notation convention of the lecture notes.
 
W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
 
%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute the activation of the hidden layer for the Sparse Autoencoder.
%計算隱層輸出a^(2)
activation  = sigmoid(W1*data+repmat(b1,[1,size(data,2)]));
 
%-------------------------------------------------------------------
 
end
 
%-------------------------------------------------------------------
% Here's an implementation of the sigmoid function, which you may find useful
% in your computation of the costs and the gradients.  This inputs a (row or
% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)).
 
function sigm = sigmoid(x)
    sigm = 1 ./ (1 + exp(-x));
end

View Code

CS229 6.11 Neurons Networks implements of self-taught learning

CS229 6.11 Neurons Networks implements of self-taught learning

CS229 6.5 Neurons Networks Implements of Sparse Autoencoder

CS229 6.8 Neurons Networks implements of PCA ZCA and whitening

CS229 6.10 Neurons Networks implements of softmax regression

CS229 6.13 Neurons Networks Implements of stack autoencoder

CS229 6.12 Neurons Networks from self-taught learning to deep network

CS229 6.16 Neurons Networks linear decoders and its implements

CS229 6.4 Neurons Networks Autoencoders and Sparsity

CS229 6.3 Neurons Networks Gradient Checking

CS229 6.2 Neurons Networks Backpropagation Algorithm

CS229 6.7 Neurons Networks whitening

CS229 6.9 Neurons Networks softmax regression

CS229 6.14 Neurons Networks Restricted Boltzmann Machines

CS229 6.15 Neurons Networks Deep Belief Networks

CS229 6.17 Neurons Networks convolutional neural network（cnn）

(六) 6.1 Neurons Networks Representation

CS229 6.6 Neurons Networks PCA主成分分析

2017.6.11 校內模擬賽

2017.6.11筆記

[譯]深度神經網絡的多任務學習概覽(An Overview of Multi-task Learning in Deep Neural Networks)

CS229 6.11 Neurons Networks implements of self-taught learning

相關推薦