CS229 6.5 Neurons Networks Implements of Sparse Autoencoder

阿新 • • 發佈：2018-11-27

sparse autoencoder的一個例項練習，這個例子所要實現的內容大概如下：從給定的很多張自然圖片中截取出大小為8*8的小patches圖片共10000張，現在需要用sparse autoencoder的方法訓練出一個隱含層網路所學習到的特徵。該網路共有3層，輸入層是64個節點，隱含層是25個節點，輸出層當然也是64個節點了。

main函式, 分五步走，每個函式的實現細節在下邊都列出了。

  1 %%======================================================================
  2 %% STEP 0: Here we provide the relevant parameters values that will
 
  3 %  allow your sparse autoencoder to get good filters; you do not need to
  4 %  change the parameters below.
  5  
  6 visibleSize = 8*8;   % number of input units
  7 hiddenSize = 25;     % number of hidden units
  8 sparsityParam = 0.01;   % desired average activation of the hidden units.
  9                      % (This was denoted by the Greek alphabet rho,
 
 10                      % which looks like a lower-case "p",
 11              %  in the lecture notes).
 12 lambda = 0.0001;     % weight decay parameter      
 13 beta = 3;            % weight of sparsity penalty term      
 14  
 15 %%======================================================================
 16 
 %% STEP 1: Implement sampleIMAGES
 17 %
 18 %  After implementing sampleIMAGES, the display_network command should
 19 %  display a random sample of 200 patches from the dataset
 20 patches = sampleIMAGES;
 21 display_network(patches(:,randi(size(patches,2),200,1)),8);
 22  
 23  
 24 %  Obtain random parameters theta
 25 theta = initializeParameters(hiddenSize, visibleSize);
 26  
 27 %%======================================================================
 28 %% STEP 2: Implement sparseAutoencoderCost
 29 %
 30 %  You can implement all of the components (squared error cost, weight decay term,
 31 %  sparsity penalty) in the cost function at once, but it may be easier to do
 32 %  it step-by-step and run gradient checking (see STEP 3) after each step.  We
 33 %  suggest implementing the sparseAutoencoderCost function using the following steps:
 34 %
 35 %  (a) Implement forward propagation in your neural network, and implement the
 36 %      squared error term of the cost function.  Implement backpropagation to
 37 %      compute the derivatives.   Then (using lambda=beta=0), run Gradient Checking
 38 %      to verify that the calculations corresponding to the squared error cost
 39 %      term are correct.
 40 %
 41 %  (b) Add in the weight decay term (in both the cost function and the derivative
 42 %      calculations), then re-run Gradient Checking to verify correctness.
 43 %
 44 %  (c) Add in the sparsity penalty term, then re-run Gradient Checking to
 45 %      verify correctness.
 46 %
 47 %  Feel free to change the training settings when debugging your
 48 %  code.  (For example, reducing the training set size or
 49 %  number of hidden units may make your code run faster; and setting beta
 50 %  and/or lambda to zero may be helpful for debugging.)  However, in your
 51 %  final submission of the visualized weights, please use parameters we
 52 %  gave in Step 0 above.
 53  
 54 [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
 55                                     lambda,sparsityParam, beta, patches);
 56  
 57 %%======================================================================
 58 %% STEP 3: Gradient Checking
 59 %
 60 % Hint: If you are debugging your code, performing gradient checking on smaller models
 61 % and smaller training sets (e.g., using only 10 training examples and 1-2 hidden
 62 % units) may speed things up.
 63  
 64 % First, lets make sure your numerical gradient computation is correct for a
 65 % simple function.  After you have implemented computeNumericalGradient.m,
 66 % run the following:
 67 checkNumericalGradient();
 68  
 69 % Now we can use it to check your cost function and derivative calculations
 70 % for the sparse autoencoder. 
 71 numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...
 72                         hiddenSize, lambda,sparsityParam, beta, patches), theta);
 73  
 74 % Use this to visually compare the gradients side by side
 75 disp([numgrad grad]);
 76  
 77 % Compare numerically computed gradients with the ones obtained from backpropagation
 78 diff = norm(numgrad-grad)/norm(numgrad+grad);
 79 disp(diff); % Should be small. In our implementation, these values are
 80             % usually less than 1e-9.
 81             % When you got this working, Congratulations!!!
 82  
 83 %%======================================================================
 84 %% STEP 4: After verifying that your implementation of
 85 %  sparseAutoencoderCost is correct, You can start training your sparse
 86 %  autoencoder with minFunc (L-BFGS).
 87  
 88 %  Randomly initialize the parameters
 89 theta = initializeParameters(hiddenSize, visibleSize);
 90  
 91 %  Use minFunc to minimize the function
 92 addpath minFunc/
 93 options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
 94                           % function. Generally, for minFunc to work, you
 95                           % need a function pointer with two outputs: the
 96                           % function value and the gradient. In our problem,
 97                           % sparseAutoencoderCost.m satisfies this.
 98 options.maxIter = 400;    % Maximum number of iterations of L-BFGS to run
 99 options.display = 'on';
100 [opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p,visibleSize, hiddenSize, ...
101                             lambda, sparsityParam, beta, patches),theta, options);
102 %%======================================================================
103 %% STEP 5: Visualization
104  
105 W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
106 display_network(W1', 12);
107  
108 print -djpeg weights.jpg   % save the visualization to a file
109  
110 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 對應step1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
111 %三個函式（sampleIMAGES）（normalizeData）（initializeParameters）%%%%
112 function patches = sampleIMAGES()
113 load IMAGES;    % 載入初始的10張512*512大圖片
114  
115 patchsize = 8;  % 取樣大小
116 numpatches = 10000;
117  
118 %  初始化該矩陣為0，該矩陣為 64*10000維每一列為一張圖片.
119 patches = zeros(patchsize*patchsize, numpatches);
120    
121 %  IMAGES 為一個包含10 張images的三維陣列，IMAGES(:,:,6) 是一個第六張圖片的 512x512 的二維陣列,
122 %  命令 "imagesc(IMAGES(:,:,6)), colormap gray;" 可以把第六張圖視覺化.
123 % 這幾張圖是經過whiteing預處理的？
124 %  IMAGES(21:30,21:30,1) 就是從第一張圖取樣得到的(21,21) to (30,30) 的小patchs
125  
126 %在每張圖片中隨機選取1000個patch，共10000個patch
127 for imageNum = 1:10
128     [rowNum colNum] = size(IMAGES(:,:,imageNum));
129     %實現每張圖片選取1000個patch
130     for patchNum = 1:1000
131         %得到左上角的兩個點
132         xPos = randi([1,rowNum-patchsize+1]);
133         yPos = randi([1, colNum-patchsize+1]);
134         %填充到矩陣裡
135         patches(:,(imageNum-1)*1000+patchNum) = ...
136             reshape(IMAGES(xPos:xPos+7,yPos:yPos+7,imageNum),64,1);
137     end
138 end
139 %由於autoencoder的激勵函式是sigmod函式，輸出值限定在[0,1],故為了達到H W,b（x）= x，x作為輸入，
140 %也要限定在0-1之間，故需要進行正則化
141 patches = normalizeData(patches);
142 end
143  
144 % 正則化的函式，不太明白s-sigma法則？
145 function patches = normalizeData(patches)
146 % 減去均值
147 patches = bsxfun(@minus, patches, mean(patches));
148 % s = std(X)，此處X是一個向量，該函式返回標準偏差（注意其分母為n-1，而不是n） 。
149 % 結果s是一個X各樣本偏差無偏估計的平方根(X包含獨立的、同分布樣本)。
150 % 如果X是一個矩陣，該函式返回一個行向量，它包含了X每列元素的標準偏差。
151 pstd = 3 * std(patches(:));
152 patches = max(min(patches, pstd), -pstd) / pstd;
153 % 重新壓縮 從[-1,1] 到 [0.1,0.9]
154 patches = (patches + 1) * 0.4 + 0.1;
155 end
156  
157 %首先初始化引數
158 function theta = initializeParameters(hiddenSize, visibleSize)
159 % Initialize parameters randomly based on layer sizes.
160  % we'll choose weights uniformly from the interval [-r, r]
161 r  = sqrt(6) / sqrt(hiddenSize+visibleSize+1);
162 %rand(a,b)產生均勻分佈的隨機矩陣維度為a*b，元素取值範圍0.0 ～1.0。
163 W1 = rand(hiddenSize, visibleSize) * 2 * r - r;
164 %rand(a,b)*2*r即取值範圍為（0-2r）， rand(a,b)*2*r -r即取值範圍為（-r - r）
165 W2 = rand(visibleSize, hiddenSize) * 2 * r - r;
166 b1 = zeros(hiddenSize, 1); %連線到hidden unit的偏置單元
167 b2 = zeros(visibleSize, 1); %連結到output layer的偏置單元
168 %  將矩陣合併為一個向量
169 theta = [W1(:) ; W2(:) ; b1(:) ; b2(:)];
170 %初始化引數結束
171 end
172  
173  
174  
175 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 對應step 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
176 %%%%%返回稀疏損失函式的值與梯度值%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
177 function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
178                                         lambda, sparsityParam, beta, data)
179 % visibleSize: 輸入層單元數
180 % hiddenSize: 隱藏單元數
181 % lambda: 正則項
182 % sparsityParam: （p）指定的平均啟用度p
183 % beta: 稀疏權重項B
184 % data: 64x10000 的矩陣為training data,data(:,i)  是第i個訓練樣例.  
185 % 把引數拼接為一個向量，因為採用L-BFGS優化，L-BFGS要求的就是向量.
186 % 將長向量轉換成每一層的權值矩陣和偏置向量值
187 % theta向量的的 1->hiddenSize*visibleSize，W1共hiddenSize*visibleSize 個元素，重新作為矩陣
188 W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
189  
190 %類似以上一直往後放
191 W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);
192 b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
193 b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);
194  
195 % 引數對應的梯度矩陣 ;
196 cost = 0;
197 W1grad = zeros(size(W1));
198 W2grad = zeros(size(W2));
199 b1grad = zeros(size(b1));
200 b2grad = zeros(size(b2));
201  
202 Jcost = 0;  %直接誤差
203 Jweight = 0;%權值懲罰
204 Jsparse = 0;%稀疏性懲罰
205 [n m] = size(data); %m為樣本的個數，n為樣本的特徵數
206  
207 %前向演算法計算各神經網路節點的線性組合值和active值
208 %W1為 hiddenSize*visibleSize的矩陣
209 %data為 visibleSize* trainexampleNum的矩陣
210 %remat(b1,1,m)把向量b1複製擴充套件為hiddenSize*m列
211 % 根據公式 Z^(l) = z^(l-1)*W^(l-1)+b^(l-1)
212 %z2儲存的是10000個樣本下隱藏層的輸入，為hiddenSize*m維的矩陣，每一列代表一次輸入
213 z2= W1*data + remat(b1,1,m)；%第二層的輸入
214 a2 = sigmoid(z2); %對z2取sigmod 即得到a2，即隱藏層的輸出
215 z3 = W2*a2+repmat(b2,1,m); %output layer 的輸入
216 a3 = sigmoid(z3); %output 層的輸出
217  
218 % 計算預測產生的誤差
219 %對應J(W,b), 外邊的sum是對所有樣本求和，裡邊的sum是對輸出層的所有分量求和
220 Jcost = (0.5/m)*sum(sum((a3-data).^2));
221 %計算權值懲罰項 正則化項，並沒有帶正則項引數
222 Jweight = (1/2)*(sum(sum(W1.^2))+sum(sum(W2.^2)));
223 %計算稀疏性規則項 sum(matrix,2)是進行按行求和運算，即所有樣本在隱層的輸出累加求均值
224 % rho為一個hiddenSize*1 維的向量
225  
226 rho = (1/m).*sum(a2,2);%求出隱含層輸出aj的平均值向量 rho為hiddenSize維的
227 %求稀疏項的損失
228 Jsparse = sum(sparsityParam.*log(sparsityParam./rho)+(1-sparsityParam).*log((1-sparsityParam)./(1-rho)));
229 %損失函式的總表示式 損失項 + 正則化項 + 稀疏項
230 cost = Jcost + lambda*Jweight + beta*Jsparse;
231 %計算l = 3 即 output-layer層的誤差dleta3，因為在autoencoder中輸入等於輸出h(W,b)=x
232 delta3 = -(data-a3).*sigmoidInv(z3);
233 %因為加入了稀疏規則項，所以計算偏導時需要引入該項，sterm為稀疏項，為hiddenSize維的向量
234 sterm = beta*(-sparsityParam./rho+(1-sparsityParam)./(1-rho))
235 % W2 為64*25的矩陣，d3為第三層的輸出為64*10000的矩陣，每一列為每個樣本x^(i)的輸出，W2'為W2的轉置
236 % repmat(sterm,1,m)會把函式複製擴充套件為m列的矩陣，每一列都為sterm向量。
237 % d2為hiddenSize*10000的矩陣
238 delta2 = (W2'*delta3+repmat(sterm,1,m)).*sigmoidInv(z2);
239  
240 %計算W1grad
241 % data'為10000*64的矩陣 d2*data' 位25*64的矩陣
242 W1grad = W1grad+delta2*data';
243 W1grad = (1/m)*W1grad+lambda*W1;
244  
245 %計算W2grad 
246 W2grad = W2grad+delta3*a2';
247 W2grad = (1/m).*W2grad+lambda*W2;
248  
249 %計算b1grad
250 b1grad = b1grad+sum(delta2,2);
251 b1grad = (1/m)*b1grad;%注意b的偏導是一個向量，所以這裡應該把每一行的值累加起來
252  
253 %計算b2grad
254 b2grad = b2grad+sum(delta3,2);
255 b2grad = (1/m)*b2grad;
256 %計算完成重新轉為向量
257 grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];
258 end
259  
260 %-------------------------------------------------------------------
261 % Here's an implementation of the sigmoid function, which you may find useful
262 % in your computation of the costs and the gradients.  This inputs a (row or
263 % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)).
264  
265 function sigm = sigmoid(x)
266     sigm = 1 ./ (1 + exp(-x));
267 end
268  
269 %sigmoid函式的導函式
270 function sigmInv = sigmoidInv(x)
271     sigmInv = sigmoid(x).*(1-sigmoid(x));
272 end
273  
274 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 對應step 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
275 %三個函式：（checkNumericalGradient）（simpleQuadraticFunction）（computeNumericalGradient）
276 function [] = checkNumericalGradient()
277 x = [4; 10];
278 %當前簡單函式實際的值與實際的導函式
279 [value, grad] = simpleQuadraticFunction(x);
280 % 在點 x 處計算簡單函式的梯度，("@simpleQuadraticFunction" denotes a pointer to a function.)
281 numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);
282 % disp()等價於 print()
283 disp([numgrad grad]);
284 fprintf('The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n');
285 % norm 等價於 sqrt(sum(X.^2)); 如果實現正確，設定 EPSILON = 0.0001，誤差應該為2.1452e-12
286 diff = norm(numgrad-grad)/norm(numgrad+grad);
287 disp(diff);
288 fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n');
289 end
290  
291  %這個簡單函式用來檢驗寫的computeNumericalGradient函式的正確性
292 function [value,grad] = simpleQuadraticFunction(x)
293 % this function accepts a 2D vector as input.
294 % Its outputs are:
295 %   value: h(x1, x2) = x1^2 + 3*x1*x2
296 %   grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2
297 % Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming
298 % that computeNumericalGradients will use only the first returned value of this function.
299 value = x(1)^2 + 3*x(1)*x(2);
300 grad = zeros(2, 1);
301 grad(1)  = 2*x(1) + 3*x(2);
302 grad(2)  = 3*x(1);
303 end
304  
305 %梯度檢驗的函式
306 function numgrad = computeNumericalGradient(J, theta)
307 % theta: 引數，向量或者實數均可
308 % J: 輸出值為實數的函式. 呼叫y = J(theta)將會返回函式在theta處的值
309  
310 % numgrad初始化為0,與theta維度相同
311 numgrad = zeros(size(theta));
312 EPSILON = 1e-4;
313 % theta是一個行向量，size(theta,1)是求行數
314 n = size(theta,1);
315 %產生一個維度為n的單位矩陣
316 E = eye(n);
317 for i = 1:n
318     % (n,:)代表第n行，所有的列
319     % (:,n)代表所有行，第n列
320     % 由於E是單位矩陣，所以只有第i行第i列的元素變為EPSILON
321     delta = E(:,i)*EPSILON;
322     %向量第i維度的值
323     numgrad(i) = (J(theta+delta)-J(theta-delta))/(EPSILON*2.0);
324 end
325 %% ---------------------------------------------------------------
326  
327 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 對應step 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
328 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%關於函式的展示%%%%%%%%%%%%%%%%%%%%%%%%%%%
329 function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)
330 % This function visualizes filters in matrix A. Each column of A is a
331 % filter. We will reshape each column into a square image and visualizes
332 % on each cell of the visualization panel.
333 % All other parameters are optional, usually you do not need to worry
334 % about it.
335 % opt_normalize: whether we need to normalize the filter so that all of
336 % them can have similar contrast. Default value is true.
337 % opt_graycolor: whether we use gray as the heat map. Default is true.
338 % cols: how many columns are there in the display. Default value is the
339 % squareroot of the number of columns in A.
340 % opt_colmajor: you can switch convention to row major for A. In that
341 % case, each row of A is a filter. Default value is false.
342 warning off all
343  
344 if ~exist('opt_normalize', 'var') || isempty(opt_normalize)
345     opt_normalize= true;
346 end
347  
348 if ~exist('opt_graycolor', 'var') || isempty(opt_graycolor)
349     opt_graycolor= true;
350 end
351  
352 if ~exist('opt_colmajor', 'var') || isempty(opt_colmajor)
353     opt_colmajor = false;
354 end
355  
356 % rescale
357 A = A - mean(A(:));
358  
359 if opt_graycolor, colormap(gray); end
360  
361 % compute rows, cols
362 [L M]=size(A);
363 sz=sqrt(L);
364 buf=1;
365 if ~exist('cols', 'var')
366     if floor(sqrt(M))^2 ~= M
367         n=ceil(sqrt(M));
368         while mod(M, n)~=0 && n<1.2*sqrt(M), n=n+1; end
369         m=ceil(M/n);
370     else
371         n=sqrt(M);
372         m=n;
373     end
374 else
375     n = cols;
376     m = ceil(M/n);
377 end
378  
379 array=-ones(buf+m*(sz+buf),buf+n*(sz+buf));
380  
381 if ~opt_graycolor
382     array = 0.1.* array;
383 end
384  
385  
386 if ~opt_colmajor
387     k=1;
388     for i=1:m
389         for j=1:n
390             if k>M,
391                 continue;
392             end
393             clim=max(abs(A(:,k)));
394             if opt_normalize
395                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;
396             else
397                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/max(abs(A(:)));
398             end
399             k=k+1;
400         end
401     end
402 else
403     k=1;
404     for j=1:n
405         for i=1:m
406             if k>M,
407                 continue;
408             end
409             clim=max(abs(A(:,k)));
410             if opt_normalize
411                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;
412             else
413                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz);
414             end
415             k=k+1;
416         end
417     end
418 end
419  
420 if opt_graycolor
421     h=imagesc(array,'EraseMode','none',[-1 1]);
422 else
423     h=imagesc(array,'EraseMode','none',[-1 1]);
424 end
425 axis image off
426  
427 drawnow;
428  
429 warning on all

CS229 6.5 Neurons Networks Implements of Sparse Autoencoder

CS229 6.13 Neurons Networks Implements of stack autoencoder

對於加深網路層數帶來的問題，（gradient diffuse 區域性最優等）可以使用逐層預訓練（pre-training）的方法來避免 Stack-Autoencoder是一種逐層貪婪（Greedy layer-wise training）的訓練方法，逐層貪婪的主要思路是每次只訓練網路中的一層

CS229 6.8 Neurons Networks implements of PCA ZCA and whitening

PCA 給定一組二維資料，每列十一組樣本，共45個樣本點 -6.7644914e-01 -6.3089308e-01 -4.8915202e-01 ... -4.4722050e-01 -7.4778067e-01 -3.90743

CS229 6.10 Neurons Networks implements of softmax regression

softmax可以看做只有輸入和輸出的Neurons Networks,如下圖：其引數數量為k*(n+1) ,但在本實現中沒有加入截距項，所以引數為k*n的矩陣。對損失函式J(θ)的形式有：演算法步驟：首先，載入資料集{x(1),x(2),x(3)...x(m)}該資料集為一個n*m的

CS229 6.11 Neurons Networks implements of self-taught learning

在machine learning領域，更多的資料往往強於更優秀的演算法，然而現實中的情況是一般人無法獲取大量的已標註資料，這時候可以通過無監督方法獲取大量的未標註資料，自學習（ self-taught learning）與無監督特徵學習（unsupervised feature learning）

CS229 6.16 Neurons Networks linear decoders and its implements

Sparse AutoEncoder是一個三層結構的網路，分別為輸入輸出與隱層，前邊自編碼器的描述可知，神經網路中的神經元都採用相同的激勵函式，Linear Decoders 修改了自編碼器的定義，對輸出層與隱層採用了不用的激勵函式，所以 Linear Decoder 得到的模型更容易應用，而且對模型的引數變

CS229 6.4 Neurons Networks Autoencoders and Sparsity

BP演算法是適合監督學習的，因為要計算損失函式,計算時y值又是必不可少的，現在假設有一系列的無標籤train data：，其中，autoencoders是一種無監督學習演算法，它使用了本身作為標籤以此來使用BP演算法進行訓練，即，見如下示例：自編碼器嘗試學習一個&n

CS229 6.3 Neurons Networks Gradient Checking

BP演算法很難除錯，一般情況下會隱隱存在一些小問題，比如（off-by-one error），即只有部分層的權重得到訓練，或者忘記計算bais unit，這雖然會得到一個正確的結果，但效果差於準確BP得到的結果。有了cost function，目標是求出一組引數W，b，這裡以表示，cost functio

CS229 6.2 Neurons Networks Backpropagation Algorithm

今天得主題是BP演算法。大規模的神經網路可以使用batch gradient descent演算法求解，也可以使用 stochastic gradient descent 演算法，求解的關鍵問題在於求得每層中每個引數的偏導數，BP演算法正是用來求解網路中引數的偏導數問題的。先上一張吊炸

CS229 6.7 Neurons Networks whitening

PCA的過程結束後，還有一個與之相關的預處理步驟，白化（whitening）對於輸入資料之間有很強的相關性，所以用於訓練資料是有很大冗餘的，白化的作用就是降低輸入資料的冗餘，通過白化可以達到（1）降低特徵之間的相關性（2）所有特徵同方差，白化是需要與平滑與PCA結合的，下邊來看如何結合。對於訓練資料{

CS229 6.9 Neurons Networks softmax regression

SoftMax迴歸模型，是logistic迴歸在多分類問題的推廣，即現在logistic迴歸資料中的標籤y不止有0-1兩個值，而是可以取k個值，softmax迴歸對諸如MNIST手寫識別庫等分類很有用，該問題有0-9 這10個數字，softmax是一種supervised learning方法。在logi

CS229 6.12 Neurons Networks from self-taught learning to deep network

self-taught learning 在特徵提取方面完全是用的無監督的方法，對於有標記的資料，可以結合有監督學習來對上述方法得到的引數進行微調，從而得到一個更加準確的引數a。在self-taught learning中，首先用無標記資料訓練一個sparse autoencoder，這樣用對於原始輸入

CS229 6.14 Neurons Networks Restricted Boltzmann Machines

1.RBM簡介受限玻爾茲曼機（Restricted Boltzmann Machines，RBM）最早由hinton提出，是一種無監督學習方法，即對於給定資料，找到最大程度擬合這組資料的引數。RBM常用於降維，分類，迴歸與協同過濾，特徵學習甚至 topic model ，其網路結構如下： RBM是一

CS229 6.15 Neurons Networks Deep Belief Networks

Hintion老爺子在06年的science上的論文裡闡述了 RBMs 可以堆疊起來並且通過逐層貪婪的方式來訓練，這種網路被稱作Deep Belife Networks(DBN)，DBN是一種可以學習訓練資料的高層特徵表示的網路，DBN是一種生成模型，可見變數與

CS229 6.17 Neurons Networks convolutional neural network（cnn）

之前所講的影象處理都是小 patchs ，比如28*28或者36*36之類，考慮如下情形，對於一副1000*1000的影象，即106，當隱層也有106節點時，那麼W(1)的數量將達到1012級別，為了減少引數規模，加快訓練速度，CNN應運而生。CNN就像辟邪劍譜一樣，正常人練得很挫，一旦自宮後，就變得很厲害。

(六) 6.1 Neurons Networks Representation

面對複雜的非線性可分的樣本是，使用淺層分類器如Logistic等需要對樣本進行復雜的對映，使得樣本在對映後的空間是線性可分的，但在原始空間，分類邊界可能是複雜的曲線。比如下圖的樣本只是在2維情形下的示例，假設有100維度，即特徵數目是100，若使用logistic來做分類，對於這種線性不可分的情形，

CS229 6.6 Neurons Networks PCA主成分分析

主成分分析（PCA）是一種經典的降維演算法，基於基變換，資料原來位於標準座標基下，將其投影到前k個最大特徵值對應的特徵向量所組成的基上，使得資料在新基各個維度有最大的方差，且在新基的各個維度上資料是不相關的，PCA有幾個關鍵的點： 1）歸一化均值與方差，均值歸一化後便於計算，方差歸一化後便於對各個維度進行比

Reset Password 重置密碼 (CentOS 5,6,7 ; Juniper Networks: SRX100 )

HA OS share pass entos iyu AS 官網 centos 5 一些重置root 密碼的文檔分享（來自官網）： CentOS 5，6，7 Juniper Networks : SRX100 鏈接：https://share.weiyun.

CentOS 6.5 安裝 Redis 執行 make #error "Newer version of jemalloc required"

CentOS 6.5 安裝 Redis 執行 make #error "Newer version of jemalloc required" 安裝過程參考：CentOS 6.5下Redis安裝記錄根據你係統安裝時或之後安裝的選項的情況，可能在安裝的過程中遇到不同的問題；就像今天遇到的一臺機器，安裝

配置使用Eslint的時候版本錯誤 "eslint": "5.6.0" a different version of eslint was detected higher up in the tr

1，如果你也遇到下面的問題，你可以按照命令列提示的那樣，下面這四步完成的一般就可以了，但是不排除你在執行的時候忘記下載某些依賴。【1】刪除 package-lock.json(不是package.json) 你可以選擇 yarn.lock 或者不要，我

CS229 6.5 Neurons Networks Implements of Sparse Autoencoder

相關推薦