Model representation

這裡寫圖片描述
作為例子模型，神經網路有三層，一個輸入層，一個隱藏層和一個輸出層。
在手寫數字識別中，輸入是20*20的圖片畫素值，因此輸入層除偏差單元1外，有400個單元。
第二層有25個單元；輸出層有10單元，對應識別的數字（類），0用10來表示。
訓練集依舊用X，y來表示。
中間引數為Theta1和Theta2。

Load data and set parameters

.mat格式的資料檔案，直接儲存了X，y。

fileName = 'xxx.mat';
load(fileName, 'X', 'y');

m = size(X, 1);
input_layer_size  = 400 
;  % 20x20 Input Images of Digits
hidden_layer_size = 25;   % 25 hidden units
num_labels = 10;          % 10 labels, from 1 to 10   
                          % (note that we have mapped "0" to label 10)

Visualize data

隨機選擇100個數據點，視覺化。
randperm(n)，可以把隨機打亂一個n長度的數字序列。

sel = randperm(m);
sel = sel(1:100);

displayData(X(sel, :));

displayData函式

function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
%   [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
%   stored in X in a nice grid. It returns the figure handle h and the 
%   displayed array if requested. 


% Set example_width automatically if not passed in
    if ~exist('example_width', 'var') || isempty(example_width) 
        example_width = round(sqrt(size(X, 2)));
    end

% Gray Image
    colormap(gray);

% Compute rows, cols
    [m n] = size(X);
    example_height = (n / example_width);

% Compute number of items to display
    display_rows = floor(sqrt(m));
    display_cols = ceil(m / display_rows);

% Between images padding
    pad = 1;

% Setup blank display
    display_array = - ones(pad + display_rows * (example_height + pad), ...
                            pad + display_cols * (example_width + pad));

% Copy each example into a patch on the display array
    curr_ex = 1;
    for j = 1:display_rows
        for i = 1:display_cols
            if curr_ex > m, 
                break; 
            end
            % Copy the patch

            % Get the max value of the patch
            max_val = max(abs(X(curr_ex, :)));
            display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
                          pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
                            reshape(X(curr_ex, :), example_height, example_width) / max_val;
            curr_ex = curr_ex + 1;
        end
        if curr_ex > m, 
            break; 
        end
    end

% Display Image
    h = imagesc(display_array, [-1 1]);

% Do not show axis
    axis image off

    drawnow;

end

Train parameters

Random initialize Theta

θ(l)的隨機初始化值通常落在[−ϵinit,ϵinit]。這個範圍內的數保證了引數足夠小，並且使得學習過程更有效率。一種有效的策略是使得ϵinit=6√Lin+Lout√，其中Lin=sl，Lout=sl+1取決於與θ(l)相鄰的兩側。

% 隨機初始化theta
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
% 將引數合併，便於傳參
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];

randInitializeWeights函式

function W = randInitializeWeights(L_in, L_out)
    W = zeros(L_out, 1 + L_in);

    epsilon_init = sqrt(6) / sqrt(L_in + L_out);
    W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;
end

Cost Function

首先Feedforward算出hθ(x)，z(t)還有a(t)的取值，然後算出J之後正規化。
接著用backpropagation計算Theta_grad的值，其過程如下：
對每一個樣例t=1：m

對在輸出層的每個輸出單元k（例子，第3層），δ(3)k=(a(3)k−yk)，其中yk∈{0,1}，為1資料類k，反之不屬於。
對隱藏層l=2，δ(2)=(θ(2))Tδ(3).∗g′(z(2))
只去掉輸入層後一層的δ的首行值，接著累加grad=grad+δ(l+1)(a(l))T
grad=1mgrad
正規化

costFunction函式

function [J grad] = CostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

    m = size(X, 1);

    Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                     hidden_layer_size, (input_layer_size + 1));

    Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                     num_labels, (hidden_layer_size + 1));

    J = 0;
    Theta1_grad = zeros(size(Theta1));
    Theta2_grad = zeros(size(Theta2));

% Feedforward
    a1 = [ones(m, 1) X];
    z2 = a1 * Theta1';
    a2 = [ones(m, 1) sigmoid( z2 )];
    z3 = a2 * Theta2';
    a3 = sigmoid( z3 );
    hx = a3;

    for i = 1 : m
        y_vec = zeros(1, num_labels);
        y_vec(y(i)) = 1;
        J = J + sum( -y_vec .* log(hx(i, :)) - (1 - y_vec) .* log(1 - hx(i, :)) );
    end
    J = J / m;
    J = J + lambda/(2*m) * (sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));

% backpropagation
    for i = 1 : m
        y_vec = zeros(1, num_labels);
        y_vec(y(i)) = 1;
        delta3 = a3(i, :) - y_vec; %1 10
        delta2 = delta3 * Theta2 .* [0 sigmoidGradient(z2(i, :))]; %1 26
        delta2 = delta2(2 : end); %1 25
        Theta2_grad = Theta2_grad + delta3' * a2(i, :);
        Theta1_grad = Theta1_grad + delta2' * a1(i, :);
    end
    Theta2_grad = Theta2_grad / m;
    Theta1_grad = Theta1_grad / m;

% regularization
    Theta1(:, 1) = 0;
    Theta1_grad = Theta1_grad + lambda / m * Theta1;
    Theta2(:, 1) = 0;
    Theta2_grad = Theta2_grad + lambda / m * Theta2;

    grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

sigmoid函式

function g = sigmoid(z)
    g = 1.0 ./ (1.0 + exp(-z));
end

sigmoid求導函式

function g = sigmoidGradient(z)
    g = sigmoid(z);
    g = g .* ( 1 - g );
end

Train

呼叫fmincg來訓練，這個函式是octave裡的函式。
使用fminunc和fminsearch都會宕機，原因應該是資料量太大了。

options = optimset('MaxIter', 50);
lambda = 1;

f = @(p) CostFunction(p, ...
                        input_layer_size, ...
                        hidden_layer_size, ...
                        num_labels, X, y, lambda);

[nn_params, cost] = fmincg(f, initial_nn_params, options);

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

fmincg函式

function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
% Minimize a continuous differentialble multivariate function. Starting point
% is given by "X" (D by 1), and the function named in the string "f", must
% return a function value and a vector of partial derivatives. The Polack-
% Ribiere flavour of conjugate gradients is used to compute search directions,
% and a line search using quadratic and cubic polynomial approximations and the
% Wolfe-Powell stopping criteria is used together with the slope ratio method
% for guessing initial step sizes. Additionally a bunch of checks are made to
% make sure that exploration is taking place and that extrapolation will not
% be unboundedly large. The "length" gives the length of the run: if it is
% positive, it gives the maximum number of line searches, if negative its
% absolute gives the maximum allowed number of function evaluations. You can
% (optionally) give "length" a second component, which will indicate the
% reduction in function value to be expected in the first line-search (defaults
% to 1.0). The function returns when either its length is up, or if no further
% progress can be made (ie, we are at a minimum, or so close that due to
% numerical problems, we cannot get any closer). If the function terminates
% within a few iterations, it could be an indication that the function value
% and derivatives are not consistent (ie, there may be a bug in the
% implementation of your "f" function). The function returns the found
% solution "X", a vector of function values "fX" indicating the progress made
% and "i" the number of iterations (line searches or function evaluations,
% depending on the sign of "length") used.
%
% Usage: [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
%
% See also: checkgrad 
%
% Copyright (C) 2001 and 2002 by Carl Edward Rasmussen. Date 2002-02-13
%
%
% (C) Copyright 1999, 2000 & 2001, Carl Edward Rasmussen
% 
% Permission is granted for anyone to copy, use, or modify these
% programs and accompanying documents for purposes of research or
% education, provided this copyright notice is retained, and note is
% made of any changes that have been made.
% 
% These programs and documents are distributed without any warranty,
% express or implied.  As the programs were written for research
% purposes only, they have not been tested to the degree that would be
% advisable in any important application.  All use of these programs is
% entirely at the user's own risk.
%
% [ml-class] Changes Made:
% 1) Function name and argument specifications
% 2) Output display
%

% Read options
if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter')
    length = options.MaxIter;
else
    length = 100;
end


RHO = 0.01;                            % a bunch of constants for line searches
SIG = 0.5;       % RHO and SIG are the constants in the Wolfe-Powell conditions
INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                    % extrapolate maximum 3 times the current bracket
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 100;                                      % maximum allowed slope ratio

argstr = ['feval(f, X'];                      % compose string used to call function
for i = 1:(nargin - 3)
  argstr = [argstr, ',P', int2str(i)];
end
argstr = [argstr, ')'];

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
S=['Iteration '];

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
fX = [];
[f1 df1] = eval(argstr);                      % get function value and gradient
i = i + (length<0);                                            % count epochs?!
s = -df1;                                        % search direction is steepest
d1 = -s'*s;                                                 % this is the slope
z1 = red/(1-d1);                                  % initial step is red/(|s|+1)

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; f0 = f1; df0 = df1;                   % make a copy of current values
  X = X + z1*s;                                             % begin line search
  [f2 df2] = eval(argstr);
  i = i + (length<0);                                          % count epochs?!
  d2 = df2'*s;
  f3 = f1; d3 = d1; z3 = -z1;             % initialize point 3 equal to point 1
  if length>0, M = MAX; else M = min(MAX, -length-i); end
  success = 0; limit = -1;                     % initialize quanteties
  while 1
    while ((f2 > f1+z1*RHO*d1) || (d2 > -SIG*d1)) && (M > 0) 
      limit = z1;                                         % tighten the bracket
      if f2 > f1
        z2 = z3 - (0.5*d3*z3*z3)/(d3*z3+f2-f3);                 % quadratic fit
      else
        A = 6*(f2-f3)/z3+3*(d2+d3);                                 % cubic fit
        B = 3*(f3-f2)-z3*(d3+2*d2);
        z2 = (sqrt(B*B-A*d2*z3*z3)-B)/A;       % numerical error possible - ok!
      end
      if isnan(z2) || isinf(z2)
        z2 = z3/2;                  % if we had a numerical problem then bisect
      end
      z2 = max(min(z2, INT*z3),(1-INT)*z3);  % don't accept too close to limits
      z1 = z1 + z2;                                           % update the step
      X = X + z2*s;
      [f2 df2] = eval(argstr);
      M = M - 1; i = i + (length<0);                           % count epochs?!
      d2 = df2'*s;
      z3 = z3-z2;                    % z3 is now relative to the location of z2
    end
    if f2 > f1+z1*RHO*d1 || d2 > -SIG*d1
      break;                                                % this is a failure
    elseif d2 > SIG*d1
      success = 1; break;                                             % success
    elseif M == 0
      break;                                                          % failure
    end
    A = 6*(f2-f3)/z3+3*(d2+d3);                      % make cubic extrapolation
    B = 3*(f3-f2)-z3*(d3+2*d2);
    z2 = -d2*z3*z3/(B+sqrt(B*B-A*d2*z3*z3));        % num. error possible - ok!
    if ~ 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    神經網路 手寫識別例子 matlab實現
      
							
							
							




Model representation

 
作為例子模型，神經網路有三層，一個輸入層，一個隱藏層和一個輸出層。 
在手寫數字識別中，輸入是20*20的圖片畫素值，因此輸入層除偏差單元1外，有400個單元。 
第二層有25個單元；輸出層有10單元，對 

  
 

    

    
    使用 tensorlayer 組建 殘差網路 resnet 實現 mnist 手寫識別例子
      
							
							
							最近學習殘差網路，非常給力，即使是深層網路也能很快收斂 
這裡的程式碼構建了一個17層的網路，5 epoch就能達到96%以上準確率

lost-損失，acc-準確率

不過發現幾個問題 
1.使用訓練過程中，lost值會先減小，然後會一直增大，而acc值卻在一 

  
 

    

    
    TensorFlow.js 卷積神經網路手寫數字識別
       
  
  
 原博地址https://laboo.top/2018/11/21/tfjs-dr/ 
 原始碼 
 digit-recognizer 
 demo 
 https://github-laziji.github.io/digit-recognizer/ 演示開始時需要載入大概100M的訓練資料 

  
 

    

    
    TensorFlow 高階之二 （卷積神經網路手寫字型識別）
      
							
							
							
一、資料集獲取
前言

在梯度下降和最優化部分用傳統的神經網路在MNIST資料集上得到了90%左右的準確率。結果其實並不太理想。
接下來，我們將使用卷積神經網路來得到一個準確率更高的模型，接近99%。卷積神經網路使用共享的卷積核對影象進行卷積操作，以提取影象深 

  
 

    

    
    神經網路-手寫字型識別
      3層神經網路，自定義輸入節點、隱藏層、輸出節點的個數，使用sigmoid函式作為啟用函式，梯度下降法進行權重的優化。 
使用MNIST資料集，進行手寫數字識別 
 
   1 #!/usr/bin/env python
  2 # -*- coding:utf-8 -*-
  3 
  4 #!/usr/bi 

  
 

    

    
    卷積神經網路CNN原理——結合例項matlab實現
      
							
							
							
								            
							
							
							卷積神經網路CNN是深度學習的一個重要組成部分，由於其優異的學習效能（尤其是對圖片的識別）。近年來研究異常火爆，出現了很多模型LeNet、Alex net、ZF net等等 

  
 

    

    
    機器學習與資料探勘-logistic迴歸及手寫識別例項的實現
      
                
本文主要介紹logistic迴歸相關知識點和一個手寫識別的例子實現
一、logistic迴歸介紹：
logistic迴歸演算法很簡單，這裡簡單介紹一下：
1、和線性迴歸做一個簡單的對比
下圖就是一個簡單的線性迴歸例項，簡單一點就是一個線性方程表示

(就是用來描述自變數和因 

  
 

    

    
    [純C#實現]基於BP神經網路的中文手寫識別演算法
      效果展示 
 
 
這不是OCR,有些人可能會覺得這東西會和OCR一樣,直接進行整個字的識別就行,然而並不是. OCR是2維畫素矩陣的畫素資料.而手寫識別不一樣,手寫可以把使用者寫字的筆畫時間順序,抽象成一個維度.這樣識別的就是3維的資料了.識別起來簡單很多. 
最近需要做一箇中文手寫識別演算法.搜尋了網上的 

  
 

    

    
    機器學習筆記：tensorflow實現卷積神經網路經典案例--識別手寫數字
      
							
							
							從識別手寫數字的案例開始認識神經網路，並瞭解如何在tensorflow中一步步建立卷積神經網路。



安裝tensorflow





資料來源

kaggle新手入門的數字識別案例，包含手寫0-9的灰度值影象的csv檔案，下載地址：https://www. 

  
 

    

    
    tensorflow 學習筆記12 迴圈神經網路RNN LSTM結構實現MNIST手寫識別
      
                
長短時記憶網路（LSTM）就是為了解決在複雜的場景中，有用資訊的間隔有大有小、長短不一問題。LSTM是一種擁有三個門結構的特殊網路結構。

LSTM靠一些門的結構讓資訊有選擇的影響迴圈神經網路中每個時刻的狀態。所謂門的結構就是一個使用sigmoid神經網路和按位做乘法的操作 

  
 

    

    
    python神經網路解決手寫識別問題演算法和程式碼
       
 
 1.演算法 
  
 2.程式碼 
 import numpy
# scipy.special for the sigmoid function expit()
import scipy.special
# library for plotting arrays
import matplotlib. 

  
 

    

    
    python神經網路程式設計（手寫識別字型）
       
 
 從《python神經網路程式設計》一書中提取的識別手寫字型的神經網路程式碼 
 訓練集：http://www.pjreddie.com/media/files/mnist_train.csv 測試集：http://www.pjreddie.com/media/files/mnist_test.csv 

  
 

    

    
    tensorflow-GPU 一步步搭建網路實現MNIST手寫識別
      
                1. MNIST資料集的匯入

這裡介紹一下MNIST，MNIST是在機器學習領域中的一個經典問題。該問題解決的是把28x28畫素的灰度手寫數字圖片識別為相應的數字，其中數字的範圍從0到9.

首先我們要匯入MNIST資料集，這裡需要用到一個input_data.py檔案，在 

  
 

    

    
    Python神經網路是這樣識別手寫字元噠？
      
                點選關注非同步圖書，置頂公眾號每天與你分享 IT好書 技術乾貨 職場知識 參與文末話題討論，每日贈送非同步圖書——非同步小編當谷歌的AlphaGo戰勝了人類頂級棋手，人工智慧開始更多進入大眾視野。而谷歌AI教父認為：“AlphaGo有直覺神經網路已接近大腦”。千百年來， 

  
 

    

    
    基於BP人工神經網路的數字字元識別及MATLAB實現
      
                
應用背景：在模式識別中，有一種高實用性的分類方法，就是人工神經網路，它被成功應用於智慧機器人、自動控制、語音識別、預測估計、生物、醫學、經濟等領域，解決了許多其他分類方法難以解決的實際問題。這得益於神經網路的模型比較多，可針對不同的問題使用相應的神經網路模型，這裡使用BP神 

  
 

    

    
    tensorflow使用softmax regression算法實現手寫識別
      tutorial   nbsp   書籍   數據集   feed   每一個   cast   amp   類別   最近在學習黃文堅的TensorFlow書籍，希望對學習做一個總結。
softmax regression算法原理：當我們對一張圖片進行預測時，會計算每一個數字的可能性，如3的概率是3%，5的 

  
 

    

    
    【機器學習演算法實現】kNN演算法 手寫識別——基於Python和NumPy函式庫
       
  
  
 分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow
 
 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！
 
 
          

  
 

    

    
    TensorFlow實現CNN卷積神經網路對手寫數字集mnist的模型訓練
       
 
 
  
  mnist手寫數字集相當於是TensorFlow應用中的Helloworld。 
  在學習了TensorFlow的卷積神經網路應用之後，今天就分步解析一下其應用過程 
  
  一、mnist手寫數字資料集 
 
         MN 

  
 

    

    
    matlab bp神經網路的簡單小例子
      
                
因為要寫論文做實驗，所以自己研究了一些關於神經網路的演算法，用matlab做的一些小測試，使用的是《matlab神經網路43個案例分析》這本書，感覺還不錯，分享一下。
使用每行的前4個數據預測第5個數據的值。
資源地址    http://download.csdn.net 

  
 

    

    
    tensorflow 1.01中GAN(生成對抗網路)手寫字型生成例子(MINST)的測試
      
                
為了更好地掌握GAN的例子，從網上找了段程式碼進行跑了下，測試了效果。具體過程如下：

程式碼檔案如下：
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
i

神經網路手寫識別例子 matlab實現

Model representation

Load data and set parameters

Visualize data

Train parameters

Random initialize Theta

Cost Function

Train

神經網路手寫識別例子 matlab實現

使用 tensorlayer 組建殘差網路 resnet 實現 mnist 手寫識別例子

TensorFlow.js 卷積神經網路手寫數字識別

TensorFlow 高階之二（卷積神經網路手寫字型識別）

神經網路-手寫字型識別

卷積神經網路CNN原理——結合例項matlab實現

機器學習與資料探勘-logistic迴歸及手寫識別例項的實現

[純C#實現]基於BP神經網路的中文手寫識別演算法

機器學習筆記：tensorflow實現卷積神經網路經典案例--識別手寫數字

tensorflow 學習筆記12 迴圈神經網路RNN LSTM結構實現MNIST手寫識別

python神經網路解決手寫識別問題演算法和程式碼

python神經網路程式設計（手寫識別字型）

tensorflow-GPU 一步步搭建網路實現MNIST手寫識別

Python神經網路是這樣識別手寫字元噠？

基於BP人工神經網路的數字字元識別及MATLAB實現

tensorflow使用softmax regression算法實現手寫識別

【機器學習演算法實現】kNN演算法手寫識別——基於Python和NumPy函式庫

TensorFlow實現CNN卷積神經網路對手寫數字集mnist的模型訓練

matlab bp神經網路的簡單小例子

tensorflow 1.01中GAN(生成對抗網路)手寫字型生成例子(MINST)的測試

神經網路 手寫識別例子 matlab實現

Model representation

Load data and set parameters

Visualize data

Train parameters

Random initialize Theta

Cost Function

Train

相關推薦

神經網路手寫識別例子 matlab實現