As it is a two-dimensional operation, andan explanation in text may lead to more confusion, let’s go through an example.
Consider the 4×4 pixel input image, which isexpressed by the matrix shown in Figure 6-15.
圖6-15 4×4畫素的輸入影象The four-by-four pixel input image
We combine the pixels of the input imageinto a 2×2 matrixwithout overlapping the elements.
Once the input image passes through thepooling layer, it shrinks into a 2×2 pixel image.
圖6- 16示出了使用平均池化和最大池化的輸出結果。
Figure 6-16 shows the resultant cases ofpooling using the mean pooling and max pooling.
圖6-16 兩種不同方法池化後的結果The resultant cases ofpooling using two different methods
Actually, in a mathematical sense, thepooling process is a type of convolution operation.
The difference from the convolution layeris that the convolution filter is stationary, and the convolution areas do notoverlap.
The example provided in the next sectionwill elaborate on this.
The pooling layer compensates for eccentricand tilted objects to some extent.
For example, the pooling layer can improvethe recognition of a cat, which may be off-center in the input image.
In addition, as the pooling process reducesthe image size, it is highly beneficial for relieving the computational loadand preventing overfitting.
示例:MNIST(Example: MNIST)
We implement a neural network that takesthe input image and recognizes the digit that it represents.
訓練資料採用MNIST(Mixed National Institute of Standardsand Technology,國家標準與技術混合研究所)資料庫,它包含70000個手寫數字影象。
The training data is the MNIST database,which contains 70,000 images of handwritten numbers.
In general, 60,000 images are used fortraining, and the remaining 10,000 images are used for the validation test.
Each digit image is a 28-by-28 pixelblack-and-white image, as shown in Figure 6-17.
圖6-17 MNIST資料庫中的28x28畫素黑白影象A 28-by-28 pixelblack-and-white image from the MNIST database
Considering the training time, this exampleemploys only 10,000 images with the training data and verification data in an8:2 ratio.
——本文譯自Phil Kim所著的《Matlab Deep Learning》