Caffe框架的理解(二):詳解AlexNet
引言
在2012年的時候,Geoffrey和他學生Alex為了迴應質疑者,在ImageNet的競賽中利用AlexNet一舉重新整理image classification的記錄,奠定了deep learning 在計算機視覺中的地位。這裡將利用對這一模型的分析學習caffe的結構。
AlexNet的模型結構
模型的檔案在根目錄的models/bvlc_reference_caffenet/deploy.prototxt,內容如附錄一所示。利用draw_net.py可獲得模型的結構影象,輸入命令:
python python/draw_net.py models/bvlc_reference_caffenet/deploy-gph.prototxt examples/AlexNet-gph/pic/alexnet.png --rankdir=TB --phase=ALL
得到的影象如附錄二所示。
逐層分析模型
1.各層的資料結構
在終端中輸入如下命令,準備相關環境:
gph@gph-pc:~ $ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import caffe
>>> import cv2
>>> import cv2.cv as cv
>>> caffe.set_mode_gpu()
>>> caffe_root = '/home/gph/Desktop/caffe-ssd/'
>>> model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy-gph.prototxt'
>>> model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
>>> img_file = caffe_root + 'examples/images/cat.jpg'
>>>
載入模型:
>>> net = caffe.Net(model_def, model_weights, caffe.TEST)
顯示所有的層以及對應的data和diff的維度:
輸入命令:
for layer, blob in net.blobs.iteritems():
... print layer + ' ' + str(blob.data.shape) + ' ' + str(blob.diff.shape)
...
輸出如下:
data (10, 3, 227, 227) (10, 3, 227, 227)
conv1 (10, 96, 55, 55) (10, 96, 55, 55)
pool1 (10, 96, 27, 27) (10, 96, 27, 27)
norm1 (10, 96, 27, 27) (10, 96, 27, 27)
conv2 (10, 256, 27, 27) (10, 256, 27, 27)
pool2 (10, 256, 13, 13) (10, 256, 13, 13)
norm2 (10, 256, 13, 13) (10, 256, 13, 13)
conv3 (10, 384, 13, 13) (10, 384, 13, 13)
conv4 (10, 384, 13, 13) (10, 384, 13, 13)
conv5 (10, 256, 13, 13) (10, 256, 13, 13)
pool5 (10, 256, 6, 6) (10, 256, 6, 6)
fc6 (10, 4096) (10, 4096)
fc7 (10, 4096) (10, 4096)
fc8 (10, 1000) (10, 1000)
prob (10, 1000) (10, 1000)
顯示有權重的層
輸入命令:
for layer, param in net.params.iteritems():
print layer + ' ' + str(param[0].data.shape) + ' ' + str(param[1].data.shape)
輸出結果為:
conv1 (96, 3, 11, 11) (96,)
conv2 (256, 48, 5, 5) (256,)
conv3 (384, 256, 3, 3) (384,)
conv4 (384, 192, 3, 3) (384,)
conv5 (256, 192, 3, 3) (256,)
fc6 (4096, 9216) (4096,)
fc7 (4096, 4096) (4096,)
fc8 (1000, 4096) (1000,)
2.分析
在caffe中存在兩種資料流動:
一種是需要處理的資料,從輸入層輸入,被各層一次處理,最後到輸出層得到輸出。這部分資料儲存在net.blobs的data中;同時,blob中diff還儲存著對應的梯度值,這是我們比較關心的兩種資料。
另一種是各層計算時需要用到的引數,也就是權重weights和偏置bias項,儲存在net.params[0]和net.params[1]中。
在AlexNet模型中,擁有卷積處理性質的層都會改變處理資料的大小。比如卷積層和池化層,他們都包含kernel_size這一引數,所以有可能改變資料的大小。是否能夠改變資料的大小,要看的與卷積相關的引數有kernel_size,pad,stride。計算公式如下y=(x+2*pad-kernel_size)/stride+1
卷積層所儲存的權重的維度由該層的kernel_size,group,num_output以及上一層的num_output決定,其值為(本層的num_output,上一層的num_output/group,kernel_size_w,kernel_size_h)。其原因和卷積是如歌進行的相關
全連線層的權重就簡單很多了,因為是全連結,引數就和上一層的神經元個數,輸出的神經元個數相關,為(num_output,上一層的神經元個數)
附錄一:deploy.prototxt內容
name: "CaffeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 1000
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}