AlexNet原理及Tensorflow實現
AlexNet的出現點燃了深度學習的熱潮,下面對其進行介紹,並使用tensorflow實現.
1. AlexNet網路結構
圖片來源:AlexNet的論文
整個網路有8個需要訓練的層,前5個為卷積層,最後3層為全連線層.
第一個卷積層
輸入的圖片大小為:224*224*3
第一個卷積層為:11*11*96即尺寸為11*11,有96個卷積核,步長為4,卷積層後跟ReLU,因此輸出的尺寸為 224/4=56,去掉邊緣為55,因此其輸出的每個feature map 為 55*55*96,同時後面跟LRN層,尺寸不變.
最大池化層,核大小為3*3,步長為2,因此feature map的大小為:27*27*96.
第二層卷積層
輸入的tensor為27*27*96
卷積和的大小為: 5*5*256,步長為1,尺寸不會改變,同樣緊跟ReLU,和LRN層.
最大池化層,和大小為3*3,步長為2,因此feature map為:13*13*256
第三層至第五層卷積層
輸入的tensor為13*13*256
第三層卷積為 3*3*384,步長為1,加上ReLU
第四層卷積為 3*3*384,步長為1,加上ReLU
第五層卷積為 3*3*256,步長為1,加上ReLU
第五層後跟最大池化層,核大小3*3,步長為2,因此feature map:6*6*256
第六層至第八層全連線層
接下來的三層為全連線層,分別為:
1. FC : 4096 + ReLU
2. FC:4096 + ReLU
3. FC: 1000
最後一層為softmax為1000類的概率值.
2. AlexNet中的trick
AlexNet將CNN用到了更深更寬的網路中,其效果分類的精度更高相比於以前的LeNet,其中有一些trick是必須要知道的.
ReLU的應用
AlexNet使用ReLU代替了Sigmoid,其能更快的訓練,同時解決sigmoid在訓練較深的網路中出現的梯度消失,或者說梯度彌散的問題.
Dropout隨機失活
隨機忽略一些神經元,以避免過擬合,
重疊的最大池化層
在以前的CNN中普遍使用平均池化層,AlexNet全部使用最大池化層,避免了平均池化層的模糊化的效果,並且步長比池化的核的尺寸小,這樣池化層的輸出之間有重疊,提升了特徵的豐富性.
提出了LRN層
區域性響應歸一化,對區域性神經元建立了競爭的機制,使得其中響應小打的值變得更大,並抑制反饋較小的.
使用了GPU加速計算
使用了gpu加速神經網路的訓練
資料增強
使用資料增強的方法緩解過擬合現象.
3. Tensorflow實現AlexNet
AlexNet訓練非常耗時,因此只定義網路結構,並進行前向後向的測試.這裡自己使用的是CPU執行的…
首先定義一個介面,輸入為影象,輸出為第五個卷積層最後的池化層的資料,和每一個層的引數資訊.都很簡單,如果不懂可以參考tensorflow實戰這本書或者共同交流.
def print_activations(t):
print(t.op.name, ' ', t.get_shape().as_list())
上面的函式為輸出當前層的引數的資訊.下面是我對開源實現做了一些引數上的修改,程式碼如下:
def inference(images):
"""Build the AlexNet model.
Args:
images: Images Tensor
Returns:
pool5: the last Tensor in the convolutional component of AlexNet.
parameters: a list of Tensors corresponding to the weights and biases of the
AlexNet model.
"""
parameters = []
# conv1
with tf.name_scope('conv1') as scope:
kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
print_activations(conv1)
parameters += [kernel, biases]
# lrn1
# TODO(shlens, jiayq): Add a GPU version of local response normalization.
# pool1
pool1 = tf.nn.max_pool(conv1,
ksize=[1, 3, 3, 1],
strides=[1, 2, 2, 1],
padding='VALID',
name='pool1')
print_activations(pool1)
# conv2
with tf.name_scope('conv2') as scope:
kernel = tf.Variable(tf.truncated_normal([5, 5, 96, 256], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv2)
# pool2
pool2 = tf.nn.max_pool(conv2,
ksize=[1, 3, 3, 1],
strides=[1, 2, 2, 1],
padding='VALID',
name='pool2')
print_activations(pool2)
# conv3
with tf.name_scope('conv3') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 384],
dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv3 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv3)
# conv4
with tf.name_scope('conv4') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 384],
dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv4 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv4)
# conv5
with tf.name_scope('conv5') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv5 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv5)
# pool5
pool5 = tf.nn.max_pool(conv5,
ksize=[1, 3, 3, 1],
strides=[1, 2, 2, 1],
padding='VALID',
name='pool5')
print_activations(pool5)
return pool5, parameters
測試的函式:
image是隨機生成的資料,不是真實的資料
def run_benchmark():
"""Run the benchmark on AlexNet."""
with tf.Graph().as_default():
# Generate some dummy images.
image_size = 224
# Note that our padding definition is slightly different the cuda-convnet.
# In order to force the model to start with the same activations sizes,
# we add 3 to the image_size and employ VALID padding above.
images = tf.Variable(tf.random_normal([FLAGS.batch_size,
image_size,
image_size, 3],
dtype=tf.float32,
stddev=1e-1))
# Build a Graph that computes the logits predictions from the
# inference model.
pool5, parameters = inference(images)
# Build an initialization operation.
init = tf.global_variables_initializer()
# Start running operations on the Graph.
config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
sess = tf.Session(config=config)
sess.run(init)
# Run the forward benchmark.
time_tensorflow_run(sess, pool5, "Forward")
# Add a simple objective so we can calculate the backward pass.
objective = tf.nn.l2_loss(pool5)
# Compute the gradient with respect to all the parameters.
grad = tf.gradients(objective, parameters)
# Run the backward benchmark.
time_tensorflow_run(sess, grad, "Forward-backward")
輸出的結果為:
下面為輸出的尺寸,具體的分析過程上面已經說的很詳細了.
conv1 [128, 56, 56, 96]
pool1 [128, 27, 27, 96]
conv2 [128, 27, 27, 256]
pool2 [128, 13, 13, 256]
conv3 [128, 13, 13, 384]
conv4 [128, 13, 13, 384]
conv5 [128, 13, 13, 256]
pool5 [128, 6, 6, 256]
下面是訓練的前後向耗時,可以看到後向傳播比前向要慢3倍.
2017-05-02 15:40:53.118788: step 0, duration = 3.969
2017-05-02 15:41:30.003927: step 10, duration = 3.550
2017-05-02 15:42:07.242987: step 20, duration = 3.797
2017-05-02 15:42:44.610630: step 30, duration = 3.487
2017-05-02 15:43:20.021931: step 40, duration = 3.535
2017-05-02 15:43:55.832460: step 50, duration = 3.687
2017-05-02 15:44:31.803954: step 60, duration = 3.567
2017-05-02 15:45:08.156715: step 70, duration = 3.803
2017-05-02 15:45:44.739322: step 80, duration = 3.584
2017-05-02 15:46:20.349876: step 90, duration = 3.569
2017-05-02 15:46:53.242329: Forward across 100 steps, 3.641 +/- 0.130 sec / batch
2017-05-02 15:49:01.054495: step 0, duration = 11.493
2017-05-02 15:50:55.424543: step 10, duration = 10.905
2017-05-02 15:52:47.021526: step 20, duration = 11.797
2017-05-02 15:54:42.965286: step 30, duration = 11.559
2017-05-02 15:56:36.329784: step 40, duration = 11.185
2017-05-02 15:58:32.146361: step 50, duration = 11.945
2017-05-02 16:00:21.971351: step 60, duration = 10.887
2017-05-02 16:02:10.775796: step 70, duration = 10.914
2017-05-02 16:04:07.438658: step 80, duration = 11.409
2017-05-02 16:05:56.403530: step 90, duration = 10.915
2017-05-02 16:07:34.297486: Forward-backward across 100 steps, 11.247 +/- 0.448 sec / batch