1. 程式人生 > 實用技巧 >CNN經典網路——Resnet以及Keras實現

CNN經典網路——Resnet以及Keras實現

由於筆者水平有限,如有錯,歡迎指正。

論文原文:https://arxiv.org/pdf/1512.03385.pdf


來源


深度殘差網路(Deep residual network, ResNet)的提出是CNN影象史上的一件里程碑事件,何凱明團隊提出的該網路在ILSVRC和COCO 2015上獲得了5項第一。那麼為什麼ResNet會有如此優異的表現呢?

我們的一般印象當中,深度學習愈是深(複雜,引數多)愈是有著更強的表達能力。憑著這一基本準則,CNN分類網路自Alexnet的7層發展到了VGG的16乃至19層,後來更有了Googlenet的22層。可後來我們發現深度CNN網路達到一定深度後再一味地增加層數並不能帶來進一步地分類效能提高,反而會招致網路收斂變得更慢,分類準確率也變得更差。排除資料集過小帶來的模型過擬合等問題後,我們發現過深的網路仍然還會使分類準確度下降(相對於較淺些的網路而言)。

正如上圖及其註解所示,56層的網路比20層還要差很多。在此基礎上,何凱明博士提出了殘差學習來解決退化問題。


殘差學習(Residual learning)


上圖是一個殘差學習單元,把一個輸入x和其堆疊了2次後的輸出F(x)的進行元素級和作為總的輸出。因此它沒有增加網路的運算複雜度,把原來需要學習逼近的未知函式H(x)恆等對映(Identity mapping),變成了逼近F(x)=H(x)-x的一個函式。這兩種表達的效果相同,但是優化的難度卻並不相同,作者假設F(x)的優化會比H(x)簡單的多。

在論文中,作者設計了實驗,兩個引數量、計算量相同的網路,僅僅是在其一插入了shortcut,就達到了優化的目的,得到了更好的效果。


ResNet in Keras


  • 使用identity_block這個函式來搭建Resnet34,使用bottleneck這個函式來搭建Resnet50。 右圖為bottleneck
  • 每個卷積層後都使用BatchNormalization,來防止模型過擬合,並且使輸出滿足高斯分佈。

identity_block(ResNet34) :

def identity_block(X, f, filters, stage, block):
# defining name basis
conv_name_base = "res" + str(stage) + block + "_branch"
bn_name_base   = "bn"  + str(stage) + block + "_branch"

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value. You'll need this later to add back to the main path. 
X_shortcut = X

# First component of main path
X = Conv2D(filters=F1, kernel_size=(1, 1), strides=(1, 1), padding="valid", 
           name=conv_name_base+"2a", kernel_initializer=glorot_uniform(seed=0))(X)
#valid mean no padding / glorot_uniform equal to Xaiver initialization - Steve 

X = BatchNormalization(axis=3, name=bn_name_base + "2a")(X)
X = Activation("relu")(X)
### START CODE HERE ### 
# Second component of main path (≈3 lines)
X = Conv2D(filters=F2, kernel_size=(f, f), strides=(1, 1), padding="same",
           name=conv_name_base+"2b", kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base+"2b")(X)
X = Activation("relu")(X)
# Third component of main path (≈2 lines)

# Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
X = Conv2D(filters=F3, kernel_size=(1, 1), strides=(1, 1), padding="valid",
           name=conv_name_base+"2c", kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base+"2c")(X)
X = Add()([X, X_shortcut])
X = Activation("relu")(X)
### END CODE HERE ###

return X

bottleneck_Block(ResNet50):

def bottleneck_Block(X, f, filters, stage, block, s = 2):
# defining name basis
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value
X_shortcut = X

##### MAIN PATH #####
# First component of main path 
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)

### START CODE HERE ###
# Second component of main path (≈3 lines)
X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b',padding='same', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
X = Activation('relu')(X)

# Third component of main path (≈2 lines)
X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)
##### SHORTCUT PATH #### (≈2 lines)
X_shortcut = Conv2D(F3, (1, 1), strides = (s, s), name = conv_name_base + '1',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut)
# Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
X = layers.add([X, X_shortcut])
X = Activation('relu')(X)

### END CODE HERE ###
return X

ResNet50 :

def ResNet50(input_shape = (64, 64, 3), classes = 6):
# Define the input as a tensor with shape input_shape
X_input = Input(input_shape)

# Zero-Padding
X = ZeroPadding2D((3, 3))(X_input)

# Stage 1
X = Conv2D(filters=64, kernel_size=(7, 7), strides=(2, 2), name="conv",
           kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name="bn_conv1")(X)
X = Activation("relu")(X)
X = MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(X)

# Stage 2
X = convolutional_block(X, f=3, filters=[64, 64, 256], stage=2, block="a", s=1)
X = identity_block(X, f=3, filters=[64, 64, 256], stage=2, block="b")
X = identity_block(X, f=3, filters=[64, 64, 256], stage=2, block="c")
### START CODE HERE ###

# Stage 3 (≈4 lines)
# The convolutional block uses three set of filters of size [128,128,512], "f" is 3, "s" is 2 and the block is "a".
# The 3 identity blocks use three set of filters of size [128,128,512], "f" is 3 and the blocks are "b", "c" and "d".
X = convolutional_block(X, f=3, filters=[128, 128, 512], stage=3, block="a", s=1)
X = identity_block(X, f=3, filters=[128, 128, 512], stage=3, block="b")
X = identity_block(X, f=3, filters=[128, 128, 512], stage=3, block="c")
X = identity_block(X, f=3, filters=[128, 128, 512], stage=3, block="d")

# Stage 4 (≈6 lines)
# The convolutional block uses three set of filters of size [256, 256, 1024], "f" is 3, "s" is 2 and the block is "a".
# The 5 identity blocks use three set of filters of size [256, 256, 1024], "f" is 3 and the blocks are "b", "c", "d", "e" and "f".
X = convolutional_block(X, f=3, filters=[256, 256, 1024], stage=4, block="a", s=2)
X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block="b")
X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block="c")
X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block="d")
X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block="e")
X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block="f")

# Stage 5 (≈3 lines)
# The convolutional block uses three set of filters of size [512, 512, 2048], "f" is 3, "s" is 2 and the block is "a".
# The 2 identity blocks use three set of filters of size [256, 256, 2048], "f" is 3 and the blocks are "b" and "c".
X = convolutional_block(X, f=3, filters=[512, 512, 2048], stage=5, block="a", s=2)
X = identity_block(X, f=3, filters=[512, 512, 2048], stage=5, block="b")
X = identity_block(X, f=3, filters=[512, 512, 2048], stage=5, block="c")
# filters should be [256, 256, 2048], but it fail to be graded. Use [512, 512, 2048] to pass the grading

# AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"
# The 2D Average Pooling uses a window of shape (2,2) and its name is "avg_pool".
X = AveragePooling2D(pool_size=(2, 2), padding="same")(X)

### END CODE HERE ###
# output layer
X = Flatten()(X)
X = Dense(classes, activation="softmax", name="fc"+str(classes), kernel_initializer=glorot_uniform(seed=0))(X)

# Create model
model = Model(inputs=X_input, outputs=X, name="ResNet50")

return model


參考資料

https://www.jianshu.com/p/93990a641066

https://www.cnblogs.com/long5683/p/12957042.html

吳恩達ResNet作業: https://blog.csdn.net/Solo95/article/details/85177557