Keras實現DenseNet結構
DenseNet結構在16年由Huang Gao和Liu Zhuang等人提出,並且在CVRP2017中被評為最佳論文。論文地址:https://arxiv.org/abs/1608.06993v3。網路的核心結構為如下所示的Dense塊,在每一個Dense塊中,存在多個Dense層,即下圖所示的H1~H4。各Dense層之間彼此均相互連線,即H1的輸入為x0,輸出為x1,H2的輸入即為[x0, x1],輸出為x2,依次類推。最終Dense塊的輸出即為[x0, x1, x2, x3, x4]。這種結構個人感覺非常類似生物學裡邊的神經元連線方式,應該能夠比較有效的提高了網路中特徵資訊的利用效率。
DenseNet的其他結構就非常類似一般的卷積神經網路結構了,可以參考論文中提供的網路結構圖(下圖)。但是個人感覺,DenseNet的這種結構應該是存在進一步的優化方法的,比如可能不一定需要在Dense塊中對每一個Dense層均直接進行相互連線,來縮小網路的結構;也可能可以在不相鄰的Dense塊之間通過簡單的下采樣操作進行連線,進一步提升網路對不同尺度的特徵的利用效率。
由於DenseNet的密集連線方式,在構建一個相同容量的網路時其所需的引數數量遠小於其之前提出的如resnet等結構。進一步,個人感覺應該可以把Dense塊看做對一個有較多引數的卷積層的高效替代。因此,其也可以結合U-Net等網路結構,來進一步優化網路效能,比如單純的把U-net中的所有卷積層全部換成DenseNet的結構,就可以顯著壓縮網路大小。
下面基於Keras實現DenseNet-BC結構。首先定義Dense層,根據論文描述構建如下:
def DenseLayer(x, nb_filter, bn_size=4, alpha=0.0, drop_rate=0.2): # Bottleneck layers x = BatchNormalization(axis=3)(x) x = LeakyReLU(alpha=alpha)(x) x = Conv2D(bn_size*nb_filter, (1, 1), strides=(1,1), padding='same')(x) # Composite function x = BatchNormalization(axis=3)(x) x = LeakyReLU(alpha=alpha)(x) x = Conv2D(nb_filter, (3, 3), strides=(1,1), padding='same')(x) if drop_rate: x = Dropout(drop_rate)(x) return x
論文原文中提出使用1*1卷積核的卷積層作為bottleneck層來優化計算效率。原文中使用的啟用函式全部為relu,但個人習慣是用leakyrelu進行構建,來方便調參。
之後是用Dense層搭建Dense塊,如下:
def DenseBlock(x, nb_layers, growth_rate, drop_rate=0.2):
for ii in range(nb_layers):
conv = DenseLayer(x, nb_filter=growth_rate, drop_rate=drop_rate)
x = concatenate([x, conv], axis=3)
return x
如論文中所述,將每一個Dense層的輸出與其輸入融合之後作為下一Dense層的輸入,來實現密集連線。
最後是各Dense塊之間的過渡層,如下:
def TransitionLayer(x, compression=0.5, alpha=0.0, is_max=0):
nb_filter = int(x.shape.as_list()[-1]*compression)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(nb_filter, (1, 1), strides=(1,1), padding='same')(x)
if is_max != 0: x = MaxPooling2D(pool_size=(2, 2), strides=2)(x)
else: x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)
return x
論文中提出使用均值池化層來作下采樣,不過在邊緣特徵提取方面,最大池化層效果應該更好,這裡就加了相關介面。
將上述結構按照論文中提出的結構進行拼接,這裡選擇的引數是論文中提到的L=100,k=12,網路連線如下:
growth_rate = 12
inpt = Input(shape=(32,32,3))
x = Conv2D(growth_rate*2, (3, 3), strides=1, padding='same')(inpt)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=0.1)(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = BatchNormalization(axis=3)(x)
x = GlobalAveragePooling2D()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inpt, x)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
雖然我們已經完成了網路的架設,網路本身的引數數量也僅有0.5M,但由於以這種方式實現的網路在Dense塊中,每一次concat均需要開闢一組全新的記憶體空間,導致實際需要的記憶體空間非常大。作者在17年的時候,還專門寫了相關的技術報告:https://arxiv.org/abs/1707.06990來說明怎麼節省記憶體空間,不過單純用keras實現起來是比較麻煩。下一篇部落格中將以pytorch框架來對其進行實現。
最後放出網路完整程式碼:
import numpy as np
import keras
from keras.models import Model, save_model, load_model
from keras.layers import Input, Dense, Dropout, BatchNormalization, LeakyReLU, concatenate
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, GlobalAveragePooling2D
## data
import pickle
data_batch_1 = pickle.load(open("cifar-10-batches-py/data_batch_1", 'rb'), encoding='bytes')
data_batch_2 = pickle.load(open("cifar-10-batches-py/data_batch_2", 'rb'), encoding='bytes')
data_batch_3 = pickle.load(open("cifar-10-batches-py/data_batch_3", 'rb'), encoding='bytes')
data_batch_4 = pickle.load(open("cifar-10-batches-py/data_batch_4", 'rb'), encoding='bytes')
data_batch_5 = pickle.load(open("cifar-10-batches-py/data_batch_5", 'rb'), encoding='bytes')
train_X_1 = data_batch_1[b'data']
train_X_1 = train_X_1.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
train_Y_1 = data_batch_1[b'labels']
train_X_2 = data_batch_2[b'data']
train_X_2 = train_X_2.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
train_Y_2 = data_batch_2[b'labels']
train_X_3 = data_batch_3[b'data']
train_X_3 = train_X_3.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
train_Y_3 = data_batch_3[b'labels']
train_X_4 = data_batch_4[b'data']
train_X_4 = train_X_4.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
train_Y_4 = data_batch_4[b'labels']
train_X_5 = data_batch_5[b'data']
train_X_5 = train_X_5.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
train_Y_5 = data_batch_5[b'labels']
train_X = np.row_stack((train_X_1, train_X_2))
train_X = np.row_stack((train_X, train_X_3))
train_X = np.row_stack((train_X, train_X_4))
train_X = np.row_stack((train_X, train_X_5))
train_Y = np.row_stack((train_Y_1, train_Y_2))
train_Y = np.row_stack((train_Y, train_Y_3))
train_Y = np.row_stack((train_Y, train_Y_4))
train_Y = np.row_stack((train_Y, train_Y_5))
train_Y = train_Y.reshape(50000, 1).transpose(0, 1).astype("int32")
train_Y = keras.utils.to_categorical(train_Y)
test_batch = pickle.load(open("cifar-10-batches-py/test_batch", 'rb'), encoding='bytes')
test_X = test_batch[b'data']
test_X = test_X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float")
test_Y = test_batch[b'labels']
test_Y = keras.utils.to_categorical(test_Y)
train_X /= 255
test_X /= 255
# model
def DenseLayer(x, nb_filter, bn_size=4, alpha=0.0, drop_rate=0.2):
# Bottleneck layers
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(bn_size*nb_filter, (1, 1), strides=(1,1), padding='same')(x)
# Composite function
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(nb_filter, (3, 3), strides=(1,1), padding='same')(x)
if drop_rate: x = Dropout(drop_rate)(x)
return x
def DenseBlock(x, nb_layers, growth_rate, drop_rate=0.2):
for ii in range(nb_layers):
conv = DenseLayer(x, nb_filter=growth_rate, drop_rate=drop_rate)
x = concatenate([x, conv], axis=3)
return x
def TransitionLayer(x, compression=0.5, alpha=0.0, is_max=0):
nb_filter = int(x.shape.as_list()[-1]*compression)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(nb_filter, (1, 1), strides=(1,1), padding='same')(x)
if is_max != 0: x = MaxPooling2D(pool_size=(2, 2), strides=2)(x)
else: x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)
return x
growth_rate = 12
inpt = Input(shape=(32,32,3))
x = Conv2D(growth_rate*2, (3, 3), strides=1, padding='same')(inpt)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=0.1)(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = BatchNormalization(axis=3)(x)
x = GlobalAveragePooling2D()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inpt, x)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
for ii in range(10):
print("Epoch:", ii+1)
model.fit(train_X, train_Y, batch_size=100, epochs=1, verbose=1)
score = model.evaluate(test_X, test_Y, verbose=1)
print('Test loss =', score[0])
print('Test accuracy =', score[1])
save_model(model, 'DenseNet.h5')
model = load_model('DenseNet.h5')
pred_Y = model.predict(test_X)
score = model.evaluate(test_X, test_Y, verbose=0)
print('Test loss =', score[0])
print('Test accuracy =', score[1])