Tensorflow+SSD使用原始權重並修改分類網路進行單目標檢測
本文的原始碼地址是https://github.com/balancap/SSD-Tensorflow
由於專案需要,需要對場景中的人體進行檢測,但是原始的SSD網路是20種類別的網路,而只需要獲取人的分類即可,當我按照其說明在具有兩塊1080Ti的伺服器上訓練8個小時,損失值降低到10左右,但是,其效果不如原始的權重引數的效果,因此,想修改網路連線,只保留最後一層的分類網路中對人體的分類。
步驟如下:
-
從github中下載上述例子,測試給定的note_books.ipynb,可以實現多目標檢測。最好將其轉換成note_books.py程式
-
在note_books.py中列印可訓練的變數名稱(讀論文和程式也是可以看出來的),部分如下:
注意,上圖中名稱為***.box/conv_cls/biases和***.box/conv_cls/weights是需要提取的最後的分類引數,也是網路需要修改的地方。 -
進行分類的網路在ssd_vgg_300.py中的def ssd_multibox_layer()函式中,原程式的weights和biases變數使用tensorflow的slim同時實現,無法手動獲取,因此,先將slim方式轉換成原始的tensorflow方式。
有關於saver進行tensorflow儲存與恢復參考前面的文章:https://blog.csdn.net/weixin_40100431/article/details/82860478
channels=[512,1024,512,256,256,256] ###這裡的chanels是輸入網路的通道數目,需要在輸入函式中新增一個變數i,確定是哪一個網路輸入進來 weights = tf.Variable(tf.truncated_normal([3, 3, channels[i] , num_cls_pred], dtype=tf.float32, stddev=1e-1), name='conv_cls/weights') biases = tf.Variable(tf.constant(0.0, shape=[num_cls_pred], dtype=tf.float32), name='conv_cls/biases') #### weights1 = weights biases1 = biases ####上面這兩行是重要的部分,後面修改兩行進行提取網路引數和修改分類網路使之變成單目標檢測 ##首先宣告變數,名稱要和上一步圖中的名稱一致,否則,在ssd_notebook中恢復引數會由於名稱不一致而報錯。 tmp = tf.nn.conv2d(net, weights1, strides=[1, 1, 1, 1], padding='SAME') # cls_pred=tf.nn.relu(tf.nn.bias_add(tmp,biases)) cls_pred = tf.nn.bias_add(tmp, biases1)
- 提取引數和修改分類網路,SSD網路設定了anchors,每一個點生成4個或者6個anchors,程式碼中這4個或者6個並不是按照同一類的順序儲存,而是先儲存第一個anchors的21類引數,然後儲存第二個anchors,以此類推,在pascalvoc_2007.py中,人是第15類,然後修改上面所說的兩行程式碼為。
有關於tf.concat函式的使用請參考前面的文章:
https://blog.csdn.net/weixin_40100431/article/details/82858085
for ii in range(num_cls_pred*2):
if ii==0:
weights1 = tf.concat([weights[0:3, 0:3, 0:channels[i], ii * num_classes:ii * num_classes + 1],
weights[0:3, 0:3, 0:channels[i], ii * num_classes + 15:ii * num_classes + 16]], 3)
biases1 = tf.concat(
[biases[ii * num_classes:ii * num_classes + 1], biases[ii * num_classes + 15:ii * num_classes + 16]], 0)
else:
weights1 = tf.concat([weights1,weights[0:3, 0:3, 0:channels[i], ii * num_classes:ii * num_classes + 1],
weights[0:3, 0:3, 0:channels[i], ii * num_classes + 15:ii * num_classes + 16]], 3)
biases1 = tf.concat(
[biases1,biases[ii * num_classes:ii * num_classes + 1], biases[ii * num_classes + 15:ii * num_classes + 16]], 0)
cls_pred = tf.reshape(cls_pred,
tensor_shape(cls_pred, 4)[:-1] + [num_anchors, 2]) #cls_pred is [N,W,H,num_anchors*classes] before,here become [N,W,H,num_anchors,classes]
###這裡將21類修改成2類
note:在ssd_notebook.py中有一個21類的引數,不需要修改,在函式的具體實現上並沒有使用。
實驗效果圖如下:
note:建議結合原始碼和論文進行理解,才能對程式碼進行更好的操作和修改。
修改後的程式碼:在ssd_vgg_300.py中替換下面的函式即可,如有問題,請留言交流
def ssd_multibox_layer(inputs,
num_classes,
sizes,
i,
ratios=[1],
normalization=-1,
bn_normalization=False,
):
"""Construct a multibox layer, return a class and localization predictions.
"""
channels=[512,1024,512,256,256,256]
net = inputs
if normalization > 0:
net = custom_layers.l2_normalization(net, scaling=True)
# Number of anchors.
num_anchors = len(sizes) + len(ratios) ###num_anchors=[4,6,6,6,4,4]
# Location.
num_loc_pred = num_anchors * 4
loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
scope='conv_loc')
loc_pred = custom_layers.channel_to_last(loc_pred)#data_format NHWC==[batch,height,width,channels];NCHW=[batcg,channles,height,width]
loc_pred = tf.reshape(loc_pred,
tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4]) #loc_pred is [N,W,H,num_anchors*4] before,here become [N,W,H,num_anchors,4]
# Class prediction.
# num_cls_pred = num_anchors * num_classes
# cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,
# scope='conv_cls')
# cls_pred = custom_layers.channel_to_last(cls_pred)
## add codes
num_cls_pred = num_anchors * num_classes ##
weights = tf.Variable(tf.truncated_normal([3, 3, channels[i], num_cls_pred], dtype=tf.float32, stddev=1e-1),
name='conv_cls/weights')
biases = tf.Variable(tf.constant(0.0, shape=[num_cls_pred], dtype=tf.float32), name='conv_cls/biases')
#sa usual
# weights1 = weights
# #weights = weights[0:3, 0:3, 0:channels[i], 0:num_cls_pred]
# biases1 = biases
###only detect person
for ii in range(num_cls_pred*2):
if ii==0:
weights1 = tf.concat([weights[0:3, 0:3, 0:channels[i], ii * num_classes:ii * num_classes + 1],
weights[0:3, 0:3, 0:channels[i], ii * num_classes + 15:ii * num_classes + 16]], 3)
biases1 = tf.concat(
[biases[ii * num_classes:ii * num_classes + 1], biases[ii * num_classes + 15:ii * num_classes + 16]], 0)
else:
weights1 = tf.concat([weights1,weights[0:3, 0:3, 0:channels[i], ii * num_classes:ii * num_classes + 1],
weights[0:3, 0:3, 0:channels[i], ii * num_classes + 15:ii * num_classes + 16]], 3)
biases1 = tf.concat(
[biases1,biases[ii * num_classes:ii * num_classes + 1], biases[ii * num_classes + 15:ii * num_classes + 16]], 0)
###
# print("*****************")
# print(weights.name)
# print(weights.get_shape)
# print(biases.name)
# print(biases.get_shape)
tmp = tf.nn.conv2d(net, weights1, strides=[1, 1, 1, 1], padding='SAME')
# cls_pred=tf.nn.relu(tf.nn.bias_add(tmp,biases))
cls_pred = tf.nn.bias_add(tmp, biases1)
#print(
# cls_pred.get_shape) ###cls_pred.get_shape==(1,38,38,84)(1,19,19,126)(1,10,10,126)(1,5,5,126)(1,3,3,84)(1,1,1,84)
# cls_pred = tf.reshape(cls_pred,
# tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes])
cls_pred = tf.reshape(cls_pred,
tensor_shape(cls_pred, 4)[:-1] + [num_anchors, 2]) #cls_pred is [N,W,H,num_anchors*classes] before,here become [N,W,H,num_anchors,classes]
#print(cls_pred) # cls_pred=(1,38,38,4,21)(1,19,19,6,21)(1,10,10,6,21)(1,5,5,6,21)(1,3,3,4,21)(1,1,1,4,21)
return cls_pred, loc_pred
def ssd_net(inputs,
num_classes=SSDNet.default_params.num_classes,
feat_layers=SSDNet.default_params.feat_layers,
anchor_sizes=SSDNet.default_params.anchor_sizes,
anchor_ratios=SSDNet.default_params.anchor_ratios,
normalizations=SSDNet.default_params.normalizations,
is_training=True,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
reuse=None,
scope='ssd_300_vgg'):
"""SSD net definition.
"""
# if data_format == 'NCHW':
# inputs = tf.transpose(inputs, perm=(0, 3, 1, 2))
# End_points collect relevant activations for external use.
end_points = {}
with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
# Original VGG-16 blocks.
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1') ### create model variable,which can be used train or finetune .
## ===
##net=slim.conv2d(inputs,64,[3,3],scope='conv1')
##net=slim.conv2d(net,64,[3,3],scope='conv1')
end_points['block1'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool1') #150*150*64
# Block 2.
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
end_points['block2'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool2') #75*75*128
# Block 3.
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
end_points['block3'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool3')# 38*38*256
# Block 4.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
end_points['block4'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool4')# 19*19*512
# Block 5.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
end_points['block5'] = net
net = slim.max_pool2d(net, [3, 3], stride=1, scope='pool5')#19*19*512
# Additional SSD blocks.
# Block 6: let's dilate the hell out of it!
net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')#19*19*1024
end_points['block6'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 7: 1x1 conv. Because the fuck.
net = slim.conv2d(net, 1024, [1, 1], scope='conv7')#19*19*1024
end_points['block7'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
end_point = 'block8'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')###10*10*512
end_points[end_point] = net
end_point = 'block9'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')###5*5*256
end_points[end_point] = net
end_point = 'block10'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID') ###
end_points[end_point] = net
end_point = 'block11'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
end_points[end_point] = net
# Prediction and localisations layers.
predictions = []
logits = []
localisations = []
for i, layer in enumerate(feat_layers):
with tf.variable_scope(layer + '_box'): #### creat context
p, l = ssd_multibox_layer(end_points[layer],
num_classes,
anchor_sizes[i],
i,
anchor_ratios[i],
normalizations[i])
predictions.append(prediction_fn(p))###softmax,to predict class
logits.append(p)
localisations.append(l)
return predictions, localisations, logits, end_points
ssd_net.default_image_size = 300