RefineDet演算法原始碼(二)網路結構
阿新 • • 發佈:2019-01-06
關於RefineDet演算法內容可以先看看部落格:RefineDet論文筆記。
RefineDet演算法是SSD演算法的升級版本,所以大部分的程式碼也是基於SSD的開原始碼來修改的。SSD開原始碼參考連結:https://github.com/weiliu89/caffe/tree/ssd。RefineDet主要包含anchor refinement module (ARM) 、object detection module (ODM)、transfer connection block (TCB)3個部分,ARM部分可以直接用SSD程式碼,只不過將分類支路的類別數由object數量+1修改成2,類似RPN網路,目的是得到更好的初始bbox。ODM部分也可以基於SSD程式碼做修改,主要是原本採用的default box用ARM生成的bbox代替,剩下的分類和迴歸支路與SSD一樣。TCB部分則通過一些卷積層和反捲積層即可實現。
在部落格:RefineDet演算法原始碼 (一)訓練指令碼中介紹了訓練RefineDet演算法的程式碼,其中包含巨集觀上的網路結構構建,並未涉及細節內容。因此這篇部落格介紹RefineDet演算法的具體網路結構構造細節,程式碼所在路徑:~RefineDet/python/caffe/model_libs.py指令碼的CreateRefineDetHead函式。
'''
CreateRefineDetHead函式是本文關於網路結構構造的重點,這部分程式碼也是在原來SSD的CreateMultiBoxHead函式
基礎上修改得到的,可以看作是將原來SSD的CreateMultiBoxHead函式內容實現了兩遍,一遍用來實現ARM部分,
另一邊用來實現ORM部分。from_layers和from_layers2是兩個重點輸入,
分別對應論文中Figure1的ARM和OBM兩部分輸出。因此這兩遍實現除了輸入不同外,另一個不同是ARM部分
是類似RPN網路的bbox迴歸和二分類,而ORM部分是類似SSD檢測網路的bbox迴歸和object分類。
'''
def CreateRefineDetHead(net, data_layer="data", num_classes=[], from_layers=[], from_layers2=[], normalizations=[], use_batchnorm=True, lr_mult=1, min_sizes=[], max_sizes=[], prior_variance = [0.1],aspect_ratios=[], steps=[], img_height=0, img_width=0, share_location=True, flip=True, clip=True, offset=0.5 , inter_layer_depth=[], kernel_size=1, pad=0, conf_postfix='', loc_postfix='', **bn_param):
assert num_classes, "must provide num_classes"
assert num_classes > 0, "num_classes must be positive number"
if normalizations:
assert len(from_layers) == len(normalizations), "from_layers and normalizations should have same length"
assert len(from_layers) == len(min_sizes), "from_layers and min_sizes should have same length"
if max_sizes:
assert len(from_layers) == len(max_sizes), "from_layers and max_sizes should have same length"
if aspect_ratios:
assert len(from_layers) == len(aspect_ratios), "from_layers and aspect_ratios should have same length"
if steps:
assert len(from_layers) == len(steps), "from_layers and steps should have same length"
net_layers = net.keys()
assert data_layer in net_layers, "data_layer is not in net's layers"
if inter_layer_depth:
assert len(from_layers) == len(inter_layer_depth), "from_layers and inter_layer_depth should have same length"
# 接下來的程式碼分為兩部分,一部分是Anchor Refinement Module(ARM),另一部分
# 是Object Detection Module(ODM),首先看看Anchor Refinement Module(ARM)部分內容。
prefix = 'arm'
num_classes_rpn = 2
num = len(from_layers)
priorbox_layers = []
loc_layers = []
conf_layers = []
# 這個迴圈就是作用於每個融合層,文章中預設融合層有4個。
for i in range(0, num):
from_layer = from_layers[i]
# Get the normalize value.
if normalizations:
if normalizations[i] != -1:
norm_name = "{}_norm".format(from_layer)
net[norm_name] = L.Normalize(net[from_layer], scale_filler=dict(type="constant", value=normalizations[i]),
across_spatial=False, channel_shared=False)
from_layer = norm_name
# Add intermediate layers.
# 這部分預設是執行的,而且inter_layer_depth=[1,1,1,1],也就是每個融合層都接一個residual block,
# 這種在分類和迴歸支路之前再新增層的操作在很多object detection演算法中都有。
if inter_layer_depth:
if inter_layer_depth[i] > 0:
inter_name = "{}_inter".format(from_layer)
ResBody(net, from_layer, inter_name, out2a=256, out2b=256, out2c=1024, stride=1, use_branch1=True)
# ConvBNLayer(net, from_layer, inter_name, use_bn=use_batchnorm, use_relu=True, lr_mult=lr_mult,
# num_output=inter_layer_depth[i], kernel_size=3, pad=1, stride=1, **bn_param)
from_layer = "res{}".format(inter_name)
# Estimate number of priors per location given provided parameters.
min_size = min_sizes[i]
if type(min_size) is not list:
min_size = [min_size]
aspect_ratio = []
if len(aspect_ratios) > i:
aspect_ratio = aspect_ratios[i]
if type(aspect_ratio) is not list:
aspect_ratio = [aspect_ratio]
max_size = []
if len(max_sizes) > i:
max_size = max_sizes[i]
if type(max_size) is not list:
max_size = [max_size]
if max_size:
assert len(max_size) == len(min_size), "max_size and min_size should have same length."
if max_size:
num_priors_per_location = (2 + len(aspect_ratio)) * len(min_size)
else:
num_priors_per_location = (1 + len(aspect_ratio)) * len(min_size)
if flip:
num_priors_per_location += len(aspect_ratio) * len(min_size)
step = []
if len(steps) > i:
step = steps[i]
# Create location prediction layer.
# 這部分程式碼是建立bbox的座標迴歸層,num_priors_per_location是feature map層的每個點生成的bbox的數量。
# share_location預設是True,所以不執行條件語句。得到的結果就會插入loc_layers列表中,
# 這樣經過4個融合層後,loc_layers就包含4個融合層的bbox座標迴歸結果。
name = "{}_mbox_loc{}".format(from_layer, loc_postfix)
num_loc_output = num_priors_per_location * 4
if not share_location:
num_loc_output *= num_classes_rpn
ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult, num_output=num_loc_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param)
permute_name = "{}_perm".format(name)
net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1])
flatten_name = "{}_flat".format(name)
net[flatten_name] = L.Flatten(net[permute_name], axis=1)
loc_layers.append(net[flatten_name])
# Create confidence prediction layer.
# 這部分程式碼是建立bbox的分類層,這裡num_conf_output = num_priors_per_location * num_classes_rpn,
# 要注意的是num_classes_rpn設定為2,所以這裡是對每個bbox做二分類,也就是前景(foreground)和
# 背景(background)的二分類。因此這裡的分類支路就和RPN網路一樣,得到的結果會插入conf_layers列表中,
# 這樣經過4個融合層後,conf_layers就包含4個融合層的二分類結果了。
name = "{}_mbox_conf{}".format(from_layer, conf_postfix)
num_conf_output = num_priors_per_location * num_classes_rpn
ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult,num_output=num_conf_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param)
permute_name = "{}_perm".format(name)
net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1])
flatten_name = "{}_flat".format(name)
net[flatten_name] = L.Flatten(net[permute_name], axis=1)
conf_layers.append(net[flatten_name])
# Create prior generation layer.
'''
這一部分程式碼是生成anchor(或者叫priorbox),這些anchor和RPN網路的anchor一樣,生成後就固定不變了,
而前面所說的bbox是指預測的框,跟這些anchor不是一回事。那麼生成這些anchor做什麼呢?
這是為了計算損失用。不管是RefineDet、SSD還是Faster RCNN,對座標的迴歸損失計算都一樣,
計算的是預測得到的offset要儘可能和(ground truth與anchor之間)的offset接近。
所以計算ground truth和anchor之間的offset的時候就需要用到這裡計算得到的輸出(anchor的座標)。
'''
name = "{}_mbox_priorbox".format(from_layer)
net[name] = L.PriorBox(net[from_layer], net[data_layer], min_size=min_size,
clip=clip, variance=prior_variance, offset=offset)
if max_size:
net.update(name, {'max_size': max_size})
if aspect_ratio:
net.update(name, {'aspect_ratio': aspect_ratio, 'flip': flip})
if step:
net.update(name, {'step': step})
if img_height != 0 and img_width != 0:
if img_height == img_width:
net.update(name, {'img_size': img_height})
else:
net.update(name, {'img_h': img_height, 'img_w': img_width})
priorbox_layers.append(net[name])
# Concatenate priorbox, loc, and conf layers.
# 接下來這部分是對不同層的輸出做融合。
mbox_layers = []
name = '{}{}'.format(prefix, "_loc")
net[name] = L.Concat(*loc_layers, axis=1)
mbox_layers.append(net[name])
name = '{}{}'.format(prefix, "_conf")
net[name] = L.Concat(*conf_layers, axis=1)
mbox_layers.append(net[name])
name = '{}{}'.format(prefix, "_priorbox")
net[name] = L.Concat(*priorbox_layers, axis=2)
mbox_layers.append(net[name])
# 接下來這部分是Object Detection Module(ODM),大部分和ARM相同的程式碼這裡不再重複介紹,主要介紹不同點。
prefix = 'odm'
num = len(from_layers2)
loc_layers = []
conf_layers = []
for i in range(0, num):
from_layer = from_layers2[i]
# Get the normalize value.
if normalizations:
if normalizations[i] != -1:
norm_name = "{}_norm".format(from_layer)
net[norm_name] = L.Normalize(net[from_layer], scale_filler=dict(type="constant", value=normalizations[i]),
across_spatial=False, channel_shared=False)
from_layer = norm_name
# Add intermediate layers.
if inter_layer_depth:
if inter_layer_depth[i] > 0:
inter_name = "{}_inter".format(from_layer)
ResBody(net, from_layer, inter_name, out2a=256, out2b=256, out2c=1024, stride=1, use_branch1=True)
# ConvBNLayer(net, from_layer, inter_name, use_bn=use_batchnorm, use_relu=True, lr_mult=lr_mult,
# num_output=inter_layer_depth[i], kernel_size=3, pad=1, stride=1, **bn_param)
# from_layer = inter_name
from_layer = "res{}".format(inter_name)
# Estimate number of priors per location given provided parameters.
min_size = min_sizes[i]
if type(min_size) is not list:
min_size = [min_size]
aspect_ratio = []
if len(aspect_ratios) > i:
aspect_ratio = aspect_ratios[i]
if type(aspect_ratio) is not list:
aspect_ratio = [aspect_ratio]
max_size = []
if len(max_sizes) > i:
max_size = max_sizes[i]
if type(max_size) is not list:
max_size = [max_size]
if max_size:
assert len(max_size) == len(min_size), "max_size and min_size should have same length."
if max_size:
num_priors_per_location = (2 + len(aspect_ratio)) * len(min_size)
else:
num_priors_per_location = (1 + len(aspect_ratio)) * len(min_size)
if flip:
num_priors_per_location += len(aspect_ratio) * len(min_size)
# Create location prediction layer.
name = "{}_mbox_loc{}".format(from_layer, loc_postfix)
num_loc_output = num_priors_per_location * 4
if not share_location:
num_loc_output *= num_classes
ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult,
num_output=num_loc_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param)
permute_name = "{}_perm".format(name)
net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1])
flatten_name = "{}_flat".format(name)
net[flatten_name] = L.Flatten(net[permute_name], axis=1)
loc_layers.append(net[flatten_name])
# Create confidence prediction layer.
# 這裡的num_conf_output = num_priors_per_location * num_classes,
# num_classes是所有object的數量+背景。因此這裡的分類支路就和SSD中的一樣。
name = "{}_mbox_conf{}".format(from_layer, conf_postfix)
num_conf_output = num_priors_per_location * num_classes
ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult,
num_output=num_conf_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param)
permute_name = "{}_perm".format(name)
net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1])
flatten_name = "{}_flat".format(name)
net[flatten_name] = L.Flatten(net[permute_name], axis=1)
conf_layers.append(net[flatten_name])
# Concatenate priorbox, loc, and conf layers.
# 最後在返回列表中添加了bbox的分類輸出和迴歸輸出。
name = '{}{}'.format(prefix, "_loc")
net[name] = L.Concat(*loc_layers, axis=1)
mbox_layers.append(net[name])
name = '{}{}'.format(prefix, "_conf")
net[name] = L.Concat(*conf_layers, axis=1)
mbox_layers.append(net[name])
return mbox_layers