tensorflow+faster rcnn程式碼理解(一):構建vgg前端和RPN網路
0.前言
該程式碼執行首先就是呼叫vgg類建立一個網路物件self.net
if cfg.FLAGS.network == 'vgg16':
self.net = vgg16(batch_size=cfg.FLAGS.ims_per_batch)
該類位於vgg.py中,如下:
class vgg16(Network):
def __init__(self, batch_size=1):
Network.__init__(self, batch_size=batch_size)
可以看到該類是繼承於network類的。也就是vgg類建立的物件擁有network類的變數,同時又有自己新增的變數。我們在來看network類,位於network.py中。可以看到該類含有的變數就是訓練一個網路所需要基本的變量了。
class Network(object): def __init__(self, batch_size=1): self._feat_stride = [16, ] self._feat_compress = [1. / 16., ] self._batch_size = batch_size self._predictions = {} self._losses = {} self._anchor_targets = {} self._proposal_targets = {} self._layers = {} self._act_summaries = [] self._score_summaries = {} self._train_summaries = [] self._event_summaries = {} self._variables_to_fix = {}
下面程式碼都以輸入影象為600×800舉例。
1.構建vgg16的前端(build_head函式)
vgg16的網路模型圖如下,程式碼就是完成紅框的部分。
def build_head(self, is_training): # Main network # Layer 1 net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1') #224×224×64 net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1') #112×112×64 # Layer 2 net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2') #112×112×128 net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2') #56×56×128 # Layer 3 net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3') #56×56×256 net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3') #28×28×256 # Layer 4 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')#28×28×512 net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4') #14×14×512 # Layer 5 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5') #14×14×512 # Append network to summaries self._act_summaries.append(net) # Append network as head layer self._layers['head'] = net return net
2.構建RPN網路(build_rpn函式)
我將build_rpn函式的內容拆成兩部分來寫,首先是生成anchor部分
2.1 生成anchor
def build_rpn(self, net, is_training, initializer):
# Build anchor component 呼叫network.py中的函式建立anchor的構成,主要有anchor_scale和anchor_ratio兩個引數修改
self._anchor_component()
#anchor的構成
def _anchor_component(self):
with tf.variable_scope('ANCHOR_' + 'default'):
# just to get the shape right 這裡feat_stride = 16,因為此時對於vgg模型來說新增RPN的時候,得到的特徵度是經過4次pool的,也就是下采樣了16倍
height = tf.to_int32(tf.ceil(self._im_info[0, 0] / np.float32(self._feat_stride[0]))) #下采樣後特徵圖的高度,這裡為38(600/16)
width = tf.to_int32(tf.ceil(self._im_info[0, 1] / np.float32(self._feat_stride[0]))) #下采樣後特徵圖的寬度,這裡為50(800/16)
anchors, anchor_length = tf.py_func(generate_anchors_pre, #anchor_length是anchor的數量
[height, width,
self._feat_stride, self._anchor_scales, self._anchor_ratios],
[tf.float32, tf.int32], name="generate_anchors") #呼叫snippets.py中的generate_anchors_pre函式產生anchor
anchors.set_shape([None, 4])
anchor_length.set_shape([])
self._anchors = anchors
self._anchor_length = anchor_length
在此基礎上進而呼叫generate_anchors_pre函式生成anchor,位於snippets.py
def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
""" A wrapper function to generate anchors given different scales
Also return the number of anchors in variable 'length'
"""
anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
#anchors = generate_anchors() #採用generate_anchors預設的形參
#pdb.set_trace()
A = anchors.shape[0]
shift_x = np.arange(0, width) * feat_stride #對應到原圖上產生anchor的中心點的位置
shift_y = np.arange(0, height) * feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
#shifts形成了(Xmin,Ymin,Xmax,Ymax)的形式,但是由於相當於枚舉了achor的中心點,所以Xmin=Xmax,Ymin=Ymax,並且acnhor是按照一行一行排列的
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
K = shifts.shape[0] #應該生成的anchor點的數量
# width changes faster, so here it is H, W, C
anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
length = np.int32(anchors.shape[0])
return anchors, length
生成anchor的的過程是(在fb的detectron框架中生成方式與這個也一樣,見部落格 detectron程式碼理解(六):對輸入樣本如何產生anchor):
(1)首先對一個cell生程anchor,此時這個anchor沒有位置點資訊,只有長寬而已,這個長寬滿足我們設計的anchor_scales和anchor_ratios,對於generate_anchors的解釋可以看:detectron程式碼理解(四):generate_anchors,生成完畢後A就是這個cell上anchor的數量,這裡為9,因為是3個anchor_scales和3個anchor_ratios的尺度的相乘的結果。
(2)之後根據我們輸入圖片的長寬,以及feat_stride,計算在這樣一張圖片上以stride為步長要在哪些位置生成anchor,此時才有了放置這些anchor的點shifts
(3)有了這些放置點的位置,將(1)步驟的anchor挪過去,就相當於在每一個位置生成了包含9中形態的anchors。
最後生成的anchor個數為38×50×9 = 17100個anchor
2.2 構建RPN層
def build_rpn(self, net, is_training, initializer):
#生成anchor(程式碼在前)
#建立RPN層,利用3×3×512來實現
rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")
self._act_summaries.append(rpn)
#rpn_cls_score.shape = (1,38,50,18) 每個anchor是二分類 這個H和W是最後一張特徵圖的大小(這裡假設原圖是600×800,經過16倍的下采樣後成為38×50)
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
# Change it so that the score has 2 as its channel size
# 過程:1.首先將rpn_cls_score變為to_caffe的形式,從 (1,38,50,18)變為(1,18,38,50)
# 2.再將to_caffe變為([self._batch_size], [num_dim, -1], [input_shape[2]]),其中num_dim 為下面輸入引數2,input_shape[2]是rpn_cls_score的50
# to_caffe = (1,2,9×38,50)= (1,2,342,50)
# 3.最後將上面的第二維度放到最後,就改變為(1,342,50,2)
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape') #rpn_cls_score_reshape.shape = (1,342,50,2)
#經過上面之後rpn_cls_score_reshape = (1,342,50,2)
#將rpn_cls_score_reshape變為(1×342×50,2)即(17100 2),再增加softmax
#最後再reshape成(1,342,50,2)的大小,所以rpn_cls_prob_reshape的大小為(1,342,50,2)
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob") #rpn_cls_prob.shape = (1,38,50,18)
rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape
最後RPN層的輸出為:
- rpn_cls_prob
- rpn_bbox_pred
- rpn_cls_score
- rpn_cls_score_reshape