1. 程式人生 > >深度學習之圖片語義分割程式碼實現

深度學習之圖片語義分割程式碼實現

使用tensorflow和python,vgg基礎上實現FCN8s網路,實現圖片語義分割:

資料集:VOC2012/ImageSets/Segmentation中,分為train.txt 1464張圖片和val.txt1449張圖片。

# class
classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
           'dog', 'horse', 'motorbike', 'person', 'potted plant',
           'sheep', 'sofa', 'train', 'tv/monitor']

# RGB color for each class
colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128],
            [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0],
            [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128],
            [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0],
            [0, 192, 0], [128, 192, 0], [0, 64, 128]]

預訓練模型:modelzoo中的VGG16模型

資料集處理:使用convert_fcn_dataset.py指令碼,生成tfrecord檔案

FCN8s理解:

這是個基於VGG_16的全卷積網路示意圖,輸入是500X500(要把輸入變換處理成離原圖尺寸最近的32的倍數,例如:512x512),在最後的輸出,沒有做展平處理,還是保留了16x16x4096, 由於共有21個分類,分類輸出結果是16x16x21,

對結果做一個2x上取樣(或者叫反捲積或者是卷積轉置),得到32x32x21,然後從vgg_16/pool4中取出feature maps,通過一個分類器(1x1的卷積)後得到32x32x21的結果,再將這兩個同維度的結果通過element wise相加得到的還是32x32x21,接著再用此結果做一個2x上取樣,得到的結果是64x64x21。這時,從vgg_16/pool3中取出feature maps,同樣通過一個分類器(1x1的卷積)後得到64x64x21的結果,然後同樣將這兩個同維度的結果進行element wise相加得到的還是64x64x21的結果,對這個結果,最後執行一個8x的上取樣,就可以得到和原來輸入一致的結果512x512x21

如此,經過2個2x和1個8x的上取樣,就得到了FCN8s

程式碼實現:

#從vgg_16中的endpoints['vgg_16/pool4']中取出feature maps,然後再通過一個1x1的卷積對feature maps做
#一個21分類處理,初始化用的是zeros_initializer,所以輸出的結果不會有任何改變,賦值到aux_logits_16s,結果是32x32x21
pool4_feature = end_points['vgg_16/pool4']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_16s = slim.conv2d(pool4_feature, number_of_classes, [1, 1],
                                 activation_fn=None,
                                 weights_initializer=tf.zeros_initializer,
                                 scope='conv_pool4')


upsample_filter_np_x2 = bilinear_upsample_weights(2,  #2x上取樣的雙線性插值filter權重
                                                  number_of_classes)

upsample_filter_tensor_x2 = tf.Variable(upsample_filter_np_x2, name='vgg_16/fc8/t_conv_x2') #2x上取樣filter

#對vgg_16最後輸出的logits,執行2x的上取樣,得到的結果是和aux_logits_16s一樣大小,為32x32x21
upsampled_logits = tf.nn.conv2d_transpose(logits, upsample_filter_tensor_x2,
                                          output_shape=tf.shape(aux_logits_16s),
                                          strides=[1, 2, 2, 1],
                                          padding='SAME')

#將兩個同維度的feature maps做element wise相加,結果還是32x32x21,放入upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_16s

#再從vgg_16中的endpoints['vgg_16/pool3']中取出feature maps,然後再通過一個1x1的卷積對feature maps做
#一個21分類處理,初始化用的是zeros_initializer,所以輸出的結果不會有任何改變,賦值到aux_logits_8s,結果是64x64x21
pool3_feature = end_points['vgg_16/pool3']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_8s = slim.conv2d(pool3_feature, number_of_classes, [1, 1],
                             activation_fn=None,
                             weights_initializer=tf.zeros_initializer,
                             scope='conv_pool3')

#對upsampled_logits再做一次2x的上取樣,得到的結果和aux_logits_8s一樣大小,為64x64x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x2,
                                 output_shape=tf.shape(aux_logits_8s),
                                 strides=[1, 2, 2, 1],
                                         padding='SAME')

#將兩個同維度的feature maps做element wise相加,結果還是64x64x21,仍然賦值回upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_8s

upsample_filter_np_x8 = bilinear_upsample_weights(upsample_factor, #8x上取樣的雙線性插值filter權重
                                               number_of_classes)

upsample_filter_tensor_x8 = tf.Variable(upsample_filter_np_x8, name='vgg_16/fc8/t_conv_x8') #8x上取樣filter

#對64x44x21的upsampled_logits最後做一個8x上取樣,可以得到和驗證時的模型輸入一樣大小的結果512x352x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x8,
                                 output_shape=upsampled_logits_shape,
                                 strides=[1, upsample_factor, upsample_factor, 1],
                                         padding='SAME')

訓練:train.py

生成語義分割圖片和經CRF的圖片