深度學習之圖片語義分割程式碼實現
阿新 • • 發佈:2019-02-06
使用tensorflow和python,vgg基礎上實現FCN8s網路,實現圖片語義分割:
資料集:VOC2012/ImageSets/Segmentation中,分為train.txt 1464張圖片和val.txt1449張圖片。
# class classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor'] # RGB color for each class colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0], [0, 64, 128]]
預訓練模型:modelzoo中的VGG16模型
資料集處理:使用convert_fcn_dataset.py指令碼,生成tfrecord檔案
FCN8s理解:
這是個基於VGG_16的全卷積網路示意圖,輸入是500X500(要把輸入變換處理成離原圖尺寸最近的32的倍數,例如:512x512),在最後的輸出,沒有做展平處理,還是保留了16x16x4096, 由於共有21個分類,分類輸出結果是16x16x21,
對結果做一個2x上取樣(或者叫反捲積或者是卷積轉置),得到32x32x21,然後從vgg_16/pool4中取出feature maps,通過一個分類器(1x1的卷積)後得到32x32x21的結果,再將這兩個同維度的結果通過element wise相加得到的還是32x32x21,接著再用此結果做一個2x上取樣,得到的結果是64x64x21。這時,從vgg_16/pool3中取出feature maps,同樣通過一個分類器(1x1的卷積)後得到64x64x21的結果,然後同樣將這兩個同維度的結果進行element wise相加得到的還是64x64x21的結果,對這個結果,最後執行一個8x的上取樣,就可以得到和原來輸入一致的結果512x512x21
如此,經過2個2x和1個8x的上取樣,就得到了FCN8s
程式碼實現:
#從vgg_16中的endpoints['vgg_16/pool4']中取出feature maps,然後再通過一個1x1的卷積對feature maps做 #一個21分類處理,初始化用的是zeros_initializer,所以輸出的結果不會有任何改變,賦值到aux_logits_16s,結果是32x32x21 pool4_feature = end_points['vgg_16/pool4'] with tf.variable_scope('vgg_16/fc8'): aux_logits_16s = slim.conv2d(pool4_feature, number_of_classes, [1, 1], activation_fn=None, weights_initializer=tf.zeros_initializer, scope='conv_pool4') upsample_filter_np_x2 = bilinear_upsample_weights(2, #2x上取樣的雙線性插值filter權重 number_of_classes) upsample_filter_tensor_x2 = tf.Variable(upsample_filter_np_x2, name='vgg_16/fc8/t_conv_x2') #2x上取樣filter #對vgg_16最後輸出的logits,執行2x的上取樣,得到的結果是和aux_logits_16s一樣大小,為32x32x21 upsampled_logits = tf.nn.conv2d_transpose(logits, upsample_filter_tensor_x2, output_shape=tf.shape(aux_logits_16s), strides=[1, 2, 2, 1], padding='SAME') #將兩個同維度的feature maps做element wise相加,結果還是32x32x21,放入upsampled_logits upsampled_logits = upsampled_logits + aux_logits_16s #再從vgg_16中的endpoints['vgg_16/pool3']中取出feature maps,然後再通過一個1x1的卷積對feature maps做 #一個21分類處理,初始化用的是zeros_initializer,所以輸出的結果不會有任何改變,賦值到aux_logits_8s,結果是64x64x21 pool3_feature = end_points['vgg_16/pool3'] with tf.variable_scope('vgg_16/fc8'): aux_logits_8s = slim.conv2d(pool3_feature, number_of_classes, [1, 1], activation_fn=None, weights_initializer=tf.zeros_initializer, scope='conv_pool3') #對upsampled_logits再做一次2x的上取樣,得到的結果和aux_logits_8s一樣大小,為64x64x21 upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x2, output_shape=tf.shape(aux_logits_8s), strides=[1, 2, 2, 1], padding='SAME') #將兩個同維度的feature maps做element wise相加,結果還是64x64x21,仍然賦值回upsampled_logits upsampled_logits = upsampled_logits + aux_logits_8s upsample_filter_np_x8 = bilinear_upsample_weights(upsample_factor, #8x上取樣的雙線性插值filter權重 number_of_classes) upsample_filter_tensor_x8 = tf.Variable(upsample_filter_np_x8, name='vgg_16/fc8/t_conv_x8') #8x上取樣filter #對64x44x21的upsampled_logits最後做一個8x上取樣,可以得到和驗證時的模型輸入一樣大小的結果512x352x21 upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x8, output_shape=upsampled_logits_shape, strides=[1, upsample_factor, upsample_factor, 1], padding='SAME')
訓練:train.py
生成語義分割圖片和經CRF的圖片