1. 程式人生 > >目標檢測入門:tensorflow實現faster rcnn——TFFRCNN

目標檢測入門:tensorflow實現faster rcnn——TFFRCNN

1.需要下載的資料、程式碼、檔案:

資料:Pascal voc2007資料集

2.訓練和測試

直接使用論文訓練好的模型進行測試:demo.py(在faster_rcnn資料夾下)

  • 進入lib資料夾下進行make
cd ./lib
make
  • 在根目錄下新建model資料夾,將下載的VGGnet_fast_rcnn_iter_70000.ckpt檔案放在model資料夾下
  • 將faster_rcnn資料夾下的demo.py檔案移動到根目錄下,並修改demo.py
# 在import下新增以下兩行程式碼
import glob
plt.switch_backend('agg') 


# 將最後幾行程式碼改成如下形式:  
for im_name in im_names:
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Demo for {:s}'.format(im_name)
    demo(sess, net, im_name)
    plt.savefig(im_name)

# plt.show()
  • 執行demo.py
python demo.py --model model/VGGnet_fast_rcnn_iter_70000.ckpt

自己訓練:train_net.py

  • 在data資料夾下新建pretrain_model資料夾,將下載的VGG_16.npy檔案放在pretrain_model資料夾下
  • 將下載的voc2007資料集放在data資料夾下並解壓,將解壓後的資料夾重新命名為VOCdevkit2007
  • 執行train_net.py
python ./faster_rcnn/train_net.py --gpu 0 --restore 0 --weights /root/hujiahui/TFFRCNN-master/data/pretrain_model//VGG_16.npy --imdb voc_2007_trainval --iters 70000 --cfg /root/hujiahui/TFFRCNN-master/experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir

3.走過的坑:

 (1)tensorflow.python.framework.errors_impl.NotFoundError: ./lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,需要修改lib資料夾下的make.sh檔案,修改後如下:

#!/usr/bin/env bash
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC

CUDA_PATH=/usr/local/cuda/

cd roi_pooling_layer

nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
#	roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

# for gcc5-built tf
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
	roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
	-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
	
cd ..


# add building psroi_pooling layer
cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc \
	psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
#	psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

cd ..

  (2)tensorflow.python.framework.errors_impl.NotFoundError: ./faster_rcnn/../lib/psroi_pooling_layer/psroi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,如果再次出現錯誤,需要繼續修改lib資料夾下的make.sh檔案,修改後如下:

#!/usr/bin/env bash
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC

CUDA_PATH=/usr/local/cuda/

cd roi_pooling_layer

nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
#	roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

# for gcc5-built tf
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
	roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
	-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
	
cd ..

# add building psroi_pooling layer
cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0\
	psroi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
	-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64

## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
#	psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

cd ..

(3)TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType……(我覺得解決辦法有點麻煩,所以直接改了程式碼),在lib/fast_rcnn/train.py檔案155行左右:

# load vgg16
if self.pretrained_model is not None and not restore:
    print
    'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
    self.net.load(self.pretrained_model, sess, True)
    # try:
    #     print
    #     'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
    #     self.net.load(self.pretrained_model, sess, True)
    # except:
    #     raise 'Check your pretrained model {:s}'.format(self.pretrained_model)

(4)如果在訓練階段忽視了所有的網路層,即ignore……,說明下載的VGG16.npy和論文中要求的VGG_imagenet.npy有些不同,需要對lib/networks/network.py中的load函式進行一下修改:

def load(self, data_path, session, ignore_missing=False):
    data_dict = np.load(data_path).item()
    for key in data_dict:
        with tf.variable_scope(key, reuse=True):
            for subkey in data_dict[key]:
                try:
                    # var = tf.get_variable(subkey)
                    # session.run(var.assign(data_dict[key][subkey]))
                    # print "assign pretrain model "+subkey+ " to "+key
                    var = tf.get_variable("weights")
                    session.run(var.assign(data_dict[key][0]))
                    var = tf.get_variable("biases")
                    session.run(var.assign(data_dict[key][1]))
                    print
                    "assign pretrain model " + " to " + key
                except ValueError:
                    print
                    "ignore " + key
                    if not ignore_missing:
                        raise

(5)缺少各種的環境配置,如yaml和skimage等:

sudo adpt-get install python-skimage
sudo adpt-get install python-yaml