目標檢測入門:tensorflow實現faster rcnn——TFFRCNN
1.需要下載的資料、程式碼、檔案:
資料:Pascal voc2007資料集
2.訓練和測試
直接使用論文訓練好的模型進行測試:demo.py(在faster_rcnn資料夾下)
- 進入lib資料夾下進行make
cd ./lib
make
- 在根目錄下新建model資料夾,將下載的VGGnet_fast_rcnn_iter_70000.ckpt檔案放在model資料夾下
- 將faster_rcnn資料夾下的demo.py檔案移動到根目錄下,並修改demo.py
# 在import下新增以下兩行程式碼 import glob plt.switch_backend('agg') # 將最後幾行程式碼改成如下形式: for im_name in im_names: print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' print 'Demo for {:s}'.format(im_name) demo(sess, net, im_name) plt.savefig(im_name) # plt.show()
- 執行demo.py
python demo.py --model model/VGGnet_fast_rcnn_iter_70000.ckpt
自己訓練:train_net.py
- 在data資料夾下新建pretrain_model資料夾,將下載的VGG_16.npy檔案放在pretrain_model資料夾下
- 將下載的voc2007資料集放在data資料夾下並解壓,將解壓後的資料夾重新命名為VOCdevkit2007
- 執行train_net.py
python ./faster_rcnn/train_net.py --gpu 0 --restore 0 --weights /root/hujiahui/TFFRCNN-master/data/pretrain_model//VGG_16.npy --imdb voc_2007_trainval --iters 70000 --cfg /root/hujiahui/TFFRCNN-master/experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir
3.走過的坑:
(1)tensorflow.python.framework.errors_impl.NotFoundError: ./lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,需要修改lib資料夾下的make.sh檔案,修改後如下:
#!/usr/bin/env bash TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') echo $TF_INC CUDA_PATH=/usr/local/cuda/ cd roi_pooling_layer nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \ -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52 ## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below #g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \ # roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 # for gcc5-built tf g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \ roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \ -fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64 cd .. # add building psroi_pooling layer cd psroi_pooling_layer nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \ -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52 g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc \ psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 ## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below #g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \ # psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 cd ..
(2)tensorflow.python.framework.errors_impl.NotFoundError: ./faster_rcnn/../lib/psroi_pooling_layer/psroi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,如果再次出現錯誤,需要繼續修改lib資料夾下的make.sh檔案,修改後如下:
#!/usr/bin/env bash
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC
CUDA_PATH=/usr/local/cuda/
cd roi_pooling_layer
nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
# roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
# for gcc5-built tf
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
cd ..
# add building psroi_pooling layer
cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0\
psroi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
# psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
cd ..
(3)TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType……(我覺得解決辦法有點麻煩,所以直接改了程式碼),在lib/fast_rcnn/train.py檔案155行左右:
# load vgg16
if self.pretrained_model is not None and not restore:
print
'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
self.net.load(self.pretrained_model, sess, True)
# try:
# print
# 'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
# self.net.load(self.pretrained_model, sess, True)
# except:
# raise 'Check your pretrained model {:s}'.format(self.pretrained_model)
(4)如果在訓練階段忽視了所有的網路層,即ignore……,說明下載的VGG16.npy和論文中要求的VGG_imagenet.npy有些不同,需要對lib/networks/network.py中的load函式進行一下修改:
def load(self, data_path, session, ignore_missing=False):
data_dict = np.load(data_path).item()
for key in data_dict:
with tf.variable_scope(key, reuse=True):
for subkey in data_dict[key]:
try:
# var = tf.get_variable(subkey)
# session.run(var.assign(data_dict[key][subkey]))
# print "assign pretrain model "+subkey+ " to "+key
var = tf.get_variable("weights")
session.run(var.assign(data_dict[key][0]))
var = tf.get_variable("biases")
session.run(var.assign(data_dict[key][1]))
print
"assign pretrain model " + " to " + key
except ValueError:
print
"ignore " + key
if not ignore_missing:
raise
(5)缺少各種的環境配置,如yaml和skimage等:
sudo adpt-get install python-skimage
sudo adpt-get install python-yaml