faster-rcnn之caffe下利用vgg16訓練及預測
工作中經常用到py-faster-rcnn做圖片的檢測與識別,訓練過程有必要記錄一下,下面是參照網上的一些資料整理實踐後的總結:
py-faster-rcnn的github地址:https://github.com/rbgirshick/py-faster-rcnn
資料採用VOC 2007格式。
一、製作資料集
程式/工具:VOC2007資料夾、labelImg
處理流程:影象重新命名為6位數字,使用labelImg工具標定,根據xml生成四個txt(train.txt、val.txt、test.txt、trainval.txt),將jpg、xml、txt等檔案按照邏輯圖所示位置存放
資料生成工具類可參考:
二、修改網路檔案
train.txt:
models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt VGG16的train.prototxt
Line 11:’num_classes’: 2 修改成 損傷型別數目+1(背景算一類)
Line 530:’num_classes’: 2 修改成 損傷型別數目+1(背景算一類)
Line 620:num_output: 2 修改成 損傷型別數目+1(背景算一類)
Line 643:num_output: 8 此處數字應為 (損傷類別數+1)*4 “4”是指bbox的四個角
test.prototxt:
models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt VGG16的test.prototxt
Line 567:num_output: 2 修改成 損傷型別數目+1(背景算一類)
Line 592:num_output: 8 此處數字應為 (損傷類別數+1)*4 “4”是指bbox的四個角
pascal_voc.py
lib/datasets/pascal_voc.py 修改line 31 修改為自定義型別
三、 執行程式
每次改動資料記得清空快取 rm -rf data/cache
終端訪問py-faster-rcnn目錄,輸入以下命令:
./experiments/scripts/faster_rcnn_end2end.sh 0 VGG16 pascal_voc
0表示使用GPU 0執行程式,可修改;VGG16表示使用的網路
四、 預測階段
直接上程式碼:
#!/usr/bin/env python
import _init_paths
from fast_rcnn.config import cfg
from fast_rcnn.test import im_detect
from fast_rcnn.nms_wrapper import nms
import numpy as np
import scipy.io as sio
import caffe, os, sys, cv2
import argparse
CLASSES = ('__background__',
'hand')
def get_detections(im, class_name, dets, thresh=0.5):
"""Draw detected bounding boxes."""
inds = np.where(dets[:, -1] >= thresh)[0]
if len(inds) == 0:
return None
bboxs = []
for i in inds:
bbox = dets[i, :4]
bboxs.append([int(bbox[0]),int(bbox[1]),int(bbox[2]),int(bbox[3])])
return bboxs
def frcn_predict(net,img_im):
# Detect all object classes and regress object bounds
scores, boxes = im_detect(net, img_im)
# Visualize detections for each class
CONF_THRESH = 0.65
NMS_THRESH = 0.15
res_dict = {}
for cls_ind, cls in enumerate(CLASSES[1:]):
cls_ind += 1 # because we skipped background
cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]
cls_scores = scores[:, cls_ind]
dets = np.hstack((cls_boxes,
cls_scores[:, np.newaxis])).astype(np.float32)
keep = nms(dets, NMS_THRESH)
dets = dets[keep, :]
boxs = get_detections(img_im, cls, dets, thresh=CONF_THRESH)
if boxs is not None:
res_dict[cls] = boxs
return res_dict
def get_init_net():
cfg.TEST.HAS_RPN = True # Use RPN for proposals
prototxt = r'models/online/models/pascal_voc/VGG16/faster_rcnn_end2end/hand_test.prototxt'
caffemodel = r'models/online/faster_rcnn_models/vgg16_faster_rcnn_hand_iter_500000.caffemodel'
if not os.path.isfile(caffemodel):
raise IOError(('{:s} not found.\n').format(caffemodel))
caffe.set_mode_gpu()
caffe.set_device(0)
cfg.GPU_ID = 0
net = caffe.Net(prototxt, caffemodel, caffe.TEST)
print '\n\nLoaded network {:s}'.format(caffemodel)
# Warmup on a dummy image
im = 128 * np.ones((300, 500, 3), dtype=np.uint8)
for i in xrange(2):
_, _= im_detect(net, im)
return net
if __name__ == '__main__':
net = get_init_net()
img_path = r'data/VOCdevkit/VOC2007_lisa/JPEGImages/5500.jpg'
im = cv2.imread(img_path)
res=frcn_predict(net, im)
print(res)
返回結果res結果示例,資料格式:{label:[pic1_point,pic2_point,…]}:
{'hand': [[482, 347, 570, 438], [52, 289, 147, 362], [104, 261, 273, 375]]}