YOLOv2如何fine-tuning?
在上一篇用YOLOv2模型訓練VOC資料集中,我們嘗試用YOLOv2來訓練voc資料集,但我想訓練自己的資料集,那麼YOLOv2如何做fine-tuning呢?我們一步一步來做~
1 準備資料
1.1 建立層次結構
首先在darknet/data資料夾下建立一個資料夾fddb2016,檔案層次如下
--fddb2016
--Annotations
2002_07_19_big_img_130.xml
2002_07_25_big_img_84.xml
2002_08_01_big_img_1445.xml
2002_08_08_big_img_277.xml
2002_08_16_big_img_637.xml
2002_08_25_big_img_199.xml
2003_01_01_big_img_698.xml
.
.
.
--ImageSets
--Main
test.txt
trainval.txt
--JPEGImages
2002_07_19_big_img_130.jpg
2002_07_25_big_img_84.jpg
2002_08_01_big_img_1445.jpg
2002_08_08_big_img_277.jpg
2002_08_16_big_img_637.jpg
2002_08_25_big_img_199.jpg
2003_01_01_big_img_698.jpg
.
.
.
--labels
trainval.txt中存放的是圖片的名稱,我們來看一下
2002_08_11_big_img_591
2002_08_26_big_img_265
2002_07_19_big_img_423
2002_08_24_big_img_490
2002_08_31_big_img_17676
2002_ 07_31_big_img_228
.
.
.
1.2 xml2txt
因為yolo讀取的是txt文件,所以我們要將xml的benchmark修改為txt格式,程式如下所示:
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import cv2
#sets=[('fddb2016', 'train'), ('fddb2016', 'val')]
#classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
classes = ["face"]
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
def convert_annotation(w, h, image_id):
in_file = open('fddb2016/Annotations/%s.xml' % image_id)
out_file = open('fddb2016/labels/%s.txt'% image_id, 'w')
print in_file
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult) == 1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
wd = getcwd()
if not os.path.exists('fddb2016/labels/'):
os.makedirs('fddb2016/labels/')
image_ids = open('fddb2016/ImageSets/Main/trainval.txt').read().strip().split()
list_file = open('fddb2016_train.txt', 'w')
for image_id in image_ids:
list_file.write('%s/fddb2016/JPEGImages/%s.jpg\n'% (wd, image_id))
image = cv2.imread('%s/fddb2016/JPEGImages/%s.jpg'%(wd, image_id))
h, w, c = image.shape
convert_annotation(w, h, image_id)
list_file.close()
2 Fine tuning
2.1 修改.cfg檔案
如果你想用22層模型的就修改cfg/yolo-voc.cfg,你想用9層的模型就修改cfg/tiny-yolo-voc.cfg,兩者修改方式一樣,我們以yolo-voc.cfg為例:
複製cfg檔案
$cp cfg/yolo-voc.cfg cfg/yolo-fddb.cfg
開啟yolo-fddb.cfg檔案,並作如下修改
a. 將learning_rate=0.0001改為learning_rate=0.00005
b. 將max_batches = 45000改為max_batches = 200000
c. 將classes=20改為classes=1
d. 將最後一層[convolutional]層的filters=125改為filters=30,filters的計算公式如下,請根據你自己資料的類別數量修改
filters=num∗(classes+coords+1)=5∗(1+4+1)=30
最後結果如下:
[net]
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.0005
max_batches = 200000
policy=steps
steps=100,25000,35000
scales=10,.1,.1
.
.
.
[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear
[region]
anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1
absolute=1
thresh = .6
random=0
2.2 修改voc.names檔案
複製voc.names檔案
$cp data/voc.names data/fddb.names
修改fddb.names檔案,結果如下
face
2.3 修改voc.data檔案
複製voc.data檔案
$cp cfg/voc.data cfg/fddb.data
修改voc.data檔案,結果如下
classes= 1
train = /home/usrname/darknet-v2/data/fddb2016_train.txt
valid = valid = /home/pjreddie/data/voc/2007_test.txt
names = data/fddb.names
backup = /home/guoyana/my_files/local_install/darknet-v2/backup
3 開始訓練
YOLOv2已經支援多gpu了,利用voc資料集得到的權重來訓練,執行以下命令即可開始
./darknet detector train ./cfg/fddb.data ./cfg/yolo-fddb.cfg backup/yolo-voc_6000.weights -gpus 0,1,2,3
4 結果
3中有個問題:一般預訓練模型都用影象分類的模型,而不是用檢測模型訓練的。所以上面的方法還是有問題的,loss降到0.1之後就不再下降了。最後沒用預訓練模型來訓練網路,迭代了18000次後的效果如下所示(注:圖片來自百度圖片)
(END)