YOLO配置文件理解
[net] batch=64 每batch個樣本更新一次參數。 subdivisions=8 如果內存不夠大,將batch分割為subdivisions個子batch,每個子batch的大小為batch/subdivisions。 在darknet代碼中,會將batch/subdivisions命名為batch。 height=416 input圖像的高 width=416 Input圖像的寬 channels=3 Input圖像的通道數 momentum=0.9 動量 decay=0.0005 權重衰減正則項,防止過擬合 angle=0 通過旋轉角度來生成更多訓練樣本 saturation = 1.5 通過調整飽和度來生成更多訓練樣本 exposure = 1.5 通過調整曝光量來生成更多訓練樣本 hue=.1 通過調整色調來生成更多訓練樣本 learning_rate=0.0001 初始學習率 max_batches = 45000 訓練達到max_batches後停止學習 policy=steps 調整學習率的policy,有如下policy:CONSTANT, STEP, EXP, POLY, STEPS, SIG, RANDOM steps=100,25000,35000 根據batch_num調整學習率 scales=10,.1,.1 學習率變化的比例,累計相乘 [convolutional] batch_normalize=1 是否做BN filters=32 輸出多少個特征圖 size=3 卷積核的尺寸 stride=1 做卷積運算的步長 pad=1 如果pad為0,padding由 padding參數指定。如果pad為1,padding大小為size/2 activation=leaky 激活函數: logistic,loggy,relu,elu,relie,plse,hardtan,lhtan,linear,ramp,leaky,tanh,stair [maxpool] size=2 池化層尺寸 stride=2 池化步進 [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 ...... ...... ####### [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [route] the route layer is to bring finer grained features in from earlier in the network layers=-9 [reorg] the reorg layer is to make these features match the feature map size at the later layer. The end feature map is 13x13, the feature map from earlier is 26x26x512. The reorg layer maps the 26x26x512 feature map onto a 13x13x2048 feature map so that it can be concatenated with the feature maps at 13x13 resolution. stride=2 [route] layers=-1,-3 [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=125 region前最後一個卷積層的filters數是特定的,計算公式為filter=num*(classes+5) 5的意義是5個坐標,論文中的tx,ty,tw,th,to activation=linear [region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 預選框,可以手工挑選, 也可以通過k means 從訓練樣本中學出 bias_match=1 classes=20 網絡需要識別的物體種類數 coords=4 每個box的4個坐標tx,ty,tw,th num=5 每個grid cell預測幾個box softmax=1 使用softmax做激活函數 jitter=.2 通過抖動增加噪聲來抑制過擬合 rescore=1 暫理解為一個開關,非0時通過重打分來調整l.delta(預測值與真實值的差) object_scale=5 暫理解為計算損失時預測框中有物體時的權重 noobject_scale=1 暫理解為計算損失時預測框中無物體時的權重 class_scale=1 暫理解為計算類別損失時的權重 coord_scale=1 暫理解為計算損失時坐標偏差的權重 absolute=1 thresh = .6 random=0 是否隨機確定最後一個預測框
darknet對應代碼
找到cfg文件解析的代碼,選擇detector demo 作為入口
darknet.c文件 main 函數開始
} else if (0 == strcmp(argv[1], "detector")){
run_detector(argc, argv);
Detector.c文件 run_detector函數
char *prefix = find_char_arg(argc, argv, "-prefix", 0); float thresh = find_float_arg(argc, argv, "-thresh", .24); float hier_thresh = find_float_arg(argc, argv, "-hier", .5); int cam_index = find_int_arg(argc, argv, "-c", 0); int frame_skip = find_int_arg(argc, argv, "-s", 0); if(argc < 4){ fprintf(stderr, "usage: %s %s [train/test/valid] [cfg] [weights (optional)]\n", argv[0], argv[1]); return; } char *gpu_list = find_char_arg(argc, argv, "-gpus", 0); char *outfile = find_char_arg(argc, argv, "-out", 0); ...... ...... else if(0==strcmp(argv[2], "demo")) { list *options = read_data_cfg(datacfg); int classes = option_find_int(options, "classes", 20); char *name_list = option_find_str(options, "names", "data/names.list"); char **names = get_labels(name_list); demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, hier_thresh); }
read_data_cfg函數解析配置文件,保存到options指針。
class
int classes = option_find_int(options, "classes", 20);
classes為YOLO可識別的種類數
batch、learning_rate、momentum、decay和 subdivisions
demo.c文件demo函數
net = parse_network_cfg(cfgfile);
Parser.c文件 parse_network_cfg函數
list *sections = read_cfg(filename); node *n = sections->front; if(!n) error("Config file has no sections"); network net = make_network(sections->size - 1); net.gpu_index = gpu_index; size_params params; section *s = (section *)n->val; list *options = s->options; if(!is_network(s)) error("First section must be [net] or [network]"); parse_net_options(options, &net);
parse_net_options函數
net->batch = option_find_int(options, "batch",1);
net->learning_rate = option_find_float(options, "learning_rate", .001);
net->momentum = option_find_float(options, "momentum", .9);
net->decay = option_find_float(options, "decay", .0001);
int subdivs = option_find_int(options, "subdivisions",1);
net->time_steps = option_find_int_quiet(options, "time_steps",1);
net->batch /= subdivs;
net->batch *= net->time_steps;
net->subdivisions = subdivs;
learning_rate為初始學習率,訓練時的真正學習率和學習率的策略及初始學習率有關。
momentum為動量,在訓練時加入動量可以幫助走出local minima 以及saddle point。
decay是權重衰減正則項,用來防止過擬合。
batch的值等於cfg文件中的batch/subdivisions 再乘以time_steps。
time_steps在yolo默認的cfg中是沒有配置的,所以是默認值1。
因此batch可以認為就是cfg文件中的batch/subdivisions。
前面有提到batch的意義是每batch個樣本更新一次參數。
而subdivisions的意義在於降低對GPU memory的要求。
darknet將batch分割為subdivisions個子batch,每個子batch的大小為batch/subdivisions,並將子batch命名為batch。
我們看下訓練時和batch有關的代碼
Detector.c文件的train_detector函數
#ifdef GPU
if(ngpus == 1){
loss = train_network(net, train);
} else {
loss = train_networks(nets, ngpus, train, 4);
}
#else
loss = train_network(net, train);
#endif
Network.c文件的train_network函數
int batch = net.batch;
int n = d.X.rows / batch;
float *X = calloc(batch*d.X.cols, sizeof(float));
float *y = calloc(batch*d.y.cols, sizeof(float));
int i;
float sum = 0;
for(i = 0; i < n; ++i){
get_next_batch(d, batch, i*batch, X, y);
float err = train_network_datum(net, X, y);
sum += err;
}
train_network_datum函數
*net.seen += net.batch;
......
......
forward_network(net, state);
backward_network(net, state);
float error = get_network_cost(net);
if(((*net.seen)/net.batch)%net.subdivisions == 0) update_network(net);
我們看到,只有((*net.seen)/net.batch)%net.subdivisions == 0時才會更新網絡參數。
*net.seen是已經訓練過的子batch數,((*net.seen)/net.batch)%net.subdivisions的意義正是已經訓練過了多少個真正的batch。
policy、steps和scales
Parser.c文件 parse_network_cfg函數
char *policy_s = option_find_str(options, "policy", "constant");
net->policy = get_policy(policy_s);
net->burn_in = option_find_int_quiet(options, "burn_in", 0);
if(net->policy == STEP){
net->step = option_find_int(options, "step", 1);
net->scale = option_find_float(options, "scale", 1);
} else if (net->policy == STEPS){
char *l = option_find(options, "steps");
char *p = option_find(options, "scales");
if(!l || !p) error("STEPS policy must have steps and scales in cfg file");
int len = strlen(l);
int n = 1;
int i;
for(i = 0; i < len; ++i){
if (l[i] == ‘,‘) ++n;
}
int *steps = calloc(n, sizeof(int));
float *scales = calloc(n, sizeof(float));
for(i = 0; i < n; ++i){
int step = atoi(l);
float scale = atof(p);
l = strchr(l, ‘,‘)+1;
p = strchr(p, ‘,‘)+1;
steps[i] = step;
scales[i] = scale;
}
net->scales = scales;
net->steps = steps;
net->num_steps = n;
} else if (net->policy == EXP){
net->gamma = option_find_float(options, "gamma", 1);
} else if (net->policy == SIG){
net->gamma = option_find_float(options, "gamma", 1);
net->step = option_find_int(options, "step", 1);
} else if (net->policy == POLY || net->policy == RANDOM){
net->power = option_find_float(options, "power", 1);
}
get_policy函數
if (strcmp(s, "random")==0) return RANDOM;
if (strcmp(s, "poly")==0) return POLY;
if (strcmp(s, "constant")==0) return CONSTANT;
if (strcmp(s, "step")==0) return STEP;
if (strcmp(s, "exp")==0) return EXP;
if (strcmp(s, "sigmoid")==0) return SIG;
if (strcmp(s, "steps")==0) return STEPS;
fprintf(stderr, "Couldn‘t find policy %s, going with constant\n", s);
return CONSTANT;
學習率動態調整的策略有多種,YOLO默認使用的是steps。
yolo-voc.cfg文件:
steps=100,25000,35000
scales=10,.1,.1
Network.c文件get_current_rate函數
int batch_num = get_current_batch(net);
int i;
float rate;
switch (net.policy) {
case CONSTANT:
return net.learning_rate;
case STEP:
return net.learning_rate * pow(net.scale, batch_num/net.step);
case STEPS:
rate = net.learning_rate;
for(i = 0; i < net.num_steps; ++i){
if(net.steps[i] > batch_num) return rate;
rate *= net.scales[i];
//if(net.steps[i] > batch_num - 1 && net.scales[i] > 1) reset_momentum(net);
}
return rate;
get_current_batch獲取的是(*net.seen)/(net.batch*net.subdivisions),即真正的batch。
steps的每個階段是根據batch_num劃分的,根據配置文件,學習率會在batch_num達到100、25000、35000時發生改變。
當前的學習率是初始學習率與當前階段及之前所有階段對應的scale的總乘積。
convolutional超參數加載
Parser.c文件parse_network_cfg函數
LAYER_TYPE lt = string_to_layer_type(s->type);
if(lt == CONVOLUTIONAL){
l = parse_convolutional(options, params);
parse_convolutional函數
int n = option_find_int(options, "filters",1);
int size = option_find_int(options, "size",1);
int stride = option_find_int(options, "stride",1);
int pad = option_find_int_quiet(options, "pad",0);
int padding = option_find_int_quiet(options, "padding",0);
if(pad) padding = size/2;
char *activation_s = option_find_str(options, "activation", "logistic");
ACTIVATION activation = get_activation(activation_s);
int batch,h,w,c;
h = params.h;
w = params.w;
c = params.c;
batch=params.batch;
if(!(h && w && c)) error("Layer before convolutional layer must output image.");
int batch_normalize = option_find_int_quiet(options, "batch_normalize", 0);
需要註意的是如果enable了pad,cfg文件中的padding不會生效,實際的padding值為size/2。
YOLO配置文件理解