yolo v2 損失函式原始碼（訓練核心程式碼）解讀和其實現原理

阿新 • • 發佈：2019-02-13

前提說明：

1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_）

yolo: https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

yolo v2: https://zhuanlan.zhihu.com/p/25167153

2, 本文僅解讀 yolo v2 的 loss 函式的原始碼，該程式碼請使用如下命令

git clone https://github.com/pjreddie/darknet

後開啟 src/region_layer.c 檢視

3, yolo 的官方網站地址為：https://pjreddie.com/darknet/yolo/

4, 我除錯程式碼時使用的命令是：

./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23

最新版yolo v2的損失函式的原始碼解讀(解釋無GPU版本)，如下：

void forward_region_layer(const region_layer l, network_state state)
{
    int i,j,b,t,n;
	//size代表著每個box需要預測出來的引數。
    int size = l.coords + l.classes + 1;
    memcpy(l.output, state.input, l.outputs*l.batch*sizeof(float));
    #ifndef GPU
    flatten(l.output, l.w*l.h, size*l.n, l.batch, 1);
    #endif
    for (b = 0; b < l.batch; ++b){
        for(i = 0; i < l.h*l.w*l.n; ++i){
            int index = size*i + b*l.outputs;
            l.output[index + 4] = logistic_activate(l.output[index + 4]);
        }
    }
#ifndef GPU
    if (l.softmax_tree){
        for (b = 0; b < l.batch; ++b){
            for(i = 0; i < l.h*l.w*l.n; ++i){
                int index = size*i + b*l.outputs;
                softmax_tree(l.output + index + 5, 1, 0, 1, l.softmax_tree, l.output + index + 5);
            }
        }
    } else if (l.softmax){
        for (b = 0; b < l.batch; ++b){
            for(i = 0; i < l.h*l.w*l.n; ++i){
                int index = size*i + b*l.outputs;
                softmax(l.output + index + 5, l.classes, 1, l.output + index + 5);
            }
        }
    }
#endif
    if(!state.train) return;
    memset(l.delta, 0, l.outputs * l.batch * sizeof(float));
    float avg_iou = 0;
    float recall = 0;
    float avg_cat = 0;
    float avg_obj = 0;
    float avg_anyobj = 0;
    int count = 0;
    int class_count = 0;
    *(l.cost) = 0;
	//這裡是對批處理的所有影象進行前向求損失值。
    for (b = 0; b < l.batch; ++b) {
		//沒有使用這個softmax分類器，即不會進入這部分程式碼。
        if(l.softmax_tree){
            int onlyclass = 0;
            for(t = 0; t < 30; ++t){
                box truth = float_to_box(state.truth + t*5 + b*l.truths);
                if(!truth.x) break;
                int class = state.truth[t*5 + b*l.truths + 4];
                float maxp = 0;
                int maxi = 0;
                if(truth.x > 100000 && truth.y > 100000){
                    for(n = 0; n < l.n*l.w*l.h; ++n){
                        int index = size*n + b*l.outputs + 5;
                        float scale =  l.output[index-1];
                        float p = scale*get_hierarchy_probability(l.output + index, l.softmax_tree, class);
                        if(p > maxp){
                            maxp = p;
                            maxi = n;
                        }
                    }
                    int index = size*maxi + b*l.outputs + 5;
                    delta_region_class(l.output, l.delta, index, class, l.classes, l.softmax_tree, l.class_scale, &avg_cat);
                    ++class_count;
                    onlyclass = 1;
                    break;
                }
            }
            if(onlyclass) continue;
        }
		/*
		這裡的l.h,l.w分別是最後卷積輸出的特徵圖解析度。l.n是anchor box的個數，這個機制是借鑑Faster R-CNN
		的迴歸方法。l.n這個引數跟配置檔案的anchors、num有關,值就是num一樣。其跟V1版的不同,V1版的是不管最後輸出
		的特徵圖解析度多少都是把起分成7*7個cell,而V2的每個特徵點就是一個cell,優點就是：能迴歸和識別更小的物體。
		*/
        for (j = 0; j < l.h; ++j) {
            for (i = 0; i < l.w; ++i) {
				//這個l.n是代表著特徵點需要進行預測的不同尺寸的box個數，box寬高大小跟配置檔案裡的anchor係數有關。
                for (n = 0; n < l.n; ++n) {
                    int index = size*(j*l.w*l.n + i*l.n + n) + b*l.outputs;
                    box pred = get_region_box(l.output, l.biases, n, index, i, j, l.w, l.h);
                    float best_iou = 0;
                    int best_class = -1;
					//這裡是假設每個特徵點cell最多隻能有30個物體坐落在相同位置。其實這裡的閾值影響不大的，其主要跟truth.x有關。
                    for(t = 0; t <30; ++t){
						// get truth_box's x, y, w, h  
                        box truth = float_to_box(state.truth + t*5 + b*l.truths);
						// 遍歷完圖片中的所有物體後退出
						if (!truth.x)
							break;
                        float iou = box_iou(pred, truth);
						//選出iou最大那個框作為最後預測框～
                        if (iou > best_iou) {
                            best_class = state.truth[t*5 + b*l.truths + 4];
                            best_iou = iou;
                        }	
                    }
					//計算有沒有目標的梯度
                    avg_anyobj += l.output[index + 4];
                    l.delta[index + 4] = l.noobject_scale * ((0 - l.output[index + 4]) * logistic_gradient(l.output[index + 4]));
                    if(l.classfix == -1) l.delta[index + 4] = l.noobject_scale * ((best_iou - l.output[index + 4]) * logistic_gradient(l.output[index + 4]));
                    else{
                        if (best_iou > l.thresh) {
                            l.delta[index + 4] = 0;
                            if(l.classfix > 0){
                                delta_region_class(l.output, l.delta, index + 5, best_class, l.classes, l.softmax_tree, l.class_scale*(l.classfix == 2 ? l.output[index + 4] : 1), &avg_cat);
                                ++class_count;
                            }
                        }
                    }
					//這裡要訓練的圖片張數達到12800後能進入
                    if(*(state.net.seen) < 12800){
                        box truth = {0};
                        truth.x = (i + .5)/l.w;
                        truth.y = (j + .5)/l.h;
                        truth.w = l.biases[2*n];
                        truth.h = l.biases[2*n+1];
                        if(DOABS){
                            truth.w = l.biases[2*n]/l.w;
                            truth.h = l.biases[2*n+1]/l.h;
                        }
						// 將預測的 tx, ty, tw, th 和 實際box計算得出的 tx',ty', tw', th' 的差存入 l.delta 
                        delta_region_box(truth, l.output, l.biases, n, index, i, j, l.w, l.h, l.delta, .01);
                    }
                }
            }
        }
		//執行到這步，則所有特徵圖上的所有格子都被標註，即代表有沒有物體在此區域。
        for(t = 0; t < 30; ++t){
			// get truth_box's x, y, w, h  
            box truth = float_to_box(state.truth + t*5 + b*l.truths);
            if(!truth.x) break;
            float best_iou = 0;
            int best_index = 0;
            int best_n = 0;
            i = (truth.x * l.w);
            j = (truth.y * l.h);
            //printf("%d %f %d %f\n", i, truth.x*l.w, j, truth.y*l.h);
			// 上面獲得了 truth box 的 x,y,w,h，這裡講 truth box 的 x,y 偏移到 0,0，記
			//為 truth_shift.x, truth_shift.y，這麼做是為了方便計算 iou
            box truth_shift = truth;
            truth_shift.x = 0;
            truth_shift.y = 0;
            //printf("index %d %d\n",i, j);
		    //這裡是計算具有真實物體的地方與anchor boxs的匹配值。
            for(n = 0; n < l.n; ++n){
				//獲得box的index。其中size是每個box需要計算的引數，(j*l.w*l.n + i*l.n + n)計算的是第幾個格子
				//b*l.outputs計算的是第幾張輸入圖片的特徵圖，這樣算就是為了計算位置。
                int index = size*(j*l.w*l.n + i*l.n + n) + b*l.outputs;
				//獲得box的預測，這裡先是座標位置x，y，w，h，而剩下的兩個confidence放到後面，
                box pred = get_region_box(l.output, l.biases, n, index, i, j, l.w, l.h);
				//box的w，h是根據anchors生成的，其中l.biases就是配置檔案裡的那些anchors引數
                if(l.bias_match){
                    pred.w = l.biases[2*n];
                    pred.h = l.biases[2*n+1];
                    if(DOABS){
                        pred.w = l.biases[2*n]/l.w;
                        pred.h = l.biases[2*n+1]/l.h;
                    }
                }
                //printf("pred: (%f, %f) %f x %f\n", pred.x, pred.y, pred.w, pred.h);
				//這裡也把box位置移到0,0;這麼做是為了方便計算IOU。
                pred.x = 0;
                pred.y = 0;
                float iou = box_iou(pred, truth_shift);
                if (iou > best_iou){
                    best_index = index;
                    best_iou = iou;
                    best_n = n;
                }
            }
            //printf("%d %f (%f, %f) %f x %f\n", best_n, best_iou, truth.x, truth.y, truth.w, truth.h);
			// 計算 box 和 truth box 的 iou  
			float iou = delta_region_box(truth, l.output, l.biases, best_n, best_index, i, j, l.w, l.h, l.delta, l.coord_scale);
			//如果大於閾值則召回率加1.
			if(iou > .5) recall += 1;
            avg_iou += iou;
			//執行到這裡，位置的迴歸基本完成，下面主要是進行目標分類的操作
            //l.delta[best_index + 4] = iou - l.output[best_index + 4];
            avg_obj += l.output[best_index + 4];
			//這裡logistic_gradient把具有目標的區域進行邏輯迴歸分類，計算其輸出的類別分數。
            l.delta[best_index + 4] = l.object_scale * (1 - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]);
            if (l.rescore) {
				// 用 iou 代替上面的 1(經除錯，l.rescore = 1，因此能走到這裡)  
                l.delta[best_index + 4] = l.object_scale * (iou - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]);
            }

			// 獲得真實的 class  
            int class = state.truth[t*5 + b*l.truths + 4];
            if (l.map) class = l.map[class];
			// 把所有 class 的預測概率與真實 class 的 0/1 的差 * scale，然後存入 l.delta 裡相應 class 序號的位置  
            delta_region_class(l.output, l.delta, best_index + 5, class, l.classes, l.softmax_tree, l.class_scale, &avg_cat);
            ++count;
            ++class_count;
        }
    }
    //printf("\n");
    #ifndef GPU
    flatten(l.delta, l.w*l.h, size*l.n, l.batch, 0);
    #endif
	// 現在，l.delta 中的每一個位置都存放了 class、confidence、x, y, w, h 的差，於是通過 mag_array 遍歷所有位置，計算每個位置的平方的和後開根  
	// 然後利用 pow 函式求平方 
    *(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2);
    printf("Region Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, Avg Recall: %f,  count: %d\n", avg_iou/count, avg_cat/class_count, avg_obj/count, avg_anyobj/(l.w*l.h*l.n*l.batch), recall/count, count);
}

注：上面的程式碼解釋是個人蔘考網上資料後的一些見解，其中如有不對的地方，大家可以指出了，通過修改完善造福更多人。

yolo v2 損失函式原始碼（訓練核心程式碼）解讀和其實現原理

前提說明： 1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_） yolo: https://zhuanlan.

yolo v2 損失函式原始碼解讀

前提說明： 1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_） yo

YOLO v2 損失函式原始碼分析

損失函式的定義是在region_layer.c檔案中，關於region層使用的引數在cfg檔案的最後一個section中定義。首先來看一看region_layer 都定義了那些屬性值： layer make_region_layer(int batch, int w, int h, int n,

主題模型TopicModel：LSA（隱性語義分析）模型和其實現的早期方法SVD

傳統方法向量空間模型（VSM）的缺點傳統向量空間模型使用精確的詞匹配，即精確匹配使用者輸入的詞與向量空間中存在的詞。由於一詞多義(polysemy)和一義多詞(synonymy)的存在，使得該模型無法提供給使用者語義層面的檢索。比如使用者搜尋”automobile”，即汽車，

C++虛擬函式表（含測試程式碼）

自己搞不懂C++虛擬函式之間的呼叫關係，特地花費一個下午加一個晚上查資料學習，現在把學到的發上來，供大家學習批評；在此之前感謝這些大佬的部落格等，為我解惑甚多： 1、虛表與虛表指標 C++中的虛擬函式的實現一般是通過虛擬函式表（V-Table）來實

source insight 新增系統庫函式原始碼（包含原始碼下載下載地址）

1、在BASE專案下新增Project->Open Project ，開啟Base專案2、開啟PreferencesProject->Preferences，選擇Symbol Lookups選項卡3、開啟Import Symbols for All

tensorflow 分類損失函式問題（有點坑）

tf.nn.softmax_cross_entropy_with_logits(記為f1) 和 tf.nn.sparse_softmax_cross_entropy_with_logits(記為f3),以及 tf.nn.softmax_cross_entropy_with_logits_v2(記為f

HashMap原始碼之hash()函式分析（JDK 1.8）

我們知道，使用雜湊的容器，其高效能的主要影響因素之一就是hash值。在HashMap中，為了更好的效能，我們希望作為Key的物件提供一個合理的hash函式以便能將其合理的分配到桶中。而在實際的HashMap中，對從物件獲取的hash值又

通過自動回復機器人學Mybatis（搭建核心架構）

root -c 驅動 ear resource any 異常 cep driver imooc視頻學習筆記 ----> URL：http://www.imooc.com/learn/154 MessageDao.java package com.imooc.d

Hibernate學習筆記2.4（Hibernate核心開發接口和三種狀態）

thread rsa 分享 action let 能力 ima 方法如果 1.configuration（配置信息管理，產生sessionfactory） sessionfactory管理一系列的連接池 opensession 永遠打開新的，需要手動close get

Excel 函式集（使用過的）

1. SUBTOTAL函式 SUBTOTAL（函式編號, 區域）　（包含隱藏值） Function_num&nbs

【譯】節選--揭祕命名函式表示式（Named function expressions ）

作者：Juriy "kangax" Zaytsev 原文連結：kangax.github.io/nfe/ 簡介令人驚訝的是，在網上，關於命名函式表示式的討論似乎並不多。這可能因為有很多誤解在流傳。在本文中，我會試著從理論和實踐兩個方面總結這些精彩的Javascript構念，包括其中好的

網頁程式設計技術一（瀏覽器核心介紹）

1、瀏覽器核心由兩部分組成：渲染引擎和JavaScript引擎　　渲染引擎：負責獲取網頁（HTML、XML、圖形等）、整理資訊（CSS）以及計算網頁顯示方式　　JavaScript引擎：負責解析和執行JavaScript程式碼來實現網頁的動態效果。（由於JavaScript引擎越來越獨立，核心傾向於指

Python 集合內建函式大全（非常全！）

Python集合內建函式操作大全集合（s）.方法名等價符號方法說明 s.issubset(t) s <= t 子集測試（允許不嚴格意義上的子集）：s 中所有的元素都是 t 的成員

JavaScript 面向物件之二 —— 函式上下文（this的指向）

本系列文章根據《愛前端邵山歡老師深入淺出的js面向物件》視訊整理歸納函式上下文在 JavaScript 中，函式的上下文是有規律可循的，基本可以分為以下幾項：規律一：函式用圓括號呼叫，函式上下文是 window 物件。如下，函式 function f

matlab常見函式總彙（不定時更新）

1magic（）隨機矩陣生成 2.eye（n）輸出n階單位方陣 3.rand（a，b）隨機矩陣 4.linspace（a，b，n）線性等分，a、b為等差數列的初值和終值，n是節點數 5.logspace（as，bf，n）等比數列 6.size（a）查驗矩陣維數 7.length（a）查驗向量

深度學習之經驗和訓練集（訓練中英文樣本）

深度學習之經驗和訓練集（訓練中英文樣本）學習深度學習和在深度學習兩年多，積累了很多的學習資料，以及一些經驗吧。學習目標是什麼？這個是一切深度學習需要明確的目標 *目前在這個方面，前人研究到什麼程度？遇到那些困哪？本人研究的是中文自然語言的讀（機器以某人的

Raft演算法（zookeeper核心演算法）

轉自： https://www.cnblogs.com/mindwind/p/5231986.html Leslie Lamport 在三十多年前發表的論文《拜占庭將軍問題》（參考[1]）。拜占庭位於如今的土耳其的伊斯坦布林，是東羅馬帝國的首都。由於當時拜占庭羅馬帝國

基於shatter tookit外掛實現場景模型切割（附外掛及核心程式碼）

前言專案需求就是，把當前視野內的模型摳下來，然後資料匯出供後邊的模組使用，是一個三維模擬相關的專案。專案結構比較簡單，主要分三個部分： 1.獲取視野內不需要切割的模型列表，挨個克隆並寫入結果集陣列 2.獲取視野邊界線上的需要切割的模型列表，按順序克隆切割並把視野內的部分寫入結果集陣列

深度學習基礎--loss與啟用函式--CTC（Connectionist temporal classification）的loss

CTC（Connectionist temporal classification）的loss 用在online sequence。由於需要在分類結果中新增一個{no gesture}的類別，如果用在segmented video的分類時，需要去掉這類（因為視訊總屬於某個類）。

yolo v2 損失函式原始碼（訓練核心程式碼）解讀和其實現原理

相關推薦