pytorch yolov3 yolo層的構建矩陣運算思維啟蒙損失函式要求公示裡面的乘以相應的anchor

阿新 • • 發佈：2018-12-01

上一篇：pytorch yolov3 構建class Darknet 腦海中過一遍

其實上一篇講到的，構建route和shortcut層，基本是簡單的層之間的疊加操作，但是yolo層要相對複雜些。

寫部落格的過程中意識到了，作者如何將功能分塊實現。你比如：

1. 轉換輸入：根據cfg檔案，先把每個block單獨儲存（作為字典），放到blocks（列表）當中。

2. 根據blocks中的block字典資訊可以建立module（nn.Sequential()），放到module_list（nn.moduleList）當中。其中涉及到不認識的層，route、shortcut和yolo層無法確定分給哪一個，我們先建立新的層，初始化在新層的init裡面

，但不在新層類的forward函式。具體操作，再說。（其實是放到了實現最後一步的darknet類的forward當中）【其實還不直接寫到最初的forward裡面，這樣呼叫簡單】

一、 yolo層引數

二、 yolo層初始化

需要anchor的加入，除此之外和普通的module沒差，所以建立一個DetectionLayer層，其他的功能放到主類Darknet的forward函式裡面。

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

三、 yolo層的實現

在Darknet類初始化裡呼叫了create_module來構建框架。其中yolo層初始化的過程

        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]
    
            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]#mask 
            anchors = [anchors[i] for i in mask]
    
            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

在class Darknet(nn.Module):的forward函式中,主要是predict_transform函式

            elif module_type == 'yolo':        
                anchors = self.module_list[i][0].anchors
                #Get the input dimensions
                inp_dim = int (self.net_info["height"])
        
                #Get the number of classes
                num_classes = int (module["classes"])
        
                #Transform 
                x = x.data
                #將x由 n c w h _> n w*h*3 c 
                #batch_size, 3*85, grid_size, grid_size)——》(batch_size, grid_size*grid_size*3, 5+類別數量)
                #在這個過程當中趁機 利用sigmod 將xywh改過來，因為需要xc和sigmod函式，迴歸嚒
    
                x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
                if not write:              #if no collector has been intialised. 
                    detections = x
                    write = 1
        
                else:       
                    detections = torch.cat((detections, x), 1)

predict_transform

矩陣思維啟蒙

這個用於3個anchors擴充套件成整個圖片的anchors。

torch.repeat()下面這張圖，是將4行2列，變成12行2列，首先利用repeat(1,3)行不變列三倍，然後view(-1,2).

np.meshgrid和torch.repeat()

grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)
#x_offset即cx,y_offset即cy，表示當前cell左上角座標
x_offset = torch.FloatTensor(a).view(-1,1) #13*13 其實和上面的圖是一樣的。
y_offset = torch.FloatTensor(b).view(-1,1)
#一行代表的是一個柵格位置的一個anchor對應的offset，列代表xy的offset值。

#因為是3個anchor，所以行數不變，列數增加為原先3倍，然後再變成2列。
x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)#在第0維度多加1

prediction[:,:,:2] += x_y_offset #bx=sigmoid(tx)+cx,by=sigmoid(ty)+cy

view和transpose共用。再來看下維度變換n,3*85,13,13的輸入，如何變成n,13*13*3,85的輸出。這樣一轉換，在第1維度上就可以torch.cat加26*26*3的結果和，52*52*3的結果。表示每一行的每一種anchor對應的85. n,3*85,13,13 _> n,13*13*3,85.

n,3*85,13,13 _> n,3*85,13*13

prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)

n,3*85,13*13 _> n,13*13,3*85

prediction = prediction.transpose(1,2).contiguous()

n,13*13,3*85 _> n,13*13*3,85

prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

矩陣乘法*

# 這裡的anchors本來是一個長度為6的list(三個anchors每個2個座標)，然後在0維上(行)進行了

# grid_size*grid_size個複製，在1維(列)上一次複製(沒有變化)，即對每個格子都得到三個anchor。

# Unsqueeze(0)的作用是在陣列上新增一維，這裡是在第0維上新增的。

# 新增grid_size是為了之後的公式bw=pw×e^tw的tw。
# (3,2)_>(13*13*3,2)
anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)

#對每一個都要對應的乘以相應的anchor
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors# 公式bw=pw×e^tw及bh=ph×e^th，pw為anchorbox的長度

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
    """
    prediction表示輸出的特徵圖，(batch_size, 3*85, 13, 13)
     # ——》(batch_size, 13*13*3, 5+80)
    """
    batch_size = prediction.size(0)
    # stride表示的是整個網路的步長
    # 等於影象原始尺寸與yolo層輸入的feature mapr尺寸相除
    stride =  inp_dim // prediction.size(2)#416//13=32
    # feature map每條邊格子的數量，416//32=13
    grid_size = inp_dim // stride
    # 一個方框屬性個數，等於5+類別數量
    bbox_attrs = 5 + num_classes
    # anchors數量
    num_anchors = len(anchors)   

    # batch_size, num_anchors * bbox_attrs, grid_size, grid_size  — —》 batch_size, num_anchors*bbox_attrs,, grid_size*grid_size

    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)

    # batch_size, 85*3, 13*13— —》batch_size, 13*13，3*85
    prediction = prediction.transpose(1,2).contiguous()
    # 將prediction維度轉換成(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)。不看batch_size，
    # (grid_size*grid_size*num_anchors, bbox_attrs)相當於將所有anchor按行排列，即一行對應一個anchor屬性，此時的屬性仍然是feature map得到的值
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
    # 錨點的維度與net塊的height和width屬性一致。這些屬性描述了輸入影象的維度，比feature map的規模大（二者之商即是步幅）。因此，我們必須使用stride分割錨點。變換後的anchors是相對於最終的feature map的尺寸
    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]
 
    #Sigmoid the tX, tY. and object confidencce.tx與ty為預測的座標偏移值
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])
    
    #這裡生成了每個格子的左上角座標，生成的座標為grid x grid的二維陣列，a，b分別對應這個二維矩陣的x,y座標的陣列，a,b的維度與grid維度一樣。每個grid cell的尺寸均為1，故grid範圍是[0,12]（假如當前的特徵圖13*13）
    grid = np.arange(grid_size)
    a,b = np.meshgrid(grid, grid)
    #x_offset即cx,y_offset即cy，表示當前cell左上角座標
    x_offset = torch.FloatTensor(a).view(-1,1)#13*13
    y_offset = torch.FloatTensor(b).view(-1,1)
 
    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()
    #一行代表的是一個柵格位置的一個anchor對應的offset，列代表xy的offset值。因為是3個anchor，所以行數不變，列數增加為原先3倍，然後再變成2列。
    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)#在第0維度多加1
 
    prediction[:,:,:2] += x_y_offset#bx=sigmoid(tx)+cx,by=sigmoid(ty)+cy
 
    #[(),(),()]np _> [[],[],[]]tensor,(3,2)
    anchors = torch.FloatTensor(anchors)
 
    if CUDA:
        anchors = anchors.cuda()
    # 這裡的anchors本來是一個長度為6的list(三個anchors每個2個座標)，然後在0維上(行)進行了grid_size*grid_size個複製，在1維(列)上
    # 一次複製(沒有變化)，即對每個格子都得到三個anchor。Unsqueeze(0)的作用是在陣列上新增一維，這裡是在第0維上新增的。新增grid_size是為了之後的公式bw=pw×e^tw的tw。
    # (3,2)_>(13*13*3,2)
    anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
    #對網路預測得到的矩形框的寬高的偏差值進行指數計算，然後乘以anchors裡面對應的寬高(這裡的anchors裡面的寬高是對應最終的feature map尺寸grid_size)，
    # 得到目標的方框的寬高，這裡得到的寬高是相對於在feature map的尺寸
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors#公式bw=pw×e^tw及bh=ph×e^th，pw為anchorbox的長度
    # 這裡得到每個anchor中每個類別的得分。將網路預測的每個得分用sigmoid()函式計算得到
    prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))
 
    prediction[:,:,:4] *= stride#將相對於最終feature map的方框座標和尺寸映射回輸入網路圖片(416x416)，即將方框的座標乘以網路的stride即可
    
    return prediction

pytorch yolov3 yolo層的構建矩陣運算思維啟蒙損失函式要求公示裡面的乘以相應的anchor

pytorch yolov3 yolo層的構建矩陣運算思維啟蒙損失函式要求公示裡面的乘以相應的anchor

pytorch相關知識（BN、dropoutput+模型載入+損失函式）

pytorch yolov3 構建class Darknet 腦海中過一遍

HDOJ 題目5097 Page Rank（矩陣運算，模擬）

HDU多校賽第9場 HDU 4965Fast Matrix Calculation【矩陣運算+數學小知識】

Python學習筆記5 【轉載】基本矩陣運算_20170618

python中的矩陣運算

numpy數組、向量、矩陣運算

Eigen中的noalias()：解決矩陣運算的混淆問題

MATLAB矩陣運算 MATLAB對數組運算在符號上做了不同的約定，運算符形式為：.* , ./ , . , .^

算法導論筆記第二十八章矩陣運算

矩陣及矩陣運算

卷積轉換為矩陣運算中填充數的計算-GEMM

數學基礎詳解——矩陣運算

caffe的卷積層的乘積運算的優化

【ACM-ICPC 2018 徐州賽區網絡預賽】E. End Fantasy VIX 血辣（矩陣運算的推廣）

矩陣運算

數據挖掘比賽，構建矩陣時的腦殘行為

MATLAB編程與應用系列-第3章矩陣運算（2）

MATLAB編程與應用系列-第3章矩陣運算（1）

pytorch yolov3 yolo層的構建 矩陣運算思維啟蒙 損失函式要求公示裡面的乘以相應的anchor

相關推薦

pytorch yolov3 yolo層的構建矩陣運算思維啟蒙損失函式要求公示裡面的乘以相應的anchor