1. 程式人生 > >pytorch實現yolov3(3) 實現forward

pytorch實現yolov3(3) 實現forward

之前的文章裡https://www.cnblogs.com/sdu20112013/p/11099244.html實現了網路的各個layer.
本篇來實現網路的forward的過程.

定義網路

class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)

實現網路的forward過程

forward函式繼承自nn.Module

Convolutional and Upsample Layers

if module_type == "convolutional" or module_type == "upsample":
            x = self.module_list[i](x)

Route Layer / Shortcut Layer

在上一篇裡講過了,route layer的輸出是之前某一層或某兩層在depth方向的連線.即

output[current_layer] = output[previous_layer]
或者
map1 = outputs[i + layers[0]]
map2 = outputs[i + layers[1]]
output[current layer]=torch.cat((map1, map2), 1)

所以route layer程式碼如下:

        elif module_type == "route":
            layers = module["layers"]
            layers = [int(a) for a in layers]

            if (layers[0]) > 0:
                layers[0] = layers[0] - i

            if len(layers) == 1:
                x = outputs[i + (layers[0])]

            else:
                if (layers[1]) > 0:
                    layers[1] = layers[1] - i

                map1 = outputs[i + layers[0]]
                map2 = outputs[i + layers[1]]

                x = torch.cat((map1, map2), 1)

shortcut layer的輸出為前一層及前xx層(配置檔案中配置)的輸出之和

        elif  module_type == "shortcut":
            from_ = int(module["from"])
            x = outputs[i-1] + outputs[i+from_]

YOLO layer

yolo層的輸出是一個n*n*depth的feature map矩陣.假設你想訪問第(5,6)個cell的第2個boundingbox的話你需要map[5,6,(5+C):2*(5+C)]這樣訪問,這種形式操作起來有點麻煩,所以我們引入一個predict_transform函式來改變一下輸出的形式.

簡而言之我們希望把一個batch_size*grid_size*grid_size*(B*(5+C))的4-D矩陣轉換為batch_size*(grid_size*grid_size*B)*(5+C)的矩陣.
2-D矩陣的每一行的排列如下:

    batch_size = prediction.size(0)
    stride =  inp_dim // prediction.size(2)
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)
    
    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
    prediction = prediction.transpose(1,2).contiguous()
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

上述程式碼涉及到pytorch中view的用法,和numpy中resize類似.contiguous一般與transpose,permute,view搭配使用,維度變換後tensor在記憶體中不再是連續儲存的,而view操作要求連續儲存,所以需要contiguous.最終我們得到一個batch_size*(grid_size*grid_size*num_anchors)*bbox_attrs的矩陣.

接下來要對預測boundingbox的座標.

注意此時prediction[:,:,0],prediction[:,:,1],prediction[:,:,2],prediction[:,:,3]prediction[:,:,4]即相應的tx,ty,tw,th,obj score.
接下來是預測相對當前cell左上角的offset

    #sigmoid轉換為0-1範圍內
    #Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])
    
    #Add the center offsets
    grid = np.arange(grid_size)
    a,b = np.meshgrid(grid, grid)
    
    x_offset = torch.FloatTensor(a).view(-1,1)
    y_offset = torch.FloatTensor(b).view(-1,1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)
    
    #prediction[:,:,:0],prediction[:,:,:1]修改為相對於當前cell偏移
    prediction[:,:,:2] += x_y_offset

有關meshgrid用法效果如下:

import numpy as np
import torch
grid_size = 13
grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)
print(a)
print(b)

x_offset = torch.FloatTensor(a).view(-1,1)
#print(x_offset)
y_offset = torch.FloatTensor(b).view(-1,1)

這段程式碼輸出如下:

預測boundingbox的width,height.注意anchors的大小要轉換為適配當前feature map的大小.配置檔案中配置的是相對於模型輸入的大小.

    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]  #適配到feature map上的尺寸
    
    #log space transform height and the width
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors
    
    ##還原為原始圖片上對應的座標
    prediction[:,:,:4] *= stride

預測class probability

prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))

predict_transform完整程式碼如下

#yolo經過不斷地卷積得到的feature map size= batch_size*(B*(5+C))*grid_size*grid_size
def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
    if CUDA:
        prediction = prediction.to(torch.device("cuda")) #使用gpu torch0.4不需要 torch1.0需要

    batch_size = prediction.size(0)
    stride =  inp_dim // prediction.size(2)
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)
    
    print("prediction.shape=",prediction.shape)
    print("batch_size=",batch_size)
    print("inp_dim=",inp_dim)
    #print("anchors=",anchors)
    #print("num_classes=",num_classes)

    print("grid_size=",grid_size)
    print("bbox_attrs=",bbox_attrs)


    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
    prediction = prediction.transpose(1,2).contiguous()
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

    #Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

    #Add the center offsets
    grid = np.arange(grid_size).astype(np.float32)
    a,b = np.meshgrid(grid, grid)

    x_offset = torch.FloatTensor(a).view(-1,1)
    y_offset = torch.FloatTensor(b).view(-1,1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)

    print(type(x_y_offset),type(prediction[:,:,:2]))
    prediction[:,:,:2] += x_y_offset

    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]  #適配到和feature map大小匹配
    #log space transform height and the width
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors

    prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))

    prediction[:,:,:4] *= stride #恢復到原始圖片上的相應座標,width,height等

    return prediction

助手函式寫好了,現在來繼續實現Darknet類的forward方法

            elif module_type == "yolo":
                anchors = self.module_list[i][0].anchors
                inp_dim = int(self.net_info["height"])
                num_classes = int (module["classes"])
                x = x.data
                x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
                if not write:              #if no collector has been intialised. 
                    detections = x
                    write = 1
                else:       
                    detections = torch.cat((detections, x), 1)

在沒有寫predict_transform之前,不同的feature map矩陣,比如13*13*N1,26*26*N2,52*52*N3是沒法直接連線成一個tensor的,現在都變成了xx*(5+C)則可以了.
上面程式碼裡的write flag主要是為了區別detections是否為空,為空則說明是第一個yolo layer做的預測,將yolo層的輸出賦值給predictions,不為空則連線當前yolo layer的輸出至detections.

測試

下載測試圖片wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png

def get_test_input():
    img = cv2.imread("dog-cycle-car.png")
    img = cv2.resize(img, (608,608))          #Resize to the input dimension
    img_ =  img[:,:,::-1].transpose((2,0,1))  # BGR -> RGB | H X W C -> C X H X W 
    img_ = img_[np.newaxis,:,:,:]/255.0       #Add a channel at 0 (for batch) | Normalise
    img_ = torch.from_numpy(img_).float()     #Convert to float
    img_ = Variable(img_)                     # Convert to Variable
    return img_
    
model = Darknet("cfg/yolov3.cfg")
inp = get_test_input()
pred = model(inp, torch.cuda.is_available())
print (pred)

cv2.imread()匯入圖片時是BGR通道順序,並且是h*w*c,比如416*416*3這種格式,我們要轉換為3*416*416這種格式.如果有

  • RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
    在predict_transform開頭新增prediction = prediction.to(torch.device("cuda")) #使用gpu
  • RuntimeError: shape '[1, 255, 3025]' is invalid for input of size 689520
    注意檢查你的input的img的大小和你模型的輸入大小是否匹配. 比如模型是608*608的

最終測試結果如下:

預測出22743個boundingbox,一共3種feature map,分別為19*19,38*38,76*76 每種尺度下預測出3個box,一共3*(19*19 + 38*38 + 76*76) = 22743個b