使用tensorflow來複現yolo-v3

yolo-v3是yolo系列的最新版本，用於物體檢測任務中，效果較好，能同時滿足精度和實時性的要求。但是yolo-v3使用的是作者自己寫的darknet框架，雖然darknet框架使用純c和cuda編寫而成，短小精悍，簡單而不失效能，但是由於其簡單性，意味著耦合性較高，使用者想要自己作出一些改動就需要修改原始碼。使用tensorflow將其復現，可以方便作出一些調整。

為了保障效能，預測階段前向傳播的演算法都是用tensorflow的操作來實現，沒有用python原生的迴圈、計算來完成，這樣前向傳播的耗時為20ms，做到和yolo-v3原版的darknet框架差不多。

下面是yolo-v3預測階段的程式碼。

from __future__ import print_function

from __future__ import division

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import numpy as np
import tensorflow as tf

import time

import matplotlib.pyplot as plt

%matplotlib inline

import cv2

input_dim = 416

bn_variance_epsilon = 1e-6


batch_size = 
 1

confidence_threshold = 0.55

iou_threshold=0.45

max_output_size = 1000

def print_tensor_info(t):
      print(t.op.name, ' ', t.get_shape().as_list())

# !python -m wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

blockList = []
current_block = {}
with open('cfg/yolov3.cfg') as 
 cfg:
    line = cfg.readline()
    while line:
        if line.startswith("#"):
            line = cfg.readline()
            continue
        if line.startswith("["):
            if current_block != {}:
                blockList.append(current_block)
            current_block = {}
            current_block["type"] = line[1:-2]
        else:
            if "=" in line:
    #             print(line)
                key, value = line.split("=")
                key = key.strip()
                value = value.strip()
    #             print(key, value)
                current_block[key] = value

        line = cfg.readline()
    if current_block != {}:
        blockList.append(current_block)

yolo_blocks = []
for block in blockList:
    if block["type"] == "yolo":
        yolo_blocks.append(block)

yolo_blocks[0]

{'type': 'yolo',
 'mask': '6,7,8',
 'anchors': '10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326',
 'classes': '80',
 'num': '9',
 'jitter': '.3',
 'ignore_thresh': '.7',
 'truth_thresh': '1',
 'random': '1'}

def yolo_layer(forward_input, block_info):
    shape = forward_input.get_shape()
    
    if len(shape.as_list()) == 4:
        shape_list = [int(shape[0]), int(shape[1]), int(shape[2]), 3, int(shape[3]) // 3]
        yolo_tensor = tf.reshape(forward_input, shape_list)
    else:
        yolo_tensor = forward_input
    # yolo_tensor的形狀為(1, 13, 13, 3, 85)
    # 85 = 1 （objectness) + 4 (bbox: x_center_in_gird, y_center_in_gird, width, height) + 80 (80 classes confidence)
    
    # stride from input image to feature map
    # 資料輸入為416 * 416， 那麼第一個yolo層是13 * 13， stride是32.
    # 第二個yolo層的stride為16，第三個為8
    stride = input_dim // shape.as_list()[1]

    
    # 總共9個prior box大小，每個yolo層使用屬於它的3個
    mask_list = [ int(x) for x in block_info["mask"].split(",")]
    anchors_list = [ int(x.strip()) for x in block_info["anchors"].split(",")]
    prior_width_height = [ tuple([anchors_list[2*i], anchors_list[2*i + 1] ]) for i in mask_list]
    
    assert(len(prior_width_height) == 3)
    
    # 三個prior box 輸出的bbox

    box_width_height1 = tf.exp(yolo_tensor[:, :, :, 0:1, 2:4]) * prior_width_height[0] 
    box_width_height2 = tf.exp(yolo_tensor[:, :, :, 1:2, 2:4]) * prior_width_height[1] 
    box_width_height3 = tf.exp(yolo_tensor[:, :, :, 2:3, 2:4]) * prior_width_height[2] 
    

    box_width_height = tf.concat([box_width_height1, box_width_height2, box_width_height3], axis=3)

    
    confidence_tensor = tf.sigmoid(yolo_tensor[:, :, :, :, 4:])

    
    # 過濾出objectness大於threshold的物體索引
    # result_indexes的形狀為( N, H, W, anchor)
    result_indexes = tf.where(tf.greater_equal(tf.sigmoid(confidence_tensor[:, :, :, :, 0]), confidence_threshold))
    
    
    # TODO 此處若無+1 操作，則bbox座標不對
#     gird_x_y_raw = tf.cast(result_indexes[:,1:3],  tf.float32)
    gird_x_y_raw = tf.cast(result_indexes[:,1:3],  tf.float32) + 1
    
    # indexes 中座標為（N, H, W, anchor) 需要將H,W 互換
    gird_x_y = tf.concat([gird_x_y_raw[:,1:], gird_x_y_raw[:,0:1]] , axis=1)
    
    
    #過濾檢測結果
    result_boxes = tf.gather_nd(yolo_tensor, result_indexes)
    
    #x, y 座標是在這個網格中的相對座標
    center_x_y = tf.sigmoid(result_boxes[:, 0:2])
    # 轉化為全圖絕對座標
    output_center_x_y = (gird_x_y + center_x_y) * stride
    
    output_width_height =  tf.gather_nd(box_width_height, result_indexes)
    
    x_min = tf.maximum(output_center_x_y[:, 0:1] - 0.5 * output_width_height[:, 0:1], 0)
    x_max = tf.minimum(output_center_x_y[:, 0:1] + 0.5 * output_width_height[:, 0:1], input_dim)
    y_min = tf.maximum(output_center_x_y[:, 1:2] - 0.5 * output_width_height[:, 1:2], 0)
    y_max = tf.minimum(output_center_x_y[:, 1:2] + 0.5 * output_width_height[:, 1:2], input_dim)

    output_xmin_ymin_xmax_y_max = tf.concat([x_min, y_min, x_max, y_max], axis=1)
    
    # 將絕對的座標值轉為相對座標值
    output_xmin_ymin_xmax_y_max_float = output_xmin_ymin_xmax_y_max / input_dim
        
    classes_confidence = tf.gather_nd(confidence_tensor[:, :, :, :,1:], result_indexes)
    index_int = tf.argmax(classes_confidence ,axis=1)[ :, tf.newaxis]
    max_index = tf.cast(index_int, dtype=tf.float32)
    max_confidence = tf.reduce_max(classes_confidence ,axis=1)[ :, tf.newaxis]
    
    # 輸出的張量形狀為 (None, 6)
    # 6 = 4 (bbox),  1 (class_index), 1 (class_confidence)
    yolo_output = tf.concat([output_xmin_ymin_xmax_y_max_float,  max_index, max_confidence], axis=1)
    return yolo_output

def make_one_layer(forward_input, block, layersList, np_weights,  weights_ptr):
    # 根據block資訊，從上一層構建下一層
    
    forward_output = None
    
    # 讀取yolov3.weights權重檔案的指標
    next_weights_ptr = 0
    
    if block['type'] == "net":
#         forward_output = tf.placeholder(dtype=tf.float32, shape = 
#                                         [batch_size, int(block["height"]), int(block["width"]), int(block["channels"])])
        forward_output = tf.placeholder(dtype=tf.float32, shape = 
                                        [batch_size, input_dim, input_dim, int(block["channels"])])
        with tf.name_scope("input") as scope:
            return tf.identity(forward_output, name="x" ), weights_ptr
    if block["type"] == "convolutional":
        input_filters = int(forward_input.get_shape()[3])
        output_filters =  int(block["filters"])
        kernel_size = int(block["size"])
        kernel_shape = [int(block["size"]),  int(block["size"]), input_filters, output_filters]
        stride = int(block["stride"])
        strides = [1, stride, stride, 1]
        pad = (kernel_size - 1) / 2 
        # batch norm load more weights
        try:
            batch_norm = block["batch_normalize"]
        except:
            batch_norm = 0
        if batch_norm:
            
            #batch norm weights
            bn_offset = np_weights[weights_ptr:weights_ptr+output_filters]
            weights_ptr += output_filters
            
            bn_scale = np_weights[weights_ptr:weights_ptr+output_filters]
            weights_ptr += output_filters
            
            bn_mean = np_weights[weights_ptr:weights_ptr+output_filters]
            weights_ptr += output_filters
            
            bn_var = np_weights[weights_ptr:weights_ptr+output_filters]
            weights_ptr += output_filters
            
            #conv filter weights
            conv_weights_num = output_filters * input_filters * kernel_size * kernel_size
            np_conv_weights = np_weights[weights_ptr:weights_ptr+conv_weights_num]
            weights_ptr += conv_weights_num
            np_conv_weights = np_conv_weights.reshape( output_filters, input_filters, kernel_size, kernel_size)
#             print(np_conv_weights.shape)
#             print(tf.constant(np_conv_weights, dtype=tf.float32))

#             conv_kernel =  (tf.transpose(tf.constant(np_conv_weights, dtype=tf.float32), [2, 3, 1, 0]))
            # 此處之前使用tensorflow來做權重張量轉置操作，很費時。不需要重複轉置，所以使用numpy一次轉置就夠了
            np_conv_weights = np.transpose(np_conv_weights, [2, 3, 1, 0])
            conv_kernel =  tf.constant(np_conv_weights, dtype=tf.float32)
    
            conv_output = tf.nn.conv2d(forward_input, conv_kernel, strides, "SAME")
            
            bn_output = tf.nn.batch_normalization(conv_output, tf.constant(bn_mean, dtype=tf.float32), 
                                                 tf.constant(bn_var, dtype=tf.float32), tf.constant(bn_offset, dtype=tf.float32),
                                                 tf.constant(bn_scale, dtype=tf.float32), bn_variance_epsilon)
            
            if block["activation"] == "leaky":
                leaky_relu_outpu = tf.nn.leaky_relu(bn_output, alpha=0.1)
                # layersList.append(leaky_relu_outpu)
                with tf.name_scope("conv"+str(len(layersList))) as scope:
                    return tf.identity(leaky_relu_outpu, name=scope ), weights_ptr
            
            # layersList.append(bn_output)
            raise Exception("unhandled condition")
            return bn_output, weights_ptr
        else:
            # no batch_norm
            
            #conv bias
            conv_bias = np_weights[weights_ptr:weights_ptr+output_filters]
            weights_ptr += output_filters
            
            #conv filter weights
            conv_weights_num = output_filters * input_filters * kernel_size * kernel_size
            np_conv_weights = np_weights[weights_ptr:weights_ptr+conv_weights_num]
            weights_ptr += conv_weights_num
            np_conv_weights = np_conv_weights.reshape( output_filters, input_filters, kernel_size, kernel_size)
            
#             conv_kernel =  (tf.transpose(tf.constant(np_conv_weights, dtype=tf.float32), [2, 3, 1, 0]))
            
            np_conv_weights = np.transpose(np_conv_weights, [2, 3, 1, 0])
            conv_kernel =  tf.constant(np_conv_weights, dtype=tf.float32)
            
            
            conv_output = tf.nn.conv2d(forward_input, conv_kernel, strides, "SAME")
            
            conv_bias_output = tf.nn.bias_add(conv_output, tf.constant(conv_bias, dtype=tf.float32))
            
            # layersList.append(conv_bias_output)
            with tf.name_scope("conv"+str(len(layersList))) as scope:
                return tf.identity(conv_bias_output, name=scope), weights_ptr
    
    if  block["type"] == "route":
        layers = []
        if "," not in block["layers"]:
            layers.append(int(block["layers"]))
        else:
            for l in block["layers"].split(","):
                layers.append(int(l.strip()))
        
#         for l in layers:
#             if l <0:
#                 l = len(layersList) + l


        # 此處，因為layerlist開頭多了一個input_data的layer，所以concat的位置應該向後一位
        layers_new=[]
        for l in layers:
            if l<0:
                layers_new.append(len(layersList) + l)
            else:
                layers_new.append( l +1 )
        layers = layers_new
        
        if len(layers) == 1:
            with tf.name_scope("route"+str(len(layersList))) as scope:
                return tf.identity(layersList[layers[0]], name= scope), weights_ptr
        else:
            
            to_concat_list = [layersList[i] for i in layers]
            concat_result =tf.concat( to_concat_list, axis=3)
            
            with tf.name_scope("route"+str(len(layersList))) as scope:
                return tf.identity(concat_result, name=scope), weights_ptr
        

    if  block["type"] == "shortcut":
        prior_index = int(block["from"])
#         print("prior_index = ", prior_index)
#         print("layersList[prior_index]", layersList[prior_index])
        residual_layer = forward_input + layersList[prior_index]
    
        with tf.name_scope("shortcut"+str(len(layersList))) as scope:
            return tf.identity(residual_layer, name= scope), weights_ptr
        
    
    if  block["type"] == "upsample":
        new_height = int(block["stride"]) * int(forward_input.get_shape()[1])
        new_width = int(block["stride"]) * int(forward_input.get_shape()[2])
        
        upsample_layer = tf.image.resize_bilinear(forward_input, [new_height, new_width])
        
        with tf.name_scope("upsample"+str(len(layersList))) as scope:
            return tf.identity(upsample_layer, name= scope), weights_ptr
        
    
    if  block["type"] == "yolo":
        
        with tf.name_scope("yolo"+str(len(layersList))) as scope:
            return tf.identity(yolo_layer(forward_input, block_info=block), name=scope), weights_ptr
        
    
    raise Exception('unkown block type')

# !python -m wget https://pjreddie.com/media/files/yolov3.weights


              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    使用tensorflow來複現yolo-v3
      
							
							
							yolo-v3是yolo系列的最新版本，用於物體檢測任務中，效果較好，能同時滿足精度和實時性的要求。
但是yolo-v3使用的是作者自己寫的darknet框架，雖然darknet框架使用純c和cuda編寫而成，短小精悍，簡單而不失效能，但是由於其簡單性，意味著耦 

  
 

    

    
    【膜拜大神】Tensorflow實現YOLO v3（TF-Slim）
      
                

最近我一直在使用Tensorflow中的YOLO v3。我在GitHub上找不到任何適合我需要的實現，因此我決定將這個用PyTorch編寫的程式碼轉換為Tensorflow。與論文一起釋出的YOLO v3的原始配置可以在Darknet GitHub repo中找到。

我 

  
 

    

    
    使用Tensorflow來讀取訓練自己的資料（三）
       
 
 本文詳解training.py是如何編寫的。
 
 import os
import numpy as np
import tensorflow as tf
import input_data
import model

N_CLASSES = 2 # 二分類問題，只有是還是否，即0，1
IMG_W 

  
 

    

    
    使用Tensorflow來讀取訓練自己的資料（二）
       
 
 接上一篇，繼續分析，model.py，也就是模型的構建。兩個卷積層，兩個池化層，以及後面的全連線層怎麼通過tensorflow定義的。
 
 import tensorflow as tf

def inference(images, batch_size, n_classess):
    # c 

  
 

    

    
    使用Tensorflow來讀取訓練自己的資料（一）
       
 
 本文的程式碼以及思路都是參考別人的，現在只是整理一下思路，做一些解釋，畢竟是小白。
   首先本文所使用的圖片資料都是https://www.kaggle.com/下載的，使用的是貓和狗的圖片集，https://www.kaggle.com/c/dogs-vs-cats-redux-ker 

  
 

    

    
    YOLO V3在windows下配置
       
  
  
 乾貨：yolo v3需要下載的大家族 
 哈羅！ 歡迎大家看小花同學的配置心得，小花第一次配這個。也是看一些大神的教程，給大家推薦幾個啦。 https://blog.csdn.net/StrongerL/article/details/81007766?tdsourcetag=s_pctim 

  
 

    

    
    [深度學習]Object detection物體檢測之YOLO v3(9)
       
 
 論文全稱：《YOLOv3: An Incremental Improvement》 
 論文地址：https://pjreddie.com/media/files/papers/YOLOv3.pdf 
 這是我目前看過最輕鬆詼諧的論文，作者的開頭特別有意思。他說自己過去一年花了很多時間在推特上面，也 

  
 

    

    
    yolo v3 網路結構
       
 
 
 首先先看一下13*13如何與26*26相concat的。整個過程相當於第二張圖裡的右側紫色部分【第一張圖】。 
 其次看一下整個大規模的模型圖【第二張圖】 
 更詳細的圖【第三章圖】 
  
 9個clusters在分配時深層的feature map應該得到最大的3個clusters，淺層的fe 

  
 

    

    
    YOLO-v3模型引數anchor設定
      
							
							
							1. 背景知識
在YOLO-v2版本中就引入了anchor box的概念，極大增加了目標檢測的效能。但是在訓練自己資料的時候還是用模型中原有的anchor設定顯然是有點不合適的，那麼就涉及到根據自己的訓練資料來設定anchor。
那麼，首先我們需要知道ancho 

  
 

    

    
    Yolo v3
      ike   nal   rem   tps   nco   mar   cti   ati   -type   紙上談兵篇
 
Reference :

convolutional-networks

Series: YOLO object detector in PyTorch


手擼篇 

  
 

    

    
    YOLO-V3 把玩 image.c demo.c
       
 
 
 detector.c 
 #include "network.h"
#include "region_layer.h"
#include "cost_layer.h"
#include "utils.h"
#include "parser.h"
#include "box.h"
#include 

  
 

    

    
    YOLO-V3 視訊檢測函式流程解讀 demo()
       
 
 
 對demo函式的理解： 
 demo.h的宣告： 
 void demo(char *cfgfile, char *weightfile, float thresh, float hier_thresh, int cam_index, const char *filename, char **n 

  
 

    

    
    YOLO-V3 圖片檢測函式流程解讀 draw_detection_v3()
       
 
 
 YOLO-V2的執行函式路徑為：yolo.c中的test_yolo()  -->  image.c中的draw_detections()  
  
   
 YOLO-V3的執行函式路徑為：detector.c中的test_detector()&nb 

  
 

    

    
    用Tensorflow來預處理Imagenet資料
      
                最近想以Imagenet 2012影象分類大賽的資料來進行訓練和測試，看看如何能利用這麼大量的影象資料來完善卷積神經網路模型。之前做的基於Cifar10的資料量還是大小了，類別也不夠多。Imagenet的資料總共有146G，共包含了1000個類別的影象，總共120萬張圖片。T 

  
 

    

    
    yolo v3 darknet 實戰視訊教程
      
                視訊教程https://edu.csdn.net/course/detail/76201. 好基友--Darknet與YOLO2. 多才多藝的Darknet3. 屌絲的利器---floydhub4. 飛一般的影象標註5. 如何編譯一個帶核彈的Darknet6. 開爐煉丹--訓 

  
 

    

    
    tensorRt加速tensorflow模型推理（inception V3為例）
       
 
  
  
 摘要 
 在一個人工智慧大爆發的時代，一個企業不來點人工智慧都不好意思說自己是科技企業。隨著各公司在各自領域資料量的積累，以及深度學習的強擬合特點，各個公司都會訓練出屬於自己的模型，那麼問題就來了，你有模型，我也有模型，那還比什麼？對，就是速度，誰的速度快，誰就厲害。 
 引言 
 te 

  
 

    

    
    yolo類檢測演算法解析——yolo v3
      計算機視覺的發展史可謂很長了，它的分支很多，而且理論那是錯綜複雜交相輝映，就好像數學一樣，如何學習數學？這問題似乎有點籠統、有點寬泛。yolo類演算法，從開始到現在已經有了3代，我們稱之為v1、v2、v3,v3現在成為了開源通用目標檢測演算法的領頭羊（ps：雖然本人一直都很欣賞SSD，但是不得不說V3版本已經 

  
 

    

    
    YOLO v3 安裝並訓練自己資料
      
							
							
							1. 安裝

YOLO v3的安裝與YOLO v2的安裝方法一樣



git clone https://github.com/pjreddie/darknet

直接使用上邊的命令下載YOLO安裝包。下載完以後，開啟進入到安裝包路徑內



cd darkn 

  
 

    

    
    基於pytorch的yolo v3的理解（下篇）
      
                參考連結：原文連結：https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/機器之心翻譯連結：https://mp.weixin.qq.com/s/jOcpMR2B3x-Nt 

  
 

    

    
    yolo v3系列之網路結構解讀page one
      
                前言：

yolo v3的網路結構搭建是基於googlenet的inception結構以及resnet的shortcut結構，因此非常有必要先看一下我的之前這兩個部落格的對於這兩個網路結構的解析。這篇文章主要以keras版本的yolo v3進行解析。在這個repo當中的網路結