【Tensorflow tf 掏糞記錄】筆記五——YOLOv3 tensorflow 實現

阿新 • • 發佈：2018-12-31

YOLOv3是YOLO作者優化的YOLO演算法，與之前的相比，網路結構多了殘差塊來連線網路，採用了金字塔結構，網路的深度大大加深，達到了53層。所以作者把網路命名為DarkNet-53。

由於作者的編碼能力實在是太強了，連所用的框架好像都是自己寫的，所以目前並沒有其他的框架的版本的釋出。我根據作者寫的關於YOLOv3的論文和之前的關於YOLO的所有論文，按我對論文的理解來實現YOLOv3。（若我對論文的理解有誤，歡迎指出。不勝感激）
還是老樣子，各種設定可以到config資料夾下修改對應的設定項

專案程式碼

簡要介紹YOLO

YOLO( you only look once)的縮寫。顧名思義，就是隻看一次，整個預測過程只看圖片一次。這有別與之前的目標定位識別的專案，YOLO之前的專案一般是定位目標看一次影象，物體分類再看一次影象。所以簡單直觀地看，YOLO似乎效率比他們都高，畢竟少看了一次，省下了時間。事實上YOLO也是快的驚人，作者在Titan GPU上實現了實時目標定位與識別，識別視訊在

官網首頁就有，若沒有可能是衝浪方法不夠科學。

專案結構

本次的專案主要有3個檔案，2個資料夾。utils資料夾中有6個檔案。config資料夾中config.yml配置檔案。

reader.py:
- 存放了用來讀取資料集，資料集標籤存放地址與檔案的方法。並在讀取名字的過程中實現的mini_batch的操作
train.py:
- 就是組合各種工具，用來訓練網路的程式碼
eval.ph:
- 用來跑訓練好的YOLOv3
utils資料夾：
- extract_labels.py:
  - 裡面的labels_normaliszer()根據傳入的標籤的存放地址來讀取標籤，並且把標籤轉化成我們網路需要的格式
- get_loss.py:
  - 整合了計算YOLOv3作者之前提到的3個loss的計算方法，並計算batch的loss
- IOU.py:
  - 裡面的IOU_calculator()方法根據傳入的預測值和目標的標籤的值來計算IOU
- net.ph:
  - 裡面實現了YOLOv3的核心演算法。DarkNet-53
- read_config.py:
  - 讀取config配置檔案中的配置引數
- ·select_things.py`:
  - 顧名思義，裡面的方法實現了選擇功能。例如選擇YOLOv3中scale的大小，選擇scale對應的check_point檔案

Utils 工具

extract_labels

對於標註的資料的處理我的思路是生成一個shape與神經網路輸出的陣列shape相同的陣列，並把目標的標籤賦給對應下標的陣列單元

labels_normalizer()方法中，我建立了一個map，用來把所有的種類（class）轉化為陣列中對應的位置的下標。這裡我用的是VOC2007資料集，可以從官網下載，官網上說明了資料集中的種類（class）的個數。更換資料集記得過來修改map的內容。

由於VOC資料集的標籤格式是xml型別的，所以我呼叫了xml.dom.minidom這個庫來解析xml檔案，
並且組合成元組，然後連線元組組成連結串列。然後返回連結串列
從中提取了object_name, bdbox, xmin,ymin,xmax,ymax

import xml.dom.minidom

def xml_extractor( dir ):
    DOMTree = parse( dir )
    collection = DOMTree.documentElement    # 得到xml檔案的根節點
    file_name_xml = collection.getElementsByTagName( 'filename' )[0]
    objects_xml = collection.getElementsByTagName( 'object' )
    size_xml = collection.getElementsByTagName( 'size' )

    file_name = file_name_xml.childNodes[0].data

    for size in size_xml:
        width = size.getElementsByTagName( 'width' )[0]
        height = size.getElementsByTagName( 'height' )[0]

        width = width.childNodes[0].data
        height = height.childNodes[0].data

    objects = []
    for object_xml in objects_xml:
        object_name = object_xml.getElementsByTagName( 'name' )[0]
        bdbox = object_xml.getElementsByTagName( 'bndbox' )[0]
        xmin = bdbox.getElementsByTagName( 'xmin' )[0]
        ymin = bdbox.getElementsByTagName( 'ymin' )[0]
        xmax = bdbox.getElementsByTagName( 'xmax' )[0]
        ymax = bdbox.getElementsByTagName( 'ymax' )[0]

        object = ( object_name.childNodes[0].data,
                   xmin.childNodes[0].data,
                   ymin.childNodes[0].data,
                   xmax.childNodes[0].data,
                   ymax.childNodes[0].data )

        objects.append( object )

    return file_name, width, height, objects

在labels_normalizer()方法中我才正式的把得到的labels轉化為陣列。有一點值得注意，在生成新的陣列的時候務必要加1e-8（一個近似0的數），為的是防止之後在計算IOU時出現分母為0從而輸出為nan的情況。因為VOC資料集標記的是目標的對角線的座標，而我們需要的是目標中點的座標與之對應的boundding box的長寬。所以需要一點小計算。

而且由於YOLOv3的檢測機制是中點所在的對應的box對目標物體進行預測，所以還需要得出物體所在的box，並對對應下標的陣列賦值。為了防止下標越界我把最右與最下邊界上的點歸為前一個box管理，他們本來歸下一個box管理，但是會發生越界錯誤

def labels_normalizer( batches_filenames, target_width, target_height, layerout_width, layerout_height ):

    class_map = {
        'person' : 5,
        'bird' : 6,
        'cat' : 7,
        'cow' : 8,
        'dog' : 9,
        'horse' : 10,
        'sheep' : 11,
        'aeroplane' : 12,
        'bicycle' : 13,
        'boat' : 14,
        'bus' : 15,
        'car' : 16,
        'motorbike' : 17,
        'train' : 18,
        'bottle' : 19,
        'chair' : 20,
        'diningtable' : 21,
        'pottedplant': 22,
        'sofa' : 23,
        'tvmonitor' : 24
    }

    height_width = []
    batches_labels = []
    for batch_filenames in batches_filenames:
        batch_labels = []
        for filename in batch_filenames:
            _, width, height, objects = xml_extractor( filename )
            width_preprotion = target_width / int( width )
            height_preprotion = target_height / int( height )
            label = np.add( np.zeros( [int( layerout_height ), int( layerout_width ), 255] ), 1e-8 )    # 這裡加1e-8的原因是防止之後在用該資料在計算IOU時出現分母為0從而導致輸出為nan的情況
            for object in objects:
                class_label = class_map[object[0]]
                xmin = float( object[1] )
                ymin = float( object[2] )
                xmax = float( object[3] )
                ymax = float( object[4] )
                x = ( 1.0 * xmax + xmin ) / 2 * width_preprotion    # 計算目標中點的x值
                y = ( 1.0 * ymax + ymin ) / 2 * height_preprotion    # 計算目標中點的y值
                bdbox_width = ( 1.0 * xmax - xmin ) * width_preprotion    # 計算目標的boundding box的寬
                bdbox_height = ( 1.0 * ymax - ymin ) * height_preprotion    # 計算目標的boundding box的高
                falg_width = int( target_width ) / layerout_width    # 計算一個box內含有多少個原影象的橫軸畫素
                flag_height = int( target_height ) / layerout_height    # 計算一個box內含有多少個原影象的橫軸畫素
                box_x = x // falg_width    # 計算x所屬的box的x下標
                box_y = y // flag_height    # 計算y所屬的box的y下標
                if box_x == layerout_width:    # 把最後一個box右邊界的點歸為最後一個box管理（本來為下一個box管理）
                    box_x -= 1
                if box_y == layerout_height:    # 把最下面一個box的下邊界的點歸為最下面一個box管理（本來為下一個box管理）
                    box_y -= 1
                for i in range( 3 ):    # 每個box預測3個bdbox
                    label[int( box_y ), int( box_x ), i * 25] = x    # point x
                    label[int( box_y ), int( box_x ), i * 25 + 1] = y    # point y
                    label[int( box_y ), int( box_x ), i * 25 + 2] = bdbox_width    # bdbox width
                    label[int( box_y ), int( box_x ), i * 25 + 3] = bdbox_height    # bdbox height
                    label[int( box_y ), int( box_x ), i * 25 + 4] = 1    # objectness
                    label[int( box_y ), int( box_x ), i * 25 + int( class_label )] = 0.9    # class label

            batch_labels.append( label )

        batches_labels.append( batch_labels )

    # batches_labels = np.array( batches_labels )

    return batches_labels

get_loss

總的loss函式數學表的式：
coord取5，noobj取0.5
這裡寫圖片描述
程式碼實現：

def calculate_loss( batch_inputs, batch_labels ):
    batch_loss = 0
    # for batch in range( batch_inputs.shape[0] ):
    for image_num in range( batch_inputs.shape[0] ):
        for y in range( batch_inputs.shape[1] ):
            for x in range( batch_inputs.shape[2] ):
                for i in range( 3 ):
                    pretect_x = batch_inputs[image_num][y][x][i * 25]
                    pretect_y = batch_inputs[image_num][y][x][i * 25 + 1]
                    pretect_width = batch_inputs[image_num][y][x][i * 25 + 2]
                    pretect_height = batch_inputs[image_num][y][x][i * 25 + 3]
                    pretect_objectness = batch_inputs[image_num][y][x][i * 25 + 4]
                    pretect_class = batch_inputs[image_num][y][x][i * 25 + 5 : i * 25 + 5 + 20]
                    label_x = batch_labels[image_num][y][x][i * 25]
                    label_y = batch_labels[image_num][y][x][i * 25 + 1]
                    label_width = batch_labels[image_num][y][x][i * 25 + 2]
                    label_height = batch_labels[image_num][y][x][i * 25 + 3]
                    label_objectness = batch_labels[image_num][y][x][i * 25 + 4]
                    label_class = batch_labels[image_num][y][x][i * 25 + 5 : i * 25 + 5 + 20]
                    IOU = get_IOU.IOU_calculator( tf.cast( pretect_x, tf.float32 ),
                                                  tf.cast( pretect_y, tf.float32 ),
                                                  tf.cast( pretect_width, tf.float32 ),
                                                  tf.cast( pretect_height, tf.float32 ),
                                                  tf.cast( label_x, tf.float32 ),
                                                  tf.cast( label_y, tf.float32 ),
                                                  tf.cast( label_width, tf.float32 ),
                                                  tf.cast( label_height, tf.float32 ) )
                    loss = class_loss( pretect_class,
                                       label_class ) + location_loss( pretect_x,
                                                                      pretect_y,
                                                                      pretect_width,
                                                                      pretect_height,
                                                                      label_x,
                                                                      label_y,
                                                                      label_width,
                                                                      label_height ) + objectness_loss( IOU, pretect_objectness, label_objectness )

                    batch_loss += loss
    return batch_loss

計算IOU的損失函式：
這裡寫圖片描述
程式碼實現：

def objectness_loss( input, switch, l_switch, alpha = 0.5 ):
    '''
    Calculate the objectness loss

    :param input: input IOU
    :param switch: If target in this box is 1, else 1e-8
    :param l_switch: Target in this box is 1, else 0
    :return: objectness_loss
    '''

    IOU_loss = tf.square( l_switch - input * switch )
    loss_max = tf.square( l_switch * 0.5 - input * switch )

    IOU_loss = tf.cond( IOU_loss < loss_max, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU_loss )

    IOU_loss = tf.cond( l_switch < 1, lambda : IOU_loss * alpha, lambda : IOU_loss )

    return IOU_loss

作者說了，這次IOU誤差0.5是在接受範圍內。所以我在objectness_loss方法中加入了判斷語句。讓誤差小於0.5的IUO_loss都等於1e-8（一個非常接近0的數）。這裡作者希望box中沒目標點的box相應的IOU預測為0，我用1e-8表示

計算Class的損失函式：
這裡寫圖片描述
程式碼實現：

def class_loss( inputs, labels ):
    classloss = tf.square( labels - inputs )
    loss_sum = tf.reduce_sum( classloss )

    return loss_sum

這種loss的計算就是機器學習的基礎。沒難度。

計算location的損失函式：
這裡寫圖片描述
程式碼實現：

def location_loss( x, y, width, height, l_x, l_y, l_width, l_height, alpha = 5 ):
    point_loss = ( tf.square( l_x - x ) + tf.square( l_y - y ) ) * alpha
    size_loss = ( tf.square( tf.sqrt( l_width ) - tf.sqrt( width ) ) + tf.square( tf.sqrt( l_height ) - tf.sqrt( height ) ) ) * alpha

    location_loss = point_loss + size_loss

    return location_loss

這裡有開方，所以等下在寫net的時候記得輸出取絕對值就好。避免等下根號下為負數輸出為nan。

IOU

IOU為目標的預測框與目標的標籤框的交集的面積佔兩框總面積的比
這裡我們我們知道了標籤框的中點座標，寬，高與預測框的中點座標，寬，高。然後就是解初中數學題了。
在這裡我儘量避免分母出現0的情況防止出現令人煩惱的nan錯誤

def IOU_calculator( x, y, width, height, l_x, l_y, l_width, l_height ):
    '''
    Cculate IOU

    :param x: net predicted x
    :param y: net predicted y
    :param width: net predicted width
    :param height: net predicted height
    :param l_x: label x
    :param l_y: label y
    :param l_width: label width
    :param l_height: label height
    :return: IOU
    '''

    x_max = calculate_max( x , width / 2 )
    y_max = calculate_max( y, height / 2 )
    x_min = calculate_min( x, width / 2 )
    y_min = calculate_min( y, height / 2 )

    l_x_max = calculate_max( l_x, width / 2 )
    l_y_max = calculate_max( l_y, height / 2 )
    l_x_min = calculate_min( l_x, width / 2 )
    l_y_min = calculate_min( l_y, height / 2 )

    '''--------Caculate Both Area's point--------'''
    xend = tf.minimum( x_max, l_x_max )
    xstart = tf.maximum( x_min, l_x_min )

    yend = tf.minimum( y_max, l_y_max )
    ystart = tf.maximum( y_min, l_y_min )

    area_width = xend - xstart
    area_height = yend - ystart

    '''--------Caculate the IOU--------'''
    area = area_width * area_height

    all_area = tf.cond( ( width * height + l_width * l_height - area ) <= 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : ( width * height + l_width * l_height - area ) )

    IOU = area / all_area

    IOU = tf.cond( area_width < 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU )
    IOU = tf.cond( area_height < 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU )

    return IOU

net

DarkNet-53論文中的結構圖：
這裡寫圖片描述
論文中說它有3種不同大小的scale。scale中的引數分別從上圖的倒數第一個框，倒數第二個框，倒數第三個框的輸出中得到引數，並分別加上最後一層的輸出的引數。作者說這樣能夠獲得很多很多的“特徵”（feature）

我是用的啟用函式是Leky Relu。因為tensorflow中沒有Leky Relu所以我自己寫了一個。其實本質就是在x的負方向梯度不為0的Relu函式。

def Leaky_Relu( input, alpha = 0.01 ):
    output = tf.maximum( input, tf.multiply( input, alpha ) )

    return output

我聲明瞭兩種卷積函式，一種是卷積操作後直接batch_normalization，Leky Relu直接走下去。還有一種是卷積操作後接batch_normalization，Leky Relu，然後再加上殘差網路的shortcut然後再次通過Leky Relu

def Res_conv2d( inputs, shortcut, filters, shape, stride = ( 1, 1 ) ):
    conv = conv2d( inputs, filters, shape )
    Res = Leaky_Relu( conv + shortcut )

    return Res
def conv2d( inputs, filters, shape, stride = ( 1, 1 ) ):
    layer = tf.layers.conv2d( inputs,
                              filters,
                              shape,
                              stride,
                              padding = 'SAME',
                              kernel_initializer=tf.truncated_normal_initializer( stddev=0.01 ) )

    layer = tf.layers.batch_normalization( layer, training = True )

    layer = Leaky_Relu( layer )

    return layer

然後就是按照圖來實現神經網路

def feature_extractor( inputs ):
    layer = conv2d( inputs, 32, [3, 3] )
    layer = conv2d( layer, 64, [3, 3], ( 2, 2 ) )
    shortcut = layer

    layer = conv2d( layer, 32, [1, 1] )
    layer = Res_conv2d( layer, shortcut, 64, [3, 3] )

    layer = conv2d( layer, 128, [3, 3], ( 2, 2 ) )
    shortcut = layer

    for _ in range( 2 ):
        layer = conv2d( layer, 64, [1, 1] )
        layer = Res_conv2d( layer, shortcut, 128, [3, 3] )

    layer = conv2d( layer, 256, [3, 3], ( 2, 2 ) )
    shortcut = layer

    for _ in range( 8 ):
        layer = conv2d( layer, 128, [1, 1] )
        layer = Res_conv2d( layer, shortcut, 256, [3, 3] )
    pre_scale3 = layer

    layer = conv2d( layer, 512, [3, 3], ( 2, 2 ) )
    shortcut = layer

    for _ in range( 8 ):
        layer = conv2d( layer, 256, [1, 1] )
        layer = Res_conv2d( layer, shortcut, 512, [3, 3] )
    pre_scale2 = layer

    layer = conv2d( layer, 1024, [3, 3], ( 2, 2 ) )
    shortcut = layer

    for _ in range( 4 ):
        layer = conv2d( layer, 512, [1, 1] )
        layer = Res_conv2d( layer, shortcut, 1024, [3, 3] )
    pre_scale1 = layer

    return pre_scale1, pre_scale2, pre_scale3

作者說，scale2, scale3從網路中間提取的引數會經過一個2x的操作。我的理解是，直接把輸出當成影象來縮放。
在此函式中，我還把網路最終層的輸出與升樣後的陣列相加。

def get_layer2x( layer_final, pre_scale ):
    layer2x = tf.image.resize_images(layer_final,
                                     [2 * tf.shape(layer_final)[1], 2 * tf.shape(layer_final)[2]])
    layer2x_add = tf.concat( [layer2x, pre_scale], 3 )

    return layer2x_add

最後的scale要通過一些層的神經網路得到最後的預測結果。這個是我按照我安裝的DarkNet列印的結構來實現的。在最後我把所有的結果取絕對值，防止出現前面提到的nan的錯誤

def scales( layer, pre_scale2, pre_scale3 ):
    layer_copy = layer
    layer = conv2d( layer, 512, [1, 1] )
    layer = conv2d( layer, 1024, [3, 3] )
    layer = conv2d(layer, 512, [1, 1])
    layer_final = layer
    layer = conv2d(layer, 1024, [3, 3])

    '''--------scale_1--------'''
    scale_1 = conv2d( layer, 255, [1, 1] )

    '''--------scale_2--------'''
    layer = conv2d( layer_final, 256, [1, 1] )
    layer = get_layer2x( layer, pre_scale2 )

    layer = conv2d( layer, 256, [1, 1] )
    layer= conv2d( layer, 512, [3, 3] )
    layer = conv2d( layer, 256, [1, 1] )
    layer = conv2d( layer, 512, [3, 3] )
    layer = conv2d( layer, 256, [1, 1] )
    layer_final = layer
    layer = conv2d( layer, 512, [3, 3] )
    scale_2 = conv2d( layer, 255, [1, 1] )

    '''--------scale_3--------'''
    layer = conv2d( layer_final, 128, [1, 1] )
    layer = get_layer2x( layer, pre_scale3 )

    for _ in range( 3 ):
        layer = conv2d( layer, 128, [1, 1] )
        layer = conv2d( layer, 256, [3, 3] )
    scale_3 = conv2d( layer, 255, [1, 1] )

    scale_1 = tf.abs( scale_1 )
    scale_2 = tf.abs( scale_2 )
    scale_3 = tf.abs( scale_3 )

    return scale_1, scale_2, scale_3

eval

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument( '-c', '--conf', default = './config/eval_config.yml', help = 'the path to the eval_config file' )
    return parser.parse_args()

def main( FLAGS ):
    if not os.path.exists( FLAGS.save_dir ):
        os.makedirs( FLAGS.save_dir )

    input_image = reader.get_image( FLAGS.image_dir, FLAGS.image_width, FLAGS.image_height )
    output_image = np.copy( input_image )

    '''--------Create placeholder--------'''
    image = net.create_eval_placeholder( FLAGS.image_width, FLAGS.image_height )

    '''--------net--------'''
    pre_scale1, pre_scale2, pre_scale3 = net.feature_extractor( image )
    scale1, scale2, scale3 = net.scales( pre_scale1, pre_scale2, pre_scale3 )

    with tf.Session() as sess:
        saver = tf.train.Saver()
        save_path = select_things.select_checkpoint( FLAGS.scale )
        last_checkpoint = tf.train.latest_checkpoint( save_path, 'checkpoint' )
        if last_checkpoint:
            saver.restore(sess, last_checkpoint)
            print( 'Success load model from: ', format( last_checkpoint ) )
        else:
            print( 'Model has not trained' )

        start_time = time.time()
        scale1, scale2, scale3 = sess.run( [scale1, scale2, scale3], feed_dict = {image: [output_image]} )

    if FLAGS.scale == 1:
        scale = scale1
    if FLAGS.scale == 2:
        scale = scale2
    if FLAGS.scale == 3:
        scale = scale3

    boxes_labels = eval_uitls.label_extractor( scale[0] )

    bdboxes = eval_uitls.get_bdboxes( boxes_labels )

    for bdbox in bdboxes:
        font = cv2.FONT_HERSHEY_SIMPLEX
        output_image = cv2.rectangle( output_image,
                                      ( int( bdbox[0] - bdbox[2] / 2 ), int( bdbox[1] - bdbox[3] / 2 ) ),
                                      ( int( bdbox[0] + bdbox[2] / 2 ), int( bdbox[1] + bdbox[3] / 2 ) ),
                                      ( 200, 0, 0 ),
                                      1 )
        # output_image = cv2.putText( output_image, bdbox[5],
        #                             ( bdbox[0] - bdbox[2] / 2, bdbox[1] - bdbox[3] / 2 ),
        #                             1.2,
        #                             (0, 255, 0),
        #                             2 )
    # output_image = np.multiply( output_image, 255 )

    generate_image = FLAGS.save_dir + '/res.jpg'
    if not os.path.exists( FLAGS.save_dir ):
        os.makedirs( FLAGS.save_dir )

    with open( generate_image, 'wb' ) as img:
        img.write( output_image )
        end_time = time.time()

    print( 'Use time: ', end_time - start_time )

    plt.imshow( output_image )
    plt.show()

train

import tensorflow as tf
import numpy as np
import os
import argparse
import time
import utils.read_config as read_config

from utils import net, read_config, get_loss, IOU, extract_labels, select_things
import reader

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument( '-c', '--conf', default = './config/config.yml', help = 'the path to the config file' )
    return parser.parse_args()

def main( FLAGS ):

    scale_width, scale_height = select_things.select_scale( FLAGS.scale, FLAGS.width, FLAGS.height )
    '''--------Creat palceholder--------'''
    datas, labels = net.create_placeholder( FLAGS.batch_size, FLAGS.width, FLAGS.height, scale_width, scale_height )

    '''--------net--------'''
    pre_scale1, pre_scale2, pre_scale3 = net.feature_extractor( datas )
    scale1, scale2, scale3 = net.scales( pre_scale1, pre_scale2, pre_scale3 )

    '''--------get labels_filenames and datas_filenames--------'''
    datas_filenames = reader.images( FLAGS.batch_size, FLAGS.datas_path )
    labels_fienames = reader.labels( FLAGS.batch_size, FLAGS.labels_path )
    normalize_labels = extract_labels.labels_normalizer( labels_fienames,
                                                          FLAGS.width,
                                                          FLAGS.height,
                                                          scale_width,
                                                          scale_height )

    '''---------partition the train data and val data--------'''
    train_filenames = datas_filenames[: int( len( datas_filenames ) * 0.9 )]
    train_labels = normalize_labels[: int( len( normalize_labels ) * 0.9 )]
    val_filenames = datas_filenames[len( datas_filenames ) - int( len( datas_filenames ) * 0.9 ) :]
    val_labels = normalize_labels[len( normalize_labels ) - int( len( normalize_labels ) * 0.9 ) :]

    '''--------calculate loss--------'''
    if FLAGS.scale == 1:
        loss = get_loss.calculate_loss( scale1, labels )

    if FLAGS.scale == 2:
        loss = get_loss.calculate_loss( scale2, labels )

    if FLAGS.scale == 3:
        loss = get_loss.calculate_loss( scale3, labels )

    '''--------Optimizer--------'''
    optimizer = tf.train.AdamOptimizer( learning_rate=FLAGS.learning_rate ).minimize( loss )

    tf.summary.scalar( 'epoch_loss', loss )

    merged = tf.summary.merge_all()

    init = tf.initialize_all_variables()
    '''--------train--------'''
    with tf.Session() as sess:
        saver = tf.train.Saver()
        save_path = select_things.select_checkpoint( FLAGS.scale )
        last_checkpoint = tf.train.latest_checkpoint( save_path, 'checkpoint' )
        if last_checkpoint:
            saver.restore( sess, last_checkpoint )
            print( 'Reuse model' )
        else:
            sess.run( init )


        for epoch in range( FLAGS.epoch ):
            epoch_loss = tf.cast( 0, tf.float32 )
            for i in range( len( train_filenames ) ):
                normalize_datas = []
                for data_filename in train_filenames[i]:
                    image = reader.get_image( data_filename, FLAGS.width, FLAGS.height )
                    image = np.array( image, np.float32 )

                    normalize_datas.append( image )

                normalize_datas = np.array( normalize_datas )

                _, batch_loss = sess.run( [optimizer, loss], feed_dict = {datas: normalize_datas, labels: train_labels[i]} )

                epoch_loss =+ batch_loss

            if epoch % 10 == 0:
                print( 'Cost after epoch %i: %f' % ( epoch, epoch_loss ) )

            if epoch % 50 == 0:
                val_loss = tf.cast( 0, tf.float32 )
                for i in range( len( val_filenames ) ):
                    normalize_datas = []
                    for val_filename in val_filenames[i]:
                        image = reader.get_image( val_filename, FLAGS.width, FLAGS.height )
                        image = np.array( image, np.float32 )
                        image = np.divide( image, 255 )

                        normalize_datas.append( image )

                    normalize_datas = np.array( normalize_datas )

                    batch_loss = sess.run( loss, feed_dict = {datas: normalize_datas, labels: val_labels[i]} )

                    val_loss =+ batch_loss

                print( 'VAL_Cost after epoch %i: %f' %( epoch, val_loss ) )
                saver.save( sess, save_path, global_step = epoch )



if __name__ == '__main__':
    args = parse_args()
    FLAGS = read_config.read_config_file( args.conf )
    main( FLAGS )

【Tensorflow tf 掏糞記錄】筆記五——YOLOv3 tensorflow 實現

專案程式碼

簡要介紹YOLO

專案結構

Utils 工具

extract_labels

get_loss

IOU

net

eval

train

【Tensorflow tf 掏糞記錄】筆記五——YOLOv3 tensorflow 實現

【Tensorflow tf 掏糞記錄】筆記四——tensorflow搭建GAN神經網路

【Python3.6爬蟲學習記錄】（五）Cookie的使用以及簡單的爬取知乎

【Head First Servlets and JSP】筆記1

【Head First Servlets and JSP】筆記7：如何創建一個全局的dog？

【Head First Servlets and JSP】筆記8：監聽者

【Head First Servlets and JSP】筆記11：cookie

【Head First Servlets and JSP】筆記13：session & cookie

【Head First Servlets and JSP】筆記23：Expression Language(EL) 完全攻略

【Head First Servlets and JSP】筆記 26： web 應用部署

【神經網絡和深度學習】筆記 - 第二章反向傳播算法

【2018.5.27會議記錄】—— [ 算法原理 ]：手工特征提取的概念問題。

【可能有點用的記錄】Android Studio 3.2.1更新(2)

【SQL必知必會】筆記 SQL萬用字元

【SQL必知必會】筆記資料庫基礎

【SQL必知必會】筆記函式

【SQL必知必會】筆記建立和表

【資料物件和對映記錄】

【轉】【記錄】Java五個最常用的集合類之間的區別和聯絡

【神經網路和深度學習】筆記

【Tensorflow tf 掏糞記錄】筆記五——YOLOv3 tensorflow 實現

專案程式碼

簡要介紹YOLO

專案結構

Utils 工具

extract_labels

get_loss

IOU

net

eval

train

相關推薦