GluonCV:用Pascal VOC資料訓練YOLO v3(下)訓練部分
本教程介紹了訓練GluonCV提供的YOLOv3目標檢測模型的基本步驟。
具體來說,展示瞭如何通過堆疊GluonCV元件來構建state-of-the-art的YOLOv3模型。
首先,關於訓練有三點說明:
(1)初始學習率預設是0.001,我訓練loss是nan,改為0.0001解決;
(2)如果八卡1080Ti,預設batch-size=64即可,四卡改成32,其他依次類推;
(3)關於下面的情況
[07:40:55] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the bestconvolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0to disable) # 解決方法 # 執行train之前,輸入 export MXNET_CUDNN_AUTOTUNE_DEFAULT = 0
1.資料集
請首先閱讀上篇教程,在磁碟上準備好Pascal VOC資料集。
然後,我們準備載入訓練和驗證影象。
import gluoncv as gcv from gluoncv.data import VOCDetection # typically we use 2007+2012 trainval splits for training data train_dataset = VOCDetection(splits=[(2007, 'trainval'), (2012, 'trainval')]) # and use 2007 test as validation data val_dataset = VOCDetection(splits=[(2007, 'test')]) print('Training images:', len(train_dataset)) print('Validation images:', len(val_dataset)) Out: Training images: 16551 Validation images: 4952
2. Data transform
我們可以從訓練資料集中讀取影象-標籤對
train_image, train_label = train_dataset[80]
bboxes = train_label[:, :4]
cids = train_label[:, 4:5]
print('image:', train_image.shape)
print('bboxes:', bboxes.shape, 'class ids:', cids.shape)
Out:
image: (375, 500, 3)
bboxes: (2, 4) class ids: (2, 1)
用matplotlib繪製圖像以及邊界框標籤:
from matplotlib import pyplot as plt
from gluoncv.utils import viz
ax = viz.plot_bbox(train_image.asnumpy(), bboxes, labels=cids, class_names=train_dataset.classes)
plt.show()
驗證影象與訓練非常相似,因為它們基本上是隨機分成不同的集合
val_image, val_label = val_dataset[120]
bboxes = val_label[:, :4]
cids = val_label[:, 4:5]
ax = viz.plot_bbox(val_image.asnumpy(), bboxes, labels=cids, class_names=train_dataset.classes)
plt.show()
Transform
from gluoncv.data.transforms import presets
from gluoncv import utils
from mxnet import nd
width, height = 416, 416 # resize image to 416x416 after all data augmentation
train_transform = presets.yolo.YOLO3DefaultTrainTransform(width, height)
val_transform = presets.yolo.YOLO3DefaultValTransform(width, height)
utils.random.seed(123) # fix seed in this tutorial
將變換應用於訓練影象
train_image2, train_label2 = train_transform(train_image, train_label)
print('tensor shape:', train_image2.shape)
Out:
tensor shape: (3, 416, 416)
張量中的影象被扭曲,因為它們不再位於(0,255)範圍內。 讓我們把它們轉換回去,這樣我們就能清楚地看到它們。
train_image2 = train_image2.transpose((1, 2, 0)) * nd.array((0.229, 0.224, 0.225)) + nd.array((0.485, 0.456, 0.406))
train_image2 = (train_image2 * 255).clip(0, 255)
ax = viz.plot_bbox(train_image2.asnumpy(), train_label2[:, :4],
labels=train_label2[:, 4:5],
class_names=train_dataset.classes)
plt.show()
訓練中使用的變換包括隨機顏色失真,隨機擴充套件/裁剪,隨機翻轉,調整大小和固定顏色標準化。 相比之下,驗證僅涉及調整大小和顏色標準化。
3.Data Loader
我們將在訓練期間多次遍歷整個資料集。 請記住,在將原始影象輸入神經網路之前,必須將原始影象轉換為張量(mxnet使用BCHW格式)。
一個方便的DataLoader非常方便我們將不同的轉換和聚合資料應用到mini-batches中。
因為目標的數量在影象間變化很大,所以我們也有不同的標籤大小。 因此,我們需要將這些標籤填充到相同的大小。 為了解決這個問題,GluonCV提供了gluoncv.data.batchify.Pad,它可以自動處理填充。 還有gluoncv.data.batchify.Stack,用於堆疊具有一致形狀的NDArrays。 gluoncv.data.batchify.Tuple用於處理來自轉換函式的多個輸出的不同行為。
from gluoncv.data.batchify import Tuple, Stack, Pad
from mxnet.gluon.data import DataLoader
batch_size = 2 # for tutorial, we use smaller batch-size
num_workers = 0 # you can make it larger(if your CPU has more cores) to accelerate data loading
# behavior of batchify_fn: stack images, and pad labels
batchify_fn = Tuple(Stack(), Pad(pad_val=-1))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
val_loader = DataLoader(val_dataset.transform(val_transform), batch_size, shuffle=False,
batchify_fn=batchify_fn, last_batch='keep', num_workers=num_workers)
for ib, batch in enumerate(train_loader):
if ib > 3:
break
print('data 0:', batch[0][0].shape, 'label 0:', batch[1][0].shape)
print('data 1:', batch[0][1].shape, 'label 1:', batch[1][1].shape)
Out:
data 0: (3, 416, 416) label 0: (6, 6)
data 1: (3, 416, 416) label 1: (6, 6)
data 0: (3, 416, 416) label 0: (3, 6)
data 1: (3, 416, 416) label 1: (3, 6)
data 0: (3, 416, 416) label 0: (2, 6)
data 1: (3, 416, 416) label 1: (2, 6)
data 0: (3, 416, 416) label 0: (2, 6)
data 1: (3, 416, 416) label 1: (2, 6)
4. YOLOv3 Network
GluonCV的YOLOv3實現是綜合的Gluon HybridBlock。 在結構方面,YOLOv3網路由基本特徵提取網路,卷積過渡層,上取樣層和專門設計的YOLOv3輸出層組成。
Gluon Model Zoo有一些內建的YOLO網路,可以使用一行簡單的程式碼載入:
(為避免在本教程中下載mdoel,我們設定pretrained_base = False,實際上我們通常希望通過設定pretrained_base = True來載入預先訓練的imagenet模型。)
from gluoncv import model_zoo
net = model_zoo.get_model('yolo3_darknet53_voc', pretrained_base=False)
print(net)
Out:
YOLOV3(
(_target_generator): YOLOV3TargetMerger(
(_dynamic_target): YOLOV3DynamicTargetGeneratorSimple(
(_batch_iou): BBoxBatchIOU(
(_pre): BBoxSplit(
)
)
)
)
(_loss): YOLOV3Loss(batch_axis=0, w=None)
(stages): HybridSequential(
(0): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(2): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(3): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(4): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(5): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(6): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(7): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(8): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(9): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(10): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(11): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(12): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(13): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(14): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
)
(1): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(2): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(3): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(4): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(5): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(6): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(7): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(8): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
)
(2): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(2): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(3): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(4): DarknetBasicBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
)
)
(transitions): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(yolo_blocks): HybridSequential(
(0): YOLODetectionBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(2): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(3): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(4): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(tip): HybridSequential(
(0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(1): YOLODetectionBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(2): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(3): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(4): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(tip): HybridSequential(
(0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(2): YOLODetectionBlockV3(
(body): HybridSequential(
(0): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(1): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(2): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(3): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
(4): HybridSequential(
(0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
(tip): HybridSequential(
(0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
(2): LeakyReLU(0.1)
)
)
)
(yolo_outputs): HybridSequential(
(0): YOLOOutputV3(
(prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
)
(1): YOLOOutputV3(
(prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
)
(2): YOLOOutputV3(
(prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
)
)
)
YOLOv3網路可以使用影象張量進行呼叫
import mxnet as mx
x = mx.nd.zeros(shape=(1, 3, 416, 416))
net.initialize()
cids, scores, bboxes = net(x)
YOLOv3返回三個值,其中cids是類標籤,scores是每個預測的置信度分數,而bbox是相應邊界框的絕對座標。
5. 訓練目標
端到端YOLOv3訓練涉及四個損失。 損失懲罰不正確的class/box預測,並在gluoncv.loss.YOLOV3Loss中定義
loss = gcv.loss.YOLOV3Loss()
# which is already included in YOLOv3 network
print(net._loss)
Out:
YOLOV3Loss(batch_axis=0, w=None)
為了加速訓練,我們讓CPU預先計算一些訓練目標。 當CPU功能強大且可以使用 -j num_workers來使用多核CPU時,這一點尤為出色。
如果我們為訓練變換函式提供網路,它將計算部分訓練目標
from mxnet import autograd
train_transform = presets.yolo.YOLO3DefaultTrainTransform(width, height, net)
# return stacked images, center_targets, scale_targets, gradient weights, objectness_targets, class_targets
# additionally, return padded ground truth bboxes, so there are 7 components returned by dataloader
batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
for ib, batch in enumerate(train_loader):
if ib > 0:
break
print('data:', batch[0][0].shape)
print('label:', batch[6][0].shape)
with autograd.record():
input_order = [0, 6, 1, 2, 3, 4, 5]
obj_loss, center_loss, scale_loss, cls_loss = net(*[batch[o] for o in input_order])
# sum up the losses
# some standard gluon training steps:
# autograd.backward(sum_loss)
# trainer.step(batch_size)
Out:
data: (3, 416, 416)
label: (4, 4)
我們可以看到data loader實際上正在為我們返回訓練目標。 然後載入資料和Gluon訓練很自然地迴圈,並讓它更新權重。
參考資料
-
Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).