目標檢測——SSD先驗框生成

阿新 • • 發佈：2020-12-15

SSD是常用的one_stage目標檢測演算法。目標檢測直白理解就是用框取框圖片中的各個位置，如果能框到目標，且目標的邊界正好與框的邊界重合

則說明檢測到一個目標。如果我們用各種各樣的框逐畫素移動，那麼肯定可以很快的檢測到目標，但是這樣就帶來一個問題，各種各種的框，逐畫素移動，就意味著無數個框，

這樣，在計算層面是無法實現的，所以我們需要採取幾個典型的框，以有一定間距的移動來框圖片中的目標。如果把這些框平鋪圖片中，在SSD300中，這樣的框共有8732個。

我們把這些框稱為先驗框，單純用這些框取框圖片中的目標，難免與真實的目標框不能完全吻合，我們在先驗框的基礎上進行中心點和長寬的微調，即可完成預測框的迴歸。

這些先驗框與yolo中的anchor一樣，提供了預測框迴歸的基準。

那麼為什麼框的總數是8732呢，如何得到這些框呢？

1. 框的中心點和尺寸

要想確定一個框只要確定了這個框的中心點和尺寸，就可以完全確定這個框了。

如果確定框的中心點呢？我們把影象分成N*N個格子，每個格子的中心點即為框的中心點，框與框的中心點相距 imge_size / N，相當於每隔imge_size /N的距離移動框來檢測目標。

如果確定尺寸呢？在目標檢測中，存在多種目標，有的大有的小，所以需要採取不同的長寬來覆蓋不同大小的目標，如果imge_size / N比較大，則框的尺寸可以設定的大一點

如果imge_size / N比較小，可以設定的小一點。為了儘可能覆蓋不同形狀的目標，通過設定不同的長寬比，來覆蓋不同形狀的物體

所以目前我們需要解決的問題

1.1. 獲得框的中心

SSD中將原圖分為（38*38），（19*19），（10*10），（5*5），（3*3），（1*1），以滿足不同目標的大小的要求

以影象大小300*300，分成3*3個格子為例

step_x = img_width / layer_width
step_y = img_height / layer_height

linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
                   layer_width)
liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
                   layer_height)

centers_x, centers_y = np.meshgrid(linx, liny)

centers_x = centers_x.reshape(-1, 1)
centers_y = centers_y.reshape(-1, 1)


fig = plt.figure()
ax = fig.add_subplot(111)
plt.ylim(0, 300)
plt.xlim(0, 300)
plt.scatter(centers_x, centers_y)
plt.show()

1.2.獲得尺寸

SSD中通過公式

S_k相對於input_shape（300）的比例，S_min = 0.2 ，S_max = 0.9，m = 5, 因為第一層指定S_k = 0.1

最終得到各層的S_k對應的框的size，在實際處理時，做了取整，所以除第一層以外加51即可

1.3.獲得不同的長寬比

SSD中在設定了三種長寬比1， 2， 1/2，3，1/3，

長寬比=1：正方形

　　W_k= H_k = min_size

　　W_k= H_k= √(min_size * max_size)

長寬比 = 2：

　　W_k= min_size *√2

　　H_k= min_size *√(1/2)

同理可以的得到其他長寬比的框的尺寸

因為當38*38， 3*3， 1*1時，沒有設定1/3和3長寬比的框，所以每個中心點對應4個框

而19*19， 10*10， 5*5，每個中心點對應6個框

所以框的總數 38*38*4 + 19*19*6 + 10*10*6 + 5*5*6 + 3*3*4 + 1*1*4= 5776 + 2166 + 600 + 300 + 36 + 4 = 8732

程式碼如下

import numpy as np
import pickle
import matplotlib.pyplot as plt
class PriorBox():
    def __init__(self, img_size, min_size, max_size=None, aspect_ratios=None,
                 flip=True, variances=[0.1], clip=True, **kwargs):

        self.waxis = 1
        self.haxis = 0

        self.img_size = img_size
        if min_size <= 0:
            raise Exception('min_size must be positive.')

        self.min_size = min_size
        self.max_size = max_size
        self.aspect_ratios = [1.0]
        if max_size:
            if max_size < min_size:
                raise Exception('max_size must be greater than min_size.')
            self.aspect_ratios.append(1.0)
        if aspect_ratios:
            for ar in aspect_ratios:
                if ar in self.aspect_ratios:
                    continue
                self.aspect_ratios.append(ar)
                if flip:
                    self.aspect_ratios.append(1.0 / ar)
        self.variances = np.array(variances)
        self.clip = True


    def call(self, input_shape, mask=None):

        # 獲取輸入進來的特徵層的寬與高
        # 3x3
        layer_width = input_shape[self.waxis]
        layer_height = input_shape[self.haxis]

        # 獲取輸入進來的圖片的寬和高
        # 300x300
        img_width = self.img_size[0]
        img_height = self.img_size[1]

        # 獲得先驗框的寬和高
        box_widths = []
        box_heights = []
        for ar in self.aspect_ratios:
            if ar == 1 and len(box_widths) == 0:
                box_widths.append(self.min_size)
                box_heights.append(self.min_size)
            elif ar == 1 and len(box_widths) > 0:
                box_widths.append(np.sqrt(self.min_size * self.max_size))
                box_heights.append(np.sqrt(self.min_size * self.max_size))
            elif ar != 1:
                box_widths.append(self.min_size * np.sqrt(ar))
                box_heights.append(self.min_size / np.sqrt(ar))


        box_widths = 0.5 * np.array(box_widths)
        box_heights = 0.5 * np.array(box_heights)

        step_x = img_width / layer_width
        step_y = img_height / layer_height

        linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
                           layer_width)
        liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
                           layer_height)

        centers_x, centers_y = np.meshgrid(linx, liny)

        # 計算網格中心
        centers_x = centers_x.reshape(-1, 1)
        centers_y = centers_y.reshape(-1, 1)

        num_priors_ = len(self.aspect_ratios)

        # 每一個先驗框需要兩個(centers_x, centers_y)，前一個用來計算左上角，後一個計算右下角
        prior_boxes = np.concatenate((centers_x, centers_y), axis=1)
        prior_boxes = np.tile(prior_boxes, (1, 2 * num_priors_))
        
        # 獲得先驗框的左上角和右下角
        prior_boxes[:, ::4] -= box_widths
        prior_boxes[:, 1::4] -= box_heights
        prior_boxes[:, 2::4] += box_widths
        prior_boxes[:, 3::4] += box_heights

        # 變成小數的形式
        prior_boxes[:, ::2] /= img_width
        prior_boxes[:, 1::2] /= img_height
        prior_boxes = prior_boxes.reshape(-1, 4)

        prior_boxes = np.minimum(np.maximum(prior_boxes, 0.0), 1.0)

        num_boxes = len(prior_boxes)
        
        if len(self.variances) == 1:
            variances = np.ones((num_boxes, 4)) * self.variances[0]
        elif len(self.variances) == 4:
            variances = np.tile(self.variances, (num_boxes, 1))
        else:
            raise Exception('Must provide one or four variances.')

        prior_boxes = np.concatenate((prior_boxes, variances), axis=1)
        return prior_boxes


def get_anchors(img_size = (300,300)):
    net = {} 
    priorbox = PriorBox(img_size, 30.0,max_size = 60.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv4_3_norm_mbox_priorbox')
    net['conv4_3_norm_mbox_priorbox'] = priorbox.call([38,38])


    priorbox = PriorBox(img_size, 60.0, max_size=111.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='fc7_mbox_priorbox')
    net['fc7_mbox_priorbox'] = priorbox.call([19,19])


    priorbox = PriorBox(img_size, 111.0, max_size=162.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv6_2_mbox_priorbox')
    net['conv6_2_mbox_priorbox'] = priorbox.call([10,10])


    priorbox = PriorBox(img_size, 152.0, max_size=213.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv7_2_mbox_priorbox')
    net['conv7_2_mbox_priorbox'] = priorbox.call([5,5])


    priorbox = PriorBox(img_size, 213.0, max_size=264.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv8_2_mbox_priorbox')
    net['conv8_2_mbox_priorbox'] = priorbox.call([3,3])

    priorbox = PriorBox(img_size, 264.0, max_size=315.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='pool6_mbox_priorbox')
                        
    net['pool6_mbox_priorbox'] = priorbox.call([1,1])

    net['mbox_priorbox'] = np.concatenate([net['conv4_3_norm_mbox_priorbox'],
                                    net['fc7_mbox_priorbox'],
                                    net['conv6_2_mbox_priorbox'],
                                    net['conv7_2_mbox_priorbox'],
                                    net['conv8_2_mbox_priorbox'],
                                    net['pool6_mbox_priorbox']],
                                    axis=0)

    return net['mbox_priorbox']