Faster R-CNN中的RPN的理解

阿新 • • 發佈：2022-05-12

RPN的作用

rpn是相對於選擇性搜尋策略做出的改進，該區域生成網路的輸入是backbone的一個或多個特徵層，維度不妨設為（B,C,H,W）先通過3x3的卷積將輸入的特徵圖的特徵進行融合，接著利用兩個獨立的1x1卷積輸出objectness和boundingbox_regression。得到的objectness輸出向量的維度為(B,K,H,W)，其中k為每個cell上生成的anchors的個數。boundingbox_regression的向量的維度為(B,Kx4,H,W)，它的值是proposal相對於anchor的相對偏移量。

注："For simplicity we implement the cls layer as a two-class softmax layer. Alternatively, one may use logistic regression to produce k scores."作者提出objectness輸出向量的維度也可以為為(B,Kx2,H,W),這樣將問題看作是一種二分類問題。為簡單期間，採用前者直接採用迴歸產生k的分數的方法。

RPN損失函式的定義

anchors的生成。對於一個尺寸為(B,C,H,W)的特徵圖而言，令每個cell生成k個anchors，則該特徵層生成HxWxK個anchors。將anchors的座標映射回原圖，在原圖上就會產生一系列高寬比不同的anchor框。為每一個anchor框分配一個class標籤，代表它們是或者不是目標。有兩種anchor框可以分配正標籤：(1)anchor框和gtbox框的iou大於0.7（這個值是人為指定的）。(2)在某個gtbox匹配的所有anchor框中，該anchor具有最大的值，即使它的iou值小於閾值0.7。（個人認為這樣做的原因是為每個gtbox框分配一個正樣本，以期提高後續檢測的召回率）。單個gtbox可以匹配多個anchors,而那些與gtbox計算的iou值小於0.3的anchor框被視為負樣本。這樣，可以將目標損失函式定義為：

其中，i:一個batch中anchor的索引

pi:對於索引為i的anchor，預測其為object的概率

pi*:gtbox標籤，當anchor為正樣本時，標籤值為1；當anchor為負樣本時，標籤值為0

ti:第i個anchor預測的bbox的座標，即為proposal

ti*:與第i個positive anchor相關聯的gtbox的座標

Lcls:分類損失函式，計算二分類損失

pi**Lreg:迴歸損失，只有anchor為正樣本時才會起作用，此時pi* *為1.Lreg為smooth L1函式

smooth L1損失函式定義

迴歸引數定義

(tx,ty,tw,th)代表proposal（預測的bbox）相對於anchor座標的偏移量

(tx,ty,tw,th)*代表gtbox相對於anchor座標的偏移量

anchors取樣操作

為了解決計算RPNloss正負樣本的不均衡問題，先假設正樣本佔總樣本的比例為0.5，如果數量不夠則選擇所有的正樣本。負樣本同理，每張影象上選取的樣本數量總數設定為256（人為設定）。
程式碼如下所示：

class BalancedPositiveNegativeSampler(object):
    """
    This class samples batches, ensuring that they contain a fixed proportion of positives
    """

    def __init__(self, batch_size_per_image, positive_fraction):
        # type: (int, float) -> None
        """
        Arguments:
            batch_size_per_image (int): number of elements to be selected per image
            positive_fraction (float): percentage of positive elements per batch
        """
        self.batch_size_per_image = batch_size_per_image
        self.positive_fraction = positive_fraction

    def __call__(self, matched_idxs):
        # type: (List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
        """
        Arguments:
            matched idxs: list of tensors containing -1, 0 or positive values.
                Each tensor corresponds to a specific image.
                -1 values are ignored, 0 are considered as negatives and > 0 as
                positives.

        Returns:
            pos_idx (list[tensor])
            neg_idx (list[tensor])

        Returns two lists of binary masks for each image.
        The first list contains the positive elements that were selected,
        and the second list the negative example.
        """
        pos_idx = []
        neg_idx = []
        # 遍歷每張影象的matched_idxs
        for matched_idxs_per_image in matched_idxs:
            # >= 1的為正樣本, nonzero返回非零元素索引
            # positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
            positive = torch.where(torch.ge(matched_idxs_per_image, 1))[0]
            # = 0的為負樣本
            # negative = torch.nonzero(matched_idxs_per_image == 0).squeeze(1)
            negative = torch.where(torch.eq(matched_idxs_per_image, 0))[0]

            # 指定正樣本的數量
            num_pos = int(self.batch_size_per_image * self.positive_fraction)
            # protect against not enough positive examples
            # 如果正樣本數量不夠就直接採用所有正樣本
            num_pos = min(positive.numel(), num_pos)
            # 指定負樣本數量
            num_neg = self.batch_size_per_image - num_pos
            # protect against not enough negative examples
            # 如果負樣本數量不夠就直接採用所有負樣本
            num_neg = min(negative.numel(), num_neg)

            # randomly select positive and negative examples
            # Returns a random permutation of integers from 0 to n - 1.
            # 隨機選擇指定數量的正負樣本
            perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
            perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

            pos_idx_per_image = positive[perm1]
            neg_idx_per_image = negative[perm2]

            # create binary mask from indices
            pos_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            neg_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )

            pos_idx_per_image_mask[pos_idx_per_image] = 1
            neg_idx_per_image_mask[neg_idx_per_image] = 1

            pos_idx.append(pos_idx_per_image_mask)
            neg_idx.append(neg_idx_per_image_mask)

        return pos_idx, neg_idx

Faster R-CNN中的RPN的理解

RPN的作用

RPN損失函式的定義

anchors取樣操作

Faster R-CNN中的RPN的理解

Faster R-CNN 自定義Dataset

Faster R-CNN小結

如何使用Faster R-CNN來計算物件個數

在Pytorch中使用Mask R-CNN進行例項分割操作

淺談vue中$event理解和框架中在包含預設值外傳參

使用markdown，knitr和pandoc在R語言中編寫可重現的報告

Mask R-CNN綜述

R-CNN論文解讀-將RCNN的多段訓練合併為一段，使用RoI池化層統一尺度-最大優點是訓練與檢測速度快

R語言中使用subset函式對資料進行分類管理操作

R語言中的模擬過程和離散化：泊松過程和維納過程

R語言中的隱馬爾可夫HMM模型例項

CNN中的平移不變性、相等性、位置資訊感知

R-CNN系演算法

vue中toRef理解

bilinear cnn中的計算

詳解R語言中生存分析模型與時間依賴性ROC曲線視覺化

詳解R語言中的多項式迴歸、區域性迴歸、核平滑和平滑樣條迴歸模型

R語言中括號[1]與雙中括號[[1]]的差異?

【目標檢測】一、初始的R-CNN與SVM

Faster R-CNN中的RPN的理解

RPN的作用

RPN損失函式的定義

anchors取樣操作

相關推薦