Faster R-CNN中的RPN的理解
RPN的作用
rpn是相對於選擇性搜尋策略做出的改進,該區域生成網路的輸入是backbone的一個或多個特徵層,維度不妨設為(B,C,H,W)先通過3x3的卷積將輸入的特徵圖的特徵進行融合,接著利用兩個獨立的1x1卷積輸出objectness和boundingbox_regression。得到的objectness輸出向量的維度為(B,K,H,W),其中k為每個cell上生成的anchors的個數。boundingbox_regression的向量的維度為(B,Kx4,H,W),它的值是proposal相對於anchor的相對偏移量。
注:"For simplicity we implement the cls layer as a two-class softmax layer. Alternatively, one may use logistic regression to produce k scores."作者提出objectness輸出向量的維度也可以為為(B,Kx2,H,W),這樣將問題看作是一種二分類問題。為簡單期間,採用前者直接採用迴歸產生k的分數的方法。
RPN損失函式的定義
anchors的生成。對於一個尺寸為(B,C,H,W)的特徵圖而言,令每個cell生成k個anchors,則該特徵層生成HxWxK個anchors。將anchors的座標映射回原圖,在原圖上就會產生一系列高寬比不同的anchor框。為每一個anchor框分配一個class標籤,代表它們是或者不是目標。有兩種anchor框可以分配正標籤:(1)anchor框和gtbox框的iou大於0.7(這個值是人為指定的)。(2)在某個gtbox匹配的所有anchor框中,該anchor具有最大的值,即使它的iou值小於閾值0.7。(個人認為這樣做的原因是為每個gtbox框分配一個正樣本,以期提高後續檢測的召回率)。單個gtbox可以匹配多個anchors,而那些與gtbox計算的iou值小於0.3的anchor框被視為負樣本。這樣,可以將目標損失函式定義為:
其中,i:一個batch中anchor的索引
pi:對於索引為i的anchor,預測其為object的概率
pi*:gtbox標籤,當anchor為正樣本時,標籤值為1;當anchor為負樣本時,標籤值為0
ti:第i個anchor預測的bbox的座標,即為proposal
ti*:與第i個positive anchor相關聯的gtbox的座標
Lcls:分類損失函式,計算二分類損失
pi**Lreg:迴歸損失,只有anchor為正樣本時才會起作用,此時pi* *為1.Lreg為smooth L1函式
smooth L1損失函式定義
迴歸引數定義
(tx,ty,tw,th)代表proposal(預測的bbox)相對於anchor座標的偏移量
(tx,ty,tw,th)*代表gtbox相對於anchor座標的偏移量
anchors取樣操作
為了解決計算RPNloss正負樣本的不均衡問題,先假設正樣本佔總樣本的比例為0.5,如果數量不夠則選擇所有的正樣本。負樣本同理,每張影象上選取的樣本數量總數設定為256(人為設定)。
程式碼如下所示:
class BalancedPositiveNegativeSampler(object):
"""
This class samples batches, ensuring that they contain a fixed proportion of positives
"""
def __init__(self, batch_size_per_image, positive_fraction):
# type: (int, float) -> None
"""
Arguments:
batch_size_per_image (int): number of elements to be selected per image
positive_fraction (float): percentage of positive elements per batch
"""
self.batch_size_per_image = batch_size_per_image
self.positive_fraction = positive_fraction
def __call__(self, matched_idxs):
# type: (List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
"""
Arguments:
matched idxs: list of tensors containing -1, 0 or positive values.
Each tensor corresponds to a specific image.
-1 values are ignored, 0 are considered as negatives and > 0 as
positives.
Returns:
pos_idx (list[tensor])
neg_idx (list[tensor])
Returns two lists of binary masks for each image.
The first list contains the positive elements that were selected,
and the second list the negative example.
"""
pos_idx = []
neg_idx = []
# 遍歷每張影象的matched_idxs
for matched_idxs_per_image in matched_idxs:
# >= 1的為正樣本, nonzero返回非零元素索引
# positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
positive = torch.where(torch.ge(matched_idxs_per_image, 1))[0]
# = 0的為負樣本
# negative = torch.nonzero(matched_idxs_per_image == 0).squeeze(1)
negative = torch.where(torch.eq(matched_idxs_per_image, 0))[0]
# 指定正樣本的數量
num_pos = int(self.batch_size_per_image * self.positive_fraction)
# protect against not enough positive examples
# 如果正樣本數量不夠就直接採用所有正樣本
num_pos = min(positive.numel(), num_pos)
# 指定負樣本數量
num_neg = self.batch_size_per_image - num_pos
# protect against not enough negative examples
# 如果負樣本數量不夠就直接採用所有負樣本
num_neg = min(negative.numel(), num_neg)
# randomly select positive and negative examples
# Returns a random permutation of integers from 0 to n - 1.
# 隨機選擇指定數量的正負樣本
perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
pos_idx_per_image = positive[perm1]
neg_idx_per_image = negative[perm2]
# create binary mask from indices
pos_idx_per_image_mask = torch.zeros_like(
matched_idxs_per_image, dtype=torch.uint8
)
neg_idx_per_image_mask = torch.zeros_like(
matched_idxs_per_image, dtype=torch.uint8
)
pos_idx_per_image_mask[pos_idx_per_image] = 1
neg_idx_per_image_mask[neg_idx_per_image] = 1
pos_idx.append(pos_idx_per_image_mask)
neg_idx.append(neg_idx_per_image_mask)
return pos_idx, neg_idx