1. 程式人生 > 實用技巧 >Selective Search-目標檢測“墊腳石”

Selective Search-目標檢測“墊腳石”

寫在前面

因為找到了指導老師,現在準備開始學習目標檢測的相關內容,目標檢測有一個感性的認識還是比較簡單的,但是每個目標檢測更新換代的過程要自己實現和弄懂細節的話還是有點難度的。
這是一篇目標檢測的入門摘要部落格,寫得非常不錯,有一個感性的認識完全是夠了。
通過基本的瞭解,我發現目標檢測的基本過程過主要是候選區域的選擇,然後影象的識別和邊界框迴歸,不同的技術主要體現在避免使每個區域都訓練(預測)一次的大量時間浪費,更加先進的是不在原圖上進行候選區域的選擇。所以說候選區域(Region Proposal)的選擇就是基礎,而selective search方法就是目標檢測的的入門墊腳石。

Selective Search

演算法步驟

  • 演算法輸入:(彩色)圖片
  • 演算法輸出:不同大小的候選區域
  • step1:使用2004年Felzenszwalb提出的基於圖的影象分割演算法生成基礎的候選區域。具體論文解析部落格,我的理解是將每個圖片看作離散數學中的圖,每個畫素點相當於頂點,兩兩畫素點之間的不相似性就是邊,然後設定一個閾值(或者像論文中提出的自適應閾值)判斷頂點是否相連,最後採用最小生成樹演算法,求出該圖的多個連通分量,不同連通分量就可以表示影象的分割,同時是我們的需要的基本候選區域集合R,注意這裡的R包含的是畫素區域的最小矩形區域
  • step2:計算相鄰區域之間的相似度,放入集合S中。這裡相似度的計算稍後進行解釋。
  • step3:選擇S中相似度最大的兩個區域,ri、rj
  • step4:合併兩個區域,rt = ri U rj
  • step5:移除S中和Si和Sj有關的相似度
  • step6:計算rt與相鄰區域的相似度並放入S中
  • step7:R = R U rt (這裡可以看出之前的區域ri、rj並沒有刪除,在演算法結果中也可以明顯得看到候選框是存在包含的,至於不刪除的原因?)
  • step8:若S不為空,返回step2,否則,返回R
    貼出一張論文中的演算法步驟:

相似度的計算

相似度分為四種,分別是顏色相似度、紋理相似度、尺寸相似度和交疊相似度,最終的相似度是四種的加權和。

顏色相似度

首先必須將RGB色彩空間轉為HSV空間,然後對每個通道下的矩陣以bins = 25計算直方圖,一共75個區間。接著進行歸一化(除以區域尺寸),進行下面的計算:

式子 含義
\(r_i\) 第i個區域
\(c_i^k\) 第i個區域第k個區間的值
\(S_{colour}(r_i,r_j) = \sum_{k=1}^n\min(c_i^k,c_j^k)\) 計算兩個區域的相似度

如果兩個區域的顏色相似,直方圖的波峰和波谷的位置應該相同則S大,相似度高,否則如果波峰波谷錯開並且每次都去最小值,那麼S小,相似度低。同時這裡可以發現,這裡計算相似度的方法可以計算不同大小候選區域的相似度,我認為這是直方圖的計算的優勢同時也是為什麼要做歸一化的原因。

紋理相似度

  • 論文采用方差為1的高斯分佈在8個方向做梯度統計,然後將統計結果(尺寸與區域大小一致)以bins=10計算直方圖。直方圖區間數為8310=240(使用RGB色彩空間)。
    \(S_{texture} = \sum^n_{k=1}\min(t^k_i,t^k_j)\)
    其中\(t^k_i\)表示第i個區域第k個bins的值。
  • 在我看的程式碼中,使用的LBP(Local Binary Pattern)方法計算紋理相似度,具體部落格地址,我的理解是其實和顏色相似度的計算方式類似,只是每個通道上每個畫素點的值不是HSV空間的對應的值了,而是通過確定半徑R和鄰域畫素點個數P來確定(具體見上述部落格),這裡就可以體現出"紋理"兩字,然後接下來的方法是相同的,只是Bins可能不同。

尺寸相似度

貼出公式:\(S_{size} = 1-\frac{{size}_{ri}-{size}_{rj}}{{size_{im}}}\)
這是為了保證不會讓大區域吃掉小區域,相近大小的區域優先合併,保證合併操作尺度均勻。
尺寸相似度(操作優先順序)高 -> 低:兩個區域尺寸較小且接近 -> 兩個區域尺寸接近 -> 兩個區域尺寸較小 -> 兩個區域尺寸不接近

交疊相似度

(腦洞:這裡我一看到這個,首先想到的是IOU,交併比,所以一開始理解錯了,但是基礎的候選框是不會重合的,只會交疊,下面舉一個例子來說明)

公式:\(S_{fill}(r_i,r_j) = 1 - \frac{size(BB_{ij})-size(r_i)-size(r_j)}{size(im)}\)
其中:\(size(BB_{ij})\)表示區域i、j最小外接矩陣面積,\(size(r_i)\)表示第i個區域的像素面積
如果兩個區域交疊程度高的話,交疊相似度高,分子部分就會很小,S大。下圖可以說明:

演算法執行結果展示

我就沒好看的小姐姐圖片了(doge

原始圖片

使用Felzenszwalb影象分割後的掩碼灰度圖片

使用演算法後的影象

  • 產生較多候選框
  • 產生較少候選框

CODE

使用opencv的Python程式碼(觀察結果)

引用地址


'''
Usage:
    ./ssearch.py input_image (f|q)
    f=fast, q=quality
Use "l" to display less rects, 'm' to display more rects, "q" to quit.
'''

import sys
import cv2
import ipdb
if __name__ == '__main__':
    # If image path and f/q is not passed as command
    # line arguments, quit and display help message
    if len(sys.argv) < 3:
        print(__doc__)
        sys.exit(1)

    # speed-up using multithreads
    cv2.setUseOptimized(True)  # 使用優化
    cv2.setNumThreads(4)  # 開啟多執行緒計算

    # ipdb.set_trace()
    # read image
    im = cv2.imread(sys.argv[1])  # 這張圖片,預設是RGB格式
    # resize image
    newHeight = 200
    newWidth = int(im.shape[1]*200/im.shape[0])
    im = cv2.resize(im, (newWidth, newHeight))  # 裁剪圖片   

    # create Selective Search Segmentation Object using default parameters
    ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

    # set input image on which we will run segmentation
    ss.setBaseImage(im)

    # Switch to fast but low recall Selective Search method
    if (sys.argv[2] == 'f'):
        ss.switchToSelectiveSearchFast()

    # Switch to high recall but slow Selective Search method
    elif (sys.argv[2] == 'q'):
        ss.switchToSelectiveSearchQuality()
    # if argument is neither f nor q print help message
    else:
        print(__doc__)
        sys.exit(1)

    # run selective search segmentation on input image
    rects = ss.process()
    print('Total Number of Region Proposals: {}'.format(len(rects)))
    
    # number of region proposals to show
    numShowRects = 100
    # increment to increase/decrease total number
    # of reason proposals to be shown
    increment = 50
    count = 1
    while True:
        # create a copy of original image
        imOut = im.copy()

        # itereate over all the region proposals
        for i, rect in enumerate(rects):
            # draw rectangle for region proposal till numShowRects
            if (i < numShowRects):
                x, y, w, h = rect
                cv2.rectangle(imOut, (x, y), (x+w, y+h), (0, 255, 0), 1, cv2.LINE_AA)
            else:
                break

        # show output
        cv2.imshow("Output", imOut)
        cv2.imwrite('{}.jpg'.format(count),imOut)
        count += 1
        # record key press
        k = cv2.waitKey(0) & 0xFF # 不允許超過256

        # m is pressed
        if k == 109:
            # increase total number of rectangles to show by increment
            numShowRects += increment
        # l is pressed
        elif k == 108 and numShowRects > increment:
            # decrease total number of rectangles to show by increment
            numShowRects -= increment
        # q is pressed
        elif k == 113:
            break

    # close image show window
    cv2.destroyAllWindows()

使用skimage的Python(利於理解)

引用地址

# -*- coding: utf-8 -*-
from __future__ import division

import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy
import ipdb
import sys
import pandas as pd 

# "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
#
#  - Modified version with LBP extractor for texture vectorization


def _generate_segments(im_orig, scale, sigma, min_size):
    """
        segment smallest regions by the algorithm of Felzenswalb and
        Huttenlocher
    """

    # 開啟圖片,生成圖片掩碼指示端標籤,表示該部分圖片屬於那一部分候選框
    im_mask = skimage.segmentation.felzenszwalb(
        skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
        min_size=min_size)
    # pd.DataFrame(im_mask).to_excel('./mask.xlsx')
    # 讓生成的初始候選框成為圖片的第四通道
    im_orig = numpy.append(
        im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
    im_orig[:, :, 3] = im_mask

    return im_orig


def _sim_colour(r1, r2):  # 顏色相似性
    """
        calculate the sum of histogram intersection of colour
    """
    return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])


def _sim_texture(r1, r2):  # 紋理相似性
    """
        calculate the sum of histogram intersection of texture
    """
    return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])


def _sim_size(r1, r2, imsize):  # 尺寸相似性
    """
        calculate the size similarity over the image
    """
    return 1.0 - (r1["size"] + r2["size"]) / imsize

'''
imsize:是整個圖片大小
r1['size']:是候選區域大小
'''
def _sim_fill(r1, r2, imsize):  # 空間交疊相似性
    """
        calculate the fill similarity over the image
    """
    bbsize = (
        (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
        * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
    )
    return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
    # 兩個區域交疊越大那麼空間交疊相似性越高,那為什麼要1減去?


def _calc_sim(r1, r2, imsize):  # 總相似性
    return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
            + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))


def _calc_colour_hist(img):
    """
        calculate colour histogram for each region
        the size of output histogram will be BINS * COLOUR_CHANNELS(3)
        number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
        extract HSV
    """

    BINS = 25
    hist = numpy.array([])

    for colour_channel in (0, 1, 2):

        # extracting one colour channel
        c = img[:, colour_channel]

        # calculate histogram for each colour and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])

    # L1 normalize
    hist = hist / len(img)

    return hist


def _calc_texture_gradient(img):
    """
        calculate texture gradient for entire image
        The original SelectiveSearch algorithm proposed Gaussian derivative
        for 8 orientations, but we use LBP instead.
        output will be [height(*)][width(*)]
    """
    ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))

    for colour_channel in (0, 1, 2):
        ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
            img[:, :, colour_channel], 8, 1.0)
    # 第一個引數是灰度矩陣,第二個是代表每個畫素點選取周圍的8的畫素點編碼,第三個引數是半徑
    return ret


def _calc_texture_hist(img):
    """
        calculate texture histogram for each region
        calculate the histogram of gradient for each colours
        the size of output histogram will be
            BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
    """
    BINS = 10

    hist = numpy.array([])

    for colour_channel in (0, 1, 2):

        # mask by the colour channel
        fd = img[:, colour_channel]

        # calculate histogram for each orientation and concatenate them all
        # and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])

    # L1 Normalize
    hist = hist / len(img)

    return hist


def _extract_regions(img):

    R = {}

    # get hsv image轉換成HSV顏色空間
    hsv = skimage.color.rgb2hsv(img[:, :, :3])

    # pass 1: count pixel positions
    # 通過遍歷每個單元格,找出每個候選框的範圍
    for y, i in enumerate(img):

        for x, (r, g, b, l) in enumerate(i):  # 表示一個單元格,四個通道

            # initialize a new region
            if l not in R:
                R[l] = {
                    "min_x": 0xffff, "min_y": 0xffff,
                    "max_x": 0, "max_y": 0, "labels": [l]}
            # 修改這個候選框的範圍
            # bounding box
            if R[l]["min_x"] > x:
                R[l]["min_x"] = x
            if R[l]["min_y"] > y:
                R[l]["min_y"] = y
            if R[l]["max_x"] < x:
                R[l]["max_x"] = x
            if R[l]["max_y"] < y:
                R[l]["max_y"] = y

    # pass 2: calculate texture gradient 計算紋理梯度
    tex_grad = _calc_texture_gradient(img)

    # pass 3: calculate colour histogram of each region
    for k, v in list(R.items()):

        # colour histogram
        masked_pixels = hsv[:, :, :][img[:, :, 3] == k]  # 候選框為K的
        R[k]["size"] = len(masked_pixels / 4)
        R[k]["hist_c"] = _calc_colour_hist(masked_pixels)  # 每個候選框的顏色直方圖向量

        # texture histogram
        R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])  # 每個候選框的紋理直方圖向量

    return R  # R中的每個元素包含了區域資訊(min_x,min_y,max_x,max_y),size?,顏色和紋理直方圖


def _extract_neighbours(regions):
    # 抽取相鄰矩陣
    def intersect(a, b):  # 相交
        if (a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]):
            return True
        return False

    R = list(regions.items())
    neighbours = []
    for cur, a in enumerate(R[:-1]):
        for b in R[cur + 1:]:
            if intersect(a[1], b[1]):
                neighbours.append((a, b))

    return neighbours


def _merge_regions(r1, r2):
    new_size = r1["size"] + r2["size"]
    rt = {
        "min_x": min(r1["min_x"], r2["min_x"]),
        "min_y": min(r1["min_y"], r2["min_y"]),
        "max_x": max(r1["max_x"], r2["max_x"]),
        "max_y": max(r1["max_y"], r2["max_y"]),
        "size": new_size,
        "hist_c": (
            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
        "hist_t": (
            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
        "labels": r1["labels"] + r2["labels"]
    }
    return rt


def selective_search(
        im_orig, scale=1.0, sigma=0.8, min_size=50):
    '''Selective Search
    Parameters
    ----------
        im_orig : ndarray
            Input image
        scale : int
            Free parameter. Higher means larger clusters in felzenszwalb segmentation.
        sigma : float
            Width of Gaussian kernel for felzenszwalb segmentation.
        min_size : int
            Minimum component size for felzenszwalb segmentation.
    Returns
    -------
        img : ndarray
            image with region label
            region label is stored in the 4th value of each pixel [r,g,b,(region)]
        regions : array of dict
            [
                {
                    'rect': (left, top, width, height),
                    'labels': [...],
                    'size': component_size
                },
                ...
            ]
    '''
    assert im_orig.shape[2] == 3, "3ch image is expected"

    # load image and get smallest regions
    # region label is stored in the 4th value of each pixel [r,g,b,(region)]
    img = _generate_segments(im_orig, scale, sigma, min_size)

    if img is None:
        return None, {}

    imsize = img.shape[0] * img.shape[1]
    R = _extract_regions(img)  # R 是指候選區域

    # extract neighbouring information
    neighbours = _extract_neighbours(R)

    # calculate initial similarities
    S = {}
    for (ai, ar), (bi, br) in neighbours:
        S[(ai, bi)] = _calc_sim(ar, br, imsize)

    # hierarchal search
    while S != {}:

        # get highest similarity
        i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]
        
        # merge corresponding regions
        t = max(R.keys()) + 1.0
        R[t] = _merge_regions(R[i], R[j])

        # mark similarities for regions to be removed
        key_to_delete = []
        for k, v in list(S.items()):
            if (i in k) or (j in k):
                key_to_delete.append(k)

        # remove old similarities of related regions
        for k in key_to_delete:
            del S[k]

        # calculate similarity set with the new region
        for k in [a for a in key_to_delete if a != (i, j)]:  # 新區域相鄰的就是刪除的那些相似度的區域
            n = k[1] if k[0] in (i, j) else k[0]
            S[(t, n)] = _calc_sim(R[t], R[n], imsize)

    regions = []
    for k, r in list(R.items()):
        regions.append({
            'rect': (
                r['min_x'], r['min_y'],
                r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
            'size': r['size'],
            'labels': r['labels']
        })
    # 生成的size和rect是不能完全對應的,size是由掩碼求出來的,是一個完整的形狀
    # rect只是粗略的矩形框,這一點可以從生成的mask.xlsx檔案看出來
    return img, regions
    
if __name__ == '__main__':
    ipdb.set_trace()
    img = skimage.io.imread(sys.argv[1])
    img_lbl,regions = selective_search(img)
    skimage.io.imshow(img_lbl)

人生此處,絕對樂觀