區域性立體匹配演算法BM中的匹配代價聚合方法

阿新 • • 發佈：2019-01-04

Block Matching methods

The following matching costs are defined for patches centered at the pixel positions {mathbf p} and {mathbf q} .

The matching costs described below are trivially extended to color images by not making distinctions between the channels. This is equivalent to consider larger patches (three times the number of pixels) containing the pixel values from all the channels.

SSD (sum of squared differences) and SAD (sum of absolute differences)

By taking p={1,2} in the following equation, we have the Sum of Absolute Differences (SAD) and the Sum of Squared Differences (SSD) respectively:

$mbox{SSD}(mathbf p,mathbf q) := sum_{mathbf t in B(0,r)} | u({mathbf p} + mathbf t), v({mathbf q} + mathbf t)|^p.$

But any value of will work as well.

ZSSD (Zero mean SSD, or SSD-mean)

The Zero-mean SSD

adds invariance with respect to additive contrast changes to the SSD. It is defined by

$mbox{ZSSD}(mathbf p, mathbf q) := sum_{mathbf t in B(0,r)} left| u({mathbf p} + mathbf t) - v({mathbf q} + mathbf t) + mu_v({mathbf q})- mu_u({mathbf p}) right|^2 ,$

where $textstyle mu_u({mathbf p})= frac{1}{| B(0,r)|} sum_{mathbf t in B(0,r)} u(mathbf p+mathbf t)$ is the precomputed average of pixel values in over the block centered at . The ZSSD cannot be implemented with integral images if expressed in this form, because the pixel value distances are dependent on the position of the pixels. Indeed, depending on the position, a different pair of means $mu_{v}$

and $mu_{u}$ are being subtracted. However, expanding the square we get

$mbox{ZSSD}(mathbf p,mathbf q) := mbox{SSD}(mathbf p,mathbf q) - {|B(0,r)|} left( mu_v({mathbf q})- mu_u({mathbf p}) right)^2$

and since mu_v(cdot) and mu_u(cdot) can be also precomputed by integral images, the overall cost of ZSSD is comparable to SSD.

SSD/Norm

Normalizing the patches by their L^2 -norms renders the comparison robust to multiplicative contrast changes. The SSD/Norm is defined as

$mbox{SSD/Norm}(mathbf p, mathbf q) := sum_{ mathbf t in B(0,r)} left| frac{ u({mathbf p} + mathbf t) } { | u|_{B({mathbf p},r)} | } - frac{ v({mathbf q} + mathbf t) } { | v|_{B({mathbf q},r)} | } right|^2 ,$

where $textstyle | u|_{B({mathbf p},r)} | = sqrt{sum_{mathbf t in B(0,r)} left( u(mathbf p+mathbf t) right)^2}$ . Expanding the above expression we get to a correlation formula

$mbox{SSD/Norm}(mathbf p, mathbf q) := 2 - 2 frac{ mbox{Prod}(mathbf p ,mathbf q) } { | u|_{B({mathbf p},r)} | ,, | v|_{B({mathbf q},r)} | } ,$

where $mbox{Prod}(mathbf p ,mathbf q):= sum_{mathbf t in B(0,r)} u({mathbf p} + mathbf t) v({mathbf q} + mathbf t)$ , which can be computed using integral images by taking d(a,b) = ab , and the terms $| u|_{B({mathbf p},r)} |$ and $| v|_{B({mathbf q},r)} |$ can be precomputed.

The implementation of this cost should prevent dividing by zero. Under this circumstance, the cost should be set to .

NCC (normalized cross correlation)

The Normalized Cross Correlation combines the benefits of ZSSD and SSDNorm as it is invariant to affine contrast changes. It is defined as

mbox{NCC}(mathbf p, mathbf q) := 1 - mbox{Corr}(mathbf p, mathbf q) ,

with

$mbox{Corr}(mathbf p, mathbf q) := frac{ frac1{|B(0,r)|} sum_{mathbf t in B(0,r)} left( u(mathbf p + mathbf t) - mu_u{(mathbf p)}right) left( v(mathbf q + mathbf t) - mu_v{(mathbf q)}right) }{ sqrt{ sigma^2left( u|_{B({mathbf p},r)} right) , sigma^2left( v|_{B({mathbf q},r)} right) } } ,$

and where $sigma^2left( u|_{B({mathbf p},r)}right)$ is the sample variance of the block centered at .

mbox{Corr} takes values in [-1,1] , where 1 indicates the maximum correspondence. For a consistent notation across the different methods mbox{NCC} is defined as 1 - mbox{Corr} . The above expression can be written as

$mbox{Corr}(mathbf p, mathbf q) := frac{ frac1{|B(0,r)|} mbox{Prod}(mathbf p,mathbf q) - mu_u{(mathbf p)} ,cdot , mu_v{(mathbf q)} }{ sqrt{ sigma^2left( u|_{B({mathbf p},r)} right) , sigma^2left( v|_{B({mathbf q},r)} right) } } ,$

which can be computed using one integral image (for mbox{Prod} ) for each offset value ( mathbf p-mathbf q ), and where the terms and sigma^2 can be precomputed by integral images too!

The implementation of this cost must prevent dividing by zero. Under that circumstance the cost should be set to .

AFF (“affine” similarity measure)

The “affine” similarity measure proposed by Delon and Desolneux (2010) is also invariant to affine contrast changes, but differently from NCC it can distinguish flat patches from those containing edges. It can be seen from the definition of mbox{Corr} that if one of the patches is flat, then the correlation will be zero independently of the content of the second patch. In contrast, the “affine” similarity measure defined below can be non zero under the same circumstances:

$mbox{AFF}(mathbf p, mathbf q) := max left( min_{alphage0, beta} left| U_{{mathbf p}} - alpha V_{{mathbf q}} - beta right| , min_{alphage0, beta} left| V_{{mathbf q}} - alpha U_{{mathbf p}} - beta right| right) ,$

where $U_{{mathbf p}}=u|_{B({mathbf p},r)}$ , $V_{{mathbf q}}=v|_{B({mathbf q},r)}$ and | cdot | denotes the L^2 -norm. It can be explicitly computed by the formula

$mbox{AFF}^2(mathbf p, mathbf q) := max left( sigma^2left( u|_{B({mathbf p},r)} right) , sigma^2left( v|_{B({mathbf q},r)} right) right) cdot min left(1, 1- mbox{Corr}( {mathbf p}, {mathbf q} ) | mbox{Corr}( {mathbf p}, {mathbf q} ) | right),$

which can be implemented with integral images after pre-computing $sigma^2left( u|_{B({mathbf p},r)} right)$ and $sigma^2left( v|_{B({mathbf q},r)} right)$ , which in turn can be done with two integral images, one for the square of and one for .

LIN is a simpler variant of the AFF cost that drops the invariance to additive changes, but has a similar performance

$mbox{LIN}(mathbf p, mathbf q) := max left( min_{alphage0} left| U_{{mathbf p}} - alpha V_{{mathbf q}} right| , min_{alphage0} left| V_{{mathbf q}} - alpha U_{{mathbf p}} right| right).$

It can also be implemented with integral images by re-writing it as

$mbox{LIN}^2(mathbf p, mathbf q) := max(| U_{{mathbf p}}|^2 , | V_{{mathbf q}}|^2) left( 1- frac{left[ mbox{Prod}( {mathbf p}, {mathbf q} )right]^2}{| U_{{mathbf p}}|^2 | V_{{mathbf q}}|^2} right).$

The implementation of these costs must prevent dividing by zero. Under that circumstance, the cost should be set respectively to the maximum variance or norm of the two patches.

BTSAD and BTSSD (Birchfield & Tomasi sampling insensitive pixel dissimilarities)

By BTSAD or BTSSD, we mean an adaptation to SAD or SSD of the Birchfield and Tomasi (1998) sampling insensitive pixel dissimilarity. This matching cost, originally proposed for stereo matching, is designed to be insensitive to image sampling. For smooth (non aliased) images this cost is proven to be stable with respect to subpixel translations of the patches, while still evaluating the costs at integer positions.

This cost is particularly useful in combination with global block matching methods (Dynamic Programming for instance), where the dissimilarity is accumulated along scanlines, and where the subpixel computations are unaffordable. The insensitivity usually prevents the misclassification of some pixels as occlusions.

The usefulness of this matching cost for a ‘‘local’’ block matching method is less clear, since the minimum SAD/SSD cost may not change. Nevertheless it is worth comparing its performance with subpixel matching.

The matching costs proposed by Birchfield & Tomasi replace the definition of the pixel value distances with $d^{BT}$ . For the case of the SSD we get

$BTSSD(mathbf p,mathbf q) := sum_{mathbf t in B(0,r)} |d^{BT}( I_{L}({mathbf p} + mathbf t), I_{R}({mathbf q} + mathbf t))|^2,$

where the distance $d^{BT}$ is defined as a symmetrization of the distance by

$d^{BT}(I_{L}(mathbf p) , I_{R}(mathbf q)) = min ( bar d ( I_{L}(mathbf p), I_{R}(mathbf q)), bar d ( I_{R}(mathbf q), I_{L}(mathbf p)) ),$

and where bar d is

$bar d ( I_{L}(mathbf p), I_{R}(mathbf q))= max (0,I_{L}(mathbf p) - I^{max}_{R}(mathbf q),I^{min}_{R}(mathbf q) - I_{L}(mathbf p)),$
$bar d ( I_{R}(mathbf q), I_{L}(mathbf p))= max (0,I_{R}(mathbf q) - I^{max}_{L}(mathbf p),I^{min}_{L}(mathbf p) - I_{R}(mathbf q)).$

The four precomputed images $I^{max}_{R}, I^{min}_{R},I^{max}_{L}, I^{min}_{L}$ contain the maximum and minimum interpolated values of the image in a half pixel neighbor. For instance, with $I_{R}$ we have

$I^{min}_{R}(mathbf q) = min(I^-_{R}(mathbf q), I^+_{R}(mathbf q),I_{R}(mathbf q)) quad mbox{and} quad I^{max}_{R}(mathbf q) = max(I^-_{R}(mathbf q), I^+_{R}(mathbf q),I_{R}(mathbf q)),$

where the interpolated values are computed by bilinear interpolation:

$I^-_{R}(mathbf q) = frac12(I_{R}(mathbf q)+I_{R}(mathbf q-(1,0)^T)), quad I^+_{R}(mathbf q) =frac12(I_{R}(mathbf q)+I_{R}(mathbf q+(1,0)^T)).$

Note that the maximum (and minimum) of the interpolated pixels $I^{max}$ (resp. $I^{min}$ ) are only computed along the horizontal axis (definitions of I^+ and I^- ). This is because the cost was originally proposed for stereo matching, so the subpixel differences are supposed to occur only along the horizontal axis. A straightforward extension of this cost for two dimensional block matching replaces the definition of $I^{max}$ (resp. $I^{min}$ ) to consider subpixel offsets also in the vertical direction

$I^{max}(mathbf q) = max left{ hat I(mathbf q+ mathbf V) right} quad mbox{with} quad mathbf V = left[-1/2,1/2right]^2,$

where hat I denotes the bilinear interpolation of the image . For this interpolation the maximum (resp. minimum) will occur at one of nine possible positions, therefore

$I^{max}(mathbf q) = max left{ hat I(mathbf q+ mathbf s) : s in {-frac12, 0,frac12} times {-frac12, 0,frac12} right}.$

Subpixel

This option permits to compute a subpixel disparity map. It is important to note that this is not a subpixel refinement step, the algorithm will consider all the subpixel disparities within the disparity range. This option is ignored when computing 2D displacement fields.

Output, Evaluation and Statistics

The output of the block matching methods always contain the following images:

disparity: Represents the computed disparity map using a grayscale coding. Next to this disparity map, another image is displayed for those stereo pairs that provide a ground truth. This second image (titled METHOD/error ) shows the differences between the computed disparity and the ground truth.
matching cost: Represents the minimum matching cost for each pixel, using a grayscale coding (black is 0).
back-projection: Is obtained by warping the secondary image using the displacement field computed from the first image to the second image (denoted and respectively). That is, the backprojection is

Warping using the ground truth displacement field, produces an image that matches (except at the occlusions) the first image. Thus, comparing with permits to assess the quality of the displacement field. To facilitate the comparison the image back-projection/error shows the pixel-wise difference , shown using a grayscale coding.
first/second image: The color range of poorly contrasted images is stretched for visualization.

When the ground truth for the stereo pair is available, the errors with respect to it are computed and shown. The errors are shown as images with fixed range from -4 to 4. In this case the following images are also shown as part of the output:

ground truth: Represents the ground truth using a grayscale coding.
evaluation mask: Is an optional input image that indicates which pixels are considered (white) in the computation of the statistics (usually the boundaries are not considered). The pixels discarded by the evaluation mask are not shown in the error displays, are not considered in the statistics and are also removed from the non-occluded mask (see below).

Evaluation mask example

occlusion mask: Is a binary mask indicating which pixels are occluded (black), and therefore not considered for the computation of the statistics on Non Occluded areas. This mask is only present in the default input datasets of the demo.

Occlusion mask example

Show |Err| > 1: This binary mask indicates the points where the ground truth and the computed disparity differ more than 1 pixel . These pixels are superimposed on the disparity map and painted in RED.

Statistics

The statistics consider two aspects of the results: the density and the precision with respect to the ground truth disparity. The errors are computed as the absolute difference between computed disparity values and the ground truth.

Density: It is the percentage of pixels that for which the algorithm has returned a disparity. This quantity does not consider the pixels removed by the evaluation mask.
Percentage of pixels with error > 1 in the Evaluation Mask: This quantity is computed considering the pixels in the evaluation mask.
Percentage of pixels with error > 1 in Occluded areas: Considers the occluded pixels (according to the mask) that are in the evaluation mask. Since the occluded regions cannot be seen in one of the images, this quantity evaluates how well the algorithm extrapolates the disparity map.
Percentage of pixels with error > 1 in NON Occluded areas: Considers the non-occluded pixels that are in the evaluation mask. This quantity is similar to Eval. Mask but is not contaminated by the errors on occluded regions.

區域性立體匹配演算法BM中的匹配代價聚合方法

Block Matching methods

SSD (sum of squared differences) and SAD (sum of absolute differences)

ZSSD (Zero mean SSD, or SSD-mean)

SSD/Norm

NCC (normalized cross correlation)

AFF (“affine” similarity measure)

BTSAD and BTSSD (Birchfield & Tomasi sampling insensitive pixel dissimilarities)

Subpixel

Output, Evaluation and Statistics

Statistics

區域性立體匹配演算法BM中的匹配代價聚合方法

Opencv立體匹配演算法BM、SGBM、GC演算法的狀態引數

LLE((locally linear embedding) 區域性線性嵌入演算法，一種降維方法

雙目立體匹配經典演算法之Semi-Global Matching（SGM）概述：代價聚合（Cost Aggregation）

雙目立體視覺匹配演算法-----SAD匹配演算法、BM演算法、SGBM演算法、GC演算法

立體視覺-opencv中立體匹配相關程式碼三種匹配演算法比較

SAD立體匹配演算法在opencv中的實現

立體匹配演算法（Stereo Matching）及其在OpenCV中的應用

OpenCV3.0立體匹配演算法對比研究（SGBM、BM、GC）

雙目立體視覺匹配演算法之視差圖disparity計算——SAD演算法、SGBM演算法

幾種典型的立體匹配演算法

Displets立體匹配演算法

Java實現演算法導論中樸素字串匹配演算法

立體像對匹配演算法stereoMatcher程式碼

幾種典型的立體匹配演算法（opencv 1.0 DP 演算法）（比較全）

【視覺-立體視覺】全域性匹配演算法SGBM實現（含動態規劃DP）詳解

OpenCV3.4兩種立體匹配演算法效果對比

基於最小生成樹的實時立體匹配演算法簡介

雙目立體匹配演算法--SAD（C++\FPGA）

立體匹配演算法_自適應視窗_Cross Based Support Region

區域性立體匹配演算法BM中的匹配代價聚合方法

Block Matching methods

SSD (sum of squared differences) and SAD (sum of absolute differences)

ZSSD (Zero mean SSD, or SSD-mean)

SSD/Norm

NCC (normalized cross correlation)

AFF (“affine” similarity measure)

BTSAD and BTSSD (Birchfield & Tomasi sampling insensitive pixel dissimilarities)

Subpixel

Output, Evaluation and Statistics

Statistics

相關推薦