HEVC核心編碼技術之三.幀間預測

Overview of the High Efficiency Video Coding(HEVC) Standard之四

H. 幀間預測
Interpicture Prediction

1) 預測塊(PB)的劃分
PB Partitioning:

Compared to intrapicture-predicted
CBs, HEVC supports more PB partition shapes for
interpicture-predicted CBs. The partitioning modes of
PART_2N×2N, PART_2N×N, and PART_N×2N indicate

the cases when the CB is not split, split into two equal-size
PBs horizontally, and split into two equal-size PBs vertically,
respectively. PART_N×N specifies that the CB is split into
four equal-size PBs, but this mode is only supported when the
CB size is equal to the smallest allowed CB size. In addition,

there are four partitioning types that support splitting the
CB into two PBs having different sizes: PART_2N×nU,
PART_2N×nD, PART_nL×2N, and PART_nR×2N. These
types are known as asymmetric motion partitions.
相對於幀內預測CB, HEVC對幀間預測CB提供了更多的PB劃分形狀；
下面四種模式對應的CB劃分形狀如下:
PART_2N×2N, CB不劃分；
PART_2N×N, CB水平劃分成兩個相等尺寸的PB;

PART_N×2N, CB垂直劃分成兩個相等尺寸的PB;
PART_N×N, CB劃分成四個相等尺寸的PB,
但是，只有當CB尺寸等於最小允許的CB尺寸時，這種模式才有效；

另外，還有四種劃分型別將CB劃分成兩個不同尺寸的PB：
ART_2N×nU,
PART_2N×nD,
PART_nL×2N,
PART_nR×2N.
這些劃分型別被稱作非對稱運動劃分；

Fig. 7. Integer and fractional sample positions for luma interpolation.

2) 分畫素插值
Fractional Sample Interpolation:

The samples of the PB for an intrapicture-predicted CB are obtained from those of
corresponding block region in the reference picture identified
by a reference picture index, which is at a position displaced
by the horizontal and vertical components of the motion vector.
對於幀間預測編碼塊(CB)的預測塊(PB)畫素是從參考影象--以參考影象索引標記--
的對應塊區域得到，這個位置表示為運動向量的水平和垂直分量；

Except for the case when the motion vector has an integer
value, fractional sample interpolation is used to generate the
prediction samples for noninteger sampling positions. As in
H.264/MPEG-4 AVC, HEVC supports motion vectors with
units of one quarter of the distance between luma samples.
除了使用整數值的MV外，為了相鄰畫素位置，分畫素插值被用來生成預測畫素。
和H.264/MPEG-4 AVC一樣，HEVC也支援四分之一亮度畫素的MV；

For chroma samples, the motion vector accuracy is determined
according to the chroma sampling format, which for 4:2:0
sampling results in units of one eighth of the distance between
chroma samples.
對於色度畫素來說，MV的精度依據色度畫素格式來確定，
對於4:2:0畫素格式，MV的精度為八分之一畫素；

The fractional sample interpolation for luma samples in
HEVC uses separable application of an eight-tap filter for the
half-sample positions and a seven-tap filter for the quartersample
positions. This is in contrast to the process used in
H.264/MPEG-4 AVC, which applies a two-stage interpolation
process by first generating the values of one or two
neighboring samples at half-sample positions using six-tap
filtering, rounding the intermediate results, and then averaging
two values at integer or half-sample positions. HEVC instead
uses a single consistent separable interpolation process for
generating all fractional positions without intermediate rounding
operations, which improves precision and simplifies the
architecture of the fractional sample interpolation. The interpolation
precision is also improved in HEVC by using longer
filters, i.e., seven-tap or eight-tap filtering rather than the sixtap
filtering used in H.264/MPEG-4 AVC. Using only seven
taps rather than the eight used for half-sample positions was
sufficient for the quarter-sample interpolation positions since
the quarter-sample positions are relatively close to integer sample
positions, so the most distant sample in an eight-tap
interpolator would effectively be farther away than in the half sample
case (where the relative distances of the integer-sample
positions are symmetric). The actual filter tap values of the
interpolation filtering kernel were partially derived from DCT
basis function equations.
在HEVC中，亮度畫素的分畫素插值應用了兩種方法:
對半畫素使用八階濾波器；
對四分之一畫素使用7階濾波器；
這一點和H.264/MPEG-4 AVC是不一樣的；
H.264/MPEG-4 AVC是用的兩步插值處理:
先使用六階濾波器，舍入均值，在半畫素位置生成一個或兩個相鄰畫素的值；
然後在整畫素和半畫素位置取兩個值的平均；
HEVC對所有分畫素位置使用了獨立的插值處理，而不用中間的舍入操作，
這種方式提高了精度並簡化了分畫素插值的架構；
而且，在HEVC中，使用更長的濾波器，如七階和八階濾波器來提高插值精度，
而不是像在H.264/MPEG-4 AVC中用的六階濾波器；
對半畫素位置使用七階濾波器，而不像四分之一插值畫素位置使用八階濾波器，
是因為四分之一畫素位置更接近整畫素位置，
因此，在八階插值中，最遠的畫素相比半畫素情況會更遠；
在半畫素中，相對於整畫素的位置是非對稱的；
實際上，插值濾波器核心的濾波階值部分是從DCT基本函式等式中推匯出來的；

In Fig. 7, the positions labeled with upper-case letters,
Ai,j , represent the available luma samples at integer sample
locations, whereas the other positions labeled with lower-case
letters represent samples at noninteger sample locations, which
need to be generated by interpolation.
在圖7中，標記為大寫字母的位置，Ai,j，表示在整畫素位置有效的亮度畫素；
因此，其它的標記為小寫字母的位置表示非整數畫素位置的畫素，它們是需要插值生成的；

The samples labeled a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0
are derived from the samples Ai,j by applying the eight-tap
filter for half-sample positions and the seven-tap filter for the
quarter-sample positions as follows:
a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0畫素都是對Ai,j畫素，
在半畫素位置時，用八階濾波器，
在四分之一畫素位置，用七階濾波器，推導等式如下：

a0,j = (i=_3..3 Ai,j qfilter[i]) >> (B _ 8)
b0,j = (i=_3..4 Ai,j hfilter[i]) >> (B _ 8)
c0,j = (i=_2..4 Ai,j qfilter[1 _ i]) >> (B _ 8)
d0,0 = (i=_3..3 A0,j qfilter[j]) >> (B _ 8)
h0,0 = (i=_3..4 A0,j hfilter[j]) >> (B _ 8)
n0,0 = (j=_2..4 A0,j qfilter[1 _ j]) >> (B _ 8)

where the constant B ≥ 8 is the bit depth of the reference
samples (and typically B = 8 for most applications) and the
filter coefficient values are given in Table II. In these formulas,
>> denotes an arithmetic right shift operation.
等式中，B是參考畫素的位元深度，通常為8；
濾波器係數值如表II中所示，
在這些等式中，>>表示算術右移操作；

TABLE II
Filter Coefficients for Luma Fractional Sample Interpolation

The samples labeled e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,
and r0,0 can be derived by applying the corresponding filters
to samples located at vertically adjacent a0,j, b0,j and c0,j
positions as follows:
畫素e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0,and r0,0的值是
對垂直相鄰的畫素位置a0,j, b0,j and c0,j使用如下等式得到的：

e0,0 = (v=_3..3 a0,v qfilter[v]) >> 6
f0,0 = (v=_3..3 b0,v qfilter[v]) >> 6
g0,0 = (v=_3..3 c0,v qfilter[v]) >> 6
i0,0 = (v=_3..4 a0,v hfilter[v]) >> 6
j0,0 = (v=_3..4 b0,v hfilter[v]) >> 6
k0,0 = (v=_3..4 c0,v hfilter[v]) >> 6
p0,0 = (v=_2..4 a0,v qfilter[1 _ v]) >> 6
q0,0 = (v=_2..4 b0,v qfilter[1 _ v]) >> 6
r0,0 = (v=_2..4 c0,v qfilter[1 _ v]) >> 6.

The interpolation filtering is separable when B is equal to
8, so the same values could be computed in this case by
applying the vertical filtering before the horizontal filtering.
When implemented appropriately, the motion compensation
process of HEVC can be performed using only 16-b storage
elements (although care must be taken to do this correctly).
當B等於8時，插值濾波器是獨立的；
因此，同一值在水平濾波之前已被垂直濾波計算；
如果實現得很好，HEVC的運動補償處理可以只需要16位元的儲存空間；

It is at this point in the process that weighted prediction
is applied when selected by the encoder. Whereas
H.264/MPEG-4 AVC supported both temporally implicit and
explicit weighted prediction, in HEVC only explicit weighted
prediction is applied, by scaling and offsetting the prediction
with values sent explicitly by the encoder. The bit depth of
the prediction is then adjusted to the original bit depth of
the reference samples. In the case of uniprediction, the interpolated
(and possibly weighted) prediction value is rounded,
right-shifted, and clipped to have the original bit depth. In the
case of biprediction, the interpolated (and possibly weighted)
prediction values from two PBs are added first, and then
rounded, right-shifted, and clipped.
如果編碼器有選擇了，那麼現在進入權值預測處理；
H.264/MPEG-4 AVC支援隱示和顯示的權值預測；
而在HEVC中，只能使用顯示的權值預測；
需要通過縮放和位移預測值並顯式地在編碼端傳送來實現；
然後，預測的位元深度調整到參考畫素原始位元深度；
在單向預測中，插值預測值被舍入，右移，並切斷到原始位元深度；
在雙向預測中，從兩個PB中得到的插值預測值先被相加，然後舍入，右移和切斷；

In H.264/MPEG-4 AVC, up to three stages of rounding
operations are required to obtain each prediction sample (for
samples located at quarter-sample positions). If biprediction is
used, the total number of rounding operations is then seven
in the worst case. In HEVC, at most two rounding operations
are needed to obtain each sample located at the quarter-sample
positions, thus five rounding operations are sufficient in the
worst case when biprediction is used. Moreover, in the most
common usage, where the bit depth B is 8 b, the total number
of rounding operations in the worst case is further reduced
to 3. Due to the lower number of rounding operations, the
accumulated rounding error is decreased and greater flexibility
is enabled in regard to the manner of performing the necessary
operations in the decoder.
在H.264/MPEG-4 AVC中，需要對第個預測畫素(位於四分之一畫素位置的畫素)
進行三步的舍入操作；
而如果是雙向預測，則在最壞的情況下，需要最多可能到七步的舍入操作；
在HEVC中，最多需要兩步舍入操作來得到每個位於四分之一畫素位置的畫素；
因此，對於雙向預測，最多隻需要五步的舍入操作；
而且，對於最通常的情況，位元嘗試B等於8時，在最壞情況下整個舍入操作也
只需要三步；
由於舍入操作步驟的減少，累積的舍入錯誤會增加，但對於解碼器來說，
有了更多的靈活性；

The fractional sample interpolation process for the chroma
components is similar to the one for the luma component,
except that the number of filter taps is 4 and the fractional
accuracy is 1/8 for the usual 4:2:0 chroma format case. HEVC
defines a set of four-tap filters for eighth-sample positions, as
given in Table III for the case of 4:2:0 chroma format (where,
in H.264/MPEG-4 AVC, only two-tap bilinear filtering was applied).
對於色度分量的分畫素插值處理和亮度分量是相似的；
只是在4：2：0色度格式下，分畫素的精度為1/8，並且使用四階濾波器；
HEVC對八分之一畫素位置定義了一個四階濾波器集來處理，
如表III中所示：

TABLE III
Filter Coefficients for Chroma FractionalSample Interpolation

Filter coefficient values denoted as filter1[i], filter2[i], filter3[
i], and filter4[i] with i = _1,. . . , 2 are used for interpolating
the 1/8th, 2/8th, 3/8th, and 4/8th fractional positions
for the chroma samples, respectively. Using symmetry for the
5/8th, 6/8th, and 7/8th fractional positions, the mirrored values
of filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,
2 are used, respectively.
標記為filter1[i], filter2[i], filter3[i], and filter4[i] with i = _1,. . . , 2
是濾波係數值用於1/8th, 2/8th, 3/8th, and 4/8th分畫素位置的插值；
對於非對稱的5/8th, 6/8th, and 7/8th分畫素位置，
則使用filter3[1_i], filter2[1_i], and filter1[1_i] with i = _1, . . . ,2的映象值；

3) 合併模式
Merge Mode:

Motion information typically consists of
the horizontal and vertical motion vector displacement values,
one or two reference picture indices, and, in the case of prediction
regions in B slices, an identification of which reference
picture list is associated with each index. HEVC includes a
merge mode to derive the motion information from spatially
or temporally neighboring blocks. It is denoted as merge mode
since it forms a merged region sharing all motion information.
運動資訊通常由
水平和垂直運動向量位移值，
一個或兩個(對於B片，每個參考影象列表都有一個索引)參考影象索引組成；
HEVC允許使用一個合併模式來從空域或時域相鄰塊來推導運動停下；
命名為合併模式是因為這種方式共享了所有的運動資訊來形成一個合併區域；

The merge mode is conceptually similar to the direct and
skip modes in H.264/MPEG-4 AVC. However, there are two
important differences. First, it transmits index information to
select one out of several available candidates, in a manner
sometimes referred to as a motion vector competition scheme.
It also explicitly identifies the reference picture list and reference
picture index, whereas the direct mode assumes that
these have some predefined values.
合併模式在概念上和H.264/MPEG-4 AVC中的direct和skip模式相似；
然而，這兩者有兩個很大的不同點:
首先，它是從多個有效候選中選擇一個出來作為索引資訊傳輸，這是一種MV競爭方案；
其次，它顯式地標識了參考影象列表和參考影象索引，而direct模式假定這個的值是相同的；

Fig. 8. Positions of spatial candidates of motion information.

The set of possible candidates in the merge mode consists
of spatial neighbor candidates, a temporal candidate, and
generated candidates. Fig. 8 shows the positions of five spatial
candidates. For each candidate position, the availability is
checked according to the order {a1, b1, b0, a0, b2}. If the
block located at the position is intrapicture predicted or the
position is outside of the current slice or tile, it is considered
as unavailable.
合併模式中的可能候選者由
空域相鄰候選者，
時域相鄰候選者，
生成的候選者組成。
圖8顯示了5個空域候選者的位置；
對於每個候選者的位置，依據{a1, b1, b0, a0, b2}這個順序來檢查有效性；
如果這個塊的位置是幀內預測模式，或是超出了當前片或瓦片，就認為它是無效的；

After validating the spatial candidates, two kinds of redundancy
are removed. If the candidate position for the current
PU would refer to the first PU within the same CU, the
position is excluded, as the same merge could be achieved by
a CU without splitting into prediction partitions. Furthermore,
any redundant entries where candidates have exactly the same
motion information are also excluded.
在對空域候選者驗證完成後，下面兩種型別的冗餘被移除：
對於當前PU, 如果候選者的位置是同一個CU中的第一個PU,這個位置的候選者被排除；
因為同樣的合併可以通過不對預測單元進行劃分來實現；
有著完全相同運動資訊的候選都也要被移除；

For the temporal candidate, the right bottom position just
outside of the collocated PU of the reference picture is used if
it is available. Otherwise, the center position is used instead.
The way to choose the collocated PU is similar to that of prior
standards, but HEVC allows more flexibility by transmitting
an index to specify which reference picture list is used for the
collocated reference picture.
對於時域候選者，參考影象對應PU外的右下位置，如果有效，則可以用作候選者；
否則，使用中心位置來代替；
這種選擇對應位置PU的方法在以前的編碼標準中也多有應用；
而HEVC只是通過傳輸一個索引來說明哪個參考影象列表被用作對應參考影象，這樣的更靈活；

One issue related to the use of the temporal candidate is
the amount of the memory to store the motion information
of the reference picture. This is addressed by restricting the
granularity for storing the temporal motion candidates to only
the resolution of a 16×16 luma grid, even when smaller
PB structures are used at the corresponding location in the
reference picture. In addition, a PPS-level flag allows the
encoder to disable the use of the temporal candidate, which is
useful for applications with error-prone transmission.
時域候選者的一個問題是儲存參考影象的運動資訊需要記憶體開銷；
這個問題可以通過限制儲存時域運動候選者的粒度到來解決，
如只允許16x16的亮度網格，即使更小的PB結構被用於參考影象對應位置；
另外，在PPS級有標誌可以關閉時域候選者的使用，
這對於易出錯傳輸鏈路的應用很有用；

The maximum number of merge candidates C is specified
in the slice header. If the number of merge candidates found
(including the temporal candidate) is larger than C, only the
first C – 1 spatial candidates and the temporal candidate
are retained. Otherwise, if the number of merge candidates

HEVC核心編碼技術之三.幀間預測

HEVC核心編碼技術之三.幀間預測

HEVC演算法和體系結構：預測編碼之幀間預測

HEVC幀間預測之三——TEncCu::xCheckRDCostMerge2Nx2N函式分析

工業網際網路平臺核心技術之三：平行計算與分散式計算

【H.264/AVC視訊編解碼技術詳解】二十三、幀間預測編碼(1)：幀間預測編碼的基本原理

HEVC幀間預測之五——運動估計（二）

HEVC幀間預測之四——運動估計（一）

HEVC幀間預測之七——運動估計（四）

H.264預測編碼之幀間預測

HM編碼器程式碼閱讀(32)——幀間預測之AMVP/Merge模式（七）encodeResAndCalcRdInterCU函式：殘差計算、變換量化

windows核心編程之進程間共享數據

PC軟體開發技術之三：C#操作SQLite資料庫

HEVC幀內/幀間預測：Cross-Component Prediction (CCP)

[幀間編碼]概分法幀間編碼快速判定

安全編碼實踐之三：

幀間預測模式獲取

幀內/幀間預測要點

HEVC學習（五） —— 幀內預測系列之三

Java核心技術之基本數據類型

【IPC進程間通訊之三】內存映射文件Mapping File

HEVC核心編碼技術之三.幀間預測

相關推薦