06415009
TRANSCRIPT
-
7/29/2019 06415009
1/9
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1375
Contributed Paper
Manuscript received 10/15/12
Current version published 12/28/12
Electronic version published 12/28/12. 0098 3063/12/$20.00 2012 IEEE
A Survey of Fast Mode Decision Algorithms
for Inter-Prediction and Their Applications
to High Efficiency Video Coding
Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee
Abstract The emerging High Efficiency Video Coding
(HEVC) standard attempts to improve the coding efficiency by a
factor of two over H.264/AVC using new compression tools with
high computational complexity. The increased computational
complexity makes the real-time execution with reasonable
computing power become one of the critical concerns for the
commercialization of HEVC. A large number of prediction
modes are the main causes of the increased complexity of HEVC.
Thus, a fast decision of a prediction mode needs to be effectively
used to reduce the computational complexity. To take advantageof large amounts of previous works and to find a guide for
application to HEVC, this paper presents a survey of these efforts
for the previous standards, especially for H.264/AVC, and
examines the possibility of the previous algorithms to be
applicable for HEVC. To this end, previous algorithms are
categorized and then the effectiveness of each category for
HEVC is evaluated. For this evaluation, a previous algorithm is
modified for HEVC when it is not applicable to HEVC directly.
Simulation results show that most previous algorithms with slight
modification, in general, improve the encoding speed with a
relatively small degradation of the compression efficiency.
Among them, hierarchical mode decision is especially effective
whereas mode pre-decision using motion or spatial homogeneityoften results in inaccurate results.
1
Index Terms Fast inter-prediction, Mode decision,
Hardware encoder, HEVC, H.264/AVC.
I.INTRODUCTIONVideo compression technologies as well as video applications
such as video conferencing, streaming, video storage and
communication have attracted industry attention due to the
increasing popular demand for high-definition (HD) video content.
H.264/AVC [1] has been regarded as the state-of-the-art video
coding standard and widely used. Recently, the next-generation
video coding standard [2]-[4] known as High Efficiency VideoCoding (HEVC) has been developed by ISO/IEC MPEG and ITU-
T VCEG. In the emerging HEVC standard, several new features
1 This work was supported by the National Research Foundation of
Korea(NRF) grant funded by the Korea government(MEST) (No.
2012R1A2A2A06047297).
Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee are with
the Inter-university Semiconductor Research Center (ISRC), Department of
Electrical Engineering and Computer Science, Seoul National University,
Seoul, Korea (e-mail: [email protected], [email protected],
[email protected], [email protected]).
are introduced, including a flexible block structure, the increased
intra-coding directions, sophisticated interpolation filters, various
in-loop filters, and enhanced entropy coding schemes. The HEVC
standard aims at bitrate saving by a factor of two over H.264/AVC
at the expense of an increase in computational complexity.
Like H.264/AVC, mode decisions with motion estimation (ME)
remain among the most time-consuming computations in HEVC.
In an inter-prediction mode decision, a full-search algorithm
searches for every possible block size and refines the results from
integer-pel to quarter-pel resolution. Thus, a full-search algorithmguarantees the highest level of compression performance. However,
the considerable computational complexity for a mode decision is
critical for the encoding speed. Moreover, the main target
resolution of HEVC is full HD (19201080) and beyond.
Therefore, fast inter-prediction is not only an important challenge
but also an urgent problem to be solved for HEVC compression to
be used in real-time consumer electronic devices.
Extensive research effort has been conducted to reduce the
computational complexity for inter-prediction for H.264/AVC,
pursuing an effective trade-off between the rate-distortion (RD)
drop and the speed-up. In order to deal with the similar challenge
for HEVC, this paper reviews principal algorithms which havealready been attempted for H.264/AVC. A survey of these various
algorithms and an evaluation of their contributions and limitations
provide valuable leads for the development of fast algorithms for
HEVC inter-prediction. Major differences between H.264/AVC
and HEVC are also investigated from an algorithmic and
architectural perspective. Previous algorithms for the fast
H.264/AVC inter-predictions are then modified and re-designed
for HEVC inter-predictions so as to explore the possibilities for
application to HEVC.
The rest of the paper is organized as follows. Section II gives
an overview of inter-prediction in HEVC. Previous approaches
for fast inter-predictions in H.264/AVC are surveyed in Section
III, and the application of the fast inter-mode selection algorithms
to HEVC is presented in Section IV. Conclusions are given in
Section V.
II. OVERVIEW OF INTER-PREDICTION IN HEVCA.Inter-Prediction Algorithm in HEVC
To achieve high compression performance for high-
resolution videos, HEVC defines the coding unit (CU) as the
basic processing unit instead of the macroblock (MB).
-
7/29/2019 06415009
2/9
1376 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012
Unlike an MB of which size is fixed as 1616 pixels, the
size of a CU is not fixed, varying from 88 to 6464. A
large CU reduces the motion information data. Thus, the
compression efficiency is improved in a lossless manner,
especially for high-resolution videos. A CU can be
partit ioned into smaller CUs and the structure among
different CUs is represented by a quad-tree. The depth of
this tree can be as large as four. The largest CU in depth 0 is
denoted as LCU. For a CU of which size is denoted by
2N2N, predictions are performed for various block sizes of
2N2N, 2NN, N2N and NN. The processing unit for
prediction is called prediction unit (PU).
The HM5.0 reference software offers fast algorithms to
speed up the prediction time by an early decision of the final
prediction mode with the evaluation of only subsets of
prediction modes. One of the fast algorithms is the early SKIP
mode decision in which the computation for the SKIP mode is
performed first for a 2N2N PU. If the RD cost is less than
the average SKIP costs, as accumulated from the previous
SKIP modes, not only the prediction of the other PU types at
the same depth but also all predictions for the further depthsare omitted. Another fast algorithms are the early CU
determination and the coded block flag (CBF)-based fast
mode decision, denoted as ECU and CFM, respectively. The
CBF represents blocks with a zero residual. In the ECU, the
RD costs for the SKIP, 2N2N, 2NN and N2N inter-modes
as well as the intra-mode are calculated at the current depth. If
the SKIP mode cost is the smallest, predictions for CUs
smaller than the current CU are not performed. Meanwhile,
the CFM is used to select the PU size in the current CU and to
save computation power for predictions of less-probable PU
sizes. The predictions for the SKIP, 2N2N, 2NN and N2N
PUs are processed in sequence. If the CBF of the current PUhappens to be all zeros, the prediction is terminated and the
computation for the remaining PU sizes is saved, as a zero
CBF indicates that the RD performance is adequate when the
current PU is determined to be the best mode. Even if the
current PU is different from the best PU, the difference in the
RD cost between the current PU and the best PU may be
negligible.
B.Hardware Implementation for an HEVC EncoderThis subsection examines the impact of the HEVC coding
structure on the hardware implementation. In this paper, it isassumed that the pipeline architecture for the HEVC hardware
encoder may be similar to the widely used architectures forH.264/AVC encoders [5]-[9] where the integer motionestimation (IME) is performed in stage 1, whereas the FMEwith the MC is performed in stage 2. As the hardware encoder
takes advantage of parallel and/or pipelined execution ofmultiple hardware resources, the dependence betweencomputations in the HEVC standard often causes anunexpected slow-down.
To support the parallel execution of IMEs for all block-sizes
in H.264/AVC, the sum of absolute differences (SADs) for all
44 blocks of an MB is calculated simultaneously. The
obtained SAD values are combined in the variable-block-size
(VBS) adder tree and 41 SADs for all block sizes are
generated in one cycle. The problem is that the rate term of
the RD cost function can be computed only after the motion
vectors (MVs) of the neighboring blocks are determined,
which causes dependence among IMEs for various size blocks.
In addition, when IME and FME are processed not serially but
in a pipelined manner, the left MB is still in the FME stage.
Thus, the best mode and the best MV of the left block are not
available. In the H.264/AVC, the modified MV predictor
(MVP) is applied for all 41 blocks [8]. Instead of the median
value of MVs on the left, upper and upper-right blocks, the
median value of MVs on the upper-left, upper and upper-right
MBs are used for all 41 blocks equally in order to facilitate the
parallel processing and the MB pipelining.
The solution for parallel IME executions for H.264/AVC is
able to be applied for an HEVC encoder. The MVP of 2N2N
PU in LCU is used for all blocks equally. When this MVP is
derived, the left and below-left candidates among the spatial
MVP candidates are excluded. With a modification of the
MVP derivation, the IME execution in HEVC now has largeparallelism. Moreover, the parallelism in IME execution for
HEVC is larger than that for H.264/AVC. In H.264/AVC, the
parallel execution of IME is done for 1616 MB and sub-
blocks, whereas 6464 LCU and all blocks smaller than LCU
can have their IMEs processed in parallel in HEVC. When the
same search range and the search scheme are used for
H.264/AVC and HEVC, a 1616 MB and a 6464 LCU are
expected to have an identical IME time.
For FME, it is more difficult to exploit available parallelism
than for IME because the two-step FMEs for half- andquarter-pixel precisions should be performed sequentially.Furthermore, the modified MVP or the mode reductiondecreases the compression efficiency more seriously than IMEfast computation algorithms, in general. Besides, recent study
on H.264/AVC hardware encoders [10]-[16] also shows thatthe speed-up of the execution time is easier for IME than FME.Thus, in H.264/AVC, FME is usually conducted one by onefor 41 blocks in a 1616 MB. As a result, the execution time
for FME with the MC is most likely to be larger than that forIME. Even though the additional hardware resource is usedfor a parallel FME execution [12], the encoding time for anLCU is most likely determined by the time for the FMEfollowed by the MC and mode decision.
III. FAST INTER-MODE SELECTION ALGORITHMS FORH.264/AVC
In this section, fast inter-mode selection algorithmsproposed for H.264/AVC are surveyed. An effective pre-
selection of prediction block sizes is crucial for fast encoding.Reduction of the prediction block sizes often requires an RDdrop. An effective trade-off between the RD drop and thespeed-up has been one of the main research subjects to be
tackled. The previous algorithms are categorized according tothe decision stage and criteria, as shown in the classificationtree in Fig. 1.
-
7/29/2019 06415009
3/9
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1377
ME
pre-decision
Hierarchicaldecision
Motion characteristics of the
current MB
Spatial characteristics of the
current MB
Remaining prediction according to
the prior prediction result
Further prediction by comparing
the prior predictions
Neighboring (spatial and temporal)
information
FME
pre-decision
Mode pre-decision based on the
rate-distortion cost from IME
Reduction of FME calculation
from the reuse of integer-pel MVs
Fast inter-modeselection
Fig. 1. Classification tree of fast inter-predictions for H.264/AVC
There are roughly three categories of algorithms for fastinter-mode prediction. In the first category, candidate block
sizes are determined prior to ME and prediction operations
including ME are performed for only the selected candidateblock sizes. This category is further classified into three sub-categories. In the first sub-category, spatial and/or temporalcorrelation in a video is widely used to select candidates and
the degree of correlation is obtained from neighboringinformation. For instance, if an MB is surrounded byneighboring MBs coded as the DIRECT or SKIP mode, thevideo sequences are assumed to be changing smoothly and the
motion is similar to that in the neighboring area. In this case,the current MB is very likely to be coded in the DIRECT orSKIP mode or with a large block size such as 1616 [17]-[22].In a similar manner, various algorithms [23]-[27] search forthe best block size based on spatial and temporal homogeneity
investigations of the neighboring blocks.The algorithms in the second sub-category take advantage of
the correlation between the motion homogeneity and the bestblock size. Natural video sequences include stationary or
motionless regions for which the optimal block sizes aremostly large. Thus, the MVs of spatially and temporallyadjacent MBs are used to classify the motion characteristics[28], whereas the absolute difference between consecutiveframes is used to detect motion homogeneity [26][29][30]. Liu
et al [33] estimate the motion homogeneity of the current MBby MVs, which are generated from ME on 44 blocks insidethe current MB.
In the algorithms in the third sub-category, candidate block
sizes are predicted through spatial characteristics of thecurrent MB. A frame-level edge map or a variance of the MBis estimated to detect a homogeneous region [23][31][32]. Inother studies, the image is down-sampled and pre-encoded
[34][35]. The candidate block sizes are obtained aftercomparing the estimated RD cost during pre-encoding.
The algorithms of the second category explore the best blocksize in a hierarchical manner. In other words, certain blocksizes are estimated prior to the other block sizes and the
decision for a further block size search is then made using theresult of the prediction of the prior block sizes [29][31][36]-
[44]. In the first sub-category, the result of the prior predictionis tested. The decision regarding a further prediction isdetermined based on the test result. One of the most popular
algorithms is the early SKIP mode decision, where predictionsof remaining block sizes are performed only when the earlySKIP condition is not satisfied. Kannangara et al [43] makethe early SKIP mode prediction by estimating a Lagrangian
RD cost function which incorporates an adaptive model for
the Lagrangian multiplier parameter based on local sequencestatistics. Other studies [37][38][41][44] propose a simplethreshold-based algorithm to detect zero-coefficients blocks.Zero-coefficients represent the small distortion in the RD cost
function, and the SKIP mode decision is made early withoutthe expensive computation of the real RD cost. Or, if the RDcost of the SKIP mode is less than the threshold, the SKIPmode is selected as the best mode [36]. Here, the threshold is
defined as N bits the Lagrangian multiplier parameter,where N bits are equal to the minimum number of the bitsrequired for the non-SKIP mode.
In the second sub-category, the results of the priorpredictions are compared and further prediction is determined
according to the comparison result. In the algorithms of Yuand Chois studies [37][40], the RD costs of block sizes arecompared in the order of large to small block sizes. If thecurrent RD cost is larger than the RD cost of the larger block
size, further searches for blocks smaller than the current blockare stopped. In Yin and Lees studies [36][42], the RD costsof square blocks, 1616, 88 and 44, are tested first. If thetendency of these RD costs is not monotonic, all other non-square blocks need to be tested. Otherwise, only block sizes
between the best two square block sizes are searched. Inaddition to the hierarchical decision approach, much researchhas proposed a hybrid solution which selects candidateprediction block sizes prior to ME using the information
mentioned in the first category, after which prediction blocksizes are searched in a hierarchical manner [18][31][37].
In the first sub-category of the third category, IME isperformed for every block size. Next, the results of IME areused to select the candidate block sizes for FME. In the
simplest approach, called mode pre-decision (MPD), the bestcombination of various block sizes (VBS) for an MB isselected with the IME results. FME simply refines the integer-pel MV of the selected block size to the quarter-pel precision.
MPD suffers from a significant RD drop because the bestblock size from the IME may change after refinement in theFME. To achieve a better trade-off between the compressionefficiency and computational complexity, the advanced MPD
(AMPD) is proposed [45]. In the AMPD, more than onecandidate block size for the subsequent FME operations isselected. Seven partitions, four 88 partitions together withthe 168, 816 and 1616 partitions, are sorted according totheir IME cost. As a result, N (N = 1~7) partitions are selected
for the FME. In AMPD2 [45], one candidate is selected fromthe 168, 816 and 1616 partitions, whereas two areselected from the 88 partitions. Similarly, two partitions areselected by mode filtering (MF) for the FME operation fromthe IME phase [10]. One is selected from the 168, 816 and
1616 partitions and the other is selected from the 88, 168,
-
7/29/2019 06415009
4/9
1378 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012
816 and 1616 partitions, where the 88 partition consists ofthe best sub-block sizes. In this MF algorithm, the number ofselected block sizes is relatively low and larger block sizes are
more frequently selected than AMPD2.In the second sub-category, computation-reuse techniques
are adopted. Shao et al [46] propose that the FME for eachblock size is performed one by one. If the integer MV of thecurrent block is identical to that of the block already processed,
no FME computation needs to be performed. In particular, inthe homogeneous region, adjacent blocks tend to have thesame integer MV after IME. Therefore, this reusing techniquereduces the calculation for the prediction of the block size
with no RD drop. The same algorithm is applied to block sizeslarger than an 88 block [47]. The FMEs for blocks smallerthan 88 are omitted. Thus, the encoding time decreases morethan that of Shaos algorithm with a reasonable PSNR drop.
IV.APPLICATION OF FAST INTER-MODE SELECTION
ALGORITHMS TO HEVC
As explained in Section II, HEVC supports larger and more
various block sizes than H.264/AVC. If the early decision ismade to select the prediction block size, the computational
complexity is significantly reduced by omitting the remainingpredictions. Recently, several fast inter-mode selectionalgorithms have been proposed for HEVC. However, it is
important first to take advantage of the considerable amountof previous work and to find a guide for application to HEVC.In this paper, several previous algorithms proposed forH.264/AVC are modified and tested for HEVC. Algorithmswhich require an additional calculation to judge the texture
characteristic or motion homogeneity, such as a frame-leveledge map, are not used for simplification. The followingalgorithms are implemented in the HM5.0 reference software.
A.Prediction Block Size Pre-DecisionIn Section III.A, the block size prediction algorithms are
classified into three sub-categories that utilize spatial/temporal
correlations, motion vector information and the spatialcharacteristics of the current MB, respectively. Thissubsection evaluates the effectiveness of the three types ofalgorithms when they are used for early block size decisions
in HEVC. To this end, the relationship between the aboveinformation and depth of the current LCU is examined with
experiments. Note that the depth of the LCU determines theblock size in HEVC. Ten video sequences,Akiyo, Container,
TABLEI
CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR352288TEST VIDEOS
QP Dep thMax(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)
Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan
20
0- - - - - - - - - - - - - - -
10.23 0.22 5.26 3.81 7.25 2.04 3.41 2.75 1.81 3.33 1.84 2.49 2.81 1.85 2.88
(0.41) (0.38) (21.81) (370.40) (91.98) (6258.15)(10528.18 (7506.87) (736.33) (4891.66) (0.82) (0.39) (0.18) (0.76) (0.13)
20.51 0.28 7.05 2.73 13.96 2 .26 2.67 3.81 2.60 3.08 2.85 2.78 2.80 2.75 2.95
(0.5 6) (0.34 ) (1 20 .28) (22 2.79) (2 95 .62) (30 20.20 ) (70 49 .42 )(11 02 0.7 6 (17 46 .03) (3 889 .57) (0 .1 5) (0 .2 0) (0.17 ) (0.31 ) (0.05 )
30.70 0.49 6.71 10.99 16.76 2.93 1.96 2.65 2.30 2.56 2.93 2.89 2.97 2.97 2.98
(2 .0 1) (0 .5 1) (1 45 .7 0) (1 15 1.85 ) (3 82 .3 9) (4 84 4.59 ) (4 59 2.76 ) (6 03 5.12 ) (1 76 7.59 ) (2 88 3.81 ) (0 .0 7) (0 .1 3) (0 .0 3) (0 .0 3) (0 .0 3)
32
0- - - - - - - - - - - - - - -
10.43 0.37 9.12 5.42 11.89 2 .51 4.26 3.93 1.97 3.19 1.29 1.60 2.00 1.50 2.21
(0.4 5) (0.86 ) (1 042 .00 ) (54 7.51) (1 71 .20) (68 69.12 ) (95 60 .06 )(12 06 5.0 5 (8 06 .6 7) (5 411 .69) (0 .4 4) (0 .6 3) (0.54 ) (0.59 ) (0.55 )
21.21 0.43 8.91 6.87 10.81 2 .95 2.96 2.99 3.07 2.88 2.14 2.33 2.29 2.27 2.55
(2.2 9) (0.36 ) (2 03 .11) (64 4.77) (1 94 .82) (39 40.74 ) (37 48 .08 ) (69 30 .70) (12 79 .87) (2 980 .45) (0 .4 2) (0 .6 0) (0.42 ) (0.63 ) (0.25 )
31.00 0.67 5.49 6.74 10.84 2 .70 1.18 2.17 2.65 2.72 2.70 2.82 2.72 2.90 2.94
(0.8 6) (0.67 ) (5 5.16 ) (54 2.93) (1 89 .52) (38 64.05 ) (13 87 .00 ) (11 05 .39) (18 85 .03) (1 729 .40) (0 .2 3) (0 .2 8) (0.21 ) (0.09 ) (0.06 )
TABLEII
CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR19201080TEST VIDEOS
QP Depth
Max(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)
AspenBasketBal
DriveB QTerrace C actu s Kimo no 1 As pen
BasketBalDrive
B QTerrace C actu s Kimo no 1 Asp enBasketBal
DriveBQTerrace Cactus Kimono1
20
069.63 73.68 27.17 11.82 15.18 0.03 0.19 0.07 0.06 0.19 1.57 1.24 1.29 2.00 2.02
(14436.13 (40665.65 (3582.74) (15.78) (624.55) (12.57) (91.78) (0.45) (29.32) (135.24) (0.68) (0.75) (0.81) (0.70) (0.53)
144.90 75.85 40.38 37.86 18.93 0.43 0.52 1.83 0.96 0.39 2.21 2.51 2.17 2.83 2.40
(7 21 1. 57 )(2 24 41 .7 4 (8 69 4.43 )(1 60 47 .7 6) (7 21 .2 0) (4 32 .4 6) (7 61 .2 9) (1 01 25 .7 2 (2 36 5.81 ) (3 63 .0 5) (0 .4 4) (0 .3 5) (0 .7 3) (0 .2 3) (0 .3 9)
233.70 62.34 20.82 12.94 20.59 0.46 0.54 1.80 0.86 0.44 2.51 2.76 2.88 2.97 2.62
(3 31 6.01 )(1 07 57 .6 0 (3 02 3.15 ) (3 73 2.76 ) (8 93 .4 1) (3 32 .5 0) (6 11 .1 0) (4 49 7.16 ) (1 78 6.60 ) (5 87 .2 8) (0 .3 2) (0 .2 0) (0 .1 3) (0 .0 3) (0 .2 7)
331.21 60.76 11.81 8.14 24.33 0.52 0.69 1.92 1.00 0.46 2.77 2.95 3.00 2.99 2.82
(3 59 0.61 ) (7 22 1.25 ) (3 03 9.67 ) (5 09 .8 6) (1 56 0.22 ) (3 56 .2 3) (6 66 .9 0) (3 83 4.08 ) (2 28 4.00 ) (6 69 .3 7) (0 .1 9) (0 .0 5) (0 .0 1) (0 .0 1) (0 .1 5)
32
016.71 36.81 5.01 1 .79 13.59 0.20 0.23 0 .99 0 .60 0.27 0.76 0.77 0.95 0.69 1.44
(1 35 3.52) (2 52 1.04 ) (1 01 .63) (2 83 .5 8) (4 40.4 5) (10 8.80) (18 4.61) (18 54 .24 ) (8 75 .6 6) (2 75 .8 7) (0.6 0) (0.61 ) (0 .93 ) (0 .8 1) (0.5 7)
131.24 64.82 4.31 7 .98 19.02 0.77 0.99 2 .90 1 .53 0.45 1.64 1.69 1.92 1.99 1.82
(3 34 8.42 ) (8 08 1.78 ) (5 2.6 8) (4 26 .4 5) (1 10 8.55 ) (6 80 .7 1) (9 97 .5 8) (7 92 5.55 ) (4 71 4.41 ) (5 39 .6 8) (0 .5 1) (0 .5 1) (0 .6 7) (0 .5 4) (0 .3 7)
212.48 63.44 4.78 12.27 23.72 0.58 1.06 2.38 1.03 0.55 2.14 2.18 2.43 2.47 2.02
(7 5.7 6) (4 56 0.53 ) (6 8.73 ) (8 56 .5 9) (2 57 6.14) (23 6.52) (64 0.00) (43 73 .66 ) (7 50 .4 0) (6 96 .8 7) (0.5 0) (0.38 ) (0 .47 ) (0 .3 6) (0.3 0)
355.96 57.45 4.02 14.01 26.78 0.51 1.03 2.02 0.98 0.68 2.07 2.58 2.81 2.81 2.34
(4 83 8.08 ) (2 71 0.68 ) (2 3.6 6) (1 30 7.32 ) (6 79 2.91 ) (1 21 .2 5) (6 01 .1 1) (3 14 8.50 ) (6 38 .7 6) (6 13 .0 5) (0 .6 0) (0 .2 7) (0 .2 1) (0 .2 0) (0 .3 3)
-
7/29/2019 06415009
5/9
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1379
Foreman, Sean and Stefan with a resolution of 352288 aswell as Aspen, BasketBallDrive, BQTerrace, Cactus and
Kimono1 with a resolution of 19201080 are used. The
352288-size test videos use the same sequences used in theresearch for H.264 [17][28].
In Table I and Table II, the correlation between the depth ofthe current LCU and the information obtained fromneighboring LCUs is presented for 352288 and 19201080
video sequences, respectively. The first column represents thequantization parameter (QP) values, while the second columnrepresents the depth. The maximum value among the absoluteMVs of the neighboring LCUs is obtained for each LCU and
presented from the third to seventh columns of Tables I and II.The average and the variance (given in parenthesis) of themaximum MVs for each depth are shown in these columns.For a low resolution video in Table I, depth 0 is seldomselected. For these cases, the corresponding cells are left blank.
For videos at a resolution of 352288 and with QP=20, itappears that the depth of the LCU increases as the averagemagnitude of the neighboring MVs increases. The variance isalso not large in this case. This result follows the data proposed
in the research [28], which uses H.264/AVC targeting 176144and 352288-size videos. For 19201080 videos in Table II,however, the depth of the LCU does not increase along with theneighboring MVs and its variance is very large. This indicatesthat the correlation between the depth of the current LCU and
the neighboring MVs does not exist. In high-resolution videos,the MV values are quite large, even when the motion seemsstationary. Sometimes, a large block size is preferred, even withfast and complex motion, because the texture, brightness or
colors of the same object can be changed in a different way inevery frame. In this case, the elaborated ME with a small blocksize cannot reduce the prediction error. Therefore, theobservation leads to the conclusion that the correlation between
the depth of each LCU and the neighboring MVs becomes smallfor high-resolution videos.
If the pixel variance of a certain region is small, this region islikely to be spatially homogeneous and is probably encoded as alarge block size. This possibility is tested and the results are
presented from the eighth to twelfth columns in Tables I and II.In these columns, the correlation between the depth and thevariance of the current LCU is presented. For 19201080videos encoded with QP=20 in Table II, the variance of thecurrent LCU is quite low when its depth is 0. However, in other
cases, the correlation between the depth and the variance of thecurrent LCU is not very strong. In HEVC, the number of blocksizes is significantly larger than that in H.264/AVC. Moreover,
blocks can be encoded as the SKIP mode not only in the LCUsize but also in every CU. These changed SKIP mode decisionand the number of block sizes make it difficult to find a strongcorrelation between the depth and the variance.
From the thirteenth to seventeenth columns in Tables I andII, the correlation between the depth of the current LCU and
the depth information of the neighboring LCUs is presented.The depth of the current LCU becomes large as theneighboring depths increases while its variance is quite small.These results show that the correlation between the depths for
the current and neighboring LCUs is positive.
From the above simulations, the neighboring depthinformation may be helpful for a prediction of the block size(or the LCU depth), whereas the MV or variance information
may be not very useful.
B.Hierarchical Decision of Prediction Block SizeThe algorithm of Yu [37] checks three conditions for an
early SKIP mode decision. First, one of neighboring blocks is
the SKIP mode block. Second, the sum of absolute difference(SAD) of the current MB is less than the average SAD of theneighboring MBs. Here, the SAD is the difference between a
block in the current frame and a co-located block in thereference frame. Lastly, the result of the fast transform-quantized coefficients is zero. In Table III, the early SKIPmode decision algorithms denoted by ES proposed in theHM5.0 reference software and Yus algorithm [37] are tested.
For the simulation of a hardware-based HEVC encoder, theencoding time to process a LCU is estimated by adding thetime for the FME with the MC of each CU inside an LCU,as the stage of the FME with the MC operations take the
most time in the pipeline schedule as discussed in Section
II.B. The configurations for the encoding are low-complexity, low-delay, and P picture-only and the numberof reference frames is four at most. Twelve video sequences,
BQMall, FlowerVase, Keiba and RaceHorses with a
resolution of 832480; FourPeople, KristenAndSara,
Johnny and Vidyo1 with a resolution of 1280720; and
Aspen,BasketBallDrive, SnowMountain andKimono1 with aresolution of 19201080, are used in the evaluation. There
are 50 frames in each test sequence, and four QPs (20, 24,28 and 32) are used. The first and the second columnsrepresent the resolutions and test sequences used in thesimulation. From the third to fifth columns, the increase inbitrate and PSNR and the time saved, denoted by B, P
and T, respectively, are shown when the ES proposed inthe reference software is applied. The time is reduced by60.76%, whereas the bitrate slightly decreases and the PSNRis degraded by 0.02dB. Yus algorithm [37] makes the early
SKIP mode decision considering neighboring informationand the characteristics of the current CU unlike the RD cost-based algorithm in the reference software. From the sixth toeighth columns, two algorithms are used together tocomplement each other. The time is reduced by 69.82%.
TABLEIII
RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY AN EARLY
SKIPMODE DECISION
Size Videos
ES in HM ES in HM + Yus
B P T B P T
(%) (dB) (%) (%) (dB) (%)
832
480
BQMall - 0.32 - 0.02 53.81 2.45 - 0.13 67.62
FlowerVase -0.03 -0.01 71.87 -0.56 -0.09 81.95
Keiba -0.13 - 0.01 38.96 2.76 -0.08 53.05
RaceHorses -0.18 -0.02 31.48 1.49 -0.08 39.98
1280
720
FourPeople -0.84 -0.03 82.55 -0.30 -0.09 88.00
Johnny - 1.15 - 0.03 78.36 - 1.30 -0.07 84.50
KristenAndSara -1.06 -0.02 79.08 -0.63 -0.08 84.92
Vidyo1 - 0.62 -0.02 78.08 - 0.29 -0.10 85.61
1920
1080
Aspen -0.19 -0.01 59.42 0.68 - 0.05 68.72
BasketBallDrive -0.43 -0.02 54.68 0.97 -0.07 63.27
SnowMoutain -0.50 -0.02 58.79 -2.13 -0.10 68.53
Kimono1 -0.16 -0.01 42.09 0.77 -0.04 51.73
Average - 0.47 -0.02 60.76 0.33 - 0.08 69.82
-
7/29/2019 06415009
6/9
1380 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012
The ECU and CFM algorithms are applied to HEVC and theresults are tabulated, as presented in Table IV. From the thirdto fifth columns, the ECU algorithm is used alone. The
encoding time is reduced by 48.03% and the RD drop ismarginal. When ECU and CFM algorithms are used together,the encoding time saved is 64.03%, whereas the PSNR is0.07dB less than that of ECU. When the three algorithms ofECU, CFM and ES are used together, only 5% of time is
additionally saved.According to the categorization in Section III, the early
SKIP mode decision, ECU, and CFM are all classified as
belonging in the first sub-category in the hierarchical decision.
On the other hand, no algorithms for the second sub-category
are defined in the HM5.0 reference software. The second sub-
category algorithms for H.264/AVC are applied to HEVC
compression and the effect on the RD performance and the
speed-up is investigated. In a number of previous block-size-
reduction algorithms, the prediction of the block size at the
lower depth is performed first and the searches for deeper
depths are then stopped if a certain condition is satisfied
[18][31][37]. The following three algorithms are classified asbelonging in the third sub-category in the hierarchical decision
according to the categorization in Section III.A Early
termination of CU, which is similar to the algorithm proposed
by Lee [31] (denoted as ETCU1 henceforth), is applicable for
the reduction of the block size search. The predictions for four
CUs at depth (d+1) are performed after the prediction for CU
at depth (d). Every time the prediction for the CU at depth
(d+1) is finished, the RD cost of each CU is accumulated and
compared with the early termination threshold. If the current
accumulated RD cost at depth (d+1) is larger than the
threshold, the total RD cost of four CUs at depth (d+1) is
expected to be larger than that of the corresponding CU at
depth (d). Thus, the ongoing prediction at depth (d+1) isterminated early. The threshold is derived from the RD cost at
depth (d). In the Yus algorithm [37], if the RD cost of
2N2N at depth (d+1) is greater than a quarter of the best RD
cost at depth (d), further searches on 2NN and N2N PUs at
depth (d+1) as well as deeper depths are not performed. This
algorithm is denoted as ETCU2 henceforth. Another early
termination algorithm proposed not performing a FME
operation at each depth [18]. This strategy is denoted as
FME_SKIP hereafter. The SKIP mode plays an important role
in compression efficiency and SKIP mode prediction is, thus,
always performed, even when various fast-mode decision
schemes are applied. The result of the SKIP mode prediction
is obtained very quickly due to its low complexity as
compared to other inter- and intra-predictions. If the ME cost
as estimated in the middle of its computation is greater than
the SKIP cost, the ME operation is terminated. A specific
algorithm is as follows. After IME, the IME cost is compared
to the cost of the SKIP mode using the condition C FME_SKIP as
defined in (1). Here, COSTSKIP is the cost of the SKIP mode,
whereas COSTIME denotes the IME cost. If COSTSKIP is less
than COSTIME multiplied by WFME_SKIP, FME is not performed
for the current block. The weight value, WFME_SKIP, is chosen
experimentally and is set to 0.8 because it is observed that the
cost obtained from FME is approximately 80% of COST IME
on average. Therefore, the final cost of ME can be estimated
as 0.8COSTIME, and this estimated ME cost is compared with
COSTSKIP.
CFME_SKIP: COSTSKIP < WFME_SKIPCOSTIME (1)
In Table IV, from the twelfth to fourteenth columns,
ETCU1 algorithm is used alongside ECU and CFM
algorithms. The encoding time is reduced by 68.26%, whereas
the increase in bitrate and the PSNR drop are 1.88% and
0.23dB, respectively. ETCU2 algorithm from the fifteenth to
seventeenth columns shows 75.25% of time saving but the RD
drop is quite large. Lastly, from the eighteenth to twentieth
columns, the simulation results are shown when ECU, CFM
and FME_SKIP algorithms are used together. The time saving
of 89.95% is achieved, whereas the RD performance is much
better than those of ETCU1 and ETCU2. For three simulationsincluding ECU+CFM+ETCU1, ECU+CFM+ETCU2 and
ECU+CFM+FME_SKIP, using ES algorithm additionally is
not helpful both for the time saving and the RD performance.
C.Decision of Prediction Block Sizes before FMEIn H.264/AVC, AMPD (or AMPD2) or MF has been
successfully used for block size reduction. As explained in
Section II, the reduction of the FME time is very important for
real-time encoding for a hardware-based encoder
TABLEIV
RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY THE ECU AND CFM ALGORITHMS PROPOSED IN THE HM5.0REFERENCE SOFTWARE
Size Videos
ECU ECU+CFM ES+ECU+CFM ECU+CFM+ETCU1 ECU+CFM+ETCU2 ECU+CFM+FME_SKIP
B P T B P T B P T B P T B P T B P T
(%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%)
832
480
BQMall -0.87 -0.05 44.84 -1.47 -0.12 60.83 -1.81 -0.16 65.49 3.48 -0.25 64.04 19.78 -0.58 72.04 -1.10 -0.15 82.49
FlowerVase -1.77 -0.07 66.12 -3.97 -0.21 82.30 -4.11 -0.23 86.46 1.79 -0.52 84.82 5.44 -0.63 86.34 -2.63 -0.29 96.35
Keiba -0.51 -0.03 30.16 -1.13 -0.09 47.67 -1.36 -0.11 51.64 3.23 -0.17 53.90 10.49 -0.31 64.86 -0.70 -0.12 78.27
RaceHorses -0.27 -0.02 14.67 -0.66 -0.09 32.62 -0.93 -0.14 38.16 5.98 -0.27 43.83 16.76 -0.48 55.73 -0.18 -0.10 66.37
1280
720
FourPeople -1.15 -0.04 67.92 -1.89 -0.09 81.95 -2.42 -0.12 86.97 2.71 -0.28 83.15 10.20 -0.47 86.34 -1.85 -0.12 95.49
Johnny -1.96 -0.04 64.72 -3.95 -0.13 80.83 -4.85 -0.16 85.96 -0.36 -0.22 82.02 5.91 -0.39 85.71 -3.91 -0.18 95.05
KristenAndSara -1.79 -0.06 64.35 -2.79 -0.14 80.33 -3.52 -0.18 85.34 2.36 -0.30 81.70 9.21 -0.50 85.15 -2.67 -0.17 95.30
Vidyo1 -1.89 -0.05 65.23 -2.80 -0.13 80.14 -3.24 -0.15 85.03 1.70 -0.30 82.14 6.88 -0.44 84.90 -2.85 -0.16 94.92
1920
1080
Aspen -0.34 -0.01 45.75 -1.01 -0.05 62.68 -1.24 -0.07 67.58 1.71 -0.12 67.94 1.47 -0.14 74.94 -0.61 -0.08 86.06
BasketBallDrive -0.53 -0.02 35.41 -1.27 -0.07 53.89 -1.72 -0.10 60.35 2.16 -0.12 61.24 3.69 -0.17 69.32 -1.17 -0.09 81.94
SnowMoutain -2.31 -0.08 52.21 -3.30 -0.12 64.35 -3.46 -0.13 67.82 -3.20 -0.17 66.15 -3.11 -0.28 75.54 -3.10 -0.13 82.42
Kimono1 -0.16 -0.01 24.94 -0.32 -0.03 40.79 -0.64 -0.07 46.65 1.04 -0.06 48.15 2.14 -0.10 62.14 -0.06 -0.05 76.73
Average -1.13 -0.04 48.03 -2.05 -0.11 64.03 -2.44 -0.13 68.95 1.88 -0.23 68.26 7.40 -0.37 75.25 -1.74 -0.14 85.95
-
7/29/2019 06415009
7/9
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1381
implementation. Thus, in this subsection, the above algorithms
are applied for HEVC and their effectiveness is then tested
through simulation. To apply AMPD2 or MF to the HEVC
mode decision, candidate partitions should be defined during
the IME phase. Fig. 2 shows one example of a block size
prediction in the IME phase. In the Clusters 1 and 2, there are
three 6464 CU partitions and three 3232 CU partitions,
respectively, whereas there are four 1616 CU partitions in
the Cluster 3. For the 88 CU partition, the best block size is
selected based on the IME cost. For the 1616 CU partition,
the IME costs of the 2N2N, N2N, 2NN and NN types
are sorted in an ascending order, whereas the IME costs of the
2N2N, N2N and 2NN types are sorted in an ascending
order for the 3232 and 6464 CU partitions. Through this
process, ten partitions in total are selected for FME, as shown
in Fig. 2.
Cluster16464
Cluster2
Cluster3
Fig. 2. Prediction modes pre-determined in the IME phase
From the third to fifth columns in Table V, the RD
performance degradation and the encoding time saved are
shown when FME is performed for the ten candidate
partitions of Fig. 2. The time saving is 30.15%, whereas theRD drop is marginal. From the sixth to eighth columns, the
seven candidates, three from the Cluster 1, two from the
Cluster 2 and two from the Cluster 3, are chosen. The time is
reduced by 57.44 %, whereas the increase in bitrate and the
drop in the PSNR are 0.21% and 0.04 dB on average. From
the ninth to eleventh columns, FME is performed for the four
candidate partitions. One from the Cluster 1 and another one
from the Cluster 2 are selected, whereas two are selected from
the Cluster 3. The encoding time is reduced by 73.36%,
whereas the increase in bitrate and the drop in the PSNR are
0.26% and 0.04 dB on average. From these simulations, this
algorithm turns out to be very effective for speed-up without a
significant RD degradation for all types of video sequences.
D.Algorithm EvaluationFrom Sections IV.A to C, in HEVC, it can be inferred that
pre-decisions of prediction block sizes are very difficult,whereas hierarchical decisions or decisions based on the
results from IME are useful for saving time. However, some
of these algorithms offers a different degree of performance
according to the video characteristics. In Fig. 3, the times
saved by various algorithms are compared for theRaceHorses
and FourPeople video sequences denoted by black and gray
bar graphs, respectively. The FourPeople sequence has slow
motion and its texture is smooth, whereas the RaceHorses
sequence includes fast and irregular motion. The hierarchical
decision presented in the HM5.0 reference software, including
the ES scheme, is very effective for theFourPeople sequence.
However, for theRaceHorses, the benefit from those ES, ECUand CFM algorithms are not large and is less than half of that
for the FourPeople sequence. Another notable observation is
that the combination of the ES, ECU and CFM increases the
time saving. However, the rate of increase is not significant as
the effects of those schemes are overlapping in many cases.
When the ECU and CFM schemes are combined with other
hierarchical decision schemes of the ETCU1, ETCU2 and
FME_SKIP, the time saving for the RaceHorses is improved
substantially, whereas the amount of the time saving is
increased slightly for the FourPeople sequence. Unlike other
hierarchical decision algorithms, AMPD algorithms show the
similar performance for both video sequences. The time
saving is increased as the number of candidates are reduced.
As shown in Fig. 3, most algorithms show significant time
savings for theFourPeople sequence, whereas the variation in
the saved time is very large in theRaceHorses sequence. Only
four combinations, ECU+CFM+ETCU2, ECU+CFM+
FME_SKIP as well as the AMPD algorithms with 7 and 4
candidates, show time savings of over 50% for both
FourPeople and RaceHorses.
Fig. 3. Algorithm comparison in terms of the time saved
TABLEV
RDPERFORMANCE DEGRADATION AND ENCODING TIME SAVED
ACCORDING TO MODES DETERMINED IN THE IMEPHASE
Size Videos 10 candidates 7 candidates 4 candidatesB P T B P T B P T
(%) (dB) (%) (%) (dB) (%) (%) (dB) (%)
832
480
BQMall 0.25 -0.03 30.23 0.58 -0.05 57.24 0.48 -0.05 72.98
FlowerVase -0.05 -0.04 30.23 0.20 -0.07 58.19 0.29 -0.07 73.93
Keiba 0.16 -0.02 30.23 0.75 -0.05 56.74 1.01 -0.04 72.42
RaceHorses 0.61 -0.04 30.23 1.30 -0.07 55.92 1.37 -0.07 71.73
1280
720
FourPeople -0.04 -0.01 30.08 0.01 -0.03 57.71 0.08 -0.03 73.87
Johnny -0.46 -0.01 30.08 -0.62 -0.03 57.64 -0.24 -0.04 73.68
KristenAndSara -0.37 -0.02 30.08 -0.20 -0.03 57.71 -0.18 -0.03 73.75
Vidyo1 -0.03 -0.01 30.08 -0.23 -0.02 57.71 -0.01 -0.03 73.75
1920
1080
Aspen -0.05 0.00 30.14 0.39 -0.02 57.95 0.41 -0.02 73.86
BasketBallDrive -0.10 -0.01 30.14 0.42 - 0.02 57.54 0.38 - 0.02 73.50
SnowMoutain -0.34 -0.04 30.14 -0.51 -0.06 57.12 -0.55 -0.06 73.15
Kimono1 -0.09 0.00 30.14 0.39 -0.02 57.78 0.08 -0.02 73.74
Average -0.04 -0.02 30.15 0.21 -0.04 57.44 0.26 -0.04 73.36
-
7/29/2019 06415009
8/9
1382 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012
In Fig. 4, the RD performances of the ECU+CFM+ETCU2,
ECU+CFM+FME_SKIP and AMPD algorithms with 7 and 4
candidates are compared to that of the HM5.0 reference
software where no early decision algorithm is adopted. The
horizontal and the vertical axes show the bitrate and the PSNR,
respectively. The RaceHorses and FourPeople video
sequences are used in Figs. 4(a) and (b), respectively. The RD
performance of the three algorithms of the
ECU+CFM+FME_SKIP and AMPD algorithms are
comparable to that of the HM5.0 reference software, whereas
the RD drop of the ECU+CFM+ETCU2 algorithm denoted by
the dash curve is quite large.
30
34
38
42
46
0 5000 10000
PSNR
(dB)
Bitrate (kbps)
ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2
30
34
38
42
46
0 1000 2000 3000
PSNR
(dB)
Bitrate (kbps)
ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2
(a) (b)
Fig. 4. Algorithm comparison in terms of the RD performance: (a)
832480-size RaceHorses sequence (b) 1280720-size FourPeople
sequence
V.CONCLUSION
The HEVC standard employs a hybrid coding approach
similar to that of the H.264/AVC standard. Thus, the two
standards have much in common. In this paper, the fast mode
decision algorithms for H.264/AVC are surveyed and then
they are applied for the speed-up of HEVC encoding. One of
the major differences is that the number of block sizes
supported by HEVC is 10 times more than that of H.264/AVC.
The other is that the execution time for FME becomes much
larger than that for IME because IME execution can be speed
up by exploiting parallelism while FME execution needs to be
executed in a serial manner. This second difference needs to
make the fast execution of FME become more important than
that of IME when a hardware-based encoder is used for
HEVC compression. It is experimentally shown that a
hierarchical inter-mode decision algorithm is a very effective
solution for HEVC because there are many opportunities to
terminate further prediction during searching a tree of CUs. In
the future, the previous algorithms tested in this paper need tobe further elaborated and enhanced.
REFERENCES
[1] Draft ITU-T Recommendation and Final Draft International Standard ofJoint Video Specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC),
2003.
[2] ISO/IEC JTC 1 SC29 WG11, "Joint Call for Proposals on VideoCompression Technology," Doc. N11113, Jan. 2010.
[3] ISO/IEC JTC 1 SC29 WG11, "Vision, Applications and Requirementsof High-Performance Video Coding," Doc. N11096, Jan. 2010.
[4] T. Wiegand, W.J. Han, B. Bross, and J. R Ohm, and G.J. Sullivan,WD4: Working Draft 4 of High-Efficiency Video Coding,
JCTVCF803, Torino, IT, July 2011.
[5] Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kou, S.-J. Wu, W.-C. Tai, W.-C.Chang, and T.-Sheuan Chang, A 242mW, 10mm2 1080p H.264/AVC
High Profile Encoder Chip, in Proc. of Design Automat. Conf., pp.78-
83, July 2008.
[6] Y.-H. Chen, T.-D. Chuang, Y.-J. Chen, C.-T. Li, C.-J. Hsu, S.-Y. Chien,and L.-G. Chen, An H.264/AVC scalable extension and high profile
HDTV 1080p encoder chip, inProc. of Sym. on VLSI Circuits, pp.104-
105, Aug. 2008.[7] Y.-H. Chen, T.-C. Chen, and L.-G. Chen, Power-scalable algorithm and
reconfigurable macro-block pipelining architecture of H.264 encoder for
mobile application, in Proc. Int. Conf. Multimedia Expo, pp.281284,
Dec. 2006.
[8] T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W.Chen, and L.-G. Chen, Analysis and architecture design of an
HDTV720p 30 frames/s H.264/AVC encoder, IEEE Trans. Circuits
Syst. Video Technol., vol. 16, no. 6, pp. 673688, June, 2006.
[9] H.-C. Chang, Y.-C. Yang, J.-W. Chen, C.-L. Su, C.-A. Chien, J.-I. Guo,and J.-S. Wang, A dynamic quality-scalable H.264 video encoder
chip, inProc. Asia South Pacific Design Automat. Conf., pp. 125126,
Feb. 2009.
[10] Y.-K. Lin, C.-C. Lin, T.-Y. Kuo, and T.-S. Chang, A Hardware-Efficient H.264/AVC Motion-Estimation Design for High-Definition
Video, IEEE Trans. Circuits and System I, vol. 55, no. 6, pp. 1526
1535, July, 2008.[11] C. Yang, S. Goto and T. Ikenaga, High Performance VLSI Architecture
of Fractional Motion Estimation in H.264 for HDTV, in Proc. of Int.
Symposium on Circuits and Systems, pp.26052608, May, 2006.
[12] C.-Y. Kao, C.-L. Wu and Y.-L. Lin, A High-Performance Three-EngineArchitecture for H.264/AVC Fractional Motion Estimation, IEEE Trans.
Very Large Scale Integration Sys., vol. 18, no. 4, pp. 662666, April,
2010.
[13] P. K. Tsung, W.-Y. Chen, L.-F. Ding, S.-Y. Chien, L.-G. Chen, Cache-based Integer Motion/Disparity Estimation for Quad-HD H.264/AVC
and HD Multiview Video Coding, in Proc. of the IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, pp. 20132016, April, 2009.
[14] C.-M. Ou, C.-F. Le, W.-J. Hwang, An efficient VLSI architecture forH.264 variable block size motion estimation, IEEE Trans. Consumer
Electronics, vol. 51, no. 4, pp. 12911299, Nov., 2005.
[15] J. Kim and T. Park, A novel VLSI architecture for full-search variableblock-size motion estimation,IEEE Trans. Consumer Electronics, vol.55, no. 2, pp. 728733, May, 2009.
[16] L. Zhang and W. Gao, Reusable Architecture and Complexity-Controllable Algorithm for the Integer/Fractional Motion Estimation of
H.264,IEEE Trans. Consumer Electronics, vol. 53, no. 2, pp. 749756,
May, 2007.
[17] X. Lu, A.M. Tourapis, P. Yin, and J. Boyce, Fast Mode Decision andMotion Estimation for H.264 with a Focus on MPEG-2/H.264
Transcoding, inProc. of Int. Symposium on Circuits and Systems, vol.
2, pp.12461249, May, 2005.
[18] C. E. Rhee, J.-S. Kim, and H.-J. Lee, Cascaded Direction Filtering forFast Multidirectional Inter-Prediction in H.264/AVC Main and High
Profile Compression, IEEE Trans. Circuits Syst. Video Technol., vol.
22, no. 3, pp. 403413, March, 2012.
[19] B.-G. Kim, S.-K. Song, and C.-S. Cho, Efficient inter-mode decisionbased on contextual prediction for the P-slice in H.264/AVC video
coding, in Proc. Int. Conf. Image Processing, pp.13331336, Oct.,
2006.
[20] B.-G. Kim and C.-S. Cho, A fast inter-mode decision algorithmbased on macro-Block tracking for P slices in the H.264/AVC video
standard, in Proc. Int. Conf. Image Processing, vol. 5, pp. 301304,
Sept., 2007.
[21] X. Jin, Y. Huang, Q. Liu, S. Wu, and T. Ikenaga, Fast Spatial DirectMode Decision for B Slice based on Temporal Information in H.264
Standard, in Proc. of Int. Sym. on Intell igent Signal Processing and
Communication Systems,pp.331334, Jan. 2009.
[22] T. Zhao, H. Wang, and S. Kwong, C. -C. J. Kuo, Fast Mode DecisionBased on Mode Adaptation, IEEE Trans. Circuits Syst. Video
Technol., vol. 20, no. 5, pp. 697705, May, 2010.
-
7/29/2019 06415009
9/9
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1383
[23] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C.Ko, Fast Intermode Decision in H.264/AVC Video Coding, IEEE
Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 953 958, July,
2005.
[24] C. Y. Chang, C. H. Pan, and H. Chen, Fast mode decision for P-framesin H.264, presented at the Picture Coding Symp., Dec., 2004.
[25] S.-H. Ri, Y. Vatis, and J. Ostermann, Fast Inter-Mode Decision in anH.264/AVC Encoder Using Mode and Lagrangian Cost Correlation,
IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 302 306,
Feb., 2009.
[26] X. Jing and L.-P. Chau, Fast approach for H.264 INTER modedecision, Electronics Letters, vol. 40, no. 17, pp.10501052, Aug.,
2004.
[27] A. Ahmad, N. Khan, S. Masud, and M.A. Maud, Selection of variableblock sizes in H.264, in Proc. of the IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing, vol. 3, pp. 173176, May, 2004.
[28] H. Zeng, C. Cai, and K.-K. Ma, Fast Mode Decision for H.264/AVCBased on Macroblock Motion Activity, IEEE Trans. Circuits Syst.
Video Technol., vol. 19, no. 4, pp. 491 499, April, 2009.
[29] J. Bu, S. Lou, C. Chen, and J. Zhu, A predictive block-size modeselection for inter frame in H.264, in Proc. of the IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, vol. 2, pp. 917920, May,
2006.
[30] H.Ko, K. Yoo, and K. Sohn, Fast mode-decision for H.264/AVC basedon inter-frame correlations, Signal Processing: Image Commun.,
vol.24, no. 10, pp. 803-813, Nov. 2009.
[31] J. Y. Lee and H. Park, A Fast Mode Decision Method Based on MotionCost and Intra Prediction Cost for H.264/AVC, IEEE Trans. CircuitsSyst. Video Technol., vol. 22, no. 3, pp. 393 402, March, 2012.
[32] D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, and X. Lin, Block intermodedecision for fast encoding of H.264, inProc. of the IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, vol. 3, pp. 181184, May,
2004.
[33] Z. Liu, L. Shen, and Z. Zhang, An Efficient Intermode DecisionAlgorithm Based on Motion Homogeneity for H.264/AVC, IEEE Trans.
Circuits Syst. Video Technol., vol. 19, no. 1, pp. 128132, Jan., 2009.
[34] D. Zhu, Q. Dai, and R. Ding, Fast inter-prediction mode decision forH.264, in Proc. Int. Conf. Multimedia Expo, vol. 2, pp. 11231126,
June, 2004.
[35] C.-H. Kuo, M. Shen, and C.-C. J. Kuo, Fast inter-prediction modedecision and motion search for H.264, in Proc. IEEE Int. Conf.
multimedia Expo, vol. 1, pp. 663666, June, 2004.
[36]
P. Yin, H.-Y.C. Tourapis, A.M. Tourapis, and J.Boyce, Fast modedecision and motion estimation for JVT/H.264, in Proc. of the IEEE Int.
Conf. on Image Processing, vol. 3, pp.853856, Sept., 2003.
[37] A. C. W. Yu, G. R. Martin, and H. Park, Fast Inter-Mode Selection inthe H.264/AVC Standard Using a Hierarchical Decision Process, IEEE
Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 186 195, April,
2009.
[38] G. Kim, Y. Moon, and J. Kim, An early detection of all-zero DCTblock in H.264, inProc. Int. Conf. Image Processing, vol. 1, pp. 453
456, Oct. 2004.
[39] J. Lee and B. W. Jeon, Fast mode decision for H.264 with variablemotion block size, Lecture Notes in Computer Science, vol. 2869, pp.
723730, 2003.
[40] I. Choi, J. Lee, and B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 part-10 AVC/H.264, IEEE Trans.
Circuits Syst. Video Technol., vol. 16, no. 12, pp. 15571561, Dec.,
2006.
[41] Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik, and H.-K.Jung,Adaptive mode decision for H.264 encoder, Electronics Letters, vol.
40, no. 19, pp.11721173, Sept., 2004.
[42] J. Lee and B. Jeon, Pruned mode decision based on variable block sizesmotion compensation for H.264, Lecture Notes in Computer Science,
vol. 2899,pp. 410418, Nov., 2003.
[43] C.S. Kannangara, I.E.G. Richardson, M. Bystrom, J.R. Solera, Y. Zhao,A. MacLennan, and R. Cooney, Low complexity skip prediction for
H.264 through Lagrangian cost estimation, IEEE Trans. Circuits Syst.
Video Technol., vol. 16, no. 2, pp. 202208, Feb., 2006.
[44] Y. Moon, G. Kim, and J. Kim, An improved early detection algorithmfor all-zero blocks in H.264 video encoding, IEEE Trans. Circuits Syst.
Video Technol., vol. 15, no. 8, pp. 10531057, Aug., 2005.
[45] T.-C. Chen, Y.-W. Huang, and L.-G. Chen, " Fully utilized and reusablearchitecture for fractional motion estimation of H.264/AVC," inProc. of
the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5,
pp. 9 12, May, 2004.[46] M. Shao, Z. Liu, S. Goto, and T. Ikenaga, Lossless VLSI oriented full
computation reusing algorithm for H.264/AVC fractional motion
estimation, IEIEC Trans. Fundamentals, vol.90-A, no.5, pp. 756763,
April, 2007.
[47] Y. Song, M. Shao, Z. Liu, S. Li, L. Li, T. Ikenaga, and S. Goto, H.264/AVC fractional motion estimation engine with computation reusing
in HDTV1080p real-time encoding applications, in Proc. of the IEEE
Workshop on Signal Processing Systems, pp.509514, Oct., 2007.
BIOGRAPHIES
Chae Eun Rhee received the B.S., M.S. and Ph.D degrees in
Electrical Engineering and Computer Science from Seoul
National University, Seoul, Korea, in 2000, 2002 and 2011,
respectively. From 2002 to 2005, she was with the Digital TV
Development Group, Samsung Electronics Company Ltd.,Suwon City, Korea, as an Engineer, where she was involved
in bus architecture and MPEG decoder development. She is currently working
as a research professor in Electrical Engineering and Computer Science at
Seoul National University, Korea. Her research interests include algorithm
and architecture design of video coding for HEVC and H.264/AVC and
configurable video coding for real time systems.
Kyujoong Lee received the B.S. degree in electrical
engineering from Seoul National University, Seoul, Korea,
in 2002 and the M.S. degree in electrical engineering from
University of Southern California, Los Angeles, USA, in
2008. He is working toward Ph.D degree in electrical
engineering of Seoul National University. From 2002 to
2005, he was with Com2us Corporation, Seoul, Korea, as a
developer. His major research interests include the algorithm and architecture
of H.264/AVC and SVC and noise reduction of video stream.
Tae-Sung Kim received the B.S degree in electrical
electronic engineering from Pusan National University,
Pusan, Korea, in 2010. He is working toward M.S degree in
electrical engineering of Seoul National University. His
research interests include the algorithm and architecture of
H.264/AVC and HEVC.
Hyuk-Jae Lee received the B.S. and M.S. degrees in
Electronics Engineering from Seoul National University,
Korea, in 1987 and 1989, respectively, and the Ph.D. degree
in Electrical and Computer Engineering from Purdue
University at West Lafayette, Indiana, in 1996. From 1998 to
2001, he worked at the Server and Workstation Chipset
Division of Intel Corporation in Hillsboro, Oregon as a senior
component design engineer. From 1996 to 1998, he was on the faculty of the
Department of Computer Science of Louisiana Tech University at Ruston,
Louisiana. In 2001, he joined the School of Electrical Engineering and
Computer Science at Seoul National University, Korea, where he is currently
working as a Professor. He is a founder of Mamurian Design, Inc., a fabless
SoC design house for multimedia applications. His research interests are in the
areas of computer architecture and SoC design for multimedia applications.