06415009

Upload: nidhi-parmar

Post on 14-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 06415009

    1/9

    C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1375

    Contributed Paper

    Manuscript received 10/15/12

    Current version published 12/28/12

    Electronic version published 12/28/12. 0098 3063/12/$20.00 2012 IEEE

    A Survey of Fast Mode Decision Algorithms

    for Inter-Prediction and Their Applications

    to High Efficiency Video Coding

    Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee

    Abstract The emerging High Efficiency Video Coding

    (HEVC) standard attempts to improve the coding efficiency by a

    factor of two over H.264/AVC using new compression tools with

    high computational complexity. The increased computational

    complexity makes the real-time execution with reasonable

    computing power become one of the critical concerns for the

    commercialization of HEVC. A large number of prediction

    modes are the main causes of the increased complexity of HEVC.

    Thus, a fast decision of a prediction mode needs to be effectively

    used to reduce the computational complexity. To take advantageof large amounts of previous works and to find a guide for

    application to HEVC, this paper presents a survey of these efforts

    for the previous standards, especially for H.264/AVC, and

    examines the possibility of the previous algorithms to be

    applicable for HEVC. To this end, previous algorithms are

    categorized and then the effectiveness of each category for

    HEVC is evaluated. For this evaluation, a previous algorithm is

    modified for HEVC when it is not applicable to HEVC directly.

    Simulation results show that most previous algorithms with slight

    modification, in general, improve the encoding speed with a

    relatively small degradation of the compression efficiency.

    Among them, hierarchical mode decision is especially effective

    whereas mode pre-decision using motion or spatial homogeneityoften results in inaccurate results.

    1

    Index Terms Fast inter-prediction, Mode decision,

    Hardware encoder, HEVC, H.264/AVC.

    I.INTRODUCTIONVideo compression technologies as well as video applications

    such as video conferencing, streaming, video storage and

    communication have attracted industry attention due to the

    increasing popular demand for high-definition (HD) video content.

    H.264/AVC [1] has been regarded as the state-of-the-art video

    coding standard and widely used. Recently, the next-generation

    video coding standard [2]-[4] known as High Efficiency VideoCoding (HEVC) has been developed by ISO/IEC MPEG and ITU-

    T VCEG. In the emerging HEVC standard, several new features

    1 This work was supported by the National Research Foundation of

    Korea(NRF) grant funded by the Korea government(MEST) (No.

    2012R1A2A2A06047297).

    Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee are with

    the Inter-university Semiconductor Research Center (ISRC), Department of

    Electrical Engineering and Computer Science, Seoul National University,

    Seoul, Korea (e-mail: [email protected], [email protected],

    [email protected], [email protected]).

    are introduced, including a flexible block structure, the increased

    intra-coding directions, sophisticated interpolation filters, various

    in-loop filters, and enhanced entropy coding schemes. The HEVC

    standard aims at bitrate saving by a factor of two over H.264/AVC

    at the expense of an increase in computational complexity.

    Like H.264/AVC, mode decisions with motion estimation (ME)

    remain among the most time-consuming computations in HEVC.

    In an inter-prediction mode decision, a full-search algorithm

    searches for every possible block size and refines the results from

    integer-pel to quarter-pel resolution. Thus, a full-search algorithmguarantees the highest level of compression performance. However,

    the considerable computational complexity for a mode decision is

    critical for the encoding speed. Moreover, the main target

    resolution of HEVC is full HD (19201080) and beyond.

    Therefore, fast inter-prediction is not only an important challenge

    but also an urgent problem to be solved for HEVC compression to

    be used in real-time consumer electronic devices.

    Extensive research effort has been conducted to reduce the

    computational complexity for inter-prediction for H.264/AVC,

    pursuing an effective trade-off between the rate-distortion (RD)

    drop and the speed-up. In order to deal with the similar challenge

    for HEVC, this paper reviews principal algorithms which havealready been attempted for H.264/AVC. A survey of these various

    algorithms and an evaluation of their contributions and limitations

    provide valuable leads for the development of fast algorithms for

    HEVC inter-prediction. Major differences between H.264/AVC

    and HEVC are also investigated from an algorithmic and

    architectural perspective. Previous algorithms for the fast

    H.264/AVC inter-predictions are then modified and re-designed

    for HEVC inter-predictions so as to explore the possibilities for

    application to HEVC.

    The rest of the paper is organized as follows. Section II gives

    an overview of inter-prediction in HEVC. Previous approaches

    for fast inter-predictions in H.264/AVC are surveyed in Section

    III, and the application of the fast inter-mode selection algorithms

    to HEVC is presented in Section IV. Conclusions are given in

    Section V.

    II. OVERVIEW OF INTER-PREDICTION IN HEVCA.Inter-Prediction Algorithm in HEVC

    To achieve high compression performance for high-

    resolution videos, HEVC defines the coding unit (CU) as the

    basic processing unit instead of the macroblock (MB).

  • 7/29/2019 06415009

    2/9

    1376 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

    Unlike an MB of which size is fixed as 1616 pixels, the

    size of a CU is not fixed, varying from 88 to 6464. A

    large CU reduces the motion information data. Thus, the

    compression efficiency is improved in a lossless manner,

    especially for high-resolution videos. A CU can be

    partit ioned into smaller CUs and the structure among

    different CUs is represented by a quad-tree. The depth of

    this tree can be as large as four. The largest CU in depth 0 is

    denoted as LCU. For a CU of which size is denoted by

    2N2N, predictions are performed for various block sizes of

    2N2N, 2NN, N2N and NN. The processing unit for

    prediction is called prediction unit (PU).

    The HM5.0 reference software offers fast algorithms to

    speed up the prediction time by an early decision of the final

    prediction mode with the evaluation of only subsets of

    prediction modes. One of the fast algorithms is the early SKIP

    mode decision in which the computation for the SKIP mode is

    performed first for a 2N2N PU. If the RD cost is less than

    the average SKIP costs, as accumulated from the previous

    SKIP modes, not only the prediction of the other PU types at

    the same depth but also all predictions for the further depthsare omitted. Another fast algorithms are the early CU

    determination and the coded block flag (CBF)-based fast

    mode decision, denoted as ECU and CFM, respectively. The

    CBF represents blocks with a zero residual. In the ECU, the

    RD costs for the SKIP, 2N2N, 2NN and N2N inter-modes

    as well as the intra-mode are calculated at the current depth. If

    the SKIP mode cost is the smallest, predictions for CUs

    smaller than the current CU are not performed. Meanwhile,

    the CFM is used to select the PU size in the current CU and to

    save computation power for predictions of less-probable PU

    sizes. The predictions for the SKIP, 2N2N, 2NN and N2N

    PUs are processed in sequence. If the CBF of the current PUhappens to be all zeros, the prediction is terminated and the

    computation for the remaining PU sizes is saved, as a zero

    CBF indicates that the RD performance is adequate when the

    current PU is determined to be the best mode. Even if the

    current PU is different from the best PU, the difference in the

    RD cost between the current PU and the best PU may be

    negligible.

    B.Hardware Implementation for an HEVC EncoderThis subsection examines the impact of the HEVC coding

    structure on the hardware implementation. In this paper, it isassumed that the pipeline architecture for the HEVC hardware

    encoder may be similar to the widely used architectures forH.264/AVC encoders [5]-[9] where the integer motionestimation (IME) is performed in stage 1, whereas the FMEwith the MC is performed in stage 2. As the hardware encoder

    takes advantage of parallel and/or pipelined execution ofmultiple hardware resources, the dependence betweencomputations in the HEVC standard often causes anunexpected slow-down.

    To support the parallel execution of IMEs for all block-sizes

    in H.264/AVC, the sum of absolute differences (SADs) for all

    44 blocks of an MB is calculated simultaneously. The

    obtained SAD values are combined in the variable-block-size

    (VBS) adder tree and 41 SADs for all block sizes are

    generated in one cycle. The problem is that the rate term of

    the RD cost function can be computed only after the motion

    vectors (MVs) of the neighboring blocks are determined,

    which causes dependence among IMEs for various size blocks.

    In addition, when IME and FME are processed not serially but

    in a pipelined manner, the left MB is still in the FME stage.

    Thus, the best mode and the best MV of the left block are not

    available. In the H.264/AVC, the modified MV predictor

    (MVP) is applied for all 41 blocks [8]. Instead of the median

    value of MVs on the left, upper and upper-right blocks, the

    median value of MVs on the upper-left, upper and upper-right

    MBs are used for all 41 blocks equally in order to facilitate the

    parallel processing and the MB pipelining.

    The solution for parallel IME executions for H.264/AVC is

    able to be applied for an HEVC encoder. The MVP of 2N2N

    PU in LCU is used for all blocks equally. When this MVP is

    derived, the left and below-left candidates among the spatial

    MVP candidates are excluded. With a modification of the

    MVP derivation, the IME execution in HEVC now has largeparallelism. Moreover, the parallelism in IME execution for

    HEVC is larger than that for H.264/AVC. In H.264/AVC, the

    parallel execution of IME is done for 1616 MB and sub-

    blocks, whereas 6464 LCU and all blocks smaller than LCU

    can have their IMEs processed in parallel in HEVC. When the

    same search range and the search scheme are used for

    H.264/AVC and HEVC, a 1616 MB and a 6464 LCU are

    expected to have an identical IME time.

    For FME, it is more difficult to exploit available parallelism

    than for IME because the two-step FMEs for half- andquarter-pixel precisions should be performed sequentially.Furthermore, the modified MVP or the mode reductiondecreases the compression efficiency more seriously than IMEfast computation algorithms, in general. Besides, recent study

    on H.264/AVC hardware encoders [10]-[16] also shows thatthe speed-up of the execution time is easier for IME than FME.Thus, in H.264/AVC, FME is usually conducted one by onefor 41 blocks in a 1616 MB. As a result, the execution time

    for FME with the MC is most likely to be larger than that forIME. Even though the additional hardware resource is usedfor a parallel FME execution [12], the encoding time for anLCU is most likely determined by the time for the FMEfollowed by the MC and mode decision.

    III. FAST INTER-MODE SELECTION ALGORITHMS FORH.264/AVC

    In this section, fast inter-mode selection algorithmsproposed for H.264/AVC are surveyed. An effective pre-

    selection of prediction block sizes is crucial for fast encoding.Reduction of the prediction block sizes often requires an RDdrop. An effective trade-off between the RD drop and thespeed-up has been one of the main research subjects to be

    tackled. The previous algorithms are categorized according tothe decision stage and criteria, as shown in the classificationtree in Fig. 1.

  • 7/29/2019 06415009

    3/9

    C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1377

    ME

    pre-decision

    Hierarchicaldecision

    Motion characteristics of the

    current MB

    Spatial characteristics of the

    current MB

    Remaining prediction according to

    the prior prediction result

    Further prediction by comparing

    the prior predictions

    Neighboring (spatial and temporal)

    information

    FME

    pre-decision

    Mode pre-decision based on the

    rate-distortion cost from IME

    Reduction of FME calculation

    from the reuse of integer-pel MVs

    Fast inter-modeselection

    Fig. 1. Classification tree of fast inter-predictions for H.264/AVC

    There are roughly three categories of algorithms for fastinter-mode prediction. In the first category, candidate block

    sizes are determined prior to ME and prediction operations

    including ME are performed for only the selected candidateblock sizes. This category is further classified into three sub-categories. In the first sub-category, spatial and/or temporalcorrelation in a video is widely used to select candidates and

    the degree of correlation is obtained from neighboringinformation. For instance, if an MB is surrounded byneighboring MBs coded as the DIRECT or SKIP mode, thevideo sequences are assumed to be changing smoothly and the

    motion is similar to that in the neighboring area. In this case,the current MB is very likely to be coded in the DIRECT orSKIP mode or with a large block size such as 1616 [17]-[22].In a similar manner, various algorithms [23]-[27] search forthe best block size based on spatial and temporal homogeneity

    investigations of the neighboring blocks.The algorithms in the second sub-category take advantage of

    the correlation between the motion homogeneity and the bestblock size. Natural video sequences include stationary or

    motionless regions for which the optimal block sizes aremostly large. Thus, the MVs of spatially and temporallyadjacent MBs are used to classify the motion characteristics[28], whereas the absolute difference between consecutiveframes is used to detect motion homogeneity [26][29][30]. Liu

    et al [33] estimate the motion homogeneity of the current MBby MVs, which are generated from ME on 44 blocks insidethe current MB.

    In the algorithms in the third sub-category, candidate block

    sizes are predicted through spatial characteristics of thecurrent MB. A frame-level edge map or a variance of the MBis estimated to detect a homogeneous region [23][31][32]. Inother studies, the image is down-sampled and pre-encoded

    [34][35]. The candidate block sizes are obtained aftercomparing the estimated RD cost during pre-encoding.

    The algorithms of the second category explore the best blocksize in a hierarchical manner. In other words, certain blocksizes are estimated prior to the other block sizes and the

    decision for a further block size search is then made using theresult of the prediction of the prior block sizes [29][31][36]-

    [44]. In the first sub-category, the result of the prior predictionis tested. The decision regarding a further prediction isdetermined based on the test result. One of the most popular

    algorithms is the early SKIP mode decision, where predictionsof remaining block sizes are performed only when the earlySKIP condition is not satisfied. Kannangara et al [43] makethe early SKIP mode prediction by estimating a Lagrangian

    RD cost function which incorporates an adaptive model for

    the Lagrangian multiplier parameter based on local sequencestatistics. Other studies [37][38][41][44] propose a simplethreshold-based algorithm to detect zero-coefficients blocks.Zero-coefficients represent the small distortion in the RD cost

    function, and the SKIP mode decision is made early withoutthe expensive computation of the real RD cost. Or, if the RDcost of the SKIP mode is less than the threshold, the SKIPmode is selected as the best mode [36]. Here, the threshold is

    defined as N bits the Lagrangian multiplier parameter,where N bits are equal to the minimum number of the bitsrequired for the non-SKIP mode.

    In the second sub-category, the results of the priorpredictions are compared and further prediction is determined

    according to the comparison result. In the algorithms of Yuand Chois studies [37][40], the RD costs of block sizes arecompared in the order of large to small block sizes. If thecurrent RD cost is larger than the RD cost of the larger block

    size, further searches for blocks smaller than the current blockare stopped. In Yin and Lees studies [36][42], the RD costsof square blocks, 1616, 88 and 44, are tested first. If thetendency of these RD costs is not monotonic, all other non-square blocks need to be tested. Otherwise, only block sizes

    between the best two square block sizes are searched. Inaddition to the hierarchical decision approach, much researchhas proposed a hybrid solution which selects candidateprediction block sizes prior to ME using the information

    mentioned in the first category, after which prediction blocksizes are searched in a hierarchical manner [18][31][37].

    In the first sub-category of the third category, IME isperformed for every block size. Next, the results of IME areused to select the candidate block sizes for FME. In the

    simplest approach, called mode pre-decision (MPD), the bestcombination of various block sizes (VBS) for an MB isselected with the IME results. FME simply refines the integer-pel MV of the selected block size to the quarter-pel precision.

    MPD suffers from a significant RD drop because the bestblock size from the IME may change after refinement in theFME. To achieve a better trade-off between the compressionefficiency and computational complexity, the advanced MPD

    (AMPD) is proposed [45]. In the AMPD, more than onecandidate block size for the subsequent FME operations isselected. Seven partitions, four 88 partitions together withthe 168, 816 and 1616 partitions, are sorted according totheir IME cost. As a result, N (N = 1~7) partitions are selected

    for the FME. In AMPD2 [45], one candidate is selected fromthe 168, 816 and 1616 partitions, whereas two areselected from the 88 partitions. Similarly, two partitions areselected by mode filtering (MF) for the FME operation fromthe IME phase [10]. One is selected from the 168, 816 and

    1616 partitions and the other is selected from the 88, 168,

  • 7/29/2019 06415009

    4/9

    1378 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

    816 and 1616 partitions, where the 88 partition consists ofthe best sub-block sizes. In this MF algorithm, the number ofselected block sizes is relatively low and larger block sizes are

    more frequently selected than AMPD2.In the second sub-category, computation-reuse techniques

    are adopted. Shao et al [46] propose that the FME for eachblock size is performed one by one. If the integer MV of thecurrent block is identical to that of the block already processed,

    no FME computation needs to be performed. In particular, inthe homogeneous region, adjacent blocks tend to have thesame integer MV after IME. Therefore, this reusing techniquereduces the calculation for the prediction of the block size

    with no RD drop. The same algorithm is applied to block sizeslarger than an 88 block [47]. The FMEs for blocks smallerthan 88 are omitted. Thus, the encoding time decreases morethan that of Shaos algorithm with a reasonable PSNR drop.

    IV.APPLICATION OF FAST INTER-MODE SELECTION

    ALGORITHMS TO HEVC

    As explained in Section II, HEVC supports larger and more

    various block sizes than H.264/AVC. If the early decision ismade to select the prediction block size, the computational

    complexity is significantly reduced by omitting the remainingpredictions. Recently, several fast inter-mode selectionalgorithms have been proposed for HEVC. However, it is

    important first to take advantage of the considerable amountof previous work and to find a guide for application to HEVC.In this paper, several previous algorithms proposed forH.264/AVC are modified and tested for HEVC. Algorithmswhich require an additional calculation to judge the texture

    characteristic or motion homogeneity, such as a frame-leveledge map, are not used for simplification. The followingalgorithms are implemented in the HM5.0 reference software.

    A.Prediction Block Size Pre-DecisionIn Section III.A, the block size prediction algorithms are

    classified into three sub-categories that utilize spatial/temporal

    correlations, motion vector information and the spatialcharacteristics of the current MB, respectively. Thissubsection evaluates the effectiveness of the three types ofalgorithms when they are used for early block size decisions

    in HEVC. To this end, the relationship between the aboveinformation and depth of the current LCU is examined with

    experiments. Note that the depth of the LCU determines theblock size in HEVC. Ten video sequences,Akiyo, Container,

    TABLEI

    CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR352288TEST VIDEOS

    QP Dep thMax(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)

    Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan

    20

    0- - - - - - - - - - - - - - -

    10.23 0.22 5.26 3.81 7.25 2.04 3.41 2.75 1.81 3.33 1.84 2.49 2.81 1.85 2.88

    (0.41) (0.38) (21.81) (370.40) (91.98) (6258.15)(10528.18 (7506.87) (736.33) (4891.66) (0.82) (0.39) (0.18) (0.76) (0.13)

    20.51 0.28 7.05 2.73 13.96 2 .26 2.67 3.81 2.60 3.08 2.85 2.78 2.80 2.75 2.95

    (0.5 6) (0.34 ) (1 20 .28) (22 2.79) (2 95 .62) (30 20.20 ) (70 49 .42 )(11 02 0.7 6 (17 46 .03) (3 889 .57) (0 .1 5) (0 .2 0) (0.17 ) (0.31 ) (0.05 )

    30.70 0.49 6.71 10.99 16.76 2.93 1.96 2.65 2.30 2.56 2.93 2.89 2.97 2.97 2.98

    (2 .0 1) (0 .5 1) (1 45 .7 0) (1 15 1.85 ) (3 82 .3 9) (4 84 4.59 ) (4 59 2.76 ) (6 03 5.12 ) (1 76 7.59 ) (2 88 3.81 ) (0 .0 7) (0 .1 3) (0 .0 3) (0 .0 3) (0 .0 3)

    32

    0- - - - - - - - - - - - - - -

    10.43 0.37 9.12 5.42 11.89 2 .51 4.26 3.93 1.97 3.19 1.29 1.60 2.00 1.50 2.21

    (0.4 5) (0.86 ) (1 042 .00 ) (54 7.51) (1 71 .20) (68 69.12 ) (95 60 .06 )(12 06 5.0 5 (8 06 .6 7) (5 411 .69) (0 .4 4) (0 .6 3) (0.54 ) (0.59 ) (0.55 )

    21.21 0.43 8.91 6.87 10.81 2 .95 2.96 2.99 3.07 2.88 2.14 2.33 2.29 2.27 2.55

    (2.2 9) (0.36 ) (2 03 .11) (64 4.77) (1 94 .82) (39 40.74 ) (37 48 .08 ) (69 30 .70) (12 79 .87) (2 980 .45) (0 .4 2) (0 .6 0) (0.42 ) (0.63 ) (0.25 )

    31.00 0.67 5.49 6.74 10.84 2 .70 1.18 2.17 2.65 2.72 2.70 2.82 2.72 2.90 2.94

    (0.8 6) (0.67 ) (5 5.16 ) (54 2.93) (1 89 .52) (38 64.05 ) (13 87 .00 ) (11 05 .39) (18 85 .03) (1 729 .40) (0 .2 3) (0 .2 8) (0.21 ) (0.09 ) (0.06 )

    TABLEII

    CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR19201080TEST VIDEOS

    QP Depth

    Max(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)

    AspenBasketBal

    DriveB QTerrace C actu s Kimo no 1 As pen

    BasketBalDrive

    B QTerrace C actu s Kimo no 1 Asp enBasketBal

    DriveBQTerrace Cactus Kimono1

    20

    069.63 73.68 27.17 11.82 15.18 0.03 0.19 0.07 0.06 0.19 1.57 1.24 1.29 2.00 2.02

    (14436.13 (40665.65 (3582.74) (15.78) (624.55) (12.57) (91.78) (0.45) (29.32) (135.24) (0.68) (0.75) (0.81) (0.70) (0.53)

    144.90 75.85 40.38 37.86 18.93 0.43 0.52 1.83 0.96 0.39 2.21 2.51 2.17 2.83 2.40

    (7 21 1. 57 )(2 24 41 .7 4 (8 69 4.43 )(1 60 47 .7 6) (7 21 .2 0) (4 32 .4 6) (7 61 .2 9) (1 01 25 .7 2 (2 36 5.81 ) (3 63 .0 5) (0 .4 4) (0 .3 5) (0 .7 3) (0 .2 3) (0 .3 9)

    233.70 62.34 20.82 12.94 20.59 0.46 0.54 1.80 0.86 0.44 2.51 2.76 2.88 2.97 2.62

    (3 31 6.01 )(1 07 57 .6 0 (3 02 3.15 ) (3 73 2.76 ) (8 93 .4 1) (3 32 .5 0) (6 11 .1 0) (4 49 7.16 ) (1 78 6.60 ) (5 87 .2 8) (0 .3 2) (0 .2 0) (0 .1 3) (0 .0 3) (0 .2 7)

    331.21 60.76 11.81 8.14 24.33 0.52 0.69 1.92 1.00 0.46 2.77 2.95 3.00 2.99 2.82

    (3 59 0.61 ) (7 22 1.25 ) (3 03 9.67 ) (5 09 .8 6) (1 56 0.22 ) (3 56 .2 3) (6 66 .9 0) (3 83 4.08 ) (2 28 4.00 ) (6 69 .3 7) (0 .1 9) (0 .0 5) (0 .0 1) (0 .0 1) (0 .1 5)

    32

    016.71 36.81 5.01 1 .79 13.59 0.20 0.23 0 .99 0 .60 0.27 0.76 0.77 0.95 0.69 1.44

    (1 35 3.52) (2 52 1.04 ) (1 01 .63) (2 83 .5 8) (4 40.4 5) (10 8.80) (18 4.61) (18 54 .24 ) (8 75 .6 6) (2 75 .8 7) (0.6 0) (0.61 ) (0 .93 ) (0 .8 1) (0.5 7)

    131.24 64.82 4.31 7 .98 19.02 0.77 0.99 2 .90 1 .53 0.45 1.64 1.69 1.92 1.99 1.82

    (3 34 8.42 ) (8 08 1.78 ) (5 2.6 8) (4 26 .4 5) (1 10 8.55 ) (6 80 .7 1) (9 97 .5 8) (7 92 5.55 ) (4 71 4.41 ) (5 39 .6 8) (0 .5 1) (0 .5 1) (0 .6 7) (0 .5 4) (0 .3 7)

    212.48 63.44 4.78 12.27 23.72 0.58 1.06 2.38 1.03 0.55 2.14 2.18 2.43 2.47 2.02

    (7 5.7 6) (4 56 0.53 ) (6 8.73 ) (8 56 .5 9) (2 57 6.14) (23 6.52) (64 0.00) (43 73 .66 ) (7 50 .4 0) (6 96 .8 7) (0.5 0) (0.38 ) (0 .47 ) (0 .3 6) (0.3 0)

    355.96 57.45 4.02 14.01 26.78 0.51 1.03 2.02 0.98 0.68 2.07 2.58 2.81 2.81 2.34

    (4 83 8.08 ) (2 71 0.68 ) (2 3.6 6) (1 30 7.32 ) (6 79 2.91 ) (1 21 .2 5) (6 01 .1 1) (3 14 8.50 ) (6 38 .7 6) (6 13 .0 5) (0 .6 0) (0 .2 7) (0 .2 1) (0 .2 0) (0 .3 3)

  • 7/29/2019 06415009

    5/9

    C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1379

    Foreman, Sean and Stefan with a resolution of 352288 aswell as Aspen, BasketBallDrive, BQTerrace, Cactus and

    Kimono1 with a resolution of 19201080 are used. The

    352288-size test videos use the same sequences used in theresearch for H.264 [17][28].

    In Table I and Table II, the correlation between the depth ofthe current LCU and the information obtained fromneighboring LCUs is presented for 352288 and 19201080

    video sequences, respectively. The first column represents thequantization parameter (QP) values, while the second columnrepresents the depth. The maximum value among the absoluteMVs of the neighboring LCUs is obtained for each LCU and

    presented from the third to seventh columns of Tables I and II.The average and the variance (given in parenthesis) of themaximum MVs for each depth are shown in these columns.For a low resolution video in Table I, depth 0 is seldomselected. For these cases, the corresponding cells are left blank.

    For videos at a resolution of 352288 and with QP=20, itappears that the depth of the LCU increases as the averagemagnitude of the neighboring MVs increases. The variance isalso not large in this case. This result follows the data proposed

    in the research [28], which uses H.264/AVC targeting 176144and 352288-size videos. For 19201080 videos in Table II,however, the depth of the LCU does not increase along with theneighboring MVs and its variance is very large. This indicatesthat the correlation between the depth of the current LCU and

    the neighboring MVs does not exist. In high-resolution videos,the MV values are quite large, even when the motion seemsstationary. Sometimes, a large block size is preferred, even withfast and complex motion, because the texture, brightness or

    colors of the same object can be changed in a different way inevery frame. In this case, the elaborated ME with a small blocksize cannot reduce the prediction error. Therefore, theobservation leads to the conclusion that the correlation between

    the depth of each LCU and the neighboring MVs becomes smallfor high-resolution videos.

    If the pixel variance of a certain region is small, this region islikely to be spatially homogeneous and is probably encoded as alarge block size. This possibility is tested and the results are

    presented from the eighth to twelfth columns in Tables I and II.In these columns, the correlation between the depth and thevariance of the current LCU is presented. For 19201080videos encoded with QP=20 in Table II, the variance of thecurrent LCU is quite low when its depth is 0. However, in other

    cases, the correlation between the depth and the variance of thecurrent LCU is not very strong. In HEVC, the number of blocksizes is significantly larger than that in H.264/AVC. Moreover,

    blocks can be encoded as the SKIP mode not only in the LCUsize but also in every CU. These changed SKIP mode decisionand the number of block sizes make it difficult to find a strongcorrelation between the depth and the variance.

    From the thirteenth to seventeenth columns in Tables I andII, the correlation between the depth of the current LCU and

    the depth information of the neighboring LCUs is presented.The depth of the current LCU becomes large as theneighboring depths increases while its variance is quite small.These results show that the correlation between the depths for

    the current and neighboring LCUs is positive.

    From the above simulations, the neighboring depthinformation may be helpful for a prediction of the block size(or the LCU depth), whereas the MV or variance information

    may be not very useful.

    B.Hierarchical Decision of Prediction Block SizeThe algorithm of Yu [37] checks three conditions for an

    early SKIP mode decision. First, one of neighboring blocks is

    the SKIP mode block. Second, the sum of absolute difference(SAD) of the current MB is less than the average SAD of theneighboring MBs. Here, the SAD is the difference between a

    block in the current frame and a co-located block in thereference frame. Lastly, the result of the fast transform-quantized coefficients is zero. In Table III, the early SKIPmode decision algorithms denoted by ES proposed in theHM5.0 reference software and Yus algorithm [37] are tested.

    For the simulation of a hardware-based HEVC encoder, theencoding time to process a LCU is estimated by adding thetime for the FME with the MC of each CU inside an LCU,as the stage of the FME with the MC operations take the

    most time in the pipeline schedule as discussed in Section

    II.B. The configurations for the encoding are low-complexity, low-delay, and P picture-only and the numberof reference frames is four at most. Twelve video sequences,

    BQMall, FlowerVase, Keiba and RaceHorses with a

    resolution of 832480; FourPeople, KristenAndSara,

    Johnny and Vidyo1 with a resolution of 1280720; and

    Aspen,BasketBallDrive, SnowMountain andKimono1 with aresolution of 19201080, are used in the evaluation. There

    are 50 frames in each test sequence, and four QPs (20, 24,28 and 32) are used. The first and the second columnsrepresent the resolutions and test sequences used in thesimulation. From the third to fifth columns, the increase inbitrate and PSNR and the time saved, denoted by B, P

    and T, respectively, are shown when the ES proposed inthe reference software is applied. The time is reduced by60.76%, whereas the bitrate slightly decreases and the PSNRis degraded by 0.02dB. Yus algorithm [37] makes the early

    SKIP mode decision considering neighboring informationand the characteristics of the current CU unlike the RD cost-based algorithm in the reference software. From the sixth toeighth columns, two algorithms are used together tocomplement each other. The time is reduced by 69.82%.

    TABLEIII

    RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY AN EARLY

    SKIPMODE DECISION

    Size Videos

    ES in HM ES in HM + Yus

    B P T B P T

    (%) (dB) (%) (%) (dB) (%)

    832

    480

    BQMall - 0.32 - 0.02 53.81 2.45 - 0.13 67.62

    FlowerVase -0.03 -0.01 71.87 -0.56 -0.09 81.95

    Keiba -0.13 - 0.01 38.96 2.76 -0.08 53.05

    RaceHorses -0.18 -0.02 31.48 1.49 -0.08 39.98

    1280

    720

    FourPeople -0.84 -0.03 82.55 -0.30 -0.09 88.00

    Johnny - 1.15 - 0.03 78.36 - 1.30 -0.07 84.50

    KristenAndSara -1.06 -0.02 79.08 -0.63 -0.08 84.92

    Vidyo1 - 0.62 -0.02 78.08 - 0.29 -0.10 85.61

    1920

    1080

    Aspen -0.19 -0.01 59.42 0.68 - 0.05 68.72

    BasketBallDrive -0.43 -0.02 54.68 0.97 -0.07 63.27

    SnowMoutain -0.50 -0.02 58.79 -2.13 -0.10 68.53

    Kimono1 -0.16 -0.01 42.09 0.77 -0.04 51.73

    Average - 0.47 -0.02 60.76 0.33 - 0.08 69.82

  • 7/29/2019 06415009

    6/9

    1380 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

    The ECU and CFM algorithms are applied to HEVC and theresults are tabulated, as presented in Table IV. From the thirdto fifth columns, the ECU algorithm is used alone. The

    encoding time is reduced by 48.03% and the RD drop ismarginal. When ECU and CFM algorithms are used together,the encoding time saved is 64.03%, whereas the PSNR is0.07dB less than that of ECU. When the three algorithms ofECU, CFM and ES are used together, only 5% of time is

    additionally saved.According to the categorization in Section III, the early

    SKIP mode decision, ECU, and CFM are all classified as

    belonging in the first sub-category in the hierarchical decision.

    On the other hand, no algorithms for the second sub-category

    are defined in the HM5.0 reference software. The second sub-

    category algorithms for H.264/AVC are applied to HEVC

    compression and the effect on the RD performance and the

    speed-up is investigated. In a number of previous block-size-

    reduction algorithms, the prediction of the block size at the

    lower depth is performed first and the searches for deeper

    depths are then stopped if a certain condition is satisfied

    [18][31][37]. The following three algorithms are classified asbelonging in the third sub-category in the hierarchical decision

    according to the categorization in Section III.A Early

    termination of CU, which is similar to the algorithm proposed

    by Lee [31] (denoted as ETCU1 henceforth), is applicable for

    the reduction of the block size search. The predictions for four

    CUs at depth (d+1) are performed after the prediction for CU

    at depth (d). Every time the prediction for the CU at depth

    (d+1) is finished, the RD cost of each CU is accumulated and

    compared with the early termination threshold. If the current

    accumulated RD cost at depth (d+1) is larger than the

    threshold, the total RD cost of four CUs at depth (d+1) is

    expected to be larger than that of the corresponding CU at

    depth (d). Thus, the ongoing prediction at depth (d+1) isterminated early. The threshold is derived from the RD cost at

    depth (d). In the Yus algorithm [37], if the RD cost of

    2N2N at depth (d+1) is greater than a quarter of the best RD

    cost at depth (d), further searches on 2NN and N2N PUs at

    depth (d+1) as well as deeper depths are not performed. This

    algorithm is denoted as ETCU2 henceforth. Another early

    termination algorithm proposed not performing a FME

    operation at each depth [18]. This strategy is denoted as

    FME_SKIP hereafter. The SKIP mode plays an important role

    in compression efficiency and SKIP mode prediction is, thus,

    always performed, even when various fast-mode decision

    schemes are applied. The result of the SKIP mode prediction

    is obtained very quickly due to its low complexity as

    compared to other inter- and intra-predictions. If the ME cost

    as estimated in the middle of its computation is greater than

    the SKIP cost, the ME operation is terminated. A specific

    algorithm is as follows. After IME, the IME cost is compared

    to the cost of the SKIP mode using the condition C FME_SKIP as

    defined in (1). Here, COSTSKIP is the cost of the SKIP mode,

    whereas COSTIME denotes the IME cost. If COSTSKIP is less

    than COSTIME multiplied by WFME_SKIP, FME is not performed

    for the current block. The weight value, WFME_SKIP, is chosen

    experimentally and is set to 0.8 because it is observed that the

    cost obtained from FME is approximately 80% of COST IME

    on average. Therefore, the final cost of ME can be estimated

    as 0.8COSTIME, and this estimated ME cost is compared with

    COSTSKIP.

    CFME_SKIP: COSTSKIP < WFME_SKIPCOSTIME (1)

    In Table IV, from the twelfth to fourteenth columns,

    ETCU1 algorithm is used alongside ECU and CFM

    algorithms. The encoding time is reduced by 68.26%, whereas

    the increase in bitrate and the PSNR drop are 1.88% and

    0.23dB, respectively. ETCU2 algorithm from the fifteenth to

    seventeenth columns shows 75.25% of time saving but the RD

    drop is quite large. Lastly, from the eighteenth to twentieth

    columns, the simulation results are shown when ECU, CFM

    and FME_SKIP algorithms are used together. The time saving

    of 89.95% is achieved, whereas the RD performance is much

    better than those of ETCU1 and ETCU2. For three simulationsincluding ECU+CFM+ETCU1, ECU+CFM+ETCU2 and

    ECU+CFM+FME_SKIP, using ES algorithm additionally is

    not helpful both for the time saving and the RD performance.

    C.Decision of Prediction Block Sizes before FMEIn H.264/AVC, AMPD (or AMPD2) or MF has been

    successfully used for block size reduction. As explained in

    Section II, the reduction of the FME time is very important for

    real-time encoding for a hardware-based encoder

    TABLEIV

    RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY THE ECU AND CFM ALGORITHMS PROPOSED IN THE HM5.0REFERENCE SOFTWARE

    Size Videos

    ECU ECU+CFM ES+ECU+CFM ECU+CFM+ETCU1 ECU+CFM+ETCU2 ECU+CFM+FME_SKIP

    B P T B P T B P T B P T B P T B P T

    (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%)

    832

    480

    BQMall -0.87 -0.05 44.84 -1.47 -0.12 60.83 -1.81 -0.16 65.49 3.48 -0.25 64.04 19.78 -0.58 72.04 -1.10 -0.15 82.49

    FlowerVase -1.77 -0.07 66.12 -3.97 -0.21 82.30 -4.11 -0.23 86.46 1.79 -0.52 84.82 5.44 -0.63 86.34 -2.63 -0.29 96.35

    Keiba -0.51 -0.03 30.16 -1.13 -0.09 47.67 -1.36 -0.11 51.64 3.23 -0.17 53.90 10.49 -0.31 64.86 -0.70 -0.12 78.27

    RaceHorses -0.27 -0.02 14.67 -0.66 -0.09 32.62 -0.93 -0.14 38.16 5.98 -0.27 43.83 16.76 -0.48 55.73 -0.18 -0.10 66.37

    1280

    720

    FourPeople -1.15 -0.04 67.92 -1.89 -0.09 81.95 -2.42 -0.12 86.97 2.71 -0.28 83.15 10.20 -0.47 86.34 -1.85 -0.12 95.49

    Johnny -1.96 -0.04 64.72 -3.95 -0.13 80.83 -4.85 -0.16 85.96 -0.36 -0.22 82.02 5.91 -0.39 85.71 -3.91 -0.18 95.05

    KristenAndSara -1.79 -0.06 64.35 -2.79 -0.14 80.33 -3.52 -0.18 85.34 2.36 -0.30 81.70 9.21 -0.50 85.15 -2.67 -0.17 95.30

    Vidyo1 -1.89 -0.05 65.23 -2.80 -0.13 80.14 -3.24 -0.15 85.03 1.70 -0.30 82.14 6.88 -0.44 84.90 -2.85 -0.16 94.92

    1920

    1080

    Aspen -0.34 -0.01 45.75 -1.01 -0.05 62.68 -1.24 -0.07 67.58 1.71 -0.12 67.94 1.47 -0.14 74.94 -0.61 -0.08 86.06

    BasketBallDrive -0.53 -0.02 35.41 -1.27 -0.07 53.89 -1.72 -0.10 60.35 2.16 -0.12 61.24 3.69 -0.17 69.32 -1.17 -0.09 81.94

    SnowMoutain -2.31 -0.08 52.21 -3.30 -0.12 64.35 -3.46 -0.13 67.82 -3.20 -0.17 66.15 -3.11 -0.28 75.54 -3.10 -0.13 82.42

    Kimono1 -0.16 -0.01 24.94 -0.32 -0.03 40.79 -0.64 -0.07 46.65 1.04 -0.06 48.15 2.14 -0.10 62.14 -0.06 -0.05 76.73

    Average -1.13 -0.04 48.03 -2.05 -0.11 64.03 -2.44 -0.13 68.95 1.88 -0.23 68.26 7.40 -0.37 75.25 -1.74 -0.14 85.95

  • 7/29/2019 06415009

    7/9

    C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1381

    implementation. Thus, in this subsection, the above algorithms

    are applied for HEVC and their effectiveness is then tested

    through simulation. To apply AMPD2 or MF to the HEVC

    mode decision, candidate partitions should be defined during

    the IME phase. Fig. 2 shows one example of a block size

    prediction in the IME phase. In the Clusters 1 and 2, there are

    three 6464 CU partitions and three 3232 CU partitions,

    respectively, whereas there are four 1616 CU partitions in

    the Cluster 3. For the 88 CU partition, the best block size is

    selected based on the IME cost. For the 1616 CU partition,

    the IME costs of the 2N2N, N2N, 2NN and NN types

    are sorted in an ascending order, whereas the IME costs of the

    2N2N, N2N and 2NN types are sorted in an ascending

    order for the 3232 and 6464 CU partitions. Through this

    process, ten partitions in total are selected for FME, as shown

    in Fig. 2.

    Cluster16464

    Cluster2

    Cluster3

    Fig. 2. Prediction modes pre-determined in the IME phase

    From the third to fifth columns in Table V, the RD

    performance degradation and the encoding time saved are

    shown when FME is performed for the ten candidate

    partitions of Fig. 2. The time saving is 30.15%, whereas theRD drop is marginal. From the sixth to eighth columns, the

    seven candidates, three from the Cluster 1, two from the

    Cluster 2 and two from the Cluster 3, are chosen. The time is

    reduced by 57.44 %, whereas the increase in bitrate and the

    drop in the PSNR are 0.21% and 0.04 dB on average. From

    the ninth to eleventh columns, FME is performed for the four

    candidate partitions. One from the Cluster 1 and another one

    from the Cluster 2 are selected, whereas two are selected from

    the Cluster 3. The encoding time is reduced by 73.36%,

    whereas the increase in bitrate and the drop in the PSNR are

    0.26% and 0.04 dB on average. From these simulations, this

    algorithm turns out to be very effective for speed-up without a

    significant RD degradation for all types of video sequences.

    D.Algorithm EvaluationFrom Sections IV.A to C, in HEVC, it can be inferred that

    pre-decisions of prediction block sizes are very difficult,whereas hierarchical decisions or decisions based on the

    results from IME are useful for saving time. However, some

    of these algorithms offers a different degree of performance

    according to the video characteristics. In Fig. 3, the times

    saved by various algorithms are compared for theRaceHorses

    and FourPeople video sequences denoted by black and gray

    bar graphs, respectively. The FourPeople sequence has slow

    motion and its texture is smooth, whereas the RaceHorses

    sequence includes fast and irregular motion. The hierarchical

    decision presented in the HM5.0 reference software, including

    the ES scheme, is very effective for theFourPeople sequence.

    However, for theRaceHorses, the benefit from those ES, ECUand CFM algorithms are not large and is less than half of that

    for the FourPeople sequence. Another notable observation is

    that the combination of the ES, ECU and CFM increases the

    time saving. However, the rate of increase is not significant as

    the effects of those schemes are overlapping in many cases.

    When the ECU and CFM schemes are combined with other

    hierarchical decision schemes of the ETCU1, ETCU2 and

    FME_SKIP, the time saving for the RaceHorses is improved

    substantially, whereas the amount of the time saving is

    increased slightly for the FourPeople sequence. Unlike other

    hierarchical decision algorithms, AMPD algorithms show the

    similar performance for both video sequences. The time

    saving is increased as the number of candidates are reduced.

    As shown in Fig. 3, most algorithms show significant time

    savings for theFourPeople sequence, whereas the variation in

    the saved time is very large in theRaceHorses sequence. Only

    four combinations, ECU+CFM+ETCU2, ECU+CFM+

    FME_SKIP as well as the AMPD algorithms with 7 and 4

    candidates, show time savings of over 50% for both

    FourPeople and RaceHorses.

    Fig. 3. Algorithm comparison in terms of the time saved

    TABLEV

    RDPERFORMANCE DEGRADATION AND ENCODING TIME SAVED

    ACCORDING TO MODES DETERMINED IN THE IMEPHASE

    Size Videos 10 candidates 7 candidates 4 candidatesB P T B P T B P T

    (%) (dB) (%) (%) (dB) (%) (%) (dB) (%)

    832

    480

    BQMall 0.25 -0.03 30.23 0.58 -0.05 57.24 0.48 -0.05 72.98

    FlowerVase -0.05 -0.04 30.23 0.20 -0.07 58.19 0.29 -0.07 73.93

    Keiba 0.16 -0.02 30.23 0.75 -0.05 56.74 1.01 -0.04 72.42

    RaceHorses 0.61 -0.04 30.23 1.30 -0.07 55.92 1.37 -0.07 71.73

    1280

    720

    FourPeople -0.04 -0.01 30.08 0.01 -0.03 57.71 0.08 -0.03 73.87

    Johnny -0.46 -0.01 30.08 -0.62 -0.03 57.64 -0.24 -0.04 73.68

    KristenAndSara -0.37 -0.02 30.08 -0.20 -0.03 57.71 -0.18 -0.03 73.75

    Vidyo1 -0.03 -0.01 30.08 -0.23 -0.02 57.71 -0.01 -0.03 73.75

    1920

    1080

    Aspen -0.05 0.00 30.14 0.39 -0.02 57.95 0.41 -0.02 73.86

    BasketBallDrive -0.10 -0.01 30.14 0.42 - 0.02 57.54 0.38 - 0.02 73.50

    SnowMoutain -0.34 -0.04 30.14 -0.51 -0.06 57.12 -0.55 -0.06 73.15

    Kimono1 -0.09 0.00 30.14 0.39 -0.02 57.78 0.08 -0.02 73.74

    Average -0.04 -0.02 30.15 0.21 -0.04 57.44 0.26 -0.04 73.36

  • 7/29/2019 06415009

    8/9

    1382 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

    In Fig. 4, the RD performances of the ECU+CFM+ETCU2,

    ECU+CFM+FME_SKIP and AMPD algorithms with 7 and 4

    candidates are compared to that of the HM5.0 reference

    software where no early decision algorithm is adopted. The

    horizontal and the vertical axes show the bitrate and the PSNR,

    respectively. The RaceHorses and FourPeople video

    sequences are used in Figs. 4(a) and (b), respectively. The RD

    performance of the three algorithms of the

    ECU+CFM+FME_SKIP and AMPD algorithms are

    comparable to that of the HM5.0 reference software, whereas

    the RD drop of the ECU+CFM+ETCU2 algorithm denoted by

    the dash curve is quite large.

    30

    34

    38

    42

    46

    0 5000 10000

    PSNR

    (dB)

    Bitrate (kbps)

    ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2

    30

    34

    38

    42

    46

    0 1000 2000 3000

    PSNR

    (dB)

    Bitrate (kbps)

    ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2

    (a) (b)

    Fig. 4. Algorithm comparison in terms of the RD performance: (a)

    832480-size RaceHorses sequence (b) 1280720-size FourPeople

    sequence

    V.CONCLUSION

    The HEVC standard employs a hybrid coding approach

    similar to that of the H.264/AVC standard. Thus, the two

    standards have much in common. In this paper, the fast mode

    decision algorithms for H.264/AVC are surveyed and then

    they are applied for the speed-up of HEVC encoding. One of

    the major differences is that the number of block sizes

    supported by HEVC is 10 times more than that of H.264/AVC.

    The other is that the execution time for FME becomes much

    larger than that for IME because IME execution can be speed

    up by exploiting parallelism while FME execution needs to be

    executed in a serial manner. This second difference needs to

    make the fast execution of FME become more important than

    that of IME when a hardware-based encoder is used for

    HEVC compression. It is experimentally shown that a

    hierarchical inter-mode decision algorithm is a very effective

    solution for HEVC because there are many opportunities to

    terminate further prediction during searching a tree of CUs. In

    the future, the previous algorithms tested in this paper need tobe further elaborated and enhanced.

    REFERENCES

    [1] Draft ITU-T Recommendation and Final Draft International Standard ofJoint Video Specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC),

    2003.

    [2] ISO/IEC JTC 1 SC29 WG11, "Joint Call for Proposals on VideoCompression Technology," Doc. N11113, Jan. 2010.

    [3] ISO/IEC JTC 1 SC29 WG11, "Vision, Applications and Requirementsof High-Performance Video Coding," Doc. N11096, Jan. 2010.

    [4] T. Wiegand, W.J. Han, B. Bross, and J. R Ohm, and G.J. Sullivan,WD4: Working Draft 4 of High-Efficiency Video Coding,

    JCTVCF803, Torino, IT, July 2011.

    [5] Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kou, S.-J. Wu, W.-C. Tai, W.-C.Chang, and T.-Sheuan Chang, A 242mW, 10mm2 1080p H.264/AVC

    High Profile Encoder Chip, in Proc. of Design Automat. Conf., pp.78-

    83, July 2008.

    [6] Y.-H. Chen, T.-D. Chuang, Y.-J. Chen, C.-T. Li, C.-J. Hsu, S.-Y. Chien,and L.-G. Chen, An H.264/AVC scalable extension and high profile

    HDTV 1080p encoder chip, inProc. of Sym. on VLSI Circuits, pp.104-

    105, Aug. 2008.[7] Y.-H. Chen, T.-C. Chen, and L.-G. Chen, Power-scalable algorithm and

    reconfigurable macro-block pipelining architecture of H.264 encoder for

    mobile application, in Proc. Int. Conf. Multimedia Expo, pp.281284,

    Dec. 2006.

    [8] T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W.Chen, and L.-G. Chen, Analysis and architecture design of an

    HDTV720p 30 frames/s H.264/AVC encoder, IEEE Trans. Circuits

    Syst. Video Technol., vol. 16, no. 6, pp. 673688, June, 2006.

    [9] H.-C. Chang, Y.-C. Yang, J.-W. Chen, C.-L. Su, C.-A. Chien, J.-I. Guo,and J.-S. Wang, A dynamic quality-scalable H.264 video encoder

    chip, inProc. Asia South Pacific Design Automat. Conf., pp. 125126,

    Feb. 2009.

    [10] Y.-K. Lin, C.-C. Lin, T.-Y. Kuo, and T.-S. Chang, A Hardware-Efficient H.264/AVC Motion-Estimation Design for High-Definition

    Video, IEEE Trans. Circuits and System I, vol. 55, no. 6, pp. 1526

    1535, July, 2008.[11] C. Yang, S. Goto and T. Ikenaga, High Performance VLSI Architecture

    of Fractional Motion Estimation in H.264 for HDTV, in Proc. of Int.

    Symposium on Circuits and Systems, pp.26052608, May, 2006.

    [12] C.-Y. Kao, C.-L. Wu and Y.-L. Lin, A High-Performance Three-EngineArchitecture for H.264/AVC Fractional Motion Estimation, IEEE Trans.

    Very Large Scale Integration Sys., vol. 18, no. 4, pp. 662666, April,

    2010.

    [13] P. K. Tsung, W.-Y. Chen, L.-F. Ding, S.-Y. Chien, L.-G. Chen, Cache-based Integer Motion/Disparity Estimation for Quad-HD H.264/AVC

    and HD Multiview Video Coding, in Proc. of the IEEE Int. Conf. on

    Acoustics, Speech, and Signal Processing, pp. 20132016, April, 2009.

    [14] C.-M. Ou, C.-F. Le, W.-J. Hwang, An efficient VLSI architecture forH.264 variable block size motion estimation, IEEE Trans. Consumer

    Electronics, vol. 51, no. 4, pp. 12911299, Nov., 2005.

    [15] J. Kim and T. Park, A novel VLSI architecture for full-search variableblock-size motion estimation,IEEE Trans. Consumer Electronics, vol.55, no. 2, pp. 728733, May, 2009.

    [16] L. Zhang and W. Gao, Reusable Architecture and Complexity-Controllable Algorithm for the Integer/Fractional Motion Estimation of

    H.264,IEEE Trans. Consumer Electronics, vol. 53, no. 2, pp. 749756,

    May, 2007.

    [17] X. Lu, A.M. Tourapis, P. Yin, and J. Boyce, Fast Mode Decision andMotion Estimation for H.264 with a Focus on MPEG-2/H.264

    Transcoding, inProc. of Int. Symposium on Circuits and Systems, vol.

    2, pp.12461249, May, 2005.

    [18] C. E. Rhee, J.-S. Kim, and H.-J. Lee, Cascaded Direction Filtering forFast Multidirectional Inter-Prediction in H.264/AVC Main and High

    Profile Compression, IEEE Trans. Circuits Syst. Video Technol., vol.

    22, no. 3, pp. 403413, March, 2012.

    [19] B.-G. Kim, S.-K. Song, and C.-S. Cho, Efficient inter-mode decisionbased on contextual prediction for the P-slice in H.264/AVC video

    coding, in Proc. Int. Conf. Image Processing, pp.13331336, Oct.,

    2006.

    [20] B.-G. Kim and C.-S. Cho, A fast inter-mode decision algorithmbased on macro-Block tracking for P slices in the H.264/AVC video

    standard, in Proc. Int. Conf. Image Processing, vol. 5, pp. 301304,

    Sept., 2007.

    [21] X. Jin, Y. Huang, Q. Liu, S. Wu, and T. Ikenaga, Fast Spatial DirectMode Decision for B Slice based on Temporal Information in H.264

    Standard, in Proc. of Int. Sym. on Intell igent Signal Processing and

    Communication Systems,pp.331334, Jan. 2009.

    [22] T. Zhao, H. Wang, and S. Kwong, C. -C. J. Kuo, Fast Mode DecisionBased on Mode Adaptation, IEEE Trans. Circuits Syst. Video

    Technol., vol. 20, no. 5, pp. 697705, May, 2010.

  • 7/29/2019 06415009

    9/9

    C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1383

    [23] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C.Ko, Fast Intermode Decision in H.264/AVC Video Coding, IEEE

    Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 953 958, July,

    2005.

    [24] C. Y. Chang, C. H. Pan, and H. Chen, Fast mode decision for P-framesin H.264, presented at the Picture Coding Symp., Dec., 2004.

    [25] S.-H. Ri, Y. Vatis, and J. Ostermann, Fast Inter-Mode Decision in anH.264/AVC Encoder Using Mode and Lagrangian Cost Correlation,

    IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 302 306,

    Feb., 2009.

    [26] X. Jing and L.-P. Chau, Fast approach for H.264 INTER modedecision, Electronics Letters, vol. 40, no. 17, pp.10501052, Aug.,

    2004.

    [27] A. Ahmad, N. Khan, S. Masud, and M.A. Maud, Selection of variableblock sizes in H.264, in Proc. of the IEEE Int. Conf. on Acoustics,

    Speech, and Signal Processing, vol. 3, pp. 173176, May, 2004.

    [28] H. Zeng, C. Cai, and K.-K. Ma, Fast Mode Decision for H.264/AVCBased on Macroblock Motion Activity, IEEE Trans. Circuits Syst.

    Video Technol., vol. 19, no. 4, pp. 491 499, April, 2009.

    [29] J. Bu, S. Lou, C. Chen, and J. Zhu, A predictive block-size modeselection for inter frame in H.264, in Proc. of the IEEE Int. Conf. on

    Acoustics, Speech, and Signal Processing, vol. 2, pp. 917920, May,

    2006.

    [30] H.Ko, K. Yoo, and K. Sohn, Fast mode-decision for H.264/AVC basedon inter-frame correlations, Signal Processing: Image Commun.,

    vol.24, no. 10, pp. 803-813, Nov. 2009.

    [31] J. Y. Lee and H. Park, A Fast Mode Decision Method Based on MotionCost and Intra Prediction Cost for H.264/AVC, IEEE Trans. CircuitsSyst. Video Technol., vol. 22, no. 3, pp. 393 402, March, 2012.

    [32] D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, and X. Lin, Block intermodedecision for fast encoding of H.264, inProc. of the IEEE Int. Conf. on

    Acoustics, Speech, and Signal Processing, vol. 3, pp. 181184, May,

    2004.

    [33] Z. Liu, L. Shen, and Z. Zhang, An Efficient Intermode DecisionAlgorithm Based on Motion Homogeneity for H.264/AVC, IEEE Trans.

    Circuits Syst. Video Technol., vol. 19, no. 1, pp. 128132, Jan., 2009.

    [34] D. Zhu, Q. Dai, and R. Ding, Fast inter-prediction mode decision forH.264, in Proc. Int. Conf. Multimedia Expo, vol. 2, pp. 11231126,

    June, 2004.

    [35] C.-H. Kuo, M. Shen, and C.-C. J. Kuo, Fast inter-prediction modedecision and motion search for H.264, in Proc. IEEE Int. Conf.

    multimedia Expo, vol. 1, pp. 663666, June, 2004.

    [36]

    P. Yin, H.-Y.C. Tourapis, A.M. Tourapis, and J.Boyce, Fast modedecision and motion estimation for JVT/H.264, in Proc. of the IEEE Int.

    Conf. on Image Processing, vol. 3, pp.853856, Sept., 2003.

    [37] A. C. W. Yu, G. R. Martin, and H. Park, Fast Inter-Mode Selection inthe H.264/AVC Standard Using a Hierarchical Decision Process, IEEE

    Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 186 195, April,

    2009.

    [38] G. Kim, Y. Moon, and J. Kim, An early detection of all-zero DCTblock in H.264, inProc. Int. Conf. Image Processing, vol. 1, pp. 453

    456, Oct. 2004.

    [39] J. Lee and B. W. Jeon, Fast mode decision for H.264 with variablemotion block size, Lecture Notes in Computer Science, vol. 2869, pp.

    723730, 2003.

    [40] I. Choi, J. Lee, and B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 part-10 AVC/H.264, IEEE Trans.

    Circuits Syst. Video Technol., vol. 16, no. 12, pp. 15571561, Dec.,

    2006.

    [41] Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik, and H.-K.Jung,Adaptive mode decision for H.264 encoder, Electronics Letters, vol.

    40, no. 19, pp.11721173, Sept., 2004.

    [42] J. Lee and B. Jeon, Pruned mode decision based on variable block sizesmotion compensation for H.264, Lecture Notes in Computer Science,

    vol. 2899,pp. 410418, Nov., 2003.

    [43] C.S. Kannangara, I.E.G. Richardson, M. Bystrom, J.R. Solera, Y. Zhao,A. MacLennan, and R. Cooney, Low complexity skip prediction for

    H.264 through Lagrangian cost estimation, IEEE Trans. Circuits Syst.

    Video Technol., vol. 16, no. 2, pp. 202208, Feb., 2006.

    [44] Y. Moon, G. Kim, and J. Kim, An improved early detection algorithmfor all-zero blocks in H.264 video encoding, IEEE Trans. Circuits Syst.

    Video Technol., vol. 15, no. 8, pp. 10531057, Aug., 2005.

    [45] T.-C. Chen, Y.-W. Huang, and L.-G. Chen, " Fully utilized and reusablearchitecture for fractional motion estimation of H.264/AVC," inProc. of

    the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5,

    pp. 9 12, May, 2004.[46] M. Shao, Z. Liu, S. Goto, and T. Ikenaga, Lossless VLSI oriented full

    computation reusing algorithm for H.264/AVC fractional motion

    estimation, IEIEC Trans. Fundamentals, vol.90-A, no.5, pp. 756763,

    April, 2007.

    [47] Y. Song, M. Shao, Z. Liu, S. Li, L. Li, T. Ikenaga, and S. Goto, H.264/AVC fractional motion estimation engine with computation reusing

    in HDTV1080p real-time encoding applications, in Proc. of the IEEE

    Workshop on Signal Processing Systems, pp.509514, Oct., 2007.

    BIOGRAPHIES

    Chae Eun Rhee received the B.S., M.S. and Ph.D degrees in

    Electrical Engineering and Computer Science from Seoul

    National University, Seoul, Korea, in 2000, 2002 and 2011,

    respectively. From 2002 to 2005, she was with the Digital TV

    Development Group, Samsung Electronics Company Ltd.,Suwon City, Korea, as an Engineer, where she was involved

    in bus architecture and MPEG decoder development. She is currently working

    as a research professor in Electrical Engineering and Computer Science at

    Seoul National University, Korea. Her research interests include algorithm

    and architecture design of video coding for HEVC and H.264/AVC and

    configurable video coding for real time systems.

    Kyujoong Lee received the B.S. degree in electrical

    engineering from Seoul National University, Seoul, Korea,

    in 2002 and the M.S. degree in electrical engineering from

    University of Southern California, Los Angeles, USA, in

    2008. He is working toward Ph.D degree in electrical

    engineering of Seoul National University. From 2002 to

    2005, he was with Com2us Corporation, Seoul, Korea, as a

    developer. His major research interests include the algorithm and architecture

    of H.264/AVC and SVC and noise reduction of video stream.

    Tae-Sung Kim received the B.S degree in electrical

    electronic engineering from Pusan National University,

    Pusan, Korea, in 2010. He is working toward M.S degree in

    electrical engineering of Seoul National University. His

    research interests include the algorithm and architecture of

    H.264/AVC and HEVC.

    Hyuk-Jae Lee received the B.S. and M.S. degrees in

    Electronics Engineering from Seoul National University,

    Korea, in 1987 and 1989, respectively, and the Ph.D. degree

    in Electrical and Computer Engineering from Purdue

    University at West Lafayette, Indiana, in 1996. From 1998 to

    2001, he worked at the Server and Workstation Chipset

    Division of Intel Corporation in Hillsboro, Oregon as a senior

    component design engineer. From 1996 to 1998, he was on the faculty of the

    Department of Computer Science of Louisiana Tech University at Ruston,

    Louisiana. In 2001, he joined the School of Electrical Engineering and

    Computer Science at Seoul National University, Korea, where he is currently

    working as a Professor. He is a founder of Mamurian Design, Inc., a fabless

    SoC design house for multimedia applications. His research interests are in the

    areas of computer architecture and SoC design for multimedia applications.