06415009

7/29/2019 06415009

1/9

C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding 1375

Contributed Paper

Manuscript received 10/15/12

Current version published 12/28/12

Electronic version published 12/28/12. 0098 3063/12/$20.00 2012 IEEE

A Survey of Fast Mode Decision Algorithms

for Inter-Prediction and Their Applications

to High Efficiency Video Coding

Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee

Abstract The emerging High Efficiency Video Coding

(HEVC) standard attempts to improve the coding efficiency by a

factor of two over H.264/AVC using new compression tools with

high computational complexity. The increased computational

complexity makes the real-time execution with reasonable

computing power become one of the critical concerns for the

commercialization of HEVC. A large number of prediction

modes are the main causes of the increased complexity of HEVC.

Thus, a fast decision of a prediction mode needs to be effectively

used to reduce the computational complexity. To take advantageof large amounts of previous works and to find a guide for

application to HEVC, this paper presents a survey of these efforts

for the previous standards, especially for H.264/AVC, and

examines the possibility of the previous algorithms to be

applicable for HEVC. To this end, previous algorithms are

categorized and then the effectiveness of each category for

HEVC is evaluated. For this evaluation, a previous algorithm is

modified for HEVC when it is not applicable to HEVC directly.

Simulation results show that most previous algorithms with slight

modification, in general, improve the encoding speed with a

relatively small degradation of the compression efficiency.

Among them, hierarchical mode decision is especially effective

whereas mode pre-decision using motion or spatial homogeneityoften results in inaccurate results.

1

Index Terms Fast inter-prediction, Mode decision,

Hardware encoder, HEVC, H.264/AVC.

I.INTRODUCTIONVideo compression technologies as well as video applications

such as video conferencing, streaming, video storage and

communication have attracted industry attention due to the

increasing popular demand for high-definition (HD) video content.

H.264/AVC [1] has been regarded as the state-of-the-art video

coding standard and widely used. Recently, the next-generation

video coding standard [2]-[4] known as High Efficiency VideoCoding (HEVC) has been developed by ISO/IEC MPEG and ITU-

T VCEG. In the emerging HEVC standard, several new features

1 This work was supported by the National Research Foundation of

Korea(NRF) grant funded by the Korea government(MEST) (No.

2012R1A2A2A06047297).

Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee are with

the Inter-university Semiconductor Research Center (ISRC), Department of

Electrical Engineering and Computer Science, Seoul National University,

Seoul, Korea (e-mail: [email protected], [email protected],

[email protected], [email protected]).

are introduced, including a flexible block structure, the increased

intra-coding directions, sophisticated interpolation filters, various

in-loop filters, and enhanced entropy coding schemes. The HEVC

standard aims at bitrate saving by a factor of two over H.264/AVC

at the expense of an increase in computational complexity.

Like H.264/AVC, mode decisions with motion estimation (ME)

remain among the most time-consuming computations in HEVC.

In an inter-prediction mode decision, a full-search algorithm

searches for every possible block size and refines the results from

integer-pel to quarter-pel resolution. Thus, a full-search algorithmguarantees the highest level of compression performance. However,

the considerable computational complexity for a mode decision is

critical for the encoding speed. Moreover, the main target

resolution of HEVC is full HD (19201080) and beyond.

Therefore, fast inter-prediction is not only an important challenge

but also an urgent problem to be solved for HEVC compression to

be used in real-time consumer electronic devices.

Extensive research effort has been conducted to reduce the

computational complexity for inter-prediction for H.264/AVC,

pursuing an effective trade-off between the rate-distortion (RD)

drop and the speed-up. In order to deal with the similar challenge

for HEVC, this paper reviews principal algorithms which havealready been attempted for H.264/AVC. A survey of these various

algorithms and an evaluation of their contributions and limitations

provide valuable leads for the development of fast algorithms for

HEVC inter-prediction. Major differences between H.264/AVC

and HEVC are also investigated from an algorithmic and

architectural perspective. Previous algorithms for the fast

H.264/AVC inter-predictions are then modified and re-designed

for HEVC inter-predictions so as to explore the possibilities for

application to HEVC.

The rest of the paper is organized as follows. Section II gives

an overview of inter-prediction in HEVC. Previous approaches

for fast inter-predictions in H.264/AVC are surveyed in Section

III, and the application of the fast inter-mode selection algorithms

to HEVC is presented in Section IV. Conclusions are given in

Section V.

II. OVERVIEW OF INTER-PREDICTION IN HEVCA.Inter-Prediction Algorithm in HEVC

To achieve high compression performance for high-

resolution videos, HEVC defines the coding unit (CU) as the

basic processing unit instead of the macroblock (MB).

7/29/2019 06415009

2/9

1376 IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

Unlike an MB of which size is fixed as 1616 pixels, the

size of a CU is not fixed, varying from 88 to 6464. A

large CU reduces the motion information data. Thus, the

compression efficiency is improved in a lossless manner,

especially for high-resolution videos. A CU can be

partit ioned into smaller CUs and the structure among

different CUs is represented by a quad-tree. The depth of

this tree can be as large as four. The largest CU in depth 0 is

denoted as LCU. For a CU of which size is denoted by

2N2N, predictions are performed for various block sizes of

2N2N, 2NN, N2N and NN. The processing unit for

prediction is called prediction unit (PU).

The HM5.0 reference software offers fast algorithms to

speed up the prediction time by an early decision of the final

prediction mode with the evaluation of only subsets of

prediction modes. One of the fast algorithms is the early SKIP

mode decision in which the computation for the SKIP mode is

performed first for a 2N2N PU. If the RD cost is less than

the average SKIP costs, as accumulated from the previous

SKIP modes, not only the prediction of the other PU types at

the same depth but also all predictions for the further depthsare omitted. Another fast algorithms are the early CU

determination and the coded block flag (CBF)-based fast

mode decision, denoted as ECU and CFM, respectively. The

CBF represents blocks with a zero residual. In the ECU, the

RD costs for the SKIP, 2N2N, 2NN and N2N inter-modes

as well as the intra-mode are calculated at the current depth. If

the SKIP mode cost is the smallest, predictions for CUs

smaller than the current CU are not performed. Meanwhile,

the CFM is used to select the PU size in the current CU and to

save computation power for predictions of less-probable PU

sizes. The predictions for the SKIP, 2N2N, 2NN and N2N

PUs are processed in sequence. If the CBF of the current PUhappens to be all zeros, the prediction is terminated and the

computation for the remaining PU sizes is saved, as a zero

CBF indicates that the RD performance is adequate when the

current PU is determined to be the best mode. Even if the

current PU is different from the best PU, the difference in the

RD cost between the current PU and the best PU may be

negligible.

B.Hardware Implementation for an HEVC EncoderThis subsection examines the impact of the HEVC coding

structure on the hardware implementation. In this paper, it isassumed that the pipeline architecture for the HEVC hardware

encoder may be similar to the widely used architectures forH.264/AVC encoders [5]-[9] where the integer motionestimation (IME) is performed in stage 1, whereas the FMEwith the MC is performed in stage 2. As the hardware encoder

takes advantage of parallel and/or pipelined execution ofmultiple hardware resources, the dependence betweencomputations in the HEVC standard often causes anunexpected slow-down.

To support the parallel execution of IMEs for all block-sizes

in H.264/AVC, the sum of absolute differences (SADs) for all

44 blocks of an MB is calculated simultaneously. The

obtained SAD values are combined in the variable-block-size

(VBS) adder tree and 41 SADs for all block sizes are

generated in one cycle. The problem is that the rate term of

the RD cost function can be computed only after the motion

vectors (MVs) of the neighboring blocks are determined,

which causes dependence among IMEs for various size blocks.

In addition, when IME and FME are processed not serially but

in a pipelined manner, the left MB is still in the FME stage.

Thus, the best mode and the best MV of the left block are not

available. In the H.264/AVC, the modified MV predictor

(MVP) is applied for all 41 blocks [8]. Instead of the median

value of MVs on the left, upper and upper-right blocks, the

median value of MVs on the upper-left, upper and upper-right

MBs are used for all 41 blocks equally in order to facilitate the

parallel processing and the MB pipelining.

The solution for parallel IME executions for H.264/AVC is

able to be applied for an HEVC encoder. The MVP of 2N2N

PU in LCU is used for all blocks equally. When this MVP is

derived, the left and below-left candidates among the spatial

MVP candidates are excluded. With a modification of the

MVP derivation, the IME execution in HEVC now has largeparallelism. Moreover, the parallelism in IME execution for

HEVC is larger than that for H.264/AVC. In H.264/AVC, the

parallel execution of IME is done for 1616 MB and sub-

blocks, whereas 6464 LCU and all blocks smaller than LCU

can have their IMEs processed in parallel in HEVC. When the

same search range and the search scheme are used for

H.264/AVC and HEVC, a 1616 MB and a 6464 LCU are

expected to have an identical IME time.

For FME, it is more difficult to exploit available parallelism

than for IME because the two-step FMEs for half- andquarter-pixel precisions should be performed sequentially.Furthermore, the modified MVP or the mode reductiondecreases the compression efficiency more seriously than IMEfast computation algorithms, in general. Besides, recent study

on H.264/AVC hardware encoders [10]-[16] also shows thatthe speed-up of the execution time is easier for IME than FME.Thus, in H.264/AVC, FME is usually conducted one by onefor 41 blocks in a 1616 MB. As a result, the execution time

for FME with the MC is most likely to be larger than that forIME. Even though the additional hardware resource is usedfor a parallel FME execution [12], the encoding time for anLCU is most likely determined by the time for the FMEfollowed by the MC and mode decision.

III. FAST INTER-MODE SELECTION ALGORITHMS FORH.264/AVC

In this section, fast inter-mode selection algorithmsproposed for H.264/AVC are surveyed. An effective pre-

selection of prediction block sizes is crucial for fast encoding.Reduction of the prediction block sizes often requires an RDdrop. An effective trade-off between the RD drop and thespeed-up has been one of the main research subjects to be

tackled. The previous algorithms are categorized according tothe decision stage and criteria, as shown in the classificationtree in Fig. 1.

7/29/2019 06415009

3/9


ME

pre-decision

Hierarchicaldecision

Motion characteristics of the

current MB

Spatial characteristics of the

current MB

Remaining prediction according to

the prior prediction result

Further prediction by comparing

the prior predictions

Neighboring (spatial and temporal)

information

FME

pre-decision

Mode pre-decision based on the

rate-distortion cost from IME

Reduction of FME calculation

from the reuse of integer-pel MVs

Fast inter-modeselection

Fig. 1. Classification tree of fast inter-predictions for H.264/AVC

There are roughly three categories of algorithms for fastinter-mode prediction. In the first category, candidate block

sizes are determined prior to ME and prediction operations

including ME are performed for only the selected candidateblock sizes. This category is further classified into three sub-categories. In the first sub-category, spatial and/or temporalcorrelation in a video is widely used to select candidates and

the degree of correlation is obtained from neighboringinformation. For instance, if an MB is surrounded byneighboring MBs coded as the DIRECT or SKIP mode, thevideo sequences are assumed to be changing smoothly and the

motion is similar to that in the neighboring area. In this case,the current MB is very likely to be coded in the DIRECT orSKIP mode or with a large block size such as 1616 [17]-[22].In a similar manner, various algorithms [23]-[27] search forthe best block size based on spatial and temporal homogeneity

investigations of the neighboring blocks.The algorithms in the second sub-category take advantage of

the correlation between the motion homogeneity and the bestblock size. Natural video sequences include stationary or

motionless regions for which the optimal block sizes aremostly large. Thus, the MVs of spatially and temporallyadjacent MBs are used to classify the motion characteristics[28], whereas the absolute difference between consecutiveframes is used to detect motion homogeneity [26][29][30]. Liu

et al [33] estimate the motion homogeneity of the current MBby MVs, which are generated from ME on 44 blocks insidethe current MB.

In the algorithms in the third sub-category, candidate block

sizes are predicted through spatial characteristics of thecurrent MB. A frame-level edge map or a variance of the MBis estimated to detect a homogeneous region [23][31][32]. Inother studies, the image is down-sampled and pre-encoded

[34][35]. The candidate block sizes are obtained aftercomparing the estimated RD cost during pre-encoding.

The algorithms of the second category explore the best blocksize in a hierarchical manner. In other words, certain blocksizes are estimated prior to the other block sizes and the

decision for a further block size search is then made using theresult of the prediction of the prior block sizes [29][31][36]-

[44]. In the first sub-category, the result of the prior predictionis tested. The decision regarding a further prediction isdetermined based on the test result. One of the most popular

algorithms is the early SKIP mode decision, where predictionsof remaining block sizes are performed only when the earlySKIP condition is not satisfied. Kannangara et al [43] makethe early SKIP mode prediction by estimating a Lagrangian

RD cost function which incorporates an adaptive model for

the Lagrangian multiplier parameter based on local sequencestatistics. Other studies [37][38][41][44] propose a simplethreshold-based algorithm to detect zero-coefficients blocks.Zero-coefficients represent the small distortion in the RD cost

function, and the SKIP mode decision is made early withoutthe expensive computation of the real RD cost. Or, if the RDcost of the SKIP mode is less than the threshold, the SKIPmode is selected as the best mode [36]. Here, the threshold is

defined as N bits the Lagrangian multiplier parameter,where N bits are equal to the minimum number of the bitsrequired for the non-SKIP mode.

In the second sub-category, the results of the priorpredictions are compared and further prediction is determined

according to the comparison result. In the algorithms of Yuand Chois studies [37][40], the RD costs of block sizes arecompared in the order of large to small block sizes. If thecurrent RD cost is larger than the RD cost of the larger block

size, further searches for blocks smaller than the current blockare stopped. In Yin and Lees studies [36][42], the RD costsof square blocks, 1616, 88 and 44, are tested first. If thetendency of these RD costs is not monotonic, all other non-square blocks need to be tested. Otherwise, only block sizes

between the best two square block sizes are searched. Inaddition to the hierarchical decision approach, much researchhas proposed a hybrid solution which selects candidateprediction block sizes prior to ME using the information

mentioned in the first category, after which prediction blocksizes are searched in a hierarchical manner [18][31][37].

In the first sub-category of the third category, IME isperformed for every block size. Next, the results of IME areused to select the candidate block sizes for FME. In the

simplest approach, called mode pre-decision (MPD), the bestcombination of various block sizes (VBS) for an MB isselected with the IME results. FME simply refines the integer-pel MV of the selected block size to the quarter-pel precision.

MPD suffers from a significant RD drop because the bestblock size from the IME may change after refinement in theFME. To achieve a better trade-off between the compressionefficiency and computational complexity, the advanced MPD

(AMPD) is proposed [45]. In the AMPD, more than onecandidate block size for the subsequent FME operations isselected. Seven partitions, four 88 partitions together withthe 168, 816 and 1616 partitions, are sorted according totheir IME cost. As a result, N (N = 1~7) partitions are selected

for the FME. In AMPD2 [45], one candidate is selected fromthe 168, 816 and 1616 partitions, whereas two areselected from the 88 partitions. Similarly, two partitions areselected by mode filtering (MF) for the FME operation fromthe IME phase [10]. One is selected from the 168, 816 and

1616 partitions and the other is selected from the 88, 168,

7/29/2019 06415009

4/9


816 and 1616 partitions, where the 88 partition consists ofthe best sub-block sizes. In this MF algorithm, the number ofselected block sizes is relatively low and larger block sizes are

more frequently selected than AMPD2.In the second sub-category, computation-reuse techniques

are adopted. Shao et al [46] propose that the FME for eachblock size is performed one by one. If the integer MV of thecurrent block is identical to that of the block already processed,

no FME computation needs to be performed. In particular, inthe homogeneous region, adjacent blocks tend to have thesame integer MV after IME. Therefore, this reusing techniquereduces the calculation for the prediction of the block size

with no RD drop. The same algorithm is applied to block sizeslarger than an 88 block [47]. The FMEs for blocks smallerthan 88 are omitted. Thus, the encoding time decreases morethan that of Shaos algorithm with a reasonable PSNR drop.

IV.APPLICATION OF FAST INTER-MODE SELECTION

ALGORITHMS TO HEVC

As explained in Section II, HEVC supports larger and more

various block sizes than H.264/AVC. If the early decision ismade to select the prediction block size, the computational

complexity is significantly reduced by omitting the remainingpredictions. Recently, several fast inter-mode selectionalgorithms have been proposed for HEVC. However, it is

important first to take advantage of the considerable amountof previous work and to find a guide for application to HEVC.In this paper, several previous algorithms proposed forH.264/AVC are modified and tested for HEVC. Algorithmswhich require an additional calculation to judge the texture

characteristic or motion homogeneity, such as a frame-leveledge map, are not used for simplification. The followingalgorithms are implemented in the HM5.0 reference software.

A.Prediction Block Size Pre-DecisionIn Section III.A, the block size prediction algorithms are

classified into three sub-categories that utilize spatial/temporal

correlations, motion vector information and the spatialcharacteristics of the current MB, respectively. Thissubsection evaluates the effectiveness of the three types ofalgorithms when they are used for early block size decisions

in HEVC. To this end, the relationship between the aboveinformation and depth of the current LCU is examined with

experiments. Note that the depth of the LCU determines theblock size in HEVC. Ten video sequences,Akiyo, Container,

TABLEI

CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR352288TEST VIDEOS

QP Dep thMax(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)

Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan Akiyo Container Foreman Sean Stefan

20

0- - - - - - - - - - - - - - -

10.23 0.22 5.26 3.81 7.25 2.04 3.41 2.75 1.81 3.33 1.84 2.49 2.81 1.85 2.88

(0.41) (0.38) (21.81) (370.40) (91.98) (6258.15)(10528.18 (7506.87) (736.33) (4891.66) (0.82) (0.39) (0.18) (0.76) (0.13)

20.51 0.28 7.05 2.73 13.96 2 .26 2.67 3.81 2.60 3.08 2.85 2.78 2.80 2.75 2.95

(0.5 6) (0.34 ) (1 20 .28) (22 2.79) (2 95 .62) (30 20.20 ) (70 49 .42 )(11 02 0.7 6 (17 46 .03) (3 889 .57) (0 .1 5) (0 .2 0) (0.17 ) (0.31 ) (0.05 )

30.70 0.49 6.71 10.99 16.76 2.93 1.96 2.65 2.30 2.56 2.93 2.89 2.97 2.97 2.98

(2 .0 1) (0 .5 1) (1 45 .7 0) (1 15 1.85 ) (3 82 .3 9) (4 84 4.59 ) (4 59 2.76 ) (6 03 5.12 ) (1 76 7.59 ) (2 88 3.81 ) (0 .0 7) (0 .1 3) (0 .0 3) (0 .0 3) (0 .0 3)

32

0- - - - - - - - - - - - - - -

10.43 0.37 9.12 5.42 11.89 2 .51 4.26 3.93 1.97 3.19 1.29 1.60 2.00 1.50 2.21

(0.4 5) (0.86 ) (1 042 .00 ) (54 7.51) (1 71 .20) (68 69.12 ) (95 60 .06 )(12 06 5.0 5 (8 06 .6 7) (5 411 .69) (0 .4 4) (0 .6 3) (0.54 ) (0.59 ) (0.55 )

21.21 0.43 8.91 6.87 10.81 2 .95 2.96 2.99 3.07 2.88 2.14 2.33 2.29 2.27 2.55

(2.2 9) (0.36 ) (2 03 .11) (64 4.77) (1 94 .82) (39 40.74 ) (37 48 .08 ) (69 30 .70) (12 79 .87) (2 980 .45) (0 .4 2) (0 .6 0) (0.42 ) (0.63 ) (0.25 )

31.00 0.67 5.49 6.74 10.84 2 .70 1.18 2.17 2.65 2.72 2.70 2.82 2.72 2.90 2.94

(0.8 6) (0.67 ) (5 5.16 ) (54 2.93) (1 89 .52) (38 64.05 ) (13 87 .00 ) (11 05 .39) (18 85 .03) (1 729 .40) (0 .2 3) (0 .2 8) (0.21 ) (0.09 ) (0.06 )

TABLEII

CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR19201080TEST VIDEOS

QP Depth

Max(Neighboring MVs) Variance of the current LCU (103) Max(Neighboring Depths)

AspenBasketBal

DriveB QTerrace C actu s Kimo no 1 As pen

BasketBalDrive

B QTerrace C actu s Kimo no 1 Asp enBasketBal

DriveBQTerrace Cactus Kimono1

20

069.63 73.68 27.17 11.82 15.18 0.03 0.19 0.07 0.06 0.19 1.57 1.24 1.29 2.00 2.02

(14436.13 (40665.65 (3582.74) (15.78) (624.55) (12.57) (91.78) (0.45) (29.32) (135.24) (0.68) (0.75) (0.81) (0.70) (0.53)

144.90 75.85 40.38 37.86 18.93 0.43 0.52 1.83 0.96 0.39 2.21 2.51 2.17 2.83 2.40

(7 21 1. 57 )(2 24 41 .7 4 (8 69 4.43 )(1 60 47 .7 6) (7 21 .2 0) (4 32 .4 6) (7 61 .2 9) (1 01 25 .7 2 (2 36 5.81 ) (3 63 .0 5) (0 .4 4) (0 .3 5) (0 .7 3) (0 .2 3) (0 .3 9)

233.70 62.34 20.82 12.94 20.59 0.46 0.54 1.80 0.86 0.44 2.51 2.76 2.88 2.97 2.62

(3 31 6.01 )(1 07 57 .6 0 (3 02 3.15 ) (3 73 2.76 ) (8 93 .4 1) (3 32 .5 0) (6 11 .1 0) (4 49 7.16 ) (1 78 6.60 ) (5 87 .2 8) (0 .3 2) (0 .2 0) (0 .1 3) (0 .0 3) (0 .2 7)

331.21 60.76 11.81 8.14 24.33 0.52 0.69 1.92 1.00 0.46 2.77 2.95 3.00 2.99 2.82

(3 59 0.61 ) (7 22 1.25 ) (3 03 9.67 ) (5 09 .8 6) (1 56 0.22 ) (3 56 .2 3) (6 66 .9 0) (3 83 4.08 ) (2 28 4.00 ) (6 69 .3 7) (0 .1 9) (0 .0 5) (0 .0 1) (0 .0 1) (0 .1 5)

32

016.71 36.81 5.01 1 .79 13.59 0.20 0.23 0 .99 0 .60 0.27 0.76 0.77 0.95 0.69 1.44

(1 35 3.52) (2 52 1.04 ) (1 01 .63) (2 83 .5 8) (4 40.4 5) (10 8.80) (18 4.61) (18 54 .24 ) (8 75 .6 6) (2 75 .8 7) (0.6 0) (0.61 ) (0 .93 ) (0 .8 1) (0.5 7)

131.24 64.82 4.31 7 .98 19.02 0.77 0.99 2 .90 1 .53 0.45 1.64 1.69 1.92 1.99 1.82

(3 34 8.42 ) (8 08 1.78 ) (5 2.6 8) (4 26 .4 5) (1 10 8.55 ) (6 80 .7 1) (9 97 .5 8) (7 92 5.55 ) (4 71 4.41 ) (5 39 .6 8) (0 .5 1) (0 .5 1) (0 .6 7) (0 .5 4) (0 .3 7)

212.48 63.44 4.78 12.27 23.72 0.58 1.06 2.38 1.03 0.55 2.14 2.18 2.43 2.47 2.02

(7 5.7 6) (4 56 0.53 ) (6 8.73 ) (8 56 .5 9) (2 57 6.14) (23 6.52) (64 0.00) (43 73 .66 ) (7 50 .4 0) (6 96 .8 7) (0.5 0) (0.38 ) (0 .47 ) (0 .3 6) (0.3 0)

355.96 57.45 4.02 14.01 26.78 0.51 1.03 2.02 0.98 0.68 2.07 2.58 2.81 2.81 2.34

(4 83 8.08 ) (2 71 0.68 ) (2 3.6 6) (1 30 7.32 ) (6 79 2.91 ) (1 21 .2 5) (6 01 .1 1) (3 14 8.50 ) (6 38 .7 6) (6 13 .0 5) (0 .6 0) (0 .2 7) (0 .2 1) (0 .2 0) (0 .3 3)

7/29/2019 06415009

5/9


Foreman, Sean and Stefan with a resolution of 352288 aswell as Aspen, BasketBallDrive, BQTerrace, Cactus and

Kimono1 with a resolution of 19201080 are used. The

352288-size test videos use the same sequences used in theresearch for H.264 [17][28].

In Table I and Table II, the correlation between the depth ofthe current LCU and the information obtained fromneighboring LCUs is presented for 352288 and 19201080

video sequences, respectively. The first column represents thequantization parameter (QP) values, while the second columnrepresents the depth. The maximum value among the absoluteMVs of the neighboring LCUs is obtained for each LCU and

presented from the third to seventh columns of Tables I and II.The average and the variance (given in parenthesis) of themaximum MVs for each depth are shown in these columns.For a low resolution video in Table I, depth 0 is seldomselected. For these cases, the corresponding cells are left blank.

For videos at a resolution of 352288 and with QP=20, itappears that the depth of the LCU increases as the averagemagnitude of the neighboring MVs increases. The variance isalso not large in this case. This result follows the data proposed

in the research [28], which uses H.264/AVC targeting 176144and 352288-size videos. For 19201080 videos in Table II,however, the depth of the LCU does not increase along with theneighboring MVs and its variance is very large. This indicatesthat the correlation between the depth of the current LCU and

the neighboring MVs does not exist. In high-resolution videos,the MV values are quite large, even when the motion seemsstationary. Sometimes, a large block size is preferred, even withfast and complex motion, because the texture, brightness or

colors of the same object can be changed in a different way inevery frame. In this case, the elaborated ME with a small blocksize cannot reduce the prediction error. Therefore, theobservation leads to the conclusion that the correlation between

the depth of each LCU and the neighboring MVs becomes smallfor high-resolution videos.

If the pixel variance of a certain region is small, this region islikely to be spatially homogeneous and is probably encoded as alarge block size. This possibility is tested and the results are

presented from the eighth to twelfth columns in Tables I and II.In these columns, the correlation between the depth and thevariance of the current LCU is presented. For 19201080videos encoded with QP=20 in Table II, the variance of thecurrent LCU is quite low when its depth is 0. However, in other

cases, the correlation between the depth and the variance of thecurrent LCU is not very strong. In HEVC, the number of blocksizes is significantly larger than that in H.264/AVC. Moreover,

blocks can be encoded as the SKIP mode not only in the LCUsize but also in every CU. These changed SKIP mode decisionand the number of block sizes make it difficult to find a strongcorrelation between the depth and the variance.

From the thirteenth to seventeenth columns in Tables I andII, the correlation between the depth of the current LCU and

the depth information of the neighboring LCUs is presented.The depth of the current LCU becomes large as theneighboring depths increases while its variance is quite small.These results show that the correlation between the depths for

the current and neighboring LCUs is positive.

From the above simulations, the neighboring depthinformation may be helpful for a prediction of the block size(or the LCU depth), whereas the MV or variance information

may be not very useful.

B.Hierarchical Decision of Prediction Block SizeThe algorithm of Yu [37] checks three conditions for an

early SKIP mode decision. First, one of neighboring blocks is

the SKIP mode block. Second, the sum of absolute difference(SAD) of the current MB is less than the average SAD of theneighboring MBs. Here, the SAD is the difference between a

block in the current frame and a co-located block in thereference frame. Lastly, the result of the fast transform-quantized coefficients is zero. In Table III, the early SKIPmode decision algorithms denoted by ES proposed in theHM5.0 reference software and Yus algorithm [37] are tested.

For the simulation of a hardware-based HEVC encoder, theencoding time to process a LCU is estimated by adding thetime for the FME with the MC of each CU inside an LCU,as the stage of the FME with the MC operations take the

most time in the pipeline schedule as discussed in Section

II.B. The configurations for the encoding are low-complexity, low-delay, and P picture-only and the numberof reference frames is four at most. Twelve video sequences,

BQMall, FlowerVase, Keiba and RaceHorses with a

resolution of 832480; FourPeople, KristenAndSara,

Johnny and Vidyo1 with a resolution of 1280720; and

Aspen,BasketBallDrive, SnowMountain andKimono1 with aresolution of 19201080, are used in the evaluation. There

are 50 frames in each test sequence, and four QPs (20, 24,28 and 32) are used. The first and the second columnsrepresent the resolutions and test sequences used in thesimulation. From the third to fifth columns, the increase inbitrate and PSNR and the time saved, denoted by B, P

and T, respectively, are shown when the ES proposed inthe reference software is applied. The time is reduced by60.76%, whereas the bitrate slightly decreases and the PSNRis degraded by 0.02dB. Yus algorithm [37] makes the early

SKIP mode decision considering neighboring informationand the characteristics of the current CU unlike the RD cost-based algorithm in the reference software. From the sixth toeighth columns, two algorithms are used together tocomplement each other. The time is reduced by 69.82%.

TABLEIII

RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY AN EARLY

SKIPMODE DECISION

Size Videos

ES in HM ES in HM + Yus

B P T B P T

(%) (dB) (%) (%) (dB) (%)

832

480

BQMall - 0.32 - 0.02 53.81 2.45 - 0.13 67.62

FlowerVase -0.03 -0.01 71.87 -0.56 -0.09 81.95

Keiba -0.13 - 0.01 38.96 2.76 -0.08 53.05

RaceHorses -0.18 -0.02 31.48 1.49 -0.08 39.98

1280

720

FourPeople -0.84 -0.03 82.55 -0.30 -0.09 88.00

Johnny - 1.15 - 0.03 78.36 - 1.30 -0.07 84.50

KristenAndSara -1.06 -0.02 79.08 -0.63 -0.08 84.92

Vidyo1 - 0.62 -0.02 78.08 - 0.29 -0.10 85.61

1920

1080

Aspen -0.19 -0.01 59.42 0.68 - 0.05 68.72

BasketBallDrive -0.43 -0.02 54.68 0.97 -0.07 63.27

SnowMoutain -0.50 -0.02 58.79 -2.13 -0.10 68.53

Kimono1 -0.16 -0.01 42.09 0.77 -0.04 51.73

Average - 0.47 -0.02 60.76 0.33 - 0.08 69.82

7/29/2019 06415009

6/9


The ECU and CFM algorithms are applied to HEVC and theresults are tabulated, as presented in Table IV. From the thirdto fifth columns, the ECU algorithm is used alone. The

encoding time is reduced by 48.03% and the RD drop ismarginal. When ECU and CFM algorithms are used together,the encoding time saved is 64.03%, whereas the PSNR is0.07dB less than that of ECU. When the three algorithms ofECU, CFM and ES are used together, only 5% of time is

additionally saved.According to the categorization in Section III, the early

SKIP mode decision, ECU, and CFM are all classified as

belonging in the first sub-category in the hierarchical decision.

On the other hand, no algorithms for the second sub-category

are defined in the HM5.0 reference software. The second sub-

category algorithms for H.264/AVC are applied to HEVC

compression and the effect on the RD performance and the

speed-up is investigated. In a number of previous block-size-

reduction algorithms, the prediction of the block size at the

lower depth is performed first and the searches for deeper

depths are then stopped if a certain condition is satisfied

[18][31][37]. The following three algorithms are classified asbelonging in the third sub-category in the hierarchical decision

according to the categorization in Section III.A Early

termination of CU, which is similar to the algorithm proposed

by Lee [31] (denoted as ETCU1 henceforth), is applicable for

the reduction of the block size search. The predictions for four

CUs at depth (d+1) are performed after the prediction for CU

at depth (d). Every time the prediction for the CU at depth

(d+1) is finished, the RD cost of each CU is accumulated and

compared with the early termination threshold. If the current

accumulated RD cost at depth (d+1) is larger than the

threshold, the total RD cost of four CUs at depth (d+1) is

expected to be larger than that of the corresponding CU at

depth (d). Thus, the ongoing prediction at depth (d+1) isterminated early. The threshold is derived from the RD cost at

depth (d). In the Yus algorithm [37], if the RD cost of

2N2N at depth (d+1) is greater than a quarter of the best RD

cost at depth (d), further searches on 2NN and N2N PUs at

depth (d+1) as well as deeper depths are not performed. This

algorithm is denoted as ETCU2 henceforth. Another early

termination algorithm proposed not performing a FME

operation at each depth [18]. This strategy is denoted as

FME_SKIP hereafter. The SKIP mode plays an important role

in compression efficiency and SKIP mode prediction is, thus,

always performed, even when various fast-mode decision

schemes are applied. The result of the SKIP mode prediction

is obtained very quickly due to its low complexity as

compared to other inter- and intra-predictions. If the ME cost

as estimated in the middle of its computation is greater than

the SKIP cost, the ME operation is terminated. A specific

algorithm is as follows. After IME, the IME cost is compared

to the cost of the SKIP mode using the condition C FME_SKIP as

defined in (1). Here, COSTSKIP is the cost of the SKIP mode,

whereas COSTIME denotes the IME cost. If COSTSKIP is less

than COSTIME multiplied by WFME_SKIP, FME is not performed

for the current block. The weight value, WFME_SKIP, is chosen

experimentally and is set to 0.8 because it is observed that the

cost obtained from FME is approximately 80% of COST IME

on average. Therefore, the final cost of ME can be estimated

as 0.8COSTIME, and this estimated ME cost is compared with

COSTSKIP.

CFME_SKIP: COSTSKIP < WFME_SKIPCOSTIME (1)

In Table IV, from the twelfth to fourteenth columns,

ETCU1 algorithm is used alongside ECU and CFM

algorithms. The encoding time is reduced by 68.26%, whereas

the increase in bitrate and the PSNR drop are 1.88% and

0.23dB, respectively. ETCU2 algorithm from the fifteenth to

seventeenth columns shows 75.25% of time saving but the RD

drop is quite large. Lastly, from the eighteenth to twentieth

columns, the simulation results are shown when ECU, CFM

and FME_SKIP algorithms are used together. The time saving

of 89.95% is achieved, whereas the RD performance is much

better than those of ETCU1 and ETCU2. For three simulationsincluding ECU+CFM+ETCU1, ECU+CFM+ETCU2 and

ECU+CFM+FME_SKIP, using ES algorithm additionally is

not helpful both for the time saving and the RD performance.

C.Decision of Prediction Block Sizes before FMEIn H.264/AVC, AMPD (or AMPD2) or MF has been

successfully used for block size reduction. As explained in

Section II, the reduction of the FME time is very important for

real-time encoding for a hardware-based encoder

TABLEIV

RDPERFORMANCE DEGRADATION AND THE TIME SAVED BY THE ECU AND CFM ALGORITHMS PROPOSED IN THE HM5.0REFERENCE SOFTWARE

Size Videos

ECU ECU+CFM ES+ECU+CFM ECU+CFM+ETCU1 ECU+CFM+ETCU2 ECU+CFM+FME_SKIP

B P T B P T B P T B P T B P T B P T

(%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (dB) (%)

832

480

BQMall -0.87 -0.05 44.84 -1.47 -0.12 60.83 -1.81 -0.16 65.49 3.48 -0.25 64.04 19.78 -0.58 72.04 -1.10 -0.15 82.49

FlowerVase -1.77 -0.07 66.12 -3.97 -0.21 82.30 -4.11 -0.23 86.46 1.79 -0.52 84.82 5.44 -0.63 86.34 -2.63 -0.29 96.35

Keiba -0.51 -0.03 30.16 -1.13 -0.09 47.67 -1.36 -0.11 51.64 3.23 -0.17 53.90 10.49 -0.31 64.86 -0.70 -0.12 78.27

RaceHorses -0.27 -0.02 14.67 -0.66 -0.09 32.62 -0.93 -0.14 38.16 5.98 -0.27 43.83 16.76 -0.48 55.73 -0.18 -0.10 66.37

1280

720

FourPeople -1.15 -0.04 67.92 -1.89 -0.09 81.95 -2.42 -0.12 86.97 2.71 -0.28 83.15 10.20 -0.47 86.34 -1.85 -0.12 95.49

Johnny -1.96 -0.04 64.72 -3.95 -0.13 80.83 -4.85 -0.16 85.96 -0.36 -0.22 82.02 5.91 -0.39 85.71 -3.91 -0.18 95.05

KristenAndSara -1.79 -0.06 64.35 -2.79 -0.14 80.33 -3.52 -0.18 85.34 2.36 -0.30 81.70 9.21 -0.50 85.15 -2.67 -0.17 95.30

Vidyo1 -1.89 -0.05 65.23 -2.80 -0.13 80.14 -3.24 -0.15 85.03 1.70 -0.30 82.14 6.88 -0.44 84.90 -2.85 -0.16 94.92

1920

1080

Aspen -0.34 -0.01 45.75 -1.01 -0.05 62.68 -1.24 -0.07 67.58 1.71 -0.12 67.94 1.47 -0.14 74.94 -0.61 -0.08 86.06

BasketBallDrive -0.53 -0.02 35.41 -1.27 -0.07 53.89 -1.72 -0.10 60.35 2.16 -0.12 61.24 3.69 -0.17 69.32 -1.17 -0.09 81.94

SnowMoutain -2.31 -0.08 52.21 -3.30 -0.12 64.35 -3.46 -0.13 67.82 -3.20 -0.17 66.15 -3.11 -0.28 75.54 -3.10 -0.13 82.42

Kimono1 -0.16 -0.01 24.94 -0.32 -0.03 40.79 -0.64 -0.07 46.65 1.04 -0.06 48.15 2.14 -0.10 62.14 -0.06 -0.05 76.73

Average -1.13 -0.04 48.03 -2.05 -0.11 64.03 -2.44 -0.13 68.95 1.88 -0.23 68.26 7.40 -0.37 75.25 -1.74 -0.14 85.95

7/29/2019 06415009

7/9


implementation. Thus, in this subsection, the above algorithms

are applied for HEVC and their effectiveness is then tested

through simulation. To apply AMPD2 or MF to the HEVC

mode decision, candidate partitions should be defined during

the IME phase. Fig. 2 shows one example of a block size

prediction in the IME phase. In the Clusters 1 and 2, there are

three 6464 CU partitions and three 3232 CU partitions,

respectively, whereas there are four 1616 CU partitions in

the Cluster 3. For the 88 CU partition, the best block size is

selected based on the IME cost. For the 1616 CU partition,

the IME costs of the 2N2N, N2N, 2NN and NN types

are sorted in an ascending order, whereas the IME costs of the

2N2N, N2N and 2NN types are sorted in an ascending

order for the 3232 and 6464 CU partitions. Through this

process, ten partitions in total are selected for FME, as shown

in Fig. 2.

Cluster16464

Cluster2

Cluster3

Fig. 2. Prediction modes pre-determined in the IME phase

From the third to fifth columns in Table V, the RD

performance degradation and the encoding time saved are

shown when FME is performed for the ten candidate

partitions of Fig. 2. The time saving is 30.15%, whereas theRD drop is marginal. From the sixth to eighth columns, the

seven candidates, three from the Cluster 1, two from the

Cluster 2 and two from the Cluster 3, are chosen. The time is

reduced by 57.44 %, whereas the increase in bitrate and the

drop in the PSNR are 0.21% and 0.04 dB on average. From

the ninth to eleventh columns, FME is performed for the four

candidate partitions. One from the Cluster 1 and another one

from the Cluster 2 are selected, whereas two are selected from

the Cluster 3. The encoding time is reduced by 73.36%,

whereas the increase in bitrate and the drop in the PSNR are

0.26% and 0.04 dB on average. From these simulations, this

algorithm turns out to be very effective for speed-up without a

significant RD degradation for all types of video sequences.

D.Algorithm EvaluationFrom Sections IV.A to C, in HEVC, it can be inferred that

pre-decisions of prediction block sizes are very difficult,whereas hierarchical decisions or decisions based on the

results from IME are useful for saving time. However, some

of these algorithms offers a different degree of performance

according to the video characteristics. In Fig. 3, the times

saved by various algorithms are compared for theRaceHorses

and FourPeople video sequences denoted by black and gray

bar graphs, respectively. The FourPeople sequence has slow

motion and its texture is smooth, whereas the RaceHorses

sequence includes fast and irregular motion. The hierarchical

decision presented in the HM5.0 reference software, including

the ES scheme, is very effective for theFourPeople sequence.

However, for theRaceHorses, the benefit from those ES, ECUand CFM algorithms are not large and is less than half of that

for the FourPeople sequence. Another notable observation is

that the combination of the ES, ECU and CFM increases the

time saving. However, the rate of increase is not significant as

the effects of those schemes are overlapping in many cases.

When the ECU and CFM schemes are combined with other

hierarchical decision schemes of the ETCU1, ETCU2 and

FME_SKIP, the time saving for the RaceHorses is improved

substantially, whereas the amount of the time saving is

increased slightly for the FourPeople sequence. Unlike other

hierarchical decision algorithms, AMPD algorithms show the

similar performance for both video sequences. The time

saving is increased as the number of candidates are reduced.

As shown in Fig. 3, most algorithms show significant time

savings for theFourPeople sequence, whereas the variation in

the saved time is very large in theRaceHorses sequence. Only

four combinations, ECU+CFM+ETCU2, ECU+CFM+

FME_SKIP as well as the AMPD algorithms with 7 and 4

candidates, show time savings of over 50% for both

FourPeople and RaceHorses.

Fig. 3. Algorithm comparison in terms of the time saved

TABLEV

RDPERFORMANCE DEGRADATION AND ENCODING TIME SAVED

ACCORDING TO MODES DETERMINED IN THE IMEPHASE

Size Videos 10 candidates 7 candidates 4 candidatesB P T B P T B P T

(%) (dB) (%) (%) (dB) (%) (%) (dB) (%)

832

480

BQMall 0.25 -0.03 30.23 0.58 -0.05 57.24 0.48 -0.05 72.98

FlowerVase -0.05 -0.04 30.23 0.20 -0.07 58.19 0.29 -0.07 73.93

Keiba 0.16 -0.02 30.23 0.75 -0.05 56.74 1.01 -0.04 72.42

RaceHorses 0.61 -0.04 30.23 1.30 -0.07 55.92 1.37 -0.07 71.73

1280

720

FourPeople -0.04 -0.01 30.08 0.01 -0.03 57.71 0.08 -0.03 73.87

Johnny -0.46 -0.01 30.08 -0.62 -0.03 57.64 -0.24 -0.04 73.68

KristenAndSara -0.37 -0.02 30.08 -0.20 -0.03 57.71 -0.18 -0.03 73.75

Vidyo1 -0.03 -0.01 30.08 -0.23 -0.02 57.71 -0.01 -0.03 73.75

1920

1080

Aspen -0.05 0.00 30.14 0.39 -0.02 57.95 0.41 -0.02 73.86

BasketBallDrive -0.10 -0.01 30.14 0.42 - 0.02 57.54 0.38 - 0.02 73.50

SnowMoutain -0.34 -0.04 30.14 -0.51 -0.06 57.12 -0.55 -0.06 73.15

Kimono1 -0.09 0.00 30.14 0.39 -0.02 57.78 0.08 -0.02 73.74

Average -0.04 -0.02 30.15 0.21 -0.04 57.44 0.26 -0.04 73.36

7/29/2019 06415009

8/9


In Fig. 4, the RD performances of the ECU+CFM+ETCU2,

ECU+CFM+FME_SKIP and AMPD algorithms with 7 and 4

candidates are compared to that of the HM5.0 reference

software where no early decision algorithm is adopted. The

horizontal and the vertical axes show the bitrate and the PSNR,

respectively. The RaceHorses and FourPeople video

sequences are used in Figs. 4(a) and (b), respectively. The RD

performance of the three algorithms of the

ECU+CFM+FME_SKIP and AMPD algorithms are

comparable to that of the HM5.0 reference software, whereas

the RD drop of the ECU+CFM+ETCU2 algorithm denoted by

the dash curve is quite large.

30

34

38

42

46

0 5000 10000

PSNR

(dB)

Bitrate (kbps)

ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2

30

34

38

42

46

0 1000 2000 3000

PSNR

(dB)

Bitrate (kbps)

ECU+CFM+FME_SKIPAMPD 7CandAMPD 4CandHM5.0ECU+CFM+ETCU2

(a) (b)

Fig. 4. Algorithm comparison in terms of the RD performance: (a)

832480-size RaceHorses sequence (b) 1280720-size FourPeople

sequence

V.CONCLUSION

The HEVC standard employs a hybrid coding approach

similar to that of the H.264/AVC standard. Thus, the two

standards have much in common. In this paper, the fast mode

decision algorithms for H.264/AVC are surveyed and then

they are applied for the speed-up of HEVC encoding. One of

the major differences is that the number of block sizes

supported by HEVC is 10 times more than that of H.264/AVC.

The other is that the execution time for FME becomes much

larger than that for IME because IME execution can be speed

up by exploiting parallelism while FME execution needs to be

executed in a serial manner. This second difference needs to

make the fast execution of FME become more important than

that of IME when a hardware-based encoder is used for

HEVC compression. It is experimentally shown that a

hierarchical inter-mode decision algorithm is a very effective

solution for HEVC because there are many opportunities to

terminate further prediction during searching a tree of CUs. In

the future, the previous algorithms tested in this paper need tobe further elaborated and enhanced.

REFERENCES

[1] Draft ITU-T Recommendation and Final Draft International Standard ofJoint Video Specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC),

2003.

[2] ISO/IEC JTC 1 SC29 WG11, "Joint Call for Proposals on VideoCompression Technology," Doc. N11113, Jan. 2010.

[3] ISO/IEC JTC 1 SC29 WG11, "Vision, Applications and Requirementsof High-Performance Video Coding," Doc. N11096, Jan. 2010.

[4] T. Wiegand, W.J. Han, B. Bross, and J. R Ohm, and G.J. Sullivan,WD4: Working Draft 4 of High-Efficiency Video Coding,

JCTVCF803, Torino, IT, July 2011.

[5] Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kou, S.-J. Wu, W.-C. Tai, W.-C.Chang, and T.-Sheuan Chang, A 242mW, 10mm2 1080p H.264/AVC

High Profile Encoder Chip, in Proc. of Design Automat. Conf., pp.78-

83, July 2008.

[6] Y.-H. Chen, T.-D. Chuang, Y.-J. Chen, C.-T. Li, C.-J. Hsu, S.-Y. Chien,and L.-G. Chen, An H.264/AVC scalable extension and high profile

HDTV 1080p encoder chip, inProc. of Sym. on VLSI Circuits, pp.104-

105, Aug. 2008.[7] Y.-H. Chen, T.-C. Chen, and L.-G. Chen, Power-scalable algorithm and

reconfigurable macro-block pipelining architecture of H.264 encoder for

mobile application, in Proc. Int. Conf. Multimedia Expo, pp.281284,

Dec. 2006.

[8] T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W.Chen, and L.-G. Chen, Analysis and architecture design of an

HDTV720p 30 frames/s H.264/AVC encoder, IEEE Trans. Circuits

Syst. Video Technol., vol. 16, no. 6, pp. 673688, June, 2006.

[9] H.-C. Chang, Y.-C. Yang, J.-W. Chen, C.-L. Su, C.-A. Chien, J.-I. Guo,and J.-S. Wang, A dynamic quality-scalable H.264 video encoder

chip, inProc. Asia South Pacific Design Automat. Conf., pp. 125126,

Feb. 2009.

[10] Y.-K. Lin, C.-C. Lin, T.-Y. Kuo, and T.-S. Chang, A Hardware-Efficient H.264/AVC Motion-Estimation Design for High-Definition

Video, IEEE Trans. Circuits and System I, vol. 55, no. 6, pp. 1526

1535, July, 2008.[11] C. Yang, S. Goto and T. Ikenaga, High Performance VLSI Architecture

of Fractional Motion Estimation in H.264 for HDTV, in Proc. of Int.

Symposium on Circuits and Systems, pp.26052608, May, 2006.

[12] C.-Y. Kao, C.-L. Wu and Y.-L. Lin, A High-Performance Three-EngineArchitecture for H.264/AVC Fractional Motion Estimation, IEEE Trans.

Very Large Scale Integration Sys., vol. 18, no. 4, pp. 662666, April,

2010.

[13] P. K. Tsung, W.-Y. Chen, L.-F. Ding, S.-Y. Chien, L.-G. Chen, Cache-based Integer Motion/Disparity Estimation for Quad-HD H.264/AVC

and HD Multiview Video Coding, in Proc. of the IEEE Int. Conf. on

Acoustics, Speech, and Signal Processing, pp. 20132016, April, 2009.

[14] C.-M. Ou, C.-F. Le, W.-J. Hwang, An efficient VLSI architecture forH.264 variable block size motion estimation, IEEE Trans. Consumer

Electronics, vol. 51, no. 4, pp. 12911299, Nov., 2005.

[15] J. Kim and T. Park, A novel VLSI architecture for full-search variableblock-size motion estimation,IEEE Trans. Consumer Electronics, vol.55, no. 2, pp. 728733, May, 2009.

[16] L. Zhang and W. Gao, Reusable Architecture and Complexity-Controllable Algorithm for the Integer/Fractional Motion Estimation of

H.264,IEEE Trans. Consumer Electronics, vol. 53, no. 2, pp. 749756,

May, 2007.

[17] X. Lu, A.M. Tourapis, P. Yin, and J. Boyce, Fast Mode Decision andMotion Estimation for H.264 with a Focus on MPEG-2/H.264

Transcoding, inProc. of Int. Symposium on Circuits and Systems, vol.

2, pp.12461249, May, 2005.

[18] C. E. Rhee, J.-S. Kim, and H.-J. Lee, Cascaded Direction Filtering forFast Multidirectional Inter-Prediction in H.264/AVC Main and High

Profile Compression, IEEE Trans. Circuits Syst. Video Technol., vol.

22, no. 3, pp. 403413, March, 2012.

[19] B.-G. Kim, S.-K. Song, and C.-S. Cho, Efficient inter-mode decisionbased on contextual prediction for the P-slice in H.264/AVC video

coding, in Proc. Int. Conf. Image Processing, pp.13331336, Oct.,

2006.

[20] B.-G. Kim and C.-S. Cho, A fast inter-mode decision algorithmbased on macro-Block tracking for P slices in the H.264/AVC video

standard, in Proc. Int. Conf. Image Processing, vol. 5, pp. 301304,

Sept., 2007.

[21] X. Jin, Y. Huang, Q. Liu, S. Wu, and T. Ikenaga, Fast Spatial DirectMode Decision for B Slice based on Temporal Information in H.264

Standard, in Proc. of Int. Sym. on Intell igent Signal Processing and

Communication Systems,pp.331334, Jan. 2009.

[22] T. Zhao, H. Wang, and S. Kwong, C. -C. J. Kuo, Fast Mode DecisionBased on Mode Adaptation, IEEE Trans. Circuits Syst. Video

Technol., vol. 20, no. 5, pp. 697705, May, 2010.

7/29/2019 06415009

9/9


[23] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C.Ko, Fast Intermode Decision in H.264/AVC Video Coding, IEEE

Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 953 958, July,

2005.

[24] C. Y. Chang, C. H. Pan, and H. Chen, Fast mode decision for P-framesin H.264, presented at the Picture Coding Symp., Dec., 2004.

[25] S.-H. Ri, Y. Vatis, and J. Ostermann, Fast Inter-Mode Decision in anH.264/AVC Encoder Using Mode and Lagrangian Cost Correlation,

IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 302 306,

Feb., 2009.

[26] X. Jing and L.-P. Chau, Fast approach for H.264 INTER modedecision, Electronics Letters, vol. 40, no. 17, pp.10501052, Aug.,

2004.

[27] A. Ahmad, N. Khan, S. Masud, and M.A. Maud, Selection of variableblock sizes in H.264, in Proc. of the IEEE Int. Conf. on Acoustics,

Speech, and Signal Processing, vol. 3, pp. 173176, May, 2004.

[28] H. Zeng, C. Cai, and K.-K. Ma, Fast Mode Decision for H.264/AVCBased on Macroblock Motion Activity, IEEE Trans. Circuits Syst.

Video Technol., vol. 19, no. 4, pp. 491 499, April, 2009.

[29] J. Bu, S. Lou, C. Chen, and J. Zhu, A predictive block-size modeselection for inter frame in H.264, in Proc. of the IEEE Int. Conf. on

Acoustics, Speech, and Signal Processing, vol. 2, pp. 917920, May,

2006.

[30] H.Ko, K. Yoo, and K. Sohn, Fast mode-decision for H.264/AVC basedon inter-frame correlations, Signal Processing: Image Commun.,

vol.24, no. 10, pp. 803-813, Nov. 2009.

[31] J. Y. Lee and H. Park, A Fast Mode Decision Method Based on MotionCost and Intra Prediction Cost for H.264/AVC, IEEE Trans. CircuitsSyst. Video Technol., vol. 22, no. 3, pp. 393 402, March, 2012.

[32] D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, and X. Lin, Block intermodedecision for fast encoding of H.264, inProc. of the IEEE Int. Conf. on

Acoustics, Speech, and Signal Processing, vol. 3, pp. 181184, May,

2004.

[33] Z. Liu, L. Shen, and Z. Zhang, An Efficient Intermode DecisionAlgorithm Based on Motion Homogeneity for H.264/AVC, IEEE Trans.

Circuits Syst. Video Technol., vol. 19, no. 1, pp. 128132, Jan., 2009.

[34] D. Zhu, Q. Dai, and R. Ding, Fast inter-prediction mode decision forH.264, in Proc. Int. Conf. Multimedia Expo, vol. 2, pp. 11231126,

June, 2004.

[35] C.-H. Kuo, M. Shen, and C.-C. J. Kuo, Fast inter-prediction modedecision and motion search for H.264, in Proc. IEEE Int. Conf.

multimedia Expo, vol. 1, pp. 663666, June, 2004.

[36]

P. Yin, H.-Y.C. Tourapis, A.M. Tourapis, and J.Boyce, Fast modedecision and motion estimation for JVT/H.264, in Proc. of the IEEE Int.

Conf. on Image Processing, vol. 3, pp.853856, Sept., 2003.

[37] A. C. W. Yu, G. R. Martin, and H. Park, Fast Inter-Mode Selection inthe H.264/AVC Standard Using a Hierarchical Decision Process, IEEE

Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 186 195, April,

2009.

[38] G. Kim, Y. Moon, and J. Kim, An early detection of all-zero DCTblock in H.264, inProc. Int. Conf. Image Processing, vol. 1, pp. 453

456, Oct. 2004.

[39] J. Lee and B. W. Jeon, Fast mode decision for H.264 with variablemotion block size, Lecture Notes in Computer Science, vol. 2869, pp.

723730, 2003.

[40] I. Choi, J. Lee, and B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 part-10 AVC/H.264, IEEE Trans.

Circuits Syst. Video Technol., vol. 16, no. 12, pp. 15571561, Dec.,

2006.

[41] Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik, and H.-K.Jung,Adaptive mode decision for H.264 encoder, Electronics Letters, vol.

40, no. 19, pp.11721173, Sept., 2004.

[42] J. Lee and B. Jeon, Pruned mode decision based on variable block sizesmotion compensation for H.264, Lecture Notes in Computer Science,

vol. 2899,pp. 410418, Nov., 2003.

[43] C.S. Kannangara, I.E.G. Richardson, M. Bystrom, J.R. Solera, Y. Zhao,A. MacLennan, and R. Cooney, Low complexity skip prediction for

H.264 through Lagrangian cost estimation, IEEE Trans. Circuits Syst.

Video Technol., vol. 16, no. 2, pp. 202208, Feb., 2006.

[44] Y. Moon, G. Kim, and J. Kim, An improved early detection algorithmfor all-zero blocks in H.264 video encoding, IEEE Trans. Circuits Syst.

Video Technol., vol. 15, no. 8, pp. 10531057, Aug., 2005.

[45] T.-C. Chen, Y.-W. Huang, and L.-G. Chen, " Fully utilized and reusablearchitecture for fractional motion estimation of H.264/AVC," inProc. of

the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5,

pp. 9 12, May, 2004.[46] M. Shao, Z. Liu, S. Goto, and T. Ikenaga, Lossless VLSI oriented full

computation reusing algorithm for H.264/AVC fractional motion

estimation, IEIEC Trans. Fundamentals, vol.90-A, no.5, pp. 756763,

April, 2007.

[47] Y. Song, M. Shao, Z. Liu, S. Li, L. Li, T. Ikenaga, and S. Goto, H.264/AVC fractional motion estimation engine with computation reusing

in HDTV1080p real-time encoding applications, in Proc. of the IEEE

Workshop on Signal Processing Systems, pp.509514, Oct., 2007.

BIOGRAPHIES

Chae Eun Rhee received the B.S., M.S. and Ph.D degrees in

Electrical Engineering and Computer Science from Seoul

National University, Seoul, Korea, in 2000, 2002 and 2011,

respectively. From 2002 to 2005, she was with the Digital TV

Development Group, Samsung Electronics Company Ltd.,Suwon City, Korea, as an Engineer, where she was involved

in bus architecture and MPEG decoder development. She is currently working

as a research professor in Electrical Engineering and Computer Science at

Seoul National University, Korea. Her research interests include algorithm

and architecture design of video coding for HEVC and H.264/AVC and

configurable video coding for real time systems.

Kyujoong Lee received the B.S. degree in electrical

engineering from Seoul National University, Seoul, Korea,

in 2002 and the M.S. degree in electrical engineering from

University of Southern California, Los Angeles, USA, in

2008. He is working toward Ph.D degree in electrical

engineering of Seoul National University. From 2002 to

2005, he was with Com2us Corporation, Seoul, Korea, as a

developer. His major research interests include the algorithm and architecture

of H.264/AVC and SVC and noise reduction of video stream.

Tae-Sung Kim received the B.S degree in electrical

electronic engineering from Pusan National University,

Pusan, Korea, in 2010. He is working toward M.S degree in

electrical engineering of Seoul National University. His

research interests include the algorithm and architecture of

H.264/AVC and HEVC.

Hyuk-Jae Lee received the B.S. and M.S. degrees in

Electronics Engineering from Seoul National University,

Korea, in 1987 and 1989, respectively, and the Ph.D. degree

in Electrical and Computer Engineering from Purdue

University at West Lafayette, Indiana, in 1996. From 1998 to

2001, he worked at the Server and Workstation Chipset

Division of Intel Corporation in Hillsboro, Oregon as a senior

component design engineer. From 1996 to 1998, he was on the faculty of the

Department of Computer Science of Louisiana Tech University at Ruston,

Louisiana. In 2001, he joined the School of Electrical Engineering and

Computer Science at Seoul National University, Korea, where he is currently

working as a Professor. He is a founder of Mamurian Design, Inc., a fabless

SoC design house for multimedia applications. His research interests are in the

areas of computer architecture and SoC design for multimedia applications.

06415009

Documents