[ieee 2013 international siberian conference on control and communications (sibcon 2013) -...

2013 International Siberian Conference on Control and Communications (SIBCON)

978-1-4799-1062-5/13/$31.00 ©2013 IEEE

Abstract—Emerging video compression standard H.265/HEVC

provides up to 2 times better compression efficiency compared to H.264/AVC standard. However, it has higher computational complexity. In this paper, we propose iterative intra prediction search for H.265/HEVC encoder to reduce the number of prediction modes for estimation. We base on our research of computational complexity and usage frequency of H.265/HEVC intra prediction. On JCT-VC test sequences, we got about 40% encoding time reduction for HM 10.1 intra-only coding with negligible bitrate increase and PSNR quality degradation. We also offer some additional speed-up techniques, including fast prediction error estimation.

Index Terms—video coding, HEVC, intra prediction search, complexity reduction.

I. INTRODUCTION Evolution of multimedia systems encourages the demand

for data storage and transmission rates. Screen resolution and quality of video devices (TV-sets, monitors, cellphone displays) increases and efficiency of contemporary video compression systems becomes insufficient.

Emerging High Efficiency Video Coding standard H.265/HEVC [1] provides up to 2 times better compression efficiency compared to H.264/AVC [2, 3] standard so that it can decrease demands on data rates.

HEVC provides much more flexible frame representation structure by introducing the concept of coding unit (CU), prediction unit (PU) and transform unit (TU). Besides it provides recursive quadtree structure for frame partitioning, larger block transforms, more efficient motion compensation and motion vector prediction, additional sample adaptive offset (SAO) filtering and CABAC modification called Syntax-based context-adaptive binary arithmetic coder (SBAC) [4]. All these improvements significantly increase decoding and especially encoding complexity. Optimization of mode decision algorithm is required in order to achieve real-time compression speed.

It this paper we carry out a research on computational complexity and usage frequency of HEVC intra prediction modes in RD-optimized coded streams. Basing on this

M. P. Sharabayko is P.G. of Department of Computer Engineering, Institute of Cybernetics at Tomsk Polytechnic University; e-mail: [email protected]).

N. G. Markov, Prof., Department of Computer Engineering, Institute of Cybernetics at Tomsk Polytechnic University (e-mail: [email protected]).

research, we propose iterative intra prediction search, decreasing the number of prediction modes for Rough Mode Decision (RMD) from 35 to 15 at maximum. We also offer additional speed-up techniques including fast prediction error estimation in the process of prediction. We do not use any preprocessing functions; just extend those that HEVC video encoder should already have.

The rest of this paper is structured as follows. Section II introduces intra prediction in HEVC and provides an overview of fast intra prediction methods developed for HEVC and AVC. Section III shows our research results on intra prediction complexity and usage frequency and describes the proposed iterative intra prediction technique. Section IV demonstrates the effectiveness of the proposed solution, followed by the conclusion in Section V.

II. HEVC INTRA PREDICTION SEARCH HEVC introduces a concept of coding tree units: an

evolution of 16×16 luma samples macroblock in AVC and previous standards. CTU size can be 2N×2N luma samples, where N = 4, 8, 16 or 32. Larger CTU size grants better compression efficiency for higher video resolutions.

Each CTU represents a top-level CU while each CU can be adaptively partitioned into four sub-CUs until it reaches the minimum size thus forming recursive quadtree structure.

CU contains prediction unit (PU) which specify the prediction made. In case of intra coding, PU has the same size as 2N×2N CU except for the bottom level CU of minimal size which can also contain four N×N PUs.

Transform unit belongs to CU and holds transformation data. The size of TU can be identical to CU. In addition, any TU may be further partitioned into four TUs of smaller size. Adaptive transforms of 4×4, 8×8, 16×16 and 32×32 TUs are allowed by HEVC.

Intra coding becomes more complicated as the number of intra prediction modes increases: 10 intra prediction modes of AVC are extended to 35 modes including planar, DC and 33 angular predictions (Fig. 1).

There are many research papers on optimization of intra prediction search. The approaches can be classified as dedicated to reduce coding tree levels to search prediction on and to reduce the number of prediction modes on each level to check. Most of the techniques for AVC intra prediction search can be extended to HEVC.

Iterative Intra Prediction Search for H.265/HEVC

Maxim P. Sharabayko, Nikolay G. Markov


Elyousfi et al. [5] suggest determining AVC intra prediction

mode based on characteristic of homogeneity information, involving gradient and quadratic prediction. This allows reducing the complexity of intra prediction search to 76.07% maintaining the similar PSNR quality with about 1.94% bitrate increase in average.

Liu et al. [6] propose an improved fast intra prediction algorithm for AVC including block type selection and mode decision based on analysis of edge feature of a block. Thus, the authors manage to reduce computational complexity of intra prediction from 52.90% to 56.31% with 0.04 dB PSNR degradation and 2% increase of bitrate.

Li et al. [7] create edge map and build a local edge direction histogram, thus reducing the number of AVC intra prediction modes to check. They report 60% time saving on all intra coding with the average PSNR quality loss of about 0.24 dB and bitrate increase of 3.7%.

There are also papers on HEVC intra prediction optimizations. For instance, HEVC intra prediction mode decision based on edge direction information, suggested by Silva et al. [8], makes it possible to decrease prediction time up to 32.08% with maximum 0.05 dB loss and 0.9% bitrate increase.

Zhao et al [9] reduce the number of directions for RDO process and compensate possible mistakes by including estimation of most probable mode. This approach results in 20% and 28% time savings in intra high efficiency and low complexity cases with negligible loss in BD-rates.

Sun et al. [10] suggest level filtering to reduce the number of prediction unit levels that require fine processing from 5 to 2. They also propose mode filtering to further reduce number of angular modes for evaluation from 33 to 9. Reported complexity reduction is over 50% with the bitrate increase lower than 2.5%.

The majority of the papers on fast intra prediction are focusing on acceleration of complex search model of reference encoders with exhaustive search and rate-distortion (RD) estimation. Thus, reference HM 10.1 encoder performs intra prediction search in two major steps. First, rough mode decision (RMD) over all 35 intra prediction modes is performed with sum of absolute Hadamard transformed

difference (SATD) computation. Several best prediction modes (the number depends on PU size) with least SATD cost are subject to expensive RD-cost estimation. Profiling results in [11] show that RMD takes about 20% of prediction search time while RD estimation takes 80% of time. We report similar results with RD estimation taking 75% and RMD - only 25%.

Some approaches reviewed improve compression speed by introducing preprocessing stage [5, 6, 7, 8, 10] to extract block features and accurately guess the best prediction modes. However, in case of real-time compression systems, RD estimations often become simpler. As a result preprocessing stage might become a bottleneck.

In [12] Hao Zhang and Zhan Ma reduce the number of intra prediction modes for RMD to at most 20 by progressive search. First, they estimate modes 0, 1, 6, 10, 14, 18, 22, 26, 30, 34. They keep six modes with least SATD cost in RD candidates list. Then for each angular mode M from this list the authors suggest to check neighboring modes M-2 and M+2, where M ∈{2…34} represents an index of the angular mode (Fig. 1). Finally, they take two angular prediction modes M1 and M2 with least SATD cost and estimate two closest neighbors of each (M1-1, M1+2, M2-1 and M2+1).

Additionally authors suggest to speedup SATD estimation by performing it on down-sampled residuals. Combining these with early RDO termination, they achieve 38% encoding time reduction for all intra case.

The progressive mode search is very simple in terms of implementation and optimization potential. Yet we assume that this approach can be modified to achieve more optimization benefits.

III. ITERATIVE INTRA PREDICTION SEARCH Fast intra prediction search should provide evident

compression time reduction with minimal bitrate increase and PSNR quality degradation.

First, we accelerate the very process of prediction estimation. Error estimation can be combined with intra prediction process. Furthermore, there is no need to store predicted sample values. Hence, we abolish extra read/write operations by estimating the distortion error while getting each sample value of PU and thus do not store the prediction result. We also do not post-filter prediction on PU borders to simplify intra prediction for RMD. To further reduce the complexity of error estimation we compute sum of absolute difference (SAD) instead of SATD as it can be performed on each sample value independently and it is less expensive. Prediction estimation may be further optimized using SIMD instructions, but we omit it in our research to get proper and comparable results.

There is an obvious need to reduce the number of intra prediction modes for RMD. We base on the fact that angular intra prediction mode with minimal error has better prediction modes among local neighbors than among more distant modes. This conclusion comes from the sense of intra

Fig. 1. HEVC angular intra prediction modes


prediction directions and from our study of prediction errors for several PUs.

We need to estimate as few prediction modes as possible to hit local neighborhood of a global minimum of prediction error. The faster we do this estimation, the better speed up we have. Table I shows relative prediction times for all 35 modes. The results are carried out on HM 10.1 implementation of intra prediction functions. Mode prediction time is expressed relative to corresponding DC prediction time (mode 1) of the same PU size.

Obviously, vertical modes 18, 26 and 34 are at most 3 times

slower than DC. At the same time they are the fastest among angular prediction modes as they use only one prediction pattern sample to get predicted value. Similar complexity should have horizontal modes 2 and 10, but they are almost 2 times slower than vertical. Horizontal prediction is ineffectively implemented in HM 10.1: horizontal modes are predicted as if they were vertical, afterwards predicted block is transposed. The last step slows down prediction. In this paper we optimize horizontal prediction (modes 2-17) in HM 10.1 and it becomes as efficient as vertical prediction.

Angular modes 3-9, 11-17, 19-25 and 27-33 use average value of two samples in prediction pattern and are at most 5 times slower than DC (Table I).

Planar prediction is the most complex and the most expensive. However, it is the most frequently used due to our study of reference HM 10.1 encoder on JCT-VC test sequences [13]. For example, Fig. 2 shows the usage frequency distribution for all 35 intra prediction modes on Traffic video sequence with four quantization parameters. Planar prediction takes almost 20% of all prediction modes chosen by the reference HM 10.1 encoder. Also 10% of prediction units have DC prediction. The most frequently used angular prediction modes 10 and 26 are chosen 10% and 8% of time respectively. It is worth mentioning that planar prediction usage increases with the increase of quantization parameter. Compressed FourPeople sequence (Fig. 3) has also about 20% planar predictions and about 10% DC predictions. Intra prediction frequency distribution is almost similar to Traffic sequence (Fig. 2).

As the result of our investigation, we check Planar, DC and the fastest angular modes 2, 10, 18, 26 and 34. However, our experiments showed that estimation of these modes is not enough to hit the local area of the best intra prediction mode accurately. Therefore, we also check equidistant modes 6, 14, 22, 30. This stage is similar to [12] expect that we also propose to check mode 2. Besides, we do not estimate vertical modes 19-34 or horizontal modes 2-17 if there is no above or left PU neighbor correspondingly. We also don’t check mode 18 if both left and above prediction is unavailable.

As the result, we estimate 11 prediction modes and

determine the best angular prediction mode M. Then we check M-2 and M+2 neighboring modes to refine local minimum of prediction cost. Among these three modes we once again choose the best mode M’ and check M’-1 and M’+1 modes.

This way we perform RMD and form the list of intra prediction candidates for RD-estimation. As a result, RMD checks at most 15 intra prediction modes instead of all 35 without extra implementation complexity and preprocessing stages.

Essential intra prediction search speed up may be achieved by optimization of RD estimation. Equally to [12] we do not estimate RD cost for angular modes if the best RMD candidate is planar or DC. Furthermore, we early terminate RD estimation as in [12] if RMD error of a candidate is 20% higher than the best RMD error, as those modes are unlikely to be optimal.

TABLE I RELATIVE INTRA PREDICTION TIME

RELATIVE INTRA PREDICTION TIME DEPENDING ON PU SIZE Intra

modes 4×4 8×8 16×16 32×32 64×64

0 0.92 1.43 2.57 4.94 8.60 1 1.00 1.00 1.00 1.00 1.00 2 0.87 1.07 1.67 3.25 5.30 3-9 1.03 1.56 2.50 4.60 7.67 10 0.84 1.15 1.96 3.58 5.84 11-17 1.03 1.56 2.50 4.60 7.67 18 0.74 0.81 0.95 1.56 2.80 19-25 0.89 1.09 1.75 3.01 5.12 26 0.71 0.81 1.26 1.64 2.85 27-33 0.89 1.09 1.75 3.01 5.12 34 0.68 0.69 0.99 1.55 2.79

Fig. 3. Intra prediction mode frequency, FourPeople

Fig. 2. Intra prediction mode frequency, Traffic


IV. EXPERIMENTAL RESULTS We implemented the described modifications in HM 10.1

reference encoder. We run several tests on JCT-VC test sequences with four quantization parameters as suggested in [13]. Table II shows the performance results of our optimization approach. The following equations are used:

%100)(

1.10

1.10 ⋅−

=ΔHM

HMour

BitrateBitrateBitrateBitrate

%100)(

1.10

1.10 ⋅−

=ΔHM

HMour

TimeTimeTimeTime

1.10HMour PSNRPSNRPSNR −=Δ As a result we gained 40% increase of (see Table II) HM

10.1 encoder intra compression speed at the expense of insignificant bitrate increase and PSNR quality reduction. The higher quantization parameter is, the more the speed up is.

As can be seen in Table II, the speed up barely depends on the resolution of a video sequence. The performance difference seems to correlate with planar and DC mode usage frequency. For example, Kimono test sequence has 27% to 37% planar predicted PUs in the originally coded sequences. As we early terminate RD estimation for some angular modes but always check planar and DC, bitrate increase and quality degradation for this sequence is less than for other test sequences. The same conclusion we make for BQTerrace sequence.

V. CONCLUSION Our optimizations of reference HM 10.1 encoder increase

compression speed 40% on average due to the decrease of

prediction search time. RMD estimation now takes only 5% of intra prediction search, while RD cost calculation takes 95% of time. Still compression time is too high for real-time and even offline applications. We consider the further research should be carried out to reduce RD estimation time required. This could be done by reducing RD candidates or even omitting it at some cases. Our solution primarily aims at reducing number of prediction modes to check for each PU. It is worth combining our approach with level reduction techniques to further decrease prediction search time.

REFERENCES [1] B. Bross, W.-J. Han, J.-R. Ohm, G. J. Sullivan, Y.-K. Wang, T.

Wiegand, “High Efficiency Video Coding (HEVC) text specification draft 10,” in Doc. JCTVC-L1003. Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, January 2013.

[2] P. Andrivon, P. Bordes, M. Arena, P. Sunna, “Comparison of Compression Performance of HEVC Draft 10 with AVC for UHD-1 material,” JCT-VC Doc. JCTVC-M0166, April 2013.

[3] B. Li, G. J. Sullivan, J. Xu, “Comparison of Compression Performance of HEVC Draft 10 with AVC High Profile,” JCT-VC Doc. JCTVC-M0329, April 2013.

[4] G.J. Sullivan, J. Ohm, Woo-Jin Han, T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1667, December 2012.

[5] A. Elyousfi, A. Tamataoui, E. Bouyakhf, “Fast Intra Prediction Algorithm for H.264/AVC Based on Quadratic and Gradient Model,” International Journal of Electrical and Electronics Engineering, vol. 1, no. 1, pp. 27-35, 2010.

[6] Qiong Liu, Rui-min Hu, Li Zhu, Xin-chen Zhang, Zhen Han, “Improved fast intra prediction algorithm of H.264/AVC,” Journal of Zhejiang University SCIENCE A, vol. 7, Issue 1 Supplement, pp 101-105, January 2006.

[7] Shen Li, Xianghui Wei, Takeshi Ikenaga, Satoshi Goto, “A VLSI architecture design of an edge based fast intra prediction mode decision algorithm for H.264/AVC,” Proceedings of the 17th ACM Great Lakes symposium on VLSI, pp. 20-24, March 2007.

[8] T.L. da Silva, L.V. Agostini, L.A. da Silva Cruz, “Fast HEVC intra prediction mode decision based on EDGE direction information,” Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 1214 – 1218, August 2012.

[9] Liang Zhao, Li Zhang, Siwei Ma, Debin Zhao, “Fast mode decision algorithm for intra prediction in HEVC,” 2011 IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, November 2011.

[10] Heming Sun, Dajiang Zhou, S. Goto, “A Low-Complexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering,” 2012 IEEE International Conference on Multimedia and Expo (ICME), pp. 1085 – 1090, July 2012.

[11] Younhee Kim, DongSan Jun, Soon-Heung Jung, Jin Soo Choi, Jinwoong Kim, “A Fast Intra-Prediction Method in HEVC Using Rate-Distortion Estimation Based on Hadamard Transform,” ETRI Journal, vol. 35, issue 2, pp. 270-280, April 2013.

[12] Hao Zhang, Zhan Ma, “Fast Intra Prediction for High Efficiency Video Coding,” Advances in Multimedia Information Processing, Proceedings on 13th Pacific-Rim Conference on Multimedia, pp. 568-577, December 2012.

[13] F. Bossen, “Common Test Conditions and software reference configurations,” JCT-VC Doc. JCTVC-F900, July 2011.

TABLE II PERFORMANCE COMPARISON BY THE PROPOSED APPROACH AND ORIGINAL

HM 10.1 ENCODER SEQUENCE QP ∆BITRATE

[%] ∆TIME

[%] ∆Y-PSNR

[dB] 22 0,45 -28,95 -0,114 27 0,71 -48,23 -0,104 32 0,84 -42,13 -0,112

PeopleOnStreet 2560×1600

30 Hz 37 0,97 -42,99 -0,112 22 0,39 -30,19 -0,114 27 0,59 -40,65 -0,099 32 0,64 -38,09 -0,095

Traffic 2560×1600

30 Hz 37 0,59 -48,31 -0,088 22 0,89 -42,15 -0,088 27 1,02 -42,43 -0,107 32 1,01 -42,58 -0,119

FourPeople 1280×920

60 Hz 37 0,91 -44,83 -0,114 22 0,15 -36,82 -0,025 27 0,21 -37,48 -0,026 32 0,23 -43,91 -0,037

Kimono 1920×1080

24 Hz 37 0,22 -41,41 -0,044 22 0,23 -37,34 -0,095 27 0,43 -38,97 -0,092 32 0,53 -40,81 -0,084

ParkScene 1920×1080

24 Hz 37 0,40 -42,91 -0,062 22 0,11 -40,47 -0,117 27 0,13 -41,32 -0,084 32 0,16 -41,64 -0,093

BQTerrace 1920×1080

60 Hz 37 0,23 -42,72 -0,094

[ieee 2013 international siberian conference on control and communications (sibcon 2013) -...

Documents