objective · web viewfast decision of block size, prediction mode and intra block for h.264 intra...

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction

Guided by – Dr. K. R. Rao

Gaurav HansdaUTA ID – 1000721849

[email protected]

OBJECTIVEThe objective is to implement a variance-based algorithm for block size decision, an

improved filter-based algorithm for prediction mode decision, and a selection algorithm for intra block decision that exploits the relation between the rate-distortion characteristic and the best coding type. All algorithms are implemented using JM reference software [17]. This is for the latest video coding standard for H.264 [19].

MOTIVATIONThe H.264/Advanced video coding (AVC) standard [1] achieves better performance than

the previous video coding standards and has become a key technology for multimedia communications and consumer electronics. However, like the other advanced coding techniques adopted earlier, it also brings in significant computational overhead to the encoder. The mode decision of H.264 is more complicated and time-consuming than those of previous coding standards. In fact, it is the most computationally intensive component of an H.264 encoder, more so for the High Profile. Various profiles of H.264 are shown in Fig.1. Therefore, fast algorithms for mode decision are needed.

The high performance of this standard is achieved by the adoption of advanced coding tools, such as spatial-domain adaptive intra directional prediction, variable block-size motion estimation (Fig. 2), multiple reference frames, and rate-distortion (R-D) optimization [2], [3]. In particular, the H.264 High Profile is further developed for applications such as professional film production, digital video broadcasting, and high-definition TV and video disc. One notable difference of this new profile is that it adopts the 8×8 intra prediction as a new coding tool to improve the R-D performance.

Fig. 3 shows a block diagram of the video coding layer (VCL) for a macroblock [2]. The input video signal is split into macroblocks, the association of macroblocks to slice groups and slices are selected, and then each macroblock of each slice is processed as shown. An efficient parallel processing of macroblocks is possible when there are various slices in the picture.

mailto:[email protected]

Fig. 1. Profiles in H.264 [17]

Fig.2. Segmentations of the macroblock for motion compensation [2]

Fig. 3. Basic coding structure for H.264/AVC for a macroblock. [2]

As shown in Fig. 4, the mode decision process of an H.264 compliant encoder entails a three-stage hierarchy of operations: inter/intra block decision, block size decision, and prediction mode decision [4]. There are two important implications of this hierarchical structure of mode decision. First, to ensure the correctness of the decision at upper layer and second, to ensure early termination is executed accurately and as early as possible.

Most fast mode decision algorithms developed so far, only deal with a single stage of the mode decision hierarchy [5]-[15] and fail to achieve the best possible complexity reduction. In this project more focus is given to left branch of the mode decision hierarchy namely, inter/intra block decision of inter frames, block size decision of intra blocks, and the prediction mode decision of intra blocks.

The techniques employed in the proposed algorithms are inspired by several factual observations. For block size decision, it is known that the block size of the best coding mode is highly correlated with texture complexity [16]. In addition, the variance of a macroblock (MB) in the spatial domain, equivalent to the energy of AC coefficients in the discrete cosine transform domain, is a low-cost and effective measurement of texture complexity [16].

Fig. 4. Mode decision hierarchy of an H.264 compliant encoder. [4]

For prediction mode decision, most filter-based algorithms use edge detectors to determine dominant edges and limit the search of the best prediction mode to those along the detected edges [5]-[12]. However, the information between neighboring blocks is not taken into account. An improved traditional filter-based prediction mode decision algorithm can be obtained by incorporating contextual information [4]. The resulting algorithm works effectively for the High Profile of H.264. When using the Intra4x4 mode, each 4x4 block is predicted from spatially neighboring samples as illustrated in Fig. 4. Eight prediction modes are shown in Fig. 5. Fig. 6 shows a 4x4 luma block that is to be predicted. For the predicted samples [a,b, ... ,p] for the current block, the above and left previously reconstructed samples [A,B, ... ,M] are used according to direction modes. The arrows in Fig. 6 indicate the direction of prediction in each mode. For mode 0 (vertical) and mode 1 (horizontal), the predicted samples are formed by extrapolation from upper samples [A, B, C, D] and from left samples [I, J, K, L], respectively.

Fig. 5. Eight “prediction directions” for Intra 4x4 prediction [2].

For mode 2 (DC), all of the predicted samples are formed by mean of upper and left samples [A, B, C, D, I, J, K, L]. For mode 3 (diagonal down left), mode 4 (diagonal down right), mode 5 (vertical right), mode 6 (horizontal down), mode 7 (vertical left), and mode 8 (horizontal up), the predicted samples are formed from a weighted average of the prediction samples A–M. For example, samples ‘a’ and ‘d’ are, respectively, predicted by round (I/4 + M/2 + A/4) and round (B/4 + C/2 + D/4) in mode 4, also by round (I/2 + J/2) and round (J/4 + K/2 + L/4) in mode 8. The encoder may select the prediction mode for each block that minimizes the residual between the block to be encoded and its prediction [17].

For prediction of each 8x8 luma block, one mode is selected from the 9 modes, similar to the (4x4) intra-block prediction [17]. For prediction of all 16x16 luma components of a macroblock, four modes are available as shown in Fig. 7. For mode 0 (vertical), mode 1 (horizontal), mode 2 (DC), the predictions are similar with the cases of 4x4 luma block. For mode 4 (plane), a linear ‘plane’ function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance [19].

For intra/inter block decision, it is observed that the adoption of fast motion estimation greatly reduces the computational complexity of inter block coding. As a result, the time required for checking the modes of intra blocks becomes relatively significant compared to that of inter blocks. In addition, although the number of intra blocks of an inter frame is usually small, it takes a considerable amount of computation to come to the decision exhaustively [4]. Therefore, a fast algorithm for intra block decision is needed.

Fig. 6. Intra 4x4 prediction mode directions (vertical, 0; horizontal, 1; DC, 2; diagonal down left, 3; diagonal down right, 4; vertical right, 5; horizontal down, 6; vertical left, 7; and horizontal up, 8) [17]

Fig. 7. Intra 16 x 16 prediction modes [19]

Proposed Algorithm for Block Size DecisionBlock size is highly correlated with texture complexity. Variance of block corresponds to the

total energy of the AC coefficients of the block; hence it is good measurement of the texture complexity. Thus variance based classification of texture complexity [16] is used in the algorithm. Variance for 16x16 macroblock is given as:

VARIANCE=∑i=0

15

∑j=0

15

[Y (i , j )]2− 1256 [∑i=0

15

∑j=0

15

Y (i , j )]2

where Y(i, j) is the luminance value of the pixel at (i, j). The texture complexity of an MB is classified according to its variance as follows:

complexity={high ,∧if variance>T 1

low ,∧otherwise (2)

where T1 is a threshold, which is set to 92 735 for 8 bit PCM. The determination of this value is found in [16].

The flow of the proposed block size decision is shown in Fig. 8. If the calculated variance is above the threshold, Intra4MB and Intra8MB modes are selected; otherwise, Intra8MB and Intra16MB are chosen. This is a simple way to skip the examination of Intra4MB mode.

(1)

Fig. 8. Variance-based MB mode decision [4]

Improved Prediction Mode DecisionThis algorithm is based on the filter-based algorithm described in [8]. However this algorithm

only considers the edge information of the current block and does not take correlation between blocks into account. Hence use of Most Probable Mode (MPM) is incorporated to the algorithm described in [8].

Fig. 9. Formation of subsampled block for a block of (a) Intra4MB, (b) Intra8MB, and (c) Intra16MB [4]

The edge detection process is based on a spatial domain filtering technique. As indicated in Fig. 9, each original image block is evenly divided into four subblocks first. Each subblock will be represented by the average pixel magnitude of its pixels. Thus, we can obtain a 2x2 pseudoblock with, a 0, a1, a2 and a3

pixel values for the mode decision, where ai is the average of the ith subblock. For example, Fig. 10 illustrates the computed pseudoblock of a 4x4 luma block, where the 2x2 pseudoblock with a0 = (a+b+e+f)/4, a1 = (c+d+g+h)/4, a2 = (i+j+m+n)/4, and a3 = (k+l+o+p)/4. Let fv, fh, f45°, f135°, and fnd denote

Fig. 10. Computation of subblocks for detection of dominant edge (4 x 4 luma block).

Fig. 11. Five sets of filter coefficients for dominant edge detection. [4]

the 2 x 2 filter coefficients for computing the vertical, horizontal, 45° diagonal, 135° diagonal and non directional edges, respectively. The filter coefficients for all directions are shown in Fig. 11. The edge strength of each direction is derived from the corresponding filtering operation

The dominant edge will be chosen as the one with the maximum value obtained from (1) to (5). Thus, the detected edge corresponding to the dominant edge strength is expressed by

Along the direction of the block texture, each detected dominant edge will highly relate to a predicted mode in intra-prediction. In other words, the vertical, horizontal, 45 diagonal, and 135

diagonal edges directly associates with the modes 0, 1, 3, and 4, respectively. The fast mode decision algorithm mainly depends on the detected predicted mode.

The MPM, which takes advantage of the spatial correlation of the prediction modes between the neighboring blocks and the current block for coding, is defined as the prediction mode of the left or the upper neighbor, whichever has the smaller prediction mode number. If either of these two neighbors is unavailable, a default DC mode is considered. Since only one bit is needed to signal the MPM, the other eight modes can be effectively coded with three bits [1], keeping the total number of bits required for representing the prediction mode of the current block to four.

The block diagram of the improved algorithm for prediction mode decision is shown in Fig. 12. For 4×4 luma block, three intra prediction modes along the direction of the detected edge and its adjacent directions are chosen as candidate modes. Since the MPM is more likely than the DC mode to be the best mode, always include the MPM as the candidate mode. If the MPM is the same as any of the other three candidate modes, replace it by the DC mode.

Fig. 12. Prediction mode decision [4]

Selective Intra Block DecisionIntra block decision, which is performed for inter frames, occupies a considerable

percentage of the total computations of inter-frame coding. Intra16MB takes much less

Input 2x2 subsampled block

Pass through the filters separately

Determine the dominant edge

Choose the candidate modes

computation time than the other modes. In addition, it has been found that the rate-distortion (R-D) cost of Intra16MB is close to that of Intra4MB [13]. Therefore, it is less probable for Intra4MB to be the best mode if the R-D cost of Intra16MB is much larger than that of the best inter MB mode. Hence scaled R-D cost is used.

The R-D cost, denoted by Jmode, of an MB mode is defined as follows:

where mode ∈ {Inter modes, Intra4MB, Intra8MB, Intra16MB}, QP is the quantization parameter, λ is a Lagrange multiplier, SSD(·) denotes the sum of squared difference, and R(·) denotes the total number of bits used for coding the selected mode, the macroblock header, and the residual information [4].

The scaled R-D cost is defined as [4]

Ĵmode=J mode

λ (4)

where λ is a scaling factor. The advantage of using the scaled R-D cost is that it lies in the same range under different QPs, which makes the analysis simpler and, most importantly, a universal threshold can be used. An MB is very likely to be inter-coded when the R-D cost of the best inter mode is much lower than that of the Intra16MB. On the other hand, the intra modes would be selected as the best mode when the R-D cost of the best inter mode is higher than or close to the R-D cost of Intra16MB. Therefore, it is obvious to predict that an MB is less probable to be intra coded if the R-D cost difference is small. Denoting the scaled R-D cost differences between Intra16MB and the inter MB mode by dĴ, based on the above observation, both Intra4MB and Intra8MB are get skipped if dĴ is small. This prediction rule has a significant computational advantage for inter-frame coding since the examination of Intra4MB and Intra8MB is computationally more complex.

The proposed intra block decision algorithm consists of the following steps [4].

1) For each MB in an inter frame, examine all inter MB modes and store the R-D cost of the best inter MB mode.

2) Examine the Intra16MB mode and store its R-D cost.3) If dĴ >T2, check Intra4MB and Intra8MB.4) Select the MB mode that has the minimum R-D cost among all visited modes as the best

mode.

The threshold T2 in Step 3 is set to −150, which is empirically determined according to the histograms of the R- D cost difference (ĴB − ĴIntra16MB) where ĴB is the R-D cost of the best inter mode. The same T2 is applied to all the sequences used. It should be noted that, since λ in (4) is a function of QP [1] and controls dĴ, the algorithm is adaptive to QP. No manual tuning is required.

(3)

RESULTSThe JM reference software version 17.2 is implemented [18]. The conditions of the experiments are as follows.

1) Run on a PC with Intel Core i3 2.27GHz processor and 3.00 GB RAM. 2) Set the QP value to 16, 20, 24, and 28.3) Number of frames to be coded is 100.4) Enable the R-D optimization.5) Choose CABAC as the entropy coding method.

Hall (QCIF and CIF), Container (QCIF and CIF) and Mobile(CIF) (Fig. 13) is used for this experiment. The performance of JM reference software is observed with respect to varying values of quantization parameter (QP) for Hall (QCIF) in Table 1. Table 1 shows the variation of peak signal to noise ratio (PSNR), bit rate and total encoding time with QP values set at 28, 24, 20 and 16. Fig. 14 shows the respective variation in graphical format.

Fig. 13. Sequences used in the experiment

Hall Container

Mobile

Table IPerformance of Hall (QCIF) in JM reference software version 17.2

QP PSNR(dB) Bit rate (kbits/s) Time (s)28 38.685 697.64 20.02224 41.378 961.59 30.87520 44.118 1359.14 34.09416 47.085 1930.5 37.767

600 800 1000 1200 1400 1600 1800 20003537394143454749

QP= 28

QP= 24

QP=20

QP=16

Bitrate vs PSNR for Hall (QCIF)

Bit rate (kbits/s)

PSN

R (d

B)

(a)

10 15 20 25 30 35 403537394143454749

QP=28

QP=24

QP=20

QP=16

Encoding Time vs PSNR

Time (s)

PSN

R (d

B)

(b)

Fig. 14. Performance plots for Hall (QCIF) (a) Bit rate vs PSNR. (b) Encoding time vs PSNR.

A. Variance-Based Block Size DecisionAs shown in Table II, The variance-based block size decision achieves on the average 19% (maximum 25.4%) time saving while maintaining almost the same R-D performance as that of the full-search scheme employed in the JM reference software. For Mobile sequence that has more texture regions, less time is saved due to the computationally expensive operations for examining I4MB. As the texture of an MB becomes homogeneous as the frame size increases, more time is saved for high resolution sequences. Note that the bit-rate increase for all sequences in this test is below 0.5%, which is negligible.

Sequence Decrease in PSNR (dB) Increase in bitrate (%)Decrease in encoding

time (%)

Container (QCIF) 0.099 0.49 23.7

Hall (QCIF) 0.051 0.27 11.5

Container (CIF) 0.045 0.26 25.4

Hall (CIF) 0.054 0.43 24.2

Mobile (CIF) 0.019 0.07 7.41

Table II. Performance of variance based block size decision in JM reference software version 17.2

B. Improved Prediction Mode DecisionTable III shows the results of algorithm developed by Wang et al [8]. This algorithm do not incorporate most probable mode for prediction mode decision. The results of improved prediction mode decision which incorporates most probable mode (MPM) are shown in Table IV for various CIF and QCIF sequences. The results of the two algorithms clearly show that the improved prediction mode decision with MPM has lesser PSNR drop (average of 10.24%) with saving of time of almost as high as 32%.


time (%)

Container (QCIF) -0.186 0.95 -31.0

Hall (QCIF) -0.181 1.01 -28.6

Container (CIF) -0.150 0.83 -29.1

Hall (CIF) -0.203 1.60 -32.0

Mobile (CIF) -0.174 0.64 -25.4

Table III. Performance of prediction mode decision [8] in JM reference software version 17.2 compared to using JM v17.2 directly.


time (%)

Container (QCIF) -0.092 0.47 -31.0

Hall (QCIF) -0.118 0.65 -29.1

Container (CIF) -0.075 0.41 -29.1

Hall (CIF) -0.091 0.70 -32.4

Mobile (CIF) -0.136 0.50 -25.9

Table IV. Performance of improved prediction mode decision in JM reference software version 17.2 compared to using JM v17.2 directly.

C. Two-Stage Approach to Intra Frame CodingTable V shows the performance of the two-stage approach. It can be seen that up to 52.6% complexity reduction over the JM reference software version 17.2 is achieved, and both the PSNR drop and the bit rate increase are very small. Overall, the two-stage approach takes advantage of both the prediction mode decision and the block size decision to provide a more efficient way for fast mode decision.


time (%)

Container (QCIF) -0.191 0.97 -48.5

Hall (QCIF) -0.161 0.90 -36.7

Container (CIF) -0.123 0.68 -50.5

Hall (CIF) -0.145 1.12 -52.6

Mobile (CIF) -0.154 0.57 -35.5

Table V. Performance of two-stage approach incorporating Variance based block size decision and improved prediction mode decision in JM reference software version 17.2 compared to using JM v17.2 directly.

500 700 900 1100 1300 1500 1700 1900 210037

39

41

43

45

47

49

Bitrate vs PSNR for Hall (QCIF)

JM 17.2Variance based block size decisionImproved Prediction mode or MPMTwo-Stage approach

Bitrate (kbits/s)

PSN

R (d

B)

Fig. 15. Comparison of the R-D curves of various algorithms for Hall (QCIF).

ConclusionsThe mode decision in H.264 encoder is a computationally expensive operation. How to achieve fast decision of block size, prediction mode, and intra block for H.264 intra prediction is described. The proposed algorithms provide significant improvement of computational efficiency over previous methods without noticeable R-D performance loss. This can be easily seen from Fig. 15. The efficiency gain of these algorithms is attributed to the fact that the design of these algorithms is fully in line with the hierarchical structure of the mode decision process. Under this design principle, these algorithms can work independently or together. Working together, they constitute an integral approach to intra-frame coding and inter-frame coding, for which high accuracy of mode prediction is achieved without complex operations.

REFERENCES

[1] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, document JVT-G050.doc, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, 2003.

[2] T. Wiegand et al, “Overview of H.264 video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[3] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, Nov. 1998.

[4] Y.-W. Huang, T. Ou, and H.Chen,” Fast decision of block size, prediction mode and intra block for H.264 intra prediction,” I EEE Trans. Circuits Syst. Video Technol., vol. 20, no.8, pp. 1122-1132, Aug. 2010

[5] Y.-W. Huang et al, “Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 378–401, Mar. 2005.

[6] F. Pan et al “Fast mode decision algorithm for intra prediction in H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 813–822, Jul. 2005.

[7] C. Tseng, H. Wang, and J. Yang, “Enhanced intra-4x4 mode decision for H.264/AVC coder,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 8, pp. 1027–1032, Aug. 2006.

[8] J.-C. Wang, J.-F. Wang, and J.-T. Chen, “A fast mode decision algorithm and its VLSI design for H.264/AVC intra-prediction,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 10, pp. 1414–1422, Oct. 2007.

[9] A.-C. Tsai et al, “Intensity gradient technique for efficient intra-prediction in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 5, pp. 694–698, May 2008.

[10] H. Li, K. Ngan, and Z. Wei, “Fast and efficient method for block edge classification and its application in H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 6, pp. 756–768, Jun. 2008.

[11] A.-C. Tsai et al, “Effective subblock- based and pixel-based fast direction detections for H.264 intra prediction,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 7, pp. 975–982, Jul. 2008.

[12] K. Bharanitharan et al, “A low complexity detection of discrete cross differences for fast H.264/AVC intra prediction,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1250–1260, Nov. 2008.

[13] I. Choi, J. Lee, and B. Jeon, “Fast coding mode selection with rate-distortion optimization for MPEG-4. Part 10: AVC/H.264,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 12, pp. 1557–1561, Dec. 2006.

[14] C. Kim and C. Jay Kuo, “Feature-based intra/inter coding mode selection for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 4, pp. 441–453, Apr. 2007.

[15] B. Kim, “Fast selective intra-mode search algorithm based on adaptive thresholding scheme for H.264/AVC encoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 1, pp. 127–133, Jan. 2008.

[16] A. Yu, G. Martin, and H. Park, “Fast inter-mode selection in the H.264/AVC standard using a hierarchical decision process,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 186–195, Feb. 2008.

[17] S-k. Kwon, A. Tamhankar, and K. R. Rao, "Overview of H.264/MPEG-4 part 10,” Journal of Visual Communication and Image Representation, vol. 17, no. 2, pp. 186-216, Apr. 2006.

[18] JM software – http://iphome.hhi.de/suehring/tml/ This software is a product of Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG. The latest version of JM Software is 18.0

[19] Iain E. Richardson , “The H.264 Advanced Video Compression Standard”, II edition, Wiley publications, 2010.

.

http://iphome.hhi.de/suehring/tml/

objective · web viewfast decision of block size, prediction mode and intra block for h.264 intra...

Documents