study and optimization of the deblocking filter in h.265 and its

1

Study and Optimization of the Deblocking Filter in H.265 and its

Advantages over H.264/AVC

Valay Shah

Supervising Professor: Dr. K. R. Rao

Abstract: H.265 or High Efficiency Video Coding (HEVC) has been developed with a goal to

achieve significant compression relative to existing standards in the range of 50% bit rate

reduction for the same perceptual video quality [1]. HEVC has been designed to address nearly

all existing applications of H.264/AVC and additionally to focus on two key issues: (a) increased

video resolution and (b) increased use of parallel processing architectures. The significant

reduction in the bit rate comes at the cost of increased coding algorithm complexity and hence

increased processing time and higher hardware cost [1]. The objective of this project is twofold:

(a) to study the working of in-loop deblocking filter and propose the modification which can

improve the performance of HEVC codec and (b) to compare HEVC codec performance with its

predecessor H.264/AVC.

1. HEVC Coding Design

The HEVC standard is designed to achieve multiple goals, including coding efficiency, ease of

transport system integration and data loss resilience, as well as implement parallel processing

architectures. In order to implement the parallel processing one needs to understand the HEVC

design and capability thoroughly [1]. The following subsection describes the key elements of

picture partitioning which is the most powerful technique compared to its predecessors and at

the same time most complex and time consuming.

a) Video Coding Layer

The video coding layer of HEVC employs the same hybrid approach of inter-/intrapicture

prediction and 2-D transform coding which are also used in all video compression

standards since H.261. Figure 1 depicts the block diagram of a hybrid video encoder

which could create the output bitstream for HEVC standard [1]. In order to generate the

bitstream the picture is first split into block shaped regions, with the exact block

partitioning being conveyed to the decoder.

2

Fig. 1: Typical HEVC video encoder with decoder modeling elements shaded in light gray

[1].

The first picture of the video sequence and the first picture at each clean random access

point into a video sequence is coded using only intra-picture prediction technique. For

all the remaining pictures of the sequence inter-picture prediction modes are typically

used for most blocks. The picture partitioning in HEVC is done using coding tree units

and coding tree blocks (CTB), coding units and coding blocks (CB), prediction units and

prediction blocks (PB) and transform units and transform blocks (TB). The maximum size

of the coding layer in previous standards was the macroblock, containing a 16×16 block

of luma samples and in the case of 4:2:0 color sampling, two corresponding 8×8 blocks

of chroma samples. Whereas in HEVC standard it is called the coding block units of size

L×L where L=16, 32 or 64 samples with the larger sizes typically enabling better

compression. This CTB can further be divided into smaller blocks using a tree structure

and quadtree-like signaling.

These prediction units increase the complexity of the encoder algorithm and hence

require more processing time. Some researchers have suggested that optimization of

the deblocking filter algorithm can lead to reduction in processing time [2]. The couple

of ways by which this algorithm can be modified is explained in the following sections.

3

2. HEVC Deblocking Filter Design

The deblocking filter of HEVC is similar to H.264/AVC and is implemented in the inter-prediction

loop. However, the design is simplified in regard to its decision making and filtering processes

which makes parallel processing easier. In HEVC, deblocking filter (DBF) followed by an SAO

filer, are applied to the reconstructed samples before writing them into the decoded picture

buffer in the decoder loop. The DBF is supposed to reduce the blocking artifacts due to block-

based coding and it is only applied to the samples located at block boundaries [1]. The detailed

deblocking filtering process of HEVC is explained in the next subsection.

a. Working of a Deblocking Filter

The deblocking filter is applied to all the samples adjacent to a Prediction Unit (PU) or

Transform Unit (TU) boundary except when the boundary is also the picture boundary or

whenever the deblocking is disabled across slice or tile boundaries. This option is signaled by

the encoder. This is pictorially shown in figure 2.

Fig. 2: Schematic showing the edges of PU, TU and picture boundary [2].

The reason for including both the PU and TU boundaries is, PU boundaries are not always

aligned with the TU boundaries in some cases of inter-picture predicted Coding Blocks (CB). The

syntax elements that control the deblocking filtering across the slice and tile boundaries is

situated in the SPS and slice headers. In HEVC the deblocking filter is applied to the edges that

are aligned on an 8×8 sample grid, for both the luma and chroma samples instead of a 4×4

sample grid basis as was used in H.264/AVC. This restriction reduces the worst case

computational complexity without noticeable degradation of the image visual quality. It also

helps in improving parallel processing operation by preventing cascading interactions between

nearby filtering operations [1].

Similar to H.264/AVC scheme, the strength of the deblocking filter is controlled by the values of

several syntax elements. However three out of five strengths are only used. For example as

4

shown in figure 3 given that P and Q are two adjacent blocks with a common 8×8 grid

boundary, the filter strength of 2 is assigned when one of the block is predicted using intra-

picture prediction. Otherwise, the filter strength of 1 is assigned when any of the following are

met [1].

(i) P or Q has at least one nonzero transform coefficient.

(ii) The reference indices of P and Q are not equal.

(iii) The motion vectors of P and Q are not equal.

(iv) The difference between a motion vector component of P and Q is greater than or

equal to one integer sample.

The filter strength of 0 is assigned if none of the above conditions are met. In other words, the

deblocking process is not applied. Figure 3 depicts an example filtering decision of vertical edge

and pixel samples.

Fig. 3: Filtering decision example for HEVC [2].

According, to the filter strength and the average quantization parameter of P and Q, two

thresholds, tC and β, are determined from the predefined tables. One of the three cases, no

filtering, strong filtering and weak filtering is chosen based on the β value for luma samples. The

computational complexity is reduces by sharing this decision across four luma rows or columns

using the first and the last rows or columns. For chroma samples there are only two cases: no

filtering and normal filtering. When the filter strength is greater than 1 normal filtering is

applied. The filtering process is then performed using the control variables tC and β [1].

In HEVC, horizontal filtering for vertical edges for the entire image is performed followed by the

filtering of horizontal edges. This is why HEVC deblocking is also called parallel de-blocking. This

specific order enables either multiple horizontal filtering or vertical filtering processes to be

applied in parallel threads, or can still be implemented on a CTB by CTB basis with only a small

processing latency [1]. The detailed filtering process is explained in figure 4 [2].

5

Fig. 4: Detailed explanation of deblocking filtering procedure in HEVC [2].

As per the basic ordering principle of HEVC, the right most horizontal edges in the current LCU

could not be processed before the leftmost vertical edges of next LCU is processed. For

example in figure 4 the filtering on edge 21 and 22 will be done after edge 17 through 20 is

completed. From the time slot it is easy to see that the filtering for #n+1, #n and #n-1 LCU is not

sequential but alternative, which introduce 3 drawbacks as explained below:

(i) The control of the filtering is complex and the hardware cost in control part is large.

Usually, the control part cost is larger than the filtering computational part, so the

control complexity is very critical for the hardware design.

(ii) The filtering of one LCU involves the data from left, right and upper neighboring

LCUs. The cost of buffers or memory accesses will be increased.

(iii) There is latency in the process of current LCU. In other words, the filtering of current

LCU cannot be completed before the data of next LCU is available. This will decrease

the throughput of the whole decoding system.

6

b. Filtering Operations

(i) Normal Filtering Operations

The filter will be active when a picture contains an inclined surface (or linear ramp surface) that

crosses a block boundary. In such cases, the signal will not be modified by the normal

deblocking filtering operations. In a normal filtering mode for a segment of four lines as shown

in figure 5, filtering operations are applied for each line [4]. The detailed math for calculating

the filtered pixel values across the block boundary is elsewhere [4].

Fig. 5: Four-pixel long vertical block boundary formed by the adjacent blocks P and Q. Deblocking decisions are based on lines marked with the dashed line (lines 0 and 3) [4].

c. Sequence and Picture Level Adaptivity

Since different video sequences have different characteristics, deblocking strength can be adjusted on a sequence and even on a picture basis. As mentioned earlier, the main sources of blocking artifacts are block transforms and quantization. Therefore, blocking artifact severity depends, to a large extent, on the quantization parameter QP. Therefore, in the deblocking filtering decisions, the QP value is taken into account. Thresholds β and tC depend on the average QP value of two neighboring blocks with common block edge [16] and are typically stored in corresponding tables. The dependence of these parameters on QP is shown in figures (6) – (7) [4]. The parameter β controls what edges are filtered, controls the selection between the normal and strong filter, and controls how many pixels from the block boundary are modified in the normal filtering operation. One can observe that the value of β increases with QP. Therefore, deblocking is enabled more frequently at high QP values compared to low QP values, high QP values correspond to coarse, and low QP values correspond to fine quantization. One can also see that the deblocking operation is effectively disabled for low QP values by setting one or both of β and tC to zero [4]. The parameter tC controls the selection between the normal and strong filter and determines the maximum absolute value of modifications that are allowed for the pixel values for a certain QP for both normal and strong filtering operations. This helps adaptively limit the amount of blurriness introduced by the deblocking filtering. The deblocking parameters tC and β provide adaptivity according to the QP and prediction type. However, different sequences or parts of the same sequence may have

7

different characteristics. It may be important for content providers to change the amount of deblocking filtering on the sequence or even on a slice or picture basis. Therefore, deblocking adjustment parameters can be sent in the slice header or picture parameters set (PPS) to control the amount of deblocking filtering applied. The corresponding parameters are tc−offset−div2 and beta−offset−div2 [15]. These parameters specify the offsets (divided by two) that are added to the QP value before determining the β and tC values. The parameter beta−offset−div2 adjusts the number of pixels to which the deblocking filtering is applied, whereas parameter tc−offset−div2 adjusts the amount of filtering that can be applied to those pixels, as well as detection of natural edges [4].

Fig. 6: Dependence of β on QP [4].

Fig. 7: Dependence of tC on QP [4].

8

3. Methods to Optimize Deblocking Operation

Several ways have been suggested by different authors to decrease either the complexity of

deblocking filtering or the time it takes to filter out the artifacts introduced by coding unit

boundaries. Some of them have been discussed in this section.

a. Unified Cross-Based Approach

A novel processing order is proposed in [2] by Li et al where the blocks are chosen and

combined to form a processing unit which is shown in figure 8. This is termed as unified-cross

unit which is different from LCU. This unit is symmetric and the edges need to be filtered are

arranged in several crosses. The benefit of this approach is that the unified-cross units are

independent with each other. The processing order for the unified-cross unit is shown in figure

9.

Fig. 8: Different blocks are chosen to combine a processing unit called unified-cross unit

which is different than LCU approach [2].

9

Fig. 9: Unified-cross based processing [2].

The advantages of implementing the unified-cross based processing is it can implement the

parallel processing in true sense which results in decreased computing time and less hardware

requirements. This method seems to be efficient but since it requires new hardware to be built

in order to implement the given algorithm, it would be out of scope for the current project.

b. Low Complexity Deblocking Filter Perceptual Optimization For The HEVC Codec Approach

The other technique of reducing the complexity and time consumption by the deblocking filter is suggested in [3] by Naccari et al. The deblocking filter in HEVC provides two offsets to vary the amount of filtering for each image area. The perceptual optimization is performed by varying these two offsets to minimize a Generalized Block-edge Impairment Metric (GBIM). A low complexity deblocking filter offsets perceptual optimization is proposed to improve the GBIM quality while reducing the computational resources significantly that would have been required be a brute force approach where all possible offset values would be comprehensively tested [3]. The proposed GBIM extension comprises of two terms: (i) perceptually weighted block edge pixel difference, Mh (Mv) which basically represents the norm of the horizontal (or vertical) block edge pixel differences, weighted by the perceptual weight wp. (ii) Perceptually weighted non block edge average difference, Eh (Ev) which represents the norm of the average for those pixels between horizontal (or vertical) block edges. The frame level GBIMf is calculated by using (1) [3].

10

)(5.0)(5.0v

v

h

hf

E

M

E

MGBIM (1)

4. Scope Of This Project

The main objective of this project is to reduce the processing time by modifying the deblocking

algorithm. This can be achieved by implementing various algorithms as suggested by experts in

the literature on HEVC [2]-[5]. The output can be compared by using various test sequences

suggested and made available by HEVC standard development committee. Some of the ways

that will be implemented on the HEVC test codec also known as HM (HEVC Test Model) code in

this project are listed below:

(i) To understand and implement the unified-cross based processing in deblocking

filtering unit in HM codec [2].

(ii) To understand and implement a low complexity offsets perceptual optimization for

deblocking filtering unit in HM codec [3].

(iii) To understand and implement the skipping mode technique in order to decrease

edge processing thereby reducing the power consumption in deblocking filtering

unit in HM codec.

(iv) To compare the HEVC performance after implementing (i)-(iii) in HM codec with

H.264/AVC codec performance using processor clock cycle, mean square error (MSE)

and signal to noise ratio (SNR).

5. Results

The standard test sequences provided by the HEVC JCT-VC [19] were encoded using

encoder_intra_main.cfg configuration file in two different ways: (a) by disabling the deblocking

filter and (b) varying the deblocking parameter to get the best results in terms of PSNR in dB,

total time needed in seconds for encoding and bit-rate in kbps. These results were obtained by

running four test sequences as listed in table 1 with the results in tables 2 through 4.

Sequence # Test Sequence Resolution (megapixels)

Frequency (Hz)

1 BasketBall.yuv 832×480 50

2 BQMall.yuv 832×480 60

3 Kirsten&Sara.yuv 1280x720 60

4 RaceHorses.yuv 416×240 30

Table 1: List of test sequences used to generate results with description.

11

# of Frames Seq. 1 Seq. 1 Seq. 2 Seq. 2 Seq. 3 Seq. 3 Seq. 4 Seq. 4

1 57.19 56.878 61.418 62.119 116.486 120.05 16.426 16.63

2 113.662 117.297 124.379 124.754 230.396 232.534 32.62 32.682

5 284.295 287.661 307.445 309.894 574.205 575.407 80.902 81.229

10 568.371 568.028 614.891 614.391 1154.293 1149.987 162.459 161.94

100 5712.818 5701.007 6044.842 6056.827 - - - -

Table 2: Total time taken in sec for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes

results for case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).


1 5816 5825.2 10667.04 10679.04 7966.08 7960.8 1932.72 1934.4

2 5801.6 5810.4 10677.12 10687.68 7948.08 7949.52 1902.24 1902.48

5 5762.24 5769.84 10653.12 10661.38 7910.112 7915.68 1836.768 1837.536

10 5714 5721.68 10612.85 10619.33 7895.712 7900.08 1818.144 1818.816

100 5910.58 5917.288 9837.552 9844.315 - - - -

Table 4: Bit-rate in kbps for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes results

for case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).


1 36.3402 36.2705 36.2410 36.1578 40.65506 40.61633 34.89253 34.8078

2 36.3402 36.2705 36.2553 36.1716 40.65451 40.62429 34.90798 34.8258

5 36.3415 36.2669 36.2494 36.1595 40.66288 40.63086 34.93489 34.8476

10 36.3525 36.2791 36.2783 36.1862 40.65958 40.62673 34.95426 34.8701

100 36.3171 - 36.3994 - - - - -

Table 5: PSNR in dB for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes results for

case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).

6. Conclusions

It is apparent from the results as shown in table 2 through 4 that the bit-rate have increased by

optimizing the deblocking filter parameters. Hence, there are two benefits of applying the

deblocking filter, (i) it will help remove the blocking artifacts from the reconstructed image and

(ii) it will help increase the bit-rate of the signal. The performance can still be improved by

including the Quantization Parameter (QP) effect on the deblocking filter parameters

optimization.

7. References

12

[1] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE

Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec.

2012.

[2] M. Li et al, “De-blocking Filter Design for HEVC and H.264/AVC”, PCM 2012, LNCS 7674, pp.

273–284, 2012.

[3] M. Naccari et al, “Low Complexity Deblocking Filter Perceptual Optimization For The HEVC

Codec”, 18th IEEE International Conference on Image Processing, pp. 737-740, 2011.

[4] A. Norkin et al, “HEVC Deblocking Filter”, IEEE Transactions on Circuits and Systems for

Video Technology, Vol. 22, No. 12, pp. 1746-1754, Dec. 2012.

[5] A. J. Honrubia, J. L. Martínez and P. Cuenca, “HEVC: A Review, Trends and Challenges”,

Instituto de Investigación en Informática de Albacete, Spain.

[6] T. Wiegand et al, “High Efficiency Video Coding (HEVC) Standarization”, IEEE Transactions on

Circuits and Systems for Video Technology, Dec. 2010.

[7] C. Man-Yau and S. Wan-Chi, “Computationally-Scalable Motion Estimation Algorithm for

H.264/AVC Video Coding”, IEEE Transactions on Consumer Electronics, vol. 56, pp. 895-903,

2010.

[8] R. Jianfeng, N. Kehtarnavaz, and M. Budagavi, “Computationally Efficient Mode Selection in

H.264/AVC Video Coding, IEEE Transactions on Consumer Electronics, vol. 54, pp. 877-886,

2008.

[9] K. R. Rao, “High Efficiency Video Coding”, Chapter 5 – soon to be published.

[10] P. List et al, “Adaptive deblocking filter”, IEEE Transactions on Circuits and Systems for

Video Technology, vol. 13, pp. 614-619, 2003.

[11] K. Xu and C. S. Choy, “A Five-Stage Pipeline, 204 Cycles/MB, Single-Port SRAM-Based

Deblocking Filter for H.264/AVC”, IEEE Transactions on Circuits and Systems, vol. 18(3), pp.

363–374, 2008.

[12] F. Tobajas et al, “An Efficient Double-Filter Hardware Architecture for H.264/AVC De-

blocking Filtering”, IEEE Transactions on Consumer Electronics, Vol. 54(1), Feb. 2008.

[13] Y. C. Lin et al, “A Two-Result-Per-Cycle De-Blocking Filter Architecture for QFHD H.264/AVC

Decoder”, IEEE Transactions on VLSI Systems, vol. 17(6), June 2009.

[14] D. Zhou et al, “A 48 Cycles/MB H.264/AVC De-blocking Filter Architecture for Ultra High

Definition Applications”, IEICE Transactions Fundamentals E92-A (12), Dec. 2009.

13

[15] T. Yamakage, S. Asaka, T. Chujoh, M. Karczewicz, and I. S. Chong, CE12: Deblocking Filter

Parameter Adjustment in Slice Level, ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document

JCTVCG174, Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov. 2011.

[16] G. Van der Auwera, X. Wang, M. Karczewicz, M. Narroschke, A. Kotra, and T. Wedi

(Panasonic), Support of Varying QP in Deblocking, ITU-T SG16 WP3 and ISO/IEC

JTC1/SC29/WG11 document JCTVCG1031, Joint Collaborative Team on Video Coding (JCTVC),

Geneva, Switzerland, Nov. 2011.

[17] JM software download for H.264/AVC: http://iphome.hhi.de/suehring/tml/

[18] HM codec download for H.265:

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/

[19] HEVC standard test video sequences:

ftp://ftp.tnt.uni-hannover.de/testsequences

http://iphome.hhi.de/suehring/tml/

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/

ftp://ftp.tnt.uni-hannover.de/testsequences

study and optimization of the deblocking filter in h.265 and its

Documents