study and optimization of the deblocking filter in h.265 and its
TRANSCRIPT
1
Study and Optimization of the Deblocking Filter in H.265 and its
Advantages over H.264/AVC
Valay Shah
Supervising Professor: Dr. K. R. Rao
Abstract: H.265 or High Efficiency Video Coding (HEVC) has been developed with a goal to
achieve significant compression relative to existing standards in the range of 50% bit rate
reduction for the same perceptual video quality [1]. HEVC has been designed to address nearly
all existing applications of H.264/AVC and additionally to focus on two key issues: (a) increased
video resolution and (b) increased use of parallel processing architectures. The significant
reduction in the bit rate comes at the cost of increased coding algorithm complexity and hence
increased processing time and higher hardware cost [1]. The objective of this project is twofold:
(a) to study the working of in-loop deblocking filter and propose the modification which can
improve the performance of HEVC codec and (b) to compare HEVC codec performance with its
predecessor H.264/AVC.
1. HEVC Coding Design
The HEVC standard is designed to achieve multiple goals, including coding efficiency, ease of
transport system integration and data loss resilience, as well as implement parallel processing
architectures. In order to implement the parallel processing one needs to understand the HEVC
design and capability thoroughly [1]. The following subsection describes the key elements of
picture partitioning which is the most powerful technique compared to its predecessors and at
the same time most complex and time consuming.
a) Video Coding Layer
The video coding layer of HEVC employs the same hybrid approach of inter-/intrapicture
prediction and 2-D transform coding which are also used in all video compression
standards since H.261. Figure 1 depicts the block diagram of a hybrid video encoder
which could create the output bitstream for HEVC standard [1]. In order to generate the
bitstream the picture is first split into block shaped regions, with the exact block
partitioning being conveyed to the decoder.
2
Fig. 1: Typical HEVC video encoder with decoder modeling elements shaded in light gray
[1].
The first picture of the video sequence and the first picture at each clean random access
point into a video sequence is coded using only intra-picture prediction technique. For
all the remaining pictures of the sequence inter-picture prediction modes are typically
used for most blocks. The picture partitioning in HEVC is done using coding tree units
and coding tree blocks (CTB), coding units and coding blocks (CB), prediction units and
prediction blocks (PB) and transform units and transform blocks (TB). The maximum size
of the coding layer in previous standards was the macroblock, containing a 16×16 block
of luma samples and in the case of 4:2:0 color sampling, two corresponding 8×8 blocks
of chroma samples. Whereas in HEVC standard it is called the coding block units of size
L×L where L=16, 32 or 64 samples with the larger sizes typically enabling better
compression. This CTB can further be divided into smaller blocks using a tree structure
and quadtree-like signaling.
These prediction units increase the complexity of the encoder algorithm and hence
require more processing time. Some researchers have suggested that optimization of
the deblocking filter algorithm can lead to reduction in processing time [2]. The couple
of ways by which this algorithm can be modified is explained in the following sections.
3
2. HEVC Deblocking Filter Design
The deblocking filter of HEVC is similar to H.264/AVC and is implemented in the inter-prediction
loop. However, the design is simplified in regard to its decision making and filtering processes
which makes parallel processing easier. In HEVC, deblocking filter (DBF) followed by an SAO
filer, are applied to the reconstructed samples before writing them into the decoded picture
buffer in the decoder loop. The DBF is supposed to reduce the blocking artifacts due to block-
based coding and it is only applied to the samples located at block boundaries [1]. The detailed
deblocking filtering process of HEVC is explained in the next subsection.
a. Working of a Deblocking Filter
The deblocking filter is applied to all the samples adjacent to a Prediction Unit (PU) or
Transform Unit (TU) boundary except when the boundary is also the picture boundary or
whenever the deblocking is disabled across slice or tile boundaries. This option is signaled by
the encoder. This is pictorially shown in figure 2.
Fig. 2: Schematic showing the edges of PU, TU and picture boundary [2].
The reason for including both the PU and TU boundaries is, PU boundaries are not always
aligned with the TU boundaries in some cases of inter-picture predicted Coding Blocks (CB). The
syntax elements that control the deblocking filtering across the slice and tile boundaries is
situated in the SPS and slice headers. In HEVC the deblocking filter is applied to the edges that
are aligned on an 8×8 sample grid, for both the luma and chroma samples instead of a 4×4
sample grid basis as was used in H.264/AVC. This restriction reduces the worst case
computational complexity without noticeable degradation of the image visual quality. It also
helps in improving parallel processing operation by preventing cascading interactions between
nearby filtering operations [1].
Similar to H.264/AVC scheme, the strength of the deblocking filter is controlled by the values of
several syntax elements. However three out of five strengths are only used. For example as
4
shown in figure 3 given that P and Q are two adjacent blocks with a common 8×8 grid
boundary, the filter strength of 2 is assigned when one of the block is predicted using intra-
picture prediction. Otherwise, the filter strength of 1 is assigned when any of the following are
met [1].
(i) P or Q has at least one nonzero transform coefficient.
(ii) The reference indices of P and Q are not equal.
(iii) The motion vectors of P and Q are not equal.
(iv) The difference between a motion vector component of P and Q is greater than or
equal to one integer sample.
The filter strength of 0 is assigned if none of the above conditions are met. In other words, the
deblocking process is not applied. Figure 3 depicts an example filtering decision of vertical edge
and pixel samples.
Fig. 3: Filtering decision example for HEVC [2].
According, to the filter strength and the average quantization parameter of P and Q, two
thresholds, tC and β, are determined from the predefined tables. One of the three cases, no
filtering, strong filtering and weak filtering is chosen based on the β value for luma samples. The
computational complexity is reduces by sharing this decision across four luma rows or columns
using the first and the last rows or columns. For chroma samples there are only two cases: no
filtering and normal filtering. When the filter strength is greater than 1 normal filtering is
applied. The filtering process is then performed using the control variables tC and β [1].
In HEVC, horizontal filtering for vertical edges for the entire image is performed followed by the
filtering of horizontal edges. This is why HEVC deblocking is also called parallel de-blocking. This
specific order enables either multiple horizontal filtering or vertical filtering processes to be
applied in parallel threads, or can still be implemented on a CTB by CTB basis with only a small
processing latency [1]. The detailed filtering process is explained in figure 4 [2].
5
Fig. 4: Detailed explanation of deblocking filtering procedure in HEVC [2].
As per the basic ordering principle of HEVC, the right most horizontal edges in the current LCU
could not be processed before the leftmost vertical edges of next LCU is processed. For
example in figure 4 the filtering on edge 21 and 22 will be done after edge 17 through 20 is
completed. From the time slot it is easy to see that the filtering for #n+1, #n and #n-1 LCU is not
sequential but alternative, which introduce 3 drawbacks as explained below:
(i) The control of the filtering is complex and the hardware cost in control part is large.
Usually, the control part cost is larger than the filtering computational part, so the
control complexity is very critical for the hardware design.
(ii) The filtering of one LCU involves the data from left, right and upper neighboring
LCUs. The cost of buffers or memory accesses will be increased.
(iii) There is latency in the process of current LCU. In other words, the filtering of current
LCU cannot be completed before the data of next LCU is available. This will decrease
the throughput of the whole decoding system.
6
b. Filtering Operations
(i) Normal Filtering Operations
The filter will be active when a picture contains an inclined surface (or linear ramp surface) that
crosses a block boundary. In such cases, the signal will not be modified by the normal
deblocking filtering operations. In a normal filtering mode for a segment of four lines as shown
in figure 5, filtering operations are applied for each line [4]. The detailed math for calculating
the filtered pixel values across the block boundary is elsewhere [4].
Fig. 5: Four-pixel long vertical block boundary formed by the adjacent blocks P and Q. Deblocking decisions are based on lines marked with the dashed line (lines 0 and 3) [4].
c. Sequence and Picture Level Adaptivity
Since different video sequences have different characteristics, deblocking strength can be adjusted on a sequence and even on a picture basis. As mentioned earlier, the main sources of blocking artifacts are block transforms and quantization. Therefore, blocking artifact severity depends, to a large extent, on the quantization parameter QP. Therefore, in the deblocking filtering decisions, the QP value is taken into account. Thresholds β and tC depend on the average QP value of two neighboring blocks with common block edge [16] and are typically stored in corresponding tables. The dependence of these parameters on QP is shown in figures (6) – (7) [4]. The parameter β controls what edges are filtered, controls the selection between the normal and strong filter, and controls how many pixels from the block boundary are modified in the normal filtering operation. One can observe that the value of β increases with QP. Therefore, deblocking is enabled more frequently at high QP values compared to low QP values, high QP values correspond to coarse, and low QP values correspond to fine quantization. One can also see that the deblocking operation is effectively disabled for low QP values by setting one or both of β and tC to zero [4]. The parameter tC controls the selection between the normal and strong filter and determines the maximum absolute value of modifications that are allowed for the pixel values for a certain QP for both normal and strong filtering operations. This helps adaptively limit the amount of blurriness introduced by the deblocking filtering. The deblocking parameters tC and β provide adaptivity according to the QP and prediction type. However, different sequences or parts of the same sequence may have
7
different characteristics. It may be important for content providers to change the amount of deblocking filtering on the sequence or even on a slice or picture basis. Therefore, deblocking adjustment parameters can be sent in the slice header or picture parameters set (PPS) to control the amount of deblocking filtering applied. The corresponding parameters are tc−offset−div2 and beta−offset−div2 [15]. These parameters specify the offsets (divided by two) that are added to the QP value before determining the β and tC values. The parameter beta−offset−div2 adjusts the number of pixels to which the deblocking filtering is applied, whereas parameter tc−offset−div2 adjusts the amount of filtering that can be applied to those pixels, as well as detection of natural edges [4].
Fig. 6: Dependence of β on QP [4].
Fig. 7: Dependence of tC on QP [4].
8
3. Methods to Optimize Deblocking Operation
Several ways have been suggested by different authors to decrease either the complexity of
deblocking filtering or the time it takes to filter out the artifacts introduced by coding unit
boundaries. Some of them have been discussed in this section.
a. Unified Cross-Based Approach
A novel processing order is proposed in [2] by Li et al where the blocks are chosen and
combined to form a processing unit which is shown in figure 8. This is termed as unified-cross
unit which is different from LCU. This unit is symmetric and the edges need to be filtered are
arranged in several crosses. The benefit of this approach is that the unified-cross units are
independent with each other. The processing order for the unified-cross unit is shown in figure
9.
Fig. 8: Different blocks are chosen to combine a processing unit called unified-cross unit
which is different than LCU approach [2].
9
Fig. 9: Unified-cross based processing [2].
The advantages of implementing the unified-cross based processing is it can implement the
parallel processing in true sense which results in decreased computing time and less hardware
requirements. This method seems to be efficient but since it requires new hardware to be built
in order to implement the given algorithm, it would be out of scope for the current project.
b. Low Complexity Deblocking Filter Perceptual Optimization For The HEVC Codec Approach
The other technique of reducing the complexity and time consumption by the deblocking filter is suggested in [3] by Naccari et al. The deblocking filter in HEVC provides two offsets to vary the amount of filtering for each image area. The perceptual optimization is performed by varying these two offsets to minimize a Generalized Block-edge Impairment Metric (GBIM). A low complexity deblocking filter offsets perceptual optimization is proposed to improve the GBIM quality while reducing the computational resources significantly that would have been required be a brute force approach where all possible offset values would be comprehensively tested [3]. The proposed GBIM extension comprises of two terms: (i) perceptually weighted block edge pixel difference, Mh (Mv) which basically represents the norm of the horizontal (or vertical) block edge pixel differences, weighted by the perceptual weight wp. (ii) Perceptually weighted non block edge average difference, Eh (Ev) which represents the norm of the average for those pixels between horizontal (or vertical) block edges. The frame level GBIMf is calculated by using (1) [3].
10
)(5.0)(5.0v
v
h
hf
E
M
E
MGBIM (1)
4. Scope Of This Project
The main objective of this project is to reduce the processing time by modifying the deblocking
algorithm. This can be achieved by implementing various algorithms as suggested by experts in
the literature on HEVC [2]-[5]. The output can be compared by using various test sequences
suggested and made available by HEVC standard development committee. Some of the ways
that will be implemented on the HEVC test codec also known as HM (HEVC Test Model) code in
this project are listed below:
(i) To understand and implement the unified-cross based processing in deblocking
filtering unit in HM codec [2].
(ii) To understand and implement a low complexity offsets perceptual optimization for
deblocking filtering unit in HM codec [3].
(iii) To understand and implement the skipping mode technique in order to decrease
edge processing thereby reducing the power consumption in deblocking filtering
unit in HM codec.
(iv) To compare the HEVC performance after implementing (i)-(iii) in HM codec with
H.264/AVC codec performance using processor clock cycle, mean square error (MSE)
and signal to noise ratio (SNR).
5. Results
The standard test sequences provided by the HEVC JCT-VC [19] were encoded using
encoder_intra_main.cfg configuration file in two different ways: (a) by disabling the deblocking
filter and (b) varying the deblocking parameter to get the best results in terms of PSNR in dB,
total time needed in seconds for encoding and bit-rate in kbps. These results were obtained by
running four test sequences as listed in table 1 with the results in tables 2 through 4.
Sequence # Test Sequence Resolution (megapixels)
Frequency (Hz)
1 BasketBall.yuv 832×480 50
2 BQMall.yuv 832×480 60
3 Kirsten&Sara.yuv 1280x720 60
4 RaceHorses.yuv 416×240 30
Table 1: List of test sequences used to generate results with description.
11
# of Frames Seq. 1 Seq. 1 Seq. 2 Seq. 2 Seq. 3 Seq. 3 Seq. 4 Seq. 4
1 57.19 56.878 61.418 62.119 116.486 120.05 16.426 16.63
2 113.662 117.297 124.379 124.754 230.396 232.534 32.62 32.682
5 284.295 287.661 307.445 309.894 574.205 575.407 80.902 81.229
10 568.371 568.028 614.891 614.391 1154.293 1149.987 162.459 161.94
100 5712.818 5701.007 6044.842 6056.827 - - - -
Table 2: Total time taken in sec for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes
results for case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).
# of Frames Seq. 1 Seq. 1 Seq. 2 Seq. 2 Seq. 3 Seq. 3 Seq. 4 Seq. 4
1 5816 5825.2 10667.04 10679.04 7966.08 7960.8 1932.72 1934.4
2 5801.6 5810.4 10677.12 10687.68 7948.08 7949.52 1902.24 1902.48
5 5762.24 5769.84 10653.12 10661.38 7910.112 7915.68 1836.768 1837.536
10 5714 5721.68 10612.85 10619.33 7895.712 7900.08 1818.144 1818.816
100 5910.58 5917.288 9837.552 9844.315 - - - -
Table 4: Bit-rate in kbps for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes results
for case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).
# of Frames Seq. 1 Seq. 1 Seq. 2 Seq. 2 Seq. 3 Seq. 3 Seq. 4 Seq. 4
1 36.3402 36.2705 36.2410 36.1578 40.65506 40.61633 34.89253 34.8078
2 36.3402 36.2705 36.2553 36.1716 40.65451 40.62429 34.90798 34.8258
5 36.3415 36.2669 36.2494 36.1595 40.66288 40.63086 34.93489 34.8476
10 36.3525 36.2791 36.2783 36.1862 40.65958 40.62673 34.95426 34.8701
100 36.3171 - 36.3994 - - - - -
Table 5: PSNR in dB for encoding where columns in yellow (i.e.: 2, 4, 6 & 8) denotes results for
case (a) whereas columns in green (i.e.: 3, 5, 7 & 9) denotes results for case (b).
6. Conclusions
It is apparent from the results as shown in table 2 through 4 that the bit-rate have increased by
optimizing the deblocking filter parameters. Hence, there are two benefits of applying the
deblocking filter, (i) it will help remove the blocking artifacts from the reconstructed image and
(ii) it will help increase the bit-rate of the signal. The performance can still be improved by
including the Quantization Parameter (QP) effect on the deblocking filter parameters
optimization.
7. References
12
[1] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE
Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec.
2012.
[2] M. Li et al, “De-blocking Filter Design for HEVC and H.264/AVC”, PCM 2012, LNCS 7674, pp.
273–284, 2012.
[3] M. Naccari et al, “Low Complexity Deblocking Filter Perceptual Optimization For The HEVC
Codec”, 18th IEEE International Conference on Image Processing, pp. 737-740, 2011.
[4] A. Norkin et al, “HEVC Deblocking Filter”, IEEE Transactions on Circuits and Systems for
Video Technology, Vol. 22, No. 12, pp. 1746-1754, Dec. 2012.
[5] A. J. Honrubia, J. L. Martínez and P. Cuenca, “HEVC: A Review, Trends and Challenges”,
Instituto de Investigación en Informática de Albacete, Spain.
[6] T. Wiegand et al, “High Efficiency Video Coding (HEVC) Standarization”, IEEE Transactions on
Circuits and Systems for Video Technology, Dec. 2010.
[7] C. Man-Yau and S. Wan-Chi, “Computationally-Scalable Motion Estimation Algorithm for
H.264/AVC Video Coding”, IEEE Transactions on Consumer Electronics, vol. 56, pp. 895-903,
2010.
[8] R. Jianfeng, N. Kehtarnavaz, and M. Budagavi, “Computationally Efficient Mode Selection in
H.264/AVC Video Coding, IEEE Transactions on Consumer Electronics, vol. 54, pp. 877-886,
2008.
[9] K. R. Rao, “High Efficiency Video Coding”, Chapter 5 – soon to be published.
[10] P. List et al, “Adaptive deblocking filter”, IEEE Transactions on Circuits and Systems for
Video Technology, vol. 13, pp. 614-619, 2003.
[11] K. Xu and C. S. Choy, “A Five-Stage Pipeline, 204 Cycles/MB, Single-Port SRAM-Based
Deblocking Filter for H.264/AVC”, IEEE Transactions on Circuits and Systems, vol. 18(3), pp.
363–374, 2008.
[12] F. Tobajas et al, “An Efficient Double-Filter Hardware Architecture for H.264/AVC De-
blocking Filtering”, IEEE Transactions on Consumer Electronics, Vol. 54(1), Feb. 2008.
[13] Y. C. Lin et al, “A Two-Result-Per-Cycle De-Blocking Filter Architecture for QFHD H.264/AVC
Decoder”, IEEE Transactions on VLSI Systems, vol. 17(6), June 2009.
[14] D. Zhou et al, “A 48 Cycles/MB H.264/AVC De-blocking Filter Architecture for Ultra High
Definition Applications”, IEICE Transactions Fundamentals E92-A (12), Dec. 2009.
13
[15] T. Yamakage, S. Asaka, T. Chujoh, M. Karczewicz, and I. S. Chong, CE12: Deblocking Filter
Parameter Adjustment in Slice Level, ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document
JCTVCG174, Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov. 2011.
[16] G. Van der Auwera, X. Wang, M. Karczewicz, M. Narroschke, A. Kotra, and T. Wedi
(Panasonic), Support of Varying QP in Deblocking, ITU-T SG16 WP3 and ISO/IEC
JTC1/SC29/WG11 document JCTVCG1031, Joint Collaborative Team on Video Coding (JCTVC),
Geneva, Switzerland, Nov. 2011.
[17] JM software download for H.264/AVC: http://iphome.hhi.de/suehring/tml/
[18] HM codec download for H.265:
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/
[19] HEVC standard test video sequences:
ftp://ftp.tnt.uni-hannover.de/testsequences