pr…  · web viewelectrical engineering ... the world has been sidetracked by digital...

17
A Study on AVS-M Video Standard Sahana Devaraju 1 and K.R. Rao 1 , IEEE Fellow 1 Electrical Engineering Department, University of Texas at Arlington, Arlington, TX E-mail: (sahana.devaraju, rao)@uta.edu Abstract Audio video standard for Mobile (AVS-M) [1][9] is the seventh part of the most recent video coding standard which is developed by AVS workgroup of China which aims for mobile systems and devices with limited processing and power consumption. This paper provides an insight into the AVS-M video standard, features it offers, various data formats it supports, profiles and tools that are used in this standard and architecture of AVS-M codec. A study is done on the key techniques such as transform and quantization, intra prediction, quarter-pixel interpolation, motion compensation modes, entropy coding and in-loop de-blocking filter. Simulation results are evaluated in terms of bitrates and SNR. 1. Introduction Over the past 20 years, analog based communication around the world has been sidetracked by digital communication. The modes of digital representation of information such as audio and video signals have undergone much transformation in leaps and bounds. With the increase in commercial interest in video communications, the need for international image and video compression standards arose. Many successful standards of audio-video signals [18] [19] have been released which have advanced a plethora of applications, the largest of which is the digital entertainment media. Products have been developed which span a wide range of applications and have been enhanced by the advances in other technologies such as the internet and digital media storage. Moving Picture Experts Group (MPEG) [3] was the first group who formed the format, which quickly became the standard for audio and video compression and transmission. Soon after MPEG- 2, was released, being broader in scope, supported interlacing and high definition video formats. Soon later, MPEG-4 uses

Upload: leduong

Post on 29-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

A Study on AVS-M Video Standard

Sahana Devaraju1 and K.R. Rao1, IEEE Fellow1Electrical Engineering Department, University of Texas at Arlington, Arlington, TX

E-mail: (sahana.devaraju, rao)@uta.edu

Abstract

Audio video standard for Mobile (AVS-M) [1][9] is the seventh part of the most recent video coding standard which is developed by AVS workgroup of China which aims for mobile systems and devices with limited processing and power consumption. This paper provides an insight into the AVS-M video standard, features it offers, various data formats it supports, profiles and tools that are used in this standard and architecture of AVS-M codec. A study is done on the key techniques such as transform and quantization, intra prediction, quarter-pixel interpolation, motion compensation modes, entropy coding and in-loop de-blocking filter. Simulation results are evaluated in terms of bitrates and SNR.

1. Introduction

Over the past 20 years, analog based communication around the world has been sidetracked by digital communication. The modes of digital representation of information such as audio and video signals have undergone much transformation in leaps and bounds. With the increase in commercial interest in video communications, the need for international image and video compression standards arose. Many successful standards of audio-video signals [18] [19] have been released which have advanced a plethora of applications, the largest of which is the digital entertainment media. Products have been developed which span a wide range of applications and have been enhanced by the advances in other technologies such as the internet and digital media storage.

Moving Picture Experts Group (MPEG) [3] was the first group who formed the format, which quickly became the standard for audio and video compression and transmission. Soon after MPEG-2, was released, being broader in scope, supported interlacing and high definition video formats. Soon later, MPEG-4 uses further coding tools with additional complexity to achieve higher compression factors than MPEG-2. MPEG-4 is very efficient in terms of coding, being almost 1/4th the size of MPEG-1. Although, the MPEG standards have monopoly over most of the video signal formats, several other formats also gave close competition in terms of efficiency, complexity, and storage requirements.

AVS China [1] [8] was developed by the AVS workgroup, and is currently owned by China. This audio and video standard was initiated by the Chinese government in order to counter the monopoly of the MPEG standards, which were costing it dearly. AVS China clearly focused on reducing the dependence on audio-video information formatting based on the MPEG formats, thereby providing China with a standard, that helped save millions of dollars of Chinese money being lost to the MPEG group. AVS objective was to create a national audio-video standard for broadcasting in China and further extend this technology across the globe.

2. Data formats

AVS supports both progressive and interlaced scan formats [5]. Progressive scan is a method of storing or transmitting images where in all lines of each frame are scanned in sequence. Interlaced scanning involves alternate run through of odd and even lines. AVS codes video data in progressive scan format. An advantage of coding data in progressive scan format is the efficiency of motion estimation and also progressive content can be encoded at significantly lower bit rates than interlaced data. Motion compensation of progressive content is less complex than interlaced content.

2.1. Layered structure

AVS follows a layered structure for the data and this is very much visible in the coded bitstream. Figure 1 depicts the layered data structure. The first layer is a set of frames of video put together as a sequence. Video frames comprise the next layer, and are called Pictures. Pictures are subdivided into rectangular regions called slices. Slices are further subdivided into square regions of pixels called macroblocks (MB). These MBs consist of a set of luminance and chrominance blocks [5].

Figure 1. Layered structure [5]

2.1.1. Sequence: The sequence layer consists of a set of mandatory and optional downloaded system parameters. The mandatory parameters are necessary to initialize decoder systems. The optional parameters are used for other system settings at the discretion of the network provider. Sometimes user data can optionally be contained in the sequence header. The sequence layer provides an entry point into the coded video. Sequence headers should be placed in the bitstream to support user access appropriately for the given distribution medium. Repeat sequence headers may be inserted to support random access. Sequences are terminated with a sequence end code.

2.1.2. Picture: The picture layer provides the coded representation of a video frame [2] [4] [5]. It comprises of a header with mandatory and optional parameters and optionally with user data. Three types of pictures are defined by AVS:

Intra pictures (I-pictures) Predicted pictures (P-pictures) Interpolated pictures (B-pictures)

AVS-M [6] supports only I picture/frame and P picture/frame which is depicted in Figure 2. I frame can be reconstructed without any reference to other frames. The P frames are forward predicted from the last I-frame or P-frame, i.e. it is impossible to reconstruct them without the data of another frame (I or P). P frame can have a maximum of two reference frames for forward prediction.

Figure 2. Picture types in AVS part 7 [2]

2.1.3. Slice: The slice structure provides the lowest-layer mechanism for resynchronizing the bitstream in case of transmission error. Slices comprise of a series of MBs. Slices must not overlap, must be contiguous, must begin and terminate at the left and right edges of the picture. It is possible for a single slice to cover the entire picture. The slice structure is optional. Slices are independently coded and no slice can refer to another slice during the decoding process.

2.1.4. Macroblock: Picture is divided into MBs. A macroblock includes the luminance and chrominance component pixels that collectively represent a 16x16 region of the picture. In 4:2:0 mode, the chrominance pixels are subsampled by a factor of two in each dimension; therefore each chrominance component contains only one 8x8 block. In 4:2:2 mode, the chrominance pixels are subsampled by a factor of two in the horizontal dimension; therefore each chrominance component contains two 8x8 blocks [2] [4] [5]. The MB header contains information about the coding mode and the motion vectors. It may optionally contain the quantization parameter (QP). Macroblock partitioning and sub macroblock partitioning [2] are shown in Figures 3 and 4. The partitioning is used for motion compensation. The number in each rectangle specifies the order of appearance of motion vectors and reference indices in a bitstream.

Figure 3. Macroblock partitioning [2]

Figure 4. Sub macroblock partitioning [2]

2.1.5. Block: The block is the smallest coded unit and contains the transform coefficient data for the prediction errors. In case of intra-coded blocks, intra prediction is performed from neighbouring blocks.

3. Profile and levels

AVS-M defines Jiben Profile. There are nine levels specified which are:

1.0 : up to QCIF and 64kbps 1.1 : up to QCIF and 128kbps 1.2 : up to CIF and 384kpbs 1.3 : up to CIF and 768kbps 2.0 : up to CIF and 2Mbps 2.1 : up to HHR and 4Mbps 2.2 : up to SD and 4Mbps 3.0 : up to SD and 6Mbps 3.1 : up to SD and 8Mbps

4. AVS-M codec

The block diagrams of AVS-M encoder and decoder [6] are depicted in Figures 5 and 6. Each input macroblock needs to be predicted (intra predicted or inter predicted). In an AVS-M encoder, S0 is used to select the correct prediction method for current MB whereas in the decoder, S0 is controlled by the MB type of current MB. The intra predictions are derived from the neighboring pixels in left and top blocks and the inter predictions are derived from the decoded frames. The unit size of intra prediction is 4×4 because of the 4×4 integer cosine transform (ICT) used by AVS-M. Seven types of block sizes, 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 are supported in AVS-M. The precision of motion vector in inter prediction is up to 1/4 pixel. The prediction residues are transformed with 4×4 ICT. The ICT coefficients are quantized using a scale quantizer and zig-zag scanning order is used for the quantized coefficients.

AVS-M employs an adaptive variable length coding (VLC) technique [6] [17]. There are two different types of Exp-Golomb codebook corresponding to different distributions. Some mapping tables are defined to map coded symbol into a special codebook and its elements.

The reconstructed image is the sum of predicted and current reconstructed error image. The deblocking filter is used in the motion compensation loop and it acts on the reconstructed image across the vertical and horizontal edges respectively. The deblocking filter is adjusted depending on the activities of blocks and quantization parameters.

Figure 5. AVS-M encoder [6]

Figure 6. AVS-M decoder [6]

5. Key techniques of AVS-M

5.1. Transform

Small block sizes perform better than large ones for lower image resolution. 4x4 is the unit of transform [6] [16], intra prediction and smallest motion compensation in AVS Part 7. The 4x4 transform used in AVS is

T 4=[2 3 22 1 −222

−1−3

−22

1−33

−1]For a 4×4 block, the decoded levels are dequantized with

xij = (xij' × d(QP) + 2s(QP)-1) >> s(QP) (1)where xij is the dequantized coefficient, QP is the quantization parameter, d(QP) is the

inverse quantization table, and s(QP) is the varied shift value for inverse quantization. The range of x'

ij is [−211,211−1].The horizontal inverse transform is performed as follows

H'= X × T4t (2)

where X is the 4×4 dequantized coefficient matrix and H' is the intermediate result after the horizontal inverse transform. T4

t implies transpose of T4 matrix.Then the vertical inverse transform is performed

H=T4× H' (3)H is the 4×4 matrix after inverse transform. The range of the elements hij of H should be

[−215, 215−17].The transform matrix contains only integer coefficients because it can be realized using

only addition and shift operations. The operations of the transform and quantization are completed within 16 bits. AVS-M uses a prescaled integer transform (PIT) technology; all of the scale-related operations are done in the encoder. The decoder does not need any scale operations. PIT is used in AVS-M to reduce the complexity.

5.2. Quantization

An adaptive uniform quantizer is used to perform the quantization process on the 4×4 transform coefficients matrix [5] [6] [12].The step size of the quantizer can be varied to provide rate control. In constant bit rate operation, this mechanism is used to prevent buffer overflow. The transmitted step size quantization parameter (QP) is used directly for luminance coefficients and for the chrominance coefficients it is modified on the upper end of its range. The quantization parameter may optionally be fixed for an entire picture or slice. If it is not fixed, it may be updated differentially at every macroblock. The quantization parameter varies from 0 to 63 in steps of one. The uniform quantization process is modified to work together with the transform in order to provide low complexity decoder implementation.

5.3. Intra prediction

Two types of intra prediction modes are adopted in AVS-M, Intra_4x4 and Direct Intra Prediction (DIP) [13]. AVS-P7’s intra coding brings a significant complexity reduction and maintains a comparable performance. In particular, the content-based most probable intra mode decision improves the possibility of most probable intra mode, which can result in the bit reduction in encoding process.

5.3.1. Intra_4x4: In Intra_4x4 mode, each 4x4 block is predicted from spatially neighboring samples as shown in Figure 7. The 16 samples of the 4x4 block which are labeled as a-p are predicted using prior decoded samples in adjacent block labeled as A-D, E-H and X [11]. The up-right pixels used to predict are expanded by pixel sample D. Similarly, the down-left pixels are expanded by H. For each 4x4 block, one of the nine prediction modes as shown in Figure 8 can be utilized to exploit spatial correlation including eight directional prediction modes [10] (such as Down Left, Vertical, etc) and non-directional prediction mode (DC).

Figure 7. Intra_4×4 prediction [11]

Figure 8. Nine intra_4×4 prediction modes of AVS P7 [11]

5.3.2. Direct intra prediction: When direct intra prediction is used a new method is applied to code the intra prediction mode information. When Intra_4x4 is used, we need at least 1 bit to represent the mode information for each block. It means, for a macroblock, even when intra prediction mode of all 16 blocks are their most probable mode (MPM), 16 bits is needed to indicate the mode information. As AVS-P7 is used for mobile applications, it always has limited bandwidth, so the QP is usually high. Thus, the percentage of best mode

equaling to most probable mode is high [7]. Many MBs use 16 bits to present all the blocks when this MB is coded using their most probable mode. In direct intra prediction mode, we use 1 bit flag to indicate whether all of the blocks in this block are coded using their most probably mode or not.

All 16 4×4 blocks in a MB use their most probable modes to do Intra_4×4 prediction and calculate RDCost(DIP) of this MB

RDCost(mode)=D(mode) + λ.R(mode) (4)Rate distortion cost (RD Cost) is used to apply rate distortion optimization when choosing

a best mode for the MB. D(mode) is the sum of square distortion between reconstructed MB and original MB under this mode. R(mode) is the number of bits required to code this MB. Also λ specifies the relative importance of the distortion D and rate R.

5.4. Interframe prediction

AVS-M defines I picture and P picture. P pictures use forward motion compensated prediction. The maximum number of reference pictures used by a P picture is two. To improve the error resilience capability, one of the two reference pictures can be an I/P pictures far away from current picture. AVS-M also specifies non reference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not be used as a reference picture. The non reference P pictures can be used for temporal scalability. The reference pictures are identified by the reference picture number, which is 0 for IDR picture. The reference picture number of a non-IDR reference picture is calculated as given in equation 5.

refnum ={ refnum prev+num−num prev , num prev ≤ numrefnum prev+num−num prev+32 , otherwise (5)

where num is the frame num value of current picture, numprev is the frame num value of the previous reference picture, and refnumprev is the reference picture number of the previous reference picture.

The size of motion compensation block can be 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 or 4×4. If the half_pixel_mv_flag is equal to ‘1’, the precision of motion vector is up to 1/2 pixel; otherwise the precision of motion vector is up to ¼ pixel [16]. When half_pixel_mv_flag is not present in the bitstream, it shall be inferred to be “11.”

5.5. Deblocking filter

AVS Part 7 makes use of a simplified deblocking filter, wherein boundary strength is decided at macroblock level [4] [12]. Filtering is applied to the boundaries of luma and chroma blocks except for the boundaries of picture or slice. In Figure 9, the dotted lines indicate the boundaries which will be filtered. Intra predicted MBs usually have more and bigger residuals than that of inter predicted MBs, which leads to very strong blocking artifacts at the same QP. Therefore, a stronger filter is applied to intra predicted MBs and a weak filter is applied to inter predicted macroblock. When QP is not very large, the distortion caused by quantization is relatively small, henceforth no filtering is required.

Luma Chroma

Figure 9. Luma and chroma block edge [7]

5.6. Entropy coding

In entropy coding, the basic concept is mapping from a video signal after prediction and transforming to a variable length coded bitstream, generally referring to two entropy coding methods, either variable length coding or arithmetic coding [7]. Context-based adaptive entropy coding comes into picture when higher coding efficiency is desired.

AVS-M uses Exp-Golomb code [11], as shown in Table 1 to encode syntax elements such as quantized coefficients, macroblock coding type, and motion vectors. Eighteen coding tables are used in quantized coefficients encoding. The encoder uses the run and the absolute value of the current coefficient to select Table 1.

Table 1. Kth order Golomb code [6]

Exponential Code Structure Range of code number

k = 0

1 00 1 x0 1 .. 2

00 1 x1 x0 3 .. 60 00 1x2 x1 x0 7 .. 14

……….. ………..

k = 1

1 x0 0 .. 10 1 x1 x0 2 .. 5

00 1 x2 x1 x0 6 .. 1300 01 x3 x2 x1 x0 14 .. 29

……….. ………..

5.6.1. Context based adaptive 2 dimensional variable length coding: In AVS an efficient context based adaptive 2D variable length coding is designed for coding transform coefficients in a 4×4 block [6]. The transform coefficients are mapped into one dimensional (level, run) sequence by the reverse zigzag scan [8] [14]. The coding process is as follows.

Step 1. Transform coefficients are classified into three categories, intra, inter and chroma, for the luma components of intra MB, luma components of inter MB and chroma components for both kinds of MB, respectively. Set tablenum=0 and use the first VLC table to code the first (level, run) instance.

Step 2. If the (level, run) can be coded in the current table, code the (level, run) with Exp-Golomb code.

Step 3. If the (level, run) is out of the current table’s range, code the (level, run) with escape coding method.

Step 4. Using the coded information to choose the table, represented as tablenum, for the next (level, run); then jump to Step 2. If all the (level, run)s in the transform block are coded, code the EOB.

6. Simulation and results

Standard QCIF and CIF sequences like Foreman, News, Mobile and Tempete [21] are tested based on the encoder and decoder architecture of AVS-M using Microsoft Visual C++. Figure 10 shows the original and decoded sequences for various test sequences. Figure 11 gives the plot of SNR vs bits per frame for these sequences. A total of 20 frames for each sequence were considered.

QCIF QCIF

(a) (b) (c) (d)

CIF CIF

(e) (f) (g) (h)

Figure 10.(a) Original foreman sequence, (b) Decoded foreman sequence, (c) Original news sequence (d) Decoded news sequence, (e) Original mobile sequence, (f) Decoded mobile sequence, (g) Original tempete sequence, (h) Decoded tempete sequence.

(a) (b)

(c) (d)

Figure 11. Plot of SNR vs bits/frame for the encoded (a)Foreman sequence (b)News sequence (c) Mobile sequence (d) Tempete sequence.

7. Conclusions and future work

AVS-M is an application driven coding standard with well-optimized and efficient techniques. It achieves performance similar to H.264/AVC at a much lower cost. AVS Part 7 targets to low complexity, low picture resolution mobility applications. The AVS encoder and decoder are implemented using the AVS-M software [20]. Tests are carried out on a set of QCIF and CIF sequences. The SNR values of the luma and chroma components are tabulated.

The 2D-VLC can be further studied to improve the performance. The AVS-M access units are also a scope for study. This paper dealt with only the video part of the AVS coding standard, therefore similar tests can be carried out on the audio part [22] of the standard.

8. References

[1] AVS working group official website, http://www.avs.org.cn

[2] UTA Electrical Engineering courses website: http://www-ee.uta.edu/dip/Courses/EE5351/ISPACSAVS.pdf

[3] MPEG website: http://www.mpeg.org/

[4] L. Yu et al., “Overview of AVS-Video: Tools, performance and complexity,” SPIE VCIP, vol. 5960, pp. 596021-1~ 596021-12, Beijing, China, July 2005.

[5] W. Gao et al., “AVS– the Chinese next-generation video coding standard,” National Association of Broadcasters, Las Vegas, 2004.

[6] L. Fan, “Mobile Multimedia Broadcasting Standards”, ISBN: 978-0-387-78263-8, Springer US, 2009.

[7] F. Yi et al., “Low-Complexity Tools in AVS Part 7”, J. Comput. Sci. Technol, vol.21, pp. 345-353, May. 2006.

[8] L. YU, S. Chen and J. Wang, “Overview of AVS-video coding standards”, Signal Process: Image Commun, vol. 24, Issue 4, pp 247-262, April 2009.

[9] W. Gao, “AVS – A project towards to an open and cost efficient Chinese national standard”, ITU-T VICA workshop, ITU Headquarters, Geneva, 22-23 July 2005.

[10] Z. Zhang et al., “Improved Intra Prediction Mode-decision Method”, Proc. of SPIE ,Vol. 5960, pp. 59601W-1~ 59601W-9, Beijing, China, July 2005.

[11] http:zhan.ma.googlepages.com/INTRA_CODING_AVS.PDF

[12] W. Gao and T. Huang “AVS Standard -Status and Future Plan”, Workshop on Multimedia New Technologies and Application, Shenzhen, China, Oct. 2007.

[13] M. Liu and Z. Wei “A fast mode decision algorithm for intra prediction in AVS-M video coding”, vol.1, ICWAPR apos; 07, Issue, 2-4, pp.326 – 331, Nov. 2007.

[14] Q. Wang et al., “Context-Based 2D-VLC for Video Coding”, IEEE Int’l Conf. on Multimedia and Expo (ICME), vol.1, pp. 89-92, June. 2004.

[15] W. Gao , K.N. Ngan and L. Yu “Special issue on AVS and its applications: Guest editorial”, Signal Process: Image Commun, vol. 24, Issue 4, pp 245-344, April 2009.

[16] S.W. Ma and W. Gao, “Low Complexity Integer Transform and Adaptive Quantization Optimization”, J. Comput. Sci. Technol, vol.21, pp.354-359, May 2006.

[17] Y Xiang et al., “Perceptual evaluation of AVS-M based on mobile platform”, Congress on Image and Signal Processing, 2008, vol. 2, Issue, pp76 – 79, 27-30 May 2008.

[18] R. Schafer and T. Sikora, “Digital video coding standards and their role in video communications”, Proc. of the IEEE, vol. 83, pp. 907-924, June 1995.

[19] K.R. Rao and J.J. Hwang, “Techniques and standards for digital image/video/audio coding,” Prentice Hall, 1996.

[20] AVS China software can be downloaded from the site ftp://159.226.42.57/public/avs_doc/avs_software  

[21] Test sequences can be downloaded from the site http://trace.eas.asu.edu/yuv/index.html

[22] H.J. Ai, S.X. Chen and R.M. Hu, “Introduction to AVS audio”, J. Comput. Sci. Technol , vol.21, Issue 3, pp.360-365, May 2006.