scalable video coding scalable extension of h.264 / avc

-1/20-

Scalable Video CodingScalable Video CodingScalable Extension of H.264 / AVCScalable Extension of H.264 / AVC

Scalable Video CodingScalable Video Coding

Video streaming over internet is gaining more and more popularity due to video conferencing and video telephony applications.

The heterogeneous, dynamic and best effort structure of the internet, motivates to introduce a scalability feature as adapting video streams to fluctuations in the available bandwidths.

Optimize the video quality for a large range of bit-rates. A video bit stream is called scalable if part of the

stream can be removed in such a way that the resulting bit stream is still decodable.

Scalability here implies: Single encode Multiple possibilities to transmit and decode bitstream

Scalable Video CodingScalable Video Coding

A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable, to adapt to the various needs of end users and to varying terminal capabilities or network conditions.

SVC - StandardizationSVC - Standardization

4

SVC Principle : one encodingSVC Principle : one encoding

5

SVC Principle : multiple decodingSVC Principle : multiple decoding

6

H.264 Simulcast Vs. SVCManInRestaurent Sequence

37

38

39

40

41

42

43

44

45

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Bitrate (KBPS)

Y-P

SN

R

1920x1080+960x540SIMULCAST

SVC with 2 spatial layers(1920x1080<->960x540)

H.264/AVC Simulcast vs. SVCH.264/AVC Simulcast vs. SVC

Typical gains in quality by doing SVC spatial scalability (as opposed to Simulcast) may be in the range of 0.5dB to 1.5dB PSNR gain Or equivalently 10 to 30% bit rate reduction

This gap will be more if there are more than one SNR layer per spatial layer

H.264 simulcast

SD

HD

SVC

HD+SD

Functionalities and ApplicationsFunctionalities and Applications

SVC has capability of reconstructing lower resolution or lower quality signals from partial bit streams.

Partial decoding of the bit stream allows- Graceful degradation in case part of bit stream is lost. Bit-rate adaptation Format adaptation Power adaptation

Beneficial for transmission services with uncertainties regarding Resolution required at the terminal. Channel conditions or device types.

SVC BasicsSVC Basics

Straight forward extension to H.264 with very limited added complexity

Layered approach One base layer One or more enhancement layers.

Base layer is H.264/AVC compliant. An SVC stream can be decoded by an H.264 decoder. Enhancement layers enable Temporal, Spatial or Quality (SNR)

scalability.

SVC ProfilesSVC Profiles

SVC Standard defines 3 profiles Scalable Baseline profile

Targeted for conversational and surveillance applications. Support for Spatial Scalable coding is restricted to ratios 1.5

and 2, between successive spatial layers. Interlaced video not supported.

Scalable High profile Designed for broadcast, storage and streaming applications. Spatial scalable coding with arbitrary resolution ratios

supported. Interlaced video supported

Scalable High Intra profile Designed for professional applications. Contains only IDR pictures for all layers. All other coding tools are same as Scalable High Profile.

Temporal ScalabilityTemporal Scalability(Dyadic prediction structure)(Dyadic prediction structure)

Group of Pictures (GOP) Key Picture: Typically Intra-coded

Hierarchically predicted B Pictures: Motion-Compensated

Prediction

Frame Rate = 3.75 fpsFrame Rate = 7.5 fpsFrame Rate = 15 fpsFrame Rate = 30 fps

PredictionGOP border GOP border

Key Picture Key Picture

T0T0T1T2

T2T3T3

T3T3

Tx : Temporal Layer Identifier

Structural Delay = 7 frames

Hierarchical B-picturesHierarchical B-pictures

• Above is a non-dyadic prediction structure, which provides 2 independently decodable subsequences with 1/9th and 1/3rd of full frame rate.

• Structural delay = 8 frames

Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007

• Above is a non-dyadic prediction structure, which provides 0 structural delay, but low coding efficiency, compared to above examples.

• Any chosen prediction structure need not be constant over time. It can be arbitrarily modified, e.g., to improve coding efficiency.

IPP : GOP Size 1 No Temporal scalability Only Temporal Level 0

IBP : GOP Size 2 Temporal Levels 0, 1

GOP Size 4 Temporal Levels 0, 1, 2

GOP Size 8 Temporal Levels 0, 1, 2, 3

Group Of Pictures (GOP)Group Of Pictures (GOP)

Coding efficiency of Hierarchical Coding efficiency of Hierarchical Prediction StructuresPrediction Structures Significant improvement in coding efficiency for high delay app. Depends on how QP is chosen for different temporal layers.

larger GOP size gives larger PSNR improvement Smaller QP for lower layer

Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007

Spatial ScalabilitySpatial Scalability

Sub-sample and Encodeto form Base Layer

Decode and Up-sampleto original Resolution

Subtract Predicted from Original

Encode residueto form Enhancement Layer

The base layer contains a reduced-resolution version of each coded frame. Decoding the base layer alone produces a low-resolution output sequence and decoding the base layer with enhancement layer(s) produces a higher-resolution output.

Spatial ScalabilitySpatial Scalability The prediction signals are formed by

MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)

Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal + Spatial)

Inter-layer prediction Three kinds of inter-layer prediction

Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction (when the co-located lower layer MB is intra coded)

Base mode MB Only residuals are transmitted, but no additional side info.

Extended Spatial Scalability (ESS)Extended Spatial Scalability (ESS)

This is required in many applications where different display sizes from broadcasting, communications and IT environments are commonly mixed, having different aspect ratios (like 4:3 or 16:9 etc).

Quality / Fidelity / SNR ScalabilityQuality / Fidelity / SNR Scalability

Types Coarse Grain Scalability (CGS) Medium Grain Scalability (MGS) Fine Grain Scalability (FGS)

Not supported by SVC standard because of very poor enhancement layer coding efficiency.

Bit rate adaptation at same spatial/temporal resolution SVC supports up to 16 SNR layers for each spatial layer

Coarse-grain quality scalability (CGS)Coarse-grain quality scalability (CGS)

A special case of spatial scalability Identical sizes (resolution) for base and enhancement

layers

Smaller quantization step sizes for higher enhancement residual layers

Designed for only several selected bit-rate points Supported bit-rate points = Number of layers

Switch can only occur at IDR access units

Medium-grain quality scalability (MGS)Medium-grain quality scalability (MGS)

More enhancement layers are supported Refinement quality layers of residual

Key pictures Drift control

Switch can occur at any access units CGS + key pictures + refinement quality layers Drift control

Drift: The effect caused by unsynchronized MCP at the encoder and decoder side

Trade-off of MCP in quality SVC Coding efficiency drift

SVC EncoderSVC Encoder

De

pen

den

cy laye

r

The same motion/prediction information

The same motion/prediction information

Temporal Decomposition

SVC: Combined ScalabilitySVC: Combined Scalability

Spatio-Temporal-Quality Cube

Combined ScalabilityCombined Scalability

Dependency and Quality refinement layers

D = 2

Q = 2

Q = 1

Q = 0

D = 1

Q = 2

Q = 1

Q = 0

D = 0

Q = 2

Q = 1

Q = 0

Scalable bit-stream

Combined ScalabilityCombined Scalability

T0

D1

Q1

Q0

D0

Q1

Q0

T2 T1 T2 T0

Combined ScalabilityCombined Scalability Bit-stream format

Bit-stream switching Inside a dependency layer

Switching everywhere Outside a dependency layer

Switching up only at IDR access units Switching down everywhere if using multiple-loop decoding

NAL unit header

NAL unit header extension

NAL unit payload

1 1 1 1 1 323362

P T D QP (priority_id): indicates the importance of a NAL unitT (temporal_id): indicates temporal levelD (dependency_id): indicates spatial/CGS layerQ (quality_id): indicates MGS/FGS layer

Profiles of SVCProfiles of SVC

Scalable Baseline For conversational and surveillance applications requiring low

decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned

cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma transform The base layer conforms Baseline profile of H.264/AVC

Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability: arbitrary The base layer conforms High profile of H.264/AVC

Scalable High Intra Scalable High + all IDR pictures

ConclusionsConclusions Temporal scalability

Hierarchical prediction structure Spatial and quality scalability

Inter-layer prediction of Intra, motion, and residual information Single-loop MC decoding Identical size for each spatial layer – CGS CGS + key pictures + quality refinement layer – MGS

applications Power adaption – decoding needed part of the video stream Graceful degradation – when “right” parts are lost Format adaption – backwards compatible extension in mobile TV

What’s next in SVC? Bit-depth scalability (8-bit 4:2:0 10-bit 4:2:0) Color format scalability (4:2:0 4:4:4)

2007/8 MC2008, VCLAB 27

ReferencesReferences H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the

Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.

T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.

T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)

H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05.

2007/8 MC2008, VCLAB 28

http://iphome.hhi.de/wiegand/dic.htm

scalable video coding scalable extension of h.264 / avc

Documents

video quality

video conferencing

residual pred flags

adapting video streams

video telephony applications

skip flags

svc spatial scalability

svc principle