scalable video coding extension of h.264/avcmapl.nctu.edu.tw/course/vc_2009/files/07.special...

Scalable Video Coding Extension of H.264/AVC

Wen-Hsiao Peng, Ph.D

Multimedia Architecture Processing Laboratory (MAPL)Department of Computer Science, National Chiao Tung University

May 2008

Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL May 2008 1 / 68

Scalable Extension of H.264/AVC

Outline

Introduction

Scalable Video Coding Extension of H.264/AVC

Standardization ActivitiesCoding TechnologiesTransport Interface

R-D Performance

Complexity Analysis

Concluding Remarks


Introduction

Introduction


Introduction De�nition

Scalable Video Coding (SVC)

Embedded Video Representation

Spatial, Temporal, SNR, and Combined Scalability

40kbits/s

Enhancement-LayerBase-Layer

3Mbits/s

Time

SNRScalability

SpatialScalability

TemporalScalability

EmbeddedRepresentation


Introduction Concepts

Scalability

QCIF

QCIF@75Hz,L QCIF@15Hz,M QCIF@30Hz,H

CIF

CIF@15Hz,L CIF@30Hz,H


Introduction Rationales

Why Do We Need SVC?

Diversi�ed Clients

Di�erent Computation Power and Display Capabilities

Heterogeneous Networks

Di�erent Types of NetworksTime-varying Channel Bandwidth

Servers

Limited Storage Spaces - compression is requiredLimited Bandwidth - unicast may not be applicable



Why Do We Need SVC?

Ethernet

Ethernet

Server

Wireless

Point-to-PointTransmission

Broadcasting

Router

Wireless

512 kbps

32 kbps

128 kbps

256 kbps

64 kbps

3 Mbps

1.5 Mbps

384 kbps

64 kbps

Bandwidth

Time

DiversifiedClients

Time-VaryingBandwidth

LimitedResources



Graceful Degradation

Foreman


Introduction Applications

How is SVC Useful?


SVC Standardization Activities

SVC Standardization Activities


SVC Standardization Activities SVC Roadmap

Roadmap

MPEG-4 Part 10 Amd. 3

Started in MPEG-21 Part 13Moved to Joint Video Team (JVT) since 2004/10

Roadmap

2003/03, Call for Evidence (CfE)2004/04, Call for Proposal (CfP) - 12 Wavelet-based + 2 DCT-based2004/10, Activities Moved to JVT (Palma Meeting)2005/01, Wording Draft (WD) - HHI Proposal2006/01, Proposed Draft of Amendment (PDAM)2006/07, Final Proposed Draft of Amendment (FPDAM)2007/01, Final Draft of Amendment (FDAM)2007/07, Amendment (AMD)


SVC Standardization Activities SVC History

SVC Call-for-Proposal

AVC/H.264-based technology

HHI, NCTU

Wavelet-based schemes

University of BresciaDanae+ ThomsonMicrosoft Research Asia (MSRA)University of South Wales (UNSW)



AVC/H.264-based Technology

Scalable Extension of AVC/H.264

Decomation

MCTF

MCTF

4x4Transform

4x4Transform

MotionRefinement

ResiduePrediction

Spatio-Temporal Transform

Scalable orCABACCoding

Entropy Coding

Scalable orCABACCoding

Bit-Stream

Input

Inter-LayerMotion

Prediction

SpatialDecimation

ScalableEntropyCodingTemporal

PredictionSpatialDCT

Inter-LayerResidue

Prediction



Wavelet-based Technology

t+2D

TemporalMCTF

Spatial2D Wavelet

Spatio-Temporal Transform

ScalableCoding

Entropy Coding

Bit-StreamInput

TemporalWavelet

SpatialWavelet

ScalableEntropyCoding

2D+t

Spatial2D W avele t

In-bandT empora l

M CT F

In-bandT empora l

M CT F

S patio-Temporal Transform

S ca lableCoding

Entropy Cod ing

S calableCoding

Bit-Stream

InputHF

LL

Spatial 2DD ecom po sition

S patia l 2DDecompositio n

SpatialWavelet

In-BandTemporalWavelet

In-BandDCT

ScalableEntropyCoding



Subjective Quality Evaluation

Single Stimulus Multimedia Test (SSMT)

Absence of unimpaired reference11 grade scale

T e s t s e q u e n c e 1 T e s t s eq u e n c e 2

V o te1

V o te2

1 0 s 5 s 1 0 s 5 s



Subjective Quality Comparison

Harbour - QCIF 15Hz

0 1 2 3 4 5 6 7 8 9 10

192 kbps

128 kbps

96 kbps

MOS

BresciaDanae + ThomsonHHIMSRAUNSW

AVC/H.264-based

Wavelet-based

Quality

Bit Rate



Summary

AVC/H.264-based Technologies

Better Viewing Quality but Worse PSNR

Wavelet-based Schemes

Worse Viewing Quality but Better PSNR

2004, AVC/H.264-based approach formally became starting point


SVC Technology

SVC Technology Overview


SVC Technology Technology Overview

SVC Technology Overview

Temporal Scalability

Motion Compensated Temporal Filtering (MCTF)Hierarchical B Pictures

Spatial Scalability

Pyramid and Stack StructureSeparate MCTF/Motion Descriptions

SNR Scalability

Coarse Granularity Scalability (CGS)Medium Granularity Scalability (MGS)

Adaptive Inter-layer Prediction

No need to decode multiple prediction loopsComparable decoding complexity than single layer



SVC Encoder Block Diagram



SVC Syntax Structure

NAL unitheader

Sliceheader Slice data

Slicetrail

Macroblock layer End ofslice

cabac_alignment

mb_skip_run

mb_skip_flag

mb_field_decoding_flag

Slice data

Sub_mb_pred

Mb_pred

residual_pred_flag

Macroblock layer

base_mode_flag mb_type

pcm sample

coded_block_pattern

mb_qp_delta Residual


SVC Technology Temporal Scalability

Temporal Scalability



Dyadic Temporal Scalability

U

L0 L0 L0 L0 L0 L0 L0 L0 L0

-

H1

-

H1

-

H1

-0.5

0.5

H1

+ +

0.25+ +

L1 L1 L1 L1

P

GOP BoundaryInput Frames

0.25

L0 L0 L0 L0 L0 L0 L0 L0 L0

1

2 2

53 4 43

5 6 6 7 7 8 8

GOP Boundary

MCTF Hierarchical B Prediction



Motion Compensated Temporal Filtering (MCTF)

Wavelet Transform along Motion Trajectory

H

G

2

2

InputHigh Pass Output

Low Pass Output

5/3 Lifting Scheme

Prediction Step, Pk = [�1/2, 1,+1/2]

hk|{z}HighPassOutputs

= S2k+1| {z }OddFrames

� Pk S2k| {z }EvenFrameFiltering

Update Step, Uk = [3/4, 1, 3/4]

lk|{z}LowPassOutputs

= S2k|{z}EvenFrames

+ Uk hk| {z }HighPassFrameFiltering



MCTF Example



Hierarchical B Predictions

Hierarchical B Predictions

Remove Update Step + Close-loop PredictionEnabled by H.264/AVC Syntax

Hierarchical B vs. P (Low Delay)



E�ciency and Mismatch Trade-o�

Closed Loop (Hierarchical B)

Use reconstructed frames for predictionNo Mismatch but Worse Prediction E�ciency

Open Loop (MCTF)

Employ original pictures for predictionMismatch with Better Prediction E�ciency

−

In-LoopFrame Buffer

+

Closed-Loop Encoder

Prediction

Input Video

Reconstruction

Quantization−

Out-LoopFrame Buffer

Open-Loop Encoder

Prediction

Input Video

Quantization

InverseQuantization



Encoder and Decoder Mismatch

Drifting Errors

−


Open-Loop Encoder

Prediction

Input Video

Quantization +

In-LoopFrame Buffer

InverseQuantization

Reconstruction

Closed-Loop Decoder

Output VideoPredictor Mismatch



Loop Control Performance

Closed-loop is better in most cases

Foreman CIF 30 Hz

kbits/s

150 200 250 300 350 400 450

PSNR

-Y

34

35

36

37

38

39

MCTF with Open-LoopMCTF with Closed-LoopTraditional GOP with IBBP

Foreman QCIF 15 Hz

kbits/s

40 50 60 70 80 90 100 110

PSNR

-Y

33

34

35

36

37

38

39

MCTF with Open-LoopMCTF with Closed-LoopTraditional GOP with IBBP

Close > Open > IBBP Close > Open > IBBP


SVC Technology Spatial Scalability

Spatial Scalability


SVC Technology Spatial Scalability

Spatial Scalability

Stack Structure with Adaptive Inter-layer Predictions

Residual prediction performed in Spatial domain

Hiearchical Prediction

−

In-LoopFrame Buffer

+

Q

−

In-LoopFrame Buffer

+

Q

Interpolation

Inter-Layer Residual Prediction


EntropyCoder

Spatial Enh.-Layer

EntropyCoder

Decimation

MotionCompensation

MotionCompensation

{M.V}

MCTF

Spatial Base-Layer IQ

Scaling &Refinement

IQ

−


SVC Technology SNR Scalability

SNR Scalability



SNR Scalability

Coarse Granularity Scalability (CGS)

Distinctive Quality LevelsStack Structure of Spatial Scalability

Medium Granularity Scalability (MGS)

Packet-based Quality Scalable CodingKey Pictures



Coarse Granularity Scalability

Stack Structure (Similar to Spatial Scalability)

Residual prediction performed in Transform domain

Distortion

Bit RateR1 R2 R3

1λ

2λ

3λ

Hiearchical Prediction

−

In-LoopFrame Buffer

+

Q

−

In-LoopFrame Buffer

+

Q

Inter-Layer Prediction


EntropyCoder

EntropyCoder

MotionCompensation

MotionCompensation

{M.V}

MCTF

Quality Base-Layer IQ

Refinement

IQ

−

Quality Enh.-Layer 1



Medium Granularity Scalability

Packet-based Quality Scalable Coding

Distribute Transform Coe�cients among Several Slices

1514109

131183

12742

6510

1514109

131183

12742

6510

Key Picture Concepts

Trade-o� between Drift and Coding E�ciencyResynchronization Points between Encoder and Decoder



Drift Control

BL-only, EL-only, Two-Loop, and Key Pictures

(A) BL-only (B) EL-only

(C) Two-Loop (D) Key Pictures



Viewing Quality Comparison

Two Loops

TwoLoops

Key Pictures

KeyPictures



Performance Comparison of Prediction Structures

BL-only, EL-only, Two-Loop, and Key Pictures



Performance Comparison of SNR Scalability

CGS performance decreases with increasing number of layers


SVC Technology Adaptive Inter-layer Prediction

Adaptive Inter-layer Prediction



Adaptive Inter-Layer Prediction

Motion Prediction

Macroblock Type, Reference Index, Motion Vector

Residual Prediction

Subtraction of Base Layer from Enhancement Layer

Textural Prediction

Inter-layer Intra Prediction

Remarks

No Base Layer ReconstructionApproximately 0-0.5dB Coding Loss



Multi-level Inter-layer Prediction

GOP Boundary

Spatial Enh. Layer 1with MCTF

(Frame Rate = FR)

Base Layer with Hierarchical B

Pictures(Frame Rate = FR/2)

B2B1 B2A

H1H2 H1

H3 H1H2 H1

L3

H1H2 H1

H3 H1H2 H1

L3Spatial Enh. Layer 2with MCTF

(Frame Rate = FR)

A

L3

L3



Inter-layer Motion Prediction

Macroblock Type, Reference Index, Motion Vector

base_mode_flag

co-located 8x8 block

inter-layerintra-prediction

deriveMB partition,mv, ref index

mb_type

motion_prediction_flag

derive mvp,ref index

mvd

mv, refindex

residual signal

1 0

01

Inter-codedIntra-coded



Macroblock Partition Derivation

Partition Derivation across Spatial Resolutions

8x8 8x4

4x44x8

D irec t,16x16, 16x8

8x16Intra

16x16 16x8

8x88x16

16x16 16x16

16x16 16x16

Intra Intra

Intra Intra

Subordinate Layer

Spatial Enhancement Layer



Inter-layer Residual and Textural Prediction

Textural Prediction - Intra-coded Macroblocks

De-blocking Filtering before Spatial InterpolationSpatial Interpolation across Submacroblocks (8x8)

Residual Prediction - Inter-coded Macroblocks

Independent from Motion PredictionNo Spatial Interpolation across Transform Blocks

Textural Prediction Residual Prediction



Gain of Textural Prediction

Simulcast vs. SVC+Textural



Gain of Motion Prediction

SVC+Textural vs. SVC+Textural/Motion



Gain of Residual Prediction

SVC+Textural/Motion vs. SVC+Textural/Motion/Residual


Combined Scalability





CIF, 30 Hz, 115 - 256 kbit/sCIF, 15 Hz, 88 - 222 kbit/s

CIF, 7.5 Hz, 65 - 189 kbit/s

CIF, 3.75 Hz, 55 - 165 kbit/sCIF, 1.875 Hz, 48 - 139kbit/s

QCIF, 15Hz 41 - 80 kbit/s

QCIF, 7.5Hz 32 - 66 kbit/s

I

L3 H0 H1

B3 B2 B1

Spatialupsampling

SNR base 41 kbit/s

B2 PB3 B3 B3

I B3 B2 B1 B2 PB3 B3 B3

Layer 0: QCIF 15Hz

FGS 41 - 80 kbit/s

Layer 1: CIF 30 Hz

FGS 115 - 256kbit/s

SNR base 115kbit/s

FGS refinement

FGS refinement

Prediction

H0 H2 H0 H1 H0 H3 H0 H1 H0 H2 H0 H1 H0 L3

L3 H0 H1 H0 H2 H0 H1 H0 H3 H0 H1 H0 H2 H0 H1 H0 L3


Encoder Optimization




Bottm-up Encoder Control

Current JSVM

Bottom-up Encoding Process

Sequential Mode Selection

Base Layerp�B = arg minfPBg

[DB (PB ) + λBRB (PB )]

Enhancement Layer

p�E = arg minfPE jP�Bg

[DE (PE jP�B ) + λERE (PE jP�B )]

Remarks

Base layer is comparable to H.264/AVCEnhancement layer is much worse than H.264/AVC



Multi-loop Encoder Control

Joint Base and Enhancement Layer Optimization

Base Layer Encoding

p�B = arg minfPE jPB ,PBg

�ω� [DB(PB) + λBRB(PB)]+

(1�ω)� [DE (PE ) + λERE (PE )]

�

Enhancement Layer Encoding (not modi�ed)

p�E = arg minfPE jP�Bg

[DE (PE jP�B ) + λERE (PE jP�B )]

Remarks

Trade-o� between Base and Enhancement Coding E�ciency



Multi-loop Encoder Control

10% bit rate increase relative to H.264/AVC


SVC Transport Interface

SVC Transport Interface


SVC Transport Interface SVC NAL Units

SVC NAL HEADER

TraditionalNAL

Header

(D, T, Q)Information

DiscardableFlag

Priority IDof a NAL

No Inter-LayerPrediction Flag

Use BaseReference

IDR Flag



SVC NAL Grouping

Layer Representation, Dependency Layer, Scalable Layer, ScalableLayer Representation, Access Unit



SVC Inter-layer Dependency Hierarchy

(Layer Type, Dependency id, Quality id)

MGS_3_2

MGS_3_3

MGS_4_1

MGS_4_2

MGS_3_1

BASE_4_0BASE_0_0

CGS_1_0

CGS_2_0

BASE_3_0

CGS_4_0

FGS_4_1

BASE_5_0

FGS_5_1

FGS_4_2

Spatial layer 0 Spatial layer 1 Spatial layer 2

FGS_5_2

BASE_0_0

CGS_1_0

CGS_2_0

BASE_3_0

MGS_3_1

MGS_3_2

BASE_4_0

MGS_4_1

MGS_3_3


MGS_4_2MGS_3_2

MGS_3_3

MGS_4_1

MGS_4_2

MGS_3_1

BASE_4_0BASE_0_0

CGS_1_0

CGS_2_0

BASE_3_0

CGS_4_0

FGS_4_1

BASE_5_0

FGS_5_1

FGS_4_2


FGS_5_2

BASE_0_0

CGS_1_0

CGS_2_0

BASE_3_0

MGS_3_1

MGS_3_2

BASE_4_0

MGS_4_1

MGS_3_3


MGS_4_2


SVC Transport Interface Scalability Information Signaling

Scalability Information SEI

Number of Scalable Layers

Information of Each Scalable Layer

Layer Identi�er + Decoding Dependency Informationlayer id = (Dependency id � 16+Quality id)� 8+ Temporal idBit Rate, Frame Size, Frame RateInitial Parameter SetsRegion-of-interest (ROI) Information

Priority Information


SVC Transport Interface Layer Switching

Layer Switching

Switching between Quality Re�nement Layers

Possible in each access unit

Switching between Dependency Layers

Possible at IDR access units

Down-switching

Possible in virtually any access unitRequire multiple-loop decoding

Up-switching

Wait for next IDR access unit


SVC Transport Interface Layer Switching

R-D Optimized Layer Switching

What to produce and what to use?

QCIF@30Hz or CIF@10Hz?Spatial quality vs. Temporal quality

RDO Without RDO


H.264/AVC vs. SVC

H.264/AVC vs. SVC


H.264/AVC vs. SVC

R-D Performance

CREW_Combine Scalability

34

34.5

35

35.5

36

36.5

37

37.5

0 400 800 1200 1600 2000 2400 2800 3200Bitrate (kbps)

PS

NR

Y (d

B)

SVC_4CIF60SVC_CIF30

SVC_QCIF15

AVC_4CIF60AVC_CIF30

AVC_QCIF15

AVC/H.264@4CIF SVC

@4CIF

AVC/H.264@CIF

SVC@CIF


SVC Complexity Analysis




How Complex is SVC?

Complexity of SVC depends on # of prediction loops

Encoder is slower than real-time by

60x for temporal scalability in CIF resolution146x for spatial scalability27x for CGS, 75x for FGS in CIF resolution1000x for combined scalability

Decoder is slower than real-time by

16x for spatial scalability1.7x for CGS, 5x for FGS in QCIF resolution92x for combined scalability

Decoder is faster than real-time for temporal scalability in QCIF

Ratio of memory access over computation is >200


Concluding Remarks

Concluding Remarks


Concluding Remarks

Concluding Remarks

Scalable Video Coding (SVC)

Spatial, Temporal, SNR, and Combined ScalabilityPower and Format Adaptation with Graceful Degradation

What is Next?

Encoder OptimizationDecoder ImplementationTransport MechanismBit-depth and Color ScalabilityInterlaced Videos


References

References

1 H. C. Huang, W. H. Peng, T. Chiang, and H. M. Hang, \Advances inthe Scalable Amendment of H.264/SVC," IEEE CommunicationsMagazine, vol.45, pp. 68 - 76, 2007.

2 H. Schwarz, D. Marpe, and T. Wiegand, \Overview of the ScalableVideo Coding Extension of the H.264/AVC Standard," IEEETransactions on Circuits and Systems for Video Technology, vol. 17,no. 9, pp. 1103 - 1120, September 2007.

3 T. Wiegand, G. Sllivan, J. Reichel, H. Schwarz, and M.Wien, \JointDraft ITUT Rec. H.264 { ISO/IEC 14496-10/Amd.3 Scalable VideoCoding," ISO/IEC JTCI/SC29/WG11 and ITU-T SG16 Q.6,JVT-X201, July 2007.

4 T. Wiegand, \Scalable Video Coding," ISO/IEC JTCI/SC29/WG11and ITU-T SG16 Q.6, JVT-W132, April 2007.

5 Y.-K. Wang, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S.Wenger, \System and Transport Interface of SVC," IEEETransactions on Circuits and Systems for Video Technology, vol. 17,no. 9, pp. 1149 - 1163, September 2007.Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL May 2008 68 / 68

scalable video coding extension of h.264/avcmapl.nctu.edu.tw/course/vc_2009/files/07.special...

Documents