the avc video coding standarddocuments.epfl.ch/users/e/eb/ebrahimi/www/shenzhen...• drop 4x8, 8x4,...
TRANSCRIPT
The AVCThe AVCVideo Coding StandardVideo Coding Standard
(ITU(ITU--T Rec. H.264 | ISO/IEC 14496T Rec. H.264 | ISO/IEC 14496--10)10)
Gary J. Sullivan, Ph.D.Gary J. Sullivan, Ph.D.
ITUITU--T VCEG Rapporteur / ChairT VCEG Rapporteur / ChairISO/IEC MPEG Video Rapporteur / CoISO/IEC MPEG Video Rapporteur / Co--ChairChair
ITU/ISO/IEC JVT Rapporteur / CoITU/ISO/IEC JVT Rapporteur / Co--Chair Chair
Microsoft Corporation Video ArchitectMicrosoft Corporation Video Architect
October 2007October 2007
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 1
1990 1996 20021992 1994 1998 2000
H.263(1995-2000+)
H.263(1995-2000+)
MPEG-4 Visual
(1998-2001+)
MPEG-4 Visual
(1998-2001+)MPEG-1
(1993)MPEG-1
(1993)
ISO
/IEC
ITU
-T
H.120(1984-1988)
H.120(1984-1988)
H.261(1990+)H.261(1990+)
H.262 / MPEG-2
(1994/95-1998+)
H.262 / MPEG-2
(1994/95-1998+)
H.264 / MPEG-4
AVC(2003-2007+
H.264 / MPEG-4
AVC(2003-2007+)
Chronology of InternationalChronology of InternationalVideo Coding StandardsVideo Coding Standards
2004
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 2
The The ScopeScope of Picture and Video of Picture and Video Coding StandardizationCoding Standardization
Only the Syntax and Decoder are standardized:• Permits optimization beyond the obvious• Permits complexity reduction for implementability• Provides no guarantees of Quality
Pre-Processing EncodingSource
DestinationPost-Processing& Error Recovery
Decoding
Scope of Standard
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 3
The Advanced Video CodingThe Advanced Video Coding ProjectProjectAVC = ITUAVC = ITU--T H.264 / MPEGT H.264 / MPEG--4 part 104 part 10
History: ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group) “H.26L” standardization activity (where the “L” stood for “long-term”)Aug 1999: 1st test model (TML-1)July 2001: MPEG open call for technology: H.26L demo’edDec 2001: Formation of the Joint Video Team (JVT) between VCEG and MPEG to finalize H.26L as a new joint project (similar to MPEG-2/H.262)July 2002: Final Committee Draft status in MPEGDec ‘02 Technical freeze, FCD ballot approvedMay ’03 Completed in both orgsJuly ’04 Fidelity Range Extensions (FRExt) completedJan ’07 Professional Profiles completed
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 4
Primary technical objectives:• Significant improvement in coding efficiency• High loss/error robustness• “Network Friendliness” (carry it well on MPEG-2 or RTP or
H.32x or in MPEG-4 file format or MPEG-4 systems or …)• Low latency capability (better quality for higher latency)• Exact match decoding
Initial extension objectives (in FRExt and Prof Profiles):• Professional applications (more than 8 bits per sample,
4:4:4 color sampling, etc.)• Higher-quality high-resolution video• Alpha plane support (a degree of “object” functionality)• Extended color gamut support
AVC ObjectivesAVC Objectives
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 5
Test of different standards (ICIP 2002 study)Using same rate-distortion optimization techniques for all codecsStreaming test: High-latency (included B frames)
• Four QCIF sequences coded at 10 Hz and 15 Hz (Foreman, Container, News, Tempete) and
• Four CIF sequences coded at 15 Hz and 30 Hz (Bus, Flower Garden, Mobile and Calendar, and Tempete)
Real-time conversation test: No B frames• Four QCIF sequences encoded at 10Hz and 15Hz (Akiyo, Foreman, Mother and
Daughter, and Silent Voice)• Four CIF sequences encoded at 15Hz and 30Hz (Carphone, Foreman, Paris,
and Sean)Compare four codecs using PSNR measure:
• MPEG-2 (in high-latency/streaming test only)• H.263 (high-latency profile, conversational high-compression profile, baseline
profile)• MPEG-4 Visual (simple and advanced simple profiles with & without B pictures)• H.264/AVC (with & without B pictures)
Note: These test results are from a private study and are not an endorsed report of the JVT, VCEG or MPEG organizations.
A ComparisonA Comparison of of PerformancePerformance
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 6
ComparisonComparison to MPEGto MPEG--2, H.263, 2, H.263, MPEGMPEG--4p24p2
27282930313233343536373839
0 50 100 150 200 250Bit-rate [kbit/s]
Foreman QCIF 10Hz
QualityY-PSNR [dB]
MPEG-2 or MPEG-1H.263 (+)
MPEG-4 Visual ASPMPEG-4 AVC/H.264
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 7
MPEGMPEG--4 AVC/H.264 Structure4 AVC/H.264 Structure
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
DeblockingFilter
OutputVideoSignal
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 8
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
Motion Compensation AccuracyMotion Compensation Accuracy
Motion vector accuracy 1/4 sample(6-tap filter)
8x8
0
4x8
0 10 12 3
4x48x4
108x8
Types
0
16x16
0 1
8x16MB
Types
8x80 12 3
16x8
1
0
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 9
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
MotionData
OutputVideoSignal
Multiple Reference FramesMultiple Reference Frames
Multiple Reference FramesGeneralized B Frames
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 10
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
IntraIntra PredictionPredictionDirectional spatial prediction (9 types for 4x4 luma pred, 4 types for 16x16 luma pred,4 types for 8x8 chroma pred)
e.g., Mode 4: diagonal down/right predictiona, f, k, p are predicted by (A + 2M + I + 2) >> 2
M A B C D E F G HI a b c dJ e f g hK i j k lL m n o p
1
8
3 45
6
7 0
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 11
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
De-blockingFilter
OutputVideoSignal
DCTDCT--Like TransformLike Transform CodingCoding
4x4 Block Integer Transform
Hierarchical transform of DC coeffsfor 8x8 chroma and 16x16 Intra luma blocks
1 1 1 12 1 1 21 1 1 11 2 2 1
⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥
− −⎢ ⎥⎣ ⎦
H
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 12
Deblocking FilterDeblocking Filter
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
DeblockingFilter
OutputVideoSignal
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 13
Entropy CodingEntropy Coding
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 luma
samples+chroma
Intra-frame Prediction
DeblockingFilter
OutputVideoSignal
•UVLC/Exp-Golomb +
•CABAC or
•CAVLC
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 14
Three profiles in version 1: Baseline, Main, and ExtendedBaseline (esp. Videoconferencing & Wireless)• I and P progressive-scan picture coding (not B)• In-loop deblocking filter• 1/4-sample motion compensation• Tree-structured motion segmentation down to 4x4 block size• VLC-based entropy coding• Some enhanced error resilience features
– Flexible macroblock ordering/arbitrary slice ordering– Redundant slices
AVC Version 1 ProfilesAVC Version 1 Profiles
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 15
Main Profile (esp. Broadcast)• All Baseline features except enhanced error resilience features• Interlaced video handling• Generalized B pictures• Adaptive weighting for B and P picture prediction• CABAC (arithmetic entropy coding)
Extended Profile (esp. Streaming)• All Baseline features• Interlaced video handling• Generalized B pictures• Adaptive weighting for B and P picture prediction• More error resilience: Data partitioning• SP/SI switching pictures
NonNon--Baseline AVC Version 1 ProfilesBaseline AVC Version 1 Profiles
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 16
FidelityFidelity--Range and Professional ExtensionsRange and Professional Extensions
AVC standard finished May 2003, published as “twin text”• ITU-T Recommendation H.264• ISO/IEC 14496-10 MPEG-4 AVC
Fidelity-Range Extensions (FRExt)• Work item initiated in July 2003• More than 8 bits, color other than 4:2:0• Alpha coding• More coding efficiency capability• Also new supplemental information
Professional Profiles• Work item initiated in October 2005• Focus initially on 4:4:4 (replacing prior FRExt 4:4:4 profile)• Later work on all-intra and new supplemental information
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 17
FRExtFRExt Technical Features Technical Features –– Part 1Part 1
Larger transforms• 8x8 transform (as was in older standards)• Drop 4x8, 8x4, or larger, 16-point…
Filtered intra prediction modes for 8x8 block sizeQuantization matrix• 4x4, 8x8, intra, inter trans. coefficients weighted differently• Old idea, dating to JPEG and before (circa 1986?)• Full capabilities not yet explored (visual weighting)
Coding in various color spaces• 4:2:2, 4:2:0, Monochrome, with/without Alpha• New integer color transform (a VUI-message item)
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 18
FRExtFRExt Technical Features Technical Features –– Part 2Part 2
Efficient lossless interframe codingFilm grain characterization for analysis/synthesis representationStereo-view video supportDeblocking filter display preference
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 19
8x8 168x8 16--Bit (Bit (BossenBossen) Transform) Transform
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−−−−−−−−
−−−−−−−−
−−−−−−−−
−−−−
3610121210634884488461231010312688888888
10312661231084488448
12106336101288888888
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 20
8x8 Transform Advantage8x8 Transform Advantage(JVT(JVT--K028, IBBP coding, K028, IBBP coding, progprog. scan). scan)
12.48Average
15.65Riverbed
10.93Crawford
13.46Movie 5
11.06Movie 4
12.01Movie 3
12.71Movie 2
11.59Movie 1
% BD bit-rate reductionSequence
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 21
Quantization MatrixQuantization Matrix
Similar concept to MPEG-2 designVary step size based on frequencyAdapted to modified transform structureMore efficient representation of weightsEight downloadable matrices (at least 4:2:0)• Intra 4x4 Y, Cb, Cr• Intra 8x8 Y• Inter 4x4 Y, Cb, Cr• Inter 8x8 Y
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 22
New Profiles Created by New Profiles Created by FRExtFRExt
4:2:0, 8-bit: “High” (HP)4:2:0, 10-bit: “High 10” (Hi10)4:2:2, 10-bit: “High 4:2:2” (Hi422)
Effectively the same tools, but acting on different input dataThe High Profile has been a major force in recent industry developments (HD DVD, Blu-ray Disc, DBS, Terrestrial Broadcast, IPTV, etc.)The others are emerging in professional applications (e.g., content acquisition, editing, studios, recording)
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 23
A Performance Test for High Profile A Performance Test for High Profile (from JVT(from JVT--L033 L033 -- Panasonic)Panasonic)
Subjective tests by Blu-Ray Disk Founders of FRExt HP
• 4:2:0/8 (HP) 1920x1080x24p (1080p), 3 clips. • Nominal 3:1 advantage to MPEG-2
– 8 Mbps HP scored better than 24 Mbps MPEG-2• Apparent transparency at 16 Mbps
H.264/AVC FRExt 8Mbps
Original MPEG2 24 Mbs, DVHS
emulation
MeanOpinionScore
2
2.5
3
3.5
4
4.5
5
H.264/AVC FRExt
12Mbps
H.264/AVC FRExt
16Mbps
H.264/AVC FRExt
20Mbps
3.65 3.714.00
4.03 3.90
3.59
Figure 1: Results of subjective test with studio participants (Blu-Ray Disk Founders)
5: Perfect4: Good3: Fair (OK for DVD)2: Poor1: Very Poor
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 24
Some Notes on Quality TestingSome Notes on Quality Testing
Use recent reference software (if using ref software)Use rate-distortion optimization in encoderUse large-range good-quality motion searchUse appropriate “High” profile (incl. adaptive transform)If testing for PSNR, use “flat” quant matricesOtherwise, use “non-flat” quant matricesUse CABAC entropy codingUse more than 1 or 2 reference picturesUse hierarchical B reference frames coding structureUse bi-predictive search optimization (see JVT-N014)If testing high-quality PSNR, use adaptive thresholding*
* = See G. Sullivan & S. Sun, “On Dead-Zone…”, VCIP 2005/JVT-N011
AVC Profile Overview
BaselineBaseline
ExtendedExtended
SI and SPslice
ASO
FMO
Redundantpictures
Datapartitioning
I and P sliceMotion-compensatedprediction
In-loop deblockingIntra prediction
CAVLC
B sliceFieldcoding
MBAFFWeightedprediction
MainMain
CABAC
HighHigh8×8spatialprediction
8×8transform
Monochromeformat
Scalingmatrices
High 10High 10High 4:2:2High 4:2:2
8-10bsamplebit depth
4:2:2chromaformat
High 4:4:4High 4:4:4PredictivePredictive
4:4:4chromaformat
8-14bsamplebit depth
Predictivelossless
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 26
High 4:4:4 Intra (14b)
High 4:4:4 Predictive (14b)
High 4:2:2 Intra (10b)
High 10 Intra (4:2:0 10b)
High 4:2:2 (Predictive 10b)
High 10(4:2:0 Predictive 10b)
Notes:Arrows denote capability subset hierarchy.Four profiles not shown: Baseline, Extended, Main, High.
Existing
CAVLC 4:4:4 Intra (14b)
CABAC+CAVLC
CAVLConly
New Profiles for Professional Apps (2007)New Profiles for Professional Apps (2007)
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 27
Scalable Baseline
Baseline High
Scalable High Scalable High Intra
Spatial scalability(dyadic, 3/2)
Coarse-grain scalability
Spatial scalability(arbitrary up to 2)
Coarse-grain scalability
Existing
New Scalable Video Coding ProfilesNew Scalable Video Coding Profiles(see following talk by Ohm)(see following talk by Ohm)
Shenzhen New Multimedia Technology Workshop, October 2007 Gary J. Sullivan 28
JVT, MPEG, and VCEG management team members:• Gary J. Sullivan ([email protected])• Jens-Rainer Ohm ([email protected]) • Ajay Luthra ([email protected])• Thomas Wiegand ([email protected])
AVC literature references:• IEEE Transactions on Circuits and Systems for Video Technology
Special Issue on H.264/AVC (July 2003)(Luthra, Sullivan, Wiegand, Eds.)
• Paper in Proceedings of IEEE January 2005 (Sullivan & Wiegand)• I. Richardson, H.264 and MPEG-4 Video Compression, 2003• Overview incl. FRExt: SPIE Aug 2004 (Sullivan, Topiwala, & Luthra)• Paper at SPIE VCIP 2005: Meta-overview and deployment• Paper in IEEE Communications Magazine, Aug 2006
(Marpe, Wiegand, Sullivan)• Paper on professional extensions, IEEE ICIP, Sept 2007 (Sullivan et al.)• Wikipedia AVC page
For Further InformationFor Further Information