Overview of MPEG-4
Lihang Ying
Department of Computing Science
University of Alberta, Edmonton, Canada
These slides are available online: www.cs.ualberta.ca/~lihang/Share/mpeg4
Outline
MPEG-4 Demos and Overview Demos Overview
How to Organize MPEG-4 Contents – Scene/Object Description Examples Study
Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
Demos
EnvivioTV:http://www.envivio.com/products/etv/content/technical.jsp
It’s a plug-in for realplayer, media player or quicktime
Characters(1)
MPEG-4 vs MPEG-1/2 Not merely video and audio Interactive
Object-based
Scalability
Characters(2)
Why MPEG-4? Interoperability:
Run on all kinds of platforms and devices Reuse Multimedia contents Create once, use everywhere
Multi-network Delivery Internet/Mobile/Broadcast Networks Different bandwidth
Scalability Different capacity (i.e. display resolution) of
different devices
MPEG-J
API: org.iso.mpeg.mpegj org.iso.mpeg.mpegj.scene org.iso.mpeg.mpegj.resource org.iso.mpeg.mpegj.decoder org.iso.mpeg.mpegj.net
Implement MPEG-4 Coder/Decoder conveniently with MPEG-J API
Create Coder/Decoder once, run on all kinds of devices and platforms
Profile/Level
Different Implementations: Profile
Divide functionality into different subsets Level
Constraints on parameters(bitrate,frames/sec…)
Example: EnvivioTV Video: Advanced simple profile at levels 0 - 5. Audio: High-quality profile at levels 1 - 2. Graphics: Advanced profile
•Interactive
Multi-network Delivery
Coder/Decoder: Using MPEG-J Scalability: Different Capacity Profile/Level
Not merely audio/videoObject-based Interoperability
Outline
MPEG-4 Demos and Overview Demos Overview
How to Organize MPEG-4 Contents – Scene/Object Description Examples Study
Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
How to Organize Contents
Scene Descriptor Assemble objects into audiovisual
scene Scene description format—binary
format for MPEG-4 scenes (BIFS)
Object Descriptor Describe objects
initialobject description
ES_Descriptor1
ES_Descriptor2
scene descriptor stream
BIFS update (replace scene)
scenedescription
scenedescription
VideoSourceAudio
Source
object descriptor stream
object descriptor update
objectdescr.
object descr.
ES_Descr1
ES_Desc2
visual stream (base layer)
visual stream (e.g. temporal enhancement layer)
audio stream
ES_ID1
ES_ID2
ES_D1 ES_IDc
ES_IDb
ES_IDa
ES_IDi
ES_IDii
Scene Description - BIFS
Represented by XMT-A Format: Similar to XML Express bitstream syntax in
document Enable easy generation of bitstream
parser
BIFS Examples: …
BIFS Example(1)–Trivial Scene(MPEG-2/DVD)
Scene Tree
Layer2D
Sound2D
AudioSource
Shape
Bitmap
Appearance
MovieTexture
BIFS Example(1)–Trivial Scene(MPEG-2/DVD)
BIFS Example(2)–Movie with Subtitles
BIFS Example(3)–Icons
Icons
BIFS Example(4)–Buttons
Event Response
Object Description
Syntactic Description Language (SDL) Express bitstream syntax in
document Enable easy generation of bitstream
parser
SDL Example:…
Object Description - SDL
ObjectDescriptorclass ObjectDescriptor extends ObjectDescriptorBase: bit(8)
tag=ObjectDescrTag {
bit(10) ObjectDescriptorID;
bit(1) URL_Flag;
const bit(5) reserved=0b1111.1;
if (URL_Flag) {
bit(8) URLlength;
bit(8) URLstring(URLlength);
} else {
ES_Descriptor esDescr[1..255];
OCI_Descriptor ociDescr[0..255];
IPMP_DescritporPointer ipmpDescriPtr[0..255];
}
ExtensionDescriptor extDescr[0..255];
}
Object Descriptor Summary ObjectDescriptor
ObjectDescriptorID URL_Flag ES_Descriptor // Elementary Streaming
ES_ID, streamDependenceFlag, URL_Flag, OCRstreamFlag, streamPriority, DecoderConfigDescriptor, SLConfigDescriptor, IPI_DescrPointer, IP_IdentificationDataSet, IPMP_DescriptorPointer, LanguageDescriptor, QoS_Decriptor...
OCI_Descriptor // Object Content Information
ContentClassificationDescriptor, KeywordDescriptor, RatingDecriptor, LanguageDescriptor, ShortTextualDescriptor, ExpandedTextualDescriptor, ContentCreatorNameDescriptor, ContentCreationDataDescriptor, OCICreatorNameDescriptor, OCICreationNameDescriptor, SmpteCameraPositionDescriptor, MediaTimeDescriptor, ...
IPMP_DescriptorPointer // Intellectual Property Management and Protection
Applications of OCI/IPMP–eDonkey’s problems
MPEG-4 Objects and Tools
Audio Natural Audio Synthetic and Natural Hybrid
Coding(SNHC) Visual
Natural Video Object-based/Scalability
SNHC 2D/3D Mesh Object/Face and Body Animation
Image Text …
Outline
MPEG-4 Demos and Overview Demos Overview
How to Organize MPEG-4 Contents – Scene/Object Description Examples Study
Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
[2D Mesh Coding]
Natural Video Coding Block-based textual and motion coding Shape information coding
2D Mesh Coding Designed for video manipulation 2D mesh or 2D planar graphs with triangles Natural images and video mapped on 2D
meshes Applications: Object tracking, Content-based
video retrieval(e.g. motion-based queries), 2D animation, Augmented reality, …
Example
(a)original frame
(b)Mesh generated
(c)Text overlaid on video:Text moves along with the fish’s meshs
Architecture of 2D Mesh Coding
2D Mesh Object Also called 2D Dynamic Mesh Support video coding by moving the
vertices of the mesh Topology of the mesh does not
change in one session
Mesh Data includes: Connectivity: how vertices are
connected Geometry: 2D coordinates of vertices Motion: temporal difference of vertices’
positions
I-MOP and P-MOP
I-MOP:Intra-Mesh Object Plane For a given session, connectivity and
geometry information needs to be transmitted only once
P-MOP:Inter-Mesh Object Plane The deformation of the given mesh
over time can be described as temporal difference of the geometry, or geometry motion
2D Mesh Decoding Scheme
Mesh Data - Connectivity
Uniform Triangulation: Suited for rectangular video objects Located in x and y grids Specify the length of grid intervals
Mesh Data - Connectivity
Delaunay Triangulation: Suited for arbitrarily shaped video
objects Guarantee:
Close to Equilateral: producing the largest minimal angle
Unique: unique triangulation for given vertices
Coding of Connectivity Data
Uniform Triangulation:
Delaunay Triangulation: Differential coding:
xn=xn-1+dxn, yn=yn-1+dyn
Coding Order of Delaunay Triangulation
1) Boundary vertices Start from top-left most Counterclockwise
2) Inside vertices Choice the next by distance-closest
one
Coding of Mesh Motion
Motion: temporal difference of vertices’ positions Mesh Traversal:
1) Start from top-left, breadth-first 2) Right(Next counterclockwise) 3) Left This order remain unchanged(intact) until
next I-MOP is decoded Mesh Motion Coding
Encoded based on previously encoded two neighboring vertices, e.g. cbaabcIn ),(,
[3D Mesh Coding]
2D Mesh Coding: supports to map natural images and
video mapped on 2D meshes 3D Mesh Coding:
Represent and compress 3D objects onto which images and videos may be mapped
Compress static 3D models, not their animation
Functionalities
High compression 2%-4% of VRML ASCII file
Incremental rendering Building the model with part bitstream
Error resilience Suffer less from network errors
Hierarchical buildup Scalable bitstream with different
resolutions, depending on viewing distance
Incremental Rendering
Data of 3D Mesh Object
Connectivity: how vertices are connected
Geometry: 3D coordinates of vertices
Photometry Colors Normals Texture
Bitstream of 3D Mesh Coding
Connectivity Data Vertex graph Triangle tree
Triangle Data Contains: geometry coordinates,
colors, normals, texture coordinates Largest part of the bitstream
Bitstream of 3D Mesh Coding
Connectivity Data is packed separately and before the Triangle Data.
Benefits: Incremental rendering:
Could decode Triangle Data incrementally since full Connectivity(topology) Data is already available
Shorten the latency Error resilience:
Can form 3D structure even with some missing Triangle Data
Decoding Scheme of 3D Mesh
Vertex Graph
Triangle Tree
Data of 3D Mesh Object
Connectivity: how vertices are connected
Geometry: 3D coordinates of vertices
Photometry Colors Normals Texture
Coding of Geometry and Photometry Data
1) Quantization
2) Differential Coding No prediction Parallelogram prediction Tree prediction
3) Adaptive Arithmetric Entropy Coding Code the differential values
3D Mesh Coding Modes
Error-Resilience Mode To minimize the impact of errors,
divide into partition or packet Render partitions independently
Progressive Transmission Mode Scalable coding
One base layer One or more enhancement layers
Provide Forest Split operations Contains face forest, triangle tree, triangle
data
Forest Split Operation
(a) Cut through the edges of vertex tree
(b) Open the dotted line
(c) Triangulate the opening to form a triangle tree
(d) Refined mesh
References Books:
Major Reference: Major Reference: Fernando Pereira,Touradj Ebrahimi,The MPEG-4 Book, Prenticle Hall PTR, 2002
Natural Video Coding Technology: Joan L.Mitchell,etc. MPEG Video Compression Standard, Chapman&Hall, 1996
MPEG Official Websites: Overview: http://mpeg.telecomitalialab.com/standards.htm ResourcesResources: http://www.m4if.org/resources.php
Demos: http://www.envivio.com/products/etv/content/technical.jsp http://www.ivast.com/aboutmpeg4/index.html
MPEG-4 Series Slides, Course Presentation of C640/2003 Winter, U. of Alberta:
http://www.cs.ualberta.ca/~anup/Courses/604/604_3D.htm
The End
Acknowledgements Yongjie Liu Michael Closson
Questions and Comments?
DecoderConfigDescriptor
Class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag {bit(8) objectTypeIndication;bit(6) streamType;bit(1) upStream;const bit(1) reserved=1;bit(24) bufferSizeDB;bit(32) maxBitrate;bit(32) avgBitrate;DecoderSpecificInfo decSpecificInfo[0..1];profileLevelIndicationIndexDescriptor
profileLevelIndicationIndexDescr[0..255];
}Back