visual media coding and transmission€¦ · 2.3.1 video signal representation and picture...

30
Visual Media Coding and Transmission Ahmet Kondoz Centre for Communication Systems Research, University of Surrey, UK

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Visual Media Coding and Transmission

Ahmet Kondoz

Centre for Communication Systems Research, University of Surrey, UK

Page 2: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4
Page 3: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Visual Media Coding and Transmission

Page 4: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4
Page 5: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Visual Media Coding and Transmission

Ahmet Kondoz

Centre for Communication Systems Research, University of Surrey, UK

Page 6: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

This edition first published 2009

# 2009 John Wiley & Sons Ltd.

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for

permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,

Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any

form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK

Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be

available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and

product names used in this book are trade names, service marks, trademarks or registered trademarks of their

respective owners. The publisher is not associated with any product or vendor mentioned in this book. This

publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is

sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice

or other expert assistance is required, the services of a competent professional should be sought.

#1998, #2001, #2002, #2003, #2004. 3GPPTM TSs and TRs are the property of ARIB, ATIS, CCSA, ETSI,

TTA and TTC who jointly own the copyright in them. They are subject to further modifications and are therefore

provided to you ‘as is’ for information purposes only. Further use is strictly prohibited.

Library of Congress Cataloging-in-Publication Data

Kondoz, A. M. (Ahmet M.)

Visual media coding and transmission / Ahmet Kondoz.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-74057-6 (cloth)

1. Multimedia communications. 2. Video compression. 3. Coding theory. 4. Data transmission systems. I. Title.

TK5105.15.K65 2009

621.382’1–dc22

2008047067

A catalogue record for this book is available from the British Library.

ISBN 9780470740576 (H/B)

Set in 10/12pt Times New Roman by Thomson Digital, Noida, India.

Printed in Great Britain by CPI Antony Rowe, Chippenham, England

Page 7: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Contents

VISNET II Researchers xiii

Preface xv

Glossary of Abbreviations xvii

1 Introduction 1

2 Video Coding Principles 7

2.1 Introduction 7

2.2 Redundancy in Video Signals 7

2.3 Fundamentals of Video Compression 8

2.3.1 Video Signal Representation and Picture Structure 8

2.3.2 Removing Spatial Redundancy 9

2.3.3 Removing Temporal Redundancy 14

2.3.4 Basic Video Codec Structure 16

2.4 Advanced Video Compression Techniques 17

2.4.1 Frame Types 17

2.4.2 MC Accuracy 19

2.4.3 MB Mode Selection 20

2.4.4 Integer Transform 21

2.4.5 Intra Prediction 22

2.4.6 Deblocking Filters 22

2.4.7 Multiple Reference Frames and Hierarchical Coding 24

2.4.8 Error-Robust Video Coding 24

2.5 Video Codec Standards 28

2.5.1 Standardization Bodies 28

2.5.2 ITU Standards 29

2.5.3 MPEG Standards 29

2.5.4 H.264/MPEG-4 AVC 31

2.6 Assessment of Video Quality 31

2.6.1 Subjective Performance Evaluation 31

2.6.2 Objective Performance Evaluation 32

2.7 Conclusions 35

References 36

Page 8: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

3 Scalable Video Coding 39

3.1 Introduction 39

3.1.1 Applications and Scenarios 40

3.2 Overview of the State of the Art 41

3.2.1 Scalable Coding Techniques 42

3.2.2 Multiple Description Coding 45

3.2.3 Stereoscopic 3D Video Coding 47

3.3 Scalable Video Coding Techniques 48

3.3.1 Scalable Coding for Shape, Texture, and Depth for 3D Video 48

3.3.2 3D Wavelet Coding 68

3.4 Error Robustness for Scalable Video and Image Coding 74

3.4.1 Correlated Frames for Error Robustness 74

3.4.2 Odd–Even Frame Multiple Description Codingfor Scalable H.264/AVC 82

3.4.3 Wireless JPEG 2000: JPWL 91

3.4.4 JPWL Simulation Results 94

3.4.5 Towards a Theoretical Approach for Optimal UnequalError Protection 96

3.5 Conclusions 98

References 99

4 Distributed Video Coding 105

4.1 Introduction 105

4.1.1 The Video Codec Complexity Balance 106

4.2 Distributed Source Coding 109

4.2.1 The Slepian–Wolf Theorem 109

4.2.2 The Wyner–Ziv Theorem 110

4.2.3 DVC Codec Architecture 111

4.2.4 Input Bitstream Preparation – Quantization and Bit Plane Extraction 112

4.2.5 Turbo Encoder 112

4.2.6 Parity Bit Puncturer 114

4.2.7 Side Information 114

4.2.8 Turbo Decoder 115

4.2.9 Reconstruction: Inverse Quantization 116

4.2.10 Key Frame Coding 117

4.3 Stopping Criteria for a Feedback Channel-based Transform

Domain Wyner–Ziv Video Codec 118

4.3.1 Proposed Technical Solution 118

4.3.2 Performance Evaluation 120

4.4 Rate-distortion Analysis of Motion-compensated Interpolation

at the Decoder in Distributed Video Coding 122

4.4.1 Proposed Technical Solution 122

4.4.2 Performance Evaluation 126

4.5 Nonlinear Quantization Technique for Distributed Video Coding 129

4.5.1 Proposed Technical Solution 129

4.5.2 Performance Evaluation 132

vi Contents

Page 9: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

4.6 Symmetric Distributed Coding of Stereo Video Sequences 134

4.6.1 Proposed Technical Solution 134

4.6.2 Performance Evaluation 137

4.7 Studying Error-resilience Performance for a Feedback Channel-based

Transform Domain Wyner–Ziv Video Codec 139

4.7.1 Proposed Technical Solution 139

4.7.2 Performance Evaluation 140

4.8 Modeling the DVC Decoder for Error-prone Wireless Channels 144

4.8.1 Proposed Technical Solution 145

4.8.2 Performance Evaluation 149

4.9 Error Concealment Using a DVC Approach for Video

Streaming Applications 151

4.9.1 Proposed Technical Solution 152

4.9.2 Performance Evaluation 155

4.10 Conclusions 158

References 159

5 Non-normative Video Coding Tools 161

5.1 Introduction 161

5.2 Overview of the State of the Art 162

5.2.1 Rate Control 162

5.2.2 Error Resilience 164

5.3 Rate Control Architecture for Joint MVS Encoding and Transcoding 165

5.3.1 Problem Definition and Objectives 165

5.3.2 Proposed Technical Solution 166

5.3.3 Performance Evaluation 169

5.3.4 Conclusions 171

5.4 Bit Allocation and Buffer Control for MVS Encoding Rate Control 171

5.4.1 Problem Definition and Objectives 171

5.4.2 Proposed Technical Approach 172

5.4.3 Performance Evaluation 177

5.4.4 Conclusions 179

5.5 Optimal Rate Allocation for H.264/AVC Joint MVS Transcoding 179

5.5.1 Problem Definition and Objectives 179

5.5.2 Proposed Technical Solution 180

5.5.3 Performance Evaluation 181

5.5.4 Conclusions 182

5.6 Spatio-temporal Scene-level Error Concealment for Segmented Video 182

5.6.1 Problem Definition and Objectives 182

5.6.2 Proposed Technical Solution 183

5.6.3 Performance Evaluation 187

5.6.4 Conclusions 188

5.7 An Integrated Error-resilient Object-based Video

Coding Architecture 189

5.7.1 Problem Definition and Objectives 189

5.7.2 Proposed Technical Solution 189

Contents vii

Page 10: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

5.7.3 Performance Evaluation 195

5.7.4 Conclusions 195

5.8 A Robust FMO Scheme for H.264/AVC Video Transcoding 195

5.8.1 Problem Definition and Objectives 195

5.8.2 Proposed Technical Solution 195

5.8.3 Performance Evaluation 197

5.8.4 Conclusions 198

5.9 Conclusions 199

References 199

6 Transform-based Multi-view Video Coding 203

6.1 Introduction 203

6.2 MVC Encoder Complexity Reduction using a Multi-grid

Pyramidal Approach 205

6.2.1 Problem Definition and Objectives 205

6.2.2 Proposed Technical Solution 205

6.2.3 Conclusions and Further Work 208

6.3 Inter-view Prediction using Reconstructed Disparity

Information 208

6.3.1 Problem Definition and Objectives 208

6.3.2 Proposed Technical Solution 208

6.3.3 Performance Evaluation 210

6.3.4 Conclusions and Further Work 211

6.4 Multi-view Coding via Virtual View Generation 212

6.4.1 Problem Definition and Objectives 212

6.4.2 Proposed Technical Solution 212

6.4.3 Performance Evaluation 215

6.4.4 Conclusions and Further Work 216

6.5 Low-delay Random View Access in Multi-view Coding Using

a Bit Rate-adaptive Downsampling Approach 216

6.5.1 Problem Definition and Objectives 216

6.5.2 Proposed Technical Solution 216

6.5.3 Performance Evaluation 219

6.5.4 Conclusions and Further Work 222

References 222

7 Introduction to Multimedia Communications 225

7.1 Introduction 225

7.2 State of the Art: Wireless Multimedia Communications 228

7.2.1 QoS in Wireless Networks 228

7.2.2 Constraints on Wireless Multimedia Communications 231

7.2.3 Multimedia Compression Technologies 234

7.2.4 Multimedia Transmission Issues in Wireless Networks 235

7.2.5 Resource Management Strategy in Wireless MultimediaCommunications 239

viii Contents

Page 11: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

7.3 Conclusions 244

References 244

8 Wireless Channel Models 2478.1 Introduction 247

8.2 GPRS/EGPRS Channel Simulator 247

8.2.1 GSM/EDGE Radio Access Network (GERAN) 247

8.2.2 GPRS Physical Link Layer Model Description 250

8.2.3 EGPRS Physical Link Layer Model Description 252

8.2.4 GPRS Physical Link Layer Simulator 256

8.2.5 EGPRS Physical Link Layer Simulator 261

8.2.6 E/GPRS Radio Interface Data Flow Model 268

8.2.7 Real-time GERAN Emulator 270

8.2.8 Conclusion 271

8.3 UMTS Channel Simulator 272

8.3.1 UMTS Terrestrial Radio Access Network (UTRAN) 272

8.3.2 UMTS Physical Link Layer Model Description 279

8.3.3 Model Verification for Forward Link 290

8.3.4 UMTS Physical Link Layer Simulator 298

8.3.5 Performance Enhancement Techniques 307

8.3.6 UMTS Radio Interface Data Flow Model 309

8.3.7 Real-time UTRAN Emulator 312

8.3.8 Conclusion 313

8.4 WiMAX IEEE 802.16e Modeling 316

8.4.1 Introduction 316

8.4.2 WIMAX System Description 317

8.4.3 Physical Layer Simulation Results and Analysis 323

8.4.4 Error Pattern Files Generation 324

8.5 Conclusions 328

8.6 Appendix: Eb/No and DPCH_Ec/Io Calculation 329

References 330

9 Enhancement Schemes for Multimedia Transmission over

Wireless Networks 333

9.1 Introduction 333

9.1.1 3G Real-time Audiovisual Requirements 333

9.1.2 Video Transmission over Mobile Communication Systems 335

9.1.3 Circuit-switched Bearers 339

9.1.4 Packet-switched Bearers 348

9.1.5 Video Communications over GPRS 350

9.1.6 GPRS Traffic Capacity 351

9.1.7 Error Performance 354

9.1.8 Video Communications over EGPRS 357

9.1.9 Traffic Characteristics 357

9.1.10 Error Performance 358

9.1.11 Voice Communication over Mobile Channels 359

Contents ix

Page 12: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

9.1.12 Support of Voice over UMTS Networks 360

9.1.13 Error-free Performance 361

9.1.14 Error-prone Performance 362

9.1.15 Support of Voice over GPRS Networks 362

9.1.16 Conclusion 363

9.2 Link-level Quality Adaptation Techniques 365

9.2.1 Performance Modeling 365

9.2.2 Probability Calculation 367

9.2.3 Distortion Modeling 368

9.2.4 Propagation Loss Modeling 368

9.2.5 Energy-optimized UEP Scheme 369

9.2.6 Simulation Setup 370

9.2.7 Performance Analysis 372

9.2.8 Conclusion 373

9.3 Link Adaptation for Video Services 373

9.3.1 Time-varying Channel Model Design 374

9.3.2 Link Adaptation for Real-time Video Communications 379

9.3.3 Link Adaptation for Streaming Video Communications 389

9.3.4 Link Adaptation for UMTS 396

9.3.5 Conclusion 402

9.4 User-centric Radio Resource Management in UTRAN 403

9.4.1 Enhanced Call-admission Control Scheme 403

9.4.2 Implementation of UTRAN System-level Simulator 403

9.4.3 Performance Evaluation of Enhanced CAC Scheme 410

9.5 Conclusions 411

References 413

10 Quality Optimization for Cross-network Media Communications 41710.1 Introduction 417

10.2 Generic Inter-networked QoS-optimization Infrastructure 418

10.2.1 State of the Art 418

10.2.2 Generic of QoS for Heterogeneous Networks 420

10.3 Implementation of a QoS-optimized Inter-networked Emulator 422

10.3.1 Emulation System Physical Link Layer Simulation 426

10.3.2 Emulation System Transmitter/Receiver Unit 428

10.3.3 QoS Mapping Architecture 428

10.3.4 General User Interface 438

10.4 Performances of Video Transmission in Inter-networked Systems 442

10.4.1 Experimental Setup 442

10.4.2 Test for the EDGE System 443

10.4.3 Test for the UMTS System 445

10.4.4 Tests for the EDGE-to-UMTS System 445

10.5 Conclusions 452

References 453

x Contents

Page 13: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

11 Context-based Visual Media Content Adaptation 455

11.1 Introduction 455

11.2 Overview of the State of the Art in Context-aware Content Adaptation 457

11.2.1 Recent Developments in Context-aware Systems 457

11.2.2 Standardization Efforts on Contextual Information forContent Adaptation 467

11.3 Other Standardization Efforts by the IETF and W3C 476

11.4 Summary of Standardization Activities 479

11.4.1 Integrating Digital Rights Management (DRM) with Adaptation 480

11.4.2 Existing DRM Initiatives 480

11.4.3 The New ‘‘Adaptation Authorization’’ Concept 481

11.4.4 Adaptation Decision 482

11.4.5 Context-based Content Adaptation 488

11.5 Generation of Contextual Information and Profiling 492

11.5.1 Types and Representations of Contextual Information 492

11.5.2 Context Providers and Profiling 494

11.5.3 User Privacy 497

11.5.4 Generation of Contextual Information 498

11.6 The Application Scenario for Context-based Adaptation

of Governed Media Contents 499

11.6.1 Virtual Classroom Application Scenario 500

11.6.2 Mechanisms using Contextual Information in a VirtualCollaboration Application 502

11.6.3 Ontologies in Context-aware Content Adaptation 503

11.6.4 System Architecture of a Scalable Platform for Context-awareand DRM-enabled Content Adaptation 504

11.6.5 Context Providers 507

11.6.6 Adaptation Decision Engine 510

11.6.7 Adaptation Authorization 514

11.6.8 Adaptation Engines Stack 517

11.6.9 Interfaces between Modules of the Content Adaptation Platform 544

11.7 Conclusions 552

References 553

Index 559

Contents xi

Page 14: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4
Page 15: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

VISNET II Researchers

UniS

Omar Abdul-Hameed

Zaheer Ahmad

Hemantha Kodikara Arachchi

Murat Badem

Janko Calic

Safak Dogan

Erhan Ekmekcioglu

Anil Fernando

Christine Glaser

Banu Gunel

Huseyin Hacihabiboglu

Hezerul Abdul Karim

Ahmet Kondoz

Yingdong Ma

Marta Mrak

Sabih Nasir

Gokce Nur

Surachai Ongkittikul

Kan Ren

Daniel Rodriguez

Amy Tan

Eeriwarawe Thushara

Halil Uzuner

Stephane Villette

Rajitha Weerakkody

Stewart Worrall

Lasith Yasakethu

HHI

Peter Eisert

J€urgen Rurainsky

Anna Hilsmann

Benjamin Prestele

David Schneider

Philipp Fechteler

Info Feldmann

Jens G€utherKarsten Gr€unebergOliver Schreer

Ralf Tanger

EPFL

Touradj Ebrahimi

Frederic Dufaux

Thien Ha-Minh

Michael Ansorge

Shuiming Ye

Yannick Maret

David Marimon

Ulrich Hoffmann

Mourad Ouaret

Francesca De Simone

Carlos Bandeirinha

Peter Vajda

Ashkan Yazdani

Gelareh Mohammadi

Alessandro Tortelli

Luca Bonardi

Davide Forzati

IST

Fernando Pereira

Jo~ao Ascenso

Catarina Brites

Luis Ducla Soares

Paulo Nunes

Paulo Correia

Jos�e Diogo Areia

Jos�e Quintas Pedro

Ricardo Martins

UPC-TSC

Pere Joaquim Mindan

Jos�e Luis Valenzuela

Toni Rama

Luis Torres

Francesc Tarr�es

UPC-AC

Jaime Delgado

Eva Rodrıguez

Anna Carreras

Rub�en Tous

TRT-UK

Chris Firth

Tim Masterton

Adrian Waller

Darren Price

Rachel Craddock

Marcello Goccia

Ian Mockford

Hamid Asgari

Charlie Attwood

Peter de Waard

Jonathan Dennis

Doug Watson

Val Millington

Andy Vooght

TUB

Thomas Sikora

Zouhair Belkoura

Juan Jose Burred

Page 16: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Michael Droese

Ronald Glasberg

Lutz Goldmann

Shan Jin

Mustafa Karaman

Andreas Krutz

Amjad Samour

TiLab

Giovanni Cordara

Gianluca Francini

Skjalg Lepsoy

Diego Gibellino

UPF

Enric Peig

Vıctor Torres

Xavier Perramon

PoliMiFabio Antonacci

Calatroni Alberto

Marco Marcon

Matteo Naccari

Davide Onofrio

Giorgio Prandi

Riva Davide

Francesco Santagata

Marco Tagliasacchi

Stefano Tubaro

Giuseppe Valenzise

IPW

Stanisław Badura

Lilla Baginska

Jarosław Baszun

Filip Borowski

Andrzej Buchowicz

Emil Dmoch

Edyta Dabrowska

Grzegorz Galinski

Piotr Garbat

Krystian Ignasiak

Mariusz Jakubowski

Mariusz Leszczynski

Marcin Morgos

Jacek Naruniec

Artur Nowakowski

Adam Ołdak

Grzegorz Pastuszak

Andrzej Pietrasiewicz

Adam Pietrowcew

Sławomir Rymaszewski

Radosław Sikora

Władysław Skarbek

Marek Sutkowski

Michał Tomaszewski

Karol Wnukowicz

INECS Porto

Giorgiana Ciobanu

Filipe Sousa

Jaime Cardoso

Jaime Dias

Jorge Mamede

Jos�e Ruela

Luıs Corte-Real

Luıs Gustavo Martins

Luıs Filipe Teixeira

Maria Teresa Andrade

Pedro Carvalho

Ricardo Duarte

Vıtor Barbosa

xiv VISNET II Researchers

Page 17: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Preface

VISNET II is a European Union Network of Excellence (NoE) in the 6th Framework Programme,

which brings together 12 leading European organizations in the field of Networked Audiovisual

Media Technologies. The consortium consists of organizations with a proven track record and

strong national and international reputations in audiovisual information technologies. VISNET II

integrates over 100 researchers who have made significant contributions to this field of

technology, through standardization activities, international publications, conferences workshop

activities, patents, and many other prestigious achievements. The 12 integrated organizations

represent 7 European states spanning across a major part of Europe, thereby promising efficient

dissemination and exploitation of the resulting technological development to larger communities.

This book contains some of the research output of VISNET II in the area of Advanced

Video Coding and Networking. The book contains details of video coding principles, which

lead to advanced video coding developments in the form of scalable coding, distributed

video coding, non-normative video coding tools, and transform-based multi-view coding.

Having detailed the latest work in visual media coding, the networking aspects of video

communication are presented in the second part of the book. Various wireless channel

models are presented, to form the basis for following chapters. Both link-level quality of

service (QoS) and cross-network transmission of compressed visual data are considered.

Finally, context-based visual media content adaptation is discussed with some examples.

It is hoped that this book will be used as a reference not only for some of the advanced

video coding techniques, but also for the transmission of video across various wireless

systems with well-defined channel models.

Ahmet Kondoz

University of Surrey

VISNET II Coordinator

Page 18: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4
Page 19: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Glossary of Abbreviations

3GPP 3rd Generation Partnership Project

AA Adaptation Authorizer

ADE Adaptation Decision Engine

ADMITS Adaptation in Distributed Multimedia IT Systems

ADTE Adaptation Decision Taking Engine

AE Adaptation Engine

AES Adaptation Engine Stack

AIR Adaptive Intra Refresh

API Application Programming Interface

AQoS Adaptation Quality of Service

ASC Aspect-Scale-Context

AV Audiovisual

AVC Advanced Video Coding

BLER Block Error Rate

BSD Bitstream Syntax Description

BSDL Bitstream Syntax Description Language

CC Convolutional Coding

CC Creative Commons

CC/PP Composite Capabilities/Preferences Profile

CD Coefficient Dropping

CDN Content Distribution Networks

CIF Common Intermediate Format

CoBrA Context Broker Architecture

CoDAMoS Context-Driven Adaptation of Mobile Services

CoOL Context Ontology Language

CoGITO Context Gatherer, Interpreter and Transformer using Ontologies

CPU Central Processing Unit

CROSLOCIS Creation of Smart Local City Services

CS/H.264/AVC Cropping and Scaling of H.264/AVC Encoded Video

CxP Context Provider

Page 20: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

DAML Directory Access Markup Language

DANAE Dynamic and distributed Adaptation of scalable multimedia content

in a context-Aware Environment

dB Decibel

DB Database

DCT Discrete Cosine Transform

DI Digital Item

DIA Digital Item Adaptation

DID Digital Item Declaration

DIDL Digital Item Declaration Language

DIP Digital Item Processing

DistriNet Distributed Systems and Computer Networks

DPRL Digital Property Rights Language

DRM Digital Rights Management

DS Description Schemes

EC European Community

EIMS ENTHRONE Integrated Management Supervisor

FA Frame Adaptor

FD Frame Dropping

FMO Flexible Macroblock Ordering

FP Framework Program

gBS Generic Bitstream Syntax

HCI Human–Computer Interface

HDTV High-Definition Television

HP Hewlett Packard

HTML HyperText Markup Language

IEC International Electrotechnical Commission

IETF Internet Engineering Task Force

IBM International Business Machines Corporation

iCAP Internet Content Adaptation Protocol

IPR Intellectual Property Rights

IROI Interactive Region of Interest

ISO International Organization for Standardization

IST Information Society Technologies

ITEC Department of Information Technology, Klagenfurt University

JPEG Joint Photographic Experts Group

JSVM Joint Scalable Video Model

MDS Multimedia Description Schemes

MB Macroblock

xviii Glossary of Abbreviations

Page 21: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

MDS Multimedia Description Schemes

MIT Massachusetts Institute of Technology

MOS Mean Opinion Score

MP3 Moving Picture Experts Group Layer-3 Audio (audio file format/extension)

MPEG Motion Picture Experts Group

MVP Motion Vector Predictor

NAL Network Abstract Layer

NALU Network Abstract Layer Unit

NoE Network of Excellence

ODRL Open Digital Rights Language

OIL Ontology Interchange Language

OMA Open Mobile Alliance

OSCRA Optimized Source and Channel Rate Allocation

OWL Web Ontology Language

P2P Peer-to-Peer

PDA Personal Digital Assistance

PSNR Peak Signal-to-Noise Ratio

QCIF Quarter Common Intermediate Format

QoS Quality of Service

QP Quantization Parameter

RD Rate Distortion

RDF Resource Description Framework

RDB Reference Data Base

RDD Rights Data Dictionary

RDOPT Rate Distortion Optimization

REL Rights Expression Language

ROI Region of Interest

SECAS Simple Environment for Context-Aware Systems

SNR Signal-to-Noise Ratio

SOAP Simple Object Access Protocol

SOCAM Service-Oriented Context-Aware Middleware

SVC Scalable Video Coding

TM5 Test Model 5

UaProf User Agent Profile

UCD Universal Constraints Descriptor

UED Usage Environment Descriptions

UEP Unequal Error Protection

UF Utility Function

Glossary of Abbreviations xix

Page 22: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

UI User Item

UMA Universal Multimedia Access

UMTS Universal Mobile Telecommunications System

URI Uniform Resource Identifiers

UTRAN UMTS Terrestrial Radio Access Network

VCS Virtual Collaboration System

VoD Video on Demand

VOP Video Object Plane

VQM Video Quality Metric

W3C World Wide Web Consortium

WAP Wireless Access Protocol

WCDMA Wideband Code Division Multiple Access

WDP Wireless Datagram Protocol

WLAN Wireless Local Area Network

WML Website Meta Language

WiFi Wireless Fidelity (IEEE 802.11b Wireless Networking)

XML eXtensible Markup Language

XrML eXtensible rights Markup Language

XSLT eXtensible Stylesheet Language Transformations

xx Glossary of Abbreviations

Page 23: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

1

Introduction

Networked Audio-Visual Technologies form the basis for the multimedia communication

systems that we currently use. The communication systems that must be supported are diverse,

ranging from fixed wired to mobile wireless systems. In order to enable an efficient and cost-

effective Networked Audio-Visual System, two major technological areas need to be investi-

gated: first, how to process the content for transmission purposes, which involves variousmedia

compression processes; and second, how to transport it over the diverse network technologies

that are currently in use orwill be deployed in the near future. In this book, therefore, visual data

compression schemes are presented first, followed by a description of various media trans-

mission aspects, including various channelmodels, and content and link adaptation techniques.

Raw digital video signals are very large in size, making it very difficult to transmit or store

them. Video compression techniques are therefore essential enabling technologies for digital

multimedia applications. Since 1984, a wide range of digital video codecs have been

standardized, each of which represents a step forward either in terms of compression efficiency

or in functionality. The MPEG-x and H.26x video coding standards adopt a hybrid coding

approach, employing block-matching motion estimation/compensation, in addition to the

discrete cosine transform (DCT) and quantization. The reasons are: first, a significant

proportion of the motion trajectories found in natural video can be approximately described

with a rigid translational motion model; second, fewer bits are required to describe simple

translational motion; and finally, the implementation is relatively straightforward and amena-

ble to hardware solutions. These hybrid video systems have provided interoperability in

heterogeneous network systems. Considering that transmission bandwidth is still a valuable

commodity, ongoing developments in video coding seek scalability solutions to achieve a

one-coding–multiple-decoding feature. To this end, the Joint Video Team of the ITU-T Video

Coding Expert Group (VCEG) and the ISO/IECMoving Picture Experts Group (MPEG) have

standardized a scalability extension to the existing H.264/AVC codec. The H.264-based

Scalable Video Coding (SVC) allows partial transmission and decoding to the bit stream,

resulting in various options in terms of picture quality and spatial-temporal resolutions.

In this book, several advanced features/techniques relating to scalable video coding are

further described, mostly to do with 3D scalable video coding applications. Applications and

scenarios for the scalable coding systems, advances in scalable video coding for 3D video

applications, a non-standardized scalable 2Dmodel-based video coding scheme applied on the

Visual Media Coding and Transmission Edited by Ahmet Kondoz

� 2009 John Wiley & Sons, Ltd

Page 24: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

texture, and depth coding of 3D video are all discussed. A scalable, multiple description coding

(MDC) application for stereoscopic 3D video is detailed. Multi-view coding and Distributed

Video Coding concepts representing the latest advancements in video coding are also covered

in significant depth.

The definition of video coding standards is of the utmost importance because it guarantees

that video coding equipment from different manufacturers will be able to interoperate.

However, the definition of a standard also represents a significant constraint for manufacturers

because it limits what they can do. Therefore, in order to minimize the restrictions imposed on

manufacturers, only those tools that are essential for interoperability are typically specified in

the standard: the normative tools. The remaining tools, which are not standardized but are also

important in video coding systems, are referred to as non-normative tools and this is where

competition and evolution of the technology have been taking place. In fact, this strategy of

specifying only the bare minimum that can guarantee interoperability ensures that the latest

developments in the area of non-normative tools can be easily incorporated in video codecs

without compromising their standard compatibility, even after the standard has been finalized.

In addition, this strategymakes it possible for manufacturers to compete against each other and

to distinguish between their products in the market. A significant amount of research effort is

being devoted to the development of non-normative video coding tools, with the target of

improving the performance of standard video codecs. In particular, due to their importance, rate

control and error resilience non-normative tools are being researched. In this book, therefore,

the development of efficient tools for the modules that are non-normative in video coding

standards, such as rate control and error concealment, is discussed. For example,multiple video

sequence (MVS) joint rate control addresses the development of rate control solutions for

encoding video scenes formed from a composition of video objects (VOs), such as in the

MPEG-4–standard, and can also be applied to the joint encoding and transcoding of multiple

video sequences (VSs) to be transmitted over bandwidth-limited channels using the H.264/

AVC standard.

The goal of wireless communication is to allow a user to access required services at any time

with no regard to location or mobility. Recent developments in wireless communications,

multimedia technologies, and microelectronics technologies have created a new paradigm in

mobile communications. Third/fourth-generation (3G/4G) wireless communication technol-

ogies provide significantly higher transmission rates and service flexibility over a wide

coverage area, as compared with second-generation (2G) wireless communication systems.

High-compression, error-robust multimedia codecs have been designed to enable the support

of multimedia application over error-prone bandwidth-limited channels. The advances of

VLSI and DSP technologies are enabling lightweight, low-cost, portable devices capable of

transmitting and viewing multimedia streams. The above technological developments have

shifted the service requirements of mobile communication from conventional voice telephony

to business- and entertainment-oriented multimedia services in wireless communication

systems. In order to successfully meet the challenges set by the latest current and future

audiovisual communication requirements, the International Telecommunication Union-Radio

communications (ITU-R) sector has elaborated on a framework for global 3G standards by

recognizing a limited number of radio access technologies. These are: Universal Mobile

Telecommunications System (UMTS), Enhanced Data rates for GSM Evolution (EDGE), and

CDMA2000. UMTS is based onWideband CDMA technology and is employed in Europe and

Asia using the frequency band around 2GHz. EDGE is based on TDMA technology and uses

2 Visual Media Coding and Transmission

Page 25: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

the same air interface as the successful 2Gmobile system GSM. General Packet Radio Service

(GPRS) and High-Speed Circuit Switched Data (HSCSD) are introduced by Phase 2þ of the

GSMstandardization process. They support enhanced serviceswith data rates up to 144 kbps in

the packet-switched and circuit-switched domains, respectively. EDGE, which is the evolution

of GPRS and HSCSD, provides 3G services up to 500 kbps within GSM carrier spacing of

200 kHz. CDMA2000 is based on multi-carrier CDMA technology and provides the upgraded

solution for existing IS-95 operators, mainly in North America. EDGE andUMTS are themost

widely accepted 3G radio access technologies. They are standardised by the 3rd Generation

Partnership Project (3GPP). Even though EDGE and UMTS are based on two different

multiple-access technologies, both systems share the same core network. The evolved GSM

core network serves for a common GSM/UMTS core network that supports GSM/GPRS/

EDGE and UMTS access. In addition, Wireless Local Area Networks (WLAN) are becoming

more and more popular for communication at homes, offices and indoor public areas such as

campus environments, airports, hotels, shopping centres and so on. IEEE 802.11 has a number

of physical layer specifications with a common MAC operation. IEEE 802.11 includes two

physical layers – a frequency-hopping spread-spectrum (FHSS) physical layer and a direct-

sequence spread-spectrum (DSSS) physical layer – and operates at 2Mbps. The currently

deployed IEEE 802.11b standard provides an additional physical layer based on a high-rate

direct-sequence spread-spectrum (HR/DSSS). It operates in the 2.4GHz unlicensed band and

provides bit rates up to 11Mbps. IEEE 802.11a standard for 5GHz band provides high bit rates

up to 54 Mbps and uses a physical layer based on orthogonal frequency division multiplexing

(OFDM). Recently, IEEE 802.11g standard has also been issued to achieve such high bit rates

in the 2.4 GHz band.

The Worldwide Interoperability for Microwave Access (WiMAX) is a telecommunications

technology aimed at providing wireless data over long distances in different ways, from point-

to-point links to full mobile cellular access. It is based on the IEEE 802.16 standard, which is

also called WirelessMAN. The nameWiMAX was created by theWiMAX Forum, which was

formed in June 2001 to promote conformance and interoperability of the standard. The forum

describesWiMAX as “a standards-based technology enabling the delivery of last milewireless

broadband access as an alternative to cable andDSL”.MobileWiMAX IEEE 802.16e provides

fixed, nomadic and mobile broadband wireless access systems with superior throughput

performance. It enables non-line-of-sight reception, and can also cope with high mobility

of the receiving station. The IEEE 802.16e enables nomadic capabilities for laptops and other

mobile devices, allowing users to benefit from metro area portability of an xDSL-like service.

Multimedia services by definition require the transmission of multiple media streams, such

as video, still picture, music, voice, and text data. A combination of thesemedia types provides

a number of value-added services, including video telephony, E-commerce services, multi-

party video conferencing, virtual office, and 3D video. 3D video, for example, provides more

natural and immersive visual information to end users than standard 2D video. In the near

future, certain 2D video application scenarios are likely be replaced by 3D video in order to

achieve a more involving and immersive representation of visual information and to provide

more natural methods of communication. 3D video transmission, however, requires more

resources than the conventional video communication applications.

Different media types have different quality-of-service (QoS) requirements and enforce

conflicting constraints on the communication networks. Still picture and text data are

categorized as background services and require high data rates but have no constraints on

Introduction 3

Page 26: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

the transmission delay. Voice services, on the other hand, are characterized by low delay.

However, they can be coded using fixed low-rate algorithms operating in the 5–24 kbps range.

In contrast to voice and data services, low-bit-rate video coding involves rates at tens to

hundreds of kbps.Moreover, video applications are delay sensitive and impose tight constraints

on system resources. Mobile multimedia applications, consisting of multiple signal types, play

an important role in the rapid penetration of future communication services and the success of

these communication systems. Even though the high transmission rates and service flexibility

have madewireless multimedia communication possible over 3G/4Gwireless communication

systems, many challenges remain to be addressed in order to support efficient communications

inmulti-user,multi-service environments. In addition to the high initial cost associatedwith the

deployment of 3G systems, the move from telephony and low-bit-rate data services to

bandwidth-consuming 3G services implies high system costs, as these consume a large

portion of the available resources. However, for rapid market evolvement, these wideband

services should not be substantially more expensive than the services offered today. Therefore,

efficient system resource (mainly the bandwidth-limited radio resource) utilization and QoS

management are critical in 3G/4G systems.

Efficient resource management and the provision of QoS for multimedia applications are in

sharp conflict with one another. Of course, it is possible to provide high-quality multimedia

services by using a large amount of radio resources and very strong channel protection.

However, this is clearly inefficient in terms of system resource allocation. Moreover, the

perceptual multimedia quality received by end users depends on many factors, such as source

rate, channel protection, channel quality, error resilience techniques, transmission/processing

power, system load, and user interference. Therefore, it is difficult to obtain an optimal source

and network parameter combination for a given set of source and channel characteristics. The

time-varying error characteristics of the radio access channel aggravate the problem. In this

book, therefore, various QoS-based resource management systems are detailed. For compari-

son and validation purposes, a number of wireless channel models are described. The key QoS

improvement techniques, including content and link-adaptation techniques, are covered.

Future media Internet will allow new applications with support for ubiquitous media-rich

content service technologies to be realized. Virtual collaboration, extended home platforms,

augmented, mixed and virtual realities, gaming, telemedicine, e-learning and so on, in which

users with possibly diverse geographical locations, terminal types, connectivity, usage

environments, and preferences access and exchange pervasive yet protected and trusted

content, are just a few examples. These multiple forms of diversity requires content to be

transported and rendered in different forms, which necessitates the use of context-aware

content adaptation. This avoids the alternative of predicting, generating and storing all the

different forms required for every item of content. Therefore, there is a growing need for

devising adequate concepts and functionalities of a context-aware content adaptation platform

that suits the requirements of such multimedia application scenarios. This platform needs to be

able to consume low-level contextual information to infer higher-level contexts, and thus

decide the need and type of adaptation operations to be performed upon the content. In this way,

usage constraints can be met while restrictions imposed by the Digital Rights Management

(DRM) governing the use of protected content are satisfied.

In this book, comprehensive discussions are presented on the use of contextual information

in adaptation decision operations, with a view to managing the DRM and the authorization

for adaptation, consequently outlining the appropriate adaptation decision techniques and

4 Visual Media Coding and Transmission

Page 27: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

adaptation mechanisms. The main challenges are found by identifying integrated tools and

systems that support adaptive, context-aware and distributed applications which react to the

characteristics and conditions of the usage environment and provide transparent access and

delivery of content, where digital rights are adequately managed. The discussions focus on

describing a scalable platform for context-aware and DRM-enabled adaptation of multimedia

content. The platform has a modular architecture to ensure scalability, and well-defined

interfaces based on open standards for interoperability as well as portability. The modules are

classified into four categories, namely: 1. Adaptation Decision Engine (ADE); 2. Adaptation

Authoriser (AA); 3. Context Providers (CxPs); and 4. Adaptation Engine Stacks (AESs),

which comprise Adaptation Engines (AEs). During the adaptation decision-taking stage the

platform uses ontologies to enable semantic description of real-world situations. The decision-

taking process is triggered by low-level contextual information and driven by rules provided by

the ontologies. It supports a variety of adaptations, which can be dynamically configured. The

overall objective of this platform is to enable the efficient gathering and use of context

information, ultimately in order to build content adaptation applications that maximize user

satisfaction.

Introduction 5

Page 28: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4
Page 29: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

2

Video Coding Principles

2.1 Introduction

Raw digital video signals are very large in size, making it very difficult to transmit or store

them. Video compression techniques are therefore essential enabling technologies for

digital multimedia applications. Since 1984, a wide range of digital video codecs have been

standardized, each of which represents a step forward either in terms of compression efficiency

or in functionality. This chapter describes the basic principles behind most standard block-

based video codecs currently in use. It begins with a discussion of the types of redundancy

present in most video signals (Section 2.2) and proceeds to describe some basic techniques

for removing such redundancies (Section 2.3). Section 2.4 investigates enhancements to the

basic techniques which have been used in recent video coding standards to provide improve-

ments in video quality. This section also discusses the effects of communication channel errors

on decoded video quality. Section 2.5 provides a summary of the available video coding

standards and describes some of the key differences between them. Section 2.6 gives an

overview of how video quality can be assessed. It includes a description of objective and

subjective assessment techniques.

2.2 Redundancy in Video Signals

Compression techniques are generally based upon removal of redundancy in the original

signal. In video signals, the redundancy can be classified as spatial, temporal, or source-coding.

Most standard video codecs attempt to remove these types of redundancy, taking into account

certain properties of the human visual system.

Spatial redundancy is present in areas of images or video frames where pixel values vary by

small amounts. In the image shown in Figure 2.1, spatial redundancy is present in parts of the

background, and in skin areas such as the shoulder.

Temporal redundancy is present invideo signals when there is significant similarity between

successive video frames. Figure 2.2 shows two successive frames from a video sequence. It is

clear that the difference between the two frames is small, indicating that it would be inefficient

to simply compress a video signal as a series of images.

Visual Media Coding and Transmission Edited by Ahmet Kondoz

� 2009 John Wiley & Sons, Ltd

Page 30: Visual Media Coding and Transmission€¦ · 2.3.1 Video Signal Representation and Picture Structure 8 2.3.2 Removing Spatial Redundancy 9 2.3.3 Removing Temporal Redundancy 14 2.3.4

Source-coding redundancy is present if the symbols produced by the video codec are

inefficiently mapped to a binary bitstream. Typically, entropy coding techniques are used to

exploit the statistics of the output video data, where some symbols occur with greater

probability than others.

2.3 Fundamentals of Video Compression

This section describes how spatial redundancy and temporal redundancy can be removed from

a video signal. It also describes how a typical video codec combines the two techniques to

achieve compression.

2.3.1 Video Signal Representation and Picture Structure

Video coding is usually performed with YUV 4 : 2 : 0 format video as an input. This format

represents video using one luminance plane (Y) and two chrominance planes (Cb and Cr). The

luminance plane represents black and white information, while the chrominance planes con-

tain all of the color data. Because luminance data is perceptually more important than the

Figure 2.1 Spatial redundancy is present in areas of an image or video framewhere the pixel values are

very similar

Figure 2.2 Temporal redundancy occurs when there is a large amount of similarity between video

frames

8 Visual Media Coding and Transmission