mar. 2002 – jeju, korea title: mpeg-7 applications group...

MPEG-7 Applications and Demos.

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANIZATION INTERNATIONAL NORMALIZATION

ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11 N4676 Mar. 2002 – Jeju, Korea

Title: MPEG-7 Applications

Group/Subgroup: Requirements

Status: Approved

Editor: Neil Day

Structure of this document This purpose of this document is to present a list of MPEG-7 applications, demos and projects that are currently

available or under development. The list is broken down into three sections, namely,

• Multimedia

• Audio

• Visual

Each section lists representative MPEG-7 applications for the domain in question. The multimedia section represents

applications that combine audio, visual and textual functionalities. In some cases, an application may appear twice since

its developers may have produced functions which address more than none of the domains. Much work still needs to be

done on the consistency and presentation of the information here. Further updates of this document will improve the

presentation format.

Before this list of applications, there is a brief introduction describing the purpose of the MPEG-7 standard and

provides an outline of the application domains to which it can be applied.

At the end of the document there is a reference section and contact list for readers seeking more information.

MPEG-7 Applications, Demos and Projects

Disclaimer:

While the utmost attention has been given to providing accurate information on this list of MPEG-7 applications, demos

and projects, please be advised that the author(s) of this document cannot accept responsibility for errors and

inaccuracies. Should the reader or owners of the applications, demos or projects listed herein seek correction or

updates, please notify the authors as soon as possible. The authors’ contact details are provided at the end of this

document. Finally, please note, this document is very much a draft and is expected to undergo many updates and

revisions.

Introduction to MPEG-7 How many times have you seen science fiction movies such as 2001: A Space Odyssey and thought, “Wow, we’re so far

away from having any of the fancy gadgets depicted in these movies!” In 2001, Hal, the talking computer intelligently

navigates and retrieves information or runs complex operations instigated by spoken input. Or how about using an

image-based query, say an image of the motorbike used by Arnold Schwartzenegger in the movie T2, to find images of

similarly looking motorbikes. Dreams or reality?

As more and more audiovisual information becomes available from many sources around the world, many people

would like to use this information for various purposes. This challenging situation led to the need for a solution that

quickly and efficiently searches for and/or filters various types of multimedia material that’s interesting to the user.

For example, finding information by rich-spoken queries, hand-drawn images, and humming improves the user-

friendliness of computer systems and finally addresses what most people have been expecting from computers. For

professionals, a new generation of applications will enable high-quality information search and retrieval. For example,

TV program producers can search with “laser-like precision” for occurrences of famous events or references to certain

people, stored in thousands of hours of audiovisual records, in order to collect material for a program. This will reduce

program production time and increase the quality of its content.

MPEG-7 is a multimedia content description standard, (to be defined by September 2001), that addresses how humans

expect to interact with computer systems, since it develops rich descriptions that reflect those expectations. This

document gives an introductory overview of the MPEG-7 standard. More information about MPEG-7 can be found at

the MPEG-7 website http://drogo.cselt.it/mpeg/ and the MPEG-7 Industry Focus Group website http://www.mpeg-

industry.com These web pages contain links to a wealth of information about MPEG, including many publicly available

documents, several lists of ‘Frequently Asked Questions’ and links to other MPEG-7 web pages.

What Are the MPEG Standards?

The Moving Picture Coding Experts Group (MPEG) is a working group of the Geneva-based ISO/IEC standards

organization, (International Standards Organization/International Electro-technical Committee,

http://www.itscj.ipsj.or.jp/sc29/) in charge of the development of international standards for compression,

decompression, processing, and coded representation of moving pictures, audio, and a combination of the two. MPEG-7

then is an ISO/IEC standard being developed by MPEG, the committee that also developed the Emmy Award-winning

standards known as MPEG-1 and MPEG-2, and the 1999 MPEG-4 standard.

• MPEG-1: For the storage and retrieval of moving pictures and audio on storage media.

• MPEG-2: For digital television, it’s the timely response for the satellite broadcasting and cable television industries in

their transition from analog to digital formats.

• MPEG-4: Codes content as objects and enables those objects to be manipulated individually or collectively on an

audiovisual scene.

MPEG-1, -2, and -4 make content available. MPEG-7 lets you to find the content you need.

Defining MPEG-7 MPEG-7 is a standard for describing features of multimedia content.

Qualifying MPEG-7:

MPEG-7 provides the world’s richest set of audio-visual descriptions

These descriptions are based on catalogue (e.g., title, creator, rights), semantic (e.g., the who, what, when, where

information about objects and events) and structural (e.g., the colour histogram - measurement of the amount of colour

associated with an image or the timbre of an recorded instrument) features of the AV content and leverages on AV data

representation defined by MPEG-1, 2 and 4.

Comprehensive Scope of Data Interoperability

MPEG-7 uses XML Schema as the language of choice for content description.

MPEG-7 will be interoperable with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core, EBU

P/Meta, and TV Anytime.

The Key Role of MPEG-7 MPEG-7, formally named “Multimedia Content Description Inter-face,” is the standard that describes multimedia

content so users can search, browse, and retrieve that content more efficiently and effectively than they could using

today’s mainly text-based search engines. It’s a standard for describing the features of multimedia content.

However…

MPEG-7 will not standardize the (automatic) extraction of AV descriptions/features. Nor will it specify the search

engine (or any other program) that can make use of the description. It will be left to the creativity and innovation of

search engine companies, for example, to manipulate and massage the MPEG-7-described content into search indices

that can be used by their browser and retrieval tools.

A few application examples are:

• Digital libraries (image catalogue, musical dictionary,…)

• Multimedia directory services (e.g. yellow pages)

• Broadcast media selection (radio channel, TV channel,…)

• Multimedia editing (personalised electronic news service, media authoring)

The potential applications are spread over the following application domains:

• Education,

• Journalism (e.g. searching speeches of a certain politician using his name, his voice or his

face),

• Tourist information,

• Cultural services (history museums, art galleries, etc.),

• Entertainment (e.g. searching a game, karaoke),

• Investigation services (human characteristics recognition, forensics),

• Geographical information systems,

• Remote sensing (cartography, ecology, natural resources management, etc.),

• Surveillance (traffic control, surface transportation, non-destructive testing in hostile

environments, etc.),

• Bio-medical applications,

• Shopping (e.g. searching for clothes that you like),

• Architecture, real estate, and interior design,

• Social (e.g. dating services), and • Film, Video and Radio archives.

Brief Overview of List of MPEG-7 Applications, Demos and Projects

The following chart shows the percentage breakdown of audio, visual and multimedia MPEG-7 applications, (for the

purpose of brevity, ‘MPEG-7 applications, demos and projects’ shall be here on in referred to as ‘MPEG-7

applications’).

To date, a total of 43 applications have been either found or made known to the authors of this document. It

can be assumed that there are in fact many more applications being developed and will be developed over time.

Consequently, this list is expected to grow rapidly. Application developers are encouraged to inform the authors of their

work so as to be included in this list.

MPEG-7 Applications

Audio12%

Visual55%

Multimedia33%

MULTIMEDIA

1. MPEG-7 Visual Annotation Tool The MPEG-7 Visual Annotation tool enables users to interactively create MPEG-7 descriptions using MPEG-7

Description Schemes and Descriptors. The tool takes as input an MPEG-7 Schema definition file and an MPEG-7

package description file. The MPEG-7 Schema defines the structure of the MPEG-7 description components using the

MPEG-7 Description Definition Language (DDL). The Package description organizes the MPEG-7 description

components in order to improve the ease of navigation in the MPEG-7 Visual Annotation Tool. The tool provides

utilities for drag-and-drop copying and re-using of description elements and allows the output of the descriptions in

XML to files. The initial implementation centers around manual entry of description data, however, in future work we

plan to explore the integration of automatic and semi-automatic feature extraction methods with the goal of providing a

complete system for MPEG-7 multimedia content annotation and query building.

Contact: John Smith,

Email: jrsmith@watson.ibm.com

Manager, Pervasive Media Management

IBM T. J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, NY 10532

(914) 784-7320;

Figure 1: Percentage breakdown of Current MPEG-7

2. Wireless Images Retrieval using Speech Dialogue Agent The agent in the client terminal recognize user's utterance in English/Japanese with rather dedicated sentences and send

a query profile to the server using a wireless tranceiver channel (32kbps). The server will retrieve the requested images

and deliver the compressed video bitstream (H.263) to the client. Then the client agent will reply with a synthesized

voice and display the images. Now the original format of the metadata is being used, but the MPEG-7 format would be

used in the near future for all the clients, servers, and channels.

Contact: Mikio Sasaki:

Email: msasaki@rlab.denso.co.jp

Research Laboratories, DENSO CORPORATION

500-1 Minamiyama, Komenoki-cho, Nisshin-shi, Aich-ken,470-0111 Japan

3. Internet Streaming Media Metadata Interchange using MPEG-7 Singingfish.com uses MPEG-7 description schemes to model the Internet streaming media metadata. This presentation

describes our use of MPEG-7 description schemes to define a schema for the XML interchange of Internet streaming

media metadata with several of our commercial content partners.

The goal of such metadata interchange is to populate our search index with the highest quality and most semantically

rich metadata possible, ultimately yielding superior relevance to the end user.

The presentation includes a short demonstration of the fidelity of a transformation from MSNBC's "Partner XML

Format" to an MPEG-7 XML description.

Contact: Eric Rehm:

Email: rehm@singingfish.com

Singingfish.com / Thomson Multimedia, Seattle, WA, USA.

4. The MPEG-7 Experimental Model (XM) This presentation covers:

1. The Basic structure of the MPEG-7 XM Software

2. A Graphical User Interface for the MPEG-7 XM Software

3. Key Applications for the MPEG-7 XM

a) Search and retrieval

b) Transcoding

4. Combining visual low level descriptors in a search application

Contact. Mr. Stephan Hermann

Email: stephanh@lis.e-technik.tu-muenchen.de

Affiliation: Munich University of Technology,

Institute for Integrated Circuits

Munich, Germany

5. ASSAVID The usefulness of archived audiovisual material is strongly dependent on the quality of the accompanying annotation.

Currently this is a labour-intensive process, which is therefore limited in the amount of detail that can be stored. In

particular, in real-time applications (such as live broadcast events) it is unrealistic to add much manual annotation. The

proposed information management system will automatically extract descriptive features, using MPEG-7 descriptors

where relevant, and associate these features with a small thesaurus relevant to the subject matter. In this project, the

subject matter will be limited to sports events. The features will be associated with the thesaurus by means of a training

process. In this way the user will be able to make text-based queries on the audiovisual material, using only the

automatically-extracted annotation.

Contact: Rex Dorricott

Organisation: Sony United Kingdom

Email: rd@adv.sonybpe.com

6. FAETHON The overall objective of FAETHON project is to develop an integrated information system offering enhanced search

and retrieval capabilities to users of audiovisual (a/v) archives. This novel system will exploit the advances in handling

a/v content and metadata, as introduced by MPEG-4 and MPEG-7, to provide access characterized by semantic

phrasing of the request (query), unified handling and personalized response. This will be achieved by developing

algorithms and software for,

(i) extracting high level semantic out of syntactic and low level semantic information contained in the a/v

archives,

(ii) filtering the responses of the latter on the basis of continuously updated profiles of individual users.

To this end, state of the art technologies will be used and new algorithms in the fields of fuzzy and hybrid systems will

be developed. Novel database schemes for multidimensional indexing will be employed.

Contact: Stavropoulou Olga, Prof. Anastasios Delopoulos

International Business Development

The ALTEC Group

Fragokklissias 4, Maroussi, 151 25 Athens, Greece

Tel: +301 61 09 746-7, Fax: +301 61 09 748

Email: ost@sysware.gr, adelo@image.ntua.gr

7. MUMIS MUMIS intends to investigate and develop technology for the automatic creation of indexes into video material, using

content related data from several sources and languages. The project will investigate a number of issues, i.e. extraction

of formal representations in several languages, from spoken accounts in several language as well as from image

understanding. Information extracted from these sources must be fused into a multi-tiered data structure, based on an

ontology for the demonstration domain (soccer matches). The data structure will consist of time markers pointing

directly towards events in the programme. The performance of the technology will be proved in the form of Internet-

accessible prototype demonstrators. To that end, a search engine will be designed and implemented which will allow

users to search for specific sets of events and retrieve the corresponding multi-media fragments.

Contact: Prof. Franciska de Jong,

University of Twente,

Centre for Telematics and Information Technology, NL.

Email: fedjong@cs.utwente.nl

8. PRIMAVERA The project PRIMAVERA aims at building a Content Management System for broadcast applications that allows for an

intuitive, visual-aided annotation and retrieval of media information and provides a personalized view on the archive

content as well as a personalized filtering of incoming information. Advanced analysis and indexing functionality for

video as well as for audio content are the fundamental building blocks for new techniques that support efficient

querying and searching in the media repository, and exploration of the archive content. A central focus of the project is

put on the usability in a production environment, by integration of high-performance indexing and retrieval techniques

that fit well for large-scale broadcast archives into an already existing Content Management System, forming the basis

of the PRIMAVERA system

Contact: KUNKELMANN Thomas

TECMATH AG

Email: kunkelmann@medien.tecmath.de

9. SAMBITS System for Advanced Multimedia Broadcast and Information Technology Services

SAMBITS will bring MPEG-4 and MPEG-7 technology to the broadcast industry and the related internet services. The

project will be able to provide multimedia services to a terminal that can display any type of general interest integrated

broadcast/internet services with local and remote interactivity. This is a cost effective solution that is of immediate

commercial interest because it is using the technological Internet and DVB broadcast infrastructure already in place.

SAMBITS will develop a multimedia stdio system and demonstrate integrated (Internet and DVB Broadcast) services

using consumer-type terminal demonstrator. The technological basis for the system will be MPEG-2/-4/-7 where

contributions will be made to the standards. Standardised systems are ecognised to be advantageous for horizontal

markets (e.g. increased competition). SAMBITS will develop methods for service providers to integrate MPEG-2,

MPEG-4 and MPEG-7 data.

SAMBITS: http://www.cordis.lu/ist/projects/99-12605.htm

Contact: Gerhard Stoll,

Organisation : Institut fuer Rundfunktechnik GmbH

Floriansmuehlstrasse 60, 80939 Munchen, Germany

Tel : +49 89 32399347, Fax : +49 89 32399415

Email : stoll@irt.de , URL: http://www.irt.de/

10. SOLO Project Name: The SOLO Project, University of Sydney

SOLO is intended to be an optimum search engine prototype for the MPEG-7 retrieval domain. SOLO will support

multi-step searches across description database (using MPEG-7 DSs) and content databases (using content-based

features). SOLO is built on the meta-search engine (for rudimentary search), mobile code paradigms (for advanced

searches), and computational intelligence (for aiding search strategy composition, database selection, and back-end

filtering). The deployment of mobile code paradigms in the form of search agent technology extends the MPEG-7

enabled MSE to include specific content-based features which are often desirable in an advanced search across content

databases of audio visual archives.

SOLO: http://www.ee.usyd.edu.au/solo/main.html

Contact: Jose Lay,

Signal and Multimedia Processing Lab.

School of Electrical and Information Engineering

Building J-03, University of Sydney

NSW 2006, Australia.

Email: jlay@ee.usyd.edu.au

11. Video Editing & Production • A tool to generate the instances of the MPEG-7 Structural Annotation DS from the annotation sentences of

shots in Japanese.

• A tool to generate an index for retrieving shots from the instances of the Structural Annotation DS, i.e., well-

formed XML documents. The tool uses an XML parser that has a DOM API.

• A retrieval tool that can match a user's queries written in natural language (Japanese) sentences with the shot

index.

• The basic estimation of the tools was done using a collection of real data, i.e., 343 shot descriptions in

Japanese.

Contact: Masahiro Shibata,

NHK, Japan.

Email: shibata@strl.nhk.or.jp

12. INTERFACE

Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments

The objective of the project is to define new models and implement advanced tools for audio-video analysis, synthesis

and representation in order to provide essential technologies for the implementation of large-scale virtual and

augmented environments. The work is oriented to make man-machine interaction as natural as possible, based on

� �

everyday human communication by speech, facial expressions and body gestures. Man to machine action will be based

on coherent analysis of audio-video channels to perform either low level tasks, or high level interpretation and data

fusion, speech emotion understanding or facial expression classification. Machine-to-man action, on the other hand, will

be based on human-like audio-video feedback simulating a "person in the machine". A common SW platform will be

developed by the project for the creation of Internet-based applications. A case study application will be developed,

demonstrated and evaluated.

The integrated SW platform, will be developed progressively through upgraded releases, the first of which at the end of

the first year of project activity. Compliance with MPEG-4 and MPEG-7 will be guaranteed by deep project

commitment in the standardisation process. At project conclusion, the InterFace consortium will organise an

International Workshop for public demonstration and dissemination of the achieved results.

Project URL: http://www-dsp.com.dist.unige.it/

Contact Person:

Name: LAVAGETTO, Fabio

Tel: +39-010-3532208

Fax: +39-010-3532154

Email: fabio@dist.unige.it

13. PISTE � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� ! � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � " # # $ % & # ' " ( ) * ' # + , - + % " ) ' . � � � � �� / ) 0 1 ) / ) ' + " 2 # / % ) " 3 " # 4 3 * # 5 � � � � � � � � � � � � � . 6 � � � � � � � � � � � � � � � � �� 6 � � � � � � � � � � � � � � � � � � � � � � � � � 7 � � 8 9 � � � � � � � 6 � � � � � � �

In the PISTE pre-production phase, the broadcasters create a schedule (e.g. the event name, location, and the

participating athletes names and CVs), according to which the capturing and creation of visual enhancements

will take place. This schedule contains the information necessary to uniquely identify the content to be

captured, as well as its proper storage location in the broadcaster database. The metadata schema uses the

information to be stored in a production multimedia repository, but also part of the data to be transmitted

with the multimedia content to the receiver.

PISTE aims at the definition of description metadata for multimedia sport. MPEG7 descriptors and

description schemes will be used to the farthest possible extent. It is expected that specific PISTE

descriptors and description schemes will need to be developed using the MPEG7 DDL language. These new

sport specific developments will be fed back to MPEG7.

The technology (figure 1) used by PISTE allows carrying additional information associated to "sensitive

objects". This means that the information is adequately generated, delivered and its possible access signalled

to the user.

� �

Figure 1: Getting information on John Doe

Standardisation activities

� � � � � � � � � � � � � � 6 � � � � 7 � � 8 � 9 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� . � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � . 6 � � � 7 � � � . � 6 � � � � � � � . � � � � � 7 � � . � � � � � � � � � . � � � � 7 � � � � � � � � ��

� � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � 7 � � 8 � 9 � � � � � � � � � � � � � � � � � � � � � � �� 6 � � � � � � � 7 � � 8 � � � � � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � �� 6 � � � � � � � � � � � � � � � � � � � � � � 6 � � . � � � � � � � � � � � � � � � � � � � � � � � . � � � � � � �� 6 � � � � � � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � �� 6 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � . � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 6 � . � � � � � � � � � � � � � � � � � � � � � � � � � � � � .� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� . � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � �

� � � � � � � � ��

� � � � � � � � � � � � � � � � ! � � � � " �

14. ART.LIVE # � � " $ ! % $ & � � � ' ( ) * + , - . � $ / � 0 � � � � $ 1 � 2 � % $ � ! � ! 0 � � � � 0 � 3 � ! � 1 ! � � � $ & � $ $ % � 4 � $ � � " � � � � 0 ! � 1! � � % � 0 ! � � $ � 1 � � � � 1 � � � 4 & $ � � � � � � ! � 0 � 5 � � � $ & � ! ! � � 2 � � � ! 0 � � � � ! � 6 � � $ � � � � $ 1 3 0 � � $ � $ & !5 � 7 � 1 8 � ! % � � 9 � � 2 � $ � 5 � � � � : ! 2 � � " � � 5 � � 1 � � � 5 � 7 � 1 8 � ! % � � 9 & ! 5 � ; $ 6 < ! � 1 � & � � � 1 � 9= � % " ! 5 � � > ? # ! 7 $ � $ 5 9 $ & @ � ! % ! � 1 A � � 3 ! % � $ % 1 B � � � % ! 9 C � � � " ! � � $ � D 4 � � E , F . G H . ' + , ) I JE . ( K , L K H . ' + ' L G M , ( ) N ' + O P ( + G Q 4 R S S S 4 T � 5 � � ! U � 1 � ! � 1 V � � � " � 8 A � % ! " W 4 � � � ' ( ) * + , - .

� �

! 0 � � � � 0 � 3 � ! � 5 � ! � 0 � ! � � � " � ! ! � � 2 � � � ! 0 � � 4 � � " � � � � � ! 0 � � 2 � � � $ � � � 4 � � ! � 5 � 7 " ! � � � 0 ! %� % � 5 � � � � ; � � � % � 2 � � � � 3 � � $ & 0 ! 5 � ! � � # � � � ! % � 5 � ! 0 � $ & � � � � 9 � � � 5 $ 0 0 3 � ; � � � � $ 5 � � � � $ � �! � � � & $ � � $ & � � � 0 ! 5 � ! � � � � � 9 " � � � � � 5 � � % 2 � � � 5 5 � � � 1 ; � � � � � � � � 2 � � 3 ! % ! 5 � � ! � 0 � ! � 1� � � 9 ! � � � � � & $ � � � 2 $ % 2 � 1 ; � � � � � � � � � ! ! � � 2 � 4 ; � � 0 � � � � 9 ! � ! � % � � $ � � � � ! 0 � ; � � � � � $ 3 " �� ! 2 � $ � ? � � � � $ � � $ � � � � � � 1 � $ & � � � ! 0 � � � � 0 � 3 � 4 $ � � � � � � $ � � ! � % $ $ 6 � � " ! � � 0 � � � �; � � � � � � 5 � 7 � 1 � 5 ! " � � ! � � � 1 � � 1 ! � 1 � � � 9 5 � " � � � � $ & & � � 1 � $ � � � � ! 0 � ; � � � � � � � 9 � � � 5

? � 9 ' ( ) * + , - . � ! ! � � 2 � < � � $ 9 W � � � � � ! � % � � � � 1 ! � ! < � ! ! � � 2 � W " ! � � � � 2 � 9 � $ 1 � $ & � � � " ! � �� $ 2 � 1 � � ! � 0 � � � < 0 $ 5 � $ � � 1 � 9 2 ! � $ 3 � 2 � � 3 ! % $ � / � 0 � � W ! % $ � " ; � � � ! % � � � $ & � 2 � � � � � $ � � � � " " � 1 �� $ � � � � ! � 0 � 4 � � � ! 3 � � $ 5 ! 9 1 � 0 � 1 � � $ % $ $ 6 & $ ! � T � � � � " ! � � � $ � 4 � $ � � $ 3 0 � � ! � ! � � 0 3 % ! $ � / � 0 � ; � � 0 � � � ! 1 $ $ 4 � $ 1 � � � 0 � � ; $ $ � / � 0 � � 5 $ 2 � � " � & ! � � � � � $ � � $ � � � � 1 � � 0 � � $ � � 4 $ � � 5 � % 9 � $; ! � � & $ R � � � 0 $ � 1 � � ? 0 0 $ 1 � � " � $ � � � 1 � � � 0 � � 1 � � " " � 4 ! � ! 0 � � $ � $ 0 0 3 � � # � � ! 0 � � $ � � � ! 5 $ 2 � � $� $ 5 � � � 7 � � $ 1 � $ & � � � " ! � � 4 ; � � � ! � $ � � � � 0 � � � � � 1 � � � 0 � � 1 ! � 1 $ � � � � � " " � � ! � � � ! 0 � � 1& $ �C � � � � � 0 $ � � � 7 � 4 1 � � 0 � � � $ � < 5 � � ! 1 ! � ! W ! � ! 3 � $ 5 ! � � 0 ! % % 9 � 7 � ! 0 � � 1 & $ 5 2 � � 3 ! % � 0 � � � < � W0 ! � � 3 � 1 � 9 0 ! 5 � ! < � W $ 5 ! � 3 ! % % 9 ! � � � " � � 1 � $ 2 � � 3 ! % � % � 5 � � � � 4 ! � 1 � � " " � � ! � 5 ! 1 � $ & � � �< � � 5 3 % � ! � � $ 3 � W $ 0 0 3 � � 0 � $ & $ � � < $ � � 2 � ! % W 1 � � 0 � � � $ < � W �

= � � 8 � � 3 � � 1 � $ � $ 2 � 1 � � � ! � 1 ! 1 � � � 1 0 $ � � � � � 8 � ! � � 1 1 � � 0 � � � � $ � & $ � � � 2 ! � $ 3 � � 9 � � � $ &! 3 1 � $ 2 � � 3 ! % � � & $ 5 ! � � $ � � 7 � � � � � " � � � � � � 9 � � � 5 � � $ � � � � " � � � � C � ? ! � 1 = � � 8 � � ! � 1 ! 1 �! % % $ ; � ! " � � � � � � 0 � ! " � $ & � � � � 0 � � ! � $ 5 ! � ! " � 5 � � � � $ 5 ! � � � 3 % ! � � ; � % % 8 1 � & � � � 1 � � " " � � � � ! �! � 0 $ 5 � � � ! � � $ � $ & � � ! � 1 ! 1 � � � 1 1 � � 0 � � � $ � �

Foreseen contribution

# � � � $ / � 0 � ; � % % 3 � � 1 � � 0 � � � $ � � % ! � � 1 � $ � 5 ! " � � ! � 1 � 5 ! " � � � " 5 � � � � 1 $ � / � 0 � � � $ � � % $ �� ! 0 � � 2 � � 0 � � ! � $ � �

# � � % $ ; 8 % � 2 � % 1 � � 0 � � � $ � 4 ; � � 0 � ! � 1 $ 5 ! � � < � 0 � � ! � $ W 8 � � 1 � � � � 1 � � � 4 � 9 � � 0 ! % % 9 0 $ � � � � � $ & � � �� ! � � 0 = � � 8 1 � � 0 � � � $ � � # � � � � ! � � 7 � ! 0 � � 1 & $ 5 � � � � 5 ! " � 0 $ � � � � � < � � � � ! � 3 ! % T � / � 0 � �� $ 2 � 1 � 1 ! � � 5 ! " � � � 3 � � 0 � � ! % $ � " ; � � � ! % � � ! 5 ! � � � � � � A T � � W � � ! � 6 � � $ 2 ! � $ 3 � � $ $ % � �

Ideally, this process should be extended to the synthetic objects as well. Special descriptors can

also be associated to some player behaviours. The project will develop a set of specific dedicated

descriptors.

For all the moving virtual objects, it is proposed to merely characterise them by a bounding box that

surrounds any disconnected object(s), be it a single person or a group of. Therefore, the following features

will be extracted:

o U $ 0 ! � � $ � $ & � � � " ! 2 � � 9 0 � � � � $ & � � � � $ 3 � 1 � � " � $ 7 �

o = $ � � $ � $ & � � � " ! 2 � � 9 0 � � � � $ & � � � � $ 3 � 1 � � " � $ 7 �

o V � � " % � � � � $ � 2 � � 3 � " $ 3 � � ! � � 1 $ � � � � ! � � � 0 � ! � � $ $ & � � � � $ 3 � 1 � � " � $ 7 �

o B � � � 0 � � $ � $ & � � � � � � � � 0 � � ! � � � � 0 � $ & ! � 9 $ � / � 0 � < � � � � � � ! � % � ! � � $ � � � $ 3 � 1 � � " � $ 7 � W �

� �

o � 3 5 � � � $ & $ � / � 0 � � < � 3 5 � � $ & � $ 3 � 1 � � " � $ 7 � � W �

C � � ! ! % % � % 4 � % ! 9 � � � � � ! 0 � � $ � < � 9 � � 0 ! % % 9 5 $ 3 � � 0 % � 0 6 � W ! � ! % � $ 1 � � 0 � � � 1 �? % % � � � � � & � ! � 3 � � 5 3 � � � � ! 0 0 $ 5 � ! � � � 1 � 9 � $ 5 � 0 � � ! � � � 9 < 0 $ � & � 1 � � 0 � W 5 � ! � 3 � � � $ 1 � � $� � ! � % � � � � � � � � � � � ! � � $ � � ! � 6 � $ � ! 6 � ! � � $ � � ! � � 1 � 0 � � � $ � � � = $ � $ 2 � 4 � $ 5 � � � 5 �0 � ! ! 0 � � � � ! � � $ � ; � % % � � � � � 1 � 1 � � $ 1 � � $ 0 � � 0 6 � & ! � � � 0 � & � � 1 � � " " � $ 0 0 3 ! � ! � � 0 � � �� ! � � 4 $ � $ & � 7 1 � ! 1 % � � � � � �

Standardisation activities

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� ; ; ; � � � % � � 3 0 % � ! 0 � � � � � $ / � 0 � � � ! � � % � 2 � �

1. Spoken Content"At the awareness event we will present the Spoken Content description scheme, along with a basic Web application to

illustrate the concept and its applications."

Canon Research Centre Europe (CRE), [along with our collaborators at IBM (Almaden)] have proposed the MPEG-7

Spoken Content description scheme. Searching and indexing audio-visual data using the speech in the sound track is,

perhaps, one of the most natural metadata retrievals and our metadata format is especially designed to store the

(sometimes erroneous) output of a speech recognition system in a manner most suited to robust retrieval. We are

performing research in a large range of potential applications of such data using textual and/or verbal querying.

Contact: Dr. Wilson Chiu, Phil Garner

Email: wilsonc@cre.canon.co.uk

Email: philg@cre.canon.co.uk

Canon Research Centre Europe Ltd

Tel: +44 1483 448844 Fax: +44 1483 448845

2. CUIDADO Content-based retrieval of Music and Audio samples

� �

Information overload, inability to quickly browse through audio, poor added-value to music via Internet distribution,

keyword dictatorship, inability to search for similarities among sounds : these are music consumer complaints addressed

by IRCAM’s CUIDADO project. It aims at developing content-based technologies using and contributing to the MPEG

7 standard. Building reusable modules for audio feature extraction, music indexing, database management, networking

and constraint based navigation, CUIDADO targets two pilot applications:

1) The Music Browser features musical paths and automatic compilations according to user’s tastes, search for music

similarities, learning systems based on user’s profiles. One version is tied to Web music monitoring and another to Web

music sales and customized radios.

2) The Sound Palette involves musicians and studios for developing an authoring tool both online and in an existing

professional audio environment taking full advantage of the extracted audio features for innovative retrieval, editing and

processing.

CUIDADO is expected to bring Studio Online to a mature stage based on MPEG7 standard.

High impact on Music providers and labels involved in Web distribution is expected. Assuming that music value is

currently decreasing in itself, this application should give an evidence that new services and interfaces for accessing

music and sounds may bring more value than the music itself in the future context of Electronic Music Distribution

(EMD). This project should also raise copyright societies and music labels awareness on their role in using new content-

based tools for music promotion and music protection.

Contact: Vincent Puig (Managing Director),

Email: Vincent.Puig@ircam.fr

IRCAM, 1 place Igor Stravinsky, 75004 Paris.

3. Music Retrieval by Melodic Query Identifying a musical work from a melodic fragment is a task that most people are able to accomplish with relative ease.

For some time now researchers have worked to give computers this ability as well, which has come to be known as the

"query-by-humming" problem. To accomplish this, it is reasonable to study how humans are able to perform this task,

and to assess what features we use to determine melodic similarity. Research has shown that melodic contour is an

important feature in determining melodic similarity, but it is also clear that rhythmic information is important as well.

The system to be demonstrated uses our proposed MPEG-7 description scheme for melody, which incorporates melodic

contour and rhythmic information as the primary representation for music search and retrieval.

Additional front-end processing (to process queries), a medium-sized database of music, and a search engine (for

finding appropriate matches) have also been implemented to complete the full query-by-humming system.

Contact: Youngmoo Kim,

Email: moo@media.mit.edu

Machine Listening Group, MIT Media Lab., Boston, USA

http://sound.media.mit.edu/~moo

4. MIDLIB

As a part of the V3D2 digital library initiative, funded by Deutsche Forschungsgemeinschaft (DFG), the MiDiLiB-

project deals with two fundamental open problems in the field of digital music libraries (DMLs). The first part of our

project investigates techniques for automatic indexing and retrieval of score-based audio data (MIDI). More precisely,

we are interested in efficient strategies for processing content-based queries, e.g., queries resulting from given melody

� �

fragments or rhythms. Typically, such kinds of queries are only rough or imprecise approximations of the actual tunes.

Therefore, the retrieval system has to generate a ranked list of approximate matchings. The second part of the MiDiLiB-

project deals with perceptually stable methods for cascaded audio coding, i.e., iterated en-and decoding of (CD-quality)

PCM-data.

A small part of MIDILIB is the development of "music contents markup language" (MCML) which will be used for

displaying query results and for browsing in pieces of music. MCML (Music Contents Markup Language) is an XML

based language and the first applications will use this XML version. But in addition we are defining a "MPEG7-draft"

version (of MCML) to make it easy to adapt our applications to MPEG7 when it is standard.

MIDLIB: http://leon.cs.uni-bonn.de/forschungprojekte/midilib/english/

Contact: Frank Kurth, Jochen

frank@leon.cs.bonn.edu :: MIDILIB and MCML

schimmel@cs.bonn.edu :: MCML

5. Natural Language The following tool was selected to be an MPEG-7 Core Experiment.

Title: Linguistic Access to Multimedia Contents

Short Description:

Authoring, retrieval, summarization, and presentation of multimedia contents semantically structured by linguistic data.

Function (in one sentence):

Benefit for Applications:

Authors and users are allowed to use natural language for structuring and accessing multimedia contents, which is

expected to be the most natural way of dealing with them.

Potential Users: General Public

Related DSs/Ds (Description Schemes and Descriptors, i.e. tools):

Linguistic DS, Structured Annotation DS, Segment DS.

Linguistic data/annotation and video data are aligned through data/data links.

Demos/Applications available:

Of Tool: available.

Of Application in Industry Domain: available.

Contact: Katashi Nagao, HASIDA Koiti:

Email; KNAGAO@jp.ibm.com

IBM Tokyo Research Laboratory

� �

Email: hasida@etl.go.jp

Director of Information Science Division,

Electrotechnical Laboratory, (ETL), Ibaraki, Japan.

VISUAL .

1. Search Engine Tool Visualization of MPEG7 Similarity Retrieval of 2D and 3D Data

On the upcoming awareness meeting, an application will be presented, that allows visualization of similarity-based

retrieval results. This so-called Search Engine was applied for Core Experiments of visual descriptors. A graphical user

interface is used for a number of functionalities, e.g.

- Browsing of image databases

- Visualization of 3D data and image sequences

- Similarity Search for a number of visual descriptors

The SearchEngine is a Java-based application, that incorporates underlying functionality of C- or C++-based extraction

and similarity matching algorithms. For sequence playback, an MPEG-player is included. A small 3D viewer was also

added in Java3D technology. For comparable results within the MPEG-7 Core Experiments for visual descriptors a

console application, called MPEG-7 XM was used among the participants. This XM-Software is also integrated into the

SearchEngine. Certain basic image features are analyzed for similarity-based retrieval by this GUI:

- Texture

- Color

- Contour/Shape

- 3D Geometry by analyzing a number of 2D projections from 3D object � Different motion in Sequences (e.g. background motion from left to right)

Contact: Karsten Müller,

� �

Email: kmueller@hhi.de

Heinrich Hertz Institute

Einsteinufer 37, 10587 Berlin, Germany

Tel: +49 30 31002 225, Fax: +49 30 392 7200

2. Video Editing To highlight the basic elements of the Video Editing DS, two applications have been developed to edit and browse the

description of the video temporal structure specified in the MPEG-7 format. This temporal structure describes various

types of temporal units : shots, rushes and composition segments. The way these units are edited is also described in

terms of transition or composition effects.

The browser offers some navigation functionalities to quickly access specific parts of a video document regarding the

way it has been built.

The editor allows the completion of a partial description of a video structure that could have been provided by a video-

to-shot segmentation algorithm.

Contact: Rosa Ruiloba, Philippe Joly

Email: rosa.ruiloba@lip6.fr, Philippe.Joly@lip6.fr

Indexation Multimedia

Laboratoire d'informatique de Paris 6 - LIP-6/UPMC

Bureau C1219 tel : (33).(0)1.44.27.88.48

8, rue du Capitaine Scott 75015 Paris

3. MPEG-7 Video Browser and Highlight Generation Tool Background

As in the case of abstracts describing papers in the classical sense, a video summary is an ‘audiovisual’ abstract of a

video program, which allows for quick understanding of the underlying story of the program. We can capture the whole

story by glancing over the summary. The structure of the summary description is hierarchical so that coarse-to-fine

navigation is possible in order to access more detailed information (contents). Furthermore the MPEG-7 summary

structure allows for an event-based summary with which customized browsing and filtering is possible on the summary.

3.1. Video Summary Generator

A video summary generator creates video summaries of highlights automatically and/or semi-automatically, using low

level audiovisual features and high level semantics, assisted by content analysis and highlight detection rules,

respectively. It outputs description data that contain a set of highlights, composed of video summaries, that are derived

from the MPEG-7 Summarization DS (Description Scheme). The generated short video highlight summaries can be

used with an electronic program guide (EPG) or with a video-browsing tool in personal storage devices. The Video

Summary Generator also generates a CC (closed-caption) text DB, which consists of keywords extracted from CC text,

using text analysis and time codes to indicate ‘keyword-sychronized’ video locations obtained by speech recognition in

the audio track, in order to support text-based retrieval of news video clips.

� �

3.2. MPEG-7 Video Browser

The generated summary description data is fed in to an MPEG-7 video browser. The MPEG-7 video browser allows for

quick overview utilizing, audiovisual highlights with different time durations, efficient browsing through non-linear

navigation (based on multi-level hierarchical highlights and associated key-frames), and a ‘highlights-view’ and

browser based on particular events. It also provides CC-text-based retrieval of news video clips. The video browser can

be used as a video-browsing/-retrieval tool in personal storage devices in digital broadcasting and internet environments

Contact: Munchurl Kim

Email: mckim@etri.re.kr

Participants: Munchurl Kim, Hyun Sung Chang

Affiliation: Electronics and Telecommuncations Research Institute

Country: Korea

4. Video-over-IP (VIP) Full streaming over the internet of both content and MPEG-7 metadata.

The Video-over-IP project (VIP) is an integration project carried out in the Netherlands. Various partners are involved,

like the Telematica Instituut, NOB, SurfNet, IBM, and TNO. In general, the purpose of the VIP project is to allow for

the production, storage, management, retrieval, and exploration of video content for a specific set of users. Moreover,

these services should be interoperable on the Internet. The following general activities should be possible:

• The production of digitised video material (media objects), ready for distribution over the Internet

• The production of content (video material plus metadata), including the management of this production process

• Digitising video and other material in various formats

• Extending the video material with additional descriptions (metadata) for disclosure, either (semi-) automatically, or

manually. In order to search in the content, parts of the video should be properly described.

• Indexing and retrieval of content

• End users should be able to search in the content

• Search, retrieval- en browsing facilities, including a user interface

• Security against improper use (encryption and watermarking)

• Distribution of high-quality video to the end user over the IP network

• The realisation of a network architecture needed for offering these services with a high quality of service

• Charging the end users on the basis of the delivered content and services (content-based billing & accounting).

Contact: Erik Oltmans:

Email: oltmans@telin.nl

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� �

5. Image Search using Edge/Contours

Short Description: The edge histogram descriptor represents local edge distribution on 4*4 sub-images. Five types of

edges, namely four directional edges and one non-directional edge, are defined for each sub-image. So, there are a total

16*5=80 histogram bins.

Function (in one sentence): Image to image matching, especially for natural images with non-uniform edge

distribution.

Benefit for Applications: Since the descriptor is based on the edge information in the image, it is good for natural

image matching. Since edges play an important role for image perception, it can retrieve images with similar semantic

meaning.

Potential Users:

- Image search (retrieval) by example or by sketch

- Scene change detection

- Key frame clustering

Contact: Soo-Jun Park

Email: psj@etri.re.kr

Senior Member of Engineering Staff

ETRI-CSTL

161 Kajong-dong, Yoosung, Taejon, 305-350, Korea

URL: http://sir.etri.re.kr/~soop

(phone) +82-42-860-6899, (fax) +82-42-860-4889

6. Video Annotation and Summaries 6.1. Video annotation editor

The system can automatically generate video transcripts using speech recognition and make a correspondence between

video scenes and words. The system can also detect scene change boundaries. The user of this system can modify

automatically-generated transcripts and scene boundaries. The user can also annotate some keywords and comments on

objects in video frames. The system generates XML-formatted annotation data that contains all information created

through user interaction.

6.2. Video player with summarization function

The system can generate summaries of video clips with annotation data and play them. The user can input any keyword

that will contribute to customization of video summaries. The player can also show transcript text synchronized with

video like closed caption. The user can also select any scenes from the scene index window.

Contact: Katashi Nagao, HASIDA Koiti:

Email; KNAGAO@jp.ibm.com

IBM Tokyo Research Laboratory

Email: hasida@etl.go.jp

Director of Information Science Division,

� �

Electrotechnical Laboratory, (ETL), Ibaraki, Japan.

7. MPEG-7 Video Object Segmentation and Retrieval

We will present a video object segmentation system, AMOS, and a video retrieval and visualization application.

Currently, fully automatic segmentation of semantic objects is only successful in constrained visual domains. The

AMOS system takes on a powerful approach in which automatic segmentation is integrated with user input to track

semantic objects in video sequences. For general video sources, the system allows users to define an approximate object

boundary by using a tracing interface. Given the approximate object boundary, the system automatically refines the

boundary and tracks the movement of the object in subsequent frames of the video. The system is robust enough to

handle many real world situations that are hard to model in existing approaches, including complex objects, fast and

intermittent motion, complicated backgrounds, multiple moving objects, and partial occlusion. For each video

sequences, the description generated by this system is a set of semantic objects with the associated regions and visual

features that can be manually annotated with text. Text annotations can also be assigned to the video sequence.

The video retrieval and visualization application developed during a Core Experiment within MPEG-7 uses the

descriptions generated by AMOS to retrieve and visualize videos based on the annotations and visual features. This

application supports (1) query by example based on any combination of visual features and text annotations (e.g.,

retrieve video sequences with similar objects based on color and texture); (2) query by keyword based on text

annotations (e.g., retrieve video sequences with “elephant”); and (3) advanced visualization of the retrieved results

based on panoramic views and segmented objects.

Contact: Ana Belen Benitez

Email: ana@ee.columbia.edu

Electrical Engineering Department

Columbia University, 1312 Mudd, #F6, 500 W. 120th St, MC 4712, New York, NY 10027

Voice: +1 212 854-7473 Fax: +1 212 932-9421

URL: http://www.ee.columbia.edu/~ana/

8. Hierarchical Summary Browser Category: Application of the Summary DS.

Features: Summary Theme based Audio-Visual Summary Selection

Presentation Time based Audio-Visual Summary Selection

Abstract: Hierarchical Summary Browser is based on the Summary DS which is in the category of navigation

and access. The functionality of the proposed hierarchical summary browser includes dynamic audio-

visual summary generation following the user’s selection of the summary theme and summary length

in time. By allowing users to select preferred summary length, the hierarchical level of the provided

summary can be automatically selected so that the length of the summary is closest to the user’s

request. By allowing users to select preferred theme of the summary, audio-visual summaries of

various length with the selected theme can be dynamically generated, so that the user can select the

length. The combined selection of the themes and length are also available. Such a hierarchical

� �

summary browser can be also used in accordance with the user preference, so that the preferred theme

and the length can be automatically selected based on the user preference.

9. Table of Contents (ToC) Browser Category: Application of the Segment DS and Graph DS.

Features: ToC based Audio-Visual Content Navigation

Abstract/Detail* relation based Navigation

Cause/Effect** relation based Navigation

Abstract: The ToC browser is based on the segment DS and the Graph DS. The ToC browser interface provides

tree-structured interface of the selected content so that a user can select interested segment of the

content. Each segment is represented by a representative key frame, and the selected segment is

summarized by a list of key frames. Based on the abstract/detail and cause/effect relationships defined

using the Graph DS, a user can select segments in the abstract/detail/cause/effect relation. The

abstract/detail relation provides two segments one of which is an abstract version of the other and the

latter is a detailed version of the former segment. The cause/effect relation provides two events one of

which causes the other and the latter is the result of the former event.

*Abstract/detail are proposed normative types of relations

**The effect relation is equivalent to the result relation which is a proposed normative relation type and the cause

relation can be considered as the inverse relation of the result

10. SmartEye

Image Retrieval System with Relevance-Feedback based Image Characterization

Category: Application of the MatchingHint DS.

Features: Image retrieval using multiple descriptors with different weights. Automatic learning MatchingHints

by user’s feedback

Abstract: Generally, relevance feedback has been utilized only to refine the query conditions in image retrieval.

However, in our Application, the usage of the relevance feedback is extended to the image database

categorization so as to be accommodated to user independent image retrieval. In our approach, to

guarantee a user-satisfactory performance, descriptors and the elements of the descriptors

corresponding features of each image are weighted using the relevance feedback. We use the

MatchingHint DS for weighting descriptors and elements of each descriptor based on color and

texture descriptors. In addition, our system uses the appropriate learning method based on the

reliability scheme preventing wrong learning from wrong feedback.

Applications 8, 9, 10. Contacts: Heon Jun Kim, Ph.D.,

Email: hjk@lge.co.kr

Senior MTS

Also: Kyoungro Yoon, Jin-Soo Lee, Jung-Min Song

� �

MI Group, Information Technology Lab.

LG Corporate Institute of Technology, 16 Woomyeon-Dong, Seocho-Gu, Seoul, Korea 137-724

TEL: +82 526 4132, FAX: +82 526 4852

11. MPEG-7 Camera In collaboration with the EPFL, FASTCOM Technology S.A. has developed, around its smart camera product, a

MPEG-7 "standard" communication layer making searchable the content of its video output. The goal of this project is

to develop an MPEG-7 camera. Such a camera is able to interpret the scene at hand and to extract the relevant

information. This information is no longer dependent on the audio-visual data from which it has been extracted. This

modality independent (or pure) information can for instance be displayed on the receiver side in any given modality, be

it textual, audio or visual. The MPEG-7 camera project builds the full chain of extracting the pure information in the

scene, transmitting it, and displaying it according to the recipient’s preference. The extraction of the information is

performed using an intelligent camera, which analyses the visual data in real time.

The analysis is defined through the software application running inside the intelligent camera. The extracted

information is then packaged in an MPEG-7 compliant bitstream and transmitted to the recipient in XML. The latter

uses an XML-based interface to display the information. The display may be done either in a textual, an audio or a

visual mode.

Contact: Nicolas Pican

Email: pican@fastcom-technology.com

FASTCOM Technology S.A.

Contacts: Touradj Ebrahimi , Fabrice Moscheni

Email: Touradj.Ebrahimi@epfl.ch, , moscheni@fastcom-technology.com

Signal Processing Laboratory

Swiss Federal Institute of Technology EPFL, CH-1015 Lausanne, Switzerland

Direct Phone: +41 21 693 2606, Mobile Phone: +41 79 331 9993, Voice Mail: +1 (661) 420 5952

Direct Fax: +1 (603) 994 8508 or +1 (661) 420 5946, Office Fax: +41 21 693 7600

12. ALIVE Architecture and authoring tools for prototype for Living Images and new Video Experiments.

The goal of the project is to develop an architecture and a set of tools, both generic and application dependent, for the

enhancement of narrative spaces. This will be achieved in testing the two aspects of this goal: telepresence (inclusion of

real objects into virtual worlds) and augmented reality (inclusion of artificial object into real worlds). ALIVE will bring

together towards these very specific goals image processing engineers, AI computer scientists and multimedia authors.

The ALIVE trials will broadcast recent innovations in the field of intelligent distributed video processing among the

authors and artists community. Indeed, the world of comics authors shows a growing interest in the World Wide Web

which is a new media able to broadcast in an easier way their production and to offer new narrative spaces. A

particularly interesting, new an innovative prospect relies into the interaction between real-life and virtual worlds

(telepresence and augmented reality). The very precise goal of ALIVE is to implement this interaction and to

demonstrate it into specific experiments.

� �

ALIVE: http://www.cordis.lu/ist/projects/99-10942.htm

Contact: Benoît Macq

Univversité catholique de Louvain

Place du Levant, 2 , 1348 Louvain-la-Neuve, Belgique

Tel : (32-10)47 22 71

Fax : (32-10)47 20 89

Email : macq@tele.ucl.ac.be

13. Image/Video Retrieval By Color The following tool was selected to be an MPEG-7 Core Experiment.

Title: Color Layout

Short Description:

Compact description of spatial distribution of color. It enables us a high-speed and high

performance Image/Video retrieval.

Function (in one sentence):

Image-to-image, video-segment-to-video-segment and sketch-to-image matching

Benefit for Applications:

This descriptor provides very high-speed image retrieval functionality with low cost

Potential Users:

Users who need image based retrieval system, especially working for consumer

products

Related DSs/Ds (Description Schemes and Descriptors, i.e. tools):

How are they used together? How they differ?

Other color related descriptors, color histogram, dominant colors, structure histogram, represent

distribution of color, so that only the Color Layout descriptor is available when we need local color

characteristic, which is important for sketch-based query system.

Demos/Applications Available:

Of Tool:

Image, video, retrieval systems have been implemented.

Of Application in Industry Domain: It might be possible.

Contact: Akio Yamada,

NEC Japan,

Email: a-yamada@da.jp.nec.com

14. ISTORAMA In order to allow efficient search of the visual information available all over the Web, a highly efficient automated

� �

system is needed that regularly traverses the Web, detects visual information and processes it in such away to allow for

effective search and retrieval.

In the ISTORAMA project, an intelligent image content-based search engine for the World Wide Web will be

developed. This system will offer a new form of media representation and access of content available in WWW.

Information Web Crawlers continuously traverse the Internet and collect images that are subsequently indexed based on

integrated feature vectors. These features along with additional information such as the URL location and the date of

index procedure are stored in a database in an MPEG-7 compliant format. The user can access and search this indexed

content through the Web with an advanced and user friendly interface. The output of the system is a set of links to the

content available in the WWW, ranked according to their similarity to the image submitted by the user.

ISTORAMA: http://uranus.ee.auth.gr/Istorama

Contact: Yiannis Kompatsiaris

Greek Secretariat for Research and Technology (GSRT)

Intrasoft S.A.

Informatics and Telematics Institute (ITI)

Lab: http://uranus.ee.auth.gr

Email: ikom@dion.ee.auth.gr

15. Movie Tool MovieTool is a Description Tool for video. MovieTool can read and write a content description file in the format

defined MPEG-7 MDS Working Draft 3.0. We can construct a structure using this tool. This step is performed manually

or automatically. Some of automatic segmentation methods by content-based analysis are provided. Then we select a

key frame, extract content-based features, and a syntactic structure is generated. After that, we can input values to

attributes in each segment and whole content.

MovieTool provides many attributes that are defined in MPEG-7 description schemes.

MovieTool is a component in a Multimedia Content Retrieval System that we are developing. In this system, we

register a content description to a database through a MPEG-7 file. In the easy web I/F, we give attribute values as a

condition, and then the scenes or contents that we want are listed. We can select some of them, and play them in

specified order.

Contact: Takahiro Kunieda, Yuki Wakita

Ricoh Co. Ltd. Tokyo, Japan,

Tel: +81-3-3815-7261, Fax: +81-3-3818-0348

Email: kunieda@src.ricoh.co.jp, wakita@src.ricoh.co.jp

16. RETRIEVE The RETRIEVE project will demonstrate that advances in multimedia technology can be applied to CCTV schemes to

improve the quality of life in the UK, by reducing crime or increasing detection rates through the location of appropriate

digital video evidence. The project aims to maximise the automatic generation of metadata associated with images so

� �

that it can be applied in real-time as the surveillance video is captured, compressed and transmitted. The effectiveness of

CCTV systems will be enhanced if there are efficient search and retrieval mechanisms for the large digital archives.

The RETRIEVE project involves research and development in the converging worlds of high-speed digital

communications (ATM) and multimedia applications; the video images captured by the surveillance cameras will be

compressed and transmitted electronically to digital archives together with metadata which will facilitate searches of the

stored image databases. The main innovation of this project is the definition and generation of metadata relating to the

wavelet compressed video stream and the real-time automated analysis and tagging of compressed video fields. The

project will also investigate and refine digital image authentication mechanisms such as watermarking, audit trails and

digital signatures for surveillance applications. This work should lead to significant advances for the security industry

which is increasingly aware that in order to upgrade and increase the performance of expanding CCTV systems it has to

adopt digital technology.

Contact: Kate Grant Email: Nine_Tiles@psilink.co.uk

17. Sign Language Indexation Here, we address the issue of sign language indexation/recognition. The existing tools, like on-line Web

dictionaries or other educational-oriented applications, are making exclusive use of textual annotations. However,

keyword indexing schemes have strong limitations due to the ambiguity of the natural language and to the huge effort

needed to manually annotate a large amount of data. In order to overcome these drawbacks, we tackle sign language

indexation issue within the MPEG-7 framework and propose an approach based on linguistic properties and

characteristics of sign language. The "shape" of a sign is characterized by performing advanced image processing

procedures. The method developed introduces the concept of over time stable hand configuration instanciated on natural

or synthetic prototypes. The prototypes are indexed by means of a shape descriptor which is defined as a translation,

rotation and scale invariant Hough transform. We also show how meaningful descriptors, like finger configuration, hand

position, palm orientation together with arm and hand motion can be combined into a complete sign language

description schema. A demonstration will be presented by applying the proposed approach to data sets consisting of

“Letters” and “Words”, respectively. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� � � � � � � � � � � � � ��

18. TV ANYTIME The ISO/MPEG group has identified a wide range of application scenarios for their emerging MPEG-7 standard on

audio-visual metadata. TV Anytime with their vision of future digital TV services encompasses a large number of them.

As TV Anytime has also identified metadata as one of the key requirements to realize their vision, MPEG-7 is the

natural candidate to fill that role. Here, we describe technically how metadata for the TV Anytime scenario can be

created using MPEG-7.

Digital broadcasting offers the opportunity to provide value-added interactive services, that allow end users to

� �

personalize and control the material of interest, an evolution of TV into an integrated entertainment / information

gateway. The MPEG-7 collection of descriptors and description schemes for multimedia is able to fulfill the metadata

requirements for TV Anytime.

Contact: Silvia Pfeiffer, Uma Srinivasan

CSIRO Mathematical and Information Sciences

Locked Bag 17, North Ryde NSW 1670, Australia

Phone: +61 2 9325 3144, Fax: +61 2 9325 3200

{ Silvia.Pfeiffer | Uma.Srinivasan }@cmis.csiro.au

19. Image Browsing System The following tool was selected to be an MPEG-7 Core Experiment.

Title: Class and image browsing system for an image collection

Short Description:

It provides browsing capabilities for semantic classes, images per class, images, and classer per image.

Function in one statement:

Class and image browser for image collections

Benefits to users:

The collection description allows to create flexible browsing interface for image and classes defined for the collection

particularized for each user.

How DS is similar/different from related Ds:

It uses the Collection Structure DS and the Model DS.

Demos/Applications available:

Yes: http://www.mpeg7.ee.columbia.edu/

(Collection Structure DS)

Contact: Ana Belen Benitez

Electrical Engineering Department, Columbia University

1312 Mudd, #F6, 500 W. 120th St, MC 4712, New York, NY 10027

Voice: +1 212 854-7473 Fax: +1 212 932-9421

Email: ana@ee.columbia.edu

URL: http://www.ee.columbia.edu/~ana/

20. Scene retrieval for Sports Events

Aerobatic Grand prix 2000

URL http://ab-movie.ricoh.co.jp

� �

Contact: Takahiro Kunieda, Yuki Wakita

Ricoh Co. Ltd. Tokyo, Japan,

Tel: +81-3-3815-7261, Fax: +81-3-3818-0348

Email: kunieda@src.ricoh.co.jp, wakita@src.ricoh.co.jp

21: COMVID

The scientific goal of this assessment project is to develop and test a low bit-rate high quality video compression that is

based on new innovative segmentation algorithms. These segmentation algorithms will enable to extract and track the

objects among consecutive video frames. The plan is to setup framework of tools for multimedia applications that are

content-based. An automatic interaction with the contents to achieve a low bit-rate video compression will be based on

the extracted segments. This will enable substantial improvements of the coding efficiency and will enable also to have

efficient coding of multiple concurrent data streams. The proposed video codec will not necessitate extensive computing

� �

power.

Objectives:

The objectives of this project are to develop and test a new innovative framework to extract visual objects that will

enable to achieve a very low bit rate of high quality video compression that will outperform existing methods. It can

also be added later to MPEG-4 standard, which recommends to utilize object manipulation to achieve low bit rate

compression. But how to extract these visual objects remains an open problem for MPEG-4. The proposed methodology

can substantially enhance the performance of MPEG-4 and MPEG-7.

Typical applications for the proposed technology are:

• interpersonal real-time video communication with low bit rate,

• broadcast video distribution with low bit rate,

• content manipulation of video content for home video production,

• mobile multimedia,

• content-based storage and retrieval,

• streaming video on the Internet/Intranet,

• digital set-top box and many more.

Contact Person:

Name: AVERBUCH, Amir

Tel: +972-3-6422020

Fax: +972-3-6422020

Email: amir@math.tau.ac.il

22: HEALTHSAT - Health Interactive Satellite Channel

R&D & feasibility study on the first interactive medium for distributing health and wellness programmes and

personalised services to European citizens, via satellite and Internet direct to home digital TV sets, computers displays

and mobile terminals. HEALTHSAT intends to develop advanced prototypes that will be mostly innovative in 3 years

time: an integrated, open satellite+Web+WAP/UMTS platform the first integrated telecommunications multi-platform

combining satellite, Internet and mobile technologies, fully compliant with all standards to come, allowing in the same

time streaming and interactivity, and any-where, any-time access to programmes and services, and new techniques for

producing programmes matching the requirements of broadcasting (digital TV), multicasting (satellite) and narrow

casting (Web). Interactivity will be introduced through the utilisation of both the Internet technologies and the satellite

Broadband Interactive System (BBI).

Work description:

The satellite platform will build upon the basis of technical components which are currently considered as state of the

art. Further developments to be conducted in the project will match components with requirements of both multicasting

and interactivity, and address issues like newscasting, Webcasting, multiple transponder load sharing, quality of service,

� �

bandwidth management and Broadband Interactive return path. Two different systems will be developed for building

and integrating the Web platform into the satellite platform: a traffic system for analysing editorial needs and

programmes' traffic with multi-format playlists, multi-play and end-users profiles, and a digitalisation system for the

automation of encoding and play system. The development of XML applications for the satellite + Web platform will

allow testing the integration of a WAP/UMTS platform into the former. Important normative work will be conducted in

the project (DVB/RCS, MPEG-4, MPEG-7). All along the project innovative health services and tools will be

developed and proposed on the Web. They will be linked to broadcastings and videos of the database. New production

methodologies matching with multicasting requirements will be developed and tested in the project, relating to

graphical design and on air look definition, edition, production, dubbing, emission and contribution.

Implementation of large scaled broadcasts via digital satellite system will serve 9 5 million already equipped

households (streaming, narrowcasting, Webcasting, newscasting). Moreover 500 supplementary households will be

equipped for testing all the aspects of interactivity. An advisory board will be established for ensuring the quality and

relevance of programmes produced and for participating in the edition of institutional programmes. Health information

will be designed to assist citizens to understand various health-related issues and maintain a healthy lifestyle.

Project URL: http://www.healthsat.org

Contact Person:

Name: PRUMMEL, Claire

Tel: +31-78-6310105

Fax: +31-78-6313563

Email: claire.prummel@mmc-europe.com

23. MASCOT - Metadata for Advanced Scalable Video Coding Tools

The explosion of multimedia applications leads to a great expansion of video transmission over heterogeneous channels.

These developments increase the need for highly flexible and scalable video compression systems. For all currently

available interactive multimedia applications however, which are demanding in terms of video quality and coding

efficiency, the cost as well as the limited scalability remain unacceptable. Therefore, MASCOT seeks to design an

intrinsically scalable video-coding scheme providing fully progressive bit streams by exploiting novel morphological

and adaptive wavelet decomposition methods. Furthermore, the project aims to improve the quality and efficiency of

video coding systems by exploiting metadata information.

Objectives:

1. To develop scalable compression schemes, which fulfil the requirements of multimedia applications, by covering a

wide range of bit-rates, by yielding high compression ratios, and by providing a high level of bit stream embeddedness

to enable the adaptation of the compressed video data to a variety of networks and receivers.

2. To provide a breakthrough in the domain of video compression and to improve the quality of the reconstructed video

at low bit rates by exploiting non-linear (morphological, adaptive) wavelet decompositions, by using metadata during

the encoding step, and by development and optimisation of advanced prediction schemes.

� �

3. To provide Europe with a leadership in new video compression techniques.

4. To contribute to standardisation committees like ITU-T and MPEG, and to follow the activities of MPEG-7 (and

MPEG-21) as well as JPEG-2000.

Work description:

The work will consist of the following ingredients:

1. The analysis of the descriptors and description schemes included in the MPEG-7 standard or in other metadata

standards such as SMPTE. Can such descriptors lead to the development of new coding techniques? Besides new tools

that may be developed based on specific descriptors or description schemes, the access to metadata also opens the door

to the design of new encoding strategies.

2. Investigation of new spatiotemporal decompositions for image and video coding and the development of new

functionalities in the domain of video compression enabling user-friendly and flexible multimedia applications as well

as the facilitation of the interoperation of heterogeneous systems. Towards this goal, new non-linear (morphological,

adaptive) wavelet representations for video compression will be designed. Such wavelet representations will be

exploited for special applications e.g. texture extraction, edge detection and motion estimation.

3. The investigation of various parameters to describe changes in time between pictures in order to capture the time

redundancy inside a picture sequence and reduce the necessary amount of data to code. This investigation will cover the

nature of the parameters to use as well as the choice of a picture representation on which they will act (space domain or

wavelet domain). This will also include the choice of adaptive representations that enables us to take optimal advantage

of this redundancy.

4. An encoder and decoder system will be developed according to the project's proposal. The focus of this codec will lie

on scalable coding and on the usage of metadata (MPEG-7) to control the coding process. This new scheme will be

compared with H.26L and MPEG-4 and a demonstration at a major international exhibition will be given.

Project URL: http://www.cwi.nl/projects/mascot/

Contact Person:

Name: HEIJMANS, Henk (Doctor)

Tel: +31-20-5924057

Fax: +31-20-5924199

Email: Henk.Heijmans@cwi.nl

24. METAVISION To define, create and demonstrate a Universal Electronic Production system capable of meeting the demands of both the

Film and Television Industries. Film as a production format is flourishing, fuelled by the inertia of the feature film

industry and demand for High Definition TV programmes in both the United States and other parts of the world. The

film industry increasingly makes use of electronic post-production techniques and is seriously considering electronic

distribution now that TV production technology is approaching an acceptable performance. We aim to create a

production chain to demonstrate that a completely electronic, high resolution capture, editing, storage, distribution and

asset tracking system can now be devised and built. Propagation of metadata (conforming to standards currently under

� �

development) through the system will ensure that each archived version will contain a history of the processing used

since its creation.

Objectives:

The goals of the METAVISION project are to revolutionise the way films and TV programmes are currently captured,

produced, stored and distributed. Its innovative electronic production system reduces the cost of film production, allows

more artistic flexibility in shooting and editing for film, allows integration of real and virtual images at source quality

for film production and in the compressed domain for use in TV studios. Content may be readily converted between

existing distribution media (film, HDTV, SDTV) and existing compression formats (MPEG-2). The system will take

into account the future requirements of compression schemes and metadata carriage currently under consideration in the

standards bodies (SMPTE) for standards such as MPEG-4, MPEG-7, and will allow material to be archived at various

reference qualities depending on application.

Project URL: http://www.ist-metavision.com

Contact Person:

Name: WALLAND, Paul

Tel: +44-1730-818715

Fax: +44-1730-881199

Email: paul.walland@snellwilcox.com

� �

REFERENCES

MPEG-7 Web Sites There are a number of documents available at the MPEG (http://drogo.cselt.it/mpeg/) Home Page. Information more

focused to industry is also available at the MPEG-7 Alliance, (http://www.mpeg-industry.com) Web site.

MPEG-7 Personal Contacts

For Technical Issues:

Requirements: Fernando Pereira (fp@lx.it.pt)

Audio: Juergen Herre (hrr@iis.fhg.de)

Visual: Miroslaw Bober (miroslaw.bober@vil.ite.mee.com )

Multimedia DS: John Smith (jsmith@us.ibm.com)

Systems: Olivier Avaro (olivier.avaro@francetelecom.fr)

XM Software Implementation: Stephan Herrmann (stephanh@lis.e-technik.tu-muenchen.de)

MPEG-7 Alliance: Neil Day (neil@mpeg-industry.com)

� �

mar. 2002 – jeju, korea title: mpeg-7 applications group...

Documents

chapter 11 mpeg video coding i — mpeg-1 and 2 11.1...

mpeg video compression - cochin university of...

mpeg powerpoint

xv-y360a · mpge-2 mpeg-4 mpeg-1 layer2 mpeg-xvid mpeg-1...

multimedia watermarking techniques -...

mpeg test system - download.tek.com … · mpeg test system...

mpeg guide

accelerating media business developments, mpeg-m: mpeg...

mpeg systems and 3dgc...

mpeg-1 and mpeg-2 digital video coding standards€¦ ·...

mpeg 1 and mpeg 2

cis679: mpeg

mpeg mpeg : motion pictures experts group mpeg : iso...

mpeg standards55555555

mpeg-4 demystified · rob koenen president, mpeg-4 industry...

ix1000 pro-mpeg/smpte 2022 fec inserter mpeg over ip

mpeg standards44444

11.3 mpeg-2 11.2 mpeg-1 11.1 overview

stereoscopic inpainting: joint color and depth...

chapter 11 - mpeg video coding i — mpeg-1 and 2