mar. 2002 – jeju, korea title: mpeg-7 applications group...
Post on 19-Feb-2018
213 Views
Preview:
TRANSCRIPT
MPEG-7 Applications and Demos.
�
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANIZATION INTERNATIONAL NORMALIZATION
ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 N4676 Mar. 2002 – Jeju, Korea
Title: MPEG-7 Applications
Group/Subgroup: Requirements
Status: Approved
Editor: Neil Day
Structure of this document This purpose of this document is to present a list of MPEG-7 applications, demos and projects that are currently
available or under development. The list is broken down into three sections, namely,
• Multimedia
• Audio
• Visual
Each section lists representative MPEG-7 applications for the domain in question. The multimedia section represents
applications that combine audio, visual and textual functionalities. In some cases, an application may appear twice since
its developers may have produced functions which address more than none of the domains. Much work still needs to be
done on the consistency and presentation of the information here. Further updates of this document will improve the
presentation format.
Before this list of applications, there is a brief introduction describing the purpose of the MPEG-7 standard and
provides an outline of the application domains to which it can be applied.
At the end of the document there is a reference section and contact list for readers seeking more information.
MPEG-7 Applications, Demos and Projects
MPEG-7 Applications and Demos.
�
Disclaimer:
While the utmost attention has been given to providing accurate information on this list of MPEG-7 applications, demos
and projects, please be advised that the author(s) of this document cannot accept responsibility for errors and
inaccuracies. Should the reader or owners of the applications, demos or projects listed herein seek correction or
updates, please notify the authors as soon as possible. The authors’ contact details are provided at the end of this
document. Finally, please note, this document is very much a draft and is expected to undergo many updates and
revisions.
Introduction to MPEG-7 How many times have you seen science fiction movies such as 2001: A Space Odyssey and thought, “Wow, we’re so far
away from having any of the fancy gadgets depicted in these movies!” In 2001, Hal, the talking computer intelligently
navigates and retrieves information or runs complex operations instigated by spoken input. Or how about using an
image-based query, say an image of the motorbike used by Arnold Schwartzenegger in the movie T2, to find images of
similarly looking motorbikes. Dreams or reality?
As more and more audiovisual information becomes available from many sources around the world, many people
would like to use this information for various purposes. This challenging situation led to the need for a solution that
quickly and efficiently searches for and/or filters various types of multimedia material that’s interesting to the user.
For example, finding information by rich-spoken queries, hand-drawn images, and humming improves the user-
friendliness of computer systems and finally addresses what most people have been expecting from computers. For
professionals, a new generation of applications will enable high-quality information search and retrieval. For example,
TV program producers can search with “laser-like precision” for occurrences of famous events or references to certain
people, stored in thousands of hours of audiovisual records, in order to collect material for a program. This will reduce
program production time and increase the quality of its content.
MPEG-7 is a multimedia content description standard, (to be defined by September 2001), that addresses how humans
expect to interact with computer systems, since it develops rich descriptions that reflect those expectations. This
document gives an introductory overview of the MPEG-7 standard. More information about MPEG-7 can be found at
the MPEG-7 website http://drogo.cselt.it/mpeg/ and the MPEG-7 Industry Focus Group website http://www.mpeg-
industry.com These web pages contain links to a wealth of information about MPEG, including many publicly available
documents, several lists of ‘Frequently Asked Questions’ and links to other MPEG-7 web pages.
What Are the MPEG Standards?
The Moving Picture Coding Experts Group (MPEG) is a working group of the Geneva-based ISO/IEC standards
MPEG-7 Applications and Demos.
�
organization, (International Standards Organization/International Electro-technical Committee,
http://www.itscj.ipsj.or.jp/sc29/) in charge of the development of international standards for compression,
decompression, processing, and coded representation of moving pictures, audio, and a combination of the two. MPEG-7
then is an ISO/IEC standard being developed by MPEG, the committee that also developed the Emmy Award-winning
standards known as MPEG-1 and MPEG-2, and the 1999 MPEG-4 standard.
• MPEG-1: For the storage and retrieval of moving pictures and audio on storage media.
• MPEG-2: For digital television, it’s the timely response for the satellite broadcasting and cable television industries in
their transition from analog to digital formats.
• MPEG-4: Codes content as objects and enables those objects to be manipulated individually or collectively on an
audiovisual scene.
MPEG-1, -2, and -4 make content available. MPEG-7 lets you to find the content you need.
Defining MPEG-7 MPEG-7 is a standard for describing features of multimedia content.
Qualifying MPEG-7:
MPEG-7 provides the world’s richest set of audio-visual descriptions
These descriptions are based on catalogue (e.g., title, creator, rights), semantic (e.g., the who, what, when, where
information about objects and events) and structural (e.g., the colour histogram - measurement of the amount of colour
associated with an image or the timbre of an recorded instrument) features of the AV content and leverages on AV data
representation defined by MPEG-1, 2 and 4.
Comprehensive Scope of Data Interoperability
MPEG-7 uses XML Schema as the language of choice for content description.
MPEG-7 will be interoperable with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core, EBU
P/Meta, and TV Anytime.
The Key Role of MPEG-7 MPEG-7, formally named “Multimedia Content Description Inter-face,” is the standard that describes multimedia
content so users can search, browse, and retrieve that content more efficiently and effectively than they could using
today’s mainly text-based search engines. It’s a standard for describing the features of multimedia content.
However…
MPEG-7 will not standardize the (automatic) extraction of AV descriptions/features. Nor will it specify the search
engine (or any other program) that can make use of the description. It will be left to the creativity and innovation of
search engine companies, for example, to manipulate and massage the MPEG-7-described content into search indices
that can be used by their browser and retrieval tools.
A few application examples are:
• Digital libraries (image catalogue, musical dictionary,…)
• Multimedia directory services (e.g. yellow pages)
MPEG-7 Applications and Demos.
�
• Broadcast media selection (radio channel, TV channel,…)
• Multimedia editing (personalised electronic news service, media authoring)
The potential applications are spread over the following application domains:
• Education,
• Journalism (e.g. searching speeches of a certain politician using his name, his voice or his
face),
• Tourist information,
• Cultural services (history museums, art galleries, etc.),
• Entertainment (e.g. searching a game, karaoke),
• Investigation services (human characteristics recognition, forensics),
• Geographical information systems,
• Remote sensing (cartography, ecology, natural resources management, etc.),
• Surveillance (traffic control, surface transportation, non-destructive testing in hostile
environments, etc.),
• Bio-medical applications,
• Shopping (e.g. searching for clothes that you like),
• Architecture, real estate, and interior design,
• Social (e.g. dating services), and • Film, Video and Radio archives.
Brief Overview of List of MPEG-7 Applications, Demos and Projects
The following chart shows the percentage breakdown of audio, visual and multimedia MPEG-7 applications, (for the
purpose of brevity, ‘MPEG-7 applications, demos and projects’ shall be here on in referred to as ‘MPEG-7
applications’).
To date, a total of 43 applications have been either found or made known to the authors of this document. It
can be assumed that there are in fact many more applications being developed and will be developed over time.
Consequently, this list is expected to grow rapidly. Application developers are encouraged to inform the authors of their
work so as to be included in this list.
MPEG-7 Applications and Demos.
�
MPEG-7 Applications
Audio12%
Visual55%
Multimedia33%
MULTIMEDIA
1. MPEG-7 Visual Annotation Tool The MPEG-7 Visual Annotation tool enables users to interactively create MPEG-7 descriptions using MPEG-7
Description Schemes and Descriptors. The tool takes as input an MPEG-7 Schema definition file and an MPEG-7
package description file. The MPEG-7 Schema defines the structure of the MPEG-7 description components using the
MPEG-7 Description Definition Language (DDL). The Package description organizes the MPEG-7 description
components in order to improve the ease of navigation in the MPEG-7 Visual Annotation Tool. The tool provides
utilities for drag-and-drop copying and re-using of description elements and allows the output of the descriptions in
XML to files. The initial implementation centers around manual entry of description data, however, in future work we
plan to explore the integration of automatic and semi-automatic feature extraction methods with the goal of providing a
complete system for MPEG-7 multimedia content annotation and query building.
Contact: John Smith,
Email: jrsmith@watson.ibm.com
Manager, Pervasive Media Management
IBM T. J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, NY 10532
(914) 784-7320;
Figure 1: Percentage breakdown of Current MPEG-7
MPEG-7 Applications and Demos.
�
2. Wireless Images Retrieval using Speech Dialogue Agent The agent in the client terminal recognize user's utterance in English/Japanese with rather dedicated sentences and send
a query profile to the server using a wireless tranceiver channel (32kbps). The server will retrieve the requested images
and deliver the compressed video bitstream (H.263) to the client. Then the client agent will reply with a synthesized
voice and display the images. Now the original format of the metadata is being used, but the MPEG-7 format would be
used in the near future for all the clients, servers, and channels.
Contact: Mikio Sasaki:
Email: msasaki@rlab.denso.co.jp
Research Laboratories, DENSO CORPORATION
500-1 Minamiyama, Komenoki-cho, Nisshin-shi, Aich-ken,470-0111 Japan
3. Internet Streaming Media Metadata Interchange using MPEG-7 Singingfish.com uses MPEG-7 description schemes to model the Internet streaming media metadata. This presentation
describes our use of MPEG-7 description schemes to define a schema for the XML interchange of Internet streaming
media metadata with several of our commercial content partners.
The goal of such metadata interchange is to populate our search index with the highest quality and most semantically
rich metadata possible, ultimately yielding superior relevance to the end user.
The presentation includes a short demonstration of the fidelity of a transformation from MSNBC's "Partner XML
Format" to an MPEG-7 XML description.
Contact: Eric Rehm:
Email: rehm@singingfish.com
Singingfish.com / Thomson Multimedia, Seattle, WA, USA.
4. The MPEG-7 Experimental Model (XM) This presentation covers:
1. The Basic structure of the MPEG-7 XM Software
2. A Graphical User Interface for the MPEG-7 XM Software
3. Key Applications for the MPEG-7 XM
a) Search and retrieval
b) Transcoding
4. Combining visual low level descriptors in a search application
Contact. Mr. Stephan Hermann
Email: stephanh@lis.e-technik.tu-muenchen.de
Affiliation: Munich University of Technology,
Institute for Integrated Circuits
Munich, Germany
MPEG-7 Applications and Demos.
�
5. ASSAVID The usefulness of archived audiovisual material is strongly dependent on the quality of the accompanying annotation.
Currently this is a labour-intensive process, which is therefore limited in the amount of detail that can be stored. In
particular, in real-time applications (such as live broadcast events) it is unrealistic to add much manual annotation. The
proposed information management system will automatically extract descriptive features, using MPEG-7 descriptors
where relevant, and associate these features with a small thesaurus relevant to the subject matter. In this project, the
subject matter will be limited to sports events. The features will be associated with the thesaurus by means of a training
process. In this way the user will be able to make text-based queries on the audiovisual material, using only the
automatically-extracted annotation.
Contact: Rex Dorricott
Organisation: Sony United Kingdom
Email: rd@adv.sonybpe.com
6. FAETHON The overall objective of FAETHON project is to develop an integrated information system offering enhanced search
and retrieval capabilities to users of audiovisual (a/v) archives. This novel system will exploit the advances in handling
a/v content and metadata, as introduced by MPEG-4 and MPEG-7, to provide access characterized by semantic
phrasing of the request (query), unified handling and personalized response. This will be achieved by developing
algorithms and software for,
(i) extracting high level semantic out of syntactic and low level semantic information contained in the a/v
archives,
(ii) filtering the responses of the latter on the basis of continuously updated profiles of individual users.
To this end, state of the art technologies will be used and new algorithms in the fields of fuzzy and hybrid systems will
be developed. Novel database schemes for multidimensional indexing will be employed.
Contact: Stavropoulou Olga, Prof. Anastasios Delopoulos
International Business Development
The ALTEC Group
Fragokklissias 4, Maroussi, 151 25 Athens, Greece
Tel: +301 61 09 746-7, Fax: +301 61 09 748
Email: ost@sysware.gr, adelo@image.ntua.gr
7. MUMIS MUMIS intends to investigate and develop technology for the automatic creation of indexes into video material, using
content related data from several sources and languages. The project will investigate a number of issues, i.e. extraction
of formal representations in several languages, from spoken accounts in several language as well as from image
understanding. Information extracted from these sources must be fused into a multi-tiered data structure, based on an
ontology for the demonstration domain (soccer matches). The data structure will consist of time markers pointing
directly towards events in the programme. The performance of the technology will be proved in the form of Internet-
MPEG-7 Applications and Demos.
�
accessible prototype demonstrators. To that end, a search engine will be designed and implemented which will allow
users to search for specific sets of events and retrieve the corresponding multi-media fragments.
Contact: Prof. Franciska de Jong,
University of Twente,
Centre for Telematics and Information Technology, NL.
Email: fedjong@cs.utwente.nl
8. PRIMAVERA The project PRIMAVERA aims at building a Content Management System for broadcast applications that allows for an
intuitive, visual-aided annotation and retrieval of media information and provides a personalized view on the archive
content as well as a personalized filtering of incoming information. Advanced analysis and indexing functionality for
video as well as for audio content are the fundamental building blocks for new techniques that support efficient
querying and searching in the media repository, and exploration of the archive content. A central focus of the project is
put on the usability in a production environment, by integration of high-performance indexing and retrieval techniques
that fit well for large-scale broadcast archives into an already existing Content Management System, forming the basis
of the PRIMAVERA system
Contact: KUNKELMANN Thomas
TECMATH AG
Email: kunkelmann@medien.tecmath.de
9. SAMBITS System for Advanced Multimedia Broadcast and Information Technology Services
SAMBITS will bring MPEG-4 and MPEG-7 technology to the broadcast industry and the related internet services. The
project will be able to provide multimedia services to a terminal that can display any type of general interest integrated
broadcast/internet services with local and remote interactivity. This is a cost effective solution that is of immediate
commercial interest because it is using the technological Internet and DVB broadcast infrastructure already in place.
SAMBITS will develop a multimedia stdio system and demonstrate integrated (Internet and DVB Broadcast) services
using consumer-type terminal demonstrator. The technological basis for the system will be MPEG-2/-4/-7 where
contributions will be made to the standards. Standardised systems are ecognised to be advantageous for horizontal
markets (e.g. increased competition). SAMBITS will develop methods for service providers to integrate MPEG-2,
MPEG-4 and MPEG-7 data.
SAMBITS: http://www.cordis.lu/ist/projects/99-12605.htm
Contact: Gerhard Stoll,
Organisation : Institut fuer Rundfunktechnik GmbH
Floriansmuehlstrasse 60, 80939 Munchen, Germany
Tel : +49 89 32399347, Fax : +49 89 32399415
MPEG-7 Applications and Demos.
�
Email : stoll@irt.de , URL: http://www.irt.de/
10. SOLO Project Name: The SOLO Project, University of Sydney
SOLO is intended to be an optimum search engine prototype for the MPEG-7 retrieval domain. SOLO will support
multi-step searches across description database (using MPEG-7 DSs) and content databases (using content-based
features). SOLO is built on the meta-search engine (for rudimentary search), mobile code paradigms (for advanced
searches), and computational intelligence (for aiding search strategy composition, database selection, and back-end
filtering). The deployment of mobile code paradigms in the form of search agent technology extends the MPEG-7
enabled MSE to include specific content-based features which are often desirable in an advanced search across content
databases of audio visual archives.
SOLO: http://www.ee.usyd.edu.au/solo/main.html
Contact: Jose Lay,
Signal and Multimedia Processing Lab.
School of Electrical and Information Engineering
Building J-03, University of Sydney
NSW 2006, Australia.
Email: jlay@ee.usyd.edu.au
11. Video Editing & Production • A tool to generate the instances of the MPEG-7 Structural Annotation DS from the annotation sentences of
shots in Japanese.
• A tool to generate an index for retrieving shots from the instances of the Structural Annotation DS, i.e., well-
formed XML documents. The tool uses an XML parser that has a DOM API.
• A retrieval tool that can match a user's queries written in natural language (Japanese) sentences with the shot
index.
• The basic estimation of the tools was done using a collection of real data, i.e., 343 shot descriptions in
Japanese.
Contact: Masahiro Shibata,
NHK, Japan.
Email: shibata@strl.nhk.or.jp
12. INTERFACE
Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments
The objective of the project is to define new models and implement advanced tools for audio-video analysis, synthesis
and representation in order to provide essential technologies for the implementation of large-scale virtual and
augmented environments. The work is oriented to make man-machine interaction as natural as possible, based on
MPEG-7 Applications and Demos.
� �
everyday human communication by speech, facial expressions and body gestures. Man to machine action will be based
on coherent analysis of audio-video channels to perform either low level tasks, or high level interpretation and data
fusion, speech emotion understanding or facial expression classification. Machine-to-man action, on the other hand, will
be based on human-like audio-video feedback simulating a "person in the machine". A common SW platform will be
developed by the project for the creation of Internet-based applications. A case study application will be developed,
demonstrated and evaluated.
The integrated SW platform, will be developed progressively through upgraded releases, the first of which at the end of
the first year of project activity. Compliance with MPEG-4 and MPEG-7 will be guaranteed by deep project
commitment in the standardisation process. At project conclusion, the InterFace consortium will organise an
International Workshop for public demonstration and dissemination of the achieved results.
Project URL: http://www-dsp.com.dist.unige.it/
Contact Person:
Name: LAVAGETTO, Fabio
Tel: +39-010-3532208
Fax: +39-010-3532154
Email: fabio@dist.unige.it
13. PISTE � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ! � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � " # # $ % & # ' " ( ) * ' # + , - + % " ) ' . � � � � �� � � � � � � � � � � � / ) 0 1 ) / ) ' + " 2 # / % ) " 3 " # 4 3 * # 5 � � � � � � � � � � � � � . 6 � � � � � � � � � � � � � � � � �� 6 � � � � � � � � � � � � � � � � � � � � � � � � � 7 � � 8 9 � � � � � � � 6 � � � � � � �
In the PISTE pre-production phase, the broadcasters create a schedule (e.g. the event name, location, and the
participating athletes names and CVs), according to which the capturing and creation of visual enhancements
will take place. This schedule contains the information necessary to uniquely identify the content to be
captured, as well as its proper storage location in the broadcaster database. The metadata schema uses the
information to be stored in a production multimedia repository, but also part of the data to be transmitted
with the multimedia content to the receiver.
PISTE aims at the definition of description metadata for multimedia sport. MPEG7 descriptors and
description schemes will be used to the farthest possible extent. It is expected that specific PISTE
descriptors and description schemes will need to be developed using the MPEG7 DDL language. These new
sport specific developments will be fed back to MPEG7.
The technology (figure 1) used by PISTE allows carrying additional information associated to "sensitive
objects". This means that the information is adequately generated, delivered and its possible access signalled
to the user.
MPEG-7 Applications and Demos.
� �
Figure 1: Getting information on John Doe
Standardisation activities
� � � � � � � � � � � � � � 6 � � � � 7 � � 8 � 9 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � . � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � . 6 � � � 7 � � � . � 6 � � � � � � � . � � � � � 7 � � . � � � � � � � � � . � � � � 7 � � � � � � � � �� � � � � � � � � � � �
� � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � 7 � � 8 � 9 � � � � � � � � � � � � � � � � � � � � � � �� � � � � � 6 � � � � � � � 7 � � 8 � � � � � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � �� � � 6 � � � � � � � � � � � � � � � � � � � � � � 6 � � . � � � � � � � � � � � � � � � � � � � � � � � . � � � � � � �� � � � � � � �� � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � . � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 6 � . � � � � � � � � � � � � � � � � � � � � � � � � � � � � .� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � . � � � � � � � � � � � � � � � � � � 6 � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � �� � � � � � � � �
� � � � � � � � � � � � � � � � ! � � � � " �
14. ART.LIVE # � � " $ ! % $ & � � � ' ( ) * + , - . � $ / � 0 � � � � $ 1 � 2 � % $ � ! � ! 0 � � � � 0 � 3 � ! � 1 ! � � � $ & � $ $ % � 4 � $ � � " � � � � 0 ! � 1! � � % � 0 ! � � $ � 1 � � � � 1 � � � 4 & $ � � � � � � ! � 0 � 5 � � � $ & � ! ! � � 2 � � � ! 0 � � � � ! � 6 � � $ � � � � $ 1 3 0 � � $ � $ & !5 � 7 � 1 8 � ! % � � 9 � � 2 � $ � 5 � � � � : ! 2 � � " � � 5 � � 1 � � � 5 � 7 � 1 8 � ! % � � 9 & ! 5 � ; $ 6 < ! � 1 � & � � � 1 � 9= � % " ! 5 � � > ? # ! 7 $ � $ 5 9 $ & @ � ! % ! � 1 A � � 3 ! % � $ % 1 B � � � % ! 9 C � � � " ! � � $ � D 4 � � E , F . G H . ' + , ) I JE . ( K , L K H . ' + ' L G M , ( ) N ' + O P ( + G Q 4 R S S S 4 T � 5 � � ! U � 1 � ! � 1 V � � � " � 8 A � % ! " W 4 � � � ' ( ) * + , - .
MPEG-7 Applications and Demos.
� �
! 0 � � � � 0 � 3 � ! � 5 � ! � 0 � ! � � � " � ! ! � � 2 � � � ! 0 � � 4 � � " � � � � � ! 0 � � 2 � � � $ � � � 4 � � ! � 5 � 7 " ! � � � 0 ! %� % � 5 � � � � ; � � � % � 2 � � � � 3 � � $ & 0 ! 5 � ! � � # � � � ! % � 5 � ! 0 � $ & � � � � 9 � � � 5 $ 0 0 3 � ; � � � � $ 5 � � � � $ � �! � � � & $ � � $ & � � � 0 ! 5 � ! � � � � � 9 " � � � � � 5 � � % 2 � � � 5 5 � � � 1 ; � � � � � � � � 2 � � 3 ! % ! 5 � � ! � 0 � ! � 1� � � 9 ! � � � � � & $ � � � 2 $ % 2 � 1 ; � � � � � � � � � ! ! � � 2 � 4 ; � � 0 � � � � 9 ! � ! � % � � $ � � � � ! 0 � ; � � � � � $ 3 " �� � � � � � � ! 2 � $ � ? � � � � $ � � $ � � � � � � 1 � $ & � � � ! 0 � � � � 0 � 3 � 4 $ � � � � � � $ � � ! � % $ $ 6 � � " ! � � 0 � � � �; � � � � � � 5 � 7 � 1 � 5 ! " � � ! � � � 1 � � 1 ! � 1 � � � 9 5 � " � � � � $ & & � � 1 � $ � � � � ! 0 � ; � � � � � � � 9 � � � 5
? � 9 ' ( ) * + , - . � ! ! � � 2 � < � � $ 9 W � � � � � ! � % � � � � 1 ! � ! < � ! ! � � 2 � W " ! � � � � 2 � 9 � $ 1 � $ & � � � " ! � �� $ 2 � 1 � � ! � 0 � � � < 0 $ 5 � $ � � 1 � 9 2 ! � $ 3 � 2 � � 3 ! % $ � / � 0 � � W ! % $ � " ; � � � ! % � � � $ & � 2 � � � � � $ � � � � " " � 1 �� $ � � � � ! � 0 � 4 � � � ! 3 � � $ 5 ! 9 1 � 0 � 1 � � $ % $ $ 6 & $ ! � T � � � � " ! � � � $ � 4 � $ � � $ 3 0 � � ! � ! � � 0 3 % ! $ � / � 0 � ; � � 0 � � � ! 1 $ $ 4 � $ 1 � � � 0 � � ; $ $ � / � 0 � � 5 $ 2 � � " � & ! � � � � � $ � � $ � � � � 1 � � 0 � � $ � � 4 $ � � 5 � % 9 � $; ! � � & $ R � � � 0 $ � 1 � � ? 0 0 $ 1 � � " � $ � � � 1 � � � 0 � � 1 � � " " � 4 ! � ! 0 � � $ � $ 0 0 3 � � # � � ! 0 � � $ � � � ! 5 $ 2 � � $� $ 5 � � � 7 � � $ 1 � $ & � � � " ! � � 4 ; � � � ! � $ � � � � 0 � � � � � 1 � � � 0 � � 1 ! � 1 $ � � � � � " " � � ! � � � ! 0 � � 1& $ �C � � � � � 0 $ � � � 7 � 4 1 � � 0 � � � $ � < 5 � � ! 1 ! � ! W ! � ! 3 � $ 5 ! � � 0 ! % % 9 � 7 � ! 0 � � 1 & $ 5 2 � � 3 ! % � 0 � � � < � W0 ! � � 3 � 1 � 9 0 ! 5 � ! < � W $ 5 ! � 3 ! % % 9 ! � � � " � � 1 � $ 2 � � 3 ! % � % � 5 � � � � 4 ! � 1 � � " " � � ! � 5 ! 1 � $ & � � �< � � 5 3 % � ! � � $ 3 � W $ 0 0 3 � � 0 � $ & $ � � < $ � � 2 � ! % W 1 � � 0 � � � $ < � W �
= � � 8 � � 3 � � 1 � $ � $ 2 � 1 � � � ! � 1 ! 1 � � � 1 0 $ � � � � � 8 � ! � � 1 1 � � 0 � � � � $ � & $ � � � 2 ! � $ 3 � � 9 � � � $ &! 3 1 � $ 2 � � 3 ! % � � & $ 5 ! � � $ � � 7 � � � � � " � � � � � � 9 � � � 5 � � $ � � � � " � � � � C � ? ! � 1 = � � 8 � � ! � 1 ! 1 �! % % $ ; � ! " � � � � � � 0 � ! " � $ & � � � � 0 � � ! � $ 5 ! � ! " � 5 � � � � $ 5 ! � � � 3 % ! � � ; � % % 8 1 � & � � � 1 � � " " � � � � ! �! � 0 $ 5 � � � ! � � $ � $ & � � ! � 1 ! 1 � � � 1 1 � � 0 � � � $ � �
Foreseen contribution
# � � � $ / � 0 � ; � % % 3 � � 1 � � 0 � � � $ � � % ! � � 1 � $ � 5 ! " � � ! � 1 � 5 ! " � � � " 5 � � � � 1 $ � / � 0 � � � $ � � % $ �� � � � ! 0 � � 2 � � 0 � � ! � $ � �
# � � % $ ; 8 % � 2 � % 1 � � 0 � � � $ � 4 ; � � 0 � ! � 1 $ 5 ! � � < � 0 � � ! � $ W 8 � � 1 � � � � 1 � � � 4 � 9 � � 0 ! % % 9 0 $ � � � � � $ & � � �� ! � � 0 = � � 8 1 � � 0 � � � $ � � # � � � � ! � � 7 � ! 0 � � 1 & $ 5 � � � � 5 ! " � 0 $ � � � � � < � � � � ! � 3 ! % T � / � 0 � �� $ 2 � 1 � 1 ! � � 5 ! " � � � 3 � � 0 � � ! % $ � " ; � � � ! % � � ! 5 ! � � � � � � A T � � W � � ! � 6 � � $ 2 ! � $ 3 � � $ $ % � �
Ideally, this process should be extended to the synthetic objects as well. Special descriptors can
also be associated to some player behaviours. The project will develop a set of specific dedicated
descriptors.
For all the moving virtual objects, it is proposed to merely characterise them by a bounding box that
surrounds any disconnected object(s), be it a single person or a group of. Therefore, the following features
will be extracted:
o U $ 0 ! � � $ � $ & � � � " ! 2 � � 9 0 � � � � $ & � � � � $ 3 � 1 � � " � $ 7 �
o = $ � � $ � $ & � � � " ! 2 � � 9 0 � � � � $ & � � � � $ 3 � 1 � � " � $ 7 �
o V � � " % � � � � $ � 2 � � 3 � " $ 3 � � ! � � 1 $ � � � � ! � � � 0 � ! � � $ $ & � � � � $ 3 � 1 � � " � $ 7 �
o B � � � 0 � � $ � $ & � � � � � � � � 0 � � ! � � � � 0 � $ & ! � 9 $ � / � 0 � < � � � � � � ! � % � ! � � $ � � � $ 3 � 1 � � " � $ 7 � W �
MPEG-7 Applications and Demos.
� �
o � 3 5 � � � $ & $ � / � 0 � � < � 3 5 � � $ & � $ 3 � 1 � � " � $ 7 � � W �
C � � ! ! % % � % 4 � % ! 9 � � � � � ! 0 � � $ � < � 9 � � 0 ! % % 9 5 $ 3 � � 0 % � 0 6 � W ! � ! % � $ 1 � � 0 � � � 1 �? % % � � � � � & � ! � 3 � � 5 3 � � � � ! 0 0 $ 5 � ! � � � 1 � 9 � $ 5 � 0 � � ! � � � 9 < 0 $ � & � 1 � � 0 � W 5 � ! � 3 � � � $ 1 � � $� � ! � % � � � � � � � � � � � ! � � $ � � ! � 6 � $ � ! 6 � ! � � $ � � ! � � 1 � 0 � � � $ � � � = $ � $ 2 � 4 � $ 5 � � � 5 �0 � ! ! 0 � � � � ! � � $ � ; � % % � � � � � 1 � 1 � � $ 1 � � $ 0 � � 0 6 � & ! � � � 0 � & � � 1 � � " " � $ 0 0 3 ! � ! � � 0 � � �� � � � ! � � 4 $ � $ & � 7 1 � ! 1 % � � � � � �
Standardisation activities
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � �; ; ; � � � % � � 3 0 % � ! 0 � � � � � $ / � 0 � � � ! � � % � 2 � �
AUDIO
1. Spoken Content"At the awareness event we will present the Spoken Content description scheme, along with a basic Web application to
illustrate the concept and its applications."
Canon Research Centre Europe (CRE), [along with our collaborators at IBM (Almaden)] have proposed the MPEG-7
Spoken Content description scheme. Searching and indexing audio-visual data using the speech in the sound track is,
perhaps, one of the most natural metadata retrievals and our metadata format is especially designed to store the
(sometimes erroneous) output of a speech recognition system in a manner most suited to robust retrieval. We are
performing research in a large range of potential applications of such data using textual and/or verbal querying.
Contact: Dr. Wilson Chiu, Phil Garner
Email: wilsonc@cre.canon.co.uk
Email: philg@cre.canon.co.uk
Canon Research Centre Europe Ltd
Tel: +44 1483 448844 Fax: +44 1483 448845
2. CUIDADO Content-based retrieval of Music and Audio samples
MPEG-7 Applications and Demos.
� �
Information overload, inability to quickly browse through audio, poor added-value to music via Internet distribution,
keyword dictatorship, inability to search for similarities among sounds : these are music consumer complaints addressed
by IRCAM’s CUIDADO project. It aims at developing content-based technologies using and contributing to the MPEG
7 standard. Building reusable modules for audio feature extraction, music indexing, database management, networking
and constraint based navigation, CUIDADO targets two pilot applications:
1) The Music Browser features musical paths and automatic compilations according to user’s tastes, search for music
similarities, learning systems based on user’s profiles. One version is tied to Web music monitoring and another to Web
music sales and customized radios.
2) The Sound Palette involves musicians and studios for developing an authoring tool both online and in an existing
professional audio environment taking full advantage of the extracted audio features for innovative retrieval, editing and
processing.
CUIDADO is expected to bring Studio Online to a mature stage based on MPEG7 standard.
High impact on Music providers and labels involved in Web distribution is expected. Assuming that music value is
currently decreasing in itself, this application should give an evidence that new services and interfaces for accessing
music and sounds may bring more value than the music itself in the future context of Electronic Music Distribution
(EMD). This project should also raise copyright societies and music labels awareness on their role in using new content-
based tools for music promotion and music protection.
Contact: Vincent Puig (Managing Director),
Email: Vincent.Puig@ircam.fr
IRCAM, 1 place Igor Stravinsky, 75004 Paris.
3. Music Retrieval by Melodic Query Identifying a musical work from a melodic fragment is a task that most people are able to accomplish with relative ease.
For some time now researchers have worked to give computers this ability as well, which has come to be known as the
"query-by-humming" problem. To accomplish this, it is reasonable to study how humans are able to perform this task,
and to assess what features we use to determine melodic similarity. Research has shown that melodic contour is an
important feature in determining melodic similarity, but it is also clear that rhythmic information is important as well.
The system to be demonstrated uses our proposed MPEG-7 description scheme for melody, which incorporates melodic
contour and rhythmic information as the primary representation for music search and retrieval.
Additional front-end processing (to process queries), a medium-sized database of music, and a search engine (for
finding appropriate matches) have also been implemented to complete the full query-by-humming system.
Contact: Youngmoo Kim,
Email: moo@media.mit.edu
Machine Listening Group, MIT Media Lab., Boston, USA
http://sound.media.mit.edu/~moo
4. MIDLIB
As a part of the V3D2 digital library initiative, funded by Deutsche Forschungsgemeinschaft (DFG), the MiDiLiB-
project deals with two fundamental open problems in the field of digital music libraries (DMLs). The first part of our
project investigates techniques for automatic indexing and retrieval of score-based audio data (MIDI). More precisely,
we are interested in efficient strategies for processing content-based queries, e.g., queries resulting from given melody
MPEG-7 Applications and Demos.
� �
fragments or rhythms. Typically, such kinds of queries are only rough or imprecise approximations of the actual tunes.
Therefore, the retrieval system has to generate a ranked list of approximate matchings. The second part of the MiDiLiB-
project deals with perceptually stable methods for cascaded audio coding, i.e., iterated en-and decoding of (CD-quality)
PCM-data.
A small part of MIDILIB is the development of "music contents markup language" (MCML) which will be used for
displaying query results and for browsing in pieces of music. MCML (Music Contents Markup Language) is an XML
based language and the first applications will use this XML version. But in addition we are defining a "MPEG7-draft"
version (of MCML) to make it easy to adapt our applications to MPEG7 when it is standard.
MIDLIB: http://leon.cs.uni-bonn.de/forschungprojekte/midilib/english/
Contact: Frank Kurth, Jochen
frank@leon.cs.bonn.edu :: MIDILIB and MCML
schimmel@cs.bonn.edu :: MCML
5. Natural Language The following tool was selected to be an MPEG-7 Core Experiment.
Title: Linguistic Access to Multimedia Contents
Short Description:
Authoring, retrieval, summarization, and presentation of multimedia contents semantically structured by linguistic data.
Function (in one sentence):
Benefit for Applications:
Authors and users are allowed to use natural language for structuring and accessing multimedia contents, which is
expected to be the most natural way of dealing with them.
Potential Users: General Public
Related DSs/Ds (Description Schemes and Descriptors, i.e. tools):
Linguistic DS, Structured Annotation DS, Segment DS.
Linguistic data/annotation and video data are aligned through data/data links.
Demos/Applications available:
Of Tool: available.
Of Application in Industry Domain: available.
Contact: Katashi Nagao, HASIDA Koiti:
Email; KNAGAO@jp.ibm.com
IBM Tokyo Research Laboratory
MPEG-7 Applications and Demos.
� �
Email: hasida@etl.go.jp
Director of Information Science Division,
Electrotechnical Laboratory, (ETL), Ibaraki, Japan.
VISUAL .
1. Search Engine Tool Visualization of MPEG7 Similarity Retrieval of 2D and 3D Data
On the upcoming awareness meeting, an application will be presented, that allows visualization of similarity-based
retrieval results. This so-called Search Engine was applied for Core Experiments of visual descriptors. A graphical user
interface is used for a number of functionalities, e.g.
- Browsing of image databases
- Visualization of 3D data and image sequences
- Similarity Search for a number of visual descriptors
The SearchEngine is a Java-based application, that incorporates underlying functionality of C- or C++-based extraction
and similarity matching algorithms. For sequence playback, an MPEG-player is included. A small 3D viewer was also
added in Java3D technology. For comparable results within the MPEG-7 Core Experiments for visual descriptors a
console application, called MPEG-7 XM was used among the participants. This XM-Software is also integrated into the
SearchEngine. Certain basic image features are analyzed for similarity-based retrieval by this GUI:
- Texture
- Color
- Contour/Shape
- 3D Geometry by analyzing a number of 2D projections from 3D object � Different motion in Sequences (e.g. background motion from left to right)
Contact: Karsten Müller,
MPEG-7 Applications and Demos.
� �
Email: kmueller@hhi.de
Heinrich Hertz Institute
Einsteinufer 37, 10587 Berlin, Germany
Tel: +49 30 31002 225, Fax: +49 30 392 7200
2. Video Editing To highlight the basic elements of the Video Editing DS, two applications have been developed to edit and browse the
description of the video temporal structure specified in the MPEG-7 format. This temporal structure describes various
types of temporal units : shots, rushes and composition segments. The way these units are edited is also described in
terms of transition or composition effects.
The browser offers some navigation functionalities to quickly access specific parts of a video document regarding the
way it has been built.
The editor allows the completion of a partial description of a video structure that could have been provided by a video-
to-shot segmentation algorithm.
Contact: Rosa Ruiloba, Philippe Joly
Email: rosa.ruiloba@lip6.fr, Philippe.Joly@lip6.fr
Indexation Multimedia
Laboratoire d'informatique de Paris 6 - LIP-6/UPMC
Bureau C1219 tel : (33).(0)1.44.27.88.48
8, rue du Capitaine Scott 75015 Paris
3. MPEG-7 Video Browser and Highlight Generation Tool Background
As in the case of abstracts describing papers in the classical sense, a video summary is an ‘audiovisual’ abstract of a
video program, which allows for quick understanding of the underlying story of the program. We can capture the whole
story by glancing over the summary. The structure of the summary description is hierarchical so that coarse-to-fine
navigation is possible in order to access more detailed information (contents). Furthermore the MPEG-7 summary
structure allows for an event-based summary with which customized browsing and filtering is possible on the summary.
3.1. Video Summary Generator
A video summary generator creates video summaries of highlights automatically and/or semi-automatically, using low
level audiovisual features and high level semantics, assisted by content analysis and highlight detection rules,
respectively. It outputs description data that contain a set of highlights, composed of video summaries, that are derived
from the MPEG-7 Summarization DS (Description Scheme). The generated short video highlight summaries can be
used with an electronic program guide (EPG) or with a video-browsing tool in personal storage devices. The Video
Summary Generator also generates a CC (closed-caption) text DB, which consists of keywords extracted from CC text,
using text analysis and time codes to indicate ‘keyword-sychronized’ video locations obtained by speech recognition in
the audio track, in order to support text-based retrieval of news video clips.
MPEG-7 Applications and Demos.
� �
3.2. MPEG-7 Video Browser
The generated summary description data is fed in to an MPEG-7 video browser. The MPEG-7 video browser allows for
quick overview utilizing, audiovisual highlights with different time durations, efficient browsing through non-linear
navigation (based on multi-level hierarchical highlights and associated key-frames), and a ‘highlights-view’ and
browser based on particular events. It also provides CC-text-based retrieval of news video clips. The video browser can
be used as a video-browsing/-retrieval tool in personal storage devices in digital broadcasting and internet environments
Contact: Munchurl Kim
Email: mckim@etri.re.kr
Participants: Munchurl Kim, Hyun Sung Chang
Affiliation: Electronics and Telecommuncations Research Institute
Country: Korea
4. Video-over-IP (VIP) Full streaming over the internet of both content and MPEG-7 metadata.
The Video-over-IP project (VIP) is an integration project carried out in the Netherlands. Various partners are involved,
like the Telematica Instituut, NOB, SurfNet, IBM, and TNO. In general, the purpose of the VIP project is to allow for
the production, storage, management, retrieval, and exploration of video content for a specific set of users. Moreover,
these services should be interoperable on the Internet. The following general activities should be possible:
• The production of digitised video material (media objects), ready for distribution over the Internet
• The production of content (video material plus metadata), including the management of this production process
• Digitising video and other material in various formats
• Extending the video material with additional descriptions (metadata) for disclosure, either (semi-) automatically, or
manually. In order to search in the content, parts of the video should be properly described.
• Indexing and retrieval of content
• End users should be able to search in the content
• Search, retrieval- en browsing facilities, including a user interface
• Security against improper use (encryption and watermarking)
• Distribution of high-quality video to the end user over the IP network
• The realisation of a network architecture needed for offering these services with a high quality of service
• Charging the end users on the basis of the delivered content and services (content-based billing & accounting).
Contact: Erik Oltmans:
Email: oltmans@telin.nl
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
MPEG-7 Applications and Demos.
� �
5. Image Search using Edge/Contours
Short Description: The edge histogram descriptor represents local edge distribution on 4*4 sub-images. Five types of
edges, namely four directional edges and one non-directional edge, are defined for each sub-image. So, there are a total
16*5=80 histogram bins.
Function (in one sentence): Image to image matching, especially for natural images with non-uniform edge
distribution.
Benefit for Applications: Since the descriptor is based on the edge information in the image, it is good for natural
image matching. Since edges play an important role for image perception, it can retrieve images with similar semantic
meaning.
Potential Users:
- Image search (retrieval) by example or by sketch
- Scene change detection
- Key frame clustering
Contact: Soo-Jun Park
Email: psj@etri.re.kr
Senior Member of Engineering Staff
ETRI-CSTL
161 Kajong-dong, Yoosung, Taejon, 305-350, Korea
URL: http://sir.etri.re.kr/~soop
(phone) +82-42-860-6899, (fax) +82-42-860-4889
6. Video Annotation and Summaries 6.1. Video annotation editor
The system can automatically generate video transcripts using speech recognition and make a correspondence between
video scenes and words. The system can also detect scene change boundaries. The user of this system can modify
automatically-generated transcripts and scene boundaries. The user can also annotate some keywords and comments on
objects in video frames. The system generates XML-formatted annotation data that contains all information created
through user interaction.
6.2. Video player with summarization function
The system can generate summaries of video clips with annotation data and play them. The user can input any keyword
that will contribute to customization of video summaries. The player can also show transcript text synchronized with
video like closed caption. The user can also select any scenes from the scene index window.
Contact: Katashi Nagao, HASIDA Koiti:
Email; KNAGAO@jp.ibm.com
IBM Tokyo Research Laboratory
Email: hasida@etl.go.jp
Director of Information Science Division,
MPEG-7 Applications and Demos.
� �
Electrotechnical Laboratory, (ETL), Ibaraki, Japan.
7. MPEG-7 Video Object Segmentation and Retrieval
We will present a video object segmentation system, AMOS, and a video retrieval and visualization application.
Currently, fully automatic segmentation of semantic objects is only successful in constrained visual domains. The
AMOS system takes on a powerful approach in which automatic segmentation is integrated with user input to track
semantic objects in video sequences. For general video sources, the system allows users to define an approximate object
boundary by using a tracing interface. Given the approximate object boundary, the system automatically refines the
boundary and tracks the movement of the object in subsequent frames of the video. The system is robust enough to
handle many real world situations that are hard to model in existing approaches, including complex objects, fast and
intermittent motion, complicated backgrounds, multiple moving objects, and partial occlusion. For each video
sequences, the description generated by this system is a set of semantic objects with the associated regions and visual
features that can be manually annotated with text. Text annotations can also be assigned to the video sequence.
The video retrieval and visualization application developed during a Core Experiment within MPEG-7 uses the
descriptions generated by AMOS to retrieve and visualize videos based on the annotations and visual features. This
application supports (1) query by example based on any combination of visual features and text annotations (e.g.,
retrieve video sequences with similar objects based on color and texture); (2) query by keyword based on text
annotations (e.g., retrieve video sequences with “elephant”); and (3) advanced visualization of the retrieved results
based on panoramic views and segmented objects.
Contact: Ana Belen Benitez
Email: ana@ee.columbia.edu
Electrical Engineering Department
Columbia University, 1312 Mudd, #F6, 500 W. 120th St, MC 4712, New York, NY 10027
Voice: +1 212 854-7473 Fax: +1 212 932-9421
URL: http://www.ee.columbia.edu/~ana/
8. Hierarchical Summary Browser Category: Application of the Summary DS.
Features: Summary Theme based Audio-Visual Summary Selection
Presentation Time based Audio-Visual Summary Selection
Abstract: Hierarchical Summary Browser is based on the Summary DS which is in the category of navigation
and access. The functionality of the proposed hierarchical summary browser includes dynamic audio-
visual summary generation following the user’s selection of the summary theme and summary length
in time. By allowing users to select preferred summary length, the hierarchical level of the provided
summary can be automatically selected so that the length of the summary is closest to the user’s
request. By allowing users to select preferred theme of the summary, audio-visual summaries of
various length with the selected theme can be dynamically generated, so that the user can select the
length. The combined selection of the themes and length are also available. Such a hierarchical
MPEG-7 Applications and Demos.
� �
summary browser can be also used in accordance with the user preference, so that the preferred theme
and the length can be automatically selected based on the user preference.
9. Table of Contents (ToC) Browser Category: Application of the Segment DS and Graph DS.
Features: ToC based Audio-Visual Content Navigation
Abstract/Detail* relation based Navigation
Cause/Effect** relation based Navigation
Abstract: The ToC browser is based on the segment DS and the Graph DS. The ToC browser interface provides
tree-structured interface of the selected content so that a user can select interested segment of the
content. Each segment is represented by a representative key frame, and the selected segment is
summarized by a list of key frames. Based on the abstract/detail and cause/effect relationships defined
using the Graph DS, a user can select segments in the abstract/detail/cause/effect relation. The
abstract/detail relation provides two segments one of which is an abstract version of the other and the
latter is a detailed version of the former segment. The cause/effect relation provides two events one of
which causes the other and the latter is the result of the former event.
*Abstract/detail are proposed normative types of relations
**The effect relation is equivalent to the result relation which is a proposed normative relation type and the cause
relation can be considered as the inverse relation of the result
10. SmartEye
Image Retrieval System with Relevance-Feedback based Image Characterization
Category: Application of the MatchingHint DS.
Features: Image retrieval using multiple descriptors with different weights. Automatic learning MatchingHints
by user’s feedback
Abstract: Generally, relevance feedback has been utilized only to refine the query conditions in image retrieval.
However, in our Application, the usage of the relevance feedback is extended to the image database
categorization so as to be accommodated to user independent image retrieval. In our approach, to
guarantee a user-satisfactory performance, descriptors and the elements of the descriptors
corresponding features of each image are weighted using the relevance feedback. We use the
MatchingHint DS for weighting descriptors and elements of each descriptor based on color and
texture descriptors. In addition, our system uses the appropriate learning method based on the
reliability scheme preventing wrong learning from wrong feedback.
Applications 8, 9, 10. Contacts: Heon Jun Kim, Ph.D.,
Email: hjk@lge.co.kr
Senior MTS
Also: Kyoungro Yoon, Jin-Soo Lee, Jung-Min Song
MPEG-7 Applications and Demos.
� �
MI Group, Information Technology Lab.
LG Corporate Institute of Technology, 16 Woomyeon-Dong, Seocho-Gu, Seoul, Korea 137-724
TEL: +82 526 4132, FAX: +82 526 4852
11. MPEG-7 Camera In collaboration with the EPFL, FASTCOM Technology S.A. has developed, around its smart camera product, a
MPEG-7 "standard" communication layer making searchable the content of its video output. The goal of this project is
to develop an MPEG-7 camera. Such a camera is able to interpret the scene at hand and to extract the relevant
information. This information is no longer dependent on the audio-visual data from which it has been extracted. This
modality independent (or pure) information can for instance be displayed on the receiver side in any given modality, be
it textual, audio or visual. The MPEG-7 camera project builds the full chain of extracting the pure information in the
scene, transmitting it, and displaying it according to the recipient’s preference. The extraction of the information is
performed using an intelligent camera, which analyses the visual data in real time.
The analysis is defined through the software application running inside the intelligent camera. The extracted
information is then packaged in an MPEG-7 compliant bitstream and transmitted to the recipient in XML. The latter
uses an XML-based interface to display the information. The display may be done either in a textual, an audio or a
visual mode.
Contact: Nicolas Pican
Email: pican@fastcom-technology.com
FASTCOM Technology S.A.
Contacts: Touradj Ebrahimi , Fabrice Moscheni
Email: Touradj.Ebrahimi@epfl.ch, , moscheni@fastcom-technology.com
Signal Processing Laboratory
Swiss Federal Institute of Technology EPFL, CH-1015 Lausanne, Switzerland
Direct Phone: +41 21 693 2606, Mobile Phone: +41 79 331 9993, Voice Mail: +1 (661) 420 5952
Direct Fax: +1 (603) 994 8508 or +1 (661) 420 5946, Office Fax: +41 21 693 7600
12. ALIVE Architecture and authoring tools for prototype for Living Images and new Video Experiments.
The goal of the project is to develop an architecture and a set of tools, both generic and application dependent, for the
enhancement of narrative spaces. This will be achieved in testing the two aspects of this goal: telepresence (inclusion of
real objects into virtual worlds) and augmented reality (inclusion of artificial object into real worlds). ALIVE will bring
together towards these very specific goals image processing engineers, AI computer scientists and multimedia authors.
The ALIVE trials will broadcast recent innovations in the field of intelligent distributed video processing among the
authors and artists community. Indeed, the world of comics authors shows a growing interest in the World Wide Web
which is a new media able to broadcast in an easier way their production and to offer new narrative spaces. A
particularly interesting, new an innovative prospect relies into the interaction between real-life and virtual worlds
(telepresence and augmented reality). The very precise goal of ALIVE is to implement this interaction and to
demonstrate it into specific experiments.
MPEG-7 Applications and Demos.
� �
ALIVE: http://www.cordis.lu/ist/projects/99-10942.htm
Contact: Benoît Macq
Univversité catholique de Louvain
Place du Levant, 2 , 1348 Louvain-la-Neuve, Belgique
Tel : (32-10)47 22 71
Fax : (32-10)47 20 89
Email : macq@tele.ucl.ac.be
13. Image/Video Retrieval By Color The following tool was selected to be an MPEG-7 Core Experiment.
Title: Color Layout
Short Description:
Compact description of spatial distribution of color. It enables us a high-speed and high
performance Image/Video retrieval.
Function (in one sentence):
Image-to-image, video-segment-to-video-segment and sketch-to-image matching
Benefit for Applications:
This descriptor provides very high-speed image retrieval functionality with low cost
Potential Users:
Users who need image based retrieval system, especially working for consumer
products
Related DSs/Ds (Description Schemes and Descriptors, i.e. tools):
How are they used together? How they differ?
Other color related descriptors, color histogram, dominant colors, structure histogram, represent
distribution of color, so that only the Color Layout descriptor is available when we need local color
characteristic, which is important for sketch-based query system.
Demos/Applications Available:
Of Tool:
Image, video, retrieval systems have been implemented.
Of Application in Industry Domain: It might be possible.
Contact: Akio Yamada,
NEC Japan,
Email: a-yamada@da.jp.nec.com
14. ISTORAMA In order to allow efficient search of the visual information available all over the Web, a highly efficient automated
MPEG-7 Applications and Demos.
� �
system is needed that regularly traverses the Web, detects visual information and processes it in such away to allow for
effective search and retrieval.
In the ISTORAMA project, an intelligent image content-based search engine for the World Wide Web will be
developed. This system will offer a new form of media representation and access of content available in WWW.
Information Web Crawlers continuously traverse the Internet and collect images that are subsequently indexed based on
integrated feature vectors. These features along with additional information such as the URL location and the date of
index procedure are stored in a database in an MPEG-7 compliant format. The user can access and search this indexed
content through the Web with an advanced and user friendly interface. The output of the system is a set of links to the
content available in the WWW, ranked according to their similarity to the image submitted by the user.
ISTORAMA: http://uranus.ee.auth.gr/Istorama
Contact: Yiannis Kompatsiaris
Greek Secretariat for Research and Technology (GSRT)
Intrasoft S.A.
Informatics and Telematics Institute (ITI)
Lab: http://uranus.ee.auth.gr
Email: ikom@dion.ee.auth.gr
15. Movie Tool MovieTool is a Description Tool for video. MovieTool can read and write a content description file in the format
defined MPEG-7 MDS Working Draft 3.0. We can construct a structure using this tool. This step is performed manually
or automatically. Some of automatic segmentation methods by content-based analysis are provided. Then we select a
key frame, extract content-based features, and a syntactic structure is generated. After that, we can input values to
attributes in each segment and whole content.
MovieTool provides many attributes that are defined in MPEG-7 description schemes.
MovieTool is a component in a Multimedia Content Retrieval System that we are developing. In this system, we
register a content description to a database through a MPEG-7 file. In the easy web I/F, we give attribute values as a
condition, and then the scenes or contents that we want are listed. We can select some of them, and play them in
specified order.
Contact: Takahiro Kunieda, Yuki Wakita
Ricoh Co. Ltd. Tokyo, Japan,
Tel: +81-3-3815-7261, Fax: +81-3-3818-0348
Email: kunieda@src.ricoh.co.jp, wakita@src.ricoh.co.jp
16. RETRIEVE The RETRIEVE project will demonstrate that advances in multimedia technology can be applied to CCTV schemes to
improve the quality of life in the UK, by reducing crime or increasing detection rates through the location of appropriate
digital video evidence. The project aims to maximise the automatic generation of metadata associated with images so
MPEG-7 Applications and Demos.
� �
that it can be applied in real-time as the surveillance video is captured, compressed and transmitted. The effectiveness of
CCTV systems will be enhanced if there are efficient search and retrieval mechanisms for the large digital archives.
The RETRIEVE project involves research and development in the converging worlds of high-speed digital
communications (ATM) and multimedia applications; the video images captured by the surveillance cameras will be
compressed and transmitted electronically to digital archives together with metadata which will facilitate searches of the
stored image databases. The main innovation of this project is the definition and generation of metadata relating to the
wavelet compressed video stream and the real-time automated analysis and tagging of compressed video fields. The
project will also investigate and refine digital image authentication mechanisms such as watermarking, audit trails and
digital signatures for surveillance applications. This work should lead to significant advances for the security industry
which is increasingly aware that in order to upgrade and increase the performance of expanding CCTV systems it has to
adopt digital technology.
Contact: Kate Grant Email: Nine_Tiles@psilink.co.uk
17. Sign Language Indexation Here, we address the issue of sign language indexation/recognition. The existing tools, like on-line Web
dictionaries or other educational-oriented applications, are making exclusive use of textual annotations. However,
keyword indexing schemes have strong limitations due to the ambiguity of the natural language and to the huge effort
needed to manually annotate a large amount of data. In order to overcome these drawbacks, we tackle sign language
indexation issue within the MPEG-7 framework and propose an approach based on linguistic properties and
characteristics of sign language. The "shape" of a sign is characterized by performing advanced image processing
procedures. The method developed introduces the concept of over time stable hand configuration instanciated on natural
or synthetic prototypes. The prototypes are indexed by means of a shape descriptor which is defined as a translation,
rotation and scale invariant Hough transform. We also show how meaningful descriptors, like finger configuration, hand
position, palm orientation together with arm and hand motion can be combined into a complete sign language
description schema. A demonstration will be presented by applying the proposed approach to data sets consisting of
“Letters” and “Words”, respectively. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � �
18. TV ANYTIME The ISO/MPEG group has identified a wide range of application scenarios for their emerging MPEG-7 standard on
audio-visual metadata. TV Anytime with their vision of future digital TV services encompasses a large number of them.
As TV Anytime has also identified metadata as one of the key requirements to realize their vision, MPEG-7 is the
natural candidate to fill that role. Here, we describe technically how metadata for the TV Anytime scenario can be
created using MPEG-7.
Digital broadcasting offers the opportunity to provide value-added interactive services, that allow end users to
MPEG-7 Applications and Demos.
� �
personalize and control the material of interest, an evolution of TV into an integrated entertainment / information
gateway. The MPEG-7 collection of descriptors and description schemes for multimedia is able to fulfill the metadata
requirements for TV Anytime.
Contact: Silvia Pfeiffer, Uma Srinivasan
CSIRO Mathematical and Information Sciences
Locked Bag 17, North Ryde NSW 1670, Australia
Phone: +61 2 9325 3144, Fax: +61 2 9325 3200
{ Silvia.Pfeiffer | Uma.Srinivasan }@cmis.csiro.au
19. Image Browsing System The following tool was selected to be an MPEG-7 Core Experiment.
Title: Class and image browsing system for an image collection
Short Description:
It provides browsing capabilities for semantic classes, images per class, images, and classer per image.
Function in one statement:
Class and image browser for image collections
Benefits to users:
The collection description allows to create flexible browsing interface for image and classes defined for the collection
particularized for each user.
How DS is similar/different from related Ds:
It uses the Collection Structure DS and the Model DS.
Demos/Applications available:
Yes: http://www.mpeg7.ee.columbia.edu/
(Collection Structure DS)
Contact: Ana Belen Benitez
Electrical Engineering Department, Columbia University
1312 Mudd, #F6, 500 W. 120th St, MC 4712, New York, NY 10027
Voice: +1 212 854-7473 Fax: +1 212 932-9421
Email: ana@ee.columbia.edu
URL: http://www.ee.columbia.edu/~ana/
20. Scene retrieval for Sports Events
Aerobatic Grand prix 2000
URL http://ab-movie.ricoh.co.jp
MPEG-7 Applications and Demos.
� �
Contact: Takahiro Kunieda, Yuki Wakita
Ricoh Co. Ltd. Tokyo, Japan,
Tel: +81-3-3815-7261, Fax: +81-3-3818-0348
Email: kunieda@src.ricoh.co.jp, wakita@src.ricoh.co.jp
21: COMVID
The scientific goal of this assessment project is to develop and test a low bit-rate high quality video compression that is
based on new innovative segmentation algorithms. These segmentation algorithms will enable to extract and track the
objects among consecutive video frames. The plan is to setup framework of tools for multimedia applications that are
content-based. An automatic interaction with the contents to achieve a low bit-rate video compression will be based on
the extracted segments. This will enable substantial improvements of the coding efficiency and will enable also to have
efficient coding of multiple concurrent data streams. The proposed video codec will not necessitate extensive computing
MPEG-7 Applications and Demos.
� �
power.
Objectives:
The objectives of this project are to develop and test a new innovative framework to extract visual objects that will
enable to achieve a very low bit rate of high quality video compression that will outperform existing methods. It can
also be added later to MPEG-4 standard, which recommends to utilize object manipulation to achieve low bit rate
compression. But how to extract these visual objects remains an open problem for MPEG-4. The proposed methodology
can substantially enhance the performance of MPEG-4 and MPEG-7.
Typical applications for the proposed technology are:
• interpersonal real-time video communication with low bit rate,
• broadcast video distribution with low bit rate,
• content manipulation of video content for home video production,
• mobile multimedia,
• content-based storage and retrieval,
• streaming video on the Internet/Intranet,
• digital set-top box and many more.
Contact Person:
Name: AVERBUCH, Amir
Tel: +972-3-6422020
Fax: +972-3-6422020
Email: amir@math.tau.ac.il
22: HEALTHSAT - Health Interactive Satellite Channel
R&D & feasibility study on the first interactive medium for distributing health and wellness programmes and
personalised services to European citizens, via satellite and Internet direct to home digital TV sets, computers displays
and mobile terminals. HEALTHSAT intends to develop advanced prototypes that will be mostly innovative in 3 years
time: an integrated, open satellite+Web+WAP/UMTS platform the first integrated telecommunications multi-platform
combining satellite, Internet and mobile technologies, fully compliant with all standards to come, allowing in the same
time streaming and interactivity, and any-where, any-time access to programmes and services, and new techniques for
producing programmes matching the requirements of broadcasting (digital TV), multicasting (satellite) and narrow
casting (Web). Interactivity will be introduced through the utilisation of both the Internet technologies and the satellite
Broadband Interactive System (BBI).
Work description:
The satellite platform will build upon the basis of technical components which are currently considered as state of the
art. Further developments to be conducted in the project will match components with requirements of both multicasting
and interactivity, and address issues like newscasting, Webcasting, multiple transponder load sharing, quality of service,
MPEG-7 Applications and Demos.
� �
bandwidth management and Broadband Interactive return path. Two different systems will be developed for building
and integrating the Web platform into the satellite platform: a traffic system for analysing editorial needs and
programmes' traffic with multi-format playlists, multi-play and end-users profiles, and a digitalisation system for the
automation of encoding and play system. The development of XML applications for the satellite + Web platform will
allow testing the integration of a WAP/UMTS platform into the former. Important normative work will be conducted in
the project (DVB/RCS, MPEG-4, MPEG-7). All along the project innovative health services and tools will be
developed and proposed on the Web. They will be linked to broadcastings and videos of the database. New production
methodologies matching with multicasting requirements will be developed and tested in the project, relating to
graphical design and on air look definition, edition, production, dubbing, emission and contribution.
Implementation of large scaled broadcasts via digital satellite system will serve 9 5 million already equipped
households (streaming, narrowcasting, Webcasting, newscasting). Moreover 500 supplementary households will be
equipped for testing all the aspects of interactivity. An advisory board will be established for ensuring the quality and
relevance of programmes produced and for participating in the edition of institutional programmes. Health information
will be designed to assist citizens to understand various health-related issues and maintain a healthy lifestyle.
Project URL: http://www.healthsat.org
Contact Person:
Name: PRUMMEL, Claire
Tel: +31-78-6310105
Fax: +31-78-6313563
Email: claire.prummel@mmc-europe.com
23. MASCOT - Metadata for Advanced Scalable Video Coding Tools
The explosion of multimedia applications leads to a great expansion of video transmission over heterogeneous channels.
These developments increase the need for highly flexible and scalable video compression systems. For all currently
available interactive multimedia applications however, which are demanding in terms of video quality and coding
efficiency, the cost as well as the limited scalability remain unacceptable. Therefore, MASCOT seeks to design an
intrinsically scalable video-coding scheme providing fully progressive bit streams by exploiting novel morphological
and adaptive wavelet decomposition methods. Furthermore, the project aims to improve the quality and efficiency of
video coding systems by exploiting metadata information.
Objectives:
1. To develop scalable compression schemes, which fulfil the requirements of multimedia applications, by covering a
wide range of bit-rates, by yielding high compression ratios, and by providing a high level of bit stream embeddedness
to enable the adaptation of the compressed video data to a variety of networks and receivers.
2. To provide a breakthrough in the domain of video compression and to improve the quality of the reconstructed video
at low bit rates by exploiting non-linear (morphological, adaptive) wavelet decompositions, by using metadata during
the encoding step, and by development and optimisation of advanced prediction schemes.
MPEG-7 Applications and Demos.
� �
3. To provide Europe with a leadership in new video compression techniques.
4. To contribute to standardisation committees like ITU-T and MPEG, and to follow the activities of MPEG-7 (and
MPEG-21) as well as JPEG-2000.
Work description:
The work will consist of the following ingredients:
1. The analysis of the descriptors and description schemes included in the MPEG-7 standard or in other metadata
standards such as SMPTE. Can such descriptors lead to the development of new coding techniques? Besides new tools
that may be developed based on specific descriptors or description schemes, the access to metadata also opens the door
to the design of new encoding strategies.
2. Investigation of new spatiotemporal decompositions for image and video coding and the development of new
functionalities in the domain of video compression enabling user-friendly and flexible multimedia applications as well
as the facilitation of the interoperation of heterogeneous systems. Towards this goal, new non-linear (morphological,
adaptive) wavelet representations for video compression will be designed. Such wavelet representations will be
exploited for special applications e.g. texture extraction, edge detection and motion estimation.
3. The investigation of various parameters to describe changes in time between pictures in order to capture the time
redundancy inside a picture sequence and reduce the necessary amount of data to code. This investigation will cover the
nature of the parameters to use as well as the choice of a picture representation on which they will act (space domain or
wavelet domain). This will also include the choice of adaptive representations that enables us to take optimal advantage
of this redundancy.
4. An encoder and decoder system will be developed according to the project's proposal. The focus of this codec will lie
on scalable coding and on the usage of metadata (MPEG-7) to control the coding process. This new scheme will be
compared with H.26L and MPEG-4 and a demonstration at a major international exhibition will be given.
Project URL: http://www.cwi.nl/projects/mascot/
Contact Person:
Name: HEIJMANS, Henk (Doctor)
Tel: +31-20-5924057
Fax: +31-20-5924199
Email: Henk.Heijmans@cwi.nl
24. METAVISION To define, create and demonstrate a Universal Electronic Production system capable of meeting the demands of both the
Film and Television Industries. Film as a production format is flourishing, fuelled by the inertia of the feature film
industry and demand for High Definition TV programmes in both the United States and other parts of the world. The
film industry increasingly makes use of electronic post-production techniques and is seriously considering electronic
distribution now that TV production technology is approaching an acceptable performance. We aim to create a
production chain to demonstrate that a completely electronic, high resolution capture, editing, storage, distribution and
asset tracking system can now be devised and built. Propagation of metadata (conforming to standards currently under
MPEG-7 Applications and Demos.
� �
development) through the system will ensure that each archived version will contain a history of the processing used
since its creation.
Objectives:
The goals of the METAVISION project are to revolutionise the way films and TV programmes are currently captured,
produced, stored and distributed. Its innovative electronic production system reduces the cost of film production, allows
more artistic flexibility in shooting and editing for film, allows integration of real and virtual images at source quality
for film production and in the compressed domain for use in TV studios. Content may be readily converted between
existing distribution media (film, HDTV, SDTV) and existing compression formats (MPEG-2). The system will take
into account the future requirements of compression schemes and metadata carriage currently under consideration in the
standards bodies (SMPTE) for standards such as MPEG-4, MPEG-7, and will allow material to be archived at various
reference qualities depending on application.
Project URL: http://www.ist-metavision.com
Contact Person:
Name: WALLAND, Paul
Tel: +44-1730-818715
Fax: +44-1730-881199
Email: paul.walland@snellwilcox.com
MPEG-7 Applications and Demos.
� �
REFERENCES
MPEG-7 Web Sites There are a number of documents available at the MPEG (http://drogo.cselt.it/mpeg/) Home Page. Information more
focused to industry is also available at the MPEG-7 Alliance, (http://www.mpeg-industry.com) Web site.
MPEG-7 Personal Contacts
For Technical Issues:
Requirements: Fernando Pereira (fp@lx.it.pt)
Audio: Juergen Herre (hrr@iis.fhg.de)
Visual: Miroslaw Bober (miroslaw.bober@vil.ite.mee.com )
Multimedia DS: John Smith (jsmith@us.ibm.com)
Systems: Olivier Avaro (olivier.avaro@francetelecom.fr)
XM Software Implementation: Stephan Herrmann (stephanh@lis.e-technik.tu-muenchen.de)
MPEG-7 Alliance: Neil Day (neil@mpeg-industry.com)
top related