Description and search of multimedia data Digital Content Retrieval
Prof.ssa Maria Grazia Albanesi
Topics
The problem: searching in multimedia data
Why is it different from text search?
How to make a search in a database of multimedia (MM)
From data to information: what is the difference?
How to evaluate a system for finding information in data MM?
Case studies
References
Book:
H. Blanken, A. P. de Vries, H. E. Blok, L. Fengs: “Multimedia
Retrieval”, Springer, 2007.
Web
http://labs.exalead.com/applications
The problem
Subtitle of the book: Data-Centric System and Applications.
What does this mean?
Purpose of the lesson: answer the following questions:
Why is the search of multimedia data different from the text?
How would you describe the media content (the purpose of the
research?)
What is the quality of the search process
How can the user interact with the search system and under
what constraints?
Diagram of a system for storage and retrieval of MM data
Terminology
Indexing
Search (best: retrieval)
Querying (execution of a query of a search)
Browsing (making a search without criteria or criteria with very
little binding analysis of the data presented in automatic or semi-
automatic way)
In which contexts, the research is a daily problem? Reference: text (introduction)
Journalism: A journalist must prepare a MM service on the
consequences of alcohol on driving
I'm watching TV and I try to find a program in an archive of the
broadcaster.
Web Search sporty nature:
"You get information about tennis players U.S. including video
clips of the games show a player as he goes to net
Other examples????
Retrieval of text vs. retrieval of MM
The text is added to a relational database with a rigid structure (but dynamic)
Understand the difference??
Employees (Name: char (20), City char (20), Phototizio: image
Select name from Employees WHERE City = "pavia"
There is a language (standard), SQL or its slight variations (FQL)
But if the target of the research was:
I want all the names of the bald employees???
First problem: The language and the relational structure are not able to
analyze the semantics of multimedia information, analyze only the DATA.
What is a semantic aspect?
"The semantics is that part of linguistics which studies the meaning of words
(lexical semantics), sets of words, phrases (phrasal semantics) and texts.
(Source: wikipedia)
And the semantic web?
The term Semantic Web, a term coined by its inventor, Tim Berners-Lee, is
defined as the transformation of the World Wide Web in an environment
where the published documents (HTML pages, files, images, and so on) are
associated with information and data (metadata) that specify the semantic
context into a format suitable to the query, the interpretation and, more in
general, to automatic retrieval .
Example: Beethoven's Sixth Symphony, the glasses 3D TV: what connection
there?
Terminology
Multimedia = more than one medium (media)
The term medium can have multiple meanings:
Means of communicating information
Means of support for information it is NOT our case
Type of structured data
Text, web pages, html-audio-video-still images (with or without audio)-graphics-
topographic-view 3D virtual reality-enhanced, ......
Multimedia: a collection of more than one type of media used together
At least one type of medium should be non-alphanumeric
Only digital medium
The text can contain only alphanumeric characters. It 's true the other way
around??
Another problem!
Compared to the text, multimedia occupies more space
Storage problem
Research problem (the search space can be huge)
Taken from the text:
Storage:
A book of 500 pages 2 MB
100 color 144 MB
1 hour of audio CDs 635 MB
1 hour of video 68.4 GB
Comment: are reliable?
This problem affects those involved in storage and those involved in the
provision of search algorithms, not those involved in measuring the
effectiveness of research.
How to bring the problem to a simpler case
metadata
Metadata describes some aspect of the meaning of multimedia data, turning it
into "information."
Photos of painted apple
Metadata:
painting, apple, Cezanne Botany
apple, fruit, stark
Metadata
Should be as descriptive as possible given the MM
Do not introduce too much overhead in terms of storage
The comparison between two metadata must be "fast“
Descriptiveness:
information concerning the format or a fact connected with the given MM
Author, creation date, data length MM, representation techniques (Dublin
Core)
annotation information on the content: give info on the content of the given
MM
information semantic annotation: give info on the meaning of the given
MM
Exercise
which semantic annotations associated with a photo album of historical
agricultural tools?
Problems of metadata-annotations
Annotations (both semantic both content) are added manually or in a semi-
automatic (dubious!)
The moethod is long, expensive and subject to incompleteness and
subjectivity
example:
One day, looking Pavia on Google images, I found in the first place
Second place!!
Problems of metadata-annotations
Annotations are often entered manually (high cost)
It is not clear the criterion with which the data is
annotated
Problem of synonyms: two words can have different
semantic mean
There are very high costs of upgrading
The criteria for entry are almost never uniform
…..
Second problem: how to extract the information to be in the annotations? Low-level Features
Statistical analysis (eg the recurrence of certain words)
Analysis of color (eg color histogram)
Extraction from video clips (eg, Motion analysis, lighting ....)
Features are automatable? Easy and quick to pull out, can be very
limiting
High-level features
Represent the meaning or content of the MM as seen from the
point of view of the user. Ex: automatic translators
Semantic gap
Between low-level features and high-level f. there is a semantic GAP
How vast is the semantic GAP?
Mean distance between high-level and since MM
To understand how big the gap do the opposite route:
From annotation to MM data
Game: What's this?
It can have the simple wave
It can also be a man
It can be compressed
If you give it to someone hurts!
May be of milk
If watered is thrown
SOLUTION: ????????
In english it is completeley meaningless!!
Semantic gap: the game in italian
Può avere l’onda semplice
Lo può essere anche un uomo
Può essere compresso
Se lo dai a qualcuno fa male!
Può essere di latte
Se lo bagni si butta
Solution: ????
CONCLUSION semantic is language dependent
Exercise: create a meaningful exercise in english (or french)
Content-Based Retrieval to database
Meaning of the term Content Based Retrieval
Try to answer the semantic gap
Utility and application fields
Techniques for still images
Fundamental concepts to describe and evaluate the
algorithms
Architecture of a CBR-System
Architecture of a CBR-System
Storage: most obvious aspect of the system, responsible for
delays and degradation of QoS. For this reason, often the MM
data and metadata are kept on separate servers.
Indexing: features can be from records, the content or the
semantics of the data. Features can occupy space and must in
turn be indexed (in the classical sense of database)
The metadata not only depend on the data MM, but can also
exist dependencies between them.
Maintenance of a MIRS
One of the most underestimated. It is incremental maintenance
Why a system must provide for the maintenance?
MM objects can be changed. Should be amended accordingly also
feature. It 'a recursive process.
You can change the algorithms with which target feature.
Dependencies between data can be changed.
Maintenance of a MIRS
Searching: the paradigms
What information do you use?
Data on the content of perception (vision, hearing):
Data on characteristics of low / intermediate level
(metadata-dependent content, often perceptive)
Data on the semantic content (metadata descriptive of
the content, relationships between entities and
attributes of the images with real-world objects)
Extraction procedure of the metadata
For any given MM in the DB are pre-calculated descriptors.
Queries are expressed in terms of perception (visual, auditory)
The examples can be supplied by the user or taken from images
offered by the IR system
To satisfy a query, the system checks the similarity between the
descriptors of the visual content of the query and those of DB
We often use iterative techniques of relevance feedback
Extraction procedure of the metadata
The retrieval by content is based on the concept of similarity, which is very different from the retrieval or exact matching:
The matching is an operation of binary partitioning. Objective: To determine whether or not corresponds to a model (classification)
The retrieval based on similarity is the reorganization of the MM data DB according to their similarity to the query (ranking), although none of the data has characteristics close to those of the given example.
Similarity
As two stimuli are similar?
The determination of the similarity between perceptual stimuli is based on the measurement of an appropriate distance in a metric space
An appropriate distance function (or metric) can be used to measure the distance between two stimuli
These two vectors V1 and V2 in the n-dimensional space, some distance functions commonly used are
21
1
221
n
i
E iViVD
n
i
C iViVD1
21
i
iiT DwD
Relevance Feedback
Browsing vs. searching
Frequently, the user cannot specify exactly what are looking
for.
However, it is able to recognize it if it appears in the output
This phenomenon implies the relevance feedback, but it is also the basis of browsing. We need to find a starting point.
Even if you make a query with approximate parameters:
It asks the system to propose a starting point
We classify data with MM subclassifications later.
Presenatation in a MIRS
It should present the user with an ordered list of
objects MM.
You have the right to see them?
We use icons, or shrunken versions of the object
Constraints and real-time network (streaming)
The interface must fulfill the criteria of usability.
Performance evaluation
Relevant Not relevant
Founded A (correct) B (uncorrectritrovati)
Not founded C (missing) D (correct)
BA
A
Precision
CA
A
Recall
exercise
In an image database of 5000 images of a museum, divided into four classes as follows:
Paintings of the 900: 1200 images
Baroque statues: 1120 images
Other Paintings: 1440 images
Miscellaneous items: 1240
Making a query-by-example by subjecting the search engine picture of a painting of the eighteenth century, and returns the following 20 images:
3 Baroque statues, two paintings of ‘900, 13 paintings from other eras, two jewelry.
How much are Precision and Recall?
Visual Query: what perceptual stimulus?
In the case of images, there are three perceptual stimuli that can be used for a search based on the content of images (still or moving):
color
shape
texture
Given the simplicity and the lack of temporal dimension, are generally used on still images.
Colour
Colour descriptors:
Histograms
Dominant colours
Stat. Moments computed on colour distributions
Application fields: photorealism, art,…
Content-based retrieval on texture
Content-based retrieval on shape
Classical methods of analysis of the shape of a region:
The description assumes the form of image segmentation into regions.
Techniques closely dependent on the application and type of images
The description of the form can be carried out through measures such as the area, the perimeter, the eccentricity, circularity, the orientation and the size of the main diameter, the Fourier descriptors of the contour or a part thereof ...
These descriptions all give rise to one or more numeric values (indices), and can therefore be conveniently used for the retrieval