multimedia information retrieval modern information retrieval course computer engineering department...

95
Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology

Post on 19-Dec-2015

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Multimedia Information Retrieval

Modern Information Retrieval Course

Computer Engineering Department

Sharif University of TechnologySpring 2006

Page 2: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

2

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 3: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

3

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 4: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

4

Support variety of data

Different kinds of media Image

Graph,… Audio

Music, speech,… Video

Page 5: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

5

MMIR Motivations

Content, content, and more content …How to get what is needed ?

Increasing availability of multimedia information

Difficult to find, select, filter, manage AV content

More and more situations where it is necessary to have ‘information about the content’

Page 6: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

6

Key Issues in MMIR

Page 7: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

7

Goals

Want to make multimedia content searchable like text information, Because the value of content depends on how easy it is to find, filter, manage, and use it.

Need content description method beyond simple text annotation

Page 8: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

8

MMIR Approaches

Text Based MMIR Content Based MMIR

Page 9: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

9

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 10: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

10

Text-Based Retrieval

based on text associated with the file

URL: http://www.host.com/animals/dogs/poodle.gif

Alt text: <img src=URL alt="picture of poodle">

Hyperlink text: <a href=URL>Sally the poodle</a>

Page 11: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

11

Text-based Search Engines

Indexing based on text in the container webpage Http://www.google.com Http://www.ditto.com …

Page 12: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

12

Keyword-based System

Video Database

User

Information Need

Automatic Annotation

Keyword

Including filename, video title, caption,

related web page

Page 13: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

13

Why this happens?

Most of these search engines are keyword based Have to represent your idea in keywords These keywords are expected to appear

in the filename, or corresponding webpage

Page 14: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

14

Image: The Google Approach

How does image search work? Google analyzes the text on the page adjacent to

the image, the image caption and dozens of other factors to determine the image content. Google also uses sophisticated algorithms to remove duplicates and ensure that the highest quality images are presented first in your results.

Examples Campanile tcd Cliffs of Moher

Recall may not be great…

Page 15: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

15

Google image search

Page 16: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

16

Google Image Search

Page 17: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

17

Problems with Text-Based

The text in the ALT tag has to be done manually Expensive Time consuming

It is incomplete and subjective Some features are difficult to define in

text such as texture or object shape

Page 18: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

18

Therefore……

Unable to handle semantic meaning of images

Unable to handle visual position Unable to handle time information Unable to use images as query ……….

Page 19: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

19

So …

Better for simple concepts e.g. A picture of a giraffe

Don’t work for complex queries e.g. A picture of a brick home with black

shutters and white pillars, with a pickup truck in front of it (image)

Page 20: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

20

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 21: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

21

Architecture for Multimedia Retrieval

StorageStorage

BrowseBrowse

AV DescriptionAV DescriptionFeature extractionFeature

extraction

Manual / automatic

TransmissionTransmissionEncoding(for transmission)

Decoding(for transmission)

FilterFilterPush

Search / querySearch / query

PullConf.points

Human or machine

Page 22: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

22

Query-retrieval matrix

text

videoimagesspeech

musicsketchesmultimedia

text

still

ssk

etc

hsp

eech

soun

dh

um

min

g exam

ple

s

query doc

conventional text retrieval

hum a tune and get a music piece

you roar and get a wildlife documentary type “floods” and get BBC radio news

Example

Page 23: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

23

Main Components

Feature Extraction & Analysis Description Schemes Searching & Filtering Examples:

IBM’s Query By Image Content (QBIC) Virages’s VIR Image Engine Online http://collage.nhil.com/

Page 24: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

24

Internal representation

Using attributes is not sufficient Feature

Information extracted from objects Multimedia object is represented as a

set of features Features can be assigned manually,

automatically, or using a hybrid approach

Page 25: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

25

Features for MMIR

high-level features words and phrases from text, speech recognition

medium-level features face detector, regions classifiers, outdoor etc

low-level features Fourier transforms, wavelet decomposition,

texture histograms, colour histograms, shape primitives, filter primitives

Page 26: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

26

Internal representation

Values of some specific features are assigned to a object by comparing the object with some previously classified objects

Feature extraction cannot be precise A weight is usually assigned to each feature

value representing the uncertainty of assigning such a value to that feature

80% sure that a shape is a square

Page 27: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

27

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 28: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

28

MMIR Model’s Main Components

Query Language

Indexing and Searching

Page 29: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

29

Query languages

In designing a multimedia query language, two main aspects require attention How the user enters his/her request to the

system Which conditions on multimedia objects

can be specified in the user request

Page 30: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

30

Request specification

Interfaces Browsing and navigation Specifying the conditions the objects of

interest must satisfy, by means of queries Queries can be specified in two

different ways Using a specific query language Query by example

Using actual data (object example)

Page 31: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

31

Conditions on multimedia data Query predicates

Attribute predicates Concern the attributes for which an exact value is

supplied for each object Exact-match retrieval

Structural predicates Concern the structure of multimedia objects Can be answered by metadata and information

about the database schema “Find all multimedia objects containing at least one

image and a video clip”

Page 32: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

32

Conditions on multimedia data

Semantic predicates Concern the semantic content of the

required data, depending on the features that have been extracted and stored for each multimedia object

“Find all the red houses” Exact match cannot be applied

Page 33: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

33

Indexing and searching Searching similar patterns Distance function

Given two objects, O1 and O2, the distance (=dissimilarity) of the two objects is denoted by D(O1,O2)

Similarity queries Whole match Sub-pattern match Nearest neighbors All pairs

Page 34: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

34

Spatial access methods

Map objects into points in f-D space, and to use multiattribute access methods (also referred to as spatial access methods or SAMs) to cluster them and to search for them

Methods R*-trees and the rest of the R-tree family Linear quadtrees Grid-files Linear quadtrees and grid files explode exponentially with

the dimensionality

Page 35: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

35

R-tree

R-tree Represent a spatial object by its minimum

bounding rectangle (MBR) Data rectangles are grouped to form parent

nodes (recursively grouped) The MBR of a parent node completely contains

the MBRs of its children MBRs are allowed to overlap Nodes of the tree correspond to disk pages

Page 36: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

36

Page 37: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

37

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 38: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

38

Visual Features ...

ColourColour

ShapeShape

TextureTexture

Page 39: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

39

HistogramsGreyscale histogram of image AAssuming 256 intensity levelshA(l) (l=1 256)hA(l) =#{(i,j)|A(i,j)=l, i = 1 m, for j = 1 n}

i.e. a count of the number of pixels at each level

Page 40: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

40

Colour Histogram

Describe the colors and its percentages in an image.

Nj

jjjjjc NjandPPColorValueIP(If1

1 ,1,10,),

Page 41: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

41

Texture Matching

Texture characterizes small-scale regularity Color describes pixels, texture describes

regions Described by several types of features

e.g., smoothness, periodicity, directionality Perform weighted vector space matching Usually in combination with a color

histogram

Page 42: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

42

Texture Test Patterns

Page 43: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

43

Image Retrieval using low level features

See IBM demos at: http://wwwqbic.almaden.ibm.com/ http://mp7.watson.ibm.com/ (video)

Hermitage Museum www.hermitagemuseum.org

Page 44: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

44

Berkeley Blobworld

Page 45: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

45

Berkeley Blobworld

Page 46: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

46

But…..• Low-level feature doesn’t work in all the cases

Page 47: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

47

Solution: Regional Low-level Image Feature

Segmentation into objects Extract low-level features from each regions

Page 48: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

48

Solution: High-level Image Feature

Objects: Persons, Roads, Cars, Skies…

Scenes: Indoors, Outdoors, Cityscape, Landscape, Water, Office, Factory…

Event: Parade, Explosion, Picnic, Playing Soccer…

Generated from low-level features

Page 49: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

49

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 50: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

50

Audio Genres

Important types of audio data Speech-centered

Radio programs Telephone conversations Recorded meetings

Music-centered Instrumental, vocal

Other sources Alarms, instrumentation, surveillance, …

Page 51: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

51

Speech-based Documents

Radio/TV news retrieval. Search archival radio/news broadcasts. Video and audio email. Knowledge management : transfert of

tacit knowledge to others. Search audio archives of meetings,

lectures, etc…

Page 52: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

52

Preamble

Two utterances of the same words by the same person under the same conditions generate very different waveforms.

Variations due to loudness, pitch, brightness, bandwidth, harmonisity, and others are all continuous variables and are equivalent to color and texture in images.

Page 53: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

53

Detectable Speech Features

Content Phonemes, one-best word recognition, n-best

Identity Speaker identification, speaker segmentation

Language Language, dialect, accent

Other measurable parameters Time, duration, channel, environment

Page 54: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

54

How Speech Recognition Works

Three stages What sounds were made?

Convert from waveform to subword units (phonemes)

How could the sounds be grouped into words? Identify the most probable word segmentation points

Which of the possible words were spoken? Based on likelihood of possible multiword sequences

All three stages are learned from training data Using hill climbing (a “Hidden Markov Model”)

Page 55: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

55

Speech Recognition

PhonemeDetection

WordConstruction

WordSelection

Phonemen-grams

Phonemelattices

Words

Phonemetranscription

dictionary

Word n-gramlanguage

model

One-best phoneme transcription

N-best phoneme sequences

One-bestword transcript

Page 56: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

56

Music and audio analysis

Music is a large and extremely variable audio class.

The range of sounds is large, from music genres to animal cries to synthesizer samples.

Any of the above can and will occur in combination.

Page 57: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

57

Audio retrieval-by-content Require some measure of audio similarity. Most approaches to general audio retrieval

take a perceptual approach, using measures such as loudness.

Neural net to map a sound clip to a text description : An obvious drawback is the subjective nature of audio description.

Page 58: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

58

Sample system: Muscle fish

To analyze sound files for a specific set of psychoacoustic features.

This results in a vector of attributes that include loudness, pitch, bandwidth and harmonicity.

Given enough training samples, a Gaussian classifier can be constructed, or for retrieval.

Page 59: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

59

An Euclidean distance is used as a measure of similarity.

For retrieval, the distance is computed between a given sound example and all other sound examples (about 400 in the demonstration).

Sounds are ranked by distance, with the closer ones being more similar.

Page 60: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

60

Music and MIDI retrieval

Using archives of MIDI files, which are score-like representations of music intended for musical synthesizers or sequencers.

Given a melodic query, the MIDI files can be searched for similar melodies.

Page 61: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

61

Polyphonic Music Indexing Technique

n-grams encode music as text strings using pitch and

onsets index text words with text search engine process query in the same way application: eg, Query by Humming

Page 62: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

62

Monophonic pitch n-gramming

0 +7 0 +2 0 -2 0 -2 0 Interval:

Example: musical strings with interval-only representation

[0 +7 0 +2]

ZGZB

[+7 0 +2 0]

GZBZ

[0 +2 0 -2]

ZBZb

Page 63: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

63

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 64: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

64

Application

Increasing demand for visual information retrieval Retrieve useful information from databases Sharing and distributing video data through computer

networks

Example: BBC BBC archive has +500k queries plus 1M new items … per

year; From the BBC …

Police car with blue light flashing Government plan to improve reading standards Two shot of Kenneth Clarke and William Hague

Page 65: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

65

Video SearchActive Research Area

Page 66: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

66

Video Search: Features

Color Robust to background Independent of size, orientation Color Histogram [Swain &

Ballard] “Sensitive to noise and

sparse”- Cumulative Histograms [Stricker & Orgengo]

Color Moments Color Sets: Map RGB Color

space to Hue Saturation Value, & quantize [Smith, Chang]

Color layout- local color features by dividing image into regions

Color Autocorrelograms

Texture One of the earliest Image features

[Harlick et al 70s] Co-occurrence matrix Orientation and distance on gray-

scale pixels Contrast, inverse deference

moment, and entropy [Gotlieb & Kreyszig]

Human visual texture properties: coarseness, contrast, directionality, likeliness, regularity and roughness [Tamura et al]

Wavelet Transforms [90s] [Smith & Chang] extracted mean

and variance from wavelet subbands

Gabor Filters And so on

Region Segmentation Partition image into regions Strong Segmentation: Object

segmentation is difficult. Weak segmentation: Region

segmentation based on some homegenity criteria

Scene Segmentation Shot detection, scene detection Look for changes in color,

texture, brightness Context based scene

segmentation applied to certain categories such as broadcast news

Page 67: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

67

Video Search: Features

Face Face detection is highly reliable

- Neural Networks [Rwoley]- Wavelet based histograms of facial features [Schneiderman]

Face recognition for video is still a challenging problem.- EigenFaces: Extract eigenvectors and use as feature space

OCR OCR is fairly successful

technology. Accurate, especially with good

matching vocabularies. Script recognition still an open

problem.

ASR Automatic speech recognition

fairly accurate for medium to large vocabulary broadcast type data

Large number of available speech vendors.

Still open for free conversational speech in noisy conditions.

Shape

Outer Boundary based vs. region based

Fourier descriptors Moment invariants Finite Element Method

(Stiffness matrix- how each point is connected to others; Eigen vectors of matrix)

Turing function based (similar to Fourier descriptor) convex/concave polygons[Arkin et al]

Wavelet transforms leverages multiresolution [Chuang & Kao]

Chamfer matching for comparing 2 shapes (linear dimension rather than area)

3-D object representations using similar invariant features

Well-known edge detection algorithms.

Page 68: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

68

Video Structures

Image structure Absolute positioning, relative positioning

Object motion Translation, rotation

Camera motion Pan, zoom, perspective change

Shot transitions Cut, fade, dissolve, …

Page 69: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

69

Typical Retrieval Framework

User : provide query information that represents his information needs

Database: store a large collection of video data

Goal: Find the most relevant shots from the database Shots: “paragraph” in video, typically

20 – 40 seconds, which is the basic unit of video retrieval

Page 70: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

70

Bridging the Gap Video Database

User

Result

Page 71: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

71

Automatically Structure Video Data

The first step for video retrieval: Video “programmes” are structured into logical scenes, and physical shots

If dealing with text, then the structure is obvious: paragraph, section, topic, page, etc.

All text-based indexing, retrieval, linking, etc. builds upon this structure;

Automatic shot boundary detection and selection of representative keyframes is usually the first step;

Page 72: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

72

Typical automatic structuring of video

A set of shots

a video document

Keyframe browser combined with transcript or object-based search

Page 73: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

73

Ideal solution Video Database

User

Video Structure Information NeedUnderstanding the semantic meaning and retrieve

Result

Page 74: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

74

Ideal solution Video Database

User

Video Structure Information NeedUnderstanding the semantic meaning and retrieve

Result

However, 1. Hard to represent query in

natural language and for computer to understand

2. Computers have no experience3. Other representation

restriction like position, time

Page 75: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

75

Alternative SolutionVideo Database

User

Video Structure Information Need

Match and combine

Result

Provide evidence of relevant information ( text, image, audio)

Page 76: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

76

Evidence-based Retrieval System

General framework for current video retrieval system Video retrieval based on the evidence from both

users and database, including Text information Image information Motion information Audio information

Return a relevant score for each evidence Combination of the scores

Page 77: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

77

Keyword-based SystemVideo Database

User

Video Structure Information Need

Automatic Annotation

Keyword

Including filename, video title, caption,

related web page

Page 78: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

78

Keyword-based SystemVideo Database

User

Video Structure Information Need

Automatic Annotation

Keyword

Manual Annotation

Page 79: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

79

Manual Annotation

Manually creating annotation/keywords for image / video data

Examples: Gettyimage.com (image retrieval) Pros:

Represent the semantic meaning of video

Cons Time-consuming, labor-intensive Keyword is not enough to represent information need

Page 80: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

80

Speech and OCR transcriptionVideo Database

User

Video Structure Information Need

Annotation

Keyword

Speech Transcription

OCR Transcription

Page 81: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

81

Query using speech/OCR information

Query: Find pictures of Harry

Hertz, Director of the National Quality Program, NIST

Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular …

OCR:H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director

Page 82: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

82

What we lack?Video Database

User

Video Structure Information Need

Annotation

Keyword

Speech Transcription

OCR Transcription

Image Information

Page 83: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

83

Image-based RetrievalVideo Database

User

Video Structure Information Need

Text Information

Keyword

Image Feature

Query Images

Page 84: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

84

Image Feature

Image-based RetrievalVideo Database

User

Video Structure Information Need

Text Information

Keyword

Query Images

Low-level Feature

High-level Feature

Page 85: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

85

More Evidence in Video Retrieval

Video DatabaseUser

Video Structure Information Need

Text Information

Keyword

Image Information

Query Images

MotionInformation

Audio Information

Motion

Audio

Page 86: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

86

MPEG-7: The ObjectiveMPEG-7: The Objective

Standardize object-based description tools for various types of audiovisual information, allowing fast and efficient content searching, filtering and identification, and addressing a large range of applications.

New objective for MPEG:

MPEG-1, -2 and -4 represent the content itself (‘the bits’)

MPEG-7 should represent information about the content (‘the bits about the bits’)

Page 87: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

87

Scope of MPEG-7

Description creation

description Description consumption

Not the description creation

Not the description consumption

Just the description !This is the scope This is the scope

of MPEG-7of MPEG-7

The goal is to define the minimum that enables interoperability.

Page 88: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

88

Descriptor (D) : A Descriptor is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.

Examples: Feature Descriptor

Color Histogram of Y,U,V components

Shape ART moments

Motion Motion field, coefficients of a model

Audio frequencyAverage frequency components

Title Text

Annotation Text

Genre Text, index in as thesaurus

MPEG-7 Terminology: DescriptorMPEG-7 Terminology: Descriptor

Page 89: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

89

Outline

Introduction Text-Based MMIR Content-Based Retrieval

Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval

Conclusions

Page 90: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

90

Conclusions

Simple image retrieval is commercially available Color histograms, texture, limited shape information

Segmentation-based retrieval is still in the lab Keep an eye on the Berkeley group

Limited audio indexing is practical now Audio feature matching, answering machine detection

Page 91: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

91

Conclusions

Multimedia IR Text: good solutions exist Video, Image, Sound – a lot of work to do.

Page 92: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

92

Conclusions

The goal of content-based video retrieval is to build more intelligent video retrieval engine via semantic meaning

Many applications in daily life Combine evidence from different aspects Hot research topic, few business system State-of-the-art performance is still

unacceptable for normal users, space to improve

Page 93: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

93

Conclusions

Problems with Content-Based MMIR Must have an example image Example image is 2-D

Hence only that view of the object will be returned

Large amount of image data Similar colour histogram does not equal similar

image Usually the best results come from a

combination of both text and content searching

Page 94: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

94

Conclusions

Combination of multi-modal results Difference characteristics between multi-

modal information Text-based Information: better for middle and

high level queries Image-based Information: better for low and

middle level queries Combination of multi-modal information

Page 95: Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

95

Conclusions

Challenging research questions Draws on

computer vision, audio processing, natural language analysis, unstructured document analysis, information retrieval, information visualisation, computer human interaction, artificial intelligence