1 constantine kotropoulos monday july 8, 2002 visual information retrieval aristotle university of...
TRANSCRIPT
11
Constantine Kotropoulos
Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002
Visual Information RetrievalVisual Information RetrievalVisual Information RetrievalVisual Information Retrieval
Aristotle University of Aristotle University of Thessaloniki Thessaloniki
Department of InformaticsDepartment of Informatics
22
Fundamentals Still image segmentation: Comparison of ICM and LVQ
techniques Shape retrieval based on Hausdorff distance Video Summarization: Detecting shots, cuts, and fades
in video – Selection of key frames MPEG-7: Standard for Multimedia Applications Conclusions
OutlineOutline
33
About Toward visual information retrieval Data types associated with images or video First generation systems Second generation systems Content-based interactivity Representation of visual content Similarity models Indexing methods Performance evaluation
Fundamentals Fundamentals
44
Visual information retrieval: – To retrieve images or image sequences from a database that are relevant to a query. – Extension of traditional information retrieval designed to include visual media.
Needs: Tools and interaction paradigms that permit searching for visual data by referring directly to its content.
– Visual elements (color, texture, shape, spatial relationships) related to perceptual aspects of image content.
– Higher-level concepts: clues for retrieving images with similar content from a database. Multidisciplinary field:
– Information retrieval Image/video analysis and processing– Visual data modeling and representation Pattern recognition– Multimedia database organization Computer vision– Multimedia database organization User behavior modeling– Multidimensional indexing Human-computer interaction
AboutAbout
55
Databases – allow a large amount of alphanumeric data to be stored in a local repository
and accessed by content through appropriate query languages. Information Retrieval Systems
– provide access to unstructured text documents• Search engines working in the textual domain either using keywords or
full text. Need for Visual Information Retrieval Systems has become
apparent when– digital archives were released. – distribution of image and video data though large-bandwidth computer
networks emerged– more prominent as we progress to the wireless era!
Toward visual information Toward visual information retrievalretrieval
66
Query by image content Query by image content using NOKIA 9210 using NOKIA 9210
CommunicatorCommunicator
www.iva.cs.tut.fi/COST211www.iva.cs.tut.fi/COST211
Iftikhar et al.Iftikhar et al.
77
Content-dependent metadata – Data related in some way to image/video content (e.g., format,
author’s name, date, etc.) Content-dependent metadata
– Low/intermediate-level features: color, texture, shape, spatial relationship, motion, etc.
– Data referring to content semantics (content-descriptive metadata)
Impact on the internal organization of the retrieval system
Data types associated with Data types associated with images or videoimages or video
88
Answers to queries: Find– All images of paintings of El Greco.– All byzantine ikons dated from 13th century, etc.
Content-independent metadata: alphanumeric strings Representation schemes: relational models, frame models, object-oriented. Content-dependent metadata: annotated keywords or scripts Retrieval: Search engines working in the textual domain (SQL, full text
retrieval) Examples: PICDMS (1984), PICQUERY (1988), etc. Drawbacks:
– Difficult for text to capture the distinctive properties of visual features– Text not appropriate for modeling perceptual similarity– Subjective
First generation systems First generation systems
99
Supports full retrieval by visual content– Conceptual level: keywords– Perceptual level: objective measurements at pixel level– Other sensory data (speech, sound) might help (e.g. video streams).
Image processing, pattern recognition and computer vision are an integral part of architecture and operation
Retrieval systems for – 2-D still images– Video – 3-D images and video– WWW
Second generation systems Second generation systems
1010
Content– Perceptual properties: color, texture, shape, and spatial relationships– Semantic primitives: objects, roles, and scenes– Impressions, emotions, and meaning associated with the combination of perceptual
features Basic retrieval paradigm: For each image a set of descriptive features are pre-
computed Queries by visual examples
– The user selects the features, ranges of model parameters, and chooses a similarity measure
– The system checks the similarity between the visual content of the user’s query and database images.
Objective: To keep the number of misses as low as possible. Number of false alarms?
Interaction: Relevance feedback
Retrieval systems for 2-D still Retrieval systems for 2-D still images (1)images (1)
1111
Similarity vs. matching
Matching is a binary partition operator: “Does the observed object correspond to a model or not?”
Uncertainties are managed during the process
Similarity-based retrieval: To re-order the database of images according to how similar are to a query example. Ranking not classification
The user is in the retrieval loop; Need for a flexible interface.
Retrieval systems for 2-D still Retrieval systems for 2-D still images (2)images (2)
1212
Video conveys information from multiple planes of communication– How the frames are linked together using editing effects (cuts, fades,
dissolves, etc).– What is in the frames (characters,story content, etc.)
Each type of video (commercials, news, movies, sport) has its own peculiar characteristics.
Basic Terminology– Frame: basic unit of information usually samples at 1/25 or 1/30 of a second.– Shot: A set of frames between a camera turn-on and a camera turn-off– Clip: A set of frames with some semantic content– Episodes: An hierarchy of shots; – Scene: A collection of consecutive shots that share simultaneity is space, time, and
action (e.g. a dialog scene). Video is accessed through browsing and navigation
Retrieval systems for video (1)Retrieval systems for video (1)
1414
3-D images and video are available in– biomedicine– computer-aided design– Geographic maps– Painting– Games and entertainment industry (immersive environments)
Expected to flourish in the current decade Retrieval on the WWW:
– Distributed problem– Need for standardization (MPEG-7)– Response time is critical (work in the compressed domain, summarization)
Retrieval systems for 3-D Retrieval systems for 3-D images and video / WWWimages and video / WWW
1515
1. Visual interfaces
2. Standards for content representation
3. Database models
4. Tools for automatics extraction of features from images and video
5. Tools for extraction of semantics
6. Similarity models
7. Effective indexing
8. Web search and retrieval
9. Role of 3-D
Research directionsResearch directions
1616
Browsing offers a panoramic view of the visual information space
Visualization
Content-based interactivityContent-based interactivity
www.virage.comwww.virage.com
1818
For still images: To check if the concepts expressed in a query match the concepts
of database images: “find all Holy Ikons with a nativity” “find all Holy Ikons with Saint George” (object categories)Treated with free-text or SQL-based retrieval engines (Google)
To verify spatial relations between spatial entities“find all images with a car parked outside a house” topological queries (disjunction, adjacency, containment, overlapping) metric queries (distances, directions, angles)
Treated with SQL-like spatial query languages
Querying by content (1)Querying by content (1)
1919
To check the similarity of perceptual features (color, texture, edges, corners, and shapes) exact queries: “find all images of President Bush” range queries: “find all images with colors between green and blue” K-nearest neighbor queries: find the ten most similar images to the example”
For video: Concepts related to video content Motion, objects, texture, and color features of video: Shot
extraction, dominant colors, etc.
Querying by content (2)Querying by content (2)
2222
Suited to express perceptual aspects of low/intermediate features of visual content.
The user provides a prototype image as a reference example Relevance feedback: the user analyses the responses of the system and
indicates, for each item retrieved the degree of relevance or the exactness of the ranking; the annotated results are fed back into the system to refine the query.
Types of querying:– Iconic (PN) :
• Suitable for retrieval based on high-level concepts
– By painting• Employed in color-based retrieval (NETRA)
– By sketch (PICASSO)– By image (NETRA)
Querying by visual exampleQuerying by visual example
2525
Representation of perceptual features of images and video is a fundamental problem in visual information retrieval.
Image analysis and pattern recognition algorithms provide the means to extract numeric descriptors.
Computer vision enables object and motion identification Representation of perceptual features
Color Texture Shape Structure Spatial relationships Motion
Representation of content semantics Semantic primitives Semiotics
Representation of visual Representation of visual content content
2727
Human visual system: Responsible for color perception are the cones.
From psychological point of view, perception of color is related to several factors e.g., color attributes (brightness, chromaticity, saturation) surrounding colors color spatial organization observer’s memory/knowledge/experience
Geometric color models (RGB, HSV, Lab, etc.) Color histogram: to describe the low-level color properties.
Representation of Representation of perceptual features Color perceptual features Color
(2)(2)
2828
Image retrieval by color Image retrieval by color similarity (1)similarity (1)
Color spaces Histograms; Moments of distribution Quantization of the color space Similarity measures
L1 and L2 norm of the difference between the query histogram H(IQ) and the histogram of a database image H(ID)
2929
Image retrieval by color Image retrieval by color similarity (2)similarity (2)
histogram intersection
weighted Euclidean distance
3030
Texture: One level of abstraction above pixels.
Perceptual texture dimensions: Uniformity Density Coarseness Roughness Regularity Linearity Directionality/Direction Frequency Phase
Representation of Representation of perceptual features Texture perceptual features Texture
(1)(1)
Brodatz albumBrodatz album
3131
Statistical methods: Autocorrelation function (coarseness, periodicity) Frequency content [rings, wedges] Coarseness,
Directionality, isotropic/non-isotropic patterns Moments Directional histograms and related features Run-lengths and related features Co-occurrence matrices
Structural methods (Grammars and production rules)
Representation of Representation of perceptual features Texture perceptual features Texture
(2)(2)
3232
Criteria of a good shape representation Each shape possesses a unique representation invariant to
translation, rotation, and scaling. Similar shapes should have similar representations
Methods to extract shapes and to derive features stem from image processing Chain codes Polygonal approximations Skeletons Boundary descriptors
contour length/ diameter shape numbers Fourier descriptors Moments
Representation of Representation of perceptual features Shape perceptual features Shape
(1)(1)
3333
Representation of Representation of perceptual features Shape perceptual features Shape
(2)(2)
Chain codesChain codes
Polygonal approximationPolygonal approximation
(I. Pitas)(I. Pitas)
3434
Representation of Representation of perceptual features Shape perceptual features Shape
(3)(3)
a b c d
Face segmentation: (a) original color image (b) skin segmentation. (c ) connected components (d) best fit-ellipses.
3535
Structure To provide a Gestalt impression of the shapes in the image.
set of edges corners
To distinguish photographs from drawings. To classify scenes: portrait, landscape, indoor
Spatial relationships Spatial entities: points, lines, regions, and objects Relationships:
Directional (include a distance/angle measure) Topological (do not include distance but they capture set-theoretical
concepts e.g. disjunction) They are represented symbolically.
Representation of Representation of perceptual features perceptual features Structure/Spatial Structure/Spatial
relationships relationships
3636
Main characterizing element in a sequence of frames Related to change in the relative position of spatial entities or toa
a camera movement. Methods:
Detection of temporal changes of gray-level primitives (optical flow) Extraction of a set of sparse characteristic features of the objects, such as
corners or salient points and their tracking in subsequent frames. Crucial role in video
Representation of Representation of perceptual features Motionperceptual features Motion
Salient features (Kanade et al.)
3737
Identification of objects, roles, actions and events as abstractions of visual signs.
Achieved through recognition and interpretation Recognition
To select a set of low-level local features and statistical pattern recognition for object classification
Interpretation is based on reasoning. Domain-dependent e.g. Photobook (www-white.media.mit.edu) Retrieval systems including interpretation: facial database
systems to compare facial expressions
Representation of content Representation of content semantics Semantic semantics Semantic
primitivesprimitives
3838
Grammar of color usage to formalize effects Association of color hue, saturation, etc to psychological
behaviors Semiotics identifies two distinct steps for the production of
meaning Abstract level by narrative structures (e.g. camera breaks, colors, editing
effects, rhythm, shot angle) Concrete level by discourse structures: how the narrative elements create
a story.
Representation of content Representation of content semantics Semiotics semantics Semiotics
3939
Pre-attentive: perceived similarity between stimuli Color/texture/shape; Models close to human perception
Attentive: Interpretation Previous knowledge and a form of reasoning Domain-specific retrieval applications (mugshots); need for
models and similarity criteria definition
Similarity models Similarity models
4040
Distance in a metric psychological Properties of a distance function d:
Commonly used distance functions: Euclidean City-block Minkowsky
Metric model (1)Metric model (1)
4141
Inadequacies: shape similarity Advantages:
similarity judgment of color stimuli consistent with pattern recognition and computer vision suitable for creating indices
Other similarity models: Virtual metric spaces Tversky’s model: function of two types of features: those that are
common to the two stimuli and those that exclusively appear to one only stimulus.
Transformational distances: elastic graph matching
User subjectivity?
Metric model (2)Metric model (2)
4242
Self improving database browser and annotator based on user interaction
Similarity is presented with groupings
The system chooses in trees hierarchies those nodes which most efficiently represent the positive examples.
Set-covering algorithm to remove all positive examples covered.
Iterations
Four eyes approach Four eyes approach
4343
To avoid sequential scanning Retrieved images are ranked in order of similarity to a query Compound measure of similarity between visual features and
text attributes. Indexing of string attributes Commonly used indexing techniques
Hashing tables and signatures Cosine similarity function
Indexing methods (1)Indexing methods (1)
4444
Triangle inequality (Barros et al.) When the query item q is presented, then d(q,r) is computed. For all database items i:
Maximum threshold l=d(q,r); r the most similar item Search for distances closest to d(q,r) If d(i,r) inferior to d(q,r) is found, item i is regarded as the most
similar item, and l=d(i,r). Continue until | d(i,r)-d(q,r)| l
Indexing methods (2)Indexing methods (2)
4545
Fixed grids: non-hierarchical index structure that organizes the space into buckets.
Grid files: fixed grids with buckets of unequal size K-d trees: Binary tree; the values of one of the k features is
checked at each node. R-trees: partition the feature space into multidimensional
rectangles SS-trees: Weighted Euclidean distance; suitable for clustering;
ellipsoidal clusters
Index structuresIndex structures
4646
Performance evaluationPerformance evaluation
Judgment by evaluator
Relevant Not relevant
Retrieved A (correctly retrieved)
C (falsely retrieved)
Not Retrieved B(missed)
D(correctly rejected)
CA
Arecall
BA
Aprecision
4747
Wrap-upWrap-up
Visual information retrieval is a research topic at the intersection of digital image processing, pattern recognition, and computer vision (fields of our interest/expertise) but also information retrieval, databases. Related to semantic web Challenging research topic dealing with many unsolved problems:
segmentation machine similarity vs. human perception focused searching
4848
Still Image Segmentation: Still Image Segmentation: Comparison of ICM and LVQComparison of ICM and LVQ
Comparison– Iterated Conditional Modes (ICM)– Split and Merge Learning Vector Quantizer (LVQ)
Ability to extract meaningful image parts based on the ground truth
Evaluation of still image segmentation algorithms
4949
Iterated Conditional Iterated Conditional Modes (ICM)Modes (ICM)
The ICM method is based on the maximization of the probability density function of the image model given real image data.
The criterion function is:
where xs is the region assignment and ys is the luminance value of the pixel s mi and δi are mean value and the standard deviation of luminance of the region i; C is the clique of the pixel s, VC(x) is the potential function of C, N8(s) is 8x8 neighborhood of the pixel s.
s CxC
i
isqss
s
xVmy
sNqxyxp )(2
exp)(,,|2
2
8
5050
How ICM worksHow ICM works
Initial segmentation is obtained using the K-means clustering algorithm. Cluster center initialization is based on image intensity histogram.
At each iteration probability, the value of the criterion function, is calculated for each pixel. Pixels are assigned to clusters- regions with maximum probability.
Having a new segmentation, the mean intensity value and the cluster variance are estimated. The iterative process stops when no change occurs in clusters.
For obtained segmentation, small regions are merged with nearest ones. The output image contains the large regions assigned the mean luminance value.
5151
Image features and Image features and parameters of the ICM parameters of the ICM
algorithmalgorithmThe ICM algorithm is applied on the luminance
component of the image. Input for the algorithm is a gray level image.
The parameter of the algorithm is the value of the potential function.
The parameter controls the roughness of the segment boundaries.
The value of the parameter is tuned experimentally.
5353
Learning Vector Quantizer Learning Vector Quantizer (1)(1)
neural networkself organizingcompetitive learning lawunsupervisedapproximates data pdf by adjusting the weights of the
reference vectors
5454
Learning Vector Quantizer Learning Vector Quantizer (2)(2)
codebookreference vectors representing their nearest data patternsnumber of reference vectors
– predefined– split and merge
5555
Learning Vector Quantizer Learning Vector Quantizer (3)(3)
Minimal error for data representation:
Iterative correction of reference vectors:
dxxfwxx
r
c )(
)()()()()1( kwkxkakwkw cicc
5656
Learning Vector Quantizer Learning Vector Quantizer (4)(4)
Split and merge technique– Find the winner reference vector w(k) for pattern x(k).– if x(k) is not an outlier proceed as in standard LVQ.– if x(k) is an outlier:
• split the cluster and include x(k) in one of the sub-clusters.• or• create a new cluster having seed x(k).
5858
Experimental set-up (1)Experimental set-up (1)
Apply both methods on images provided by BAL Explore the ability of the algorithms to extract
meaningful image parts based on the qualitative description of the ground truth.
5959
Paintings from Bridgeman Paintings from Bridgeman Art Art
LibraryLibrary – sky, mountains, people, water (smpw)– hammerhead cloud, reflection (cr)– sky, buildings, trees, people, pavement (sbtpp)– sky, people, hat (sph)– sky, trees, water, sails (stws)– horses, sledges, people, snow, sky (hspss)
6060
Experimental setup (2)Experimental setup (2)
We define by O={O1,..,OM} the set of objects given in the qualitative description of the ground truth, where M is the number of objects.
We define by T={T1,..,TN} the set of the regions with the unique label, obtained in the segmented image, where N is number of regions.
Three cases on the outcome of the segmentation as compared to the ground truth are possible.
6161
MatchingMatching
Case 1, best match (BM): The best match is when the region of the segmented image has one to one correspondence with the ground truth object;
Case 2, reasonable match (RM): The reasonable match is when the ground truth object has one to many correspondence with the regions of the segmented image;
Case 3, mismatch. The mismatch is when there is no correspondence between the ground truth objects and the regions of the segmented image.
6262
Three CasesThree Cases
For the jth ground truth object Oj by denoting the cases by i , and the segmented region by T the three cases occur as follows:
.,,1
,3
,2
,,1
Nk
TOwheni
TTOwheni
TTOwheni
j
kj
kj
6363
DecisionDecision
The decision about the presence of the ground truth object Oj in the segmented image according to all cases is:
We put a decision for each object after visual examination of the segmented image according to the definition of the ground truth.
.,0
,,1
otherwise
icaseOr j
ij
6464
Assessment of results (1)Assessment of results (1)
•Ground truth
•sky
•buildings
•trees
•people
•pavement
6767
Assessment of results Assessment of results (4)(4)
•Ground truth
•horses
•sledges
•people
•snow
•sky
7070
Assessment of results (7)Assessment of results (7)
Image Gr. truth ICM LVQ
Smpw 5 7 7
Cr 4 10 8
Sbtpp 5 13 8
Sph 3 14 8
Stws 4 15 11
Hspss 3 5 8
Number of regions
7171
Assessment of results (8)Assessment of results (8)
ICM LVQ
Image BM RM MM BM RM MMSmpw 0 0.75 0.25 0.5 0.5 0
Cr 0 0.75 0.25 0.5 0.5 0
Sbtpp 0.4 0.4 0.2 0.2 0.8 0
Sph 0 1 0 0 1 0Stws 0.5 0.25 0.25 0.25 0.75 0Hspss 0.33 0 0.66 0 1 0
Ranking: ICM vs. LVQ
BM: Best Match
RM: Reasonable Match
MM: Mismatch
7272
Assessment of results (9)Assessment of results (9)
ICM LVQ
BM RM MM BM RM MMAverage 0.20 0.53 0.27 0.24 0.76 0
Ranking: ICM vs. LVQ
BM: Best Match
RM: Reasonable Match
MM: Mismatch
7373
Evaluation of Image Evaluation of Image Segmentation Algorithms (1)Segmentation Algorithms (1)
CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge
7474
Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms
((22)) Evaluation within the Semantic Space; Impossible to ask the Average User to provide all possible h Compromise: Evaluation in the Indexing Space;Allows us to access S without explicitly defining σ. Average User: to achieve a consensus on h. Ask users to evaluate two proposed arrows π to obtain Average User’s response. Implicitly characterize h and σ.
7575
Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms
((33)) Unsupervised algorithms
1. Multiscale Image Segmentation (UCAM-MIS)
2. Blobworld (UC Berkeley-Blobworld)
3. Iterated Conditional Modes (AUTH-ICM)
4. Learning Vector Quantizer (AUTH-LVQ)
5. Double Markov Random Field (TCD-DMRF)
6. Complex Wavelet based Hidden Markov Tree (UCAM-CHMT)
7676
Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((44))
Hard measurements Soft measurements: The speed of response of the user
(time-1): how much better the user prefers one scheme over the other
Faster response: the selected scheme provides a better semantic breakdown of the original image
Slower response: reflects the similarity of two schemes
Aims: To determine whether or not agreement exists in users’ decisions Do two pairwise rankings lead to consistent total orderings? Do hard and soft measurements coincide?
7777
Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((55))
CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge
7878
CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge
Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((66))
7979
Wrap-upWrap-up
ICM– continuous, large sized regions– appropriate for homogeneous regions
LVQ– spatially connected, small regions– more detailed segmentation
Both provide good RM
8080
Image retrieval based on Hausdorff Image retrieval based on Hausdorff distancedistance
Hausdorff distance definition Advantages How to speed-up the computations Experiments
8181
Hausdorff distance Hausdorff distance definitiondefinition
dH+(A,B) = sup {d(x,B) : x A}
dH-(A,B) = sup {d(y,A) : y B},
d(v,W) = inf {d(v,w) : w W}.
dH(A,B) = max (dH+(A,B), dH-(A,B))
8282
Hausdorff distance Hausdorff distance advantagesadvantages
dH (A, B) = 0 A=B (A, B – sets representing graphical objects, object contours, etc.)
Information about parameters of transformation (complex object recognition)
Predictable – simple intuitive interpretation
dH+ and dH
- - for partial obscured or erroneously segmented objects
Possibility of generalization: max quantiles
Possibility of taking into consideration any object transformations
8383
How to speed up the How to speed up the computations computations
for comparing one pair (1)for comparing one pair (1)A. Replacing objects by their contours
The HD between the objects may be large although for contours the HD is small (e.g. disk and ring) possibility of false alarms
but
Contours of similar objects are always similar (small HD)no possibility of omitting similar objects
8484
How to speed up the How to speed up the computations for comparing computations for comparing
one pairone pair (2)(2)
B. Voronoi diagram or distance transform
C. Early scan termination
D. Pruning some parts of transformation space
8585
How to speed up the How to speed up the computations – Number of computations – Number of
models considermodels considerIdea:
Matrix of distances for models (every pair)
1. Pruning some models (we know they will not match query)
2. Database navigation optimal search order (possibility of early finish)
8686
How to speed up the How to speed up the computationscomputations
A. Excluding of model object from the search
queryquery
ref – any model object
- distance to the closest model
only here may lay model closest to query object
refref
Model closest to query object may lay only in colored area
8787
How to speed up the How to speed up the computationscomputations
B. Pruning with many reference objects
8989
How to speed up the How to speed up the computationscomputations
D. Introducing other criteria (pre-computation)
Moment invariants:Moment invariants:
• MM11==(M(M2020+M+M
0202) / m) / m0000
22
• MM22==(M(M2020MM02 02 – M– M
111122) / m) / m
000044
where:where:
ijqp
J
j
I
ipq fjjiiM )()( 00
11
J
jij
qpI
ipq fjim
11
Shape coefficients:Shape coefficients:
• Blair-Bliss coefficientBlair-Bliss coefficient
dsr
SWBB
22
9090
Experiments - databaseExperiments - database
Database: 76 islands, represented as *.bmp images
..…..…
9191
Experiment 1: map queryExperiment 1: map query
Image retrieval. Step 1: interactive segmentation of query object
9292
Experiment 1: map queryExperiment 1: map query
Searching order: 8 / 76 model object were checked Loading model 1 / 76: "amorgos.bmp"Loading model 1 / 76: "amorgos.bmp"Hausdorff distance: 0.156709Hausdorff distance: 0.156709
Loading model 42 / 76: "ithaca.bmp"Loading model 42 / 76: "ithaca.bmp"Hausdorff distance: 0.143915Hausdorff distance: 0.143915
Loading model 27 / 76: "ikaria.bmp"Loading model 27 / 76: "ikaria.bmp"Hausdorff distance: 0.080666Hausdorff distance: 0.080666
Loading model 31 / 76: "kasos.bmp"Loading model 31 / 76: "kasos.bmp"Hausdorff distance: 0.080551Hausdorff distance: 0.080551
Loading model 20 / 76: "sikinos.bmp"Loading model 20 / 76: "sikinos.bmp"Hausdorff distance: 0.121180Hausdorff distance: 0.121180
Loading model 52 / 76: "alonissos.bmp"Loading model 52 / 76: "alonissos.bmp"Hausdorff distance: 0.153914Hausdorff distance: 0.153914
Loading model 17 / 76: "rithnos.bmp"Loading model 17 / 76: "rithnos.bmp"Hausdorff distance: 0.103512Hausdorff distance: 0.103512
Loading model 61 / 76: "skopelos.bmp"Loading model 61 / 76: "skopelos.bmp"Hausdorff distance: 0.045430Hausdorff distance: 0.045430
9393
Experiment 1: map queryExperiment 1: map query
Minimum of Hausdorff distance of model closest to query object
9494
Experiment 2: mouse-Experiment 2: mouse-drawing querydrawing query
Query HD criterion position for min HD HD+M1+M2+WBB
SantoriniSantorini HD = 0.112HD = 0.112 MCD=1.024 MCD=1.024
HD = 0.143HD = 0.143 MCD=1.771 MCD=1.771max HD = 0. 3072max HD = 0. 3072 max MCD=3.4326 max MCD=3.4326
closestclosest
secondsecond
furthestfurthest
PorosPorosElafonisosElafonisos
9595
Wrap-upWrap-up
Hausdorff distance is better for shape recognition than feature-based criteria.Big computational cost of image retrieval based on HD can be reduced by:
• decreasing cost of computation for pair of objects
• replacing object by it’s contours
• using of Voronoi diagram
• off-line database processing – calculating of matrix of distances between model objects
• reducing number of model objects to be compared
• optimal searching order
• using features as auxiliary similarity criteria
9797
OutlineOutline
Entropy, joint entropy, and mutual information Shot cut detection based on mutual information Fade detection based on joint entropy Key frame selection Comparison with other methods Wrap-up
9898
Entropy-Joint EntropyEntropy-Joint Entropy
measure of the information content or the “uncertainty” about X.
• Joint entropy of RVs X and Y:
• Entropy of a random variable X (RV):
9999
Mutual InformationMutual Information
It measures the average reduction in uncertainty about X that results from learning the value of Y.
It measures the amount of information that X conveys about Y.
100100
- for each pair of successive frames ft and f t+1 whose gray levels vary from
0 to N-1
• Calculate three NxN co-occurrence matrices, one for each chromatic component R, G, and B,
whose (i,j) element is the joint probability of observing a pixel having the ith gray level in ft and jth gray level in f t+1
• calculate the mutual information of the gray levels for the three components R, G, B independently and sum them.
CCCB
tt
G
tt
R
tt 1,1,1,,,
Algorithm for detecting Algorithm for detecting abrupt cuts (1)abrupt cuts (1)
101101
– Apply a robust estimator of the mean value in the time-series of mutual information values by defining a time-window around each time instant t0
- An abrupt cut is detected if
Algorithm for detecting Algorithm for detecting abrupt cuts (2)abrupt cuts (2)
102102
cuts
• Mutual information pattern from “star” video sequence that depicts cuts
Mutual information pattern Mutual information pattern (1)(1)
104104
Performance evaluationPerformance evaluation
GT: denotes the ground truth,Seg: the segmented (correct and false) shots using
our methodsRecall is corresponding to the probability of detection
Precision is corresponding to the accuracy of the method considering false detections
Overlap (for fades)
107107
– Features that could be used to define a distance measure: • Successive color frame differences:
• Successive color vector bin-wise HS histogram differences (invariant to brightness changes):
– Fusion of the two differences: – Shot cut detection
• by adaptive local thresholding
Alternative technique for shot Alternative technique for shot cut detectioncut detection
108108
results using mutualinformation
results using the combined method
Comparison of abrupt cut Comparison of abrupt cut detection detection methodsmethods
109109
If G(x,y,t) is a gray scale sequence then, the chromatic scaling of G(x,y,t) can be modeled as
Therefore, a fade-out can be modeled as:
and a fade-in as:
Fades (1)Fades (1)
110110
part of video sequence showing fade-in
part of video sequence showing fade-out
Fades (2)Fades (2)
111111
cuts fade
• Mutual information pattern from “basketball” video sequence showing cuts and fade
Mutual information pattern Mutual information pattern (2)(2)
112112
For each pair of successive frames ft and f t+1 calculate the joint entropy of the basic chromatic components.
Determine the values of the joint entropy close to zero
Detect fade-out (fade-in)• The first (last) zero value defines the end (start) of fade-out (fade-in)
• Find the start (end) of fade-out (fade-in).
A fade should have at least a duration of 2 frames:
Algorithms for detecting Algorithms for detecting fades (1)fades (1)
113113
Fade out cut
frame 1785 frames 1791-1802 frame 1803 frame 1805
frame 1765 frame 1770 frame 1775 frame 1780
Joint entropy pattern (1)Joint entropy pattern (1)
114114
threshold
fade fade
frame 4420 frame 4425 frame 4426 frame 4430 frame 4440
Cut to the dark frame
Joint entropy pattern (2)Joint entropy pattern (2)
115115
results using the joint entropy
results using the average frame value
Comparison of fade detection Comparison of fade detection methods (1)methods (1)
116116
results using the joint entropy
results using the average frame value
Comparison of fade detection Comparison of fade detection methods (2)methods (2)
117117
split & merge algorithm
based on the series of mutual information of gray levels at successive frames within the shot
choose clusters of large sizes
select as potential key frame the first frame from each cluster.
test the similarity of potential key-frames using the mutual information
Algorithm for key frame Algorithm for key frame selection (1)selection (1)
121121
frame 314 frame 2026 frame 2904 frame 4344
key frames selected from different shots
two key frames selected from one shot
frame 2607 frame 2637
Key frame selection (4)Key frame selection (4)
122122
Wrap-upWrap-up
New methods for detecting cuts and fades with high precision have been described.
Accurate detection of fade borders (starting and ending point) has been achieved.
Comparisons with other methods demonstrate the accuracy/success of the proposed techniques.
Satisfactory results for key frame selection by performing clustering on the mutual information series have been reported.
123123
Introduction Applications Standard Description elements Visual structural elements Description schemes
for still images video
Wrap-up
MPEG-7: Standard for MPEG-7: Standard for Multimedia Information Multimedia Information
SystemsSystems
124124
MPEG-7: annotates – data in
• MPEG-4 object-based representations (interactive representations)• MPEG-2• MPEG-1
– analog data (e.g. VHS)– photo prints– artistic pictures
It is not about compression. Aim: Description of audiovisual content
– Descriptors– Description Schemes– Description Definition
IntroductionIntroduction
Frame-based encoding of waveforms
125125
Provides generic description of audiovisual and multimedia content for– systematic access to audiovisual
information sources– re-usability of descriptions and
annotations– management and linking of
content, events, and user interaction
(Jens-Rainer Ohm, HHI)
ApplicationsApplications
126126
o MPEG-7 consists of Descriptors (D) with
Descriptor Value (DV)
Description Schemes (DS)
Description Definition Language (DDL)
(Jens-Rainer Ohm, HHI)
Standard Standard
127127
Structural (Can be extracted automatically) Signal-based features Regions and Segments
Semantic/Conceptual (Mostly manual annotation)o Objectso Sceneso Events
Metadata (Manual or non-signal based annotation)o Acquisition & productionso High-level content descriptiono Intellectual property, usage
Description elements Description elements
128128
Examples of low-level visual features Color Texture Shape Motion
Examples of MPEG-7 visual descriptors
Visual structural elements Visual structural elements
Color Color histogram, Dominant color
Texture Frequency layout, Edge histogramShape Zernike moments, curvature peaksMotion Motion trajectory, parametric motion
Examples of MPEG-7 Visual Description Schemes
Still region Moving region Video Segment
129129
Layouts for description schemes Hierarchical (tree) Relational (entity relationship graph)
Description SchemesDescription Schemes
132132
Description Definition Description Definition Language Language
Based on Extensible Markup Languages
133133
MPEG-7: Generic description interface for audiovisual and multimedia content
MPEG-7: Can be used for Search/filtering and manipulation of audiovisual information Multimedia browsing and navigation Data organization, archiving, and authoring Interpretation and understanding of multimedia content
Key technology
Wrap-upWrap-up