beyond nouns eccv_2008
DESCRIPTION
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers-- A PresentationTRANSCRIPT
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual
Classifiers
Abhinav Gupta and Larry S. DavisUniversity of Maryland, College Park
Proceedings of ECCV 2008
Presented by: Debaleena Chattopadhyay
Presentation Outline
- The Problem Definition
- The Novelty
- The Problem Solution
- The Results
The Problem Definition
To learn visual classifiers for object recognition from weakly labeled data
Labels: city, mountain, sky, sun
Input:
Expected Output:
citymountai
n
sky sun
Novelty
To learn visual classifiers for object recognition from weakly labeled data utilizing additional language
constructs
Labels: (Nouns) city, mountain, sky, sun(Relations) below(mountain, sky), below(mountain, sun) above(sky, city), above(sun, city)
brighter(sun, mountain), brighter(sun, city) behind(mountain, city), convex(sun, city)
in(sun, sky), smaller(sun, sky)
Input:
Expected Output:
citymountai
n
sky sun
Related WorkSome Previous Works:• Learn classifiers for visual attributes from a training dataset of
+ve and –ve images using a generative model [Ferrari et. al]
• Learn adjectives and nouns in 2 steps (adjectives in the 1st step, nouns in the 2nd) using a latent model
[Bernard et. al]
Some After Works:• Mining Discriminative Adjectives and Prepositions for
Natural Scene Recognition [Fei-Fei Li et. al, CVPR 09]
• Joint Learning of visual attributes, object classes and visual saliency
[ Forsyth et. al, ICCV 2009]
Overview
Relationships: in, above, below
SEA
SUN
SKY
(SEA, SUN)
(SEA, SKY)
(SKY, SEA)
(SKY, SUN)
(SUN, SKY)
(SUN, SEA)
Pairs of Nouns:
Nouns:
Proposed Algorithm• Dataset: Training set annotated with nouns and binary
relationships (prepositions and comparative adjectives)• Algorithm:
o Each image represented into a set of image regions.o Each image region is represented by a set of features o Classifiers for nouns are based on these features (CA)o Classifiers for relationships are based on differential features
extracted from pairs of regions (CR)o EM-approach is used to learn noun and relationship models
simultaneously E-step: Update assignments of nouns to image regions,
given CA and CR
M-step: Update model parameters,(CA and CR ) given updated assignments
The Generative Model
Ij Ik
ns np
r
Ijk
CA
CR
Graphical Model for Image Annotation
Learning the Model
EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)
E-step: Compute the noun assignment using parameters from the previous iteration. P( noun i assigned to region j) =
Where,
'
'
'
'
( | , , )| , ,
( | , , )
lij
lik
l l old
A Al l l oldi l l old
k A A
P A IP A j I
P A I
' ' '( | , , ) ( | , , ) ( | , )l l old l l old l oldP A I P A I P A I
Learning the Model
Learning the Model
EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)
M-step: Update the model parameters depending on the updated assignments in the E-step. The Maximum Likelihood parameters depends upon the classifier used.
To utilize contextual information for labeling test-images, priors on relationship ,P(r|ns,np), are also learnt from a co-occurrence table after the relationship annotations are generated.
Inference- Labeling• Test images are divided into regions. Region j is associated with some features Ij and noun nj.
• We know Ij and we have to estimate nj.
• The labeling problem is constrained by priors on relationships between pairs of nouns.• Bayesian Network is used to represent the labeling problem and belief propagation for inference.
Experimental ResultsDataset: • Subset of Corel5k training and test dataset• For training, 850 images with nouns and hand-labelled relationships between subset of pairs of nouns.• Nearest neighbor and Gaussian Classifier based likelihood model for nouns is used.• Decision stump based likelihood model for relationships is used.• 173 nouns • 19 relationships: above, behind, below, beside, more textured, brighter, in, greener, larger, left, near, far from, ontopof, more blue, right, similar, smaller, taller, shorter• Image Features used (30): area, x, y, boundary/area, convexity, moment-
of-inertia, RGB (3), RGB stdev (3), L*a*b (3), L*a*b stdev (3), mean oriented energy, 30 degree
increments (12)
Experimental Results
Resolution of Correspondence Ambiguities• On randomly sampled 150 images from the training dataset• Compared with human labeling• Performance measures:
Range of semantics identified- Both algorithm give similar performance (L)
Frequency Correct- Later algorithm performs better in number of times a noun is identified (R)
Nouns only
Nouns & Relationships
(learned)
Nouns & Relationships
(Human)
Proposed EM algorithm
bootstrapped by IBM Model 1
Proposed EM algorithm bootstrapped by Duygulu et. al
Experimental ResultsReducing Correspondence Ambiguity
Duygulu et. al Beyond Nouns
Experimental ResultsLabeling New Images:• Dataset: Subset of 500 images provided in Corel5k dataset. (Images were selected randomly from those images which had been annotated with words present in the learned vocabulary)• Performance Measure:
Missed Labels (L): Compute St/Sg where St= set of annotations provided by Corel dataset, Sg = set of annotations generated by the algorithmUsing proposed Bayesian model, missed labels decreases by 24% (IBM Model 1) and 17% (Duygulu et. al)
False Labels (R): Compared with human observers.
Experimental ResultsImage Labeling : Constrained Bayesian Model
Duygulu et. al Beyond Nouns
Experimental Results
Precision-Recall:
Precision Ratio- The ratio of number of images that have been correctly annotated with that word to the number of images which were annotated with the word by the algorithm. (Respect to Human Observers)
Recall Ratio: The ratio of the number of images correctly annotated with that word using the algorithm to the number of images that should have been annotated with that word. (Respect to Corel Annotations)
Conclusion
• Most approaches to learn visual classifiers from weakly labeled data use a “bag” of nouns model and try to find correspondence using co-occurrence of image features and the nouns. However, correspondence ambiguity remains.
• This algorithm proposes an EM based method to simultaneously learn visual classifiers for nouns, prepositions and comparative adjectives.
• Experimental results show that using relationship words helps in reduction of correspondence ambiguity and using a constrained model leads to a better labeling performance.
Thank you