beyond nouns eccv_2008

20
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Abhinav Gupta and Larry S. Davis University of Maryland, College Park Proceedings of ECCV 2008 Presented by: Debaleena Chattopadhyay

Upload: debaleena-chattopadhyay

Post on 29-Jun-2015

208 views

Category:

Career


0 download

DESCRIPTION

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers-- A Presentation

TRANSCRIPT

Page 1: Beyond nouns eccv_2008

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual

Classifiers

Abhinav Gupta and Larry S. DavisUniversity of Maryland, College Park

Proceedings of ECCV 2008

Presented by: Debaleena Chattopadhyay

Page 2: Beyond nouns eccv_2008

Presentation Outline

- The Problem Definition

- The Novelty

- The Problem Solution

- The Results

Page 3: Beyond nouns eccv_2008

The Problem Definition

To learn visual classifiers for object recognition from weakly labeled data

Labels: city, mountain, sky, sun

Input:

Expected Output:

citymountai

n

sky sun

Page 4: Beyond nouns eccv_2008

Novelty

To learn visual classifiers for object recognition from weakly labeled data utilizing additional language

constructs

Labels: (Nouns) city, mountain, sky, sun(Relations) below(mountain, sky), below(mountain, sun)   above(sky, city), above(sun, city)      

brighter(sun, mountain), brighter(sun, city)      behind(mountain, city), convex(sun, city)

in(sun, sky), smaller(sun, sky)      

Input:

Expected Output:

citymountai

n

sky sun

Page 5: Beyond nouns eccv_2008

Related WorkSome Previous Works:• Learn classifiers for visual attributes from a training dataset of

+ve and –ve images using a generative model [Ferrari et. al]

• Learn adjectives and nouns in 2 steps (adjectives in the 1st step, nouns in the 2nd) using a latent model

[Bernard et. al]

Some After Works:• Mining Discriminative Adjectives and Prepositions for

Natural Scene Recognition [Fei-Fei Li et. al, CVPR 09]

• Joint Learning of visual attributes, object classes and visual saliency

[ Forsyth et. al, ICCV 2009]

Page 6: Beyond nouns eccv_2008

Overview

Relationships: in, above, below

SEA

SUN

SKY

(SEA, SUN)

(SEA, SKY)

(SKY, SEA)

(SKY, SUN)

(SUN, SKY)

(SUN, SEA)

Pairs of Nouns:

Nouns:

Page 7: Beyond nouns eccv_2008

Proposed Algorithm• Dataset: Training set annotated with nouns and binary

relationships (prepositions and comparative adjectives)• Algorithm:

o Each image represented into a set of image regions.o Each image region is represented by a set of features o Classifiers for nouns are based on these features (CA)o Classifiers for relationships are based on differential features

extracted from pairs of regions (CR)o EM-approach is used to learn noun and relationship models

simultaneously E-step: Update assignments of nouns to image regions,

given CA and CR

M-step: Update model parameters,(CA and CR ) given updated assignments

Page 8: Beyond nouns eccv_2008

The Generative Model

Ij Ik

ns np

r

Ijk

CA

CR

Graphical Model for Image Annotation

Page 9: Beyond nouns eccv_2008

Learning the Model

EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)

E-step: Compute the noun assignment using parameters from the previous iteration. P( noun i assigned to region j) =

Where,

'

'

'

'

( | , , )| , ,

( | , , )

lij

lik

l l old

A Al l l oldi l l old

k A A

P A IP A j I

P A I

' ' '( | , , ) ( | , , ) ( | , )l l old l l old l oldP A I P A I P A I

Page 10: Beyond nouns eccv_2008

Learning the Model

Page 11: Beyond nouns eccv_2008

Learning the Model

EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)

M-step: Update the model parameters depending on the updated assignments in the E-step. The Maximum Likelihood parameters depends upon the classifier used.

To utilize contextual information for labeling test-images, priors on relationship ,P(r|ns,np), are also learnt from a co-occurrence table after the relationship annotations are generated.

Page 12: Beyond nouns eccv_2008

Inference- Labeling• Test images are divided into regions. Region j is associated with some features Ij and noun nj.

• We know Ij and we have to estimate nj.

• The labeling problem is constrained by priors on relationships between pairs of nouns.• Bayesian Network is used to represent the labeling problem and belief propagation for inference.

Page 13: Beyond nouns eccv_2008

Experimental ResultsDataset: • Subset of Corel5k training and test dataset• For training, 850 images with nouns and hand-labelled relationships between subset of pairs of nouns.• Nearest neighbor and Gaussian Classifier based likelihood model for nouns is used.• Decision stump based likelihood model for relationships is used.• 173 nouns • 19 relationships: above, behind, below, beside, more textured, brighter, in, greener, larger, left, near, far from, ontopof, more blue, right, similar, smaller, taller, shorter• Image Features used (30): area, x, y, boundary/area, convexity, moment-

of-inertia, RGB (3), RGB stdev (3), L*a*b (3), L*a*b stdev (3), mean oriented energy, 30 degree

increments (12)

Page 14: Beyond nouns eccv_2008

Experimental Results

Resolution of Correspondence Ambiguities• On randomly sampled 150 images from the training dataset• Compared with human labeling• Performance measures:

Range of semantics identified- Both algorithm give similar performance (L)

Frequency Correct- Later algorithm performs better in number of times a noun is identified (R)

Nouns only

Nouns & Relationships

(learned)

Nouns & Relationships

(Human)

Proposed EM algorithm

bootstrapped by IBM Model 1

Proposed EM algorithm bootstrapped by Duygulu et. al

Page 15: Beyond nouns eccv_2008

Experimental ResultsReducing Correspondence Ambiguity

Duygulu et. al Beyond Nouns

Page 16: Beyond nouns eccv_2008

Experimental ResultsLabeling New Images:• Dataset: Subset of 500 images provided in Corel5k dataset. (Images were selected randomly from those images which had been annotated with words present in the learned vocabulary)• Performance Measure:

Missed Labels (L): Compute St/Sg where St= set of annotations provided by Corel dataset, Sg = set of annotations generated by the algorithmUsing proposed Bayesian model, missed labels decreases by 24% (IBM Model 1) and 17% (Duygulu et. al)

False Labels (R): Compared with human observers.

Page 17: Beyond nouns eccv_2008

Experimental ResultsImage Labeling : Constrained Bayesian Model

Duygulu et. al Beyond Nouns

Page 18: Beyond nouns eccv_2008

Experimental Results

Precision-Recall:

Precision Ratio- The ratio of number of images that have been correctly annotated with that word to the number of images which were annotated with the word by the algorithm. (Respect to Human Observers)

Recall Ratio: The ratio of the number of images correctly annotated with that word using the algorithm to the number of images that should have been annotated with that word. (Respect to Corel Annotations)

Page 19: Beyond nouns eccv_2008

Conclusion

• Most approaches to learn visual classifiers from weakly labeled data use a “bag” of nouns model and try to find correspondence using co-occurrence of image features and the nouns. However, correspondence ambiguity remains.

• This algorithm proposes an EM based method to simultaneously learn visual classifiers for nouns, prepositions and comparative adjectives.

• Experimental results show that using relationship words helps in reduction of correspondence ambiguity and using a constrained model leads to a better labeling performance.

Page 20: Beyond nouns eccv_2008

Thank you