supervised learning of semantic classes for image annotation and retrieval

SUPERVISED LEARNING OF SEMANTIC CLASSES FOR IMAGE ANNOTATION AND RETRIEVAL

G. Carneiro, A. Chan, P. Moreno N. Vasconcelos

by: Lukáš Tencer

ECSE626 2012

Outline

• Introduction• Prior techniques

• Supervised OVA Labeling• Unsupervised Labeling

• Methodology• Supervised Multiclass Labeling• Semantic Distribution Estimation• Density Estimation

• Algorithm• Learning, Annotation, Retrieval

• Results• Quantitative• Qualitative

• Conclusion

Introduction

• Task• Assign labels to unknown images• Retrieve relevant images given labels

• Supervised Learning• Learning from labeled training data• Training data consist of pairs • Multiple instance learning

• Semantic Classes• labels representing common concepts (sky, bear, snow…)

• Image Annotation and Retrieval• Annotation: Given the image D, what labels are present in

the image• Given the label what are the top n matching images

nilx ii ...1 },{

Introduction

Datasets: Corel5K – 5000 images, 272 Classes Corel30K – 30000 images, 1120 Classes MIRFLICKR – 25000 images, 37 Classes (PSU) – not available anymore

ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track

Medical Image retrieval Photo Annotation Plant Identification Wikipedia Retrieval Patent Image Retrieval and Classification

Introduction

Corel 5K Corel 30K MIRFLICKRBear New Zealand Urban

Prior Techniques

Supervised OVA Binary decision problem, concept present /

absent Hidden variable Yi

Decision rule: Unsupervised Learning

Modeling dependency between text label and image features, expressed as hidden variable L

Considering just positive examples, densities for Yi=1

)0()0|()1()1|( || iiii YYXYYX PXPPXP

D

l LWLXWX lPlwPlxPwxP1 ||, )(),(),(),(

L

W XW1 W2 W3 X

bear

polar, grizzly features

Methodology

Supervised Multiclass Labeling (SML) Elements of semantic vocabulary (W) are

explicitly made to semantic classes (L) ! Random var. W:

annotation and retrieval is then easy to do as:

Annotation Retrieval

)|(P and from sample is ifonly },...,1{ , W|X ixwxTiiW i

)(

)(),()|( |

| xP

iPixPxiP

X

WWXXW

)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji

???

Methodology

Estimation of Semantic Class Distributions

Given Di training set of images, estimate Assumption: Gaussian Distribution How to estimate?

Direct estimation Model Averaging Naive Averaging

GMM model:

Averaged:

)|(| ixP WX

iD

l WLXi

WX ilxPD

ixP1 ,|| ),|(

1),(

k

kli

kli

kliWLX xGilxP ),,(),|( ,,,,|

k

D

l

kli

kli

kli

iWX

i

xGD

ixP1

,,,| ),,(1

)|(

Methodology

Mixture hierarchies First step, get GMM from images –

regular soft EM

E:

M:

8

1| ),,()|(

k

kI

kI

kIWX xGIxP

InitializationEuclidian distance

Mahalonobis distance

Initial Par. estimate

Expectation

Maximizaiton

Max iter. 200Change in likelihood is too small

n

ij jjiji xGjzzxP

1

2

1),;()()|,(

)|,()|,()|,( 1 ttt zxPzxPzxP

)],;([log),(,|

ZXFEQ txz

t

),(maxarg1 tt Q

Methodology

Mixture hierarchies for label Second step, get HGMM for labels

E:

M:

64

1| ),,()|(

k

kw

kw

kwWX xGwxP Initialization

Bhattacharyya distance

Initial Par. estimate

Expectation

Maximizaiton

Max iter. 200Change in likelihood is too small

n

ij jjiji xGjzzxP

1

2

1),;()()|,(

)|,()|,()|,( 1 ttt zxPzxPzxP

)],;([log),(,|

ZXFEQ txz

t

),(maxarg1 tt Q

E and M step for HGMM

Input: Output: E-step:

M-step:

KkDj ikj

kj

kj ,...,1,,...,1},,,{

l

lc

Ntracelc

lc

kj

mc

Ntracemc

mc

kjm

jkkj

kj

lc

kj

kj

mc

eG

eGh

]),,([

]),,([

}){(2

1

}){(2

1

1

1

Mmmj

mj

mj ,...,1},,,{

KD

h

i

mjkjknewm

c

)(

jkjk

kj

mjk

kj

mjkm

jkkj

mjk

newmc h

hww

where,)(

jk

Tmc

kj

mc

kj

kj

mjk

newmc w ]))(([)(

Algorithm - learning

Training For each training set I for label w Decompose image (192px * 128px ) into 8x8

regions by sliding window moving each 2 pixels Calculate DCT for each window (8*8*3) 192-d

feature vector Calculate mixture of 8 Gaussians for each

Image using EM

Calculate mixture of 64 Gaussians for each label using H-EM

8

1| ),,()|(

k

kI

kI

kIWX xGIxP

64

1| ),,()|(

k

kw

kw

kwWX xGwxP

Algorithm – annotation, retrieval

Annotation Get n(5) beast labels for image I Get features from image ((192*128/2)*192) Get log likelihood for each label, choose the

best n

Retrieval For images IT and label w: Annotate IT and get decreasing scores of

posterior

x

iWXiWX wxPwP )|(log)|(log ||

)|(| iWX wP

Results-quantitative

Database: Corel 5k Precision: Recall:

4000 training 1000 testing

retrieved

retrievedrelevant

relevant

retrievedrelevant H

C

w

wrecall

auto

C

w

wprecision

annotated automatic

annotatedhuman

images annotatedcorrectly

auto

H

C

w

w

w


Non zero recall mean Recall mean Precision

1 2 3 4 5 6

w with Recall > 0 140 121 110 125 90 131

Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27

Mean Precision pre w

0.25 0.24 0.23 0.23 0.2 0.23

Annotation


Recall > 0 PrecisionAll precision

1 2 3 4 5 6

Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24

Mean Recall per w R>0

0.45 0.40 0.40 0.41 0.37 0.41

Retrieval

Results-qualitative

Results-qualitative

plane jet f-14 sky-----------------------sky plane clouds smoke snow

coast waves water hills -----------------------water sky ocean mountain clouds

polar bear bars cage -----------------------bear snow texture sunrise closeup

people cheese market street -----------------------people wall sand flower bird

Results-qualitative

Results-qualitative

Blooms Mountain Pool Smoke Woman

Results-qualitative

Conclusions

Pros Nice segmentation as byproduct of annotation Great for general concepts with lots of samples Just weakly annotated data is required (multi-instance

learning) Allows hierarchical representation (adding images, speed)

Contras Fixed number of labels per image Learning is time consuming Parameter tuning is time consuming Weakly represented classes could be associated with wrong

concepts

Resources

Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).

Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).

Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).

Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).

Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).

Thank you for your attentionQuestions?

[email protected]://tencer.hustej.net@lukastenceraccuratelyrandom.blogspot.comfacebook.com/lukas.tencer

Google labeling game

supervised learning of semantic classes for image annotation and retrieval

Technology

label w

jk newm c w

retrieval annotation

c ntrace

image d

mean precision pre w

jk tm c

jkjknewm c jk jk