supervised learning of semantic classes for image annotation and retrieval
DESCRIPTION
This is presentation done by me for ECSE626 "Statistical Computer Vision" at McGill University. It is presentation of a project inspired by paper "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" from PAMI 2007. It presents my implementation of the paper and my achieved results.TRANSCRIPT
![Page 1: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/1.jpg)
SUPERVISED LEARNING OF SEMANTIC CLASSES FOR IMAGE ANNOTATION AND RETRIEVAL
G. Carneiro, A. Chan, P. Moreno N. Vasconcelos
by: Lukáš Tencer
ECSE626 2012
![Page 2: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/2.jpg)
Outline
• Introduction• Prior techniques
• Supervised OVA Labeling• Unsupervised Labeling
• Methodology• Supervised Multiclass Labeling• Semantic Distribution Estimation• Density Estimation
• Algorithm• Learning, Annotation, Retrieval
• Results• Quantitative• Qualitative
• Conclusion
![Page 3: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/3.jpg)
Introduction
• Task• Assign labels to unknown images• Retrieve relevant images given labels
• Supervised Learning• Learning from labeled training data• Training data consist of pairs • Multiple instance learning
• Semantic Classes• labels representing common concepts (sky, bear, snow…)
• Image Annotation and Retrieval• Annotation: Given the image D, what labels are present in
the image• Given the label what are the top n matching images
nilx ii ...1 },{
![Page 4: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/4.jpg)
Introduction
Datasets: Corel5K – 5000 images, 272 Classes Corel30K – 30000 images, 1120 Classes MIRFLICKR – 25000 images, 37 Classes (PSU) – not available anymore
ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track
Medical Image retrieval Photo Annotation Plant Identification Wikipedia Retrieval Patent Image Retrieval and Classification
![Page 5: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/5.jpg)
Introduction
Corel 5K Corel 30K MIRFLICKRBear New Zealand Urban
![Page 6: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/6.jpg)
Prior Techniques
Supervised OVA Binary decision problem, concept present /
absent Hidden variable Yi
Decision rule: Unsupervised Learning
Modeling dependency between text label and image features, expressed as hidden variable L
Considering just positive examples, densities for Yi=1
)0()0|()1()1|( || iiii YYXYYX PXPPXP
D
l LWLXWX lPlwPlxPwxP1 ||, )(),(),(),(
L
W XW1 W2 W3 X
bear
polar, grizzly features
![Page 7: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/7.jpg)
Methodology
Supervised Multiclass Labeling (SML) Elements of semantic vocabulary (W) are
explicitly made to semantic classes (L) ! Random var. W:
annotation and retrieval is then easy to do as:
Annotation Retrieval
)|(P and from sample is ifonly },...,1{ , W|X ixwxTiiW i
)(
)(),()|( |
| xP
iPixPxiP
X
WWXXW
)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji
???
![Page 8: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/8.jpg)
Methodology
Estimation of Semantic Class Distributions
Given Di training set of images, estimate Assumption: Gaussian Distribution How to estimate?
Direct estimation Model Averaging Naive Averaging
GMM model:
Averaged:
)|(| ixP WX
iD
l WLXi
WX ilxPD
ixP1 ,|| ),|(
1),(
k
kli
kli
kliWLX xGilxP ),,(),|( ,,,,|
k
D
l
kli
kli
kli
iWX
i
xGD
ixP1
,,,| ),,(1
)|(
![Page 9: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/9.jpg)
Methodology
Mixture hierarchies First step, get GMM from images –
regular soft EM
E:
M:
8
1| ),,()|(
k
kI
kI
kIWX xGIxP
InitializationEuclidian distance
Mahalonobis distance
Initial Par. estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood is too small
n
ij jjiji xGjzzxP
1
2
1),;()()|,(
)|,()|,()|,( 1 ttt zxPzxPzxP
)],;([log),(,|
ZXFEQ txz
t
),(maxarg1 tt Q
![Page 10: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/10.jpg)
Methodology
Mixture hierarchies for label Second step, get HGMM for labels
E:
M:
64
1| ),,()|(
k
kw
kw
kwWX xGwxP Initialization
Bhattacharyya distance
Initial Par. estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood is too small
n
ij jjiji xGjzzxP
1
2
1),;()()|,(
)|,()|,()|,( 1 ttt zxPzxPzxP
)],;([log),(,|
ZXFEQ txz
t
),(maxarg1 tt Q
![Page 11: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/11.jpg)
E and M step for HGMM
Input: Output: E-step:
M-step:
KkDj ikj
kj
kj ,...,1,,...,1},,,{
l
lc
Ntracelc
lc
kj
mc
Ntracemc
mc
kjm
jkkj
kj
lc
kj
kj
mc
eG
eGh
]),,([
]),,([
}){(2
1
}){(2
1
1
1
Mmmj
mj
mj ,...,1},,,{
KD
h
i
mjkjknewm
c
)(
jkjk
kj
mjk
kj
mjkm
jkkj
mjk
newmc h
hww
where,)(
jk
Tmc
kj
mc
kj
kj
mjk
newmc w ]))(([)(
![Page 12: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/12.jpg)
Algorithm - learning
Training For each training set I for label w Decompose image (192px * 128px ) into 8x8
regions by sliding window moving each 2 pixels Calculate DCT for each window (8*8*3) 192-d
feature vector Calculate mixture of 8 Gaussians for each
Image using EM
Calculate mixture of 64 Gaussians for each label using H-EM
8
1| ),,()|(
k
kI
kI
kIWX xGIxP
64
1| ),,()|(
k
kw
kw
kwWX xGwxP
![Page 13: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/13.jpg)
Algorithm – annotation, retrieval
Annotation Get n(5) beast labels for image I Get features from image ((192*128/2)*192) Get log likelihood for each label, choose the
best n
Retrieval For images IT and label w: Annotate IT and get decreasing scores of
posterior
x
iWXiWX wxPwP )|(log)|(log ||
)|(| iWX wP
![Page 14: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/14.jpg)
Results-quantitative
Database: Corel 5k Precision: Recall:
4000 training 1000 testing
retrieved
retrievedrelevant
relevant
retrievedrelevant H
C
w
wrecall
auto
C
w
wprecision
annotated automatic
annotatedhuman
images annotatedcorrectly
auto
H
C
w
w
w
![Page 15: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/15.jpg)
Results-quantitative
Non zero recall mean Recall mean Precision
1 2 3 4 5 6
w with Recall > 0 140 121 110 125 90 131
Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27
Mean Precision pre w
0.25 0.24 0.23 0.23 0.2 0.23
Annotation
![Page 16: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/16.jpg)
Results-quantitative
Recall > 0 PrecisionAll precision
1 2 3 4 5 6
Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24
Mean Recall per w R>0
0.45 0.40 0.40 0.41 0.37 0.41
Retrieval
![Page 17: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/17.jpg)
Results-qualitative
![Page 18: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/18.jpg)
Results-qualitative
plane jet f-14 sky-----------------------sky plane clouds smoke snow
coast waves water hills -----------------------water sky ocean mountain clouds
polar bear bars cage -----------------------bear snow texture sunrise closeup
people cheese market street -----------------------people wall sand flower bird
![Page 19: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/19.jpg)
Results-qualitative
![Page 20: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/20.jpg)
Results-qualitative
Blooms Mountain Pool Smoke Woman
![Page 21: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/21.jpg)
Results-qualitative
![Page 22: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/22.jpg)
Conclusions
Pros Nice segmentation as byproduct of annotation Great for general concepts with lots of samples Just weakly annotated data is required (multi-instance
learning) Allows hierarchical representation (adding images, speed)
Contras Fixed number of labels per image Learning is time consuming Parameter tuning is time consuming Weakly represented classes could be associated with wrong
concepts
![Page 23: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/23.jpg)
Resources
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).
Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).
Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).
Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).
![Page 24: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/24.jpg)
Thank you for your attentionQuestions?
[email protected]://tencer.hustej.net@lukastenceraccuratelyrandom.blogspot.comfacebook.com/lukas.tencer
![Page 25: Supervised Learning of Semantic Classes for Image Annotation and Retrieval](https://reader036.vdocument.in/reader036/viewer/2022062418/554fb962b4c9050e7d8b478e/html5/thumbnails/25.jpg)
Google labeling game