andrea frome, yoram singer, fei sha, jitendra malik

29
Learning Globally- Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Upload: lupita

Post on 07-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. Goal. Nearest neighbor classification. D ( , ). Learning a Distance Metric from Relative Comparisons. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Learning Globally-Consistent Local Distance Functions for Shape-Based

Image Retrieval and Classification

Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Page 2: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Goal

Page 3: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Nearest neighbor classification

D ( , )

Page 4: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Learning a Distance Metric from Relative Comparisons

[Schulz & Joachims, NIPS ’03]

D ( , ) D ( , )

D ( , ) = ( - )T ( - )

Page 5: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik
Page 6: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Approach

image j

image i

Page 7: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Approach

image j

image i

dji,m

Page 8: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Approach

image j

image i

Dji =Σ wj,mdji,m

image k

Page 9: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Approach

image j

image i

Dki

image k

Dji <

Page 10: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Core

image j

wj,m ?

image j

image i

Dki

image k

Dji<

Page 11: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Derivations

• Notation• Large-margin formulation• Dual problem• Solution

Page 12: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

NotationsDji =Σ wj,mdji,m Dji =wj ·dji

Dki > Dji wk ·dki > wj ·dji

for triplet i, j, k

wk ·dki - wj ·dji ≥ 1

W w1w2…wk…wj…

Xijk 0 0 … dki…-dji…

wk ·dki - wj ·dji ≥ 1 W·Xijk ≥ 1

kji

ijkXW,,

]1[

Page 13: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Large-margin formulation

kji

ijkXW,,

]1[

kji

ijkXWCW,,

2 ]1[||||21

Page 14: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

SVM

Page 15: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

SVM

Page 16: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

SVM

Page 17: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

SVM

Page 18: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Soft-margin SVM

Page 19: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Derivation

0,0,0

)1(||||21),,,,( 2

ijk

ijkijkijkijk

ijkijkijk

ijk WXWCWWL

Cijk 0ijkijkijk

CL

ijkijkijk XW

WL

ijk

ijkijk XW

Page 20: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Dual

0

1||||

1)(

2

abcabcabcabc

abcijkijkijk

abcijk

ijkijkabc

XXXX

XXF

22 ||||

)(1

||||

)(1

abc

abcabcijk

ijkijk

abc

abcabcabcijk

ijkijk

abc X

XX

X

XXX

Page 21: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Details – Features and descriptors

• Find ~400 features per image• Compute geometric blur descriptor

Page 22: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Descriptors

• Geometric blur

Page 23: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Descriptors

• Two sizes of geometric blur (42 pixels and 70 pixels)– Each is 204 dimensions (4 orientations and 51 samples each)

• HSV histograms of 42-pixel patches

Page 24: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Choosing triplets

• Caltech101 – at 15 images per class– 31.8 million triplets– Many are easy to satisfy

• For each image j, for each feature– Find the N images I with closest features– For each negative example i in I, form triplets (j, k, i)

• Eliminates ~ half of triplets

Page 25: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Choosing C

Page 26: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Choosing C• Train with multiple values of C, testing on a held-

out part of the training set• Choose whichever gives the best results

• For each C, run online version of the training algorithm– Make one sweep through training triplets– For each misclassified triplet (i,j,k), update weights for

the three images– Choose C which gets the most right answers

Page 27: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Results

• At 15 training examples per class: 63.2% (~3% improvement)• At 20 training examples per class: 66.6% (~5% improvement)

Page 28: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Results

• Confusion matrix

Hardest categories: crocodile, cougar_body, cannon, bass

Page 29: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Questions

• Is there any disadvantage to a non-metric distance function?

• Could the images be embedded in a metric space?• Why not learn everything?

– Include a feature for each image pixel– Include multiple types of descriptors

• Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)?

• Can you learn a single distance per class?