andrea frome, yoram singer, fei sha, jitendra malik

Learning Globally-Consistent Local Distance Functions for Shape-Based

Image Retrieval and Classification

Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Nearest neighbor classification

D ( , )

Learning a Distance Metric from Relative Comparisons

[Schulz & Joachims, NIPS ’03]

D ( , ) D ( , )

D ( , ) = ( - )T ( - )

Approach

image j

image i

Approach

image j

image i

Approach

image j

image i

Dji =Σ wj,mdji,m

image k

Approach

image j

image i

image k

image j

wj,m ?

image j

image i

image k

Derivations

• Notation• Large-margin formulation• Dual problem• Solution

NotationsDji =Σ wj,mdji,m Dji =wj ·dji

Dki > Dji wk ·dki > wj ·dji

for triplet i, j, k

wk ·dki - wj ·dji ≥ 1

W w1w2…wk…wj…

Xijk 0 0 … dki…-dji…

wk ·dki - wj ·dji ≥ 1 W·Xijk ≥ 1

ijkXW,,

Large-margin formulation

ijkXW,,

ijkXWCW,,

2 ]1[||||21

Soft-margin SVM

Derivation

)1(||||21),,,,( 2

ijkijkijkijk

ijkijkijk

ijk WXWCWWL

Cijk 0ijkijkijk

ijkijkijk XW

ijkijk XW

abcabcabcabc

abcijkijkijk

abcijk

ijkijkabc

22 ||||

abcabcijk

ijkijk

abcabcabcijk

ijkijk

Details – Features and descriptors

• Find ~400 features per image• Compute geometric blur descriptor

Descriptors

• Geometric blur

Descriptors

• Two sizes of geometric blur (42 pixels and 70 pixels)– Each is 204 dimensions (4 orientations and 51 samples each)

• HSV histograms of 42-pixel patches

Choosing triplets

• Caltech101 – at 15 images per class– 31.8 million triplets– Many are easy to satisfy

• For each image j, for each feature– Find the N images I with closest features– For each negative example i in I, form triplets (j, k, i)

• Eliminates ~ half of triplets

Choosing C

Choosing C• Train with multiple values of C, testing on a held-

out part of the training set• Choose whichever gives the best results

• For each C, run online version of the training algorithm– Make one sweep through training triplets– For each misclassified triplet (i,j,k), update weights for

the three images– Choose C which gets the most right answers

Results

• At 15 training examples per class: 63.2% (~3% improvement)• At 20 training examples per class: 66.6% (~5% improvement)

Results

• Confusion matrix

Hardest categories: crocodile, cougar_body, cannon, bass

Questions

• Is there any disadvantage to a non-metric distance function?

• Could the images be embedded in a metric space?• Why not learn everything?

– Include a feature for each image pixel– Include multiple types of descriptors

• Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)?

• Can you learn a single distance per class?

andrea frome, yoram singer, fei sha, jitendra malik

distance metric

training examples

training tripletsfor

training algorithmmake

training setchoose

mapproachimage jimage

misclassified triplet

n images

Documents

ethan frome day 2

ethan frome test reminders

2019 results - frome cheese show

jitendra malik university of california at...

hey ! magazine frome

ethan frome

ethan frome ppt

accommodation - discover frome · knoll hill farm,...

frome valley school, crossways

frome hey! magazine

yoram (jerry) wind cv - powerbase

stories frome nowhere island

jitendra ppt

professor yoram barzel’s retirement celebration november...

towards human level ai jitendra malik u.c. berkeley jitendra...

jitendra report

ethan frome(1)

frome valley voice

biofloc yoram avnimelech 2

peri yoram