minimal loss hashing for compact binary codes

34
Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi David Fleet University of Toronto

Upload: boaz

Post on 14-Jan-2016

127 views

Category:

Documents


0 download

DESCRIPTION

Minimal Loss Hashing for Compact Binary Codes. Mohammad Norouzi David Fleet University of Toronto. Near Neighbor Search. Near Neighbor Search. Near Neighbor Search. Similarity-Preserving Binary Hashing. Why binary codes? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Minimal Loss Hashing for Compact Binary Codes

Minimal Loss Hashing for Compact Binary Codes

Mohammad Norouzi

David Fleet

University of Toronto

Page 2: Minimal Loss Hashing for Compact Binary Codes

Near Neighbor Search

Page 3: Minimal Loss Hashing for Compact Binary Codes

Near Neighbor Search

Page 4: Minimal Loss Hashing for Compact Binary Codes

Near Neighbor Search

Page 5: Minimal Loss Hashing for Compact Binary Codes

Similarity-Preserving Binary Hashing

Why binary codes?

Sub-linear search using hash indexing

(even exhaustive linear search is fast)

Binary codes are storage-efficient

Page 6: Minimal Loss Hashing for Compact Binary Codes

input vector

parametermatrix

binaryquantization

Random projections used by locality-sensitive hashing

(LSH) and related techniques [Indyk & Motwani ‘98;

Charikar ’02; Raginsky & Lazebnik ’09]

Similarity-Preserving Binary Hashing

Hash function

kth row of W

Page 7: Minimal Loss Hashing for Compact Binary Codes

Learning Binary Hash Functions

Reasons to learn hash functions:

to find more compact binary codes

to preserve general similarity measures

Previous work

boosting [Shakhnarovich et al ’03]

neural nets [Salakhutdinov & Hinton 07; Torralba et al 07]

spectral methods [Weiss et al ’08]

loss-based methods [Kulis & Darrel ‘09]

Page 8: Minimal Loss Hashing for Compact Binary Codes

Formulation

Input data:

Similarity labels:

Hash function:

Binary codes:

Page 9: Minimal Loss Hashing for Compact Binary Codes

Loss Function

Hash code quality measured by a loss function:

similarity label

binarycodes : code for item 1

: code for item 2

: similarity label

cost

measures consistency

Similar items should map to nearby hash codes

Dissimilar items should map to very different codes

Page 10: Minimal Loss Hashing for Compact Binary Codes

Hinge Loss

Similar items should map to codes within a radius of bits

Dissimilar items should map to codes no closer than bits

Page 11: Minimal Loss Hashing for Compact Binary Codes

Empirical Loss

Good:

incorporates quantization and Hamming distance

Not so good:

discontinuous, non-convex objective function

Given training pairs with similarity labels

Page 12: Minimal Loss Hashing for Compact Binary Codes

We minimize an upper bound on empirical loss,

inspired by structural SVM formulations

[Taskar et al ‘03; Tsochantaridis et al ‘04; Yu &

Joachims ‘09]

Page 13: Minimal Loss Hashing for Compact Binary Codes

Bound on loss

LHS = RHS

Page 14: Minimal Loss Hashing for Compact Binary Codes

Bound on loss

Remarks: piecewise linear in W convex-concave in W relates to structural SVM with latent variables

[Yu & Joachims ‘09]

Page 15: Minimal Loss Hashing for Compact Binary Codes

Bound on Empirical Loss

Loss-adjusted inference

Exact

Efficient

Page 16: Minimal Loss Hashing for Compact Binary Codes

Perceptron-like Learning

Initialize with LSH

Iterate over pairs

• Compute , the codes given by

• Solve loss-adjusted inference

• Update

[McAllester et al.., 2010]

Page 17: Minimal Loss Hashing for Compact Binary Codes

Experiment: Euclidean ANN

Similarity based on Euclidean distance

Datasets LabelMe (GIST) MNIST (pixels) PhotoTourism (SIFT) Peekaboom (GIST) Nursery (8D attributes) 10D Uniform

Page 18: Minimal Loss Hashing for Compact Binary Codes

Experiment: Euclidean ANN

22K LabelMe

512 GIST

20K training

2K testing

~1% of pairs are similar

Evaluation

Precision: #hits / number of items retrieved

Recall: #hits / number of similar items

Page 19: Minimal Loss Hashing for Compact Binary Codes

Techniques of interest

MLHMLH – minimal loss hashing (This work)

LSHLSH – locality-sensitive hashing (Charikar ‘02)

SHSH – spectral hashing (Weiss, Torralba & Fergus ‘09)

SIKHSIKH – shift-Invariant kernel hashing (Raginsky & Lazebnik ‘09)

BRE BRE – Binary reconstructive embedding (Kulis & Darrel ‘09)

Page 20: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 32 bits

Page 21: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 32 bits

Page 22: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 32 bits

Page 23: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 64 bits

Page 24: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 64 bits

Page 25: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 128 bits

Page 26: Minimal Loss Hashing for Compact Binary Codes

Euclidean Labelme – 256 bits

Page 27: Minimal Loss Hashing for Compact Binary Codes

Experiment: Semantic ANN

Semantic similarity measure based on annotations(object labels) from LabelMe database:

512D GIST, 20K training, 2K testing

Techniques of interest

MLHMLH – minimal loss hashing

NNNN – nearest neighbor in GIST space

NNCA NNCA – multilayer network with RBM pre-training and nonlinear NCA fine tuning [Torralba, et al. ’09; Salakhutdinov & Hinton ’07]

Page 28: Minimal Loss Hashing for Compact Binary Codes

Semantic LabelMe

Page 29: Minimal Loss Hashing for Compact Binary Codes

Semantic LabelMe

Page 30: Minimal Loss Hashing for Compact Binary Codes
Page 31: Minimal Loss Hashing for Compact Binary Codes
Page 32: Minimal Loss Hashing for Compact Binary Codes
Page 33: Minimal Loss Hashing for Compact Binary Codes

Summary

A formulation for learning binary hash functions

based on

structured prediction with latent variables

hinge-like loss function for similarity search

Experiments show that with minimal loss hashing

binary codes can be made more compact

semantic similarity based on human labels can be preserved

Page 34: Minimal Loss Hashing for Compact Binary Codes

Thank you!

Questions?