indexing techniques

70
Indexing Techniques Mei-Chen Yeh

Upload: altessa

Post on 24-Feb-2016

41 views

Category:

Documents


1 download

DESCRIPTION

Indexing Techniques. Mei-Chen Yeh. Last week. Matching two sets of features Strategy 1 Convert to a fixed-length feature vector (Bag-of-words) Use a conventional proximity measure Strategy 2: Build point correspondences. visual vocabulary. …. Last week: bag-of-words. frequency. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Indexing Techniques

Indexing Techniques

Mei-Chen Yeh

Page 2: Indexing Techniques

Last week

• Matching two sets of features– Strategy 1

• Convert to a fixed-length feature vector (Bag-of-words)• Use a conventional proximity measure

– Strategy 2:• Build point correspondences

Page 3: Indexing Techniques

Last week: bag-of-words

…..

freq

uenc

y

codewords

visual vocabulary

Page 4: Indexing Techniques

Matching local features: building patch correspondences

?

To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD)

Image 1 Image 2

Slide credits: Prof. Kristen Grauman

Page 5: Indexing Techniques

Matching local features: building patch correspondences

?

Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance)

Image 1 Image 2

Slide credits: Prof. Kristen Grauman

Page 6: Indexing Techniques

Indexing local features• Each patch / region has a descriptor, which is a point

in some high-dimensional feature space (e.g., SIFT)

Descriptor’s feature space

Database images

Page 7: Indexing Techniques

Indexing local features• When we see close points in feature space, we have

similar descriptors, which indicates similar local content.

Descriptor’s feature space

Database images

Query image

Page 8: Indexing Techniques

Problem statement

• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

Page 9: Indexing Techniques

50 thousand images

Slide credit: Nistér and Stewénius

4m

Page 10: Indexing Techniques

110 million images?

Page 11: Indexing Techniques
Page 12: Indexing Techniques
Page 13: Indexing Techniques

Scalability matters!

Page 14: Indexing Techniques

The Nearest-Neighbor Search Problem

• Given– A set S of n points in d dimensions– A query point q

• Which point in S is closest to q?

Time complexity of linear scan: O( ? ) dn

?

Page 15: Indexing Techniques

The Nearest-Neighbor Search Problem

Page 16: Indexing Techniques

The Nearest-Neighbor Search Problem

• r-nearest neighbor– for any query q, returns a point p S∈

s.t.

• c-approximate r-nearest neighbor– for any query q, returns a point p’ S∈

s.t.

rqp

crqp '

Page 17: Indexing Techniques

Today

• Indexing local features– Inverted file– Vocabulary tree– Locality sensitivity hashing

Page 18: Indexing Techniques

Indexing local features:

inverted file

Page 19: Indexing Techniques

Indexing local features: inverted file

• For text documents, an efficient way to find all pages on which a word occurs is to use an index.

• We want to find all images in which a feature occurs.– page ~ image– word ~ feature

• To use this idea, we’ll need to map our features to “visual words”.

Page 20: Indexing Techniques

Text retrieval vs. image search

• What makes the problems similar, different?

Page 21: Indexing Techniques

Visual words

• Extract some local features from a number of images …

e.g., SIFT descriptor space: each point is 128-dimensional

Slide credit: D. Nister, CVPR 2006

Page 22: Indexing Techniques

Visual words

Page 23: Indexing Techniques

Visual words

Page 24: Indexing Techniques

Visual words

Page 25: Indexing Techniques

Each point is a local descriptor, e.g. SIFT vector.

Page 26: Indexing Techniques

Example: Quantize into 3 words

Page 27: Indexing Techniques

• Map high-dimensional descriptors to tokens/words by quantizing the feature space

Descriptor’s feature space

• Quantize via clustering, let cluster centers be the prototype “words”

• Determine which word to assign to each new image region by finding the closest cluster center.

Word #2

Visual words

Page 28: Indexing Techniques

• Each group of patches belongs to the same visual word!

Figure from Sivic & Zisserman, ICCV 2003

Visual words

Page 29: Indexing Techniques

Visual vocabulary formation

Issues:• Sampling strategy: where to extract features? Fixed

locations or interest points?• Clustering / quantization algorithm• What corpus provides features (universal

vocabulary?)• Vocabulary size, number of words• Weight of each word?

Page 30: Indexing Techniques

Inverted file index

The index maps word-to-image ids

Why the index give us a significant gain in efficiency?

Page 31: Indexing Techniques

A query image is matched to database images that share visual words.

Inverted file index

Page 32: Indexing Techniques

tf-idf weighting• Term frequency – inverse document frequency• Describe the frequency of each word within an

image, decrease the weights of the words that appear often in the database– economic, trade, …– the, most, we, …

w↗w↘

discriminative regionscommon regions

Page 33: Indexing Techniques

tf-idf weighting• Term frequency – inverse document frequency• Describe the frequency of each word within an

image, decrease the weights of the words that appear often in the database

Total number of documents in database

Number of documents word i occurs in, in whole database

Number of occurrences of word i in document d

Number of words in document d

Page 34: Indexing Techniques

Slide credit: Xin Yang

Bag-of-Words + Inverted file

Training images

Local descriptors from training samples

Feature space Vocabulary

Visual-word2Visual-word1

Visual-word3

Freq

uenc

y

Visual Words

Local descriptor

VW1

VW2

VW3

VWk

K: number of words in vocabulary

Image i Image k...

Image i Image j... Image k...

Image m Image n...

.

.

.

Matching Scorehttp://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html

Bag-of-words representation

Inverted file

http://people.cs.ubc.ca/~lowe/keypoints/

Page 35: Indexing Techniques

D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006.

Page 36: Indexing Techniques
Page 37: Indexing Techniques

Visualize as a tree

Page 38: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 39: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 40: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 41: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 42: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

42

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 43: Indexing Techniques

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Recognition

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

RetrievedOr perform geometric verification

Page 44: Indexing Techniques

Think about the computational advantage of the hierarchical tree vs. a flat vocabulary!

Page 45: Indexing Techniques

Hashing

Page 46: Indexing Techniques

Direct addressing

• Create a direct-address table with m slots

U(universe of keys)

K(actual keys)

0123456789

2

3

5

8

key satellite data

23

5 8

1 9 4 07

6

Page 47: Indexing Techniques

Direct addressing

• Search operation: O(1)• Problem: The range of keys can be large!

– 64-bit numbers => 18,446,744,073,709,551,616 different keys

– SIFT: 128 * 8 bits U

K

Page 48: Indexing Techniques

Hashing

• O(1) average-case time• Use a hash function h to compute the slot

from the key k

U(universe of keys)

K(actual keys)

k1k4

k5

T: hash table0

h(k1)h(k4)

h(k5)

m-1

may not be k1 anymore!

k3

= h(k3) may share a bucket

Page 49: Indexing Techniques

Hashing

• A good hash function– Satisfies the assumption of simple uniform

hashing: each key is equally likely to hash to any of the m slots.

• How to design a hash function for indexing high-dimensional data?

Page 50: Indexing Techniques

128-d

T: hash table

?

Page 51: Indexing Techniques

Locality-sensitive hashing

• Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998.

Page 52: Indexing Techniques

Locality-sensitive hashing (LSH)

• Hash functions are locality-sensitive, if, for any pair of points p, q we have:– Pr[h(p)=h(q)] is “high” if p is close to q– Pr[h(p)=h(q)] is “low” if p is far from q

Pr[ ( ) ( )] ( , ),h Fh x h y sim x y

Page 53: Indexing Techniques

Locality Sensitive Hashing

• A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q:– if then Pr[h(p)=h(q)] > P1– if then Pr[h(p)=h(q)] < P2

rqp crqp

Page 54: Indexing Techniques

LSH Function: Hamming Space

• Consider binary vectors– points from {0, 1}d

– Hamming distance D(p, q) = # positions on which p and q differ

Example: (d = 3)

D(100, 011) =D(010, 111) =

32

Page 55: Indexing Techniques

LSH Function: Hamming Space

• Define hash function h as hi(p) = pi where pi is the i-th bit of p

Example: select the 1st dimension

h(010) = 0h(111) = 1

Pr[h(010)≠h(111)] = ?⅔ vs. D(p, q)? d?Pr[h(p)=h(q)] = ?1 - D(p, q)/d

= D(p, q)/d

Clearly, h is locality sensitive.

Page 56: Indexing Techniques

LSH Function: Hamming Space

• A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T

– Each hi(p) is chosen randomly

– Each hi(p) results in a single bitk

P

1

111Pr(similar points collide) ≥

Pr(dissimilar points collide) ≤ kP2

Indyk and Motwani [1998]

Page 57: Indexing Techniques

LSH Function: R2 space

• Consider 2-d vectors

Page 58: Indexing Techniques

LSH Function: R2 space

• The probability that a random hyperplane separates two unit vectors depends on the angle between them:

Page 59: Indexing Techniques

LSH Pre-processing

• Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL

• Preprocessing Space: O(LN)

Page 60: Indexing Techniques

LSH Querying

• For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L.

• Perform a linear search on the union of the bins.

Page 61: Indexing Techniques

W. –T Lee and H. –T. Chen. Probing the local-feature space of interest points, ICIP 2010.

Page 62: Indexing Techniques

Hash family

a : random vector sampled from a Gaussian distribution

b : real value chosen uniformly from the range [0 , r]

r : segment width

The dot-product a v projects each vector ‧ v to “a line”

Page 63: Indexing Techniques

Building the hash table

Page 64: Indexing Techniques

Building the hash table

: segment width(max-min)/t

For each random projection, we get t buckets.

Page 65: Indexing Techniques

Building the hash table

• Generate K projections

How many buckets do we get? tK

Combing them to get an index in the hash table:

Page 66: Indexing Techniques

Building the hash table

• Example – 5 projections (K = 5)– 15 segments (t = 15)

• 155 = 759,375 buckets in total!

Page 67: Indexing Techniques

Collect three image patches of different size 16x16 , 32x32 , 64x64

Each set consist of 200,000 patches.

Natural image patches (from Berkeley segmentation database )

Noise image patches (Randomly-generated noise patches)

Sketching the Feature Space

Page 68: Indexing Techniques

Patch distribution over buckets

Page 69: Indexing Techniques

Summary

• Indexing techniques are essential for organizing a database and for enabling fast matching.

• For indexing high-dimensional data– Inverted file– Vocabulary tree– Locality sensitive hashing

Page 70: Indexing Techniques

Resources and extended readings

• LSH Matlab Toolbox– http://www.cs.brown.edu/~gregory/download.ht

ml• Yeh et al., “Adaptive Vocabulary Forests for

Dynamic Indexing and Category Learning,” ICCV 2007.