indexing techniques

Indexing Techniques

Mei-Chen Yeh

Last week

• Matching two sets of features– Strategy 1

• Convert to a fixed-length feature vector (Bag-of-words)• Use a conventional proximity measure

– Strategy 2:• Build point correspondences

Last week: bag-of-words

…..

freq

uenc

y

codewords

visual vocabulary

Matching local features: building patch correspondences

?

To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD)

Image 1 Image 2

Slide credits: Prof. Kristen Grauman

Matching local features: building patch correspondences

?

Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance)

Image 1 Image 2

Slide credits: Prof. Kristen Grauman

Indexing local features• Each patch / region has a descriptor, which is a point

in some high-dimensional feature space (e.g., SIFT)

Descriptor’s feature space

Database images

Indexing local features• When we see close points in feature space, we have

similar descriptors, which indicates similar local content.


Database images

Query image

Problem statement

• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

50 thousand images

Slide credit: Nistér and Stewénius

4m

110 million images?

Scalability matters!

The Nearest-Neighbor Search Problem

• Given– A set S of n points in d dimensions– A query point q

• Which point in S is closest to q?

Time complexity of linear scan: O( ? ) dn

?


• r-nearest neighbor– for any query q, returns a point p S∈

s.t.

• c-approximate r-nearest neighbor– for any query q, returns a point p’ S∈

s.t.

rqp

crqp '

Today

• Indexing local features– Inverted file– Vocabulary tree– Locality sensitivity hashing

Indexing local features:

inverted file

Indexing local features: inverted file

• For text documents, an efficient way to find all pages on which a word occurs is to use an index.

• We want to find all images in which a feature occurs.– page ～ image– word ～ feature

• To use this idea, we’ll need to map our features to “visual words”.

Text retrieval vs. image search

• What makes the problems similar, different?

Visual words

• Extract some local features from a number of images …

e.g., SIFT descriptor space: each point is 128-dimensional

Slide credit: D. Nister, CVPR 2006

Visual words

Each point is a local descriptor, e.g. SIFT vector.

Example: Quantize into 3 words

• Map high-dimensional descriptors to tokens/words by quantizing the feature space


• Quantize via clustering, let cluster centers be the prototype “words”

• Determine which word to assign to each new image region by finding the closest cluster center.

Word #2

Visual words

• Each group of patches belongs to the same visual word!

Figure from Sivic & Zisserman, ICCV 2003

Visual words

Visual vocabulary formation

Issues:• Sampling strategy: where to extract features? Fixed

locations or interest points?• Clustering / quantization algorithm• What corpus provides features (universal

vocabulary?)• Vocabulary size, number of words• Weight of each word?

Inverted file index

The index maps word-to-image ids

Why the index give us a significant gain in efficiency?

A query image is matched to database images that share visual words.

Inverted file index

tf-idf weighting• Term frequency – inverse document frequency• Describe the frequency of each word within an

image, decrease the weights of the words that appear often in the database– economic, trade, …– the, most, we, …

w↗w↘

discriminative regionscommon regions

tf-idf weighting• Term frequency – inverse document frequency• Describe the frequency of each word within an

image, decrease the weights of the words that appear often in the database

Total number of documents in database

Number of documents word i occurs in, in whole database

Number of occurrences of word i in document d

Number of words in document d

Slide credit: Xin Yang

Bag-of-Words + Inverted file

Training images

Local descriptors from training samples

…

Feature space Vocabulary

Visual-word2Visual-word1

Visual-word3

Freq

uenc

y

Visual Words

Local descriptor

VW1

VW2

VW3

VWk

K: number of words in vocabulary

Image i Image k...

Image i Image j... Image k...

Image m Image n...

.

.

.

Matching Scorehttp://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html

Bag-of-words representation

Inverted file

http://people.cs.ubc.ca/~lowe/keypoints/

http://www.robots.ox.ac.uk/~vgg/research/

http://www.robots.ox.ac.uk/~vgg/research/

http://people.cs.ubc.ca/~lowe/keypoints/

D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006.

Visualize as a tree

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

42

Vocabulary Tree• Training: Filling the tree



Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Re

cogn

itio

n Tu

tori

al

Vocabulary Tree• Recognition



RetrievedOr perform geometric verification

Think about the computational advantage of the hierarchical tree vs. a flat vocabulary!

Hashing

Direct addressing

• Create a direct-address table with m slots

U(universe of keys)

K(actual keys)

0123456789

2

3

5

8

key satellite data

23

5 8

1 9 4 07

6

Direct addressing

• Search operation: O(1)• Problem: The range of keys can be large!

– 64-bit numbers => 18,446,744,073,709,551,616 different keys

– SIFT: 128 * 8 bits U

K

Hashing

• O(1) average-case time• Use a hash function h to compute the slot

from the key k

U(universe of keys)

K(actual keys)

k1k4

k5

T: hash table0

h(k1)h(k4)

h(k5)

m-1

may not be k1 anymore!

k3

= h(k3) may share a bucket

Hashing

• A good hash function– Satisfies the assumption of simple uniform

hashing: each key is equally likely to hash to any of the m slots.

• How to design a hash function for indexing high-dimensional data?

128-d

T: hash table

?

Locality-sensitive hashing

• Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998.

Locality-sensitive hashing (LSH)

• Hash functions are locality-sensitive, if, for any pair of points p, q we have:– Pr[h(p)=h(q)] is “high” if p is close to q– Pr[h(p)=h(q)] is “low” if p is far from q

Pr[ ( ) ( )] ( , ),h Fh x h y sim x y

Locality Sensitive Hashing

• A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q:– if then Pr[h(p)=h(q)] > P1– if then Pr[h(p)=h(q)] < P2

rqp crqp

LSH Function: Hamming Space

• Consider binary vectors– points from {0, 1}d

– Hamming distance D(p, q) = # positions on which p and q differ

Example: (d = 3)

D(100, 011) =D(010, 111) =

32


• Define hash function h as hi(p) = pi where pi is the i-th bit of p

Example: select the 1st dimension

h(010) = 0h(111) = 1

Pr[h(010)≠h(111)] = ?⅔ vs. D(p, q)? d?Pr[h(p)=h(q)] = ?1 - D(p, q)/d

= D(p, q)/d

Clearly, h is locality sensitive.


• A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T

– Each hi(p) is chosen randomly

– Each hi(p) results in a single bitk

P

1

111Pr(similar points collide) ≥

Pr(dissimilar points collide) ≤ kP2

Indyk and Motwani [1998]

LSH Function: R2 space

• Consider 2-d vectors

LSH Function: R2 space

• The probability that a random hyperplane separates two unit vectors depends on the angle between them:

LSH Pre-processing

• Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL

• Preprocessing Space: O(LN)

LSH Querying

• For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L.

• Perform a linear search on the union of the bins.

W. –T Lee and H. –T. Chen. Probing the local-feature space of interest points, ICIP 2010.

Hash family

a : random vector sampled from a Gaussian distribution

b : real value chosen uniformly from the range [0 , r]

r : segment width

The dot-product a v projects each vector ‧ v to “a line”

Building the hash table


: segment width(max-min)/t

For each random projection, we get t buckets.


• Generate K projections

How many buckets do we get? tK

Combing them to get an index in the hash table:


• Example – 5 projections (K = 5)– 15 segments (t = 15)

• 155 = 759,375 buckets in total!

Collect three image patches of different size 16x16 , 32x32 , 64x64

Each set consist of 200,000 patches.

Natural image patches (from Berkeley segmentation database )

Noise image patches (Randomly-generated noise patches)

Sketching the Feature Space

Patch distribution over buckets

Summary

• Indexing techniques are essential for organizing a database and for enabling fast matching.

• For indexing high-dimensional data– Inverted file– Vocabulary tree– Locality sensitive hashing

Resources and extended readings

• LSH Matlab Toolbox– http://www.cs.brown.edu/~gregory/download.ht

ml• Yeh et al., “Adaptive Vocabulary Forests for

Dynamic Indexing and Category Learning,” ICCV 2007.

http://www.cs.brown.edu/~gregory/download.html

http://www.cs.brown.edu/~gregory/download.html

indexing techniques

Documents

matching local features

image content

similar local content

google image search

indexing local featureswhen

image searchwhat

new image

thousands of features