data science research day (talk)

41
Variable Bit Quantisation for Large Scale Search Sean Moran Final year PhD student Institute of Language, Cognition and Computation 12th September 2014

Upload: sean-moran

Post on 09-Jun-2015

147 views

Category:

Data & Analytics


0 download

DESCRIPTION

Edinburgh University Informatics Data Science Research day talk

TRANSCRIPT

Page 1: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Sean Moran

Final year PhD studentInstitute of Language, Cognition and Computation

12th September 2014

Page 2: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Nearest Neighbour Search

Variable Bit Quantisation for LSH

Evaluation

Summary

Page 3: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Nearest Neighbour Search

Variable Bit Quantisation for LSH

Evaluation

Summary

Page 4: Data Science Research Day (Talk)

Nearest Neighbour Search

I Given a query q find the nearest neighbour NN(q) fromX = {x1, x2 . . . , xN} where NN(q) = argminx∈Xdist(q, x)

I dist(q,x) is a distance measure - e.g. Euclidean, Cosine etc

q ?

y

x

1-NN

1-NN

q

x

y

I Generalised variant: K-nearest neighbour search (KNN(q))

I Compare query to all N database items - O(N) query time

Page 5: Data Science Research Day (Talk)

Example: First Story Detection in Twitter

I Real-time detection of first stories in the Twitter stream

I Haiti earthquake struck 21:53, first story at 22:17 UTC

I 22:17:43 justinholtweb NOT expecting Tsunami on east coastafter haiti earthquake. good news.

I State-of-the-art FSD uses NN search under the bonnet

I Problems: dimensionality (1 million+) and data volume(250Gb/day)

I Hashing-based approximate NN operates in O(1) time [1]

[1] Real-Time Detection, Tracking and Monitoring of Discovered Events inSocial Media. S. Moran et al. In ACL’14.

Page 6: Data Science Research Day (Talk)

Example: First Story Detection in Twitter

Time

Volume

T 1

Rapper Lil Wayne ends up in hospital after a skateboarding accident

Page 7: Data Science Research Day (Talk)

Example: First Story Detection in Twitter

Time

Thor Hushovd wins stage 13 of Tour de France

T 2

Volume

Page 8: Data Science Research Day (Talk)

Example: First Story Detection in Twitter

Time

T 3

Volume

Magnitude 5.4 earthquake hits western Japan

Page 9: Data Science Research Day (Talk)

Example: First Story Detection in Twitter

Time

T 8

USGS and Nat Weather Service NOT expecting Tsunami on east coast after haiti earthquake.

Volume

Page 10: Data Science Research Day (Talk)

Hashing-based approximate NN search

Tweets from - T 1 T 7

H Database

Page 11: Data Science Research Day (Talk)

Hashing-based approximate NN search

110101

010111

Hash Table

010101

111101

.....

H Database

Page 12: Data Science Research Day (Talk)

Hashing-based approximate NN search

H

Query

Tweet from T 8

110101

010111

Hash Table

010101

111101

.....

H Database

Page 13: Data Science Research Day (Talk)

Hashing-based approximate NN search

H

Query

H Database

110101

010111

Hash Table

010101

111101

.....

Page 14: Data Science Research Day (Talk)

Hashing-based approximate NN search

H

Query ComputeSimilarity

Query

H Database

110101

010111

010101

111101

.....

Hash Table

Page 15: Data Science Research Day (Talk)

Hashing-based approximate NN search

H

Query

NearestNeighbours

H Database

110101

010111

010101

111101

.....

ComputeSimilarity

Query

Hash Table

Page 16: Data Science Research Day (Talk)

Hashing-based approximate NN search

H

Query

H Database

110101

010111

010101

111101

.....

ComputeSimilarity

Query

First Story!

Hash Table

Page 17: Data Science Research Day (Talk)

Locality Sensitive Hashing (LSH)

x

y

Page 18: Data Science Research Day (Talk)

Locality Sensitive Hashing (LSH)

x

y

n2

n1

h1

h2

11

0100

1011

Page 19: Data Science Research Day (Talk)

Locality Sensitive Hashing (LSH)

x

y

n2

n1

h1

h2

11

0100

10

h1(q) (q)h2

11

q

Page 20: Data Science Research Day (Talk)

Step 1: Projection

n2

n1

q

Page 21: Data Science Research Day (Talk)

Step 2: Single Bit Quantisation (SBQ)

0 1

t

n2

q

I Threshold typically zero (sign function): sgn(n2.q)

I Generate full 2 bit hash key (bitcode) by concatenation:

g(q) = h1(q)⊕ h2(q) = sgn(n1.q)⊕ sgn(n2.q)= 1⊕ 1 = 11

Page 22: Data Science Research Day (Talk)

Many more methods exist...

I Very active area of research:

I Kernel methods [3]

I Spectral methods [4] [5]

I Neural networks [6]

I Loss based methods [7]

I Commonality: all use single bit quantisation (SBQ)

[3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS ’09.[4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS ’08.[5] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI ’12.[6] R. Salakhutdinov and G. Hinton. Semantic Hashing. NIPS ’08.[7] B. Kulis and T. Darrell. Learning to Hash with Binary Reconstructive Embeddings. NIPS ’09.

Page 23: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Nearest Neighbour Search

Variable Bit Quantisation for LSH

Evaluation

Summary

Page 24: Data Science Research Day (Talk)

Problem 1: SBQ leads to high quantisation errors

I Threshold at zero can separate many related Tweets:

−1.5 −1 −0.5 0 0.5 1 1.50

1000

2000

3000

4000

5000

6000

Projected Value

Co

un

t

Page 25: Data Science Research Day (Talk)

Problem 2: some hyperplanes are better than others

n1

n2Projected Dimension 2

Projected Dimension 1

x

y

n2

n1

h1

h2

11

0100

10

Page 26: Data Science Research Day (Talk)

Problem 2: some hyperplanes are better than others

n1

n2Projected Dimension 2

Projected Dimension 1

x

y

n2

n1

h1

h2

11

0100

10

Page 27: Data Science Research Day (Talk)

Problem 2: some hyperplanes are better than others

n1

n2Projected Dimension 2

Projected Dimension 1

x

y

n2

n1

h1

h2

11

0100

10

Page 28: Data Science Research Day (Talk)

Threshold Positioning

I Multiple bits per hyperplane requires multiple thresholds [8]

I F -score optimisation: maximise # related tweets falling insidethe same thresholded regions:

F1 score: 1.00

00 01 10 11

t1 t2 t3

n2

F1 score: 0.44

00 01 10 11

t1 t2 t3

n2

[8] Neighbourhood Preserving Quantisation for LSH. S. Moran et al. In

SIGIR’13

Page 29: Data Science Research Day (Talk)

Variable Bit Allocation

I F -score is a measure of the neighbourhood preservation [9]:

F1 score: 1.00

F1 score: 0.25

00 01 10 11

t1 t2 t3

0 bits assigned: 0 thresholds do just as well as one or more thresholds

n1

n2

2 bits assigned: 3 thresholds perfectly preserve the neighbourhood structure

I Compute bit allocation that maximises the cumulative F -score

I Bit allocation solved as a binary integer linear program (BILP)

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 30: Data Science Research Day (Talk)

Variable Bit Allocation

max ‖F ◦ Z‖subject to ‖Zh‖ = 1 h ∈ {1 . . . B}

‖Z ◦D‖ ≤ B

Z is binary

I F contains the F scores per hyperplane, per bit count

I Z is an indicator matrix specifying the bit allocation

I D is a constraint matrix

I B is the bit budget

I ‖.‖ denotes the Frobenius L1 norm

I ◦ the Hadamard product

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 31: Data Science Research Day (Talk)

Variable Bit Allocation

max ‖F ◦ Z‖subject to ‖Zh‖ = 1 h ∈ {1 . . . B}

‖Z ◦D‖ ≤ B

Z is binary

I F contains the F scores per hyperplane, per bit count

I Z is an indicator matrix specifying the bit allocation

I D is a constraint matrix

I B is the bit budget

I ‖.‖ denotes the Frobenius L1 norm

I ◦ the Hadamard product

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 32: Data Science Research Day (Talk)

Variable Bit Allocation

max ‖F ◦ Z‖subject to ‖Zh‖ = 1 h ∈ {1 . . . B}

‖Z ◦D‖ ≤ B

Z is binary

I F contains the F scores per hyperplane, per bit count

I Z is an indicator matrix specifying the bit allocation

I D is a constraint matrix

I B is the bit budget

I ‖.‖ denotes the Frobenius L1 norm

I ◦ the Hadamard product

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 33: Data Science Research Day (Talk)

Variable Bit Allocation

max ‖F ◦ Z‖subject to ‖Zh‖ = 1 h ∈ {1 . . . B}

‖Z ◦D‖ ≤ B

Z is binary

I F contains the F scores per hyperplane, per bit count

I Z is an indicator matrix specifying the bit allocation

I D is a constraint matrix

I B is the bit budget

I ‖.‖ denotes the Frobenius L1 norm

I ◦ the Hadamard product

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 34: Data Science Research Day (Talk)

Variable Bit Allocation

max ‖F ◦ Z‖subject to ‖Zh‖ = 1 h ∈ {1 . . . B}

‖Z ◦D‖ ≤ B

Z is binary

F h1 h2

b0 0.25 0.25b1 0.35 0.50b2 0.40 1.00

D

0 01 12 2

Z

1 00 00 1

I Sparse solution: lower quality hyperplanes discarded [9].

[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

Page 35: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Nearest Neighbour Search

Variable Bit Quantisation for LSH

Evaluation

Summary

Page 36: Data Science Research Day (Talk)

Evaluation Protocol

I Task: Text and image retrieval

I Projections: LSH [2], Shift-invariant kernel hashing (SIKH)[3], Spectral Hashing (SH) [4] and PCA-Hashing (PCAH) [5].

I Baselines: Single Bit Quantisation (SBQ), ManhattanHashing (MQ)[10], Double-Bit quantisation (DBQ) [11].

I Evaluation: how well do we retrieve the NN of queries?

[2] P. Indyk and R. Motwani. Approximate nearest neighbors: removing the curse of dimensionality. In STOC ’98.[3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS ’09.[4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS ’08.[9] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI ’12.[10] W. Kong and W. Li and M. Guo. Manhattan hashing for large-scale image retrieval. SIGIR ’12.[11] W. Kong and W. Li. Double Bit Quantisation for Hashing. AAAI ’12.

Page 37: Data Science Research Day (Talk)

AUPRC across different projections (variable # bits)

Dataset Images (32 bits) Text (128 bits)

SBQ MQ DBQ VBQ SBQ MQ DBQ VBQ

SIKH 0.042 0.046 0.047 0.161 0.102 0.112 0.087 0.389

LSH 0.119 0.091 0.066 0.207 0.276 0.201 0.175 0.538

SH 0.051 0.144 0.111 0.202 0.033 0.028 0.030 0.154

PCAH 0.036 0.132 0.107 0.219 0.095 0.034 0.027 0.154

I Variable bit allocation yields substantial gains in retrievalaccuracy

I VBQ is an effective multimodal quantisation scheme

Page 38: Data Science Research Day (Talk)

AUPRC for LSH across a broad bit range

32 48 64 96 128 2560

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SBQ MQ VBQ

Number of Bits

AU

PR

C

8 16 24 32 40 48 56 640

0.05

0.1

0.15

0.2

0.25

0.3

0.35

SBQ MQ VBQ

Number of Bits

AU

PR

C

(a) Text (b) Images

I VBQ is effective for both long and short bit codes (hash keys)

Page 39: Data Science Research Day (Talk)

Variable Bit Quantisation for Large Scale Search

Nearest Neighbour Search

Variable Bit Quantisation for LSH

Evaluation

Summary

Page 40: Data Science Research Day (Talk)

Summary

I Proposed a novel data-driven scheme (VBQ) to adaptivelyassign variable bits per LSH hyperplane

I Hyperplanes better preserving the neighbourhood structureare afforded more bits from budget

I VBQ substantially increased LSH retrieval performance acrosstext and image datasets

I Current work: method to couple the quantisation andprojection stages of LSH

Page 41: Data Science Research Day (Talk)

Thank you for your attention!

I FSD live system: goo.gl/Q7WQOk [1]

I Running over live Twitter stream in real time

I Sub-second detection latency via ANN search (1 CPU!)

I Papers: www.seanjmoran.com [1][8][9]

I Contact: [email protected]

[1] Real-Time Detection, Tracking and Monitoring of Discovered Events inSocial Media. S. Moran et al. In ACL’14.

[8] Variable Bit Quantisation for LSH. S. Moran et al. In ACL’13

[9] Neighbourhood Preserving Quantisation for LSH. S. Moran et al. In

SIGIR’13