learning to hash with its application to big data ... · with its application to big data retrieval...

58
Learning to Hash with its Application to Big Data Retrieval and Mining o Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai, China Joint work with h, , L¿ Dec 21, 2013 Li (http://www.cs.sjtu.edu.cn/ ~ liwujun) Learning to Hash CSE, SJTU 1 / 49

Upload: dothuy

Post on 18-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Learning to Hashwith its Application to Big Data Retrieval and Mining

o�

Department of Computer Science and EngineeringShanghai Jiao Tong University

Shanghai, China

Joint work with ��h, ÜÀ�, L¯¿

Dec 21, 2013

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 1 / 49

Page 2: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 2 / 49

Page 3: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 3 / 49

Page 4: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Nearest Neighbor Search (Retrieval)

Given a query point q, return the points closest (similar) to q in thedatabase(e.g. images).

Underlying many machine learning, data mining, information retrievalproblems

Challenge in Big Data Applications:

Curse of dimensionality

Storage cost

Query speedLi (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 4 / 49

Page 5: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Similarity Preserving Hashing

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 5 / 49

Page 6: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Reduce Dimensionality and Storage Cost

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 6 / 49

Page 7: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Querying

Hamming distance:

||01101110, 00101101||H = 3||11011, 01011||H = 1

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 7 / 49

Page 8: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Querying

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 8 / 49

Page 9: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Querying

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 9 / 49

Page 10: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Fast Query Speed

By using hashing scheme, we can achieve constant or sub-linearsearch time complexity.

Exhaustive search is also acceptable because the distance calculationcost is cheap now.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 10 / 49

Page 11: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Problem Definition

Two Stages of Hash Function Learning

Projection Stage (Dimension Reduction)

Projected with real-valued projection functionGiven a point x, each projected dimension i will be associated with areal-valued projection function fi(x) (e.g. fi(x) = wT

i x)

Quantization Stage

Turn real into binary

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 11 / 49

Page 12: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

Data-Independent Methods

The hashing function family is defined independently of the trainingdataset:

Locality-sensitive hashing (LSH): (Gionis et al., 1999; Andoni andIndyk, 2008) and its extensions (Datar et al., 2004; Kulis andGrauman, 2009; Kulis et al., 2009).

SIKH: Shift invariant kernel hashing (SIKH) (Raginsky and Lazebnik,2009).

Hashing function: random projections.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 12 / 49

Page 13: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

Data-Dependent Methods

Hashing functions are learned from a given training dataset.

Relatively short codes

Seminal papers: (Salakhutdinov and Hinton, 2007, 2009; Torralba et al.,2008; Weiss et al., 2008)

Two categories:

Unimodal

Supervised methodsgiven the labels yi or triplet (xi, xj , xk)Unsupervised methods

Multimodal

Supervised methodsUnsupervised methods

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 13 / 49

Page 14: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

(Unimodal) Unsupervised Methods

No labels to denote the categories of the training points.

PCAH: principal component analysis.

SH: (Weiss et al., 2008) eigenfunctions computed from the datasimilarity graph.

ITQ: (Gong and Lazebnik, 2011) orthogonal rotation matrix torefine the initial projection matrix learned by PCA.

AGH: Graph-based hashing (Liu et al., 2011).

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 14 / 49

Page 15: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

(Unimodal) Supervised (semi-supervised) Methods

Class labels or pairwise constraints:

SSH: Semi-Supervised Hashing (SSH) (Wang et al., 2010a,b) exploitsboth labeled data and unlabeled data for hash function learning.

MLH: Minimal loss hashing (MLH) (Norouzi and Fleet, 2011) basedon the latent structural SVM framework.

KSH: Kernel-based supervised hashing (Liu et al., 2012)

LDAHash: Linear discriminant analysis based hashing (Strecha et al.,2012)

Triplet-based methods:

Hamming Distance Metric Learning (HDML) (Norouzi et al., 2012)

Column Generation base Hashing (CGHash) (Li et al., 2013)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 15 / 49

Page 16: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

Multimodal Methods

Multi-Source Hashing

Cross-Modal Hashing

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 16 / 49

Page 17: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

Multi-Source Hashing

Aims at learning better codes by leveraging auxiliary views thanunimodal hashing.

Assumes that all the views provided for a query, which are typicallynot feasible for many multimedia applications.

Multiple Feature Hashing (Song et al., 2011)

Composite Hashing (Zhang et al., 2011)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 17 / 49

Page 18: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

Cross-Modal Hashing

Given a query of either image or text, return images or texts similar to it.

Cross View Hashing (CVH) (Kumar and Udupa, 2011)

Multimodal Latent Binary Embedding (MLBE) (Zhen and Yeung,2012a)

Co-Regularized Hashing (CRH) (Zhen and Yeung, 2012b)

Inter-Media Hashing (IMH) (Song et al., 2013)

Relation-aware Heterogeneous Hashing (RaHH) (Ou et al., 2013)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 18 / 49

Page 19: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Introduction Existing Methods

IS�ó�

FDU: Yugang Jiang, Xuanjing Huang

HKUST: Dit-Yan Yeung

IA-CAS: Cheng-Lin Liu, Yan-Ming Zhang

ICT-CAS: Hong Chang

MSRA: Kaiming He, Jian Sun, Jingdong Wang

NUST: Fumin Shen

SYSU: Weishi Zheng

Tsinghua: Peng Cui, Shiqiang Yang, Wenwu Zhu

ZJU: Jiajun Bu, Deng Cai, Xiaofei He, Yueting Zhuang......

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 19 / 49

Page 20: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 20 / 49

Page 21: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

Motivation

Problem:All existing methods use the same number of bits for different projecteddimensions with different variances.

Possible Solutions:

Different number of bits for different dimensions(Unfortunately, have not found an effective way)

Isotropic (equal) variances for all dimensions

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 21 / 49

Page 22: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

Contribution

Isotropic hashing (IsoHash):(Kong and Li, 2012b)hashing with isotropic variances for all dimensions

Multiple-bit quantization:(1) Double-bit quantization (DBQ):(Kong and Li, 2012a)Hamming distance driven

(2) Manhattan hashing (MH):(Kong et al., 2012)Manhattan distance driven

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 22 / 49

Page 23: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

PCA Hash

To generate a code of m bits, PCAH performs PCA on X, and then usethe top m eigenvectors of the matrix XXT as columns of the projectionmatrix W ∈ Rd×m. Here, top m eigenvectors are those corresponding tothe m largest eigenvalues {λk}mk=1, generally arranged with thenon-increasing order λ1 ≥ λ2 ≥ · · · ≥ λm. Let λ = [λ1, λ2, · · · , λm]T .Then

Λ = W TXXTW = diag(λ)

Define hash functionh(x) = sgn(W Tx)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 23 / 49

Page 24: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

Weakness of PCA Hash

Using the same number of bits for different projected dimensions isunreasonable because larger-variance dimensions will carry moreinformation.

Solve it by making variances equal (isotropic)!

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 24 / 49

Page 25: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing

Weakness of PCA Hash

Using the same number of bits for different projected dimensions isunreasonable because larger-variance dimensions will carry moreinformation.

Solve it by making variances equal (isotropic)!

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 24 / 49

Page 26: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Model

Idea of IsoHash

Learn an orthogonal matrix Q ∈ Rm×m which makesQTW TXXTWQ become a matrix with equal diagonal values.

Effect of Q: to make each projected dimension has the same variancewhile keeping the Euclidean distances between any two pointsunchanged.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 25 / 49

Page 27: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Model

Problem Definition

tr(QTW TXXTWQ) = tr(W TXXTW ) = tr(Λ) =

m∑i=1

λi

a = [a1, a2, · · · , am] with ai = a =

∑mi=1 λim

,

andT (z) = {T ∈ Rm×m|diag(T ) = diag(z)},

Problem

The problem of IsoHash is to find an orthogonal matrix Q makingQTW TXXTWQ ∈ T (a).

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 26 / 49

Page 28: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Model

IsoHash Formulation

Because QTΛQ = QT [W TXXTW ]Q, let

M(Λ) = {QTΛQ|Q ∈ O(m)},

where O(m) is the set of all orthogonal matrices in Rm×m.

Then, the IsoHash problem is equivalent to:

||T − Z||F = 0,

where T ∈ T (a), Z ∈M(Λ), || · ||F denotes the Frobenius norm.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 27 / 49

Page 29: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Model

Existence Theorem

Lemma

[Schur-Horn Lemma (Horn, 1954)] Let c = {ci} ∈ Rm and b = {bi} ∈ Rm

be real vectors in non-increasing order respectively, i.e.,c1 ≥ c2 ≥ · · · ≥ cm, b1 ≥ b2 ≥ · · · ≥ bm. There exists a Hermitian matrixH with eigenvalues c and diagonal values b if and only if

k∑i=1

bi ≤k∑

i=1

ci, for any k = 1, 2, ...,m,

m∑i=1

bi =

m∑i=1

ci.

So we can prove :There exists a solution to the IsoHash problem. And this solution is in theintersection of T (a) and M(Λ).

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 28 / 49

Page 30: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Learning Methods

Two methods: (Chu, 1995)

Lift and projection (LP)

Gradient Flow (GF)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 29 / 49

Page 31: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Lift and projection (LP)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 30 / 49

Page 32: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Gradient Flow

Objective function:

minQ∈O(m)

F (Q) =1

2||diag(QTΛQ)− diag(a)||2F .

The gradient ∇F at Q:

∇F (Q) = 2Λβ(Q),

where β(Q) = diag(QTΛQ)− diag(a).

The projection of ∇F (Q) onto O(m)

g(Q) = Q[QTΛQ, β(Q)]

where [A,B] = AB −BA is the Lie bracket.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 31 / 49

Page 33: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Gradient Flow

Objective function:

minQ∈O(m)

F (Q) =1

2||diag(QTΛQ)− diag(a)||2F .

The gradient ∇F at Q:

∇F (Q) = 2Λβ(Q),

where β(Q) = diag(QTΛQ)− diag(a).

The projection of ∇F (Q) onto O(m)

g(Q) = Q[QTΛQ, β(Q)]

where [A,B] = AB −BA is the Lie bracket.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 31 / 49

Page 34: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Gradient Flow

Objective function:

minQ∈O(m)

F (Q) =1

2||diag(QTΛQ)− diag(a)||2F .

The gradient ∇F at Q:

∇F (Q) = 2Λβ(Q),

where β(Q) = diag(QTΛQ)− diag(a).

The projection of ∇F (Q) onto O(m)

g(Q) = Q[QTΛQ, β(Q)]

where [A,B] = AB −BA is the Lie bracket.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 31 / 49

Page 35: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Learning

Gradient Flow

The vector field Q = −g(Q) defines a steepest descent flow on themanifold O(m) for function F (Q). Letting Z = QTΛQ andα(Z) = β(Q), we get

Z = [Z, [α(Z), Z]],

where Z is an isospectral flow that moves to reduce the objective functionF (Q).

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 32 / 49

Page 36: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Experiment

Accuracy (mAP)

Method CIFAR

# bits 32 64 96 128 256

IsoHash 0.2249 0.2969 0.3256 0.3357 0.3651PCAH 0.0319 0.0274 0.0241 0.0216 0.0168

ITQ 0.2490 0.3051 0.3238 0.3319 0.3436

SH 0.0510 0.0589 0.0802 0.1121 0.1535

SIKH 0.0353 0.0902 0.1245 0.1909 0.3614

LSH 0.1052 0.1907 0.2396 0.2776 0.3432

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 33 / 49

Page 37: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Isotropic Hashing Experiment

Training Time

0 1 2 3 4 5 6

x 104

0

10

20

30

40

50

Number of training data

Tra

inin

g T

ime(

s)

IsoHash−GFIsoHash−LPITQSHSIKHLSHPCAH

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 34 / 49

Page 38: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 35 / 49

Page 39: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Double-Bit Quantization

Double Bit Quantization

1.5 1 0.5 0 0.5 1 1.50

500

1000

1500

2000

X

Sam

ple

Num

be

A BC D0 1

01 00 10 11

01 00 10

(a)

(b)

(c)

Point distribution of the real values computed by PCA on 22K LabelMedata set, and different coding results based on the distribution:

(a) single-bit quantization (SBQ);(b) hierarchical hashing (HH) (Liu et al., 2011);(c) double-bit quantization (DBQ).

The popular coding strategy SBQ which adopts zero as the thresholdis shown in Figure (a). Due to the thresholding, the intrinsicneighboring structure in the original space is destroyed.The HH strategy (Liu et al., 2011) is shown in Figure (b). If we used(A,B) to denote the Hamming distance between A and B, we canfind that d(A,D) < d(A,C) for HH, which is obviously notreasonable.With our DBQ code, d(A,D) = 2, d(A,B) = d(C,D) = 1, andd(B,C) = 0, which is obviously reasonable to preserve the similarityrelationships in the original space.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 36 / 49

Page 40: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Double-Bit Quantization

Experiment I

Precision-recall curve on 22K LabelMe data set

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH−SBQSH−HHSH−DBQ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH−SBQSH−HHSH−DBQ

SH 32 bits SH 64 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH−SBQSH−HHSH−DBQ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH−SBQSH−HHSH−DBQ

SH 128 bits SH 256 bits

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 37 / 49

Page 41: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Double-Bit Quantization

Experiment II

mAP on LabelMe data set

# bits 32 64SBQ HH DBQ SBQ HH DBQ

ITQ 0.2926 0.2592 0.3079 0.3413 0.3487 0.4002SH 0.0859 0.1329 0.1815 0.1071 0.1768 0.2649

PCA 0.0535 0.1009 0.1563 0.0417 0.1034 0.1822LSH 0.1657 0.105 0.12272 0.2594 0.2089 0.2577SIKH 0.0590 0.0712 0.0772 0.1132 0.1514 0.1737

# bits 128 256SBQ HH DBQ SBQ HH DBQ

ITQ 0.3675 0.4032 0.4650 0.3846 0.4251 0.4998SH 0.1730 0.2034 0.3403 0.2140 0.2468 0.3468

PCA 0.0323 0.1083 0.1748 0.0245 0.1103 0.1499LSH 0.3579 0.3311 0.4055 0.4158 0.4359 0.5154SIKH 0.2792 0.3147 0.3436 0.4759 0.5055 0.5325

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 38 / 49

Page 42: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Quantization Stage

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 39 / 49

Page 43: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Natural Binary Code(NBC)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 40 / 49

Page 44: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Manhattan Distance

Let x = [x1, x2, · · · , xd]T , y = [y1, y2, · · · , yd]T , the Manhattan distancebetween x and y is defined as follows:

dm(x,y) =

d∑i=1

|xi − yi|,

where |x| denotes the absolute value of x.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 41 / 49

Page 45: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Manhattan Distance Driven Quantization

We divide each projected dimension into 2q regions and then use qbits of natural binary code to encode the index of each region.

For example, If q = 3, the indices of regions are {0, 1, 2, 3, 4, 5, 6, 7}and the natural binary codes are{000, 001, 010, 011, 100, 101, 110, 111}

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 42 / 49

Page 46: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Manhattan Distance Driven Quantization

We divide each projected dimension into 2q regions and then use qbits of natural binary code to encode the index of each region.

For example, If q = 3, the indices of regions are {0, 1, 2, 3, 4, 5, 6, 7}and the natural binary codes are{000, 001, 010, 011, 100, 101, 110, 111}

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 42 / 49

Page 47: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Manhattan Distance Driven Quantization

Manhattan quantization(MQ) with q bits is denoted as q-MQ.

For example, if q = 2,

dm(000100, 110000) = dd(00, 11) + dd(01, 00) + dd(00, 00)

= 3 + 1 + 0

= 4.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 43 / 49

Page 48: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Experiment I

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH SBQSH HQSH 2−MQ

SH 32 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

RecallP

reci

sion

SH SBQSH HQSH 2−MQ

SH 64 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH SBQSH HQSH 2−MQ

SH 128 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SH SBQSH HQSH 2−MQ

SH 256 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SIKH SBQSIKH HQSIKH 2−MQ

SIKH 32 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SIKH SBQSIKH HQSIKH 2−MQ

SIKH 64 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SIKH SBQSIKH HQSIKH 2−MQ

SIKH 128 bits

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

SIKH SBQSIKH HQSIKH 2−MQ

SIKH 256 bits

Figure: Precision-recall curve on 22K LabelMe data set

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 44 / 49

Page 49: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Multiple-Bit Quantization Manhattan Quantization

Experiment II

Table: mAP on ANN SIFT1M data set. The best mAP among SBQ, HQ and2-MQ under the same setting is shown in bold face.

# bits 32 64 96SBQ HQ 2-MQ SBQ HQ 2-MQ SBQ HQ 2-MH

ITQ 0.1657 0.2500 0.2750 0.4641 0.4745 0.5087 0.5424 0.5871 0.6263SIKH 0.0394 0.0217 0.0570 0.2027 0.0822 0.2356 0.2263 0.1664 0.2768LSH 0.1163 0.0961 0.1173 0.2340 0.2815 0.3111 0.3767 0.4541 0.4599SH 0.0889 0.2482 0.2771 0.1828 0.3841 0.4576 0.2236 0.4911 0.5929

PCA 0.1087 0.2408 0.2882 0.1671 0.3956 0.4683 0.1625 0.4927 0.5641

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 45 / 49

Page 50: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Conclusion

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 46 / 49

Page 51: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Conclusion

Conclusion

Hashing can significantly improve searching speed and reduce storagecost.

Projections with isotropic variances will be better than those withanisotropic variances. (IsoHash)

The quantization stage is at least as important as the projectionstage. (DBQ/MQ)

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 47 / 49

Page 52: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Conclusion

Q & A

Thanks!

Question?

Code available athttp://www.cs.sjtu.edu.cn/~liwujun

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 48 / 49

Page 53: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

Reference

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Isotropic HashingModelLearningExperiment

3 Multiple-Bit QuantizationDouble-Bit QuantizationManhattan Quantization

4 Conclusion

5 Reference

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49

Page 54: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximatenearest neighbor in high dimensions. Commun. ACM, 51(1):117–122,2008.

M. Chu. Constructing a Hermitian matrix from its diagonal entries andeigenvalues. SIAM Journal on Matrix Analysis and Applications, 16(1):207–217, 1995.

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitivehashing scheme based on p-stable distributions. In Proceedings of theACM Symposium on Computational Geometry, 2004.

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensionsvia hashing. In Proceedings of International Conference on Very LargeData Bases, 1999.

Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approachto learning binary codes. In Proceedings of Computer Vision andPattern Recognition, 2011.

A. Horn. Doubly stochastic matrices and the diagonal of a rotation matrix.American Journal of Mathematics, 76(3):620–630, 1954.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49

Page 55: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

References

W. Kong and W.-J. Li. Double-bit quantization for hashing. InProceedings of the Twenty-Sixth AAAI Conference on ArtificialIntelligence (AAAI), 2012a.

W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the 26thAnnual Conference on Neural Information Processing Systems (NIPS),2012b.

W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale imageretrieval. In The 35th International ACM SIGIR conference on researchand development in Information Retrieval (SIGIR), 2012.

B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalableimage search. In Proceedings of International Conference on ComputerVision, 2009.

B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learnedmetrics. IEEE Trans. Pattern Anal. Mach. Intell., 31(12):2143–2157,2009.

S. Kumar and R. Udupa. Learning hash functions for cross-view similaritysearch. In IJCAI, pages 1360–1365, 2011.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49

Page 56: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

References

X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hashfunctions using column generation. In ICML, 2013.

W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. InProceedings of International Conference on Machine Learning, 2011.

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashingwith kernels. In CVPR, pages 2074–2081, 2012.

M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binarycodes. In Proceedings of International Conference on Machine Learning,2011.

M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metriclearning. In NIPS, pages 1070–1078, 2012.

M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparingapples to oranges: a scalable solution with heterogeneous hashing. InKDD, pages 230–238, 2013.

M. Raginsky and S. Lazebnik. Locality-sensitive binary codes fromshift-invariant kernels. In Proceedings of Neural Information ProcessingSystems, 2009.

Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49

Page 57: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

References

R. Salakhutdinov and G. Hinton. Semantic Hashing. In SIGIR workshopon Information Retrieval and applications of Graphical Models, 2007.

R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx.Reasoning, 50(7):969–978, 2009.

J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple featurehashing for real-time large scale near-duplicate video retrieval. In ACMMultimedia, pages 423–432, 2011.

J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-mediahashing for large-scale retrieval from heterogeneous data sources. InSIGMOD Conference, pages 785–796, 2013.

C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash:Improved matching with smaller descriptors. IEEE Trans. Pattern Anal.Mach. Intell., 34(1):66–78, 2012.

A. Torralba, R. Fergus, and Y. Weiss. Small codes and large imagedatabases for recognition. In Proceedings of Computer Vision andPattern Recognition, 2008.

J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning forLi (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49

Page 58: Learning to Hash with its Application to Big Data ... · with its Application to Big Data Retrieval and Mining o ... Problem De nition ... the top meigenvectors of the matrix XXT

hashing with compact codes. In Proceedings of International Conferenceon Machine Learning, 2010a.

J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing forlarge-scale image retrieval. In Proceedings of Computer Vision andPattern Recognition, 2010b.

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings ofNeural Information Processing Systems, 2008.

D. Zhang, F. Wang, and L. Si. Composite hashing with multipleinformation sources. In Proceedings of International ACM SIGIRConference on Research and Development in Information Retrieval,2011.

Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hashfunction learning. In KDD, pages 940–948, 2012a.

Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. InNIPS, pages 1385–1393, 2012b.

() December 24, 2013 49 / 49