stilpjtilifsequential projection learning forsequential ...final objective function after relaxation...

1
S ti l P j ti L i f Sequential Projection Learning for Sequential Projection Learning for C C Hashing with Compact Codes Hashing with Compact Codes Jun Wang , Sanjiv Kumar , Shih-Fu Chang Columbia University, New York, USA Google Research, New York, USA Background and Related Work Background and Related Work Semi Semi-Supervised Hashing Supervised Hashing - Formulation Formulation Background and Related Work Background and Related Work Semi Semi-Supervised Hashing Supervised Hashing - Formulation Formulation Nearest neighbor search for large databases Setting and notations Exhaustive nearest neighbor search not practical for large scale databases with high dimensional points Approximate nearest neighbor (ANN) search achieves sub-linear query time Popular ANN search techniques include tree-based approaches and hashing h approaches Si il P P titi Related work Similar Dissimilar Poor Partition Good Partition Different choices of projections: ¾ d j ti Illustration of semi-supervised hashing ¾ random projections locality sensitive hashing (LSH, Indyk et al. 98) to achieve partitioning with maximum ii l fit d t General form of hash functions Shift Invariant Kernel based Hashing (SIKH, Raginsky et al. 09) ¾ i i l j ti empirical fitness and entropy. ¾ principal projections spectral hashing (SH, Weiss, et al. 08) Empirical Fitness Different forms of hash functions ¾ identity function: LSH & Boosted SSC (BSSC, Shakhnarovich, 05) ¾ sinusoidal function: SH & SIKH ¾ sinusoidal function: SH & SIKH Other related work Other related work ¾ Restricted boltzmann machines (RBMs, Hinton et al. 06) ¾ Jian et al. 08 (metric learning) ¾ Kulis et al 09 & Mu et al 10 (kernerlized) Variance as Regularizer ¾ Kulis et al. 09 & Mu et al. 10 (kernerlized) Variance as Regularizer Main issues Information theoretic term: maximum variance partition Existing hashing methods mostly rely on random or principal projections, hi h Information theoretic term: maximum variance partition equals to balanced partition which are not very compact or accurate Si l i ll h i i il i Simple metrics are usually not enough to express semantic similarity Lower bound of maximum bit variance Semi Semi Supervised Sequential Projection Supervised Sequential Projection Experiment and Results Experiment and Results Semi Semi-Supervised Sequential Projection Supervised Sequential Projection Experiment and Results Experiment and Results Learning (S3PLH) Learning (S3PLH) - Solution Solution Datasets ¾ MNIST Digits: 70K, 1K training labels, 1K query tests (semi-supervised) ¾ SIFT F t 1 Milli 2K d lbl 10K t t( i d) Final objective function after relaxation ¾ SIFT Feature: 1 Million, 2K pseudo labels, 10K query tests (unsupervised) Final objective function after relaxation E i tl t l The orthogonal hash codes can be obtained by Experimental protocol “adjusted” covariance matrix The orthogonal hash codes can be obtained by doing eigen-decomposition on - PCAH ¾ Hash lookup (precision within hamming radius 2) E t t t td f il d ith ii Sequential solution via updating label matrix Empty return treated as failed query with zero precision ¾ Hamming ranking (Mean Average Precision) Sequential solution via updating label matrix ¾ Hamming ranking (Mean Average Precision) ¾ Compared methods: LSH, SH, SIKH, RBMs, BSSC, PCAH Performance evaluation and comparison Performance evaluation and comparison IST MNI M a Data FT D SIF Hamming ranking Hash lookup 48-bit Recall curve Hamming ranking Hash lookup 48-bit Recall curve Computational evaluation Computational evaluation Unsupervised Sequential Projection Unsupervised Sequential Projection Unsupervised Sequential Projection Unsupervised Sequential Projection Learning (USPLH) Learning (USPLH) Extension Extension Learning (USPLH) Learning (USPLH) - Extension Extension G t d i i lbldt f i ti h h bit Generate pseudo pair -wise label data from existing hash bits Conclusion: A semi-supervised paradigm for hashing learning (empirical fitness with information theoretic regularization) Sequential learning idea for error correction Extension of unsupervised case Use pseudo label matrix to learn projections Easy implementation and highly scalable Pseudo label matrix from kth hash bit Future work: adjusted” covariance matrix with pseudo labels Theoretical analysis of performance guarantee pseudo labels Weighted hamming embedding © July 2010

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StilPjtiLifSequential Projection Learning forSequential ...Final objective function after relaxation ¾SIFTF t 1Milli 2K dlbl 10K t t( i d)SIFT Feature: 1 Million, 2K pseudo labels,

S ti l P j ti L i fSequential Projection Learning forSequential Projection Learning for C CHashing with Compact CodesHashing with Compact Codes

Jun Wang† , Sanjiv Kumar‡ , Shih-Fu Chang†

† Columbia University, New York, USAy, ,‡ Google Research, New York, USAg , ,

Background and Related WorkBackground and Related Work SemiSemi--Supervised HashingSupervised Hashing -- FormulationFormulationBackground and Related WorkBackground and Related Work SemiSemi--Supervised Hashing Supervised Hashing -- FormulationFormulationNearest neighbor search for large databases Setting and notations

Exhaustive nearest neighbor search not practical for large scale databases g p gwith high dimensional points

Approximate nearest neighbor (ANN) search achieves sub-linear query timePopular ANN search techniques include tree-based approaches and hashing

happroachesSi il P P titiRelated work SimilarDissimilar

Poor PartitionGood Partition

Different choices of projections: d j ti Illustration of semi-supervised hashing random projections

locality sensitive hashing (LSH, Indyk et al. 98)

p gto achieve partitioning with maximum

i i l fit d tGeneral form of hash functionsy g ( , y )

Shift Invariant Kernel based Hashing (SIKH, Raginsky et al. 09)i i l j ti

empirical fitness and entropy.

principal projectionsspectral hashing (SH, Weiss, et al. 08) Empirical Fitnessp g ( , , )

Different forms of hash functionsp

identity function: LSH & Boosted SSC (BSSC, Shakhnarovich, 05)sinusoidal function: SH & SIKHsinusoidal function: SH & SIKH

Other related workOther related workRestricted boltzmann machines (RBMs, Hinton et al. 06)( )Jian et al. 08 (metric learning)Kulis et al 09 & Mu et al 10 (kernerlized) Variance as RegularizerKulis et al. 09 & Mu et al. 10 (kernerlized) Variance as Regularizer

Main issuesInformation theoretic term: maximum variance partition

Existing hashing methods mostly rely on random or principal projections, hi h

Information theoretic term: maximum variance partition equals to balanced partition

which are not very compact or accurateSi l i ll h i i il iSimple metrics are usually not enough to express semantic similarity

Lower bound of maximum bit varianceowe bou d o u b v ce

SemiSemi Supervised Sequential ProjectionSupervised Sequential Projection Experiment and ResultsExperiment and ResultsSemiSemi--Supervised Sequential Projection Supervised Sequential Projection Experiment and ResultsExperiment and ResultsLearning (S3PLH) Learning (S3PLH) -- SolutionSolution Datasetsg ( )g ( )

MNIST Digits: 70K, 1K training labels, 1K query tests (semi-supervised) SIFT F t 1 Milli 2K d l b l 10K t t ( i d)Final objective function after relaxation SIFT Feature: 1 Million, 2K pseudo labels, 10K query tests (unsupervised)Final objective function after relaxation

E i t l t lThe orthogonal hash codes can be obtained by

Experimental protocol“adjusted” covariance matrixThe orthogonal hash codes can be obtained by doing eigen-decomposition on - PCAH Hash lookup (precision within hamming radius 2)

E t t t t d f il d ith i iSequential solution via updating label matrix

Empty return treated as failed query with zero precision Hamming ranking (Mean Average Precision)Sequential solution via updating label matrix Hamming ranking (Mean Average Precision)Compared methods: LSH, SH, SIKH, RBMs, BSSC, PCAHCo pa ed et ods: S , S , S , s, SSC, C

Performance evaluation and comparisonPerformance evaluation and comparison

IST

MN

IM

aD

ata

FT D

SIF

Hamming ranking Hash lookup 48-bit Recall curveHamming ranking Hash lookup 48-bit Recall curve

Computational evaluationComputational evaluation

Unsupervised Sequential ProjectionUnsupervised Sequential ProjectionUnsupervised Sequential Projection Unsupervised Sequential Projection Learning (USPLH)Learning (USPLH) ExtensionExtensionLearning (USPLH) Learning (USPLH) -- ExtensionExtension

G t d i i l b l d t f i ti h h bitGenerate pseudo pair-wise label data from existing hash bits

Conclusion:A semi-supervised paradigm for hashing learning (empirical fitness with

information theoretic regularization)Sequential learning idea for error correctionExtension of unsupervised caseUse pseudo label matrix to learn projectionsEasy implementation and highly scalable

Pseudo label matrix from kth hash bitFuture work:

“adjusted” covariance matrix withpseudo labels

Theoretical analysis of performance guaranteepseudo labels

Weighted hamming embedding

© July 2010