an approximate nearest neighbor retrieval scheme for computationally intensive distance measures...
TRANSCRIPT
![Page 1: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/1.jpg)
An Approximate Nearest Neighbor Retrieval Scheme for
Computationally Intensive Distance Measures
Pratyush BhattMS by Research(CVIT)
![Page 2: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/2.jpg)
Nearest Neighbor Retrieval
• Representation of an object– Fixed Length– Variable Length
• (Dis)similarity Function (Distance Measure)• Neighborhood
• Nearest neighbor retrieval problem can now be formalized as retrieving objects similar to a given object, where similarity is in accordance to a given similarity function
![Page 3: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/3.jpg)
Need of NN retrieval schemes
• If the search space is small, Sequential search is applied for accurate results.
• With increase in memory, volume of data stored online has increased.– Sequential search is time consuming– Need to index data for fast retrieval– Birth of NN search algorithms
![Page 4: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/4.jpg)
Computationally Expensive Distance Measures
• X1 more similar to X2 than X3 visually• L1 distance between X1 and X2 is more than X1 and X3
![Page 5: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/5.jpg)
• Time complexity is super-linear to length of input.
• Edit Distance: O(d2)
Chamfer Distance
![Page 6: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/6.jpg)
Nearest Neighbor Classification
• Computationally expensive distance functions– Nearest neighbor classifiers often impractical for real applications.– Takes over 20 minutes to classify a single object on a modern PC using an
optimized C++ implementation.– Larger the available training data, better will be the accuracy but at the
cost of high computation time.
![Page 7: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/7.jpg)
Motivation(Approx. NN)
• Why to find similarity with all samples when decision is based on top K ?
• How to find top K without finding similarity with all the samples ?
• Solution: Compute K1 > K Approx. NN and find best K in that list.
• In order to compute K1-NN, use fewer explicit matches.
![Page 8: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/8.jpg)
Problem Statement
• Improve nearest neighbor retrieval and classification performance in spaces with computationally expensive distance measures.
• Generate an expansible approximate nearest neighbor list for a given query.
• Dealing with points in non-metric space.
![Page 9: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/9.jpg)
Metric Space
![Page 10: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/10.jpg)
Metric Space
Tree based(KD-tree, R-Tree)
Hashing Based
![Page 11: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/11.jpg)
KD Tree for Metric Space
![Page 12: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/12.jpg)
Non-Metric Space
![Page 13: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/13.jpg)
Wrist Rotation
![Page 14: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/14.jpg)
![Page 15: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/15.jpg)
Applications
• Classification based on K-NN– K is determined empirically
• Identification– Stop when similarity is above a fixed threshold
• Retrieval Applications– Optimizing network usage in peer-to-peer computer networks.– Content-based retrieval systems
• Concept of similarity is abstract– Need to generate expansible list– Learn from user feedbacks
![Page 16: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/16.jpg)
Challenges
• Accuracy depends on the similarity function used to compute Approx. NN
• List can not be pre-computed, should be generated on-the-fly• Should be incremental to support scalability• Non-metric space
– Prohibit application of triangular inequality
![Page 17: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/17.jpg)
Manifold Theory
},.....,,{ 21 nxxxO di Rx
},.....,,{ 21 nyyyE
)( dpR p
![Page 18: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/18.jpg)
• Run MDS on similarity matrix to get
• Embedding of new sample is given by
mean of column of squared matrix requires computation of similarity of new sample to points
t
k
tk
tt
k
vvvL
.....
2
2
1
1#
)(# akLy
nnD
nnD
an
![Page 19: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/19.jpg)
Related Work
• FastMap , Random Reference Objects, Random Line Projections, VP-Trees– Finds embedding of the query by computing only a few exact
distances – Assumption: Triangular Inequality
• BoostMap: Used AdaBoost to combine many simple 1-D embedding
FastMapLipschitz Embedding
![Page 20: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/20.jpg)
Problem Formulation
• Goal : Compute approx. NN of a query point q, from a set S of N points in accordance to similarity function, F.
• Solution : – Split the data into a multi-level hierarchy– Exploit local similarity property to direct the search from top to
bottom
![Page 21: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/21.jpg)
Hierarchical Local Maps(HLM)
![Page 22: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/22.jpg)
1. FIND SIMILARITY OF QUERY WITH POINTS AT TOPMOST LEVEL
2. IDENTIFY NEAREST NEIGHBORS3. GET CHILDREN OF NEAREST NEIGHBORHOW TO FIND SIMILARITY OF QUERY
WITH THESE SAMPLES WITHOUT EXPLICITLY CALCULATING IT ?
4. PROJECT POINTS ON THE MANIFOLD
5. PROJECT QUERY ON THE MANIFOLD
6. FIND NEAREST NEIGHBOURS OF QUERY IN
THIS METRIC SPACE
7. IDENTIFY THESE POINTS IN THE TREE8. TRAVERSE DOWN IN
SAME FASHION
![Page 23: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/23.jpg)
Results
• Unipen Handwriting Database– 15953 digit examples– Divided into training and testing set with 2:1 ratio– Distance Measure : DTW
![Page 24: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/24.jpg)
Number of DTW Computations for K nearest neighbor retrieval
![Page 25: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/25.jpg)
Number of DTW Computations for K nearest neighbor retrieval
![Page 26: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/26.jpg)
Number of DTW Computations for K nearest neighbor retrieval
![Page 27: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/27.jpg)
Classification Accuracy on UNIPEN Dataset using exact and approximate k-NN
Classification Accuracy
K 1 5 10 15 20
DTW 98.1 97.73 97.41 96.99 96.83
HLM 97.65 97.27 97.08 96.69 96.44
Difference 0.45 0.46 0.33 0.30 0.39
• Average No. of DTW computations done by HLM : 160• Expected No. of DTW computations done in brute-force : 5315
![Page 28: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/28.jpg)
Biometric ( Special Case)
• High Inter-Class Variation
• Low Intra-Class Variation
• Low variation in inter-class distances
• Indexing for identification
• How to apply HLM in such cases ??
– Local similarity structure becomes degenerate destroying any manifold structure
![Page 29: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/29.jpg)
Biometric Data
![Page 30: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/30.jpg)
High dimensional data
Relative Contrast :
![Page 31: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/31.jpg)
Iris
• Feature Vector Length 1000• If same class- < 100 bit differ • Else equal probability of each bit to match or not match
– On average 500 bit differ
• Imposter Scores would be around 0.5
![Page 32: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/32.jpg)
Biometric Data
• Such Distribution is bad for Indexing
![Page 33: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/33.jpg)
Softness/Hardness
• Measure of overlap between genuine and imposter classes• Soft Biometric (Face, Body Silhouette)
– Poor Classification Accuracy– Better indexing– Correlates to multi dimensional point
• Hard Biometric (Iris, Fingerprint)– Good Classification Accuracy– Poor indexing– Correlates to high dimensional point
• Need a balance between two
![Page 34: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/34.jpg)
Dataset
• CASIA Iris Image Database V3.0– 855 images corresponding to 285 users in training and testing set– 3 samples per eye
• Simlarity Function– Hamming Distance
– Euclidean Distance (Softer Metric)• Average gray value of a block resulted in 160D feature vector
• Penetration Rate– Percentage of data on which we ran biometric matcher.
• False Reject Rate– Percentage of identification instances in which false rejection occurs
![Page 35: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/35.jpg)
HLM on CASIA Iris Dataset
![Page 36: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/36.jpg)
Synthetic Dataset
• Class center sampled from 1D gaussian (0,1)• Generate d-dimensional by sampling d times• Points of same class sampled from gaussian with mean as
class centers and varying variance.• Total Number of classes: 500• Points in same class
– Training : 10– Testing : 5
![Page 37: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/37.jpg)
Indexing performance varying number of dimensions
![Page 38: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/38.jpg)
Indexing performance varying within class to between class variance ratio
![Page 39: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/39.jpg)
Contributions
• A representation scheme for objects in a dataset that allows for fast retrieval of approximate nearest neighbors in non-euclidean space.
• Search mechanism combined with filter and refine approach is proposed that minimizes the number of exact distance computations for computationally expensive distance measure.
• Study performance of our scheme on biometric data and study the parameters affecting its performance.
![Page 40: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/40.jpg)
Conclusion and Future Work
• Local Similarity Property is well exploited by HLM• Incremental and Scalable• Softer biometric in filtering step combined with hard
biometric in refine step would drastically reduce computation time
• Optimal construction of HLM• Defining a measure for similarity function that allows
hierarchical representation.• Learn a function to find degree of indexibility
– Extract parameters from data distribution and similarity function
![Page 41: An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)](https://reader035.vdocument.in/reader035/viewer/2022062315/5697bf8b1a28abf838c8b01b/html5/thumbnails/41.jpg)
Thank You
Related Publication:• Pratyush Bhatt and Anoop Namboodiri “Hierarchical Local
Maps for Robust Approximate Nearest Neighbour Computation” Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR 2009), Feb. 4-6, 2009, Kolkatta, India.