indexing and binning large databases. abstract problems with large databases biometric...
Post on 20-Dec-2015
214 views
TRANSCRIPT
![Page 1: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/1.jpg)
Indexing and Binning Large Databases
![Page 2: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/2.jpg)
Abstract
Problems with large databases Biometric identification (1:N Matching) does not scale well
with size No established way to organize high dimensional biometric
data Proposed Solution
Reduce search space before 1:N matching Divide the database using Clustering Techniques
Contributions We analyze the effect of implementing a binning scheme
on search performance and accuracy We present binning and pruning approaches using multiple
biometrics Using hand geometry and signature, we have achieved a
search space reduction of 95% without any FRR
![Page 3: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/3.jpg)
Background
Only biometric identification (1:N matching) can prevent duplicate enrollments, double dipping
Biometrics are being deployed for immigration and national ID applications US-VISIT program Voter ID and national ID programs[3]
Potential size that can run into millions Current research is focused only on accuracy Apart from accuracy, scalability, speed and efficiency
also become important at this scale
![Page 4: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/4.jpg)
Challenges
Textual/Numeric Data
Data is scalar(1D) Textual/numeric data can be linearly
ordered and therefore easily indexed
Biometric Data
Biometric templates are high dimensional
No linear ordering or sorting methods exists for biometric data
![Page 5: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/5.jpg)
Search space analysis
As number of stored templates increases, template density (TD) also increases
countCluster N
distancecluster intra Average
C
1
IC
C
IC
D
N
DTD
![Page 6: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/6.jpg)
Number of false positives grows geometrically with the size of the database
Let FAR and FRR be the False Acceptance Rate (probability) and False Reject Rate (probability) for 1:1 matching
For a 1:N matching,
Identification problem
FRRFRR
FARNFARFAR
N
NN
)1(1
The total number of False Accepts is given by
FARNFARNFAR NN 2))1(1(N accepts False
![Page 7: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/7.jpg)
State of the Art
Biometrics State of the art Research Problems
Fingerprint 0.15% FRR at 1% FAR(FVC 2002)
Fingerprint EnhancementPartial fingerprint matching
Face Recognition
10% FRR at 1% FAR(FRVT 2002)
Improving accuracyFace alignment variationHandling lighting variations
Hand Geometry 4% FRR at 0% FAR(Transport Security Administration Tests)
Developing reliable modelsIdentification problem
Signature Verification
1.5%(IBM Israel) Developing offline verification systemsHandling skillful forgeries
Voice Verification
<1% FRR (Current Research)
Handling channel normalizationUser habituationText and language independence
![Page 8: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/8.jpg)
State of the Art
Biometrics State of the art Research Problems
Fingerprint 0.15% FRR at 1% FAR(FVC 2002)
Fingerprint EnhancementPartial fingerprint matching
Face Recognition
10% FRR at 1% FAR(FRVT 2002)
Improving accuracyFace alignment variationHandling lighting variations
Hand Geometry
2.6% FRR at 0.02% FAR(CUBS, SUNY-Buffalo)
Developing reliable modelsIdentification problem
Signature Verification
1.5%(IBM Israel) Developing offline verification systemsHandling skillful forgeries
Voice Verification
<1% FRR (Current Research)
Handling channel normalizationUser habituationText and language independence
![Page 9: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/9.jpg)
Identification problem (contd.)
Even if FAR = 0.0001%, False accepts = 1 in 10 for N=100000(lower bound) in the identification case.
No single biometric is capable of meeting this security requirement individually
Ways to reduce identification errors: Reduce FAR
FAR is limited by feature representation and the recognition algorithm
Cannot be indefinitely reduced Reduce N
Classify or index the biometric database. (e.g Henry classification system for fingerprints)
Index the records based on meta-data Can we do better?
![Page 10: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/10.jpg)
Fingerprint Features
Fingerprints can be classified based on the ridge flow pattern
Fingerprints can be distinguished based on the ridge characteristics
65% of fingerprints belong to the Loop class
![Page 11: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/11.jpg)
Henry Classification of Fingerprints
[Ratha et al,1996] used Henry Classification on database of 1800 templates, tested on 100 templates Search Space: 25%; FRR: 10%
[Jain, Pankanti,2000] similar experiment on database of 700 templates achieved FRR: 7.4% (Focus on classification only)
State-of-art Fingerprint classification system [Capelli,Maio,Maltoni,Nanni,2003] has FRR 4.8% for 5 class problem and 3.7% for 4 class problem
Though natural class exists, still classification is non-trivial Natural classes do not exist for biometrics like Hand Geometry Need more sophistication for partitioning database
![Page 12: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/12.jpg)
Analysis of search space reduction
For a 1:N matching,
We can improve performance by reducing the search space during identification
Let PSYS – Penetration rate [between 0.0 and 1.0] Penetration rate is the average fraction of the database
searched during identification Effective size = N*PSYS
FRRFRR
FARNPFARFAR
NP
SYSNP
NP
SYS
SYS
SYS
)1(1
The total number of False Accepts is given by
FARPNFAR SYSN 2SYS )(PN accepts False
State of the art fingerprint systems has PSYS=0.5
![Page 13: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/13.jpg)
Effect of binning on accuracy
For PSYS < 0.2, the false accepts are almost constant
Query response time improves by a factor of PSYS
Capabilities of a low FAR system Will allow us to screen immigrants at airports Will make biometric systems more user-friendly by
eliminating the need to remember PINs and IDs
0 2 4 6 8 10
x 105
0
2
4
6
8
10x 10
7
N
Fal
se A
ccep
ts
1.0 0.75
0.5
0.3
0.1
![Page 14: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/14.jpg)
Binning
Binning can be used to achieve a smaller PSYS
Partition the feature space Each bin is represented by a cluster center CK
Records are compared with only NB cluster centers
Bin representatives are computed offline during training Challenges
How to handle clustering of large databases? How to handle additions and deletions?
![Page 15: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/15.jpg)
Tradeoff
Although binning reduces search space, it introduces another source of identification error : Bin Miss
If the bin in which the user record exists is not searched, then FRR is generated no matter how good the matcher is
If P(B) is the probability of getting the correct bin
Binning increases the probability of False Rejects Not tolerable in security and screening applications Solution:
Use K-means clustering to find K bins Check Ns nearest bins for the record, such that P(B) = 1
![Page 16: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/16.jpg)
In general a biometric template may be represented as a vector
Vectors are represented into N distinct clusters; each represented by a ‘code book vector’
The code book vectors divide the feature space into N distinct Voronoi regions
Every template is closest to the mean (codebook vector) of the region it belongs to
Formal definition of Binning
kikiiii xxxxx ]....,,,[ 4321
kNYYYYY ]....,,,[ 4321
iV
jixYxY ji 22
ik
i VV and
![Page 17: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/17.jpg)
Search Space Partition: Voronoi Regions
![Page 18: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/18.jpg)
Hand Geometry Template
Feature extraction stages•Image capture•Binarization•Contour Extraction•Noise Removal
35 Features are extracted•25 directly measured features•10 ratio and perimeter features
![Page 19: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/19.jpg)
Signature Template
11 Features Extracted
•Regression Constants b0,b1•Compactness•Signature Length•Major Stroke Length•Major Stroke Angle
•Connected Components•Hole Count•Hole Area•Stroke Count•Signing Time
![Page 20: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/20.jpg)
Results
Signature Binning
0
10
20
30
40
50
60
70
80
3 4 5 6 7 8 9 10 12 14 16 18 20 25 30
Number of Bins
Pen
etr
atio
n R
ate
11 – Dimensional Signature dataBest Penetration: 35.57% for 6 binsFRR = 0%
Hand Geometry Binning
0
10
20
30
40
50
60
70
3 4 5 6 7 8 9 10 12 14 16 18 20 25 30
Number of Bins
Pen
etr
ati
on
Rate
35 – Dimensional Hand Geometry data Best Penetration: 35.8% for 6 bins
FRR = 0%
D
BBSSYS N
NTNP
database theof Size:N bins, ofNumber :N
binin templatesAverage :T search, tobins ofNumber :
DB
BSN
Dataset 250 Training Set & 250 Testing Set
![Page 21: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/21.jpg)
Multi-modal approach
Resulting bins have very high template densities A different biometric modality should be used to classify
templates within a bin Multimodal biometrics
Using multiple biometrics improves accuracy It is difficult to forge multiple biometrics Composite templates reduce template density Statistical independence ensures that individual
binning results are diverse The search space (intersection of bins) is reduced
due to low commonality between the individual binning results
![Page 22: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/22.jpg)
Multi-Modal Approach
![Page 23: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/23.jpg)
Multi-Modal Approach
Search Space: 5% original database size; FRR – 0%
![Page 24: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/24.jpg)
Results of Combination
0
5
10
15
20
25
30
35
40
45
5 6 7 8 9 10 12 14 16 18 20 25 30
Number of Bins
Pen
etr
ati
on
Rate
Signature
Hand Geometry
Combination
Best combined penetration rate of 5%
D
BSSYS N
TNP
database theof Size:N
binin templatesAverage :T search, tobins ofNumber :
D
BSN
Dataset 250 Training Set & 250 Testing Set
![Page 25: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/25.jpg)
Binning v/s Indexing
Applications can have frequent insertions of new templates
Binning works well when database is static Insertions will require re-partitioning the entire
database
Indexing can be used in both – static and dynamic database scenarios
Trees are commonly used for indexing Extend the concept of indexing relational databases
to indexing biometric databases Much more challenging – no concept of primary key
exists in biometric templates!
![Page 26: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/26.jpg)
Pyramid Technique spatial hashing
Determine the Pyramid (i) within with which the template lies
Determine height (h) of template from the apex The 1-D value = Pyramid Number (i) + Height (h) Indexing done using B+ Trees
![Page 27: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/27.jpg)
Various Indexing Techniques
Grid Files KD Tree
R Tree R+ Tree X Tree
Pyramid Technique
![Page 28: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/28.jpg)
Comparative Study
Method Scalable Order Invariant
Dynamic Range Query
No Overlap
Grid File Y Y N N Y
R Tree Y N N N N
R* Tree Y N N N N
R+ Tree Y N N N Y
KD Tree Y N N N Y
X Tree Y N Y Y Y
Pyramid Tech
Y Y Y Y Y
![Page 29: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/29.jpg)
Results of Indexing
35 – Dimensional Hand Geometry data Best Penetration: 27% FRR = 0%
Dataset 450 Training Set & 450 Testing Set
Parallel combination with signature will further reduce the search space
![Page 30: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/30.jpg)
Multimodal Biometrics
![Page 31: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/31.jpg)
2D Biometric: Signature & Fingerprint Fusion
Impostor Score Pairs True Match Score Pairs
![Page 32: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/32.jpg)
Optimal Fusion AlgorithmSignature Fused With Fingerprint
FusionAlgorithm
True Match Score Pairs
Impostor Score Pairs
Unrealizable Performance Area
Suboptimal Performance Area
The ROC is the boundary between what is possible and suboptimal performance.
Acc
ura
cy (
1-F
RR
)
False Accept Rate (FAR)
Optimal Fusion ROC
![Page 33: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/33.jpg)
Optimal Fusion Algorithm Decision Regions99.04% Accuracy @ Specified FAR of 1 in a Million
No-Match ZoneMatch Zone
irregular decision region boundary due to finite sample sizethe more data the smoother the boundaries
2nd B
iom
etric
Sco
re A
xis
1st Biometric Score Axis
![Page 34: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/34.jpg)
RSS Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC
RSSFusion
True Match Score Pairs
Impostor Score PairsA
ccu
racy
(1
-FR
R)
False Accept Rate (FAR)
RSS Fusion ROC
Optimal ROC
![Page 35: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/35.jpg)
RSS Fusion Decision Regions96.11% Accuracy @ Specified FAR of 1 in a Million
No-Match ZoneMatch Zone
2nd B
iom
etric
Sco
re A
xis
1st Biometric Score Axis
![Page 36: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/36.jpg)
OR Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC
ORFusion
True Match Score Pairs
Impostor Score PairsA
ccu
racy
(1
-FR
R)
False Accept Rate (FAR)
OR Fusion ROC
Optimal ROC
![Page 37: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/37.jpg)
OR Fusion Decision Regions96.85% Accuracy @ Specified FAR of 1 in a Million
No-Match ZoneMatch Zone
2nd B
iom
etric
Sco
re A
xis
1st Biometric Score Axis
![Page 38: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/38.jpg)
AND Fusion Algorithm for Fingerprint & Signature Provides A Suboptimal Performance ROC
ANDFusion
True Match Score Pairs
Impostor Score PairsA
ccu
racy
(1
-FR
R)
False Accept Rate (FAR)
AND Fusion ROC
Optimal ROC
![Page 39: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/39.jpg)
AND Fusion Decision Regions62.91% Accuracy @ Specified FAR of 1 in a Million
No-Match ZoneMatch Zone
2nd B
iom
etric
Sco
re A
xis
1st Biometric Score Axis
![Page 40: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/40.jpg)
ROC
![Page 41: Indexing and Binning Large Databases. Abstract Problems with large databases Biometric identification (1:N Matching) does not scale well with size](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d485503460f94a24691/html5/thumbnails/41.jpg)
Thank You