keystroke biometric studies keystroke biometric identification and authentication on long-text input...
Post on 21-Dec-2015
225 views
TRANSCRIPT
Keystroke Biometric Studies
Keystroke Biometric Identification and
Authentication on Long-Text Input Book chapter in
Behavioral Biometrics for Human Identification (2009), edited by Liang Wang and Xin Geng
Authors Charles C. Tappert, Mary Villani, and Sung-Hyuk Cha
Summarizes keystroke biometric work 2005-2008 3 DPS dissertations
2 on identification, 1 on missing/incomplete data About 6 masters-level projects
New material – authentication, longitudinal, touch-type model
Keystroke Biometric Studies
Major Chapter Sections
Introduction Keystroke Biometric System Experimental Design and Data
Collection Experimental Results Conclusions and Future Work
Keystroke Biometric Studies
IntroductionBuild a Case for Usefulness of Study Validate importance of study – applications Define keystroke biometric Appeal of keystroke over other biometrics Previous work on the keystroke biometric No direct study comparisons on same data Feature measurements Make case for using: data over the internet,
long text input, free (arbitrary) text input Extends previous work by authors Summary of scope and methodology Summary of paper organization
Keystroke Biometric Studies
Introduction Validate importance of study –
applications
Internet authentication application Authenticate (verify) student test-takers
Internet identification application Identify perpetrators of inappropriate email
Internet security for other applications Important as more businesses move toward
e-commerce
Keystroke Biometric Studies
Introduction Define Keystroke Biometric
The keystroke biometric is one of the less-studied behavioral biometrics
Based on the idea that typing patterns are unique to individuals and difficult to duplicate
Keystroke Biometric Studies
Introduction Appeal of Keystroke
Biometric
Not intrusive – data captured as users type Users type frequently for business/pleasure
Inexpensive – keyboards are common No special equipment necessary
Can continue to check ID with keystrokes after initial authentication As users continue to type
Keystroke Biometric Studies
Introduction Previous Work on Keystroke
Biometric
One early study goes back to typewriter input Identification versus authentication
Most studies were on authentication Two commercial products on hardening passwords
Few on identification (more difficult problem) Short versus long text input
Most studies used short input – passwords, names Few used long text input –copy or free text
Other keystroke problems studies One study detected fatigue, stress, etc. Another detected ID change via monitoring
Keystroke Biometric Studies
Introduction No Direct Study Comparisons on Same
Data
No comparisons on a standard data set (desirable, available for many biometric and
pattern recognition problems) Rather, researchers collect their own
data Nevertheless, literature optimistic of
keystroke biometric potential for security
Keystroke Biometric Studies
Introduction Feature Measurements
Features derived from raw data Key press times and key release times Each keystroke provides small amount of data
Data varies from different keyboards, different conditions, and different entered texts
Using long text input allows Use of good (statistical) feature measurements Generalization over keyboards, conditions, etc.
Keystroke Biometric Studies
Introduction Make Case for Using
Data over the internet Required by applications
Long text input More and better features Higher accuracy
Free text input Required by applications Predefined copy texts unacceptable
Keystroke Biometric Studies
Introduction Extends Previous Work by Authors
Previous keystroke identification study Ideal conditions
Fixed text and Same keyboard for enrollment and testing
Less ideal conditions Free text input Different keyboards for enrollment and testing
Keystroke Biometric Studies
Introduction Summary of Scope and
Methodology
Determine distinctiveness of keystroke patterns
Two application types Identification (1-of-n problem) Authentication (yes/no problem)
Two indep. variables (4 data quadrants) Keyboard type – desktop versus laptop Entry mode – copy versus free text
Keystroke Biometric Studies
Keystroke Biometric System
Raw keystroke data capture Feature extraction Classification for identification Classification for authentication
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
Mostly statistical features Averages and standard deviations
Key press times Transition times between keystroke pairs
Individual keys and groups of keys – hierarchy
Percentage features Percentage use of non-letter keys Percentage use of mouse clicks
Input rates – average time/keystroke
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
A two-key sequence (th) showing the two transition measures
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
Hierarchy tree for the 39 duration categories
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
Hierarchy tree for the 35 transition categories
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
Fallback procedure for few/missing samples When the number of samples is less than a
fallback threshold, take the weighted average of the key’s mean and the fallback mean
weightfallback
weightfallback
kin
fallbackkiini
)(
)()()()('
Keystroke Biometric Studies
Keystroke Biometric SystemFeature Extraction
Two preprocessing steps Outlier removal
Remove duration and transition times > threshold
Feature standardization Convert features into the range 0-1
minmax
min'xx
xxx
Keystroke Biometric Studies
Keystroke Biometric SystemClassification for Identification
Nearest neighbor using Euclidean distance
Compare a test sample against the training samples, and the author of the nearest training sample is identified as the author of the test sample
Keystroke Biometric Studies
Keystroke Biometric SystemClassification for Authentication
Cha’s vector-distance (dichotomy) model
Keystroke Biometric Studies
Experimental Design and Data Collection
Design
Two independent variables Keyboard type
Desktop – all Dell Laptop – 90% Dell + IBM, Compaq, Apple, HP,
Toshiba Input mode
Copy task – predefined text Free text input – e.g., arbitrary email
Keystroke Biometric Studies
Experimental Design and Data Collection
Data Collection Subjects provided samples in at least two quadrants Five samples per quadrant per subject Summary of subject demographics
Age Female Male Total
Under 20 15 19 34
20-29 12 23 35
30-39 5 10 15
40-49 7 11 18
50+ 11 5 16
All 50 68 118
Keystroke Biometric Studies
Experimental Results Identification experimental results Authentication experimental results Longitudinal study results System hierarchical model and
parameters Hierarchical fallback model Outlier parameters Number of enrollment samples Input text length Probability distributions of statistical features
Keystroke Biometric Studies
Experimental ResultsIdentification Experimental
Results
90%
95%
100%
0 20 40 60 80 100
Number of Subjects
Per
cen
t A
ccu
racy
Desk-Copy
Lap-Copy
Desk-Free
Lap-Free
Identification performance under ideal conditions(same keyboard type and input mode, leave-one-out
procedure)
Keystroke Biometric Studies
Experimental ResultsIdentification Experimental
Results
0%
10%
20%
30%
40%
50%60%
70%
80%
90%
100%
0 20 40 60 80 100
Number of Subjects
Pe
rce
nt
Ac
cu
rac
y Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Identification performance under non-ideal conditions
(train on one file, test on another)
Keystroke Biometric Studies
Experimental ResultsAuthentication Experimental
Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
DeskCopy LapCopy DeskFree LapFree
Performance
FRR
FAR
Perc
ent A
ccur
acy
Conditions
Authentication performance under ideal conditions(weak enrollment: train on 18 subjects and test on 18 different
subjects)
Keystroke Biometric Studies
Experimental ResultsLongitudinal Study Results
Identification – 13 subjects at 2-week intervals Average 6 arrow groups: 90% -> 85% -> 83%
Authentication – 13 subjects at 2-week intervals Average 6 arrow groups: 90% -> 87% -> 85%
Identification – 8 subjects at 2-year interval Average 6 arrow groups: 84% -> 67%
Authentication – 8 subjects at 2-year interval Average 6 arrow groups: 94% -> 92%
(all above results under non-ideal conditions)
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
Touch-type hierarchy tree for durations
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
Identification accuracy versus outlier removal passes
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
Identification accuracy versus outlier removal distance (sigma)
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
70
75
80
85
90
95
100
1 2 3 4
Enrollment Samples
Per
cen
t A
ccu
racy
Identification accuracy versus enrollment samples
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
Identification accuracy versus input text length
Keystroke Biometric Studies
Experimental Results System hierarchical model and
parameters
Distributions of “u” duration times for each entry mode
Keystroke Biometric Studies
Conclusions
Results are important and timely as more people become involved in the applications of interest Authenticating online test-takers Identifying senders of inappropriate email
High performance (accuracy) results if 2 or more enrollment samples/user Users use same keyboard type
Keystroke Biometric Studies
Future Work
Focus on user authentication Focus on Cha’s dichotomy model Develop strong/weak enrollment
concepts Strong – system trained on actual users Weak – system trained on other (non-test)
users Develop strategies to obtain ROC curves Run actual test-taker experiments