keystroke biometrics studies on a variety of short and long text and numeric input ned bakelman, dps...
Post on 03-Jan-2016
218 Views
Preview:
TRANSCRIPT
Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input
Ned Bakelman, DPS CandidateCharles C. Tappert, PhD, Advisor
Seidenberg School of Computer Science and Information Systems Pace University
White Plains, NY 10606, USA
DPS DefenseApril 11, 2014
Researched Questions
This study focuses on biometric authentication using long bursts of arbitrary input and short bursts of fixed input
with an improved classification system
• Long Input: 100 – 1500 characters (paragraph, couple of sentences, etc.)• Short Input: 10 – 15 characters (password, pass code, etc.)
• Arbitrary Input: Open unrestricted text (up to the users choosing)
Research Questions (continued)
1) Can we accurately detect the intruder use of a computer system in an office environment?
2) How does the use of standard applications such as word processing, spreadsheet, browser impact intruder detection?
3) Is an intruder still detectable if using a web browser (low text environment)
Purpose of the StudyLong Input - Unauthorized User Detection
1) What is the accuracy between the two?2) Which performs better on long input?3) Which performs better on short input?
1) What is the detection accuracy of short fixed numeric keypad input?2) Does the use of specific keypad features improve detection accuracy?
Short Keypad Input – Detection Accuracy
Classifier Comparison – Multi Match vs. Single Match
Background
T. Olzak, Keystroke Dynamics: Low Impact Biometric Verification, Sep, 2006
• Derived from raw timing data• Based on key press duration and transition times• Also known as Dwell and Flight time
• Statistical in nature, mainly Means and Standard Deviations• Pre-processing to remove outliers and standardize between 0 – 1• Fallback procedure
(Source of Features or Attributes)
Background (continued)
Wikipedia.org http://en.wikipedia.org/wiki/Computer_keyboard, last updated: March 6, 2012
QWERTY Numeric Keypad
Separate features for QWERTY and Keypad• Durations and transitions for individual keys, groups of keys, etc.• QWERTY: each letter, each number, vowels, consonants, all letters, etc.• Keypad: each digit, each operator (+ - * /), all digits, all operators, etc
(Target of Features or Attributes)
Background (continued)(Pace Classifier: Single Match)
• Dichotomy Model• Uses vector differences• Transforms a multi-class problem to a two-class problem• K-Nearest Neighbor (k-NN) is used for classification
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.05
0.1
0.15
0.2
S1
S2S3
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
within distance
between distance
Feature Vector Space3 subjects, 4 samples
Feature Difference Space18 within, 48 between
Background (continued)(Pace Classifier: Multi Match)
Authentication Process• User Focused Reduction Method (reduces the training space)• System performance obtained using the Leave-One-Out method• “Left out” test sample is used to create differences of different vectors
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
S1 within distance
S1 between distance
• Each test difference is classified(k-NN)• Results are grouped together • Authentication decision based on all
Feature Reduction Space6 within, 32 between
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.05
0.1
0.15
0.2
S1
S2S3
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
within distance
between distance
Feature Vector Space3 subjects, 4 samples
Feature Difference Space18 within, 48 between
Background (continued)
Receiver Operating Characteristic Curves (ROC)• Historically used in signal detection such as RADAR in distinguishing an actual signal from noise• Used in Biometrics to plot the FAR and FRR at various operating points (thresholds)
(Performance: ROC Curves, Equal Error Rate)
Equal Error Rate (EER)• The point on the ROC curve where the FAR and FRR are equal• The operating point on the ROC curve where the FAR and FRR intersect
ROC Curve FAR / FRR Intersection
Data Collection
• Only “perfect” samples were used (no mistakes)• Rest period of at least one day between sessions• Data entered into a spreadsheet using right hand
30 Subjects
914 193 7761 4
Number Sessions
20Per Subject
(Numeric Keypad)
Features
Features
Attributes Mean (µ) Standard Deviation (σ) TotalQWERTY (Non-Numeric)
Durations: 53 53 53 106per (Type I and II) Transitions: 35 70 70 140
QWERTY (Numeric)Durations: 27 27 27 54
per (Type I and II) Transitions: 26 52 52 104Keypad
Durations: 29 29 29 58per (Type I and II) Transitions: 128 256 256 512
Totals: 298 487 487 974
(Feature Attribute Summary)
NumericKeypad
Digits with Decimal
0
1
2
34 5 6
7
8
9
.
Arithmetic Operators with Num Lock and Enter
NumLock
Enter/ *
-
+
All Keys
Features(Keypad Durations)
Print Screen, Sys Rq, Scroll Lock, Pause, Break
CenterpadHome
Page Up
Page Dn End
Del
Ins
Four Arrows
keypad ->keypad
any digit->any Digit1->1,2,3…0
2->1,2,3…0
3->1,2,3…0
4->1,2,3…0
5->1,2,3…0 6->1,2,3…0
7->1,2,3…0
8->1,2,3…0
9->1,2,3…0
0->1,2,3…01->digits
2->digits
3->digits
4->digits
5->digits 6->digits
7->digits
8->digits
9->digits
0->digits
Any Digit->ArithmeticOperators
1->ArithmeticOperators
2->ArithmeticOperators
3->ArithmeticOperators
4->ArithmeticOperators
5->ArithmeticOperators
6->ArithmeticOperators 7->
ArithmeticOperators
8->ArithmeticOperators
9->ArithmeticOperators
0->Arithmetic Operators div->
digits
Arithmetic Operator->
any digit
mult-> digits
sub-> digits
add-> digits
Any Key->Any Key
Features (continued)(Keypad Transitions)
Results – Short Input Experiments(Equal Error Rate for each keypad experiment per Classifier)
10 Subject 20 Subject 30 Subject
Multi Match
Single Match
Multi Match
Single Match
Multi Match
Single Match
Results – Short Input Experiments (continued)(ROC Curve for each keypad experiment per Classifier)
Multi Match Classifier Single Match Classifier
10 - 20: 10 Subjects, 20 samples each20 - 20: 20 Subjects, 20 samples each30 - 20: 30 Subjects, 20 samples each
Results – Short Input Experiments (continued)
Numeric KeypadSubjects 10 20 30Samples per Subject 20 20 20Total Samples (All Subjects) 200 400 600
EER % (Multi Match) 5.50% 5.65% 6.14%EER % (Single Match) 15.56% 15.72% 14.95%
EER Improvement % 64.65% 64.06% 58.93%
• Independent Variable 1: Number of Subjects• Independent Variable 2: Classifier
• Conclusion 1: EER increases ˄ as Number of Subjects increases *• Conclusion 2: New Classifier much better than Old Classifier
* Except for old Classifier
(Independent Variables for the short input experiments)
(but not by much)
CMU Experiment - Keypad
914 193 7761 + Enter Key = 11 Characters
• 10 key-down ---> key-down• 10 key-up ---> key-down• 11 dwell times• 31 Features
Carnegie Melon Features (from their numeric keypad study *)
• (10 key-down ---> key-down) per µ, per σ = 20 • (10 key-up ---> key-down) per µ, per σ = 20 • (7 dwell) per µ, per σ = 14• 54 Timing Features
Pace University Features (from our numeric keypad study)
(Features Set Comparison – CMU vs. PaceU)
R. Maxion and K. Killourhy, "Keystroke Biometrics with Number-Pad Input,“ 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL, 2010, pp. 201-210.
*
CMU Experiment – Keypad (continued)(Equal Error Rate and ROC Curves only using Multi Match)
0
20
40
60
80
100
0 20 40 60 80 100
FA
R (%
)
FRR (%)
Keypad 30 - 20 CMU Features
Keypad 30 - 20 PU Features
PU Data with CMU Features
Equal Error Rate ROC Curves
PU Features vs. CMU Features
CMU Experiment – Keypad (continued)
• Independent Variable: Feature Set• Conclusion: PU Feature Set out performed CMU Feature Set
(Independent Variable for the CMU Keypad experiment)
Numeric Keypad (30 – 20)Features Set CMU PUSubjects 30 30Samples per Subject 20 20Total Samples (All Subjects) 600 600
EER % (Multi Match) 10.47% 6.14%EER Improvement % 41.36%
Conclusions
• Keystroke Biometrics can be effective at detecting the unauthorized use of a computer system in a closed environment (government office, school, business office, etc.)
• Performance Varied with Input Type:• Spreadsheet: Good Performance (EER: 8.1%)• Text: Very Good Performance (EER: 5.8%)• Browser: Fair Performance (ER: 15.7%)
Long Input Experiments – Intruder Detection Accuracy
1) Multi Match out performed Single Match significantly (EER Improvement from 50% - 64%)2) Multi Match out performed detector study from CMU using their data and features (EER: 7.6%)
• Numeric Keypad yields very good performance (EER Range: 5.5% - 6.2%)• PaceU Features Set is Effective: CMU features performed much worse (10.5% vs. 6.2%)
Short Input Experiments – Detection Accuracy
Classifier Comparison – Multi Match vs. Single Match
Conclusions (continued)
• Less optimal samples• No designated entry window for sample collection (less control over quality of entry)• Large fluctuations in the number of keystrokes• Input types most likely had substantial mouse activity that “Interrupts” keystroke entry• Possible sparseness of keystrokes (meaning less concentrated and spread out especially with
browser entry)
Long Input Performance: Weaker Performance compared to previous studies at PU… Why?
• Propose that correlating performance simply to Number of Keystrokes is not sufficient• Need to factor in the density of the keystrokes as well• Simply stated: It may take a lot more keystrokes to maintain an effective level of performance if the
sparseness is high
Future Considerations: Do keystroke counts tell the whole story?
Suggestions for Future Work
• Further studies on numeric entry from QWERTY• Compare performance to numeric entry from keypad• Study free text entry from keypad
• Feature Analysis• Which features contributed to performance from the keypad?• How do equivalent numeric features from QWERTY perform compared to
keypad?
• Perform mixed mode experiments• Collect input that combines spreadsheet, browser, and text• Collect spreadsheet input which includes all numeric entry from keypad
• Incorporate Multi Biometric• Keystroke + Mouse Movement + Stylometry
Backup Slides
Generate ROC Curves from kNN Data(vary m from 0 to k [m is the controlling or threshold parameter])
R. Zack, C. Tappert, and S.Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using anImproved k-Nearest-Neighbor Classification Method," IEEE 4th Int Conf Biometrics (BTAS 2010), Washington D.C., 2010.
The m-kNN procedure with k = 9 and m = 5
For each Q (questioned) test sample:
• Examine the top k nearest-neighbors• count the number of within-class matches• If the number of within-class matches >= a threshold
of matches (m), the user is authenticated. Otherwise rejected.
Generate the ROC curve as follows:
• vary m from 0 to k• calculate FAR / FRR in each of the following cases:
• m = 0, authenticate if 0 or more of the k choices are within• m = 1 authenticate of 1 or more of the k choices are within• and so on until m = 9 in this case
Linear Rank Weighting Method:
• 1st choice weight = k, 2nd choice weight = k-1… weight = 1
• Authenticate a user if the sum of the weighted-within-classchoices >= the m threshold
• Threshold varies from 0 to k(k+1)/2 (maximum score)
Equal Error Rates(From the Literature)Long Input:
• Ferreiar and Santos: 1.4%• Monaco using data from Villani: 1.7%
Generate the ROC curve as follows:
• vary m from 0 to k• calculate FAR / FRR in each of the following cases:
• m = 0, authenticate if 0 or more of the k choices are within• m = 1 authenticate of 1 or more of the k choices are within• and so on until m = 9 in this case
Multi Biometrics for Intrusion Detection • Motor Control Level: keystroke + mouse movement
• Linguistic Level: stylometry (char, word, syntax)
• Semantic Level: target likely intruder commands
Intruder
Keystroke + Mouse
Stylometry
Motor Control Level
Linguistic Level
SemanticLevel
Future Work (continued)
Intruder Experiment Design (continued)
• Authenticate user on various window sizes, beginning 300-keystroke windows• Window Type 1: use overlapping windows to:
• Minimize the “wait” period for the next authentication• Maximize fast intruder detection
1 300 600 900 1200 1500 1800
300KS
300KS
300KS
300KS
300KS
300KS
150
300KS
450 750 1050 1350 1650
300KS
300KS
300KS
300KS
Figure 1.5-1 Overlapping Window Burst Authentication
EISIC 2012 27
Continuous vs Continual Authenticationwith Data Capture Windows
• Continuous (ongoing) burst authentication
• Continual burst authentication with pauses
0 5 min 10 min
1min
1min
1min
Burst 1 Burst 2 Burst 3
0 8 min 30 min
1min
1min
1min
PauseThreshold
Burst 1 Burst 2 Burst 3
PauseThreshold
Background (continued)
• DARPA (Defense Advanced Research Projects Agency) through their Cyber Genome Program is funding research for the development of new software based authentication biometric modalities
•These include keystrokes and targets a desktop environment running Microsoft Office applications as the standard computer system platform
DARPA. Active Authentication Program. https://www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0, accessed 2014.
• The 2008 United States Higher Education Opportunity Act requires institutions of higher learning to make greater online access control efforts by adopting ubiquitous identification technologies
HEOA. Higher Education Opportunity Act (HEOA) of 2008. http://www2.ed.gov/policy/highered/leg/hea08/index.html, accessed 2014.
Spreadsheet Template2011 2010 2009
AssetsCash
Investments:
Cash
Equity Securities
Corporate debt securities
US government securities
Private equity
Real estate
Total Investments 0 0 0
Other Assets
Total Assets $0 $0 $0
Liabilities and Net Assets
Liabilities:Penalities
Accounts Payable
Advance from Lendor
Federak excuse tax
Total Liabilities 0 0 0
Net Assets:Tangiable
Non Tangiable
Total Net Assets 0 0 0Total Net Assets and Liabilities $0 $0 $0
Special Journal EntriesEnter Journal Entry name here
Enter Journal Entry name here
Enter Journal Entry name here
Total Journal Entries $0.00 $0.00 $0.00
top related