keystroke recognition using android devices€¦ · keystroke) to create a template and store it....

55
Keystroke recognition using Android devices João Paulo Sim-Sim Lopes Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisor: Prof. Paulo Luís Serras Lobato Correia Examination Committee Chairperson: Prof. José Eduardo Charters Ribeiro da Cunha Sanguino Supervisor: Prof. Paulo Luís Serras Lobato Correia Members of the committee: Prof. Rui Jorge Henrique Calado Lopes April 2015

Upload: others

Post on 23-Sep-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

  

Keystroke recognition using Android devices

João Paulo Sim-Sim Lopes

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Prof. Paulo Luís Serras Lobato Correia

Examination Committee

Chairperson: Prof. José Eduardo Charters Ribeiro da Cunha Sanguino

Supervisor: Prof. Paulo Luís Serras Lobato Correia

Members of the committee: Prof. Rui Jorge Henrique Calado Lopes

April 2015

Page 2: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

ii

Abstract The term “biometrics” is derived from the Greek words “bio” (life) and “metrics” (to measure).

Biometric recognition is therefore related with the recognition of people based on their characteristics.

Automatic biometric recognition systems have become available over the last decades, due to

significant advances in computation. However, until recently, specific devices were needed for

biometric recognition. Nowadays, smartphones have a considerable processing power, allowing to

implement some biometric algorithms. A demand for biometric recognition was created due to

increased smartphone market penetration, since the devices hold sensible personal information. To

have a secure access to sensitive information some type of security against illegitimate users is

needed. Biometric security is therefore a must on these devices, given that the traditional PINs

(Personal Identification Number) can be stolen, forgotten or cracked. On the other hand, personal

characteristics are unique and can’t be forgotten and are hardly stolen, making biometric validation

superior to PIN usage, creating a demand for biometric validation applications to secure people’s

information. To increase security a PIN is commonly used together with biometric identification. The

system proposed in this dissertation aims to monitor mobile phone users for a pattern while writing on

keyboards (keystroke) and then using this pattern to secure the mobile phone from unauthorized

users. The proposed system can use an algorithm based either on Euclidean distances or Support

Vector Machines, for the classification stage. Encouraging results were obtained using the SVM

classifier. 

Keywords

Biometric recognition, personal identification, keystroke dynamics, smartphone, Euclidean distances,

Support Vector Machine.

Page 3: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

iii

Resumo O termo "biometria" é derivado da palavra grega "bio" (vida) e "métricas" (para medir).

Reconhecimento biométrico está relacionado com o reconhecimento de pessoas com base nas suas

características. Sistemas de reconhecimento biométrico automáticos tornaram-se disponíveis ao

longo das últimas décadas, devido a avanços significativos na computação. No entanto, até

recentemente, eram necessários dispositivos específicos para reconhecimento biométrico. Hoje em

dia, os smartphones têm um poder considerável de processamento, permitindo a implementação de

alguns algoritmos biométricos. Uma procura por reconhecimento biométrico foi criada devido ao

aumento da penetração no mercado dos smartphones, já que os dispositivos contêm informações

pessoais sensíveis. Para ter um acesso seguro a informações sensíveis, é necessário ter algum tipo

de segurança contra usuários ilegítimos. Segurança biométrica é, portanto, uma obrigação a ter

nesses dispositivos, dado que os PINs tradicionais (Personal Identification Number) podem ser

roubados, esquecidos ou descobertos. Por outro lado, as características pessoais são únicas e não

podem ser esquecidas e dificilmente são roubadas, tornando a validação biométrica superior ao uso

do PIN, criando uma demanda por aplicações de validação biométricas para garantir a segurança da

informação das pessoas. Para aumentar a segurança de um PIN é comum usar técnicas de

identificação biométrica em conjunto. O sistema proposto nesta dissertação tem como objetivo

monitorizar o padrão dos utilizadores do telemóvel enquanto escrevem em teclados (keystroke) e, em

seguida, usar esse padrão para proteger o telemóvel de utilizadores não autorizados. O sistema

proposto pode usar um algoritmo baseado quer em distâncias euclidianas ou Support Vector

Machines (SVM), para a fase de classificação. Resultados encorajadores foram obtidos utilizando o

classificador SVM.

Palavras-chave

Validação biométrica, identificação pessoal, keystroke dynamics, Smartphones, Euclidean distances,

Support Vector Machine

Page 4: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

iv

Table of Contents

Abstract ................................................................................................................................................... i 

Resumo .................................................................................................................................................. iii 

1  Introduction .................................................................................................................................... 1 

1.1  Context ................................................................................................................................... 1 

1.2  Biometric systems ................................................................................................................. 1 

1.3  Objectives .............................................................................................................................. 4 

1.4  Contributions ......................................................................................................................... 5 

1.5  Organization of the text ........................................................................................................ 6 

2  Mobile biometrics state of the art ............................................................................................... 7 

2.1  Biometric recognition techniques ....................................................................................... 7 

2.1.1  Keystroke dynamics ..................................................................................................... 7 

2.1.2  Face recognition ........................................................................................................... 8 

2.1.3  Iris scan .......................................................................................................................... 9 

2.1.4  Voice recognition .......................................................................................................... 9 

2.1.5  Hand geometry ............................................................................................................ 10 

2.1.6  Gait ................................................................................................................................ 10 

2.1.7  Handwritten biometric signatures ............................................................................. 11 

2.1.8  Choosing the technique ............................................................................................. 12 

2.2  Keystroke dynamics as a biometric trait .......................................................................... 12 

2.2.1  Input sensor ................................................................................................................. 12 

2.2.2  Features ....................................................................................................................... 13 

2.2.3  Classification techniques ........................................................................................... 14 

2.2.4  Keystroke models ....................................................................................................... 17 

2.2.5  Conclusion ................................................................................................................... 19 

3  Proposed keystroke dynamics recognition application ......................................................... 20 

3.1  Architecture .......................................................................................................................... 20 

3.2  Capturing user input ........................................................................................................... 21 

3.3  Classification and decision ................................................................................................ 22 

4  Results .......................................................................................................................................... 26 

4.1.1  Average key timing measures .................................................................................. 26 

4.1.2  Euclidean distances ................................................................................................... 29 

4.1.3  SVM .............................................................................................................................. 32 

4.1.4  Conclusion ................................................................................................................... 36 

5  Using the application .................................................................................................................. 36 

Page 5: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

v

6  Conclusions and further Work .................................................................................................. 42 

6.1  Summary and conclusion .................................................................................................. 42 

6.2  Further work......................................................................................................................... 42 

7  References .................................................................................................................................. 44 

Page 6: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

vi

Index of Figures

Figure 1 – Generic biometric system ................................................................................................. 2 

Figure 2 – Identification vs. Verification (griaulebiometrics, 2014) ............................................... 3 Figure 3 - Classification of user authentication approaches .......................................................... 3 Figure 4- FRR, FAR and CER ............................................................................................................ 4 

Figure 5 – Generic architecture of a biometric recognition system............................................... 5 

Figure 6 – Timing intervals between consecutive key presses (McLoughlin & Mohanavel, 2009) ....................................................................................................................................................... 8 Figure 7 – Face preprocessing (Tao & Veldhuis, 2006) ................................................................. 9 Figure 8 - Voice recognition process (Shabeer & Suganthi, 2007)............................................. 10 Figure 9 - Gait cycle phases (physio-pedia, 2015) ........................................................................ 11 Figure 10 –Different types of input ................................................................................................... 13 Figure 11 – Various keystroke features .......................................................................................... 14 Figure 12- Classification techniques for keystroke dynamics (Support Vector Machine, Back-Propagation Neural Network, Predictive Adaptive Resonance Theory, Radial Basis Function Network) ............................................................................................................................................... 16 Figure 13 - Digraph ............................................................................................................................ 16 Figure 14 – Monograph ..................................................................................................................... 17 Figure 15 - User enrollment process (Awad & Traore, 2013) ...................................................... 18 Figure 16- Verification process (Awad & Traore, 2013) ............................................................... 19 Figure 17- System architecture ........................................................................................................ 20 Figure 18 –Dwell time ........................................................................................................................ 21 Figure 19 – Flight time ....................................................................................................................... 21 Figure 20 – Soft keyboard ................................................................................................................. 22 Figure 21 - Illustrative configuration of hashmap from training labels ........................................ 23 Figure 22- Illustrative configuration of hashmap from train and test features ........................... 24 Figure 23 - SVM linear kernel illustration (Ranga, 2015) ............................................................. 25 Figure 24- SVM RBF kernel illustration (openclassroom.stanford.edu, 2015) .......................... 26 Figure 25- Average dwell time from all users (mxplayer) ............................................................. 27 Figure 26 - Average flight time from all users (mxplayer) ............................................................ 27 Figure 27 - Average dwell time from all users (Lisboa2014) ....................................................... 28 Figure 28 - Average flight time from all users (Lisboa2014) ........................................................ 28 Figure 29 - Average dwell time from all users (tecnicoLisboa) .................................................... 29 Figure 30 - Average flight time from all users (tecnicoLisboa) .................................................... 29 Figure 31- ROC curve for 'mxplayer' (Euclidean distances) ........................................................ 31 Figure 32 - ROC curve for ‘Lisboa2014’ (Euclidean distances) .................................................. 31 Figure 33 - ROC curve for ‘tecnicoLisboa’ (Euclidean distances) ............................................... 32 Figure 34 - Probability of the claimed user be the true user (mxplayer) .................................... 33 Figure 35 - Probability of the claimed user be the true user (Lisboa2014) ................................ 33 Figure 36 - Probability of the claimed user be the true user (tecnicoLisboa) ............................ 34 Figure 37 - ROC curve for ‘mxplayer’ (SVM) .................................................................................. 35 Figure 38 - ROC curve for ‘Lisboa2014’ (SVM) ............................................................................. 35 Figure 39 - ROC curve for ‘tecnicoLisboa’ (SVM) ......................................................................... 36 Figure 40 - Login screen .................................................................................................................... 37 Figure 41 - Password choices .......................................................................................................... 37 Figure 42 - Confirmation box for a user that already exists ......................................................... 37 Figure 43 – Main screen .................................................................................................................... 38 

Page 7: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

vii

Figure 44 - Training screen ............................................................................................................... 39 Figure 45 - Training accepted screen .............................................................................................. 39 Figure 46 - Box to choose an algorithm to proceed with verification .......................................... 40 Figure 47 - Imposter message .......................................................................................................... 40 Figure 48 - Verification screen after a user is approved ............................................................... 41 

Page 8: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

viii

Index of Tables

Table 1 - ROC evaluation for the best accuracy achieved (Euclidean distances) ................... 30 Table 2 - ROC evaluation for the best accuracy achieved (SVM) .............................................. 34 

Page 9: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

ix

List of Acronyms

ARTMAP Predictive Adaptive Resonance Theory

AUC Area Under the Curve

BPNN Back-Propagation Neural Networks

CER Crossover Error Rate

DTW Dynamic Time Warping

EER Equal Error Rate

FRR False Recognition Rate

FAR False Acceptance Rate

FP False Positive

FN False Negative

GPS Global Positioning System

ID Identify

MCS Multiple Classifier System

OS Operating System

PIN Personal Identification Number

RBFN Radial Basis Function Network

RBF Radial Basis Function

ROC Receiver Operating Characteristic

SVM Support Vector Machine

TP True Positive

TN True Negative

Page 10: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

1

1 Introduction

1.1 Context

Mobile phones have a central role in everyday life. Worldwide, the number of active cellphones

now exceeds the world population, and the same penetration growth trend is observed in Portugal.

Among these, smartphones are assuming an increasing share of the market. Smartphones are in fact

small computers, with increasingly powerful processors and considerable amounts of memory and

storage capabilities. Also, smartphones include displays capable of providing friendly graphical

interfaces and offer touch sensitive screens. This allows the development of advanced applications

covering all aspects of life: from voice communications, to internet access, personal entertainment, or

even to take care of mobile payments and other financial applications. Since smartphones are in fact

small computers, their operation is governed by an operating system. Today the dominating operating

system in the market is Android, with more than 50% market share, followed by iOS with 42% and the

remaining distributed between Microsoft, BlackBerry and Symbian, according to (Mobile Markting,

2015).

Since smartphones appeared and took charge of our information and communications, a need

to enhance the security of these devices exists. For example, there are applications to track

smartphones from their GPS unit and control the device remotely, antivirus, backup, etc. Besides this,

most people use a pattern (combination of movements that lock the phone screen) or a PIN to access

the device, however they are easy to detect crack. Nowadays, a few smartphones already have

biometric recognition, such as face (Nexus phones) and fingerprint (IPhone 5S). However, adding a

biometric trait to PIN’s hasn’t been commercially explored. This work will explore a person writing on

smartphones as a biometric trait in order to increase access security.

1.2 Biometric systems

Knowledge based authentication seeks to look for the user identification through a service

access (ex.: website). Object based identification, consists on comparing the attributes of the original

object to what is known about objects with the same features. Finally, biometric based authentication

needs a unique characteristic of the user to create a copy of that characteristic (ex.: hand, iris, face,

keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the

user traits will be compared to ones stored in the template.

Figure 3 represents 3 different types of user authentication currently available. Knowledge

based type seeks to identify the user by requiring personal information. A good example, is used by

some websites, when recovering the password, by implementing security questions. Regarding object

based authentication, it is a little different as it requires a person besides the user, to identify that

object. Finally, the most important, biometric authentication, which is divided into two categories.

Physiological, which is associated with physical characteristics of the user. Behavioral, which

describes how a user behaves during that type of authentication.

Biometric systems are automated methods that verify or recognize the ID of a person based on

a physical, physiological or behavioral characteristics. When conjugated with traditional security

Page 11: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

2

methods they provide an extra level of security. Examples of biometric characteristics would be

fingerprints, face, iris, and others that will be enumerated in the section entitled ‘Mobile biometrics

state of the art’.

Enrollment is the process of collecting biometric data from a user and store it in the system.

Furthermore, authentication is the identification or verification of the user’s identity by matching the

data provided by the user with the data stored in the system. During the enrollment, the biometric

system stores biometric traits of the user. During authentication, this traits are used to recognize a

user who provides his biometric trait. Depending on how the biometric system is projected, it can

operate in two different modes which are verification and identification, as it’s shown in Figure 2. In

identification mode, the user does not required to provide his identity, thus the biometric trait provided

by the user is matched with all the users enrolled in the template in order to match or reject the

claimed identity. On the other hand, in verification mode the biometric trait is matched only against the

user enrollment template.

Enrollment is an important step regarding the accuracy of the template, so it should not be

limited to one-time step and keep updating the user template. As observed in Figure 1, a generic

biometric system is assembled by 5 major components, sensor, feature extraction, feature matching,

decision maker and a template. The first component, biometric sensor, is responsible for the scanning

the biometric trait of the user, being the interface between the user and the authentication system.

Next, feature extraction is responsible to extract salient data that is responsible to distinguish between

different users. During the enrollment the data extracted is stored in a template. The matcher is a

module that compares the input with the template and then indicates the similarity between those two.

The decision module, makes the authentication decision.

SensorFeature 

extractionApplication 

deviceDecision

Template

Matcher

Figure 1 – Generic biometric system

Page 12: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

3

Figure 2 – Identification vs. Verification (griaulebiometrics, 2014)

Any biometric system will exhibit occasional false acceptance of intruders and false rejection of

legitimate users. The corresponding False Accept Rate (FAR) and False Reject Rate (FRR), as well

as the Equal Error Rate (EER or CER, which stands for Crossover Error Rate), where FAR equals

FRR, are important metrics to ensure the validation. FAR ought to be low, as it specifies the probability

that an impostor can use the device, as well as FRR, which can cause inconvenient when the ratio is

high. An illustration of these parameters can be seen in Figure 4, which demonstrates that FAR and

FRR are inversely proportional.

Figure 3 - Classification of user authentication approaches

Page 13: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

4

Figure 4- FRR, FAR and CER

Besides the FRR and FAR metrics, there are others such as sensitivity, specificity and

accuracy. These 5 metrics will be used to assess the obtained results, in section 4. Sensitivity, also

called true positive rate, measures the actual positives which are correctly identified, and it is

complementary to the false negative rate. Specificity, also called true negative rate, measures the

negatives which are correctly identified, being complementary of the false positive rate. Sensitivity and

specificity can be calculated according to formulas (1) and (2), respectively. The acronyms TP, TN, FP

and FN stands for true positives, true negatives, false positives and false negatives, respectively.

Accuracy, which can be calculated using equation (3), assesses how well the system behaves, and

allows choosing the optimal operation threshold for a system.

(1)

(2)

(3)

1.3 Objectives

This work is focused on the exploitation of biometric based authentication using keystroke

dynamics in mobile devices, which uses a person’s unique typing pattern to aid in identifying that

Page 14: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

5

person. This pattern is difficult to observe but can be produced by anyone who is able to press a

keypad. Keystroke dynamics is a biometric trait usable for authentication which isn’t much exploited

until now. However, there’s much interest on developing it for mobile phones once there’s an

exponential growth of smartphone usage and these devices carry a lot of personal information,

currently being secured by PINs or patterns which can be stolen or forgotten.

Given the rapid increase observed in the adoption of smartphones and of the corresponding

applications market, there is also an opportunity for the development of biometric security applications.

As such, there are three main types of mobile keyboards that need to be considered: numeric, thumb-

based (QWERTY) and soft keyboards (touch keyboards). Due to the variety of available keyboards,

there are some challenges on the adaptation of the keystroke analysis methods originally developed

for PC/traditional keyboards, to the mobile phone case. The main aspects to take into consideration

include:

Usage of small keys – mobile devices are limited in size, leading to the usage of smaller

keyboards. The user tends to make more writing errors, can stop a sentence while

writing, raising the challenge of identifying which keystrokes are valid.

Key shape and response to the applied pressure makes the keystroke analysis for

mobile handsets significantly different from the one performed over traditional

keyboards.

Mobile devices have limited memory and CPU capability thus algorithms to use ought to be

simple.

1.4 Contributions

The main objective of this work is to develop an application for Android OS smartphones,

which performs biometric verification of the user based on the keystroke dynamics when entering a

password. The developed biometric recognition system follows the general architecture represented in

Figure 5. Based on that figure, the author has designed, implemented and tested a biometric

verification system capable of identifying users. However, as the smartphones standard keyboard

does not have some of the necessary functionalities for this work, another keyboard had to be

developed in order to proceed with the remaining work. The majority of the corresponding software

implementation has been developed by the author. A deeper explanation of all these steps will be

provided in section 3.

InputFeature 

extractionClassification Decision

 

Figure 5 – Generic architecture of a biometric recognition system

Page 15: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

6

1.5 Organization of the text

This work follows in detail the development of a biometric keystroke application for an

Android powered mobile device. The description of this work is made in four sections with the current

one already introducing the contextualization, demand for biometric security, biometric system,

objectives and organization and addressed topics which compose this work. Each section is focused

on different steps, but all necessary to achieve the proposed application.

Section 2 presents a general overview of all the biometric techniques in order to understand

better their behavior. After that, the author provides a more detailed overview over the chosen

technique for this work.

Section 3 provides in detail the approach to development of the application, as well as,

architecture, capturing user input, classification and decision. In architecture section, all the steps

essential in the coding process are explain but that does not replace all the coding necessary.

Section 4 targets the evaluation of the data with the proposed algorithms. After that, the

results performance are presented and discussed.

Section 5 presents a walkthrough for the application usage.

Section 6 is reserved for conclusions and some further work that can be made.

Page 16: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

7

2 Mobile biometrics state of the art From law enforcement, to military forces, public transportation, border control and commercial

shipping authorities, mobile biometrics are quickly becoming a lifesaver to these industries in order to

speed up processing of people and goods. Access to business data from mobile devices requires

secure authentication, but traditional password schemes based on a mix of alphanumeric and symbols

are cumbersome and unpopular, leading users to avoid accessing business data on their personal

devices altogether (Trewin, et al., 2015).

This section overviews the main solutions currently available for biometric recognition

techniques, such as, face recognition, iris scan, voice recognition, hand geometry, gait and

handwriting signature. It discusses some of the most used biometric recognition approaches as well

as their main advantages and disadvantages.

Section 2.1 is dedicated to presenting the main biometric recognition techniques employed in

mobile devices, while Section 2.2, focuses on the main theme of this work, mobile keystroke

dynamics, providing more detail about this biometric trait.

2.1 Biometric recognition techniques

Biometric techniques for recognition can rely on information extracted from different

modalities, or traits. The usage of several biometric traits in a mobile environment is discussed in this

section.

2.1.1 Keystroke dynamics

Keystroke dynamics recognition consists in the recognition of an individual based on the way

he types, using a mobile keyboard. This is the goal of this work, and this topic is further elaborated in

the final part of this chapter. This subsection just introduces the problem and defines the main

concepts. In particular when employing keystroke dynamics as a biometric trait, there are two major

authentication strategies that can be employed: static or continuous.

In static biometric authentication, each participant provides his biometric features during

enrollment. These features are stored in a template. Whenever a person tries to authenticate herself,

she will provide a new sample of the same biometric feature and this new input is compared to the

ones previously stored in the template. If they are similar, the input will match, and it will validate the

user.

Authentication using static keystroke dynamics is based on measuring the duration of key

presses by the user, and on the time latency between consecutive keystrokes, relating them as they

are being pressed. For enrollment the user is asked to type a fixed text a number of times and each

time the measures are stored in a template. When attempting authentication, the user types the text

once again while measuring duration and latency timings to be compared against the stored values

(Crawford, 2010).

Page 17: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

8

In continuous biometric authentication instead of typing a fixed text, the system is used with

unconstrained textual input (free-text), typically for a longer period of time. Over that time, information

is collected on how the user types on the keyboard, during enrollment. During authentication the

features computed from the input text values are compared to the ones stored in the template (Awad

& Traore, 2013).

Figure 6 represents the duration of one keystroke which is the sum of the time when the key is

pressed and when it is released.

Figure 6 – Timing intervals between consecutive key presses (McLoughlin & Mohanavel, 2009)

2.1.2 Face recognition

Face is probably the biometric trait most frequently used for recognition purposes. System

implementing face recognition algorithms are usually composed by face preprocessing, face

authentication and information fusion.

The face preprocessing module is responsible for the segmentation of an adequate facial

image from the available photo or video footage. This process typically includes three steps, namely

face detection, face registration and illumination normalization.

Face detection can consist in a simple scheme using rectangular binary features and the

integral image. There are two classes of methods to achieve face detection, heuristic-based and

classification-based methods. The first class comprehend skin color and facial geometry methods. The

heuristic-based methods are simple to implement but aren’t reliable as they are vulnerable to exterior

changes. On the other hand, classification-based methods treat face detection as a pattern

classification problem, thus they benefit from the existing pattern classification resources, being able

to deal with more complex scenarios. However, patterns to be classified have to cover the exhaustive

set of image patches at any location and scale of the input image, so classification-based methods

typically have a high computational load (Tao & Veldhuis, Biometric Authentication System on Mobile

Personal Devices, 2010).

The next step is to register the face. To do that facial features are combined (Tao & Veldhuis,

2006). It is common to use holistic methods or local methods. Holistic methods take advantage of both

global face texture information and the local facial feature information, however it has a relatively high

computational load. Nevertheless a local registration method is more direct and faster as it only takes

the locations of local facial features to calculate the transformation (Tao & Veldhuis, 2010).

Page 18: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

9

Furthermore, illumination normalization is characterized by two methods. One method studies

the illumination problem and the other works on the image pixel values (Tao & Veldhuis, 2010).

Finally, after detecting, registering and normalizing the image, comes the verification, which uses two

classes the user and the impostor classes. They classify overlapped regions with a minimal possible

error, using the likelihood ratio in the Newman-Pearson sense.

An example of this process is represented in Figure 7.

Figure 7 – Face preprocessing (Tao & Veldhuis, 2006)

2.1.3 Iris scan

Iris is a biometric trait which exhibits good recognition properties, notably due to not changing

with aging.

Most iris recognition methods requires an infrared illumination to highlight the characteristics of

the iris, however on mobile phones that’s not possible, as they should work on the visible spectrum.

To be able to do that, there are two main approaches to cope with the noisy images in a color iris

recognition system are either to apply image enhancement techniques or to extract multiple types of

features and apply a fusion mechanism (Radu P. , 2012).

2.1.4 Voice recognition

Voice activity detection plays an important role for an efficient voice interface between human

and mobile devices. As the user records his voice, the speech is digitalized and the frequency

spectrum of speech signal is encoded and stored (Shabeer & Suganthi, 2007). An illustration of this

process is shown in Figure 8.

Furthermore, when a person starts using a cell phone his speech spectrum can be coded for

recognition purposes and then compared with the stored coded spectrum. On the other hand, to save

computational power, a voice trigger system could be implemented using a keyword-dependent

Page 19: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

10

speaker recognition technique (Lee, Chang, Yook, & Kim, 2009). The goal of this component is to

avoid false activation from the voice recognition.

Figure 8 - Voice recognition process (Shabeer & Suganthi, 2007)

2.1.5 Hand geometry

Hand geometry recognition not only has a good performance in identifying the user but also it is

known to be a non-invasive biometric technique. This approach works well with low resolution

cameras, which is an upside for mobile phones. However, it can be difficult to distinguish the hand

from the background, due to illumination, lack of contrast between hand and background or even blur

effects within the image, making the image segmentation a challenge for mobile phones.

Segmentation step is essential in hand biometrics, given that a subsequent feature extraction depends

on an accurate and precise hand isolation, otherwise template features could be inappropriately

extracted, resulting in a reduction in individual identification (Sierra, Casanova, Ávila, & Vera, 2009).

On (Franzgrote, et al., 2011) they use flash illumination to enhance the hand silhouette while

darkening the background and use an effective method for extraction and representation of palm line

orientation information.

2.1.6 Gait

Gait corresponds to the particular manner of walking of a subject. It has two basic components,

the swing phase and the stance phase. The stance phase is when one foot is in contact with the

ground and the swing phase is when one of the feet is in the air for limb advancement. Each person

has a specific stride, making it a unique person’s signature. The most basic form of gait is step

detection and characterization. An illustrative image of a gait cycle is included in Figure 9.

Page 20: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

11

Figure 9 - Gait cycle phases (physio-pedia, 2015)

As smartphones are equipped with various sensors such as gyroscopes and accelerometers,

often, the gait cycle is measured with a combination of sensors. This is the case in (Minh Thang,

Quang Viet, Dinh Thuc, & Choi, 2012) where they perform data acquisition with a built-in

accelerometer, while a user walked naturally. However, due to battery power saving the sampling rate

is low and time intervals between two consecutive acceleration values are not equal. Moreover, the

biometric gait of each individual is different day by day (D.S., M.S., S., & J.N., 2012), making this

method not very efficient for biometric validation. Finally, gait analysis within consumer devices must

overcome several difficulties that specialized gait sensors do not face, for example, the compensation

for different positions that the mobile device may be placed during motion. For the reading to be

accurate the mobile phone has to be always facing the same position.

2.1.7 Handwritten biometric signatures

Despite handwrite signatures are well deployed on particular devices, they aren’t on mobile

devices. With the growth of smartphones and tablets (touch screens), was created a new opportunity

to migrate handwritten signature authentication to mobile devices. However, some of the signals

captured on traditional on-line signature systems are not present in portable devices, making it a

challenge for algorithm implementation.

On (Blanco-Gonzalo, Miguel-Hurtado, Mendaza-Ormaza, & Sanchez-Reillo, 2012) they tested 7

different devices with different characteristics. They only used time, X and Y signals. This restriction

comes from the fact that some devices do not capture pressure, making it less robust. Regarding

results, they concluded that visual feedback from the signature is a major performance parameter,

showing that the small devices offer the best CERs, on average. Capacitive screens performed better

than resistive screens.

Moreover, on (Mendaza-Ormaza, Miguel-Hurtado, Blanco-Gonzalo, & Jose Diez-Jimeno, 2011),

they used 4 different devices, all with Android OS, including devices with both resistive and capacitive

screens. Due to hardware constrains, azimuth and inclination angles were not captured. In addition,

resistive screens’ precision of the pressure values obtained was really low, with a great variation

between devices, making it useless for implementation of an algorithm. On the other hand, capacitive

Page 21: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

12

screens provide information about size of the surface in contact with the touch screen. Given that, they

used 3 different temporal signals for each type of screen. For capacitive screens, x-axis, y-axis and

size signals were used and for resistive screens x-axis, y-axis and pressure signals were used. SVM

and DTW algorithms were used for signature classification.

2.1.8 Choosing the technique

After presenting a brief overview of several biometric traits that may be used to achieve a

stronger authentication of smartphone users, the author has developed a particular interest in

techniques using keystroke dynamics as a biometric trait. The reason for this choice is related to the

fact that people often have to write messages on their smartphones. Moreover, adding an extra layer

of security while writing passwords or PIN’s seemed an excellent idea to protect the information within

the device.

2.2 Keystroke dynamics as a biometric trait

As discussed above, this work is about using keystroke dynamics as a biometric trait.

Therefore, this section provides a more detailed overview of the published work on this topic, to lay the

foundations for the subsequent development of smartphone-based authentication system based on

keystroke dynamics analysis.

2.2.1 Input sensor

A system relying on the analysis of keystroke dynamics needs to capture the relevant

information by accessing the user typing characteristics. For that purpose the nature of the keyboard

used as an input sensor needs to be known. As summarized in Figure 10, the input can be done via a

soft or a hard keyboard. A hard keyboard is an external input device used to type data into some sort

of computer system whether it be a mobile device, a personal computer, or another electronic

machine. A keyboard usually includes alphabetic, numerical and common symbols used in everyday

writing. On the other hand, a soft keyboard is a system that replaces the hard keyboard on a

computing device with an on-screen image map. These keyboards are typically used to enable input

on a handheld device so that a keyboard does not have to be carried with it.

For the purpose of this work, only soft keys will be considered, as smartphones have software

keyboards. The touch screens, in these devices, used for displaying the keyboard and receiving the

corresponding information, can be either resistive or capacitive. On resistive screens the pressure

applied on the screen can be read, as they function based on finger pressure. Furthermore, capacitive

screens can read the size of the surface of the finger, once they detect anything that is conductive.

Page 22: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

13

Input

Hard keys

Soft keys

Desktop keyboard

Hard mobile 

keyboard

Capacitive

Resistive

Mobile keyboard

Figure 10 –Different types of input

2.2.2 Features

A feature is a distinctive attribute or characteristic of something. Every human has different

characteristics or attributes and hence it is possible to distinguish, i.e., identify them. With that in mind,

in this work, features will be defined such as the typing rhythm of the user.

There are different methods and metrics upon which keystroke analysis can be based

(Shanmugapriya & Padmavathi, 2009):

Static at login: a known keyword, phrase or predetermined text is captured and then

compared against stored typing patterns

Periodic-dynamic: the user typing pattern is captured during a part of a logged session

and then compared against stored typing patterns to determine deviations.

Continuous-dynamic: similar to the periodic dynamic but the authentication is done

during the entire logged session.

Keyword-specific: is an extension of continuous or periodic dynamic but related to

specific keywords.

Application-specific: continuous or periodic dynamic applied to a specific application.

Keyword latency: considers the overall latency for a complete word

Some additional features can be considered when using smartphones and tablets:

Pressure during typing

Fingertip size

Physics of the mobile device: it means how the user holds his device and which is the

preferred hand.

Figure 11 proposes a compact representation organizing the information presented above.

Page 23: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

14

Feature Extraction

Static

Continous

Template (specific times)

Free‐Text (whole session)

Pressure during typing

Finger size

Key press duration

Key dwell time

Key press duration

Key dwell time

Static at login

Periodic dynamic

Key word specific

Keyword latency

Continuous dynamic

Trigraph latency

Keyword latency

Pressure during typing

Finger size

Figure 11 – Various keystroke features

2.2.3 Classification techniques

Classification techniques enable a classifier to identify to which of a set of categories a new

observation belongs. This is possible through the training of the algorithm on the basis of a training set

of data containing observations whose category membership is known. A category would be

composed by a set of features which defines it. An example would be assigning a password input into

true or false user classes.

In this context, there are two main classification approaches followed for keystroke analysis,

statistical techniques and neural networks techniques or a combination of both (Karman & Krishnaraj,

2010). Furthermore, both need a matcher and stored data, to allow the processing of the keystroke

timings. Figure 12 illustrates some classification techniques.

For statistical analysis, some of the methods commonly applied on keystroke are:

Euclidean or Manhattan distance measures between two vectors of typing characters.

Not only that but total time periods and pressure are measured and stored as a

template.

SVM, which separates vector samples in Rn. Each feature will correspond to a plane xi

where a binary set will be represented. The goal is to design a hyper plane that

classifies all training vectors in two or more classes, that leaves the maximum margin

from all classes.

Pattern recognition and neural network is comprised by fuzzy ARTMAP (Predictive

Adaptive Resonance Theory), RBFN (Radial Basis Function Network), BPNN (Back-

Propagation Neural Networks) and Bayes’ rule algorithms.

As an example one can consider the bio password (Karman & Krishnaraj, 2010) where there

are 3 statistical measurements (mean, standard deviation and median) that are submitted to a feature

selection algorithm. Posterior to that, there’s a classification that aims to find the best class closest to

the classified pattern.

Page 24: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

15

The essential features to be used for the classification step are keystroke timings, which are

the timings between successive keystrokes, press and release events. The time between both events

is called dwell time, on the other hand, the time between the release event and the press event of the

next key is named flight time. The template to be used for recognition is constructed with basis on this

concepts. The template refers to the process of determining, from a given set of available biometric

acquisitions, which are the best suited to represent the collected data and the statistics of the

considered users’ biometrics (Maiorana, Campisi, González-Carballo, & Neri, 2011). For example, the

mean relates to the fact that typing biometrics are behavioral, so the samples collected won’t be

consistent. As result the trend of the samples are used for authentication. This can be done taking into

account different approaches.

A simplistic approach is used in (McLoughlin & Mohanavel, 2009), where the variance will help

to determine the statistical dispersion of the samples. Fixed weights are assigned to each variance

value, with the highest weight assigned to the smallest variance. Three verifiers are used to determine

the authenticity of the user, these being press timings, release timings and overall timings. Each of the

verifiers has mean values, time duration and weighted variance that will be used for authentication.

A more complex approach is used in (Karman & Krishnaraj, 2010) and (Maiorana, Campisi,

González-Carballo, & Neri, 2011). On the first reference mean, standard deviation and median are

calculated for the features. The next step is to select a subset of features through stochastic

algorithms and the classified to find the best match. On the second reference, Euclidean and

Manhattan distances are used to compute the distances between keystrokes. Then, to characterize

the keystroke variability of each user, four statistical values are computed for each latency feature.

After that, an algorithm is used to select a template to apply to keystroke dynamics. The authentication

step is done by comparing the current acquisition with the reference ones (stored and computed

during training).

Even though these models performed well, they are only suitable on alphanumerical

passwords, not on free-text. Due to the many word combination possible in free text, it would be

necessary to enroll all words before putting it to test.

Some recent studies have begun to use neural networks as a pattern classification method.

Common approaches for neural networks include Feed Forward Multilayered Perceptron Networks,

Radial Base Function Networks and Generalized Regression Networks. However, some mobile

devices lack the computing power necessary to employ neural networks, where the processing is

done on the device itself. Neural networks are composed by monographs, digraphs, n-graphs, each

describing the human behavior in performing the described action. A monograph represents the action

of pressing on a key on the keyboard. A digraph represents a typing action performed by the user from

a specific key to another key on the keyboard. As shown in Figure 14 monograph network consists of

two layers with one input node representing the mapped key code and the output node the fly time

associated with the input key. On the other hand, in Figure 13 the digraph network consists of two

layers with two input nodes representing the from and to mapped keys, while the output layer consists

of one node which represents the fly time of the input digraph. (Awad & Traore, 2013).

Page 25: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

16

ClassificationARTMAP

SVM

BPNN

RBFN

Bayes  algorithms

Standard measures

Euclidean

Fixed weights

Manhattan distances

Figure 12- Classification techniques for keystroke dynamics (Support Vector Machine, Back-Propagation Neural

Network, Predictive Adaptive Resonance Theory, Radial Basis Function Network)

1

2

n

From

Time3

To

Input layer Output layerHidden layer

Figure 13 - Digraph

Page 26: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

17

1

2

n

Key Time

Input layer

Hidden layer

Output layer

Figure 14 – Monograph

2.2.4 Keystroke models

Keystroke models authentication can be classified as either static or continuous. As it was

referred previously, static authentication refers to keystroke analysis performed only at specific times,

as during a login process. An example is PIN model (Karman & Krishnaraj, 2010), where the PIN

number is introduced by the user several times during enrollment. The user timing vector is captured

in and is enrolled in keystroke acquisition. Other keystroke features are extracted and their mean,

standard deviation and median is calculated which is given as input to the feature subset selection.

In addition, continuous authentication performs the same analysis but during the whole session.

This method provides a tool to also detect user substitution after successful login. The free-text model

is a continuous authentication system, looking for the continuously presence of the authorized user.

This is done by analyzing the typing rhythms the users’ shows during their normal interaction with a

computer. There is a long time of data collecting due to many combinations of words. In (Sim &

Janakiraman, 2007) word specific digraphs are constructed from the most common words used, due

to sample dispersion. Nevertheless the achieved results aren’t very accurate: for the best sequences

the accuracy is 80% or more. On the other hand, on (Awad & Traore, 2013) very good accuracy was

achieved. They used monographs and digraphs analysis and a neural network to predict missing

digraphs. The technique assumes that it is possible to enroll the user by covering the most frequently

occurring keys (or most frequently occurring monographs), but not all expected digraphs. It further

assumes that it is possible to approximate the remaining digraphs based on the relation between the

monitored ones. However, all missing monographs will be ignored during the analysis. In addition, the

model is valid for a diversity of keyboards. A key mapping technique sorts the key codes based on

associated average time, and accordingly maps them in corresponding order. This will assist the

neural network in building and approximating the relation between the keys based on the behavioral

distance between them expressed in time. For the elimination of outliers Peirce’s criterion is used.

Page 27: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

18

There are two neural networks (Mono and Di) that are trained for each of the enrolled users. The

enrollment process is shown on Figure 15 as well as verification process shown in Figure 16

The flight time for a missing digraph is obtained from the output of the trained digraph neural

network. The digraph neural network takes as input the DKO (Digraph Key Order) for the ‘to’ and

‘from’ keys of a missing digraph and then returns as output an estimate of the corresponding fly time.

The neural network architecture remains the same for all users, although the weights for each key are

user specific. For 53 users, in a heterogeneous experiment, they achieved a FAR equal to 0.0152%

and a FRR equal to 4.82%. However, neural networks require a high level of data processing, difficult

for a mobile device, but due to the fast advance in technology that might be doable on a mobile phone.

However, their approach doesn’t have an adaptive enrolment scheme. So it isn’t possible for the

method to know if the user changed its behavior.

Figure 15 - User enrollment process (Awad & Traore, 2013)

Page 28: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

19

Figure 16- Verification process (Awad & Traore, 2013)

Finally, there has to be a decision whether the classification is accepted or not. To do that,

there has to be a commitment between the FAR and FRR. These two metrics can be visualized on a

ROC curve, which is a trade-off between the metrics. In general, the decision is based on a threshold,

which determines how close to a template the input needs to be for it to be considered a match. If the

threshold is reduced, FAR will increase opposed to the FRR which will decrease. Conversely, if the

threshold is raised the opposite will happen.

2.2.5 Conclusion

In general, neural networks tend to produce better results than statistical methods, although

neural networks are highly variable since the number of layers and the number of neurons per layer

have a linear relationship with the quality of the results. As the number of layers and neurons

increases, so does the complexity of the network and thus the amount of time required to process

results. Also, each time a user is added, the neural network must be retrained, which also increases

the amount of processing power required to use these methods. So, despite of lower quality results

when using statistical methods, there are still areas where the statistical methods may be superior,

such as when available processing power is limited. However, statistical classifiers do not provide a

strong enough level of pattern classification to support the needs of some authentication systems.

Page 29: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

20

3 Proposed keystroke dynamics recognition application An application or app is a software design to run on smartphones and other mobile devices. In

this case, a new app is created for the Android operating system. It is developed in the Java

programming language using the Android Software Development Kit (SDK), which includes a

comprehensive set of development tools including a debugger, a set of libraries, an emulator,

documentation, sample code, and tutorials.

3.1 Architecture

As illustrated in Figure 2 a biometric system can be projected for identification or verification.

Regarding this work, as the smartphone is a personal device, the application will be developed for

verification. In verification mode the features extracted from the selected biometric trait are matched

only against the corresponding user enrollment template. In Figure 17, the architecture of the

projected system is illustrated.

Fixed Text

Euclidean measures

Key times

InputFeature extraction

Classification Decision

Threshold

SVM

Database

Database

Figure 17- System architecture

In the first step, the input is an alphanumeric password, as the writing done in smartphones is

relatively short making free-text input not the most suitable for smartphones. As the app deals with

keystroke dynamics as a biometric trait, the classification algorithms used for the classification step

need training. Due to that, the developed application has two operation modes. One is the training

mode, in which the user inserts the chosen password to allow the system to capture the corresponding

features and store them in a database. For the purpose of this work, to be able to compare different

types of passwords, three passwords have been chosen, notably: mxplayer, Lisboa2014 and

tecnicoLisboa. This passwords have different characteristics, the first only has lowercase, the second

has upper and lower case as well as numbers and the third has upper and lowercase The second

mode, verification, is the mode where the actual input (of the true user or the intruder) is compared

against the data gathered during training. The app will allow different users to register and choose one

password. Figure 18 and Figure 19 illustrates the two time features that are being analyzed, dwell and

flight time.

Page 30: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

21

Dwell  time

Key press Key release

Figure 18 –Dwell time

Key i Key i+1

Figure 19 – Flight time

3.2 Capturing user input

At the same time each key is pressed or released the time metrics are measured and stored in

a SQLite database, which is available from an Android library. Each training operation has a different

id, and each id has associated with a set of the key codes, dwell times and flights time for the chosen

password. As the application records key timings, while the user is typing the password, if the user

makes a typing mistake, that input will not be valid, because it would invalidate the training data as the

key timing would be greater than expected.

To be able to capture key times, the developed application implements two functions

OnKeyDown and OnKeyUp which handle key down and key up events, respectively, when they occur.

However, in conjunction with the text editor function from Android library, the standard keyboard only

generates number key events, while alphabet does not generate any events.

To circumvent this, a new soft keyboard had to be developed and included in the application.

This keyboard allows to generate the events with the key timings, as well as providing information

about the pressed key codes. While the keyboard is active, it is always ready to receive a key press

from any key. When a key is pressed, it is identified, then the necessary action is applied. These

actions can be applied trough the binding of OnKeyboardActionListener interface to the keyboard

view, which is illustrated in Figure 20. This interface implements a listener for virtual keyboard events,

such as, onPress, onRelease, and onKey methods. The first two, as the names suggest, are called

when a key is pressed or released. These functions are responsible to send a key event with the

system clock time, in milliseconds, and the key code of the press/released key. Then, onKey, is

responsible for sending a key press to the listener, which translates into the writing in the EditText.

Page 31: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

22

Through the above described steps, the OnKeyDown and OnKeyUp functions, are already able

to capture key timings and key codes necessary for keystroke dynamics analysis. In addition, the

screen also detects motion. However, when the keyboard pops up, that functionality isn’t available, but

in the same away that is possible to bind OnKeyboardActionListener interface to the keyboard view, is

also possible to bind onTouchListener interface. This allows to implement onTouch method, which is

called when a touch event is dispatched to the keyboard, allowing to capture pressure and size.

Capacitive touch screens detect size much more than pressure, returning a normalized value between

0 and 1. However, in this work this method is implemented but not tested due to lack of time. The

author just achieved this after all the data have been analyzed. To test this it would be necessary to

create new databases and analyze all over again. Moreover, when a key event is detected by

OnKeyDown, the interface will follow the event until OnKeyUp, preventing from missing the release

time of the key.

Figure 20 – Soft keyboard

3.3 Classification and decision

When the user enters the verification mode, he can choose which algorithm to use, Euclidean

distances and SVM. On both of them, the user has to enter the same password one more time, to be

able to verify if the user being tested is the true user.

When using Euclidean distances, the key times for the user attempting authentication are

loaded into a vector with the double size of the given password, in which the odd positions contain

dwell times and even positions contain flight times, and another vector contains the values of the

registered user password entered after choosing the algorithm.

Each time a Euclidean distance is calculated, according to the formula in (4), the result is

stored in a vector that holds all the distances between the entered password and each of the

password training from that user. Then, upon a threshold, the algorithm will decide if the user is valid

or not. In section 4, the choice of the threshold will be explained for both algorithms.

Page 32: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

23

(4)

Regarding SVM, the implemented algorithm uses ‘libsvm’ library for training and testing of the

features. This one is very different from the first, as it needs samples from the true user and other

samples from a false user, with the same password. To be able to do that the data is loaded into one

simple hashmap for the training labels, and two complex hashmap’s for the train and test features. A

hashmap has a number of “boxes” which are used to store key-value pairs. Each “box” has a unique

number, and when a key-value pair is stored into the map, the hashmap will look at the hash code of

the key, and store the pair in the “box” which identifier is the hash code for the key. Figure 21

represents the hashmap from training labels, where the value of each “box” represents the classified

classes, where number 1 represents the true user class and ‘–n’ the false user(s) class(es).

There is a correlation between the keys of Figure 21 and Figure 22. The key values in each of

the hashmap’s, correspond to the same trained password, in other words if key equal to zero holds a

true user in Figure 22, the same key in Figure 21 will have a value of 1 which represents the true user.

Yet in Figure 21, there is another hashmap associated with the key which holds dwell and flight times

for each password.

HashMap

0

1

2

n

1

1

1

n

26

27

‐1

‐1

n

Figure 21 - Illustrative configuration of hashmap for training labels

Page 33: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

24

HashMap

HashMap

HashMap

HashMap

0

1

2

n

D time 1 F time 1 D time 2

1 2 3 4

F time 2

n

Figure 22- Illustrative configuration of hashmap for train and test features

Before the data is copied into the hashmap, it has to be normalized. This is a common

procedure in machine learning. The normalization consists in converting a vector into a unit vector,

between 0 and 1. This trains the SVM on relative values of the features, not magnitudes. Why

normalize? Because of the way the SVM optimization problem is defined, features with higher

variance have greater effect on the margin. Usually this doesn't make sense - we'd like our classifier to

be 'unit invariant'. The procedure is done by dividing each value by the norm of the vector. The norm is

calculated using the formula in (5).

(5)

After this, the SVM nodes are created, where each key-value pair, from Figure 22 represents

the index and the value of the nodes. Moreover, each training vector, which is composed by a number

of nodes that belongs to one password, are labeled with the values from Figure 21, in order to

distinguish them. For the nodes training, there are some parameters that can be set, kernel type,

parameter C and gamma. The kernel can be linear or nonlinear, as illustrated in Figure 23 and Figure

24, respectively. The decision whether to use one or the other, has some facts take into consideration.

Typically, the best possible predictive performance is better for a nonlinear kernel, or at least as good

as the linear one. It’s been shown that linear kernel is a degenerate version from RBF (Gaussian),

which is a nonlinear kernel, hence the linear kernel is never more accurate than a properly tuned RBF

kernel. This affirmation is only not true when the number of features is large relatively to the number of

samples, (Ng., 2015). In that case is good enough to use linear kernel, because nonlinear kernels do

not score better than the linear one. In the case of this work, the number of features is small as well as

the number of samples. So, to sum up, the RBF kernel is the chosen one. With this kernel, there are

Page 34: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

25

two parameters that can be selected which is C and gamma. Parameter C tells the SVM optimization

how much you want to avoid misclassifying each training set. For large values of C, the optimization

will choose a smaller margin hyper plane, which does a better job of getting all the training points

classified correctly. The opposite happens to small values of C, the optimization will look for a larger

margin hyper plane, even if that hyper plane misclassifies more points. On the other hand, parameter

gamma should be chosen according to the magnitudes on the pairwise distances of the data points. If

the value gamma is very small, RBF kernel is very wide, meaning all the data points could fall into one

class. However, if gamma is very large, RBF kernel is very narrow, meaning that, probably, all training

vectors will end up as support vectors. These two extreme situations are not desirable, so a

combination between the two extremes should be found. The chosen parameters for each dataset are

specified in section 4.1.3.

The final results are obtain with a function, from “libsvm”, which predicts a probability for each

of the classes. Finally, upon a threshold, the user is validated or rejected.

Figure 23 - SVM linear kernel illustration (Ranga, 2015)

Page 35: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

26

Figure 24- SVM RBF kernel illustration (openclassroom.stanford.edu, 2015)

4 Results In this section some results, from the two implemented algorithms, will be presented, as well as

some discussion about the final results. One drawback from the analysis, is that the database does

not have many users, making generalization rather limited. However there are 3 different passwords.

These passwords were chosen carefully, so that each one has different characteristics: one is

lowercase only, another includes lowercase, uppercase and numbers, while the third one includes

lower and uppercase.

To analyze the performance of the algorithm and calculate the best operation thresholds for

each user, a ROC curve should be plotted. A ROC curve is a graphical plot that illustrates a

performance of a classifier as its threshold varies. The curve is plotted by the ratio between true

positives and false negatives at various threshold settings.

In the case of this work thresholds are represented by the distances or by the probability of the

claimed user be the true user. To plot these curves it is necessary to enroll different users and test

them against each other to cover the true and false cases. All ROC curves were plotted using

XLSTAT software, which is an add-on for Microsoft Excel. This allows excel to plot ROC curves by

giving the correct and incorrect values from the users. Subsequently, the threshold is chosen based

on best accuracy for that ROC curve. However, depending on the goal of the verification, the

threshold should be chosen accordingly. If the goal is to secure sensitive information, the threshold

should be lower, to not allow false negatives. On the other hand, the threshold might be higher, so

the user do not have to enter the password more than once. If a system has its ROC curve below or

along with the line which corresponds to the true positive rate equal to false positive rate, is a random

system. On the other hand, a perfect system would have the AUC (Area Under the Curve) equal to 1.

This means that the ‘curve’ goes alongside the true positive rate axis and the when reaching the

value 1 goes until the end of the false positive rate axis.

4.1.1 Average key timing measures

As the key timings are the main tool for the algorithms, it makes sense to look at them to make

a superficial prediction for the final output values. The way they behave influences the performance of

the algorithms. A good method to have a general look at the key timings is to make an average of the

values. The average timings are represented in Figure 25 to Figure 30.

Figure 25 and Figure 26 corresponds to the mxplayer password. From the dwell time is hard to

take any conclusion, once it’s not constant between users, however that does not happen with flight

time. From there, is clear to see that ‘x’ letter is the one with the higher flight time, contrary to ‘l’ letter

which has the lowest. The ‘x’ is a letter which is no common to write so the higher flight time can be

explained. Regarding ‘l’ letter, the time is significantly reduce because letter ‘l’ is very close to ‘p’, so

the flight is significantly reduced.

Page 36: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

27

Figure 27 and Figure 28 correspond to the Lisboa2014 password. With this password, dwell

time has its highest value in the first letter. Regarding flight time, is similar between users, and the

transition between letters and numbers has the highest values, due to the shift key being pressed.

Finally, Figure 29 and Figure 30 corresponds to the tecnicoLisboa password. Dwell time is

very unstable, however is clear to see that it increases for all users in ‘L’ letter. This also occur with

Lisboa2014 password. While observing flight time, the same behavior as dwell time is observed, very

unstable except for ‘L’ letter.

Figure 25- Average dwell time from all users (mxplayer)

Figure 26 - Average flight time from all users (mxplayer)

m x p l a y e r

jpl 85.16 79.32 81.16 88.76 69.12 77.16 74.84 86.84

carla1 97.20 97.48 91.80 94.48 78.88 98.48 98.00 114.32

kiwi1 66.80 78.12 67.00 58.52 73.08 56.48 76.64 70.76

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

dwell tim

e [ms]

average dwell time (mxplayer)

m x p l a y e r

jpl 0.00 175.60 136.64 138.44 98.96 206.88 181.04 131.40

carla1 0.00 1028.56 720.00 288.48 665.56 632.60 469.72 231.56

kiwi1 0.00 358.84 238.72 122.88 268.32 337.20 283.96 182.48

0.00

200.00

400.00

600.00

800.00

1000.00

1200.00

flight time [m

s]

average flight time (mxplayer)

Page 37: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

28

Figure 27 - Average dwell time from all users (Lisboa2014)

Figure 28 - Average flight time from all users (Lisboa2014)

L i s b o a 2 0 1 4

joao 97.81 69.81 78.58 79.31 89.31 76.27 90.08 94.77 83.19 75.15

mj 59.88 43.46 58.96 63.92 55.23 68.62 57.08 56.54 64.73 61.12

susy 114.48 101.92 100.12 80.64 99.00 97.12 98.48 99.52 102.16 67.64

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

dwell tim

e [m

s]

average dwell time (Lisboa2014)

L i s b o a 2 0 1 4

joao 0.00 257.6 303.2 122.4 162.6 40.42 727.3 36.12 35.19 208.3

mj 0.00 466.0 847.8 739.3 851.6 682.2 1660. 803.0 536.4 406.0

susy 0.00 377.9 536.8 489.9 429.1 437.7 1350. 488.4 426.3 489.5

0.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1600.00

1800.00

flight time [m

s]

average flight time (Lisboa2014)

Page 38: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

29

Figure 29 - Average dwell time from all users (tecnicoLisboa)

Figure 30 - Average flight time from all users (tecnicoLisboa)

4.1.2 Euclidean distances

In mathematics, the Euclidean distance is a distance between two points in a space with two

or more dimensions. To plot ROC curves when having a decision module based on Euclidean

distances, intra and inter user distances have to be calculated. Intra user distances are measured by

t e c n i c o L i s b o a

jpssl 83.0 84.5 77.7 85.6 90.0 74.6 71.4 91.9 53.0 82.9 91.9 85.2 77.3

kiwi 56.2 79.2 53.2 55.3 51.2 55.9 55.3 71.8 55.6 75.1 52.8 52.5 72.2

vanessa 82.2 123. 128. 110. 83.2 114. 80.7 114. 94.9 123. 115. 88.8 113.

mary 71.4 68.7 65.3 70.1 81.2 61.9 71.2 96.6 67.1 64.9 64.9 74.1 65.4

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

dwell tim

e [ms]

average dwell time (tecnicoLisboa)

t e c n i c o L i s b o a

jpssl 0.00 122. 222. 92.3 148. 46.3 67.1 403. 152. 216. 118. 162. 54.2

kiwi 0.00 273. 331. 411. 141. 302. 203. 1247 131. 209. 240. 167. 184.

vanessa 0.00 319. 362. 225. 177. 255. 175. 1023 228. 244. 211. 283. 178.

mary 0.00 244. 367. 217. 228. 205. 81.1 932. 179. 243. 202. 263. 62.0

0.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

flight time [m

s]

average flight time (tecnicoLisboa)

Page 39: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

30

comparing a distance of one password with the remaining ones inserted by the same user until all

combinations have been made. Inter user, uses the same method as intra user but between different

users. With that data, it is possible to plot ROC curves. In Table 1, each row represents the threshold

for the best accuracy achieved. With this algorithm the threshold represents a distance, where the

user is accepted when the distance of the password inserted is lower than the threshold calculated.

In Table 1, the figures for the best accuracy achieved are represented. Despite the thresholds

indicated in the table being an optimal solution, there are other possible thresholds than can be used

depending on the goal of the application. If security is important the threshold should be lower, on the

other hand a higher threshold value might be chosen if the user prefers to minimize the number of

access attempts required by the system, at the expense of allowing an increase false accept rate.

Table 1 - ROC evaluation for the best accuracy achieved (Euclidean distances)

Password Threshold Sensitivity Specificity TP TN FP FN Accuracy

mxplayer 460 0.769 0.758 76.9% 75.8% 24.2% 23.1% 76.4%

Lisboa2014 707 0.64 0.874 64% 87.4% 12.6% 36% 76.3%

tecnicoLisboa 363 0.542 0.946 54.2% 94.6% 5.4% 45.8% 77.1%

To be able to evaluate the performance of the system that uses Euclidean distances as a

metric, and set an optimal threshold, the corresponding ROC curves were plotted – see

Figure 31, Figure 32 and Figure 33. From Table 1, it is possible to see that using the password

tecnicoLisboa leads to the best performance, while the other two passwords present similar results.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive rate (Sensitivity)

False positive rate (1 ‐ Specificity)

Page 40: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

31

Figure 31- ROC curve for 'mxplayer' (Euclidean distances)

Figure 32 - ROC curve for ‘Lisboa2014’ (Euclidean distances)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive rate (Sensitivity)

False positive rate (1 ‐ Specificity)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive

 rate (Sensitivity)

False positive rate (1 ‐ Specificity)

Page 41: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

32

Figure 33 - ROC curve for ‘tecnicoLisboa’ (Euclidean distances)

4.1.3 SVM

In machine learning SVM is a learning model with associated learning algorithms that analyze

data and recognize patterns, used for classification and regression analysis. To be able to learn how

to analyze data and recognize patterns, there has to be training data consisting of a set of training

examples, each marked as belonging to one of two categories. Given that, an SVM training algorithm

builds a model that assigns new examples into one category or the other, making it a non-probabilistic

binary classifier. Those two categories are divided by a clear gap that is as wide as possible. This is

possible through a construction of a hyper plane which can be linear or nonlinear. A good separation

is achieved by the hyper plane that has the largest distance to the nearest training data point of any

class, since in general the large the margin the lower the generalization error of the classifier.

To be able to train and test, the data parameter C and gamma had to be chosen for each set

of users. The parameters where chosen based on the best output result. To do that, a combination of

values where tested. From there, the best output values were taken into account. When training the

data if the test user was the true user or a false user, the password being tested would not be in the

training. This way the tested password, does not correspond to any in the training data. To analyze the

performance values for each user, histograms where used. Each histogram represents the probability

of the claimed user being the true user. For each test, the output would return a probability, which is

represented in each bin of the histogram. After some preliminary tests parameter C was set to the

value 10000, for every password. Regarding parameter gamma, for mxplayer it was set to 1 and for

Lisboa2014 and tecnicoLisboa it was set to 10.

Figure 34, Figure 35 and Figure 36 illustrate the recognition rate, in percentage, for each user

of the three passwords, mxplayer, Lisboa2014 and tecnicoLisboa, respectively. The algorithm returns

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive rate (Sensitivity)

False positive rate (1 ‐ Specificity)

Page 42: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

33

a probability of recognition between 0 and 1, where the higher the number the higher is the probability

of the true user being the claimed user. So, when the claimed user inserts the password the

probability should be higher than the threshold.

Figure 34 - Probability of the claimed user be the true user (mxplayer)

Figure 35 - Probability of the claimed user be the true user (Lisboa2014)

02468

10121416

Frequency

Probability [%]

jpl

carla1

kiwi1

0

5

10

15

20

25

Freq

uen

cy

Probability [%]

joao

susy

mj

Page 43: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

34

Figure 36 - Probability of the claimed user be the true user (tecnicoLisboa)

To be able to evaluate the performance of the system when using the SVM classifier, and set

an optimal operation threshold, the corresponding ROC curves were plotted – see Figure 37, Figure

38 and Figure 39. These curves also allow comparing the performance between the two approaches,

using either the SVM classifier or the Euclidean distance metric. When plotting these figures, the true

user would be set and not enrolled with the testing data, and then tested against other captures from

the same user as well as from false users, with the output probability being recorded. The XLSTAT

software was used to produce the ROC curves.

In Table 2, the figures for the best accuracy achieved are presented. Despite the thresholds

indicated in the table corresponding to an optimal solution, there are other possible threshold values

than can be used, depending on the goal of the application. If security is important the threshold

should be higher.

Table 2 - ROC evaluation for the best accuracy achieved (SVM)

Password Threshold Sensitivity Specificity TP TN FP FN Accuracy

mxplayer 0.65 0.92 0.973 92% 97.3% 2.7% 8% 95.6%

Lisboa2014 0.39 0.987 0.973 98.6% 97.3% 2.7% 1.4% 97.8%

tecnicoLisboa 0.3 0.93 0.958 93% 95.8% 4.2% 7% 95%

The password Lisboa2014 has the best performance, while the other two have a similar

performance. However, it is clear to see that using the SVM classifier allowed achieving a big

0

2

4

6

8

10

12

14

Freq

uen

cy

Probability [%]

jpssl

mary

kiwi

vanessa

Page 44: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

35

improvement over the results when using the Euclidean distance metric, as it is able to distinguish

each users’ characteristics much better, which is reflected in the classification and consequently on

recognition results.

Figure 37 - ROC curve for ‘mxplayer’ (SVM)

Figure 38 - ROC curve for ‘Lisboa2014’ (SVM)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive

 rate (Sensitivity)

False positive rate (1 ‐ Specificity)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive rate (Sensitivity)

False positive rate (1 ‐ Specificity)

Page 45: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

36

Figure 39 - ROC curve for ‘tecnicoLisboa’ (SVM)

4.1.4 Conclusion

By the analysis above is clear to see that SVM has better performance that Euclidean

distances. In ROC analysis tecnicoLisboa performed better in Euclidean distances while Lisboa2014

performed better in SVM. As stated above, a password should not have only lower case or correlated

letters. A good password would have at least an upper case and at least one number. The password

Lisboa2014 is a very good example, as it is the one which performed better because fulfills all the

characteristics stated in the beginning of this section.

5 Using the application The purpose of this section is to elucidate the user in how to use the application developed. The

description will guide the user on how to create a new account or use an old one, to train the

algorithms and how to test them.

To be able to run the application it’s necessary to install the apk. To do that the smartphone

should be connected to the computer. After that, go to ‘Computer’ in ‘Start Menu’ and in ‘Portable

Devices’ click on your phone logo to open internal storage and drag the Keystroke.apk and

SoftKeyboard.apk to your internal storage root. Then you can unplug the phone. Next go to a file

manager from your phone, if you don’t have one Astro File Manager is a good one, and click in the

apk that were previously copied into the phone. After the installation the SoftKeyboard should be

selected. For that it is necessary to go to settings, language & input, current keyboard, choose

keyboards and select Soft Keyboard. This way the keyboard pops up and the user is able to change

between the standard keyboard and the Soft Keyboard.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True positive rate (Sensitivity)

False positive rate (1 ‐ Specificity)

Page 46: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

37

After completing the steps described above, the next step is to create a new user when the

login screen appears, as seen in Figure 40. In the case of an existing user, a confirmation box

appears, Figure 42, where the user can confirm his identity or reject it. After inserting the username, a

box pops up with the three passwords allowing the user to choose the one to be used, as shown in

Figure 41. Of course, in a real deployment of the application the password should be freely chosen by

the user.

Figure 40 - Login screen

 

Figure 41 - Password choices

Figure 42 - Confirmation box for a user that already exists

Page 47: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

38

After the successful login, comes the main screen, Figure 43, where the use can choose

between data training and verification. The app does not allow verification when the user does not

have at least 5 training attempts, in order to allow creating the database template for that user.

When the user presses the training button for the first time a screen as shown in Figure 44 will

appear. For each new user, 25 training attempts are recommended, and the counter will keep

counting how many attempts are left until the user reaches 25. It is possible to see the password while

writing it, for that is enough to press the ‘show’ button in the upper right corner. It is also shown, below

the text box, the chosen password, to avoid making mistakes in the password writing. If the user

makes a mistake while writing the password, that attempt will not be valid. As this treats key timings, it

does not make sense to accept writing errors, once this treats key timings that would alter the normal

key timings. If the user makes a mistake is enough to press send that will reset the text box. If the

password is well written a new screen will pop up, shown in Figure 45. Then by pressing ‘Go Back’

button the main screen will appear again to proceed with another training.

Figure 43 – Main screen

 

Page 48: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

39

Figure 44 - Training screen

Figure 45 - Training accepted screen

Page 49: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

40

 

After completing the recommended training, the user should proceed to the verification step.

When pressed a pop up window will appear, Figure 46, where the user may choose between two

algorithms or return to the previous screen. Then, a screen similar to the training will appear. After

writing the password and pressed ‘send’ button the algorithm analyzes the writing sample. Two

outputs can appear. If the user is considered an imposter, a message will appear in the main screen,

Figure 47. On the other hand, a new screen will appear, Figure 48, with the name of the user in

question and two buttons. One to go back to the main screen and another to erase the user and all the

training from the database.

Figure 46 - Box to choose an algorithm to proceed with verification

Figure 47 - Impostor message

Page 50: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

41

Figure 48 - Verification screen after a user is approved

   

Page 51: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

42

6 Conclusions and further Work This section summarizes the work presented, highlights the main conclusion and provides

guidelines for the development of future work.

6.1 Summary and conclusion

The first section has presented the dependency of smartphones in everyday life. They carry

personal information and the security in them is easy to break. So, adding a biometric trait to the

already existing security is a good way to improve security. This section also overviews the theoretical

basis of biometric systems solutions. Finally, the work objectives and its structure are presented by

summarizing the purpose of each section.

Section 2 starts by reviewing some of the main biometric techniques and their structure. After

that, the technique for this work is chosen and a deeper study is presented. In this way, the reader will

have a better understanding about the relevant state-of-the-art, allowing a better understand of the

choices made by the author in the design and implementation if the developed keystroke recognition

solution.

Section 3 presents the detailed architecture of the implemented solution and describes the

interaction between the various steps. Next, the capture of the user input, classification and decision

are described separately, notably the key timing is captured, and the training of the presented

algorithms.

Section 4 presents performance analysis. First, an estimation of the influence of the time

samples for each password is presented due to its influence on analysis evaluation. For each

algorithm the test conditions are presented as well as the test results and the respective discussion.

Section 5 introduces the installation of the application on the smartphone as well as guide

through for the user on how to use it.

In the last few years there have been some research in mobile keystroke dynamics. This Thesis

has been one more effort to enrich that research. However, with all the coding done by the author and

the limited time involved, the application performance still has some room to be improved as well as

the expansion of the database. So, it may be concluded that even with the limitation involving the

application implementation, the application has a good performance and should allow increasing the

security when entering alpha numeric passwords. Furthermore, by analyzing the system performance,

notably looking at the ROC curves, it can be stated that using the SVM classifier the application can

improve the password security for a user. On the other hand, a system based on the Euclidean

distance metric would still need further improvement before being applied to help improve security.

6.2 Further work

The author hopes that the developed app will be revisited in the future and that better

performances may be achieved through the addition of new features. In this context, some features to

enhance the algorithms classification are presented here:

Page 52: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

43

Pressure: adding the user finger pressure on the key may improve algorithm

performance classification

Fixed weights: adding fixed weights based on variance of the samples. A sample which

is much more statistical dispersed naturally indicates a less reliable mean compared to

a sample with a smaller variance, so variance is calculated in order to determine the

statistical dispersion of the samples. Fixed weights would be assigned to each value in

the template in order of the inverse ranking of their variance

Phone orientation: analyze how the user holds its phone when training data through the

gyroscope may help identifying the true user.

Page 53: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

44

7 References

(2014, March 4). Retrieved from wikipedia: http://en.wikipedia.org/ 

(2014, 6 17). Retrieved from whatsnext.nuance: http://whatsnext.nuance.com/biometrics‐

smartphone‐future‐mobile‐authentication/ 

(2014, June 17). Retrieved from zdnet: http://www.zdnet.com/30‐percent‐of‐companies‐will‐use‐

biometric‐identification‐by‐2016‐7000025942/ 

(2014, Jul 08). Retrieved from griaulebiometrics: http://www.griaulebiometrics.com/en‐

us/book/understanding‐biometrics/introduction/model/types 

(2015, January). Retrieved from www.sibelle.info/oped4.htm 

(2015, January). Retrieved from openclassroom.stanford.edu: 

http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearni

ng&doc=exercises/ex8/ex8.html 

Awad, A., & Traore, I. (2013). Biometric Recognition Based on Free‐Text Keystroke Dynamics. (p. 1). 

Cybernetics, IEEE Transactions on (Volume:PP , Issue: 99 ). 

Blanco‐Gonzalo, R., Miguel‐Hurtado, O., Mendaza‐Ormaza, A., & Sanchez‐Reillo, R. (2012). 

Handwritten signature recognition in mobile scenarios: Performance evaluation. Security 

Technology (ICCST), 2012 IEEE International Carnahan Conference on, (pp. 174‐179). Boston, 

MA. 

Cho, D.‐h., Ryoung Park, K., Woong Rhee, D., Kim, Y., & Yang, J. (2006). Pupil and Iris Localization for 

Iris Recognition in Mobile Phones. Software Engineering, Artificial Intelligence, Networking, 

and Parallel/Distributed Computing, 2006. SNPD 2006. Seventh ACIS International Conference 

on, (pp. 197‐201). Las Vegas, NV. 

Crawford, H. (2010). Keystroke dynamics: characteristics and opportunities. Eight annual 

international conference on privacy, security and trust.  

D.S., M., M.S., N., S., M., & J.N., C. (2012). The Effect of Time on Gait Recognition Performance. 

Information Forensics and Security, IEEE Transactions on (Volume:7 , Issue: 2 ) , (pp. 543‐552). 

Franzgrote, M., Borg, C., Ries, B. J., Bussemaker, S., Jiang, X., Fieseler, M., & Zhang, L. (2011). 

Palmprint Verification on Mobile Phones Using Accelerated Competitive Code. Hand‐Based 

Biometrics (ICHB), 2011 International Conference on, (pp. 1‐6). Hong Kong. 

Karman, M., & Krishnaraj, N. (2010). Bio password — Keystroke dynamic approach to secure mobile 

devices. (pp. 1‐4). Coimbatore: Computational Intelligence and Computing Research (ICCIC), 

2010 IEEE International Conference on. 

Kurkovsky, S., Carpenter, T., & MacDonald, C. (2010). Experiments with Simple Iris Recognition for 

Mobile Phones. (pp. 1293‐1294). Las Vegas, NV: Information Technology: New Generations 

(ITNG), 2010 Seventh International Conference on. 

Lee, H., Chang, S., Yook, D., & Kim, Y. (2009). A Voice Trigger System using Keyword and Speaker 

Recognition. Consumer Electronics, IEEE Transactions on (Volume:55 , Issue: 4 ), (pp. 2377‐

2384). 

Page 54: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

45

Maiorana, E., Campisi, P., González‐Carballo, N., & Neri, A. (2011). Keystroke dynamics 

authentication for mobile phones. SAC '11 Proceedings of the 2011 ACM Symposium on 

Applied Computing (pp. 21‐26). New York: ACM New York, NY, USA ©2011. 

McLoughlin, & Mohanavel. (2009). Keypress biometrics for user validation in mobile consumer 

devices. (pp. 280‐284). Kyoto: Consumer Electronics, 2009. ISCE '09. IEEE 13th International 

Symposium on. 

Mendaza‐Ormaza, A., Miguel‐Hurtado, O., Blanco‐Gonzalo, R., & Jose Diez‐Jimeno, F. (2011). Analysis 

of handwritten signature performances using mobile devices. Security Technology (ICCST), 

2011 IEEE International Carnahan Conference on, (pp. 1‐6). Barcelona. 

Minh Thang, H., Quang Viet, V., Dinh Thuc, N., & Choi, D. (2012). Gait Identification Using 

Accelerometer on Mobile Phone. Control, Automation and Information Sciences (ICCAIS), 

2012 International Conference on, (pp. 344‐348). Ho Chi Minh City. 

Mobile Markting. (2015, February). Retrieved from Marketing land: 

http://marketingland.com/nearing‐75‐percent‐smartphone‐penetration‐year‐end‐94903 

Ng, P. A. (2015, January). Retrieved from https://www.youtube.com/watch?v=i25MEJeX0Eg 

Ng., P. A. (2015, January). Retrieved from opencourseonline: 

https://www.youtube.com/watch?v=i25MEJeX0Eg 

Oner, M., Pulcifer‐Stump, J., Seeling, P., & Kaya, T. (2012). Towards the run and walk activity 

classification through step detection ‐ an android application. 34th Annual International 

Conference of the IEEE EMBS. San Diego, California, USA. 

physio‐pedia. (2015, February). Retrieved from http://www.physio‐

pedia.com/images/b/b0/Figure2.jpg 

Radu, P. (2012). Image Enhancement vs Feature Fusion in Colour Iris Recognition. (pp. 53‐57). Lisbon: 

Emerging Security Technologies (EST), 2012 Third International Conference on. 

Radu, R., Sirlantzis, K., Howells, W., Hoque, S., & Deravi, F. (2012). Image Enhancement vs Feature 

Fusion in Colour Iris Recognition. (pp. 53‐57). Lisbon: Emerging Security Technologies (EST), 

2012 Third International Conference on. 

Ranga, A. (2015, January). Retrieved from https://amitranga.wordpress.com/machine‐

learning/support‐vector‐machines/ 

Ranga, A. (2015, January). Retrieved from https://amitranga.wordpress.com/machine‐

learning/support‐vector‐machines/ 

Ranga, A. (2015, January 20). amitranga. Retrieved from wordpress: 

https://amitranga.wordpress.com/machine‐learning/support‐vector‐machines/ 

Ritchie, R., Rubino, D., Michaluk, K., & Nickison, P. (2013, 09 24). Retrieved from android central: 

http://www.androidcentral.com/talk‐mobile/future‐authentication‐biometrics‐multi‐factor‐

and‐co‐dependency‐talk‐mobile 

Shabeer, H., & Suganthi, P. (2007). Mobile Phones Security Using Biometrics. Conference on 

Computational Intelligence and Multimedia Applications, 2007. International Conference on, 

(pp. 270‐274). Sivakasi, Tamil Nadu. 

Page 55: Keystroke recognition using Android devices€¦ · keystroke) to create a template and store it. Given that, when a user wants to authenticate itself, the user traits will be compared

46

Shanmugapriya, D., & Padmavathi, G. (2009). A Survey of Biometric keystroke Dynamics: Approaches, 

Security and Challenges. (IJCSIS) International Journal of Computer Science and Information 

Security, (pp. Vol. 5, No. 1). 

Sierra, A. d., àvila, C. S., del Pozo, G. B., & Casanova, J. G. (2011). Gaussian multiscale aggregation 

oriented to hand biometric segmentation in mobile devices. Nature and Biologically Inspired 

Computing (NaBIC), 2011 Third World Congress on, (pp. 237‐242). Salamanca. 

Sierra, A. d., Casanova, J. G., Ávila, C. S., & Vera, V. J. (2009). Silhouette‐based hand recognition on 

mobile devices. Security Technology, 2009. 43rd Annual 2009 International Carnahan 

Conference on, (pp. 160‐166). Zurich. 

Sim, T., & Janakiraman, R. (2007). Are Digraphs Good for Free‐Text Keystroke Dynamics? (pp. 1‐6). 

Minneapolis, MN: Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE 

Conference on. 

Tao, Q., & Veldhuis, R. (2006). Biometric Authentication for a Mobile Personal Device (pp. 1‐3). San 

Jose, CA: Mobile and Ubiquitous Systems: Networking & Services, 2006 Third Annual 

International Conference on. 

Tao, Q., & Veldhuis, R. (2010). Biometric Authentication System on Mobile Personal Devices. (pp. 

763‐779). Instrumentation and Measurement, IEEE Transactions on (Volume:59 , Issue: 4 ) . 

Trewin, S., Swart, C., Koved, L., Martino, J., Singh, K., & Ben‐David, S. (2015, January). Biometric 

Authentication on a Mobile Device: A Study of User Effort, Error and Task Disruption. 

http://researcher.ibm.com/. Retrieved from http://researcher.ibm.com/researcher/files/us‐

kapil/ACSAC12.pdf 

Trojahn, M., & Ortmeier, F. (2013). Toward mobile authentication with keystroke dynamics on 

mobile phones and tablets. Advanced Information Networking and Applications Workshops 

(WAINA), 2013 27th International Conference on, (pp. 697‐702). Barcelona.