towards practical driver cognitive workload monitoring via … · 2018-07-18 · abstract towards...

Towards Practical Driver Cognitive Workload Monitoringvia Electroencephalography

by

Vipin Bakshi

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

The Edward S. Rogers Sr. Department of Electrical and ComputerEngineering

University of Toronto

Copyright c© 2018 by Vipin Bakshi

Abstract

Towards Practical Driver Cognitive Workload Monitoring via Electroencephalography

Vipin Bakshi

Master of Applied Science

The Edward S. Rogers Sr. Department of Electrical and Computer Engineering

University of Toronto

2018

Monitoring of Driver Cognitive Workload is an active area of research and has gained

traction in recent years. This study pertains towards the development of an Automated

Driver Cognitive Workload Prediction System using the Electroencephalography modal-

ity. In this experiment, the driver cognitive workload has been modelled using a secondary

n-back task. This study aims to answer the question if a consumer-grade 2-channel EEG

modality can be used to discriminate between the granular cognitive workloads induced

on the driver as it is measured by the n-back test. Statistical Learning Models are gen-

erated while taking into account the practical data-partitioning schemes as would be

feasible in real world implementation.

It is found that individual Beta and Gamma bands provide good discriminating per-

formance while using a combined set of 32-features provides the best overall performance.

Non-Linear Classifiers outperform the linear classifiers and dimensionality reduction tech-

niques assist in producing practical prediction models.

ii

Acknowledgements

First and foremost, I would like to thank Professor Konstantinos (Kostas) for providing

me with the opportunity to perform this study. Above all, the kindness and patience,

in providing timely assistance and encouragement towards this study made the journey

that much sweeter.

I would also like to thank some special people for the support over the duration of

this study. To my mom and dad: Thank you for the constant love and encouragement

and making sure everything was fine back home. I am so lucky to have you as parents

and I appreciate all that you have done through the years to make sure that I can achieve

my numerous goals. You inspire me to be a better person each day and to strive towards

achieving my full potential. To my brother: Thank you for your support over the years

and to always give the alternative perspective that was needed. I am so proud of all that

you have accomplished and the exciting things to come.

Thank you to the fantastic friends and researchers at the Multimedia Lab who assisted

with presentations, ideas and write-ups. It was a privilege to work with all of you.

iii

Table of Contents

Acknowledgements iii

Table of Contents iv

List of Tables viii

List of Figures ix

Glossary xii

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Modeling using Low-Channel Consumer Grade EEG . . . . . . . 4

1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 eDREAM EEG Modality 8

2.1 Quantifying Cognitive Load . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Modeling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 n-Back Secondary Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 n-Back for the Driving Task . . . . . . . . . . . . . . . . . . . . . 14

iv

2.4 eDREAM Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 n-Back Drive Labeling . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 EEG Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Time Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 17

2.6 EEG Headband Measurements . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6.1 Wireless EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6.2 Power Spectral Density Feature . . . . . . . . . . . . . . . . . . . 21

2.6.3 Absolute Band Power Features . . . . . . . . . . . . . . . . . . . 23

2.6.4 Relative Band Power Features . . . . . . . . . . . . . . . . . . . . 24

2.6.5 βγ Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.6 β-Relative Features . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.7 Additional / Artifact Information . . . . . . . . . . . . . . . . . . 26

2.7 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.1 PSD Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.2 Statistical Featurespace Overview . . . . . . . . . . . . . . . . . . 29

2.8 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.8.1 Dimensionality Reduction for Individual Participants . . . . . . . 34

2.9 Dimensionality Reduction for all Participants . . . . . . . . . . . . . . . 36

2.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Related Works and Review 44

3.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1.1 Drowsiness and Fatigue studies . . . . . . . . . . . . . . . . . . . 45

3.1.2 Cognitive Workload and Distraction . . . . . . . . . . . . . . . . . 46

3.2 State-of-the-Art Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 Wireless EEG System for Driver Vigilance (Lin et. al)[38] . . . . 49

3.2.2 Wireless EEG for Cognitive Workload(Wang et. al)[53] . . . . . . 52

3.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

v

4 Estimating Cognitive Load with Wireless EEG 56

4.1 Experiment Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.1 Cumulative Feature Space . . . . . . . . . . . . . . . . . . . . . . 60

4.3 Feature Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3.1 Windowed-Averaging . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3.3 Subject-Level Normalization . . . . . . . . . . . . . . . . . . . . . 65

4.4 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4.1 Support Vector Machine with Linear Kernel (LSVM) . . . . . . . 69

4.4.2 Logistic Regression Logistic Regression (LR) . . . . . . . . . . . . 70

4.4.3 k-Nearest Neighbors (k-Nearest Neighbours (kNN)) . . . . . . . . 70

4.4.4 Support Vector Machine with Radial Basis Kernel (RBSVM) . . . 70

4.4.5 Shallow Artificial Neural Network (Artifical Neutal Network (ANN)) 71

4.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.5.1 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.5.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.6.1 Individualized Subject-Specific Performance . . . . . . . . . . . . 78

4.6.2 Generalized Subject-Partitioned Classification . . . . . . . . . . . 83

4.6.3 Generalized Time-Partitioned Performance . . . . . . . . . . . . . 87

4.6.4 Grouped Power Spectral Sub-Features . . . . . . . . . . . . . . . 98

4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5 Conclusion 107

5.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 108

vi

5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2.1 Improvements to Current Work . . . . . . . . . . . . . . . . . . . 110

5.2.2 Extension of Work . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliography 112

vii

List of Tables

2.1 Granular Cognitive Workload States . . . . . . . . . . . . . . . . . . . . 11



2.4 Group-wise(28) PSD Band Sensitivities . . . . . . . . . . . . . . . . . . . 28

4.1 PSD Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 16-dimensional Feature Vector (per channel) . . . . . . . . . . . . . . . . 60

4.3 Medians of Individualized Scores using All-Features(32) . . . . . . . . . . 81

4.4 Medians of Individualized Scores using Sub-Features . . . . . . . . . . . . 81

4.5 Generalized Subject-Partitioned Binary Classification . . . . . . . . . . . 85

4.6 Generalized Subject-Partitioned Ternary Classification . . . . . . . . . . 86

4.7 Time-Partitioned Binary Classification using 32-Features . . . . . . . . . 91

4.8 Time-Partitioned Ternary Classification using 32-Features . . . . . . . . 92

4.9 Time-Partitioned Classification using Single Absolute Features . . . . . . 94

4.10 Time-Partitioned Classification using Single Relative Features . . . . . . 95

4.11 Time-Partitioned Classification using Single Relative Features . . . . . . 96

4.12 Time-Partitioned Ternary Classification using Sub-Features . . . . . . . . 101

4.13 Time-Partitioned Binary Classification using Sub-Features . . . . . . . . 102

viii

List of Figures

1.1 Proposed top-level design pipeline . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Empirical Cognitive Modeling . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Yerkes-Dodson: Arousal vs Performance . . . . . . . . . . . . . . . . . . 10

2.3 n-Back Drive for Participant 8 . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Muse Headband electrode placements. . . . . . . . . . . . . . . . . . . . 18

2.5 Muse Headband gel foam temporal electrodes and frontal dry electrodes. 18

2.6 EEG Signal Labeling Procedure . . . . . . . . . . . . . . . . . . . . . . . 19

2.7 220Hz EEG for User 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.8 FFT Response User 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.9 Sensor Conductivity at Frontal Site . . . . . . . . . . . . . . . . . . . . . 27

2.10 Sensor Conductivity at Temporal Site . . . . . . . . . . . . . . . . . . . . 27

2.11 Groupwise PSD across the 3 nBack tasks. . . . . . . . . . . . . . . . . . 29

2.12 Absolute Power Feature vs nBack Task for group of 28 users at Frontal Sites 30

2.13 Relative Power Feature vs nBack Task for group of 28 users at Frontal Sites 31

2.14 βRelative Power Feature vs nBack Task for group of 28 users at Frontal

Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.15 Single Participant PCA on 32-features . . . . . . . . . . . . . . . . . . . 35

2.16 Single participant PCA on 32-feautures . . . . . . . . . . . . . . . . . . . 35

2.17 Single Participant LDA on 32-features . . . . . . . . . . . . . . . . . . . 36

ix

2.18 Principal Component Analysis (PCA) applied to a 32-Dimensional Fea-

turespace (28 Participants) . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.19 PCA applied to a 32-Dimensional Featurespace (28 Participants) . . . . . 38

2.20 Linear Discriminant Analysis (LDA) applied to a 32-Dimensional Features-

pace (28 Participants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.21 PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.22 PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.23 LDA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.24 PCA applied to a 12-Dimensional Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.25 PCA applied to a 12-Dimensional Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.26 LDA applied to a 12-Dimensional Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.27 PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.28 PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.29 LDA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.1 Power Spectral sensitivities across cognitive tasks . . . . . . . . . . . . . 48

3.2 State-of-Art Studies employing EEG and nBack Tasks . . . . . . . . . . 54

4.1 Top Level Design Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 57

x

4.2 Top Level Design Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3 Windowing Applied to a time-series Absolute γ signal . . . . . . . . . . . 64

4.4 Standardization operation applied to a time-series 3s Averaged Absolute

γ signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Normalization operation applied to a time-series 3s Averaged and Stan-

dardized Absolute γ signal . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6 Complete Feature Processing methodology to transform original Time-

Series signal into a processed Normalized Signal prepared for the Classifi-

cation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.7 Machine Learning Algorithm and evaluated Hyper-Parameters . . . . . . 72

4.8 k-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.9 Individualized Models median binary classification performance . . . . . 79

4.10 Individualized Models median ternary classification performance . . . . . 80

4.11 Individualized Models sub-featurespace performance . . . . . . . . . . . . 80

4.12 2-dimensional LDA Space for Subject 37 on original dataset . . . . . . . 82

4.13 2-Dimensional LDA Space for Subject 37 on feature-processed dataset . . 82

4.14 3-dimensional PCA Space for Subject 37 on feature-processed dataset . . 83

4.15 Generalized Subject-Partitioned binary classification . . . . . . . . . . . . 84

4.16 Generalized Subject-Partitioned ternary classification . . . . . . . . . . . 84

4.17 Generalized Time-Partitioned ternary classification . . . . . . . . . . . . 90

4.18 Generalized Time-Partitioned binary classification . . . . . . . . . . . . . 90

4.19 Generalized Time-Partitioned ternary classification (Individual Features) 97

4.20 Generalized Time-Partitioned binary classification (Individual Features) . 97

4.21 Generalized Time-Partitioned ternary classification (Sub-Features) . . . . 100

4.22 Generalized Time-Partitioned binary classification (Sub-Features) . . . . 103

xi

Glossary

α Frequency Band: 7.5Hz-13Hz. xi, 24

β Frequency Band: 13Hz-30Hz. xi, 24

βγ Frequency Band: 20Hz-40Hz. xi, 24

δ Frequency Band: 1Hz-4Hz. xi, 24

γ Frequency Band: 30Hz-44Hz. xi, 24

θ Frequency Band: 4Hz-8Hz. xi, 24

ANN Feed-forward Neural Network with backpropagation. vi, xi, 56, 71

DSP Digital Signal Processing. xi, 20

EDD Multi-modal Driver Cognitive Workload Monitoring dataset. xi, 5–7, 14

EEG Electroencephalography. xi, 3

FFT Fast Fourier Transform. xi, 21

Fp Frontal Polar Sites. xi, 19

IVIS In-Vehicular Intelligence Systems. xi, 1

kNN k-Nearest Neighbours. vi, xi, 46, 56, 70

xii

LDA Linear Discriminant Analysis. x, xi, 6, 7, 33, 34, 36, 38, 40, 41, 43, 46, 54, 57, 59,

79, 81–83, 85, 86, 88, 89, 91, 92, 98, 100–105, 109

LR Logistic Regression. vi, xi, 70

LSVM Support Vector Machine with Linear Kernel. xi, 46, 56

PCA Principal Component Analysis. x, xi, 6, 7, 33, 34, 36–43, 54, 57, 59, 78, 81, 83,

85–89, 91, 92, 98–102, 104, 105, 108–110

PSD Power Spectral Density. xi, 4

RBSVM Support Vector Machine with Radial Basis Kernel. xi, 46, 56

TP Temporal-Parietal Sites. xi, 20

xiii

Chapter 1

Introduction

1.1 Background and Motivation

The automotive industry is undergoing a new era of technological transformations. These

advancements are led by developments in autonomous vehicular technologies towards the

safe and automated navigation of vehicles. However, the advancements are not merely

restricted to the navigation of the vehicle, instead, vast changes inside the vehicle, in the

form of In-Vehicle Intelligence Systems (IVIS), sophisticated Vehicular Control / Navi-

gation Systems and increased integration with smartphones are drastically changing the

behavior of a driver inside a vehicle. Studies have shown driver inattention to be a lead-

ing cause of automotive accidents.[18][1][5][2] In addition, with the gradual progression

of the vehicles towards full-automation, it is imperative that drivers maintain maximal

attention on the road during this transitory period. In fact, various recent cases have

been reported where driver inattention in semi-autonomous vehicles have been factors

towards fatal accidents.[21][10][43] A new generation of vehicles are considered to be semi-

autonomous such that these vehicles are able to autonomously navigate under controlled

environments, however, as conditions deteriorate, it is imperative for drivers to regain

control of these vehicles. Therefore, newer internal and external vehicle technologies have

1

Chapter 1. Introduction 2

given a rise to added complexities in driving which requires the maximal driver attention

on the road.

Studies have shown driver inattention to be prevalent in the form of visual, manual or

cognitive distractions.[5][2] While visual and manual distractions can be observed easily

observed through video modalities, detecting cognitive distraction is much more challeng-

ing. Complex cognitive workload states can be generated due to a variety of factors such

as emotional state, fatigue, drowsiness or external stimulus to name a few. Studies have

shown the cognitively distracted state to be a leading cause of driving accidents whereby

a driver may seem to be visually and manually attentive, yet be cognitively distracted

due to the aforementioned conditions.[18][1] Therefore, it is of urgent importance, and

various approaches have been discussed, towards Driver Cognitive Workload Monitoring

to build automated predictive systems to prevent distraction related repercussions.[12]

[22] [32]

1.1.1 Technical Challenges

In order to successfully model the Driver Cognitive Workload towards building a practical

and automated system, a variety of challenges are presented:

1. Quantifying a subjective measure such as Cognitive Workload is an interdisciplinary

subject between Psychology and Human Factors studies[39]. A variety of modeling

methods have been researched, proposed and implemented, and careful consider-

ation must be paid towards the generation of a commonly agreed ’ground truth’

metric as a confidence measure for further modeling work. Challenge lies in the ex-

perimental design, data collection and modeling methodologies via various modali-

ties towards ensuring that a properly quantified cognitive workload model is indeed

being generated.

2. A wide variety and paradigms of system modalities are present such as Driving


Performance, Eye Tracking, Video and Physiological modalities to name a few.

Each modality contains its own set of challenges and advantages related to the

balance in precision, performance and practicality.[30]

3. A major goal towards the development of a practical and automated driver cognitive

workload system is to study the trade-off between individualized and generalized

model performance. Participants have high inter-individual variability in their re-

sponses to cognitive workload demands and challenges arising from the practical

and performance trade-offs between individualized and generalized models must be

addressed.

4. A successful cognitive workload detection system must be practical, reproducible,

reliable and responsive.

5. Finally there exists a lack of standardized procedures and datasets to collectively

study and tackle this problem.

1.2 Research Objective

Electroencephalography (EEG) is one of the most prominent Physiological modalities

used for Driver Cognitive Load Monitoring.[17] While Heart Rate and Galvanic sensors

are other options for physiological monitoring, EEG has consistently shown to be a

more reproducible and a reliable source for monitoring.[17] In part, the advantage lies

with the recording site: the Brain, the center of all cognitive actions and resources.

EEG allows for the recording of the induced electro-physiological behavior of the brain

while driving under varying cognitive conditions. The close proximity of the sensor

to the brain allows for the finest temporal resolution of recording to gather the most

responsive changes induced by external factors such as an increased workload during

driving. Significant correlations have been shown to associate EEG measurements with


cognitive workload changes.[24][47][51] In particular, studies have been performed to

detect correlations between related mental states such as alertness, attention, fatigue,

drowsiness and cognitive workload.[34][31][25][44][19] The research objective of this study

is now stated:

In this experiment, the driver cognitive workload has been modelled using a secondary

n-back task (Section 2.3). This study aims to answer the question if a consumer grade

2-channel EEG modality can be used to discriminate between the granular cognitive work-

loads induced on the driver as it is measured by the n-back test

1.2.1 Modeling using Low-Channel Consumer Grade EEG

Most studies have used high resolution medical-grade EEG sensors which can require at

least 32 electrodes(channels) to cover the neural activity across the entire scalp. While

the added number of channels increases the performance of the monitoring system, it

also imposes ergonomic challenges. In particular, a wired medical grade EEG sensor is

too cumbersome and intrusive to be worn on a practical real-world driving monitoring

system. In addition, a trade-off using high resolution recording is the absence of digital

processing of the signals at the recording site. The recorded signals are often processed at

a secondary processing site such as a remote computer which makes it impractical to be

used in a real-time driving monitoring system. To mitigate these issues, newer research

has been performed using consumer-grade wireless EEG devices which are much more

ergonomically desirable and provide a rich set of on-device processing capabilities. As a

drawback, these devices contain a much lower number of recording channels (4-16) and

focus on concentrated regions of the scalp with a limited number of sensors to derive

patterns and behaviors. [9][7][8]

An objective of this study is to use one such consumer-grade wireless EEG sensor,

the Muse, and to determine its feasibility in the development of an automated Driver

Cognitive Monitoring System. In particular, the Power Spectral Density (LDA) generated


Figure 1.1: Proposed top-level design pipeline

by the sensor are evaluated to determine the optimal feature(s) or feature-set(s) which

provide the maximal discriminant behavior in distinguishing between differing cognitive

workloads. Figure 1.1 describes the top-level system design to answer this question.

1.3 Thesis Contributions

In order to study the feasibility of using a consumer-grade EEG modality towards

the development of an automated driver cognitive monitoring system, the eDREAM

dataset (EDD) is considered.[6] The EDD is a multimodal dataset that consists of Vehi-

cle performance, Visual-Video and Physiological(Heart Rate, Galvanic, Respiration and

EEG) modalities with are time-synchronized with ground truth cognitive level measure-

ments. Prior studies have been performed with the Visual-Video modality, however, a

detailed analysis of the EEG modality has not been performed. Using the EEG modality

(consumer-grade Muse sensor) of the EDD, the following contributions are made towards


the development of a driver cognitive monitoring system:

1. A careful description of the EDD as it pertains to the EEG modality is described.

A careful overview of the hardware and software capabilities of the primary Muse

sensor is explored with respect to the original EEG measurements and the derived

features. Attention is paid to the on-device measurement technicalities versus the

information that is transmitted wirelessly and recorded as part of the dataset. In

summary, this contribution describes the featurespace that is generated by under-

standing the recording technicalities of the EEG recording from the Muse sensor

and how these recording are time-synchronized with the secondary task to create

a labeled dataset that is used in the design of this monitoring system. Finally, a

statistical analysis on the discriminant sensitivity across the features are discussed.

2. A standardization and normalization feature-processing step using historical par-

ticipant data is proposed which aims to reduce artifact and inter-individual differ-

ences towards building successful generalizable models. Performance comparisons

between the original and feature-processed datasets are discussed.

3. Dimensionality reduction techniques of PCA and LDA are performed. Compar-

isons in performance between PCA, LDA and original featurespaces is performed

to determine the practical implications of improving modeling performance via di-

mensionality reduction techniques.

4. A comparison between individualized and generalized models is performed to un-

derstand the performance and practical tradeoffs of using both methodologies for

estimating cognitive load.

5. In order to achieve successful modeling of driver cognitive load, it is important

to identify the optimal feature or set of features that generate the best prediction

performance. In this study, various Machine Learning models are generated on


multiple featurespaces and a performance evaluation is used to identify the opti-

mal featurespace(s) or feature(s) that are ideal for usage in a real-time monitoring

system. Therefore, this contribution pertains to the myriad of simulations per-

formed and the accompanying results to generate predictive models for EEG based

cognitive monitoring task.

1.4 Thesis Organization

The Thesis is organized as follows:

• Chapter 1 summarizes the motivation behind the study and an overview of the

various modalities used towards automated driver cognitive workload monitoring.

It concludes with the contributions of the study pertaining to the use of the EEG

modality of the EDD.

• Chapter 2 provides the experimental overview and the various features available

as part of the EDD and the EEG sensor as it pertains to performing the experi-

mental procedure. The statistical properties of the available feature-set is presented

and pre-experimental analysis is performed and discussed. The techniques of di-

mensionality reduction using PCA and LDA are also discussed.

• Chapter 3 describes the prior-works and the state-of-the-art studies performed for

both high resolution and low resolution EEG sensors. A discussion of the Feature

Extraction techniques and the accompanying results are discussed.

• Chapter 4 provides the Machine Learning Performance Evaluation to identify the

optimal featurespaces for the estimation task. The experiment pipeline, feature

processing methodology and simulation results are presented and interpreted to

identify the optimal models and feature(s).


• Chapter 5 provides a conclusion and summary of the study and discusses future

works that can succeed this preliminary study.

Chapter 2

eDREAM EEG Modality

2.1 Quantifying Cognitive Load

In order to provide a quantifiable experimental premise for this study, it is very important

to precisely define Cognitive Workload and the various associated modeling paradigms. A

comprehensive review by Bin and Salvendy review a consensus of literature on Cognitive

Workload and define it as the following: ”Amount of Mental Work or Effort necessary

for a person or group to complete a task over a given number of time.”.[54] Cognitive

Workload cannot be measured directly, instead, it needs to be modeled via other sub-

jective or quantitative means.[54][46] Therefore Cognitive Workload is a multivariate

measurement with temporal, physical(resources) and psychological(stress / anxiety etc)

attributes. A subjective methodology called the NASA-TLX index extends this to 6

aspects: mental, physical, temporal, performance, frustration and effort levels and is one

of the subjective measures employed in this study.[29] A proper modeling methodology

must take into account all of these attributes and allow for the quantification for analysis.

In addition, additional constraints are added once an automated monitoring system is

desired as post-hoc subjective paradigms are impractical in such scenarios. Bin and Sal-

vendy propose a modified taxonomy of mental-workload techniques, one of which is an

9

Chapter 2. eDREAM EEG Modality 10

Figure 2.1: Empirical Cognitive Modeling

Empirical Modeling methodology as shown in Figure 2.1. In this modeling method, sub-

jective, performance and psychophysiological methods are used in unison to determine

an emperically feaseable quantifiable model.[54]

Cognitive Workload during driving is related to the workload associated with driving

related tasks such as vehicle control, navigation, rule-following etc. This can be catego-

rized as the primary task. When the driver is in a cognitive overloaded state, it can be

due to sub-optimal conditions in one or all of the aforementioned cognitive workload at-

tributes(temporal, physical(resource), psychological demands). In experimental studies,

the secondary task is carefully controlled to generate a ’ground truth’ for the Cognitive

Load Level by adding weights to the temporal, physical and psychological demands.[30]

The Yerkes-Dodson summarization of ’performance vs mental-arousal’ can be used

in a driving context to describe the driver cognitive workload as shown in Figure 2.2.

A low arousal resulting in an underloaded cognitive performance can be related to low

temporal,physical and psychological demands. Some studies have associated this state

with drowsiness.[57] On the other hand, high temporal, physical and psychological stress

demands are associated with High arousal on the Yerkes-Dodson curve associated with


Figure 2.2: Yerkes-Dodson: Arousal vs Performance

high alertedness and anxiety.[57]

As can be seen in the Yerkes-Dodson curve, a continuous spectrum displaying varying

levels of cognitive workloads can be manifested. It is therefore necessary in any modeling

attempts to have a controlled methodology to induce accurate granular measurements of

cognitive workloads. Various studies have proposed highly controlled secondary tasks to

supplement primary tasks and granular cognitive monitoring is achieved.[30][54][46][55]

Table 2.1 shows granular cognitive workloads as may be experienced by a driver. This

study uses a secondary n-back task which is used to create a 3-level cognitive workload

when supplemented with the primary driving task. 2.3

Various modalities are now discussed which provide real-time performance metrics to


Table 2.1: Granular Cognitive Workload States

Driving Difficulty Multi-Tasking Mental State Cognitive Workload Example

0 0 0 Low Low Traffic, No Distractions, No Fatigue

0 0 1 Medium Low Traffic, No Distractions, Fatigue

0 1 0 Medium Low Traffic, No Distractions, Fatigue

0 1 1 Medium Low Traffic, Navigating, Fatigue

1 0 0 Medium-High High Traffic, Navigating, No Fatigue

1 0 1 Medium-High High Traffic, No Distractions, Fatigue

1 1 0 Medium-High High Traffic, Navigating, No Fatigue

1 1 1 High High Traffic, Navigating, Fatigue

monitor and model the quantifiable Cognitive Workload during driving.

2.2 Modeling Methods

Studies have been performed using Vehicular, Video and a wide assortment of Physiolog-

ical Modalities to model the driver cognitive workload. Each of these modalities presents

its own set of advantages and challenges which in turn affects the reliability, generaliza-

tion and practicality of the imposed solution. The increased reliability, generalization

and practicality allows for greater enabling of built-in pro-activeness into the proposed

solution which is a requirement for accident prevention.

Vehicular Modality

In vehicular modality studies, driver performance in physical tasks such as speed man-

agement, steering, breaking, lane deviations and response timings are extensively used to

generate models to determine the cognitive workload.[30] It is often impractical to test

such maneuvers in a real-world scenarios due to safety concerns, and therefore, indoor

mechanical emulations are often designed to collect experimental data. In addition, the


modality is less temporally responsive when compared to Physiological measures and

may not provide the necessary proactive response when handling time-critical distrac-

tion stages. However, most vehicles already contain the necessary hardware (steering,

breaks etc.) and minimal additions are required to implement such a monitoring system.

As a result, various vehicles are already equipped with such systems to monitor driving

behaviors.[30][32][26][11]

Video Modality

The video modality uses facial derived information such as blinks, gaze, head movements

and facial-expressions to model the cognitive workload. Video allows for excellent tem-

poral tracking of the behavioral response of the driver via movement based gestures.

However, privacy concerns and installation of camera(s) are required for the implemen-

tation of such a system. Additionally, the computation requirements to process the

high-data streams are quite significant and image processing and proactive predictive

algorithms need to be aligned to maintain the high temporal response required in an

overloaded scenario. Finally, the success of this modality is dependent on the ambi-

ent recording conditions which can be a challenge due to varying external and internal

lighting conditions.[30][32][26][11]

Physiological Modality

Various types of bio-medical modalities such as Heart Rate, Electroencephalography,

Respiration Rate and Skin Conductance are used to model the cognitive workload. Phys-

iological sensors provide the best temporal resolution and are able to most accurately

detect the changes in the physiology of the subjects. In addition, this data can be quan-

tized to accomplish lower bandwidth for processing and therefore compute requirements

are lowered when compared to the video modality. Monitoring using this modality is able

to accomplish the proactiveness required in time-critical periods of distraction. However,


physiological modalities are among the most intrusive as external hardware is worn by

the user. In recent years, less intrusive wearable sensors are being developed which show

for an encouraging sign of practical implementation of this modality as a monitoring

system.[7][9] In 2018, Nissan implemented a practical EEG based monitoring system to

monitor driver vigilance.

Subjective Modality

While direct quantitative measures are the preferred mechanism to create a reliable and

testable monitoring system, subjective measures such as the NASA Task-Load-Index and

Karolinska Sleepiness Scale can be used as supplementary methodologies to receive post-

hoc subjective human feedback such as the difficulty of the task or self-perceived levels

of workload.[29][52] Using the subjective modalities as the sole measurement modality is

not feasible due to high cross-trial and individual variability in self-perception of tasks,

non-quantitative nature of measurements and impracticality towards an automated cog-

nitive monitoring system. Instead, using the subjective measures as a secondary measure

for validation and comparison can be very useful, in particular, when ground truth veri-

fication are to be made and a confidence premise is required to determine if a particular

task-difficulty is being indeed perceived by the user. [54]

2.3 n-Back Secondary Task

n-Back task is a commonly used working memory function task used to test the memory

recall and cognitive processing of the brain. [42] In this study, an auditory version of

the nBack task is performed by a participant. A sequence of letters are played to a

participant, and the participant is required to keep track of the letters played ’n’ steps

ago. If the latest letter played matches the letter played ’n’ steps ago, the participant has

encountered an ’n-back’ event. The participant is asked to report the total number of


’n-back’ events experienced at the end of the task. In this modified task, the participant

not only has to store the letters from ’n’ steps ago but also has to locally store the

number of occurrences of an n-back event which adds to the difficulty of this modified

n-back auditory task. An example is showcased next: [40]

A B C D E F G H G I J J

A B C D E F G H G I J J - 1-back region

A B C D E F G H G I J J - 2-back region

2.3.1 n-Back for the Driving Task

As described before, in order to successfully model Cognitive Load, a secondary task can

be introduced to create a granular cognitive workload variable. One method to do that

is to design experiment scenarios where the driving task is controlled and treated as the

primary task and a secondary cognitive task is varied to generate scenarios where the

affects of varying secondary cognitive loads can be examined. n-Back task is a common

choice as a secondary task across various studies to induce a granular cognitive workload

environment.[53][35] A combination of a controlled primary task and a varied secondary

n-back task (by varying ’n’) allows for the creation of a cognitive workload environment

where drives performed can be labeled with an associated cognitive workload level.

In this study using the EDD,[40] a modified n-back ’audio’ task was presented to the

participants. The outline of the task is as follows:

1. Each participant performs 3 separate drives associated with 3 N-back levels (N0,

N1 and N2)

2. Participant is told about which n-back drive is being performed

3. An audio version playing 10 randomly selected ’letters’ is presented to the partici-

pant for each drive


4. The participant is required to keep a ’count’ of the number of specified n-back

occurrences and report the answer at the end of the drive.

5. This is repeated for all 3 distinct n-back drives

The main modification arises by not requiring the driver to continuously verbalizing

the n-back occurances and instead adding an extra memory dimension by remembering

the total number of n-back occurances. While this adds to the complexity of the task,

it also serves as an added benefit since motion based artifacts from speaking during the

recording process are reduced. Details of the recording and experiment design decisions

can be gathered from the eDREAM Data Collection document. [40]

2.4 eDREAM Experiment

In the eDREAM experiment[40], a myriad of sensory data: (1) Vehicle-Based Measures,

(2) Physiological Measures (EEG, ECG, Galvanic Skin Response, Respitation) and (3)

Video and Eye Tracking measures, are collected while a driver drives in a high perfor-

mance driving simulator called the NADS miniSIM. Drivers perform the primary driving

task in the presence of a secondary auditory-recall nBack task as described in Section 2.3.

The secondary nBack task is used to induce a controlled and granular secondary cognitive

task. A three-level granularity is chosen in the eDREAM experiment, such that, three

drives are performed by each participant with each drive consisting of a differing n-back

level. The primary and secondary task together induce a level of cognitive workload

allowing for the labeled generation of the EEG recordings. Details of the data collection

campaign can be found in [40] and the following sections will provide a summary and

then focus deeply into the EEG modality as it pertains to this study:

Participant Demographics

• 36 Participants (18 Male and 18 Female)


• Age ≤ 35 years (27.6yr +- 4.45)

• Consistent Drivers with valid license for atleast 3 years

• No Vision-Correction Glasses (contacts allowed)

2.4.1 n-Back Drive Labeling

Three incremental n-back task drives are performed by the participants in random order.

Each drive is approximately 5-10 minutes in length, however, the audio n-back task is only

employed for 2 minutes within the drive denoted as the critical audio section. Therefore,

there are periods before and after the n-back audio task, which are recorded, however,

are not labeled under any cognitive workload. During the critical audio regions, the

driver performs the n-back task while driving in straight sections of road with controlled

non-distracting driving conditions. Each critical audio region consists of 3 sets of same

level n-back exercises (consisting of 10 letters). A break of approximately 45 seconds is

provided after the first critical section allowing for a relaxation of the cognitive state after

the mental exercises. Figure 2.3 shows a visual illustration of the n-Back drive timings.

During the drive, the participants are expected to follow a lead vehicle at a constant speed

and are not expected to make turns or change lanes. This ensures that the driver is mostly

impacted cognitively via the secondary n-back task and a controlled cognitive modeling

environment is implemented. The participants also undergo a training and preparation

procedure to ensure minimal errors and variability during the data collection phase.

Upon the completion of each drive, the results for the nback task are recorded. The self-

perceived subjective workload scores are recorded using the NASA-TLX methodology.

This allows for the creation of a subjective measure which could be used in unison with

the other modeling methods as a confidence measure during evaluation.


Figure 2.3: n-Back Drive for Participant 8

2.5 EEG Data Collection

EEG Data collection is performed via a consumer-grade 4 channel sensor: Muse devel-

oped by Interaxon.[9] The device used is a prototype first-generation Muse device which

consists of two dry electrodes in the frontal sites and two gel foam electrodes in the two

temporal sites behind the ears as shown in Figure 2.4 and Figure 2.5 A remote computer

using the Interaxon development software, Muselab, is used to record the wireless data

that is transmitted. Muselab also consists of a graphical user interface to analyze the

EEG data transmission in real time and observe any artifacts. Timestamped data for

each drive is originally saved in ’.muse’ format and then converted to ’.csv’ and ’.mat’

formats for post-hoc analysis.

2.5.1 Time Synchronization

A critical step required for the labeling of EEG data from the Muse headset is to correctly

synchronize the timestamps of the EEG recordings with the real-timing of the drive

sequence. This is shown in Figure 2.6.


Figure 2.4: Muse Headband electrode placements.

Figure 2.5: Muse Headband gel foam temporal electrodes and frontal dry electrodes.


Figure 2.6: EEG Signal Labeling Procedure

This is achieved by using the visual frame generated by the miniSim Driving Simulator

and forwarding these frames to the Muselab recording software. By matchning the frame

numbers in the miniSim recordings and the EEG recordings, it is possible to determine

exact timestamps in the EEG domain which relate to drive events such as start/stop

of the critical audio tasks. Therefore, the independent timing of the EEG recording is

synchronized with the real-time timing of the minisim simulator and successful labeling

of the EEG data is achieved for each drive. The detailed procedure is described in the

eDREAM data collection document.[40]

2.6 EEG Headband Measurements

The Muse consists of 4 sensors, two located on the forehead at the corresponding Frontal

Polar Site (Fp) of the brain, denoted as Fp1 and Fp2 and two nodes located at the


back of the ears at the corresponding Temporal-Parietal Site (TP), denoted as TP9 and

TP10. Data is processed on-site at the Muse headset and information is then transmitted

wirelessly to the remote recording computer. In addition to the sensory EEG data, a

variety of ancillary information is also transmitted related to the quality and recording

specifications of the measurements. Table 2.2 shows the various measurements and its

recording specifications.

Table 2.2: EEG Headband Measurements

Measurement UnitsSampling

Rate

EEG uV 220Hz

FFT dB 10Hz

Absolute Power Bels 10Hz

Relative Power None 10Hz

Blink Artifact 0/1 10Hz

Jaw Artifact 0/1 10Hz

Acceleration mG 50Hz

Headband Connection 0/1 10Hz

In-Device Headband Recording

A Digital Signal Processing (DSP) enabled embedded system is present inside the head-

band which interfaces with the raw electrodes and performs DSP operations to generate

the information that is transmitted wirelessly as shown in Table 2.2. While the wirelessly

transmitted information is used for this study, it is important to consider the original


recording specifications and technicalities to understand the myriad of extra capabilities

available on-site, compared to the off-site localities. The following in-device configura-

tions as per used in this study are described:

• The EEG data is collected at a native sampling rate of 3520Hz.

• A downsampling by 16 takes place which generates a new EEG signal at 220Hz to

be transmitted wirelessly via the blutooth specifications.

• A digital notch filter is applied at 60Hz to reduce ambient lighting and power

artifacts.

• Each EEG measurement is recorded at a 10-bit resolution.

• An analog noise reduction technique using the Driven Right Leg Circuit is im-

plemented to reduce the environmental electromagnetic artifacts collected by the

human body.

2.6.1 Wireless EEG

The raw EEG signal at the four sites are originally sampled at 3520Hz and at a 10-bit res-

olution. However, to optimize processing requirements and transmit wirelessly, the signal

is downsampled by 16 times to 220Hz and transmitted wirelessly at bluetooth specifi-

cations. The recording is in the range of 0.0 - 1682.815 uV and each measurement is a

vector of length 4, corresponding the the measurement at each site (TP9,Fp1,Fp2,TP10).

The resulting power spectral features are derived from this 220Hz EEG signal, therefore

this can be considered to be the base signal from which all features may be derived.

2.6.2 Power Spectral Density Feature

While the EEG data stream may be used in its raw form for analysis, the MUSE headset

performs an on-device Fast Fourier Transform (FFT) of the EEG data to produce a


Figure 2.7: 220Hz EEG for User 8

Logarithm of the PSD. This operation performs a transformation of the time-domain

signal into its Frequency-domain representation and Power Spectral analysis of the signal

can be performed.

A FFT generates both Amplitude and Phase components, however, Muse only trans-

mits the Amplitude components of the transformation as 129 Amplitude coefficients. In

order to generate a temporal representation of the spectral response, continuous FFT

transformations of the EEG are performed every 100ms. Muse performs the FFT trans-

formation on the 220Hz EEG signal every 100ms, thereby generating a 10Hz temporal

recording of 129 FFT Amplitude coefficients. The following steps describe this proce-

dure towards the creation of a time-series representation of the PSD response. This is the

algorithm as descibed by Interaxon to describe the inner workings of the headband:[9]

1. A Hamming Window of 256 samples is used to perform a FFT on the analog EEG

signal recorded at 220Hz.


2. A 90 percent overlapping window is used whereby the hamming window is slid 22

samples (1/10th of a second @ 220Hz) and the FFT is performed. This generates

the 129 FFT Amplitude coefficients at the aforementioned 10Hz frequency.

3. Since the FFTs are calculated over a 256 sample Hamming Window, the transform

generates 256 symmetric FFT components over the origin. ’Negative’ Components

are dropped and 128 components along with the component at the origin are re-

tained, thereby generating 129 Power Spectral Amplitude components.

4. The FFT Amplitudes are transformed to the log scale with units in deciBels

The 129 coefficients are of significance as each coefficient corresponds to the Power

Amplitude at a bin size of 0.86Hz. Therefore for 10ms of EEG recording, a Power Spectral

Representation at a resolution of 0.86Hz from 0 to 110Hz is provided. Since most of the

EEG Spectral information is contained between 0Hz-50Hz, [9] a maximal frequency of

110Hz is highly satisfactory for our analysis to generate subsequent features.

Power Spectral Bands

5 main EEG Power Spectral Bands of interest are defined in Table 2.3.

Figure 2.8 shows the median PSD at Fp1 site for a drive performed by user 8.

2.6.3 Absolute Band Power Features

As described in Section 2.6.2, the 5 main EEG Power Spectral Bands of interest are

the δ, θ, α, β and γ Bands. The 129 Fourier Amplitude Coefficients derived in Section

2.6.2 are used to generate these cumulative Absolute Power Features. It is defined as the

Logarithm of the sum of the Power Spectral Density over the desired Frequency Range

as shown in equation 2.1 with unit ’Bels’.

Pabs = log10(

j∑n=i

Xn), s.t. : 0 ≤ i, j ≤ 128 (2.1)


Table 2.3: EEG Headband Measurements

Power

Band

Frequency

Range

(Hz)

δ 1-4

θ 4-8

α 7.5-13

β 13-30

γ 30-44

βγ 20-40

where Pabs is the Logarithm of sum of the PSD of the entire band, Xn is the FFT

Amplitude Coefficient of the nth selected frequency bin, i is the starting frequency-bin

and j is the ending frequency bin for the desired frequency-band.

2.6.4 Relative Band Power Features

Using the Absolute Band Power Features, the Relative Power features can be generated

via equation 2.2.

Prel =10Pabs

(10Pδ + 10Pθ + 10Pα + 10Pβ + 10Pγ )(2.2)

where Prel is the ratio of the selected frequency band to the total power in all of the

bands and Pabs is the Absolute Power of the band.

The Relative Band Power of a selected frequency band is the ratio of the energy of

the selected frequency band to the total energy of the five defined frequency bands. It is


Figure 2.8: FFT Response User 8

dimensionless.

2.6.5 βγ Features

Studies have shown an increased discriminant behavior in band behaviors between 20-

40Hz.[53][48] Therefore, using Equation 2.1 and Equation 2.2, two new features are gener-

ated. βγ Absolute corresponds to the Absolute Band Power at 20-40Hz and βγ Relative

corresponds to the Relative Band Power at 20-40Hz.

Therefore, in total, 6 Absolute Power Features and 6 Relative Power Features are

generated at each channel.

2.6.6 β-Relative Features

Studies have shown the importance of using Low and High Frequency ratios in detecting

cognitive states.[31][17] The following four β-relative features are generated:

• (θ + α) / β


• α / β

• (θ + α) / (α + β)

• θ / β

2.6.7 Additional / Artifact Information

Table 2.2 shows the artifact information that is provided. EEG recordings are prone to

artifacts such as eye blinks, movement, jaw clenches, skin conductance etc. Accelerometer

can be used to determine the head movement of the participant as well as any drastic

movements causing a noisy EEG signal. Frontal Sites(Fp1 and Fp2) experience higher

blink artifacts and these are reported via the blink artifact indicator. Temporal Sites(TP9

and TP10) experience greater affects of Jaw Clenches which is reported via the Jaw

Clench binary indicator. Most importantly, a Headband Status Indicator is present which

provides binary measurements of ’Good’ and ’Bad’ conductive status. This indicator

provides a strict description of the quality of the signal and is the primary indicator of

the integrity of the recorded EEG sample.

Figure 2.9 shows the EEG signal at the Frontal site synchronized with the artifact

recordings. Figure 2.10 shows the EEG signal at the Temporal Site synchronized with the

artifact indicators. As can be seen, Temporal Sites exhibit extreme noise as indicated with

the Headband Status Artifact. This was observed across all trials and can be attributed

due to the usage of a gel-based electrode at the TP contact points. Due to the high noise

artifacts present in the temporal sites, recordings from TP9 and TP10 were omitted in

this study.


Figure 2.9: Sensor Conductivity at Frontal Site

Figure 2.10: Sensor Conductivity at Temporal Site


2.7 Data Exploration

Since the two temporal sites are omitted from the analysis due to excessive noise artifacts,

the two frontal sites are fully taken into consideration. A group-wise data-exploration

using 28 chosen participants is now described.

2.7.1 PSD Response

The group-wise median PSD responses are generated across the 3 datasets. Frequency

Range from 0Hz-44Hz corresponding to the 5 bands of interest are evaluated.

Using the experimental data-collection notes and observing the artifact-information

across all 37 users, 28 out of the 37 users(15 male, 13 female) are chosen for this study.

Users dropped are excluded due to significant movement, poor skin-conductance, lack

of effort or incompletion of tasks. Due to the sensitivity and inherent inter-individual

differences in the EEG responses, users showing aforementioned artifacts/challenges were

dropped from the analysis to allow for maximal control of the affects of the secondary

nBack task.

Figure 2.11 shows the group median PSD response for the 28 specified users. Table

2.4 summarizes the findings:

Table 2.4: Group-wise(28) PSD Band Sensitivities

Experiment Sensitivity Prior Work

Cognitive Workload ↑ θ↑, αβγ↓ [31][35]


Figure 2.11: Groupwise PSD across the 3 nBack tasks.

2.7.2 Statistical Featurespace Overview

The three sets of Power Spectral Feature (Absolute, Relative and β-Relative) can be

statistically analyzed for pre-experimental snooping to determine any underlying patterns

as the n-back task is varied. The Featurespace is first standardized such that all features

have a zero mean and are represented as a relative metric away from the zero mean.

This ensures that during the visualization, no features are over-weighted and a uniform

comparison can be made. This process is performed by using the Z-Score statistic as

shown in Equation 2.3.

Xscore =Xsample − µ

σ(2.3)

where Xscore is the normalized feature value, Xsample is the original feature value, µ is

the mean of the feature vector and σ is the standard deviation of the feature vector.

The following observations are made by analyzing the Feature Distribution before

experimental stages. This allows us to understand the distribution and first order sta-

tistical behavior of the data as it pertains to the samples present in the three distinct


Figure 2.12: Absolute Power Feature vs nBack Task for group of 28 users at Frontal Sites


Figure 2.13: Relative Power Feature vs nBack Task for group of 28 users at Frontal Sites


Figure 2.14: βRelative Power Feature vs nBack Task for group of 28 users at Frontal

Sites


workloads:

1. No single Feature showcases clear linear separability between the three classes.

2. All Absolute Power Features showcase decreasing trends as cognitive workload is

increased. α, β and γ bands show the most significant decrease.[31]

3. Low Frequency Relative Power Features (δ, θ and α) showcase increasing trends as

Cognitive Workload is increased. β, γ and βγ relative bands showcase decreasing

trends.

4. All βRelative Band Features showcase increasing trends as Cognitive Workload is

increased. This indicates that θ and α bands contain more power than the β band

as cognitive workload is increased. [31]

2.8 Dimensionality Reduction

Principal Component Analysis (PCA) and Linear Discriminant Analysis(LDA) are two

techniques used in this study to perform dimensionality reduction. Dimensionality re-

duction allows the following advantages:

1. It allows for the presentation of multi-dimensional data into a 2D or 3D space,

thereby generating a new featurespace that contains informative energy.

2. In various machine learning studies, dimensionality reduction is a key pre-processing

stage that allows for complex models to be trained successfully when training sam-

ples are scarce and dataset is high-dimensional. This problem is called the ”Curse

of Dimensionality”. [3]

3. Reducing the featurespace to a 2D or 3D space allows for the visualization of

higher dimensional featurespaces and allows us to make inferences as to the possible

separability of the featurespace.


4. In real-time environments, it may be more computationally efficient to execute and

generate models in a low-dimensional space.

In this section, the combined and individual Absolute, Relative and β-Relative fea-

turespaces are reduced via PCA and LDA, and the data visualizations generated are

discussed.

2.8.1 Dimensionality Reduction for Individual Participants

PCA and LDA can be performed on datasets generated from individual participants to

generate reduced featurespaces and benefit from the aforementioned advantages. In par-

ticular, during the generation of individualized models for Machine Learning, there can

often be a scarcity of training samples as compared to the number of samples present for

generalized multi-user models. This scarcity of samples, added with a high-dimensional

space make the usage of dimensionality reduction techniques highly beneficial. The fol-

lowing figures showcase various PCA and LDA featurespaces applied to a 32-dimensional

featurespace (Absolute, Relative and Ration Features across 2 Frontal EEG channels)

for Participant 8:

The following observations can be made after dimensionality reduction for participant

8:

1. In the PCA space with 3-dimensions, it can be seen that linear separability is

possible between n0 and n1 datasets. Looking closer at the PCA1 vs PCA2 space

(Figure 2.16, it can be seen that it may be possible to linearly separate all three

classes. Therefore, the components with the greatest informative energies may also

be most discriminant.

2. Figure 2.17 showcases the 2-D LDA space. It is very clear in this representation,

that a simple linear classifier can be trained to very accurately discriminate be-

tween the three classes. Therefore, Linear Discriminant Analysis is successful in


Figure 2.15: Single Participant PCA on 32-features

Figure 2.16: Single participant PCA on 32-feautures


Figure 2.17: Single Participant LDA on 32-features

maximizing the distances between the three classes.

However, it should be noted that these are individualized spaces for Participant 8 only

and empirical evaluation and experimentation as performed in Chapter 4 are required to

determine the performance of all individual participants in this dataset.

2.9 Dimensionality Reduction for all Participants

Datasets can be generated via the same dimensionality reduction techniques applied

to the individualized sets. The following figures show the LDA and PCA spaces for

the cumulative dataset of 28 participants considered in this experiment. The following

generalized obersvations can be made across all complete and sub-featurespaces:

1. As the datasets from individual users are concatenated to generate a cumulative

dataset, the PCA and LDA techniques by themselves are unable to generate linearly

separable featurespaces.

2. Significant Overlap between the three classes is present across all featurespaces and


Figure 2.18: PCA applied to a 32-Dimensional Featurespace (28 Participants)

feature-processing techniques are required to take into account inter-individual and

outlier affects on the datasets.

3. It should be noted, that no windowing, standardization or normalization schemes

have been performed on these datasets and these visualizations are part of the

original data generated by amalgamating the labeled data from all users.

4. The dimensionality reduction techniques showcase the 2D and 3D representation of

the datasets and it presents the challenges associated with performing a data-mining

approach to learn from data. The feature-processing techniques in Chapter 4 aim

to deal with these challenges to improve the performance of cognitive workload

prediction using the eDREAM dataset.

2.10 Chapter Summary

This chapter introduced Cognitive Modeling using a mixture of Subjective, Psychophys-

iological and Performance Measures. It introduced the advantages of using the EEG


Figure 2.19: PCA applied to a 32-Dimensional Featurespace (28 Participants)

Figure 2.20: LDA applied to a 32-Dimensional Featurespace (28 Participants)


Figure 2.21: PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-

ipants)

Figure 2.22: PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-

ipants)


Figure 2.23: LDA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-

ipants)

Figure 2.24: PCA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-

pants)


Figure 2.25: PCA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-

pants)

Figure 2.26: LDA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-

pants)


Figure 2.27: PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants)

Figure 2.28: PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants)


Figure 2.29: LDA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-

ticipants)

modality as a Psychophysiological measure. The eDREAM experiment was briefly intro-

duced with a focus on the nBack task as a secondary cognitive task to accompany the

primary driving task. The EEG modality of the eDREAM dataset using the Low Channel

Muse sensor was discussed and an introduction to the Power Spectral Feature Extraction

methodologies employed were discussed. Statistical performance of the Features were

then observed and analyzed to perform pre-experimental snooping of the patterns of the

features. Finally, a comparison between individualized and groupwise Dimensionality

Reduction techniques using PCA and LDA were discussed. Dimensionality Reduction as

a visualization and Machine Learning toolset were both discussed.

Chapter 3

Related Works and Review

This chapter showcases the summary of the related works that were used to perform

this study. Attention is paid towards the EEG modality in particular as it is the sys-

tem modality in this study. There have been two approaches towards the estimation of

the Driver Cognitive Workload monitoring: 1) Using specific Digital Signal Processing

algorithms to develop fine tuned solutions, 2) Usage of data mining and statistical learn-

ing approaches have been extensively used to perform top-down or bottom-up feature

selection processes. This study employs a bottom-up statistical learning approach and

the following sections summarize the related and state-of-the-art studies which have used

similar techniques to accomplish the goal of studying the relationship between Cognitive

Workload and EEG sensitivities.

This chapter is organized as following, Section 3.1 discusses Power Spectral sensitivity

findings in related domains of Drowsiness / Fatigue, Cognitive Workload / Distraction

and Meditation Section 3.2 discusses the state of the art works using consumer-grade

low-channel EEG sensors.

45

Chapter 3. Related Works and Review 46

3.1 Related Works

A myriad of studies employ feature extraction techniques to operate in the power spectral

domain and garner statistical significant activities in the five major bands as employed

in this study: δ θ α β γ While many studies deal with driving-related cognitive work-

load estimation tasks, a myriad of studies deal with variations of cognitive states such

as fatigue, drowsiness, sleepiness and meditation. In the context of the literature review,

studies pertaining to the relationship of EEG features with cognitive workload and afore-

mentioned states are both considered to understand the techniques or frameworks used

in the development of estimation models.

3.1.1 Drowsiness and Fatigue studies

A myriad of studies showcase the relationship between Power Spectral sensitivities and

onsets of Drowsiness / Fatigue.[34][31][33][38][25][56][50] In terms of the the Yerkes Dod-

son summarization, the Drowsiness/Fatigue states can be related to low arousal or a

cognitively under-loaded condition.[57] The following takeaways relating the power spec-

tral sensitivities are presented from their respective literature:

• Lal, Craig and Jap have performed prominent research related to drowsiness de-

tection in driving conditions which primarily deals with all the bands except γ.

Primary findings relate to a significant increase in δ, θ and a decrease in α and

β power bands during onset of fatigue.[34] In another study to understand the

onset of fatigue during long monotonous driving, a significant decrease in α was

observed along with a consistent increase of β-Relative Features (as employed in

this study).[31]

• It is also interesting to discuss the relationship between Cognitive Worload and

Fatigue. In a comprehensive review performed by Borghine et al. in an aircraft-

pilots cognitive study, an onset of focused and sustained task often leads to fatigue


over time.[16] Both activities consist of an increase in θ and β activities and a

decrease in α activities. However, during the onset of fatigue, α spindle activity

(bursts of α bands) is observed leading to increased δ, θ and α activities and a

decrease in β activity.[49] Changes from an abrupt cognitively overloaded to an

underloaded scenario (and need to distinguish between focused and fatigued) can

be very challenging to distinguish when only single bands are selected for modeling.

Driver Drowsiness Monitoring

To generate monitoring systems, a classification stage is necessary to train and test the

various feature extraction methodologies. Khusuba et al. perform feature extraction

using Wavelet Packet Transform and select frequency bands by inspecting and mining

frequencies form the entire spectrum. These selected features are used to generate indi-

vidualized models and are tested with Linear Support Vector Machine (LSVM), Radial

Basis Kernel Support Vector Machine (RBSVM), LDA and kNN statistical learning al-

gorithms. Due to the comprehensive feature selection process and individualized models,

an accuracy of .94 is achieved using an LDA classifier.[33] Conversely, Yeo et. al inves-

tigate the use SVM for classification duties and use the traditional FFT mechanism for

feature selection. An accuracy of .99 is achieved for detecting onset of drowsiness.[56] It

is important to note that both studies use medical grade EEG sensors with high channel

counts in a highly controlled environment which may not be practical for real driving

situation.

3.1.2 Cognitive Workload and Distraction

A breadth of studies have also attempted to study the sensitivities associated with Cog-

nitive Workload. While Cognitive Workload and Cognitive Distraction are two distinct

phenomenon, they are both associated with the prevalence of a secondary task. There-

fore, for this review, both distraction and workload based studies are discussed under the


same umbrella. A key finding consistently shown in most studies is the increase in θ band

behavior and a decrease in α and β band behavior as cognitive workload is systematically

increased.[24][47][51][14][59][48][23][35][41][17]

1. In a non-driving study and in a single task environment to recall upto 6 objects,

Lundqvist et. al showed a consistent increase in θ and γ band as each new item is

added to the working memory task. Conversely, α and Low β show a decreasing

trend.[41] Related to this, although not discussed as often, studies by Harmony et.

al have focused on the δ band and showcased an increase in δ band behavior as

similar memory retention tasks are engaged. [27][28]

2. Ryu et. al tracked the α band power as a mental arithmetic task was performed

with increasing difficulty and observed a decrease in band power. Using factor

analysis when fusing α Band, ECG and EOG, it was determined that a combined

fusion of the sensors could better distinguish between the varying workloads. [45]

Similar fusion techniques have been performed with EEG where using features from

other sensors or modalities can boost the performance of the combined measure.[4]

3. A comprehensive study by Almahasneh et. al investigated the frontal lobes and

the associated hemispheric performance on cognitive memory and processing tasks.

The affects on the frontal right lobe were more pronounced than the left lobe using

arithmetic tasks as the secondary task to induce distraction during driving. An

increase in θ and low-α activity was both observed in the frontal right lobe after

analysis across 40 subjects. [13]

4. In an prominent study performed by Gevin, Smith et. al, a visio-spatial recall

primary task was performed where subjects were asked to recall the shape and

position of objects on a computer screen. A significant increase in θ activity at the

frontal sites were observed as the recall difficulty was increased corresponding with

a decrease in α bands in the parietal sites. [48][23]


Figure 3.1: Power Spectral sensitivities across cognitive tasks

5. In studies performed on meditation which focus on attentiveness while in a relaxed

state, γ band was considered to be of importance. In various studies it was seen that

an increase in γ band activity in the pre-frontal cortex was observed in experienced

meditators corresponding to increased attentiveness. [44][15] Another meditation

study described an increase in α and decrease in β activities during meditative

practices focused on attentiveness in a non-distracting environment. [19]

Figure 3.2 summarizes the findings.


3.2 State-of-the-Art Studies

In this section, studies that employed practical cognitive monitoring using the EEG

modality are discussed.

3.2.1 Wireless EEG System for Driver Vigilance (Lin et. al)[38]

In this practical study, a custom wireless EEG sensor is first designed using 4 dry elec-

trodes concentrated along the occipital region (back of the head). A mobile app is

designed which communicates wirelessly with the EEG sensor (Mindo). Experiments

and evaluation is performed on 15 users performing an immersive and focused 90 minute

driving challange in a virtual simulator. The challenge entails the correction of the vehi-

cle as it veers off its lane. The response time to perform the correction is recorded and

the duration of the response time is used as a labeling mechanic. Unlike other studies,

this study does not generate classification models, but rather generates Support Vector

Regression models to predict the response time of corrections based on EEG states.

Hardware

A wireless headset with 4 channels is developed with analog amplification circuitry to

boost the low voltage EEG signals from the hairy scalp region of the occipital lobe. The

analog signal is digitized and sampled at 256Hz which serves as the original sampling

rate of the sensor. Data is transmitted wirelessly every two seconds to the mobile unit

upon which all the signal processing takes place.

Pre-Processing

The two second data comprising of 512 samples are broken down into 128 sample chunks

and using a 256 Hanning Window, the PSD is generated using Welsh’s method. There-

fore, PSD components are generated on 0.5s sliding windows with .50 overlap. 30 Fre-


quency bins are subsequently generated comprising of a spectral range of 1-30Hz. These

are in-turn used to create the corresponding δ, θ, α and γ logarithmic band powers.

Further, the data received is divided into training, testing and baseline recordings.

The baseline recording are two seconds of data before the onset of the driving correction

challenge. The EEG data from the baseline is used as a normalization factor for the

training and test sets to gauge the true change in band behaviors.

Classification

Support Vector Regression models using linear, sigmoid, polynomial and Radial Basis

Kernels are all explored. Individualized models are generated using a 50 - 50 split 2-Fold

cross validation procedure repeated 100 times. Final reporting is presented on remaining

test samples.

Results

α Band Power was highly correlated to the reaction time as has been shown in vari-

ous studies. The lack of vigilance as showcased by increased reaction times to vehicle

corrections corresponded with higher values of alpha. 6 feature-sets were experimented

with: [δ], [θ], [α], [β], [δ,θ,α,β], [30 1Hz bands from 1-30Hz]. Amongst the single bands,

α obtained the lowest RMSE and concatenating all bands further reduced the RMSE.

However, the best result was obtained by using 30 features from the 1-30Hz band region.

The Radial Basis Function SVR obtains the best regression results for all feature-sets.

Discussion and Learnings

In this study, individualized models are successfully developed and used to predict the

reaction times to lane deviations. Based on the predicted reaction times, it can be inferred

if the driver is indeed in a low vigilant state. The following similarities and approaches

to this study are presented:


1. The idea of using a baseline values as a normalization factor is also performed in

this (our) study where the 30second drive data before each nback audio task is

used as the baseline data over which Z-Standardization takes place.

2. In our study, a similar cross-validation procedure is performed. However, instead of

reporting on the test set, reporting on the average of all validation set performance

is reported.

3. A low 4-channel sensor is developed which is similar to our study also employing 4

channels(only 2 used). However, the stark difference in the location of the channels

on the scalp should be observed. While this study employs sensors in the occipital

scalp region, our study employs it in the frontal scalp regions to monitor cognitive

workload rather than vigilance.

4. A very similar feature extraction method is used where a sliding window based

approach is taken by both methodologies to perform a zero-padded FFT transfor-

mation. In our study, a Hamming Sliding Window over a 220Hz signal is used

with 90 percent overlap to generate a reporting rate of 10Hz. This work, instead

uses a Hanning Window over 128Hz sub-window and an overlap of 50 percent for

a reporting rate of 2Hz.

5. The feature-set generation and comparison methodology of this study using indi-

vidual power spectral features, grouped power spectral features and FFT power

components is employed in our study.

6. At the feature-processing level once the PSD features have been generated, a 2

second sliding window is employed in this study to ensure good reporting resolution.

In our study, a similar 3 second sliding window with .95 overlap is employed for

similar reasons.

A limitation of this study is the lack of discussion or evaluation on the generalization


of a solution. It is shown that the results are a mean of the individualized model pre-

diction results. In addition, due to the problem being framed as a regression challenge,

it is difficult to compare it with the classification metrics prevalent in most studies for

distraction / cognitive workload estimation. Another limitation is the small sample size

of participants over which the study has been conducted (15).

3.2.2 Wireless EEG for Cognitive Workload(Wang et. al)[53]

In this study, a computer based visio-spatial n-back game is played by 9 participants and

the EEG data is collected using a 14 channel Emotiv EEG Headset.The 14 channels cover

the temporal, occipital and frontal scalp regions. Labeling of the EEG data is performed

simply by noting the n-back task being performed. 4 levels of nback task are presented

labeled as n0 - n3.

Feature Extraction

Original EEG recording is performed at 2048Hz and then downsampled to 128Hz for

wireless transmission. The EEG signal undergoes four types of Feature Extraction: 1)

FFT based Power Spectral generation in 2Hz intervals between 4-40Hz (18 features), 2)

Statistical Features: mean, variance, skewness and kurtosis of signal, 3) Morphological

Features: curve length, number of peaks, average non-linear energy, 4) Time-Frequency

Features: Wavelet Packet Transform. Therefore as can be seen a large set of features are

generated across the 14 channels.

Pre-Processing

A Personal Standardization approach is introduced in this study which employs an Inter

Quartile Range based feature scaling based approach. In this approach, all features are

scaled to a range between [0,1] and by looking at the historical data of the feature, a

statistical analysis similar to that as performed on ’boxplots’ is performed where outlier


values are identified and saturated with values of 0 or 1. The goal of this process is to

ensure that outliers are removed from the dataset and by ensuring that all participant

feature values are normalized onto the same scale. Following, dimensions of each epoch

is 648 and therefore a top 10 feature selection using dimensionality reduction technique

of Maximum Redundancy Maximum Relevance is performed which uses feature mutual

information as the distance measure to choose the top features for the classification stage.

Classification

A Radial Basis Function SVM was employed using 5-Fold cross validation to produce

binary classification results using one-vs-all and one-vs-one classifiers. The Top Results

are now stated for a dataset generated from those trials which contained correct nBack

responses:

1. When using top 10 features from the entire featurespace of power, morphological,

statistical and time-frequency features, an accuracy of 0.82 is achieved between n0

vs all and n0 vs n3 classification challenges. n0 vs n2 achieves 0.71 accuracy. A

ternary comparison is not performed. All other binary classifications comparisons

lead to accuracy between 0.65 and 0.70.

2. When using top 10 features only from the Power Band set (4-40Hz), a top accuracy

of .71 is achieved for n0 vs n1 and n0 vs n3. It is odd that an accuracy of .55 is

achieved for n0 vs n2. In general, all binary classification results are between 0.6

and 0.71 when using the top features from this feature-set.

Discussion and Learnings

Similar to our study, this study aims to build a generalizable model and the technique to

Personal Standardization is discussed which is used extensively in our study. The same

Feature Scaling approach is used to perform subject-level normalization as will be shown


Figure 3.2: State-of-Art Studies employing EEG and nBack Tasks

in Chapter 4. The usage of the nBack task for Cognitive Load modeling with 4 classes

is also similar to our study where a granularity of three levels is chosen. Dimensionality

Reduction is also a common theme between the two studies, however, PCA and LDA

based reduction techniques are used in our study. An advantage of the mRMR approach

employed in this study is the usage of the original features for classification which makes

it much more descriptive to understand. When using PCA or LDA, a transformation

into a secondary space takes place leading to loss of the original feature information.

A disadvantage of this study lies in the methodology that the Classification stage is

performed. Since the data is collected a-prior, data snooping in the form of the Personal

Standardization approach is performed. This is due to the fact that during the outlier

removal process, the entire dataset is snooped into to understand the distribution of the

dataset and then the dataset is used for a cross validation process. This is not feaseable

in practical applications where ’test’ data cannot be used for statistical purposes in the

training stages.


3.3 Chapter Summary

In this chapter, a myriad of the literature used to understand the research subject are

discussed. Findings in the Power Spectral Domain related to Cognitive Distraction,

Cognitive Workload, Drowsiness and Fatigue are discussed. The importance of the Power

Spectral Bands and individual windowed frequency featurespaces is discussed. Finally,

two state-of-the-art studies that are used extensively in our study are discussed with a

goal to critique and share the techniques adapted from these studies in our study.

Chapter 4

Estimating Cognitive Load with

Wireless EEG

Using data located from the two frontal channels of the Muse EEG headset, the goal of the

estimation analysis is to identify the ideal feature(s) that can be successful in estimating

the cognitive load of the drivers. To that extent, in this study, a data-driven experi-

ment is performed where Statistical Learning Models are developed and implemented to

explore the various featuresets generated from the eDREAM dataset. An experimental

methodology used in many machine learning studies consists of well defined practices

pertaining to the training, validation, testing and evaluation stages. In this study, these

methodologies are followed closely and will be described towards generating generalizable

models to estimate cognitive workload while driving. In particular, using the advantages

of the labeled eDREAM dataset, it naturally leans towards the development of super-

vised learning based Classification models. Traditional algorithms such as LSVM, kNN,

RBSVM and ANN are generated for the evaluation of the various feature spaces.

It should be noted, however, that the implementation and evaluation of the learning

models is highly dependent on the various data-partitioning schemes and the details

of the feature extraction techniques. The goal of this study is to build a robust real-

57

Chapter 4. Estimating Cognitive Load with Wireless EEG 58

Figure 4.1: Top Level Design Pipeline

time implementable system that aims to be generalizable among subjects, and to that

extent, the experiment takes into account details of data partitioning, feature processing

and dimensionality reduction techniques(PCA and LDA). Recommendations based on

the results and implication of design decisions to the real-time implementation of the

proposed system are provided.

The major goal of this experiment is to identify the top performing models and

thereby identify the best performing feature-set(s). The complexity of the the learning

algorithms are also identified and real-time implications towards learning and testing are

discussed. Figure 4.1, shows the design pipeline. Each of the stages are now described

in detail.

4.1 Experiment Overview

The overview of the experiment is provided in Figure 4.1. Details about the labeling

procedure and experimental data collection are provided in Chapter 2. FFT coefficients,

Power Spectral based Absolute and Relative features are derived by the sensor using the

raw EEG recordings. Data is recorded concurrently across the two frontal EEG channels.


Table 4.1 describes feature-set provided by the sensor.

Table 4.1: PSD Features

Measurement UnitsSampling

Rate

EEG uV 220Hz

FFT dB 10Hz

Absolute Power Bels 10Hz

Relative Power None 10Hz

An objective of this study is to understand the top featurespaces that allow for opti-

mal estimation of cognitive load. To that extent, various datasets consisting of varying

featurespaces and dimensionalities are generated to identify and perform an evaluation

based feature-selection. The underlying nature of this experiment is to learn from the

data and find top feature(s) that maximally discriminate between the differing cognitive

workloads.

A major challenge towards building a practical generalizable model is to evaluate the

various system-level challenges associated with the implementation. EEG signals pose

various challenges due to the non-stationary characteristics of the signal, highly variable

inter-individual electrophysiological responses and high prevalence of recording artifacts.

[53][38] While the data collection campaign has taken optimal care towards ensuring

a controlled and low artifact environment, steps in this experiment deal with issues

pertaining to the non-stationary characteristics of the eeg recordings and emphasis is paid

towards reducing the effects of the high inter-individual differences to build generalizable

estimation models. The Standardization, Normalization and Data-Partitioning schemes

described in Section 4.3 and 4.2 tackle these challenges. In addition, performance and


practical implications of using these Feature-Processing techniques are discussed.

Using the cumulative eDREAM dataset, various sub-datasets can be derived by vary-

ing the featurespace, performing dimensionality reduction (PCA and LDA) and gener-

ating feature-processed datasets. These sub-datasets behave as the inputs to the Clas-

sification stage whereby different data-partitioning schemes (Subject-Partitioned, Time

Partitioned and Individualized Subject-Specific) are discussed and evaluated. Well de-

fined performance metrics are used to quantify the evaluation results and an assessment

on the findings is discussed. (Section 4.5.2)

4.1.1 Data Selection

Using the experimental data-collection notes and observing the artifact-information across

all 37 users, 28 out of the 37 users(15 male, 13 female) are chosen for the analysis. Users

dropped are excluded due to significant movement artifacts during experimentation, poor

skin-conductance, lack of effort as per the experimental notes or in-completion of tasks.

Due to the sensitivity and inherent inter-individual differences in the EEG responses,

users showing aforementioned artifacts/challenges were dropped from the analysis to

allow for maximal control of the affects of the secondary nBack task.

4.2 Feature Extraction

The goal of this study is to understand the ideal Power Spectral feature-set(s) towards

estimating the cognitive workload. The basic time-domain data provided is the raw EEG

signal as a Voltage representation. Relevant studies have used derived features from the

raw EEG signals as features for their studies. These features range from statistical,

morphological and most often Power Spectral.[53][37][4][38][33] This study focuses on

the use of Power Spectral derived featurespaces. Table 4.2 outlines 3 groups of Power

Spectral derived features: Absolute Power, Relative Power and β-Relative Power. All


features are derived from the FFT transformation of the raw EEG signal and the resultant

generation of the 129 Power Spectral Amplitude Coefficients as described in section 2.6.2.

4.2.1 Cumulative Feature Space

A 16 dimensional feature vector is generated with 6 Absolute Power, 6 Relative Power

and 4 β-Relative Features. Since there are two channels of recording, a 16-features *

2-channels = 32-dimensional Feturespace is obtained. It is possible to fuse the chan-

nels and generate an averaged 16-dimensional space. However, a decision to retain the

16-features from both sides is taken with respect to the distinct diversification of brain

functions among different hemispheres [13] and the observed non-uniformity in the spec-

tral behavior across the two sites.

Table 4.2: 16-dimensional Feature Vector (per channel)

Absolute Power Bands Relative Power Bands β-Relative Power BandsSampling

Rate

δ, θ, α, β, γ, βγ δ, θ, α, β, γ, βγ((θ + α) / β), (α / β), ((θ + α)

/ (α + β)), (θ / β)10Hz

4.3 Feature Processing

In order to mitigate the effects of noise artifacts and inter-user differences, it is rec-

ommended to perform feature processing operations on the time-series feature vectors.

However, it is to be noted, that much of the signal processing on the raw EEG signal is

pre-performed by the sensor, and processing at this stage is at the Featurespace Level.

To that extent, the pre-processing stage is applied to the 32 dimensional Featurespace


Figure 4.2: Top Level Design Pipeline

reported at a temporal resolution of 10Hz (100ms). Figure 4.2 shows the Feature Pro-

cessing pipeline and subsequent processing on the signal from Subject 37.

4.3.1 Windowed-Averaging

Windowed Averaging of the time-domain feature-set is used to provide adaptive noise re-

duction and smoothing as a pre-processing step before the classification stage. Since the

featurespace is reported at 10Hz, a strong correlation in information is present between

consecutive samples. However, it is also possible for samples to be contaminated with

artifacts and individual noisy samples may overpower or distort the ’informative’ signal

present in nearby noiseless samples. Windowed Averaging can allow for the reduction

in the artifact affects and informative trends of the windowed signal are preserved. Fur-

thermore, the artifact information transmitted by the sensor is used to pinpoint samples

where artifacts were reported. This artifact information is used to actively remove the

samples from consideration. After thorough analysis, users selected for the experimen-

tal analysis exhibited minimal artifact samples. Therefore, active artifact removal and

windowing are both used to allow for adaptive noise reduction and reduce the affect of

outliers in the classification training stages. The two major goals of Windowed-Averaging

as pertaining to this experiment are now summarized:


• Adaptive Noise Reduction.

• Ensure that the sample count is preserved and sufficient samples are available for

the Classification stages.

• Ensure reporting responsiveness in a real-time environment.

Two windowing schemes that were experimented with are discussed. The decision to

use the Overlapping Moving Window scheme for subsequent analysis is discussed:

Non-Overlapping Moving-Average-Window

A non-overlapping moving-average-window operates by sampling and averaging every n

samples based on the sampling duration used. For example, at 10Hz, and a selection of

a 1 second Non-Overlapping Window, a new windowed measurement will be available

every 1 second. The non-overlapping nature of the scheme is desirable as it allows for

flexible granularization of time segments where independent averages may provide more

meaningful insights. For example, in cases where changes in discrete cognitive workloads

may occur every few seconds such as playing an action video game, it may be desirable

to have no overlap between each short window to be able to more robustly analyze the

distinct loads and prevent contamination due to overlap. However, since the driving

workloads are much longer in duration pertaining to the nBack tasks(30s tasks), the

non-overlapping window does not pose much benefit.

A disadvantage of this scheme lies in the reduction in the responsive reporting of the

measurement. The length of the window is also the real-time reporting rate as samples

must be collected and averaged before moving to the successive processing stage. If

long windows are desired for optimal performance, significant latency may be present

between recording and reporting. This can particularly be of issue when operating in

High Cognitive Workload periods where instant feedback is desired.


Another disadvantage is the drastic reduction in the number of samples as it con-

sumes many samples over the averaging process. For example, at a sampling rate of

10Hz, and using a 1s non-overlapping moving window, 10 samples would be averaged

to generate a single sample. This results in a reduction of sample counts by a factor of

10. This reduction in sample count can be a detriment to the learning process where

it is desired to have as many samples as possible to facilitate the learning of complex

learning models with high-dimensionality.[3] These challenges can be mitigated with an

Overlapping Window.

Overlapping Moving-Average-Window

A simple Dirichlet(Boxcar) window is used for the analysis. The overlapping scheme is

generated by using a sliding window of size n which is slid by 1 for each new sample. The

windowed result is generated by averaging the new sample with n− 1 previous samples.

A major advantage of using this scheme is the preservation of the total sample counts

to be used for the succeeding classification stage. As noted earlier, having an increased

number of samples for the learning stages allows for the development of more robust and

generalizable learning models. Another advantage is the robust and responsive real-time

reporting allowing for a practical and proactive reporting of cognitive workload. Using

the mechanism of sliding by 1 sample, a new windowed measurement can be made at

with each successive sample and no latency in reporting is observed. However, a practical

consideration is the requirement to maintain a buffer of N previous samples which may

be of concern in resource constrained computational environments such as a headband

computer. In this study, a 3s sliding window with 90 percent overlap was used.

4.3.2 Standardization

A major challenges with EEG signals is the high inter-individual differences in the record-

ings. These differences arise not only from experimental artifacts present in individualized


Figure 4.3: Windowing Applied to a time-series Absolute γ signal

recordings, but also from the differences in the electrophysiological behavior between par-

ticipants resulting in varying baseline differences between individuals.[53][38] Inter-user

differences and the presence of artifacts in the recording signals are a major deterrent

towards building generalizable models to predict workload. Therefore, a standardization

process is recommended and performed in most studies to mitigate these concerns and

improve the generalization of models.[53][38] A commonly used standardization statis-

tic is the Z-Score statistic which transforms the data as a standard-deviations measure

from the mean. This allows for inter-subject standardization as the baseline differences

between individual users are mitigated as each individual recording is now described as

a relative z-score.

Z =X − µ(X)

σ(X)(4.1)

where Z is the Z-Score, X is the original data, µ(X) is the mean of the dataset and

σ(X) is the standard deviation of the dataset.


Subject-Standardization

In order to achieve subject-level standardization, historical data of the participant is

required. In this study, the first 50 seconds of each drive which correspond to the time

before the nBack audio task is activated is taken as the baseline data for standardization

statistics. The mean and standard deviation is computed from the baseline data and the

labeled nBack task data is standardized as shown in Equation 4.2.

Xstd =Xraw − µbase

σbase(4.2)

where Xstd is the standardized labeled dataset, Xraw is the original labeled dataset,

Xbaseline is the mean of the baseline data and σbaseline is the standard deviation of the

baseline data. Xstd can be described as a relative measurement in terms of standard-

deviation that the labeled data is away from the mean of the baseline data.

A subject-standardized cumulative dataset is created by concatenating each of the

subject-standardized dataset for each user. This cumulative dataset showcases a re-

duction in the inter-individual differences between participants. A disadvantage of the

subject-level standardization process is the requirement of the collection of historical

data of a participant to learn the statistical properties required for the standardization.

Implementation of a real-time system using a standardization approach can lead to addi-

tional steps such as calibration and increased computational requirements. However, in

this study, while the practical implications are discussed, results for both standardized

and original datasets are compared.

4.3.3 Subject-Level Normalization

A Feature Scaling approach is used as proposed by Wang et al., which is adept at removing

outlier samples as well as scaling the measurements to a fixed range [0,1]. Removing

the outliers prevents such samples from contaminating and biasing the learning models.


Figure 4.4: Standardization operation applied to a time-series 3s Averaged Absolute γ

signal

Scaling the features to a fixed range ensures that no single feature is overweighted during

the learning process. Equation 4.3, 4.4, 4.5 outlines the Feature Normalization procedure

which is applied to the subject-standardized datasets.

Xscaled =Xraw −Xl

Xu −Xl

(4.3)

Xu = min(Xmax, Qu + (1.5 ∗ (Qu −Ql)) (4.4)

Xl = max(Xmin, Ql − (1.5 ∗ (Qu −Ql)) (4.5)

where Xscaled is the normalized feature, Xraw is the original sample value, Xu is the

upper limit, Xl is the lower limit, Qu is the upper quartile of X, Ql is the lower quartile

of X, Xmax is the maximum value of X and Xmin is the lowest.


Figure 4.5: Normalization operation applied to a time-series 3s Averaged and Standard-

ized Absolute γ signal

Figure 4.6: Complete Feature Processing methodology to transform original Time-Series

signal into a processed Normalized Signal prepared for the Classification Stage


4.4 Machine Learning Algorithms

The feature-processed stage generates a myriad of datasets which require a modeling

methodology to identify the top featurespaces. Due to the estimation goal of the proposed

system and the availability of a labeled dataset, a natural approach is to use Supervised

Machine Learning algorithms for the classification tasks. The classification problem can

be phrased as a classification challenge based on the ’ground-truth’ labels associated with

each labeled dataset. Eq 4.6 formalizes this classification challenge:

y =

0, no n-back task during driving

1, 1-back task during driving

2, 2-back task during driving

(4.6)

An advantage of a data-driven approach is to avoid the development of specific algor-

tihms to determine the underlying patterns of the various featurespaces towards the clas-

sification problem. In addition, the feature extraction and feature-processing stages are

completely decoupled from the classification stage. This allows various feature-processing

techniques (as described in previous section) to generate numerous datasets which can

be evaluated by controlled processes in the classification stage. A labeled dataset, X is

generated by the Feature Extraction / Pre-Processing stage and inputted to the Classi-

fication stage. Machine Learning Algorithms, techniques in data-partitioning, validation

based hyper-parameter optimization and evaluation criterion are further discussed in this

section as it pertains to this experiment.

Simulations were performed using a mixture of Python and Matlab programs. In

particular, Python was used for SVM hyper-parameter tuning using libSVM[20], and

the development of a shallow neural network using TensorFlow. While hyper-parameter

tuning was primarily performed using Python based programs, final evaluation was per-

formed using MATLAB due to greater data visualization flexibility and ease of use of


parallel computing capabilities.

Most studies partake in using ’classical’ machine learning algorithms for the estima-

tion challenge while greater attention is paid towards the feature extraction processes.

In particular, studies evaluate both linear and non-linear classifiers with the respective

datasets. Some newer studies have explored using Deep Learning architectures, however,

features are often not extracted as a separate processing stage and are rather learned

through the training process.[59][38] In this study, due to the prevalence of strictly iden-

tified feature-set family, i.e the Spectral Power Components, shallow-learning algorithms

are evaluated to identify the optimal features or feature-set that optimizes the classifica-

tion task. As a secondary goal, by evaluating the 5 following algorithms, a preliminary

comparison is provided between the various classification algorithms.

4.4.1 Support Vector Machine with Linear Kernel (LSVM)

The lSVM is a maximum-margin linear classifier which generates an optimal separating

hyperplane using soft-constraints for the classification tasks.[20][3] No transfomation into

higher dimensional spaces are required, requiring for only the optimization of the penalty

parameter C. Varying the C parameter allows us to introduce regularization into the

model generation for optimal generalization between in-sample and out-of-sample per-

formance. A larger C moves towards a hard-constraint and overfitted models while a low

C moves towards underfitted models, therefore, a validation scheme is required to deter-

mine the optimal regularization parameter value. [3][20] This experiment uses the Grid

Search methodology provided in the libSVM library and the accompanying guidelines to

perform this parameter tuning. lSVM is generally a binary classification technique and

a One-Vs-All approach is used to build n-models where n is the number of classes. The

following Hyperparameters are considered for this experiment:

• Penalty Factor C: Exhaustive search through libSVM grid search algorithm


4.4.2 Logistic Regression LR

Logistic Regression is another linear classification algorithm where the model outputs

probabalistic outputs specifying the confidence of each class being the correct output. [3]

The maximum probability class is chosen as the prediction output. Once again, in order

to optimally generalize between in-sample and out-of-sample, a regularization penalty

term C is used in a manner very similar to that of lSVM described above. The following

Hyperparameters are considered for this experiment:

• Penalty Factor C: 10, 1, 0.01. 0.001

4.4.3 k-Nearest Neighbors (kNN)

The k-Nearest neighbors is not a trainable model, rather it memorizes the the labeled

training samples and uses a voting scheme based on k-neighbours to assign a label to a new

test sample. [3] The voting is based on using the Euclidian Distance metric to determine

the ’k’ closest training samples to the test sample and taking a majority vote based on

the labels of the ’k’ labels. In case of ties, the label of the closest sample is chosen.

Potential hyper-parameters that can be chosen are various forms of Distance Metrics,

weighting of neighbours and number of neighbours. The following Hyperparameters are

considered for this experiment:

• Distance Weights: Inverse, Equal-Weighted

• Number of Neighbors: 1,10,50,100

4.4.4 Support Vector Machine with Radial Basis Kernel (RB-

SVM)

The rbSVM is an extension of the lSVM where the ’kernel trick’ is applied to perform a

transformation of the input space into higher dimensions. [3] This is useful when the data


is not linearly separable and a higher dimensional featurespace is desired to compute the

optimal hyperplane.[3] Due to the kernel transformation, an additional hyper-parameter

γ, sensitivity to sample misclassification, needs to be optimized. Higher γ lead to overfit-

ting and low γ lead to underfitted models. Therefore, a validation procedure is required

to find the optimal hyper-parameters. This experiment using the Grid Search method-

ology provided in the libSVM library and the accompanying guidelines to perform this

parameter tuning. The following Hyperparameters are considered for this experiment:

• Penalty Cost C : Exhaustive Grid search by libSVM algorithm

• Sample Sensitivity γ: Exhaustive search by libSVM grid search algorithm

4.4.5 Shallow Artificial Neural Network (ANN)

A shallow Feed-Forward Neural Network using the backpropagation algorithm is devel-

oped for this experiment. The Cross Entropy Loss is used as the evaluation metric for

training. ReLu activation functions are used in the hidden layers and a softmax output is

used in the output layer to generate binary outputs. Regularization via Penalty Factor λ

and early stopping are performed during the classification task. Adam Optimizer is used

for objective function optimization during training. The following hyper-parameters are

optimized:

• Learning rate α: 0.1,0.01,0.001

• Regularization Parameter λ: 0.005, 0.01, 0.1

• Architecture (Layers-Nodes): 2-10, 1-10, 1-100, 2-100

4.5 Performance Evaluation

The goal of the system is towards building a practical and real-time implementable cogni-

tive workload estimation system. To that extent, careful attention must be paid towards


Figure 4.7: Machine Learning Algorithm and evaluated Hyper-Parameters

the evaluation and data partitioning schemes to account for the practical assesment of the

solution. In a practical scenario, systems are either built by using collected data to build

customised solutions which are tested on new data, or as in our approach, a data mining

approach is followed and statistical learning models are developed to learn the inherent

patterns in the data. In both approaches, the dataset is identical, however, the goal is

to use the data to perform well on new unseen data. Idealy, the new unseen data would

be test data collected after the experiment and then tested on the models generated by

the original dataset. However, such an approach is infeasible as costs, controllability and

time are factors towards following up with a secondary data collection campaign.Instead,

a data partitioning approach is used where the collected and labeled dataset is divided

into training, validation and test sets and well established methodologies are followed to

optimize the in-sample performance with the out-of-sample generalization.[3] The Ma-

chine Learning training and evaluation procedure is shown in Figure 4.8 and described


Figure 4.8: k-Fold Cross-Validation

in the following sections:

4.5.1 Training and Testing

Performance evaluation is established by training models on the training set (in-sample)

and evaluating on samples outside of the training set(out-of-sample). Training and test-

ing on the same set will certainly lead to overly optimistic results. However, testing on

a random test set is also not the ideal indicator of out-of-sample performance as the

selected test set may be biased by overly optimistic/pessimistic samples. Performance is

affected not only by the selection of the training / test sets from selections within the

dataset, but also by the partitioning rations of the various sets. For example, performing


a fifty-fifty split of training and test sets will drastically reduce the samples required for

satisfactory training and affect the optimization of the in-sample performance. Inade-

quate optimization of the in-sample performance will lead to unsatisfactory out-of-sample

performance accordingly. In particular, it is desired to have a large number of training

samples when training complex models or dealing with a dataset in high dimensions.[3]

The machine learning challenge is to minimize the in-sample error Ein while also minimiz-

ing the generalization error (Eout−Ein). In essence, it is desired to have an out-of-sample

error very close to the in-sample error to ensure that trained models provide realistic and

reproducable results when tested with new unseen samples. Throughout this process of

optimizing the two measures, model hyper-parameters are constantly adapted until a set

of hyper-parameters are found which accomplish the two aforementioned optimizations.

An established methodology to accomplish this is the K-Fold Cross Validation technique

described below. [3]

K-Fold Cross-Validation

In order to optimize the minimization of Ein and (Eout − Ein), a balance between the

choice and ratio of test and training sets is required. Too many training points can lead

to an unpredictable out of sample behavior while too many test points can lead to the

generation of an inadequate model. To mitigate these issues, a technique called K-Fold

Cross-Validation is used which introduces a third set, called the Validation Set.[3] K-Fold

validation as used in this experiment is now described:

1. Dataset is divided into a 90 percent Training Set and 10 percent Test Set.

2. Training Set is divided into 5 Folds (The partitioning of data will be discussed in

the next section)

3. Initial model hyper-parameters are chosen and trained on 4 of the 5 folds. The

resulting model is tested on the remaining 5th fold called the ’validation’ Set. This


process is repeated until each set has been a part of the validation set once. This

ensures that all samples are used as training samples multiple times but are used

as validation samples only once.

4. Repeat Step 3 until all desired hyper-parameters have been evaluated.

5. Choose the best performing hyper-parameters based on the hyper-parameter set

which achieved the best average validation error

6. Train entire training set (all 5-folds) with chosen hyper-parameters

7. Use the generated model to evaluate the Test Set

Data Partitioning

In order to replicate the practicality of implementing such a system in a practical driving

scenario, attention must be paid to the paradigms which would facilitate a practical data

collection, training and evaluation scenario. Some obvious choices for practical usability

of a trainable system are as follows:

1. An individualized model is generated for each driver and then tested on the same

driver. While this option is the most ideal due to the fine tuning of individual

responses, it is often time consuming and impractical to gather the sufficient indi-

vidual data to deliver a graceful out-of-the-box experience. A better experience can

be achieved if pre-trained models from other users’ data can be used to evaluate a

new users’ data as described next.

2. A generalized pre-trained model is generated by multi-user collected data and is

then tested on new users. This allows for the best out-of-box experience as no

individualized training is required and pre-trained models can be used to predict

workload of new individuals


3. A generalized model is generated by multi-user collected data and is then tested

on the same users. This differs from (1), in that, the models generated using this

methodology are not individualized and instead are generated from a multitude

of users. A downside of this methodology is the requirement for individual data

collection to retrain the models.

Approach (1) is denoted as the Individualized Subject-Specific partitioning scheme

where individual models for each user are generated and tested.

Approach (2) is denoted as Generalized Subject-Partitioned scheme where test sub-

jects are not used as part of the training procedure.

Approach (3) is denoted as Generalized Time-Partitioned scheme where subjects are

used both in the train and test sets. It is very important to note that even though the

same users are used for both the training and testing, the training and test sets are two

distinct sets from the same users where test samples are never included in the training

procedure to mitigate sampling bias.

4.5.2 Performance Metrics

Most studies use Accuracy as the primary evaluation criterion. Accuracy is described in

Equation 4.7.

A(p, y) =

∑ncount1 ([p == y])

ncount(4.7)

where A(p, y) is the Accuracy, ncount is the number of samples, p is the predicted class,y

is the correct condition class and [] is the indicator function.

In cases where the datasets are unbalanced (one class is overweighted in representa-

tion), it is much more useful to use the related metrics of Precision and Recall as has been

used by various studies. Precision and Recall are described in the following equations[16]:

P =TP

TP + FP(4.8)


R =TP

TP + FN(4.9)

where TP is the number of correctly predicted samples as the positive class, FP is the

incorrectly predicted samples as the positive class and FN is the number of incorrectly

predicted samples as the negative class.

Precision can be described as the ’rate’ of correctly predicting the positive class.

That is, for all classes predicted as the positive class, what is the rate of the correct

predictions. Recall can be described as the ratio of correctly predicted positive class over

the total number of conditional positive samples. While Precision and Recall can be used

individually as a good evaluation metrics, a harmonized mean of the Precision and Recall

can be used, called the F-Score which takes into account both Precision and Recall and

allows for a singular numeric representation of the performance. F-Score is described by

Equation 4.10[16]

Fβ = (1 + β2)PR

(β2P ) +R(4.10)

where Fβ is the F-Score, β is the recall weight, P is the precision and R is the recall. By

increasing the value of β, the recall is given greater importance. In this experiment, both

F1 and F2 scores are computed. F1 weighs both precision and recall as equally weighted,

while F2 provides greater sensitivity to the recall parameter.

In the final evaluation reporting, Accuracy, the average of F1 score and the average

of the F2 scores over the binary and ternary classes are reported.

4.6 Experiment Results

A comprehensive set of simulation results are presented which describe the performance of

featuresets that would be applicable in the aforementioned practical driving and modeling

scenarios. Both individualized and generalized model results are presented for both

binary and ternary performance.


4.6.1 Individualized Subject-Specific Performance

Individualize Subject-specific models are generated on solo users using the time-based

partitioning scheme. kNN and lSVM algorithms are used a 5-fold cross validation is per-

formed. Results presented are the averaged validation scores of the top hyper-parameter.

Discussion

Figure 4.9, 4.10 and 4.11 show the individualized classification performance using the ac-

curacy metric. Table 4.3 and 4.4 show the medians values of the performance of the 28 in-

dividualized models generated for 28 participants for 32-feature and sub-featurespaces. A

comparison between modeling on the original dataset and the proposed feature-processed

dataset is shown. The following discussion is presented:

1. The standardization and normalization procedures part of the feature-processing

stage boost the performance of the individual models. It can be seen that by

performing a subject level standardization / normalization process, a linear classi-

fier(lSVM) is able to clearly discriminate between the 3 classes with .991 accuracy.

This is expected as fine-tuned individual models are generated for each participant.

2. In the reduced featurspaces, Absolute Power shows the best discriminant behavior

across both binary and ternary datasets with performance accuracy of .923 and .890

respectively. Absolute Power feature-set is the best choice for individual subject-

specific models, however, significantly better performance is achieved using any of

the sub-featurespaces when compared to performance in the original (no feature-

processed) datasets.

3. PCA is performed on the original 32-dimensional feature-set and the featurespace

is reduced to 3-dimensions. In the binary classification task for the original (no

feature-processing) dataset, no significant performance improvements are observed


Figure 4.9: Individualized Models median binary classification performance

and .780 accuracy is achieved. For the same classification task on the Feature-

Processed dataset, an accuracy of .871 is achieved. This is a degradation from the

performance when using all 32 features(.991). For the ternary classification task,

a significant performance drop is observed due to the dimensionality reduction in

the feature-processed dataset.

4. LDA is also performed on the original 32-dimensional dataset, reducing it to 1

and 2 dimensions for the binary and ternary datasets respectively. In the feature-

processed dataset, for both classification tasks, performance is improved as LDA is

successful in maximizing the inherent linear separability of the data. Figure 4.12

and 4.13 visually showcase this observation. Reducing the 32-dimensional space

to 1 and 2-dimensions is resulting in an equivalent or superior performance when

compared to the performance of the complete 32-dimensional featurespace. LDA

as a dimensionality reduction technique is recommended for individualized datasets

when the proposed feature-processing scheme is performed. This results also points

towards the inherent linear separability of the individualized datasets where simple

linear classifiers can be used to generate optimal models for detecting between

cognitive workloads.


Figure 4.10: Individualized Models median ternary classification performance

Figure 4.11: Individualized Models sub-featurespace performance


Table 4.3: Medians of Individualized Scores using All-Features(32)

Algorithm Classes Original Feature Processing

ACC F1 F2 ACC F1 F2

lSVM 0 vs 2 .764 .754 .758 .991 .991 .991

lSVM(LDA) 0 vs 2 .767 .761 0.761 1 1 1

lSVM(PCA) 0 vs 2 .780 .781 .784 .871 .894 .894

lSVM 0 vs 1 vs 2 .662 .654 .656 .976 .975 .974

lSVM(LDA) 0 vs 1 vs 2 .572 .582 .589 .950 .986 .986

lSVM(PCA) 0 vs 1 vs 2 .595 .618 .625 .666 .775 .778

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =

Linear SVM

Table 4.4: Medians of Individualized Scores using Sub-Features

Algorithm Classes Absolute Relative β-Relative

ACC F1 F2 ACC F1 F2 ACC F1 F2

lSVM 0 vs 2 .923 .959 .959 .880 .889 .889 .792 .858 .857

lSVM 0 vs 1 vs 2 .790 .884 .884 .746 .745 .752 .666 .792 .798

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM = Linear SVM


Figure 4.12: 2-dimensional LDA Space for Subject 37 on original dataset

Figure 4.13: 2-Dimensional LDA Space for Subject 37 on feature-processed dataset


Figure 4.14: 3-dimensional PCA Space for Subject 37 on feature-processed dataset

4.6.2 Generalized Subject-Partitioned Classification

In this grouping scheme, a unique set of participants are used to train the models and the

generated models are then tested on a new set of participants. This is the most ambitious

experimental condition where the test set participants may not have statistical properties

related to the training set participants. (Figure 4.15 and 4.16)

Discussion

For both the binary and ternary tasks, around guess performance is achieved (33 percent

and 50 percent respectively). Dimensionality reduction techniques to generate datasets

in PCA and LDA featurespaces do not boost performance. Best performance for the

binary classification task (.551) is achieved at the PCA space using kNN with 10 nearest

neighbours model. kNN with 10 neighbors model achieves the best performance in the

ternary classification task (.394) in the PCA space. However, best performance is not


Figure 4.15: Generalized Subject-Partitioned binary classification

Figure 4.16: Generalized Subject-Partitioned ternary classification


Table 4.5: Generalized Subject-Partitioned Binary Classification

Featurespace Classes Dimensions Performance

ACC F1 F2

lSVM 0 vs 2 32 .509 .505 .509

LR 0 vs 2 32 .511 .507 .511

kNN 0 vs 2 32 .538 .535 .537

rbSVM 0 vs 2 32 .539 .375 .440

ANN 0 vs 2 32 .487 .486 .488

lSVM (PCA) 0 vs 2 3 .509 .505 .509

LR (PCA) 0 vs 2 3 .506 .508 .512

kNN (PCA) 0 vs 2 3 .551 .546 .549

rbSVM (PCA) 0 vs 2 3 .505 .500 .501

ANN (PCA) 0 vs 2 3 .487 .479 .481

lSVM (LDA) 0 vs 2 1 .510 .506 .510

LR (LDA) 0 vs 2 1 .512 .507 .511

kNN (LDA) 0 vs 2 1 .499 .497 .498

rbSVM (LDA) 0 vs 2 1 .459 .458 .458

ANN (LDA) 0 vs 2 1 .488 .477 .484

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score,

lSVM = Linear SVM, LR = Logistic Regression, kNN =

k-Nearest Neighbors, rbSVM = Radial Basis SVM, ANN

= Artificial Neural Network


Table 4.6: Generalized Subject-Partitioned Ternary Classification

Featurespace Classes Dimensions Performance

ACC F1 F2

lSVM 0 vs 1 vs 2 32 .331 .322 .328

LR 0 vs 1 vs 2 32 .333 .327 .328

kNN 0 vs 1 vs 2 32 .363 .358 .360

rbSVM 0 vs 1 vs 2 32 .355 .206 .260

ANN 0 vs 1 vs 2 32 .359 .348 .351

lSVM (PCA) 0 vs 1 vs 2 3 .333 .326 .330

LR (PCA) 0 vs 1 vs 2 3 .332 .326 .330

kNN (PCA) 0 vs 1 vs 2 3 .394 .349 .351

rbSVM (PCA) 0 vs 1 vs 2 3 .378 .375 .375

ANN (PCA) 0 vs 1 vs 2 3 .359 .312 .316

lSVM (LDA) 0 vs 1 vs 2 2 .325 .313 .320

LR (LDA) 0 vs 1 vs 2 2 .333 .327 .330

kNN (LDA) 0 vs 1 vs 2 2 .378 .367 .371

rbSVM (LDA) 0 vs 1 vs 2 2 .281 .270 .271

ANN (LDA) 0 vs 1 vs 2 2 .333 .323 .328

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score,

lSVM = Linear SVM, LR = Logistic Regression, kNN =

k-Nearest Neighbors, rbSVM = Radial Basis SVM, ANN

= Artificial Neural Network


significantly better than guess performance and therefore, a csubject-partitioned scheme

where models are trained and tested with separate participants is not recommended for

an automated driver cognitive workload monitoring system. Similar around guess results

are also observed in a study by Lockheed Martin. [59]

4.6.3 Generalized Time-Partitioned Performance

In this grouping scheme, models are trained and tested on the data from the same

participants. It should be noted that using the 5-fold partitioning scheme ensures that

no data leakage between the training and validation sets takes place and a true out of

sample performance evaluation is presented. Evaluation results using the Accuracy, FP1

and FP2 metrics are provided for the original dataset as well as the dataset with the

proposed feature processing methodology.

32-Dimensional Featurespace

The complete set of Absolute, Relative and β-Relative features across the two frontal sites

are used for the evaluation. A comparison between the original and feature processed

datasets is provided in Table 4.7 and 4.8.

The following results are achieved in the binary classification task between Low Work-

load and High Workload conditions:

1. rbSVM models achieve the best accuracy of .675 in the original 32-Featurespace.

By using the proposed feature-processed dataset, an improved .864 accuracy is

achieved using a single layer ANN (100 hidden nodes).

2. In the reduced featurespace using PCA with 3-features, no significant performance

differences are observed in the original dataset compared to the original 32-featurespace.

rbSVM achieves the best performance in the original dataset with an accuracy of

.671 (similar to .675 in the 32-featurespace). However, in the feature-processed


dataset, severe performance degradation takes place when compared to the feature-

processed dataset in the 32-featurespace. Best performance is achieved using kNN

with 10 nearest neighbours with an accuracy of .623. This is a stark reduction

in performance of .864 when compared to the 32-dimensional featurespace. PCA

based dimensionality reduction is not recommeded for the proposed feature process-

ing methodology. However, a dimensionality reduction using PCA in the original

space achieved the same performance as the 32-featurespace.

3. In the 1-dimensional LDA space, performance degradation was observed in both

datasets. Best accuracy of .579 was achieved using ANN in the original dataset.

In the proposed feature-procesed dataset, kNN achieved the top accuracy with

.678, however this is a significant degradation from an accuracy of .864 using all

32-features. LDA outperforms PCA based dimensionality reduction in the feature

processed dataset.

4. In general, across both datasets, more complex classifiers such as kNN, ANN and

rbSVM performed significantly better than the linear models of lSVM and Logistic

Regression. This points towards the datasets not being inherently linearly separable

and the requirement of more complex models for acceptable performance.

The following results are achieved for the ternary classification task,

1. Using all 32-features in the original dataset, an accuracy of .619 is achieved using

rbSVM. In the proposed feature-processed dataset, an accuracy of .790 is achieved

using ANN which is significantly superior to the performance of all other models. In

general, all models generated in the feature-processed dataset perform superiority

compared to the original dataset.

2. In the 3-dimensional PCA space, a slight performance degradation takes places in

the original dataset evaluation. Top performance in the PCA space is an accuracy


of .568 using an ANN model compared to .619 using all 32-features. In the proposed

feature processed dataset with PCA based dimensionality reduction, best accuracy

of .490 is achieved using ANN. This is a severe degradation compared to the .790

achieved using ANN in the 32-feature space. PCA is recommended for use in the

original dataset, however, severe performance degradation is observed in the feature

processed dataset.

3. In the 2 dimensional LDA space, a performance degradation takes place in both

datasets. Best performance of .445 is achieved using ANN in the original dataset.

This is a significant degradation from an accuracy of .619 using all 32 features

with the same dataset. In the proposed feature-processed dataset, a performance

degradation is also observed with the best performance of .540 achieved compared

to .790 using all 32 features. LDA outperforms PCA in the feature processed

dataset and is recommended if dimensionality reduction is essential for real-time

performance and training requirements.

4. Once again, non-linear classifiers outperform the linear classifiers which suggests

towards non-linearity present in the underlying patterns of the dataset.

Absolute Power Featurespace

Table 4.9 showcases the performance of individual features in the feature-processed

datasets using individual Absolute Power Features. Due to the low dimensional space of

this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed

and only lSVM and kNN are analyzed for performance metrics. The following is observed:

1. High Frequency Absolute Bands β and γ exhibit greater discriminant behavior

across both binary and ternary classification tasks compared to lower frequency

bands.

2. kNN outperforms the Linear SVM classifier across both classification tasks.


Figure 4.17: Generalized Time-Partitioned ternary classification

Figure 4.18: Generalized Time-Partitioned binary classification


Table 4.7: Time-Partitioned Binary Classification using 32-Features


ACC F1 F2 ACC F1 F2

lSVM 0 vs 2 .543 .539 .595 .677 .676 .676

LR 0 vs 2 .542 .541 .541 .681 .681 .681

kNN 0 vs 2 .675 .675 .675 .797 .797 .797

rbSVM 0 vs 2 .659 .653 .654 .771 .726 .739

ANN 0 vs 2 .649 .649 .649 .864 .864 .864

lSVM(PCA) 0 vs 2 .542 .536 .538 .562 .546 .553

LR(PCA) 0 vs 2 .542 .542 .542 .564 .552 .557

kNN(PCA) 0 vs 2 .663 .663 .663 .623 .623 .633

rbSVM(PCA) 0 vs 2 .671 .671 .671 .619 .617 .617

ANN(PCA) 0 vs 2 .625 .624 .624 .592 .592 .592

lSVM(LDA) 0 vs 2 .541 .539 .539 .677 .678 .677

LR(LDA) 0 vs 2 .542 .541 0.541 .678 .678 .677

kNN(LDA) 0 vs 2 .573 .572 0.572 .678 .677 .678

rbSVM(LDA) 0 vs 2 .577 .572 0.572 .677 .676 .676

ANN(LDA) 0 vs 2 .579 .578 0.578 .671 .670 .670

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM

= Linear SVM, LR=Logistic Regression, kNN=k-Nearest Neigh-

bors, rbSVM=Radial Basis Function SVM, ANN= Artificial Neu-

ral Network


Table 4.8: Time-Partitioned Ternary Classification using 32-Features


ACC F1 F2 ACC F1 F2

lSVM 0 vs 1 vs 2 .411 .404 .407 .533 .533 .533

LR 0 vs 1 vs 2 .412 .403 .407 .520 .520 .520

kNN 0 vs 1 vs 2 .589 .589 .589 .692 .692 .691

rbSVM 0 vs 1 vs 2 .619 .619 .618 .661 .634 .629

ANN 0 vs 1 vs 2 .612 .611 .612 .790 .790 .790

lSVM(PCA) 0 vs 1 vs 2 .367 .335 .350 .373 .372 .372

LR(PCA) 0 vs 1 vs 2 .362 .327 .343 .364 .364 .364

kNN(PCA) 0 vs 1 vs 2 .540 .540 .540 .465 .465 .465

rbSVM(PCA) 0 vs 1 vs 2 .560 .560 .560 .455 .456 .456

ANN(PCA) 0 vs 1 vs 2 .568 .568 .568 .490 .487 .487

lSVM(LDA) 0 vs 1 vs 2 .378 .367 .378 .533 .532 .532

LR(LDA) 0 vs 1 vs 2 .381 .373 .377 .532 .531 .531

kNN(LDA) 0 vs 1 vs 2 .376 .374 .378 .522 .522 .522

rbSVM(LDA) 0 vs 1 vs 2 .414 .12 .411 .540 .540 .540

ANN(LDA) 0 vs 1 vs 2 .445 .445 .445 .527 .527 .527

Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM

= Linear SVM, LR=Logistic Regression, kNN=k-Nearest Neigh-

bors, rbSVM=Radial Basis Function SVM, ANN= Artificial Neu-

ral Network


3. The Absolute γ Power band exhibits best performance with .434 accuracy for

ternary and .623 accuracy for binary classification tasks. Both results are signifi-

cantly better than guess performance and should be strong candidates as isolated

features.

Relative Power Featurespace


datasets using individual Relative Power Features. Due to the low dimensional space of

this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed


1. Significant differences in performance is not observed between the lower and higher

frequency bands in the relative power space.

2. kNN and lSVM achieve similar performance across both classification tasks.

3. The Relative γ Power band exhibits best performance with .371 accuracy for ternary

and .554 accuracy for binary classification tasks. This is around the guess perfor-

mance for both classification tasks.

4. Single Relative High Band features exhibit inferior performance compared to single

Absolute High Band features

β-Relative Power Featurespace


datasets using individual β-Relative Power Features. Due to the low dimensional space

of this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed



Table 4.9: Time-Partitioned Classification using Single Absolute Features

Algorithm Classes lSVM kNN

ACC F1 F2 ACC F1 F2

Absolute δ 0 vs 1 vs 2 .352 .353 .353 .352 .351 .352

Absolute θ 0 vs 1 vs 2 .345 .342 .341 .342 .341 .341

Absolute α 0 vs 1 vs 2 .362 .331 .332 .356 .356 .356

Absolute β 0 vs 1 vs 2 .380 .323 .355 .424 .423 .422

Absolute γ 0 vs 1 vs 2 .375 .309 .344 .436 .434 .437

Absolute βγ 0 vs 1 vs 2 .375 .359 .367 .372 .369 .370

Absolute δ 0 vs 2 .525 .522 .523 .525 .525 .525

Absolute θ 0 vs 2 .514 .512 .513 .512 .511 .511

Absolute α 0 vs 2 .540 .539 .541 .528 .527 .527

Absolute β 0 vs 2 .573 .572 .572 .599 .599 .599

Absolute γ 0 vs 2 .566 .564 .563 .623 .622 .621

Absolute βγ 0 vs 2 .562 .562 .562 .607 .607 .604


Linear SVM, kNN = k-Nearest Neighbors


Table 4.10: Time-Partitioned Classification using Single Relative Features


ACC F1 F2 ACC F1 F2

Relative δ 0 vs 1 vs 2 .341 .300 .316 .354 .354 .355

Relative θ 0 vs 1 vs 2 .362 .323 .341 .357 .357 .357

Relative α 0 vs 1 vs 2 .340 .305 .319 .354 .354 .354

Relative β 0 vs 1 vs 2 .335 .306 .318 .346 .347 .345

Relative γ 0 vs 1 vs 2 .356 .330 .340 .366 .367 .366

Relative βγ 0 vs 1 vs 2 .343 .342 .342 .371 .370 .372

Relative δ 0 vs 2 .513 .492 .491 .539 .538 .538

Relative θ 0 vs 2 .537 .525 .529 .533 .534 .533

Relative α 0 vs 2 .521 .459 .484 .514 .514 .514

Relative β 0 vs 2 .517 .512 .514 .526 .525 .525

Relative γ 0 vs 2 .537 .526 .530 .554 .554 .554

Relative βγ 0 vs 2 .525 .526 .522 .542 .541 .544




Table 4.11: Time-Partitioned Classification using Single Relative Features


ACC F1 F2 ACC F1 F2

θ+αβ 0 vs 1 vs 2 .342 .344 .346 .355 .355 .356

αβ 0 vs 1 vs 2 .355 .354 .355 .355 .356 .358

θ+αα+β 0 vs 1 vs 2 .347 .348 .349 .361 .364 .365

θβ 0 vs 1 vs 2 .354 .352 .352 .363 .362 .367

θ+αβ 0 vs 1 .530 .524 .526 .548 .548 .547

αβ 0 vs 1 .528 .521 .523 .541 .541 .541

θ+αα+β 0 vs 1 .535 .532 .533 .541 .542 .541

θβ 0 vs 1 .534 .526 .528 .544 .544 .544



1. Significant differences in performance is not observed between any β-Relative power

bands and around guess performance is achieved using any of the individual fea-

tures.

2. kNN and lSVM achieve similar performance across both classification tasks.

3. (θ + α / β) exhibit best performance for the binary classification task(.548) while

θ / β exhibit best performance over the ternary task(.363). This is around guess

performance for both classification tasks.

4. Single β-Relative features exhibit inferior performance compared to single Absolute

High Band features


Figure 4.19: Generalized Time-Partitioned ternary classification (Individual Features)

Figure 4.20: Generalized Time-Partitioned binary classification (Individual Features)


4.6.4 Grouped Power Spectral Sub-Features

Table 4.12 and Table 4.13 shows the group-wise comparison between the Absolute, Rel-

ative and β-Relative datasets after the proposed feature-processing mechanism. The

Absolute and Relative Featurespaces have a dimensionality of 12 across the two frontal

sites while the β-Relative Featurespace has a dimensionality of 8. Each of these datasets

generated by applying the proposed feature-processing mechanism and specified features-

paces undergo dimensionality reduction via PCA and LDA, and sub-datasets in the PCA

and LDA spaces are generated. The 9 datasets are compared across the two classification

tasks to identify the optimal performing featurespace.

The following discussion is presented for the Ternary Classification task:

1. In the original featurespace, Relative Power bands achieve the best performance

at an accuracy of .565 using rbSVM models. This is followed by the β-Relative

bands at an accuracy of .523 using ANN. In general, the relative band features-

paces outperform the absolute band featurespace. Non-linear classifiers significantly

outperform the linear classifiers across all three featurespaces.

2. In the PCA space with 3-dimensions explaining .95 of variance, the Relative Fea-

turespace outperforms both the Absolute and β-Relative featurespaces with best

performance accuracy of .529 using rbSVM models. This is a slight decrease in

performance from .565 when using all 12 features. Across all 3 featurespaces,a re-

duction in performance is achieved through PCA induced dimensionality reduction.

Absolute Power Featurespace is the worst performing featurespace.

3. In the LDA space with 2-dimensions, the β-Relative featurespace achieves the best

performance accuracy of .423 using kNN with 5 nearest neighbors. The Relative

featurespace achieves very similar performance accuracy of .420. Absolute fea-

turespace achieves the worst performance as is the case in all featurespaces. The

performance of the LDA datasets is significantly worse than the performance when


using the PCA or original featurespace datasets. This points towards the inherent

non-linearity present in the classification tasks for all datasets.

4. ANN and rbSVM models perform superiorly to the lSVM and kNN models across

all datasets.

5. All of the generated datasets produce significantly better than guess performance

across all featurespaces using linear or complex classification models.

6. Although the Relative sub-featurespace achieves the best performance (.565) when

compared to other sub-featurespaces, it should be noted that, using the combined

32-Feature space, a best performance accuracy of .790 is achieved using ANN. How-

ever, applying dimensionality reduction to the original 32-featurespace achieves

inferior performance as applying the dimensionality reduction to the reduced fea-

turespaces.

The following discussion is presented for the Binary Classification task:

1. In the original featurespace, Relative Power bands achieve the best performance at

an accuracy of .683 using rbSVM models. This is followed by the β-Relative bands

at an accuracy of .683 using rbSVM. In general, the relative band featurespaces out-

perform the absolute band featurespace. However, there is no significant difference

in the performance between the Absolute, Relative or β-Relative featurespaces.

2. In the PCA space with 3-dimensions explaining .95 of variance, the Relative Fea-

turespace outperforms both the Absolute and β-Relative featurespaces with best

performance accuracy of .671 using rbSVM models. This is a slight decrease in

performance from .683 when using all 12 features. Across all 3 featurespaces, an

insignificant reduction in performance is achieved through PCA induced dimen-

sionality reduction.


Figure 4.21: Generalized Time-Partitioned ternary classification (Sub-Features)

3. In the LDA space with 2-dimensions, the β-Relative featurespace achieves the best

performance accuracy of .621 using ANN. Absolute featurespace achieves the worst

performance with performance of .583. The performance of the LDA datasets is sig-

nificantly worse than the performance when using the PCA or original featurespace

datasets. This points towards the inherent non-linearity present in the classification

tasks for all datasets.

4. kNN, ANN and rbSVM models perform superiorly to the lSVM models across all

datasets.

5. All of the generated datasets produce significantly better than guess performance

across all featurespaces using linear or complex classification models.

6. Although the Relative sub-featurespace achieves the best performance (.683) when

compared to other sub-featurespaces, it should be noted that, using the combined

32-Feature space, a best performance accuracy of .864 is achieved using ANN. How-

ever, applying dimensionality reduction to the original 32-featurespace achieves

inferior performance as applying the dimensionality reduction to the reduced fea-

turespaces.


Table 4.12: Time-Partitioned Ternary Classification using Sub-Features



lSVM 0 vs 1 vs 2 .407 .405 .406 .420 .418 .419 .391 .391 391

kNN 0 vs 1 vs 2 .496 .495 .495 .550 .550 .550 .444 .444 .444

rbSVM 0 vs 1 vs 2 .498 .498 .498 .565 .565 .564 .501 . 502 .503

ANN 0 vs 1 vs 2 .502 .498 .503 .558 .558 .558 .523 .528 .528

lSVM(PCA) 0 vs 1 vs 2 .389 .387 .389 .374 .369 .371 .363 .344 .353

kNN(PCA) 0 vs 1 vs 2 .438 .438 .438 .530 .529 .531 .480 .479 .479

rbSVM(PCA) 0 vs 1 vs 2 .438 .440 .438 .539 .538 .538 0.464 .464 .465

ANN(PCA) 0 vs 1 vs 2 .440 .438 .440 .512 .512 .512 .449 .448 .448

lSVM(LDA) 0 vs 1 vs 2 .400 .398 .399 .418 .414 .415 .394 .376 .384

kNN(LDA) 0 vs 1 vs 2 .388 .388 .388 .402 .401 .401 .423 .425 .425

rbSVM(LDA) 0 vs 1 vs 2 .409 .406 .407 .417 .415 .413 .419 .418 .418

ANN(LDA) 0 vs 1 vs 2 .411 .406 .411 .420 .417 .417 .421 .417 .418


Linear SVM


Table 4.13: Time-Partitioned Binary Classification using Sub-Features



lSVM 0 vs 2 .550 .550 .550 .598 .593 .594 .602 597 .598

kNN 0 vs 2 .643 .643 .643 .670 .670 .670 .606 .606 .606

rbSVM 0 vs 2 .652 .650 .650 .683 .683 .683 .672 .671 .670

ANN 0 vs 2 .645 .645 .645 .657 .656 .656 .671 .671 .671

lSVM(PCA) 0 vs 2 .562 .562 .562 .586 .582 .583 .553 .547 .549

kNN(PCA) 0 vs 2 .637 .637 .637 .671 .671 .671 .638 .638 .638

rbSVM(PCA) 0 vs 2 .584 .583 .583 .672 .671 .671 .599 .597 .598

ANN(PCA) 0 vs 2 .621 .620 .620 .665 .663 .663 .607 .589 .596

lSVM(LDA) 0 vs 2 .552 .550 .551 .593 .584 .586 .602 .598 .599

kNN(LDA) 0 vs 2 .546 .544 .544 .585 .581 .582 .585 .581 .582

rbSVM(LDA) 0 vs 2 .556 .551 .552 .589 .581 .583 .565 .571 .568

ANN(LDA) 0 vs 2 .583 .576 .578 .621 .611 .614 .581 .577 .578


Linear SVM


Figure 4.22: Generalized Time-Partitioned binary classification (Sub-Features)

4.7 Discussion

A variety of featurespaces and datasets have been analyzed and the results are now

summarized and used to answer: What are the optimal feature(s) or featurespace(s) that

allow for the maximal discriminant behavior between the three cognitive loads?

Individualized Subject-Specific Performance

Linear SVM is able to satisfactorily discriminate in the original and feature-processed

individual datasets. This finding points towards an inherent linear separability when

individual participants are trained and tested on their own data. In the analysis, the

combined featurespace (absolute, relative, β-relative) as well as the grouped sub-spaces

are compared.

1. By applying Feauture Processing to the datasets, 1.0 accuracy is achieved when

using 32 dimensions or 1-dimensional LDA space for both tasks. This out-performs

the performance in the orignal datasets.

2. A downside to the feature processing methodology for individualized datasets is the

requirement for a calibration procedure before each drive since the standardization

and normalization statistics need to be computed for each drive and then fed into


the trained model for inference. However, this is a design tradeoff that is upto the

system designer to implement if the distinctly high performance is desired.

3. While a dimensionality reduction via LDA into 1 or 2 dimensions performs as well

as the 32-dimensional space, using the Absolute Power Bands featurespace also

performs .900 accuracy. An advantage of this approach in a real time monitoring

scenario is that no real time transformation from a multi-dimensional to a low

dimensional space is required via LDA. Absolute Power bands showcased the best

performance among the three sub-featuresets, followed by Relative Power Bands

and lastly the β-Relative Features.

Top Featurespaces: 32-dimensional Space, LDA space applied to 32-dimensional

space, 12-Absolute Power Featurespace

Top Learning Algorithm: Simple Linear Classifiers sufficient. Tested on lSVM

Generalized Subject-Partitioned Classification

Subject-Partitioned scheme where participants are tested on models generated by other

participants is the most challenging to model. Around guess performance is achieved

for all featurespaces with feature processing and dimensionality reduction techniques. A

subject-partitioned group is perhaps the most practical system design methodology as

prediction models may be pre-trained on anonymous subjects and an elegant out-of-box

performance may be experienced by the new test user. However, using a low channel

EEG sensor such as a Muse, this is not a feasible solution as around guess performance

is achieved.

Top Featurespaces (in order of ranking): PCA Applied to 32 Dimensional Space

Top Learning Algorithm:kNN with 50 nearest neighbours


Generalized Time-Partitioned Classification

The following table identifies the top feature and featurespaces in the time-partitioned

classification task. The results are comparable to prior studies and it can be deduced

that by training and testing on the same participants, models can be practically used

towards automated driver cognitive modeling.

1. In the original dataset with 32-features, kNN, rbSVM and ANN, all perform more

superior to Linear and Logistic Regression based models.

Top Featurespaces: 32-Features, 3-Dimensional PCA

Top Learning Algorithms: ANN, PCA, kNN10

2. The Feature Processed dataset achieves .86 and .79 accuracy for the binary and

ternary classification tasks. This is an approximated .20 and .15 percent perfor-

mance improvement when compared to the original dataset.

Top Featurespaces: 32-Features

Top Learning Algorithms: ANN

3. Analyzing the results of the Absolute, Relative and β-Relative Sub-Featurespaces,

the Relative Power Bands outperforms the other sub-featurespaces.

Top Featurespaces: Relative Power (Original, PCA and LDA)

Top Learning Algorithms: ANN, kNN10, rbSVM

4. The top 5 individual features are listed in order of ranking.

Top Featurespaces: Absolute γ, Absolute beta, Absolute β γ, Relative γ, Relative

β, α, αθ

Top Learning Algorithms: kNN


4.8 Chapter Summary

In this chapter, the experiment and results associated with generating statistical learn-

ing models to estimate driver cognitve workload are presented. First, the experiment

methodology is outlined, followed by the explaination of the generation of the various

datasets generated by adapting the featurespaces and application of the Feature Pro-

cessing. The methodology of Feature Processing is also described and a sliding window,

standardization and normalization process is described. Following, the machine learning

algorithms are described and the performance evaluation metrics are established. Fi-

nally, individualized and generalized subject-partitioned and time-partitioned datasets

are evaluated and a myriad of simulation results are presented to describe the top fea-

ture(s) and featurespace(s). Individual High Frequency Band features (β and γ) show

superior performance in discriminating between the three workloads, the Relative Power

(sub-feature-set) showcases the best performance as a sub-group and the overall best

performance is achieved by combining the sub-featurespaces and modeling with ANN.

Individual Models generate the best results using simple linear classifier followed by the

Time-Partitioned models using non-linear classifiers. Subject-Partitioned models are very

difficult to model.

Chapter 5

Conclusion

The goal of this study is towards the estimation of Driver Cogntive Workload Monitoring

using a Low Channel EEG modality. To that extent, the eDREAM dataset consisting of

labeled EEG data from a Low Channel wireless Headset is used as the system modality

and a system design consisting of Feature Extraction, Feature Processing and Classifica-

tion are performed. Through the generation of a system pipeline, this study showcases

the details and technicalities of each stage and its effect in generating varying statistical

leaerning performances estimating the driver cognitive workload. As a result, the top

feature(s) and featureset(s) are identified and determined to be the top candidates to

optimize this classification challenge. It is shown that using a Low Channel EEG modal-

ity, it is possible to use the information and biophysiological responses of the Pre-Frontal

Cortex to capture, process and extract features that are able to successfully discriminate

between the granular cognitive workloads induced via the n-back task.

In Chapter 2, attention is paid towards the modeling of the cognitive workload and

an overview of the features present as part of the eDREAM EEG modality. A detailed

description of the Feature Extraction process and the generation of the Spectral Power

Features are discussed. A top level statistical performance of the data is presented and

the features trends across varying cognitive loads are observed. Dimensionality Reduc-

108

Chapter 5. Conclusion 109

tion and its advantages as pertaining to the visualization and performance in Machine

Learning system are discussed. PCA and LDA are used as the dimensionality reduction

techniques in this work. Chapter 3 reviews the relevant works used for understanding the

area and two state-of-the-art studies are followed closely in this study. Finally a data-

mining approach of feature selection based on classification performance across a breadth

of featurespaces is performed in Chapter 4. Chapter 4 generates multi-dimensional in-

dividualized and generalized datasets across 28 subjects to carry out the feature se-

lection process. A feature processing methodology is implementing using windowing,

standardization and normalization schemes. The empirical advantages of such a scheme

are demonstrated across individualized and time-partitioned datasets. Finally, datasets

from individualized, time-partitioned and subject-partitioned schemes are generated and

using a k-Fold cross validation methodology, are evaluated with the goal of identifying

the top featurespace(s) and learning algorithms for the task. A summarization of the top

features and the practical implications are discussed.

5.1 Summary of Contributions

This pilot study explores the eDREAM dataset’s EEG modality towards building an

automated driver cognitive workload monitoring system. The sensory and ancillary in-

formation is explored and recommendations are made towards the best methodologies to

use the available data. Generation of additional features from the provided information

is demonstrated. Featurespaces used in literature are generated for this study and an

approach to learn from the myriad of featurespaces is demonstrated.

The primary goal of this study is to understand if the eDREAM EEG modality can be

used for discriminating between Cognitive Workload levels as induced by the secondary

n-back task. This is demonstrated via a bottom-up Data Mining Feature Selection pro-

cess where a breadth of datasets with varying featuresets are generated for both binary


and ternary classification tasks. Models generated for individual, grouped and permu-

tations with dimensionality reduction techniques of LDA and PCA are demonstrated

and evaluated. In this dataset, it is shown that β and γ Absolute and Relative indi-

vidual features showcase the highest discriminant behavior in the single feature domain.

The Relative Grouped Bands showcase the highest discriminant behavior in the grouped

domain. Best overall performance is achieved when all 32 Features are used and an ac-

ceptable performance for practical considerations is also achieved when PCA is applied

to this featurespace. Top performance of .79 for the ternary classification task and .86

for the binary classification is obtained when using all 32-features in a feature-processed

(standardized and normalized) dataset.

Another contribution is towards an in-depth comparison performed when using stan-

dardization / normalization techniques (and thereby using subject historical data) to-

wards reducing the inter-subject differences between the participants to improve model

generalization. A significant improvement in performance is observed when the partici-

pants undergo a Feature Processing stage where their historical data is used for calibration

and standardization purposes. The practical implications of using a standardization pro-

cess are also discussed and it is left onto the system designer to make a decision between

higher performance and user-experience. However, through this process, one of the goals

in this study, of generating generalized models across participants is achieved when using

time-partitioned models. A myriad of data partitioning techniques are also performed.

In a subject-partitioned dataset, training and model generation is performed on a unique

set of participants and tested on another set of participants. Around guess performance

is observed which shows that significant inter-individual differences exist and in order to

achieve high-performance, it is important to have participants part of both the training

and testing stages. In the time-partitioned validation methodology, same participants are

part of both the training and testing stages and much improved performance is achieved.

Finally an analysis of individualized models with and without feature-processing is


performed and the performance advantages of using a feature-processed model is dis-

cussed. Best performance, as expected, is achieved in individualized feature-processed

datasets.

5.2 Future Works

5.2.1 Improvements to Current Work

1. Due to the availability of data of all users, the individual performance of all users

were observed and it was possible to identify users which showcased outlier perfor-

mance. In some cases, in order to create stable models, these users were excluded

as part of the model generation process. This can be considered to be a form of

data snooping. In order to build generalized models in a practical scenario, it is

essential that outlier participants are studies and used as part of the evaluation as

it is imperative that such users will be present in the population.

2. By using standardization and normalization, four pieces of historical information

are required to be stored and therefore before each drive, a calibration process is

required to compute these four parameters. This can be a detriment in driving

scenarios where the user may want to step in and start driving and not undergo a

controlled calibration process. In essence, the requirement of the historical data of

a driver may pose challenges in the real world.

3. Similar challenge as the previous point exists for dimensionality reduction tech-

niques where PCA and LDA coefficients need to be stored to apply on the unseen

test data. The decision to either use group or individualized PCA/LDA coefficients

poses challenges in the real world and evaluation performance of both methodolo-

gies should be investigated. This study stores a set of group PCA/LDA coefficients

for evaluation which are applied to each test subject individually.


4. The affect of using differing sliding window schemes should be evaluated. In this

study, sliding windows of size 1-3 seconds are evaluated and due to short 75s n-

back regions, larger windows were not evaluated. It would be beneficial to study

the performance as window sizes are increased.

5.2.2 Extension of Work

1. Phase information generated from the FFT transformation can be evaluated in

addition to the amplitude response. In general more comprehensive temporal-

spectral extraction techniques such as Wavelet Packet Transform and Empirical

Mode Decomposition can be performed such as in a study by Li et al. [36]

2. Deep Learning methods have become very popular in recent years and richer set of

features can be learned iteratively using deep networks such as Recurrent Neural

Networks or Stacked Denoising AutoEncoder networks as performed by Yin et al.

[58]

3. Fusion techniques by adding features from other modalities should also be inves-

tigated. The eDREAM dataset has a myriad of modalities such as video, eye-

tracking, EEG etc. and using fusion techniques can be implemented.

4. The prevalence of marked gender and age information may also be used to in-

vestigate the affect of gender and age on model performance. This can allow for

a higher-level human-factors study towards the affect of age and gender towards

model generation.

Bibliography

[1] “Drowsy driving and automobile crashes: Report and recommendation,” National

Heart, Lung, and Blood Institute, Tech. Rep., 1998.

[2] “National motor vehicle crash causation survey,” US Department of Transportation,

Tech. Rep., 2008.

[3] Learning from Data - A Short Course. AMLBook, 2012.

[4] “Combining eeg with pupilometry to improve cognitive workload detection,” Phys-

iological Computing, 2015.

[5] “Critical reasons for crashes investigated in the national motor vehicle crash causa-

tion survey,” US Department of Transportation, Tech. Rep., 2015.

[6] (2017) edream dataset. [Online]. Available: http://www.dsp.utoronto.ca/projects/

eDREAM/

[7] (2017) Emotiv 5-channel eeg headset. [Online]. Available: http://www.emotiv.com/

[8] (2017) Mobita 32-channel eeg headset. [Online]. Available: http://www.biopac.com/

[9] (2017) Muse 4-channel eeg headset. [Online]. Available: http://dev.choosemuse.

com/tools/available-data

[10] (2018) Tesla and gm self-drive cars involved in road collisions. [Online]. Available:

http://www.bbc.com/news/technology-42801772

113

http://www.dsp.utoronto.ca/projects/eDREAM/

http://www.dsp.utoronto.ca/projects/eDREAM/

http://www.emotiv.com/

http://www.biopac.com/

http://dev.choosemuse.com/tools/available-data

http://dev.choosemuse.com/tools/available-data

http://www.bbc.com/news/technology-42801772

BIBLIOGRAPHY 114

[11] K. S. A. Sahayadhas and M. Murugappan, “Detecting driver drowsiness based on

sensors: A review,” Sensors, 2012.

[12] A. S. Aghaei, B. Donmez, C. C. Liu, D. He, G. Liu, K. N. Plataniotis, H.-Y. W.

Chen, and Z. Sojoudi, “Smart driver monitoring: When signal processing meets

human factors: In the driver’s seat,” IEEE Signal Processing Magazine, vol. 33,

no. 6, pp. 35–48, 2016.

[13] H. Almahasneh, W. Chooi, N. Kamel, and A. Malik, “Deep in thought while driving:

An eeg study on driver cognitive distraction,” Transportation Research, vol. 26, pp.

218–226, 2014.

[14] C. Berka, D. Lavendowski, M. Limicao, A. Yau, G. Davis, V. Zivkovic, R. Olmstead,

D. Tremoulet, and P. Craven, “Eeg correlates of task engagement and mental work-

load in vigilance, learning and memory tasks,” Aviation, Space and Environmental

Medicine, vol. 78, pp. 234–244, 2007.

[15] A. Berkivich-Ohana, J. Glicksohn, and A. Goldstein, “Mindfullness-induced changes

in gamma band activitty - implications for the default mode network, self-reference

and attention,” Clinical Neurophysiology, 2011.

[16] B. Borghetti and C. Rusnock, “Introduction to real-time state assessment,” Air

Force Institute of Technology, USA, Tech. Rep., 2016.

[17] G. Borghini, L. Astolfi, G. Vecchiato, D. Mattie, and F. Babiloni, “Measuring neu-

rophysiological signals in aircraft pilots and car drivers for the assessment of mental

workload, fatigue and drowsiness,” Neuroscience and Biobehavioral Reviews, vol. 44,

pp. 58–75, 2012.

[18] A. Campagne, T. Pebayle, and A. Muzet, “Correlation between driving errors and

vigilance level: influence of the driver’s age,” Physiology and Behavior, vol. 80, pp.

512 – 524, 2004.

BIBLIOGRAPHY 115

[19] S. Chandra and S. Sharma, “Workload regualtion by sudarshan kriya: an eeg and

ecg perspective,” vol. 4, pp. 13–25, 2017.

[20] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM

Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27,

2011.

[21] D. T. D. Yadron. (2016) Tesla driver dies in first fatal crash while using autopilot

mode. [Online]. Available: https://www.theguardian.com/technology/2016/jun/

30/tesla-autopilot-death-self-driving-car-elon-musk

[22] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama, “Driver inattention monitoring

system for intelligent vehicles: A review,” IEEE Transactions on Intelligent Trans-

portation Systems, vol. 12, no. 2, pp. 596–614, 2011.

[23] A. Gevins and M. Smith, “Monitoring working memory load during computer-based

tasks with eeg pattern recognition methods,” SAM Technology and EEG Systems

Laboratory, Tech. Rep., 1998.

[24] M. Gevins, A Smith, “Monitoring working memory load during computer-based

tasks with eeg pattern recognition methods,” Human factors and Ergonomics Soci-

ety, vol. 40, no. 1, pp. 78–91, 1998.

[25] M. Gillberg, G. Kercklund, and T. Akerstedt, “Sleepineses and performance of pro-

fessional drivers in a truck simulator - comparisons between day and night driving,”

Journal of Sleep Research, vol. 6, pp. 12–15, 1996.

[26] J. L. Harbluk, Y. I. Noy, P. L. Trbovich, and M. Eizenman, “An on-road assessment

of cognitive distraction: Impacts on drivers’ visual behavior and braking perfor-

mance,” Accident Analysis & Prevention, vol. 39, no. 2, pp. 372–379, 2007.

https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk

https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk

BIBLIOGRAPHY 116

[27] T. Harmony, “The functional significance of delta oscillations in cognitive process-

ing,” Frontiers in Integrative Neuroscience, 2013.

[28] T. Harmony and T. Fernandez, “Eeg delta activity: an indicator of attention to

internal processing during performance of mental tasks,” International Journal of

Psychophysiology, 1996.

[29] S. Hart, “Nasa task load index (tlx),” NASA, Tech. Rep., 1986.

[30] H. O. J. Son, M. Park, “Sensitivity of multiple cognitive workload measures: A field

study considering environmental factors,” Daegu Gyeongbuk Institute of Science

and Technology Deagu, South Korea, Tech. Rep., 2014.

[31] B. Jap, S. Lal, P. Fischer, and E. Bekiaris, “Using eeg spectral components to assess

algorithms for detecting fatigue,” Expert Systems with Applications, vol. 36, pp.

2352–2359, 2009.

[32] H.-B. Kang, “Various approaches for driver and driving behavior monitoring: A

review,” in Proceedings of the IEEE International Conference on Computer Vision

Workshops, 2013, pp. 616–623.

[33] R. Khushaba, S. Lal, and D. G, “Driver drowsiness classification using fuzzy wavelet-

packet-based feature-extraction algorithm,” IEEE Transactions on Biomedical En-

gineering, vol. 58, pp. 121–131, 2011.

[34] S. Lal and A. Craig, “Driver fatigue: Eeg and psychological assesment,” Psychophys-

iology, vol. 39, pp. 313–321, 2002.

[35] S. Lei and M. Roetting, “Influence of task combination of eeg spectrum modulation

for driver workload estimation,” Berlin Institute of Technology, Tech. Rep., 2011.

BIBLIOGRAPHY 117

[36] D. Li, W. Pedreyez, and N. Pizzi, “Fuzzy wavelet packet based feature extraction

method and its applications to biomedical signal classification,” IEEE Transactions

in Biomedical Engineering, vol. 52, pp. 1132–1139, 2005.

[37] Y. Liang, M. Retes, and J. Lee, “Real-time detection of driver cognitive distraction

using svm,” IEEE Transactions on Intelligent Transportation Systems, vol. 8, pp.

341–350, 2007.

[38] F. Lin, L. Ko, C. Chuang, T. Su, and L. C, “Generalzied eeg-based drowsiness pre-

diction system by using a self-organizing neural fuzzy system,” IEEE Transactions

on Circuits and Systems, vol. 59, pp. 2044–2055, 2012.

[39] C. C. Liu, “Towards practical driver cognitive load detection based on visual atten-

tion information,” 2017.

[40] H. D. Liu, Cheng Chen, B. Donmez, and K. N. Plataniotis, “edream data collection

report,” Tech. Rep., 2016.

[41] M. Lundqvist, P. Herman, and A. Lansner, “Theta and gamma power increases

and alpha/beta power decreases with memory load in an attractor network model,”

Journal of Cognitive Neuroscience, vol. 23, pp. 3008–3020, 2011.

[42] B. Mehler, B. Reimer, and J. Dusek, “Mit agelab delayed digit recall

task (n-back),” Massachusetts Institute of Technology, Cambridge, MA, Tech.

Rep. 2011-3B, 2011. [Online]. Available: http://agelab.mit.edu/system/files/

Mehler et al n-back-white-paper 2011 B.pdf

[43] R. Mitchell. (2018) Tesla crash highlights a problem: When cars are

partly self-driving, humans don’t feel responsible. [Online]. Available: http:

//www.latimes.com/business/autos/la-fi-hy-tesla-autopilot-20180125-story.html

[44] G. Nolfe, “Eeg and medicine,” Clinical Neurophysiology, vol. 123, pp. 631–632, 2012.

http://agelab.mit.edu/system/files/Mehler_et_al_n-back-white-paper_2011_B.pdf

http://agelab.mit.edu/system/files/Mehler_et_al_n-back-white-paper_2011_B.pdf

http://www.latimes.com/business/autos/la-fi-hy-tesla-autopilot-20180125-story.html

http://www.latimes.com/business/autos/la-fi-hy-tesla-autopilot-20180125-story.html

BIBLIOGRAPHY 118

[45] L. Ryu and M. Rohae, “Evaluation of mental workload with a combined measure

based on physiological indices during a dual task of tracking and mental arithmetic,”

Industrial Ergonomics, vol. 35, pp. 991–1009, 2005.

[46] E. P. F. Shijing Liu, Chang S. Nam, “Quantitative modeling of user performance in

multitasking environments,” 2018.

[47] M. Smith and A. Gevins, “Monitoring task loading with multivariate eeg mea-

sures during complex forms of human-computer interaction,” Human factors and

Ergonomics Society, vol. 43, no. 3, pp. 366–380, 2001.

[48] ——, “Monitoring task loading with multivariate eeg measures during complex forms

of human-computer interaction,” Brain Research Institute and SAM Technology, San

Francisco, California, Tech. Rep., 2001.

[49] A. Sonnleitner, M. Treder, M. Simon, S. Willmann, A. Erwald, A. Buchner, and

M. Schrauf, “Eeg alpha spindles and prolonged brake reaction times during auditory

distraction in an on-road driving study,” Accidednt Analysis and Prevention, vol. 62,

pp. 110–118, 2014.

[50] N. Sriraam, T. Padmashri, and U. Maheshwari, “Recognition of wake-sleep stage 1

multichannel eeg patterns using spectral entropy features for drowsiness detection,”

Australasian College of Physical Scientists and Engineers in Medicine, vol. 39, pp.

797–806, 2016.

[51] J. Sweller, “Cognitive load during problem solving: Effects on learning,” Cognitive

science, vol. 12, no. 2, pp. 257–285, 1988.

[52] J. G. T.Akerstedt, A.Anund, “Subjective sleepiness is a sensitive indicator of insuf-

ficient sleep and impaired waking function,” Journal of Sleep Research, vol. 23, pp.

12–58, 2014.

BIBLIOGRAPHY 119

[53] S. Wang, J. Gwizdka, and W. Chaovalitwongse, “Using wireless eeg signals to as-

sess memory workload in the n-back task,” IEEE Transactions on Human-Machine

Systems, vol. 46, pp. 424–435, 2016.

[54] B. Xie and G. Salvendy, “Review and reappraisal of modelling and predicting mental

workload in single- and multitask environments,” Work and Stress, 2000.

[55] t. S. Y. Zheng, Y.Jie, “Workload functions distribution method: A workload mea-

surement based on pilot’s behaviors,” Shanghai Aircraft Airworthiness Certification

Center of CAAC, Shanghai, People’s Republic of China, Tech. Rep., 2016.

[56] V. Yeo, X. Li, K. Shen, and E. Wilder-Smith, “Can svm be used for automatic eeg

detection of drowsiness during car driving?” Safety Science, vol. 47, pp. 115–124,

2009.

[57] R. M. Yerkes and J. D. Dodson, “The relation of strength of stimulus to rapidity of

habit-formation,” Journal of Comparative Neurology and Psychology, vol. 18, no. 5,

pp. 459–482, 1908.

[58] Z. Yin and J. Zhang, “Cross-session classification of mental workload levels using eeg

and an adaptive deep learning model,” Biomedical Signal Processing and Control,

vol. 33, pp. 30–47, 2017.

[59] M. Ziegler, A. Kraft, M. Krein, L. Lo, B. Hatfield, W. Casebeer, and B. Russel,

“Sensing and assessing cognitive workload across multiple tasks,” Lockheed Martin

Advanced Technology Lab, Arlington, VA, USA, Tech. Rep., 2016.

towards practical driver cognitive workload monitoring via … · 2018-07-18 · abstract towards...

Documents