towards practical driver cognitive workload monitoring via … · 2018-07-18 · abstract towards...
TRANSCRIPT
Towards Practical Driver Cognitive Workload Monitoringvia Electroencephalography
by
Vipin Bakshi
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
The Edward S. Rogers Sr. Department of Electrical and ComputerEngineering
University of Toronto
Copyright c© 2018 by Vipin Bakshi
Abstract
Towards Practical Driver Cognitive Workload Monitoring via Electroencephalography
Vipin Bakshi
Master of Applied Science
The Edward S. Rogers Sr. Department of Electrical and Computer Engineering
University of Toronto
2018
Monitoring of Driver Cognitive Workload is an active area of research and has gained
traction in recent years. This study pertains towards the development of an Automated
Driver Cognitive Workload Prediction System using the Electroencephalography modal-
ity. In this experiment, the driver cognitive workload has been modelled using a secondary
n-back task. This study aims to answer the question if a consumer-grade 2-channel EEG
modality can be used to discriminate between the granular cognitive workloads induced
on the driver as it is measured by the n-back test. Statistical Learning Models are gen-
erated while taking into account the practical data-partitioning schemes as would be
feasible in real world implementation.
It is found that individual Beta and Gamma bands provide good discriminating per-
formance while using a combined set of 32-features provides the best overall performance.
Non-Linear Classifiers outperform the linear classifiers and dimensionality reduction tech-
niques assist in producing practical prediction models.
ii
Acknowledgements
First and foremost, I would like to thank Professor Konstantinos (Kostas) for providing
me with the opportunity to perform this study. Above all, the kindness and patience,
in providing timely assistance and encouragement towards this study made the journey
that much sweeter.
I would also like to thank some special people for the support over the duration of
this study. To my mom and dad: Thank you for the constant love and encouragement
and making sure everything was fine back home. I am so lucky to have you as parents
and I appreciate all that you have done through the years to make sure that I can achieve
my numerous goals. You inspire me to be a better person each day and to strive towards
achieving my full potential. To my brother: Thank you for your support over the years
and to always give the alternative perspective that was needed. I am so proud of all that
you have accomplished and the exciting things to come.
Thank you to the fantastic friends and researchers at the Multimedia Lab who assisted
with presentations, ideas and write-ups. It was a privilege to work with all of you.
iii
Table of Contents
Acknowledgements iii
Table of Contents iv
List of Tables viii
List of Figures ix
Glossary xii
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Modeling using Low-Channel Consumer Grade EEG . . . . . . . 4
1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 eDREAM EEG Modality 8
2.1 Quantifying Cognitive Load . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Modeling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 n-Back Secondary Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 n-Back for the Driving Task . . . . . . . . . . . . . . . . . . . . . 14
iv
2.4 eDREAM Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 n-Back Drive Labeling . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 EEG Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Time Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 EEG Headband Measurements . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6.1 Wireless EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.2 Power Spectral Density Feature . . . . . . . . . . . . . . . . . . . 21
2.6.3 Absolute Band Power Features . . . . . . . . . . . . . . . . . . . 23
2.6.4 Relative Band Power Features . . . . . . . . . . . . . . . . . . . . 24
2.6.5 βγ Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.6 β-Relative Features . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.7 Additional / Artifact Information . . . . . . . . . . . . . . . . . . 26
2.7 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.1 PSD Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.2 Statistical Featurespace Overview . . . . . . . . . . . . . . . . . . 29
2.8 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8.1 Dimensionality Reduction for Individual Participants . . . . . . . 34
2.9 Dimensionality Reduction for all Participants . . . . . . . . . . . . . . . 36
2.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Related Works and Review 44
3.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 Drowsiness and Fatigue studies . . . . . . . . . . . . . . . . . . . 45
3.1.2 Cognitive Workload and Distraction . . . . . . . . . . . . . . . . . 46
3.2 State-of-the-Art Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Wireless EEG System for Driver Vigilance (Lin et. al)[38] . . . . 49
3.2.2 Wireless EEG for Cognitive Workload(Wang et. al)[53] . . . . . . 52
3.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
v
4 Estimating Cognitive Load with Wireless EEG 56
4.1 Experiment Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.1 Cumulative Feature Space . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Feature Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Windowed-Averaging . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.3 Subject-Level Normalization . . . . . . . . . . . . . . . . . . . . . 65
4.4 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.1 Support Vector Machine with Linear Kernel (LSVM) . . . . . . . 69
4.4.2 Logistic Regression Logistic Regression (LR) . . . . . . . . . . . . 70
4.4.3 k-Nearest Neighbors (k-Nearest Neighbours (kNN)) . . . . . . . . 70
4.4.4 Support Vector Machine with Radial Basis Kernel (RBSVM) . . . 70
4.4.5 Shallow Artificial Neural Network (Artifical Neutal Network (ANN)) 71
4.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.1 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.1 Individualized Subject-Specific Performance . . . . . . . . . . . . 78
4.6.2 Generalized Subject-Partitioned Classification . . . . . . . . . . . 83
4.6.3 Generalized Time-Partitioned Performance . . . . . . . . . . . . . 87
4.6.4 Grouped Power Spectral Sub-Features . . . . . . . . . . . . . . . 98
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5 Conclusion 107
5.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 108
vi
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.1 Improvements to Current Work . . . . . . . . . . . . . . . . . . . 110
5.2.2 Extension of Work . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography 112
vii
List of Tables
2.1 Granular Cognitive Workload States . . . . . . . . . . . . . . . . . . . . 11
2.2 EEG Headband Measurements . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 EEG Headband Measurements . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Group-wise(28) PSD Band Sensitivities . . . . . . . . . . . . . . . . . . . 28
4.1 PSD Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 16-dimensional Feature Vector (per channel) . . . . . . . . . . . . . . . . 60
4.3 Medians of Individualized Scores using All-Features(32) . . . . . . . . . . 81
4.4 Medians of Individualized Scores using Sub-Features . . . . . . . . . . . . 81
4.5 Generalized Subject-Partitioned Binary Classification . . . . . . . . . . . 85
4.6 Generalized Subject-Partitioned Ternary Classification . . . . . . . . . . 86
4.7 Time-Partitioned Binary Classification using 32-Features . . . . . . . . . 91
4.8 Time-Partitioned Ternary Classification using 32-Features . . . . . . . . 92
4.9 Time-Partitioned Classification using Single Absolute Features . . . . . . 94
4.10 Time-Partitioned Classification using Single Relative Features . . . . . . 95
4.11 Time-Partitioned Classification using Single Relative Features . . . . . . 96
4.12 Time-Partitioned Ternary Classification using Sub-Features . . . . . . . . 101
4.13 Time-Partitioned Binary Classification using Sub-Features . . . . . . . . 102
viii
List of Figures
1.1 Proposed top-level design pipeline . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Empirical Cognitive Modeling . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Yerkes-Dodson: Arousal vs Performance . . . . . . . . . . . . . . . . . . 10
2.3 n-Back Drive for Participant 8 . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Muse Headband electrode placements. . . . . . . . . . . . . . . . . . . . 18
2.5 Muse Headband gel foam temporal electrodes and frontal dry electrodes. 18
2.6 EEG Signal Labeling Procedure . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 220Hz EEG for User 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 FFT Response User 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 Sensor Conductivity at Frontal Site . . . . . . . . . . . . . . . . . . . . . 27
2.10 Sensor Conductivity at Temporal Site . . . . . . . . . . . . . . . . . . . . 27
2.11 Groupwise PSD across the 3 nBack tasks. . . . . . . . . . . . . . . . . . 29
2.12 Absolute Power Feature vs nBack Task for group of 28 users at Frontal Sites 30
2.13 Relative Power Feature vs nBack Task for group of 28 users at Frontal Sites 31
2.14 βRelative Power Feature vs nBack Task for group of 28 users at Frontal
Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.15 Single Participant PCA on 32-features . . . . . . . . . . . . . . . . . . . 35
2.16 Single participant PCA on 32-feautures . . . . . . . . . . . . . . . . . . . 35
2.17 Single Participant LDA on 32-features . . . . . . . . . . . . . . . . . . . 36
ix
2.18 Principal Component Analysis (PCA) applied to a 32-Dimensional Fea-
turespace (28 Participants) . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.19 PCA applied to a 32-Dimensional Featurespace (28 Participants) . . . . . 38
2.20 Linear Discriminant Analysis (LDA) applied to a 32-Dimensional Features-
pace (28 Participants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.21 PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.22 PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.23 LDA applied to a 12-Dimensional Absolute Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.24 PCA applied to a 12-Dimensional Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.25 PCA applied to a 12-Dimensional Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.26 LDA applied to a 12-Dimensional Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.27 PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.28 PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.29 LDA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 Power Spectral sensitivities across cognitive tasks . . . . . . . . . . . . . 48
3.2 State-of-Art Studies employing EEG and nBack Tasks . . . . . . . . . . 54
4.1 Top Level Design Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x
4.2 Top Level Design Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Windowing Applied to a time-series Absolute γ signal . . . . . . . . . . . 64
4.4 Standardization operation applied to a time-series 3s Averaged Absolute
γ signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Normalization operation applied to a time-series 3s Averaged and Stan-
dardized Absolute γ signal . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Complete Feature Processing methodology to transform original Time-
Series signal into a processed Normalized Signal prepared for the Classifi-
cation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 Machine Learning Algorithm and evaluated Hyper-Parameters . . . . . . 72
4.8 k-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.9 Individualized Models median binary classification performance . . . . . 79
4.10 Individualized Models median ternary classification performance . . . . . 80
4.11 Individualized Models sub-featurespace performance . . . . . . . . . . . . 80
4.12 2-dimensional LDA Space for Subject 37 on original dataset . . . . . . . 82
4.13 2-Dimensional LDA Space for Subject 37 on feature-processed dataset . . 82
4.14 3-dimensional PCA Space for Subject 37 on feature-processed dataset . . 83
4.15 Generalized Subject-Partitioned binary classification . . . . . . . . . . . . 84
4.16 Generalized Subject-Partitioned ternary classification . . . . . . . . . . . 84
4.17 Generalized Time-Partitioned ternary classification . . . . . . . . . . . . 90
4.18 Generalized Time-Partitioned binary classification . . . . . . . . . . . . . 90
4.19 Generalized Time-Partitioned ternary classification (Individual Features) 97
4.20 Generalized Time-Partitioned binary classification (Individual Features) . 97
4.21 Generalized Time-Partitioned ternary classification (Sub-Features) . . . . 100
4.22 Generalized Time-Partitioned binary classification (Sub-Features) . . . . 103
xi
Glossary
α Frequency Band: 7.5Hz-13Hz. xi, 24
β Frequency Band: 13Hz-30Hz. xi, 24
βγ Frequency Band: 20Hz-40Hz. xi, 24
δ Frequency Band: 1Hz-4Hz. xi, 24
γ Frequency Band: 30Hz-44Hz. xi, 24
θ Frequency Band: 4Hz-8Hz. xi, 24
ANN Feed-forward Neural Network with backpropagation. vi, xi, 56, 71
DSP Digital Signal Processing. xi, 20
EDD Multi-modal Driver Cognitive Workload Monitoring dataset. xi, 5–7, 14
EEG Electroencephalography. xi, 3
FFT Fast Fourier Transform. xi, 21
Fp Frontal Polar Sites. xi, 19
IVIS In-Vehicular Intelligence Systems. xi, 1
kNN k-Nearest Neighbours. vi, xi, 46, 56, 70
xii
LDA Linear Discriminant Analysis. x, xi, 6, 7, 33, 34, 36, 38, 40, 41, 43, 46, 54, 57, 59,
79, 81–83, 85, 86, 88, 89, 91, 92, 98, 100–105, 109
LR Logistic Regression. vi, xi, 70
LSVM Support Vector Machine with Linear Kernel. xi, 46, 56
PCA Principal Component Analysis. x, xi, 6, 7, 33, 34, 36–43, 54, 57, 59, 78, 81, 83,
85–89, 91, 92, 98–102, 104, 105, 108–110
PSD Power Spectral Density. xi, 4
RBSVM Support Vector Machine with Radial Basis Kernel. xi, 46, 56
TP Temporal-Parietal Sites. xi, 20
xiii
Chapter 1
Introduction
1.1 Background and Motivation
The automotive industry is undergoing a new era of technological transformations. These
advancements are led by developments in autonomous vehicular technologies towards the
safe and automated navigation of vehicles. However, the advancements are not merely
restricted to the navigation of the vehicle, instead, vast changes inside the vehicle, in the
form of In-Vehicle Intelligence Systems (IVIS), sophisticated Vehicular Control / Navi-
gation Systems and increased integration with smartphones are drastically changing the
behavior of a driver inside a vehicle. Studies have shown driver inattention to be a lead-
ing cause of automotive accidents.[18][1][5][2] In addition, with the gradual progression
of the vehicles towards full-automation, it is imperative that drivers maintain maximal
attention on the road during this transitory period. In fact, various recent cases have
been reported where driver inattention in semi-autonomous vehicles have been factors
towards fatal accidents.[21][10][43] A new generation of vehicles are considered to be semi-
autonomous such that these vehicles are able to autonomously navigate under controlled
environments, however, as conditions deteriorate, it is imperative for drivers to regain
control of these vehicles. Therefore, newer internal and external vehicle technologies have
1
Chapter 1. Introduction 2
given a rise to added complexities in driving which requires the maximal driver attention
on the road.
Studies have shown driver inattention to be prevalent in the form of visual, manual or
cognitive distractions.[5][2] While visual and manual distractions can be observed easily
observed through video modalities, detecting cognitive distraction is much more challeng-
ing. Complex cognitive workload states can be generated due to a variety of factors such
as emotional state, fatigue, drowsiness or external stimulus to name a few. Studies have
shown the cognitively distracted state to be a leading cause of driving accidents whereby
a driver may seem to be visually and manually attentive, yet be cognitively distracted
due to the aforementioned conditions.[18][1] Therefore, it is of urgent importance, and
various approaches have been discussed, towards Driver Cognitive Workload Monitoring
to build automated predictive systems to prevent distraction related repercussions.[12]
[22] [32]
1.1.1 Technical Challenges
In order to successfully model the Driver Cognitive Workload towards building a practical
and automated system, a variety of challenges are presented:
1. Quantifying a subjective measure such as Cognitive Workload is an interdisciplinary
subject between Psychology and Human Factors studies[39]. A variety of modeling
methods have been researched, proposed and implemented, and careful consider-
ation must be paid towards the generation of a commonly agreed ’ground truth’
metric as a confidence measure for further modeling work. Challenge lies in the ex-
perimental design, data collection and modeling methodologies via various modali-
ties towards ensuring that a properly quantified cognitive workload model is indeed
being generated.
2. A wide variety and paradigms of system modalities are present such as Driving
Chapter 1. Introduction 3
Performance, Eye Tracking, Video and Physiological modalities to name a few.
Each modality contains its own set of challenges and advantages related to the
balance in precision, performance and practicality.[30]
3. A major goal towards the development of a practical and automated driver cognitive
workload system is to study the trade-off between individualized and generalized
model performance. Participants have high inter-individual variability in their re-
sponses to cognitive workload demands and challenges arising from the practical
and performance trade-offs between individualized and generalized models must be
addressed.
4. A successful cognitive workload detection system must be practical, reproducible,
reliable and responsive.
5. Finally there exists a lack of standardized procedures and datasets to collectively
study and tackle this problem.
1.2 Research Objective
Electroencephalography (EEG) is one of the most prominent Physiological modalities
used for Driver Cognitive Load Monitoring.[17] While Heart Rate and Galvanic sensors
are other options for physiological monitoring, EEG has consistently shown to be a
more reproducible and a reliable source for monitoring.[17] In part, the advantage lies
with the recording site: the Brain, the center of all cognitive actions and resources.
EEG allows for the recording of the induced electro-physiological behavior of the brain
while driving under varying cognitive conditions. The close proximity of the sensor
to the brain allows for the finest temporal resolution of recording to gather the most
responsive changes induced by external factors such as an increased workload during
driving. Significant correlations have been shown to associate EEG measurements with
Chapter 1. Introduction 4
cognitive workload changes.[24][47][51] In particular, studies have been performed to
detect correlations between related mental states such as alertness, attention, fatigue,
drowsiness and cognitive workload.[34][31][25][44][19] The research objective of this study
is now stated:
In this experiment, the driver cognitive workload has been modelled using a secondary
n-back task (Section 2.3). This study aims to answer the question if a consumer grade
2-channel EEG modality can be used to discriminate between the granular cognitive work-
loads induced on the driver as it is measured by the n-back test
1.2.1 Modeling using Low-Channel Consumer Grade EEG
Most studies have used high resolution medical-grade EEG sensors which can require at
least 32 electrodes(channels) to cover the neural activity across the entire scalp. While
the added number of channels increases the performance of the monitoring system, it
also imposes ergonomic challenges. In particular, a wired medical grade EEG sensor is
too cumbersome and intrusive to be worn on a practical real-world driving monitoring
system. In addition, a trade-off using high resolution recording is the absence of digital
processing of the signals at the recording site. The recorded signals are often processed at
a secondary processing site such as a remote computer which makes it impractical to be
used in a real-time driving monitoring system. To mitigate these issues, newer research
has been performed using consumer-grade wireless EEG devices which are much more
ergonomically desirable and provide a rich set of on-device processing capabilities. As a
drawback, these devices contain a much lower number of recording channels (4-16) and
focus on concentrated regions of the scalp with a limited number of sensors to derive
patterns and behaviors. [9][7][8]
An objective of this study is to use one such consumer-grade wireless EEG sensor,
the Muse, and to determine its feasibility in the development of an automated Driver
Cognitive Monitoring System. In particular, the Power Spectral Density (LDA) generated
Chapter 1. Introduction 5
Figure 1.1: Proposed top-level design pipeline
by the sensor are evaluated to determine the optimal feature(s) or feature-set(s) which
provide the maximal discriminant behavior in distinguishing between differing cognitive
workloads. Figure 1.1 describes the top-level system design to answer this question.
1.3 Thesis Contributions
In order to study the feasibility of using a consumer-grade EEG modality towards
the development of an automated driver cognitive monitoring system, the eDREAM
dataset (EDD) is considered.[6] The EDD is a multimodal dataset that consists of Vehi-
cle performance, Visual-Video and Physiological(Heart Rate, Galvanic, Respiration and
EEG) modalities with are time-synchronized with ground truth cognitive level measure-
ments. Prior studies have been performed with the Visual-Video modality, however, a
detailed analysis of the EEG modality has not been performed. Using the EEG modality
(consumer-grade Muse sensor) of the EDD, the following contributions are made towards
Chapter 1. Introduction 6
the development of a driver cognitive monitoring system:
1. A careful description of the EDD as it pertains to the EEG modality is described.
A careful overview of the hardware and software capabilities of the primary Muse
sensor is explored with respect to the original EEG measurements and the derived
features. Attention is paid to the on-device measurement technicalities versus the
information that is transmitted wirelessly and recorded as part of the dataset. In
summary, this contribution describes the featurespace that is generated by under-
standing the recording technicalities of the EEG recording from the Muse sensor
and how these recording are time-synchronized with the secondary task to create
a labeled dataset that is used in the design of this monitoring system. Finally, a
statistical analysis on the discriminant sensitivity across the features are discussed.
2. A standardization and normalization feature-processing step using historical par-
ticipant data is proposed which aims to reduce artifact and inter-individual differ-
ences towards building successful generalizable models. Performance comparisons
between the original and feature-processed datasets are discussed.
3. Dimensionality reduction techniques of PCA and LDA are performed. Compar-
isons in performance between PCA, LDA and original featurespaces is performed
to determine the practical implications of improving modeling performance via di-
mensionality reduction techniques.
4. A comparison between individualized and generalized models is performed to un-
derstand the performance and practical tradeoffs of using both methodologies for
estimating cognitive load.
5. In order to achieve successful modeling of driver cognitive load, it is important
to identify the optimal feature or set of features that generate the best prediction
performance. In this study, various Machine Learning models are generated on
Chapter 1. Introduction 7
multiple featurespaces and a performance evaluation is used to identify the opti-
mal featurespace(s) or feature(s) that are ideal for usage in a real-time monitoring
system. Therefore, this contribution pertains to the myriad of simulations per-
formed and the accompanying results to generate predictive models for EEG based
cognitive monitoring task.
1.4 Thesis Organization
The Thesis is organized as follows:
• Chapter 1 summarizes the motivation behind the study and an overview of the
various modalities used towards automated driver cognitive workload monitoring.
It concludes with the contributions of the study pertaining to the use of the EEG
modality of the EDD.
• Chapter 2 provides the experimental overview and the various features available
as part of the EDD and the EEG sensor as it pertains to performing the experi-
mental procedure. The statistical properties of the available feature-set is presented
and pre-experimental analysis is performed and discussed. The techniques of di-
mensionality reduction using PCA and LDA are also discussed.
• Chapter 3 describes the prior-works and the state-of-the-art studies performed for
both high resolution and low resolution EEG sensors. A discussion of the Feature
Extraction techniques and the accompanying results are discussed.
• Chapter 4 provides the Machine Learning Performance Evaluation to identify the
optimal featurespaces for the estimation task. The experiment pipeline, feature
processing methodology and simulation results are presented and interpreted to
identify the optimal models and feature(s).
Chapter 1. Introduction 8
• Chapter 5 provides a conclusion and summary of the study and discusses future
works that can succeed this preliminary study.
Chapter 2
eDREAM EEG Modality
2.1 Quantifying Cognitive Load
In order to provide a quantifiable experimental premise for this study, it is very important
to precisely define Cognitive Workload and the various associated modeling paradigms. A
comprehensive review by Bin and Salvendy review a consensus of literature on Cognitive
Workload and define it as the following: ”Amount of Mental Work or Effort necessary
for a person or group to complete a task over a given number of time.”.[54] Cognitive
Workload cannot be measured directly, instead, it needs to be modeled via other sub-
jective or quantitative means.[54][46] Therefore Cognitive Workload is a multivariate
measurement with temporal, physical(resources) and psychological(stress / anxiety etc)
attributes. A subjective methodology called the NASA-TLX index extends this to 6
aspects: mental, physical, temporal, performance, frustration and effort levels and is one
of the subjective measures employed in this study.[29] A proper modeling methodology
must take into account all of these attributes and allow for the quantification for analysis.
In addition, additional constraints are added once an automated monitoring system is
desired as post-hoc subjective paradigms are impractical in such scenarios. Bin and Sal-
vendy propose a modified taxonomy of mental-workload techniques, one of which is an
9
Chapter 2. eDREAM EEG Modality 10
Figure 2.1: Empirical Cognitive Modeling
Empirical Modeling methodology as shown in Figure 2.1. In this modeling method, sub-
jective, performance and psychophysiological methods are used in unison to determine
an emperically feaseable quantifiable model.[54]
Cognitive Workload during driving is related to the workload associated with driving
related tasks such as vehicle control, navigation, rule-following etc. This can be catego-
rized as the primary task. When the driver is in a cognitive overloaded state, it can be
due to sub-optimal conditions in one or all of the aforementioned cognitive workload at-
tributes(temporal, physical(resource), psychological demands). In experimental studies,
the secondary task is carefully controlled to generate a ’ground truth’ for the Cognitive
Load Level by adding weights to the temporal, physical and psychological demands.[30]
The Yerkes-Dodson summarization of ’performance vs mental-arousal’ can be used
in a driving context to describe the driver cognitive workload as shown in Figure 2.2.
A low arousal resulting in an underloaded cognitive performance can be related to low
temporal,physical and psychological demands. Some studies have associated this state
with drowsiness.[57] On the other hand, high temporal, physical and psychological stress
demands are associated with High arousal on the Yerkes-Dodson curve associated with
Chapter 2. eDREAM EEG Modality 11
Figure 2.2: Yerkes-Dodson: Arousal vs Performance
high alertedness and anxiety.[57]
As can be seen in the Yerkes-Dodson curve, a continuous spectrum displaying varying
levels of cognitive workloads can be manifested. It is therefore necessary in any modeling
attempts to have a controlled methodology to induce accurate granular measurements of
cognitive workloads. Various studies have proposed highly controlled secondary tasks to
supplement primary tasks and granular cognitive monitoring is achieved.[30][54][46][55]
Table 2.1 shows granular cognitive workloads as may be experienced by a driver. This
study uses a secondary n-back task which is used to create a 3-level cognitive workload
when supplemented with the primary driving task. 2.3
Various modalities are now discussed which provide real-time performance metrics to
Chapter 2. eDREAM EEG Modality 12
Table 2.1: Granular Cognitive Workload States
Driving Difficulty Multi-Tasking Mental State Cognitive Workload Example
0 0 0 Low Low Traffic, No Distractions, No Fatigue
0 0 1 Medium Low Traffic, No Distractions, Fatigue
0 1 0 Medium Low Traffic, No Distractions, Fatigue
0 1 1 Medium Low Traffic, Navigating, Fatigue
1 0 0 Medium-High High Traffic, Navigating, No Fatigue
1 0 1 Medium-High High Traffic, No Distractions, Fatigue
1 1 0 Medium-High High Traffic, Navigating, No Fatigue
1 1 1 High High Traffic, Navigating, Fatigue
monitor and model the quantifiable Cognitive Workload during driving.
2.2 Modeling Methods
Studies have been performed using Vehicular, Video and a wide assortment of Physiolog-
ical Modalities to model the driver cognitive workload. Each of these modalities presents
its own set of advantages and challenges which in turn affects the reliability, generaliza-
tion and practicality of the imposed solution. The increased reliability, generalization
and practicality allows for greater enabling of built-in pro-activeness into the proposed
solution which is a requirement for accident prevention.
Vehicular Modality
In vehicular modality studies, driver performance in physical tasks such as speed man-
agement, steering, breaking, lane deviations and response timings are extensively used to
generate models to determine the cognitive workload.[30] It is often impractical to test
such maneuvers in a real-world scenarios due to safety concerns, and therefore, indoor
mechanical emulations are often designed to collect experimental data. In addition, the
Chapter 2. eDREAM EEG Modality 13
modality is less temporally responsive when compared to Physiological measures and
may not provide the necessary proactive response when handling time-critical distrac-
tion stages. However, most vehicles already contain the necessary hardware (steering,
breaks etc.) and minimal additions are required to implement such a monitoring system.
As a result, various vehicles are already equipped with such systems to monitor driving
behaviors.[30][32][26][11]
Video Modality
The video modality uses facial derived information such as blinks, gaze, head movements
and facial-expressions to model the cognitive workload. Video allows for excellent tem-
poral tracking of the behavioral response of the driver via movement based gestures.
However, privacy concerns and installation of camera(s) are required for the implemen-
tation of such a system. Additionally, the computation requirements to process the
high-data streams are quite significant and image processing and proactive predictive
algorithms need to be aligned to maintain the high temporal response required in an
overloaded scenario. Finally, the success of this modality is dependent on the ambi-
ent recording conditions which can be a challenge due to varying external and internal
lighting conditions.[30][32][26][11]
Physiological Modality
Various types of bio-medical modalities such as Heart Rate, Electroencephalography,
Respiration Rate and Skin Conductance are used to model the cognitive workload. Phys-
iological sensors provide the best temporal resolution and are able to most accurately
detect the changes in the physiology of the subjects. In addition, this data can be quan-
tized to accomplish lower bandwidth for processing and therefore compute requirements
are lowered when compared to the video modality. Monitoring using this modality is able
to accomplish the proactiveness required in time-critical periods of distraction. However,
Chapter 2. eDREAM EEG Modality 14
physiological modalities are among the most intrusive as external hardware is worn by
the user. In recent years, less intrusive wearable sensors are being developed which show
for an encouraging sign of practical implementation of this modality as a monitoring
system.[7][9] In 2018, Nissan implemented a practical EEG based monitoring system to
monitor driver vigilance.
Subjective Modality
While direct quantitative measures are the preferred mechanism to create a reliable and
testable monitoring system, subjective measures such as the NASA Task-Load-Index and
Karolinska Sleepiness Scale can be used as supplementary methodologies to receive post-
hoc subjective human feedback such as the difficulty of the task or self-perceived levels
of workload.[29][52] Using the subjective modalities as the sole measurement modality is
not feasible due to high cross-trial and individual variability in self-perception of tasks,
non-quantitative nature of measurements and impracticality towards an automated cog-
nitive monitoring system. Instead, using the subjective measures as a secondary measure
for validation and comparison can be very useful, in particular, when ground truth veri-
fication are to be made and a confidence premise is required to determine if a particular
task-difficulty is being indeed perceived by the user. [54]
2.3 n-Back Secondary Task
n-Back task is a commonly used working memory function task used to test the memory
recall and cognitive processing of the brain. [42] In this study, an auditory version of
the nBack task is performed by a participant. A sequence of letters are played to a
participant, and the participant is required to keep track of the letters played ’n’ steps
ago. If the latest letter played matches the letter played ’n’ steps ago, the participant has
encountered an ’n-back’ event. The participant is asked to report the total number of
Chapter 2. eDREAM EEG Modality 15
’n-back’ events experienced at the end of the task. In this modified task, the participant
not only has to store the letters from ’n’ steps ago but also has to locally store the
number of occurrences of an n-back event which adds to the difficulty of this modified
n-back auditory task. An example is showcased next: [40]
A B C D E F G H G I J J
A B C D E F G H G I J J - 1-back region
A B C D E F G H G I J J - 2-back region
2.3.1 n-Back for the Driving Task
As described before, in order to successfully model Cognitive Load, a secondary task can
be introduced to create a granular cognitive workload variable. One method to do that
is to design experiment scenarios where the driving task is controlled and treated as the
primary task and a secondary cognitive task is varied to generate scenarios where the
affects of varying secondary cognitive loads can be examined. n-Back task is a common
choice as a secondary task across various studies to induce a granular cognitive workload
environment.[53][35] A combination of a controlled primary task and a varied secondary
n-back task (by varying ’n’) allows for the creation of a cognitive workload environment
where drives performed can be labeled with an associated cognitive workload level.
In this study using the EDD,[40] a modified n-back ’audio’ task was presented to the
participants. The outline of the task is as follows:
1. Each participant performs 3 separate drives associated with 3 N-back levels (N0,
N1 and N2)
2. Participant is told about which n-back drive is being performed
3. An audio version playing 10 randomly selected ’letters’ is presented to the partici-
pant for each drive
Chapter 2. eDREAM EEG Modality 16
4. The participant is required to keep a ’count’ of the number of specified n-back
occurrences and report the answer at the end of the drive.
5. This is repeated for all 3 distinct n-back drives
The main modification arises by not requiring the driver to continuously verbalizing
the n-back occurances and instead adding an extra memory dimension by remembering
the total number of n-back occurances. While this adds to the complexity of the task,
it also serves as an added benefit since motion based artifacts from speaking during the
recording process are reduced. Details of the recording and experiment design decisions
can be gathered from the eDREAM Data Collection document. [40]
2.4 eDREAM Experiment
In the eDREAM experiment[40], a myriad of sensory data: (1) Vehicle-Based Measures,
(2) Physiological Measures (EEG, ECG, Galvanic Skin Response, Respitation) and (3)
Video and Eye Tracking measures, are collected while a driver drives in a high perfor-
mance driving simulator called the NADS miniSIM. Drivers perform the primary driving
task in the presence of a secondary auditory-recall nBack task as described in Section 2.3.
The secondary nBack task is used to induce a controlled and granular secondary cognitive
task. A three-level granularity is chosen in the eDREAM experiment, such that, three
drives are performed by each participant with each drive consisting of a differing n-back
level. The primary and secondary task together induce a level of cognitive workload
allowing for the labeled generation of the EEG recordings. Details of the data collection
campaign can be found in [40] and the following sections will provide a summary and
then focus deeply into the EEG modality as it pertains to this study:
Participant Demographics
• 36 Participants (18 Male and 18 Female)
Chapter 2. eDREAM EEG Modality 17
• Age ≤ 35 years (27.6yr +- 4.45)
• Consistent Drivers with valid license for atleast 3 years
• No Vision-Correction Glasses (contacts allowed)
2.4.1 n-Back Drive Labeling
Three incremental n-back task drives are performed by the participants in random order.
Each drive is approximately 5-10 minutes in length, however, the audio n-back task is only
employed for 2 minutes within the drive denoted as the critical audio section. Therefore,
there are periods before and after the n-back audio task, which are recorded, however,
are not labeled under any cognitive workload. During the critical audio regions, the
driver performs the n-back task while driving in straight sections of road with controlled
non-distracting driving conditions. Each critical audio region consists of 3 sets of same
level n-back exercises (consisting of 10 letters). A break of approximately 45 seconds is
provided after the first critical section allowing for a relaxation of the cognitive state after
the mental exercises. Figure 2.3 shows a visual illustration of the n-Back drive timings.
During the drive, the participants are expected to follow a lead vehicle at a constant speed
and are not expected to make turns or change lanes. This ensures that the driver is mostly
impacted cognitively via the secondary n-back task and a controlled cognitive modeling
environment is implemented. The participants also undergo a training and preparation
procedure to ensure minimal errors and variability during the data collection phase.
Upon the completion of each drive, the results for the nback task are recorded. The self-
perceived subjective workload scores are recorded using the NASA-TLX methodology.
This allows for the creation of a subjective measure which could be used in unison with
the other modeling methods as a confidence measure during evaluation.
Chapter 2. eDREAM EEG Modality 18
Figure 2.3: n-Back Drive for Participant 8
2.5 EEG Data Collection
EEG Data collection is performed via a consumer-grade 4 channel sensor: Muse devel-
oped by Interaxon.[9] The device used is a prototype first-generation Muse device which
consists of two dry electrodes in the frontal sites and two gel foam electrodes in the two
temporal sites behind the ears as shown in Figure 2.4 and Figure 2.5 A remote computer
using the Interaxon development software, Muselab, is used to record the wireless data
that is transmitted. Muselab also consists of a graphical user interface to analyze the
EEG data transmission in real time and observe any artifacts. Timestamped data for
each drive is originally saved in ’.muse’ format and then converted to ’.csv’ and ’.mat’
formats for post-hoc analysis.
2.5.1 Time Synchronization
A critical step required for the labeling of EEG data from the Muse headset is to correctly
synchronize the timestamps of the EEG recordings with the real-timing of the drive
sequence. This is shown in Figure 2.6.
Chapter 2. eDREAM EEG Modality 19
Figure 2.4: Muse Headband electrode placements.
Figure 2.5: Muse Headband gel foam temporal electrodes and frontal dry electrodes.
Chapter 2. eDREAM EEG Modality 20
Figure 2.6: EEG Signal Labeling Procedure
This is achieved by using the visual frame generated by the miniSim Driving Simulator
and forwarding these frames to the Muselab recording software. By matchning the frame
numbers in the miniSim recordings and the EEG recordings, it is possible to determine
exact timestamps in the EEG domain which relate to drive events such as start/stop
of the critical audio tasks. Therefore, the independent timing of the EEG recording is
synchronized with the real-time timing of the minisim simulator and successful labeling
of the EEG data is achieved for each drive. The detailed procedure is described in the
eDREAM data collection document.[40]
2.6 EEG Headband Measurements
The Muse consists of 4 sensors, two located on the forehead at the corresponding Frontal
Polar Site (Fp) of the brain, denoted as Fp1 and Fp2 and two nodes located at the
Chapter 2. eDREAM EEG Modality 21
back of the ears at the corresponding Temporal-Parietal Site (TP), denoted as TP9 and
TP10. Data is processed on-site at the Muse headset and information is then transmitted
wirelessly to the remote recording computer. In addition to the sensory EEG data, a
variety of ancillary information is also transmitted related to the quality and recording
specifications of the measurements. Table 2.2 shows the various measurements and its
recording specifications.
Table 2.2: EEG Headband Measurements
Measurement UnitsSampling
Rate
EEG uV 220Hz
FFT dB 10Hz
Absolute Power Bels 10Hz
Relative Power None 10Hz
Blink Artifact 0/1 10Hz
Jaw Artifact 0/1 10Hz
Acceleration mG 50Hz
Headband Connection 0/1 10Hz
In-Device Headband Recording
A Digital Signal Processing (DSP) enabled embedded system is present inside the head-
band which interfaces with the raw electrodes and performs DSP operations to generate
the information that is transmitted wirelessly as shown in Table 2.2. While the wirelessly
transmitted information is used for this study, it is important to consider the original
Chapter 2. eDREAM EEG Modality 22
recording specifications and technicalities to understand the myriad of extra capabilities
available on-site, compared to the off-site localities. The following in-device configura-
tions as per used in this study are described:
• The EEG data is collected at a native sampling rate of 3520Hz.
• A downsampling by 16 takes place which generates a new EEG signal at 220Hz to
be transmitted wirelessly via the blutooth specifications.
• A digital notch filter is applied at 60Hz to reduce ambient lighting and power
artifacts.
• Each EEG measurement is recorded at a 10-bit resolution.
• An analog noise reduction technique using the Driven Right Leg Circuit is im-
plemented to reduce the environmental electromagnetic artifacts collected by the
human body.
2.6.1 Wireless EEG
The raw EEG signal at the four sites are originally sampled at 3520Hz and at a 10-bit res-
olution. However, to optimize processing requirements and transmit wirelessly, the signal
is downsampled by 16 times to 220Hz and transmitted wirelessly at bluetooth specifi-
cations. The recording is in the range of 0.0 - 1682.815 uV and each measurement is a
vector of length 4, corresponding the the measurement at each site (TP9,Fp1,Fp2,TP10).
The resulting power spectral features are derived from this 220Hz EEG signal, therefore
this can be considered to be the base signal from which all features may be derived.
2.6.2 Power Spectral Density Feature
While the EEG data stream may be used in its raw form for analysis, the MUSE headset
performs an on-device Fast Fourier Transform (FFT) of the EEG data to produce a
Chapter 2. eDREAM EEG Modality 23
Figure 2.7: 220Hz EEG for User 8
Logarithm of the PSD. This operation performs a transformation of the time-domain
signal into its Frequency-domain representation and Power Spectral analysis of the signal
can be performed.
A FFT generates both Amplitude and Phase components, however, Muse only trans-
mits the Amplitude components of the transformation as 129 Amplitude coefficients. In
order to generate a temporal representation of the spectral response, continuous FFT
transformations of the EEG are performed every 100ms. Muse performs the FFT trans-
formation on the 220Hz EEG signal every 100ms, thereby generating a 10Hz temporal
recording of 129 FFT Amplitude coefficients. The following steps describe this proce-
dure towards the creation of a time-series representation of the PSD response. This is the
algorithm as descibed by Interaxon to describe the inner workings of the headband:[9]
1. A Hamming Window of 256 samples is used to perform a FFT on the analog EEG
signal recorded at 220Hz.
Chapter 2. eDREAM EEG Modality 24
2. A 90 percent overlapping window is used whereby the hamming window is slid 22
samples (1/10th of a second @ 220Hz) and the FFT is performed. This generates
the 129 FFT Amplitude coefficients at the aforementioned 10Hz frequency.
3. Since the FFTs are calculated over a 256 sample Hamming Window, the transform
generates 256 symmetric FFT components over the origin. ’Negative’ Components
are dropped and 128 components along with the component at the origin are re-
tained, thereby generating 129 Power Spectral Amplitude components.
4. The FFT Amplitudes are transformed to the log scale with units in deciBels
The 129 coefficients are of significance as each coefficient corresponds to the Power
Amplitude at a bin size of 0.86Hz. Therefore for 10ms of EEG recording, a Power Spectral
Representation at a resolution of 0.86Hz from 0 to 110Hz is provided. Since most of the
EEG Spectral information is contained between 0Hz-50Hz, [9] a maximal frequency of
110Hz is highly satisfactory for our analysis to generate subsequent features.
Power Spectral Bands
5 main EEG Power Spectral Bands of interest are defined in Table 2.3.
Figure 2.8 shows the median PSD at Fp1 site for a drive performed by user 8.
2.6.3 Absolute Band Power Features
As described in Section 2.6.2, the 5 main EEG Power Spectral Bands of interest are
the δ, θ, α, β and γ Bands. The 129 Fourier Amplitude Coefficients derived in Section
2.6.2 are used to generate these cumulative Absolute Power Features. It is defined as the
Logarithm of the sum of the Power Spectral Density over the desired Frequency Range
as shown in equation 2.1 with unit ’Bels’.
Pabs = log10(
j∑n=i
Xn), s.t. : 0 ≤ i, j ≤ 128 (2.1)
Chapter 2. eDREAM EEG Modality 25
Table 2.3: EEG Headband Measurements
Power
Band
Frequency
Range
(Hz)
δ 1-4
θ 4-8
α 7.5-13
β 13-30
γ 30-44
βγ 20-40
where Pabs is the Logarithm of sum of the PSD of the entire band, Xn is the FFT
Amplitude Coefficient of the nth selected frequency bin, i is the starting frequency-bin
and j is the ending frequency bin for the desired frequency-band.
2.6.4 Relative Band Power Features
Using the Absolute Band Power Features, the Relative Power features can be generated
via equation 2.2.
Prel =10Pabs
(10Pδ + 10Pθ + 10Pα + 10Pβ + 10Pγ )(2.2)
where Prel is the ratio of the selected frequency band to the total power in all of the
bands and Pabs is the Absolute Power of the band.
The Relative Band Power of a selected frequency band is the ratio of the energy of
the selected frequency band to the total energy of the five defined frequency bands. It is
Chapter 2. eDREAM EEG Modality 26
Figure 2.8: FFT Response User 8
dimensionless.
2.6.5 βγ Features
Studies have shown an increased discriminant behavior in band behaviors between 20-
40Hz.[53][48] Therefore, using Equation 2.1 and Equation 2.2, two new features are gener-
ated. βγ Absolute corresponds to the Absolute Band Power at 20-40Hz and βγ Relative
corresponds to the Relative Band Power at 20-40Hz.
Therefore, in total, 6 Absolute Power Features and 6 Relative Power Features are
generated at each channel.
2.6.6 β-Relative Features
Studies have shown the importance of using Low and High Frequency ratios in detecting
cognitive states.[31][17] The following four β-relative features are generated:
• (θ + α) / β
Chapter 2. eDREAM EEG Modality 27
• α / β
• (θ + α) / (α + β)
• θ / β
2.6.7 Additional / Artifact Information
Table 2.2 shows the artifact information that is provided. EEG recordings are prone to
artifacts such as eye blinks, movement, jaw clenches, skin conductance etc. Accelerometer
can be used to determine the head movement of the participant as well as any drastic
movements causing a noisy EEG signal. Frontal Sites(Fp1 and Fp2) experience higher
blink artifacts and these are reported via the blink artifact indicator. Temporal Sites(TP9
and TP10) experience greater affects of Jaw Clenches which is reported via the Jaw
Clench binary indicator. Most importantly, a Headband Status Indicator is present which
provides binary measurements of ’Good’ and ’Bad’ conductive status. This indicator
provides a strict description of the quality of the signal and is the primary indicator of
the integrity of the recorded EEG sample.
Figure 2.9 shows the EEG signal at the Frontal site synchronized with the artifact
recordings. Figure 2.10 shows the EEG signal at the Temporal Site synchronized with the
artifact indicators. As can be seen, Temporal Sites exhibit extreme noise as indicated with
the Headband Status Artifact. This was observed across all trials and can be attributed
due to the usage of a gel-based electrode at the TP contact points. Due to the high noise
artifacts present in the temporal sites, recordings from TP9 and TP10 were omitted in
this study.
Chapter 2. eDREAM EEG Modality 28
Figure 2.9: Sensor Conductivity at Frontal Site
Figure 2.10: Sensor Conductivity at Temporal Site
Chapter 2. eDREAM EEG Modality 29
2.7 Data Exploration
Since the two temporal sites are omitted from the analysis due to excessive noise artifacts,
the two frontal sites are fully taken into consideration. A group-wise data-exploration
using 28 chosen participants is now described.
2.7.1 PSD Response
The group-wise median PSD responses are generated across the 3 datasets. Frequency
Range from 0Hz-44Hz corresponding to the 5 bands of interest are evaluated.
Using the experimental data-collection notes and observing the artifact-information
across all 37 users, 28 out of the 37 users(15 male, 13 female) are chosen for this study.
Users dropped are excluded due to significant movement, poor skin-conductance, lack
of effort or incompletion of tasks. Due to the sensitivity and inherent inter-individual
differences in the EEG responses, users showing aforementioned artifacts/challenges were
dropped from the analysis to allow for maximal control of the affects of the secondary
nBack task.
Figure 2.11 shows the group median PSD response for the 28 specified users. Table
2.4 summarizes the findings:
Table 2.4: Group-wise(28) PSD Band Sensitivities
Experiment Sensitivity Prior Work
Cognitive Workload ↑ θ↑, αβγ↓ [31][35]
Chapter 2. eDREAM EEG Modality 30
Figure 2.11: Groupwise PSD across the 3 nBack tasks.
2.7.2 Statistical Featurespace Overview
The three sets of Power Spectral Feature (Absolute, Relative and β-Relative) can be
statistically analyzed for pre-experimental snooping to determine any underlying patterns
as the n-back task is varied. The Featurespace is first standardized such that all features
have a zero mean and are represented as a relative metric away from the zero mean.
This ensures that during the visualization, no features are over-weighted and a uniform
comparison can be made. This process is performed by using the Z-Score statistic as
shown in Equation 2.3.
Xscore =Xsample − µ
σ(2.3)
where Xscore is the normalized feature value, Xsample is the original feature value, µ is
the mean of the feature vector and σ is the standard deviation of the feature vector.
The following observations are made by analyzing the Feature Distribution before
experimental stages. This allows us to understand the distribution and first order sta-
tistical behavior of the data as it pertains to the samples present in the three distinct
Chapter 2. eDREAM EEG Modality 31
Figure 2.12: Absolute Power Feature vs nBack Task for group of 28 users at Frontal Sites
Chapter 2. eDREAM EEG Modality 32
Figure 2.13: Relative Power Feature vs nBack Task for group of 28 users at Frontal Sites
Chapter 2. eDREAM EEG Modality 33
Figure 2.14: βRelative Power Feature vs nBack Task for group of 28 users at Frontal
Sites
Chapter 2. eDREAM EEG Modality 34
workloads:
1. No single Feature showcases clear linear separability between the three classes.
2. All Absolute Power Features showcase decreasing trends as cognitive workload is
increased. α, β and γ bands show the most significant decrease.[31]
3. Low Frequency Relative Power Features (δ, θ and α) showcase increasing trends as
Cognitive Workload is increased. β, γ and βγ relative bands showcase decreasing
trends.
4. All βRelative Band Features showcase increasing trends as Cognitive Workload is
increased. This indicates that θ and α bands contain more power than the β band
as cognitive workload is increased. [31]
2.8 Dimensionality Reduction
Principal Component Analysis (PCA) and Linear Discriminant Analysis(LDA) are two
techniques used in this study to perform dimensionality reduction. Dimensionality re-
duction allows the following advantages:
1. It allows for the presentation of multi-dimensional data into a 2D or 3D space,
thereby generating a new featurespace that contains informative energy.
2. In various machine learning studies, dimensionality reduction is a key pre-processing
stage that allows for complex models to be trained successfully when training sam-
ples are scarce and dataset is high-dimensional. This problem is called the ”Curse
of Dimensionality”. [3]
3. Reducing the featurespace to a 2D or 3D space allows for the visualization of
higher dimensional featurespaces and allows us to make inferences as to the possible
separability of the featurespace.
Chapter 2. eDREAM EEG Modality 35
4. In real-time environments, it may be more computationally efficient to execute and
generate models in a low-dimensional space.
In this section, the combined and individual Absolute, Relative and β-Relative fea-
turespaces are reduced via PCA and LDA, and the data visualizations generated are
discussed.
2.8.1 Dimensionality Reduction for Individual Participants
PCA and LDA can be performed on datasets generated from individual participants to
generate reduced featurespaces and benefit from the aforementioned advantages. In par-
ticular, during the generation of individualized models for Machine Learning, there can
often be a scarcity of training samples as compared to the number of samples present for
generalized multi-user models. This scarcity of samples, added with a high-dimensional
space make the usage of dimensionality reduction techniques highly beneficial. The fol-
lowing figures showcase various PCA and LDA featurespaces applied to a 32-dimensional
featurespace (Absolute, Relative and Ration Features across 2 Frontal EEG channels)
for Participant 8:
The following observations can be made after dimensionality reduction for participant
8:
1. In the PCA space with 3-dimensions, it can be seen that linear separability is
possible between n0 and n1 datasets. Looking closer at the PCA1 vs PCA2 space
(Figure 2.16, it can be seen that it may be possible to linearly separate all three
classes. Therefore, the components with the greatest informative energies may also
be most discriminant.
2. Figure 2.17 showcases the 2-D LDA space. It is very clear in this representation,
that a simple linear classifier can be trained to very accurately discriminate be-
tween the three classes. Therefore, Linear Discriminant Analysis is successful in
Chapter 2. eDREAM EEG Modality 36
Figure 2.15: Single Participant PCA on 32-features
Figure 2.16: Single participant PCA on 32-feautures
Chapter 2. eDREAM EEG Modality 37
Figure 2.17: Single Participant LDA on 32-features
maximizing the distances between the three classes.
However, it should be noted that these are individualized spaces for Participant 8 only
and empirical evaluation and experimentation as performed in Chapter 4 are required to
determine the performance of all individual participants in this dataset.
2.9 Dimensionality Reduction for all Participants
Datasets can be generated via the same dimensionality reduction techniques applied
to the individualized sets. The following figures show the LDA and PCA spaces for
the cumulative dataset of 28 participants considered in this experiment. The following
generalized obersvations can be made across all complete and sub-featurespaces:
1. As the datasets from individual users are concatenated to generate a cumulative
dataset, the PCA and LDA techniques by themselves are unable to generate linearly
separable featurespaces.
2. Significant Overlap between the three classes is present across all featurespaces and
Chapter 2. eDREAM EEG Modality 38
Figure 2.18: PCA applied to a 32-Dimensional Featurespace (28 Participants)
feature-processing techniques are required to take into account inter-individual and
outlier affects on the datasets.
3. It should be noted, that no windowing, standardization or normalization schemes
have been performed on these datasets and these visualizations are part of the
original data generated by amalgamating the labeled data from all users.
4. The dimensionality reduction techniques showcase the 2D and 3D representation of
the datasets and it presents the challenges associated with performing a data-mining
approach to learn from data. The feature-processing techniques in Chapter 4 aim
to deal with these challenges to improve the performance of cognitive workload
prediction using the eDREAM dataset.
2.10 Chapter Summary
This chapter introduced Cognitive Modeling using a mixture of Subjective, Psychophys-
iological and Performance Measures. It introduced the advantages of using the EEG
Chapter 2. eDREAM EEG Modality 39
Figure 2.19: PCA applied to a 32-Dimensional Featurespace (28 Participants)
Figure 2.20: LDA applied to a 32-Dimensional Featurespace (28 Participants)
Chapter 2. eDREAM EEG Modality 40
Figure 2.21: PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-
ipants)
Figure 2.22: PCA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-
ipants)
Chapter 2. eDREAM EEG Modality 41
Figure 2.23: LDA applied to a 12-Dimensional Absolute Power Featurespace (28 Partic-
ipants)
Figure 2.24: PCA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-
pants)
Chapter 2. eDREAM EEG Modality 42
Figure 2.25: PCA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-
pants)
Figure 2.26: LDA applied to a 12-Dimensional Relative Power Featurespace (28 Partici-
pants)
Chapter 2. eDREAM EEG Modality 43
Figure 2.27: PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants)
Figure 2.28: PCA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants)
Chapter 2. eDREAM EEG Modality 44
Figure 2.29: LDA applied to a 12-Dimensional β-Relative Power Featurespace (28 Par-
ticipants)
modality as a Psychophysiological measure. The eDREAM experiment was briefly intro-
duced with a focus on the nBack task as a secondary cognitive task to accompany the
primary driving task. The EEG modality of the eDREAM dataset using the Low Channel
Muse sensor was discussed and an introduction to the Power Spectral Feature Extraction
methodologies employed were discussed. Statistical performance of the Features were
then observed and analyzed to perform pre-experimental snooping of the patterns of the
features. Finally, a comparison between individualized and groupwise Dimensionality
Reduction techniques using PCA and LDA were discussed. Dimensionality Reduction as
a visualization and Machine Learning toolset were both discussed.
Chapter 3
Related Works and Review
This chapter showcases the summary of the related works that were used to perform
this study. Attention is paid towards the EEG modality in particular as it is the sys-
tem modality in this study. There have been two approaches towards the estimation of
the Driver Cognitive Workload monitoring: 1) Using specific Digital Signal Processing
algorithms to develop fine tuned solutions, 2) Usage of data mining and statistical learn-
ing approaches have been extensively used to perform top-down or bottom-up feature
selection processes. This study employs a bottom-up statistical learning approach and
the following sections summarize the related and state-of-the-art studies which have used
similar techniques to accomplish the goal of studying the relationship between Cognitive
Workload and EEG sensitivities.
This chapter is organized as following, Section 3.1 discusses Power Spectral sensitivity
findings in related domains of Drowsiness / Fatigue, Cognitive Workload / Distraction
and Meditation Section 3.2 discusses the state of the art works using consumer-grade
low-channel EEG sensors.
45
Chapter 3. Related Works and Review 46
3.1 Related Works
A myriad of studies employ feature extraction techniques to operate in the power spectral
domain and garner statistical significant activities in the five major bands as employed
in this study: δ θ α β γ While many studies deal with driving-related cognitive work-
load estimation tasks, a myriad of studies deal with variations of cognitive states such
as fatigue, drowsiness, sleepiness and meditation. In the context of the literature review,
studies pertaining to the relationship of EEG features with cognitive workload and afore-
mentioned states are both considered to understand the techniques or frameworks used
in the development of estimation models.
3.1.1 Drowsiness and Fatigue studies
A myriad of studies showcase the relationship between Power Spectral sensitivities and
onsets of Drowsiness / Fatigue.[34][31][33][38][25][56][50] In terms of the the Yerkes Dod-
son summarization, the Drowsiness/Fatigue states can be related to low arousal or a
cognitively under-loaded condition.[57] The following takeaways relating the power spec-
tral sensitivities are presented from their respective literature:
• Lal, Craig and Jap have performed prominent research related to drowsiness de-
tection in driving conditions which primarily deals with all the bands except γ.
Primary findings relate to a significant increase in δ, θ and a decrease in α and
β power bands during onset of fatigue.[34] In another study to understand the
onset of fatigue during long monotonous driving, a significant decrease in α was
observed along with a consistent increase of β-Relative Features (as employed in
this study).[31]
• It is also interesting to discuss the relationship between Cognitive Worload and
Fatigue. In a comprehensive review performed by Borghine et al. in an aircraft-
pilots cognitive study, an onset of focused and sustained task often leads to fatigue
Chapter 3. Related Works and Review 47
over time.[16] Both activities consist of an increase in θ and β activities and a
decrease in α activities. However, during the onset of fatigue, α spindle activity
(bursts of α bands) is observed leading to increased δ, θ and α activities and a
decrease in β activity.[49] Changes from an abrupt cognitively overloaded to an
underloaded scenario (and need to distinguish between focused and fatigued) can
be very challenging to distinguish when only single bands are selected for modeling.
Driver Drowsiness Monitoring
To generate monitoring systems, a classification stage is necessary to train and test the
various feature extraction methodologies. Khusuba et al. perform feature extraction
using Wavelet Packet Transform and select frequency bands by inspecting and mining
frequencies form the entire spectrum. These selected features are used to generate indi-
vidualized models and are tested with Linear Support Vector Machine (LSVM), Radial
Basis Kernel Support Vector Machine (RBSVM), LDA and kNN statistical learning al-
gorithms. Due to the comprehensive feature selection process and individualized models,
an accuracy of .94 is achieved using an LDA classifier.[33] Conversely, Yeo et. al inves-
tigate the use SVM for classification duties and use the traditional FFT mechanism for
feature selection. An accuracy of .99 is achieved for detecting onset of drowsiness.[56] It
is important to note that both studies use medical grade EEG sensors with high channel
counts in a highly controlled environment which may not be practical for real driving
situation.
3.1.2 Cognitive Workload and Distraction
A breadth of studies have also attempted to study the sensitivities associated with Cog-
nitive Workload. While Cognitive Workload and Cognitive Distraction are two distinct
phenomenon, they are both associated with the prevalence of a secondary task. There-
fore, for this review, both distraction and workload based studies are discussed under the
Chapter 3. Related Works and Review 48
same umbrella. A key finding consistently shown in most studies is the increase in θ band
behavior and a decrease in α and β band behavior as cognitive workload is systematically
increased.[24][47][51][14][59][48][23][35][41][17]
1. In a non-driving study and in a single task environment to recall upto 6 objects,
Lundqvist et. al showed a consistent increase in θ and γ band as each new item is
added to the working memory task. Conversely, α and Low β show a decreasing
trend.[41] Related to this, although not discussed as often, studies by Harmony et.
al have focused on the δ band and showcased an increase in δ band behavior as
similar memory retention tasks are engaged. [27][28]
2. Ryu et. al tracked the α band power as a mental arithmetic task was performed
with increasing difficulty and observed a decrease in band power. Using factor
analysis when fusing α Band, ECG and EOG, it was determined that a combined
fusion of the sensors could better distinguish between the varying workloads. [45]
Similar fusion techniques have been performed with EEG where using features from
other sensors or modalities can boost the performance of the combined measure.[4]
3. A comprehensive study by Almahasneh et. al investigated the frontal lobes and
the associated hemispheric performance on cognitive memory and processing tasks.
The affects on the frontal right lobe were more pronounced than the left lobe using
arithmetic tasks as the secondary task to induce distraction during driving. An
increase in θ and low-α activity was both observed in the frontal right lobe after
analysis across 40 subjects. [13]
4. In an prominent study performed by Gevin, Smith et. al, a visio-spatial recall
primary task was performed where subjects were asked to recall the shape and
position of objects on a computer screen. A significant increase in θ activity at the
frontal sites were observed as the recall difficulty was increased corresponding with
a decrease in α bands in the parietal sites. [48][23]
Chapter 3. Related Works and Review 49
Figure 3.1: Power Spectral sensitivities across cognitive tasks
5. In studies performed on meditation which focus on attentiveness while in a relaxed
state, γ band was considered to be of importance. In various studies it was seen that
an increase in γ band activity in the pre-frontal cortex was observed in experienced
meditators corresponding to increased attentiveness. [44][15] Another meditation
study described an increase in α and decrease in β activities during meditative
practices focused on attentiveness in a non-distracting environment. [19]
Figure 3.2 summarizes the findings.
Chapter 3. Related Works and Review 50
3.2 State-of-the-Art Studies
In this section, studies that employed practical cognitive monitoring using the EEG
modality are discussed.
3.2.1 Wireless EEG System for Driver Vigilance (Lin et. al)[38]
In this practical study, a custom wireless EEG sensor is first designed using 4 dry elec-
trodes concentrated along the occipital region (back of the head). A mobile app is
designed which communicates wirelessly with the EEG sensor (Mindo). Experiments
and evaluation is performed on 15 users performing an immersive and focused 90 minute
driving challange in a virtual simulator. The challenge entails the correction of the vehi-
cle as it veers off its lane. The response time to perform the correction is recorded and
the duration of the response time is used as a labeling mechanic. Unlike other studies,
this study does not generate classification models, but rather generates Support Vector
Regression models to predict the response time of corrections based on EEG states.
Hardware
A wireless headset with 4 channels is developed with analog amplification circuitry to
boost the low voltage EEG signals from the hairy scalp region of the occipital lobe. The
analog signal is digitized and sampled at 256Hz which serves as the original sampling
rate of the sensor. Data is transmitted wirelessly every two seconds to the mobile unit
upon which all the signal processing takes place.
Pre-Processing
The two second data comprising of 512 samples are broken down into 128 sample chunks
and using a 256 Hanning Window, the PSD is generated using Welsh’s method. There-
fore, PSD components are generated on 0.5s sliding windows with .50 overlap. 30 Fre-
Chapter 3. Related Works and Review 51
quency bins are subsequently generated comprising of a spectral range of 1-30Hz. These
are in-turn used to create the corresponding δ, θ, α and γ logarithmic band powers.
Further, the data received is divided into training, testing and baseline recordings.
The baseline recording are two seconds of data before the onset of the driving correction
challenge. The EEG data from the baseline is used as a normalization factor for the
training and test sets to gauge the true change in band behaviors.
Classification
Support Vector Regression models using linear, sigmoid, polynomial and Radial Basis
Kernels are all explored. Individualized models are generated using a 50 - 50 split 2-Fold
cross validation procedure repeated 100 times. Final reporting is presented on remaining
test samples.
Results
α Band Power was highly correlated to the reaction time as has been shown in vari-
ous studies. The lack of vigilance as showcased by increased reaction times to vehicle
corrections corresponded with higher values of alpha. 6 feature-sets were experimented
with: [δ], [θ], [α], [β], [δ,θ,α,β], [30 1Hz bands from 1-30Hz]. Amongst the single bands,
α obtained the lowest RMSE and concatenating all bands further reduced the RMSE.
However, the best result was obtained by using 30 features from the 1-30Hz band region.
The Radial Basis Function SVR obtains the best regression results for all feature-sets.
Discussion and Learnings
In this study, individualized models are successfully developed and used to predict the
reaction times to lane deviations. Based on the predicted reaction times, it can be inferred
if the driver is indeed in a low vigilant state. The following similarities and approaches
to this study are presented:
Chapter 3. Related Works and Review 52
1. The idea of using a baseline values as a normalization factor is also performed in
this (our) study where the 30second drive data before each nback audio task is
used as the baseline data over which Z-Standardization takes place.
2. In our study, a similar cross-validation procedure is performed. However, instead of
reporting on the test set, reporting on the average of all validation set performance
is reported.
3. A low 4-channel sensor is developed which is similar to our study also employing 4
channels(only 2 used). However, the stark difference in the location of the channels
on the scalp should be observed. While this study employs sensors in the occipital
scalp region, our study employs it in the frontal scalp regions to monitor cognitive
workload rather than vigilance.
4. A very similar feature extraction method is used where a sliding window based
approach is taken by both methodologies to perform a zero-padded FFT transfor-
mation. In our study, a Hamming Sliding Window over a 220Hz signal is used
with 90 percent overlap to generate a reporting rate of 10Hz. This work, instead
uses a Hanning Window over 128Hz sub-window and an overlap of 50 percent for
a reporting rate of 2Hz.
5. The feature-set generation and comparison methodology of this study using indi-
vidual power spectral features, grouped power spectral features and FFT power
components is employed in our study.
6. At the feature-processing level once the PSD features have been generated, a 2
second sliding window is employed in this study to ensure good reporting resolution.
In our study, a similar 3 second sliding window with .95 overlap is employed for
similar reasons.
A limitation of this study is the lack of discussion or evaluation on the generalization
Chapter 3. Related Works and Review 53
of a solution. It is shown that the results are a mean of the individualized model pre-
diction results. In addition, due to the problem being framed as a regression challenge,
it is difficult to compare it with the classification metrics prevalent in most studies for
distraction / cognitive workload estimation. Another limitation is the small sample size
of participants over which the study has been conducted (15).
3.2.2 Wireless EEG for Cognitive Workload(Wang et. al)[53]
In this study, a computer based visio-spatial n-back game is played by 9 participants and
the EEG data is collected using a 14 channel Emotiv EEG Headset.The 14 channels cover
the temporal, occipital and frontal scalp regions. Labeling of the EEG data is performed
simply by noting the n-back task being performed. 4 levels of nback task are presented
labeled as n0 - n3.
Feature Extraction
Original EEG recording is performed at 2048Hz and then downsampled to 128Hz for
wireless transmission. The EEG signal undergoes four types of Feature Extraction: 1)
FFT based Power Spectral generation in 2Hz intervals between 4-40Hz (18 features), 2)
Statistical Features: mean, variance, skewness and kurtosis of signal, 3) Morphological
Features: curve length, number of peaks, average non-linear energy, 4) Time-Frequency
Features: Wavelet Packet Transform. Therefore as can be seen a large set of features are
generated across the 14 channels.
Pre-Processing
A Personal Standardization approach is introduced in this study which employs an Inter
Quartile Range based feature scaling based approach. In this approach, all features are
scaled to a range between [0,1] and by looking at the historical data of the feature, a
statistical analysis similar to that as performed on ’boxplots’ is performed where outlier
Chapter 3. Related Works and Review 54
values are identified and saturated with values of 0 or 1. The goal of this process is to
ensure that outliers are removed from the dataset and by ensuring that all participant
feature values are normalized onto the same scale. Following, dimensions of each epoch
is 648 and therefore a top 10 feature selection using dimensionality reduction technique
of Maximum Redundancy Maximum Relevance is performed which uses feature mutual
information as the distance measure to choose the top features for the classification stage.
Classification
A Radial Basis Function SVM was employed using 5-Fold cross validation to produce
binary classification results using one-vs-all and one-vs-one classifiers. The Top Results
are now stated for a dataset generated from those trials which contained correct nBack
responses:
1. When using top 10 features from the entire featurespace of power, morphological,
statistical and time-frequency features, an accuracy of 0.82 is achieved between n0
vs all and n0 vs n3 classification challenges. n0 vs n2 achieves 0.71 accuracy. A
ternary comparison is not performed. All other binary classifications comparisons
lead to accuracy between 0.65 and 0.70.
2. When using top 10 features only from the Power Band set (4-40Hz), a top accuracy
of .71 is achieved for n0 vs n1 and n0 vs n3. It is odd that an accuracy of .55 is
achieved for n0 vs n2. In general, all binary classification results are between 0.6
and 0.71 when using the top features from this feature-set.
Discussion and Learnings
Similar to our study, this study aims to build a generalizable model and the technique to
Personal Standardization is discussed which is used extensively in our study. The same
Feature Scaling approach is used to perform subject-level normalization as will be shown
Chapter 3. Related Works and Review 55
Figure 3.2: State-of-Art Studies employing EEG and nBack Tasks
in Chapter 4. The usage of the nBack task for Cognitive Load modeling with 4 classes
is also similar to our study where a granularity of three levels is chosen. Dimensionality
Reduction is also a common theme between the two studies, however, PCA and LDA
based reduction techniques are used in our study. An advantage of the mRMR approach
employed in this study is the usage of the original features for classification which makes
it much more descriptive to understand. When using PCA or LDA, a transformation
into a secondary space takes place leading to loss of the original feature information.
A disadvantage of this study lies in the methodology that the Classification stage is
performed. Since the data is collected a-prior, data snooping in the form of the Personal
Standardization approach is performed. This is due to the fact that during the outlier
removal process, the entire dataset is snooped into to understand the distribution of the
dataset and then the dataset is used for a cross validation process. This is not feaseable
in practical applications where ’test’ data cannot be used for statistical purposes in the
training stages.
Chapter 3. Related Works and Review 56
3.3 Chapter Summary
In this chapter, a myriad of the literature used to understand the research subject are
discussed. Findings in the Power Spectral Domain related to Cognitive Distraction,
Cognitive Workload, Drowsiness and Fatigue are discussed. The importance of the Power
Spectral Bands and individual windowed frequency featurespaces is discussed. Finally,
two state-of-the-art studies that are used extensively in our study are discussed with a
goal to critique and share the techniques adapted from these studies in our study.
Chapter 4
Estimating Cognitive Load with
Wireless EEG
Using data located from the two frontal channels of the Muse EEG headset, the goal of the
estimation analysis is to identify the ideal feature(s) that can be successful in estimating
the cognitive load of the drivers. To that extent, in this study, a data-driven experi-
ment is performed where Statistical Learning Models are developed and implemented to
explore the various featuresets generated from the eDREAM dataset. An experimental
methodology used in many machine learning studies consists of well defined practices
pertaining to the training, validation, testing and evaluation stages. In this study, these
methodologies are followed closely and will be described towards generating generalizable
models to estimate cognitive workload while driving. In particular, using the advantages
of the labeled eDREAM dataset, it naturally leans towards the development of super-
vised learning based Classification models. Traditional algorithms such as LSVM, kNN,
RBSVM and ANN are generated for the evaluation of the various feature spaces.
It should be noted, however, that the implementation and evaluation of the learning
models is highly dependent on the various data-partitioning schemes and the details
of the feature extraction techniques. The goal of this study is to build a robust real-
57
Chapter 4. Estimating Cognitive Load with Wireless EEG 58
Figure 4.1: Top Level Design Pipeline
time implementable system that aims to be generalizable among subjects, and to that
extent, the experiment takes into account details of data partitioning, feature processing
and dimensionality reduction techniques(PCA and LDA). Recommendations based on
the results and implication of design decisions to the real-time implementation of the
proposed system are provided.
The major goal of this experiment is to identify the top performing models and
thereby identify the best performing feature-set(s). The complexity of the the learning
algorithms are also identified and real-time implications towards learning and testing are
discussed. Figure 4.1, shows the design pipeline. Each of the stages are now described
in detail.
4.1 Experiment Overview
The overview of the experiment is provided in Figure 4.1. Details about the labeling
procedure and experimental data collection are provided in Chapter 2. FFT coefficients,
Power Spectral based Absolute and Relative features are derived by the sensor using the
raw EEG recordings. Data is recorded concurrently across the two frontal EEG channels.
Chapter 4. Estimating Cognitive Load with Wireless EEG 59
Table 4.1 describes feature-set provided by the sensor.
Table 4.1: PSD Features
Measurement UnitsSampling
Rate
EEG uV 220Hz
FFT dB 10Hz
Absolute Power Bels 10Hz
Relative Power None 10Hz
An objective of this study is to understand the top featurespaces that allow for opti-
mal estimation of cognitive load. To that extent, various datasets consisting of varying
featurespaces and dimensionalities are generated to identify and perform an evaluation
based feature-selection. The underlying nature of this experiment is to learn from the
data and find top feature(s) that maximally discriminate between the differing cognitive
workloads.
A major challenge towards building a practical generalizable model is to evaluate the
various system-level challenges associated with the implementation. EEG signals pose
various challenges due to the non-stationary characteristics of the signal, highly variable
inter-individual electrophysiological responses and high prevalence of recording artifacts.
[53][38] While the data collection campaign has taken optimal care towards ensuring
a controlled and low artifact environment, steps in this experiment deal with issues
pertaining to the non-stationary characteristics of the eeg recordings and emphasis is paid
towards reducing the effects of the high inter-individual differences to build generalizable
estimation models. The Standardization, Normalization and Data-Partitioning schemes
described in Section 4.3 and 4.2 tackle these challenges. In addition, performance and
Chapter 4. Estimating Cognitive Load with Wireless EEG 60
practical implications of using these Feature-Processing techniques are discussed.
Using the cumulative eDREAM dataset, various sub-datasets can be derived by vary-
ing the featurespace, performing dimensionality reduction (PCA and LDA) and gener-
ating feature-processed datasets. These sub-datasets behave as the inputs to the Clas-
sification stage whereby different data-partitioning schemes (Subject-Partitioned, Time
Partitioned and Individualized Subject-Specific) are discussed and evaluated. Well de-
fined performance metrics are used to quantify the evaluation results and an assessment
on the findings is discussed. (Section 4.5.2)
4.1.1 Data Selection
Using the experimental data-collection notes and observing the artifact-information across
all 37 users, 28 out of the 37 users(15 male, 13 female) are chosen for the analysis. Users
dropped are excluded due to significant movement artifacts during experimentation, poor
skin-conductance, lack of effort as per the experimental notes or in-completion of tasks.
Due to the sensitivity and inherent inter-individual differences in the EEG responses,
users showing aforementioned artifacts/challenges were dropped from the analysis to
allow for maximal control of the affects of the secondary nBack task.
4.2 Feature Extraction
The goal of this study is to understand the ideal Power Spectral feature-set(s) towards
estimating the cognitive workload. The basic time-domain data provided is the raw EEG
signal as a Voltage representation. Relevant studies have used derived features from the
raw EEG signals as features for their studies. These features range from statistical,
morphological and most often Power Spectral.[53][37][4][38][33] This study focuses on
the use of Power Spectral derived featurespaces. Table 4.2 outlines 3 groups of Power
Spectral derived features: Absolute Power, Relative Power and β-Relative Power. All
Chapter 4. Estimating Cognitive Load with Wireless EEG 61
features are derived from the FFT transformation of the raw EEG signal and the resultant
generation of the 129 Power Spectral Amplitude Coefficients as described in section 2.6.2.
4.2.1 Cumulative Feature Space
A 16 dimensional feature vector is generated with 6 Absolute Power, 6 Relative Power
and 4 β-Relative Features. Since there are two channels of recording, a 16-features *
2-channels = 32-dimensional Feturespace is obtained. It is possible to fuse the chan-
nels and generate an averaged 16-dimensional space. However, a decision to retain the
16-features from both sides is taken with respect to the distinct diversification of brain
functions among different hemispheres [13] and the observed non-uniformity in the spec-
tral behavior across the two sites.
Table 4.2: 16-dimensional Feature Vector (per channel)
Absolute Power Bands Relative Power Bands β-Relative Power BandsSampling
Rate
δ, θ, α, β, γ, βγ δ, θ, α, β, γ, βγ((θ + α) / β), (α / β), ((θ + α)
/ (α + β)), (θ / β)10Hz
4.3 Feature Processing
In order to mitigate the effects of noise artifacts and inter-user differences, it is rec-
ommended to perform feature processing operations on the time-series feature vectors.
However, it is to be noted, that much of the signal processing on the raw EEG signal is
pre-performed by the sensor, and processing at this stage is at the Featurespace Level.
To that extent, the pre-processing stage is applied to the 32 dimensional Featurespace
Chapter 4. Estimating Cognitive Load with Wireless EEG 62
Figure 4.2: Top Level Design Pipeline
reported at a temporal resolution of 10Hz (100ms). Figure 4.2 shows the Feature Pro-
cessing pipeline and subsequent processing on the signal from Subject 37.
4.3.1 Windowed-Averaging
Windowed Averaging of the time-domain feature-set is used to provide adaptive noise re-
duction and smoothing as a pre-processing step before the classification stage. Since the
featurespace is reported at 10Hz, a strong correlation in information is present between
consecutive samples. However, it is also possible for samples to be contaminated with
artifacts and individual noisy samples may overpower or distort the ’informative’ signal
present in nearby noiseless samples. Windowed Averaging can allow for the reduction
in the artifact affects and informative trends of the windowed signal are preserved. Fur-
thermore, the artifact information transmitted by the sensor is used to pinpoint samples
where artifacts were reported. This artifact information is used to actively remove the
samples from consideration. After thorough analysis, users selected for the experimen-
tal analysis exhibited minimal artifact samples. Therefore, active artifact removal and
windowing are both used to allow for adaptive noise reduction and reduce the affect of
outliers in the classification training stages. The two major goals of Windowed-Averaging
as pertaining to this experiment are now summarized:
Chapter 4. Estimating Cognitive Load with Wireless EEG 63
• Adaptive Noise Reduction.
• Ensure that the sample count is preserved and sufficient samples are available for
the Classification stages.
• Ensure reporting responsiveness in a real-time environment.
Two windowing schemes that were experimented with are discussed. The decision to
use the Overlapping Moving Window scheme for subsequent analysis is discussed:
Non-Overlapping Moving-Average-Window
A non-overlapping moving-average-window operates by sampling and averaging every n
samples based on the sampling duration used. For example, at 10Hz, and a selection of
a 1 second Non-Overlapping Window, a new windowed measurement will be available
every 1 second. The non-overlapping nature of the scheme is desirable as it allows for
flexible granularization of time segments where independent averages may provide more
meaningful insights. For example, in cases where changes in discrete cognitive workloads
may occur every few seconds such as playing an action video game, it may be desirable
to have no overlap between each short window to be able to more robustly analyze the
distinct loads and prevent contamination due to overlap. However, since the driving
workloads are much longer in duration pertaining to the nBack tasks(30s tasks), the
non-overlapping window does not pose much benefit.
A disadvantage of this scheme lies in the reduction in the responsive reporting of the
measurement. The length of the window is also the real-time reporting rate as samples
must be collected and averaged before moving to the successive processing stage. If
long windows are desired for optimal performance, significant latency may be present
between recording and reporting. This can particularly be of issue when operating in
High Cognitive Workload periods where instant feedback is desired.
Chapter 4. Estimating Cognitive Load with Wireless EEG 64
Another disadvantage is the drastic reduction in the number of samples as it con-
sumes many samples over the averaging process. For example, at a sampling rate of
10Hz, and using a 1s non-overlapping moving window, 10 samples would be averaged
to generate a single sample. This results in a reduction of sample counts by a factor of
10. This reduction in sample count can be a detriment to the learning process where
it is desired to have as many samples as possible to facilitate the learning of complex
learning models with high-dimensionality.[3] These challenges can be mitigated with an
Overlapping Window.
Overlapping Moving-Average-Window
A simple Dirichlet(Boxcar) window is used for the analysis. The overlapping scheme is
generated by using a sliding window of size n which is slid by 1 for each new sample. The
windowed result is generated by averaging the new sample with n− 1 previous samples.
A major advantage of using this scheme is the preservation of the total sample counts
to be used for the succeeding classification stage. As noted earlier, having an increased
number of samples for the learning stages allows for the development of more robust and
generalizable learning models. Another advantage is the robust and responsive real-time
reporting allowing for a practical and proactive reporting of cognitive workload. Using
the mechanism of sliding by 1 sample, a new windowed measurement can be made at
with each successive sample and no latency in reporting is observed. However, a practical
consideration is the requirement to maintain a buffer of N previous samples which may
be of concern in resource constrained computational environments such as a headband
computer. In this study, a 3s sliding window with 90 percent overlap was used.
4.3.2 Standardization
A major challenges with EEG signals is the high inter-individual differences in the record-
ings. These differences arise not only from experimental artifacts present in individualized
Chapter 4. Estimating Cognitive Load with Wireless EEG 65
Figure 4.3: Windowing Applied to a time-series Absolute γ signal
recordings, but also from the differences in the electrophysiological behavior between par-
ticipants resulting in varying baseline differences between individuals.[53][38] Inter-user
differences and the presence of artifacts in the recording signals are a major deterrent
towards building generalizable models to predict workload. Therefore, a standardization
process is recommended and performed in most studies to mitigate these concerns and
improve the generalization of models.[53][38] A commonly used standardization statis-
tic is the Z-Score statistic which transforms the data as a standard-deviations measure
from the mean. This allows for inter-subject standardization as the baseline differences
between individual users are mitigated as each individual recording is now described as
a relative z-score.
Z =X − µ(X)
σ(X)(4.1)
where Z is the Z-Score, X is the original data, µ(X) is the mean of the dataset and
σ(X) is the standard deviation of the dataset.
Chapter 4. Estimating Cognitive Load with Wireless EEG 66
Subject-Standardization
In order to achieve subject-level standardization, historical data of the participant is
required. In this study, the first 50 seconds of each drive which correspond to the time
before the nBack audio task is activated is taken as the baseline data for standardization
statistics. The mean and standard deviation is computed from the baseline data and the
labeled nBack task data is standardized as shown in Equation 4.2.
Xstd =Xraw − µbase
σbase(4.2)
where Xstd is the standardized labeled dataset, Xraw is the original labeled dataset,
Xbaseline is the mean of the baseline data and σbaseline is the standard deviation of the
baseline data. Xstd can be described as a relative measurement in terms of standard-
deviation that the labeled data is away from the mean of the baseline data.
A subject-standardized cumulative dataset is created by concatenating each of the
subject-standardized dataset for each user. This cumulative dataset showcases a re-
duction in the inter-individual differences between participants. A disadvantage of the
subject-level standardization process is the requirement of the collection of historical
data of a participant to learn the statistical properties required for the standardization.
Implementation of a real-time system using a standardization approach can lead to addi-
tional steps such as calibration and increased computational requirements. However, in
this study, while the practical implications are discussed, results for both standardized
and original datasets are compared.
4.3.3 Subject-Level Normalization
A Feature Scaling approach is used as proposed by Wang et al., which is adept at removing
outlier samples as well as scaling the measurements to a fixed range [0,1]. Removing
the outliers prevents such samples from contaminating and biasing the learning models.
Chapter 4. Estimating Cognitive Load with Wireless EEG 67
Figure 4.4: Standardization operation applied to a time-series 3s Averaged Absolute γ
signal
Scaling the features to a fixed range ensures that no single feature is overweighted during
the learning process. Equation 4.3, 4.4, 4.5 outlines the Feature Normalization procedure
which is applied to the subject-standardized datasets.
Xscaled =Xraw −Xl
Xu −Xl
(4.3)
Xu = min(Xmax, Qu + (1.5 ∗ (Qu −Ql)) (4.4)
Xl = max(Xmin, Ql − (1.5 ∗ (Qu −Ql)) (4.5)
where Xscaled is the normalized feature, Xraw is the original sample value, Xu is the
upper limit, Xl is the lower limit, Qu is the upper quartile of X, Ql is the lower quartile
of X, Xmax is the maximum value of X and Xmin is the lowest.
Chapter 4. Estimating Cognitive Load with Wireless EEG 68
Figure 4.5: Normalization operation applied to a time-series 3s Averaged and Standard-
ized Absolute γ signal
Figure 4.6: Complete Feature Processing methodology to transform original Time-Series
signal into a processed Normalized Signal prepared for the Classification Stage
Chapter 4. Estimating Cognitive Load with Wireless EEG 69
4.4 Machine Learning Algorithms
The feature-processed stage generates a myriad of datasets which require a modeling
methodology to identify the top featurespaces. Due to the estimation goal of the proposed
system and the availability of a labeled dataset, a natural approach is to use Supervised
Machine Learning algorithms for the classification tasks. The classification problem can
be phrased as a classification challenge based on the ’ground-truth’ labels associated with
each labeled dataset. Eq 4.6 formalizes this classification challenge:
y =
0, no n-back task during driving
1, 1-back task during driving
2, 2-back task during driving
(4.6)
An advantage of a data-driven approach is to avoid the development of specific algor-
tihms to determine the underlying patterns of the various featurespaces towards the clas-
sification problem. In addition, the feature extraction and feature-processing stages are
completely decoupled from the classification stage. This allows various feature-processing
techniques (as described in previous section) to generate numerous datasets which can
be evaluated by controlled processes in the classification stage. A labeled dataset, X is
generated by the Feature Extraction / Pre-Processing stage and inputted to the Classi-
fication stage. Machine Learning Algorithms, techniques in data-partitioning, validation
based hyper-parameter optimization and evaluation criterion are further discussed in this
section as it pertains to this experiment.
Simulations were performed using a mixture of Python and Matlab programs. In
particular, Python was used for SVM hyper-parameter tuning using libSVM[20], and
the development of a shallow neural network using TensorFlow. While hyper-parameter
tuning was primarily performed using Python based programs, final evaluation was per-
formed using MATLAB due to greater data visualization flexibility and ease of use of
Chapter 4. Estimating Cognitive Load with Wireless EEG 70
parallel computing capabilities.
Most studies partake in using ’classical’ machine learning algorithms for the estima-
tion challenge while greater attention is paid towards the feature extraction processes.
In particular, studies evaluate both linear and non-linear classifiers with the respective
datasets. Some newer studies have explored using Deep Learning architectures, however,
features are often not extracted as a separate processing stage and are rather learned
through the training process.[59][38] In this study, due to the prevalence of strictly iden-
tified feature-set family, i.e the Spectral Power Components, shallow-learning algorithms
are evaluated to identify the optimal features or feature-set that optimizes the classifica-
tion task. As a secondary goal, by evaluating the 5 following algorithms, a preliminary
comparison is provided between the various classification algorithms.
4.4.1 Support Vector Machine with Linear Kernel (LSVM)
The lSVM is a maximum-margin linear classifier which generates an optimal separating
hyperplane using soft-constraints for the classification tasks.[20][3] No transfomation into
higher dimensional spaces are required, requiring for only the optimization of the penalty
parameter C. Varying the C parameter allows us to introduce regularization into the
model generation for optimal generalization between in-sample and out-of-sample per-
formance. A larger C moves towards a hard-constraint and overfitted models while a low
C moves towards underfitted models, therefore, a validation scheme is required to deter-
mine the optimal regularization parameter value. [3][20] This experiment uses the Grid
Search methodology provided in the libSVM library and the accompanying guidelines to
perform this parameter tuning. lSVM is generally a binary classification technique and
a One-Vs-All approach is used to build n-models where n is the number of classes. The
following Hyperparameters are considered for this experiment:
• Penalty Factor C: Exhaustive search through libSVM grid search algorithm
Chapter 4. Estimating Cognitive Load with Wireless EEG 71
4.4.2 Logistic Regression LR
Logistic Regression is another linear classification algorithm where the model outputs
probabalistic outputs specifying the confidence of each class being the correct output. [3]
The maximum probability class is chosen as the prediction output. Once again, in order
to optimally generalize between in-sample and out-of-sample, a regularization penalty
term C is used in a manner very similar to that of lSVM described above. The following
Hyperparameters are considered for this experiment:
• Penalty Factor C: 10, 1, 0.01. 0.001
4.4.3 k-Nearest Neighbors (kNN)
The k-Nearest neighbors is not a trainable model, rather it memorizes the the labeled
training samples and uses a voting scheme based on k-neighbours to assign a label to a new
test sample. [3] The voting is based on using the Euclidian Distance metric to determine
the ’k’ closest training samples to the test sample and taking a majority vote based on
the labels of the ’k’ labels. In case of ties, the label of the closest sample is chosen.
Potential hyper-parameters that can be chosen are various forms of Distance Metrics,
weighting of neighbours and number of neighbours. The following Hyperparameters are
considered for this experiment:
• Distance Weights: Inverse, Equal-Weighted
• Number of Neighbors: 1,10,50,100
4.4.4 Support Vector Machine with Radial Basis Kernel (RB-
SVM)
The rbSVM is an extension of the lSVM where the ’kernel trick’ is applied to perform a
transformation of the input space into higher dimensions. [3] This is useful when the data
Chapter 4. Estimating Cognitive Load with Wireless EEG 72
is not linearly separable and a higher dimensional featurespace is desired to compute the
optimal hyperplane.[3] Due to the kernel transformation, an additional hyper-parameter
γ, sensitivity to sample misclassification, needs to be optimized. Higher γ lead to overfit-
ting and low γ lead to underfitted models. Therefore, a validation procedure is required
to find the optimal hyper-parameters. This experiment using the Grid Search method-
ology provided in the libSVM library and the accompanying guidelines to perform this
parameter tuning. The following Hyperparameters are considered for this experiment:
• Penalty Cost C : Exhaustive Grid search by libSVM algorithm
• Sample Sensitivity γ: Exhaustive search by libSVM grid search algorithm
4.4.5 Shallow Artificial Neural Network (ANN)
A shallow Feed-Forward Neural Network using the backpropagation algorithm is devel-
oped for this experiment. The Cross Entropy Loss is used as the evaluation metric for
training. ReLu activation functions are used in the hidden layers and a softmax output is
used in the output layer to generate binary outputs. Regularization via Penalty Factor λ
and early stopping are performed during the classification task. Adam Optimizer is used
for objective function optimization during training. The following hyper-parameters are
optimized:
• Learning rate α: 0.1,0.01,0.001
• Regularization Parameter λ: 0.005, 0.01, 0.1
• Architecture (Layers-Nodes): 2-10, 1-10, 1-100, 2-100
4.5 Performance Evaluation
The goal of the system is towards building a practical and real-time implementable cogni-
tive workload estimation system. To that extent, careful attention must be paid towards
Chapter 4. Estimating Cognitive Load with Wireless EEG 73
Figure 4.7: Machine Learning Algorithm and evaluated Hyper-Parameters
the evaluation and data partitioning schemes to account for the practical assesment of the
solution. In a practical scenario, systems are either built by using collected data to build
customised solutions which are tested on new data, or as in our approach, a data mining
approach is followed and statistical learning models are developed to learn the inherent
patterns in the data. In both approaches, the dataset is identical, however, the goal is
to use the data to perform well on new unseen data. Idealy, the new unseen data would
be test data collected after the experiment and then tested on the models generated by
the original dataset. However, such an approach is infeasible as costs, controllability and
time are factors towards following up with a secondary data collection campaign.Instead,
a data partitioning approach is used where the collected and labeled dataset is divided
into training, validation and test sets and well established methodologies are followed to
optimize the in-sample performance with the out-of-sample generalization.[3] The Ma-
chine Learning training and evaluation procedure is shown in Figure 4.8 and described
Chapter 4. Estimating Cognitive Load with Wireless EEG 74
Figure 4.8: k-Fold Cross-Validation
in the following sections:
4.5.1 Training and Testing
Performance evaluation is established by training models on the training set (in-sample)
and evaluating on samples outside of the training set(out-of-sample). Training and test-
ing on the same set will certainly lead to overly optimistic results. However, testing on
a random test set is also not the ideal indicator of out-of-sample performance as the
selected test set may be biased by overly optimistic/pessimistic samples. Performance is
affected not only by the selection of the training / test sets from selections within the
dataset, but also by the partitioning rations of the various sets. For example, performing
Chapter 4. Estimating Cognitive Load with Wireless EEG 75
a fifty-fifty split of training and test sets will drastically reduce the samples required for
satisfactory training and affect the optimization of the in-sample performance. Inade-
quate optimization of the in-sample performance will lead to unsatisfactory out-of-sample
performance accordingly. In particular, it is desired to have a large number of training
samples when training complex models or dealing with a dataset in high dimensions.[3]
The machine learning challenge is to minimize the in-sample error Ein while also minimiz-
ing the generalization error (Eout−Ein). In essence, it is desired to have an out-of-sample
error very close to the in-sample error to ensure that trained models provide realistic and
reproducable results when tested with new unseen samples. Throughout this process of
optimizing the two measures, model hyper-parameters are constantly adapted until a set
of hyper-parameters are found which accomplish the two aforementioned optimizations.
An established methodology to accomplish this is the K-Fold Cross Validation technique
described below. [3]
K-Fold Cross-Validation
In order to optimize the minimization of Ein and (Eout − Ein), a balance between the
choice and ratio of test and training sets is required. Too many training points can lead
to an unpredictable out of sample behavior while too many test points can lead to the
generation of an inadequate model. To mitigate these issues, a technique called K-Fold
Cross-Validation is used which introduces a third set, called the Validation Set.[3] K-Fold
validation as used in this experiment is now described:
1. Dataset is divided into a 90 percent Training Set and 10 percent Test Set.
2. Training Set is divided into 5 Folds (The partitioning of data will be discussed in
the next section)
3. Initial model hyper-parameters are chosen and trained on 4 of the 5 folds. The
resulting model is tested on the remaining 5th fold called the ’validation’ Set. This
Chapter 4. Estimating Cognitive Load with Wireless EEG 76
process is repeated until each set has been a part of the validation set once. This
ensures that all samples are used as training samples multiple times but are used
as validation samples only once.
4. Repeat Step 3 until all desired hyper-parameters have been evaluated.
5. Choose the best performing hyper-parameters based on the hyper-parameter set
which achieved the best average validation error
6. Train entire training set (all 5-folds) with chosen hyper-parameters
7. Use the generated model to evaluate the Test Set
Data Partitioning
In order to replicate the practicality of implementing such a system in a practical driving
scenario, attention must be paid to the paradigms which would facilitate a practical data
collection, training and evaluation scenario. Some obvious choices for practical usability
of a trainable system are as follows:
1. An individualized model is generated for each driver and then tested on the same
driver. While this option is the most ideal due to the fine tuning of individual
responses, it is often time consuming and impractical to gather the sufficient indi-
vidual data to deliver a graceful out-of-the-box experience. A better experience can
be achieved if pre-trained models from other users’ data can be used to evaluate a
new users’ data as described next.
2. A generalized pre-trained model is generated by multi-user collected data and is
then tested on new users. This allows for the best out-of-box experience as no
individualized training is required and pre-trained models can be used to predict
workload of new individuals
Chapter 4. Estimating Cognitive Load with Wireless EEG 77
3. A generalized model is generated by multi-user collected data and is then tested
on the same users. This differs from (1), in that, the models generated using this
methodology are not individualized and instead are generated from a multitude
of users. A downside of this methodology is the requirement for individual data
collection to retrain the models.
Approach (1) is denoted as the Individualized Subject-Specific partitioning scheme
where individual models for each user are generated and tested.
Approach (2) is denoted as Generalized Subject-Partitioned scheme where test sub-
jects are not used as part of the training procedure.
Approach (3) is denoted as Generalized Time-Partitioned scheme where subjects are
used both in the train and test sets. It is very important to note that even though the
same users are used for both the training and testing, the training and test sets are two
distinct sets from the same users where test samples are never included in the training
procedure to mitigate sampling bias.
4.5.2 Performance Metrics
Most studies use Accuracy as the primary evaluation criterion. Accuracy is described in
Equation 4.7.
A(p, y) =
∑ncount1 ([p == y])
ncount(4.7)
where A(p, y) is the Accuracy, ncount is the number of samples, p is the predicted class,y
is the correct condition class and [] is the indicator function.
In cases where the datasets are unbalanced (one class is overweighted in representa-
tion), it is much more useful to use the related metrics of Precision and Recall as has been
used by various studies. Precision and Recall are described in the following equations[16]:
P =TP
TP + FP(4.8)
Chapter 4. Estimating Cognitive Load with Wireless EEG 78
R =TP
TP + FN(4.9)
where TP is the number of correctly predicted samples as the positive class, FP is the
incorrectly predicted samples as the positive class and FN is the number of incorrectly
predicted samples as the negative class.
Precision can be described as the ’rate’ of correctly predicting the positive class.
That is, for all classes predicted as the positive class, what is the rate of the correct
predictions. Recall can be described as the ratio of correctly predicted positive class over
the total number of conditional positive samples. While Precision and Recall can be used
individually as a good evaluation metrics, a harmonized mean of the Precision and Recall
can be used, called the F-Score which takes into account both Precision and Recall and
allows for a singular numeric representation of the performance. F-Score is described by
Equation 4.10[16]
Fβ = (1 + β2)PR
(β2P ) +R(4.10)
where Fβ is the F-Score, β is the recall weight, P is the precision and R is the recall. By
increasing the value of β, the recall is given greater importance. In this experiment, both
F1 and F2 scores are computed. F1 weighs both precision and recall as equally weighted,
while F2 provides greater sensitivity to the recall parameter.
In the final evaluation reporting, Accuracy, the average of F1 score and the average
of the F2 scores over the binary and ternary classes are reported.
4.6 Experiment Results
A comprehensive set of simulation results are presented which describe the performance of
featuresets that would be applicable in the aforementioned practical driving and modeling
scenarios. Both individualized and generalized model results are presented for both
binary and ternary performance.
Chapter 4. Estimating Cognitive Load with Wireless EEG 79
4.6.1 Individualized Subject-Specific Performance
Individualize Subject-specific models are generated on solo users using the time-based
partitioning scheme. kNN and lSVM algorithms are used a 5-fold cross validation is per-
formed. Results presented are the averaged validation scores of the top hyper-parameter.
Discussion
Figure 4.9, 4.10 and 4.11 show the individualized classification performance using the ac-
curacy metric. Table 4.3 and 4.4 show the medians values of the performance of the 28 in-
dividualized models generated for 28 participants for 32-feature and sub-featurespaces. A
comparison between modeling on the original dataset and the proposed feature-processed
dataset is shown. The following discussion is presented:
1. The standardization and normalization procedures part of the feature-processing
stage boost the performance of the individual models. It can be seen that by
performing a subject level standardization / normalization process, a linear classi-
fier(lSVM) is able to clearly discriminate between the 3 classes with .991 accuracy.
This is expected as fine-tuned individual models are generated for each participant.
2. In the reduced featurspaces, Absolute Power shows the best discriminant behavior
across both binary and ternary datasets with performance accuracy of .923 and .890
respectively. Absolute Power feature-set is the best choice for individual subject-
specific models, however, significantly better performance is achieved using any of
the sub-featurespaces when compared to performance in the original (no feature-
processed) datasets.
3. PCA is performed on the original 32-dimensional feature-set and the featurespace
is reduced to 3-dimensions. In the binary classification task for the original (no
feature-processing) dataset, no significant performance improvements are observed
Chapter 4. Estimating Cognitive Load with Wireless EEG 80
Figure 4.9: Individualized Models median binary classification performance
and .780 accuracy is achieved. For the same classification task on the Feature-
Processed dataset, an accuracy of .871 is achieved. This is a degradation from the
performance when using all 32 features(.991). For the ternary classification task,
a significant performance drop is observed due to the dimensionality reduction in
the feature-processed dataset.
4. LDA is also performed on the original 32-dimensional dataset, reducing it to 1
and 2 dimensions for the binary and ternary datasets respectively. In the feature-
processed dataset, for both classification tasks, performance is improved as LDA is
successful in maximizing the inherent linear separability of the data. Figure 4.12
and 4.13 visually showcase this observation. Reducing the 32-dimensional space
to 1 and 2-dimensions is resulting in an equivalent or superior performance when
compared to the performance of the complete 32-dimensional featurespace. LDA
as a dimensionality reduction technique is recommended for individualized datasets
when the proposed feature-processing scheme is performed. This results also points
towards the inherent linear separability of the individualized datasets where simple
linear classifiers can be used to generate optimal models for detecting between
cognitive workloads.
Chapter 4. Estimating Cognitive Load with Wireless EEG 81
Figure 4.10: Individualized Models median ternary classification performance
Figure 4.11: Individualized Models sub-featurespace performance
Chapter 4. Estimating Cognitive Load with Wireless EEG 82
Table 4.3: Medians of Individualized Scores using All-Features(32)
Algorithm Classes Original Feature Processing
ACC F1 F2 ACC F1 F2
lSVM 0 vs 2 .764 .754 .758 .991 .991 .991
lSVM(LDA) 0 vs 2 .767 .761 0.761 1 1 1
lSVM(PCA) 0 vs 2 .780 .781 .784 .871 .894 .894
lSVM 0 vs 1 vs 2 .662 .654 .656 .976 .975 .974
lSVM(LDA) 0 vs 1 vs 2 .572 .582 .589 .950 .986 .986
lSVM(PCA) 0 vs 1 vs 2 .595 .618 .625 .666 .775 .778
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM
Table 4.4: Medians of Individualized Scores using Sub-Features
Algorithm Classes Absolute Relative β-Relative
ACC F1 F2 ACC F1 F2 ACC F1 F2
lSVM 0 vs 2 .923 .959 .959 .880 .889 .889 .792 .858 .857
lSVM 0 vs 1 vs 2 .790 .884 .884 .746 .745 .752 .666 .792 .798
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM = Linear SVM
Chapter 4. Estimating Cognitive Load with Wireless EEG 83
Figure 4.12: 2-dimensional LDA Space for Subject 37 on original dataset
Figure 4.13: 2-Dimensional LDA Space for Subject 37 on feature-processed dataset
Chapter 4. Estimating Cognitive Load with Wireless EEG 84
Figure 4.14: 3-dimensional PCA Space for Subject 37 on feature-processed dataset
4.6.2 Generalized Subject-Partitioned Classification
In this grouping scheme, a unique set of participants are used to train the models and the
generated models are then tested on a new set of participants. This is the most ambitious
experimental condition where the test set participants may not have statistical properties
related to the training set participants. (Figure 4.15 and 4.16)
Discussion
For both the binary and ternary tasks, around guess performance is achieved (33 percent
and 50 percent respectively). Dimensionality reduction techniques to generate datasets
in PCA and LDA featurespaces do not boost performance. Best performance for the
binary classification task (.551) is achieved at the PCA space using kNN with 10 nearest
neighbours model. kNN with 10 neighbors model achieves the best performance in the
ternary classification task (.394) in the PCA space. However, best performance is not
Chapter 4. Estimating Cognitive Load with Wireless EEG 85
Figure 4.15: Generalized Subject-Partitioned binary classification
Figure 4.16: Generalized Subject-Partitioned ternary classification
Chapter 4. Estimating Cognitive Load with Wireless EEG 86
Table 4.5: Generalized Subject-Partitioned Binary Classification
Featurespace Classes Dimensions Performance
ACC F1 F2
lSVM 0 vs 2 32 .509 .505 .509
LR 0 vs 2 32 .511 .507 .511
kNN 0 vs 2 32 .538 .535 .537
rbSVM 0 vs 2 32 .539 .375 .440
ANN 0 vs 2 32 .487 .486 .488
lSVM (PCA) 0 vs 2 3 .509 .505 .509
LR (PCA) 0 vs 2 3 .506 .508 .512
kNN (PCA) 0 vs 2 3 .551 .546 .549
rbSVM (PCA) 0 vs 2 3 .505 .500 .501
ANN (PCA) 0 vs 2 3 .487 .479 .481
lSVM (LDA) 0 vs 2 1 .510 .506 .510
LR (LDA) 0 vs 2 1 .512 .507 .511
kNN (LDA) 0 vs 2 1 .499 .497 .498
rbSVM (LDA) 0 vs 2 1 .459 .458 .458
ANN (LDA) 0 vs 2 1 .488 .477 .484
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score,
lSVM = Linear SVM, LR = Logistic Regression, kNN =
k-Nearest Neighbors, rbSVM = Radial Basis SVM, ANN
= Artificial Neural Network
Chapter 4. Estimating Cognitive Load with Wireless EEG 87
Table 4.6: Generalized Subject-Partitioned Ternary Classification
Featurespace Classes Dimensions Performance
ACC F1 F2
lSVM 0 vs 1 vs 2 32 .331 .322 .328
LR 0 vs 1 vs 2 32 .333 .327 .328
kNN 0 vs 1 vs 2 32 .363 .358 .360
rbSVM 0 vs 1 vs 2 32 .355 .206 .260
ANN 0 vs 1 vs 2 32 .359 .348 .351
lSVM (PCA) 0 vs 1 vs 2 3 .333 .326 .330
LR (PCA) 0 vs 1 vs 2 3 .332 .326 .330
kNN (PCA) 0 vs 1 vs 2 3 .394 .349 .351
rbSVM (PCA) 0 vs 1 vs 2 3 .378 .375 .375
ANN (PCA) 0 vs 1 vs 2 3 .359 .312 .316
lSVM (LDA) 0 vs 1 vs 2 2 .325 .313 .320
LR (LDA) 0 vs 1 vs 2 2 .333 .327 .330
kNN (LDA) 0 vs 1 vs 2 2 .378 .367 .371
rbSVM (LDA) 0 vs 1 vs 2 2 .281 .270 .271
ANN (LDA) 0 vs 1 vs 2 2 .333 .323 .328
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score,
lSVM = Linear SVM, LR = Logistic Regression, kNN =
k-Nearest Neighbors, rbSVM = Radial Basis SVM, ANN
= Artificial Neural Network
Chapter 4. Estimating Cognitive Load with Wireless EEG 88
significantly better than guess performance and therefore, a csubject-partitioned scheme
where models are trained and tested with separate participants is not recommended for
an automated driver cognitive workload monitoring system. Similar around guess results
are also observed in a study by Lockheed Martin. [59]
4.6.3 Generalized Time-Partitioned Performance
In this grouping scheme, models are trained and tested on the data from the same
participants. It should be noted that using the 5-fold partitioning scheme ensures that
no data leakage between the training and validation sets takes place and a true out of
sample performance evaluation is presented. Evaluation results using the Accuracy, FP1
and FP2 metrics are provided for the original dataset as well as the dataset with the
proposed feature processing methodology.
32-Dimensional Featurespace
The complete set of Absolute, Relative and β-Relative features across the two frontal sites
are used for the evaluation. A comparison between the original and feature processed
datasets is provided in Table 4.7 and 4.8.
The following results are achieved in the binary classification task between Low Work-
load and High Workload conditions:
1. rbSVM models achieve the best accuracy of .675 in the original 32-Featurespace.
By using the proposed feature-processed dataset, an improved .864 accuracy is
achieved using a single layer ANN (100 hidden nodes).
2. In the reduced featurespace using PCA with 3-features, no significant performance
differences are observed in the original dataset compared to the original 32-featurespace.
rbSVM achieves the best performance in the original dataset with an accuracy of
.671 (similar to .675 in the 32-featurespace). However, in the feature-processed
Chapter 4. Estimating Cognitive Load with Wireless EEG 89
dataset, severe performance degradation takes place when compared to the feature-
processed dataset in the 32-featurespace. Best performance is achieved using kNN
with 10 nearest neighbours with an accuracy of .623. This is a stark reduction
in performance of .864 when compared to the 32-dimensional featurespace. PCA
based dimensionality reduction is not recommeded for the proposed feature process-
ing methodology. However, a dimensionality reduction using PCA in the original
space achieved the same performance as the 32-featurespace.
3. In the 1-dimensional LDA space, performance degradation was observed in both
datasets. Best accuracy of .579 was achieved using ANN in the original dataset.
In the proposed feature-procesed dataset, kNN achieved the top accuracy with
.678, however this is a significant degradation from an accuracy of .864 using all
32-features. LDA outperforms PCA based dimensionality reduction in the feature
processed dataset.
4. In general, across both datasets, more complex classifiers such as kNN, ANN and
rbSVM performed significantly better than the linear models of lSVM and Logistic
Regression. This points towards the datasets not being inherently linearly separable
and the requirement of more complex models for acceptable performance.
The following results are achieved for the ternary classification task,
1. Using all 32-features in the original dataset, an accuracy of .619 is achieved using
rbSVM. In the proposed feature-processed dataset, an accuracy of .790 is achieved
using ANN which is significantly superior to the performance of all other models. In
general, all models generated in the feature-processed dataset perform superiority
compared to the original dataset.
2. In the 3-dimensional PCA space, a slight performance degradation takes places in
the original dataset evaluation. Top performance in the PCA space is an accuracy
Chapter 4. Estimating Cognitive Load with Wireless EEG 90
of .568 using an ANN model compared to .619 using all 32-features. In the proposed
feature processed dataset with PCA based dimensionality reduction, best accuracy
of .490 is achieved using ANN. This is a severe degradation compared to the .790
achieved using ANN in the 32-feature space. PCA is recommended for use in the
original dataset, however, severe performance degradation is observed in the feature
processed dataset.
3. In the 2 dimensional LDA space, a performance degradation takes place in both
datasets. Best performance of .445 is achieved using ANN in the original dataset.
This is a significant degradation from an accuracy of .619 using all 32 features
with the same dataset. In the proposed feature-processed dataset, a performance
degradation is also observed with the best performance of .540 achieved compared
to .790 using all 32 features. LDA outperforms PCA in the feature processed
dataset and is recommended if dimensionality reduction is essential for real-time
performance and training requirements.
4. Once again, non-linear classifiers outperform the linear classifiers which suggests
towards non-linearity present in the underlying patterns of the dataset.
Absolute Power Featurespace
Table 4.9 showcases the performance of individual features in the feature-processed
datasets using individual Absolute Power Features. Due to the low dimensional space of
this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed
and only lSVM and kNN are analyzed for performance metrics. The following is observed:
1. High Frequency Absolute Bands β and γ exhibit greater discriminant behavior
across both binary and ternary classification tasks compared to lower frequency
bands.
2. kNN outperforms the Linear SVM classifier across both classification tasks.
Chapter 4. Estimating Cognitive Load with Wireless EEG 91
Figure 4.17: Generalized Time-Partitioned ternary classification
Figure 4.18: Generalized Time-Partitioned binary classification
Chapter 4. Estimating Cognitive Load with Wireless EEG 92
Table 4.7: Time-Partitioned Binary Classification using 32-Features
Algorithm Classes Original Feature Processing
ACC F1 F2 ACC F1 F2
lSVM 0 vs 2 .543 .539 .595 .677 .676 .676
LR 0 vs 2 .542 .541 .541 .681 .681 .681
kNN 0 vs 2 .675 .675 .675 .797 .797 .797
rbSVM 0 vs 2 .659 .653 .654 .771 .726 .739
ANN 0 vs 2 .649 .649 .649 .864 .864 .864
lSVM(PCA) 0 vs 2 .542 .536 .538 .562 .546 .553
LR(PCA) 0 vs 2 .542 .542 .542 .564 .552 .557
kNN(PCA) 0 vs 2 .663 .663 .663 .623 .623 .633
rbSVM(PCA) 0 vs 2 .671 .671 .671 .619 .617 .617
ANN(PCA) 0 vs 2 .625 .624 .624 .592 .592 .592
lSVM(LDA) 0 vs 2 .541 .539 .539 .677 .678 .677
LR(LDA) 0 vs 2 .542 .541 0.541 .678 .678 .677
kNN(LDA) 0 vs 2 .573 .572 0.572 .678 .677 .678
rbSVM(LDA) 0 vs 2 .577 .572 0.572 .677 .676 .676
ANN(LDA) 0 vs 2 .579 .578 0.578 .671 .670 .670
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM
= Linear SVM, LR=Logistic Regression, kNN=k-Nearest Neigh-
bors, rbSVM=Radial Basis Function SVM, ANN= Artificial Neu-
ral Network
Chapter 4. Estimating Cognitive Load with Wireless EEG 93
Table 4.8: Time-Partitioned Ternary Classification using 32-Features
Algorithm Classes Original Feature Processing
ACC F1 F2 ACC F1 F2
lSVM 0 vs 1 vs 2 .411 .404 .407 .533 .533 .533
LR 0 vs 1 vs 2 .412 .403 .407 .520 .520 .520
kNN 0 vs 1 vs 2 .589 .589 .589 .692 .692 .691
rbSVM 0 vs 1 vs 2 .619 .619 .618 .661 .634 .629
ANN 0 vs 1 vs 2 .612 .611 .612 .790 .790 .790
lSVM(PCA) 0 vs 1 vs 2 .367 .335 .350 .373 .372 .372
LR(PCA) 0 vs 1 vs 2 .362 .327 .343 .364 .364 .364
kNN(PCA) 0 vs 1 vs 2 .540 .540 .540 .465 .465 .465
rbSVM(PCA) 0 vs 1 vs 2 .560 .560 .560 .455 .456 .456
ANN(PCA) 0 vs 1 vs 2 .568 .568 .568 .490 .487 .487
lSVM(LDA) 0 vs 1 vs 2 .378 .367 .378 .533 .532 .532
LR(LDA) 0 vs 1 vs 2 .381 .373 .377 .532 .531 .531
kNN(LDA) 0 vs 1 vs 2 .376 .374 .378 .522 .522 .522
rbSVM(LDA) 0 vs 1 vs 2 .414 .12 .411 .540 .540 .540
ANN(LDA) 0 vs 1 vs 2 .445 .445 .445 .527 .527 .527
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM
= Linear SVM, LR=Logistic Regression, kNN=k-Nearest Neigh-
bors, rbSVM=Radial Basis Function SVM, ANN= Artificial Neu-
ral Network
Chapter 4. Estimating Cognitive Load with Wireless EEG 94
3. The Absolute γ Power band exhibits best performance with .434 accuracy for
ternary and .623 accuracy for binary classification tasks. Both results are signifi-
cantly better than guess performance and should be strong candidates as isolated
features.
Relative Power Featurespace
Table 4.10 showcases the performance of individual features in the feature-processed
datasets using individual Relative Power Features. Due to the low dimensional space of
this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed
and only lSVM and kNN are analyzed for performance metrics. The following is observed:
1. Significant differences in performance is not observed between the lower and higher
frequency bands in the relative power space.
2. kNN and lSVM achieve similar performance across both classification tasks.
3. The Relative γ Power band exhibits best performance with .371 accuracy for ternary
and .554 accuracy for binary classification tasks. This is around the guess perfor-
mance for both classification tasks.
4. Single Relative High Band features exhibit inferior performance compared to single
Absolute High Band features
β-Relative Power Featurespace
Table 4.11 showcases the performance of individual features in the feature-processed
datasets using individual β-Relative Power Features. Due to the low dimensional space
of this feature-set, (2-dimensions across 2 sites) dimensionality-reduction is not performed
and only lSVM and kNN are analyzed for performance metrics. The following is observed:
Chapter 4. Estimating Cognitive Load with Wireless EEG 95
Table 4.9: Time-Partitioned Classification using Single Absolute Features
Algorithm Classes lSVM kNN
ACC F1 F2 ACC F1 F2
Absolute δ 0 vs 1 vs 2 .352 .353 .353 .352 .351 .352
Absolute θ 0 vs 1 vs 2 .345 .342 .341 .342 .341 .341
Absolute α 0 vs 1 vs 2 .362 .331 .332 .356 .356 .356
Absolute β 0 vs 1 vs 2 .380 .323 .355 .424 .423 .422
Absolute γ 0 vs 1 vs 2 .375 .309 .344 .436 .434 .437
Absolute βγ 0 vs 1 vs 2 .375 .359 .367 .372 .369 .370
Absolute δ 0 vs 2 .525 .522 .523 .525 .525 .525
Absolute θ 0 vs 2 .514 .512 .513 .512 .511 .511
Absolute α 0 vs 2 .540 .539 .541 .528 .527 .527
Absolute β 0 vs 2 .573 .572 .572 .599 .599 .599
Absolute γ 0 vs 2 .566 .564 .563 .623 .622 .621
Absolute βγ 0 vs 2 .562 .562 .562 .607 .607 .604
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM, kNN = k-Nearest Neighbors
Chapter 4. Estimating Cognitive Load with Wireless EEG 96
Table 4.10: Time-Partitioned Classification using Single Relative Features
Algorithm Classes lSVM kNN
ACC F1 F2 ACC F1 F2
Relative δ 0 vs 1 vs 2 .341 .300 .316 .354 .354 .355
Relative θ 0 vs 1 vs 2 .362 .323 .341 .357 .357 .357
Relative α 0 vs 1 vs 2 .340 .305 .319 .354 .354 .354
Relative β 0 vs 1 vs 2 .335 .306 .318 .346 .347 .345
Relative γ 0 vs 1 vs 2 .356 .330 .340 .366 .367 .366
Relative βγ 0 vs 1 vs 2 .343 .342 .342 .371 .370 .372
Relative δ 0 vs 2 .513 .492 .491 .539 .538 .538
Relative θ 0 vs 2 .537 .525 .529 .533 .534 .533
Relative α 0 vs 2 .521 .459 .484 .514 .514 .514
Relative β 0 vs 2 .517 .512 .514 .526 .525 .525
Relative γ 0 vs 2 .537 .526 .530 .554 .554 .554
Relative βγ 0 vs 2 .525 .526 .522 .542 .541 .544
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM, kNN = k-Nearest Neighbors
Chapter 4. Estimating Cognitive Load with Wireless EEG 97
Table 4.11: Time-Partitioned Classification using Single Relative Features
Algorithm Classes lSVM kNN
ACC F1 F2 ACC F1 F2
θ+αβ 0 vs 1 vs 2 .342 .344 .346 .355 .355 .356
αβ 0 vs 1 vs 2 .355 .354 .355 .355 .356 .358
θ+αα+β 0 vs 1 vs 2 .347 .348 .349 .361 .364 .365
θβ 0 vs 1 vs 2 .354 .352 .352 .363 .362 .367
θ+αβ 0 vs 1 .530 .524 .526 .548 .548 .547
αβ 0 vs 1 .528 .521 .523 .541 .541 .541
θ+αα+β 0 vs 1 .535 .532 .533 .541 .542 .541
θβ 0 vs 1 .534 .526 .528 .544 .544 .544
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM, kNN = k-Nearest Neighbors
1. Significant differences in performance is not observed between any β-Relative power
bands and around guess performance is achieved using any of the individual fea-
tures.
2. kNN and lSVM achieve similar performance across both classification tasks.
3. (θ + α / β) exhibit best performance for the binary classification task(.548) while
θ / β exhibit best performance over the ternary task(.363). This is around guess
performance for both classification tasks.
4. Single β-Relative features exhibit inferior performance compared to single Absolute
High Band features
Chapter 4. Estimating Cognitive Load with Wireless EEG 98
Figure 4.19: Generalized Time-Partitioned ternary classification (Individual Features)
Figure 4.20: Generalized Time-Partitioned binary classification (Individual Features)
Chapter 4. Estimating Cognitive Load with Wireless EEG 99
4.6.4 Grouped Power Spectral Sub-Features
Table 4.12 and Table 4.13 shows the group-wise comparison between the Absolute, Rel-
ative and β-Relative datasets after the proposed feature-processing mechanism. The
Absolute and Relative Featurespaces have a dimensionality of 12 across the two frontal
sites while the β-Relative Featurespace has a dimensionality of 8. Each of these datasets
generated by applying the proposed feature-processing mechanism and specified features-
paces undergo dimensionality reduction via PCA and LDA, and sub-datasets in the PCA
and LDA spaces are generated. The 9 datasets are compared across the two classification
tasks to identify the optimal performing featurespace.
The following discussion is presented for the Ternary Classification task:
1. In the original featurespace, Relative Power bands achieve the best performance
at an accuracy of .565 using rbSVM models. This is followed by the β-Relative
bands at an accuracy of .523 using ANN. In general, the relative band features-
paces outperform the absolute band featurespace. Non-linear classifiers significantly
outperform the linear classifiers across all three featurespaces.
2. In the PCA space with 3-dimensions explaining .95 of variance, the Relative Fea-
turespace outperforms both the Absolute and β-Relative featurespaces with best
performance accuracy of .529 using rbSVM models. This is a slight decrease in
performance from .565 when using all 12 features. Across all 3 featurespaces,a re-
duction in performance is achieved through PCA induced dimensionality reduction.
Absolute Power Featurespace is the worst performing featurespace.
3. In the LDA space with 2-dimensions, the β-Relative featurespace achieves the best
performance accuracy of .423 using kNN with 5 nearest neighbors. The Relative
featurespace achieves very similar performance accuracy of .420. Absolute fea-
turespace achieves the worst performance as is the case in all featurespaces. The
performance of the LDA datasets is significantly worse than the performance when
Chapter 4. Estimating Cognitive Load with Wireless EEG 100
using the PCA or original featurespace datasets. This points towards the inherent
non-linearity present in the classification tasks for all datasets.
4. ANN and rbSVM models perform superiorly to the lSVM and kNN models across
all datasets.
5. All of the generated datasets produce significantly better than guess performance
across all featurespaces using linear or complex classification models.
6. Although the Relative sub-featurespace achieves the best performance (.565) when
compared to other sub-featurespaces, it should be noted that, using the combined
32-Feature space, a best performance accuracy of .790 is achieved using ANN. How-
ever, applying dimensionality reduction to the original 32-featurespace achieves
inferior performance as applying the dimensionality reduction to the reduced fea-
turespaces.
The following discussion is presented for the Binary Classification task:
1. In the original featurespace, Relative Power bands achieve the best performance at
an accuracy of .683 using rbSVM models. This is followed by the β-Relative bands
at an accuracy of .683 using rbSVM. In general, the relative band featurespaces out-
perform the absolute band featurespace. However, there is no significant difference
in the performance between the Absolute, Relative or β-Relative featurespaces.
2. In the PCA space with 3-dimensions explaining .95 of variance, the Relative Fea-
turespace outperforms both the Absolute and β-Relative featurespaces with best
performance accuracy of .671 using rbSVM models. This is a slight decrease in
performance from .683 when using all 12 features. Across all 3 featurespaces, an
insignificant reduction in performance is achieved through PCA induced dimen-
sionality reduction.
Chapter 4. Estimating Cognitive Load with Wireless EEG 101
Figure 4.21: Generalized Time-Partitioned ternary classification (Sub-Features)
3. In the LDA space with 2-dimensions, the β-Relative featurespace achieves the best
performance accuracy of .621 using ANN. Absolute featurespace achieves the worst
performance with performance of .583. The performance of the LDA datasets is sig-
nificantly worse than the performance when using the PCA or original featurespace
datasets. This points towards the inherent non-linearity present in the classification
tasks for all datasets.
4. kNN, ANN and rbSVM models perform superiorly to the lSVM models across all
datasets.
5. All of the generated datasets produce significantly better than guess performance
across all featurespaces using linear or complex classification models.
6. Although the Relative sub-featurespace achieves the best performance (.683) when
compared to other sub-featurespaces, it should be noted that, using the combined
32-Feature space, a best performance accuracy of .864 is achieved using ANN. How-
ever, applying dimensionality reduction to the original 32-featurespace achieves
inferior performance as applying the dimensionality reduction to the reduced fea-
turespaces.
Chapter 4. Estimating Cognitive Load with Wireless EEG 102
Table 4.12: Time-Partitioned Ternary Classification using Sub-Features
Algorithm Classes Absolute Relative β-Relative
ACC F1 F2 ACC F1 F2 ACC F1 F2
lSVM 0 vs 1 vs 2 .407 .405 .406 .420 .418 .419 .391 .391 391
kNN 0 vs 1 vs 2 .496 .495 .495 .550 .550 .550 .444 .444 .444
rbSVM 0 vs 1 vs 2 .498 .498 .498 .565 .565 .564 .501 . 502 .503
ANN 0 vs 1 vs 2 .502 .498 .503 .558 .558 .558 .523 .528 .528
lSVM(PCA) 0 vs 1 vs 2 .389 .387 .389 .374 .369 .371 .363 .344 .353
kNN(PCA) 0 vs 1 vs 2 .438 .438 .438 .530 .529 .531 .480 .479 .479
rbSVM(PCA) 0 vs 1 vs 2 .438 .440 .438 .539 .538 .538 0.464 .464 .465
ANN(PCA) 0 vs 1 vs 2 .440 .438 .440 .512 .512 .512 .449 .448 .448
lSVM(LDA) 0 vs 1 vs 2 .400 .398 .399 .418 .414 .415 .394 .376 .384
kNN(LDA) 0 vs 1 vs 2 .388 .388 .388 .402 .401 .401 .423 .425 .425
rbSVM(LDA) 0 vs 1 vs 2 .409 .406 .407 .417 .415 .413 .419 .418 .418
ANN(LDA) 0 vs 1 vs 2 .411 .406 .411 .420 .417 .417 .421 .417 .418
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM
Chapter 4. Estimating Cognitive Load with Wireless EEG 103
Table 4.13: Time-Partitioned Binary Classification using Sub-Features
Algorithm Classes Absolute Relative β-Relative
ACC F1 F2 ACC F1 F2 ACC F1 F2
lSVM 0 vs 2 .550 .550 .550 .598 .593 .594 .602 597 .598
kNN 0 vs 2 .643 .643 .643 .670 .670 .670 .606 .606 .606
rbSVM 0 vs 2 .652 .650 .650 .683 .683 .683 .672 .671 .670
ANN 0 vs 2 .645 .645 .645 .657 .656 .656 .671 .671 .671
lSVM(PCA) 0 vs 2 .562 .562 .562 .586 .582 .583 .553 .547 .549
kNN(PCA) 0 vs 2 .637 .637 .637 .671 .671 .671 .638 .638 .638
rbSVM(PCA) 0 vs 2 .584 .583 .583 .672 .671 .671 .599 .597 .598
ANN(PCA) 0 vs 2 .621 .620 .620 .665 .663 .663 .607 .589 .596
lSVM(LDA) 0 vs 2 .552 .550 .551 .593 .584 .586 .602 .598 .599
kNN(LDA) 0 vs 2 .546 .544 .544 .585 .581 .582 .585 .581 .582
rbSVM(LDA) 0 vs 2 .556 .551 .552 .589 .581 .583 .565 .571 .568
ANN(LDA) 0 vs 2 .583 .576 .578 .621 .611 .614 .581 .577 .578
Note: ACC = accuracy, F1 = F1-score, F2 = F2-score, lSVM =
Linear SVM
Chapter 4. Estimating Cognitive Load with Wireless EEG 104
Figure 4.22: Generalized Time-Partitioned binary classification (Sub-Features)
4.7 Discussion
A variety of featurespaces and datasets have been analyzed and the results are now
summarized and used to answer: What are the optimal feature(s) or featurespace(s) that
allow for the maximal discriminant behavior between the three cognitive loads?
Individualized Subject-Specific Performance
Linear SVM is able to satisfactorily discriminate in the original and feature-processed
individual datasets. This finding points towards an inherent linear separability when
individual participants are trained and tested on their own data. In the analysis, the
combined featurespace (absolute, relative, β-relative) as well as the grouped sub-spaces
are compared.
1. By applying Feauture Processing to the datasets, 1.0 accuracy is achieved when
using 32 dimensions or 1-dimensional LDA space for both tasks. This out-performs
the performance in the orignal datasets.
2. A downside to the feature processing methodology for individualized datasets is the
requirement for a calibration procedure before each drive since the standardization
and normalization statistics need to be computed for each drive and then fed into
Chapter 4. Estimating Cognitive Load with Wireless EEG 105
the trained model for inference. However, this is a design tradeoff that is upto the
system designer to implement if the distinctly high performance is desired.
3. While a dimensionality reduction via LDA into 1 or 2 dimensions performs as well
as the 32-dimensional space, using the Absolute Power Bands featurespace also
performs .900 accuracy. An advantage of this approach in a real time monitoring
scenario is that no real time transformation from a multi-dimensional to a low
dimensional space is required via LDA. Absolute Power bands showcased the best
performance among the three sub-featuresets, followed by Relative Power Bands
and lastly the β-Relative Features.
Top Featurespaces: 32-dimensional Space, LDA space applied to 32-dimensional
space, 12-Absolute Power Featurespace
Top Learning Algorithm: Simple Linear Classifiers sufficient. Tested on lSVM
Generalized Subject-Partitioned Classification
Subject-Partitioned scheme where participants are tested on models generated by other
participants is the most challenging to model. Around guess performance is achieved
for all featurespaces with feature processing and dimensionality reduction techniques. A
subject-partitioned group is perhaps the most practical system design methodology as
prediction models may be pre-trained on anonymous subjects and an elegant out-of-box
performance may be experienced by the new test user. However, using a low channel
EEG sensor such as a Muse, this is not a feasible solution as around guess performance
is achieved.
Top Featurespaces (in order of ranking): PCA Applied to 32 Dimensional Space
Top Learning Algorithm:kNN with 50 nearest neighbours
Chapter 4. Estimating Cognitive Load with Wireless EEG 106
Generalized Time-Partitioned Classification
The following table identifies the top feature and featurespaces in the time-partitioned
classification task. The results are comparable to prior studies and it can be deduced
that by training and testing on the same participants, models can be practically used
towards automated driver cognitive modeling.
1. In the original dataset with 32-features, kNN, rbSVM and ANN, all perform more
superior to Linear and Logistic Regression based models.
Top Featurespaces: 32-Features, 3-Dimensional PCA
Top Learning Algorithms: ANN, PCA, kNN10
2. The Feature Processed dataset achieves .86 and .79 accuracy for the binary and
ternary classification tasks. This is an approximated .20 and .15 percent perfor-
mance improvement when compared to the original dataset.
Top Featurespaces: 32-Features
Top Learning Algorithms: ANN
3. Analyzing the results of the Absolute, Relative and β-Relative Sub-Featurespaces,
the Relative Power Bands outperforms the other sub-featurespaces.
Top Featurespaces: Relative Power (Original, PCA and LDA)
Top Learning Algorithms: ANN, kNN10, rbSVM
4. The top 5 individual features are listed in order of ranking.
Top Featurespaces: Absolute γ, Absolute beta, Absolute β γ, Relative γ, Relative
β, α, αθ
Top Learning Algorithms: kNN
Chapter 4. Estimating Cognitive Load with Wireless EEG 107
4.8 Chapter Summary
In this chapter, the experiment and results associated with generating statistical learn-
ing models to estimate driver cognitve workload are presented. First, the experiment
methodology is outlined, followed by the explaination of the generation of the various
datasets generated by adapting the featurespaces and application of the Feature Pro-
cessing. The methodology of Feature Processing is also described and a sliding window,
standardization and normalization process is described. Following, the machine learning
algorithms are described and the performance evaluation metrics are established. Fi-
nally, individualized and generalized subject-partitioned and time-partitioned datasets
are evaluated and a myriad of simulation results are presented to describe the top fea-
ture(s) and featurespace(s). Individual High Frequency Band features (β and γ) show
superior performance in discriminating between the three workloads, the Relative Power
(sub-feature-set) showcases the best performance as a sub-group and the overall best
performance is achieved by combining the sub-featurespaces and modeling with ANN.
Individual Models generate the best results using simple linear classifier followed by the
Time-Partitioned models using non-linear classifiers. Subject-Partitioned models are very
difficult to model.
Chapter 5
Conclusion
The goal of this study is towards the estimation of Driver Cogntive Workload Monitoring
using a Low Channel EEG modality. To that extent, the eDREAM dataset consisting of
labeled EEG data from a Low Channel wireless Headset is used as the system modality
and a system design consisting of Feature Extraction, Feature Processing and Classifica-
tion are performed. Through the generation of a system pipeline, this study showcases
the details and technicalities of each stage and its effect in generating varying statistical
leaerning performances estimating the driver cognitive workload. As a result, the top
feature(s) and featureset(s) are identified and determined to be the top candidates to
optimize this classification challenge. It is shown that using a Low Channel EEG modal-
ity, it is possible to use the information and biophysiological responses of the Pre-Frontal
Cortex to capture, process and extract features that are able to successfully discriminate
between the granular cognitive workloads induced via the n-back task.
In Chapter 2, attention is paid towards the modeling of the cognitive workload and
an overview of the features present as part of the eDREAM EEG modality. A detailed
description of the Feature Extraction process and the generation of the Spectral Power
Features are discussed. A top level statistical performance of the data is presented and
the features trends across varying cognitive loads are observed. Dimensionality Reduc-
108
Chapter 5. Conclusion 109
tion and its advantages as pertaining to the visualization and performance in Machine
Learning system are discussed. PCA and LDA are used as the dimensionality reduction
techniques in this work. Chapter 3 reviews the relevant works used for understanding the
area and two state-of-the-art studies are followed closely in this study. Finally a data-
mining approach of feature selection based on classification performance across a breadth
of featurespaces is performed in Chapter 4. Chapter 4 generates multi-dimensional in-
dividualized and generalized datasets across 28 subjects to carry out the feature se-
lection process. A feature processing methodology is implementing using windowing,
standardization and normalization schemes. The empirical advantages of such a scheme
are demonstrated across individualized and time-partitioned datasets. Finally, datasets
from individualized, time-partitioned and subject-partitioned schemes are generated and
using a k-Fold cross validation methodology, are evaluated with the goal of identifying
the top featurespace(s) and learning algorithms for the task. A summarization of the top
features and the practical implications are discussed.
5.1 Summary of Contributions
This pilot study explores the eDREAM dataset’s EEG modality towards building an
automated driver cognitive workload monitoring system. The sensory and ancillary in-
formation is explored and recommendations are made towards the best methodologies to
use the available data. Generation of additional features from the provided information
is demonstrated. Featurespaces used in literature are generated for this study and an
approach to learn from the myriad of featurespaces is demonstrated.
The primary goal of this study is to understand if the eDREAM EEG modality can be
used for discriminating between Cognitive Workload levels as induced by the secondary
n-back task. This is demonstrated via a bottom-up Data Mining Feature Selection pro-
cess where a breadth of datasets with varying featuresets are generated for both binary
Chapter 5. Conclusion 110
and ternary classification tasks. Models generated for individual, grouped and permu-
tations with dimensionality reduction techniques of LDA and PCA are demonstrated
and evaluated. In this dataset, it is shown that β and γ Absolute and Relative indi-
vidual features showcase the highest discriminant behavior in the single feature domain.
The Relative Grouped Bands showcase the highest discriminant behavior in the grouped
domain. Best overall performance is achieved when all 32 Features are used and an ac-
ceptable performance for practical considerations is also achieved when PCA is applied
to this featurespace. Top performance of .79 for the ternary classification task and .86
for the binary classification is obtained when using all 32-features in a feature-processed
(standardized and normalized) dataset.
Another contribution is towards an in-depth comparison performed when using stan-
dardization / normalization techniques (and thereby using subject historical data) to-
wards reducing the inter-subject differences between the participants to improve model
generalization. A significant improvement in performance is observed when the partici-
pants undergo a Feature Processing stage where their historical data is used for calibration
and standardization purposes. The practical implications of using a standardization pro-
cess are also discussed and it is left onto the system designer to make a decision between
higher performance and user-experience. However, through this process, one of the goals
in this study, of generating generalized models across participants is achieved when using
time-partitioned models. A myriad of data partitioning techniques are also performed.
In a subject-partitioned dataset, training and model generation is performed on a unique
set of participants and tested on another set of participants. Around guess performance
is observed which shows that significant inter-individual differences exist and in order to
achieve high-performance, it is important to have participants part of both the training
and testing stages. In the time-partitioned validation methodology, same participants are
part of both the training and testing stages and much improved performance is achieved.
Finally an analysis of individualized models with and without feature-processing is
Chapter 5. Conclusion 111
performed and the performance advantages of using a feature-processed model is dis-
cussed. Best performance, as expected, is achieved in individualized feature-processed
datasets.
5.2 Future Works
5.2.1 Improvements to Current Work
1. Due to the availability of data of all users, the individual performance of all users
were observed and it was possible to identify users which showcased outlier perfor-
mance. In some cases, in order to create stable models, these users were excluded
as part of the model generation process. This can be considered to be a form of
data snooping. In order to build generalized models in a practical scenario, it is
essential that outlier participants are studies and used as part of the evaluation as
it is imperative that such users will be present in the population.
2. By using standardization and normalization, four pieces of historical information
are required to be stored and therefore before each drive, a calibration process is
required to compute these four parameters. This can be a detriment in driving
scenarios where the user may want to step in and start driving and not undergo a
controlled calibration process. In essence, the requirement of the historical data of
a driver may pose challenges in the real world.
3. Similar challenge as the previous point exists for dimensionality reduction tech-
niques where PCA and LDA coefficients need to be stored to apply on the unseen
test data. The decision to either use group or individualized PCA/LDA coefficients
poses challenges in the real world and evaluation performance of both methodolo-
gies should be investigated. This study stores a set of group PCA/LDA coefficients
for evaluation which are applied to each test subject individually.
Chapter 5. Conclusion 112
4. The affect of using differing sliding window schemes should be evaluated. In this
study, sliding windows of size 1-3 seconds are evaluated and due to short 75s n-
back regions, larger windows were not evaluated. It would be beneficial to study
the performance as window sizes are increased.
5.2.2 Extension of Work
1. Phase information generated from the FFT transformation can be evaluated in
addition to the amplitude response. In general more comprehensive temporal-
spectral extraction techniques such as Wavelet Packet Transform and Empirical
Mode Decomposition can be performed such as in a study by Li et al. [36]
2. Deep Learning methods have become very popular in recent years and richer set of
features can be learned iteratively using deep networks such as Recurrent Neural
Networks or Stacked Denoising AutoEncoder networks as performed by Yin et al.
[58]
3. Fusion techniques by adding features from other modalities should also be inves-
tigated. The eDREAM dataset has a myriad of modalities such as video, eye-
tracking, EEG etc. and using fusion techniques can be implemented.
4. The prevalence of marked gender and age information may also be used to in-
vestigate the affect of gender and age on model performance. This can allow for
a higher-level human-factors study towards the affect of age and gender towards
model generation.
Bibliography
[1] “Drowsy driving and automobile crashes: Report and recommendation,” National
Heart, Lung, and Blood Institute, Tech. Rep., 1998.
[2] “National motor vehicle crash causation survey,” US Department of Transportation,
Tech. Rep., 2008.
[3] Learning from Data - A Short Course. AMLBook, 2012.
[4] “Combining eeg with pupilometry to improve cognitive workload detection,” Phys-
iological Computing, 2015.
[5] “Critical reasons for crashes investigated in the national motor vehicle crash causa-
tion survey,” US Department of Transportation, Tech. Rep., 2015.
[6] (2017) edream dataset. [Online]. Available: http://www.dsp.utoronto.ca/projects/
eDREAM/
[7] (2017) Emotiv 5-channel eeg headset. [Online]. Available: http://www.emotiv.com/
[8] (2017) Mobita 32-channel eeg headset. [Online]. Available: http://www.biopac.com/
[9] (2017) Muse 4-channel eeg headset. [Online]. Available: http://dev.choosemuse.
com/tools/available-data
[10] (2018) Tesla and gm self-drive cars involved in road collisions. [Online]. Available:
http://www.bbc.com/news/technology-42801772
113
BIBLIOGRAPHY 114
[11] K. S. A. Sahayadhas and M. Murugappan, “Detecting driver drowsiness based on
sensors: A review,” Sensors, 2012.
[12] A. S. Aghaei, B. Donmez, C. C. Liu, D. He, G. Liu, K. N. Plataniotis, H.-Y. W.
Chen, and Z. Sojoudi, “Smart driver monitoring: When signal processing meets
human factors: In the driver’s seat,” IEEE Signal Processing Magazine, vol. 33,
no. 6, pp. 35–48, 2016.
[13] H. Almahasneh, W. Chooi, N. Kamel, and A. Malik, “Deep in thought while driving:
An eeg study on driver cognitive distraction,” Transportation Research, vol. 26, pp.
218–226, 2014.
[14] C. Berka, D. Lavendowski, M. Limicao, A. Yau, G. Davis, V. Zivkovic, R. Olmstead,
D. Tremoulet, and P. Craven, “Eeg correlates of task engagement and mental work-
load in vigilance, learning and memory tasks,” Aviation, Space and Environmental
Medicine, vol. 78, pp. 234–244, 2007.
[15] A. Berkivich-Ohana, J. Glicksohn, and A. Goldstein, “Mindfullness-induced changes
in gamma band activitty - implications for the default mode network, self-reference
and attention,” Clinical Neurophysiology, 2011.
[16] B. Borghetti and C. Rusnock, “Introduction to real-time state assessment,” Air
Force Institute of Technology, USA, Tech. Rep., 2016.
[17] G. Borghini, L. Astolfi, G. Vecchiato, D. Mattie, and F. Babiloni, “Measuring neu-
rophysiological signals in aircraft pilots and car drivers for the assessment of mental
workload, fatigue and drowsiness,” Neuroscience and Biobehavioral Reviews, vol. 44,
pp. 58–75, 2012.
[18] A. Campagne, T. Pebayle, and A. Muzet, “Correlation between driving errors and
vigilance level: influence of the driver’s age,” Physiology and Behavior, vol. 80, pp.
512 – 524, 2004.
BIBLIOGRAPHY 115
[19] S. Chandra and S. Sharma, “Workload regualtion by sudarshan kriya: an eeg and
ecg perspective,” vol. 4, pp. 13–25, 2017.
[20] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM
Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27,
2011.
[21] D. T. D. Yadron. (2016) Tesla driver dies in first fatal crash while using autopilot
mode. [Online]. Available: https://www.theguardian.com/technology/2016/jun/
30/tesla-autopilot-death-self-driving-car-elon-musk
[22] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama, “Driver inattention monitoring
system for intelligent vehicles: A review,” IEEE Transactions on Intelligent Trans-
portation Systems, vol. 12, no. 2, pp. 596–614, 2011.
[23] A. Gevins and M. Smith, “Monitoring working memory load during computer-based
tasks with eeg pattern recognition methods,” SAM Technology and EEG Systems
Laboratory, Tech. Rep., 1998.
[24] M. Gevins, A Smith, “Monitoring working memory load during computer-based
tasks with eeg pattern recognition methods,” Human factors and Ergonomics Soci-
ety, vol. 40, no. 1, pp. 78–91, 1998.
[25] M. Gillberg, G. Kercklund, and T. Akerstedt, “Sleepineses and performance of pro-
fessional drivers in a truck simulator - comparisons between day and night driving,”
Journal of Sleep Research, vol. 6, pp. 12–15, 1996.
[26] J. L. Harbluk, Y. I. Noy, P. L. Trbovich, and M. Eizenman, “An on-road assessment
of cognitive distraction: Impacts on drivers’ visual behavior and braking perfor-
mance,” Accident Analysis & Prevention, vol. 39, no. 2, pp. 372–379, 2007.
BIBLIOGRAPHY 116
[27] T. Harmony, “The functional significance of delta oscillations in cognitive process-
ing,” Frontiers in Integrative Neuroscience, 2013.
[28] T. Harmony and T. Fernandez, “Eeg delta activity: an indicator of attention to
internal processing during performance of mental tasks,” International Journal of
Psychophysiology, 1996.
[29] S. Hart, “Nasa task load index (tlx),” NASA, Tech. Rep., 1986.
[30] H. O. J. Son, M. Park, “Sensitivity of multiple cognitive workload measures: A field
study considering environmental factors,” Daegu Gyeongbuk Institute of Science
and Technology Deagu, South Korea, Tech. Rep., 2014.
[31] B. Jap, S. Lal, P. Fischer, and E. Bekiaris, “Using eeg spectral components to assess
algorithms for detecting fatigue,” Expert Systems with Applications, vol. 36, pp.
2352–2359, 2009.
[32] H.-B. Kang, “Various approaches for driver and driving behavior monitoring: A
review,” in Proceedings of the IEEE International Conference on Computer Vision
Workshops, 2013, pp. 616–623.
[33] R. Khushaba, S. Lal, and D. G, “Driver drowsiness classification using fuzzy wavelet-
packet-based feature-extraction algorithm,” IEEE Transactions on Biomedical En-
gineering, vol. 58, pp. 121–131, 2011.
[34] S. Lal and A. Craig, “Driver fatigue: Eeg and psychological assesment,” Psychophys-
iology, vol. 39, pp. 313–321, 2002.
[35] S. Lei and M. Roetting, “Influence of task combination of eeg spectrum modulation
for driver workload estimation,” Berlin Institute of Technology, Tech. Rep., 2011.
BIBLIOGRAPHY 117
[36] D. Li, W. Pedreyez, and N. Pizzi, “Fuzzy wavelet packet based feature extraction
method and its applications to biomedical signal classification,” IEEE Transactions
in Biomedical Engineering, vol. 52, pp. 1132–1139, 2005.
[37] Y. Liang, M. Retes, and J. Lee, “Real-time detection of driver cognitive distraction
using svm,” IEEE Transactions on Intelligent Transportation Systems, vol. 8, pp.
341–350, 2007.
[38] F. Lin, L. Ko, C. Chuang, T. Su, and L. C, “Generalzied eeg-based drowsiness pre-
diction system by using a self-organizing neural fuzzy system,” IEEE Transactions
on Circuits and Systems, vol. 59, pp. 2044–2055, 2012.
[39] C. C. Liu, “Towards practical driver cognitive load detection based on visual atten-
tion information,” 2017.
[40] H. D. Liu, Cheng Chen, B. Donmez, and K. N. Plataniotis, “edream data collection
report,” Tech. Rep., 2016.
[41] M. Lundqvist, P. Herman, and A. Lansner, “Theta and gamma power increases
and alpha/beta power decreases with memory load in an attractor network model,”
Journal of Cognitive Neuroscience, vol. 23, pp. 3008–3020, 2011.
[42] B. Mehler, B. Reimer, and J. Dusek, “Mit agelab delayed digit recall
task (n-back),” Massachusetts Institute of Technology, Cambridge, MA, Tech.
Rep. 2011-3B, 2011. [Online]. Available: http://agelab.mit.edu/system/files/
Mehler et al n-back-white-paper 2011 B.pdf
[43] R. Mitchell. (2018) Tesla crash highlights a problem: When cars are
partly self-driving, humans don’t feel responsible. [Online]. Available: http:
//www.latimes.com/business/autos/la-fi-hy-tesla-autopilot-20180125-story.html
[44] G. Nolfe, “Eeg and medicine,” Clinical Neurophysiology, vol. 123, pp. 631–632, 2012.
BIBLIOGRAPHY 118
[45] L. Ryu and M. Rohae, “Evaluation of mental workload with a combined measure
based on physiological indices during a dual task of tracking and mental arithmetic,”
Industrial Ergonomics, vol. 35, pp. 991–1009, 2005.
[46] E. P. F. Shijing Liu, Chang S. Nam, “Quantitative modeling of user performance in
multitasking environments,” 2018.
[47] M. Smith and A. Gevins, “Monitoring task loading with multivariate eeg mea-
sures during complex forms of human-computer interaction,” Human factors and
Ergonomics Society, vol. 43, no. 3, pp. 366–380, 2001.
[48] ——, “Monitoring task loading with multivariate eeg measures during complex forms
of human-computer interaction,” Brain Research Institute and SAM Technology, San
Francisco, California, Tech. Rep., 2001.
[49] A. Sonnleitner, M. Treder, M. Simon, S. Willmann, A. Erwald, A. Buchner, and
M. Schrauf, “Eeg alpha spindles and prolonged brake reaction times during auditory
distraction in an on-road driving study,” Accidednt Analysis and Prevention, vol. 62,
pp. 110–118, 2014.
[50] N. Sriraam, T. Padmashri, and U. Maheshwari, “Recognition of wake-sleep stage 1
multichannel eeg patterns using spectral entropy features for drowsiness detection,”
Australasian College of Physical Scientists and Engineers in Medicine, vol. 39, pp.
797–806, 2016.
[51] J. Sweller, “Cognitive load during problem solving: Effects on learning,” Cognitive
science, vol. 12, no. 2, pp. 257–285, 1988.
[52] J. G. T.Akerstedt, A.Anund, “Subjective sleepiness is a sensitive indicator of insuf-
ficient sleep and impaired waking function,” Journal of Sleep Research, vol. 23, pp.
12–58, 2014.
BIBLIOGRAPHY 119
[53] S. Wang, J. Gwizdka, and W. Chaovalitwongse, “Using wireless eeg signals to as-
sess memory workload in the n-back task,” IEEE Transactions on Human-Machine
Systems, vol. 46, pp. 424–435, 2016.
[54] B. Xie and G. Salvendy, “Review and reappraisal of modelling and predicting mental
workload in single- and multitask environments,” Work and Stress, 2000.
[55] t. S. Y. Zheng, Y.Jie, “Workload functions distribution method: A workload mea-
surement based on pilot’s behaviors,” Shanghai Aircraft Airworthiness Certification
Center of CAAC, Shanghai, People’s Republic of China, Tech. Rep., 2016.
[56] V. Yeo, X. Li, K. Shen, and E. Wilder-Smith, “Can svm be used for automatic eeg
detection of drowsiness during car driving?” Safety Science, vol. 47, pp. 115–124,
2009.
[57] R. M. Yerkes and J. D. Dodson, “The relation of strength of stimulus to rapidity of
habit-formation,” Journal of Comparative Neurology and Psychology, vol. 18, no. 5,
pp. 459–482, 1908.
[58] Z. Yin and J. Zhang, “Cross-session classification of mental workload levels using eeg
and an adaptive deep learning model,” Biomedical Signal Processing and Control,
vol. 33, pp. 30–47, 2017.
[59] M. Ziegler, A. Kraft, M. Krein, L. Lo, B. Hatfield, W. Casebeer, and B. Russel,
“Sensing and assessing cognitive workload across multiple tasks,” Lockheed Martin
Advanced Technology Lab, Arlington, VA, USA, Tech. Rep., 2016.