language model assisted eeg-based brain computer …cj82pq66g/... · simulation for each variable...

LANGUAGE MODEL ASSISTED EEG-BASED BRAIN COMPUTER

INTERFACE FOR TYPING

A Dissertation Presentedby

Mohammad Moghadamfalahi

to

The Department of Electrical and Computer Engineering

in partial fulfillment of the requirementsfor the degree of

Doctor of Philosophy

in

Electrical Engineering

Northeastern UniversityBoston, Massachusetts

December 2016

Contents

List of Figures vi

Abstract xi

1 Introduction 1

1.1 Noninvasive BBCIs for Communication and Control . . . . . . . . . . . . . . . . 1

1.2 Overview of BCI components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Input modalities to the BCI . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Signal Processing and Inference in BCI for Communication . . . . . . . . 5

1.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.4 Output Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Manuscript organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Language-Model Assisted Brain Computer Interface for Typing: A Comparison ofMatrix and Rapid Serial Visual Presentation 21

2.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

ii

2.3 General system specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Presentation Component . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.2 Feature Extraction Component . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.3 Decision Making Component . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.4 System Operation Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Spatio-Temporal EEG Models for BCIs 44

3.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 RSVPKeyboard™ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.1 Presentation Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.2 Probabilistic Inference Mechanism . . . . . . . . . . . . . . . . . . . . . . 47

3.3.3 Signal Processing and Feature Extraction . . . . . . . . . . . . . . . . . . 47

3.4 Signal Models and Covariance Matrix Structures . . . . . . . . . . . . . . . . . . 50

3.4.1 Linear Forward Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.4.2 ARMA model for the multicahnnel EEG signal . . . . . . . . . . . . . . . 56

3.5 Covariance Estimation Flip-Flop Algorithm . . . . . . . . . . . . . . . . . . . . . 58

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

iii

3.6.1 Study 1: Analysis of Required Calibration Session Length . . . . . . . . . 60

3.6.2 Study 2: Effect of different covariance structures on the EEG-based BCIclassification performance . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 Active Recursive Bayesian State Estimation for Multimodal Noninvasive Body/BrainComputer Interface Design 72

4.1 Noninvasive BBCIs for Communication and Control . . . . . . . . . . . . . . . . 72

4.2 Active Recursive Bayesian State Estimation . . . . . . . . . . . . . . . . . . . . . 73

4.2.1 Active learning for RBSE . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.2 Submodular monotone set functions for set optimization problems . . . . 79

4.2.3 On the objective functions for Query optimization . . . . . . . . . . . . . 81

4.3 Illustrative BCI Design Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3.1 ERP based BCI for letter-by-letter typing . . . . . . . . . . . . . . . . . . 85

4.3.2 Decision Making Component . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3.3 Application of proposed cost function in BCI example . . . . . . . . . . . 89

4.3.4 Combinatorial Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.5 Experimental Results and Discussions . . . . . . . . . . . . . . . . . . . . 95

5 Conclusion 103

5.1 Work accomplished . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.2 Suggested future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

A Error-Related Potentials for EEG-based Typing Systems 106

iv

A.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

A.2 Materials: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A.3 Method: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A.4 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

A.5 Discussion & Significance: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

v

List of Figures

2.1 The in-house BCI block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Probabilistic graphical model of the fusion rule. . . . . . . . . . . . . . . . . . . . 29

2.3 Bar charts of average AUC with error bars. Sub-figures (a), (b) and (c) demonstratethe accuracy statistics for each ITI, respectively for RCP, SCP and RSVP paradigms.Sub-figure (d) reports the AUC statistics for different presentation paradigms atIT I = 150ms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4 Average ERP response to target and non-target stimuli, for each presentationparadigm and ITI pairs for user "U8". From top to buttom the ITI is increasingmonotonically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5 Average ERP response to target and non-target stimuli, for each presentationparadigm and ITI pairs for user "U12". From top to buttom the ITI is increas-ing monotonically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Topography map of (1 − p) resulting from paired t-tests for each channel’s AUCbetween each paradigm pair and across users for IT I = 150ms. Here red denotes1 − p = 1 and blue represents 1 − p = 0. . . . . . . . . . . . . . . . . . . . . . . . 37

2.7 Topography of channel based AUCs for each user at IT I = 150ms. . . . . . . . . 40

vi

2.8 Typing speed analysis results. Average number of sequences per (typed) targetcharacter (lower means faster typing) and probability of phrase completion (highermeans more accuracy) are shown. Simulation results are used to define the shaded90% confidence area shown. The dashed line shows the expected value fromsimulation for each variable and the solid line shows actual typing outcomes in asingle experimental run that follows. . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.9 Number of sequences utilized by users U7, U3, and U9 to type each target characterusing RSVP, SCP and RCP paradigms. Red bars show the sequence counts forepochs that typed a wrong character and yellow bars show the number of sequencesused to fix the error before typing the correct target. Green bars show the numberof sequences in epochs that resulted in correct selection of target symbols (lowermeans faster typing). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.10 Scatter plot of the average number of sequences for correctly typing a target char-acter. The x-axis demonstrates the mean number of sequences per target characterwhen no language model is used, y-axis represents the mean number of sequencesrequired per target character while a 6-gram language model is utilized. Each pointon the figure shows the average of the mean number of sequences per target from10 Monte-Carlo simulations. Horizontal skewness of each box around a point is thestandard-deviation of the number of sequences per target character for typing whileno language model was used and the vertical skewness is the standard-deviation inpresence of the language model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.1 Normalized root-MSE and LB as a function of the sample size for three differentcovariance structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2 Bar charts of the median area under the receiver operation characteristics (ROC)curves (AUC) with bar range indicating the maximum and minimum value fortwelve users calculated by use of different signal models for every presentationparadigm and ITI combinations when the classifiers are trained with all training data. 64

vii

3.3 The median of AUC among twelve BCI users for all ITIs and presentation paradigmsas a function of the model order complexity (the number of parameters to be esti-mated) of signal models with different covariance structures. Complexity numbers136, 137, 200, 2216 and 524800 correspond to models with KSI, KSAR(1), KST,GKS, and non-structured (NS) covariances respectively. . . . . . . . . . . . . . . . 65

3.4 Bar charts of the median area under the receiver operation characteristics (ROC)curves (AUC) with bar range indicating the maximum and minimum value fortwelve users calculated by use of different signal models for every presentationparadigm ITI=150ms and different calibration lengths ({10; 20; 30; 40; 50}sequences)to train the classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1 Hidden Markov Model of order n (HMM-n). . . . . . . . . . . . . . . . . . . . . . 73

4.2 PGM of the system during inference cycle k. . . . . . . . . . . . . . . . . . . . . 75

4.3 HMM-n while prior states are observed. . . . . . . . . . . . . . . . . . . . . . . . 75

4.4 PGM of the system in at epoch k. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5 PGM of the system in at epoch k while multiple sequences are presented. . . . . . 77

4.6 Typical BCI block diagram [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.7 Scatter plot of average TTD in minutes from 20 Monte-Carlo simulations for RSVPand ARSVP [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.8 Average probability of phrase completion with 90% confidence intervals for RSVPand ARSVP paradigm [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.9 Scatter plot of total typing duration of 10 phrases in terms of minutes for RCP andALP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.10 Average probability of phrase completion with 90% confidence intervals for RCPand ALP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.11 Scatter plot of TTD of 10 phrases in terms of minutes for SCP and ASCP. . . . . . 101

viii

4.12 Average probability of phrase completion with 90% confidence intervals for SCPand ASCP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A.1 Average time to successfully complete the typing task for two users. . . . . . . . . 108

ix

Acknowledgements

This work would not have been possible without the support of many people who encouragedand helped me during my doctoral program. I was fortunate to have one of the best advisors both inhis scientific abilities and, perhaps more importantly, in being a caring human being. So, I wishto express my gratitude to my advisor Prof. Deniz Erdogmus, for his continuous support duringmy Ph.D. study and research. He supported me through the challenging, and sometimes stressful,situations of my lifetime. His positive attitude, understanding, caring, and immense knowledgehave helped me a lot to shape my professional career.

A special thank to Dr. Murat Akcakaya, for his constant support and constructive suggestionswhich were determinant for the accomplishment of this work. I would like to thank and acknowledgemy dissertation committee members, Prof. Dana Brooks, Prof. Jennifer Dy, and Dr. MuratAkcakaya, for all of their guidance and constructive comments throughout this process.

I thank my family, especially my mom, dad, and sisters Mana and Ghazaleh who have educatedme with lots of affection and taught me to be who I am. They did whatever they could to help methrough my scientific (and non-scientific) endeavors.

I would also like to thank my many amazing friends all around the world for helping me duringthis journey and for creating so many unforgettable memories. From the long list of friends, Iespecially like to thank Umut Orhan, Paula Gonzales-navarro, Hooman Nezamfar, Jamshid Sourati,Asieh Ahani, Marzieh Haghighi, Matt Higger, Fernando Quivira, and Bruna Girvent from cognitivesystems lab (CSL), and Ardalan Alizadeh and his wife Ronak Moradi, Mohammadhasan Safaei,and Mohammadreza Doustmohammadi.

I acknowledge and appreciate the department of electrical and computer engineering of North-eastern University and my advisor for his financial support as research assistanship at CSL.

Abstract

LANGUAGE MODEL ASSISTED EEG-BASED BRAIN COMPUTER

INTERFACE FOR TYPING

by

Mohammad Moghadamfalahi

Doctor of Philosophy in Electrical Engineering

Northeastern University, December 2016Dr. Deniz Erdogmus Adviser

Brain computer interfaces (BCIs) promise to provide a novel access channel for assistivetechnologies, including augmentative and alternative communication (AAC) systems, to peoplewith severe speech and physical impairments (SSPI). Research on the subject has been acceleratingsignificantly in the last decade and the research community took great strides towards makingBCI-AAC a practical reality to individuals with SSPI. Nevertheless, the end goal has still not beenreached and there is much work to be done to produce real-world-worthy systems that can becomfortably, conveniently, and reliably used by individuals with SSPI with help from their familiesand care givers who will need to maintain, setup, and debug the systems at home.

In an earlier development we have introduced a gaze minimally dependent noninvasive electroen-cephalography (EEG) based BCI known as RSVPKeyboard™. To improve the system and addressa wider target population we have added matrix presentation paradigms to our system which requirea higher level of gaze control but are known to provide a better performance. In this manuscript,through an experimental study, we assess the speed, recorded signal quality and system accuracyof a language-model-assisted BCI typing system using three different presentation paradigms:

xi

4 × 7 matrix paradigm of a 28-character alphabet with row-column presentation (RCP) and singlecharacter presentation (SCP), and rapid serial visual presentation (RSVP) of the same. Our analysesshow that signal quality and classification accuracy are comparable between the two visual stimuluspresentation paradigms. In addition, we observe that while the matrix based paradigm can begenerally employed with lower inter-trial-interval (ITI) values, the best presentation paradigm andITI value configuration is user dependent. This potentially warrants offering both presentationparadigms and variable ITI options to users of BCI typing systems.

Multichannel EEG is widely used in non-invasive BCIs for user intent inference. EEG can beassumed to be a Gaussian process with unknown mean and autocovariance, and the estimationof parameters is required for BCI inference. However, the relatively high dimensionality of theEEG feature vectors with respect to the number of labeled observations lead to rank deficientcovariance matrix estimates for multivariate Gaussian class-conditioned feature density models.Typically, this problem was tackled by applying regularization on maximum likelihood covariancematrix estimators, resulting in regularized discriminant analysis (RDA) for discriminative dimensionreduction.

Here, we build a spatio-temporal signal model for EEG and show that under certain assumptionsthis model leads to a Kronecker product structure for said covariance matrices. Our underlyinghypothesis is that the structure imposed on covariance matrices will improve estimation accuracyand accordingly will result in BCI performance improvements. Using Cramer-Rao bound analysison simulated data, we demonstrate that a model with structured covariance matrices will achieve thesame estimation error as a model with no covariance structure using fewer labeled EEG observations.In practice, this corresponds to shorter calibration sessions and is an important benefit for BCI users.Results obtained using EEG data from 12 healthy participants in the context of a language-model-assisted typing BCI show improvement in classification accuracy and reduction in model ordercompared to an approach that does not impose any structure.

Moreover, presentation paradigm design in current ERP-based typing BCI typically query theuser with an arbitrary subset or entire set of characters. However, the typing accuracy and also typingspeed can potentially be enhanced with more informed subset selection and flash assignment. In thismanuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) frameworkfor inference and sequence optimization. Rather than showing all the possible stimuli or randomlychoosing a subset, the developed framework, prior to presentation, optimally selects a subset basedon a query function. Through a simulation-based study, we assess the effect of active-RBSE on theperformance of a language-model assisted typing BCI in terms of typing speed and accuracy. To

xii

provide a baseline for comparison, we also utilize standard presentation paradigms namely, RCPparadigm and also random RSVP paradigms. The results show that utilization of active-RBSE cansignificantly enhance the online performance of the system, both in terms of typing accuracy andspeed.

xiii

Chapter 1

Introduction

1.1 Noninvasive BBCIs for Communication and Control

The digital divide between healthy individuals and millions of individuals with severe speechand physical impairments (SSPI) is rapidly growing. Individuals with reduced speech and motorability due to injury or motor neuron diseases, including cerebral palsy (CP), multiple sclerosis(MS), amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), locked-in syndrome(LIS), spinal cord injury (SCI), stroke, and traumatic brain injury (TBI), currently have severalforms of alternative input devices or software solutions that provide access to computer applications.Speech/gesture recognition, on-screen keyboards, touch screens/pads, electronic pointing devices,mouth/head sticks, eye-trackers, joysticks/trackballs are among the computer access modalitiesexploited by commercial assistive technology (AT) solutions. These ATs perform well if the userhas maintained some form of accurate muscle control. But in late stages of ALS or for people withLIS this assumption does not always hold.

Brain computer interfaces (BCIs) have shown promising capacity to mitigate the dependencyon muscle control for a reliable performance [3], by inferring user intent from some form ofbrain activity recordings. Brain activity can be captured through different recording techniquessuch as noninvasive electroencephalography (EEG), magnetoencephalography (MEG), functionalmagnetic resonance imaging (fMRI), electrocorticography (ECoG), and near infrared spectroscopy(NIRS) [4, 5, 6, 7]. Among all, EEG has been the subject of many studies for developing BCIs, as asafe, cost effective, and potable solution [3].

In the next section of this chapter, with a focus on a particular class of BCIs we provide and

1

CHAPTER 1. INTRODUCTION

overview of different building blocks of systems existing in the field.

1.2 Overview of BCI components

The typical components of a noninvasive BCI system are: (1) stimulus presentation paradigm(e.g., auditory, visual, tactile, etc.), (2) signal acquisition (EEG data or other modalities such aseye tracker, etc.), (3) preprocessing (signal filtering, artifact removal, etc.), (4) dimensionalityreduction, (5) EEG evidence (feature extraction), (6) contextual evidence (e.g., language model orword completion), (7) joint inference (system decision by classification).

1.2.1 Input modalities to the BCI

EEG-based BCI is one recent development that relies on monitoring the electrical activity ofthe brain [8], and it is now considered a possible access method for communication and controlfor individuals with SSPI to maintain their daily life activities. EEG-based BCIs have becomeincreasingly popular due to their portability, cost-effectiveness, high temporal resolution, anddemonstrated reliability. A number of EEG signals have been used in noninvasive BCI to detectuser intent. Most popularly, BCI systems have exploited:

∙ Auditory and visual event related potentials (A-ERP/V-ERP): As a response to infrequentnovel/target stimuli, the brain generates a P300 response, a positive deflection in centro-parietal scalp voltage with a typical latency just over 300ms [9] and other accompanyingwaves. This natural novelty detection or target matching response of the brain allows designersto detect user intent from EEG signals, using either auditory or visual stimuli to elicit thisresponse.

∙ Volitional cortical potentials (VCP): Volitional synchronization and desynchronization ofcortical electrical activity have been utilized in numerous BCI systems that control externaldevices, including, cursors, avatars, and robotic agents to perform simple activities of dailyliving, as well as to control typing interfaces for communication.

∙ Steady-state evoked potentials (SSEP): Fluctuating auditory or flickering visual stimuli (fol-lowing periodic or other structured patterns) will elicit steady state auditory/visual evokedpotentials (SSAEP/SSVEP) in the auditory and visual cortex areas, respectively. Focusingauditory or visual attention on one of several such stimuli causes temporally matching electri-

2


cal oscillations in the cortex. Time-frequency features can be analyzed to identify with highaccuracy which stimulus the attention is placed on.

In this manuscript, we aim to present a design of a V-ERP BCI for communication trough letter-by-letter typing. Hence, from throughout the rest of this section we specifically focus on thecomponents of these BCIs.

1.2.1.1 Event Related Potentials

The pioneering example of these systems, is the matrix-speller from, Farwell and Donchin whichdemonstrates how to design a presentation paradigm for inducing P3001 in response to user intentas a control signal for BCI-based communication [10]. In this study, the subjects observe a 6x6matrix of letters in the English alphabet, numbers from 1 to 9 and a space symbol distributed onthe screen. While the user has focused on the intent character, rows and columns of the matrixare flashed randomly. This work led to extensive efforts for designing different configurations oralgorithms to improve the communication speed and accuracy with the matrix speller, as well asother audio, visual, and tactile stimulus presentation techniques for eliciting P300 responses. In thefollowing, we will review some of these stimulus presentation techniques.

Visuospatial Presentation: We categorize different visuospatial presentation techniques intofollowing groups:

Matrix Presentation: Generally the matrix spellers use an R × C matrix of symbols with R rowsand C columns. Traditionally, in these systems each row and column of the matrix is intensified ina pseudo-random fashion, while the participants count the number of highlighted rows or columns(or, in general, subsets) that include the desired symbol. Among all rows and columns in the matrix,only two contain the target symbol hence it is proposed that they will induce a P300 response. Bydetecting this signature in the EEG the BCI system can identify the target letter to enable typing.

The accuracy of BCIs highly depend on signal to noise ratio (SNR). Consequently, due to lowSNR of EEG, matrix speller systems require to sacrifice the speed by cuing the user with multiplesequences of flashes to achieve an acceptable accuracy. It was demonstrated that the matrix spellercan achieve 7.8 characters/minute with 80% accuracy, using bootstrapping and averaging the trialsin different sequences [11]. Many signal processing and machine learning techniques have beenproposed by the researchers in the field, to improve the matrix speller performance in terms of speedand accuracy [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34].1P300 is an ERP which is elicited in response to a desired unpredictable and rare event. This evidence is characterized as a positivepeak around 300 ms after the onset of desired stimuli.

3


Considering the target population, matrix based presentation paradigms might not offer a suitablesolution for BCIs since they performs well in overt attention mode; however, in covert attentionmode its performance degrades significantly [35]. BCI researchers have proposed minimally gazedependent stimulus presentation techniques such as rapid serial visual presentation and balanced-treevisual presentation to overcome such performance drops.

Rapid Serial Visual Presentation (RSVP): RSVP is a technique in which stimuli are presented oneat a time at a fixed location on the screen, in pseudorandom order, with a short time gap in between.Within a sequence of RSVP stimuli, each symbol is shown only once, hence, the user intent isconsidered as rare event which can induce an ERP containing the P300 wave as consequence ofthe target matching process that takes place in the brain. RSVP aims to be less dependent on gazecontrol, by utilizing temporal separation of symbols instead of spatial separation as in the matrixspeller [36, 37, 38, 39, 40].

Usually, in RSVP-based BCIs the inference speed is lower than matrix spellers, as the binarytree that leads to symbol selections in a matrix speller could reduce expected bits to select a symbol(determined by entropy), by exploiting the opportunity of highlighting a subset of symbols; whileRSVP is constrained to a highly structured right-sided binary tree which can only offer a largerexpected bits per symbol. Letter-by-letter typing RSVP-BCIs designed by Berlin BCI and RSVPKeyboard™ groups, have achieved up to 5 characters/minute [36, 37, 38, 40]. Utilization of colorcues and language models have offered some enhancements in typing speeds with RSVP [37, 40].

Balanced-Tree Visual Presentation Paradigms: In balanced-tree visual presentation techniquevisual stimuli are distributed spatially into multiple presentation groups with balanced numbers ofelements. For example in a system from Berlin BCI known as Hex-o-Spell, a set of 30 symbols isdistributed among 6 presentation groups each containing 5 symbols. Presentation groups are flashedin a random fashion to induce an ERP in response to the group that contains the intended symbol.Upon selection of a group, the symbols of that set are distributed individually to different presentationgroups, typically with one group containing a command symbol for moving back to the firstpresentation stage. Then the system utilizes the same flash paradigm to decide on the user desiredsymbol [35, 41]. In a similar system known as Geospell, 12 groups of 6 symbols, corresponding torows and columns of a 6 × 6 matrix speller, are arranged in a circular fashion [42, 43]. In anotherstudy these 12 overlapping subset of symbols are presented in RSVP manner [44]. In these systems,the intersection of the selected groups gives the desired symbol.

Other Visual Presentation Paradigms: The visual presentation paradigms explained above donot exhaustively cover all the possible presentation techniques that could be (and have been) used inan ERP-based BCI system for communication. Various alternatives have been proposed and testedfor limited communication. Here, we categorize systems that vary in their vocabulary extent from a

4


few icons all the way down to binary (yes/no) communication as limited communication systems.Examples include: (1) Icon-based limited communication - for example (i) systems for appliance orgadget control in which icons are flashing in sequences of random order one at a time [45, 46], and(ii) a system for expressing basic needs and emotions by answering yes/no questions [47]. RSVPiconCHAT (unpublished at the time of submission) is a variation of RSVP KeyboardTM that useslimited-vocabulary icon representations (based on Rupal Patel’s iconCHAT system). (2) Cursorcontrol - for example, a system in which four flashing stimuli map to movements of the cursor toone of four directions (up, down, left, right) [48, 49, 50]. Exogenous-icon (four arrows or fouricons flashing on the sides of the screen) and endogenous-letter (letters representing directions)paradigms were tested on users with ALS, revealing that the endogenous paradigm provides betterperformance for a gaze-independent BCI [50]. Qualitatively, results were similar when the signalprocessing approach was improved [49]. (3) Web browser - for example, (i) the Virtual Keyboard(RoBIK) project, which employs a matrix-speller paradigm to provide the user with different tagswhich are mapped to elements of the web browser [51]; and (ii) a system that employs a matrixspeller paradigm to allow complete keyboard and mouse control to navigate through web browseroptions [47].

1.2.2 Signal Processing and Inference in BCI for Communication

The signal processing and inference techniques used for BCI-based communication systemscan be used with little or no modification for other applications of BCI. However, this particularapplication also presents some customization opportunities to be exploited by designers of BCI-based communication systems.

1.2.2.1 Preprocessing and Dimension Reduction for EEG Evidence Extraction

EEG signals acquired as a response to presented stimuli are not only noisy, with very lowsignal-to-noise ratio, but also have nonstationarities due to various factors such as physiologicalor environmental artifacts, sensor failure, and subject fatigue. To design an effective inferencemethod for BCI, it is essential that the most salient EEG signal features are extracted as evidence.Preprocessing and dimension reduction are steps aimed at such feature extraction. In ERP-basedBCIs the P300 is of primary interest and statistical preprocessing spatiotemporal filters with priorsthat favor these components can be designed. In all designs, the removal of DC drift (the baselinefluctuations due to frequencies << 1Hz) and possibly artifact-related high frequency components inEEG are partially achieved with a properly designed bandpass filter. This initial bandpass filteringis a common step in all BCI systems. It is recommended that linear-phase FIR (finite impulse

5


response) filters be used to prevent phase-response-induced distortions to waves and rhythms, aswell as to make accounting for group delay easy for downstream operations in the signal processingand inference pipeline. In particular, for visually evoked potentials the group delay of the bandpassfilter must be considered when aligning (unfiltered) event markers to filtered EEG. This alsomeans that for real-time operation the bandpass filter group delay should be kept as small aspossible (considering the tradeoff between having a high quality magnitude response for desiredand undesired frequencies and the delay introduced to the inference process and the close-loopcontrol dynamics; the latter consideration is relevant in robotic agent control applications).

After the initial bandpass filtering, time-windowed data from different EEG channels is usuallyconcatenated to obtain the EEG feature vector. Based on the sampling frequency and the numberof channels used, this vector could have a high dimensionality. Several methods are employed,before or after concatenation as suitable, for feature dimension reduction and further noise andartifact reduction: grand average over all trials [47, 16, 11, 52], downsampling [12, 11, 45, 53,26, 27, 41, 54], discrete/continuous wavelet transform [13, 15, 11], feature selection by stepwiselinear discriminant analysis [22], decimation by moving average filtering [16, 17, 21, 22, 29, 34],channel selection [14, 21, 22, 26, 27], artifact removal through independent component analysis(ICA) [48, 50, 49, 30], enhancing P300 response by adaptive spatial filtering including commonspatial pattern (CSP) and xDAWN algorithm [14, 15, 31, 28], and dimensionality reduction throughprincipal component analysis (PCA) [34, 40]. In the following, we describe the most commonpreprocessing methods in more detail.

Downsampling: From each EEG channel, after bandpass filtering, discrete signals x[n], n =1, ..., N are obtained through the discretization of the continuous signal xc(nTs) with Ts = 1∕fsas the sampling period and fs as the sampling frequency. To detect a possible change in EEG,usually a time-windowed portion of the EEG signal time-locked to the presentation of each stimulusis extracted. Then, based on the sampling frequency, a high dimensional data vector is obtainedfrom each channel. A very common way to decrease the dimensionality is downsampling, i.e.,xd[n] = x[nM] where M is the reduction factor. M is chosen to prevent aliasing, based on thecut-off frequency fc of the bandpass filter such that M fc

fs≤ 1∕2.

Moving average filtering: An alternative or additional dimensionality reduction technique todownsampling is moving average filtering. For every channel, the signal, x[n], n = 1, ..., N , ispartitioned into equal non-overlapping segments of, for example, length K (usually N∕K is aninteger), such that the ith segment is x[(i − 1)K + n] for n = 1, ..., K . Then, decimation is obtainedby taking the average of each segment, ending up with N∕K data points to represent the data.

Independent component analysis (ICA): Assuming that the measured EEG data is a linearcombination (mixture) of signals of interest, artifacts, noise, and other brain activity irrelevant to the

6


task, blind source separation techniques such as ICA are used to separate sources of interest fromother contributing signals [30, 28, 50]. Assuming statistical independence between mixed sources,ICA tackles the problem of source separation on the basis of optimizing an objective functionthat is appropriate even with limited assumptions on source statistics, including non-Gaussianity,non-whiteness, or nonstationarity. Statistical properties of separated source estimates commonlyused in objectives include kurtosis (the fourth-order cumulant), negentropy (the difference betweenthe differential entropy of a multivariate Gaussian random variable that has the same covarianceas the source estimate vector and the differential entropy of the source estimate vector), mutualinformation, maximum likelihood fit under the parametric density-mixing model (with Infomaxproviding one possible realization) [16].

Channel selection: Another common way to decrease the dimensionality of the EEG data isto choose which EEG channels to use in the BCI setup. Using a limited number of sensors hasother practical benefits, such as reduced preparation time, which is an important consideration forin-home use of BCI systems. One common way to choose the set of channels to retain is to usechannels previously shown in the literature to exhibit event detection. For example, in additionto the Fz, Cz, and Pz locations of the International 10-20 system, posterior sites and occipitalregions are shown to improve BCI performance for ERP/P300 detection [12, 22] . Rather thanusing pre-selected sets of channels in BCI systems to consider possible performance changes acrossdifferent users, adaptive channel selection methods have also been developed. Recursive [27, 26]and backward-forward [14] channel selection methods that optimize typing accuracy, and a channelselection method based on maximizing the mutual information between class labels and channelfeatures [31], are shown to improve BCI performance.

Common spatial patterns (CSP): CSP is a commonly used spatial filtering method that at-tempts to exploit the high spatial correlations in extracting common underlying responses for a trialin the BCI presentation paradigm. Obtained by determining the linear projection that maximizessignal-to-noise power ratio, CSP leads to an explicit generalized eigenvalue type solution that canbe easily obtained. In a binary classification problem, let the recorded EEG signal for the kth trialbe Xk (an Nc ×Nt matrix where Nc is the number of channels and Nt is the number of temporalsamples following stimulus/cue onset), and define index sets I1 and I0, where k ∈ I1 or I0 if kth trialbelongs to class C1 or C0. Then, for c ∈ {0, 1} the class-conditional sample covariance estimatesare

Sc =∑

k∈Ic

XkXTk

trace(XkXTk )

and the CSP filter coefficients W are calculated by solving

argmaxW

trace(W TS1W ) subject to W T (S1 + S0)W = I.

7


By equating the gradient of the Lagrangian for this equality constrained optimization problem tozero and solving for the parameters, it is found that generalized eigenvectors of the matrix pair(pencil) (S1, S1+S0) are candidates in this first order analysis. Relating the generalized eigenvaluesto the objective being optimized reveals that projection vectors can be selected by sorting accordingthe eigenvalues and selecting the vectors accordingly.

xDAWN algorithm: This algorithm specifically aims to provide an unsupervised spatiotemporalfilter design method to project raw EEG on the estimated ERP (P300) subspace by maximizing thesignal-to-signal-plus-noise ratio (SSNR) [14, 28]. Let the number of sensors be denoted with Ns,the total number of temporal samples with Nt, and the number of temporal samples correspondingto an ERP with Ne (which is typically chosen to extend over 600ms to 1s long post-stimulusintervals – a longer than necessary interval, in our opinion, for pure P300 response, possibly withthe purpose of capturing potentially useful motor activity in the brain in case the user engagesin motor responses for each target stimulus). Assume that the target stimuli elicit P300 evokedpotentials and the measurement model is written as X = DA +N , where X is an Nt ×Ns matrix,A is an Ne ×Ns matrix of ERP signals, D is an Nt ×Ne Toeplitz matrix (first column elementsall null, but D(�k, 1) = 1 with �k as the stimulus onset time of the kth stimulus (1 ≤ k ≤ K), withK denoting the total number of target stimuli), and N is an Nt ×Ns noise matrix (other brain andartifact activity). A = A1 + A2 is assumed to contain a response common to all ERPs, A1 and arandom spatiotemporal pattern A2. Then, the aim of the algorithm is to estimate spatial filter U ,an Ns ×Nf matrix, with Nf denoting the number of spatial filters, by solving the optimizationproblem

U = argmaxVSSNR(V ) = argmax

V

trace(V TAT1D

TDA1V )trace(V TXTXV )

after which the filtered signals are obtained by X = XU .

Principal component analysis: The dimension of EEG evidence (feature) vectors obtainedupon concatenation of data from each channel can be reduced using PCA, which projects the featurevectors to the subspace spanned by the largest eigenvectors of the feature covariance matrix inorder to preserve high power (since EEG is made zero-mean by bandpass filtering) bands. Notethat PCA applied to time-delay vectors acts as energy-selective FIR bandpass filters. Eigenvectorscorresponding to eigenvalues smaller than a predefined threshold are discarded in this process. Itshould be noted that PCA may be used for regularization purposes with care as described, but itshould not be used with the intent of finding the discriminant projections in general.

8


1.2.3 Classification

The purpose of the classifier in ERP-based systems is to detect the existence of ERP (especiallyP300) in the EEG response following each stimulus (e.g., intensification of rows/columns/subsets inthe matrix speller, presentation of letters/symbols in the RSVP paradigm, or finger tapping events ina tactile stimulation paradigm). In SSVEP/SSAEP-based systems the classifier uses temporal orfrequency domain features to detect which stimulus the user is attending to (e.g., flickering arrowsor textures on the screen for SSVEP/codeVEP or tones/clips in SSAEP paradigms). In VCP, theclassifier attempts to identify which imagery-induced brain rhythm is prominent in EEG, especiallyover motor cortical areas for motor imagery paradigms, using spatiotemporal filtering and featureextraction. We will survey the most commonly used classification approaches, which include (1)linear discriminant analysis (LDA) based classifiers (e.g. Fisher LDA (FLDA), Stepwise LDA(SWLDA), and Bayesian LDA), and (2) support vector machine (SVM). Other classifiers for BCIsystem include genetic algorithms [48], logistic linear regression, neural networks, matched filters,Pearson’s correlation method, and regularized discriminant analysis (RDA) and its special cases.

In addition, unsupervised and semisupervised methods including those that assume hierarchicalGaussian distribution models for EEG [19, 20], that are based on co-training of FLDA and BLDA[55], and that are based on offline learning of the ERP classifier from EEG using data from a poolof subjects followed by online adaptation for different individuals [25] have also been employed.Semisupervised classifier adaptation promises to reduce calibration data collection duration andpossibly adaptability against nonstationarities in EEG during test phase.

A BCI system’s performance depends not only on the choice of classifier, but also on prepro-cessing methods, selected features, the users who participate in the study, and a multitude of otherfactors. Therefore, a comparison among different studies to choose the “best” classifier for a BCIspeller system is not feasible. However, within individual studies, comparisons among classifiershave been attempted. For example, using offline EEG data, it was demonstrated that SWLDA andFLDA provided better overall classification performance compared to Pearson’s correlation method,linear SVM, and Gaussian Kernel SVM [21], a matched filter based classifier outperformed amaximum likelihood based classifier [30], and BLDA outperformed LDA, SWLDA and neuralnetworks.

LDA based classifiers: LDA is a supervised method for classification. For two classes C0 andC1 consider samples (EEG features) given in the form X = {xt, rt} such that rt = 1 if xt ∈ C1 andrt = 0 if xt ∈ C0. LDA finds the vector w that maximizes some measure of class separation forprojected data. A typical approach is to maximize Fisher’s discriminant [56]

J (w) = ((m1 − m0)2)∕(s21 + s20).

9


Here, m1 = wT (Σtxtrt)∕(Σtrt) = wT�1 and m0 = wT (Σtxt(1 − rt))∕(Σt(1 − rt)) = wT�0 with �1and �0 denoting the class-conditional mean vectors of features from C1 and C0, respectively. Also,s21 = Σt(w

Txt − m1)2rt and s20 = Σt(wTxt − m0)2(1 − rt) indicate the class-conditional variances of

projected samples from C1 and C0. Noticing that and (m1 −m0)2 = wTSBw and s21 + s20 = w

TSWw,with SW = S1 + S0 where S1 and S0 denote the class-conditional covariances of the feature vectorsand SB = (�1 − �0)(�1 − �0)T , the optimal FLDA projection vector is found as the generalizedeigenvector of the matrix pencil (SW , SB) corresponding to the largest generalized eigenvalue. Aftersome simplifications, the resulting vector is wFLDA = S

(−1)W (�1−�0) [56]. The discriminant score

is then simplywTx +w0 (1.1)

where w0 is a threshold, and it specifies a hyperplane classification boundary along with w. Notethat the FLDA solution is minimum-risk optimal under the assumption of equal covariance Gaussianclass distributions, which is typically reasonable for EEG if one assumes EEG is a superpositionof background brain activity and stimulus/event-related brain activity with a wide-sense stationaryGaussian background process model; and it is also a special case of linear regression [57].

In (1.1), x is the feature vector and w is the vector of feature weights. In P300 matrix spellerapplications, to combine multiple trials in a sequence, it is assumed that the user is focusing on asingle symbol during a sequence, and this symbol is inferred by the intersection of the predictedrow and the predicted column. Denoting with Tirow and Ticol the index sets of the trials (row andcolumn highlights) where the ith symbol is highlighted, the following equations are used to obtainpredicted row and column indices:

Predicted Row = argmaxirow

∑

(t∈Tirow )

(

wTxt)

& Predicted Column = argmaxicol

∑

(t∈Ticol )

(

wTxt)

SWLDA is an extension of LDA to choose the feature values to be used in (1.1). The significantfeatures are chosen using a combination of forward and backward stepwise regression. SWLDAhas an inherent automatic feature selection property and is commonly used in P300-based BCIsystems and other BCI designs. SWLDA consists of two loops: one for forward selection and onefor backward elimination (see Algorithm 1).

In BLDA [45], to design a separating hyperplane as shown in (1.1), a prior distribution isassumed for the weight vector w. Then, a predictive feature distribution is obtained using theposterior distribution of the weight vector, and this predictive distribution is used to make aninference on the stimuli/options. The targets trt for rt ∈ {0, 1} and feature vectors xt are assumed tobe linearly related in the presence of additive white Gaussian noise n, such that

tc = wTx + n. (1.2)

10


Algorithm 1: Stepwise Linear Discriminant Analysis (SWLDA)/* Initializations */

1 S ← ∅.Sc ← {1, ⋯ , K} ⧵ S.rS ← r.Define f (k, l)← F(k,l),1−�f ∈ [0, 1] as the confidence threshold for forward selection.Define b(k, l)← F(k,l),1−�b ∈ [0, 1] as the confidence threshold for backward elimination./* Starting the Iterations */

2 Sf ← S./* Froward selection */

3 for all s ∈ Sc doUsing linear least squares, fit a line from xS∪{s} to rs. Let wS∪{s} be the regressioncoefficient vector, rS∪{s} be the predicted label and N represent the number of samples inthe training set. Then the sum of squared errors over samples isSSerr,S∪{s} =

∑Ni=1(r(S, i) − rS∪{s},i)

2. Let r be the average of label r over the sample setand SSreg,S∪{s} =

∑Ni=1(r − rS∪{s},i)

2. Given these, the F statistic is computed as

F S∪{s}(|S∪{s}|,N−|S∪{s}|−1) =

( (|S ∪ {s}|, N − |S ∪ {s}| − 1) ⋅ SSreg,S∪{s}|S ∪ {s}| ⋅ SSerr,S∪{s}

)

.

4 if F S∪{s}(|S∪{s}|,N−|S∪{s}|−1) > f (|S ∪ {s}|, N − |S ∪ {s}| − 1) thenSf ← S ∪ {s}.

5 Sb ← Sf ./* Backward elimination */

6 for all s ∈ Sb do

F Sb⧵{s}(1,N−|Sb|−1)

=

(

(N − |Sb| − 1) ⋅SSreg,Sb⧵{s} − SSreg,Sb

SSreg,Sb

)

.

7 if F Sb⧵{s}(1,N−|Sb|−1)

> b(1, N − |Sb| − 1) thenSb ← Sb ⧵ {s}.

/* Check for convergence */

if Sb ≠ S thenS ← Sb.rS,i ← rS,i − rS,i for i ∈ {1, 2, ⋯ , N}.Sc ← {1, ⋯ , K} − S.Goto 2.

8 Stop.

Here t1 = N1∕N for C1 and t0 = −N0∕N for C0 withN0 andN1 denoting the number of calibrationsamples corresponding to C0 and C1, respectively, and N = N0 +N1.

Using (1.2) and considering all feature vectors for both classes, the conditional distribution ofthe targets, p(tc|w,X, �), with � denoting the noise distribution parameters vector, can be calculated,

11


where X = {xt, rt} is defined as above. In addition, assuming a prior distribution for the weightvector w, p(w|�), with � denoting the weight prior parameters, the posterior distribution for theweight vector w is computed using Bayes’ rule as

p(w|tc, X, �, �) ∝ p(tc|w,X, �)p(w|�).

Usually, the prior distribution forw is chosen as the conjugate prior to the assumed noise model suchthat p(w|tc, X, �, �) has a closed form solution. Then a predictive distribution for the target variablefor a new input x can be calculated as in (1.3) for inference on the class label r corresponding tothis new input:

p(t|x, X, �, �) = ∫wp(t|w, x, �)p(w|tc, X, �, �)dw. (1.3)

Support vector machine: SVM classifiers provide the optimum separating hyperplane in featurespace (linear SVM) or in the transformed feature space (kernel SVM) by not only putting a constraintthat the separated features are on different sides of the hyperplane (similar to LDA), but alsomaximizing the distance between the features closest to the hyperplane and the separating hyperplane(this distance is called the margin). In the event of non-separable classes, the misclassified samplesare penalized by their distance to the boundary.

For two classes C1 and C−1 (changing label values), given labeled samples (EEG features)X = {xt, rt} such that rt = 1 if xt ∈ C1 and rt = −1 if xt ∈ C−1, the solution to the followingproblem provides the optimal separating hyperplane in SVM:

minw

12‖w‖22 subject tort(wTxt +w0) ≥ 1 − � t

where � t ≥ 0 are slack variables storing variation from the margin. The Lagrangian for thisoptimization problem can be written as

L = 12‖w‖22 + C

∑

t� t −

∑

t�t[rt(wTxt +w0) − 1 + � t] −

∑

t�t� t

where �t and �t are the Lagrange multipliers, and C is the complexity parameter penalizing theboundary violations by nonseparable points. This is a quadratic convex optimization problem thatshould be minimized with respect to w and w0 and maximized with respect to �t and �t. Thesolution is obtained by maximizing the dual problem in terms of �t, and then setting w =

∑

t �trtxt.By calculating g(x) = wTx +w0, one decides on C1 if g(x) > 0 and C−1 otherwise. This classifieris commonly referred to as linear SVM.

Kernel SVM is a generalization such that the feature vectors are first transformed z = �(x) from

12


a finite dimensional space to possibly an infinite dimensional space through basis functions, thenusing w =

∑

t �trtzt =∑

t �trt�(xt), the discriminant is

g(x) = wT�(x) =∑

t�trt�T (xt)�(x) =

∑

t�trtK(xt, x)

where the kernel function K(xt, x) = �T (xt)�(x) is the inner product of the basis function vectors.Different kernel functions are used to design SVM classifiers, most popularly Gaussian kernel orhigher order polynomials.

The presence of artifacts, sensor failure, or other effects such as BCI user fatigue cause nonsta-tionarity in EEG signals. These nonstationarities change the underlying distribution of the EEGdata; therefore a classifier designed based on a training data set may not always work with thepredicted accuracy or speed. To overcome such issues two SVM based classifiers are proposed.

An ensemble of SVMs is proposed to classify EEG data [26, 27]. In this method, the trainingdata is separated into multiple parts, and for each part a separate linear SVM is trained. The scorefor each row/column is then calculated as the summation of the scores of the ensemble of SVMs.The authors show that with fewer sequence repetitions they achieve similar results compared to anLDA-based classifier tested on the same data set [13].

A self-training SVM is proposed to deal with nonstationarities of the EEG data [23]. A linearSVM is first designed using the training data set. Then during the testing phase of the BCI system,each decision made by the classifier is assumed as correctly labeled EEG data. Then, using thesenew labeled data, the SVM classifier is retrained. It was shown that for a desired communicationaccuracy, this method significantly reduces the training session length.

Regularized discriminant analysis: RDA is a supervised quadratic classification algorithm[58] that assumes multivariate normal distributions as the class-conditional distributions. To alleviatethe rank deficiency of the maximum likelihood estimates of class-conditional covariance matricesdue to the curse of dimensionality caused by low number of samples in calibrations, shrinkage andregularization operations are applied, respectively, as

Σr(�) =(1 − �)Sr + �S(1 − �)Nr + �N

& Σr(�, ) = (1 − )Σr(�) + ptrace[Σr(�)]I

where � and are hyperparameters that need to be optimized, for instance, using cross validation.Shrinkage operation makes the class covariances closer to an overall covariance matrix (suitablefor EEG assuming equal covariances for classes for reasons explained in the LDA section) andregularization makes them more circular and primarily, nonsingular.

13


1.2.3.1 Factors that affect speller performance

Odd-ball effect: The standard presentation setup in matrix spellers consists of a 6×6matrix withrows or columns intensified one at a time. As mentioned above, a sequence includes (6 + 6 =)12flashes when all the rows and columns are intensified. The 6 × 6 matrix structure presents 36symbols, including the 26 English letters and 10 more choices, which can contain digits or otherchoices like delete or space. With the assumption of one target item in each sequence, there areonly 2 flashes containing the desired symbol; and hence the probability of oddball paradigm is2∕12 ≃ 0.17. This probability is sufficiently low for generating a P300 response [10]. Many criteriahave been considered to increase the ERP detectability.

Inter symbol interval (ISI): ISI (including a related measure, target to target interval (TTI))is one of the most effective factors to be studied. Short intervals between target flashes wouldresult in repetition blindness (attention blink) and habituation, which decrease ERP amplitudeand hence its detectability. Many papers have studied this factor along with other parameters likematrix size [59, 60] or different presentation paradigms [61, 62, 53, 63, 64, 65]. In the matrixspeller, the optimal ISI varies depending on the matrix size and presentation paradigm; for example,[60] reported the best performance with an ISI of 175 ms for a 3 × 3 matrix and row/columnparadigm (RCP), and [24] showed that lower flash rates in the range of 8 to 32 Hz result in thebest performance for an 8x9 matrix with flashes of 6 items at a time. They also demonstrated thatvariation in stimulus-on and stimulus-off time doesn’t affect performance.

Matrix spellers are typically set up to avoid the possibility of consecutive target flashes. Similarly,in the RSVP paradigm, one would avoid consecutive presentations of the same symbol for the samereason. Lu and colleagues studied BCI performance as a function of stimulus-off time, ISI, flashduration and flash rate as 4 timing parameters [55]. They suggested that BCI accuracy is a functionof the number of trial repetitions and BCI performance is enhanced when stimulus-off time and ISIare increased. These studies suggest that optimal ISI depends on the number of non-target flashesbetween targets. Jin et al [53] studied the effect of TTI on BCI performance. They employeda 7 × 12 matrix of characters with 16, 18, and 21 flashes in each sequence, with a flash patternoptimized to minimize TTI while avoiding repetition blindness. To avoid repetition blindness aminimum of one (for 16 flashes), two (for 18 flashes) and three (for 21 flashes) non-similar symbolpresentations between two flashes of the same item has been proposed. Here, the 18-flash patternshowed the best performance in terms of classification accuracy and information transfer rate.

Different matrix and stimuli/flash organizations: The unpredictability of the target letter andthe physical arrangement of items on the presentation screen are other factors which can affectERP amplitude. Changing the size of a matrix will alter the location of items on the screen, as

14


well as the number of items displayed, resulting in changes to the probability of the target item[60]. Increasing matrix size decreases the probability of the target letter and hence enhances theERP’s SNR. However, the required time for highlighting all the columns and rows will increase,so this does not necessarily lead to improved typing speed [59]. Smaller matrix sizes flashingwith shorter ISI seem to yield better typing speeds in a typical RCP [60]. Remodeling the flashparadigm from an RCP to a group-based paradigm is another phenomenon that has been analyzed.In the matrix speller, a non-row/column subset-based flash paradigm is studied on a 12 × 7 matrix[53]. Subsets are selected such that each sequence contains 9, 12, 14, or 16 flashes. The 16-flashparadigm shows better performance than the other subset-based options and RCP. Townsend andcolleagues proposed the checkerboard paradigm (CBP) to avoid adjacency distraction error [64].This paradigm is a special case of the previous flash paradigm in which subsets of symbols in an8×9 matrix are flashed by alternatingly selecting a row or column from one of two 6×6 matrices ofsymbols, forming a checkerboard pattern for each flashing subset. CBP demonstrates a significantimprovement in accuracy compared to RCP. Another flash paradigm known as C(m, n) is introducedin which m is the number of flashes per sequence and n is the number of flashes per item [65].Specifically the C(36, 5) known as the 5-flash paradigm (FFP) has been compared against CBP.Both have high accuracy, but the FFP offered a higher information transfer rate.

To consider an error correction code approach, Hill and colleagues assume a noisy communi-cation channel and assign a code word to each item with a length equal to the number of flashesin each sequence [62]. Code words are all zeros except for a single one at times correspondingflashes. Extra flashes are employed to generate redundancy and the codebook is optimized to havemaximal minimum-Hamming-distance between pairs of codes. The TTI is constrained to be largerthan a threshold. Results indicate that RCP demonstrates better performance than one would expectaccording to its hamming distance and TTI. Moreover, the optimal stimulus type is a subject-specificparameter. Imposing transparent familiar or well-known faces (like those of family members) onmatrix elements is another method which can lead to increased SNR [52].

Reshaping the fixed matrix arrangement of items into various forms has been another strategyfor matrix spellers. One proposed method is the hierarchical region based flash paradigm [61]. Inthis setup, 49 items equally distributed in 7 groups are positioned in different regions of the screen.At the first level, each region would intensify one by one. Then letters in the (inferred) intendedregion would be distributed at 7 locations on the screen and the user can proceed by making furtherselections to reach the intended item. In a similar paradigm, one can use a language model to decideon the hierarchy of characters to be used in the presentation layout. The lateral single-characterparadigm (LSCP) is another proposed technique in which items are arranged in a circular layout onthe screen. Only one item would flash at a time and two consecutive flashes cannot be from thesame side (left or right) to reduce cross-talk from nontarget flashes.

15


Gaze dependence: The P300 matrix speller is a gaze-control dependent design. Hence, userswith limited gaze control will experience significant difficulty. To address this, a new presentationparadigm called the gaze independent block speller (GIBS) has been proposed to reduce thedependency on gaze control. Here, 36 items are distributed into four groups, one block at thecenter of the screen and three blocks at three corners. Central block items flash one by one, andother blocks flash as a group. If the intended character is in another block, the user should aimfor that block and if selected, that block will move to the center. Results indicate that without eyemovements (fixating at the center) this system offers a bit ratio similar to the standard RCP. Incontrast, for SSVEP stimuli, selective attention to a flicker pattern even with overlapping stimuligroups may provide sufficiently discriminative signals for BCI [59]. In a similar observation forauditory BCIs, Hohne and colleagues observed that discriminating different pitches was easier thandiscriminating direction of arrival [66].

Feature attention: This corresponds to the attention of a BCI user to different properties of thepresented stimuli, and has been shown to affect BCI performance. The original ERP-based Hex-o-Spell has been compared to its variants, Cake Speller and Center Speller, which feature differentcolors and forms for the visual stimuli. Cake Speller is similar to Hex-o-Spell in terms of designexcept that the symbol groups are located in triangles rather than circles, and these triangular groupsform a hexagon. In Center Speller, symbol groups are presented within various shapes of variouscolors in the center of the screen, in RSVP fashion [41]. The results showed that the Center Spellerhas higher P300 response and higher classification accuracy. In the matrix speller, a green/bluecolor change during highlighting was shown to be superior to white/gray color change [67]. Avisual stimuli scheme based on color change and movement of the stimuli has been employed inmatrix speller design. This scheme induces P300 and motion onset visual evoked potential, and wasshown to outperform a scheme based only on color or motion [53]. In RSVP-based BCIs, assigningcolors or different capitalization to the cues led to an increase in spelling rate [37].

Error related potentials (ErrPs): ErrPs are EEG potentials induced by the user’s recognitionof an error. These potentials are detectable in the anterior cingulate cortex over the fronto-centralregions of the scalp when the decided action shown on the interface is not the user’s intended symbol[16, 17]. Detection of ErrPs in EEG, and their integration into P300-based intent classifiers by errorcorrection after P300 detection, can improve the accuracy and speed of BCI systems [46, 33].

Context information: Context information refers to evidence from non-EEG-sources thatcomplement EEG data in inference. Word completion and use of language models are well-knownexamples. BCI communication systems specifically designed for typing benefit greatly fromprobabilistic language models. Various predictive word completion methods integrated into theintent detection process [29, 52, 68] and Bayesian fusion methods that combine probabilisticn-gram language models with different classifiers, as in RSVP Keyboard™ [69, 38, 39] and other

16


systems [70, 32, 71], have been demonstrated to enhance the accuracy and speed of communication.

1.2.4 Output Components

BCI communication systems have three options for output: text; text-to-speech; and speech.The output option most often referred to in non-invasive BCI literature is text, but off-the-shelftext-to-speech modules can be appended with relative ease. The widely researched P300 Speller[60] that is also used by the BCI2000 system has been validated for text output tasks like spelling,email, or internet browsing. Text-to-speech requires a speech synthesizer for conversion of normallanguage text into artificial verbal production; such synthesizers are available on virtually all modernpersonal computers. To employ this output method, the user must simply enable this feature on hisor her computer and have a way to interface with it. Various groups report people with advancedALS effectively using BCI-controlled text-to-speech applications in their daily lives. The option ofdirect speech output has been investigated by a group working with an invasive BCI; initial resultsindicate the potential to use speech motor imagery to produce vowel sounds, and the researchers’eventual goal is to develop a BCI capable of producing synthetic speech in real time.

Although excellent advances have been made since P300 and SSVEP BCIs for communicationwere introduced in late 80s [10, 72], researchers agree that slow information transfer rates continueto plague the technology. Even so, the field remains hopeful about emerging communicationapplications [61].

1.3 Manuscript organisation

In this manuscript, each chapter consists of a published or submitted paper or book chapter.

As discussed in Section 1.2.3.1, among many, performance of BCIs can dramatically vary basedon gaze dependency of presentation component, incorporation of contextual information and intertrial interval. In Chapter 2 after providing an overview of the design of our system we considerthese effects through an experimental study executed on 12 healthy users.

The performance of BCIs which rely of supervised data for system parameter estimation can bedegraded due to relative high dimensionality of feature vectors in contrast to number of data pointscollected during a supervised session. The estimators that are generally used for these estimationsare asymptotically optimal under some assumptions. But it is not feasible to collect enough samplesto satisfy the asymptotic properties when system is in use by a human participant. Hence, many

17


dimensionality reduction and regularization methods have been proposed by the researchers in thefield to overcome this problem (please see Section 1.2.2 for more details). In Chapter 3 of thismanuscript, we introduce a method that introduce some bias in covariance estimator to reduce thevariance by reducing the number of estimands trough proposing some assumptions.

EEG can only offer a low (SNR) hence, in recent decade researchers have increasingly consideredto combine non-BCI ATs with BCIs to develop reliable systems. The resulting body/brain computerinterfaces (BBCIs) can enhance the detection accuracy and inference speed. For instance combiningelectromyography (EMG) or electrooculography (EOG) signals – which records the electricalactivity of muscles – with EEG have shown great improvement over standalone inference fromeach modality separately [73]. Based on these observations, in the chapter 4 of this manuscript,we introduce a framework which provides a probabilistic framework for incorporating differentphysiological measurements along with contextual information for a more accurate inference in realtime. This framework also utilize active learning concept for informative sequence design whichcan lead to significant improvement of system performance.

Despite all efforts for improving the system performance, sometimes the inference algorithmmight lead to an incorrect decision. In our system, a backspace symbol is offered to the user fordeleting miss-typed symbols. But, based on the work-flow of the system each error correction needtwo back to back correct selection. Hence, error corrections can lead to long cycles especially whenthe classifier is not very accurate. We proposed to mitigate this problem by fusing the probability ofuser agreement to system decision. We estimate the likelihood of user confirmation or disagreementfrom Error related potentials (ErRPs) which is induced in response to systems wrong decisions.Experimental results presented a significant performance improvement when ErRP was used ininference mechanism.

Abstract of the remaining chapters are listed bellow:

Chapter 2: Language-Model Assisted Brain Computer Interface for Typing: A Comparison of Ma-trix and Rapid Serial Visual Presentation: Non-invasive electroencephalography (EEG)based brain computer interfaces (BCIs) popularly utilize event related potential (ERP) for in-tent detection. Specifically, for EEG-based BCI typing systems, different symbol presentationparadigms have been utilized to induce ERPs.

In this manuscript, through an experimental study, we assess the speed, recorded signalquality and system accuracy of a language-model-assisted BCI typing system using threedifferent presentation paradigms: 4 × 7 matrix paradigm of a 28-character alphabet with row-column presentation (RCP) and single character presentation (SCP), and rapid serial visualpresentation (RSVP) of the same. Our analyses show that signal quality and classification

18


accuracy are comparable between the two visual stimulus presentation paradigms. In addition,we observe that while the matrix based paradigm can be generally employed with lowerinter-trial-interval (ITI) values, the best presentation paradigm and ITI value configuration isuser dependent. This potentially warrants offering both presentation paradigms and variableITI options to users of BCI typing systems.

Chapter 3: Spatio-Temporal EEG Models for BCIs: Multichannel electroencephalography (EEG) iswidely used in non-invasive brain computer interfaces (BCIs) for user intent inference. EEGcan be assumed to be a Gaussian process with unknown mean and autocovariance, andthe estimation of parameters is required for BCI inference. However, the relatively highdimensionality of the EEG feature vectors with respect to the number of labeled observationslead to rank deficient covariance matrix estimates. In this manuscript, to overcome ill-conditioned covariance estimation, we propose a structure for the covariance matrices of themultichannel EEG signals. Specifically, we assume that these covariances can be modeled asa Kronecker product of temporal and spatial covariances. Our results over the experimentaldata collected from the users of a letter-by-letter typing BCI show that with less number ofparameter estimations, the system can achieve higher classification accuracies compared to amethod that uses full unstructured covariance estimation. Moreover, in order to illustrate thatthe proposed Kronecker product structure could enable shortening the BCI calibration datacollection sessions, using Cramer-Rao bound analysis on simulated data, we demonstrate thata model with structured covariance matrices will achieve the same estimation error as a modelwith no covariance structure using fewer labeled EEG observations.

Chapter 4: Active Recursive Bayesian State Estimation for Multimodal Noninvasive Body/BrainComputer Interface Design:EEG can only offer a low signal to noise ratio (SNR) hence, inrecent decade researchers have increasingly considered to combine non-BCI ATs with BCIs todevelop reliable systems. The resulting body/brain computer interfaces (BBCIs) can enhancethe detection accuracy and inference speed. For instance combining electromyography (EMG)or electrooculography (EOG) signals – which records the electrical activity of muscles – withEEG have shown great improvement over standalone inference from each modality separately[73].

In this chapter, we introduce a framework for the design of a multi-modal noninvasive BBCI.This framework in addition to having the capability of employing various combinationsof EEG potentials as listed above, also combines EEG with any combination of differentphysiological measurement modalities (such as EMG, fNIRS, fMRI, etc) to jointly infer theuser intent. The introduced framework processes and communicates large amount of datathat are streamed from multiple modalities to make an inference in a short period of timeto make these BBCIs usable in on-line settings. The framework as explained in the nextsection satisfies such on-line performance requirements by introducing certain conditions on

19


the streamed data and enables parallel processing before the fusion to make the entire processfeasible in real-time.

Appendix A: Error-Related Potentials for EEG-based Typing Systems: Event related potential (ERP)-based typing systems can provide means of communication for people with severe neuromus-cular impairments. During visual presentation of the letters of the alphabet, detection of ERPsin EEG corresponding to a target stimulus can be used to detect user intent. However, EEGhas very low signal-to-noise ratio and making confident decisions becomes a challenge. Toincrease accuracy, repeated stimuli are normally used to detect the user intent which decreasesthe typing speed. In addition, the incorporation of the backspace symbol is used for errorcorrection. However, it also considerably decreases the speed since it requires at least twoselective actions: correctly detecting the backspace symbol and reselecting the intended usersymbol. Alternatively, we propose to use the detection of error related potentials (ErrP) in theEEG response and propose different probabilistic approaches to incorporate ErrP evidences indecision making and auto-correction. With simulations on prerecorded real EEG calibrationdata using our BCI typing system RSVPkeyboard™, we show that our auto-correction methodcan improve typing speed without sacrificing accuracy.

20

Chapter 2

Language-Model Assisted Brain Computer

Interface for Typing: A Comparison of

Matrix and Rapid Serial Visual

Presentation

Mohammad Moghadamfalahi1, Student Member, IEEE,Umut Orhan1, Member, IEEE,

Murat Akcakaya2, Member, IEEE,Hooman Nezamfar1, Student Member, IEEE,

Melanie Fried-Oken3,and Deniz Erdogmus1, Senior Member, IEEE

1Northeastern University, Boston, MA 02115,2University of Pittsburgh, Pittsburgh, PA 15260,

3Oregon Health and Science University, Portland, OR 97239E-mails: {moghadam,orhan,nezamfar,erdogmus}@ece.neu.edu,

[email protected], [email protected]: +1-617-3733021

This work was supported by NIH grant R01DC009834 and NSF grants CNS-1136027, IIS-1149570, SMA-0835976.The package including the code and data associate with this paper can be findat “https://repository.lib.neu.edu/collections/neu:rx913r029”

21

https://repository.lib.neu.edu/collections/neu:rx913r029

CHAPTER 2. LANGUAGE-MODEL ASSISTED BRAIN COMPUTER INTERFACE FOR TYPING: A COMPARISON OF MATRIX AND RAPID SERIAL VISUAL PRESENTATION

2.1 abstract

Non-invasive electroencephalography (EEG) based brain computer interfaces (BCIs) popularlyutilize event related potential (ERP) for intent detection. Specifically, for EEG-based BCI typingsystems, different symbol presentation paradigms have been utilized to induce ERPs.

In this manuscript, through an experimental study, we assess the speed, recorded signal qualityand system accuracy of a language-model-assisted BCI typing system using three different presen-tation paradigms: 4 × 7 matrix paradigm of a 28-character alphabet with row-column presentation(RCP) and single character presentation (SCP), and rapid serial visual presentation (RSVP) of thesame. Our analyses show that signal quality and classification accuracy are comparable betweenthe two visual stimulus presentation paradigms. In addition, we observe that while the matrixbased paradigm can be generally employed with lower inter-trial-interval (ITI) values, the bestpresentation paradigm and ITI value configuration is user dependent. This potentially warrantsoffering both presentation paradigms and variable ITI options to users of BCI typing systems.

Keywords–Brain computer interface, Matrix Speller, RSVP KeyboardTM, Event Related Potential,P300.

2.2 Introduction

Noninvasive brain computer interfaces (BCIs), specifically those based on electroencephalogra-phy (EEG), have become popular to safely enable people with severe motor and speech impairmentsto communicate with their social networks and interact with their environments [3, 74, 75]. Typingis one of the most widely explored applications for EEG-based BCI systems [3]. Event related po-tentials (ERPs), specifically the P300 component of these EEG responses, are commonly exploitedby such typing interfaces for user intent detection [10, 38, 76, 60].

The pioneering work of Farwell and Donchin showed that ERPs containing the P300 responsecan be used to design EEG-based BCI typing systems [10]. They distributed 36 symbols consistingof the 26 letters in the English alphabet and 10 numerical digits across a 6 × 6 matrix. The rows andcolumns of the matrix are flashed in a random fashion to generate an oddball paradigm such thatwhen the row or column that includes the symbol that the user intends to select is flashed, an ERPcontaining the P300 component is elicited. This ERP is then used for target symbol detection. P300is a positive deflection in the scalp voltage with a typical latency around 300 ms after the onset ofan infrequent target stimuli [9].

22


Despite the practice being the benchmark in matrix spellers, flashing rows and columns for thepresentation of a symbol may result in poor P300 signal quality and a single character flashingparadigm enhances the P300 response [77]. Studies also demonstrated that the performanceof a BCI typing system that employs a matrix presentation paradigm depends on the gaze ofthe user [35, 44]. Many potential users from the target population, unfortunately, lack precisegaze control and for these users, it is anticipated that matrix paradigms will suffer from reducedperformance. To overcome this dependency in BCI typing systems, different presentation schemeshave been explored and shown to have comparable performances with the matrix presentationparadigm in terms of speed and accuracy [44, 41, 78]. Rapid serial visual presentation (RSVP) isone of these paradigms, in which symbols are presented sequentially in time, at a predefined fixedlocation on the screen and in a pseudorandom order [79, 38, 39, 40, 36, 37].

BCI typing systems can benefit greatly from a language model in order to enhance typing speed.A probabilistic language model can be employed to incorporate predictive word completion duringthe intent detection process [29, 52, 68], or to define a prior on potential target characters during theclassification task [70, 32, 71]. Our system, the RSVP KeyboardTM, originally developed based onthe RSVP paradigm and now also features the matrix presentation paradigm, probabilistically fusescontext evidence with physiological evidence to infer user intent. A symbol n-gram language modeltrained on a large corpus provides probabilities for each character in the alphabet, which are fusedtightly in a Bayesian fashion with EEG evidence [38, 39, 40].

In this paper, we utilize two different matrix schemes (row column flash and single symbolflash) and one RSVP scheme in a BCI typing interface and compare the differences in measuredsignal quality, typing speed, and accuracy. In a similar study, Chennu et. al., through an offlinestudy, have shown that the classification accuracy is comparable between RSVP and matrix basedparadigms, but without a language model the typing speed is relatively low while utilizing the RSVPparadigm [78]. In this study, we also compare the typing performance during online typing of bothRSVP and matrix paradigms, using the aforementioned language-model-assisted BCI.

The contributions of this paper are: (1) building a unified framework for different presentationparadigms that utilize EEG and language model evidence for joint decision making, (2) conductingreal-time and offline comparisons among different presentation schemes, (3) analyzing the effect ofdifferent presentation paradigms on the EEG signal quality.

23


2.3 General system specifications

The complete operational flowchart of the language-model-assisted BCI typing system is illus-trated in Figure 4.6. The system has the following main components: (A) a presentation compo-nent that controls the presentation scheme, (B) a feature extraction component that converts rawEEG evidence into a likelihood for Bayesian fusion and (C) a decision making component thatfuses EEG (physiology) and language evidence to infer user intent. In the following, we describethese components in some more detail.

Stimuli/Decision

Brain EEG Preprocessing

Dimensionality Reduction

EEG EvidenceJoint Inference

Contextual Evidence

decision making component

feature extraction component

presentation component

Figure 2.1: The in-house BCI block diagram.

2.3.1 Presentation Component

2.3.1.1 Definitions

Let = {a1,a2,a3, ...,aN} be the set of all possible symbols, typically including the letters inthe (English) alphabet, numerical symbols, space and backspace symbols (represented here by _ and

24


< respectively). Let = {f1, f2, ..., f2||} be the set of all subsets of ; fi ⊂ . || represents the

cardinality of .

A "trial" in the matrix based presentation scheme flashes a subset fi that can contain multiplecharacters i.e. |fi| ≥ 1, and in RSVP it presents a single symbol; i.e., |fi| = 1. A "flash" is thepresentation of a trial. A "sequence" is a series of consecutive flashes of trials with no gap inbetween. After presenting each sequence, the system updates the posterior probabilities of everysymbol in the alphabet using the new EEG evidence and tries to make an inference about userintent. However, a decision is not made until a predefined confidence level is reached1. Therefore,the system may need to present multiple sequences before a decision can be made. We define thecollection of sequences, at the end of which one symbol is selected, as an "epoch".

2.3.1.2 Matrix presentation

Typically, in non-invasive EEG based typing BCIs with the matrix presentation paradigm,symbols are arranged in an × matrix with number of rows and number of columns [3].Subsets of these symbols are intensified usually in pseudorandom order to produce an odd ballparadigm to induce ERP responses.

Trials f(1), f(2), ..., f(n) in a sequence typically cover all the symbols in the matrix, that is⋃n

i=1 f(i) = . When each trial f(i) contains exactly all the symbols in a row or a column of thematrix layout with n = (+) [10], this setup is known as the row-and-column presentation (RCP)paradigm. RCP requires that all the symbols in would be flashed twice and |fi ∩ fj| ≤ 1, i ≠ j.In this study, we utilize a matrix of size 4 × 7 which leads to the best coverage of the wide screenmonitors used in our experiments. It has been claimed that the probability of target character in eachsequence’s flash set should be lower than 25% to induce the P300 response [10]. In this grid setupfor RCP, each sequence contains 11 flashes, 2 of which include the target symbol. So the probabilityof each target trial in each sequence is 2

11≃ 0.18, which satisfies the threshold suggested above.

Single character presentation (SCP) paradigm is also a widely used scheme. SCP was shown toincrease the P300 signal quality compared to RCP [77]. In this paradigm, each trial contains singlesymbols, i.e. |fi| = 1, and assuming there is no repetition in a sequence, fi ∩ fj = ∅; i ≠ j. Withenough number of flashes (n ≥ 5) in a sequence we can satistfy the suggested condition for targetprobability.1In the current implementation, confidence is measured by the maximum posterior probability over ; this corresponds to usingRenyi entropy of order ∞ as the measure of uncertainty. Other entropy definitions such as Shannon’s could also be used.

25


2.3.1.3 Rapid Serial Visual Presentation (RSVP)

RSVP is a presentation technique in which trials are presented one at a time at a fixed predefinedlocation on the screen at a rapid rate and in a pseudorandom order [38, 3]. If a BCI user’s desiredsymbol exists in a sequence of trials presented in RSVP fashion, a P300 response is elicited bythe target in the EEG signal. RSVP is similar to SCP in that each presentation subset includesonly a single symbol; however, RSVP decreases the dependency on gaze control. Presenting 28symbols in RSVP paradigm is time consuming, therefore a typical RSVP based BCI system can onlyachieve a speed of 5 symbols/minute if each sequence contains the entire alphabet [36, 37, 38, 40].However, recent efforts to speed-up typing with this presentation paradigm showed that usingcontext information (such as a language model) and careful selection of subsets of a in eachsequence may significantly improve typing speed and accuracy [38, 40, 39, 29, 68].

2.3.2 Feature Extraction Component

The EEG signals are acquired using a g.USBamp biosignal amplifier with active g.Butterflyelectrodes at a sampling rate of 256Hz, from 16 EEG sites (according to the International 10/20configuration): Fp1, Fp2, F3, F4, Fz, Fc1, Fc2, Cz, P1, P2, C1, C2, Cp3, Cp4, P5 and P6 . Toimprove the signal to noise ratio (SNR), and to eliminate drifts, signals were filtered by an FIRlinear-phase bandpass filter passing [1.5,42] Hz with zero DC gain and a notch filter at 60 Hz.

In order to capture the P300 while omitting the possible motor EEG [9], EEG from a timewindow of [0, 500) ms after each flash’s onset is processed as the corresponding raw data for eachtrial. As we explain later in Section 2.4, we test our system with healthy users; therefore the windowlength is chosen short to avoid any discriminative contributions of motor activity related EEGresponse, if any. EEG data processing continues with: (i) down-sampling by 2, (ii) projection toa lower dimensional space using principle component analysis (PCA) to remove directions withnegligible variance, and (iii) concatenation of data from all channels corresponding to the same trialto form a feature vector for each trial.

2.3.3 Decision Making Component

Evidence from EEG is supported with evidence from language structure. These two informationsources are fused using a Naïve Bayes assumption to make a joint decision using MAP inference.Optimal classifier parameters for target detection are learned using the calibration data.

26


2.3.3.1 EEG feature extraction and classification

To improve intent detection performance, the EEG feature vectors computed as described aboveare projected in to a one dimensional space which attempts to maximize the separation betweentarget and non-target classes according to a measure. Specifically, assuming that, in each class,feature vectors follow a multivariate Gaussian distribution2, quadratic discriminant analysis (QDA)is used to project the data to minimize the expected risk. QDA requires the inverse of the empiricalcovariance for each class. Estimating an invertible covariance is not feasible in the practical usageof the typing system due to the high dimensionality of the EEG feature vectors and low numberof calibration samples in each class. This issue has been addressed by employing regularizeddiscriminant analysis (RDA), which provides full-rank covariance estimates for each class [80].

RDA uses shrinkage and regularization. Shrinkage is a linear combination of each class co-variance matrix and the overall class-mean-subtracted covariance. Considering xi ∈ Rp as a pdimensional feature vector and li as its label, which can take values of 0 and 1 for non-target andtarget classes respectively, the maximum likelihood estimator for mean and covariance of each classare

µk =1Nk

N∑

i=1xi�li,k

�k =1Nk

N∑

i=1(xi − µk)(xi − µk)T �li,k

(2.1)

where k ∈ {0, 1}, Nk is the number of training feature vectors in class k, and thus N , the totalnumber of feature vectors, will be N0 +N1 and �<.,.> is the Kronecker-�. The shrinkage proceduremanipulates the covariance matrices by

�k(�) =(1 − �)Nk�k + (�)

∑1k=0Nk�k

(1 − �)Nk + (�)∑1

k=0Nk

(2.2)

here � ∈ [0, 1] is the shrinkage parameter which defines the similarity of two classes’ covariance.� = 1 leads to equal covariance matrices for both classes, which turns RDA to linear discriminantanalysis (LDA).The regularization procedure is as follows

�k(�, ) = (1 − )�k(�) + ( )1ptr[�k(�)]Ip (2.3)

2The Gaussian distribution assumption here is a direct consequence of the assumption that filtered EEG is a Gaussian randomprocess.

27


tr[.] is the trace operator, Ip is a p × p identity matrix and ∈ [0, 1] is the regularization parameterwhich determines the circularity of the covariance matrix.

Correspondingly, the discriminant score function defined as

dRDA(x) = logf (x;µ1, �1(�, ))�1f (x;µ0, �0(�, ))�0

(2.4)

where f (x;µ,�) is the Gaussian probability density function when x ∼ (µ,�) and �k is theprior probability of class k. In our system, we use �1 = �0. To find the class conditional probabilitydistributions of RDA scores, we use kernel density estimation (KDE) [40]. Each class conditionalKDE is calculated over the RDA scores of EEG evidence recorded for the representative trials ofthat class in the calibration data set. Finally, the conditional probability density function for eachclass is defined as:

f (x = y|l = k) = fKDE(dRDA(x) = dRDA(y)|l = k) =1Nk

N∑

i=1ℎk(dRDA(xi), dRDA(y))�li,k (2.5)

Here ℎk(., .) is a suitable kernel function with bandwidth ℎk. A Gaussian kernel is used in oursystem, and accordingly the kernel bandwidth ℎk for each class is calculated using the Silvermanrule of thumb [81] over the RDA scores for the corresponding class.

2.3.3.2 Language Model

The system utilizes a letter n-gram model in an iterative Bayesian framework to increase thetyping speed by prioritizing the symbols to be presented in each sequence and by providing a priorcontext for intent detection. A letter n-gram model estimates the conditional probability of everyletter in the alphabet based on n − 1 previously typed letters in a Markov model framework [82].

Therefore, in a letter n-gram model, the conditional probability of each character, according tothe Bayes rule, is given by

p(at = a|at = a) =p(at = a, at = a)

p(at = a)(2.6)

where at is the symbol (yet) to be typed at epoch t and at is the string of previously written n − 1symbols. In our system, we use a 6-gram letter model, which is trained on the NY Times portion of

28


the English Gigaword corpus [82].

2.3.3.3 Fusion

Assume xt,r,airepresents the EEG feature vector of a trial, which contains ai ∈ , at repetition

r ∈ {1, 2, ..., Rai} in epoch t where Rai

represents total number of repetitions of trials containingthe character ai in the same epoch. Moreover, define lt,ai

as the class label for ai ∈ in epoch t.The probabilistic graphical model that we use for fusion is shown in figure 4.5.

at a

lt,a1 ⋯ lt,a|A|

xt,a1,2xt,a1,1

⋯

xt,a1,Ra1 xt,a|A|,2xt,a

|A|,1

⋯

xt,a|A|,Ra

|A|

Figure 2.2: Probabilistic graphical model of the fusion rule.Let t,ai

= [xt,1,ai, xt,2,ai

, ..., xt,Rai ,ai] represent a (p × Rai

) matrix of observed EEG featurevectors in epoch t. Here, p is the length of each feature vector. Accordingly, assume t =[t,a1 ,t,a2 , ...,t,a

||

] is a (p×N)matrix, whereN is the number of total flashes in epoch t. Define as a possible outcome for matrix t. Using Bayes rule, we can define the posterior probabilityconditioned on the prior typed text and the observed EEG feature vectors as

Q = p(at = a|t = , at = a) ∝ p(t = , at = a|at = a)P (at = a) (2.7)

Using the proposed graphical model, given the intended symbol a, the EEG evidence and previouslytyped text are conditionally independent. Moreover, given a, the EEG evidences for each trialxt,ai,1, xt,ai,2, ..., xt,ai,Rai

are independent.

Q ∝⎛

⎜

⎜

⎝

∏

ai∈

Rai∏

r=1f (xt,ai,r = xai,r|at = a)

⎞

⎟

⎟

⎠

P (at = a|at = a) (2.8)

29


xai,r is the possible EEG evidence for rtℎ repetition of character ai. Also for given a, lt,ais are

deterministically defined. With this assumption eq. (2.8) can be simplified as

Q ∝

( Ra∏

r=1

f (xt,a,r = xa,r|lt,a = 1)f (xt,a,r = xa,r|lt,a = 0)

)

P (at = a|at = a) (2.9)

At the end of each sequence p(at = a|t = , at = a) is calculated for all the symbols, ifthe maximum of these posterior probabilities is higher than a predefined confidence threshold, adecision to type the corresponding symbol is made. Otherwise, sequences are repeated until therequired confidence level is reached. If the confidence level is not reached in a predefined maximumnumber of repetitions bound for sequences, the symbol with the maximum a posteriori probabilityis chosen as the desired symbol.

2.3.4 System Operation Modes

The developed typing interface can currently be utilized in four different modes.

(i) Calibration mode: During calibration, the users are asked to attend to predefined targetsymbols within randomly ordered sequences to record labeled EEG data. The data acquiredin this mode are then used in the estimation of classifier parameters to be used in othersystem operation modes. The shrinkage and regularization parameters are optimized duringcalibration using k-fold cross-validation to maximize area under the ROC curve.

(ii) Copy phrase task mode: In this task, the users are given a set of predefined phrases. Eachphrase includes a missing word and the users are asked to complete these words. This task isdesigned to assess the system and/or user performance in terms of speed and accuracy in thepresence of a language model.

(iii) Mastery task mode: Users are trained to use the system in this mode. It is similar to the copyphrase task mode in that the users are asked to type a set of predefined phrases. In contrast,the phrases used in this task have been carefully selected and divided into 5 difficulty levelsbased on their predictability by the language model. As the user completes the phrases in alevel, the task continues with the next level with more difficult sentences3.

(iv) Free spelling mode. This mode allows the users to type their desired text.3Lower levels consist of copying phrases that have letters which are assigned high probabilities by the language model. As the levelincreases, the language model probabilities become increasingly adversarial. Level 3 is neutral on average.

30


(v) Simulation mode: In this mode, the copy phrase task is completed using samples drawn fromthe KDE of class conditional EEG feature distributions as computed in (4.15). These samplessimulate EEG evidence and are fused with the language model probabilities for decisionmaking as in regular operation [40]. Probability of completing the task and expected taskcompletion durations are reported as estimated performance measures using Monte-Carlosimulations.

In this paper, we use all modes of the system for the following experiments, except free spelling.

2.4 Experimental Results

2.4.1 Experiment

In this study we assess the system performance in three presentation scenarios:

∙ 4 × 7 Matrix row and column presentation (RCP) paradigm

∙ 4 × 7 Matrix single character presentation (SCP) paradigm

∙ Rapid serial visual presentation (RSVP) paradigm

The comparison is based on three dependent variables: signal quality, system accuracy and typingspeed. Following a group based analysis, we utilize paired t-tests to determine if the systemperformance varies significantly due to changes in the presentation paradigm or inter-trial interval(ITI) values. In addition, we perform paired t-tests within each user to assess the variations in P300responses due to different ITIs.

Twelve healthy volunteers, nine males and three females, between the ages of 24 and 38 years,consented to participate in this study, which is conducted following an IRB-approved protocol. Eachuser participated in three sessions, each session on a different day with the various presentationparadigms. It is possible for a participant to gradually obtain skills to handle the system moreefficiently, thereby introducing learning effects from session to session. To control for this effect,we relied on quasi-randomization; we distributed the presentation paradigms over the experimentalsessions such that the number of users who attended a session with a specific presentation paradigmon a specific session order is kept the same (balanced). Every session that a user attended includedcalibration tasks with 4 different ITI values of {200, 150, 100, 85}ms. These values are chosen tobe compatible with a 60 Hz monitor refresh rate and cover the range of possible optimum inter-trial

31


durations. To account for the effect of user fatigue on typing performance, we randomized the orderof ITI values for each presentation scenario and among all users. We used a duty cycle of 75% foreach flash.

After calibration, each session proceeded with the mastery task [83] followed by the copy phrasetask with 8 sentences. We use level 1 mastery task to familiarize the users with the copy phrase task.To prevent long sessions, the system marks a phrase as unsuccessful if more than 4 wrong letterselections occur in row, and the next phrase is presented to the user.

2.4.2 Results

2.4.2.1 Signal Quality

In their work, Sellers et. al. show that ITI effectively modifies the shape of the P300 response[84]. To investigate the effect of ITI on the P300 response, we analyzed the signal quality forevery presentation scheme and ITI combination using the calibration data collected for differentITI values. For such combinations we computed the area under the curve (AUC) for the ROC asthe classification accuracy measure. Within each presentation paradigm, we applied a paired t-testover these accuracy values. The results are reported in Figures 2.3(a), 2.3(b), 2.3(c) and Table 2.1for each paradigm. The three sub-figures correspond to different presentation paradigms and ineach sub-figure, the average accuracies for different ITIs are presented using bar-graphs with errorbars. Table 2.1 summarizes the paired t-test results between every ITI pair for each presentationparadigm.Table 2.1: Hypothesis testing results between different ITIs within each paradigm. The nullhypothesis is that the expected AUC difference of the two considered ITIs is zero. Here we used� = 0.05.

Paired t-test results between different ITIs in each paradigm

IT I1 v.s. IT I2 P -values in RCP P -values in SCP P -values in RSVP

85 v.s. 100 ms 0.068 0.593 0.15785 v.s. 150 ms 0.803 0.927 0.00185 v.s. 200 ms 0.550 0.240 0.009100 v.s. 150 ms 0.075 0.673 0.0008100 v.s. 200 ms 0.053 0.053 0.027150 v.s. 200 ms 0.570 0.236 0.693

From Table 2.1, we observe that the group based hypothesis testing does not show significant

32


85 100 150 2000.65

0.7

0.75

0.8

0.85

0.9

AU

C

ITI in ms

(a) ITI comparison in RCP.

85 100 150 2000.65

0.7

0.75

0.8

0.85

0.9

AU

C

ITI in ms

(b) ITI comparison in SCP.

85 100 150 2000.65

0.7

0.75

0.8

0.85

0.9

AU

C

ITI in ms

(c) ITI comparison in RSVP.

RCP RSVP SCP0.65

0.7

0.75

0.8

0.85

0.9

AU

C

Presentation Paradigm

(d) Paradigm comparison at ITI=150ms.

Figure 2.3: Bar charts of average AUC with error bars. Sub-figures (a), (b) and (c) demonstrate theaccuracy statistics for each ITI, respectively for RCP, SCP and RSVP paradigms. Sub-figure (d)reports the AUC statistics for different presentation paradigms at IT I = 150ms.

variations among classification accuracies due to changes in ITI values for the RCP paradigm. Theresults also suggest that the ITI value of 85 ms is the best candidate for matrix RCP paradigm. ThisITI offers shorter sequence times and a consistent higher average AUC (averaged across users) asshown in Figure 2.3(a). Our observations suggest that ERP responses in the RCP paradigm aremore robust to the changes in ITI values.

The SCP paradigm with an ITI of 200ms demonstrates the highest average AUC with the lowestvariance (figure 2.3(b)). Although, average AUCs across users show an increasing trend from ITI of

33


85 ms to 200ms, pairwise comparisons between different ITIs do not show statistically significantvariations in population AUCs (see table 2.1).

Generally, in matrix based presentation paradigms, variations in ITI values seem to have neg-ligible effect on system AUC. The usage of smaller ITIs might be preferable due to a possibledecrease in the sequence length, which might improve the speed of the typing interface. Moreover,it might be viable to optimize the matrix subset flashes based on context information to haveshorter sequence lengths and higher classification confidence by increasing the number of flashes ofprobable characters which can lead to faster target detections.

On the other hand, accuracies with the RSVP paradigm tend to be more sensitive to changes in ITIvalues (as shown in Table 2.1). The most significant increase in AUC happens from IT I = 100msto IT I = 150ms. The accuracy deviations between IT I = 85ms and IT I = 100ms and alsobetween IT I = 150ms and IT I = 200ms are not significant as reported in Table 2.1. Consequently,among the ITI values tested with the RSVP paradigm, IT I = 150ms is the best choice for systemdesign, since the accuracies between ITIs of 150 and 200 ms do not significantly change whileIT I = 150ms provides better speed. This is consistent with our previous work using RSVP forimage search [79]. In contrast with matrix based presentation paradigms, in the RSVP paradigm,users need to recognize the target symbols, which induces the weaker P300 signals, especially atlower ITIs as shown in Figure 2.3(c).

User U8 in RCP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200

ms

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(a) RCP

User U8 in SCP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200

ms

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(b) SCP

User U8 in RSVP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200

ms

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(c) RSVP

Figure 2.4: Average ERP response to target and non-target stimuli, for each presentation paradigmand ITI pairs for user "U8". From top to buttom the ITI is increasing monotonically.

To investigate signal quality variations due to ITI changes in each presentation scheme, we extractthe P300 peak values for every target stimulus at all channels per user, for different combinations ofITI values and presentation paradigms. In this process, we filter the EEG signal using a Gaussianlow pass filter with (� = 5 samples) to increase the signal to noise ratio (SNR). For each targettrial, we define a (16 × 1) dimensional feature vector with the itℎ element containing the peak valueof the EEG at channel i in the time window [250, 350]ms after stimulus onset. For every user andpresentation paradigm, we use these feature vectors in a multivariate paired t-test to investigate the

34

CHAPTER 2. LANGUAGE-MODEL ASSISTED BRAIN COMPUTER INTERFACE FOR TYPING: A COMPARISON OF MATRIX AND RAPID SERIAL VISUAL PRESENTATIONTable 2.2: Multivariate paired t-test results of P300 peak value for each subject among different ITIswithin each presentation paradigm. The null hypothesis is that the expected AUC difference of thetwo considered ITIs is zero within each paradigm. Here we used � = 0.05, i.e. H = 1 if P < 0.05and H = 0 otherwise.

Users U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 U11 U12

RCP H P H P H P H P H P H P H P H P H P H P H P H P85 v.s. 100 ms 0 0.18 0 0.09 0 0.19 0 0.27 1 0.01 0 0.85 0 0.16 1 0 1 0.02 0 0.16 1 0.02 1 0.04

85 v.s. 150 ms 0 0.61 1 0 1 0.03 0 0.89 1 0 0 0.28 1 0.01 1 0 1 0 0 0.32 0 0.18 0 0.73

85 v.s. 200 ms 1 0.02 0 0.27 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0.03 1 0.02 0 0.79

100 v.s. 150 ms 0 0.13 1 0 1 0.01 0 0.25 1 0 0 0.31 1 0.01 1 0 1 0 1 0.03 0 0.16 0 0.11

100 v.s. 200 ms 0 0.07 1 0 1 0 1 0 1 0 1 0 1 0 0 0.06 1 0 1 0 1 0 0 0.11

150 v.s. 200 ms 1 0 0 0.15 1 0 1 0 1 0 1 0 1 0 1 0 0 0.25 0 0.14 1 0 0 0.11

SCP H P H P H P H P H P H P H P H P H P H P H P H P85 v.s. 100 ms 0 0.35 0 0.13 1 0 0 0.31 0 0.88 0 0.1 1 0 1 0 0 0.75 1 0.04 1 0 0 0.08

85 v.s. 150 ms 0 0.47 1 0 0 0.95 0 0.15 0 0.72 0 0.11 1 0 1 0 0 0.33 1 0.03 1 0 0 0.06

85 v.s. 200 ms 0 0.18 1 0 1 0 0 0.33 1 0 1 0 1 0 1 0 1 0.03 0 0.16 1 0 0 0.08

100 v.s. 150 ms 0 0.05 1 0 1 0 1 0.05 0 0.85 1 0.04 1 0.01 1 0 0 0.38 0 0.29 0 0.93 0 0.28

100 v.s. 200 ms 1 0.02 1 0 1 0 0 0.55 0 0.1 1 0 1 0 1 0 0 0.47 0 0.39 0 0.09 1 0.03

150 v.s. 200 ms 1 0 1 0 0 0.36 1 0.04 1 0.01 1 0 1 0 1 0 1 0.05 0 0.18 0 0.48 0 0.2

RSVP H P H P H P H P H P H P H P H P H P H P H P H P85 v.s. 100 ms 0 0.06 0 0.69 0 0.1 0 0.38 1 0.02 1 0 0 0.42 0 0.14 0 0.57 0 0.64 0 0.24 0 0.07

85 v.s. 150 ms 0 0.63 0 0.09 1 0.02 0 0.07 0 0.36 1 0 1 0 1 0.04 1 0.05 0 0.19 0 0.37 0 0.05

85 v.s. 200 ms 1 0 0 0.15 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0.08 0 0.09

100 v.s. 150 ms 0 0.16 1 0.01 0 0.08 1 0.01 0 0.21 1 0 0 0.12 1 0.01 0 0.1 0 0.13 0 0.24 0 0.82

100 v.s. 200 ms 1 0 0 0.28 1 0 1 0 0 0.06 1 0 1 0 1 0 1 0.01 1 0.02 0 0.93 0 0.37

150 v.s. 200 ms 1 0 0 0.2 1 0 1 0 1 0.03 1 0 1 0.02 1 0 1 0 0 0.45 0 0.1 0 0.62

User U12 in RCP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200m

s

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(a) RCP

User U12 in SCP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200m

s

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(b) SCP

User U12 in RSVP paradigm

0 100 200 300 400 500−5

0

5

ms

µ V

Target

0 100 200 300 400 500−5

0

5

ms

µ V

non−Target

ITI=

85m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

100m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

150m

s

0 100 200 300 400 500−5

0

5

ms

µ V

0 100 200 300 400 500−5

0

5

ms

µ V

ITI=

200m

s

Fp1

Fp2

F3

F4

Fz

Fc1

Fc2

Cz

P1

P2

C1

C2

Cp3

Cp4

P5

P6

(c) RSVP

Figure 2.5: Average ERP response to target and non-target stimuli, for each presentation paradigmand ITI pairs for user "U12". From top to buttom the ITI is increasing monotonically.

P300 amplitude deviations across different ITI values. We report the results in Table 2.2. Comparingthe results for different paradigms, we do not observe a consistent change in P300 amplitude amongdifferent ITI values. For instance, variations in ITI can significantly change the P300 peak values ofuser U8 at every presentation paradigm as illustrated in figure 2.4, while this is not true for userU12 (see figure 2.5). Consequently, to acquire the best performance, we recommend that optimumITI be defined uniquely for each user.

35


2.4.2.2 System Accuracy based on Presentation Paradigm

We analyzed the changes in system classification accuracy across different presentation paradigms.Similar to the signal quality analysis, we employed AUC values as the measure of accuracy. We setIT I = 150ms, which provides good performance for all paradigms and is close to the ITI valuetypically used for matrix based presentation paradigms (125ms) [10]. We analyzed the changes

Table 2.3: Hypothesis testing results between different presentation paradigms at IT I = 150ms.The null hypothesis is that the expected AUC difference of the two considered paradigms is zero.Here we used � = 0.05.

Paired t-test results between different presentation paradigms

compaired paradigms H0 rejected P -value

RCP v.s. RSVP No 0.362RCP v.s. SCP No 0.453RSVP v.s. SCP No 0.188

in AUC values across different presentation paradigms using a paired t-test between differentpresentation paradigms. We report the results in Table 2.3. These results do not illustrate significantchanges due to different presentation paradigms. Moreover, in Figure 2.3(d), we plot the AUCvalues averaged over all the users for different presentation schemes. This figure shows that theaverage AUC values in matrix based paradigms are higher than in the RSVP paradigm. However,the paired t-test outcomes do not confirm statistically significant separations among these averageAUC values. Based on these results, we propose that the system accuracy might be more dependenton the user than the presentation paradigm at IT I = 150ms.

In Figure 2.6, we plot the channel by channel significance levels for the paired t-tests betweendifferent presentation paradigms. From this figure, we first observe that there is no statisticallysignificant difference between different presentation paradigms. This result is consistent with theresults that we report in Table 2.3.

We also observe that there is no consistent electrode subset that shows significant differenceamong different presentation paradigms. We also plot the AUC values calculated using eachchannel separetely, for each user, for different presentation paradigms, in Figure 2.7. For a specificpresentation paradigm, this figure does not show a consistent region on the scalp, across differentusers, for high accuracy4. Based on these results, we suggest that an optimum presentation paradigmis user dependent.4One may need to optimize best electrode locations for each paradigm-user combination to maximize performance

36


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) RCP

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) SCP

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) RSVP

Figure 2.6: Topography map of (1−p) resulting from paired t-tests for each channel’s AUC betweeneach paradigm pair and across users for IT I = 150ms. Here red denotes 1 − p = 1 and bluerepresents 1 − p = 0.

2.4.2.3 Typing Speed

We analyze the differences in the typing speed across different presentation paradigms employingthe average number of sequences per target trial as the measure of speed-inverse (time-spent perletter). Conventionally, in the RCP paradigm, during each sequence, all rows and columns areflashed once (which results in 11 flashes during a sequence in this study). On the other hand,the RSVP paradigm has previously demonstrated almost optimized performance with 8 trialsin a sequence [39]. Accordingly, to keep the analysis equitable, we use the average number ofsequences per target trial as the measure of time spent. We set IT I = 150ms. In addition to theexperimental results, we also perform 20 Monte-Carlo simulations of the copy phrase task forevery user under different presentation paradigms, using the corresponding calibration EEG data togenerate simulated EEG evidence.

We report both the simulation and experimental results in Figure 2.9. For different presentationparadigms, this figure shows the average number of sequences per target trial and task completionprobabilities versus AUC values of different users. We observe that both minimum and maximumvalues of user AUCs are smaller in the RSVP paradigm than the matrix based presentation schemes.In the RCP paradigm, each symbol is represented twice in a sequence. Subsequently, the numberof data points from the target class for recorded EEG during the calibration task is twice the otherparadigms. This can lead to a more accurate estimation of classifier parameters, which then leads tosmaller average numbers of sequences per target trial and higher task completion probability.

In general, actual typing performance in the SCP paradigm shows a behavior consistent withsimulation results. In all paradigms, simulation results are reasonably predictive of the actual typingtask statistics for larger AUCs. Mismatch between simulation results and actual user typing speeds

37


is more frequent in the RSVP paradigm. This maybe because the user AUCs are generally lowerfor the RSVP paradigm, since the requirement to recognize the target symbol might impose morecognitive load and require more attention from the user5. However, some participants still showfaster typing performance with RSVP than the matrix based presentation schemes, see Table 2.4.

Table 2.4: Typing speed results for each user and paradigm combination. Here "Average ± standarddeviation" of sequence count per target (correctly typed) symbol is reported.

U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 U11 U12RSVP 3.98 ± 2.7 12.74 ± 5.74 3.05 ± 0.69 10.74 ± 3.35 5.6 ± 1.46 3.35 ± 1.48 2.04 ± 0.35 8.64 ± 5.15 2.44 ± 0.76 10.89 ± 4.3 6.59 ± 3.76 5.84 ± 3.87SCP 1.29 ± 0.26 10.96 ± 4.9 1.48 ± 0.32 3.55 ± 1.78 5.54 ± 1.97 3.85 ± 2.04 2.21 ± 0.62 5.08 ± 2.44 3.1 ± 1.36 2.52 ± 0.95 2.03 ± 0.42 8.55 ± 4.26RCP 2.73 ± 0.85 5.67 ± 4.53 3.1 ± 0.83 8.92 ± 3.83 7.48 ± 3.55 4.1 ± 1.32 3.38 ± 1.68 2.09 ± 0.57 2.09 ± 0.57 3.01 ± 1.11 1.93 ± 0.5 7.14 ± 4.09

From this table, user U7 shows better typing performance when using the RSVP paradigm (seesub-figures 2.9(a), 2.9(b), 2.9(c)), while users U3 and U9 spelled target phrases with lower averagenumber of sequences when using, SCP and RCP paradigms (see sub-figures 2.9(d), 2.9(e), 2.9(f)and 2.9(g), 2.9(h), 2.9(i)), respectively. Accordingly, the choice for the best presentation schemeshould be user dependent.

2.4.2.4 Effect of Language Model on Typing Duration

We employ the simulation mode of the system to asses the effect of the language model on the(estimated) performance of each presentation paradigm. We perform 10 Monte-Carlo simulations(of the copy phrase task) with and without the language model to estimate the typing speed underboth conditions using calibration EEG data from each user. We represent the typing speed as theaverage number of sequences for correctly typing a character, Navg . The results of Navg shownin Figures 2.10(a), 2.10(b) and 2.10(c) indicate that the language model significantly improvesthe performance for all three presentation paradigms. This is seen in the form of reduced averagesequence counts required to type a target symbol correctly, as well as reduced variance. That is,without a language model, the mean values of Navg are larger for all the users and the standarddeviations of Navg are larger for most of the participants. For RCP, users with lower AUC (largersequence counts for the without-LM axis) seem to increasingly benefit from the assistance of thelanguage model in this task (see Figure 2.10(a)). In the case of SCP (Figure 2.10(b)) and RSVP(Figure 2.10(c)), while the same trend is observed for high to moderately good AUCs, for users withthe lowest several AUCs (appearing on the right-most side of their respective plots), the consistencyof language model assistance is not as good as that in the case of RCP. This inconsistent behaviorseems to occur due to low AUCs (for AUC < .74). This suggests that, for some users with low5This claim is mainly based on the users’ feedback after each session. They described it as more challenging to spot the desiredcharacter in RSVP paradigm especially for the sessions with smaller IT Is.

38


classification performance we may need to collect more training samples in the calibration session,for them to be able to benefit from the language model assistance.

2.5 Conclusion

In this manuscript we compared three different presentation paradigms, (i) 4 × 7 matrix rowand column (ii) 4 × 7 matrix single character and (iii) rapid serial visual presentation, utilizing alanguage-model-assisted EEG-based letter by letter typing BCI. The underlying intent inferenceengine used tight fusion of language and EEG evidence, as described in earlier papers on RSVP Key-board™ [40, 38, 39]. Twelve participants were recruited to use the system in four different ITIs of{85, 100, 150, 200}ms for each presentation scheme. The order of paradigm presentations for eachsession and each user were quasi-randomized. The same classifier, language model, and fusion rulewere used for all paradigms and ITI combinations.

Through this study, we illustrated that the best presentation paradigm and ITI combination amongthe ones presented in this study should be identified for each user individually to achieve the bestperformance. Also, we showed that the performance of the RSVP paradigm is comparable to matrixbased presentation paradigms with healthy users. Based on our results, we propose that BCI typingsystems capable of employing multiple presentation schemes including both RSVP and matrixpresentation paradigms are inevitable. This system, after individual clinical assessments, shouldbe able to determine the best presentation option and the best ITI value for each user, according touser preferences, capabilities, EEG signal statistics, and simulations.Moreover, the length of thecalibration session might need to be increased based on the classification performance for a user ateach presentation paradigm.

A side product of this work is that we now have a unified BCI typing interface that has bothRSVP and matrix presentation options along with a MAP intent inference engine that tightly fusesn-gram symbol and EEG evidence. It is an open vocabulary typing interface with the potential to beindividualized by personal language models and the incorporation of supplementary physiologicaland behavioral evidence about intent, for instance via EMG or switches. Other open problemsinclude improved signal models for more accurate performance simulations and run-time intentinference, optimized dynamic selection of stimulus subsets to be presented in each trial for theupcoming sequence, and rigorous field testing to compare RSVP and matrix presentation paradigmson potential user populations.

39


Figure 2.7: Topography of channel based AUCs for each user at IT I = 150ms.40


0.7 0.75 0.8 0.85 0.9 0.950

5

10

15

Number of sequences for each target symbol

AUC

# s

eq

uen

ces/#

targ

ets

90% area

Expected value

Actual value

0.7 0.75 0.8 0.85 0.9 0.950

0.5

1

Probability of phrase completion

AUC

P

90% area

Expected value

Actual value

(a) RCP

0.7 0.75 0.8 0.85 0.9 0.950

5

10

15


AUC

# s

eq

uen

ces/#

targ

ets

90% area

Expected value

Actual value

0.7 0.75 0.8 0.85 0.9 0.950

0.5

1


AUC

P

90% area

Expected value

Actual value

(b) SCP

0.7 0.75 0.8 0.85 0.9 0.950

5

10

15


AUC

# s

eq

uen

ces/#

targ

ets

90% area

Expected value

Actual value

0.7 0.75 0.8 0.85 0.9 0.950

0.5

1


AUC

P

90% area

Expected value

Actual value

(c) RSVP

Figure 2.8: Typing speed analysis results. Average number of sequences per (typed) target character(lower means faster typing) and probability of phrase completion (higher means more accuracy) areshown. Simulation results are used to define the shaded 90% confidence area shown. The dashedline shows the expected value from simulation for each variable and the solid line shows actualtyping outcomes in a single experimental run that follows.

41


WI LL RUN AND BETWEEN ADULT PLEASE ARE MAKEUP0

2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(a) U7, RSVP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(b) U7, SCP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(c) U7, RCP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(d) U3, RSVP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(e) U3, SCP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(f) U3, RCP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(g) U9, RSVP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(h) U9, SCP


2

4

6

8

10

12

14

16

18

20

22

Target phrases

Nu

mb

er

of

se

qu

en

ce

s

Correct

Fix

Error

(i) U9, RCP

Figure 2.9: Number of sequences utilized by users U7, U3, and U9 to type each target characterusing RSVP, SCP and RCP paradigms. Red bars show the sequence counts for epochs that typed awrong character and yellow bars show the number of sequences used to fix the error before typingthe correct target. Green bars show the number of sequences in epochs that resulted in correctselection of target symbols (lower means faster typing).

42


(a) RCP (b) SCP

(c) RSVP

Figure 2.10: Scatter plot of the average number of sequences for correctly typing a target character.The x-axis demonstrates the mean number of sequences per target character when no languagemodel is used, y-axis represents the mean number of sequences required per target character while a6-gram language model is utilized. Each point on the figure shows the average of the mean numberof sequences per target from 10 Monte-Carlo simulations. Horizontal skewness of each box arounda point is the standard-deviation of the number of sequences per target character for typing whileno language model was used and the vertical skewness is the standard-deviation in presence of thelanguage model.

43

Chapter 3

Spatio-Temporal EEG Models for BCIs

Paula Gonzalez-Navarro1,Mohammad Moghadamfalahi1, Student Member, IEEE,

Murat Akcakaya2, Member, IEEE,and Deniz Erdogmus1, Senior Member, IEEE

1Northeastern University, Boston, MA 02115,2University of Pittsburgh, Pittsburgh, PA 15260

E-mails: {gonzaleznavarro,moghadam,erdogmus}@ece.neu.edu,[email protected]

Phone: +1-617-3733021

3.1 abstract

Multichannel electroencephalography (EEG) is widely used in non-invasive brain computerinterfaces (BCIs) for user intent inference. EEG can be assumed to be a Gaussian process withunknown mean and autocovariance, and the estimation of parameters is required for BCI inference.However, the relatively high dimensionality of the EEG feature vectors with respect to the numberof labeled observations lead to rank deficient covariance matrix estimates. In this manuscript, toovercome ill-conditioned covariance estimation, we propose a structure for the covariance matrices

This work is supported by NIH 2R01DC009834, NIDRR H133E140026, NSF CNS-1136027, IIS-1118061, IIS-1149570, CNS-1544895, SMA-0835976.For supplemental materials, please visit “http://hdl.handle.net/2047/D20199232” for the CSL Collection in the NortheasternUniversity Digital Repository System.

44

http://hdl.handle.net/2047/D20199232

CHAPTER 3. SPATIO-TEMPORAL EEG MODELS FOR BCIS

of the multichannel EEG signals. Specifically, we assume that these covariances can be modeled asa Kronecker product of temporal and spatial covariances. Our results over the experimental datacollected from the users of a letter-by-letter typing BCI show that with less number of parameterestimations, the system can achieve higher classification accuracies compared to a method thatuses full unstructured covariance estimation. Moreover, in order to illustrate that the proposedKronecker product structure could enable shortening the BCI calibration data collection sessions,using Cramer-Rao bound analysis on simulated data, we demonstrate that a model with structuredcovariance matrices will achieve the same estimation error as a model with no covariance structureusing fewer labeled EEG observations.

Keywords–Structured covariance matrices, Kronecker product, brain computer interface, multi-channel electroencephalogram (EEG), auto-regressive (AR) model, linear mixture.

3.2 Introduction

Electroencephalography (EEG)-based brain computer interfaces (BCIs) offer people with severespeech and physical impairments (SSPI) an alternative communication method[3]. Event related po-tentials (ERPs), steady state evoked potentials (SSVEP) or voluntarily controlled cortical potentialsare commonly employed by the EEG-based BCIs to detect the user intent [10, 3, 1, 36, 38, 72, 85,86].

In most EEG-based BCI systems, signal recorded from multiple channels along the scalp isassumed to be a Gaussian process with an unknown covariance and mean [87, 88, 89]. In this setupa quadratic discriminant analysis (QDA) can achieve the minimum risk classification performance.To estimate the mean and the covariance for QDA, prior to operation of a BCI system, superviseddata are collected through a calibration session [90, 91, 1, 92, 93]. Typically, the multivariate EEGsignal recorded from multiple channels in a time window is stacked into a feature vector, and featurevectors extracted from the supervised EEG data in such a fashion are used to estimate the covarianceand the mean of the Gaussian process. In such an approach, due to the high dimensionality of theEEG feature vectors, an invertible estimation of covariance requires many supervised data samples;that is, a longer calibration session. However, a long calibration session is not always possible,since it might decrease the system practicality and lead to user frustration. In consequence, theestimated sample covariance is rank-deficient and over-fitted to the data [94, 95]. In our earlierwork, in the development of a language model assisted letter-by-letter typing BCI, which wecall as RSVPKeyboard™, we have utilized regularized discriminant analysis (RDA) to overcomethis issue [38, 40]. In a classification setup, RDA applies regularization and shrinkage in theestimation of covariances [96]. This approach has shown promising results [38, 40]. However such

45


an approach, ignores any inherent structure of covariance matrices and may require more labeleddata for estimation compared to methods that impose a covariance structure.

In this manuscript, we hypothesize that imposing a Kronecker product (KP) structure on acovariance matrix of the EEG signal model will result in a smaller number of parameters to beestimated and hence, a smaller training set from a calibration session. This structure implies thatthe multichannel EEG signal is assumed to have a covariance matrix that is the KP of temporal andspatial covariances. This means that the KP structure assumes spatiotemporal stationarity over themultichannel EEG. In this paper, we describe two signal modeling approaches for the multichannelEEG signal which will lead to KP covariance structure under certain assumptions. In our approach,we use a spatio-temporal linear forward model from unknown brain sources to the EEG electrodesites to model the multichannel EEG. We also show that this is equivalent to using an auto-regressivemoving average (ARMA) model over time and channels. Over this model, we show that undercertain independence and stationarity assumptions the full covariance matrix can be factorized as aKronecker product of temporal and spatial covariance matrices. We then introduce further temporalassumptions on the signal to achieve specific structures on temporal covariance matrix: (i) Toeplitzmatrix structure, (ii) auto-regressive model with degree 1 (AR(1)) structure, and (iii) identity matrixstructure. These additional structures further decrease the number of parameters to be estimated.We also compute the Cramer-Rao bound (CRB) on covariance estimation error and use this as aperformance metric to analyze the number of observations required to achieve a desired estimationaccuracy; that is, the CRB indicates a calibration session length. We have collected data from 12healthy participants using RSVPKeyboard™and compared their typing performance under differentcovariance structures. Our results show that with much lower model complexity, the structuredcovariance assumption provides the same or better typing accuracy compared to a model with anunstructured covariance.

3.3 RSVPKeyboard™

In this section, we briefly describe the main components of RSVPKeyboard™. Interested readerscould refer to [97] for further information on RSVPKeyboard™.

3.3.1 Presentation Paradigms

RSVPKeyboard™employs three different presentation paradigms.

46


a. Rapid Serial Visual Presentation (RSVP): is a paradigm for which a set of pseudo-randomlyordered stimuli are presented on prefixed location of the screen in a rapid serial manner. InRSVP each stimulus is a trial and a set of trials which have been presented with no time gapin between, is called a sequence. Every sequence contains only a single target stimulus.

b. Single Character Presentation (SCP): consists of sequential flashes of individual charactersplaced on a R × C matrix background in a pseudo-random order. In SCP each stimulus isa trial and a set of trials which have been presented with no time gap in between is called asequence. Every sequence contains a single target stimulus.

c. Row and Column Presentation (RCP): consists of sequential flashes of sets of characterswhich are placed on a row or a column of a R × C matrix background in a pseudo-randomorder. In RCP each row or column is called a trial and a full set of rows and columns which areflashing with no time gap in between is called a sequence. Every sequence in RCP, containstwo target stimuli.

3.3.2 Probabilistic Inference Mechanism

RSVPKeyboard™ utilizes a maximum a-posteriori (MAP) inference for intent detection. As-sume as the dictionary of all candidates for user intent and s ∈ . We define s∗ as the randomvariable which represents the actual user intent and s is the system estimation. Then the decisionrule is as follows:

s =argmaxs∈

P (s∗ = s| ;C) (3.1)

Here represents the recorded EEG during the presentation of a sequence of symbols and Cdemonstrates the non-EEG contextual evidences (for detailed information please see [1]). Prior toinference the system perform EEG prepossessing and feature extraction followed by a quadraticprojection to improve classification performance.

3.3.3 Signal Processing and Feature Extraction

The EEG time signal is recorded from Ncℎ channels along the scalp at a particular samplingrate (here 256 Hz). We utilize a bandpass filter to increase the signal to noise ratio (SNR) andeliminate the slow drifts. Based on the high cut-off frequency of the filter, one can propose to reduce

47


the sampling rate while preventing from aliasing. The multivariate EEG signals recorded frommultiple channels in a time window of [0, w) ms, from the onset of every trial, is assigned as theEEG features for that trial. We concatenate the multivariate EEG time samples to form the featurevector. Assume vi[n], the multivariate measurement at the time instant n, defined as follows:

vi[n] =[

vi1[n] vi2[n] … viNcℎ

[n]]T∈ ℝNcℎ (3.2)

where vicℎ[n] is the ntℎ time sample recorded at channel cℎ for trial i (note that n is measured withrespect to each trial one-set time). Then we can define yi as the feature vector for trial i as follows:

yi =[

vi[1]T vi[2]T … vi[Nt]T]T

∈ ℝNcℎ⋅Nt (3.3)

Here Nt is the number of time samples in a window of [0, w) ms.

We further perform a quadratic projection of these feature vectors on to a one dimensionalevidence space. One can propose the quadratic discriminant analysis (QDA) to obtain a minimumexpected classification risk for this projection. In QDA the log-likelihood of the observations can bedefined as:

li(yi) = logf (yi; �1,�1) �1f (yi; �0,�0) �0

(3.4)

where f (⋅, ��, ��) is the multivariate normal density function with mean ��, covariance ��, andprior ��. This log-likelihood ratio li(yi) from equation (3.4) is the EEG evidence for the itℎ trialwhere � = 1 represents the target class and � = 0 defines the non-target class.

In order to perform the projection, explained in (3.4) one needs to estimate the class conditionalprobability distributions of target (� = 1) and non-target (� = 0) classes. RSVPKeyboard™utilizesa set of supervised data collected in a calibration session, to estimate the class conditional probabilitydistributions in maximum likelihood framework. Typically, a calibration session consists of 100sequences of many trials. Before every sequence, the user is presented with a target stimulus whichis expected to be the intent of the user during the presentation of that sequence.

The parameters of class conditional probability density functions (PDFs) f (⋅, ��, ��) are

48


estimated as follows:

µ� =1N�

N∑

i=1yi�( i, �)

�� =1N�

N∑

i=1(yi − µ�)(yi − µ�)T �( i, �)

(3.5)

where i ∈ {0, 1} is the label of yi, �(., .) is the Kronecker delta function, � ∈ {0, 1} is the classlabel, N represents the total number of observations and N� =

∑Ni=1 �( i, �) is the number of

observations in class �.

Estimating an invertible covariance matrix �� from equation (3.5) requires at least Ncℎ ⋅Nt + 1observations in each class. However, this can lead to extremely long calibration sessions whichpractically is not feasible. This issue can be relaxed by regularizing the estimated covariances as inregularized discriminant analysis (RDA) which is a generalization of QDA [96]. RDA regularizesthe class conditional covariance matrices in two steps of (i) shrinkage and (ii) regularization aspresented in (3.6).

��(�) =(1 − �)N�� + (�)

∑1�=0N��

(1 − �)N� + (�)∑1

�=0N�

��(�, ) = (1 − )��(�) + ( )1ptr[��(�)]Ip

(3.6)

Here, �, ∈ [0, 1] are the shrinkage and regularization parameters, tr[⋅] is the trace operator andIp is an identity matrix of size p × p. In our system, we typically utilize k-fold cross-validation tochoose the optimal values of hyper-parameters � and .

In addition, as we show in this manuscript, an EEG signal model under specific assumptions canfactorize the class conditional covariance matrices as the following Kronecker product.

�� = ��t ⊗ �

�c (3.7)

In this equation ��t and ��c are the temporal and spatial covariance matrices, respectively. Sucha structure can reduce the number of parameters to be estimated dramatically and results in aninvertible covariance estimate from the calibration data. Next, we describe the models that wepropose for the EEG the signal and introduce different covariance structures.

49


3.4 Signal Models and Covariance Matrix Structures

In this section, we first model the multichannel EEG as a linear forward model from unknownbrain sources and then show that this modeling approach is equivalent to using an ARMA modelover the multichannel EEG under certain assumptions. Linear forward model describes the EEGrecorded from Ncℎ channels along the scalp as a linear combination of signals originated fromNs brain sources. In Section 3.4.1, we first show that certain assumptions on this linear forwardmodel leads to a Kronecker product structure as described above. We then model the brain sourcesignals as: (i) a stationary signal, (ii) an auto-regressive process of order one AR(1) and (iii) anauto-regressive process of order zero AR(0) to impose more structure on the temporal covarianceand hence reduce the number of parameters even further. Then in Section 3.4.2, we show that anARMA model over the multichannel EEG also leads to the same spatiotemporal KP structure undercertain stationarity assumptions imposed over time and EEG channels.

3.4.1 Linear Forward Model

We propose to define the multichannel EEG signal as a linear combination of unknown brainsources. Assume

si[n] =[

si1[n] sij[n] … siNs

[n]]T∈ ℝNs

as the vector of brain source signals, where Ns is the unknown number of brain sources and sij[n] isthe generated signal from source j at time sample n for the ith trial. Then the linear model can bedefined as:

vi[n] = Hsi[n] ∈ ℝNcℎ (3.8)

where

H =

⎡

⎢

⎢

⎢

⎢

⎣

ℎ1,1 … ℎ1,Ns

ℎ2,1 … ℎ2,Ns

⋮ ⋮ℎNcℎ,1 … ℎNcℎ,Ns

⎤

⎥

⎥

⎥

⎥

⎦

∈ ℝNcℎ×Ns (3.9)

is the forward combination matrix for which, every ℎc,j , ∀ 0 < c ≤ Ncℎ, 0 < j ≤ Ns is a realcoefficient. In this manuscript we assume that the number of brain sources is higher than or equal tothe number of channels, which means that the matrix H is a full row rank matrix.

We then define xi as the feature vector of brain sources such that

xi =[

si[1]T si[j]T … si[Nt]T]T∈ ℝNs⋅Nt (3.10)

50


In this model, assuming that the matrix H is constant through time, the linear model of the EEGfeature vector, yi, which is defined in equation (3.3), becomes

yi = Hfxi (3.11)

where Hf is a time invariant block diagonal matrix of dimensions (NtNcℎ) × (NtNs) and theNcℎ ×Ns blocks are equal to the matrix H.

Hf =

⎡

⎢

⎢

⎢

⎢

⎣

H 0 … 00 H 0 0⋮ 0 ⋱ 00 0 0 H

⎤

⎥

⎥

⎥

⎥

⎦

∈ ℝ(Nt⋅Ncℎ)×(Nt⋅Ns) (3.12)

We further assume that s1[n], s2[n], ⋯ , si[n], ⋯ ∼ Ns(�s[n],�s[n, n]) are i.i.d. samples,

where�s[n] = E[s[n]]�s[n, n] = Cov[si[n], si[n]]

(3.13)

Accordingly, from (3.8) we can define vi[n] ∼Ns(�v[n],�v[n, n]) for which we have:

�v[n] = E[v[n]]�v[n, n] = Cov[vi[n], vi[n]] = H�s[n, n]H

T (3.14)

Therefore, under these assumptions, one should consider that xi ∼ Nt⋅Ns(�x,�x) and also yi ∼

Ncℎ⋅Ns(�y,�y).

3.4.1.1 Assumptions on the Spatial Characteristics of the Brain Sources

In our approach, the factorization in (3.7) achieved using the following assumptions. We firstassume that the unknown brain sources are statistically independent from each other. This impliesthat �s[m, n] ∀ 0 ≤ m, n ≤ Nt is a diagonal matrix.

�s[m, n] =

⎡

⎢

⎢

⎢

⎢

⎣

cs[m, n]1,1 0 … 00 cs[m, n]2,2 0 0⋮ 0 ⋱ 00 0 0 cs[m, n]Ns,Ns

⎤

⎥

⎥

⎥

⎥

⎦

(3.15)

51


where cs[m, n]j,j is a scalar value which represents the correlation between the mth and nth timesamples of the j th brain source. Furthermore, assuming that all brain sources have the samecovariance structure; we have:

�s[m, n] =

⎡

⎢

⎢

⎢

⎢

⎣

cs[m, n] 0 … 00 cs[m, n] 0 0⋮ 0 ⋱ 00 0 0 cs[m, n]

⎤

⎥

⎥

⎥

⎥

⎦

= cs[m, n] ⋅ INs(3.16)

Then following equation (3.11), we can write �y as follows:

�y = Hf�xHTf =

⎡

⎢

⎢

⎢

⎢

⎣

H�s[1, 1]HT … H�s[1, Nt]H

T

H�s[2, 1]HT … H�s[2, Nt]H

T

⋮ ⋮H�s[Nt, 1]H

T … H�s[Nt, Nt]HT

⎤

⎥

⎥

⎥

⎥

⎦

=

⎡

⎢

⎢

⎢

⎢

⎣

Hcs[1, 1]INsHT … Hcs[1, Nt]INs

HT

Hcs[2, 1]INsHT … Hcs[2, Nt]INs

HT

⋮ ⋮Hcs[Nt, 1]INs

HT … Hcs[Nt, Nt]INsHT

⎤

⎥

⎥

⎥

⎥

⎦

=

⎡

⎢

⎢

⎢

⎢

⎣

cs[1, 1] … cs[1, Nt]cs[2, 1] … cs[2, Nt]⋮ ⋮

cs[Nt, 1] … cs[Nt, Nt]

⎤

⎥

⎥

⎥

⎥

⎦

⊗HHT

(3.17)

Note that equation (3.17) is in the same factorization form as (3.7). Throughout this manuscript thematrix HHT ∈ ℝNcℎ×Ncℎ is assumed as the spatial covariance matrix �cℎ. For this spatial matrix tobe invertible, we assume that the number of the brain sources is at least the same as the number ofthe channels. The temporal covariance matrix is defined as follows:

�t =

⎡

⎢

⎢

⎢

⎢

⎣

cs[1, 1] … cs[1, Nt]cs[2, 1] … cs[2, Nt]⋮ ⋮

cs[Nt, 1] … cs[Nt, Nt]

⎤

⎥

⎥

⎥

⎥

⎦

∈ ℝNt×Nt (3.18)

52


Accordingly we model the full covariance matrix of the EEG feature vectors as a Kronecker productof a spatial and temporal covariances matrices

�y = �t ⊗ �cℎ (3.19)

Under this covariance structure the number of parameters that need to estimated is:

Np1 =Ncℎ(Ncℎ + 1)

2+Nt(Nt + 1)

2(3.20)

Different assumptions for specific temporal covariance structures are described next.

3.4.1.2 Assumptions on the Temporal Characteristics of the Brain Sources

Brain sources as stationary processes:If we assume that the brain sources are stationary in time with �s[m, n] = �s[m − n]; that is, wehave cs[m, n] = cs[m − n]. Then, the temporal covariance matrix becomes a Toeplitz matrix

�t =⎡

⎢

⎢

⎣

cs[0] … cs[Nt − 1]⋱ ⋮

cs[Nt − 1] … cs[0]

⎤

⎥

⎥

⎦

(3.21)

and accordingly the full covariance matrix is characterized by


2+Nt (3.22)

number of parameters.

Brain sources as an Autoregressive (AR) model:The pth order AR model of the multivariate signal si[n] can be written as

si[n] =p∑

k=1Aksi[n − k] + ei[n] (3.23)

where Ak represents the Ns × Ns weight matrix at time lag k, and ei[n] is zero mean additiveGaussian noise at time n that is independent from the sources and independent and identicallydistributed in time. Further, we assume that not only the brain sources are statistically independent

53


but also the AR model among all the sources is invariant. Then, we rewrite (3.23) as

si[n] =p∑

k=1ak ⋅ si[n − k] + ei[n] (3.24)

where ak is a scalar weight for the signal at time lag k. In our approach, we consider p = 1 in thissection and p = 0 later in Section III.C.3. When we have p = 1, (3.24) becomes

sin = a1 ⋅ si[n − 1] + ei[n] (3.25)

Then, assuming thats[0] ∼Ns

(�s[0],�s[0, 0]) (3.26)

from (3.25), we define the mean and the covariance as

si[n] = (a1)n ⋅ s[0] ⇒ E[si[n]] = (a1)n ⋅ �s[0], (3.27)

and for m ≤ n,�s[m, n] = E{si[n]si[m]T } − E{si[n]}E{si[m]}T

= a|(n−m)|1 ⋅ (E{si[m]si[m]T } − �s[m]�s[m]T )= a|(n−m)|1 ⋅ �s[m,m]= a|(n−m)|1 ⋅ �s[0, 0]

(3.28)

One should note that �s[n, m] = �s[m, n] and �s[n, m] is a symetric matrix. Then by substituting(3.28) in (3.10), the covariance matrix of the brain source signals �x becomes

�x =

⎡

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎣

�s[1, 1] … �s[1, Nt]�s[2, 1] �s[2, Nt]⋮ ⋮

�s[Nt, 1] … �s[Nt, Nt]

⎤

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

=

⎡

⎢

⎢

⎢

⎢

⎣

1 a(1)1 … a(Nt−1)1

a(1)1 1 … a(Nt−2)1

⋮ ⋮ ⋱ ⋮a(Nt−1)1 a(Nt−2)

1 … 1

⎤

⎥

⎥

⎥

⎥

⎦

⊗ �s[0, 0]

(3.29)

Considering the assumptions introduced in subsection (3.4.1.1), we have �s[m, n] = cs[m, n] ⋅

54


INs∀ m, n, then from equation (3.28), we have:

�s[0, 0] = cs[0] ⋅ INs(3.30)

�s[m, n] = a(|n−m|)1 ⋅ �s[0] = a

(|n−m|)1 ⋅ cs[0] ⋅ INs

(3.31)

Using the linear forward model and the equation (3.31), we compute the full covariance matrix of�y as:

�y = Hf�xHTf

=

⎡

⎢

⎢

⎢

⎢

⎣

Hcs[0]INsHT … Ha(Nt−1)

1 cs[0]INsHT

Ha(1)1 cs[0]INsHT … Ha(Nt−2)

1 cs[0]INsHT

⋮ ⋮Ha(Nt−1)

1 cs[0]INsHT … Hcs[0]INs

HT

⎤

⎥

⎥

⎥

⎥

⎦

= cs[0] ⋅

⎡

⎢

⎢

⎢

⎢

⎣

1 a(1)1 … a(Nt−1)1

a(1)1 1 … a(Nt−2)1

⋮ ⋮ ⋱ ⋮a(Nt−1)1 a(Nt−2)

1 … 1

⎤

⎥

⎥

⎥

⎥

⎦

⊗HHT

(3.32)

where HHT is the spatial covariance matrix and the temporal covariance matrix is defined as follows

�t = cs[0] ⋅

⎡

⎢

⎢

⎢

⎢

⎣

1 a(1)1 … a(Nt−1)1

a(1)1 1 … a(Nt−2)1

⋮ ⋮ ⋱ ⋮a(Nt−1)1 a(Nt−2)

1 … 1

⎤

⎥

⎥

⎥

⎥

⎦

(3.33)

Where cs[0] is a constant value which is absorbed in the spatial covariance. Therefore, the cs[0]in equation 3.33 is estimated through the estimation of the spatial covariance and the temporalparameters in Np3 is then equal to 1. Under these assumptions the full covariance matrix can befully characterized by:


2+ 1 (3.34)

number of parameters.

Brain sources as temporally independent processes:

Under this assumption, we set p = 0 in equation (3.24), then �s[m, n] = 0 ∀m ≠ n and the

55


temporal covariance �t is�t = INt

(3.35)

This structure requires:


2(3.36)

number of parameters to characterize the full covariance.

Note that in this section, one can observe that the number of parameters to be estimated tocharacterize the covariance matrix decreases as more restricted structures are assumed on the EEGsignal model. Assume Np0 as the number of parameters of the non-structured full covariance, then:

Np0 =NcℎNt(NcℎNt + 1)

2> Np1 > Np2 > Np3 > Np4 (3.37)

3.4.2 ARMA model for the multicahnnel EEG signal

We define an ARMA model for the multichannel EEG corresponding to the ith trial, vi[n] asdefined in (3.2), as follows:

vi[n] =p∑

k=1Rkvi[n − k] +

q∑

j=0bjwi[n − j] (3.38)

In (3.38), Rk represents the Ncℎ × Ncℎ signal weight matrix at lag k, bj is an scalar weight fornoise at lag j and w[n] represents multivariate wide sense stationary Gaussian noise for the nth timesample that is independent of vi[n].

3.4.2.1 Assumptions on the Spatiotemporal Characteristics of the EEG signal

If we assume that the EEG signals among the channels is stationary. Then one can write (3.38)as:

vi[n] =p∑

k=1rk ⋅ vi[n − k] +

q∑

j=0bjwi[n − j] (3.39)

in which rk is an scalar weight of time for the signal at time lag k. Now lets further assume p = 1and bj = 1 for j = 0 and bj = 0 ∀j = 1… q such that wi represents the wide sense stationaryGaussian noise which is independent than the previous measurement vi[n − 1]. Accordingly we canwrite:

56


vi[n] = r1 ⋅ vi[n − 1] + wi[n] (3.40)

Lets assume m < n and with wide-sense stationarity assumption over channels we will have:

�v[m, n] = E{vi[n]vi[m]T } − E{vi[n]}E{vi[m]}T

= r(n−m)1 ⋅ (E{vi[m]vi[m]T } − �v[m]�Tv[m])= r(n−m)1 ⋅ �v[m,m]= r[|n − m|] ⋅ �v[0, 0]where r[|n − m|] = r(|n−m|)1

(3.41)

According to definition of the multichannel EEG signal in (3.3) one can define the covariance matrixof y as:

�y =

⎡

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎣

�v[1, 1] … �v[1, Nt]�v[2, 1] �v[2, Nt]⋮ ⋮

�v[Nt, 1] … �v[Nt, Nt]

⎤

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

=

⎡

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎣

r[0] … r[Nt − 1]r[1] r[Nt − 2]⋮ ⋮

r[Nt − 1] … r[0]

⎤

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

⊗ �v[0, 0]

(3.42)

where �v[0, 0] is the spatial covariance matrix defined over the channels. Note here that whilewriting (3.42), we utilize the form that we obtain in (3.41). Then, following this representationgiven in (3.42) we generalize (3.40) such that

vi[n] = r[n, m] ⋅ vi[n − 1] + wi[n] (3.43)

n = 1...Nt

m = 1...Nt

57


Then we can show that

�y =

⎡

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎣

�v[1, 1] … �v[1, Nt]�v[2, 1] �v[2, Nt]⋮ ⋮

�v[Nt, 1] … �v[Nt, Nt]

⎤

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

⊗ �v[0, 0](3.44)

Note that when the number of the brain sources is at least the same as number of channels (3.17)is equivalent to (3.44). If we assume the EEG signal is independent in time which means thatr[|n − m|] = 0 for all m ≠ n. Then we have:

�y = r[0] ⋅ INt⊗ �v[0, 0] (3.45)

Here INtis an Nt ×Nt identity matrix.

3.5 Covariance Estimation Flip-Flop Algorithm

In the proposed approach, we estimate the parameters of the covariance matrices using maximumlikelihood estimation. Given a set of N i.i.d. y1, y2,… , yi,… yN samples from a zero meanmultivariate Gaussian distribution with a Kronecker product covariance structure, the likelihoodfunction is defined as

L(Y,�t ⊗ �cℎ) =exp(−

∑Ni=1(y

i)T (�−1t ⊗ �−1cℎ )yi)

�Nt⋅Ncℎ|�ℎt ⊗ �cℎ|Nt⋅Ncℎ

(3.46)

Using (3.46), we iterate between the estimation of the temporal and spatial components. Define themultivariate EEG time feature vector ticℎ recorded from the cℎth channel, as follows:

ticℎ =[

ticℎ[1] ticℎ[2] … ticℎ[Nt]

]T∈ ℝNt (3.47)

where ticℎ[n] is the ntℎ time sample recorded at channel cℎ for the itℎ trial. According to thedefinitions of the EEG feature vector yi in Section 3.3.3, ticℎ represents the multivariate Nt timemeasurements at one channel cℎ of the feature vector yi. The maximum-likelihood estimate of the

58


Algorithm 2: Flip-Flop Algorithm for estimating a full covariance matrix with Kroneckerstructure

Inputs: A set of N samples. Number maximum of iterations Kmax and convergence criteria.Outputs: Estimated full covariance �.

/* Initializations */

1 �cℎ ← �cℎ,initial2 iterate← true3 k← 0/* Compute */

4 �0t ← �t(�cℎ,initial)/* Starting the Iterations */

5 while (k ≤ Kmax ∧ iterate) dok ← k + 1/* Compute and update the loops variables */

6 �kcℎ ← �cℎ(�(k−1)t )

7 �kt ← �t(�kcℎ)/* Check convergence */

if convergence theniterate← false

8 return � = �kt ⊗ �kcℎ

temporal covariance matrix for a fixed estimate of the spatial covariance matrix is:

�t =1

N ⋅Ncℎ

N∑

i=1

Ncℎ∑

j=1

Ncℎ∑

p=1�cℎjp tip(t

ij)T (3.48)

where �cℎjp is the (j, p)tℎ element of the estimate of �−1cℎ . Similarly, for a fixed estimate of thetemporal covariance matrix, we compute an estimate for the spatial covariance matrix as

�cℎ =1

N ⋅Ncℎ

N∑

i=1

Nt∑

j=1

Nt∑

p=1�tjpv

i[p]vi[j]T (3.49)

where �tjp is the (j, p)tℎ element of the estimate of �−1t , and vi[j] is the multivariate EEG featurevector according to (3.2) of Section 3.3.3.

This iterative approach is called the Flip-Flop algorithm and it is summarized in Algorithm (2).The Flip-Flop algorithm was shown to converge to the maximum likelihood estimate of the fullcovariance matrix [55].

The covariance estimates computed through the Flip-Flop algorithm are ambiguous up to a

59


scaling factor because:� = �t ⊗ �cℎ = ⋅ �t ⊗ −1 ⋅ �cℎ (3.50)

Hence, for estimating the parameters of the covariance matrices in the cases of general Kronecker-product structure, Kronecker-product with Toeplitz temporal covariance and Kronecker-productwith AR(1) temporal covariance, we normalize the spatial covariance matrix to have the determinantof one, at each iteration. Moreover, for estimating the structured covariance matrix with the diagonaltemporal covariance structure, we fix the temporal covariance as an identity matrix and then weperform a one-time estimation of the spatial covariance matrix using the Flip-Flop algorithm. Toestimate the AR(1) coefficient of the temporal covariance, we employ the method presented by Kayin [98]. For more details, interested readers are encouraged to refer to our technical report [99].

3.6 Results

We perform two studies to analyze the effect of different covariance structures on the calibrationsession length and classification accuracy, respectively.

3.6.1 Study 1: Analysis of Required Calibration Session Length

Sample size required to achieve an invertible covariance estimation with a desired estimationaccuracy is an indication of the calibration session length for EEG-based BCI systems. As the lengthof the calibration length increases, more samples are collected and better parameter estimationsare obtained. However, as explained above, due to practical reasons for BCI operation, shortercalibration sessions are preferred. Using Monte Carlo simulations on synthetic data and computingthe Cramer-Rao bound on covariance estimation error, we performed an error-calibration lengthanalysis for the following Kronecker Product covariance structures with full spatial covariancesand different temporal covariances: (i) full (GKS), (ii) Toeplitz (KST), and (iii) identity (KSI)temporal covariance matrices. Assume that ncℎ × 1 vector �cℎ and nt × 1 vector �t are the vectorizedparameters of the spatial and temporal covariances respectively. Then define:

�0 = (�t ⊗ Int Incℎ ⊗ �cℎ) (3.51)

where Im is the identity matrix of size (m×m). Then If the full covariance matrix could be representedas a linear function of the parameters, one can define a mapping matrix P such that:

vec{�t ⊗ �cℎ} = P ⋅ vec{�t ⋅ �cℎ} (3.52)

60


Accordingly, the Cramer-Rao bound is computed from [100]:

CRB = P�0(�T0P∗(�−T ⊗ �−1)P�0)†�T0P

∗ (3.53)

We compute the RMSE as function of the sample size for three different covariances structures:(i) full temporal covariance (GKS), (ii) Toeplitz temporal covariance (KST) and (iii) Identitytemporal covariance (KSI).We fix three true covariance values, one from each structure and duringMonte Carlo simulations we generate data by randomly choosing among these three covariancestructures. Every data generated in such a random fashion is then used to estimate the covariancesunder three models. For each sample size the number of times we generate different data from eachchosen covariance is NR = 5000. CRBs for each model under the chosen true covariance structuresare also computed. We denote the underlying true covariances as �r for r = 1, 2, 3. Note that foreach r, we keep the "true covariance" constant while comparing models with each other using thesame synthetically generated data. Here �k is the estimated full covariance at simulation step k, �ris the true covariance at Monte Carlo run r, ‖ ⋅ ‖F is the Frobenius norm; and NR is the total numberof Monte Carlo simulations. For these simulations, we assume that the number of the channels andnumber of time samples are equal such that Nt = Ncℎ = 4. The RMSE is calculated as below.

RMSE = 13

3∑

r=1

√

√

√

√1NR

NR∑

k=1

‖�r − �k‖2F‖�r‖2F

(3.54)

Moreover, using the CRB on covariance estimation error, we compute a lower bound on theRMSE as

LB =

√

tr{CRB}N ⋅ ||�r||2F

(3.55)

where N is the sample size.

Simulation results for root-MSE and LB are presented in 3.1. From this figure, we observe thatfor a small sample size, which corresponds to shorter calibration length, the estimation error is thesmallest under KSI. However, as calibration length increases the estimation error is smallest forthe model with a higher complexity. We think that when the number of samples is low there ishigher variance introduced to the estimation error from more complex covariance structure models.However, as the number of samples increases this variance decreases, and the model with the leastamount of complexity introduces more bias. Especially this is the case in our simulations because afull Kronecker product structure, GKS, is the asymptotically unbiased estimator for all the proposedtrue structures. This analysis can be used as an indication of the calibration session design andindicate us when to switch from one model to another.

61


101

102

103

10−2

10−1

100

No

rma

lize

d r

oo

t−M

SE

Sample size (N)

KSI RMSEE

KSI LB

GKS RMSEE

GKS LB

KST RMSEE

KST LB

Figure 3.1: Normalized root-MSE and LB as a function of the sample size for three differentcovariance structures.

3.6.2 Study 2: Effect of different covariance structures on the EEG-based BCI classificationperformance

For this study, we collected data from 12 healthy participants using the language-model-assistedEEG-based typing BCI, RSVPKeyboard™, to assess the effect of different covariance structures onthe BCI user intent inference, namely classification performance.

3.6.2.1 Data Collections

Calibration data were collected from 12 healthy users who had consented to participate accordingto the IRB-approved protocol (IRB130107) [1]. Each user performed 12 calibration sessions forall possible combinations of 4 inter trial interval (ITI) values, which is defined as the time intervalbetween consecutive trials, ({200; 150; 100; 85}ms) and 3 presentation paradigms (RCP, SCP andRSVP) as described in Section 3.3.1.

EEG data were collected according to the International 10/20 EEG configuration from 16 EEGlocations: Fp1, Fp2, F3, F4, Fz, Fc1, Fc2, Cz, P1, P2, C1, C2,Cp3, Cp4, P5 and P6. The sampling

62


frequency for those recording was set to 256 Hz and the recorded signals were band-pass filteredwith the pass-band of [1.5,42]Hz. The preprocessed data was down-sample by order of 2. TheEEG signal recorded from multiple channels was windowed in a time window of [0, 500) ms, fromthe onset of every stimulus, and assigned to that trial as its EEG response. Finally the windoweddata from every channel was concatenated to form the feature vector yi for trial itℎ as we defined inequation (3.3) in section 3.3.3. For every trial Nt = 64 and Ncℎ = 16.

3.6.2.2 Classification performance assessment

We calculated the area under the receiver operating characteristics (ROC) curve (AUC) values,for every calibration data using a 10-fold cross validation assuming signal models with differentcovariance structures. We use AUC as a classification accuracy indicator for models with fourdifferent covariance structures: (i) GKS , (ii) KST, (iii) KSI, (iv) Kronecker-product structure withAR(1) temporal covariance (KSAR(1)). Moreover, we compute the class conditional non-structured(NS) full covariance matrices for every data set to be compared with the 4 structured covariancemodels.

For each particular ITI and presentation paradigm combination, using range bars, we employedall the calibration data and compared the median of AUC values for NS estimator, GSK, KST, KSIand KSAR(1) in Figures 3.2(a), 3.2(b), and 3.2(c). Since the number of supervised observations inevery data set is proportionally lower than the number of parameters of every covariance structure,we apply RDA to regularize the estimated class conditional covariance matrices.

The results of sub-figure 3.2(a) show that for RSVP paradigm, the maximum performanceimprovement is observed at ITI= 150 ms, which previously has been shown to provide the bestperformance for NS estimator among the four ITIs [1]. For RSVP paradigm, and ITI=150ms, onecan observe a trend in median AUCs which is inversely related to the model complexity; that is, atrend of improvement in the median AUC is observed as the model order (the number of parametersto be estimated) decreases. For this presentation paradigm, although the assumed structures does notdemonstrate significant performance improvements, the models with KSAR(1) and KSI covariancestructures performed equal to or better than NS estimator with lower model order complexity. Thatis, as the results of Section 3.6.1 indicate, under the models of KSAR(1) and KSI, shorter BCIcalibration sessions can be designed to achieve the same performance as a model that uses nocovariance structure.

Moreover, results presented in sub-figure 3.2(b) illustrate a similar trend among the median AUCvalues obtained from signal models with different covariance structures, for all ITI values. Basedon these results, for the SCP paradigm, one can improve the classification performance by use of

63


signal models with KSAR(1) and KSI covariance structures.

On the other hand, unlike the other presentation paradigms, in the results presented for RCPparadigm in sub-figure 3.2(c), we observe that all the classifiers developed based on signal modelswith different covariance structures perform almost similarly (similar median AUC values). Thiscan be due to the fact that for this presentation paradigm, the number of observations in the targetclass is twice the number for the other presentation paradigms. The increase in the sample size canimprove the performance of more complex models and relax the necessity of restricting covariancestructures. However, we can still observe that classifiers built under the signal model assumptionswith KSAR(1) and KSI covariance structures perform better than the other models, for most ITIs inmedian.

ITI in ms

85 100 150 200

AU

C

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NS GKS KST KSAR(1) KSI

(a) RSVP

ITI in ms

85 100 150 200

AU

C

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


(b) SCP

ITI in ms

85 100 150 200

AU

C

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


(c) RCP

Figure 3.2: Bar charts of the median area under the receiver operation characteristics (ROC) curves(AUC) with bar range indicating the maximum and minimum value for twelve users calculatedby use of different signal models for every presentation paradigm and ITI combinations when theclassifiers are trained with all training data.

Furthermore, we illustrate the AUC values of different classifiers built under different covariancestructure assumptions as a function of the model order complexity (the number of covarianceparameters to be estimated) in Figures 3.3(a), 3.3(b) and 3.3(c). As the plots suggest, the signalmodels with KSAR(1) and KSI covariance structures not only show a trend in the improvement ofthe median AUC, they also have significantly lower model complexity (the number of parametersto be estimated for KSI and KSAR(1) are 136 and 137, respectively) compared to the model thatdoes not assume a structured covariance (the number of parameters to be estimated is 524800). Asthe results of Section 3.6.1 suggest this decrease in model complexity results in also a decrease incalibration session length, which is a very desirable feature for BCI systems that are designed forreal-life applications.

Finally, we evaluate the performance of the proposed method on the system accuracy whenthe classifiers are trained with calibration sets of different sizes. We use this study to see how theperformance of each model varies with respect to the size of the training set. As described in section

64


(a) RSVP (b) SCP (c) RCP

Figure 3.3: The median of AUC among twelve BCI users for all ITIs and presentation paradigmsas a function of the model order complexity (the number of parameters to be estimated) of signalmodels with different covariance structures. Complexity numbers 136, 137, 200, 2216 and 524800correspond to models with KSI, KSAR(1), KST, GKS, and non-structured (NS) covariancesrespectively.

3.3.3 each calibration session of RSVPkeyboard ™consists of a number of sequences which areused to estimate the class conditional distributions. The number of sequences to show in calibrationeffects the duration of calibration session in time. It is desired that calibration sessions should be asshort as possible to prevent user fatigue and nonstationarities in the EEG. We compute the medianAUC for different calibration lengths from 12 healthy subject for every calibration data using a10-fold cross validation assuming signal models with different covariance structures. Here in theresults section, we demonstrate the results for ITI=150ms and the results for other ITIs are added inthe Appendix, see Figure 3.4, and Tables 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11,3.12.

Figure 3.4 compares the median of AUC values for NS estimator, GSK, KST, KSI and KSAR(1)for different calibration lengths of [10; 20; 30; 40; 50] sequences for all presentation paradigms anda fixed inter trial interval (ITI) of 150ms 1 using range bars. We also performed a statistical test tocompare the differences among the AUC values, see Tables 3.1, 3.2 and 3.3. With a significancelevel of � = 0.1, we observe significant differences between the KSAR (1) and NS, and KSI andNS for the sequence length of 10 for the RSVP paradigm and sequence lengths of 10 and 20 forthe SCP and RCP paradigms. As shown in Figure 3.4(a), in the case of the RSVP presentationparadigm the median classification AUC can reach up to 70 % within 20 sequences when KSAR(1)or KSI (Identity temporal covariance matrix) models are used. On the other hand, the same level ofclassification performance for RDA can be achieved with at least 40 sequences. A similar trendcan be observed in Figure 3.4(b) such that in the case of SCP paradigm, we obtain higher AUCswith all the models, the median classification AUC can reach up to 80 % within 20 sequences when1The current RSVP Keyboard uses 150ms as the optimal ITI [1]

65


KSAR(1) or KSI models are used, while to reach the same level of classification AUC in RDAmodel 40 sequences are required. Moreover, as we can see in Figure 3.4(c), in RCP paradigm with40 sequences in KSAR(1), KSI and KST model, % 80 accuracy could be reached while RDA (NS)model requires more than 50 sequences to reach the same level of classification accuracy. This trendshows us that it is possible to shorten the calibration session length by choosing KSI, KSAR(1) orKST covariance structures compared to the NS structure. Here the best results are obtained for SCPpresentation paradigm.

Similar calibration session length improvement trend can be observed for different ITI values asshown in the Appendix Tables 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12. Here we observethat when ITI=85ms, under SCP paradigm with significance level of � = 0.1, the AUC values forKSAR(1) and KSI structures are significantly higher than the AUC values for NS covariance whenthe calibration sequence lengths are 10 and 20. When the ITI is increased to 100ms, for sequencelengths of 10, 20 and 30, the same trend is observed. However, when ITI=200ms, the significance isobserved only for the sequence length of 10. On the other hand, for the RCP paradigm, under thesignificance level of � = 0.1, we observe that for all the ITI values, KSI and KSAR(1) models aresignificantly better than the NS covariance model for sequence lengths of 10, 20 and 30. Among allthe results, RSVP shows the least improvement.

10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of sequences

AU

C

NS GKS KST KSAR1 KSI

(a) RSVP

10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of sequences

AU

C


(b) SCP

10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of sequences

AU

C


(c) RCP

Figure 3.4: Bar charts of the median area under the receiver operation characteristics (ROC) curves(AUC) with bar range indicating the maximum and minimum value for twelve users calculated byuse of different signal models for every presentation paradigm ITI=150ms and different calibrationlengths ({10; 20; 30; 40; 50}sequences) to train the classifiers.

3.7 Conclusions

In this paper, we developed signal models for multichannel EEG, leading to different covariancestructures for class-conditional densities of multivariate EEG features to be used in intent inference

66


SM v.s. NS 10 20 30 40 50

GKS v.s. NS 0.7612 0.4662 0.5338 0.6006 0.5563KST v.s. NS 0.6224 0.6006 0.5786 0.6851 0.6647AR(1) v.s. NS 0.0298 0.1888 0.2388 0.3994 0.3563KSI v.s. NS 0.0503 0.1593 0.3149 0.3563 0.2214

Table 3.1: p-values of a right-sided Wilcoxon rank sum test for the different covariance modelsversus NS for different calibrations lengths ({10; 20; 30; 40; 50}sequences) in RSVP paradigm andITI 150ms.

SM v.s. NS 10 20 30 40 50


Table 3.2: p-values of a right-sided Wilcoxon rank sum test for the different covariance modelsversus NS for different calibrations lengths ({10; 20; 30; 40; 50}sequences) in SCP paradigm andITI 150ms.

SM v.s. NS 10 20 30 40 50


Table 3.3: p-values of a right-sided Wilcoxon rank sum test for the different covariance modelsversus NS for different calibrations lengths ({10; 20; 30; 40; 50}sequences) in RCP paradigm andITI 150ms.

engines for EEG-based brain computer interfaces. We showed that under certain stationarityassumptions, concatenated EEG feature vectors follow a multivariate Gaussian distribution with aKronecker product structure, which factorizes into spatial and temporal covariance matrices. Wealso showed that different temporal and independence assumptions on temporal propagation ofsignals result in further structures on temporal covariance.

We performed two studies, one with synthetic and one with real EEG data collected through

67


a letter-by-letter typing BCI to analyze the effect of the structured covariance models for EEGon BCI calibration session length and user intent detection accuracy, respectively. Our results ofthe first study show that as the model order complexity decreases (stronger assumptions on thecovariance structure resulting in lower degrees of freedom in covariance) the session length couldbe shortened compared to a signal model that assumes no specific covariance structure, while itachieves similar covariance estimation accuracy. The results of the second study show that themodels with structured covariance not only show a trend in improvement in user intent detectionaccuracy, but also have significantly lower model order complexity compared to a signal model withno specific covariance structure. These results implies that more structured covariances could beused to potentially shorten BCI calibration sessions, while preserving system performance.

In future work, we will extend our approaches to relax the stationarity assumptions and select themodel orders optimally to more realistically model the EEG signals. We will also investigate furtherstructure constraints on head models in forward linear source model to represent different spatialcovariance structures. Appropriately structured Gaussian covariance models, coupled with growingEEG data from the same user over time, and from multiple users across the user population couldresult in significantly improved BCI system performance by statistical modeling the multichannelEEG within the BCI contexts.

Appendix

SM v.s. NS 10 20 30 40 50



68


SM v.s. NS 10 20 30 40 50



SM v.s. NS 10 20 30 40 50



SM v.s. NS 10 20 30 40 50



69


SM v.s. NS 10 20 30 40 50



SM v.s. NS 10 20 30 40 50



SM v.s. NS 10 20 30 40 50



70


SM v.s. NS 10 20 30 40 50



SM v.s. NS 10 20 30 40 50



71

Chapter 4

Active Recursive Bayesian State Estimation

for Multimodal Noninvasive Body/Brain

Computer Interface Design

4.1 Noninvasive BBCIs for Communication and Control

EEG can only offer a low signal to noise ratio (SNR) hence, in recent decade researchers haveincreasingly considered to combine non-BCI ATs with BCIs to develop reliable systems. Theresulting body/brain computer interfaces (BBCIs) can enhance the detection accuracy and inferencespeed. For instance combining electromyography (EMG) or electrooculography (EOG) signals –which records the electrical activity of muscles – with EEG have shown great improvement overstandalone inference from each modality separately [73].

In this chapter, we introduce a framework for the design of a multi-modal noninvasive BBCI.This framework in addition to having the capability of employing various combinations of EEGpotentials as listed above, also combines EEG with any combination of different physiologicalmeasurement modalities (such as EMG, fNIRS, fMRI, etc) to jointly infer the user intent. Theintroduced framework processes and communicates large amount of data that are streamed from

This work was supported by: NSF CNS-1136027, IIS1149570; NIH 2R01DC009834-06A1; NIDRR H133E140026.

72

CHAPTER 4. ACTIVE RECURSIVE BAYESIAN STATE ESTIMATION FOR MULTIMODAL NONINVASIVE BODY/BRAIN COMPUTER INTERFACE DESIGN

multiple modalities to make an inference in a short period of time to make these BBCIs usable inon-line settings. The framework as explained in the next section satisfies such on-line performancerequirements by introducing certain conditions on the streamed data and enables parallel processingbefore the fusion to make the entire process feasible in real-time.

4.2 Active Recursive Bayesian State Estimation

We represent our fusion and joint inference architecture as a hidden Markov model of ordern (HMM-n). In this dynamic system setup, recursive Bayesian state estimation (RBSE) is usedto extract information about parameters, or states, of the system in real time given the noisymeasurements of the system output.

Through this section we build a probabilistic graphical model of our system in four steps. Ateach step, based on the structure of the problem we impose a set of assumptions on HMM-n thatmake the model more restricted and specific to Active-RBSE. First we stars with the abstract PGMof the proposed HMM-n as shown in Figure 4.1.

xkxk−1⋯ xk−2

kk−1⋯ k−2

Figure 4.1: Hidden Markov Model of order n (HMM-n).

In this Figure, xk represents the system state at time k and k is the system output measurement.In a BCI, we assume that system state represents the user intent which belongs to a finite discretespace while we let the measurement space to be continues. k is multi-modal evidences, such thatk = {O1

k, O2k, ⋯ , Om

k } where m represents the number of measurement modalities. The evidencethat we consider here can be divided into two types:

73


a. Internally driven (Type I abbreviated as T-I throughout this chapter) are the set of mea-surement that are generated by the user based on the desired intent without external stimulation.Two examples of this type of evidences are:

∙ Volitional cortical potentials (VCP)

∙ Eye-tracking signal: Eye-tracker measurements can be used to define the direction offocus in the visual field. Using these measurements the BBCI can identify the user intentby defining the visual space which the user is focusing on.

b. Externally cued (Type II abbreviated as T-II throughout this chapter) are the set ofmeasurements that are generated by user physiology as a function the intent in response to aset of stimuli (or questions) presented to the user. Two examples for this type of evidence are:

∙ Auditory and visual event related potentials (A-ERP/V-ERP)

∙ Steady-state evoked potentials (SSEP)

Consequently, system output measurements can be partitioned into two sets of 1,k = {O11,k, O

21,k, ⋯ , Om1

1,k}for T-I, and 2,k = {O1

2,k, O22,k, ⋯ , Om2

2,k} for T-II evidences, such that k = 1,k ∪ 2,k, thusm = m1 + m2. Next we describe our assumptions on the probabilistic relationships among differentmeasurements originating from various modalities. Accordingyl, the abstract graphical modelpresented in Figure 4.1 is detailed in the probabilistic graphical model (PGM) as illustrated inFigure 4.2 to represent our assumptions on the interdependency among observations.

In this chapter, using the assumptions imposed throug the graphical model in 4.2, we employ amaximum a-posteriori (MAP) inference method to estimate the user intent. To compute the posteriorPMF over the state space, we use the Bayes rule of posterior ∝ prior × likeliℎood. But accordingto the PGM shown in Figure 4.2, for a given state value, likelihoods corresponding to differentmodalities can be calculated, up to a normalization factor, independently from each other. Hencefor the rest of the chapter we will focus on estimating the posterior from one type of evidence asother likelihoods can be calculated likewise and be fused with each other easily.

The PGMs illustrated in Figures 4.1 and 4.2 correspond to real time casual systems and they aredesigned to infer user intent and execute certain tasks when a confidence threshold is attained. Inthis chapter, we refer the time window in which the system reaches a confident decision as an epoch.In this setup, when the output measurements up to epoch k are observed, the goal is to estimatethe current system state. Note that, this setup represents a dynamic system in which the statedynamics, the user intent might change during the operation of the BBCI system. For instance, in aletter-by-letter typing scenario, the state at epoch k represents only a character while the user needsto type a sequence of characters to form words, sentences and eventually communicate the desiredmassage. As a result, it seems inevitable to adaptively update the (probabilistic or deterministic)

74


⋯ xk

Φk,m2

yx(Ai,m2)

e(Ai,m2)

yx,m1

em1

i = 1, ..., |Φk,m2|

m2 = 1, ..., |2,k|

m1 = 1, ..., |1,k|

Figure 4.2: PGM of the system during inference cycle k.

system belief about the state space throughout the time. Upon intent detection, the BCI will executea command and hence the system state at all past epochs can be assumed as observed. Generally,user action at epoch k is a function of previously estimated system states, and the human user in theloop acts as a controller of this closed-loop system to perform a task. Moreover, the user interestcan be affected based on some environmental factors zk. Then the high-level graphical model inFigure 4.1 can be updated as in Figure 4.3

zk

xkxk−1⋯ xk−2

k

Figure 4.3: HMM-n while prior states are observed.

75


In the rest of this chapter, we will focus on the intent inference mechanism from T-II-basedmeasurement outputs e.g. EEG-ERP evidence. Typically a BCI queries the user with a setof questions to obtain a set of noisy measurements of user intent. These queries may containsets of state values presented to the user and the user responds to these questions with a yes/noanswer through some voluntarily or involuntarily generated physiological evidence. Let us defineCk = {xk−n, ⋯ , xk−1, zk} and the query set as Φk then we update the PGM as in Figure 4.4.Throughout this chapter we will refer Ck as the context information.

Ck xk

Φk

yx(Ai)

e(Ai)

i = 1, ..., |Φk|

Figure 4.4: PGM of the system in at epoch k.

In Figure 4.4, Ai is a subset of state space which we call as a trial, the yx(Ai) ∈ {0, 1|1 =yes , 0 = no}, and e(Ai) represents the physiological measurement in response to Ai,j , the j th

element of the subset Ai. But due the noisy nature of measurements, the BCI might need to querythe user with multiple sequences of trials to obtain a confident estimation.

In this setup, the system query the user iteratively, and updates the state space posterior probabilitymass function (PMF) until the probability of most likely state value reaches a predefined confidencethreshold. The updated PGM which allows for multiple sequences is shown in Figure 4.5.

In this model (see Figure 4.5), we have set an upper bound on the number of sequences ms withineach epoch to discard the possibility of extremely long decision cycles. This setup allows us toupdate the posterior PMF recursively, after every sequence. Assume 1 ≤ s ≤ ms sequences havebeen shown to the user and define s = {Ei}si=1, where Ei = {e(Ai

j)| j = 1, ⋯ , |Φik|}. Similarly,

take Y ixk= {yxk(A

ij)| j = 1, ⋯ , |Φi

k|} then define sxk= {Y i

xk}si=1. The MAP framework estimates

76


Ck xk

Φsk

yxk(Asi )

e(Asi )

i = 1, ..., |Φsk|

1 ≤ s ≤ ms

Figure 4.5: PGM of the system in at epoch k while multiple sequences are presented.

the user intent by solving the following optimization problem:

xk = argmaxxP (xk = x| s, C, {Φi

k}si=1) (4.1)

The posterior probability defined in (4.1) can be factorized in terms of likelihood and context priorusing the assumptions imposed in Figure 4.5.

P (xk = x| s, C, {Φik}

si=1) =

p(xk = x, s| C, {Φik}

si=1)

p( s| C, {Φik}

si=1)

∝ p( s| xk = x, {Φik}

si=1) ⋅ P (xk = x| C)

(4.2)

But, for a given xk the sxk

for {Φik}

si=1 is deterministically defined. Hence, according to the

conditional independence of s and context information defined in PGM we obtain;

p( s| xk = x; {Φik}

si=1) = p(

s| s

xk; {Φi

k}si=1) =

s∏

i=1, …, sj=1, …, |Φik|

p(e(Aij)|yxk(A

ij); {Φ

ik}

si=1)

(4.3)

77


Then we can rewrite (4.1) as

P (xk = x| s, C, {Φik}

si=1) ∝

s∏

i=1, …, sj=1, …, |Φik|

p(e(Aij)|yxk(A

ij); {Φ

ik}

si=1) ⋅ P (xk = x| C) ∝

∏

i=1, …, s{j| yxk (A

ij )=1}

p(e(Aij)|yxk(A

ij) = 1; {Φ

ik}

si=1)

p(e(Aij)|yxk(A

ij) = 0; {Φ

ik}

si=1)

⋅ P (xk = x| C)

(4.4)

4.2.1 Active learning for RBSE

T-II-based measurements for BCI are obtained in response to a set of labeling questions queriedfrom the user. For a small set of actions, it is possible to query the users for the label of all possiblestate values at every sequence. However, if the state space is large, depending on the level ofabstraction offered by questions, this can potentially lead to long sequences with large amount ofnon-informative multivariate measurements to be processed. In addition, obtaining human answersto these questions can be very expensive in terms of time and cognitive frustration especially whenthe user has severe disabilities. To mitigate these problems, one can propose to select random subsetof state space to be presented to the user. But Instead, we propose to use the active learning (AL),to intelligently select samples for annotation that enables efficiently learning an accurate posteriorwith as few questions as possible. Here, the implicit assumptions are that the labeling costs (interms of time or cognitive load) are the same for all of the queries and also significantly larger thanthe computational cost of the querying algorithms. The latter assumption leads us towards the ALalgorithms that generate suboptimal batch of queries, even though they might not be the optimalsolutions of defined objective functions. Later in this chapter, through experimental results we showthat this mechanism outperforms the cheap passive learning with random queries.

Accordingly, we define the Active-RBSE for inference and query optimization. Within thisframework, a generic AL and MAP inference loop will iterate by alternating between the followingtwo steps:

Query: Φs+1k = argmax

Φs+1k

g(Φs+1k ) s.t. Φs+1

k ∈ k ⊆ 2 (4.5)

Inference: xk = argmaxxP (xk = x| s, C, {Φi

k}si=1) (4.6)

Here, Φs+1k is a potential query set restricted to the set of feasible queries at time k, k, which is a

subset of all possible queries, 2 , the power set of . The quality of a query from the perspective of

78


AL is measured by the set function g.

An AL setting typically starts with an initial model (which we obtain from the context infor-mation), then samples are selected for label querying. Performing active learning in batch mode(sequence by sequence) introduces new challenges. Since we need to select a set of queries, oneshould also make sure that the samples are non-redundant to maximize the amount of informationthat they provide. Another related challenge is that optimally selecting a subset of samples basedon a given objective function defined over sets are in general NP-hard combinatorial optimizationproblems and they can easily lead to intractable solutions.

4.2.2 Submodular monotone set functions for set optimization problems

Submodular set functions offer various mathematical properties that can be exploited to definetractable solutions in combinatorial optimization problems. Submodular set functions are discreteanalogs of concave or convex real-valued functions [101]. Next we introduce certain definitions andtheorems about the submodular functions.

Definition 1 . (Discrete derivative)

Assume a set function f ∶ 2 → ℝ, B ⊆ , andw ∈ , thenΔf (w|B) ∶= f (B∪{w})−f (B)is “discrete derivative” of f at B with respect to w.

Now we can define a submodular function as follows.

Definition 2 . (Submodular set function)

A function f ∶ 2 → ℝ is “submodular” if for every B1 ⊆ B2 ⊆ and w ∈ ⧵ B2,

Δ(w|B1) ≥ Δ(w|B2)

or equivalently the function f ∶ 2 → ℝ is “submodular” if for every B1, B2 ⊆

f (B1 ∩ B2) + f (B1 ∪ B2) ≤ f (B1) + f (B2).

In particular, one can use a greedy forward algorithm to find a solution within a guaranteed boundaround the global optimum when the objective is a monotone submodular set function [102]. Toprovide the proof, first we need to define monotone set functions.

Definition 3 . (Monotone set function)

79


A set function f ∶ 2 → ℝ is “monotone” if for every B1 ⊆ B2 ⊆ , we get f (B1) ≤ f (B2).

In a maximization problem, the greedy forward algorithm starts with B0 = ∅, and iterativelyadds the elements that maximizes the discrete derivative of the function at the set from prior iteration,with respect to that element. Accordingly, the subproblem for iteration i is;

Bi = Bi−1 ∪ {argmaxw Δ(w|Bi−1)} (4.7)

Theorem 1 . [101] Assume a nonnegative monotone submodular set function f ∶ 2 → ℝ+.Also define {Bi}i≥0 to be the greedily selected sets according to equation (4.7). Then for all positiveintegers k and l we have:

f (Bl) ≥(

1 − e−lk

)

maxB∶|B|≤k

f (B).

Proof. Fix k and l and get B∗ ∈ argmax{f (B) ∶ |B| ≤ k} as an optimal set with |B∗| ≤ k. Sincethe function f is a monotone set function, we can assume |B∗| = k without loss of generality anddefine B∗ = {w∗

1, w∗2, ⋯ , w∗

k}. Then for all i ≤ l

f (B∗) ≤ f (B∗ ∪ Bi)

= f (Bi) +k∑

j=1Δ(w∗

j |Bi ∪ {w∗1, ⋯ , w∗

j−1})

≤ f (Bi) +∑

w∗∈B∗Δ(w∗

|Bi)

≤ f (Bi) +∑

w∗∈B∗

(

f (Bi+1 − Bi))

≤ f (Bi) + k(

f (Bi+1 − Bi))

.

Hence we havef (B∗) − f (Bi) ≤ k

(

f (Bi+1 − Bi))

.

Now define si = f (B∗) − f (Bi) then we get

si ≤ (si − si+1)⇒ si+1 ≤(

1 − 1k

)

si ⇒ sl ≤(

1 − 1k

)ls0.

We know that s0 = f (B∗) − f (∅) since f is nonnegative by assumption. Consequently, by use of

80


the well-known inequality 1 − x ≤ e−x, ∀x ∈ ℝ we get

sl ≤(

1 − 1k

)ls0 ≤ e

−lk f (B∗)⇒ f (B∗) − f (Bl) ≤ e

−lk f (B∗)

⇒ f (Bl) ≥(

1 − e−lk

)

f (B∗).

More interestingly, for a modular-monotone objective function, the greedy forward algorithmleads to the global optimum solution. The proof of this proposition follows easily form the followingdefinition as the contribution of each element does not depend on the set size.

Definition 4 . (Modular set function)

A function f ∶ 2 → ℝ is “modular” if for every B1 ⊆ B2 ⊆ and w ∈ ⧵ B2,

Δ(w|B1) = Δ(w|B2)

or equivalently the function f ∶ 2 → ℝ is “modular” if for every B1, B2 ⊆

f (B1 ∩ B2) + f (B1 ∪ B2) = f (B1) + f (B2).

4.2.3 On the objective functions for Query optimization

System parameter/state learning can be more efficient if we can query the oracle (in our case theuser), to obtain the labels of state values which convey most salient information. Such queryingcan be achieved through careful selection of objective functions in the Active-RBSE inferenceframework. It is important to note here that for efficient solution of the subset selection through agreedy forward optimization in an online setting, in addition to being informative about the stateestimation, the objective functions need to be either monotone-modular set functions or they shouldbe upper or lower bounded by such set functions.

Here, we consider a g(.) to be used in active-RBSE framework, as specified in (4.5). Let usassume that the actual user intent for the current epoch (i.e. epoch k) is given as x∗k, and s sequencesof stimuli have already been presented to the user. The goal is to optimize the query set for sequences+ 1 with the assumption that s prior sequences have not led to a confident decision. Then although

81


the measurements for that sequence is not observed yet, one can estimate a prediction of x∗k posteriorprobability by introducing and marginalizing the random variable for measurements, when Φs+1

k isgiven. We define a function g ∶ , 22 → ℝ as:

g(x,Φs+1k ) = P (xk = x| s, C, {Φi

k}s+1i=1 , x

∗k = x)

= ∫Es+1P (xk = x, Es+1

| s, C, {Φik}

s+1i=1 , x

∗k = x)d(Es+1)

= EEs+1|Φs+1k , x∗k[P (xk = x|Es+1, s, C, {Φi

k}si=1, x

∗k)]

= EEs+1|Φs+1k , x∗k

[

Πs+1(x) ⋅ p(Es+1|xk = x, Φs+1

k )∑

v∈ Πs+1(v) ⋅ p(Es+1|xk = v, Φs+1

k )

]

(4.8)

where Πs+1(x) = P (xk = x| s, C, {Φik}

si=1) represents the prior probability of x ∈ before

observing sequence s + 1. Moving from the third line to the fourth of (4.8), we use the following

P (xk = x|Es+1, s, C, {Φik}

s+1i=1 ) =

Πs+1(x) ⋅ p(Es+1|xk = x, Φs+1

k )∑

v∈ Πs+1(v) ⋅ p(Es+1|xk = v, Φs+1

k )(4.9)

for which the denominator is the normalization constant.

Note that, g(x,Φs+1k ) computes the posterior probability of hypothesized target for a particular

Φs+1k given previously observed measurements and context information. But note that during

the current epoch, x is yet to be estimated; and hence, it is not known. Consequently, we canmarginalize out the dependency on this unobserved random variable by computing the expectedvalue of g(x,Φs+1

k ) with respect to the most recent estimate of state space posterior PMF, Πs+1(x).

Accordingly, the objective function for query set selection is then defined as follows.

Φs+1k = argmax

Φs+1k

EΠs+1(x)[

g(x,Φs+1k )

]

(4.10)

The function defined in (4.10) is the expected value of the predicted target posterior probabilitywith respect to current probability distribution over the state space which was obtained through theevidence gained until the sequence s + 1. Optimizing this function in terms of Φs+1

k exploits ourcurrent knowledge to the extend to maximize our belief on the inference of the unknown state xk.

Next, we illustrate the usage of this objective function in stimuli subset selection for a language-model-assisted BCI for letter-by-letter typing. In the following subsections, after a short introductionof the typing BCI, we approximate the proposed objective function with a modular-monotone setfunction and provide the algorithm for stimuli subset selection. Then through an experimental

82


study, we demonstrate the benefit of AL component of Active-RBSE in terms of typing speed andaccuracy.

4.3 Illustrative BCI Design Example

A safe and portable set of BCIs utilize non-invasively recorded electroencephalography (EEG)for inference. Among many, a class of these BCIs employ external cues to induce an event relatedpotentials (ERP) in response to user intent. For example most commonly, ERP-based BCI systemswhich rely on visual stimulation can utilize various presentation paradigms. The pioneering exampleof these systems, is the matrix-speller from, Farwell and Donchin which demonstrates how to designa presentation paradigm for inducing P3001 in response to user intent as a control signal for BCI-based communication [10]. In this study, the subjects observe a 6x6 matrix of letters in the Englishalphabet, numbers from 1 to 9 and a space symbol distributed on the screen. While the userhas focused on the intent character, rows and columns of the matrix are flashed randomly. Thiswork led to extensive efforts for designing different configurations or algorithms to improve thecommunication speed and accuracy with the matrix speller, as well as other audio, visual, and tactilestimulus presentation techniques for eliciting P300 responses. In the following, we will reviewsome of these stimulus presentation techniques.

Visuospatial Presentation: We categorize different visuospatial presentation techniques intofollowing groups:

Matrix Presentation: Generally the matrix spellers use an R × C matrix of symbols with R rowsand C columns. Traditionally, in these systems each row and column of the matrix is intensified ina pseudo-random fashion, while the participants count the number of highlighted rows or columns(or, in general, subsets) that include the desired symbol. Among all rows and columns in the matrix,only two contain the target symbol hence it is proposed that they will induce a P300 response. Bydetecting this signature in the EEG the BCI system can identify the target letter to enable typing.

The accuracy of BCIs highly depend on signal to noise ratio (SNR). Consequently, due to lowSNR of EEG, matrix speller systems require to sacrifice the speed by cuing the user with multiplesequences of flashes to achieve an acceptable accuracy. It was demonstrated that the matrix spellercan achieve 7.8 characters/minute with 80% accuracy, using bootstrapping and averaging the trialsin different sequences [11]. Many signal processing and machine learning techniques have beenproposed by the researchers in the field, to improve the matrix speller performance in terms of speed1P300 is an ERP which is elicited in response to a desired unpredictable and rare event. This evidence is characterized as a positivepeak around 300 ms after the onset of desired stimuli.

83


and accuracy [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34].

Considering the target population, matrix based presentation paradigms might not offer a suitablesolution for BCIs since they performs well in overt attention mode; however, in covert attentionmode its performance degrades significantly [35]. BCI researchers have proposed minimally gazedependent stimulus presentation techniques such as rapid serial visual presentation and balanced-treevisual presentation to overcome such performance drops.

Rapid Serial Visual Presentation (RSVP): RSVP is a technique in which stimuli are presented oneat a time at a fixed location on the screen, in pseudorandom order, with a short time gap in between.Within a sequence of RSVP stimuli, each symbol is shown only once, hence, the user intent isconsidered as rare event which can induce an ERP containing the P300 wave as consequence ofthe target matching process that takes place in the brain. RSVP aims to be less dependent on gazecontrol, by utilizing temporal separation of symbols instead of spatial separation as in the matrixspeller [36, 37, 38, 39, 40].

Usually, in RSVP-based BCIs the inference speed is lower than matrix spellers, as the binarytree that leads to symbol selections in a matrix speller could reduce expected bits to select a symbol(determined by entropy), by exploiting the opportunity of highlighting a subset of symbols; whileRSVP is constrained to a highly structured right-sided binary tree which can only offer a largerexpected bits per symbol. Letter-by-letter typing RSVP-BCIs designed by Berlin BCI and RSVPKeyboard™ groups, have achieved up to 5 characters/minute [36, 37, 38, 40]. Utilization of colorcues and language models have offered some enhancements in typing speeds with RSVP [37, 40].

Balanced-Tree Visual Presentation Paradigms: In balanced-tree visual presentation techniquevisual stimuli are distributed spatially into multiple presentation groups with balanced numbers ofelements. For example in a system from Berlin BCI known as Hex-o-Spell, a set of 30 symbols isdistributed among 6 presentation groups each containing 5 symbols. Presentation groups are flashedin a random fashion to induce an ERP in response to the group that contains the intended symbol.Upon selection of a group, the symbols of that set are distributed individually to different presentationgroups, typically with one group containing a command symbol for moving back to the firstpresentation stage. Then the system utilizes the same flash paradigm to decide on the user desiredsymbol [35, 41]. In a similar system known as Geospell, 12 groups of 6 symbols, corresponding torows and columns of a 6 × 6 matrix speller, are arranged in a circular fashion [42, 43]. In anotherstudy these 12 overlapping subset of symbols are presented in RSVP manner [44]. In these systems,the intersection of the selected groups gives the desired symbol.

In this section we introduce a language model assisted EEG based BCI which can utilize any oftwo well-known matrix presentation paradigms (matrix row and column (RCP) and matrix single

84


character presentation (SCP)) and rapid serial visual presentation (RSVP) paradigm to cue the userfor the inference of the desired symbol.

4.3.1 ERP based BCI for letter-by-letter typing

Figure 4.6 represents the complete flow chart of the BCI in this example. The system can besegmented in 3 main components: (A) a presentation component that controls the presentationscheme, (B) a feature extraction component that extracts the likelihoods from raw EEG evidencefor Bayesian fusion and (C) a decision making component that combines the EEG (physiology)and context information to estimate the user intent. In the next subsection these components aredescribed in more details.

Stimuli/Decision

Brain EEG Preprocessing

Dimensionality Reduction

EEG EvidenceJoint Inference

Contextual Evidence

decision making component

feature extraction component

presentation component

Figure 4.6: Typical BCI block diagram [1].

85


4.3.1.1 Presentation Component

Definitions: Let = {x1, x2, x3, ..., x||} be the vocabulary set i.e. the state space. In thisexample, for a letter by letter typing application consists of letters in the (English) alphabet,numerical symbols, space and backspace symbols (represented here by _ and < respectively). Define2 = {A1, A2, ..., A2||} as the power set of ; Ai ⊂ .

As a reminder, we define a “trial” as a subset Ai which is highlighted during the presentation.In RCP paradigm, each trial consists of multiple characters i.e. |Ai| ≥ 1, but in RSVP and SCPparadigms each trial is a singleton; i.e., |Ai| = 1. A “sequence” is a series of consecutive flashesof trials with a predefine short inter trial interval (ITI) in between. Among many definitions, herewe use the ITI as the time gap between the onset of two consecutive trial in a sequence. Afterevery sequence, the system fuses the likelihoods obtained from recorded EEG, in response to thatsequence, to compute the posterior PMF over the vocabulary set and tries to estimate the userintent through MAP inference. However, a final decision is not made until a predefined confidencelevel is reached2. Therefore, the system may need to query the user with multiple sequences beforecommiting to a decision. In this chapter a set of sequences which leads to a decision is referred toas an “epoch”.

In matrix based presentation paradigms, symbols are spatially distributed on a R × C grid withR number of rows and C number of columns [3]. To cue the user for the inference of the desiredcharacter, subsets of these symbols are intensified typically in pseudorandom order.

In every sequence, usually trials A1, A2, ..., An are selected such that⋃n

i=1Ai = . RCP is aparadigm in which a trial Ai is constraint to contain exactly all the symbols in a row or a column ofthe matrix of symbols with n = (R + C) [10]. Accordingly, in RCP every symbol in is flashedtwice in every sequence since, |Ai ∩Aj| ≤ 1, i ≠ j. For this example, to obtain the best coverage ofa wide screen monitors used in our experiments, we utilize a matrix of size 4 × 7. Researchers inthe field have suggested that a probability of less than 25% can lead to a detectable P300 wave inresponse to desired symbol [10]. In our setup for RCP, each sequence contains 11 flashes, amongwhich only 2 contain the target symbol and hence, the probability of each symbol trial in a sequenceis 2

11≃ 0.18 ≤ 0.25.

In contrast to RCP, Single character presentation (SCP) paradigm was shown to increase the P300signal quality which lead to more accurate target detection [77]. In SCP, each trial is a singleton, i.e.|Ai| = 1. Moreover, for the trials in a sequence, it is required that, Ai ∩ Aj = ∅; i ≠ j. We chooseenough number of flashes (n ≥ 5) in a sequence to reach a target probability less than 25%.2In the current implementation, confidence is measured by the maximum posterior probability over ; this corresponds to usingRényi entropy of order ∞ as the measure of uncertainty. Other entropy definitions such as Shannon’s could also be used.

86


Similar to SCP, each trial in RSVP includes only a single symbol. It have been shown that,RSVP-BCI systems which present all 28 symbols in every sequence, can only achieve a speed of5 symbols/minute [36, 37, 38, 40]. Instead, one may choose to present a subset of vocabulary ineach sequence to improve typing speed/accuracy. Active RBSE offers a principled mechanism forselecting this subset not only for RSVP but also for RCP and SCP paradigms to achieve an optimalsolution based on an objective function that is associated with gaining information about the userintent inference, more specifically for state estimation as described in Section 4.2.1. The resultsfrom an experimental study as described in Section 4.3.5 shows that this framework can improveboth the typing speed and accuracy significantly.

4.3.2 Decision Making Component

The inference engine of the BCI in this example uses language structure information and providescontext prior to be fused with EEG likelihoods. The joint decision from context information andEEG evidences is estimated through MAP inference. System parameters are optimized for each userindividually, with the data obtained from a calibration session, collected in a supervised fashion.

4.3.2.1 EEG feature extraction and classification

More reliable inference from EEG can be achieved by preprocessing the data and enhancing thesignal quality. The processed signal is used in this system to form EEG feature vectors, and thedetails of this procedure is provided later in Section 4.3.5. These feature vectors are computed byapplying certain linear operations on EEG time series. EEG is widely considered as a Gaussianprocess in the field [103, 104, 1]. Hence, it seems natural to use quadratic discriminant analysis(QDA)3 to project these vectors onto a one dimensional space with minimum expected classificationrisk. But QDA requires invertible class conditional covariance matrix estimates which are notfeasible in the practical usage of the BCI due to the high dimensionality of the EEG feature vectorsand low number of calibration samples in each class4. This problem can be mitigated by employingregularized discriminant analysis (RDA), which provides full-rank class conditional covarianceestimates [80].

Rank deficient covariance matrices are obtained using maximum likelihood estimation fromcalibration data. Then RDA converts these estimates to full-ranked covariance matrices usingshrinkage and regularization. Shrinkage is defined as a convex combination of each class covariance3The Gaussian distribution assumption here is a direct consequence of the assumption that filtered EEG is a Gaussian randomprocess.

4In BCI design, limited number of calibration samples are obtained in order to keep the calibration sessions short.

87


matrix and the overall class-mean-subtracted covariance. Define fi ∈ ℝp as a p dimensionalfeature vector, and l ∈ {0, 1} as class label where 0 and 1 represent non-target and target classesrespectively; then the maximum likelihood estimator for mean and covariance matrices are definedas in (4.11).

µl =1Nl

N∑

i=1fi�(yi, l)

�l =1Nl

N∑

i=1(fi − µl)(fi − µl)T �(yi, l)

(4.11)

Here Nl is the number of trials in class l, thus N = N0 + N1 is the total number of featurevectors, and �<.,.> is the Kronecker-�. The shrinkage step of RDA is defined as in (4.12).

�l(�) =(1 − �)Nl�l + (�)

∑1j=0Nj�j

(1 − �)Nl + (�)∑1

j=0Nj

(4.12)

where � ∈ [0, 1] is known as the shrinkage parameter. Note that, � = 1 provides equal classconditional covariance matrices, and converts the problem to linear discriminant analysis (LDA).

Regularization step of RDA is defined in (4.13).

�l(�, ) = (1 − )�l(�) + ( )1ptr[�l(�)]Ip (4.13)

In this equation, we use tr[.] as the trace operator, a p × p identity matrix is shown by Ip, and ∈ [0, 1] is the regularization parameter. The regularization step corresponds to diagonal loadingon these matrices to make them invertible. The discriminant scores of RDA are the estimated similarto QDA but with regularized & shrinked class conditional covariance matrices.

e = logf (f ;µ1, �1(�, ))�1f (f ;µ0, �0(�, ))�0

(4.14)

In equation (4.14), f (f ;µ,�) serve as Gaussian probability density function when f ∼ (µ,�),and the the prior probability of class l is shown as �l. Here, with no knowledge about a trial, we use�1 = �0. The EEG scores obtained from RDA are then treated as random variables for which theclass conditional probability density functions (PDF) are computed using kernel density estimation(KDE) as in (4.15) [40].

88


P (f = f |l = l) = fKDE(e = e|l = l) =1Nl

N∑

i=1ℎl(e, ei)�li,l (4.15)

In this equation, ℎk(., .) is a suitable kernel function with bandwidth ℎl. In this example, weuse a Gaussian kernel and we estimate the kernel bandwidth ℎl for each class using the Silvermanrule of thumb [81], applied over the RDA scores for the corresponding class.

4.3.2.2 Language Model

In a letter-by-letter typing scenario, a letter n-gram language model can be used to obtain aprior PMF over the state space of user intent. This information can be used both for sequenceoptimization and for inference. A letter n-gram model corresponds to a Markov mode of order n−1which estimates the conditional probability of every letter in the alphabet based on n − 1 previouslytyped letters [82].

According to the Bayes rule and conditional Independence provided by an n − 1 Markov model,the conditional probability of each character is computed as in (4.16).

p(xk = x|xk = x) =p(xk = x, xk−1 = x)

p(xk−1 = x)(4.16)

As defined in Section 4.2, xk is the system state (yet) to be estimated during epoch k; and the stringof previously written n−1 symbols are denoted by xk−1. In this particular example, we use a 6-gramletter model. This model is trained on the NY Times portion of the English Gigaword corpus [82].

4.3.3 Application of proposed cost function in BCI example

The ERP-based BCI of our example uses T-II-based measurements for user intent detection.Recall that as described in Section 4.2, T-II measurements are obtained in response to external cuesprovided by the system. Hence, in oppose to T-I evidences, T-II measurements can be affectedby the query set that system employs in each sequence. Here, we will use the objective functionproposed in section 4.2.3 to optimize the sequence query set.

In the function presented in (4.8), the argument inside the expectation is only a function of Es+1

89


when x and Φs+1k are fixed. Hence, we define � =

[

�(As+11 ), ⋯ , �(As+1

|Φs+1k |

)]

, where

�(Aij) =

p(e(Aij)|1)

p(e(Aij)|0)

.

Then we define a new function ∶ ℝ|Φs+1k | → ℝ useing (4.4) as follows;

(�) =Πs+1(x) ⋅

∏

{j| yx(As+1j )=1} �(As+1j )

∑

v∈ Πs+1(v) ⋅∏

{j| yv(As+1j )=1} �(As+1j )

(4.17)

The online operation mode of the BCI needs a time efficient optimization mechanism. For that,we simplify the problem by approximating the g(x,Φs+1

k ) using the Taylor series expansion of thefunction defined in (4.17).

g(x,Φs+1k ) = EEs+1|Φs+1k , x∗k

[ (�)] =

EEs+1|Φs+1k , x∗k

[

(µ�) +(

� − µ�)

⋅ ∇ (µ�) +⋯]

(4.18)

In (4.18), µ� = EEs+1|Φs+1k , x∗k[�]. Now we use (4.18) to define a substitute objective function as in

(4.19), which is the locally suboptimal linear approximation around the µ� of the original objectivefunction. This type of approximation is commonly used in the field of signal processing [105],especially for distributions with negligible higher order central moments. Usually the estimatedclass conditional KDEs from experimental data, are sharp unimodal PDFs with small variance.Hence we assume this approximation around the expected value of the distribution is justifiable.

g(x,Φs+1k ) ≈ g(x,Φs+1

k ) = (µ�) + EEs+1|Φs+1k , x∗k

[(

� − µ�)]

⋅ ∇ (µ�) = (µ�)(4.19)

The Taylor expansion in (4.18) is done around µ� = EEs+1|Φs+1k , x∗k[�], hence the second term in

equation (4.19) is zero. Now we need to compute µ� = EEs+1|Φs+1k , x∗k[�]. Here we use the conditional

independence of trials as presented in the proposed graphical model in Figure 4.5, for a given s+1x∗k

.

This means that, �(As+1i ) is independent from �(As+1

j ) ∀i, j = 1,… , ||

Φs+1k

|

|

for all i ≠ j. Note that,�(As+1

j ) is calculated at samples drawn from the following distributions:

{

e(As+1j ) ∼ p(e(.)|1), if x∗k ∈ A

s+1j

e(As+1j ) ∼ p(e(.)|0), if x∗k ∉ A

s+1j

90


Accordingly, we define µ� = [�(As+11 ), ⋯ , �(As+1

|Φs+1k |

)], such that

�(As+1j ) =

⎧

⎪

⎪

⎨

⎪

⎪

⎩

�+ = Ee(.)|1[

p(

e(As+1j )|1)

p(

e(As+1j )|0)

]

, if x∗k ∈ As+1j

�− = Ee(.)|0[

p(

e(As+1j )|1)

p(

e(As+1j )|0)

]

, if x∗k ∉ As+1j

(4.20)

Then g(x,Φs+1k ) 5 can be defined as follows:

g(x,Φs+1k ) =

Πs+1(x) ⋅ �+c+x,x(Φ

s+1k )

∑

v∈ Πs+1(v) ⋅ �+c+x,v(Φs+1k )

⋅ �−c−x,v(Φ

s+1k )

(4.21)

where,

c+x,v(Φs+1k ) =

|Φs+1k |

∑

j=1yx(As+1

j ) ⋅ yv(As+1j ) and

c−x,v(Φs+1k ) =

|Φs+1k |

∑

j=1(1 − yx(As+1

j )) ⋅ yv(As+1j )

for yx(As+1j ) ∈ {0, 1} (i.e. target and non-target classes). Then we can use the approximate objective

function g(x,Φs+1k ) and redefine the optimization problem as follows:

Φs+1k = argmax

Φs+1k

EΠs+1(x)[

g(x,Φs+1k )

]

= argmaxΦs+1k

log(

EΠs+1(x)[

g(x,Φs+1k )

])(4.22)

Here we propose to optimize the logarithm of the objective function, as the solution does not changedue to this monotonically increasing transformation. To solve the problem defined in (4.22), we use5Note that the approximation in (4.19), also corresponds to defining a point estimate of the EEG scores by calculating their meanvalue as computed in (4.20).

91


Jensen’s inequality to define a lower-bound of the objective function as follows:

log(

EΠs+1(x)[

g(x,Φs+1k )

])

≥ EΠs+1(x)[

log(

g(x,Φs+1k )

)]

=

EΠs+1(x)⎡

⎢

⎢

⎣

log⎛

⎜

⎜

⎝

Πs+1(x) ⋅ �+c+x,x(Φ

s+1k )

∑

v∈ Πs+1(v) ⋅ �+c+x,v(Φs+1k )

⋅ �−c−x,v(Φ

s+1k )

⎞

⎟

⎟

⎠

⎤

⎥

⎥

⎦

= EΠs+1(x)[

log(Πs+1(x)) + c+x,x(Φs+1k ) log(�+)

]

−

EΠs+1(x)

[

log

(

∑

v∈Πs+1(v) ⋅ �+

c+x,v(Φs+1k )⋅ �−c

−x,v(Φ

s+1k )

)]

(4.23)

Experimentally, the class-conditional PDFs are typically sharp unimodal with sufficiently differentmean values to assume, �+ > 1 and �− < 1. In this example we proposed an upper bound|Φs+1

k | ≤ mt as the limit on the number of sequences in each epoch to prevent from extremely longstate estimation cycles. We have,

�+c+x,v(Φ

s+1k )

≤ (�+)mt and �−c−x,v(Φ

s+1k ) ≤ �−0 = 1.

which leads to �+c+x,v(Φ

s+1k )⋅ �−c

−x,v(Φ

s+1k ) ≤ (�+)mt . Finally,

EΠs+1(x)[


]

−

EΠs+1(x)

[

log

(

∑

v∈Πs+1(v) ⋅ �+

c+x,v(Φs+1k )⋅ �−c

−x,v(Φ

s+1k )

)]

≥

EΠs+1(x)[


]

−

EΠs+1(x)[

log(

(�+)mt)]

.

(4.24)

Now we can exclude the terms that are independent of Φs+1k in (4.24), and use (4.22) to define the

optimization problem as follows;

Φs+1k ≈ argmax

Φs+1k

Q = EΠs+1(x)[

c+x,x(Φs+1k ) log(�+)

]

(4.25)

Through this simplification we maximize the lower bound on the original cost function, and weshow next that such a simplification leads to time efficient solution to the optimization problem.

92


4.3.4 Combinatorial Optimization

The approximated objective function defined in (4.25) is a modular and monotonic set functionQ ∶ 22 → ℝ; therefore the optimization defined in (4.25) has guaranteed convergence proper-ties [102]. Here, we prove that Q is a monotone modular set function.Lemma 1 .

Take = 2 , then the function Q ∶ 2 → ℝ as define in (4.25) is a modular set function.

Proof. Assume Φ1 ⊆ Φ2 ⊆ 2 and A ∈ 2 ⧵Φ2, then

ΔQ(A|Φ1) =

EΠs+1(x)[

c+x,x(Φ1 ∪ {A}) log(�+)]

− EΠs+1(x)[

c+x,x(Φ1) log(�+)]

=

EΠs+1(x)[

c+x,x(Φ1 ∪ {A}) log(�+) − c+x,x(Φ1) log(�+)]

Since A ∉ Φ1, we use the definition of c+x,x(.) to write

c+x,x(Φ1 ∪ {A}) = c+x,x(Φ1) + c+x,x({A})⇒ ΔQ(A|Φ1) =

EΠs+1(x)[

(c+x,x(Φ1) + c+x,x({A})) log(�+) − c+x,x(Φ1) log(�+)

]

=

EΠs+1(x)[

c+x,x({A}) log(�+)]

Similarly as A ∉ Φ2, we have

ΔQ(A|Φ2) = EΠs+1(x)[

c+x,x({A}) log(�+)]

⇒

ΔQ(A|Φ1) = ΔQ(A|Φ2)

Lemma 2 .

Take = 2 , then the function Q ∶ 2 → ℝ as define in (4.25) is a monotone set function.

Proof. Assume Φ1 ⊆ Φ2 ⊆ 2 , and define Φ3 = Φ2 ⧵Φ1, then Φ3 ∪ Φ1 = Φ2 and we can write:

Q(Φ2) = EΠs+1(x)[

c+x,x(Φ1 ∪ Φ3) log(�+)]

93


Moreover, Φ3 ∩ Φ1 = ∅ then according to the definition of c+x,x(.) we have

Q(Φ2) = EΠs+1(x)[

(c+x,x(Φ1) + c+x,x(Φ3)) log(�+)]

=

EΠs+1(x)[

c+x,x(Φ1) log(�+)]

+ EΠs+1(x)[

c+x,x(Φ3) log(�+)]

=

Q(Φ1) + EΠs+1(x)[

c+x,x(Φ3) log(�+)]

Based on our assumption, �+ ≥ 1⇒ log(�+) ≥ 0. Also due to definition, c+x,x(.) ≥ 0. Hence

EΠs+1(x)[

c+x,x(Φ3) log(�+)]

≥ 0 ⇒

EΠs+1(x)[

c+x,x(Φ3) log(�+)]

+Q(Φ1) ≥ 0 +Q(Φ1)⇒

Q(Φ2) ≥ Q(Φ1)

As shown in section 4.2.2, a greedy forward algorithm can provide a good approximation of thesolution for an NP -hard set optimization problem when the objective function is a submodular-monotone set function. It is easy to see that this algorithm provides the global optimum if theobjective function is a modular-monotone set function. Here we assume that the number of trialswithin each sequence is fixed and equal to Nt. Accordingly, the deterministic greedy algorithm isdescribed in Algorithm 3. The selected subset according to this algorithm is the global optimum of

Algorithm 3: Greedy algorithm for maximization of QInput: The size of sequence set Nt.Output: Estimated sequence set Φs+1

k .

/* Initializations */

1 Φs+1k ← ∅

/* Starting the Iterations */

2 for i = 1→ Nt do/* Adding a the next optimal A ∈ 2 ⧵ Φs+1

k */

3 Φs+1k ← Φs+1

k ∪ {argmaxA ΔQ(A|Φs+1k )} where A ∈ 2 ⧵ Φs+1

k .

4 return Φs+1k

the optimization problem defined in (4.25). In the next subsection we will show the effect of thisquery selection strategy on the online performance of a standalone BCI in a letter by letter typingscenario.

94


4.3.5 Experimental Results and Discussions

In this example, we employed a set of supervised data, collected from 12 healthy participantsfollowing an IRB approved protocol [1]. In this experiment we used 16 EEG electrode locationsof Fp1, Fp2, F3, F4, Fz, Fc1, Fc2, Cz, P1, P2, C1, C2, Cp3, Cp4, P5 and P6 according to theInternational 10/20 configuration. The EEG was recorded at the sampling rate of 256Hz utilizingg.USBamp bio-signal amplifier with active g.Butterfly electrodes. These data were collected in threeseparate days each for one presentation paradigms of RCP, SCP, and RSVP at the ITI of 150ms.The order of sessions distributed uniformly among participant to exclude the effect of learning orfrustration from statistical analysis on the results. Each calibration session contains 100 sequenceswith 10 trials per sequence among which one is the target. Prior to each sequence we show the targetto the user to make the calibration task a supervised data collection session. These data sets wereused to obtain class conditional PDFs and other system parameters for each user and presentationparadigm combinations. These PDFs were then used in Monte-Carlo simulations of the system inan online typing scenario.

20 Monte-Carlo simulations of the system executed using the samples drawn from estimatedclass conditional PDFs. During each simulation, the system types missing words in 10 differentphrases. Note that the prior suggested by the context information is not always helping the user. Forinstance the user might need to type a word which is not common in English language. To considerthese cases in our analysis we define 5 different difficulty levels6 with 1 as the easiest and 5 as themost difficult words to type. These phrases in these simulations, are selected uniformly across thesefive difficulty levels. Here we compare the results for simulated online performance of our systemunder Active-RSBE framework and baseline methods which either use the entire vocabulary atevery sequence or perform random stimuli subset selections.

The performance of the system is measured in terms of: (I) total typing duration (TTD) fortyping 10 phrases –which is inversely proportional to typing speed–, and (II) probability of phrasecompletion (PPC) –which is a measure of typing accuracy. Next, we present the simulation resultsfor different presentation paradigms and for different users with various BCI usage accuracylevels. These accuracy levels are represented by the area under the receiver operating curve (AUC)values for each user. These AUC values are obtained through performing cross-validation over thesupervised data collected at calibration sessions.6Lower levels consist of copying phrases that have letters which are assigned high probabilities by the language model. As the levelincreases, the language model probabilities become increasingly adversarial. Level 3 is neutral on average.

95


4.3.5.1 RSVP Paradigm

To assess the effect of proposed query set optimization on online system performance, weconducted two sets of Monte-Carlo simulations (1) with random trial selection, and (2) with optimalquery selection. Based on an earlier experimental study, the upper bound on the number of sequencesin each epoch was set to mt = 8 and number of trials per sequence was selected as k = 14 [106, 1].The scatter plot in Figure 4.7 shows the TTD for active RSVP (ARSVP) vs. the random RSVP

Figure 4.7: Scatter plot of average TTD in minutes from 20 Monte-Carlo simulations for RSVP andARSVP [2].

paradigm. In this figure, the horizontal and vertical axes represent TTD for random RSVP andARSVP respectively. In this plot, the width and the height of the box around each data pointrepresents the standard deviation of TTD from 20 Monte-Carlo simulations for each case in thecorresponding dimensions. This figure shows that 9 out of 12 users could benefit from optimalsequence selection and achieve higher typing speed. To quantify this results we used Wilcoxonsigned-ranksum test which showed a statistically significant improvement with P < 0.03 under theassumption of the significance level of � = 0.05.

96


In these typing scenarios, we mark a phrase as incomplete if the system can not type the correctphrase in a predefined duration or if more than five consecutive mistakes occur. Consequently, wedefine the PPC as the ratio of number of completed phrases to the total number of phrases. Theestimated PPCs from the simulation sets are presented in Figure 4.8. In this figure, the AUC values

Figure 4.8: Average probability of phrase completion with 90% confidence intervals for RSVP andARSVP paradigm [2].

(which is a measure of EEG classification performance for each user) are mapped on the x-axis andthe PPCs are presented on the y-axis. The averaged PPC of 20 Monte-Carlo simulations are shownas “∗” points in green for ARSVP and as “o” in red for random RSVP. The 90% area under a betadistribution fitted to PPCs for each parameter set is obtained for both cases which are shown aserror bars in corresponding colors around the averaged PPCs.

These results suggest that the optimal query strategy improves the typing accuracy especiallyfor the AUCs∈ [0.7, 0.9], which usually this range includes most of the user in healthy population.Wincoxon signed-ranksum hypothesis testing results (P < 0.003) applied on averaged PPCs fromboth condition shows a significant improvement in typing accuracy due to optimizing query sets foreach sequence according to proposed objective function.

97


4.3.5.2 Matrix-based Presentation Paradigm with Overlapping Trials

In contrast to RSVP and SCP, trials in RCP are not constrained to be singletons. This canpotentially lead to higher typing speed. Here we propose to define a more relaxed search space forthe optimization problem at hand.

Define a function c ∶ 2 → {1, 0} where c(Ai) = [1{v1 ∈ Ai}, ⋯ , 1{v|| ∈ Ai}]T . Then,

define a || × k binary code matrix as C = [c(A1), ⋯ , c(Ak)]. Each row of this code matrixC associates a code word of zeros and ones to each state space element; i.e. the symbols in thevocabulary. In this setup the length of the codewords represents the number of trials in a sequence.The value of ones in each row defines the trials in which the corresponding character is flashed.

In particular for RCP, the trials have overlaps with Ai ∩ Aj ≤ 1, ∀ i, j ∈ {1,⋯ , |Φs+1k |}, i ≠ j.

Then under these assumptions, for an × matrix of characters, the RCP paradigm offers uniquecodewords of length + with two nonzero elements. The visual presentation component of thesystem used in this example utilizes a 4 × 7 background matrix. Hence the length of codewords inRCP is 4 + 7 = 11.

In such an RCP setup, we propose to define the C such that each letter is assigned with a uniquecodeword. For application in ERP-BCIs, we need to impose certain constraints on the search spaceto control for the frequency of letter presentation in each sequence to induce ERP in responseto intent. But it is important to note that, in oppose to RSVP-based paradigms, matrix-basedpresentation paradigms can benefit from visual evoked potentials (VEPs). Hence, we can relaxthe constraints and allow for more frequent flashes of the same character in each sequence [78].This can lead to improved typing speed by reducing the sequences length. More specifically, wedefine the feasible set such that there exist a unique codeword for each symbol while each symbol ispresented with a frequency less than 0.5 in each sequence.

Based on these propositions we define the length of the codewords equal to 6, to get at leastas many codewords as the size of our vocabulary set with 3 or less non-zero elements. With thisnumber of trials we can produce

(63

)

+(62

)

+(61

)

= 41 unique code words to be assigned to eachcharacter. The scatter plot of TTD for standard RCP (using 11 flashes corresponding to all therow and columns in each sequence) and active learning-based presentation (ALP) paradigms arepresented in Figure 4.9. This plot suggests that every user can achieve shorter TTD with the ALPparadigm. This effect is more clear for the lower AUCs that are represented by the points towards thecenter of the figure. Also, the statistical hypothesis testing result confirms a significant improvement(P < 0.0005). But on the other hand, as shown in Figure 4.10, the improvement on the PPCs isnot as significant. Here note that in RCP paradigm the EEG classification AUCs are generallyhigh and this leads to high PPCs (above %95) with no sequence optimization. Thus as expected

98


Figure 4.9: Scatter plot of total typing duration of 10 phrases in terms of minutes for RCP and ALP.

we don’t have a statistically significant improvement in PPCs (P > 0.75). But in sum, from theresults presented in Figures 4.9 and 4.10, we infer that ALP can significantly reduce the TTD whilepreserving the PPC.

Here we should mention that the proposed monotone-modular lower-bound on the approximatedobjective function is not a good objective to obtain the query sets with overlapping trials as it omitsthe effect of normalization factor in the predicted posterior PMF. This problem can be mitigated byemploying more tight bounds on the objective and changing the optimization approach accordingly.

4.3.5.3 Matrix-based Presentation Paradigm with Single Character Trials

For RSVP paradigm, it has been shown that the best typing performance can be achieved whennot all letters but a subset of vocabulary is presented in each sequence [106]. Similar to RSVP, inSCP each trial consists of a single letter and each letter is presented at most once in each sequence.

99


Figure 4.10: Average probability of phrase completion with 90% confidence intervals for RCP andALP.

Accordingly, we use the results of an RSVP study to propose that the best typing performance forSCP can be achieved with sequences of length 14 [106]. Hence, we constraint the search space ofour objective function to sequence sets of length k = 14 while each trial is a singleton i.e. |Ai| = 1.Here we compare our methods to a typical SCP paradigm in which all the vocabulary elements arepresented in random order at every sequence.

The results are summarized in Figures 4.11 & 4.12. The scatter plot of TTD is shown inFigure 4.11. In this figure the horizontal axis shows the TTD value for standard SCP paradigm andthe vertical axis shows the TTD when the active-SCP (ASCP) paradigm is used. The results heresuggest that, the typing speed of a SCP paradigm can be significantly improved when optimizedsequence sets of length 14 are used(P < 0.01). Moreover, as the results show in Figure 4.12, userscan achieve a significantly improved (P < 0.008) typing accuracy when ASCP paradigm is usedrather than standard SCP paradigm that uses all the symbols in the vocabulary at every sequence.

100


Figure 4.11: Scatter plot of TTD of 10 phrases in terms of minutes for SCP and ASCP.

101


Figure 4.12: Average probability of phrase completion with 90% confidence intervals for SCP andASCP.

102

Chapter 5

Conclusion

5.1 Work accomplished

People with severe speech and muscle impairments need alternative and augmentative solutionsfor communication and environment control. In the most severe conditions such as latest stages ofALS, assistive technologies that relay on some level of muscle control have failed to be effective.

Brain computer interfaces (BCIs) have proved to be effective at least for a portion of targetpopulation, in the lab environment. In particular, noninvasive EEG-based BCIs have been consideredas a safe and portable assistive technologies. A class of EEG-based BCIs utilize event relatedpotentials (ERPs) to detect user intent. These BCIs have been highly considered for communicationto mitigate the most important and vital need of target population. P300-matrix speller is a famousexample of ERP-based BCI for letter by letter typing. In this system, characters are distributedon a matrix of 6 × 6 on the screen and rows and columns of this matrix are flashed in a rapidmanner and in random order to induce ERP in response to users intent. It has been indicatedthat this system can offer a good performance only for users with covert attention. Consequentlypresentation techniques such as rapid serial visual presentation (RSVP) are employed in systemssuch as RSVPkeyboard™ to reduce the level of gaze dependence.

In this manuscript, we introduced an ERP-based BCI which can employ a wide range of matrix-based presentation paradigms while it offers RSVP as an option. We showed that for healthy usersthere is no statistically significant benefit when they are using matrix based presentation paradigmssuch as row and column presentation paradigm or single character presentation paradigm rather thanRSVP paradigm. Moreover we demonstrated the effect of a language model fusion in uplifting the

103

CHAPTER 5. CONCLUSION

typing accuracy and speed. The main cause of performance degradation in EEG based BCIs is lowsignal to noise ratio (SNR) of EEG and scarcity of labeled data for system parameter optimization.To mitigate this problem, we proposed a signal model which leads to a significant parameter countreduction. This method reduces estimation variance while introduces estimation bias. However, thesystem performance did not improve in a statistically meaningful manner.

To enhance the system performance (typing speed and accuracy), we employed active learningconcept to optimize the stimuli set. Therefore, we defined an objective function which exploitsthe observed evidences for a faster decision making. Then we approximated a lower bound onthe proposed NP-hard set optimization problem which is a monotone and modular set functionand can be solved efficiently. This approach improved the system performance significantly. Thebest improvement occurs when the trials in each sequence are not overlapped. Adding this stimulioptimization to inference engine of our system led to a framework which we call as active-RBSE.

Despite all efforts for improving the system performance, sometimes the inference algorithmmight lead to an incorrect decision. In our system, a backspace symbol is offered to the user fordeleting miss-typed symbols. However, based on the work-flow of the system, each error correctionneeds two back to back correct selection. Hence, error corrections can lead to long cycles especiallywhen the classifier is not very accurate.

We proposed to mitigate this problem by fusing the probability of user agreement to the systemdecision. We estimate the likelihood of user confirmation or disagreement from Error RelatedPotentials (ErRPs) which is induced in response to systems wrong decisions. Experimental resultspresented a significant performance improvement when ErRP was used in inference mechanism.

5.2 Suggested future works

In this section, we suggest some ideas for improving the current system performance..

∙ Correct design of our system can employ different stimulation techniques to cue the user forher/his intent. One possible way for performance enhancement is to use hybrid stimulationparadigms. For example, one can use the frequency-based features (such as SSVEP) and ERPcan be induced simultaneously in response to user intent.

∙ Decision making component of our system is designed based on a naive independent as-sumption among EEG feature vectors of different trials while they have overlaps. A possibleimprovement is to use a more realistic signal model that use the dependency information togain more accuracy.

104

CHAPTER 5. CONCLUSION

∙ Due to the nature of RSVP paradigm, it is possible that user confuses similar characters andgenerates ERP in response to an undesired symbol. It might be possible to learn a matrix ofconfusion probabilities and use that during inference.

∙ The sequence optimization method for active RBSE is now optimal for presentation paradigmsin which trials of a sequence does not have common symbols.

However, to solve the general problem, it is necessary to use a different technique to solve theproblem when trials are not exclusive.

∙ Currently, the prospective candidate for inducing ErRP is selected based on the maximuminformation transfer rate by reducing the posterior entropy. However, one can frame theproblem into the reinforcement learning setup and use different objective functions to gain abetter performance metric.

105

Appendix A

Error-Related Potentials for EEG-based

Typing Systems

Paula Gonzalez-Navarro1,Mohammad Moghadamfalahi1, Student Member, IEEE,

Murat Akcakaya2, Member, IEEE,and Deniz Erdogmus1, Senior Member, IEEE

1Northeastern University, Boston, MA 02115,2University of Pittsburgh, Pittsburgh, PA 15260,

E-mails: {gonzaleznavarro,moghadam,erdogmus}@ece.neu.edu,[email protected]

Phone: +1-617-3733021

A.1 Introduction:

Event related potential (ERP)-based typing systems can provide means of communication forpeople with severe neuromuscular impairments. During visual presentation of the letters of thealphabet, detection of ERPs in EEG corresponding to a target stimulus can be used to detect userintent. However, EEG has very low signal-to-noise ratio and making confident decisions becomes

This work was supported by NIH grant R01DC009834 and NSF grants CNS-1136027, IIS-1149570, SMA-0835976.The package including the code and data associate with this paper can be findat “https://repository.lib.neu.edu/collections/neu:rx913r029”

106

https://repository.lib.neu.edu/collections/neu:rx913r029

APPENDIX A. ERROR-RELATED POTENTIALS FOR EEG-BASED TYPING SYSTEMS

a challenge. To increase accuracy, repeated stimuli are normally used to detect the user intentwhich decreases the typing speed. In addition, the incorporation of the backspace symbol is usedfor error correction. However, it also considerably decreases the speed since it requires at leasttwo selective actions: correctly detecting the backspace symbol and reselecting the intended usersymbol. Alternatively, we propose to use the detection of error related potentials (ErrP) in the EEGresponse and propose different probabilistic approaches to incorporate ErrP evidences in decisionmaking and auto-correction. With simulations on prerecorded real EEG calibration data using ourBCI typing system RSVPkeyboardTM [1], we show that our auto-correction method can improvetyping speed without sacrificing accuracy.

A.2 Materials:

RSVP Keyboard™ currently uses the g.USBamp biosignal amplifier from g.Tec (Graz, Austria)and its EEG cap and electrodes; g.GAMMAcap with g.Butterfly active electrodes. Softwareimplementation uses Matlab.

A.3 Method:

Sequences of symbols are presented to the user, who is asked to pay attention to the desiredsymbol to be typed. The EEG signal collected during the visual stimulation is employed in adecision-making procedure that uses a maximum a posteriori (MAP) inference mechanism. Duringthis procedure, the context information from a language model (LM) is probabilistically fusedwith EEG evidence. After each sequence, the posterior probability for all symbols is computed.Sequences are repeated until a predefined confidence threshold or a user-defined limit on themaximum number of sequences (NMAX) is reached. We call this procedure an epoch and at the endof an epoch the most likely symbol is typed. An ErrP response should be elicited if the typed symbolis different from the one that the user intended. Since we would like to benefit from ErrP evidencebefore the system commits to a symbol, we introduced the concept of prospective symbol, whichpresents the most likely symbol at the end of a sequence, when this symbol becomes sufficientlylikely. This prospective symbol presentation elicits an ErrP response and the posterior probabilitiesfor all symbols are updated taking into account this additional piece of evidence. Type ProspectiveSymbol: If NMAX is reached or the ErrP-informed posterior probability distribution maintainsthe prospective symbol as the most likely in the alphabet, and the revised posterior of this topcandidate exceeds the required confidence threshold for typing, this symbols is typed. Do Not Type

107


Prospective Symbol Yet: If the typing conditions are not met for the current prospective symbol,it is not typed. We simulated two strategies under this outcome: [Case 1] discard all evidence incurrent epoch and restart the current epoch; [Case 2] Keep all evidence and continue the epoch bypresenting a new sequence of symbols. The first case is analogous to using an ErrP classifier toapprove or delete a typed symbol after a hard decision is made.

A.4 Results:

We perform 25 Monte Carlo simulations of a task that involves copying 10 predetermined phrases,with synthetic EEG evidence generated with models obtained using existing EEG calibration datafrom two users with different EEG accuracy levels for ERP and ErrP features. In these simulations,the maximum number of sequences was set as NMAX = 8. Estimated average time to successfullycomplete the task (Test) for the two proposed strategies Case 1 and Case 2, as well as the baselineERP-based inference (Case 0; with no ErrP evidence or prospective symbol presentation) areobtained from simulations. Results are summarized in Figure A.1.

Figure A.1: Average time to successfully complete the typing task for two users.

As expected Case 2 performs best, and both proposed strategies with ErrP evidence outperformedthe baseline that did not use ErrP.

108


A.5 Discussion & Significance:

We will test these strategies extensively in human-in-the-loop real-time typing experiments andreport results in a journal publication. Appropriate use of additional ErrP evidence will improveBCI performance.

References

[1] M. Moghadamfalahi, U. Orhan, M. Akcakaya, H. Nezamfar, M. Fried-Oken, and D. Er-dogmus, “Language-model assisted brain computer interface for typing: A comparisonof matrix and rapid serial visual presentation,” IEEE Transactions on Neural Systems andRehabilitation Engineering, vol. PP, no. 99, pp. 1–1, 2015.

[2] M. Moghadamfalahi, J. Sourati, M. Akcakaya, H. Nezamfar, M. Haghighi, and D. Erdogmus,“Active learning for efficient querying from a human oracle with noisy response in a language-model assisted brain computer interface,” in 2015 IEEE 25th International Workshop onMachine Learning for Signal Processing (MLSP). IEEE, 2015, pp. 1–6.

[3] M. Akcakaya, B. Peters, M. Moghadamfalahi, A. Mooney, U. Orhan, B. Oken, D. Erdogmus,and M. Fried-Oken, “Noninvasive brain computer interfaces for augmentative and alternativecommunication,” Biomedical Engineering, IEEE Reviews in, vol. 7, no. 1, pp. 31–49, 2014.

[4] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain–computer interfaces for communication and control,” Clinical neurophysiology, vol. 113,no. 6, pp. 767–791, 2002.

[5] N. Weiskopf, R. Veit, M. Erb, K. Mathiak, W. Grodd, R. Goebel, and N. Birbaumer, “Physio-logical self-regulation of regional brain activity using real-time functional magnetic resonanceimaging (fmri): methodology and exemplary data,” Neuroimage, vol. 19, no. 3, pp. 577–586,2003.

[6] S. Waldert, H. Preissl, E. Demandt, C. Braun, N. Birbaumer, A. Aertsen, and C. Mehring,“Hand movement direction decoded from meg and eeg,” The Journal of neuroscience, vol. 28,no. 4, pp. 1000–1008, 2008.

[7] S. Coyle, T. Ward, C. Markham, and G. McDarby, “On the suitability of near-infrared (nir)systems for next-generation brain–computer interfaces,” Physiological measurement, vol. 25,no. 4, p. 815, 2004.

[8] S. Fager, D. R. Beukelman, M. Fried-Oken, T. Jakobs, and J. Baker, “Access interfacestrategies,” Assistive Technology, vol. 24, no. 1, pp. 25–33, 2012.

[9] S. Sutton, M. Braren, J. Zubin, and E. John, “Evoked-potential correlates of stimulusuncertainty,” Science, vol. 150, no. 3700, pp. 1187–1188, 1965.

109


[10] L. Farwell and E. Donchin, “Talking off the top of your head: Toward a mental prosthesis uti-lizing event-related brain potentials,” Electroencephalography and clinical Neurophysiology,vol. 70, pp. 510–523, 1988.

[11] E. Donchin, K. M. Spencer, and R. Wijesinghe, “The mental prosthesis: assessing the speedof a p300-based brain-computer interface,” IEEE transactions on rehabilitation engineering,vol. 8, no. 2, pp. 174–179, 2000.

[12] L. Bianchi, S. Sami, A. Hillebrand, I. P. Fawcett, L. R. Quitadamo, and S. Seri, “Whichphysiological components are more suitable for visual erp based brain–computer interface? apreliminary meg/eeg study,” Brain topography, vol. 23, no. 2, pp. 180–185, 2010.

[13] V. Bostanov, “Bci competition 2003-data sets ib and iib: feature extraction from event-relatedbrain potentials with the continuous wavelet transform and the t-value scalogram,” IEEETransactions on Biomedical engineering, vol. 51, no. 6, pp. 1057–1061, 2004.

[14] H. Cecotti, B. Rivet, M. Congedo, C. Jutten, O. Bertrand, E. Maby, and J. Mattout, “A robustsensor-selection method for p300 brain–computer interfaces,” Journal of neural engineering,vol. 8, no. 1, p. 016001, 2011.

[15] A. Combaz, N. V. Manyakov, N. Chumerin, J. A. Suykens, and M. M. Van Hulle, “Featureextraction and classification of eeg signals for rapid p300 mind spelling,” in Machine Learningand Applications, 2009. ICMLA’09. International Conference on. IEEE, 2009, pp. 386–391.

[16] A. Combaz, N. Chumerin, N. V. Manyakov, A. Robben, J. A. Suykens, and M. M. Van Hulle,“Error-related potential recorded by eeg in the context of a p300 mind speller brain-computerinterface,” in 2010 IEEE International Workshop on Machine Learning for Signal Processing.IEEE, 2010, pp. 65–70.

[17] ——, “Towards the detection of error-related potentials and its integration in the context of ap300 speller brain–computer interface,” Neurocomputing, vol. 80, pp. 73–82, 2012.

[18] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter, “Bci competition2003-data set iib: support vector machines for the p300 speller paradigm,” IEEE Transactionson Biomedical Engineering, vol. 51, no. 6, pp. 1073–1076, 2004.

[19] P.-J. Kindermans, D. Verstraeten, and B. Schrauwen, “A bayesian model for exploitingapplication constraints to enable unsupervised training of a p300-based bci,” PloS one, vol. 7,no. 4, p. e33758, 2012.

[20] P.-J. Kindermans, H. Verschore, D. Verstraeten, and B. Schrauwen, “A p300 bci for themasses: Prior information enables instant unsupervised spelling,” in Advances in NeuralInformation Processing Systems, 2012, pp. 710–718.

[21] D. J. Krusienski, E. W. Sellers, F. Cabestaing, S. Bayoudh, D. J. McFarland, T. M. Vaughan,and J. R. Wolpaw, “A comparison of classification techniques for the p300 speller,” Journalof neural engineering, vol. 3, no. 4, p. 299, 2006.

[22] D. J. Krusienski, E. W. Sellers, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, “Towardenhanced p300 speller performance,” Journal of neuroscience methods, vol. 167, no. 1, pp.

110


15–21, 2008.[23] Y. Li, C. Guan, H. Li, and Z. Chin, “A self-training semi-supervised svm algorithm and its

application in an eeg-based brain computer interface speller system,” Pattern RecognitionLetters, vol. 29, no. 9, pp. 1285–1294, 2008.

[24] D. J. McFarland, W. A. Sarnacki, and J. R. Wolpaw, “Should the parameters of a bcitranslation algorithm be continually adapted?” Journal of neuroscience methods, vol. 199,no. 1, pp. 103–107, 2011.

[25] R. C. Panicker, S. Puthusserypady, and Y. Sun, “Adaptation in p300 brain–computer inter-faces: A two-classifier cotraining approach,” IEEE Transactions on Biomedical Engineering,vol. 57, no. 12, pp. 2927–2935, 2010.

[26] A. Rakotomamonjy, V. Guigue, G. Mallet, and V. Alvarado, “Ensemble of svms for improvingbrain computer interface p300 speller performances,” in International conference on artificialneural networks. Springer, 2005, pp. 45–50.

[27] A. Rakotomamonjy and V. Guigue, “Bci competition iii: dataset ii-ensemble of svms for bcip300 speller,” IEEE transactions on biomedical engineering, vol. 55, no. 3, pp. 1147–1154,2008.

[28] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xdawn algorithm to enhance evokedpotentials: application to brain–computer interface,” IEEE Transactions on BiomedicalEngineering, vol. 56, no. 8, pp. 2035–2043, 2009.

[29] D. B. Ryan, G. Frye, G. Townsend, D. Berry, S. Mesa-G, N. A. Gates, and E. W. Sellers,“Predictive spelling with a p300-based brain–computer interface: Increasing the rate ofcommunication,” Intl. Journal of Human–Computer Interaction, vol. 27, no. 1, pp. 69–84,2010.

[30] H. Serby, E. Yom-Tov, and G. F. Inbar, “An improved p300-based brain-computer interface,”IEEE Transactions on neural systems and rehabilitation engineering, vol. 13, no. 1, pp.89–98, 2005.

[31] Y. Shahriari and A. Erfanian, “Improving the performance of p300-based brain–computerinterface through subspace-based filtering,” Neurocomputing, vol. 121, pp. 434–441, 2013.

[32] W. Speier, C. Arnold, J. Lu, R. K. Taira, and N. Pouratian, “Natural language processingwith dynamic classification improves p300 speller accuracy and bit rate,” Journal of neuralengineering, vol. 9, no. 1, p. 016004, 2012.

[33] M. Spüler, W. Rosenstiel, and M. Bogdan, “Online adaptation of a c-vep brain-computerinterface (bci) based on error-related potentials and unsupervised learning,” PloS one, vol. 7,no. 12, p. e51077, 2012.

[34] M. Thulasidas, C. Guan, and J. Wu, “Robust classification of eeg signal for brain-computerinterface,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 14,no. 1, p. 24, 2006.

[35] M. S. Treder and B. Blankertz, “(C)overt attention and visual speller design in an ERP-based

111


brain-computer interface.” Behavioral and brain functions : BBF, vol. 6, no. 1, p. 28, Jan.2010. [Online]. Available: http://www.behavioralandbrainfunctions.com/content/6/1/28

[36] L. Acqualagna, M. S. Treder, M. Schreuder, and B. Blankertz, “A novel brain-computerinterface based on the rapid serial visual presentation paradigm,” in Engineering in Medicineand Biology Society (EMBC), 2010 Annual International Conference of the IEEE. IEEE,2010, pp. 2686–2689.

[37] L. Acqualagna and B. Blankertz, “Gaze-independent bci-spelling using rapid serial visualpresentation (rsvp),” Clinical Neurophysiology, vol. 124, no. 5, pp. 901–908, 2013.

[38] U. Orhan, K. E. Hild, D. Erdogmus, B. Roark, B. Oken, and M. Fried-Oken, “Rsvp keyboard:An eeg based typing interface,” Acoustics, Speech and Signal Processing (ICASSP), 2012IEEE International Conference on, pp. 645 – 648, 2012.

[39] U. Orhan, D. Erdogmus, B. Roark, B. Oken, S. Purwar, K. E. Hild, A. Fowler, and M. Fried-Oken, “Improved accuracy using recursive bayesian estimation based language model fusionin erp-based bci typing systems,” in Engineering in Medicine and Biology Society (EMBC),2012 Annual International Conference of the IEEE. IEEE, 2012, pp. 2497–2500.

[40] U. Orhan, D. Erdogmus, B. Roark, B. Oken, and M. Fried-Oken, “Offline analysis of contextcontribution to erp-based typing bci performance,” Journal of neural engineering, vol. 10,no. 6, p. 066003, 2013.

[41] M. S. Treder, N. M. Schmidt, and B. Blankertz, “Gaze-independent brain–computer interfacesbased on covert attention and feature attention,” Journal of neural engineering, vol. 8, no. 6,p. 066003, 2011.

[42] P. Aricò, F. Aloise, F. Schettini, A. Riccio, S. Salinari, F. Babiloni, D. Mattia, and F. Cin-cotti, “Geospell: an alternative p300-based speller interface towards no eye gaze required,”International Journal of Bioelectromagnetism, vol. 13, no. 3, pp. 152–153, 2011.

[43] F. Schettini, F. Aloise, P. Arico, S. Salinari, D. Mattia, and F. Cincotti, “Control or no-control?reducing the gap between brain-computer interface and classical input devices,” in 2012Annual International Conference of the IEEE Engineering in Medicine and Biology Society.IEEE, 2012, pp. 1815–1818.

[44] Y. Liu, Z. Zhou, and D. Hu, “Gaze independent brain–computer speller with covert visualsearch tasks,” Clinical Neurophysiology, vol. 122, no. 6, pp. 1127–1136, 2011.

[45] U. Hoffmann, J.-M. Vesin, T. Ebrahimi, and K. Diserens, “An efficient p300-based brain–computer interface for disabled subjects,” Journal of Neuroscience methods, vol. 167, no. 1,pp. 115–125, 2008.

[46] J. D. Bayliss, S. A. Inverso, and A. Tentler, “Changing the p300 brain computer interface,”CyberPsychology & Behavior, vol. 7, no. 6, pp. 694–704, 2004.

[47] S. Blain-Moraes, R. Schaff, K. L. Gruis, J. E. Huggins, and P. A. Wren, “Barriers to andmediators of brain–computer interface user acceptance: focus group findings,” Ergonomics,vol. 55, no. 5, pp. 516–525, 2012.

112

http://www.behavioralandbrainfunctions.com/content/6/1/28


[48] M. Marchetti, F. Piccione, S. Silvoni, and K. Priftis, “Exogenous and endogenous orientingof visuospatial attention in p300-guided brain computer interfaces: A pilot study on healthyparticipants,” Clinical Neurophysiology, vol. 123, no. 4, pp. 774–779, 2012.

[49] M. Marchetti, F. Piccione, S. Silvoni, L. Gamberini, and K. Priftis, “Covert visuospatialattention orienting in a brain-computer interface for amyotrophic lateral sclerosis patients,”Neurorehabilitation and neural repair, p. 1545968312471903, 2013.

[50] M. Marchetti, F. Onorati, M. Matteucci, L. Mainardi, F. Piccione, S. Silvoni, and K. Priftis,“Improving the efficacy of erp-based bcis using different modalities of covert visuospatialattention and a genetic algorithm-based classifier,” PloS one, vol. 8, no. 1, p. e53946, 2013.

[51] L. Mayaud, S. Filipe, L. Petegnief, O. Rochecouste, and M. Congedo, “Robust brain-computer interface for virtual keyboard (robik): project results,” IRBM, vol. 34, no. 2, pp.131–138, 2013.

[52] T. Kaufmann, S. Völker, L. Gunesch, and A. Kübler, “Spelling is just a click away–auser-centered brain–computer interface including auto-calibration and predictive text entry,”Frontiers in neuroscience, vol. 6, 2012.

[53] J. Jin, B. Z. Allison, E. W. Sellers, C. Brunner, P. Horki, X. Wang, and C. Neuper, “Optimizedstimulus presentation patterns for an event-related potential EEG-based brain–computerinterface,” Medical & biological engineering & computing, vol. 49, no. 2, pp. 181–191, 2011.

[54] M. van der Waal, M. Severens, J. Geuze, and P. Desain, “Introducing the tactile speller: anerp-based brain–computer interface for communication,” Journal of Neural Engineering,vol. 9, no. 4, p. 045002, 2012.

[55] N. Lu and D. Zimmerman, “On the Likelihood-based Inference for a Separable CovarianceMatrix,” Department Of Statistics and Actuarial Science, Univ. of Iowa, Iowa City, Iowa,Tech. Rep. 337, 2004.

[56] E. Alpaydin, Introduction to machine learning. MIT press, 2014.[57] C. C. Bishop and N. Nasrabadi, Pattern recognition and machine learning. Springer, 2006.[58] J. H. Friedman, “Regularized Discriminant Analysis,” Journal of the American Statistical

Association, vol. 84, no. 405, pp. 165–175, 1989.[59] B. Z. Allison, J. Pineda et al., “ERPs evoked by different matrix sizes: implications for a

brain computer interface (BCI) system,” Neural Systems and Rehabilitation Engineering,IEEE Transactions on, vol. 11, no. 2, pp. 110–113, 2003.

[60] E. W. Sellers and E. Donchin, “A p300-based brain–computer interface: initial tests by alspatients,” Clinical neurophysiology, vol. 117, no. 3, pp. 538–548, 2006.

[61] R. Fazel-Rezai and K. Abhari, “A comparison between a matrix-based and a region-basedp300 speller paradigms for brain-computer interface,” in Engineering in Medicine andBiology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE.IEEE, 2008, pp. 1147–1150.

[62] N. Hill, T.Navin.Lal, K.Bierig, N.Birbaumer, and B.Cholkopf, “An Auditory Paradigm for

113


Brain-Computer Interfaces.” Advance in Neural Information Processing Systems., vol. 17,pp. 569–576, 2005.

[63] J. Jin, E. W. Sellers, S. Zhou, Y. Zhang, X. Wang, and A. Cichocki, “A p300 brain–computerinterface based on a modification of the mismatch negativity paradigm,” International journalof neural systems, vol. 25, no. 03, p. 1550011, 2015.

[64] G. Townsend, B. LaPallo, C. Boulay, D. Krusienski, G. Frye, C. Hauser, N. Schwartz,T. Vaughan, J. Wolpaw, and E. Sellers, “A novel P300-based brain–computer interface stimu-lus presentation paradigm: moving beyond rows and columns,” Clinical Neurophysiology,vol. 121, no. 7, pp. 1109–1120, 2010.

[65] G. Townsend, J. Shanahan, D. B. Ryan, and E. W. Sellers, “A general P300 brain–computerinterface presentation paradigm based on performance guided constraints,” Neuroscienceletters, vol. 531, no. 2, pp. 63–68, 2012.

[66] J. Höhne, M. Schreuder, B. Blankertz, and M. Tangermann, “A novel 9-class auditory ERPparadigm driving a predictive text entry system,” Frontiers in Neuroscience, vol. 5, no. AUG,pp. 1–10, 2011.

[67] K. Takano, T. Komatsu, N. Hata, Y. Nakajima, and K. Kansaku, “Visual stimuli for thep300 brain–computer interface: a comparison of white/gray and green/blue flicker matrices,”Clinical neurophysiology, vol. 120, no. 8, pp. 1562–1566, 2009.

[68] S. Lee and H.-S. Lim, “Predicting text entry for brain-computer interface,” in Future Infor-mation Technology. Springer, 2011, pp. 309–312.

[69] U. Orhan, D. Erdogmus, K. E. Hild, B. Roark, B. Oken, and M. Fried-Oken, “Contextinformation significantly improves brain computer interface performance-a case study ontext entry using a language model assisted bci,” in Asilomar Conference on Signals, Systemsand Computers (ASILOMAR). IEEE, 2011, pp. 132–136.

[70] E. Samizo, T. Yoshikawa, and T. Furuhashi, “A study on application of rb-arq consideringprobability of occurrence and transition probability for p300 speller,” in Foundations ofAugmented Cognition. Springer, 2013, pp. 727–733.

[71] C. Ulas and M. Cetin, “Incorporation of a language model into a brain computer interfacebased speller through hmms,” in Acoustics, Speech and Signal Processing (ICASSP), 2013IEEE International Conference on. IEEE, 2013, pp. 1138–1142.

[72] E. E. Sutter, “The brain response interface: Communication through visually-inducedelectrical brain responses,” J. Microcomput. Appl., vol. 15, no. 1, pp. 31–45, Jan. 1992.[Online]. Available: http://dx.doi.org/10.1016/0745-7138(92)90045-7

[73] S. Amiri, R. Fazel-Rezai, and V. Asadpour, “A review of hybrid brain-computer interfacesystems,” Advances in Human-Computer Interaction, vol. 2013, p. 1, 2013.

[74] C.-T. Lin, L.-W. Ko, M.-H. Chang, J.-R. Duann, J.-Y. Chen, T.-P. Su, and T.-P. Jung, “Reviewof wireless and wearable electroencephalogram systems and brain-computer interfaces - amini-review,” Gerontology, vol. 56, no. 1, pp. 112–119, 2010.

114

http://dx.doi.org/10.1016/0745-7138(92)90045-7


[75] S. Moghimi, A. Kushki, A. M. Guerguerian, and T. Chau, “A review of eeg-based brain-computer interfaces as access pathways for individuals with severe disabilities,” AssistiveTechnology: The Official Journal of RESNA, vol. 25, no. 2, pp. 99–110, 2012.

[76] E. Sellers, G. Schalk, and E. Donchin, “The p300 as a typing tool: tests of brain computerinterface with an als patient,” Psychophysiology, vol. 40, p. 77, 2003.

[77] C. Guan, M. Thulasidas, and J. Wu, “High performance p300 speller for brain-computerinterface,” in Biomedical Circuits and Systems, 2004 IEEE International Workshop on.IEEE, 2004, pp. S3–5.

[78] S. Chennu, A. Alsufyani, M. Filetti, A. M. Owen, and H. Bowman, “The cost of spaceindependence in p300-bci spellers,” Journal of neuroengineering and rehabilitation, vol. 10,no. 82, pp. 1–13, 2013.

[79] Y. Huang, “Event-related potentials in electroencephalography: Characteristics and single-trial detection for rapid object search,” Ph.D. dissertation, Oregon Health & Science Univer-sity, Portland, Oregon, USA, 2010.

[80] J. H. Friedman, “Regularized discriminant analysis,” Journal of the American statisticalassociation, vol. 84, no. 405, pp. 165–175, 1989.

[81] B. W. Silverman, Density estimation for statistics and data analysis. CRC press, 1986,vol. 26.

[82] B. Roark, J. D. Villiers, C. Gibbons, and M. Fried-Oken, “Scanning methods and languagemodeling for binary switch typing,” in Proceedings of the NAACL HLT 2010 Workshop onSpeech and Language Processing for Assistive Technologies. Association for ComputationalLinguistics, 2010, pp. 28–36.

[83] B. S. Oken, U. Orhan, B. Roark, D. Erdogmus, A. Fowler, A. Mooney, B. Peters,M. Miller, and M. B. Fried-Oken, “Brain–computer interface with language model–electroencephalography fusion for locked-in syndrome,” Neurorehabilitation and neuralrepair, p. 1545968313516867, 2013.

[84] E. W. Sellers, D. J. Krusienski, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, “A p300event-related potential brain–computer interface (bci): the effects of matrix size and interstimulus interval on performance,” Biological psychology, vol. 73, no. 3, pp. 242–252, 2006.

[85] H. Nezamfar, U. Orhan, D. Erdogmus, K. Hild, S. Purwar, B. Oken, and M. Fried-Oken,“On visually evoked potentials in eeg induced by multiple pseudorandom binary sequencesfor brain computer interface design,” in Acoustics, Speech and Signal Processing (ICASSP),2011 IEEE International Conference on, May 2011, pp. 2044–2047.

[86] M. Xu, H. Qi, B. Wan, T. Yin, Z. Liu, and D. Ming, “A hybrid bci speller paradigm combiningp300 potential and the ssvep blocking feature,” Journal of Neural Engineering, vol. 10, no. 2,p. 026001, 2013.

[87] B. Properties, “Gaussian processes,” Encyclopedia of Environmetrics, Section on StochasticModeling and Environmental Change, 2001.

115


[88] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (AdaptiveComputation and Machine Learning). The MIT Press, 2005.

[89] F. J. M. Jr., “The kolmogorov-smirnov test for goodness of fit,” Journal of the AmericanStatistical Association, vol. 46, no. 253, pp. 68–78, 1951.

[90] D. B. Ryan, G. E. Frye, G. Townsend, D. R. Berry, S. Mesa-G, N. A. Gates, and E. W.Sellers, “Predictive spelling with a p300-based brain–computer interface: Increasing the rateof communication,” International Journal of Human-Computer Interaction, vol. 27, no. 1,pp. 69–84, 2015/07/16 2010.

[91] B. Blankertz, F. Losch, M. Krauledat, G. Dornhege, G. Curio, and K.-R. Muller, “The berlinbrain–computer interface: Accurate performance from first-session in bci-naïve subjects,”Biomedical Engineering, IEEE Transactions on, vol. 55, no. 10, pp. 2452–2462, Oct 2008.

[92] M. Moghadamfalahi, U. Orhan, M. Akcakaya, and D. Erdogmus, “Bayesian Priors forClassifier Design in RSVP Keyboard,” 2013.

[93] M. S. Treder and B. Blankertz, “(C)overt attention and visual speller design in an ERP-basedbrain-computer interface.” Behavioral and brain functions : BBF, vol. 6, no. 1, p. 28, Jan.2010. [Online]. Available: http://www.behavioralandbrainfunctions.com/content/6/1/28

[94] P. Stoica and M. Viberg, “Maximum likelihood parameter and rank estimation in reduced-rank multivariate linear regressions,” Signal Processing, IEEE Transactions on, vol. 44,no. 12, pp. 3069–3078, Dec 1996.

[95] O. Ledoit and M. Wolf, “Improved estimation of the covariance matrix of stock returns withan application to portfolio selection,” Journal of Empirical Finance, vol. 10, no. 5, pp. 603 –621, 2003.

[96] J. H. Friedman, “Regularized discriminant analysis,” Journal of the American statisticalassociation, vol. 84, no. 405, pp. 165–175, 1989.

[97] U. Orhan, “RSVP keyboard : an EEG based BCI typing system with context informationfusion,” 2014.

[98] S. Kay, “Recursive maximum likelihood estimation of autoregressive processes,” Acoustics,Speech and Signal Processing, IEEE Transactions on, vol. 31, no. 1, pp. 56–65, Feb 1983.

[99] P. Gonzalez-Navarro, “Spatio-temporal EEG Models for BCI,” Technical Report,BSPIRAL-150724-R0, Northeastern University, Tech. Rep., 2015. [Online]. Available:http://hdl.handle.net/2047/d20194049

[100] K. Werner, M. Jansson, and P. Stoica, “On Estimation of Covariance Matrices With KroneckerProduct Structure,” vol. 56, no. 2, pp. 478–491, 2008.

[101] A. Krause and D. Golovin, “Submodular function maximization,” Tractability: PracticalApproaches to Hard Problems, vol. 3, p. 19, 2012.

[102] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations formaximizing submodular set functions—I,” Mathematical Programming, vol. 14, no. 1, pp.265–294, 1978.

116

http://www.behavioralandbrainfunctions.com/content/6/1/28

http://hdl.handle.net/2047/d20194049


[103] S. Faul, G. Gregorcic, G. Boylan, W. Marnane, G. Lightbody, and S. Connolly, “Gaussianprocess modeling of eeg for the detection of neonatal seizures,” IEEE Transactions onBiomedical Engineering, vol. 54, no. 12, pp. 2151–2162, 2007.

[104] M. Zhong, F. Lotte, M. Girolami, and A. Lécuyer, “Classifying eeg for brain computerinterfaces using gaussian processes,” Pattern Recognition Letters, vol. 29, no. 3, pp. 354–359,2008.

[105] S. Kay, “Fundamentals of Statistical Signal Processing, Volume II: Detection Theory.” 2008.[106] M. Moghadamfalahi, P. Gonzalez-Navarro, M. Akcakaya, U. Orhan, and D. Erdogmus, “The

Effect of Limiting Trial Count in Context Aware BCIs: A Case Study with Language ModelAssisted Spelling,” in Foundations of Augmented Cognition. Springer, 2015, pp. 281–292.

117

language model assisted eeg-based brain computer …cj82pq66g/... · simulation for each variable...

Documents