[ieee 2010 4th international symposium on communications, control and signal processing (isccsp) -...

6
Proceedings of the 4th International Symposium on Communications, Control and Signal Processing, ISCCSP 2010, Limassol, cyprus, 3-5 March 2010 The Bilinear Brain: Towards Subject-Invariant Analysis Christoforos Christoforou, Robert M. Haralick, Paul Sajda and Lucas C. Parra Abstract-A major challenge in single-trial electroencephalog- raphy (EEG) analysis and Brain Computer Interfacing (BCI) is the so called, inter-subject/inter-session variability: (i.e large variability in measurements obtained during different record- ing sessions). This variability restricts the number of samples available for single-trial analysis to a limited number that can be obtained during a single session. Here we propose a novel method that distinguishes between subject-invariant features and subject-specific features, based on a bilinear formulation. The method allows for one to combine multiple recording of EEG to estimate the subject-invariant parameters, hence addressing the issue of inter-subject variability, while reducing the complexity of estimation for the subject-specific parameters. The method is demonstrated on 34 datasets from two different experimental paradigms: Perception categorization task and Rapid Serial Visual Presentation (RSVP) task. We show significant improve- ments in classification performance over state-of-the-art methods. Further, our method extracts neurological components never before reported on the RSVP thus demonstrating the ability of our method to extract novel neural signatures from the data. I. INTRODUCTION Traditionally, the term Brain Computer Interfaces (BCI) refers to algorithms that aim to decode brain activity, on a single-trial basis, in order to provide a direct control pathway between a user's intentions and a computer [1], [2], [3], [4]. Such an interface could provide locked in patients a more direct and natural control over a neuroprosthesis or other computer applications [2]. Further, by providing an additional communication channel for healthy individuals, BCI systems can be used to increase productivity and efficiency in high- throughput tasks [5], [6]. Single-trial discriminant analysis has also been used as a research tool to study the neural correlates of behavior. The low signal-to-noise ratio (SNR) of EEG can be overcome by extracting activity that differs maximally between two ex- perimental conditions. The resulting discriminant components can be used to identify the spatial origin and time course of stimulus/response specific activity. Further, improved SNR can be leveraged to correlate variability of neural activity across trials to behavioral variability and behavioral performance Christoforos Christoforou is with R.K.I Leaders Limited, Agias Triados 26A, Aradippou, Cyprus (e-mail: [email protected]) Robert M. Haralick is with the Department of Computer Science, The Graduate Center of The City University of New York,New York, NY 10011, USA (e-mail: [email protected]) Paul Sajda is with the Department of Biomedical Engineering, Columbia University, New York, NY 10027 USA.(e-mail: [email protected]) Lucas C. Parra is with the Department of Biomedical Engineering, The City College of The City University of New York, New York,NY 10031, USA (e-mail: [email protected]) 978-1-4244-6287-2/10/$26.00 ©2010 IEEE [7], [5], [8]. In essence, discriminant analysis adds to the existing set of multi-variate statistical tools commonly used in neuroscience research (ANOVA, Hoteling T 2 , Wilks A test). A major challenge in a single-trial analysis of EEG is the so called inter-subject/inter-session variability. Different sessions of EEG recordings vary dramatically, even when the same subject is used. The causes of this phenomenon are found in the experimental setup procedure, variations in electrode position placing and conductivity, the anatomical structure of the brain, and the individual's alertness at the time of the experiment, to name a few. Due to inter-subject variability, BCI algorithms proposed thus far are restricted to applying single-trial EEG analysis separately to data obtained from each session. Further, it enforces a constraint on the number of training-set samples to be used in single-trial discriminant models, since there is a limit on the number of trials that can be obtained by a subject in a single session. In this paper, we propose a novel method that distinguishes between subject-invariant features and subject-specific fea- tures, based on a bilinear formulation. The method allows for one to combine multiple recordings of EEG hence addressing the issue of inter-subject variability while reducing the com- plexity of estimation for the subject-specific parameters. The following hypothesis motivates our approach: Hypothesis: Spatio-temporal characteristics of EEG signals vary greatly among subjects or repetitions of EEG experiments. However, for any specific ex- perimental paradigm, the underlying neural rhythmic activity - the cause of oscillatory and evoked related features in EEG - associated with a task of interest remains invariant. This hypothesis is implicitly assumed in most current single- trial analysis algorithms. For example, in the BCI task, where a subject is asked to imagine a movement of the left or the right hand, a preprocessing step involves band-pass filtering the data in the a-band (8Hz-13Hz) [9]. The selection of this band was motivated by neurological studies and is tied to this specific experimental paradigm, independent of the subject performing the task. Another example is the case of Rapid Serial Visual Presentation (RSVP) experiments were series of images are shown to a subject. The goal is for the subject to identify images that contain a target of interest. In such experimental paradigms one looks for an evoked potential at around 300ms after the stimulus (called P300 signal)[6]. Again this signal

Upload: lucas-c

Post on 13-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

Proceedings of the 4th International Symposium on Communications,Control and Signal Processing, ISCCSP 2010, Limassol, cyprus, 3-5 March 2010

The Bilinear Brain:Towards Subject-Invariant Analysis

Christoforos Christoforou, Robert M. Haralick, Paul Sajda and Lucas C. Parra

Abstract-A major challenge in single-trial electroencephalog­raphy (EEG) analysis and Brain Computer Interfacing (BCI)is the so called, inter-subject/inter-session variability: (i.e largevariability in measurements obtained during different record­ing sessions). This variability restricts the number of samplesavailable for single-trial analysis to a limited number that canbe obtained during a single session. Here we propose a novelmethod that distinguishes between subject-invariant features andsubject-specific features, based on a bilinear formulation. Themethod allows for one to combine multiple recording of EEG toestimate the subject-invariant parameters, hence addressing theissue of inter-subject variability, while reducing the complexityof estimation for the subject-specific parameters. The method isdemonstrated on 34 datasets from two different experimentalparadigms: Perception categorization task and Rapid SerialVisual Presentation (RSVP) task. We show significant improve­ments in classification performance over state-of-the-art methods.Further, our method extracts neurological components neverbefore reported on the RSVP thus demonstrating the ability ofour method to extract novel neural signatures from the data.

I. INTRODUCTION

Traditionally, the term Brain Computer Interfaces (BCI)refers to algorithms that aim to decode brain activity, on asingle-trial basis, in order to provide a direct control pathwaybetween a user's intentions and a computer [1], [2], [3], [4].Such an interface could provide locked in patients a moredirect and natural control over a neuroprosthesis or othercomputer applications [2]. Further, by providing an additionalcommunication channel for healthy individuals, BCI systemscan be used to increase productivity and efficiency in high­throughput tasks [5], [6].

Single-trial discriminant analysis has also been used as aresearch tool to study the neural correlates of behavior. Thelow signal-to-noise ratio (SNR) of EEG can be overcomeby extracting activity that differs maximally between two ex­perimental conditions. The resulting discriminant componentscan be used to identify the spatial origin and time course ofstimulus/response specific activity. Further, improved SNR canbe leveraged to correlate variability of neural activity acrosstrials to behavioral variability and behavioral performance

Christoforos Christoforou is with R.K.I Leaders Limited, Agias Triados26A, Aradippou, Cyprus (e-mail: [email protected])

Robert M. Haralick is with the Department of Computer Science, TheGraduate Center of The City University of New York,New York, NY 10011,USA (e-mail: [email protected])

Paul Sajda is with the Department of Biomedical Engineering, ColumbiaUniversity, New York, NY 10027 USA.(e-mail: [email protected])

Lucas C. Parra is with the Department of Biomedical Engineering, TheCity College of The City University of New York, New York,NY 10031,USA (e-mail: [email protected])

978-1-4244-6287-2/10/$26.00 ©2010 IEEE

[7], [5], [8]. In essence, discriminant analysis adds to theexisting set of multi-variate statistical tools commonly used inneuroscience research (ANOVA, Hoteling T2, Wilks A test).

A major challenge in a single-trial analysis of EEG is the socalled inter-subject/inter-session variability. Different sessionsof EEG recordings vary dramatically, even when the samesubject is used. The causes of this phenomenon are foundin the experimental setup procedure, variations in electrodeposition placing and conductivity, the anatomical structure ofthe brain, and the individual's alertness at the time of theexperiment, to name a few.

Due to inter-subject variability, BCI algorithms proposedthus far are restricted to applying single-trial EEG analysisseparately to data obtained from each session. Further, itenforces a constraint on the number of training-set samplesto be used in single-trial discriminant models, since there is alimit on the number of trials that can be obtained by a subjectin a single session.

In this paper, we propose a novel method that distinguishesbetween subject-invariant features and subject-specific fea­tures, based on a bilinear formulation. The method allows forone to combine multiple recordings of EEG hence addressingthe issue of inter-subject variability while reducing the com­plexity of estimation for the subject-specific parameters.

The following hypothesis motivates our approach:

Hypothesis: Spatio-temporal characteristics of EEGsignals vary greatly among subjects or repetitionsof EEG experiments. However, for any specific ex­perimental paradigm, the underlying neural rhythmicactivity - the cause of oscillatory and evoked relatedfeatures in EEG - associated with a task of interestremains invariant.

This hypothesis is implicitly assumed in most current single­trial analysis algorithms. For example, in the BCI task, where asubject is asked to imagine a movement of the left or the righthand, a preprocessing step involves band-pass filtering the datain the a-band (8Hz-13Hz) [9]. The selection of this band wasmotivated by neurological studies and is tied to this specificexperimental paradigm, independent of the subject performingthe task. Another example is the case of Rapid Serial VisualPresentation (RSVP) experiments were series of images areshown to a subject. The goal is for the subject to identifyimages that contain a target of interest. In such experimentalparadigms one looks for an evoked potential at around 300msafter the stimulus (called P300 signal)[6]. Again this signal

Page 2: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

is assumed to occur in this specific experimental paradigmindependent of the subject.

In the next section, we define the single-trial classificationproblem across multiple EEG recordings. We then introduceour proposed algorithm termed Bilinear Feature Based Dis­criminant (BFBD). In the last section, we demonstrate theperformance of our method on 34 real EEG datasets fromtwo different experimental paradigms: the Perception catego­rization task and the Rapid Serial Visual Presentation (RSVP)task. We show significant improvements in classification per­formance over state-of-the-art methods. Further, our methodextracts neurological components never before reported on anRSVP paradigm, thus demonstrating the ability of our methodto extract novel neural signatures from the data.

that follow, we formulate a new classification algorithm thatuses this extra information to estimate the subject-invariantparameters and fine tune the subject-specific parameters in abilinear formulation. We term our algorithm Bilinear-FeatureBased Discriminant (BFBD).

B. Bilinear Feature Pool

Before we present the new algorithm, we define theconcept of a Bilinear Feature Pool. Let ~ be a given set offunctions with elements cPk : jRDxT ---+ jRDk XTk. Note thatcPk denotes a function from a matrix space to a matrix space.For EEG signal analysis, we design this set of functions tobe relevant to the features of interest. Examples of possiblefunction relevant to EEG analysis are listed in the Appendix I.

For each such function cPk E ~, we can define the bilinearprojection of the resulting transformation as follows :

where the parameter vector w E jRlII defines a linear combina­tion of the elements oft(X; Q, I). We identify the parametersQ(I) and w as the subject-specific parameters of our model.In the following sections, we give an interpretation of themodel and define the optimization problem to estimate themodel parameters I, Q(I), and w.

tk = Trace{(Ul cPk(X)Vk)}

where u, E jRDXR, Vk E jRTXR are parameters, X E jRDxT

is the EEG signal and R is a user specified integer. Theresulting scalar t k is a linear combination of the elements ofthe transformed matrix X. Note that the parameters are tiedto the selection of the kth function cPk.

(

gIl(X;QIl,cPIl) )t(X;Q,I) = ...

gIlll(X; QI11 1' cPI111)

Note that the dimensions of the space t(X; Q, I) E jRlIIdepend on the cardinality of the index set III. We identifyI as the subject-invariant parameter of our model. Finally,we define the classification of an observation X, as:

(2)f(X) = sign(wTt(X;Q,I))

c. Bilinear-Feature Based Discriminant Model

Consider the set of functions {cPI, ...cPK} c ~. For eachcPk E ~ we define the following discriminant model:

gk(X;Uk,Vk,cPk) = Trace {UlcPk(X)Vk} (1)

where Ui, and V k are the model parameters. Let Q ={(Us. V k)}f[=1 be the list of K model parameters eachassociated with the kth feature function. Define the parameterindex set I C {I, .., K}, whose elements are in ascendingorder. We denote I, and Qi as the ith element of sets Iand Q respectively. We use Q(I) to denote the subset ofQ whose elements are determined by the index set I, i.e.,{QII , ..., QIII I }. We can express the feature vector t (X; Q, I).

II. METHOD

Our goal in this proposed method is to take advantageof the fact that certain characteristics of the EEG signalsare subject-invariant and session-invariant, while some othersare subject-specific and session-specific. By distinguishingbetween subject-invariant parameters and subject-specific pa­rameters, we can use EEG recordings from many subjectsto estimate the subject invariant-parameters, while using arecording from an individual subject to fine-tune the subject­specific parameters. The advantage of this approach is ofcourse the fact that, while we have a limited number of trialsin a single session (i.e. subject gets tired after a couple ofhours), there is no limit in repeating the experiment on multiplesubjects/session. Hence, part of the model can be trained withmany more trials, reducing the problem of over-training.

A. Problem definition

We represent an EEG trial by a matrix X E jRDxT where,D denotes the number of sensors and T the number oftemporal samples of the EEG. Multiple trials from a singleexperiment are represented as a set of such matrices. EachEEG trial is associated with some underlying mental state,that constitutes the label of the trial. Further, we now havemultiple experiments, denoted by a superset of sets. We canthen define the following classification problem:

Classification ProblemLet VI = {Xn,Yn};;=I, X E jRDXT,y E {-I,I}be a set of training examples, where, X; corre­sponds to the EEG signal of D channels and T sam­ple points and Yn indicates one of two conditions orclasses, (e.g. right or left hand imaginary movement,stimulus versus control conditions, etc.). Given Mmore such datasets {Vm}~~i, the task is then topredict the class label Y for an unobserved trial Xsampled from the same distribution as the samplesfrom the session VI.

Note that in the above formulation, we have informationavailable from multiple datasets that due to inter-subject andinter-session variability cannot be treated as one large dataset.Further, the observations are given as matrices and not as vec­tors, as is the norm in classification problems. In the sections

Page 3: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

ROC Performance (Az) ROC Performance (Az)

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5HOCA BOCA BFBO HOCA BOCA BFBO

(a) (b)

Fig. 1. (a)Perfonnance comparison for 20 data sets collected from 10 subjects on perception categorization task. The Az performance for the three algorithmsis (mean ± std.): 0.6S± 0.07, 0.70±0.OS and 0.73± 0.07. Statistical significance with p < 0.009 and was computed using a paired t-test.(b)Perfonnancecomparison for 14 data sets collected from 5 subjects on target detection task. The Az performance for the three algorithms is (mean ± std.): 0.76± 0.07,0.S3±0.OS and 0.91± 0.07. Statistical significance (p < 0.0005) and was computed using a paired t-test. Each scatter plot compares area under the ROCcurve (Az) achieved by BOCA, HOCA classifier vs BFBO.

Q(I) given the corresponding <p by:

INote that in the formulation above we drop the dependence of L on thedata L((UIt ' V It) ; {Xn ,Yn}~=l ' <PIt)

a rg max L((UIt , VIt ) ; <P I t ) + reg(UIt , VIt )

a rg max L ((U I2 ' VI2) <PI2) + reg(UI2' VI2)

UIt ,YIt

UI2' YI2

N

L((U, V) ; {Xn , Yn}~=l' <p) = - L log(l+exp-Yng(Xn;(U ,V )"t» )

n=l(3)

We further introduce regularization terms for each of theparameters U and V that we denote here as reg(U, V):

R

reg(U, V) = L u~Kuur + v~Kvvr (4)r=l

We obtained analytic formulas for the gradient function of Land reg, with respect to the parameters (U, V) I . We solvethe above optimization using the gradient descent algorithm.For details on bilinear optimization we refer to the work of[10].

where K; and K , are covariance matrices. For details onselecting K; and K, we refer to the work of [10]. The pa­rameters Q(I) are the solutions of the following optimizationproblems:

B. Estimating the parameter vector w

For an optimal set of parameters Q = {Uk, Vk}~~l'obtained by the optimization in section III-A we transform

. N . A Nthe mput data-set {Xn , Yn}n=l to {t(Xn , Q,I), Yn}n=l ' Forsimplicity in notation, we denote t n = t(Xn ;Q,I). In thisnew space we assume a Normal class-conditional distribution

D. Model interpretation

As explained in the previous section, the functions in thefeature set <I> transform an input EEG matrix X to a potentiallymore informative matrix <p(X). The motivation is that someof these functions can possibly decrease the dimensions ofthe space, while increasing the signal-to-noise ratio of theobservation. Note that we have no restriction on the type offunctions in the set; they can be either linear or non-lineartransformations. The function gk is associated with function<Pk and can be thought of as a parametric feature extractor. Itimplements a bilinear combination of the elements of <Pk (X).The output of gk is considered a single feature obtained by theEEG observation. The effectiveness of each feature dependson its parameters Uj, and V k, as well as the selection ofthe corresponding <Pk . We determine the proper values ofthe parameters Uk and V k by means of an optimizationprocedure, which we present in the next section. The indexset I specify a selection of features from the feature pool <I>.Depending on the experimental paradigm, different functionsin <I> might be informative (i.e., increase the signal-to-noiseratio). We identify I as the subject-invariant parameter of ourmodel because we obtain it using recordings from multipledata-sets, contrary to Q(I) and w, which are determined usinga single-subject/single-sessionrecording. Finally, parameter wdefines a linear discriminant in the feature space.

III. OPTIMIZATION

The model involves a number of parameters that need to beoptimized. Specifically, the index set I, the vector wand theparameters Q(I) . We will first formulate the optimization ofthe subject-specific parameters Q(I) and w for a fixed indexset I. Then we will formulate a combinatorial optimizationproblem and present an algorithm to find an approximatesolution for the index set I .

A. Estimating the subject specific parameters:Q(I)

Given a dataset {Xn , Yn};;=l' fix an index set I . We canthen express the log-likelihood of the parameters (U, V) E

Page 4: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

on tly with equal covariance matrix:

2Five fold cross validation is a standard technique in patter recognitionwhere a training set is partitioned into five blocks, with approximately anequal number of observations in each block. Four of the blocks are used fortraining a classifier, and the fifth for applying the classifier. Blocks are rotatedfive times until the classifier is applying to all data.

where N(JL, E) denotes the normal distribution with meanJL and covariance matrix E. Under this normality assumption,the Fisher Discriminant gives the Bayes optimal classifier withan analytic solution for the parameter w given as:

to search the parameter lattice for solutions or exploit differentproperties of the optimization function using branch and boundtechniques [11]. Since in our formulation it is desired for I tohave small cardinality (i.e., III < 4) we employ a stochasticsearch strategy that promotes sparseness. In Appendix II weprovide the pseudo-code for our proposed algorithm, whichapproximates the solution to the combinatorial problem. Al­ternatively, we can use a generic algorithm for feature selectionproposed by [12].

IV. EVALUATION

We evaluated our method on 20 real EEG data-sets obtainedfrom 10 subjects/individuals on a Perceptual Categorizationtask [13]. Further, we report results on 14 real EEG data-setsobtained from 5 subjects on a Rapid Serial Visual Presentationtask [6]. For each subject s, the data-sets corresponding toall subjects except the sth one were used in the estimationprocedure of the subject-invariant set i. Having obtained i,the set of subject-specific parameters are estimated using thedata-sets associated with the sth subject. Performance on eachdata-set is measured in terms of the area under the ROC curve(we refer to as Az values) using five-fold cross validationwithin each data-set.

We compare our method against Bilinear DiscriminantComponent Analysis (BDCA) [10], and Hierarchical FisherDiscriminant Analysis (HDCA)[6]. The results are summa­rized in Figure 1 for each of the two tasks. In both paradigmsthe proposed method outperforms the other two methods interms of Az values. Specifically, in the perception catego­rization task, the BFBD mean performance across data-sets is0.73 ± 0.07, the BDCA mean performance is 0.70 ± 0.07 andthe HDCA mean performance is 0.68 ± 0.07. The differencein performance was found to be significant using a pairedt-test with p-values < 0.009. In the RSVP task the meanperformance across datasets is 0.93 ± 0.07 for the proposedmethod, 0.83 ± 0.08 for the BDCA and 0.76 ± 0.07 for theHDCA. The improvement in terms of Az value differenceis 0.10 and 0.17 over BDCA and HDCA respectively. Thisdifference was found significant using a paired t-test with p­values< 0.0005.

The performance of each algorithm in comparison to oneanother is summarized using the scatter-plots in figure 2. Fromthe two scatterplots in figure 2, we see that BFBD achieveshigher Az values in comparison to BDCA and HDCA on 85%of the datasets. We note that in the 15% of the datasets, whereBDCA and HDCA perform better, the difference between theperformance is small, since most of those points are very closeto the equi-performance manifold.

The two most important features extracted by our methodare the conventional linear features (i.e raw EEG data) andan estimate of power in higher frequencies (20-40Hz). Theresulting bilinear coefficients indicate that images of interestelicited increased power in this frequency band followingimage presentation with a non-uniform spatial distribution.This finding is interesting because traditionally, analysis inRSVP paradigms, only involved Evoke Related Potentials (i.e

(5)

where iLl and iL2 are estimates of the mean for the twoclasses, and E is the estimated pool covariance matrix.

A new observation X n ew can then be classified by evaluat­ing the function:

f(Xn ew ) = sign(wT t(Xn ew , Q, I)) (6)

p(tly = +1)p(tly = -1)

We can now define the optimization procedure as:

A M+II = argj maxmean_cv(I; {Vm}m=2) (8)

The optimization problem specified in equation (8) fallsinto the category of combinatorial problems. There are manyalgorithms proposed in the literature to solve this type of opti­mization. These algorithms either employ different heuristics

c. Estimation ofsubject-invariant parameter I

In the section above we defined the discriminant model andexpressed the corresponding optimization problems to estimatethe subject specific parameters Q(I) and w for a fixed I.In this section we define the optimization of the subject­invariant parameter set I and suggest a heuristic algorithmto find satisfactory solutions.

According to the problem definition in section II-A, asuperset of data-sets {V2 , ••• , VM+I} is given, where eachV m = {(Xn, Yn)}~:l corresponds to observations from aspecific subject and a specific EEG session. The subjectspecific parameters of the model are trained on individual data­sets. Our goal in estimating the subject-invariant parameter setI is to take advantage of the information provided by all data­sets combined.

We proceed in formulating the optimization problem bydefining the following functions. Let cv(I; V m ) be the func­tion that calculates the five-fold cross-validation/ performancein terms of the area under the ROC curve, evaluated for afixed parameter set I on data-set V m , using the optimizationprocedures described in the previous section. We define theaverage performance of index set I on the given data-set as

1 M+I

mean_cv(I; {'Om}~!i) = M L cv(I; 'Om) (7)m=2

Page 5: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

BFBO vs BOCA BFBO vs HOCA

o 0 8 0o o~

15%

0.9

oo

008o~ <6 ~

o 09J0

Q)

o <% 00

0<0> 0o

85%

0 .5 "-----~-~--~-~-~

0.5 0.6 0.7 0.8HOCA

0.6

0.7

0.8

0.9

ocoLLco

15%

0.90.7 0.8BOCA

0.6

85%

ocoLLco

Fig. 2. Comparison of Az values achieved by BFBD,BDCA and HDCA. Az values obtained by BDCA and the proposed BDBD are shown as a circle foreach dataset in the left scatter-plot. Similarly, Az values obtained by HDCA and BFBD are shown in the right scatter-plot. The proportion of datasets lyingabove/below the diagonal is shown at the top-left/bottom-right corners of each plot.

changes in signal amplitude). Thus we demonstrate that ourmethod can extract novel components solely from the data.

V. CONCLUSION

In this paper, we proposed a novel method that distin­guishes between subject-invariant features and subject-specificfeatures, based on a bilinear formulation that addressed inter­subject and inter-session variability. This allowed our methodto utilize multiple EEG recordings in the analysis. We demon­strated our method on 34 datasets from two difference ex­perimental paradigms and showed significant improvementsin classification performance over state-of-the-art methods.Further, we showed that our method is capable of extractingnovel neurological components never before reported on theRSVP, thus demonstrating the ability of our method to extractnovel neural signatures from the data.

leakage [14]. It uses Slepian Sequences, a set of temporalbasis functions designed to concentrate the maximum energyin a narrow frequency band around zero; under the assumptionof signals with unit variance. By multiplying the basis withsinusoid of some center frequency f the basis can capture thepower in the same bandwidth centered on f . Time varyingwindows are used to capture the power of the signal in anarrow frequency band and at various center frequencies andtemporal resolutions.

Temporal downsampling EEG is usually recorded usinghigh temporal resolution, with sampling rates ranging from1024Hz to 2048Hz. However, this high sampling rate increaseunnecessarily the dimensionality of the parameter space.Temporal down sampling helps in reducing the number ofdimensions without compromising the signal quality.

APPENDIX I

List of possible families of functions, that could be includedBilinear Feature Pool and their relevance to EEG analysis.

Spatial filtering Spatial filtering includes channel subsetselection, whereby a small number of channels are selected.This reduces the number of dimensions to learn. An alternativeway, is the grouping and averaging of channels based on theirlocation. Spatial filtering has been used extensively in EEG.

Hilbert Transform: The Hilbert transform is useful foranalyzing the instantaneous power content of an oscillatorysignal as a function of time. The analytic signal of the Hilberttransform of a real-valued signal is a complex signal wherethe real part can be thought of as the original signal, whilethe imaginary part is the original signal with 7r / 2 phase shift- i.e. every sinusoidal component of the signal is shifted.The magnitude squared of its complex coefficients, definesthe instantaneous power of the signal over time. By filteringthe signal for a specific frequency band, before applying theHilbert transform, we can obtain the instantaneous powerover that specific band. In terms of the EEG signals, thischange in power is useful since it captures modulation ofoscillatory features of interest.

Multitaper Spectrum Estimation Multitaper is a spectrumestimation method that minimizes the problem of spectral

Identity Transform The Identity function applied on amatrix returns the exact same matrix. We include the identityfunction in our pool of features for uniformity in notation.

APPENDIX II

Below we provide the code-listing of the proposed combina­torial optimization algorithm, to solve the optimization prob­lem formulated in 8. Note that the optimization is performedoff-line, thus it can afford the additional complexity introducedin our method. Once the optimal parameter values are obtainedand fixed, applying the classification involves calculating aninner product of the parameters with the observation which isa fast calculation to apply on real-time.

Page 6: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

Algorithm 1 Combinatorial optimization algorithm

Input: {V}~=I' ~ = {<PI, ...,<Pk} ,J = {I, .., K}Choose MaxNumOfDimensions E {I, ..4}Choose RepetitionsPerDimension s; KInitialize bestFeaturePool = {}for k = l:MaxNumOf Dimensions do

Initialize currentFeaturePool = {}for i =l:RepetitionsPerDimension do

leur1 «-randomly sample from bestFeaturePool atuple < Az, I >, assign I to leurlleur2 «-randomly sample from J.leur f- leurl U leur2Az f- mean_cv(leur, {V}~=I)Add < Az, leur> to currentFeaturePool

end forbestFeaturePool f- Assign five of < Az, I >E{currentFeaturePool U bestFeaturePool} with high­est Az value.

end forreturn < Az, I >E bestFeaturePool with highest Azvalue.

ACKNOWLEDGMENT

This work was funded by Government Contract#NBCHC080029

REFERENCES

[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M.Vaughan. Brain-computer interfaces for communication and control.Clin Neurophysiol, 113(6):767-791, June 2002.

[2] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey,A. Kubler, J. Perelmouter, E. Taub, and H. Flor. A spelling device forthe paralysed. Nature, 398(6725):297-8, Mar FebruaryMay 1999.

[3] B. Blankertz, G. Curio, and K. Muller. Classifying single trial EEG:Towards brain computer interfacing. In T. G. Diettrich, S. Becker,and Z. Ghahramani, editors, Advances in Neural Information ProcessingSystems 14. MIT Press, 2002., 2002.

[4] B. Blankertz, G. Dornhege, C. Schfer, R. Krepki, J. Kohlmorgen,K. Muller, V. Kunzmann, F. Losch, and G. Curio. Boosting bit ratesand error detection for the classification of fast-paced motor commandsbased on single-trial EEG analysis. IEEE Trans. Neural Sys. Rehab.Eng., 11(2):127-131, 2003.

[5] Adam D. Gerson, Lucas C. Parra, and Paul Sajda. Cortically-coupledcomputer vision for rapid image search. IEEE Transactions on NeuralSystems and Rehabilitation Engineering, 14:174-179, June 2006.

[6] Lucas C. Parra, Christoforos Christoforou, Adam D. Gerson, MadsDyrholm, An Luo, Mark Wagner, Marios G. Philiastides, and PaulSajda. Spatio-temporal linear decoding of brain state: Application toperformance augmentation in high-throughput tasks. Signal ProcessingMagazine, Special Issue on Brain Computer Interfaces, 2007 (to appear).

[7] M.G. Philiastides, R. Ratcliff, and P. Sajda. Neural representation oftask difficulty and decision making during perceptual categorization: Atiming diagram. Journal of Neuroscience, 26(35):8965-8975, August2006.

[8] M.G. Philiastides and P. Sajda. Temporal characterization of the neuralcorrelates of perceptual decision making in the human brain. CerebralCortex, 16(4), April 2006.

[9] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller. Optimal spatialfiltering of single trial EEG during imagined hand movement. IEEETrans. Rehab. Eng., 8:441--446, December 2000.

[10] Mads Dyrholm, Christoforos Christoforou, and Lucas C. Parra. Bilineardiscriminant component analysis. J. Mach. Learn. Res., 8:1097-1111,2007.

[11] G. Ausiello, P. Crescenzi, V. Kann, Marchetti-Sp, Giorgio Gambosi, andAlberto M. Spaccamela. Complexity and Approximation: CombinatorialOptimization Problems and Their Approximability Properties. Springer,January 2000. ISBN 3540654313.

[12] A.K. Jain and D. Zongker. Feature-selection: Evaluation, application,and small sample performance. IEEE Trans. on Pattern Analysis andMachine Intelligence, 19:153-158, 1997.

[13] Philiastides Marios G., Ratcliff Roger, and Sajda Paul. Neural rep­resentation of task difficulty and decision making during perceptualcategorization: A timing diagram. Journal of Neuroscience, 26(35):8965-8975, August 2006.

[14] J.W. Pitton. Time-frequency spectrum estimation: an adaptive multitapermethod. Time-Frequency and Time-Scale Analysis, 1998. Proceedingsof the IEEE-SP International Symposium on, pages 665-668, Oct 1998.