30-jcit2-753041je

7/28/2019 30-JCIT2-753041JE

1/9

A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM

Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng

Journal of Convergence Information Technology, Volume 6, Number 1. January 2011

A Detective Method for Multi-class EEG-based Motor ImageryClassification Based on OCSVM

Yanlei Gu1,3, Jianhua Dai1,2,, Bian Wu1,3, Nenggan Zheng1, Weidong Chen1,2,

Xiaoxiang Zheng1,3,*1 Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou 310027, China

2 College of Computer Science, Zhejiang University, Hangzhou 310027, China3 College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou

310027, [email protected]

doi:10.4156/jcit.vol6. issue1.30

AbstractThe aim of BCI is to translate the activity of brain into command to control external device

completing the task of communication. To achieve this goal, we need to recognize the various patterns

of the brain. So improving classification accuracy is essential in BCI. In this paper, a detective method:one class support vector machine (OCSVM) is applied to three (EEG) motor imagery (MI) tasksclassification the first time. The EEG signals is recorded from9 subjects performing left, right handand feet MI. In addition, we also use other classification methods: LDA, PNN, multi-class SVM, as acomparison. The results of the 3-class classification problem show that when using OCSVM,classification accuracy is significantly improved.

1. Introduction

Over the past years, Many research results show the possibility that brain signals recorded from thescalp or from within the brain could control external device to complete certain tasks which disabilities

cannot complete. The signals used to communicate through a computer usually include: P300,

event-related desynchronization (ERD), visual evoked potential(VEP), etc.

The aim of BCI is to translate the activity of brain into command to control external devicecompleting the task of communication[1,2]. To achieve this goal, we need to recognize the

patterns of the brain. Now, in EEG MI classification, lots of work is put in the methods for

difference feature extraction. In this paper, we will discuss aspects of classification methods.These most common methods include multi-class support vector machine(M-SVM) and artificial

neural networks(ANN). SVM has been applied to multiclass BCI problems using the one-versus

rest(OVR) strategy. ANN have been applied to binary and multiclass mental tasks classification.

Schalk, et al. proposed a new concept in BCI: detection instead of classification. They introduces andvalidates signal detection, which does not require the analysis procedures (preliminary analyses to

identify the brain signal features best suited for communication)[3].

OCSVM is an extension to SVMs to estimate the support vectors of a distribution introduced by

Schlkopf et al [4]. Various areas have the applications of OCSVM: image retrieval [5], geometryinvariant texture retrieval [6], Clustering[7], communication signal modulation scheme recognition[8],

etc. In EEG-based signal processing, OCSVM is usually used to detect epilepsy[3,9], in the cases thatthe number of samples of one class is much less than another.

In this paper, OCSVM is introduced as an detective method for EEG MI classification, with almost

the same number of samples of three classes. In this paper, we choose OCSVM to classify three MI

tasks. First, the characteristics of feature samples in one class are modeled using this algorithm. Then,

for every class, we construct a model. So, for every new feature samples, we select the nearest modelas its class.

The introduced method is tested on brain pattern recognition data to correctly distinguish three MI

tasks. First, we use the common spatial pattern (CSP) to derive the feature samples for every class. The

method was compared to Linear discriminant analysis (LDA), Multi-class support vector

*Corresponding author.

- 257 -

7/28/2019 30-JCIT2-753041JE

2/9




machine(M-SVM), probabilistic neural network(PNN). The results show that the classification

performance can be significantly improved by OCSVM method.

2. Methods

2.1. Feature extraction

In this paper, we use the well-known algorithm called Common Spatial Patterns(CSP) to derive thefeature vectors for training. The main idea of CSP is to use a linear transform to project the

multi-channel EEG data into a spatial subspace with a projection matrix, each row of which consists of

the weights corresponding to each channel. This transformation can maximize the variance of two-classsignal matrices. The algorithm is based on the simultaneous diagonalization of the covariance matrices

of both classes[11]. At first, CSP is applied to two imaginary tasks[12, 13]. But in this paper, three

kinds of tasks need to be distinguished. So, we use a method based on one-versus-the-rest(OVR)

algorithm, an extension of common spatial patterns(CSP) algorithm to multi-class case[14, 15].

Suppose there are three class tasks A, B, and C [14]

Step1: Estimate covariance matrices using equation(1)

TAAXXAR TBBXXBR TCC XXCR (1)

AX BX and CX are matrices with dimension of N (channels) by T (samples) (T>N). AX BX and

CX represent a trial of the all channels signal of class A, class B, and class C.

Step2: Separate one class from others. Here, we separate feet from hands. Factorize the sum

covariance matrices ofAR , BR and CR using equation(2)

letCBA RRR

TCBAAA UURRRRRR (2)

U is the matrix of eigenvector with the dimension of N by N, and U is also the unitary matrices of

principal components. is a diagonal matrix of eigenvalues with dimension of N by N.Here, we separate feet from hands. Then, the two classes signals can be modeled as[16]

C

A

CAAS

SCCX (3)

C

A

CAAS

SCCX (4)

Where AS and AS are the special source components for class A and A . AC and AC

are corresponding spatial patterns; CS is the common source component, CC is its

corresponding spatial pattern. By CSP, we can get two spatial filters, which can be used to extract

source componentsAS and AS .

Step3: Construct whitening transformation matrix using equation(5)

TUP 21 (5)Step4:

PRPS AA (6)

PRPS AA (7)

- 258 -

7/28/2019 30-JCIT2-753041JE

3/9




Step5: Find the maximum and minimum eigenvalues ofAR then we can find the spatial

filters(SF1 and SF2) corresponding to the eigenvalues which make the two classes with maximum

separability. Then use the filters to process the signals as (8):

XSFS 11

XSFS 22 (8)

X is a data matrix of preprocessed multi-channel EEG. The feature vector corresponding to one

source activity was defined as:

Feature=

)

)var()var(

)var(log()

)var()var(

)var(log(

21

2

21

1

SS

S

SS

S(9)

Computation step and detailed description about CSP can be found in [14, 15]. In this process, a key

point feature frequency band selection problem. In this paper, the preprocessed multi-channel EEGsignals were cut into 17 overlapped frequency band with bandwidth of 4 and overlapping bandwidth of

2. Then we select the band with best separability. We select one frequency band obtaining bestseparability, so the feature vectors are 4-dimensional vectors.

In this paper, the results of these types of classifiers are compared. In our case, we define left hand

MI as class1, right hand MI as class 2, feet MI as class 3.

2.2. L inear discriminant analysis( LDA)

LDA defines two measures: within-class scatter matrix and between-class scatter matrix.

Within-class scatter matrix: Tj

jij

ji

c

j

N

iW mXmXS

j

))((1 1

(10)

Within-class scatter matrix: Tjj

c

jb mmmmS ))((

1

(11)

The number of classes is c, jiX reprensents the ith sample of class j, jm is the mean of class j,

m is the mean of all class. The basic idea of LDA is to makes the direction of Fisher criterion function

reaches an extremum value as the best projection direction vector. So maximize the between-classmeasure, at the same time minimize the within-class measure [17].

In our case, we combine three LDA classifiers to make decision. As described in Figure 1.

First, we combine two of three classes as one class with the label of -1. Then the rest class is labeled

as 1. Using a LDA classifier, we get a result which we look as the value belongs to class 1. Afterrepeating the process twice, as shown in Fig.1, we get three results: result1, result2, result3

(corresponding to the original class1, class2, class3 respectively). Then we compare the three resultsand choose the maximum as the classification result.

- 259 -

7/28/2019 30-JCIT2-753041JE

4/9




COMPARE

Figure 1. Three classifications using three LDA classifiers. LH: left hand, RH: right hand, F: feet

2.3. Probabilistic neural network(PNN)

PNN is feedforward neural networks with 2 hidden layers, and is a typical nonlinear classifier,

which uses minimum Bayesian risk criterion. PNN network has 4-layer structure: input layer, pattern

layer, summation layer and decision layer.The input layer receives and normalizes input vector which is the feature extracted from EEG signal

using CSP. We set the number of neurons in the pattern layer as 3 for our 3 MI classification. Everyunit in pattern layer represents a training vector. Compute the Euclidean distance between the input

vector and every training vector, then realize nonlinear mapping with Gaussian kernel: the spherical

Gaussian radial basis function which is a Parzen probability density function estimator as Equation (12)

iM

k

iKT

iKqqi

XXXX

MXf

12

1

2]

2

)()(exp[

1

)2(

1)(

(12)

The summation layer computes the summation of each pattern and multiply the loss factor. Thedecision layer selects the largest one in summation layer as the classification result.

Figure 2. Probabilistic neural network structure

2.4. Multi-class support vector machine( M-SVM)

For comparison with the OCSVM, we also use M-SVM here, we use LIBSVM software package,designed by Taiwan University, Dr. Lin Zhiren, a freely-available library of SVM tools. It can solve

classification and regression estimation problems and distribution problems, and so on. In this paper,

we use LIBSVM to solve classification problem and a Gaussian radial basis function was selected asthe kernel function.

2.5. One-class support vector machine (OCSVM)

Class: RH and F LH

Label: -1 1

Class: LH and F RH

Label: -1 1

Class: RH and LH F

Label: -1 1

Result1

Result2

Result3

Make decision

- 260 -

7/28/2019 30-JCIT2-753041JE

5/9




Based on statistical learning theory, there developed a new machine learning method: SVM, which

is to find the optimal separating hyper-plane through learning in the feature space. This method can

overcome the shortcomings of rule-based classification algorithm and use less training data to achievehigher classification accuracy. However, SVM was originally proposed for the binary case. To solve

multi-classification problems, it requires constructing multiple classifiers (one to one, one to many,

etc.), training models and determining the complex parameters. In this paper, we select an extendedmethod: OCSVM to solve the problem. It achieves good results.

We consider training data

1x , 2x , 3x ,, nx XNR ;

Where n is the number of training samples. First, map the data into the feature space, then find a

smallest separating hyper-sphere through learning in the feature space, containing samples as many aspossible. We want the sphere to be as small as possible while at the same time, including the training

samples as many as possible. This problem can be transformed as the following optimization problem:

min(

l

ii

lw 1

2 1

2

1) (13)

s.t.iixw ))(( 0,2,1 ili

where w and are hyper-sphere parameters, is the map from input space to

feature space. By setting the parameter (0

7/28/2019 30-JCIT2-753041JE

6/9




*3333 ),()( xxKxf ii

i, 0

3 i (19)

The discriminant function to make a decision, finding the Maximum of )(1 xf , )(2 xf , )(3 xf .

Max( )(1 xf , )(2 xf , )(3 xf ) (20)

3. Results

3.1. Data description and processing

We use the BCI Competition 2008 IV dataset II a. The competition data set consists of EEG data

from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namelythe imagination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and

tongue (class 4). Two sessions on different days were recorded for each subject. Each session is

comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four

possible classes), yielding a total of 288 trials per session. We choose the first three classes as the datawe processed. For every subject we combine the two sessions as one data set. Twenty-two Ag/AgCl

electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG. All signals wererecorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The

signals were sampled with 250Hz and bandpass-filtered between 0.5Hz and 100Hz. The signals were

further bandpass-filtered between 0.5Hz and 40Hz in this paper. The sensitivity of the amplifier was setto 100 V. An additional 50Hz notch filter was used to suppress line noise.

3.2. Results of four algorithm

For the feature vectors of every person, we use ten-fold cross-validation to get the mean

classification accuracy(MCA) of every algorithm(alg) with the same feature samples extracted through

the CSP, The results is shown as follow. There are 9 subjects denoted as A01-A09.

Table 1. The classification result of four methods

AlgMCA

SubLDA OCSVM PNN SVM

A01 0.786 0.660 0.621 0.333

A02 0.538 0.626 0.548 0.307

A03 0.485 0.590 0.563 0.333

A04 0.678 0.741 0.698 0.650

A05 0.697 0.714 0.698 0.581

A06 0.542 0.605 0.540 0.333

A07 0.652 0.674 0.669 0.610

A08 0.595 0.636 0.610 0.555A09 0.619 0.645 0.555 0.600

Besides, for every subject, we calculate the variance of the ten classification accuracy, which is also

plot on the figure3.

- 262 -

7/28/2019 30-JCIT2-753041JE

7/9




A01 A02

A03 A04

A05 A06

A07 A08

A09Figure 3. Classification accuracy of nine subjects using the four algorithms(A01-A09 is the results of

subjects1-subject9). Black lines represent the variance of the results of 10 times cross-validation.

- 263 -

7/28/2019 30-JCIT2-753041JE

8/9




Figure 4.The line chart is the combined effect of the nine subjects. Abscissa is the nine subjects.Ordinate is the classification accuracy.

4. Discussion and Conclusion

By the results, the detective method OCSVM in the EEG-based three MI tasks classification hasdistinct advantages. Based on detection method, OCSVM construct a model for every class. When

there are new samples, every model detects and selects ones belong to their own class. This detective

method provides a new way to solve the classification problem.

5. Acknowledgement

This work was supported in part by National Science Foundation of China (61031002,60873125, 30800287, 60703038, 61070074).

6. Reference

[1] McFarland D, Wolpaw J, "Sensorimotor rhythm-based braincomputer interface (BCI): featureselection by regression improves performance", IEEE Transactions on Neural Systems and

Rehabilitation Engineering, vol. 13, no. 3, 2005.[2] Penny W, Roberts S, "EEG-based communication: a pattern recognition approach", IEEE

Transactions on Rehabilitation Engineering, vol. 8, no. 2, pp.214-215, 2000.

[3] Schalk G, Brunner P, "Brain-computer interfaces (BCIs): detection instead of classification",Journal of Neuroscience Methods, vol. 167, no. 1, pp.51-62, 2008.

[4] Schlkopf B, Platt J, "Williamson R. Estimating the support of a high-dimensional distribution",Neural computation, vol. 13, no. 7, pp.1443-1471, 2001.

[5] Yunqiang Chen, Xiang Zhou, "One-class SVM for learning in image retrieval", Proceedings ofIEEE International Conference on Image Processing, pp.34-39, 2001.

[6] Ma YD, Liu L, "Pulse-coupled neural networks and one-class support vector machines forgeometry invariant texture retrieval", Pattern Recognization, vol. 28, no. 11, pp.1524-1529, 2010.

[7] Huang X, Chen X, "A Novel Clustering Algorithm Based on One-Class SVM", IEEE ComputerSociety, pp.486-490, 2009.[8] Zhendong Y. "Research of communication signal modulation scheme recognition based on

one-class SVM bayesian algorithm", Proceedings of the 5th International Conference on Wirelesscommunications, networking and mobile computing, pp.739-742, 2009.

[9] Gardner A, Krieger A, "One-class novelty detection for seizure analysis from intracranial EEG",The Journal of Machine Learning Research, vol. 7, pp.10251044, 2006.

[10]Sun S, Zhang C, "Adaptive feature extraction for EEG signal classification", Medical andBiological Engineering and Computing, vol. 44, no. 10, pp.931-935, 2006.

[11]Ramoser H, Muller-Gerking J, "Optimal spatial filtering of single trial EEG during imaginedhandmovement", IEEE Transactions on Rehabilitation Engineering, vol. 8, no. 4, pp.441-446,

2000.

- 264 -

7/28/2019 30-JCIT2-753041JE

9/9




[12]Wang Y, Berg P, "Common spatial subspace decomposition applied to analysis of brain responsesunder multiple task conditions: a simulation study", Clinical Neurophysiology, vol. 110, no. 4, pp.

604-614, 1999.

[13]Mller-Gerking J, Pfurtscheller G, "Designing optimal spatial filters for single-trial EEGclassification in a movement task", Clinical Neurophysiology, vol. 110, no. 5, pp.787-798, 1999.[14]Wu W, Gao X, "One-versus-the-rest (OVR) algorithm: An extension of common spatial patterns

(CSP) algorithm to multi-class case", 27th Annual International Conference of the IEEEEngineering in Medicine and Biology Society, pp.2387-2390, 2005.

[15]Dornhege G, Blankertz B, "Boosting bit rates in noninvasive EEG single-trial classifications byfeature combination and multiclass paradigms", IEEE Transactions on Biomedical Engineering,

vol. 51, no. 6, pp.993-1002, 2004.

[16]Quigguo W, Fei M, "Feature combination for classifying single-trial ECoG during motor imageryof different sessions", Progress in Natural Science, vol. 17, no. 7, pp.851-858, 2007.

[17]Martnez A, Kak A, "PCA versus LDA", IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 23, no. 2, pp.228-233, 2001.

- 265 -

30-jcit2-753041je

Documents