30-jcit2-753041je
TRANSCRIPT
-
7/28/2019 30-JCIT2-753041JE
1/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
A Detective Method for Multi-class EEG-based Motor ImageryClassification Based on OCSVM
Yanlei Gu1,3, Jianhua Dai1,2,, Bian Wu1,3, Nenggan Zheng1, Weidong Chen1,2,
Xiaoxiang Zheng1,3,*1 Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou 310027, China
2 College of Computer Science, Zhejiang University, Hangzhou 310027, China3 College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou
310027, [email protected]
doi:10.4156/jcit.vol6. issue1.30
AbstractThe aim of BCI is to translate the activity of brain into command to control external device
completing the task of communication. To achieve this goal, we need to recognize the various patterns
of the brain. So improving classification accuracy is essential in BCI. In this paper, a detective method:one class support vector machine (OCSVM) is applied to three (EEG) motor imagery (MI) tasksclassification the first time. The EEG signals is recorded from9 subjects performing left, right handand feet MI. In addition, we also use other classification methods: LDA, PNN, multi-class SVM, as acomparison. The results of the 3-class classification problem show that when using OCSVM,classification accuracy is significantly improved.
1. Introduction
Over the past years, Many research results show the possibility that brain signals recorded from thescalp or from within the brain could control external device to complete certain tasks which disabilities
cannot complete. The signals used to communicate through a computer usually include: P300,
event-related desynchronization (ERD), visual evoked potential(VEP), etc.
The aim of BCI is to translate the activity of brain into command to control external devicecompleting the task of communication[1,2]. To achieve this goal, we need to recognize the
patterns of the brain. Now, in EEG MI classification, lots of work is put in the methods for
difference feature extraction. In this paper, we will discuss aspects of classification methods.These most common methods include multi-class support vector machine(M-SVM) and artificial
neural networks(ANN). SVM has been applied to multiclass BCI problems using the one-versus
rest(OVR) strategy. ANN have been applied to binary and multiclass mental tasks classification.
Schalk, et al. proposed a new concept in BCI: detection instead of classification. They introduces andvalidates signal detection, which does not require the analysis procedures (preliminary analyses to
identify the brain signal features best suited for communication)[3].
OCSVM is an extension to SVMs to estimate the support vectors of a distribution introduced by
Schlkopf et al [4]. Various areas have the applications of OCSVM: image retrieval [5], geometryinvariant texture retrieval [6], Clustering[7], communication signal modulation scheme recognition[8],
etc. In EEG-based signal processing, OCSVM is usually used to detect epilepsy[3,9], in the cases thatthe number of samples of one class is much less than another.
In this paper, OCSVM is introduced as an detective method for EEG MI classification, with almost
the same number of samples of three classes. In this paper, we choose OCSVM to classify three MI
tasks. First, the characteristics of feature samples in one class are modeled using this algorithm. Then,
for every class, we construct a model. So, for every new feature samples, we select the nearest modelas its class.
The introduced method is tested on brain pattern recognition data to correctly distinguish three MI
tasks. First, we use the common spatial pattern (CSP) to derive the feature samples for every class. The
method was compared to Linear discriminant analysis (LDA), Multi-class support vector
*Corresponding author.
- 257 -
-
7/28/2019 30-JCIT2-753041JE
2/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
machine(M-SVM), probabilistic neural network(PNN). The results show that the classification
performance can be significantly improved by OCSVM method.
2. Methods
2.1. Feature extraction
In this paper, we use the well-known algorithm called Common Spatial Patterns(CSP) to derive thefeature vectors for training. The main idea of CSP is to use a linear transform to project the
multi-channel EEG data into a spatial subspace with a projection matrix, each row of which consists of
the weights corresponding to each channel. This transformation can maximize the variance of two-classsignal matrices. The algorithm is based on the simultaneous diagonalization of the covariance matrices
of both classes[11]. At first, CSP is applied to two imaginary tasks[12, 13]. But in this paper, three
kinds of tasks need to be distinguished. So, we use a method based on one-versus-the-rest(OVR)
algorithm, an extension of common spatial patterns(CSP) algorithm to multi-class case[14, 15].
Suppose there are three class tasks A, B, and C [14]
Step1: Estimate covariance matrices using equation(1)
TAAXXAR TBBXXBR TCC XXCR (1)
AX BX and CX are matrices with dimension of N (channels) by T (samples) (T>N). AX BX and
CX represent a trial of the all channels signal of class A, class B, and class C.
Step2: Separate one class from others. Here, we separate feet from hands. Factorize the sum
covariance matrices ofAR , BR and CR using equation(2)
letCBA RRR
TCBAAA UURRRRRR (2)
U is the matrix of eigenvector with the dimension of N by N, and U is also the unitary matrices of
principal components. is a diagonal matrix of eigenvalues with dimension of N by N.Here, we separate feet from hands. Then, the two classes signals can be modeled as[16]
C
A
CAAS
SCCX (3)
C
A
CAAS
SCCX (4)
Where AS and AS are the special source components for class A and A . AC and AC
are corresponding spatial patterns; CS is the common source component, CC is its
corresponding spatial pattern. By CSP, we can get two spatial filters, which can be used to extract
source componentsAS and AS .
Step3: Construct whitening transformation matrix using equation(5)
TUP 21 (5)Step4:
PRPS AA (6)
PRPS AA (7)
- 258 -
-
7/28/2019 30-JCIT2-753041JE
3/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
Step5: Find the maximum and minimum eigenvalues ofAR then we can find the spatial
filters(SF1 and SF2) corresponding to the eigenvalues which make the two classes with maximum
separability. Then use the filters to process the signals as (8):
XSFS 11
XSFS 22 (8)
X is a data matrix of preprocessed multi-channel EEG. The feature vector corresponding to one
source activity was defined as:
Feature=
)
)var()var(
)var(log()
)var()var(
)var(log(
21
2
21
1
SS
S
SS
S(9)
Computation step and detailed description about CSP can be found in [14, 15]. In this process, a key
point feature frequency band selection problem. In this paper, the preprocessed multi-channel EEGsignals were cut into 17 overlapped frequency band with bandwidth of 4 and overlapping bandwidth of
2. Then we select the band with best separability. We select one frequency band obtaining bestseparability, so the feature vectors are 4-dimensional vectors.
In this paper, the results of these types of classifiers are compared. In our case, we define left hand
MI as class1, right hand MI as class 2, feet MI as class 3.
2.2. L inear discriminant analysis( LDA)
LDA defines two measures: within-class scatter matrix and between-class scatter matrix.
Within-class scatter matrix: Tj
jij
ji
c
j
N
iW mXmXS
j
))((1 1
(10)
Within-class scatter matrix: Tjj
c
jb mmmmS ))((
1
(11)
The number of classes is c, jiX reprensents the ith sample of class j, jm is the mean of class j,
m is the mean of all class. The basic idea of LDA is to makes the direction of Fisher criterion function
reaches an extremum value as the best projection direction vector. So maximize the between-classmeasure, at the same time minimize the within-class measure [17].
In our case, we combine three LDA classifiers to make decision. As described in Figure 1.
First, we combine two of three classes as one class with the label of -1. Then the rest class is labeled
as 1. Using a LDA classifier, we get a result which we look as the value belongs to class 1. Afterrepeating the process twice, as shown in Fig.1, we get three results: result1, result2, result3
(corresponding to the original class1, class2, class3 respectively). Then we compare the three resultsand choose the maximum as the classification result.
- 259 -
-
7/28/2019 30-JCIT2-753041JE
4/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
COMPARE
Figure 1. Three classifications using three LDA classifiers. LH: left hand, RH: right hand, F: feet
2.3. Probabilistic neural network(PNN)
PNN is feedforward neural networks with 2 hidden layers, and is a typical nonlinear classifier,
which uses minimum Bayesian risk criterion. PNN network has 4-layer structure: input layer, pattern
layer, summation layer and decision layer.The input layer receives and normalizes input vector which is the feature extracted from EEG signal
using CSP. We set the number of neurons in the pattern layer as 3 for our 3 MI classification. Everyunit in pattern layer represents a training vector. Compute the Euclidean distance between the input
vector and every training vector, then realize nonlinear mapping with Gaussian kernel: the spherical
Gaussian radial basis function which is a Parzen probability density function estimator as Equation (12)
iM
k
iKT
iKqqi
XXXX
MXf
12
1
2]
2
)()(exp[
1
)2(
1)(
(12)
The summation layer computes the summation of each pattern and multiply the loss factor. Thedecision layer selects the largest one in summation layer as the classification result.
Figure 2. Probabilistic neural network structure
2.4. Multi-class support vector machine( M-SVM)
For comparison with the OCSVM, we also use M-SVM here, we use LIBSVM software package,designed by Taiwan University, Dr. Lin Zhiren, a freely-available library of SVM tools. It can solve
classification and regression estimation problems and distribution problems, and so on. In this paper,
we use LIBSVM to solve classification problem and a Gaussian radial basis function was selected asthe kernel function.
2.5. One-class support vector machine (OCSVM)
Class: RH and F LH
Label: -1 1
Class: LH and F RH
Label: -1 1
Class: RH and LH F
Label: -1 1
Result1
Result2
Result3
Make decision
- 260 -
-
7/28/2019 30-JCIT2-753041JE
5/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
Based on statistical learning theory, there developed a new machine learning method: SVM, which
is to find the optimal separating hyper-plane through learning in the feature space. This method can
overcome the shortcomings of rule-based classification algorithm and use less training data to achievehigher classification accuracy. However, SVM was originally proposed for the binary case. To solve
multi-classification problems, it requires constructing multiple classifiers (one to one, one to many,
etc.), training models and determining the complex parameters. In this paper, we select an extendedmethod: OCSVM to solve the problem. It achieves good results.
We consider training data
1x , 2x , 3x ,, nx XNR ;
Where n is the number of training samples. First, map the data into the feature space, then find a
smallest separating hyper-sphere through learning in the feature space, containing samples as many aspossible. We want the sphere to be as small as possible while at the same time, including the training
samples as many as possible. This problem can be transformed as the following optimization problem:
min(
l
ii
lw 1
2 1
2
1) (13)
s.t.iixw ))(( 0,2,1 ili
where w and are hyper-sphere parameters, is the map from input space to
feature space. By setting the parameter (0
-
7/28/2019 30-JCIT2-753041JE
6/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
*3333 ),()( xxKxf ii
i, 0
3 i (19)
The discriminant function to make a decision, finding the Maximum of )(1 xf , )(2 xf , )(3 xf .
Max( )(1 xf , )(2 xf , )(3 xf ) (20)
3. Results
3.1. Data description and processing
We use the BCI Competition 2008 IV dataset II a. The competition data set consists of EEG data
from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namelythe imagination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and
tongue (class 4). Two sessions on different days were recorded for each subject. Each session is
comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four
possible classes), yielding a total of 288 trials per session. We choose the first three classes as the datawe processed. For every subject we combine the two sessions as one data set. Twenty-two Ag/AgCl
electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG. All signals wererecorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The
signals were sampled with 250Hz and bandpass-filtered between 0.5Hz and 100Hz. The signals were
further bandpass-filtered between 0.5Hz and 40Hz in this paper. The sensitivity of the amplifier was setto 100 V. An additional 50Hz notch filter was used to suppress line noise.
3.2. Results of four algorithm
For the feature vectors of every person, we use ten-fold cross-validation to get the mean
classification accuracy(MCA) of every algorithm(alg) with the same feature samples extracted through
the CSP, The results is shown as follow. There are 9 subjects denoted as A01-A09.
Table 1. The classification result of four methods
AlgMCA
SubLDA OCSVM PNN SVM
A01 0.786 0.660 0.621 0.333
A02 0.538 0.626 0.548 0.307
A03 0.485 0.590 0.563 0.333
A04 0.678 0.741 0.698 0.650
A05 0.697 0.714 0.698 0.581
A06 0.542 0.605 0.540 0.333
A07 0.652 0.674 0.669 0.610
A08 0.595 0.636 0.610 0.555A09 0.619 0.645 0.555 0.600
Besides, for every subject, we calculate the variance of the ten classification accuracy, which is also
plot on the figure3.
- 262 -
-
7/28/2019 30-JCIT2-753041JE
7/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
A01 A02
A03 A04
A05 A06
A07 A08
A09Figure 3. Classification accuracy of nine subjects using the four algorithms(A01-A09 is the results of
subjects1-subject9). Black lines represent the variance of the results of 10 times cross-validation.
- 263 -
-
7/28/2019 30-JCIT2-753041JE
8/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
Figure 4.The line chart is the combined effect of the nine subjects. Abscissa is the nine subjects.Ordinate is the classification accuracy.
4. Discussion and Conclusion
By the results, the detective method OCSVM in the EEG-based three MI tasks classification hasdistinct advantages. Based on detection method, OCSVM construct a model for every class. When
there are new samples, every model detects and selects ones belong to their own class. This detective
method provides a new way to solve the classification problem.
5. Acknowledgement
This work was supported in part by National Science Foundation of China (61031002,60873125, 30800287, 60703038, 61070074).
6. Reference
[1] McFarland D, Wolpaw J, "Sensorimotor rhythm-based braincomputer interface (BCI): featureselection by regression improves performance", IEEE Transactions on Neural Systems and
Rehabilitation Engineering, vol. 13, no. 3, 2005.[2] Penny W, Roberts S, "EEG-based communication: a pattern recognition approach", IEEE
Transactions on Rehabilitation Engineering, vol. 8, no. 2, pp.214-215, 2000.
[3] Schalk G, Brunner P, "Brain-computer interfaces (BCIs): detection instead of classification",Journal of Neuroscience Methods, vol. 167, no. 1, pp.51-62, 2008.
[4] Schlkopf B, Platt J, "Williamson R. Estimating the support of a high-dimensional distribution",Neural computation, vol. 13, no. 7, pp.1443-1471, 2001.
[5] Yunqiang Chen, Xiang Zhou, "One-class SVM for learning in image retrieval", Proceedings ofIEEE International Conference on Image Processing, pp.34-39, 2001.
[6] Ma YD, Liu L, "Pulse-coupled neural networks and one-class support vector machines forgeometry invariant texture retrieval", Pattern Recognization, vol. 28, no. 11, pp.1524-1529, 2010.
[7] Huang X, Chen X, "A Novel Clustering Algorithm Based on One-Class SVM", IEEE ComputerSociety, pp.486-490, 2009.[8] Zhendong Y. "Research of communication signal modulation scheme recognition based on
one-class SVM bayesian algorithm", Proceedings of the 5th International Conference on Wirelesscommunications, networking and mobile computing, pp.739-742, 2009.
[9] Gardner A, Krieger A, "One-class novelty detection for seizure analysis from intracranial EEG",The Journal of Machine Learning Research, vol. 7, pp.10251044, 2006.
[10]Sun S, Zhang C, "Adaptive feature extraction for EEG signal classification", Medical andBiological Engineering and Computing, vol. 44, no. 10, pp.931-935, 2006.
[11]Ramoser H, Muller-Gerking J, "Optimal spatial filtering of single trial EEG during imaginedhandmovement", IEEE Transactions on Rehabilitation Engineering, vol. 8, no. 4, pp.441-446,
2000.
- 264 -
-
7/28/2019 30-JCIT2-753041JE
9/9
A Detective Method for Multi-class EEG-based Motor Imagery Classification Based on OCSVM
Yanlei Gu, Jianhua Dai, Bian Wu, Nenggan Zheng, Weidong Chen, Xiaoxiang Zheng
Journal of Convergence Information Technology, Volume 6, Number 1. January 2011
[12]Wang Y, Berg P, "Common spatial subspace decomposition applied to analysis of brain responsesunder multiple task conditions: a simulation study", Clinical Neurophysiology, vol. 110, no. 4, pp.
604-614, 1999.
[13]Mller-Gerking J, Pfurtscheller G, "Designing optimal spatial filters for single-trial EEGclassification in a movement task", Clinical Neurophysiology, vol. 110, no. 5, pp.787-798, 1999.[14]Wu W, Gao X, "One-versus-the-rest (OVR) algorithm: An extension of common spatial patterns
(CSP) algorithm to multi-class case", 27th Annual International Conference of the IEEEEngineering in Medicine and Biology Society, pp.2387-2390, 2005.
[15]Dornhege G, Blankertz B, "Boosting bit rates in noninvasive EEG single-trial classifications byfeature combination and multiclass paradigms", IEEE Transactions on Biomedical Engineering,
vol. 51, no. 6, pp.993-1002, 2004.
[16]Quigguo W, Fei M, "Feature combination for classifying single-trial ECoG during motor imageryof different sessions", Progress in Natural Science, vol. 17, no. 7, pp.851-858, 2007.
[17]Martnez A, Kak A, "PCA versus LDA", IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 23, no. 2, pp.228-233, 2001.
- 265 -