brainwave feature extraction, classification & prediction

Feature Extraction, Classification & Prediction

www.oliviamoran.me

Cognitive Computing

FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 3

Olivia Moran is a leading training specialist who specialises in E-Learning instructional design and is a certified Moodle expert. She has been working as a trainer and course developer for 3 years developing and delivery training courses for traditional classroom, blended learning and E-learning.

Courses Olivia Moran Has Delivered: ● MOS ● ECDL ● Internet Marketing ● Social Media ● Google [Getting Irish Businesses Online] ● Web Design [FETAC Level 5] ● Adobe Dreamweaver ● Adobe Flash ● Moodle Specialties:

★Moodle [MCCC Moodle Certified Expert] ★ E Learning Tools/ Technologies [Commercial & Opensource] ★ Microsoft Office Specialist ★ Web Design & Online Content Writer ★ Adobe Dreamweaver, Flash & Photoshop

AAAbbbooouuuttt TTThhheee AAAuuuttthhhooorrr


111... AAABBBSSSTTTRRRAAACCCTTT

This document will examine issues pertaining to feature extraction, classification and prediction. It will consider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in an attempt to discriminate between left and right hand imagery movements. It will briefly reflect on the need for brainwave signal preprocessing. The feature extraction and classification process will be examined in depth and the results obtained using various classifiers will be illustrated. Classification algorithms will be given some thought, namely Linear Discriminant Analysis (L.D.A.), K-Nearest Neighbour (K.N.N.) and Neural Network (N.N.) analysis. This document will explore prediction and highlight its effect on accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches however, these are briefly addressed. The way in which biology and nature inspires the design of feature extraction, classification and prediction systems will be explored. Finally future work will be touched on.

222... IIINNNTTTRRROOODDDUUUCCCTTTIIIOOONNN

The study of E.E.G. data is a very important field of study that according to Ebrahimi et al (2003) has been ‚Motivated by the hope of creating new communication channels for persons with severe motor

disabilities‛. Advances in this area of research caters for the construction of more advanced Brain Computer Interfaces (B.C.I.’s). Wolpaw et al (2002) describes such an interface as a ‚Non-muscular

channel for sending messages and commands to the external world‛. The impact that such technologies could have on the quality of peoples’ everyday lives, namely those who have some form of physical disability is enormous. ‚Brain-Computer Interfacing is an interesting emerging technology

that translates intentional variations in the Electroencephalogram into a set of particular commands in

order to control a real world machine‛ Atry et al (2005). Improvements to these systems are often made through an increased understanding of the human body and the way in which it operates. Feature extraction, classification and prediction are all processes that our bodies carry out on a daily basis with or without our knowledge. Studying such activities will undoubtedly lead researchers to the creation of more biologically plausible B.C.I. solutions. It is not only individuals who will benefit from further studies and understanding of these processes, as feature extraction, classification and prediction have many other applications. Take for example, the world of business. Companies everywhere have to deal with a constant bombardment of information from both their internal and external environments. There seems to be an endless amount of both useful and useless information. As one can imagine, it is often very difficult to find exactly what you are looking for. When people eventually locate what they have been seeking it may be in a format that does not suit them. This is where feature extraction, classification and prediction play their part. These processes are often the only way in which a business can locate information gems in a sea of data.

FFFeeeaaatttuuurrreee EEExxxtttrrraaaccctttiiiooonnn,,, CCClllaaassssssiiifffiiicccaaatttiiiooonnn &&& PPPrrreeedddiiiccctttiiiooonnn


This document explores the various issues pertaining to feature extraction, classification and prediction. The application of these techniques to unlabelled E.E.G. data is examined in an attempt to discriminate between left and right hand imagery movements. It briefly looks at brainwave signal preprocessing. An in depth study of the feature extraction and the classification process is carried out focusing on numerous classifiers. L.D.A., K.N.N. and N.N. classification algorithms are examined. This document gives thought to prediction and how it could be used to improve accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches, however, these methods are mentioned in this document. Biology and nature often inspire the computing industry to produce feature extraction, classification and prediction systems that operate in the same or a similar way as the human body does. This issue of inspiration is briefly addressed and examples from nature are given. Finally areas for future work are considered.

333... BBBRRRAAAIIINNNWWWAAAVVVEEE SSSIIIGGGNNNAAALLL PPPRRREEEPPPRRROOOCCCEEESSSSSSIIINNNGGG

E.E.G. data is commonly used for tasks such as discrimination between left and right hand imagery movements. ‚An E.E.G. is a recording of the very weak electrical potentials generated by the brain on

the scalp‛ Ebrahimi et al (2003). The collection of such signals is non-invasive and they can be ‚Easily

recorded and processed with inexpensive equipment‛ Ebrahimi et al (2003). It also offers many advantages over other methods as ‚It is based on a much simpler technology and is characterized by

much smaller time constants when compared to other noninvasive approaches such as M.E.G, P.E.T. and

F.M.R.I.‛ Ebrahimi et al (2003). The E.E.G. data used as input for the analysis carried out during the course of this assignment had been preprocessed. Ebrahimi et al (2003) points out ‚Some preprocessing is generally performed due to the

high levels of noise and interference usually present‛. Artifacts are factors such as motor movements, eye blinking, electrode movement etc. that are removed, as these are not required and all the essential data needed to carry out classification is left behind. The E.E.G. data was recorded on two different channels, C3 and C4. These correspond to the left and right hemisphere of the motor cortex and would have been recorded by placing electrodes over the right and left sides of the motor cortex as shown in the figure 1 below.

Figure 1. – Showing the placing of the electrodes at channels 3 and 4 of the motor cortex.


It is important to record signals at these two channels due to the fact that ‚When people execute or

imagine the movement of left and right hand, E.E.G. features differs in two brain hemispheres

corresponding to sensorimotor hand representation area‛ Pei & Zheng (2004). Subsequently, when an imagined left hand movement is made, there are essentially two signals recorded C3 and C4, with both being left signals and vice versa for the right hand imagery movements.

444... FFFEEEAAATTTUUURRREEE EEEXXXTTTRRRAAACCCTTTIIIOOONNN

A feature is described by Sriraja (2002) as ‚Any structural characteristic, transform, structural

description or graph, extracted from a signal or a part of it, for use in pattern recognition or

interpretation. It is a representation of the signal or pattern, containing only the salient information‛. Ripley (1996) goes on to argue that a ‚Feature is a measurement on an example, so the training set of

examples has measured features and a class for each‛. Feature extraction is concerned with the identification of features that are unique or specific to a particular type of E.E.G. data such as all imagined left hand movements. The aim of this process is the formation of useful new features by combining existing ones. Using such features facilitates the process of data classification. There are multiple amounts of these features; some provide useful information while others none. The next logical step is the elimination of features that produce the lowest accuracy. For each test ran the accuracy of the classifier used was calculated. This was important as it allowed the author to determine which classifiers gave the best results for the data being examined. Wolpert (1992) points out that ‚Estimating the accuracy of a classier is important not only to predict its future

prediction accuracy, but also for choosing a classifier from a given set (model selection), or combining

classifiers‛.

555... TTTHHHEEE CCCLLLAAASSSSSSIIIFFFIIICCCAAATTTIIIOOONNN PPPRRROOOCCCEEESSSSSS

5. 1. Descriptive Classifiers In an effort to find the most appropriate type of classifier for the analysis of the E.E.G. data used in this assignment, the author turned to descriptive methods. These included basic features like the mean, standard deviation and kurtosis. Using this descriptive approach allows for the summarisation of the test and training data. This is useful where the sample contains a large amount of variables.

5. 1. 1. Mean The mean is ‚Short for arithmetic mean: in descriptive statistics, the average value, calculated for a

finite set of scores by adding the scores together and then dividing the total by the number of scores‛

Coleman (2003). During ‘Descriptive Features – Test 1’ an accuracy of 64% was obtained using the mean feature. It performed slightly higher than that of the standard deviation, which reached 61% accuracy.


5. 1. 2. Standard Deviation Standard Deviation is defined by Coleman (2003) as ‚A measure of the degree of dispersion, variability

or scatter in a set of scores, expressed in the same units as the scores themselves, defined as the square

root of the variance‛. ‘Descriptive Features – Test 2’ attempted to classify the E.E.G. data by utilising the feature of standard deviation. An accuracy of 61% was achieved.

5. 1. 3. Kurtosis Kurtosis is useful in that it ‚Provides information about the ‘peakedness’ of the distribution. If the

distribution is perfectly normal you would obtain a skewness and kurtosis value of 0‛ Pallant (2001). The results obtained during ‘Descriptive Features – Test 3’ using the kurtosis feature were disappointing with an accuracy of 49%. Kurtosis in this instance was not able to offer a higher level of separability than with both the mean and standard deviation. Kurtosis is usually more appropriate for lager samples, with which more satisfactory results could be accomplished. As noted by Tabachnick & Fidell (1996), ‚Kurtosis can result in an underestimate of the variance, however, this risk is also reduced

with a large sample‛.

5. 1. 4. Combination Of Mean, Standard Deviation And Kurtosis Features In some instances the combination of features can allow for greater accuracy, however this was not the case for the E.E.G. data that was examined using the mean, standard deviation and kurtosis. Test results from ‘Descriptive Features – Test 4’ showed accuracy to be in the region of 49% giving much lower performance than that of the mean and standard deviation features when used individually.

5. 1. 5. Conclusion Drawn From Mean, Standard Deviation And Kurtosis

Feature Tests The accuracy of the mean as a classifier was substantially higher than that of the standard deviation and kurtosis as well as a combination of all three. On the other hand, it still did not offer a satisfactory level of separation between the imagery left and right signals. These three features it seems are not appropriate for E.E.G. data and are better suited to more simple forms of data. With this in mind the author turned to the Hjorth features.

5. 2. Hjorth Features A number of Hjorth parameters were drawn upon during the course of this assignment. ‚In 1970, Bo

Hjorth derived certain features that described the E.E.G. signal by means of simple time domain

analysis. These parameters, namely Activity, Mobility and Complexity, together characterize the E.E.G.

pattern in terms of amplitude, time scale and complexity‛ Sriraja (2002). These were used in an attempt to achieve a separation between imagery left and right hand signals. The Hjorth approach involves the measurement of the E.E.G. signal ‚For successive epochs (or windows)

of one to several seconds. Two of the attributes are obtained from the first and second time derivates

of the amplitude fluctuations in the signal. The first derivative is the rate of change of the signal’s

amplitude. At peaks and troughs the first derivative is zero. At other points it will be positive or

negative depending on whether the amplitude is increasing or decreasing with time. The steeper the

slope of the wave, the greater will be the amplitude of the first derivative. The second derivative is

determined by taking the first derivative of the first derivative of the signal. Peaks and troughs in the


first derivative, which correspond to points of greatest slope in the original signal, result in zero

amplitude in the second derivative, and so forth‛ Miranda & Brouse (2005). According to Sriraja (2002) mathematically the equation for mobility and complexity resembles the following if x1, x2, …, xn are the n EEG data values, and the consecutive differences, xn - xn-1 be denoted as dn

5. 2. 1. Activity Feature Activity is defined by Miranda & Brouse (2005) as ‚The variance of the amplitude fluctuations in the

epoch‛. This feature during ‘Hjorth Features – Test 1’ was able to achieve only an accuracy of 44% and therefore offered very poor separability. ‘Hjorth Features – Test 2’ used the same classifier, however the time interval for sampling was changed from the 6th second to the 7th. This change resulted in an accuracy of 55%, an increase of 11% on the previous test. ‘Hjorth Features – Test 3’ was also carried out using the activity feature. This test aimed to determine whether or not changing the number of neurons used in the N.N. would have a notable effect on the accuracy of the classification. A change in this instance of the neuron numbers did not have a significant impact on performance.

5. 2. 2. Mobility Feature ‚Mobility is calculated by taking the square root of the variance of the first derivative divided by the

variance of the primary signal‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 4’ utilised this mobility feature for classification purposes. Results from this test showed that accuracy using this feature stands at 52%.

5. 2. 3. Complexity Feature Complexity is described as ‚The ratio of the mobility of the first derivative of the signal to the mobility

of the signal itself‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 5’ examined the complexity feature and its effect on accuracy. Results for this test showed the level of accuracy using this feature to be 64%.

5. 2. 4. Combination Of Activity, Mobility And Complexity Features ‘Hjorth Features – Test 6’ combined the activity, mobility and complexity feature in the hope of increasing accuracy further. This test showed very mediocre results with accuracy at 56%. However, when the data windows were specified as in ‘Hjorth Features – Test 7’ more promising results were recorded. Accuracy at 74% was achieved with a greater level of separability of the imagery left and right hand signals than all other pervious results.


Combining multiple features is useful as it can often lead to improved accuracy. Lotte et al (2007) highlights this point arguing, ‚A combination of similar classifiers is very likely to outperform one of

the classifiers on its own. Actually, combining classifiers is known to reduce the variance and thus the

classification error‛.

666... CCCLLLAAASSSSSSIIIFFFIIICCCAAATTTIIIOOONNN AAALLLGGGOOORRRIIITTTHHHMMMSSS

Kohavi (1995) defines a classifier as ‚A function that maps an unlabelled instance to a label using

internal data structures‛. Three different types of algorithms were used for classification. These included the L.D.A, K.N.N. and the N.N. classification algorithms.

6.1. L.D.A. Classification L.D.A. also known as Fisher’s L.D.A. is ‚Often used to investigate the difference between various

groups when their relationship is not clear. The goal of a discriminant analysis is to find a set of

features or discriminants whose values are such that the different groups are separated as much as

possible‛ Sriraja (2002). Lotte et al (2007) describes the aim of L.D.A. as being to ‚Use hyperplanes to

separate the data representing the different classes. For a two-class problem, the class of a feature

vector depends on which side of the hyperplane the vector is‛. The L.D.A. is concerned with finding the features that will maximise the distance between the two classes and reducing the distance that exists among the interclass. This concept is illustrated in figure 2 below.

Figure 2. – Shows a hyperplane that is used to illustrate graphically the separation of the classes i.e. the

separability of the imagery left hand data from the imagery right hand data

The equation for L.D.A. can be denoted in mathematical terms. Sriraja (2002) discusses the equation of L.D.A. and the principles on which it works. ‚First, a linear combination of the features x are projected

into a new feature, y. The idea is to have a projection such that the y’s from the two classes would be

as much separated as possible. The measure of separation between the two sets of y’s is evaluated in

Imagery Left Hand

Data

Imagery Right

Hand Data

Imagery Right Hand

Data


terms of the respective means and the variances of the projected classes . . . The objective is therefore

to have a linear combination such that the following ratio is maximised.‛

where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and

where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and n1 and n2 are the

sample sizes for the two sets‛. During testing the author utilised scatter graphs like figure 3 below to display graphically the results from the tests. Figure 3 shows the scatter graph that was constructed as part of test, which attempted classification of the E.E.G. data using the mean feature. The accuracy achieved using this feature was 64%.

Figure 3. – Mean Scatter Graph

The next graph Figure 4 illustrates the results of a test examining standard deviation with the accuracy of this feature standing at 61%.

-0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08


Figure 4. – Standard Deviation Scatter Graph

Scatter graphs are described by Fisher & Holtom (1999) as useful for the presentation of ‚The

relationship between two different types of information plotted on horizontal, x, and vertical, y, axis.

You simply plot the point at which the values meet, to get an idea of the overall distribution of your

data‛. Pallant (2001) is keen to point out that ‚The scatter graph also provides a general indication of

the strength of the relationship between your two variables. If the relationship is weak, the points will

be all over the place, in a blob type arrangement. For a strong relationship the points will form a

vague cigar shape with a definite clumping of scores around an imaginary straight line‛.

6.2. K.N.N. Classification The K.N.N. function is concerned with the computation of the minimum distance between the test data and the data used for training. Ripley (1996) defines test data as a ‚Set of examples used only to assess

the performance of a fully specified classifier‛ while training data is a ‚Set of examples used for

learning, that is to fit the parameters of the classifier‛. The K.N.N. belongs to the family of discriminative nonlinear classifiers. According to Lotte et al (2007) the main objective of this method is ‚To assign to an unseen point the dominant class among its k nearest neighbours within the training

set‛. A metric distance may be used to find the nearest neigbour. ‚With a sufficiently high value of k

and enough training samples, K.N.N. can approximate any function which enables it to produce

nonlinear decision boundaries‛ Lotte et al (2007).

6.3. N.N. Classification N.N.’s are widely used for classification ‚Due to their non-linear model and parallel computation

capabilities‛ Sriraja (2002). N.N.’s are described by Lotte et al (2007) as ‚An Assembly of several

artificial neurons which enables us to produce nonlinear decision boundaries‛. The N.N. used for the classification tests was the Multilayer Perception (M.L.P.) which is one of the more popular N.N.’s. It used 10 linear neurons for the first input layer and then 12 for the hidden layer. In this M.L.P. N.N ‚Each neuron’s input is connected with the output of the previous layer’s neurons whereas the neurons

of the output layer determine the class of the input feature vector‛ Lotte et al (2007).

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16


M.L.P. are useful for classification, provided they have a satisfactory amount of neurons and layers ‚They can approximate any continuous function‛ Lotte et al (2007). They are commonly used as they can quickly adapt to different problems and situations. However, it must be noted, ‚The fact that

M.L.P. are universal approximators makes these classifiers sensitive to overtraining, especially with such

noisy and non-stationary data as E.E.G. therefore, careful architecture selection and regularization is

required‛ Lotte et al (2007). The greater the amount of neurons available or used, the greater the ability of the N.N. to learn however, they are susceptible to over learning and therefore sometimes a lower amount of neurons gives greater accuracy. Cross validation is useful as it is concerned with preventing the N.N. from learning too much and consequently ignoring new data when it is inputted. Usually training sets are small in size as it is very time consuming and costly collecting ‚Known cases for

training and testing‛ Masters (1995). These small cases are often broken down further into relatively small sets for both training and testing, however this is not a desirable approach. Instead of taking this action one can avail of cross validation. This is a process which ‚Combines training and validation into

one operation‛ Masters (1995). When constructing a prediction rule reducing the error rate where possible is an important task. Efron (1983) describes an error rate as the ‚Probability of incorrectly classifying a randomly selected future

case, in other words the exception‛ to the rule. Cross validation is often used to reduce this error rate and ‚Provides a nearly unbiased estimate, using only the original data‛ Efron (1983).

6. 3. 1. Euclidean Distance A part of the N.N. algorithm examines the Euclidean distance. This distance refers to the difference between the coordinates i.e. location of a set of objects squared. This Euclidean distance between two points where

and can be denoted as

666... PPPRRREEEDDDIIICCCTTTIIIOOONNN

Frank et al (2001) defines a time series as ‚A sequence of vectors, x(t), t = 0,1,… , where t represents

elapsed time. Theoretically, x may be a value which varies continuously with t, such as a temperature‛. This time series method can be used in prediction in what is known as time series prediction. It involves the examination of past performance to predict future performance. This according to Coyle et al (2004) can be used to improve classification accuracy. Their work uses a ‚Novel feature extraction procedure which carries out self-organising fuzzy neural network based time

series prediction, performing feature extraction in the time domain only‛. Using such a method in their studies allowed for classification accuracies in the region of 94%. They argue that the main


advantage of this approach is that ‚The problem of specifying the neural network architecture does

not have to be considered‛. Instead of adapting the parameters for individual users, the system can ‚Self-organise the network architecture, adding and pruning neurons as required‛ just like with the human body. The author, using 6-step ahead prediction carried out a number of tests. The parameters for these tests were set at the following, unless otherwise stated.

Data was trained and tested with x (trl3) Embedding Dimension = 6 Time Lag = 1 Cross Validation was not used Number of neurons available to the neural network = one layer of 6.

All results were graphically displayed on a chart like that seen in figure 5 below.

Figure 5. – Shows the training data in blue and the test data in red. The difference between these two

lines is referred to as the root square error or simply the error rate.

7. 1. One Layer Neural Network The first test examined accuracy using a neural network with one layer of 6 neurons. This test was ran 10 times and then the average training root mean square and testing root mean square were calculated. The training root mean square was recorded at 0.0324 and the testing root mean square at 0.0313.

7. 2. Multi Layer Neural Network The next test was conducted using the exact same parameters except the neural network was changed from a single layer network with 6 neurons to one that also has a hidden layer of 8 neurons. The results from this test were slightly worse than the previous with a training and testing root mean square of 0.0326 and 0.0314. The difference between the figures from test 1 and test 2 were extremely minuet.

0 500 1000 1500 2000 2500 3000-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15Training Vectors

Time Step t

Targ

et

and O

utp

ut


7. 3. Cross Validation The next test was exactly the same as test 1 except that cross validation was used to determine whether or not it has a negative or positive effect. The training data scored slightly better with cross validation at 0.0293 compared to 0.0324 obtained in test 1. On the other hand the testing data performed better in test 1 with 0.0313 rather than 0.0317 found with cross validation.

7. 4. Left E.E.G. Data A test was carried out which used trl3 to train the network and trl4 to test it. The training mean square root was relatively the same as previous experiments using the same parameters for the training data. The testing mean square root however, was much improved with a result of 0.0240 compared to 0.0313 using trl3 for training.

7. 5. Right E.E.G. Data Tests were conducted using the right data. The N.N. was trained and tested with trr3. The error was a lot less than that found with the tests on the left data using the same parameters. 0.0292 was recorded for the training mean root square error and 0.0281 for the testing mean root square error. The right data was also tested to see what effect testing the N.N. with trr4 instead or trr3 would have on the performance. The training root mean square error stayed more or less the same and the testing root mean square error increased slightly to 0.0293.

888... OOOTTTHHHEEERRR MMMEEETTTHHHOOODDDSSS TTTHHHAAATTT CCCOOOUUULLLDDD BBBEEE UUUSSSEEEDDD FFFOOORRR FFFEEEAAATTTUUURRREEE EEEXXXTTTRRRAAACCCTTTIIIOOONNN

There are many other methods that could be used and that offer satisfactory performance when it comes to feature extraction for B.C.I’s.

8. 1. Amplitude And Phase Coupling Measure One such approach was created by Wei et al (2007), it is known as the ‘Amplitude and Phase Coupling Measure’. This method is concerned with ‚Using amplitude and phase coupling measures, quantified

by a nonlinear regressive coefficient and phase locking value respectively‛. Wei and his colleagues carried out studies utilising this approach. The results obtained from the application of this feature extraction method were promising. The ‚Averaged classification accuracies of the five subjects ranged

from 87.4% to 92.9%‛ and the ‚Best classification accuracies ranged between 84.4% and 99.6%‛. The conclusion reached from these studies is that ‚The combination of coupling and autoregressive

features can effectively improve the classification accuracy due to their complementarities‛ Wei et al (2007).


8. 2. Combination Of Classifiers Some researchers in an effort to improve performance and accuracy have begun using multiple classifiers to achieve the desired results. The author attempted this approach with the combination of mean, standard deviation and kurtosis as well as activity, mobility and complexity however, there are various different strategies that can be followed. These include boosting, voting and stacking to name but a few. Boosting basically operates on the principle of cooperation with ‚Each classifier focusing on

the errors committed by the previous ones‛ Lotte et al (2007). Voting on the other hand works like a voting system. The different modules of the N.N. are ‚Modeled

as multiple voters electing one candidate in a single ballot election assuming the availability of votes'

preferences and intensities. All modules are considered as candidates as well as voters. Voting bids are

the output-activations of the modules forming the cooperative modular structure‛ Auda et al (1995). Those candidates who have the majority vote wins. According to Lotte et al (2007) ‚Voting is the most

popular way of combining classifiers in B.C.I. research, probably because it is simple and efficient‛. Another strategy used for the combining of classifiers is what’s known as ‘Stacking’. This method according to Ghorbani & Owrangh (2001) ‚Improves classification performance and generalization

accuracy over single level cross-validation model‛.

8. 3. Multivariate Autoregressive Analysis (M.V.A.R.) Studies have been conducted in the past based on the M.V.A.R. model. Pei et al (2004) carried out such a study and boasts a classification accuracy of 88.57%. They describe the MVAR model as ‚The

extension form of univariate A.R. model‛ and argue, ‚Using the coefficients of M.V.A.R. model as EEG

features is feasible‛.

999... IIINNNSSSPPPIIIRRRAAATTTIIIOOONNN FFFRRROOOMMM BBBIIIOOOLLLOOOGGGYYY

There is no doubt that inspiration for some of the classification and prediction techniques that we use today came from the world of biology. Shadbolt (2004) points out that ‚We see complexity all around

us in the natural world – from the cytology and fine structures of cells to the organization of the

nervous system . . . Biological systems cope with and glory in complexity – they seem to scale, to be

robust and inherently adaptable at the system level . . . Nature might provide the most direct

inspiration‛. The author shares the view of Bamford et al (2006) that ‚An attempt to imitate a

biological phenomenon is spawning innovative system designs in an emerging alternative

computational paradigm with both specific and yet unexplored potential‛.

9. 1. Classification And Object Recognition Our brains are constantly classifying things in our everyday environment whether we are aware of it or not. Classification is the process that is responsible for letting us determine what the objects around us are i.e. a chair, a car, a person. It even allows us to recognise different faces of people with whom we come in contact with. The brain is able to distinguish each specific object by examining its numerous features and does so with great speed and accuracy. Many systems seek to reproduce a similar means of classifying data and can be useful in nearly every kind of industry. Take for example, the medical


industry in which classification plays a crucial role. Classification is used extensively for the identification of almost every kind of disease and illness. The process of diagnosis would be much more complex and time consuming if classification techniques were not applied to it.

9. 2. Self-Organisation Computer systems i.e. neural networks can be constructed on the same principles and concepts of self-organisation in humans. The term self-organisation is used to describe the process by which ‚Internal

structures can evolve without the intervention of an external designer or the presence of some

centralised form of internal control. If the capacities of the system satisfy a number of constraints, it

can develop a distributed form of internal structure through a process of self-organisation‛ Cilliers (1998). Self-organising maps are widely used a method for feature extraction and data mapping as well as prediction. Self-organising neural networks can encompass a time series prediction element and often with huge success. These can be extremely useful for predicting trends in different areas such as weather forecasting, marketing, the list is endless. The various prediction algorithms available work in the same way as the nervous system in humans. These programs aim to replicate the ‘anticipatory neural activity’ that occurs in the body and reproduce this in a system. Take for example a financial decisions system recently developed. This system looked at how using the ‘anticipatory neural activity’ element and taking it into consideration could help people using this system to make decisions that are more likely to be successful and thus less risky. When people are making financial decisions, they can often opt for an option that seems like the irrational one. The reasons for this irrational thought had not previously been known. Kuhnen & Knutson (2005) examined ‚Whether anticipatory neural activity would predict optimal and

suboptimal choices in a financial decision-making task‛. They observed that the nucleus accumbens was more active when risky choices were being made and that anterior insula when riskless options were being followed. From their findings they concluded that particular neural circuits linked to anticipatory affect would either hinder or encourage an individual to go for either a risky or riskless choice. They also uncovered the fact that an over activation in these circuits are more likely to cause investing mistake and ‚Thus, consideration of anticipatory neural mechanisms may add predictive

power to the rational actor model of economic decision making‛. The system was able to replicate relatively successfully the way in which humans make investment decisions.

111000... FFFUUURRRTTTUUURRREEE WWWOOORRRKKK

The combination of classifiers is gaining popularity and becoming more widely used as a means of improving accuracy and performance. From researching this topic one can see that most publications deal with one particular classifier with little effort been taken to compare one classifier to the next. Studies could be undertaken in an attempt to compare these to particular criteria. There is a lot more room for improvement considering the algorithms that are available at the moment. A deeper understanding of the human brain and how it classifies and predicts should lead to the creation of more biologically plausible solutions.


111111... CCCOOONNNCCCLLLUUUSSSIIIOOONNN

This document addressed the various issues pertaining to feature extraction, classification and prediction. It focused on the application of these techniques to unlabelled E.E.G. data. This was done in an effort to discriminate between left and right hand imagery movements. It briefly reflected on the need for brainwave signal preprocessing. An in depth analysis of the feature extraction and classification process was carried out and the results highlighted. Classification algorithms were examined, namely L.D.A., K.N.N. and N.N. This document looked at prediction and its effect on accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches, however, a number of these other methods not tested were dealt with. This document also highlighted the fact that inspiration for the design of feature extraction, classification and prediction systems often comes from nature. Finally thought was given to future work. From studying the E.E.G. data and carrying out various tests using numerous parameters and classifiers, it has been concluded that a combination of the three Hjorth features, activity, mobility and complexity gives the highest level of accuracy. The author discovered that the descriptive classifiers drawn upon are not suitable for E.E.G. data, as they do not provide a satisfactory level of separation, they work better with simple data. It was found that feature extraction and classification enjoyed more success by using cross validation and a multiple layer N.N. in contrast to prediction that was best suited to a single layer N.N. without cross validation. The greatest level of accuracy recorded using the combined Hjorth features was 74%. Separability of the left hand imagery motor signal from the right was greater at 7 seconds than it was at 6. Accuracy was improved by specifying the data window extents of s=680 and e=700. Prediction tests indicated that left hand data is more easily separated and classified than the right hand data. The author also realised that the N.N. performed better when different data was used for training and testing. New methods of feature extraction, classification and prediction will undoubtedly be discovered as the understanding of the human body evolves. The research of this particular topic can extend over multiple disciplines and therefore it is likely that ‚Insights from one subject will inform the thinking in

another‛ Shadbolt (2004). Advances made in the field of science often results in complimentary gains in the area of computing and vice versa. All the processes discussed in this document can have a huge impact on the lives of individuals, businesses and society at large. Many people suffering from motor impairments rely heavily on B.C.I. technologies that incorporate classification and prediction techniques for everyday living. They will undoubtedly progress society towards the creation of a safer and more inclusive society. Classification and prediction can often be an integral part of any business decision. A manager may consult with his/her computer system to make risky business decisions such as, should I invest in this new product?, how much stock should I buy? etc. Society also benefits from feature extraction, classification and prediction. These processes are widely used for disease and illness diagnoses and other things such as weather forecasting and storm prediction to name but a few. Consequently, it is safe to assume that this field of study will remain a popular one in the years to come and make many more advances.


BBBIIIBBBLLLIIIOOOGGGRRRAAAPPPHHHYYY

ATRY, F. & OMIDVARNIA, A. H. & SETAREHDAN, S. K. (2005) ‚Model Based E.E.G. Signal Purification to Improve the Accuracy of the B.C.I. Systems‛ Proceedings from the 13th European Signal Processing

Conference.

Auda, G. & Kamel, M. & Raafat, H. (1995) ‚Voting Schemes for Cooperative Neural Network Classifiers‛ Neural Networks 3(3), pp. 1240-1243. Proceedings of the IEEE International Conference on Neural

Networks.

Bamford, S. & Murray, A. & Willshaw, D. J. (2006) ‚Synaptic Rewiring in Neuromorphic VLSI for Topographic Map Formation‛ [Internet], Date Accessed 15 April 2007, Available From: http://www.see.ed.ac.uk/~s0454958/interimreport. pdf. ColEman, A. M. (2003) ‚Oxford Dictionary of Psychology‛ Oxford: Oxford University Press. COYLE, D. & PRASAD, g. & MCGINNITY, T. M. (2004) ‚extracting Features for a Brain-Computer Interface by Self-Organising Fuzzy Neural Network-Based Time Series Prediction‛ Proceedings from the

26th Annual International Conference of the IEEE EMBS.

Cilliers, P. (1998) ‚Complexity and Postmodernism: Understanding Complex Systems‛ London: Routledge. EBRAHIMI, T. & VESIN, J. M. & GARCIA, G. (2003) ‚Brain-Computer Interface in Multimedia Communication‛ IEEE Signal Processing Magazine 20(1), pp. 14-24. Efron, B. (1983) ‚Estimating the Error Rate of Prediction Rules: Improvement on Cross Validation‛ Journal of the American Statistical

Association 78(382), pp. 316-331. FISHER, E. & HOLTOM, D (1999) ‚Enjoy Writing Your Science Thesis or Dissertation – A Step by Step Guide to Planning and Writing ‛ London: Imperial College Press. FRANK, R. J. & DAVEY, N. & HUNT, S. P. (2001) ‚Time Series Prediction and Neural Networks‛ Journal of

Intelligent and Robotic Systems 31(1-3), pp. 91-103. GHORBANI, A. A. & OWRANGH, K. (2001) ‚Stacked Generalization in Neural Networks: Generalization on Statistically Neutral Problems‛ Neural Networks 3, pp. 1715-1720, Proceedings from the IJCNN

International Joint Conference on Neural Networks.

Kohavi, R. (1995) ‚A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection‛ IJCAI Proceedings from the International Joint Conference on Artificial Intelligence.


Kuhnen, C. M. & knutson, b. (2005) ‚The Neural Basis of Financial Risk Taking‛ Neuron 47(5), pp. 763-770. LOTTE, F. & CONGEDO, M. & LECUYER, A. & LAMARCHE, F. & ARNALDI, B. (2007) ‚A Review of Classification Algorithms for EEG-Based Brain-Computer Interfaces‛ Journal of Neural Engineering 4, pp. R1-R13. MASTERS, T. (1995) ‚Neural, Novel & Hybrid Algorithms for Time Series Prediction‛ New York: John Wiley & Sons Inc. MIRANDA, E. & BROUSE, A. (2005) ‚Toward Direct Brain-Computer Musical Interfaces‛ Proceedings

from the 2005 Conference on New Interfaces for Musical Expression, pp. 216 - 219. PALLANT, J. (2001) ‚S.P.S.S. Survival Manual – A Step By Step Guide To Data Analysis Using S.P.S.S.‛ Berkshire: Open University Press. PEI, X. M. & ZHENG, C. X. (2004) ‚Feature Extraction and Classification of Brain Motor Imagery Task Based on MVAR Model‛ Machine Learning and Cybernetics 6, pp. 3726 – 3730, Proceedings from the 3rd

International Conference on Machine Learning and Cybernetics. RIPLEY, B. D. (1996) ‚Pattern Recognition and Neural Networks‛ Cambridge: Cambridge University Press. SHADBOLT, N. (2004) ‚From the Editor in Chief: Nature-Inspired Computing‛ IEEE Intelligent Systems

19(1), pp.2-3. SRIRAJA, Y. (2002) ‚E.E.G. Signal Analysis for Detection of Alzheimer’s Disease‛ PhD Thesis, Texas Tech University, Data Accessed: 11 April 2007, Available From: http://webpages.acs.ttu.edu /ysriraja/MSthesis/Thesis.pdf. TABACHNICK, B. G. & FIDELL, L. S. (1996) ‚Using Multivariate Statistics‛ 3 ed. New York: Harper Collins. WEI, Q. & WANG, Y. & GAO, X. & GAO, S. (2007) ‚Amplitude and Phase Coupling Measures for Feature Extraction in an E.E.G.-Based Brain-Computer Interface‛ Journal of Neural Engineering 4, pp. 120-129. Wolpaw, J. R. & Birbaumer, N. & McFarland, D. J. & Pfurtscheller, G. & Vaughan, T. M. (2002) ‚Brain-Computer Interfaces for Communication and Control‛ The Journal of Clinical Neurophysiology 113(6), pp. 767-91. WOLPERT, D. H. (1992) ‚Stacked Generalization‛ Neural Networks 5(2), pp. 241-259, Pergamon Press. Heading One