04227366

Automatic Spike Sorting For Real-time ApplicationsDaniel J. Sebald1 and Almut Branner2

1Consultant, Madison, WI, USA2Cyberkinetics Neurotechnology Systems, Inc., Salt Lake City, UT, USA

Abstract Real-time applications of spike sorting, e.g., neuraldecoding, generally require high numbers of channels, andmanual spike sorting methods are extremely time consuming,subjective and, generally, do not perform well for low signal-to-noise ratio (SNR) signals. Hence, an automatic method issought which is efficient and robust in both detecting neuralspikes and constructing a classification model of spikes arrivingwith underlying statistics that are time-varying.

We present such a system under study for application witha microelectrode array of 96 channels with typically threeor four units (i.e., neurons) per channel. There are severalnovel elements of the system including filtering the neuralsignal to a frequency band having better SNR for spikedetection, a fixed feature space for simple implementation yetadequate resolving capabilities, a Gaussian statistics modelalso for simple implementation as a log-likelihood classifier, asystematic approach to determining the number of clusters in apattern recognition problem, and a robust linear discriminant,histogram-based technique for determining boundaries betweenfeature space clusters.

Index Terms Neural spike sorting, unsupervised classifica-tion, pattern recognition, clustering methods, brain-machineinterface.

I. INTRODUCTIONPenetrating microelectrodes used to extracellularly record

neural activity in the brain or other parts of the nervoussystem frequently record spikes from multiple cells or units[1]. Because of the information contained in the spatial andtemporal firing patterns of individual neurons, separatingmixtures of neuronsa process called spike sorting, which isa form of classificationis essential for neurophysiologicalstudies as well as for other neural prosthetic, diagnosticor therapeutic applications. Much of the automatic spikesorting research has been done either offline or has yet tobe integrated efficiently and effectively into online, real-timeprocessing systems [14]. Furthermore, one eventual goal isto create wireless, fully implantable devices for long termresearch as well as human clinical applications. Portable andimplantable devices have limited computational resourcesand memory. For this reason, seeking efficient signal pro-cessing and lower bandwidth solutions is critical. As withmany signal processing problems, there is a balance soughtbetween reduced signal processing and adequate informationretention.

Spike sorting is the process of discerning neural ac-tion potential waveforms from multiple sources (neurons)detected by a single electrode based on waveform shapeand amplitude [5, 6]. There is typically significant noisepresent on microelectrodes because of the small electric field

0 10

20 30

40 50

60 0 5

10 15 20 25

30 35 40 45

0 0.001 0.002 0.003 0.004 0.005

Fig. 1. Gaussian mixture model.

potential a single neuron generates upon discharge. The SNRgenerally can vary from 0 to 20 dB over the duration of oneaction potential, which is approximately 12 ms in duration.

The process of sorting involves building a model withlittle a priori information about what lies near the tip ofan electrode. We are only equipped with the notion of atypical neural spike waveform. The number of neurons andstatistical firing patterns are unknown. We have sought anefficient method that we believe is adequate for classifyingneural waveforms in such a noisy setting and have decidedupon a fixed feature-space based, Gaussian mixture model.

II. GAUSSIAN MIXTURE MODEL

Figure 1 illustrates the feature space (base) and the Gaus-sian mixture model (top) [7, 8]. The x/y-plane of the featurespace are parameter values representing some discernibleaspect of the neural spikes, discussed later. The z-dimensionis the joint probability density function for the sample spacecomprised of a finite sum of conditional Gaussian densities

fn(x) =

Ni=1

Pn(ci)fn(x|ci). (1)

where n is the time index, x the feature space variable, Nthe number of units, Pn(ci) the probability that a neuralspike originated from class (i.e., neuron or unit) ci, andfn(x|ci) the conditional probability density of feature-spacetransformed neural spikes for class ci. The probabilitiesand densities are time-varying and, in fact, Pn(ci) conveysa portion of the information sought for decoding neuralactivity, the other important information being firing rate[9]. In this unsupervised and time-varying setting, we havelittle option but to assume equiprobable outcomes when

Proceedings of the 3rd InternationalIEEE EMBS Conference on Neural EngineeringKohala Coast, Hawaii, USA, May 2-5, 2007

SaD1.33

1-4244-0792-3/07/$20.002007 IEEE. 670

RATE 130 kS/s

Sampledchannel

1000 Hzhighpass

1000 Hzlowpass

( 2)250 Hz 5 kHz

bandpass

| |1.5 kHzlowpass

60 kSboxcar

( 3)

Envelope

RATE 40.25 S/s

RATE 35 kS/s

Triggeralgorithm

Basethreshold

A

( 60 k)RATE 215 kS/s

Fig. 2. Subsystem for detecting spikes in noise. All filters used aresecond order Butterworth filters for efficiency.

classifying, i.e., Pn(ci) = P (ci) = 1/N and

fn(x|ci) = f(x|ci) =1

(2pi)N

2 |i|1

2

e(xi)T1

i(xi)/2.

(2)Further information may be available in the form of intendedmovement in the case of brain-machine interfaces via trainingexercises or long term statistics, but these topics are for futureresearch.

The method we use to classify feature space vector sam-ples xk is the log-likelihood of (2) which results in a fairlyefficient squared Mahalanobis distance problem

k = argmaxi

f(xk|ci) (3)

= argmaxi

(n

2log(2pi)

1

2log |i|

1

2(xki)

T1i (xk i)) (4)

= argmini

(log |i|+ (xk i)

T1i (xk i)). (5)

where k indexes spike triggers (i.e., not the same as timeindex n). Based on second order statistics, (5) impliesdecision boundaries which are piecewise conic sections. Theclassifier output k in conjunction with timing informationcan be used for neurophysiological studies and applicationssuch as real-time neural decoding and device control [9].

With model parameters known (or assumed), further es-timates of the Gaussian mean and covariance are computedafter collecting a block of K samples for individual modelcomponents

i[mi] =1

|Bmi |

kBmi

xk (6)

and

i[mi] =1

|Bmi | 1

kBmi

(xk i[mi]

)(xk i[mi]

)T

(7)where mi indexes the evolution of the block statistics forclass i, Bmi is the set of feature space samples belongingto class i for block mi, and |Bmi | is the cardinality of setBmi . We use the general notation for statistics in (6) and(7) because the set Bmi may not necessarily be the exactgroup of K block samples for unit i as output by classifier(5). This is the case because cluster boundaries may move asnew samples are gathered and model statistics are updatedaccording to (6) and (7). Hence, the algorithm will adaptto small changes in waveform averages due to electrodemovement over time and will rectify the incorrect placement

of the boundaries that may occur occasionally as seen inFig. 7 between the red and green clusters. The histogramtechnique discussed later is the global data algorithm whichachieves this adaptive mechanism.

III. DETECTION AND TRANSFORMATION

A. High Frequency TriggeringThe method of detecting neural spikes chosen here is dif-

ferent than conventional thresholding methods of triggeringupon the waveform passing below or above a predeterminedamplitude value. Figure 2 shows the detection subsystemcomponents, which include elements similar to envelopedetection used in communication systems.

Typically there is very strong local field potential (LFP)energy in the monitored neural signal below 250 Hz. Above250 Hz there is important waveform content useful for classi-fication, yet there remains noticeable LFP noise which tapersdownward with increasing frequency. Above 1 kHz thereremains transient neural spike energy up to approximately5 kHz while the LFP noise energy continues its decreasingtrend with respect to increasing frequency. Naturally there isa white noise component given the large amplification of thesignals.

Because of the above spectral characteristics, the initialfiltering stage is between 250 Hz and 5 kHz, followed bydecimation for sake of computation reduction. The neuralsignal is then split into a high frequency subband for spikedetection and subsequent classification and a low frequencysubband which is only used for spike classification. This isdone for two reasons: 1) The higher frequency range hasimproved SNR for triggering, i.e., we seek to trigger on highfrequency transients at the leading edge of the nominal neuralspike. 2) Removing the lower frequency transient producesshorter duration spikes for improved time resolution whenthere is the possibility of temporal overlap of spikes frommultiple units.

In addition to the restricted frequency band for triggering,we have employed envelope detection as a means to captureboth negative and positive leading edge spikes. Another fea-ture of the subsystem in Fig. 2 (the lower path) is an estimateof l1 signal power used to generate a trigger threshold that isadapted every four seconds. Additional multirate processingis done to decrease computation consumption in these stages.

B. Fixed Feature SpaceSeveral researchers have sought the best feature space for

discerning between spike waveforms [24]. Here we take a

671

Align spike

A

zD

Correlatewith scaledkernel w1

1

zD

Correlatewith scaledkernel w2

2

Featurespace

RATE 215 kS/s

1000 5000 Hz

250 1000Hz

Fig. 3. Subsystem for correlating bandpass filtered spike waveformwith nominal waveform filtered to same band.

less optimal but more efficient approach which is to seeka fixed, two-dimensional feature space that in most cases isadequate for separating units into clusters. The most obviousadvantage of a fixed feature space is that there is no needfor constructing the projections on each channel separatelyand adaptively. This reduces processing requirements aswell as eliminates an optimization algorithm. Also, a fixedprojection space is something that a layman (e.g., clinician)can grow accustomed to and can have confidence that thefixed projection is in fact adequate in most cases, whereasan always changing feature space may introduce doubt aboutwhether the projection is over or under-sensitive in onedirection or another.

The two dimensional projection chosen utilizes someprocessing we have already done. That is, we have filteredthe waveform to a high frequency band to increase SNR andtemporal resolution, so we choose this as one signal basis ofour feature space, and the low frequency waveform servesas another basis of our feature space as seen in Fig. 3. Thetwo dimensional, random feature space vector is constructedas xk =

[1[k] 2[k]

]Twhere

i[k] =

J1j=0

xi[nk + j]wi[j] (8)

is the projection (correlation) of the captured, filtered, andaligned neural signal xi, starting at time nk onto the com-ponent basis wi. In this case, fixed kernels, Fig. 4, werecalculated as the ensemble average of previously recorded,filtered waveforms. Note the shorter temporal duration of thehigh frequency kernel w1 compared to w2 and its associationwith the triggering mechanism mentioned earlier.

C. Alignment

One cannot under-emphasize the importance of properalignment when computing 1 and 2, for otherwise smallvestigial clusters appear in the feature space. Ideally, onewould use a matched filter (correlation) and select theindex as peak output in the trigger region. However, thisis wasteful, computation-wise. Hence, we approximate witha three point correlation near the peak in the capturedhigh frequency component of the waveform. This approacheliminates vestigial clusters that we have otherwise seen.

n T

w2

0.0020.00150.0010.00050

2.52

1.51

0.50

-0.5-1

-1.5

Correlation kernels for feature space

w1

543210

-1-2-3

Fig. 4. Kernels used to transform captured, aligned waveforms intothe feature space; a high frequency component and a low frequencycomponent of a nominal neural spike as passed through frequencysplitting, Butterworth filters.

IV. CLASSIFICATION AND MODEL CONSTRUCTIONA. Pairwise Cluster Dividing and Combining

To determine the number of clusters in an unsupervisedfashion, we propose a novel approach called pairwise clusterdividing and combining (PCDC) which systematically exam-ines a single spike cluster and divides it into two clusters, ifwarranted, and vice versa examines a pair of spike clustersand combines them into one cluster, if warranted. Thetechnique includes building a histogram based on projectionsonto either principle component or linear discriminant axes.The concept is to systematically examine individual clusters,i.e., peaks in the histogram, and eventually converge to anappropriate number of clusters.

The inherent questions of the PCDC method are 1) Doesthe data presented constitute one cluster, or more than onecluster? and 2) If more than one cluster, where is a goodplace to split the data for an acceptable boundary? Givenmethods for addressing both questions, as described later inthis section, we may add/remove a unit to/from the modeland compute its associated statistics i and i in (6) and (7)based upon the reclassification according to the boundary.

We begin by assuming only one cluster, or detectable firingneuron, is present simply because it is an easy place to start interms of building an initial mean and covariance for a groupof clusters. After the collection of a block of feature samples,typically between 200 and 300, the initial single cluster maybe split as a result of the histogram peak hypothesis test.The resulting subclusters may later split themselves and soon. At some point in time a random, aberrant splitting mayoccur resulting in too many clusters. However, a combiningmechanism, which acts analogous to the division mechanism,will eventually correct such an occurrence.

For the combination test, we pick the cluster nearestthe one of interest and consider their combined histogramstructure. The inherent question then is once again, Doesthe data presented constitute one cluster, or more than onecluster? If the histogram peak profile suggests only onecluster is present then the algorithm combines those clustersand computes statistics for the single unit.

672

PCDC is an adaptable technique constantly readjusting thenumber of units. Some applications, e.g., neural decodingalgorithms, may require the number and order of units tobe constant, in which case the occasional appearance of anew unit poses a problem. To deal with this, we have theoption to freeze the number of units, or additionally freezethe update of statistics after a training period which dependson the firing rate characteristics of the data. For example,decoding and control applications utilizing movement relatedsignals generally require two to three minutes of trainingwhile epilepsy applications utilizing interictal signals frominfrequently firing neurons may require up to 20 minutes oftraining.

B. Clusters as Peaks in HistogramThe question we have posed in the previous section about

the number of clusters is reminiscent of work originallydone by Fisher [10] in choosing the best direction to projectmultidimensional data onto a linear subspace. That is, forlinear discriminant inner product

y = wTx, (9)Fisher had shown, given two classes known a priori, theoptimum cluster separation projection is [11]

wf = S1W (1 2). (10)

where scatter matrix SW 1+2. Our problem is similar,but involves data not classified a priori and possibly morethan two clusters. Initially, when the number of clusters isunknown, we turn to principle components, which gives thedirection of most variation, on the premise that if the groupof data represents separate clusters then most variation willappear along the direction connecting the means of individualclusters, provided the variance of feature space projections1 and 2 are of approximately equal scale.

Let vmaj be the principle major eigenvector for the eigen-decomposition of the covariance matrix , i.e., that whichis associated with the larger of the two eigenvalues of thedecomposition for two dimensional data. Simply choose

wp = vmaj. (11)as the projection direction, and construct univariate data yi =wTp xk for k Bmi . Figure 5 illustrates the idea of projecting

feature space data onto the principle component lying alongthe line between clusters.

Given the univariate data, the next step is to normalize dataalong the linear subspace so that it maps to the resolutionof the histogram, Nbin. We leave out the outer four samples(two on each end) as potential outliers then map accordingto the affine relationship

zi = myi + b (12)where scale m = Nbin/(ymax ymin), inverse of the binwidth, and offset b = (Nbin m(ymin+ ymax))/2. With thistranslation, rounding downward to the nearest integer givesthe bin in which the vector sample xk should be associated.The resulting histogram for the data of Fig. 5 appears in

Principle Component, Limits Range Projection

1

2

706050403020100

40

30

20

10

Fig. 5. Projection direction and histogram span based upon principlecomponent linear discriminant.

valley

Principle Component, Limits Range Histogram

bin

cou

nt

20151050

8070605040302010

0

Fig. 6. Resulting 20 bin histogram for data projected onto principlecomponent.

Fig. 6 where multiple peaks are evident. The hypothesistest for determining whether a histogram represents oneor more than one cluster needs to be robust to a varietyof possible distribution characteristics such as overlappingGaussian clusters or well separated clusters. For example,the currently implemented level-crossing based strategy is todeclare a valley when the bin count falls at least, say, 30%below a peak and a peak when the bin count rises, say, 130%above a valley. If a valley descends to zero, the next peakhas to rise above at least, say, 1.6% of the total number ofsamples in the histogram.

The histogram allows choosing an inherent boundary asthe valley separating peaks, as shown in Fig. 6. Hence, hereinlies the means alluded to in Section II of reclassifying samplepoints before adjusting the Gaussian mixture model statisticsand why (6) and (7) have general notation.

V. PRELIMINARY RESULTSThe methods presented in this paper have been imple-

mented on a commercially available real-time neural proces-sor and have been tested using 30 kHz sampled human dataacquired from a microelectrode array placed in the motorcortex [9]. Figure 7 shows the resulting clusters for a veryactive channel having four units. Settings were configuredfor a 250 sample point block size and the system was runfor 60 s at which point the number of clusters was frozen.

The boundaries of Fig. 7 are as we might expect for such amixture given the feature space data. Some related capturedspikes for this model are shown in Figs. 8 and 9, whichillustrate some interesting details. Several points are worthnoting:

1) The temporal waveforms of Fig. 8 illustrate how veryactive channels often have spikes landing in the tail

673

2

1

Fig. 7. Pattern space for a single electrode channel after sorting. Thecircle represents a user-defined noise cluster.

Fig. 8. Individual, sorted unit traces corresponding to cluster modelillustrated in Fig. 7.

of previous spikes. Yet, detection and classificationdoes not seem to be significantly effected by suchoccurrences.

2) Figure 9 shows how similar unit 3 (blue) and unit 4(purple) are and how difficult this problem might bewith temporal techniques like time amplitude windows.Furthermore, the fixed feature space we have chosenyields reasonable cluster separation as illustrated inFig. 7.

3) In Fig. 7, unit 2 (green) extends slightly into what weassume should be the cluster of unit 1 (red). This is aconsequence of the fact unit 2 includes several noisywaveforms resembling outliers and does not preciselyfit a Gaussian model in the pattern space. A differentclassification model, such as [2], could be used withthe same automated techniques afforded by PCDC andhistogram analysis with little alteration.

VI. CONCLUSIONSThe real-time automatic spike sorting methods presented

here have shown robust classification on actual human cor-tical action potentials as implemented on the CyberkineticsNeuroPortTM System. The method does suffer from the factthat successive dividing occurs only as new blocks of dataappear. For low firing rate neurons this can mean slow

86

-172

-344

344

-258

-86

0

258

172

-430

1.12

1.28

1.44

1.60

0.160

0.32

0.48

0.64

0.80

0.96

t (mS)

x(

V)

Fig. 9. Ensemble of sorted unit traces corresponding to cluster modelillustrated in Fig. 7.

model building. Yet, this training period is fast compared tomanually constructing a model for 96 channels. Furthermore,future research can explore reusing the block of previouslycaptured data if too much time has elapsed. The histogramtechnique is essential to the algorithm, as it introduces globalanalysis of data rather than sample-by-sample adaptation.Computing histograms is as efficient as, say, computingstatistics. The PCDC method is novel as far as we are awareand may have applications in pattern recognition scenariosoutside of spike sorting.

REFERENCES[1] M. S. Lewicki, A review of methods for spike sorting: the detection

and classification of neural action potentials, Network: Comput.Neural. Syst., vol. 9, pp. R53R78, 1998.

[2] S. Shoham, M. R. Fellows, and R. A. Normann, Robust, automaticspike sorting using mixtures of multivariate t-distributions, Journalof Neuroscience Methods, vol. 127, pp. 111122, 2003.

[3] M. Sahani, Latent variable models for neural data analysis, Com-putation and Neural System, California Institute of Technology,Pasadena, 1999.

[4] G. Santhanam, M. Sahani, S. I. Ryu, and K. V. Shenoy, An extensibleinfrastructure for fully automated spike sorting during online experi-ments, in Proceedings of the 26th Annual International Conference ofthe IEEE EMBS, vol. 127, San Francisco, CA, 2004, pp. 43804384.

[5] C. M. Gruner and D. H. Johnson, Comparison of optimal and subop-timal spike sorting algorithms to theoretical limits, Neurocomputing,vol. 3840, pp. 16631669, Jun. 2001.

[6] S. M. Bierer and D. J. Anderson, Multi-channel spike detection andsorting using an array processing technique, Neurocomputing, vol.2627, pp. 947956, Jun. 1999.

[7] G. J. McLachlan and D. Peel, Finite Mixture Models. New York:Wiley, 2000.

[8] K. H. Kim, A fully-automated neural spike sorting based on projec-tion pursuit and Gaussian mixture model, in IEEE Conf. on Eng. inMed. & Bio., Arlington, VA, March 1619 2005, pp. 151154.

[9] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh,A. H. Caplan, A. Branner, D. Chen, R. D. Penn, and J. P. Donoghue,Neuronal ensemble control of prosthetic devices by a human withtetraplegia, Nature, vol. 442(7099), pp. 164171, July 13 2006.

[10] R. A. Fisher, The use of multiple measurements in taxonomicproblems, Annals of Eugenics, vol. 7, pp. 179188, 1936.

[11] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York:John Wiley & Sons, 2001.

674

04227366

Documents