information-based modeling of event-related brain dynamicsscott/pdf/onton_makeig_pbr06.pdf ·...

Neuper & Klimesch (Eds.)Progress in Brain Research, Vol. 159ISSN 0079-6123Copyright r 2006 Elsevier B.V. All rights reserved

CHAPTER 7

Information-based modeling of event-related braindynamics

Julie Onton! and Scott Makeig

Swartz Center for Computational Neuroscience, University of California at San Diego, La Jolla, CA 92093-0961, USA

Abstract: We discuss the theory and practice of applying independent component analysis (ICA) toelectroencephalographic (EEG) data. ICA blindly decomposes multi-channel EEG data into maximallyindependent component processes (ICs) that typically express either particularly brain generated EEGactivities or some type of non-brain artifacts (line or other environmental noise, eye blinks and other eyemovements, or scalp or heart muscle activity). Each brain and non-brain IC is identified with an activitytime course (its ‘activation’) and a set of relative strengths of its projections (by volume conduction) to therecording electrodes (its ‘scalp map’). Many non-articraft IC scalp maps strongly resemble the projection ofa single dipole, allowing the location and orientation of the best-fitting equivalent dipole (or other sourcemodel) to be easily determined. In favorable circumstances, ICA decomposition of high-density scalp EEGdata appears to allow concurrent monitoring, with high time resolution, of separate EEG activities intwenty or more separate cortical EEG source areas. We illustrate the differences between ICA and tra-ditional approaches to EEG analysis by comparing time courses and mean event related spectralperturbations (ERSPs) of scalp channel and IC data. Comparing IC activities across subjects necessitatesclustering of similar Ics based on common dynamic and/or spatial features. We discuss and illustrate such acomponent clustering strategy. In sum, continued application of ICA methods in EEG research shouldcontinue to yield new insights into the nature and role of the complex macroscopic cortical dynamicscaptured by scalp electrode recordings.

Keywords: independent component analysis (ICA); event-related potentials (ERPs); event-related spectralperturbation (ERSP); EEG source localization; independent component (IC) clustering

Introduction

In the past decade, an explosion of advances inhuman brain imaging techniques have pushedanalysis of metabolic or blood-oxygenation levelsin the brain to the forefront of human neurosci-ence research. In contrast, the popularity of elect-roencepholography (EEG) has waned, largelybecause the majority of EEG studies still analyzeaverage electrical potential time series from single

scalp electrodes which, in isolation, reveal littleabout the number, type and spatial distribution ofthe brain potentials that generate them. By basicbiophysics, if temporally coherent activity in anybrain area creates far-field potentials on the scalp,these potentials are distributed widely, by passivevolume conduction, across the scalp surface. EEGsignals arriving at each electrode are the sum ofactivities in all such EEG source areas, as well aselectrical artifacts from muscles, eyes, electrodes,movements, and the electrical environment.

An additional cause of complexity forelectromagnetic brain imaging, as compared with!Corresponding author.E-mail: [email protected]

DOI: 10.1016/S0079-6123(06)59007-7 99

metabolic imaging, is that electromagnetic fieldactivity within a cortical patch has a net sourceorientation as well as location. Typical differencesin cortical folding patterns between individualbrains may therefore produce differences in theorientations of spatially equivalent source areas,producing large differences in their projections tothe scalp electrodes.

For this reason, comparing EEG activities atequivalent scalp locations across subjects may notbe as accurate as comparing hemodynamic activ-ities of equivalent 3-D locations in their magneticresonance (MR) images. The mixture of sourceactivities reaching a given scalp location, e.g., thevertex (Cz), in different subjects may depend onthe relative amplitudes, distances, and orientationsof subject cortical source areas across most of thecortex. Therefore, a first signal processing step inusing scalp EEG (or its magnetic equivalent, mag-netoencephalographic or MEG) data for dynamicbrain imaging should be able to spatially filter therecorded data so that the outputs of each spatialfilter may be identified with activity in a particularcortical source area or domain.

Unfortunately, the widespread projection ofsource activities across the scalp surface alsomeans that the ‘EEG inverse problem’ of locatingthe brain sources of the recorded data is mathe-matically ill-posed and is in fact not resolvablewithout additional constraints and assumptions.The EEG inverse problem is even more difficultthan the corresponding inverse problem for MEGdata, since propagation of electrical potentialthrough the brain by volume conduction is in partanisotropic (meaning it varies with the direction ofpropagation). However, direct non-invasive meas-ures of this anisotropy are not currently available.For these reasons, as well as the considerablecomputational complexity involved, adequatebrain location-based spatial filtering of EEG sig-nals has been considered difficult to impossible,leading to the widespread but largely inaccurateperception that EEG brain imaging is doomed tohaving low spatial resolution.

In recent years, an alternative approach hasbeen developed for generating spatial filters thatallow simultaneous monitoring of field activitiesin different cortical areas, using a recent signal

processing approach known as independent com-ponent analysis (ICA) (Comon, 1994; Bell andSejnowski, 1995). Eleven years ago, the senior au-thor and colleagues first discovered that ICA is auseful tool for decomposing EEG signals intomaximally independent activity patterns that inmany cases are compatible with activity in a singleactive cortical area (Makeig et al., 1996).

The ICA approach to dynamic brain imaging isto separate the independent EEG activities in eachsubject’s data, not by direct spatial filtering foractivities generated in a set of pre-defined corticallocations, but by using the information content ofthe data itself to separate portions of the recordedscalp data from each active cortical and artifactsource area based on the deceptively simple butstatistically and physiologically plausible assump-tion that over time, these activities should benearly independent of each other.

The major advantage of this approach is thatthe locally coherent activity constituting a singleEEG source will be grouped together into a singleindependent component (IC) that includes its pro-jections to all the scalp channels, while the activ-ities of unrelated EEG sources will be rejectedfrom this IC and isolated into other ICs. In thisway, under favorable circumstances ICA willtransform the recorded high-density scalp datainto a set of cortical and artifact source recordings— thereby discovering what distinct signals arecontained in the data before asking directly wherethese signals are generated.

This indirect or even (to some) backwards-seem-ing approach to spatial source filtering has madethe ICA approach difficult to accept for some re-searchers with physical science backgrounds. Onthe other hand researchers accustomed to usingcomputationally simpler measures of activity atsingle scalp electrodes might balk at the apparentincrease in complexity of ICA-based analysis. Al-though skepticism between (at least) these twoclasses of researchers may have slowed widespreadinvestigation of the utility of ICA for EEG anal-ysis, more and more students and investigators aretaking advantage of freely available and commer-cial software tools for performing ICA analysis, inthe process understanding more of its benefits andpitfalls. Here, we first review the theory, promise,

100

and practical use of ICA decomposition of EEGdata, then exemplify its benefits for event-relatedtime/frequency analysis. Finally, we will describean important problem ICA poses the identifying ofequivalent component processes across subjectsand sessions. We discuss approaches to solvingthis problem by ICA component clustering.

ICA history

The concept of ICA was first developed in the fieldof signal processing around 1990 (Comon, 1994)as part of a larger class of ‘blind source separation’problems that aim to separate individual sourcesignals from multi-dimensional data in which theyare mixed. ICA applied to high-density scalp EEGdata produces a strictly linear and invertible de-composition of the data, meaning that the activityof every resulting IC is simply a weighted sum ofthe signals recorded at all of the input scalp chan-nels, and every scalp channel signal is simply aweighted sum of the projected activities of all theICs. Technically, ICA finds a set of fixed spatialfilters that together constitute the most distinct(i.e., temporally near-independent) signals availa-ble in the input data.

Currently, several related Matlab (The Math-works, Inc.) algorithms for performing ICA arereadily available on the internet. Several relatedalgorithms for performing ICA have been devel-oped for Matlab (The Mathworks, Inc.) which arereadily available on the internet. These includeJADE (Cardoso and Laheld, 1996), infomax ICA(Bell and Sejnowski, 1995), and so-called FastICA(Hyvarinen et al., 2001), as well as variants ofsecond-order blind identification (SOBI)(Molgedey and Schuster, 1994) that also factor inrelationships between multiple time points usingautoregressive models. We have found thatinfomax ICA, in particular, gives reliable resultsfor data of sufficient quantity and quality havingalmost any number of channels.

In its original formulation, infomax ICA couldonly find sources that have super-Gaussian activitydistributions, meaning roughly that sourceprocesses are only intermittently active. Over suffi-cient time, this fairly well describes most EEG

phenomena. However, ‘extended’ mode infomax,introduced by Lee et al. (1999), can also learnfilters for sources such as line noise that have sub-Gaussian activity distributions (roughly speaking,activities that are mostly ‘on’). This may be im-portant when EEG data are obscured by (sub-Gaussian) 50 or 60-Hz line noise from environ-mental AC power sources. Matlab and binary im-plementations of infomax ICA, in particular, aswell as many other tools for analyzing EEG datawith this or other ICA algorithms, are freelyavailable in the open source EEGLAB analysisprogramming environment (http://www.sccn.ucsd.edu/eeglab).

In our hands, infomax ICA produces useful re-sults from decomposition of EEG datasets with31–256 channels. Decomposing data with fewer channels is also possible and should be use-ful for some purposes. Below, we compare someresults of ICA decomposition with results of tra-ditional scalp channel analysis to demonstrate thepromise of ICA methods for EEG research.

ICA theory

ICA model assumptions

A general and physiologically plausible assump-tion underlying most EEG analysis is that most ofthe far-field potentials detected at the scalp aregenerated not in the scalp itself, but within spatialdomains (or most simply, patches) of similarlyoriented cortical pyramidal neurons. EEG record-ings are time series of measured potential differ-ences between two scalp electrodes, usuallythought of as potential differences between an ‘ac-tive’ electrode and a second (‘passive’) referenceelectrode — though in fact both electrodes areequally receptive to nearly all cortical and artifactsource signals.

A given cortical patch can only produce a far-field potential and thus become an EEG sourcedomain, however, if the local fields surrounding itspyramidal cells become partially (not necessarilycompletely) synchronized. There are severalbiophysical properties of the brain that encouragesuch synchronization. These include inherent

101

rhythmic proclivities of pyramidal and other cor-tical cells, the high-speed, non-synaptic, electro-tonic gap junction connections between (mainly)non-pyramidal inhibitory cells, bidirectional cou-pling between inhibitory and excitatory (includingpyramidal) cells, and between cortex and thala-mus. However, both the strength and frequencycontent of the local synchronization is highly var-iable and may not normally have sufficient coher-ency and/or may not extend over a sufficient areato produce appreciable far-field scalp potentials.Many biophysical systems and properties modu-late the emergence and time course of local-fieldsynchronization. As a result of all this physiolog-ical complexity, EEG signals themselves havehigh-spatiotemporal complexity. The simple bio-physical fact remains, however, that potentials re-corded between any electrode pair will sumactivities from nearly all the active cortical EEGsource domains as well as from nearly all the activeartifact signal sources.

Temporally, EEG signals are comparatively wideband (at least 1-100 Hz, or more than seven octaves1–50Hz, more than six-octaves) and highly variablein amplitude, frequency content, and time course.Recent studies have even reported that amplitudemodulation of the posterior resting alpha activityexhibits self-similar or fractal complexity over manyoctaves (seconds to hours) (Linkenkaer-Hansen etal., 2004). In addition to exhibiting complex pat-terns of amplitude modulation, EEG signals oftenexhibit both small frequency shifts and large fre-quency jumps — phenomenally (if not functionally)akin to a European driver up- and down-shiftingwhile negotiating tight corners on a mountain road(Onton et al., 2005). The static ICA modeling dis-cussed in this chapter does not itself attempt tomodel the temporal properties of EEG source sig-nals. Here, we focus on specific examples of EEGprocesses and dynamics revealed by ICA-basedspatial filtering.

The spatially static ICA model discussed hereassumes that the (‘far-field’) activities recorded atthe scalp are produced in cortical EEG source do-mains that project near-instantly to the scalp elec-trodes via volume conduction. This means thatsignals recorded at scalp electrodes are the sum ofpotentials originating in nearly all cortical and ar-

tifact source domains. For electrical frequencies inthe EEG range, the basic ICA assumptions thatvolume conduction is linear and practically instan-taneous are confirmed by biophysics (Nunez andSrinivasan, 2005). Thus, scalp electrode signals canbe modeled as instantaneous linear mixtures ofcortical source plus non-brain artifact signals. Themechanisms by which the local synchronies appearin the cortical source domains and are modulatedacross time are of course highly non-linear. ICAonly seeks to cancel the volume conduction andlinear summation of distinct cortical (plus artifact)signals at the scalp electrodes — opening the prob-lem of characterizing the (highly non-linear) gener-ation and modulation of the EEG source activitiesthemselves to further analysis.

To separate EEG source signals, ICA makes akey assumption: that the far-field signals producedby the cortical and non-cortical EEG sources aretemporally distinct and, over sufficient input data,near temporally independent of one another. Isthis a physiologically plausible assumption? Sev-eral factors suggest that in many cases cortical andartifactual source signals may indeed be nearly in-dependent. It is important to realize, first, that es-tablishing the absolute independence of a numberof signals would require infinite data. Thus, inde-pendence, measured by any approximation alwaysmeans near independence. That said, why shouldthe far-field signals produced by different corticaland artifactual sources be near independent?

A simple constraint on cortical signal depend-ence comes from cortical connectivity, which isvery highly weighted toward short (o500 mm)connections. The largest class of inhibitory cellsin cortex, for example, has only short-range con-nections (Budd and Kisvarday, 2001). Thus, sync-hrony (or partial synchrony) between local-fieldactivities should very likely spread through a con-tiguous cortical area, rather than jumping betweendistant and very weakly connected cortical terri-tories. In fact, this is generally what has been ob-served in recordings from closely-spaced corticalelectrode and optical grids (Arieli et al., 1996;Freeman, 2004a).

Finally, the spatially static ICA model assumesthat the cortical domains or patches, as well as theartifact sources that together constitute EEG

102

‘sources,’ remain spatially fixed over time andtherefore project to the scalp channels with fixedweights or proportions. Most likely, the synchronyof distributed field activity across each source do-main is only approximate. For example, invasiveoptical and electrical recordings using closelyspaced sensors reveal moving sub-millimeter scalepotential gradients with traveling wave patterns(Arieli et al., 1996), leading some to speculate thattraveling waves occur regularly in cortex at largerspatial scales. However, if the cortical ‘patch’ thatproduces a synchronous far-field EEG source sig-nal is on the centimeter square scale, progressiveor radially expanding traveling wave activity(Freeman, 2004a) within the source domain willproduce a signal that appears nearly spatially con-stant on the scalp.

For example, consider Freeman’s model of EEGsource dynamics, based on his observations ofmammal brains with small (sub-millimeter spaced)electrode grids, of circular wave patterns thatspread across small areas of cortex like pond rip-ples produced by throwing a small rock into apond (Freeman, 2004b). What field dynamics onthe scalp should be produced by such activity ac-tive at, e.g., 10Hz? At a nominal traveling velocityof 2m/s, and assuming a cortical domain diameterof as much as 3 cm, the 10-Hz phase differencebetween the focal center of the ‘pond rippling’potentials and the edge of the active ‘ripple’ area(1.5 cm from the center) would be only

1:5 cm

0:002 m=ms! 100 cm=m=100 ms=cycle

! 360"=cycle # 27"

Thus, the outer edge of the pond-ripple patternwould lead (or follow) the center by less than a13th of a 10-Hz cycle, and mean local-field poten-tials within the patch (and at the scalp electrodes)would change from positive to negative and backagain nearly synchronously. Unless the corticaldomain involved were (a) quite close to the skulland (b) folded (e.g., straddling the edge of asulcus), the projection pattern on the scalp wouldbe very difficult to distinguish from the far-fieldpotential of a cortical source domain behaving asan ideal, fully-synchronous cortical source. Thus,

for resolving signals from most centimeter-scalesource domains similar to those observed by Free-man, ICA may indeed be adequate, at least forEEG frequencies lower than the (30-Hz andabove) gamma band. However, more advancedICA approaches including complex ICA (ICAperformed on Fourier or wavelet transformedEEG data) might indeed be able to recover, insome cases, evidence of near centimeter-scale po-tential flow patterns within individual cortical al-pha source domains (Anemueller et al., 2003).

Certain macroscopic EEG phenomena are alsoknown to exhibit large-scale traveling wave prop-erties — including epileptic seizures, slow-spreadingdepressions associated with migraine headaches(Lauritzen, 1994), and sleep spindles (Massimini etal., 2004). At best, ICA can only model such phe-nomena as active within a component subspace,i.e., a set of components each accounting for aspatial (and temporal) phase of the moving activitypattern. One can think of individual componentactivities in such cases as ‘overlapping movieframes’ that, in combination, can capture spatiallyshifting phenomena that reoccur in the data. Inmany cases, the resulting separation of the movingsignal into a relevant component subspace separatefrom the near-independent activities of other, spa-tially static EEG sources may be useful for manyanalysis purposes. There are, however, many pos-sible extensions of the static ICA model to moregeneral blind source separation methods that mayalso be able to identify and isolate particular sortsof spatially fluid patterns in scalp EEG data — ifand when the particular moving-source model be-ing used does in fact fit the phenomena of interest(Anemueller et al., 2003).

Questions about spatial EEG source stabilityalso extend over longer time scales and, as well,across changes in task and state. In general, it isnot known if in the normal awake brain, majorsources of synchronized activity tend to move be-tween cortical source domains, or whether corticalsource domains slowly change shape or location,e.g., during performance of a single task. Therelative stability of the results of infomax ICA de-composition applied to waking EEG data, bothwithin and between subjects (Makeig et al.,2002, 2004b; Onton et al., 2005) suggests that

103

many major sources of EEG activity are suffi-ciently spatially stable to be reproducibly resolvedinto temporally near-independent sources withfixed scalp maps.

Preliminary analysis, however, suggests that theset of cortical source domains separated by ICAmay well change when subjects perform a new task(Onton and Makeig, 2005). Nonetheless, the spa-tially static ICA model, applied to data from twotasks performed consecutively in one session,might still find components common to both taskperiods, weighting their activities during each taskperiod appropriately. In summary, the subject ofspatial non-stationarity of EEG cortical sourcedomains is still a largely open, interesting, andpotentially important frontier of EEG research.

ICA model basics

The data submitted to ICA are simply the EEGchannel recordings arranged in a matrix of nchannels (rows) by t time points (columns). Unlikedirect spatial filtering methods, no channel loca-tion information is used in the analysis. ICA per-forms a blind separation of the data matrix (X)based only on the criterion that resulting sourcetime courses (U) are maximally independent. Spe-cifically, ICA finds a component ‘unmixing’ matrix(W) that, when multiplied by the original data (X),yields the matrix (U) of IC time courses.

U # WX (1)

where X and U are n! t matrices, and W isn! n. By simple matrix algebra, Eq. (1) impliesthat

X # W$1U (2)

Here, W$1 (the inverse of W) is the n! n com-ponent ‘mixing’ matrix whose columns contain therelative weights with which the component pro-jects to each of the scalp channels, i.e., the IC scalpmap. The portion of the original data (X) thatforms the ith IC (Xi) is the (outer) product of twovectors, the ith column of the mixing matrix, W$1,and the ith row of U,

Xi # W$1i Ui (3)

and the whole data (X) are the sum of the ICs (Xi)

X #X

Xi where i # 1; 2; . . . n (4)

Again, each column of the (W$1) mixing matrixrepresents, for a single-component source, the rel-ative projection weight at each electrode. Mappingthese weights to corresponding electrodes on a car-toon head model allows visualization of the scalpprojection or scalp map of each source. The sourcelocations of the components are presumed to bestationary for the duration of the training data.That is, the brain source locations and projectionmaps (W$1) are assumed to be spatially fixed, whiletheir ‘activations’ (U) reveal their activity timecourses throughout the input data. Thus, the ICactivations (U) can be regarded as the EEG wave-forms of single sources, although obtaining theiractual amplitudes at the scalp channels requiresmultiplication by the inverse of the unmixing ma-trix (W$1) to return to microvolt units.

Neither the IC scalp maps nor the IC activationsare themselves in original recorded units. Rather,the original activity units (mV) and polarities(+/$) are distributed between the two factors —the IC scalp map and activation time series. Forexample, reversing the polarities of the activationand inverse weight matrices, then back projectingby multiplying these two matrices (as in Eq. (3)above) recovers the original component activitiesin their native microvolt units. Thus, neither thesign of the scalp maps nor the sign of the activat-ions are meaningful in themselves, but it is theirproduct that determines the sign of the potentialaccounted for at each scalp channel. However, ICactivation magnitudes may be normalized by mul-tiplying each by the root-mean square (RMS) am-plitude of the corresponding IC scalp map. Theactivation units are then RMS microvolts acrossthe scalp array.

The style of ICA decomposition considered hereis said to be complete, i.e., a decomposition inwhich the number of ICA components recovered isthe same as the number of channel inputs. Thus,30-channel data will be decomposed by ICA into30 ICs, whereas 60-channel data will be decom-posed into 60 ICs. Methods for overcomplete ICAdecomposition also exists, though these require

104

additional assumptions. Frequently asked ques-tions about ICA include: (1) are there really only afixed number of data sources? (2) What are theeffects of recording and decomposing differentnumbers of data channels? Although full answersto these questions are mathematically difficult andpossibly intractable, in general the number ofnear-independent brain sources of EEG datashould theoretically be nearly unlimited, althoughour power to resolve them from any fixed numberof scalp channels is limited.

Results of ICA decomposition of high-density(e.g., 256-channel) data acquired from normalsubjects during performance of cognitive taskssuggest that some dozens of distinct EEG sourcesare large and/or distinct enough to be separatedinto ICs with physiologically interpretable scalpmaps and activations. The remainder of the (e.g.,200+) ICs found by ICA in such data must beeither ICs that clearly account for non-brain arti-facts, or else mixtures of lower energy sourcesthat are combined to satisfy the requirement thatthe component activities sum to the whole data.In our experience, applying ICA decompositionto 31-channel data typically yields 5–15 non-artifactEEG components comparable with thoseobtained from high-density recordings.

ICA practice

Example data

In the following sections, we will illustrate the ad-vantages of using ICA to isolate EEG source ac-tivities in multi-channel data. In these examples,we will use data from a single subject performing astandard ‘two-back’ working memory task. Thesubject was presented with a letter (B, H, J, C, F,or K) at roughly 1500-ms intervals, and respondedto each letter by pressing one of two buttons usinghis or her their right or left thumb, respectively, toindicate whether the current letter was the same as(match) or different from (non-match) the letterthat had been presented two back in the sequence.At each letter offset, an auditory feedback stimu-lus indicated whether the subject response wascorrect or incorrect. Letter presentation duration

was adjusted after each trial block to induce per-formance to be as close as possible to 75% correct.

Correct responses each earned the subject 1point, while incorrect responses cost 1 point. Atthe end of the experiment, the volunteer subjectswere paid a 1b (US) bonus for every point ac-crued. Total bonus money earned was in the rangeof $10 (US) in addition to the regular hourly rateof compensation. To introduce occasional height-ened experience of reward and punishment, on10% of the correct and incorrect trials, respec-tively, upward and downward gliding tones, re-spectively, were delivered to indicate that thenumber of points earned or lost on that trial werefive times as large as usual. Thus, infrequent ‘bo-nus’ signals added 5 points, while infrequent ‘pun-ishment’ signals lost 5 points from the subjectpoint total, which was displayed on the subjectscreen between task bouts.

Figure 1 illustrates typical results of ICA decom-position performed on 1917 s of 100-channel EEGdata digitized with 24-bit resolution at 256Hz. Thetop half of the figure displays EEG data from asubset of the 100 electrodes over the course of 15 s,while the bottom half of the figure shows the (‘ac-tivation’) time courses of several ICs during thesame period. The gray bars show when a letter wasdisplayed on the computer screen. Vertical coloredlines indicate the type of auditory performancefeedback signal delivered at each letter offset.

During the illustrated data period, the subjectblinked after each performance feedback signal (asclearly revealed by the time courses and scalpmaps for IC1 and IC3, and as visible in severalfrontal scalp channels). Other types of EEG arti-facts were also isolated by ICA, including lefttemporal scalp muscle activity (IC55) and cardiacpulse artifact (IC12). ICA found several ICs pre-dominantly projecting to posterior scalp with apeak in the (8–12Hz) alpha band (e.g., IC5 andIC8), as well as at least three ICs with spectralpeaks in the (4–7Hz) theta band (IC4, IC6, andIC7). Alpha peak components tend to be associ-ated with scalp maps suggesting projection of one,or else of two symmetric equivalent dipole(s) (seeSection ‘‘Example data’’) in posterior brain, whilefrontal components typically have a mean spectralpeak in the theta range.

105

Figure 1 also shows the concatenated averageevent-related potential (ERP) waveform time (themean for all 308 letter presentations) locked tovisual letter onsets for IC5. Although the ERP isconvenient to summarize event-related data, com-parison of its waveform with that of the unaver-aged IC5 activity provides a dramatic example ofhow the average ERP captures little of the actual

process activity. In particular, the IC5 alpha burststhat regularly follow letter offsets in this time spanare nearly absent from the ERP because, thoughthey clearly follow most stimulus presentations,they are not precisely phase locked to letter onsets(i.e., the phase of their alpha activity, relative tostimulus onset, varies near randomly). Thus, theselarge alpha activity bursts are not prominent in the

Fig. 1. Fifteen seconds of EEG data at 9 (of 100) scalp channels (top panel) plus simultaneous activities of 9 (of 100) independentcomponents (ICs, bottom panel). While nearby electrodes (upper panel) record highly similar mixtures of brain and non-brainactivities, ICA component activities (lower panel) are temporally distinct (i.e., maximally independent over time), even when their scalpmaps are overlapping — compare, e.g., IC1 and IC3, accounting for different phases of eye-blink artifacts produced by this subjectafter each visual letter presentation (gray background) and ensuing auditory performance feedback signal (colored lines). Compare,also, IC4 and IC7, which account for overlapping frontal (4–8Hz) theta-band activities appearing during a stretch of correct per-formance (7 through 15 s). Typical ECG and EMG artifact ICs are also shown, as well as overlapping posterior (8–12Hz) alpha bandbursts that appear when the subject waits for the next letter presentation (white background). For comparison, the repeated averagevisual-evoked response of a bilateral occipital IC process (IC5) is shown (in red) on the same (relative) scale. Clearly the unaveragedactivity dynamics of this IC process are not well summarized by its averaged response — a dramatic illustration of the independence ofphase-locked and phase-incoherent activity.

106

average ERP waveform, having been removedfrom it by phase cancellation.

Component source modeling

Although each EEG channel recording is associ-ated with a specific location on the scalp, as wehave seen, electrode locations are at best quitecrude indicators of the locations of even thestrongest underlying cortical sources. Thus, EEGrecordings are typically and traditionally con-sidered to have ‘low spatial resolution.’ The stand-ard approach to EEG source localization is todirectly attempt inverse modeling of EEG sourcelocations from one or more observed scalp poten-tial distributions. However, since nearly all re-corded EEG scalp maps sum activities frommultiple brain and non-brain sources, this has ap-peared to be a nearly intractable problem.

Originally, some researchers hoped that thenumber of active EEG sources could be mini-mized by computing the average scalp distributionat some latency or latencies following significanttask events, reasoning that all but one EEG proc-ess might be cancelled out during response-lockedaveraging. Unfortunately, more than one corticalEEG process reacts to a stimulus, and further re-search has shown that such ERP averaging canonly isolate activity from a single brain area if it isthe earliest responder to a sensory event. In gen-eral, the changes in EEG dynamics induced bysignificant events quickly involve several corticalareas (Klopp et al., 2000).

ICA decomposition, however, offers a new andmore promising approach to solving the EEG in-verse problem — namely, performing inversemodeling of the individual IC scalp maps them-selves. The simplest ‘realistic’ EEG inverse modelattempts to match each observed scalp map withthe best-fitting projection pattern of a single activeequivalent dipole placed in a 3-D head model atsome location and orientation (Scherg and VonCramon, 1985). Applying such standard inversesource modeling methods to the IC scalp mapsfrom the example session of Fig. 1, using a best-fitting four-shell spherical head model, we foundsingle equivalent dipole models for about 20 ICs

whose scalp projection patterns adequately fit theobserved IC maps (e.g., with no more than 15%residual variance between the IC scalp map andthe dipole scalp projection).

Figure 2 illustrates the scalp projections (W$1)of nine ICs from one subject linked to the loca-tions of their respective best-fitting equivalent di-poles in a common 3-D (Talairach) brain space.The indicated percent residual variance specifiesthe percent difference between the scalp projectionof the model dipole and the actual IC scalp map.For all nine pictured ICs, median residual variancewas 2.8%. This is near to the minimum level oferror expected from imperfectly modeling the headas a set of conductive spheres (representing brain,pial surface, skull, and scalp). Each dipole is as-sociated with a particular orientation that deter-mines the pattern of its scalp distribution. Forexample, IC7 is oriented radial to the scalp surface(see the sagittal brain projection) and thus projectsmost strongly to the forehead. The other end ofthis dipole (e.g., the red end, in this depiction)might project near the throat, but was not re-corded by the electrode montage used in this ex-periment. In contrast, IC5 can be accuratelymodeled (with residual variance near 1%) onlyusing two occipital dipoles with bilaterally sym-metric locations and roughly tangential orientat-ions with respect to the scalp surface, thussimultaneously contributing both positive andnegative potentials to the scalp recordings.

If an IC scalp map cannot be well modeled by asingle dipole, we typically do not attempt to modelit. However, when an IC scalp map appears to bebilaterally symmetrical, it may be reasonablymodeled by two dipoles symmetrically located inleft and right hemispheres, though possibly withdiffering dipole orientations. An example of thissituation is illustrated by IC5 in Fig. 2. This sourceconfiguration might arise from (1) synchronousevoked responses in left and right visual cortices inresponse to central visual stimuli and/or (2) syn-chronized alpha-band activity bi-directionally cou-pled through dense callosal connections. In thelatter case, it might be possible to observe ICsgenerated in other pairs of cortical areas also bi-directionally coupled by dense white matter tracts,e.g., the arcuate fasciculus that connects frontal

107

cortex to the ipsilateral temporoparietal junction,though we do not yet have convincing evidenceconfirming this possibility.

In general, ICs tending to account for mostevent-related dynamics are likely to be dipolar (oreye artifacts), whereas non-dipolar and noisy-appearing components tend to make little contri-bution to event-related dynamics (unpublishedobservations). Non-dipolar ICs may represent ac-tivity that does not fit the ICA spatial stationarityassumption, or may represent mixtures of smalland inconsistent sources. Given the reasonablepresumption that the total number of quasi-inde-

pendent brain and non-brain sources (mostly quitesmall) may be unlimited, whereas the number ofscalp channels is limited, some ICs will containinformation from more than one source. However,in our experience, such components tend to ac-count for relatively little of the recorded EEG sig-nals, and even less of the time or time/frequencydomain event-related dynamics.

Modeling ICs using single or bilateral equivalentdipole models not only helps localize activities ofinterest to a particular brain region; it is also aconvenient way to assess and visualize the spatialhomogeneity of clusters of functionally similar or

Fig. 2. Independent component scalp maps modeled as single dipolar sources. The location and orientation of dipoles within the headmodel determine the theoretical scalp projection of the dipole’s electric field. For the nine IC scalp maps pictured here (from thedecomposition shown in Fig. 1), there is a median 2.8% residual variance between the IC scalp map (W$1) and the model projection ofthe best-fit equivalent dipole, likely close to the error inherent to fitting sources within a spherical head model. Thus, each of the ICscalp maps pictured (except IC5) are highly compatible with a cortical source domain consisting of a single cortical patch of unknownextent. IC5 is well modeled by two dipoles located symmetrically across the occipital midline and is likely tightly coupled through thecorpus callosum.

108

equivalent IC processes across subjects (see Sec-tion ‘‘Do different subjects have equivalent ICs?’’).However, more advanced methods of distributedsource modeling incorporating structural informa-tion from subject MR head images, should be stillmore informative.

Practical considerations

Decomposing EEG data by ICA is relativelystraightforward, though the quality of the resultsis highly dependent on two major factors. First,the number of time points of n-channel data usedin the decomposition must be sufficient to learnthe n2 weights in the (n! n) ICA unmixing matrix.Technically, the number of independent degrees offreedom in the data is a more relevant (though lessaccessible) measure of data quantity than thenumber of time points per se, and is always lowerthan the number of time points since the data arenot white. As a rule of thumb, at comparable datacollection rates the amount of data needed for a‘clean’ decomposition is related to the number ofelectrodes squared (i.e., the number of weights)times a factor, k. In our experience (and for ourdata and procedure), when the number of elec-trodes is relatively high k may need to be 25 orlarger. But to perform a full-rank (256-compo-nent) decomposition with k # 25 would require2562! 25 # 1,638,400 or more data points. At a256-Hz sampling rate, this would require nearly2 h of data. (Note that our k # 25 rule of thumbmay depend on algorithm, data quality, frequencyrange, etc).

For smaller numbers of channels, the amount ofdata required should be much smaller. For exam-ple, for one quarter the number of channels (64),only a 16th the amount of data (%7min) would berequired to give the same k factor. However, ICAdecompositions using still more data (k425) tendto be more regular and produce more dipolarcomponent maps. Thus, more data is better — solong as the EEG source locations do not shift. Forexample, jointly decomposing data from awakeand sleeping conditions might not be optimal if theEEG source locations in these portions of the datadiffered.

Second, the universal rule of signal processing,‘garbage in, garbage out’ (GIGO), applies to ICAdecomposition as well. Two classes of artifactsmust be considered to decide what constitutes un-desirable (‘garbage’) data for ICA. EEG artifactsarising from eye movements, eye blinks, and mus-cle tension have stereotyped scalp projections(since the positions of the eyes and muscles donot change throughout the session), although eyemovements in different directions, or blinks ofeach eye separately, etc., may introduce multiplescalp projection patterns into the data and thusrequire more than one IC to account for their ar-tifacts.

Another class of artifacts is more problematicfor ICA decomposition. These include large mus-cle movements such as clenching the jaw, talking,swallowing, clearing the throat, or scratching thescalp. Because these activities involve many musclegroups and possible electrode movements, thescalp potential maps they create are likely to beunique, so their inclusion in data submitted to ICAcould require as many ICs to separately modeltheir large- amplitude activities. Even when a largenumber of channels and ICA dimensions, areavailable, relatively few ICs will then be left toaccount for independent cortical sources of inter-est. Therefore, it is important to remove such ‘ir-regular’ artifact periods from the data before finaldecomposition.

Artifact prototypes

One of the simplest applications of ICA to EEGdata is to remove stereotyped artifacts from thedata. As mentioned above, eye blinks are highlyspatially stereotyped and are usually separatedinto one or more ICs that together neatly accountfor the entire eye-blink artifact. Once a componenthas been identified as artifactual, it may be re-moved from the data by reversing the ICA linearunmixing process. To remove the activity of the kth

IC from the data, simply replace the entire kth rowof the component activation matrix U withzeros and then multiply the modified activationmatrix, Uk0, by the ICA mixing matrix W$1.Else, equivalently, the map weights for unwanted

109

components in W$1 may be zeroed. The resultingback-projected data will be the same size and unitsas the original data matrix, though its intrinsic di-mensionality (or ‘rank’) will have been reduced byone and the data will no longer contain activityfrom IC k.

The process of determining whether an IC ac-counts for an artifact or a cortical process may inmany cases be simple for the trained eye, but typ-ically requires some evaluation of its activity pat-tern. Figure 3 demonstrates two types of data weuse to determine whether an IC accounts for cor-tical or artifact activity. Each panel shows the ac-tivity power spectrum (left) and the actual activitytime course in the single experimental trials (right)of four ICs from the same session as Figs. 1–3.Panel A shows a typical spectrum of eye-blink ac-tivity, consisting of relatively high power at lowfrequencies and no spectral peaks. This subjectblinked consistently following each stimulus pair(A, right). While this is unusual, and so cannot beconsidered a criterion for eye-blink identification,it was not unreasonable behavior for a subject inthis demanding task. Eye blinks are one of theeasiest types of components to recognize by scalpmap alone because (assuming the electrode mon-tage includes sites below as well as above the eyes)the scalp map of a blink-related IC will be nearlyidentical to that shown in panel A.

Panel B displays properties of another commontype of EEG artifact, sporadic muscle tension inscalp or neck muscles. This is usually well modeledby ICA. Unlike eye-blink components, musclecomponent scalp maps vary depending on themuscle they represent. The equivalent dipole for amuscle component is oriented parallel to, and ide-ally within the scalp, and the component scalpmap exhibits a sharp polarity reversal at the mus-cle’s point of insertion into the skull (e.g., the shiftfrom red to blue in the scalp map in panel B).Muscle component spectra typically have highestpower at frequencies above 20Hz. Other frequen-cies may be present (as, here, near 10Hz), thoughthe most prominent feature of electromygraphic(EMG) activity is the dominant power in thegamma range. Muscle tension is typically notmaintained throughout an entire experiment, in-stead tending to switch on and off for stretches of

time (likely without explicit awareness of the sub-ject). As an illustration, a red bar to the right ofthe ERP-image plot in panel B indicates a block oftrials when this muscle’s activity was relativelyquiet. Thus, ICs accounting for muscle activitymay be identified using multiple criteria and sub-sequently removed from the rest of the data, ifdesired.

For comparison, panels C and D display datafrom two types of cortical IC processes with strongalpha (C) and theta (D) rhythms, respectively, asindicated by the strong spectral peaks in their ac-tivity spectra. Furthermore, the ERP-image plots(right) (Makeig et al., 1999; Jung et al., 2001) ofthe activities of these components in single trialsshow consistent patterns of activity time-locked toexperimental stimuli of interest. Other criteria canbe used to determine which ICs are putative cor-tical components, such as the fit of each IC scalpmap to a best-fitting single (or sometimes bilateral)dipole model (see Section ‘‘Example data’’). Theestimated location of an IC can be another helpfultool for discovering whether or not a componentexplains artifact activity. For example, eye blinkand muscle ICs will typically localize outside of thebrain volume (given a reasonably good co-regis-tration between the head model and the actualelectrode locations).

ICA applications

Time-domain analysis of IC activities

Traditionally, EEG analysis has focused on theaveraged scalp potentials in relation to a time-locking event. These average ERPs are interpretedas if they captured the relevant event-related EEGactivity. It is important to realize that evoked po-tentials, even at relatively early latencies, sum pro-jections to the scalp of the net activities in manyEEG sources that survive averaging. In addition,ERPs capture only the portions of the single-trialsignals that are both time-locked and phase-lockedto the set of time-locking events. Often, this maybe only 1% of the single-trial activities. It cannotbe concluded from this that the other 99% of therecorded signal that does not survive averaging is

110

Fig. 3. Power spectra and ERP-image plots of single-trial activities time locked to auditory feedback stimuli for four selectedindependent components (ICs). (A) Eye blinks create high power at low frequencies and characteristic blink-like deviations in singletrials that may (as here) or may not be regularly time-locked to experimental stimuli. (B) Muscle tension (here, in a left temporalmuscle) is associated with high power above 20Hz.The roll-off of power above 40Hz here was produced by 50-Hz low-pass filtering ofthe original data. Note the absence of this muscle’s activity during the middle portion of the experiment (red bar on right). (C and D)IC processes accounting for cortical EEG activity during cognitive tasks typically show a mean spectral peak in either the (8–12Hz)alpha band (posterior ICs) or (4–7Hz) theta band (anterior ICs), and produce both phase-locked evoked and phase-random inducedspectral responses to significant stimuli (C). They may also exhibit partial phase resetting of their ongoing activity at their characteristicfrequencies (D).

111

unaffected by the time-locking events. In particu-lar, event-related modulation of the power spec-trum in one or more parts of cortex may occurwhich is generally time-locked to the time-lockingevents but whose signal phase distribution withrespect to event onsets at relevant frequencies israndom. Measuring such event-related modula-tions in signal spectral power can be accomplishedusing time/frequency analysis.

Time/frequency analysis of IC activities

As the red trace in Fig. 1 (above) shows, and con-trary to naıve assumptions often gradually accruedin the minds of researchers who come to view dataaverages as if they were the data themselves, theaveraged evoked response typically retains little ofthe complexity of the event-related EEG dynamicsthat follow (or, sometimes, anticipate) significantevents in cognitive task paradigms. As Pfurtschel-ler and Aranibar (1977) realized over 25 years ago,changes in amplitude of EEG oscillations time-locked but not phase-locked to a set of similarevents are not evident in ERP averages. Pfurtschel-ler’s event-related desynchronization (ERD) andlater synchronization (ERS) measures capturemean event-related dynamics of oscillatory activ-ity in a selected narrow band (originally, thealpha band).

In 1993, the senior author generalized this anal-ysis to consider a wider range of frequencies atonce, producing a mean latency-by-frequency im-age, the event-related spectral perturbation(ERSP) image (Makeig, 1993). The time/frequencymeasures and equivalent 2-D images we call ER-SPs reveal the frequencies and latencies whenmean changes in log power occur from some meanpower baseline, time-locked to a class of experi-mental events. Subtracting log mean baselinepower (or equivalently, dividing mean power bymean baseline power and then taking the log of theresult) measures how strongly mean, event-relatedpower at different frequencies either increased ordecreased relative to the baseline spectrum. Thisnormalization models spectral perturbations bymultiplicative influences by which ongoing activityis either augmented or reduced near experimental

events. Converting to (log) dB scale after takingmeans regularizes the measure, avoiding extremenegative values that would be produced by takingthe log of individual power spectra containingnear-zero values. The normalization also mini-mized the dominance of low frequencies in corticalEEG (see Fig. 3, left column), making a commoncolor scale applicable for all frequencies.

When interpreting ERSP images, it should beremembered that an ERSP is a statisticalmeasure–the mean of a distribution of single-trialtime/frequency transforms. EEG activity at bothsource and scalp levels is typically highly variablefrom second-to-second and trial-to-trial. Thus, amean event-related increase at some frequency by,say, 6 dB does not typically imply that a constanttrain of oscillatory activity at that frequency con-tinues following the event, only doubled in ampli-tude. More likely, either the frequency ofoccurrence of bursts at that frequency doublesfollowing the events in question, or the mean am-plitude of the bursts doubles — or some combi-nation of the above. In fact, the mean change inlog power revealed by the mean ERSP may besmall compared with the ongoing variability in therecorded EEG power from trial-to-trial. In gen-eral, trial-mean measures (including ERPs andERSPs) do not characterize the variability of thequantity measured, nor other aspects of its distri-bution across trials — for this, further analysis isrequired. Arguably, understanding the place andfunction of the trial-to-trial variability may givemore insight into dynamic brain function thanmeasuring smaller mean changes across trials. Forthis, however, new analysis methods are required(Onton et al., 2005a).

The purpose of computing (log) mean ERSPs isto identify stable features of the event-related data.However, all features of a computed ERSP maynot be equally stable or reliable across sessions orsubjects. To estimate which ERSP features aresignificant, we have implemented non-parametrictesting using data permutation methods to createdistributions of surrogate data whereby statisticalsignificance of the observed mean changes can beevaluated (Delorme and Makeig, 2004). Specifi-cally, for each time/frequency point we generate adistribution of pseudo or surrogate data points in

112

which all features of the actual data except one arepreserved. For example, to assess the significanceof a non-zero ERSP value at a given time/fre-quency point, we could repeatedly shuffle the late-ncies in a selected time window from each trialused in the mean ERSP computation, thereby col-lecting a distribution of some hundreds of surro-gate mean ERSP values representing randomvariations in mean spectral power at the selectedfrequency that occurred during the trials.

If the actual mean ERSP value at the time/fre-quency point is found to be outside the distribu-tion of surrogate values (e.g., for po0.01, outsideits 99% percentile), then the observed mean ERSPdifference at this time/frequency point may be saidto differ significantly from expected power fluctu-ations at this frequency in the data. That is, thereis likely to be some influence at the selected latencythat increases (or decreases) power at this fre-quency. It is important, of course, to recognize theproblem of multiple comparisons when interpret-ing results of permutation tests. For example, at apixel significance threshold of pp0.01, 1% of evenrandom data transforms should be expected to bemarked as ‘significant.’

To test the significance of differences betweenmean ERSPs in two conditions across a group ofsubjects, one may simply shuffle the assignmentsof ERSPs to the two conditions and make a dis-tribution of surrogate mean difference ERSPswhose limits at each time/frequency point definethe bounds of expected variation of the ERSPcontrast of interest (Blair and Karniski, 1993). Forcomparing ERSP features across subjects, a com-putationally simpler though possibly less reliablemethod of evaluating the reliability of the ob-served ERSP differences is to quantify the numberof significant power perturbations at a given time/frequency point for all subjects, and then to rejectobserved differences whose significance across sub-jects is less than a pre-defined binomial probabilitylevel. This procedure often yields strong signifi-cance levels that minimize the chance of significantresults arising simply from the problem of multiplecomparisons.

Analysis of both scalp channel data and IC ac-tivities shows that in many cases, the EEG sourcescontributing to ERP features may be the same as

those contributing to concurrent event-relatedpower (ERSP) changes. Further, ICA spatial fil-tering reveals that oscillatory activity of EEGsource processes contributing to average ERPs isusually only partially phase reset (or, alternatively,phase locked or phase constrained) relative to thetime locking events (Makeig et al., 2002). That is,mean spectral amplitude shifts and degrees ofphase locking at various frequencies and latencieswith respect to events of interest form a multi-dimensional space of possibilities for event-relateddynamics of cortical source processes (Makeig etal., 2004a). That is, ERP features and event-relatedspectral power shifts or perturbations are usuallynot, in fact, distinct phenomena. Instead, eachpresent a limited view of more complex event-re-lated brain dynamics that involve changes in bothspectral power and phase at one or more frequen-cies (Makeig et al., 2004a). Naturally, their appli-cation to data recorded at single scalp channelsconfounds the multiple and often partially cancel-ing contributions of the underlying EEG sources,thus complicating the interpretation of event-re-lated phase consistency in scalp channel records.

Another specific pitfall of both ERP and ERSPanalysis is that the typically flat pre-stimulus ‘base-line’ of average ERPs and ERSPs suggests that theactivity captured in the ERP or ERSP is producedby cortical sources that are silent (ERP) or activein some steady state (ERSP) before the events ofinterest. Neither average ERPs nor average ERSPsare sufficient measures of event-related dynamicsunless all the single-trial data exhibit the samesource dynamics plus other (‘noise’) processes thatare unaffected by the events of interest — an as-sumption that a recent detailed analysis, shows tobe inadequate (Onton et al., 2005). In general, thesources of ongoing EEG activity isolated by ICAshow highly variable activity patterns in pre-stim-ulus periods, and complex transformations of theirjoint amplitude and phase statistics following sig-nificant stimuli in event-related paradigms.

A next important step in ICA analysis of EEGdata is to analyze the IC activity (or ‘activation’)time courses themselves. This involves, first, per-forming an ICA decomposition of data ofadequate quantity and quality to yield physiolog-ically plausible components. Next, physiologically

113

plausible ICs should be selected for detailed anal-ysis. Typically a workable criterion is that the ICbe reasonably well fit by a single (or bilateral)equivalent dipole(s) (e.g., within %15% residualvariance; see Section ‘‘Example data’’) and thatthis dipole be localized inside the head volume. If adipole cannot be fit to an IC scalp projection, thenthe homogeneity of its activity may be questioned.As mentioned in Section ‘‘Practical considera-tions’’, ICs that also express physiological rhythmssuch as alpha or theta activities are likely to rep-resent both anatomically and functionally distinctcortical source activities (though Fig. 3B shows acounterexample).

Note, however, that IC activities represent theresult of instantaneous and therefore broad-bandfiltering of the scalp data. Each IC activation timeseries apparently represents the synchronous por-tion of activity within one (or sometimes twolinked) patches of cortex, and should therefore havea broad colored-noise spectrum with or withoutsingle or multiple spectral peaks. If multiple peaksare present, higher peaks may represent harmonicsof lower-frequency peaks (e.g., harmonics of alphaor theta rhythms arising from the non-sinusoidal,more triangular wave shape of the oscillatory ac-tivity), and/or bursts of unrelated higher-frequencyactivity interspersed between periods of dominantfrequency activity (Onton et al., 2005).

Figure 4 directly compares time/frequency anal-ysis for a mid-frontal scalp channel and for threecontributing ICs. It makes visible the ambiguitiesinherent in time/frequency analysis of scalp channeldata alone. Panels A and E show the ERSP andinter-trial coherence (ITC) images, respectively, fora frontal midline scalp channel indicated by thegray disc on the model head. Here, artifacts includ-ing vertical and lateral eye movements, pulse arti-fact, and muscle activity were removed from thedata before time/frequency transformation. Theside panels display ERSP and ITC images for threedipolar components contributing to the channelERSP, as shown by the relation of the channel lo-cation (gray disc) to each of the IC scalp maps.

The ERSP transform of the frontal channel data(A) is quite similar in pattern to that of IC5 ac-tivity (D), though the blocking of (10-Hz) alpha-band activity in this bilateral occipital component

is stronger (%12 dB vs. %6 dB). In contrast, thesmall event-related alpha power increase (ERS) ofmid-frontal component IC7 (B) is not visible in thechannel data transform (A). Further, the post-stimulus increase in theta-band activity of mid-frontal IC4 (C) is not visible in the scalp channeldata transform (A). Either another theta-bandcomponent activity projecting to this channel mayhave decreased during this time interval, thus bal-ancing the increase in IC4, or the summed mixtureof many source activities at the scalp channel hadmore trial-to-trial variability, making the alphaincrease from IC7 insignificant in the channel data.Figure 4 illustrates that not only scalp channelERSP measures may be blind to some of the ER-SPs in the underlying cortical sources, but alsothey may suggest incorrect conclusions aboutsource locations. Here, the strongest IC contribu-tor to this frontal electrode ERSP is in fact from anoccipital source. Thus, results of time/frequencyanalysis applied to single-channel data should alsobe interpreted with caution.

In itself, the average ERP says nothing aboutthe portion of the energy in the whole EEG signalthat it captures. ITC, first introduced as ‘phase-locking factor’ by Tallon-Baudry et al. (1996), is ameasure of trial-to-trial phase consistency at eachfrequency and latency relative to a set of timelocking events. Its theoretic range is from 0 (uni-form phase distribution across trials) to 1 (identi-cal phase in each trial). Most EEG processescontributing to ERPs do so via partial rather thancomplete phase locking, i.e., with an ITC 51. InFigure 4, the marked theta-band ITC differencesbetween the contributing ICs are unavailable inthe scalp channel ITC. As with scalp channelERSP analysis, scalp channel ITC measures maymask differences in underlying phase consistencyof the distinct cortical source activities whose pro-jections are mixed at the scalp electrodes.

Comparing IC activities across subjects

Do different subjects have equivalent ICs?

In exchange for the benefits that ICA offersto EEG analysis in both spatial and temporal

114

resolution of separable source-level activities, italso introduces a new level complexity into EEGanalysis. In traditional scalp channel signal anal-ysis, clustering of event-related EEG phenomenaacross subjects is straightforward, as each scalpelectrode is assumed to be comparable with resultsfrom equivalently placed electrodes for the allsubjects. Comparing ICA results across subjects,in contrast, requires that, if possible, ICs fromdifferent subjects should likewise be grouped intoclusters of ICs that are functionally equivalent de-spite differences in their scalp maps.

If clustering ICs across subjects seems like animprecise process, it should be considered thatdata recorded at a single-scalp channel within

each subject is heterogeneous, so 5the idea ofgrouping channel activity across subjects may alsobe a risky proposition. In particular, clear phys-ical differences between subjects in the locationsand, particularly, the orientations of cortical gyriand sulci mean that even exactly equivalent cor-tical sources may project, across subjects, withvarying relative strengths to any single-scalpchannel location, no matter how exactly repro-duced across subjects. Thus, the basic assumptionin nearly all EEG research, that activity at a givenscalp location should be equivalent in every sub-ject, is itself questionable. In contrast, changingthe basis of EEG evaluation from scalp channelrecordings to IC activities necessitates an extra

Fig. 4. Comparison of average event-related spectral power (ERSP) and inter-trial coherence (ITC) measures time-locked to letteronsets in the two-back task, at a single frontal midline scalp channel (A and E, center) and for three ICs projecting to this channel(ERSP: B–D; ITC: F–H). Green portions of the ERSP and ITC images are non-significant (p40.01) by surrogate data testing. ERSPand ITC measures of event-related IC signal dynamics represent different aspects of event-related perturbations of ongoing oscillatoryactivity. Component ERSP and ITC features may conceal differences in the summed contributions of different sources to individualscalp channels, as is the case here, for both theta and low-alpha band power and phase locking. The far left panels of A–D (greentraces) show the mean component activity spectra, and the lower panels (blue and green traces) show the maximum and minimumspectral power perturbations. The green and blue traces of the left panels in E–H show the mean and maximum ITC values at eachfrequency, while the lower panels in F–H show the activity ERPs.

115

step compared with channel analysis — that ofcombining and/or comparing results across sub-jects through identifying equivalent IC processes,if any, in their data.

Approaches to IC clustering

The process of identifying sets of equivalent ICsacross subjects, or even across sessions from thesame subject, can proceed in many ways depend-ing on the measures and experimental questions ofinterest. An appealing approach to clustering ICsis by their scalp map (W$1) characteristics. Suchclustering can be attempted by eye, by correlation,or by an algorithm that searches for common fea-tures of IC scalp maps. The disadvantage of thismethod is that, as shown in Fig. 2, slight differ-ences across subjects in the orientation of equiv-alent dipoles for a set of equivalent ICs canproduce quite different IC scalp maps.

Clustering ICs based on the 3-D locations oftheir equivalent dipoles may avoid this problem.Using this method, it is possible to describe typicalevent-related or other activities in cortical areas ofinterest, or at least in cortical areas with sufficientdensity of IC equivalent dipoles across subjects orsessions. Common clustering algorithms such asK-means and other distance-based algorithms canbe used to cluster ICs based on the 3-D locationsof their equivalent dipoles quickly and easily.However, clustering on estimated cortical locationalone may introduce similar confounds as cluster-ing by scalp channel location, since subjects mayhave multiple types of IC processes in the samegeneral cortical regions.

For one, comparing cortical locations acrosssubjects raises the same spatial normalizationquestions as arise in functional magnetic reso-nance imaging (fMRI) analysis. Since brain shapesdiffer across subjects, true comparison of 3-Dequivalent dipole locations should be performedonly after spatially normalizing each set of subjectIC locations to his or her normalized individualstructural MR brain image. This requires MR im-ages to be obtained for each EEG subject, a re-quirement that may greatly increase the resourcesrequired for EEG data acquisition.

A simpler method normalizes the 3-D equivalentdipole locations via normalizing the subject headshape, as learned from the recorded 3-D locationsof the scalp electrodes, to a standard head model.When 3-D electrode location information is notavailable, the expected functional specificity ofequivalent dipole clusters based on estimated equiv-alent dipole locations in a standard head modelmust be reduced. In this case, some IC processesestimated to be located in the same cortical areamay not express the same functional activities. De-spite this drawback, our results show that clusteringcomponent dipole locations in a standard sphericalhead model still allows for meaningful conclusionsabout differences in regional EEG activities acrossone or more subject groups, assuming sufficientstatistical testing is applied to the data, and thelimitations of the analysis are acknowledged.

Because homogeneity of an IC cluster is mostaccurately assessed and characterized by the ac-tivities of its constituent ICs, a more direct route toobtaining functionally consistent clusters may be togroup ICs from experimental event-related studiesaccording to their event-related activity patterns.For example, a recent EEG/fMRI study (Debeneret al., 2005) clustered components contributingmost strongly to the event-related negativity(ERN) feature of the average ERP time-lockedto incorrect button presses in a speeded choicemanual response task. Remarkably, the authorsshowed that trial-to-trial variations in the strengthof the activity underlying the ERN correlated withchanges in the fMRI blood oxygen level-depend-ent (BOLD) signal only in the immediate vicinityof the equivalent dipole source for the componentcluster. In some cases, therefore, clustering ICs onsimilarities in their ERP contributions can be asimple but powerful approach to discoveringsources of well-documented ERP peaks.

If the measure of primary interest is not the av-erage ERP but, instead, the average ERSP meas-uring mean event-related fluctuations in spectralpower of the ongoing EEG across frequencies andlatencies, then component ERSP characteristicsmay similarly be used as a basis for IC clustering.Given a small number of subjects and a simpleexperimental design, it might be possible to groupcomponent ERSPs across subjects by eye, though

116

this quickly becomes discouraging as the numberof subjects and/or task conditions rise. In any case,an objective approach is more desirable.

As an example, let us consider data from thesame two-back task described earlier and illustratedfor one subject. Assume there are 20 subjects, eachwith a mean of 15 dipolar cortical ICs. To preparethe data, each 2-D (latencies, frequencies) compo-nent ERSP image for one or more task conditions(correct and incorrect) must be concatenated andthen reshaped into a 1-D (1, latencies! frequen-cies! conditions) vector. Thereafter, the vectorizedcomponent ERSPs from the 20 subjects can beconcatenated to form a large 2-D matrix of size(]ICs, latencies! frequencies! conditions) or, inthis case, (300, latencies! frequencies! conditions).A number of options are now available.

A simple approach is to use standard clusteringalgorithms such as K-means to cluster on Euclideandistances between the rows of the component ma-trix, whose dimensionality can be made manageableby preliminary PCA reduction. It is also possible tocombine dissimilar IC activity and/or locationmeasures in computing component ‘distance’ meas-ures. The open source EEG analysis toolbox (EEG-LAB, http://www.sccn.ucsd.edu/eeglab) includes aclustering interface that implements this method.The ICACLUST facility enables component clus-tering across subjects or sessions using a variable setof IC features: ERPs, ERSPs, scalp maps, meanspectra, and/or equivalent dipole locations.

Figure 5 illustrates preliminary clustering resultson 368 near-dipolar ICs from 29 subjects perform-ing the two-back task described earlier. In the fig-ure, equivalent dipoles of the same colors wereclustered by computing a Euclidean ‘distance’measure between the concatenated average com-ponent ERSPs time-locked to auditory feedbacktones signaling ‘correct’ and ‘wrong’ responses, aswell as the significant ERSP difference between thetwo, and also 3-D dipole location of each IC. TheERSPs plotted for each cluster represent themeans over all the cluster components, after zero-ing out spectral perturbations not significant(po0.00001) by binomial probability across theset of clustered components.

Note that although IC equivalent dipole loca-tions were here only a portion of the data used in the

clustering, ICs with similar event-related activitypatterns proven to be naturally associated with dis-tinct cortical regions. From the ‘difference’ activity(A, light blue) central midline cluster, it is clear thatthis cortical area produced a different activity pat-tern following wrong responses, namely a 400-mstheta band burst that began before the auditoryfeedback during the period of the motor response.This result is in line with our previous findings (Luuet al., 2004; Makeig et al., 2004b), and neatly re-produces the recent result of Debener et al. (2005)who used time-domain analysis of simultaneousEEG and fMRI data to show that trial-to-trialvariations in post-error activity of a very similarlylocated IC cluster were correlated with trial-to-trialvariations in fMRI BOLD signal only directly be-low the cortical projection of the component clus-ter, and highly coincident with the location of theequivalent dipole cluster in Fig. 5.

ERSP and equivalent dipole locations for threeother activity-derived but spatially ‘tight’ compo-nent clusters are shown in Fig. 5. IC clusters lo-cated in/near left and right hand somatomotorcortex (B, yellow and C, magenta, respectively)exhibited significantly stronger alpha activity for ahalf-second after receiving Correct feedback thanafter receiving Wrong feedback, as confirmed bypermutation-based statistical testing. Analysis ofresponses to matching versus non-matching letters(not shown) revealed the expected dominance ofspectral perturbations contralateral to the actualresponse hand. Finally, event-related perturbat-ions in spectral power in the bilateral occipital (D,green) IC cluster differed little following Correctversus Wrong feedback.

In contrast, the same clustering of ICs on dipolelocations, condition ERSPs, and ERSP differencesproduced two more spatially diffuse and highlyoverlapping occipital dipole clusters (red andblue). Although the spatial distributions of thesetwo clusters cannot be distinguished, the differenceERSPs between the two clusters do differ. Muchlike the left somatomotor (yellow) cluster, compo-nents in the blue cluster exhibit increased alphaactivity following Correct auditory feedback. Ofcourse, interpretation of the clustered componentactivities is a problem separate from clustering.However, once successful clustering of IC activities

117

Fig. 5. Clustering ICs from 29 subjects by common properties of their mean event-related activity time courses can be an efficientmethod for finding homogeneous groups of independent processes across sessions or subjects. Here, ICs were clustered by similaritiesin 3-D dipole location as well as features of their mean ERSPs time-locked to auditory performance feedback signals within two taskconditions (following Correct and Wrong button presses) and by the significant ERSP difference (when any) between them (thissignificance estimated by non-parametric binomial statistics, po1e$5). Colored spheres show the locations of the equivalent dipolesfor the clustered components. Colored lines connect these clusters to the respective cluster-mean ERSP and ERSP-difference images.Although IC equivalent dipole locations were only a portion of the data used in the clustering algorithm, the equivalent dipole modelsfor four of the obtained clusters (A–D) are spatially distinct. Two other, spatially intermingled clusters (E–F) illustrate how activity-based clustering can differentiate spatially similar components that would not be separated in clustering based on location alone.

118

has been accomplished, meaningful conclusionsabout brain function may be approached withmore confidence.

Summary and conclusions

This review is intended to provide an overview ofhow ICA is currently applied to EEG data de-composition and to time/frequency analysis ofEEG data in particular. As we have tried to illus-trate through sample results and explanations,ICA is a powerful tool for EEG analysis. Becauseof the numerous cortical EEG sources, as well asthe considerable variety of stereotyped non-brainEEG artifacts, experimental results for either av-eraged or unaveraged scalp channel data are in-herently ambiguous and may become even more sowhen data are pooled across subjects.

Nearly 11 years after the utility of ICA for EEGanalysis was first discovered by the senior authorand colleagues (Makeig et al., 1996), quite a fewEEG researchers have accepted ICA as an effectivemethod for removing stereotyped data artifacts in-cluding eye blinks and lateral eye movements, mus-cle activities, electrode or line noise, and pulseartifacts. ICs accounting for and isolating these ar-tifacts can be recognized easily. The value of sep-arately studying the scalp maps and activity timecourses of the non-artifact cortical components ex-tracted by ICA from high-density EEG data is stillless generally understood or adopted. We hope,however, that the benefits of this approach to EEG(and MEG) analysis will gradually be more under-stood, particularly as student researchers exploretheir own data using freely available analysis toolssuch as are being made available in EEGLAB orother available analysis environments.

We have discussed briefly how scalp channel av-erage ERPs, long enjoying a starring role in psy-chophysiological research, are in fact composed ofsignals from many cortical (as well as artifactual)sources. Without a method for spatially filtering thescalp-channel data to separate the underlying sourcesignals, they must be summed and thus confoundedin scalp channel averages. In fact, a peak in a scalpchannel ERP may not occur at a time point atwhich any of the contributing cortical signals actu-

ally reach a peak. This point is important to keep inmind when specific labels for ERP peaks (N100,P100, etc.) are supposed to represent latencies whensome ‘thing’ occurred in the subject’s brain. In fact,in most cases ERP peak latencies are only momentsat which the sum of the means of the underlyingcortical source signals create a net peak in theirsummed scalp potentials. In addition, it is well-demonstrated by research employing event-relatedspectral measures that ERP peaks often do not cor-respond to moments when mean EEG power peaks,either at the scalp channels or at the cortical sources.

While event-related average ERD and ERS, andmore generally ERSP measures provide more in-formation about event-related EEG dynamicsthan is available in average ERP measures, thesespectral power measurements are subject to similarspatiotemporal confounds as ERPs. A single-scalpelectrode receives projections from many corticalareas, thus spectral power measures are the resultof arbitrary summation and cancellation of dis-tinct source signals. Similarly, event-related aver-age ITC measures can be highly affected by sourcemixing, as was the case in Fig. 5 where the strengthof event-related ITC in at least one of the con-tributing IC processes was underestimated in theITC computed at a supervening scalp channel.

Inevitably, with the added benefits of ICA comesome additional complexity and inconvenience. Thiscomplexity, however, reflects the actual and likelyirreducible complexity both of the brain itself and ofthe recorded EEG signals. This complexity cannot bereduced, but instead only hidden and/or confoundedby modeling the data using simpler measures.

Chief among the problems introduced by ICAdecomposition is the issue of how to cluster ICsacross subjects and/or sessions. This is an evolvingresearch area that will likely become more widelyexplored as researchers discover new ways of work-ing with independent signal components. Our ownresults in this direction have convinced us thatcomponent clustering, when successfully accom-plished, can increase the amount, consistency, andutility of information about macroscopic event-re-lated brain dynamics that can be extracted fromhigh-dimensional EEG (or other electromagnetic)brain signals (Makeig et al., 2002, 2004b; Delormeand Makeig, 2003; Onton et al., 2005). We look

119

forward to the results that will come from nextdecade of research in this direction and beyond.

References

Anemueller, J., Sejnowski, T.J. and Makeig, S. (2003) Complexindependent component analysis of frequency-domainelectroencephalographic data. Neural Netw., 16: 1311–1323.

Arieli, A., Sterkin, A., Grinvald, A. and Aertsen, A. (1996)Dynamics of ongoing activity: explanation of the large var-iability in evoked cortical responses. Science, 273: 1868–1871.

Bell, A.J. and Sejnowski, T.J. (1995) An information-maxim-ization approach to blind separation and blind deconvolu-tion. Neural Comput., 7: 1129–1159.

Blair, R.C. and Karniski, W. (1993) An alternative method forsignificance testing of waveform difference potentials. Psy-chophysiology, 30: 518–524.

Budd, J.M. and Kisvarday, Z.F. (2001) Local lateral connec-tivity of inhibitory clutch cells in layer 4 of cat visual cortex(area 17). Exp. Brain Res., 140: 245–250.

Cardoso, J.-F. and Laheld, B. (1996) Equivariant adaptive sourceseparation. IEEE Trans. Signal Process., 44: 3017–3030.

Comon, P. (1994) Independent component analysis, a new con-cept. Signal Process., 36: 287–314.

Debener, S., Ullsperger, M., Siegel, M., Fiehler, K., von Cra-mon, D.Y. and Engel, A.K. (2005) Trial-by-trial coupling ofconcurrent electroencephalogram and functional magneticresonance imaging identifies the dynamics of performancemonitoring. J. Neurosci., 25: 11730–11737.

Delorme, A. and Makeig, S. (2003) EEG changes accompany-ing learned regulation of 12-Hz EEG activity. IEEE Trans.Neural Syst. Rehabil. Eng., 11: 133–137.

Delorme, A. and Makeig, S. (2004) Eeglab: an open sourcetoolbox for analysis of single-trial EEG dynamics. J. Neuro-sci. Method., 134: 9–21.

Freeman, W.J. (2004a) Origin, structure, and role of back-ground EEG activity. Part 1. Analytic amplitude. Clin. Ne-urophysiol., 115: 2077–2088.

Freeman, W.J. (2004b) Origin, structure, and role of back-ground EEG activity. Part 2. Analytic phase. Clin. Ne-urophysiol., 115: 2089–2107.

Hyvarinen, A., Karhunen, J. and Oja, E. (2001) IndependentComponent Analysis. John Wiley & Sons, New York.

Jung, T.P., Makeig, S., Westerfield, M., Townsend, J., Co-urchesne, E. and Sejnowski, T.J. (2001) Analysis and visu-alization of single-trial event-related potentials. Hum. BrainMapp., 14: 166–185.

Klopp, J., Marinkovic, K., Chauvel, P., Nenov, V. and Halgren,E. (2000) Early widespread cortical distribution of coherentfusiform face selective activity. Hum. BrainMapp., 11: 286–293.

Lauritzen, M. (1994) Pathophysiology of the migraine aura.The spreading depression theory. Brain, 117: 199–210.

Lee, T.W., Girolami, M. and Sejnowski, T.J. (1999) Independ-ent component analysis using an extended infomax algorithm

for mixed subgaussian and supergaussian sources. NeuralComput., 11: 417–441.

Linkenkaer-Hansen, K., Nikulin, V.V., Palva, S., Ilmoniemi,R.J. and Palva, J.M. (2004) Prestimulus oscillations enhancepsychophysical performance in humans. J. Neurosci., 24:10186–10190.

Luu, P., Tucker, D.M. and Makeig, S. (2004) Frontal midlinetheta and the error-related negativity: neurophysiologicalmechanisms of action regulation. Clin. Neurophysiol., 115:1821–1835.

Makeig, S. (1993) Auditory event-related dynamics of the EEGspectrum and effects of exposure to tones. Electroencepha-logr. Clin. Neurophysiol., 86: 283–293.

Makeig, S., Bell, A.J., Jung, T.P. and Sejnowski, T.J. (1996)Independent component analysis of electroencephalographicdata. Adv. Neural Inf. Process. Syst., 8: 145–151.

Makeig, S., Debener, S., Onton, J. and Delorme, A. (2004a)Mining event-related brain dynamics. Trends Cogn. Sci., 8:204–210.

Makeig, S., Delorme, A., Westerfield, M., Jung, T.P., Town-send, J., Courchesne, E. and Sejnowski, T.J. (2004b)Electroencephalographic brain dynamics following manuallyresponded visual targets. PLoS Biol., 2: E176.

Makeig, S., Westerfield, M., Jung, T.P., Covington, J., Town-send, J., Sejnowski, T.J. and Courchesne, E. (1999) Func-tionally independent components of the late positive event-related potential during visual spatial attention. J. Neurosci.,19: 2665–2680.

Makeig, S., Westerfield, M., Jung, T.P., Enghoff, S., Townsend,J., Courchesne, E. and Sejnowski, T.J. (2002) Dynamic brainsources of visual evoked responses. Science, 295: 690–694.

Massimini, M., Huber, R., Ferrarelli, F., Hill, S. and Tononi,G. (2004) The sleep slow oscillation as a traveling wave. J.Neurosci., 24: 6862–6870.

Molgedey, L. and Schuster, H.G. (1994) Separation of a mix-ture of independent signals using time delayed correlations.Phys. Rev. Lett., 72: 3634–3637.

Nunez, P. and Srinivasan, R. (2005) Electric Fields of the Brain:The Neurophysics of EEG. Oxford University Press, NewYork.

Onton, J., Delorme, A. and Makeig, S. (2005a) Frontal midlineEEG dynamics during working memory. Neuroimage, 27:341–356.

Onton, J. and Makeig, S. (2005b). Independent componentanalysis (ICA) source locations vary according to task de-mands. Org. Hum. Brain Mapp., Abstracts.

Pfurtscheller, G. and Aranibar, A. (1977) Event-related corticaldesynchronization detected by power measurements of scalpEEG. Electroencephalogr. Clin. Neurophysiol., 42: 817–826.

Scherg, M. and Von Cramon, D. (1985) Two bilateral sourcesof the late AEP as identified by a spatio-temporal dipolemodel. Electroencephalogr. Clin. Neurophysiol., 62: 32–44.

Tallon-Baudry, C., Bertrand, O., Delpuech, C. and Pernier, J.(1996) Stimulus specificity of phase-locked and non-phase-locked 40 Hz visual responses in human. J. Neurosci., 16:4240–4249.

120

information-based modeling of event-related brain dynamicsscott/pdf/onton_makeig_pbr06.pdf ·...

Documents