a computational model of how the basal ganglia produce sequences

A Computational Model of How the BasalGanglia Produce Sequences

Gregory S BernsUniversity of Pittsburgh

Terrence J SejnowskiHoward Hughes Medical Institute The Salk Institute andUniversity of California San Diego

Abstract

We propose a systems-level computational model of thebasal ganglia based closely on known anatomy and physiologyFirst we assume that the thalamic targets which relay ascend-ing information to cortical action and planning areas are toni-cally inhibited by the basal ganglia Second we assume that theoutput stage of the basal ganglia the internal segment of theglobus pallidus (GPi) selects a single action from several com-peting actions via lateral interactions Third we propose that aform of local working memory exists in the form of reciprocalconnections between the external globus pallidus (GPe) andthe subthalamic nucleus (STN) As a test of the model thesystem was trained to learn a sequence of states that requiredthe context of previous actions The striatum which was as-sumed to represent a conjunction of cortical states directlyselected the action in the GP during training The STN-to-GPconnection strengths were modied by an associative learning

rule and came to encode the sequence after 20 to 40 iterationsthrough the sequence Subsequently the system automaticallyreproduced the sequence when cued to the rst action Thebehavior of the model was found to be sensitive to the ratioof the striatal-nigral learning rate to the STN-GP learning rateAdditionally the degree of striatal inhibition of the globuspallidus had a signicant inuence on both learning and theability to select an action Low learning rates which would behypothesized to reect low levels of dopamine as in Parkin-sonrsquos disease led to slow acquisition of contextual informationHowever this could be partially offset by modeling a lesion ofthe globus pallidus that resulted in an increase in the gain ofthe STN units The parameter sensitivity of the model is dis-cussed within the framework of existing behavioral and lesiondata

INTRODUCTION

The basal ganglia are a collection of subcortical struc-tures that are relatively large in primates particularly inhumans Although much is now known about both thetypes of neurons that comprise these structures andtheir connectivity relatively little is known about theoverall function of the basal ganglia Lesion studies bothin lower primates and in humans consistently point to arole in motor function yet it is known that several partsof the basal ganglia receive massive projections from theprefrontal cortex suggesting a role in planning and cog-nition In this paper we study an anatomically motivatedcomputational model that integrates experimental datafrom the molecular to the behavioral levels and suggestsa function for each part of the basal ganglia

Approximately 80 of both the striato-pallidal andpallido-thalamic neurons are GABAergic a prevalence ofinhibition that is uncommon in the central nervous sys-tem yet no satisfactory explanation exists for this nd-ing In particular the direct striato-pallidal-thalamic

copy 1998 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 101 pp 108ndash121

pathway is comprised of two GABA neurons in serieswith each other A fundamental question is what advan-tage this arrangement confers over a single excitatorysynapse Findings suggest the maintenance of corticaltopography throughout the basal ganglia which hasraised the possibility that parallel streams of informationproject through the structure but with relatively littleintegration being performed (Alexander DeLong ampStrick 1986) However it is also known that the inputstage of the basal ganglia the caudate and putamen(collectively referred to as the striatum) receives inputsfrom almost the entire cortex There is also a conver-gence in neuron number from the striatum to the outputstage the globus pallidus These two levels of massiveconvergence suggest that the basal ganglia is involved inintegrating many types of information within eachstream to either plan or select an action from the manycompeting possibilities represented in the cortex

The primary source of dopamine in the brain is foundin the substantia nigra pars compacta (SNc) and ventraltegmental area (VTA) both of which have close ties with

the other basal ganglia structures and themselves areoften considered part of the basal ganglia Dopamine hasbeen implicated in reward-driven learning (SchultzApicella amp Ljungberg 1993 Schultz Apicella Scarnati ampLjungberg 1992) and the VTA is known to be a self-stimulation site (Cooper Bloom amp Roth 1996) The roleof dopamine seems to be closely related to motor behav-ior and the need to perform an action in so-called oper-ant tasks where rewards are contingent upon acting Yetdopamine has also been implicated in cognitive decitsespecially in regards to schizophrenia where one predic-tor of pharmacological efcacy of an antipsychotic drugis its dopamine-receptor afnity In this paper we pro-pose a model for the role of dopamine that integrates itsrole in reinforcement with that of the aforementionedmotor planning

While the connectivity of the basal ganglia and itsventral extensions the nucleus accumbens and ventraltegmental area has been well described there is noconsensus regarding the types of computations thesestructures perform Although the overall function is be-lieved to be related to planning and executing actionsespecially sequences of actions it is not clear how thecircuitry could accomplish this Previous attempts toassign function to the basal ganglia circuit have primarilyrelied upon heuristic arguments without any analyticalor computational analysis to test these ideas Swerdlowand Koob (1987) attempted to understand certain as-pects of psychiatric disease by proposing a model basedon nested loops of activity through the ventral parts ofthe basal ganglia However their model did not attemptto characterize the behavior of the proposed circuitSeveral models have been developed that attempt tointegrate the role of cellular reinforcement with actionselection in the basal ganglia (Barto 1995 Berns ampSejnowski 1996 Graybiel Aosaki Flaherty amp Kimura1994 Houk Adams amp Barto 1995 Montague Dayan ampSejnowski 1996 Wickens amp Kotter 1995) although com-pelling at the level of classical conditioning these mod-els have not yet shown how a complex sequence ofactions could be implemented by the basal ganglia Wepropose in this paper that the basal ganglia project tocortical areas that implement actions and that they ltermultimodal information by selecting previously learnedoptimal actions based on the instantaneous cortical stateWe further show how the connectivity is ideally suitedto the production of optimal action sequences An earliermodel which was restricted to action selection hasappeared elsewhere (Berns amp Sejnowski 1996)

MODEL

We modeled the basal ganglia as groups of simpliedneurons that corresponded to the various divisions ofthe structure The primary assumption was that the basalganglia facilitate the production of action sequencesbased on cortical states (for review of this hypothesis

see Cromwell amp Berridge 1996 Marsden amp Obeso1994) In order to do this the basal ganglia must haveboth a mechanism that selects competing actions and amechanism by which to learn sequences Thus ourmodel demonstrates how the architecture of the basalganglia is well suited to the production of action se-quences and how diffuse reinforcement mechanismsallow for learning

Selection Model

The output stage of the basal ganglia the internal seg-ment of the globus pallidus (GPi) is known to be almostwholly GABAergic and tonically active Thus the GPitonically inhibits the target thalamic nuclei (VLmVLpcmc VLo CM) Because these nuclei may also gateascending information to their cortical targets (motorsupplementary motor prefrontal cortices) it is reason-able that they should be tonically inhibited until theascending information is required for action Further-more release from tonic inhibition in the thalamus leadsto a very rapid postinhibitory rebound Inhibition of thecorresponding GPi neurons may thus lead to a morerapid response in the thalamus than if they were directlyexcited Such a rapid response would be necessary toproduce rapid sequences of activity In our model weassumed that a mechanism exists in the GPi so thatwithin a pool of neurons only one is inhibited at a timea form of ldquoloser-take-allrdquo (Berns amp Sejnowski 1996)

Each structure within the basal ganglia was con-structed of a set of units each of which representedsome locally distributed function (Figure 1) The striatalunits represented pools of spiny neurons that reect thecortical state which itself is comprised of both externalrepresentations (eg parietal areas) and internal plan-ning representations (eg frontal cortex) The striatummapped the cortical state onto a nite set of actions asrepresented in the globus pallidus internus (GPi) oroutput layer The striatal-GPi or ldquodirectrdquo (Alexander ampCrutcher 1990) pathway thus allows for the direct map-ping of the striatal state onto a particular action How-ever if one assumes that the striatum reects the corticalstate the direct pathway really maps the cortical activityonto a nite set of possible actions in the globus palliduswithout any automatic sequencing occurring in the basalganglia

It has been hypothesized that the basal ganglia areinvolved in the production of action sequences (Crom-well amp Berridge 1996) In order to produce a sequenceefciently either a short-term memory must exist or theoutput state must be fed back to the input layer Thereis evidence for the latter in the form of segregatedcortical-subcortical loops (Alexander et al 1986 Hed-reen amp DeLong 1991) that presumably feed back infor-mation from the globus pallidus to the cortical area fromwhich the inputs originated However the time courseof activity propagation through this loop is probably too

Berns and Sejnowski 109

long to allow for action sequences on the millisecondtime scale Thus if the sequencing hypothesis is truethere must be either local short-term memory or localfeedback within the basal ganglia In fact the presenceof local feedback itself could result in short-term mem-ory

The STN-GP loop contains recurrent projectionsthrough the GPe (Smith Wichmann amp DeLong 1994a)and we propose that this is a putative site of localshort-term memory This was modeled by diffuse excita-tory projections from STN to GP and specic inhibitoryconnections from GP to STN (Figure 2) The activity ofeach STN unit represented the average ring frequencyof a pool of neurons and was modeled as a leaky inte-grator

t dBi(t)

dt = Bi(t) a Gi(t d ) (1)

where Bi(t) is the activity of the i-th STN unit Gi(t) isthe activity of the corresponding GP unit (the sign isnegative because it is an inhibitory synapse via the GPe)a is a constant specifying the magnitude of the inhibi-tory synapse t is the time constant and d is the synapticdelay between the GP and STN units With a gt 1 theinhibition overrides the STN excitation The time con-stant t in Equation 1 represents both the actual mem-brane time constant and the time constant associatedwith activity in the recurrent STN-GPe loop For thesimulations a nonlinear discretized version of Equation 1was used and the membrane potentials were convertedto normalized representations of ring frequency by asigmoidal function s For each STN unit this was deter-mined by

Bi(t) = s [ l Bi(t 1) (1 l ) a Gi(t n)] (2)

Figure 1 Overall schematic of the basal ganglia model Thestriatum (STR) was modeled as two types of neurons a set of ma-trix neurons projecting to the globus pallidus and a set of strioso-mal (or patch) neurons projecting to the substantia nigra parscompacta (SNc) and ventral tegmental area (VTA) The striatal toglobus pallidus pathway diverges with inhibitory projections toboth the external (GPe) and internal (GPi) segments Similarly thesubthalamic nucleus (STN) has divergent excitatory projections toboth segments The GPe also has a recurrent inhibitory projectionto the STN which allows for the short-term storage of activitypatterns The GPi is assumed to be arranged topographically accord-ing to action and projects to the corresponding thalamic (Thal)neurons which are tonically inhibited The SNcVTA signal repre-sented a diffuse synaptic reinforcement signal which was givenby the difference between convergent projections from thestriatum and the GPi The striatal projection represented a predic-tion of reinforcement the summed GPi projection represented anestimate of the degree of match between the STR-GPi andSTN-GPi signals and the SNcVTA signal was the differencebetween the two

Figure 2 Detailed schematic of the network model The globus pal-lidus has been simplied to one set of units Each unit represented apool of related neurons in a particular structure and correspondedto some discrete action The GPe was represented by a xed projec-tion to corresponding STN units and the STN-GPi projection was amatrix of weights (wij) which was modied by the learning rulegiven in Equation 6 The striatum was the input layer and repre-sented a conjunction of cortical states and specied which action toselect during learning by inhibiting one of the GP units The GPlayer specied the action to perform and the GP unit with the mini-mum activity was assumed to be the one that selected the action Ad-ditionally the GP layer projected topographically to the subthalamicnucleus via the GPe-STN pathway The STN layer had two units foreach action one with a short time constant (7 msec) and one witha long time constant (90 msec) The time constants specied thetime interval over which the STN activities changed signicantly

110 Journal of Cognitive Neuroscience Volume 10 Number 1

where

l = t

t + D t(3)

and the sigmoid function with a gain of g and a bias ofb is given by

s [x] = 1

1 + e g (x b )(4)

and t now represents the time-step number n representsthe number of time steps corresponding to the synapticdelay and D t is the length of the time step l ranges from0 to 1 As shown in Figure 2 the activity of the GP unitswas determined by both striatal and STN inputs Thestriatal input was represented by a xed unit-to-unitinhibitory connection and the STN input was weightedby the strength of the modiable connections w Thesum of these weighted inputs was then transformed bya sigmoid function to give the representation of ringfrequency

Gi(t) = s aeligccedilegravearing

j

wij (t)Bj(t) a Si(t) + h ioumldivideoslash

(5)

where wij(t) is the matrix of connection strengths be-tween STN unit j and GP unit i at time t Si(t) is theactivity of striatal unit i at time step t and h i is the levelof noise drawn from a uniform distribution The actionultimately selected was assumed to be represented bythe GP unit with the minimum activity a form of loser-take-all (Berns amp Sejnowski 1996) Because the GP inhib-ited the thalamus the pool of GP neurons with thelowest activity corresponded to the thalamic pool withthe maximum activity

Synaptic Plasticity

Several models of cellular reinforcement have been pro-posed all of which rely on the use of a scalar signal tomodify synaptic strengths (Houk et al 1995 MontagueDayan Person amp Sejnowski 1995 Montague et al 1996Sutton amp Barto 1990 Wickens amp Kotter 1995) Thesemodels have typically proposed that extrinsic rewardsare mapped onto a structure such as the ventral tegmen-tal area (VTA) and that dopamine is released in propor-tion to the extrinsic reward (Houk et al 1995 Wickensamp Kotter 1995) Dopamine is then proposed to modifysynaptic strengths If dopamine modies synapticstrength it must be released in close temporal proximityto the activity that precedes it Problems with previousmodels arise because extrinsic reward occurs at times

remote from the activity that produced the action Thisproblem can be partially overcome by using a temporal-difference model in which dopamine is released in pro-portion to errors in prediction of reward (Houk et al1995 Montague et al 1995 Sutton 1988 Sutton amp Barto1990) but this still requires either the storage of activitytraces for subsequent modication of synaptic strengthor the use of a diffusible messenger that regulates localdopamine release (Montague amp Sejnowski 1994)

We propose an alternative learning model Because ofthe dichotomy of information ow through both theinternal and external segments of the globus pallidus itis possible for the striatum to ldquotrainrdquo a sequence Thestriatal activity represents a mapping of the cortical stateonto a nite set of possible actions which are imple-mented in the GPi and in the untrained state thestriatum can directly select the action via the directpathway However the striatum also projects to the GPeas does the STN Because the GPe-STN loop can alsostore short-term activity traces the connection strengthscan be modied through associative learning mecha-nisms such that the GPe-STN can produce a sequence ofstates that are learned from the striatum

We used a modied Hebbian learning rule (Hebb1949) given by

D wij (t) = r (e(t)Gi(t) Si(t))Bj(t) (6)

where D wij(t) is the weight change in the connectionfrom STN unit j to GP unit i at time t r is the learningrate e(t) is a scalar error signal from the SNcVTA at timet Gi(t) is the activity of GP unit i and Bj(t) is the activityof STN unit j The rst term r e(t)Gi(t)Bj(t) represents aclassical Hebbian synapse with presynaptic activity fromSTN and postsynaptic activity from GP modulated by theerror term e(t) The second term r Si(t)Bj(t) representsa different form of plasticity that decreases the synapticefcacy when the postsynaptic cell is actively inhibitedby a GABAergic synapse This is an important differencebecause it distinguishes the situations when a postsynap-tic cell is inactive because of lack of presynaptic activityfrom that in which the postsynaptic cell is inactivebecause it has been actively inhibited in this case by thestriatum In the latter case any concurrently active exci-tatory synapse decreases in strength whereas in theformer case no change occurs

The error signal e(t) was generated by the mismatchbetween the direct and indirect pathways and was as-sumed to originate from the dopaminergic neurons inthe substantia nigra (SNc) and ventral tegmental area(VTA) As shown in Figure 3 the striatum sends a con-vergent projection to the error unit and the GP alsosends a convergent projection to the error unit via ahypothesized inhibitory neuron During the trainingphase the striatal inhibition overrides the STN excitationof GP which results for extreme levels of activity in thelogic table shown in Table 1 Thus only when the STN


input to a GP cell is high and there is no striatal inputis there signicant GP activity which functions as a typeof inverse coincidence detector By summing all the GPactivities a scalar estimate of the total degree of coin-cidence is produced A striatal estimate is produced bysumming the vector of striatal activities weighted by theconnections vi(t) (Figure 3) Ideally this would matchthe estimate obtained from the GP and the differencebetween the two represents the error signal at timestep t

e(t) = aring i

(Gi(t) vi(t)Si(t)) (7)

where vi(t) is the connection strength from striatal uniti to the error unit Furthermore the connectionstrengths vi are also modied by a rule similar to thatof Equation 6 so eventually the error signal tends towardzero

D vi(t) = r e(t)Si(t) (8)

RESULTS

During the training phase in the simulation with veunits the activities of the GP units gradually increasedas the STN inputs to them increased (Figure 4) Thewinning unit was the one with lowest activation andduring training the striatum directly selected the win-ning unit Hence at each time step one and only oneunit was off The STN weights associated with the GPunits increased throughout training until the error haddeclined to zero As shown in Figure 5 the weight of theconnection from the STN unit that corresponded to theaction preceding the one in the GP grew at a slower ratethan the others This reected the fact that on averageboth the pre- and postsynaptic activities for this synapsewere lower than the others

During training the internal reward by which wemean the scalar measure of the match between thestriatal and GP activities and computed by the sum ofGP activities steadily increased (Figure 6A) Most of thiseffect resulted from the fact that the connectionstrengths from the STN were increasing driving the GPunits to higher activities Part of the effect was also dueto the increasingly better match between the striatal andSTN inputs to the GP as illustrated in Table 1 The errorsignal which represented the difference between thereward predicted by the striatal patch and that of theactual internal reward (from the sum of the GP activi-ties) increased during the initial learning phase in whichthere was a rapid change in connection strengths Thissteadily decreased as the striatal weights vi becamemodied to predict the amount of reward associatedwith a particular striatal state (Figure 6B) As the errortended toward zero further changes in connectionstrength ceased

In order to test the modelrsquos ability to reproduce asequence trained in the aforementioned manner thestriatal activity vector corresponding to the initial statein the training sequence was loaded for one time stepinto the striatal units As shown in Figure 7 the sub-sequent GP activities reproduced this sequence wellhowever the noise led to an imperfect reproduction as

Figure 3 Schematic of the errorreinforcement mechanism Thestriatum (STR) sends a convergent inhibitory projection to thedopamine containing neurons of the substantia nigra compacta(SNc) and ventral tegmental area (VTA) It is assumed that this pro-jection arises from the striosomal compartment and these connec-tions (vi) were modiable in strength The globus pallidus (GP) unitsalso projected convergently to the SNcVTA but via an inhibitory in-terneuron which effectively reversed the sign of the signal The GPprojection represented the sum total of the matches between theGP activity and the striatal activity According to Table 1 the highestmatch occurred when maximal subthalamic nucleus (STN) activa-tion coincided with minimal striatal inhibition The output of theerror unit (e) computed as the difference between the weightedstriatal signal and the GP signal was projected back to both theSTR-SN (vi) and STN-GP (wij) weights where it modulated weightchanges according to Equations 8 and 6 respectively

Table 1 Logic table for the GP units given hypotheticallyextreme values of input from both the STN and striatumStriatal inhibition overrides STN excitation so maximal GPactivation occurs with both maximal STN activity andminimal striatal activity

STN Striatal GP

0 0 0

1 0 1

0 1 0

1 1 0


evidenced by the error at time step 12 The ambiguityassociated with unit 2 is resolved by the differentialactivations between the short and long STN units Morecomplex sequences typically of length greater than 10exceeded the modelrsquos ability to reliably reproduce themMore complex ambiguities also resulted in incorrectreproductions

The parameter sensitivity revealed that the model isrobust but beyond certain limits various degradationsoccurred The ratio of STRGP learning rates signicantlyaffected performance Decreasing the ratio to 1 resultedin persistently high error signals because the STR pre-diction was slow to learn which in turn resulted in STNweights that continued to increase longer With persist-ently increasing weights the model lost the ability todisambiguate the context of certain activity patternsyielding the sequence 1 2 5 1 2 5 With a learningratio of 4 the same sequence was produced but this wasdue to the rapid cessation of learning as the striatalweights rapidly adjusted and the error went to zero

before many of the STN to GP weights had achievedtheir correct values Diminishing the degree of inhibitoryoverride by decreasing a to 1 resulted in maximal acti-vation of all the GP units during training because thestriatum had insufcient inhibition to directly select anaction The end result after training was a uniformweight matrix with all weights close to 1 With thisweight matrix the sequence could not be produced atall Changing the gain ( g ) and bias ( b ) parameters withgains ranging from 2 to 8 and biases ranging from 01to 02 did not signicantly affect the production ofsequences however certain combinations of gain andbias yielded GP activities that were subtly different

The aforementioned sequence demonstrated how themodel learned a sequence requiring the disambiguationof context We also tested the modelrsquos ability to shiftbetween a random sequence and a repeating 10-itemsequence This was done in part to test the model on awell-studied behavioral task of procedural learning (Will-ingham Nissen amp Bullemer 1989) The model was pre-

Figure 4 Unit activities during learning the sequence 1 2 3 4 2 5 With layers of ve units each activities are shown from 0 (black) to 1(white) for striatum (STR) globus pallidus (GP) and the two subthalamic nucleus layers with short time constant (STN Short) and long timeconstant (STN Long) Panel A shows the activity patterns during the initial 20 time steps of training and Panel B shows the activity patterns af-ter 200 time steps Using the parameters given in Table 2 the striatum trained the globus pallidus to produce a sequence of actions Initiallythe GP activities were low and disorganized because of minimal excitation from the STN Subsequently the weights and hence the GP activi-ties increased except for those corresponding to the action that was actively inhibited by the striatum


sented with 100 trials of randomly ordered stimuli (1 23 or 4) then 40 repetitions of the sequence 4-2-3-1-3-2-4-3-2-1 followed by another 100 trials of random stimuliIn order to compare the GP output to previously re-ported reaction times the GP output was linearly trans-formed by

R(t) = 1 1N

aring i=1

N

Gi(t) (9)

where N was the number of GP units (4 in this case)R(t) represented a normalized reaction time at time stept and ranged from 0 to 1 This linearly scaled the matchbetween the direct and indirect pathways with the bet-ter the match the lower the reaction time As shown inFigure 8A the reaction time initially declined even witha random sequence and then rapidly reached a stablelevel with the introduction of the repeating sequence Itstabilized at the value 025 because the inherent struc-ture of the sequence allowed for maximal activation ofall GP units except the one being selected When therandom sequence was reintroduced the normalized re-action time became slightly longer

We also used this paradigm to model the effects ofParkinsonrsquos disease and the subsequent improvement ofsymptoms from pallidotomy (Figure 8 parts B and C)Parkinsonrsquos disease was modeled by decreasing thelearning rate ( r in Equation 6) from 0025 to 0005reecting the overall decline in dopamine that is found

in Parkinsonrsquos disease This resulted in substantiallyslower learning as evidenced by the lower slope inFigure 8B but because the GP activations were generallylower the effect of noise was also more prominent Theeffects of the decreased learning rate could largely beameliorated by increasing the gain of both the GP andSTN units from 4 to 8 As the gain was increased unitsthat were previously marginally active became maxi-mally active and thus the rst term in Equation 6 in-creased partially offsetting the decreased learning rateThis suggests that a potential mechanism for the efcacyof pallidotomy is in the alteration of the gain of pools ofneurons in both the STN and GP One prediction is thateven though the rate of learning is partially restored theeffect of noise still remains

Figure 5 Changes in connection strengths wij from learning the se-quence 1 2 3 4 2 5 The ve weights from the ve STN units withshort time constants to GP unit 2 are shown The three weights thatincreased to saturation levels were from STN units 2 3 and 5 (iethose STN units that were not active prior to GP unit 2 being ac-tive) Conversely the weights from STN units 1 and 4 did not in-crease signicantly because when these units were active GP unit 2was inhibited by the striatum

Figure 6 Levels of the ldquorewardrdquo (A) from the GP and the error sig-nal from the SNcVTA (B) during learning the sequence 1 2 3 4 25 The reward was computed as the sum of the GP activities andwas proportional to how well the GP activity vector matched the in-verse of the striatal activity vector As the system learned to producethe sequence the match and hence the reward increased The errorsignal which was computed by Equation 7 represented the differ-ence between a weighted sum of the striatal activity and the sum ofthe GP activity The weights on the striatal activities were modiedby the error signal and thus the difference ultimately converged tozero Note that the variance also decreased


DISCUSSION

We have presented a model of the function of the basalganglia that was based closely on known neuroanatomyand we have demonstrated how such a neuron-levelmodel can both learn and reproduce action sequencesThis differs from other connectionist models based onback-propagation (Cleeremans amp McClelland 1991) Al-though these have been quite successful at modelingbehavioral data they are not anatomically or physiologi-cally motivated Our model attempted to incorporateseveral proposed functions in the basal ganglia Actionselection was assumed to occur through lateral inu-ences in the globus pallidus whereas sequences wereproduced via recurrent projections in the GPe-STN loopFurthermore both these proposals were based on the

Figure 7 Globus pallidus activities after learning the sequence 1 23 4 2 5 and being cued to the beginning of the sequence by in-hibiting GP unit 1 for one time step Activities are shown from 0(black) to 1 (white) for striatum (STR) globus pallidus (GP) andthe two subthalamic nucleus layers with short time constant (STNShort) and long time constant (STN Long) Because of the STN-GPweights that were learned the system subsequently reproduced thesequence without further input from the striatum The GP layer al-though noisy correctly produces the sequence with the correct unithaving minimal activity at each time step

Figure 8 Modeling the effects of Parkinsonrsquos disease (PD) and sub-sequent pallidotomy The model was presented with 100 trials of ran-dom responses followed by 400 trials of stimuli from a xed re-peating sequence (4-2-3-1-3-2-4-3-2-1) (Willingham et al 1989) andthen by another 100 random trials The output of the globus pal-lidus (GP) units was transformed to represent a normalized reactiontime by a linear mapping and ranged from 0 to 1 (see text) In theNormal model ( r = 0025 gain = 4 bias = 02) the GP output be-came more organized and consequently reaction time improvedeven with the random sequence When the repeating sequence wasintroduced the GP activation became even more organized and reac-tion time improved further until the nonselected GP units weremaximally activated When the random sequence was reintroduced(t = 500) the activation pattern was no longer maximal and the re-action time worsened slightly Parkinsonrsquos disease was modeled bydecreasing the overall learning rate ( r = 0005 gain = 4 bias = 02)With these parameters the rate of improvement in reaction timewas slower and showed a greater variance of response for the samelevel of noise in the system Pallidotomy was modeled by increasingthe gain of the GP and STN units ( r = 0005 gain = 8 bias = 02) Al-though the variance still remained high the rate of reaction time im-provement was partially improved


assumption that segregated streams of information owthrough the basal ganglia (Alexander et al 1986) Weformalized this segregation by representing each poten-tial action by a separate unit a so-called grandmothercell however this was chosen for computationalefciency Each of the individual units in our model morerealistically represents a pool of neurons devoted to aparticular action but the segregation requirement forpools of neurons remains

Although there is good evidence for cortical topogra-phy being maintained throughout the basal ganglia (Al-exander amp Crutcher 1990 Alexander et al 1986Goldman-Rakic amp Selemon 1986 Parent 1990) it is alsoknown that the segregation is not complete The massiveconvergence from striatum to globus pallidus (Wilson1990) alone requires that inputs cannot remain com-pletely segregated (Flaherty amp Graybiel 1994 Hedreenamp DeLong 1991) However our model is consistent withthis convergence We modeled the striatum as an inputstage in which diverse areas of the cortex map ontosubsets of neuron pools In this manner diverse sensori-motor modalities are combined with higher repre-sentations of both context and timing information fromthe prefrontal areas It is the functional mapping fromstriatum to globus pallidus and thalamus that remainssegregated In other words the segregation may reectthe nal cortical targets not the afferents Work withretrograde transneuronal transport of herpes simplexvirus injected into the cortical motor areas suggests thatthe output stages of the basal ganglia are indeed organ-ized into discrete channels that correspond to theirtargets (Hoover amp Strick 1993)

The basal ganglia have long been assumed to facilitateaction sequencing (Cromwell amp Berridge 1996) anddespite the fact that there is not a great deal of experi-mental evidence for the sequencing hypothesis ourmodel does support this notion In fact a review ofhuman behavioral syndromes associated with variousbasal ganglia lesions did not nd any reports of sequenc-ing disturbance (apraxia) (Bhatia amp Marsden 1994) Analternative but related hypothesis is that the basal gan-glia monitor automatic behavior and alter it in novelcontexts (Marsden amp Obeso 1994) Our model focusedon the role of the pallidal-subthalamic loop in facilitatingsequence production but the output of this networkmay be used in other ways For example the sequenceinformation could be used to monitor the reliability ofsequential predictions maintained elsewhere such as inthe cortex or the striatum In our model this actuallyoccurs in the degree of match between the direct andindirect pathways

The primary requirement for automatic sequence pro-duction was the presence of recurrent connections thatboth relay information regarding previous actions andresult in short-term memory Until recently the presenceof recurrent projections has not been a prominentnding in basal ganglia anatomy however such path-

ways exist locally in the primate pallidal-subthalamicloop (Smith et al 1994a Smith Wichmann amp DeLong1994b) It has previously been argued that the mostobvious recurrent pathway is the thalamo-cortical pro-jection that presumably closes the cortical-subcorticalloop (Alexander et al 1986 Berns amp Sejnowski 1996Graybiel et al 1994 Houk et al 1995 Middleton ampStrick 1994) but the time taken for impulses to traversethis loop appears to be on the order of 100 msec (Wil-son 1990) which would be too long for rapid automaticsequencing Within the basal ganglia the striatum itselfmay have memory properties and there is evidence thatstriatal lesions impair innate grooming sequences (Crom-well amp Berridge 1996) In slice preparations striatal cellsshow bistable activity existing in either a low state or ahigh subthreshold state from which action potentialsirregularly appear (Kawaguchi Wilson amp Emson 1989Wilson 1986) and may represent context information forthe conjunction of either cortical states or in the caseof grooming other brainstem inputs Our model does notdirectly address the role of the striatum in action se-quencing other than representing a conjunction of in-puts but rather the model demonstrates how theGP-STN loop could automate sequence learning whichmay be different from the mechanisms for innate se-quences

Similarly we have not attempted to incorporate therole of the prefrontal cortex (PFC) in the model eventhough it is closely connected to the basal ganglia Alarge body of literature has implicated the prefrontalcortex especially the lateral areas in tasks requiring themaintenance of information for short periods of time(Fuster 1973 Goldman-Rakic 1987) Both the architec-ture and size of the prefrontal areas suggest that the PFCperforms far more complex operations than sequencingalthough this may be part of its function As noted abovethe contribution of the basal ganglia may be in a moni-toring role following along with highly automated se-quences and detecting deviations from the expectedpredictions Another important difference is the timescale on which these two structures may operate PFCneurons can hold information on the order of seconds(Fuster 1993) whereas the proposed local memory inthe basal ganglia may be operating on the scale of 10 to100 msec

We have proposed that the GPe-STN loop stores short-term traces of activity and functions as working memoryfor the production of action sequences In a study ofpallidal activity during sequential arm movementsMushiake and Strick (1995) found subsets of pallidalneurons that only displayed activity changes generallydecreases during specic phases of the movement se-quence These neurons did not display the same changeswhen the same task was explicitly guided suggesting arole in both sequencing and memory The function of thesubthalamic nucleus has remained elusive and has tradi-tionally been conceived in terms of tonic excitation of


the globus pallidus to prevent unwanted activity frombeing initiated Although no direct evidence exists forthe STNrsquos role in working memory both in vivo record-ings and lesion data can be interpreted within this frame-work Studies by Wichmann Bergman and DeLong(1994) demonstrated that STN neurons re around theonset of movement which would be considered a rela-tively late increase in activity This would however beconsistent with a storage function Similarly pallidal ac-tivity changes have been found to occur too late formovement initiation (Mink amp Thach 1991) STN lesionshave not consistently been shown to decrease the ringrate of GP neurons which one would expect if its func-tion were solely tonic excitation but instead they regu-larize the GP ring pattern (Ryan amp Sanders 1993)Conversely lesioning the globus pallidus results in anincrease in STN bursting cells (Ryan Sanders amp Clark1992) The latter lesion may represent a large-scale inhi-bition of the GP somewhat analagous to the effect of thestriatum on the GP but on a local scale The appearanceof bursting would be consistent with a memory storagefunction as activity oscillates through the remaining cellsin the GPe-STN loop STN lesions have been found tocause a condition in which animals are subject to invol-untary large movements of whole limbs (hemiballismus)Conceptualizing this within the framework of a localmemory function one would hypothesize that the lossof local memory that occurs with a subthalamic lesionresults in the loss of context for movements In this casestriatal representations cannot be disambiguated with-out the context of preceding actions and thus may resultin the activation of large pools of neurons with theeffect being a large involuntary movement when theintention was a small specic one

A major assumption of our model is that thedopamine-containing neurons of both the substantia ni-gra pars compacta (SNc) and the ventral tegmental area(VTA) modulate synaptic efcacy in response to intrinsicreward There is strong evidence that dopamine plays animportant role in at least extrinsic reward-driven learn-ing In studies by Schultz et al (Ljungberg Apicella ampSchultz 1992 Schultz et al 1993 Schultz et al 1992) ithas been demonstrated that dopamine neurons of theVTA re transiently during the learning of an operanttask when a reward is given but that after the task islearned the dopamine neurons do not re in responseto reward Furthermore when reward is withheld afterthe task is learned the dopamine neurons show a tran-sient depression in activity (Schultz et al 1993) Thisresult is consistent with the existence of a projection tothese neurons that contains predictive information re-garding future reward In a previous model of me-sencephalic dopamine systems it was proposed thatdopamine is released in proportion to a temporal-differ-ence of reward prediction but this model required acomplete temporal representation of preceding stimuliin the form of a tapped-delay line (Montague et al 1996)

A similar type of representation appears in the presentmodel in the form of multiple time constants within theGP-STN loop Furthermore we propose that in additionto ring in response to extrinsic rewards the dopamineneurons also re in response to intrinsic rewards bywhich we mean internal representations of how well thebasal ganglia are performing and which we have mod-eled as originating from the GPi We hypothesize that thestriatonigral (or in the ventral striatum the accumbal-VTA projection) carries a prediction of reward (Barto1995 Sutton amp Barto 1990) and because of its knownprojections to the dopamine-containing structures thisarises from the striosomal compartment (Graybiel 1990Houk et al 1995)

As implemented in our model the dopamine activitycan be either positive or negative We have modeled therole of dopamine as modulatory rather than as either anexcitatory or inhibitory inuence From Equation 7 thequantity e(t) represents this signal and species both themagnitude and direction by which the synapse efcacychanges Although it is not formalized in the equationthe tacit assumption in allowing e(t) to be either positiveor negative is that a tonic level of dopamine activity isrequired to maintain the synaptic efcacy Thus a depres-sion from maintained activity would lead to a decreasein synaptic efcacy whereas an increase in dopamineactivity would lead to an increase in synaptic efcacyassuming the correlation of both pre- and postsynapticactivities There is evidence that the release of dopaminein the striatum is biphasic and depends on the learnedavailability of actions In an experiment in which micewere allowed to escape foot shock there was an in-crease in the accumbal concentration of the dopaminemetabolite 3-methoxytyramine (3-MT) but a decrease in3-MT if the mice were not allowed to escape (Cabib ampPuglisi-Allegra 1994) indicating that dopamine releaseboth increases and decreases according to learned be-haviors

The further requirement imposed by Equation 6 isthat the dopamine-modulated synaptic change only oc-curs at those synapses where there was a precedingcorrelation between pre- and postsynaptic activity Thusin our model dopamine modulates a classical Hebbiansynapse Presently both long-term potentiation (LTP) andlong-term depression (LTD) are the best candidates foractivity-dependent changes in synaptic efcacy althoughthere may be other mechanisms The focus for LTP hascentered on the hippocampus and its role in memorybut both LTP and LTD have also been reported in theventral striatum (Calabresi Maj Pisani Mercuri amp Ber-nardi 1992 Kombian amp Malenka 1994) Kombian andMalenka reported that LTP was produced in the core ofthe nucleus accumbens by tetanic stimulation of thecortical input and that the LTP was mediated mainly bynon-NMDA receptors They further showed that LTDoccurred in the same neurons via the NMDA receptorsuggesting that depending on the intracellular calcium


concentration either LTP or LTD could occur in a singlestriatal cell Dopamine is known to act on intracellularcalcium (Cooper et al 1996) and the areas of the brainwith the highest calmodulin-dependent phosphodi-esterase concentration correspond to those areas withheavy dopaminergic innervation (Polli amp Kincaid 1994)It may be that dopamine modulates the LTPLTD behav-ior of both striatal and pallidal cells by altering theconcentration of intracellular calcium but it is likely thatother second messenger systems are also involved

The second component of synaptic plasticity in themodel distinguished between active inhibition of thepostsynaptic cell and inactivity from lack of presynapticexcitation The classical Hebbian synapse requires onlycoincident pre- and postsynaptic activity without regardto the different contributors to the postsynaptic poten-tial In order to differentiate active inhibition one mustpostulate the existence of a second messenger thatweakens any synapse that is concurrently active Thereis experimental evidence for long-term depression (LTD)of synaptic activity when presynaptic excitatory im-pulses are coupled with GABAergic activity (Thiels Bar-rionuevo amp Berger 1994 Wagner amp Alger 1995 YangConnor amp Faber 1994) The underlying mechanism couldinvolve decreases in intracellular calcium In our modelthis effect became signicant for reorganization ofweights when changing from one sequence to anotherThe model predicts that any process that weakens thestriatal inhibition of the globus pallidus should lead toan impairment in sequence shifting Processes that mightdo this include striatal degeneration (eg Huntingtonrsquosdisease and striatal-nigral degeneration) Conversely aprocess that facilitates GABAergic transmission (egbenzodiazepines) would lead to an increase in sequenceshifting possibly manifested as an inability to concen-trate on a task

The learning and reproduction of a sequence of statesin our model suggests mechanisms by which the primatebasal ganglia may automate action sequences Beginningfrom a state in which the weights were all zero themodel rapidly converged to the proper set of connec-tions necessary to reproduce the sequence Howeveraltering the ratio of striatal to pallidal learning rateswhich can be roughly interpreted as a differential effectof dopamine at these sites led either to a saturation ofthe weights (low ratio) or a rapid cessation of learning(high ratio) In both cases this impaired the sequencelearning Diminishing the degree of inhibitory overridefrom the striatum which would be analagous to the lossof spiny neurons in Huntingtonrsquos disease led to thesaturation of all the STN-GP weights This resulted notonly in the inability to produce any sequence but itgenerated random actions that were driven by the levelof presynaptic noise in the system This could beanalagous to the choreiform movements typical of Hun-tingtonrsquos disease

In the model of Parkinsonrsquos disease a dopamine decit

was modeled by decreasing the overall learning rate Asshown in Figure 8 this resulted in slower weightchanges and consequently an impairment in procedurallearning Parkinsonrsquos disease patients have been thoughtto suffer from decits in both motor initiation and shift-ing as well as cognitive shifting (Cools Van Den BerckenHorstink Van Spaendonck amp Berger 1984 Marsden ampObeso 1994) although it is not clear whether shiftingdecits are separate from the overall slowing observedon motor tasks The specic sequence shifting task thatwe modeled was selected to test the model againstexisting experimental data in Parkinsonrsquos disease (Pas-cual-Leone et al 1993) Using a linear mapping for R(Equation 9) the model results closely match the decitsobserved in Parkinsonrsquos disease Not only does it capturethe slower rate of improvement but it also displays thesame increase in variance in R In the model the ldquosignalrdquowas relatively attenuated because of the lower connec-tion weights and their slow rate of change and thus thesame level of noise in the system resulted in a largervariance of R This model suggests that so-called switch-ing decits may simply be an aspect of slow learningwhich is consistent with studies nding no basic switch-ing decit (Brown amp Marsden 1988 Downes SharpCostall Sagar amp Howe 1993) By slow learning we reallymean a decrease in plasticity at the neuron level whichwould be particularly evident with tasks requiring re-sponses to new information This could manifest itself asa switching decit if the task involved new material orwas designed to require rapid learning or as a predictiondecit if external feedback was absent (Flowers 1978)Our model also suggests that there is a differential effectbetween unlearning a previous response set and learn-ing a new one Because striatal inhibition overrides pal-lidal excitation connection weights will generallydecrease faster than they increase and this effect will bemore pronounced when the overall learning rate is de-creased as in the model of Parkinsonrsquos disease This isconsisent with previous ndings of Parkinsonian impair-ments of both shifting away from a previously learneddimension and shifting to a new dimension (Owen et al1993)

It is curious then that a focal lesion of the globuspallidus can alleviate some of the symptoms of Parkin-sonrsquos disease but there is substantial evidence for theefcacy of the pallidotomy procedure in Parkinsonrsquosdisease It has been proposed that the procedure worksby lesioning a hyperactive globus pallidus thus restoringbalance (Iacono et al 1995 Marsden amp Obeso 1994)Our model suggests that the pallidotomy lesion ratherthan simply restoring normal tonic inhibition alters theinput-output relationship in pools of both pallidal andSTN neurons This partially offsets the decreased learningrate because of an increased signal-to-noise ratio (iebetter separation of the units that should or should notbe active) Thus those units that are activated with theincreased gain have maximal activity which maximizes


the rst term in Equation 6 and offsets the decreasedlearning rate By lesioning neurons within a pallidal poolthe effective gain of the corresponding STN neuronsmay be increased by effectively uncoupling them fromeach other If one assumes that each of the neuronscontributing to the pool is fundamentally a binary(offon) unit uncoupling them from each other allowseach individual cell to have a greater signal-to-noise ratio(ie each STN cell is better able to detect signicantinputs because they are not being averaged with othercells) In an animal model of Parkinsonrsquos disease whenthe STN was lesioned the percentage of pallidal cellsresponding to an external stimulus decreased (Wich-mann Bergman amp DeLong 1994b)

We have proposed a systems-level model of the basalganglia that attempts to bridge the gap between theanatomy and the function of automatic sequence pro-duction Local working memory was postulated to existin the form of activity patterns in the GPe-subthalamicfeedback loop to which damage is predicted to causeincorrect movements because of a loss of ability todisambiguate the action context on a short time scaleFurthermore diseases that alter the striatal inhibition ofthe globus pallidus are predicted to either decrease theability to shift sequences or cause false shifting In orderto learn sequences we have proposed that the striatumtrains the globus pallidus and subthalamic nucleus toproduce sequences of states and dopamine modulatesthe synaptic efcacy to achieve this In order toefciently shift sequences a mechanism by which synap-tic efcacy is weakened depends on the GABAergicmodulation of long-term depression Although many de-tails of the model are based on assumptions the resultsof the model t a large body of behavioral and physi-

ological data and it suggests a framework for conceptu-alizing basal ganglia function

METHODS

All simulations were programmed in Mathematica ver-sion 22 (Wolfram 1991) running on a 486ndash33 Pentium-90 or a Sparc 10 For 100 iterations the simulation timeranged from approximately 10 sec on the Sparc 10 to 1min on the 486ndash33 Several simulations were performedwith varying numbers of units and sequence complexityValues for the equation parameters are shown in Table2 The learning rate for the STR reg VTA weights was 2times that of the STN reg GP weights Two sets of STNunits were used one with a time constant t of 7 msec(Nambu amp Llinas 1994) which according to Equation 3and a time-step interval of 10 msec corresponded to al of 04 the other set of STN units had a t of 90 mseccorresponding to a l of 09 The synaptic delay was takento be one time step (n = 1) Each unit in the simulationrepresented a pool of neurons with each pool corre-sponding to a particular action As a test of the modelthe sequence (1 2 3 4 2 5) was learned using 5 unitsin the STR and GP layers and 10 units in the STN layer(two integration times for each unit) This sequencerequires short-term memory in order to disambiguatethe action following Unit 2 All weights were constrainedto the range 0 to 1 Larger simulations with up to 10units were also performed

Acknowledgments

We thank Francis Crick for discussion during the developmentof this model and comments on early drafts of this paper andP Read Montague and Peter Dayan for comments on the manu-script This work was supported by funds from the AmericanPsychiatric Association and Ely Lilly to G S B and the WasieFoundation to T J S

Reprint requests should be sent to Gregory S Berns WesternPsychiatric Institute amp Clinic University of Pittsburgh 3811OrsquoHara St Pittsburgh PA 15213 or via e-mail berns+pittedu

REFERENCES

Alexander G E amp Crutcher M D (1990) Functional archi-tecture of basal ganglia circuits Neural substrates of paral-lel processing Trends in Neuroscience 13 266ndash271

Alexander G E DeLong M R amp Strick P L (1986) Parallelorganization of functionally segregated circuits linking ba-sal ganglia and cortex Annual Review of Neuroscience9 357ndash381

Barto A G (1995) Adaptive critics and the basal ganglia InJ C Houk J L Davis amp D G Beiser (Eds) Models of in-formation processing in the basal ganglia (pp 215ndash232)Cambridge MA MIT Press

Berns G S amp Sejnowski T J (1996) How the basal gangliamake decisions In A Damasio H Damasio amp Y Christen(Eds) The neurobiology of decision making (pp 101ndash113) Berlin Springer-Verlag

Bhatia K P amp Marsden C D (1994) The behavioural and

Table 2 Parameter values for the model

Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05

a r = overall learning rate g = gain b = bias D t = length of simu-lated time step n = number of time steps corresponding to synapticdelay l is dened by Equation 3 a = relative effect of an inhibitorysynapse to an excitatory one h = magnitude of random synapticnoise


motor consequences of focal lesions of the basal gangliain man Brain 117 859ndash876

Brown R G amp Marsden C D (1988) Internal versus exter-nal cues and the control of attention in Parkinsonrsquos dis-ease Brain 111 323ndash345

Cabib S amp Puglisi-Allegra S (1994) Opposite responses ofmesolimbic dopamine system to controllable and uncon-trollable aversive experiences Journal of Neuroscience14 3333ndash3340

Calabresi P Maj R Pisani A Mercuri N B amp Bernardi G(1992) Long-term synaptic depression in the striatumPhysiological and pharmacological characterization Jour-nal of Neuroscience 12 4224ndash4233

Cleeremans A amp McClelland J L (1991) Learning the struc-ture of event sequences Journal of Experimental Psychol-ogy General 120 235ndash253

Cools A R Van Den Bercken J H L Horstink M W I VanSpaendonck K P M amp Berger H J C (1984) Cognitiveand motor shifting aptitude disorder in Parkinsonrsquos dis-ease Journal of Neurology Neurosurgery and Psychia-try 47 443ndash453

Cooper J R Bloom F E amp Roth R H (1996) The bio-chemical basis of neuropharmacology 7th ed New YorkOxford University Press

Cromwell H C amp Berridge K C (1996) Implementation ofaction sequences by a neostriatal site A lesion mappingstudy of grooming syntax Journal of Neuroscience 163444ndash3458

Downes J J Sharp H M Costall B M Sagar H J ampHowe J (1993) Alternating uency in Parkinsonrsquos diseaseAn evaluation of the attentional control theory of cogni-tive impairment Brain 116 887ndash902

Flaherty A W amp Graybiel A M (1994) Input-output organiza-tion of the sensorimotor striatum in the squirrel monkeyJournal of Neuroscience 14 599ndash610

Flowers K (1978) Lack of prediction in the motor behaviorof Parkinsonism Brain 101 35ndash52

Fuster J M (1973) Unit activity in prefrontal cortex duringdelayed-response performance neuronal correlates of tran-sient memory Journal of Neurophysiology 36 61ndash78

Fuster J M (1993) Frontal lobes Current Opinion in Neuro-biology 3 160ndash165

Goldman-Rakic P S (1987) Circuitry of primate prefrontalcortex and regulation of behavior by representationalmemory In V B Mountcastle (Ed) Handbook of physiol-ogy Section 1 The nervous system (Vol V pp 373ndash417)Bethesda MD American Physiological Society

Goldman-Rakic P S amp Selemon L D (1986) Topography ofcorticostriatal projections in nonhuman primates and im-plications for functional parcellation of the neostriatum InE G Jones amp A Peters (Eds) Cerebral cortex Sensory-motor areas and aspects of cortical connectivity (Vol 5pp 447ndash466) New York Plenum

Graybiel A M (1990) Neurotransmitters and neuromodula-tors in the basal ganglia Neuroscience 13 244ndash254

Graybiel A M Aosaki T Flaherty A W amp Kimura M (1994)The basal ganglia and adaptive motor control Science265 1826ndash1831

Hebb D O (1949) The organization of behavior New YorkWiley

Hedreen J C amp DeLong M R (1991) Organization of stria-topallidal striatonigral and nigrostriatal projections in themacaque Journal of Comparative Neurology 304 569ndash595

Hoover J E amp Strick P L (1993) Multiple output channelsin the basal ganglia Science 259 819ndash821

Houk J C Adams J L amp Barto A G (1995) A model ofhow the basal ganglia generate and use neural signals that

predict reinforcement In J C Houk J L Davis amp D GBeiser (Eds) Models of information processing in the ba-sal ganglia (pp 249ndash270) Cambridge MA MIT Press

Iacono R P Shima F Lonser R R Kuniyoshi S Maeda Gamp Yamada S (1995) The results indications and physiol-ogy of posteroventral pallidotomy for patients with Parkin-sonrsquos disease Neurosurgery 36 1118ndash1127

Kawaguchi Y Wilson C J amp Emson P C (1989) Intracellu-lar recording of identied neostriatal patch and matrixspiny cells in a slice preparation preserving cortical in-puts Journal of Neurophysiology 62 1052ndash1068

Kombian S B amp Malenka R C (1994) Simultaneous LTP ofnon-NMDA- and LTD of NMDA-receptor-mediated re-sponses in the nucleus accumbens Nature 368 242ndash246

Ljungberg T Apicella P amp Schultz W (1992) Responses ofmonkey dopamine neurons during learning of behavioralreactions Journal of Neurophysiology 67 145ndash163

Marsden C D amp Obeso J A (1994) The functions of the ba-sal ganglia and the paradox of stereotaxic surgery of Park-insonrsquos disease Brain 117 877ndash897

Middleton F A amp Strick P L (1994) Anatomical evidencefor cerebellar and basal ganglia involvement in higher cog-nitive function Science 266 458ndash461

Mink J W amp Thach W T (1991) Basal ganglia motor controlII Late pallidal timing relative to movement onset and in-consistent pallidal coding of movement parameters Jour-nal of Neurophysiology 65 301ndash329

Montague P R Dayan P Person C amp Sejnowski T J(1995) Bee foraging in an uncertain environment usingpredictive Hebbian learning Nature 376 725ndash728

Montague P R Dayan P amp Sejnowski T J (1996) A frame-work for mesencephalic dopamine systems based on pre-dictive Hebbian learning Journal of Neuroscience 161936ndash1947

Montague P R amp Sejnowski T J (1994) The predictivebrain Temporal coincidence and temporal order in synap-tic learning mechanisms Learning amp Memory 1 1ndash33

Mushiake H amp Strick P L (1995) Pallidal activity during se-quential arm movements Journal of Neurophysiology 742754ndash2758

Nambu A amp Llinas R (1994) Electrophysiology of globuspallidus neurons in vitro Journal of Neurophysiology 721127ndash1139

Owen A M Roberts A C Hodges J R Summers B APolkey C E amp Robbins T W (1993) Contrasting mecha-nisms of impaired attentional set-shifting in patients withfrontal lobe damage or Parkinsonrsquos disease Brain 1161159ndash1175

Parent A (1990) Extrinsic connections of the basal gangliaTrends in Neuroscience 13 254ndash258

Pascual-Leone A Grafman J Clark K Stewart M Mas-saquoi S Lou J-S amp Hallet M (1993) Procedural learn-ing in Parkinsonrsquos disease and cerebellar degenerationAnnals of Neurology 34 594ndash602

Polli J W amp Kincaid R L (1994) Expression of a cal-modulin-dependent phosphodiesterase isoform (PDE1B1)correlates with brain regions having extensive dopaminer-gic innvervation Journal of Neuroscience 14 1251ndash1261

Ryan L J amp Sanders D J (1993) Subthalamic nucleus le-sion regularizes ring patterns in globus pallidus and sub-stantia nigra pars reticulata neurons in rats BrainResearch 626 327ndash331

Ryan L J Sanders D J amp Clark K B (1992) Auto- andcross-correlation analyis of subthalamic nucleus neuronalactivity in neostriatal- and globus pallidal-lesioned ratsBrain Research 583 253ndash261

Schultz W Apicella P amp Ljungberg T (1993) Responses ofmonkey dopamine neurons to reward and conditioned


stimuli during successive steps of learning a delayed re-sponse task Journal of Neuroscience 13 900ndash913

Schultz W Apicella P Scarnati E amp Ljungberg T (1992)Neuronal activity in monkey ventral striatum related tothe expectation of reward Journal of Neuroscience 124595ndash4610

Smith Y Wichmann T amp DeLong M R (1994a) The exter-nal pallidum and the subthalamic nucleus send conver-gent synaptic inputs onto single neurones in the internalpallidal segment in monkey anatomical organization andfunctional signicance In G Percheron J S McKenzie ampJ Feger (Eds) The basal ganglia IV (Vol 41 pp 51ndash62)New York Plenum

Smith Y Wichmann T amp DeLong M R (1994b) Synaptic in-nervation of neurones in the internal pallidal segment bythe subthalamic nucleus and the external pallidum in mon-keys Journal of Comparative Neurology 343 297ndash318

Sutton R S (1988) Learning to predict by the methods oftemporal differences Machine Learning 3 9ndash44

Sutton R S amp Barto A G (1990) Time-derivative models ofPavlovian reinforcement In M Gabriel amp J Moore (Eds)Learning and computational neuroscience Foundationsof adaptive networks (pp 497ndash538) Cambridge MA MITPress

Swerdlow N R amp Koob G F (1987) Dopamine schizophre-nia mania and depression Toward a unied hypothesis ofcortico-striato-pallido-thalamic function Behavioral andBrain Sciences 10 197ndash245

Thiels E Barrionuevo G amp Berger T W (1994) Excitatorystimulation during postsynaptic inhibition induces long-term depression in hippocampus in vivo Journal ofNeurophysiology 72 3009ndash3016

Wagner J J amp Alger B E (1995) GABAergic and develop-

mental inuences on homosynaptic LTD and depotentia-tion in rat hippocampus Journal of Neuroscience 151577ndash1586

Wichmann T Bergman H amp DeLong M R (1994a) The pri-mate subthalamic nucleus I Functional properties in in-tact animals Journal of Neurophysiology 72 494ndash506

Wichmann T Bergman H amp DeLong M R (1994b) The pri-mate subthalamic nucleus III Changes in motor behaviorand neuronal activity in the internal pallidum induced bysubthalamic inactivation in the MPTP model of parkinson-ism Journal of Neurophysiology 72 521ndash530

Wickens J amp Kotter R (1995) Cellular models of reinforce-ment In J C Houk J L Davis amp D G Beiser (Eds) Mod-els of information processing in the basal ganglia(pp 187ndash214) Cambridge MA MIT Press

Willingham D B Nissen M J amp Bullemer P (1989) On thedevelopment of procedural knowledge Journal of Experi-mental Psychology Learning Memory and Cognition15 1047ndash1060

Wilson C J (1986) Postsynaptic potentials evoked in spinyneostriatal projection neurons by stimulation of ipsilateraland contralateral neocortex Brain Research 367 201ndash213

Wilson C J (1990) Basal ganglia In G M Shepherd (Ed)The synaptic organization of the brain (pp 279ndash316)New York Oxford University Press

Wolfram S (1991) Mathematica A system for doing mathe-matics by computer (2nd ed) Redwood City CA Addison-Wesley

Yang X-D Connor J A amp Faber D S (1994) Weak ex-citation and simultaneous inhibition induce long-term de-pression in hippocampal CA1 neurons Journal ofNeurophysiology 71 1586ndash1590


the other basal ganglia structures and themselves areoften considered part of the basal ganglia Dopamine hasbeen implicated in reward-driven learning (SchultzApicella amp Ljungberg 1993 Schultz Apicella Scarnati ampLjungberg 1992) and the VTA is known to be a self-stimulation site (Cooper Bloom amp Roth 1996) The roleof dopamine seems to be closely related to motor behav-ior and the need to perform an action in so-called oper-ant tasks where rewards are contingent upon acting Yetdopamine has also been implicated in cognitive decitsespecially in regards to schizophrenia where one predic-tor of pharmacological efcacy of an antipsychotic drugis its dopamine-receptor afnity In this paper we pro-pose a model for the role of dopamine that integrates itsrole in reinforcement with that of the aforementionedmotor planning

While the connectivity of the basal ganglia and itsventral extensions the nucleus accumbens and ventraltegmental area has been well described there is noconsensus regarding the types of computations thesestructures perform Although the overall function is be-lieved to be related to planning and executing actionsespecially sequences of actions it is not clear how thecircuitry could accomplish this Previous attempts toassign function to the basal ganglia circuit have primarilyrelied upon heuristic arguments without any analyticalor computational analysis to test these ideas Swerdlowand Koob (1987) attempted to understand certain as-pects of psychiatric disease by proposing a model basedon nested loops of activity through the ventral parts ofthe basal ganglia However their model did not attemptto characterize the behavior of the proposed circuitSeveral models have been developed that attempt tointegrate the role of cellular reinforcement with actionselection in the basal ganglia (Barto 1995 Berns ampSejnowski 1996 Graybiel Aosaki Flaherty amp Kimura1994 Houk Adams amp Barto 1995 Montague Dayan ampSejnowski 1996 Wickens amp Kotter 1995) although com-pelling at the level of classical conditioning these mod-els have not yet shown how a complex sequence ofactions could be implemented by the basal ganglia Wepropose in this paper that the basal ganglia project tocortical areas that implement actions and that they ltermultimodal information by selecting previously learnedoptimal actions based on the instantaneous cortical stateWe further show how the connectivity is ideally suitedto the production of optimal action sequences An earliermodel which was restricted to action selection hasappeared elsewhere (Berns amp Sejnowski 1996)

MODEL

We modeled the basal ganglia as groups of simpliedneurons that corresponded to the various divisions ofthe structure The primary assumption was that the basalganglia facilitate the production of action sequencesbased on cortical states (for review of this hypothesis

see Cromwell amp Berridge 1996 Marsden amp Obeso1994) In order to do this the basal ganglia must haveboth a mechanism that selects competing actions and amechanism by which to learn sequences Thus ourmodel demonstrates how the architecture of the basalganglia is well suited to the production of action se-quences and how diffuse reinforcement mechanismsallow for learning

Selection Model

The output stage of the basal ganglia the internal seg-ment of the globus pallidus (GPi) is known to be almostwholly GABAergic and tonically active Thus the GPitonically inhibits the target thalamic nuclei (VLmVLpcmc VLo CM) Because these nuclei may also gateascending information to their cortical targets (motorsupplementary motor prefrontal cortices) it is reason-able that they should be tonically inhibited until theascending information is required for action Further-more release from tonic inhibition in the thalamus leadsto a very rapid postinhibitory rebound Inhibition of thecorresponding GPi neurons may thus lead to a morerapid response in the thalamus than if they were directlyexcited Such a rapid response would be necessary toproduce rapid sequences of activity In our model weassumed that a mechanism exists in the GPi so thatwithin a pool of neurons only one is inhibited at a timea form of ldquoloser-take-allrdquo (Berns amp Sejnowski 1996)

Each structure within the basal ganglia was con-structed of a set of units each of which representedsome locally distributed function (Figure 1) The striatalunits represented pools of spiny neurons that reect thecortical state which itself is comprised of both externalrepresentations (eg parietal areas) and internal plan-ning representations (eg frontal cortex) The striatummapped the cortical state onto a nite set of actions asrepresented in the globus pallidus internus (GPi) oroutput layer The striatal-GPi or ldquodirectrdquo (Alexander ampCrutcher 1990) pathway thus allows for the direct map-ping of the striatal state onto a particular action How-ever if one assumes that the striatum reects the corticalstate the direct pathway really maps the cortical activityonto a nite set of possible actions in the globus palliduswithout any automatic sequencing occurring in the basalganglia

It has been hypothesized that the basal ganglia areinvolved in the production of action sequences (Crom-well amp Berridge 1996) In order to produce a sequenceefciently either a short-term memory must exist or theoutput state must be fed back to the input layer Thereis evidence for the latter in the form of segregatedcortical-subcortical loops (Alexander et al 1986 Hed-reen amp DeLong 1991) that presumably feed back infor-mation from the globus pallidus to the cortical area fromwhich the inputs originated However the time courseof activity propagation through this loop is probably too




t dBi(t)



Bi(t) = s [ l Bi(t 1) (1 l ) a Gi(t n)] (2)




where

l = t

t + D t(3)


s [x] = 1

1 + e g (x b )(4)



j


(5)


Synaptic Plasticity










e(t) = aring i




RESULTS






STN Striatal GP

0 0 0

1 0 1

0 1 0

1 1 0









R(t) = 1 1N

aring i=1

N

Gi(t) (9)







DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05


































































t dBi(t)



Bi(t) = s [ l Bi(t 1) (1 l ) a Gi(t n)] (2)




where

l = t

t + D t(3)


s [x] = 1

1 + e g (x b )(4)



j


(5)


Synaptic Plasticity










e(t) = aring i




RESULTS






STN Striatal GP

0 0 0

1 0 1

0 1 0

1 1 0









R(t) = 1 1N

aring i=1

N

Gi(t) (9)







DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05
































































where

l = t

t + D t(3)


s [x] = 1

1 + e g (x b )(4)



j


(5)


Synaptic Plasticity










e(t) = aring i




RESULTS






STN Striatal GP

0 0 0

1 0 1

0 1 0

1 1 0









R(t) = 1 1N

aring i=1

N

Gi(t) (9)







DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05

































































e(t) = aring i




RESULTS






STN Striatal GP

0 0 0

1 0 1

0 1 0

1 1 0









R(t) = 1 1N

aring i=1

N

Gi(t) (9)







DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05







































































R(t) = 1 1N

aring i=1

N

Gi(t) (9)







DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05
































































DISCUSSION





























METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05
























































































METHODS


Acknowledgments



REFERENCES







Parametera Value

r 005 (w) or 01 (v)

g 4

b 01

D t 10 msec

n 1

l 04 or 09

a 10

h 05
































































a computational model of how the basal ganglia produce sequences

Documents