the formation of habits - inria · initial formation and the later expression of habits. the model...

THE FORMATION OF HABITS The implicit supervision of the basal ganglia

MEROPI TOPALIDOU12e Colloque de Société des Neurosciences

Montpellier May 19-22, 2015

Goal-Directed Actions VS Habits

Belin et al. (2008), Yin (2008), Foerde & Shohamy (2011), Doll et al. (2012)


→ initiation of response is under direct control of the current value of outcome





→ sensitive to devaluation of the outcome


→ direct initiation of responding by stimulus and/or context presentation









→ resistant to devaluation of the outcome






→ behavior adjusts to reflect the new value of the outcome that the action would obtain







→ behavior adjusts to reflect the new value of the outcome that the action would obtain

→ habits persist even if the reward becomes less attractive or if the action is not necessary to earn the reward.


Cortex Basal Ganglia

Novel behaviors require attention and flexible thinking and therefore are dependent on cortex, whereas automatic behaviors has been assumed to be primarily mediated by subcortical structures. Much evidence suggests however, that subcortical structures, such as the striatum, make significant contributions to initial learning. More recently, evidence has been accumulating that neurons in the associative striatum are selectively activated during early learning, whereas those in the sensorimotor striatum are more active after automaticity has developed. At the same time, other recent reports suggest that automatic behaviors are striatum- and dopamine-independent, and may be mediated entirely within cortex. Resolving this apparent conflict should be a major goal of future research.

These ideas led to the theory that dominated the 20th century: Novel behaviors require attention and flexible thinking and therefore are dependent on cortex, whereas automatic behaviors require neither of these and so are not mediated primarily by cortex. Instead, it has long been assumed that automatic behaviors are primarily mediated by subcortical structures.

Cortex Basal Ganglia

Ashby, Turner & Horvitz (2010)

Habits go there

Goal Directed actions go here

Cortex leads decision once learned

BG “teach” cortex during

learning phase

Daw, Niv & Dayan (2005)

Novel behaviors require attention and flexible thinking and therefore are dependent on cortex, whereas automatic behaviors has been assumed to be primarily mediated by subcortical structures. Much evidence suggests however, that subcortical structures, such as the striatum, make significant contributions to initial learning. More recently, evidence has been accumulating that neurons in the associative striatum are selectively activated during early learning, whereas those in the sensorimotor striatum are more active after automaticity has developed. At the same time, other recent reports suggest that automatic behaviors are striatum- and dopamine-independent, and may be mediated entirely within cortex. Resolving this apparent conflict should be a major goal of future research.

These ideas led to the theory that dominated the 20th century: Novel behaviors require attention and flexible thinking and therefore are dependent on cortex, whereas automatic behaviors require neither of these and so are not mediated primarily by cortex. Instead, it has long been assumed that automatic behaviors are primarily mediated by subcortical structures.

Outline

• Experiment

• Computational model

• Results

Experimental setup

Trial start Cue presentation Go signal Decision Reward Trial stop

1.0s - 1.5s 1.0s - 1.5s 1.0s - 1.5s Time

0.75

0.25

Pre-learned cues

Novel cues (every day)

0.75

0.25

Hab

itual

Con

ditio

nN

ovel

Con

ditio

n

Two monkeys, simple two-armed bandit task with P=0.75 and P=0.25.

→ Habitual condition (known stimuli pair, same every day)→ Novel condition (unfamiliar stimuli pair, new every day)

Piron et al. (submitted)

Experimental results

1.0

0.8

0.6

0.4

0.2

0.00 20 40 60 80 100 120

Number of trials

Mea

n su

cces

s ra

te

HC NC saline


HC NCSaline Saline

Mean of first 25 trials Mean of last 25 trials

1.0

0.8

0.6

0.4

0.2

0.0Mea

n su

cces

s ra

te

HC NC

**

HC NCSaline Saline

Mean of first 25 trials Mean of last 25 trials

1.0

0.8

0.6

0.4

0.2

0.0Mea

n su

cces

s ra

te

HC NC

**

Experimental results

Muscimol injection in GPi disrupts learning in novel conditions (NC) but performances remains intact (but slower) in habitual conditions (HC).

Mean of last 25 trials

1.0

0.8

0.6

0.4

0.2

0.0

0 20 40 60 80 100 120

Number of trials

Me

an

su

cce

ss r

ate

HC NC

saline

muscimol

Saline Muscimol

1.0

0.8

0.6

0.4

0.2

0.0

Me

an

su

cce

ss r

ate

***

*

HC NC HC NC


Experimental conclusion

If habits were stored in basal ganglia, monkeys would not achieve peak performances in muscimol conditions for familiar stimuli.

If habits were learned in cortex, monkeys would be able to reach peak performances in muscimol conditions for unfamiliar stimuli.

Piron et al. (submitted)Mean of last 25 trials

1.0

0.8

0.6

0.4

0.2

0.0

0 20 40 60 80 100 120

Number of trials

Mean s

uccess r

ate

HC NC

saline

muscimol

Saline Muscimol

1.0

0.8

0.6

0.4

0.2

0.0

Mean s

uccess r

ate

***

*

HC NC HC NC

Computational model

Two segregated loops:

→ Cognitive loop allows to choose a shape→ Motor loop allows to reach a shape

External current External current

2

1

Thalamuscognitive

(4 units)

Thalamusmotor(4 units)

STNcognitive

(4 units)

GPimotor(4 units)

GPicognitive

(4 units)

Striatumcognitive

(4 units)

Striatumassociative

(4x4 units)

Cortexmotor(4 units)

Cortexcognitive

(4 units)

Cortexassociative

(4x4 units)

3

- - -

- -

-+ +

GPecognitive

(4 units)

-

-+

External current

INDI

RECT

PAT

HWAY

-

-

HYPE

RDIR

ECT

PATH

WAY

-

-

-

Striatummotor (4 units)

DIRE

CT P

ATHW

AY

GPemotor(4 units)

-

STNmotor(4 units)

Topalidou et al. (in prep.)

Neural NetworkNeuron Rate model

Cortico-basal competition

Cognitive decision has to intervene in motor decision.

Thanks to lateral competition, cortex can take a decision without interaction with BG.

Cortical decision

Cortico-Basal decision


2

1

Thalamuscognitive

(4 units)


STNcognitive

(4 units)

GPimotor(4 units)

GPicognitive

(4 units)

Striatumcognitive

(4 units)

Striatumassociative

(4x4 units)


Cortexcognitive

(4 units)

Cortexassociative

(4x4 units)

3

- - -

- -

-+ +

GPecognitive

(4 units)

-

-+

External current

INDI

RECT

PAT

HWAY

-

-

HYPE

RDIR

ECT

PATH

WAY

-

-

-


DIRE

CT P

ATHW

AY

GPemotor(4 units)

-

STNmotor(4 units)


Acting is learning

Learning occurs at three different places simultaneously.

① & ② Hebbian learning

③ Reinforcement learning

Cortex learns to reproduce previous repertories, regardless of whether or not are appropriate (HL).

Fast basal ganglia trial-and-error learning (RL) biases slow cortical one (HL) ensuring that the correct behavior is produced. Hélie et al. (2014)


2

1

Thalamuscognitive

(4 units)


STNcognitive

(4 units)

GPimotor(4 units)

GPicognitive

(4 units)

Striatumcognitive

(4 units)

Striatumassociative

(4x4 units)


Cortexcognitive

(4 units)

Cortexassociative

(4x4 units)

3

- - -

- -

-+ +

GPecognitive

(4 units)

-

-+

External current

INDI

RECT

PAT

HWAY

-

-

HYPE

RDIR

ECT

PATH

WAY

-

-

-


DIRE

CT P

ATHW

AY

GPemotor(4 units)

-

STNmotor(4 units)


Computational results

Intact model→ peak performances on familiar conditions→ can learn novel conditions

Lesioned model (GPi)→ peak performances on familiar conditions→ cannot learn novel conditions

Mean of last 25 trials

1.0

0.8

0.6

0.4

0.2

0.0

0 20 40 60 80 100 120

Number of trials

Me

an

su

cce

ss r

ate

HC NC

saline

muscimol

Saline Muscimol

1.0

0.8

0.6

0.4

0.2

0.0

Me

an

su

cce

ss r

ate

***

*

HC NC HC NC

(Monkey results)



2

1

Thalamuscognitive

(4 units)


STNcognitive

(4 units)

GPimotor(4 units)

GPicognitive

(4 units)

Striatumcognitive

(4 units)

Striatumassociative

(4x4 units)


Cortexcognitive

(4 units)

Cortexassociative

(4x4 units)

3

- - -

- -

-+ +

GPecognitive

(4 units)

-

-+

External current

INDI

RECT

PAT

HWAY

-

-

HYPE

RDIR

ECT

PATH

WAY

-

-

-


DIRE

CT P

ATHW

AY

GPemotor(4 units)

-

STNmotor(4 units)

Sensitivity to reward devaluation

Conclusion

The acquisition and the expression of habits are two entangled processes that can be dissociated experimentally.

This experimental dissociation sheds light on the nature of the interaction between the basal ganglia and the cortex and their respective role in the initial formation and the later expression of habits.

The model suggests that the basal ganglia implicitly supervises the cortex where habits are actually stored, but the cortex cannot learn them on its own.

In the future, the model will be tested in different protocols in order to ensure the accuracy of its predictions.

• Nicolas Rougier

• T. Boraud

• C. Piron

• D. Kase

• A. Leblois

Acknowledgements

550

500

450

400

0HC NC HC NC

Saline Muscimol

Rea

ctio

n pe

riod

(ms)

**

**

the formation of habits - inria · initial formation and the later expression of habits. the model...

Documents