goal directed reaching with the motor cortex model cheol han feb 20, 2007

Goal Directed Reaching with the Motor Cortex Model

Cheol HanFeb 20, 2007

Introduction Goal: A computational model for goal

directed reaching movement with biologically plausible motor cortex model, which can explain 1. neural coding in the motor cortex 2. relationship between skill learning and

map formation 3. reorganization of the motor cortex

after lesion with improvement of movement

Overview Dual Map

Motor output map Motor input map

Models Arm model with Hill-type muscles Cortex model Reinforcement learning framework

Results Discussion

Directional coding

Georgeopoulos, 1986

Dual Map Two views of neural coding in the motor cortex

Low-level, Muscle coding (Evarts…) High-level, Kinematic coding (Georgeopoulos, 1986) Both or Intermediate (joint) coding

We hypothesized Motor cortex output map: mainly encodes low-level muscle c

oding Motor input map: high-level kinematic coding

Learning goal directed movements with Actor-Critic

Learning feed-forward controller using temporal difference learning and actor-critic architecture (Sutton, 1984, Barto et al. 1983)

Biologically plausible (dopamine and/or acetylcholine modulation of LTP in motor cortex)

Continuous time and space (K Doya, 2000)

Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004.

Trajectory Planner

KinematicCoding

Motor CortexModel

Motoneurons(Spinal Cord)

Arm ModelWith muscles

Critic(Basal Ganglia)

Temporal Difference Learning

(Dopamine neurons)

CompetitiveHebbianLearning

Motor output map ICMS may exhibit characteristics of

corticospinal projections Monosynaptic projections from so

me M1 neurons to motoneurons Fetz and Cheney, 1980; Lemon et a

l., 1986 Todorov(2003), Donoghue group’s

Motor CortexModel

Motoneurons(Spinal Cord)

Motor input map Motor cortex neural

recording during voluntary movements (i.e. Georgeopoulos)

Activation level in the voluntary movement tends to be similar to the high level’s coding, kinematic coding

KinematicCoding

Motor CortexModel

Models Motor output map

Competitive Hebbian learning with a motor cortex model

Reversed feature extraction Motor input map

Temporal difference Reinforcement learning

Arm model Arm model: 2 links on the horizontal

plane 6 muscles with Hill-type muscle model Shoulder Extensor (E), Shoulder Flexor (F) Elbow Extensor (O), Elbow Flexor (C) Biarticular Extensor (B) and Flexor (T)

An accurate arm model is important Todorov (2002) mentioned characteristic

may be propagated from bottom to up.

Ning Lan(2002), Zajac (1989), Katayama (1993), Cheng et al., (2000), Spoelstra et al.(2000)

(from Spoelstra et al., 20001)

Motor Cortex model Chernjavsky and Moody, 1990

2 layer with GABA neurons. Shunting inhibitory GABA neurons Mexican Hat activation Shunting inhibition (Douglas et al.,

1995; Prescott et al., 2003) PYR

GABA

Model Diagram Our motor cortex model

includes the inverse dynamics and the inverse muscle model.

How do we learn it in a biologically plausible manner?

Using reinforcement learning

Provides an evaluation of the movement

Implementation with temporal difference learning based on the actor-critic structure

Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004.

TrajectoryGenerator

Inverse Dynamics

Inverse Muscle Model

Motoneurons

Arm

EvaluatorOf Mvmt

Joint staticLevelPlanning

Joint “force”LevelPlanning

MuscleLevelPlanning

ACTOR CRITIC

TD error

Actor-Critic Model (Sutton, 1984)

“Actor” produces a motor command

The motor command feeds into the plant.

“Critic” evaluate how good the movement was, compared with previous expectations (TD error)

Update “Actor” based on Critic’s evaluation.

Update “Critic”. If the actor is improved, the critic can expect better movements.

The worse movement than what the critic expected is discarded.

TrajectoryGenerator

MOTOR CORTEX

Arm

EvaluatorOf Mvmt

ACTOR CRITIC

TD error

Actor: compute the motor commands

Example of Actor: Bissmarck et al, 2005. Coding of kinematic variables Distributed coding Action pool: preferred tor

ques The layer contains action unit which is tuned to

“preferred torque” Competition between these preferred torques u

sing softmax. Pi is the probability to be chosen, shown as bar i

n the diagram.

Modifiable weights w exist between kinematic planning signal and preferred torques

Exploration using action perturbation

Kinematic planning

Torque (Joint Force)

exp( )exp( )

ii

jall actions

apa

pi

w TD

Preferred TorqueLayer

Critic: providing the reward prediction error for actor learning Temporal Difference Learning

Critic learns the reward prediction error by the temporal difference learning

The reward is generally delayed This prediction of reward helps to help generate correct actio

n choice before the reward is received (temporal credit assignment problem).

K Doya, 2000. in continuous time and space

Critic: The Basal Ganglia and dopamine neurons Dopamine neuron carry TD error (Schultz 1998) Reward prediction error is learned in the basal ganglia (O’Doh

erty Science 2004)

Critic: immediate reward A large reward is given at the goal. The reward function over space does not have to be continuous.

However, if it is continuous, it helps to find a good movement. The reward function bellow is (Bissmarck et al., 2005)

-0.2-0.1

00.1

0.2

-0.2

-0.1

0

0.1

0.2-2

0

2

4

6

x

reward when the target=[0,0])

y

imm

edia

te r

ewar

d

x

y

reward when the target=[0,0])

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

6 exp( 20 (distance between the end-effector and the target)) 0.3r

Critic: Reward prediction error

The total predicted reward at the current state includes discounted future rewards

A critic learns this predicted error at the current state

Delta shows how much action made difference between expected reward and real reward.

If it’s positive, the action was good.

0

1 10 0

tk k

t t k t kk k

R r r

t0t

( 0)0

t ttr

0

10

0

1 20

1 1

1 1

tk

t t kk

tk

t t kk

t t

t t t t t

V E r

E r r

E r V

V V r V V

Critic: Reward prediction error Example: Dopamine neuron

1 1t t tr V V CS US

Reward given

No reward

A well trained critic produced Just before reward is expected to be given

1 1t t tV r V

1 1 1 1

1

( )t t t t

t

r V r Vr

If there is no reward, because a well trained criticexpected , delta become negative. 1 1t t tV r V

Results (1): Motor output map Motor output map of the cortex model

Map representation is the muscle codingE

5 10 15 20

5

10

15

20

F

5 10 15 20

5

10

15

20

B

5 10 15 20

5

10

15

20

D

5 10 15 20

5

10

15

20

O

5 10 15 20

5

10

15

20

C

5 10 15 20

5

10

15

2010

20

30

40

50

60

E

5 10 15 20

5

10

15

20

F

5 10 15 20

5

10

15

20

B

5 10 15 20

5

10

15

20

D

5 10 15 20

5

10

15

20

O

5 10 15 20

5

10

15

20

C

5 10 15 20

5

10

15

2010

20

30

40

50

60

Results (2) : Motor output map 50 msec random stimulation on the motor cortex Motoneuron pattern shows ‘determined’ preferred direction. Actually, motoneuron is tuned to preferred “torque”. However, at

a fixed starting posture, preferred torque implies preferred direction

-4 -3 -2 -1 0 1 2 3 40

5

10

15

20

25

30

angles (radian)

# of

neu

rons

, whi

ch h

ave

the

pref

erre

d di

rect

ion

angl

e x

Results (3) : Motor input map

NOT FINISHED, NEED TUNING OF REINFORCEMENT LEARNING Movement is not fully learned Motor input map

Activation of the motor cortex during a voluntary movement. Broad activation (on 20% of movement time) Similar direction has similar pattern

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

10 20

5

10

15

20

10 20

5

10

15

2010 20

5

10

15

2010 20

5

10

15

20

10 20

5

10

15

20

10 20

5

10

15

2010 20

5

10

15

2010 20

5

10

15

20

Results (4) : Motor input map

Population code During the first 20% of

time Excluded insignificantly

tuned neurons (about half among 400 neurons)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6 0 1 2 3

0

1

2

-1 0 1 2

0

1

2

-1 0 1 20

1

2

Population code regarding "UNIT" preferred torque

-3 -2 -1 0 10

1

2

3

-4 -2 0

-1

0

1

2

-3 -2 -1 0-3

-2

-1

0

-2 0 2

-2

-1

0

0 1 2-2

-1

0

Short Discussion Neural coding and regression Tuning curve over directions

Cosine Sharper than cosine

Truncated cosine Advantage of population coding Two ways of neural coding

Neural coding and regression

Cricket detects wind direction with four neurons. ci is pre-tuned (preferred) wind direction of the ith neuron, and ri is its firing rate.

Regression error is the smallest where the preferred direction exists. (Its tuning curve is a truncated cosine function)

INFERENCE AND COMPUTATION WITH POPULATION CODES (Pouget A, Dayan P, Zemel RS. 2003)

Tuning Curve If the tuning curve is cosine function as Georg

eopoulos (1986) Perfect reconstruction using basis

If the tuning curve is sharper than cosine function Recently, sharper tuning curve has been reported

(Paninski et al., 2004; Scott et al., 2001) Distortion exists. (Regression error)

Advantage of population coding Low regression error:

Ideally, if preferred direction exist for all different directions. (Pouget et al., 2003)

Strong to noisy input Pouget et al., 2003

Less variability in the motor control Assumes signal dependent noise (SDN) Use more muscles, less variability (Todorov, 2002)

Future work Fine tuning of Reinforcement learning Cerebellum Concurrent Learning of Motor input map and

Motor output map Sensory cortex, which may be related to the fe

edback control “Premotor cortex” for inverse kinematic c

oding (Action sensory coding, currently implemented with SOM)

goal directed reaching with the motor cortex model cheol han feb 20, 2007

Documents