goal directed reaching with the motor cortex model cheol han feb 20, 2007
DESCRIPTION
Overview Dual Map Motor output map Motor input map Models Arm model with Hill-type muscles Cortex model Reinforcement learning framework Results DiscussionTRANSCRIPT
Goal Directed Reaching with the Motor Cortex Model
Cheol HanFeb 20, 2007
Introduction Goal: A computational model for goal
directed reaching movement with biologically plausible motor cortex model, which can explain 1. neural coding in the motor cortex 2. relationship between skill learning and
map formation 3. reorganization of the motor cortex
after lesion with improvement of movement
Overview Dual Map
Motor output map Motor input map
Models Arm model with Hill-type muscles Cortex model Reinforcement learning framework
Results Discussion
Directional coding
Georgeopoulos, 1986
Dual Map Two views of neural coding in the motor cortex
Low-level, Muscle coding (Evarts…) High-level, Kinematic coding (Georgeopoulos, 1986) Both or Intermediate (joint) coding
We hypothesized Motor cortex output map: mainly encodes low-level muscle c
oding Motor input map: high-level kinematic coding
Learning goal directed movements with Actor-Critic
Learning feed-forward controller using temporal difference learning and actor-critic architecture (Sutton, 1984, Barto et al. 1983)
Biologically plausible (dopamine and/or acetylcholine modulation of LTP in motor cortex)
Continuous time and space (K Doya, 2000)
Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004.
Trajectory Planner
KinematicCoding
Motor CortexModel
Motoneurons(Spinal Cord)
Arm ModelWith muscles
Critic(Basal Ganglia)
Temporal Difference Learning
(Dopamine neurons)
CompetitiveHebbianLearning
Motor output map ICMS may exhibit characteristics of
corticospinal projections Monosynaptic projections from so
me M1 neurons to motoneurons Fetz and Cheney, 1980; Lemon et a
l., 1986 Todorov(2003), Donoghue group’s
Motor CortexModel
Motoneurons(Spinal Cord)
Motor input map Motor cortex neural
recording during voluntary movements (i.e. Georgeopoulos)
Activation level in the voluntary movement tends to be similar to the high level’s coding, kinematic coding
KinematicCoding
Motor CortexModel
Models Motor output map
Competitive Hebbian learning with a motor cortex model
Reversed feature extraction Motor input map
Temporal difference Reinforcement learning
Arm model Arm model: 2 links on the horizontal
plane 6 muscles with Hill-type muscle model Shoulder Extensor (E), Shoulder Flexor (F) Elbow Extensor (O), Elbow Flexor (C) Biarticular Extensor (B) and Flexor (T)
An accurate arm model is important Todorov (2002) mentioned characteristic
may be propagated from bottom to up.
Ning Lan(2002), Zajac (1989), Katayama (1993), Cheng et al., (2000), Spoelstra et al.(2000)
(from Spoelstra et al., 20001)
Motor Cortex model Chernjavsky and Moody, 1990
2 layer with GABA neurons. Shunting inhibitory GABA neurons Mexican Hat activation Shunting inhibition (Douglas et al.,
1995; Prescott et al., 2003) PYR
GABA
Model Diagram Our motor cortex model
includes the inverse dynamics and the inverse muscle model.
How do we learn it in a biologically plausible manner?
Using reinforcement learning
Provides an evaluation of the movement
Implementation with temporal difference learning based on the actor-critic structure
Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004.
TrajectoryGenerator
Inverse Dynamics
Inverse Muscle Model
Motoneurons
Arm
EvaluatorOf Mvmt
Joint staticLevelPlanning
Joint “force”LevelPlanning
MuscleLevelPlanning
ACTOR CRITIC
TD error
Actor-Critic Model (Sutton, 1984)
“Actor” produces a motor command
The motor command feeds into the plant.
“Critic” evaluate how good the movement was, compared with previous expectations (TD error)
Update “Actor” based on Critic’s evaluation.
Update “Critic”. If the actor is improved, the critic can expect better movements.
The worse movement than what the critic expected is discarded.
TrajectoryGenerator
MOTOR CORTEX
Arm
EvaluatorOf Mvmt
ACTOR CRITIC
TD error
Actor: compute the motor commands
Example of Actor: Bissmarck et al, 2005. Coding of kinematic variables Distributed coding Action pool: preferred tor
ques The layer contains action unit which is tuned to
“preferred torque” Competition between these preferred torques u
sing softmax. Pi is the probability to be chosen, shown as bar i
n the diagram.
Modifiable weights w exist between kinematic planning signal and preferred torques
Exploration using action perturbation
Kinematic planning
Torque (Joint Force)
exp( )exp( )
ii
jall actions
apa
pi
w TD
Preferred TorqueLayer
Critic: providing the reward prediction error for actor learning Temporal Difference Learning
Critic learns the reward prediction error by the temporal difference learning
The reward is generally delayed This prediction of reward helps to help generate correct actio
n choice before the reward is received (temporal credit assignment problem).
K Doya, 2000. in continuous time and space
Critic: The Basal Ganglia and dopamine neurons Dopamine neuron carry TD error (Schultz 1998) Reward prediction error is learned in the basal ganglia (O’Doh
erty Science 2004)
Critic: immediate reward A large reward is given at the goal. The reward function over space does not have to be continuous.
However, if it is continuous, it helps to find a good movement. The reward function bellow is (Bissmarck et al., 2005)
-0.2-0.1
00.1
0.2
-0.2
-0.1
0
0.1
0.2-2
0
2
4
6
x
reward when the target=[0,0])
y
imm
edia
te r
ewar
d
x
y
reward when the target=[0,0])
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
6 exp( 20 (distance between the end-effector and the target)) 0.3r
Critic: Reward prediction error
The total predicted reward at the current state includes discounted future rewards
A critic learns this predicted error at the current state
Delta shows how much action made difference between expected reward and real reward.
If it’s positive, the action was good.
0
1 10 0
tk k
t t k t kk k
R r r
t0t
( 0)0
t ttr
0
10
0
1 20
1 1
1 1
tk
t t kk
tk
t t kk
t t
t t t t t
V E r
E r r
E r V
V V r V V
Critic: Reward prediction error Example: Dopamine neuron
1 1t t tr V V CS US
Reward given
No reward
A well trained critic produced Just before reward is expected to be given
1 1t t tV r V
1 1 1 1
1
( )t t t t
t
r V r Vr
If there is no reward, because a well trained criticexpected , delta become negative. 1 1t t tV r V
Results (1): Motor output map Motor output map of the cortex model
Map representation is the muscle codingE
5 10 15 20
5
10
15
20
F
5 10 15 20
5
10
15
20
B
5 10 15 20
5
10
15
20
D
5 10 15 20
5
10
15
20
O
5 10 15 20
5
10
15
20
C
5 10 15 20
5
10
15
2010
20
30
40
50
60
E
5 10 15 20
5
10
15
20
F
5 10 15 20
5
10
15
20
B
5 10 15 20
5
10
15
20
D
5 10 15 20
5
10
15
20
O
5 10 15 20
5
10
15
20
C
5 10 15 20
5
10
15
2010
20
30
40
50
60
Results (2) : Motor output map 50 msec random stimulation on the motor cortex Motoneuron pattern shows ‘determined’ preferred direction. Actually, motoneuron is tuned to preferred “torque”. However, at
a fixed starting posture, preferred torque implies preferred direction
-4 -3 -2 -1 0 1 2 3 40
5
10
15
20
25
30
angles (radian)
# of
neu
rons
, whi
ch h
ave
the
pref
erre
d di
rect
ion
angl
e x
Results (3) : Motor input map
NOT FINISHED, NEED TUNING OF REINFORCEMENT LEARNING Movement is not fully learned Motor input map
Activation of the motor cortex during a voluntary movement. Broad activation (on 20% of movement time) Similar direction has similar pattern
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
10 20
5
10
15
20
10 20
5
10
15
2010 20
5
10
15
2010 20
5
10
15
20
10 20
5
10
15
20
10 20
5
10
15
2010 20
5
10
15
2010 20
5
10
15
20
Results (4) : Motor input map
Population code During the first 20% of
time Excluded insignificantly
tuned neurons (about half among 400 neurons)
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6 0 1 2 3
0
1
2
-1 0 1 2
0
1
2
-1 0 1 20
1
2
Population code regarding "UNIT" preferred torque
-3 -2 -1 0 10
1
2
3
-4 -2 0
-1
0
1
2
-3 -2 -1 0-3
-2
-1
0
-2 0 2
-2
-1
0
0 1 2-2
-1
0
Short Discussion Neural coding and regression Tuning curve over directions
Cosine Sharper than cosine
Truncated cosine Advantage of population coding Two ways of neural coding
Neural coding and regression
Cricket detects wind direction with four neurons. ci is pre-tuned (preferred) wind direction of the ith neuron, and ri is its firing rate.
Regression error is the smallest where the preferred direction exists. (Its tuning curve is a truncated cosine function)
INFERENCE AND COMPUTATION WITH POPULATION CODES (Pouget A, Dayan P, Zemel RS. 2003)
Tuning Curve If the tuning curve is cosine function as Georg
eopoulos (1986) Perfect reconstruction using basis
If the tuning curve is sharper than cosine function Recently, sharper tuning curve has been reported
(Paninski et al., 2004; Scott et al., 2001) Distortion exists. (Regression error)
Advantage of population coding Low regression error:
Ideally, if preferred direction exist for all different directions. (Pouget et al., 2003)
Strong to noisy input Pouget et al., 2003
Less variability in the motor control Assumes signal dependent noise (SDN) Use more muscles, less variability (Todorov, 2002)
Future work Fine tuning of Reinforcement learning Cerebellum Concurrent Learning of Motor input map and
Motor output map Sensory cortex, which may be related to the fe
edback control “Premotor cortex” for inverse kinematic c
oding (Action sensory coding, currently implemented with SOM)