jennie sinsfadp06

43
[email protected] NSF ADP 2006 Jennie Si Department of Electrical Engineering Arizona State University Gradient Algorithms, Robustness, and Partial Observability - In the context of Cortical Neural Control using Rat Model

Upload: sundarnu

Post on 24-May-2015

112 views

Category:

Education


7 download

TRANSCRIPT

Page 1: Jennie sinsfadp06

[email protected] NSF ADP 2006

Jennie SiDepartment of Electrical Engineering

Arizona State University

Gradient Algorithms, Robustness, and Partial

Observability- In the context of Cortical Neural Control

using Rat Model

Page 2: Jennie sinsfadp06

[email protected] NSF ADP 2006

Motivation/Challenge/Societal Impact

• Introduce an interesting platform to study the higher function of the brain (the frontal cortical area and the motor area) in decision and control using designed control tasks

• Use systems tools (ADP, MDP, CI…) to understand some fundamental science questions

• Need to develop new tools: technology centered designs and theory centered analysis

• Inspire new ways of thinking about complex systems

Page 3: Jennie sinsfadp06

[email protected] NSF ADP 2006

Background on cortical motor control

• Center-out task and preferred direction

• Population coding of movement direction and speed

• Motor cortical neural activity as a predictive signal, preceding movement onset

• Brain-machine interface: open loop vs. close loop solution

Page 4: Jennie sinsfadp06

[email protected] NSF ADP 2006

Cortical neural signal extraction: non-invasive vs. invasive recording

• EEG – Rhythms β and μ, P300, Slow cortical potential (SCP)

– Sampling rate 200-1000Hz,

– # of channels, from 1 or 2 to 128 or 256

• Electrodes– Bioactive, allowing growth of nerve, or bio-inactive multiple

mircowires or multichannel electrode arrays

– Superficial motor areas or deep brain structures

– Primary motor, parietal, premotor, frontoparietal, basal ganglia

Page 5: Jennie sinsfadp06

[email protected] NSF ADP 2006

(d) Imagery is associated with decrease in µ (8–12 Hz) and β (18–26 Hz) bands.

imagining saying the word ‘move’

resting

electrodes for online control are circledspectral correlations of ECoG with target location (color encodes patients)

A brain–computer interface using electrocorticographic signals in humans*Leuthardt et al 2004 J. Neural Eng. 1 63-71 

Cortical neural signal extraction: ECoG

Page 6: Jennie sinsfadp06

[email protected] NSF ADP 2006

Chapin, J.K.; Moxon, K.A.; Markowitz, R.S.; and Nicolelis, M.A.L. (1999) Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex. Nature Neurosci., 2:664-670.

•Motor and Thalamic Regions •Used large number (40-60) of neurons•Regress the position of a water dripper arm•Used recurrent Neural Network

Page 7: Jennie sinsfadp06

[email protected] NSF ADP 2006

• SERRUYA, HATSOPOULOS, PANINSKI, FELLOWS & DONOGHUE. Instant neural control of a movement signal, NATURE 416 (6877): 141-142 MAR 14 2002– Monkey, Utah array, motor cortex,– 2D cursor position and velocity, Linear and Kalman Filters, – a few (7–30) MI neurons– careful calibration can lead to reasonable control without excessive training

a, b, Trial examples showing the movement by hand (green) and by neural reconstruction (blue) of a cursor to a target (red). Dotted outlines represent the actual circumference of the target and cursor on the screen. In a, hand motion resembles the neurally controlled cursor path; in b, no manipulandum motion occurred, but the neurally controlled cursor reached the target. Each dot represents an estimate of position, updated at 50-ms intervals. Axes are in x, y screen coordinates (1,000 units corresponds to a visual angle of 3.5°); note that the two trials take place in different parts of the workspace.

Page 8: Jennie sinsfadp06

[email protected] NSF ADP 2006

• Taylor, Dawn M., Tillery, Stephen I. Helms, Schwartz, Andrew B.,Direct Cortical Control of 3D Neuroprosthetic Devices, Science 2002 296: 1829-1832

– Monkey, microwire, motor and pre-motor cortex

– 3D cursor velocity, adaptive version of Population Vectors

– Showed small numbers of neurons can be used to control a three dimensional cursor and that neurons trained to control a cursor can control a real robot for feeding

Page 9: Jennie sinsfadp06

[email protected] NSF ADP 2006

• Carmena JM, Lebedev MA, Crist RE, et al., Learning to control a brain-machine interface for reaching and grasping by primates, PLOS BIOLOGY 1 (2): 193-208 NOV 2003 – Monkey, – high density array of 128 microwires, Motor, Premotor, Supplimentary Motor,

Posterior Parietal, and Sensory Cortex – 2D cursor position and velocity and gripping force, Linear Filters

Page 10: Jennie sinsfadp06

[email protected] NSF ADP 2006

Musallam, S., Corneil, B. D., Greger, B., Scherberger, H., and Andersen, R. A. (2004). "Cognitive Control Signals for Neural Prosthetics", Science, Vol 305, Issue 5681, 258-262

- Parietal reach region (PRR)

- Cognition-based prosthetic goal rather than trajectory

- Performance improved over a period of weeks.

- Expected value signals related to fluid preference, the expected magnitude, or probability of reward were decoded simultaneously with the intended goal.

Page 11: Jennie sinsfadp06

[email protected] NSF ADP 2006

Driving tasks

• The arena for training rats to drive the robot towards one of the light

Page 12: Jennie sinsfadp06

[email protected] NSF ADP 2006

Question asked

• How does the rat develop a control strategy to complete the driving tasks (under different time scale and spatial complexity)?

Page 13: Jennie sinsfadp06

[email protected] NSF ADP 2006

Neuroscientific evidence

• Multimodal association area - anterior association area (prefrontal cortex) integrating different sensory modalities and linking them to action

• Macaque and rat prefrontal cortex receives multimodal cortico-cortical projections from motor, somatosensory, visual, auditory, gustatory, and limbic cortices

• Prefrontal areas provide cognitive, sensory or motivational inputs for motor behavior (rastral region in rat)

• Motor areas are concerned with more concrete aspects of movement (caudal region in rat)

Page 14: Jennie sinsfadp06

[email protected] NSF ADP 2006

One step at a time…

First, a directional control task with only high level control commands

Page 15: Jennie sinsfadp06

[email protected] NSF ADP 2006

Directional control

Neural Interface Neural SignalsSignal Processing

Algorithms/Command Extraction

Sensors

Control Command

Vehicle State Signal

Vehicle

Environmental Feedback

The Brain-Controlled Vehicle

Page 16: Jennie sinsfadp06

[email protected] NSF ADP 2006

Goals

• To decode the directional control decision as a predictive signal from motor cortical neural activities

• To associate motor neural activities with motor behavior and thus to develop models to possibly interpret neural mechanism of cortical motor directional control

Page 17: Jennie sinsfadp06

[email protected] NSF ADP 2006

• male Sprague-Dawley rats• 2×4 arrays of 50µm tungsten wires coated with

polyimide • spaced 500µm apart for a size of approximately

1.5mm×0.5mm.• The implant site targets the rostral region

From Kolbe The Cerebral Cortex of the Rat, 1990

Page 18: Jennie sinsfadp06

[email protected] NSF ADP 2006

Recording System

Binned Data

Computation of Directional

Control Decision

Spike times Neural Activity Vector(NAV)

Neural Signals

Decision

Task Execution

Feedback - Visual, Auditory & Reward

Neuron 1 Neuron L

Bin 1 ... K Bin 1 ... K

· · ·

Right

Left

,1

,1LK NAV - dimensional vector

Brain Control Diagram

Page 19: Jennie sinsfadp06

[email protected] NSF ADP 2006

-2 -1 0 1 2

0

100

200sig001a

-2 -1 0 1 2

0204060

sig002a

-2 -1 0 1 2

0

40

sig003a

-2 -1 0 1 2

0

40

80sig003b

-2 -1 0 1 2

04080

120sig004a

-2 -1 0 1 2Time (sec)

02040

sig004b

-2 -1 0 1 2

0

20

40

sig005a

-2 -1 0 1 2

0

10

20

sig005b

-2 -1 0 1 2

04080

sig006a

-2 -1 0 1 2

0

40

80

sig007a

-2 -1 0 1 2

0

4

sig007b

-2 -1 0 1 2Time (sec)

0

80

sig008a

-2 -1 0 1 2

0

100

200

sig001a

-2 -1 0 1 2

04080

120sig002a

-2 -1 0 1 2

0

40

sig003a

-2 -1 0 1 2

0204060

sig003b

-2 -1 0 1 2

050

100150

sig004a

-2 -1 0 1 2Time (sec)

0

40

sig004b

-2 -1 0 1 2

02040

sig005a

-2 -1 0 1 2

0102030

sig005b

-2 -1 0 1 2

04080

sig006a

-2 -1 0 1 2

04080

sig007a

-2 -1 0 1 2

0

4

8sig007b

-2 -1 0 1 2Time (sec)

0

80

sig008a

Left Hits Right Hits

Perievent Histograms Rdar36

coun

ts/b

in

Page 20: Jennie sinsfadp06

[email protected] NSF ADP 2006

Cross validation accuracy boxplots for manual and brain control respectively, 5 rats, 8 data sets

• Each box shows the 25-75 quartile, median values of accuracy.

• R3, R5/1, R5/2, there are fewer than 30 trials in each brain control data set.

R1 R2 R3 R4/1 R4/2 R4/3 R5/1 R5/20.4

0.5

0.6

0.7

0.8

0.9

1

Rat/Day

CV

Acc

ura

cy

CV accuracy, Calibration and Brain control, all neurons

Calib 25/75Calib medianBrain 25/75Brain median

Typically 20 runs of randomized 5 fold cross-validation were performed for each data set.

Page 21: Jennie sinsfadp06

[email protected] NSF ADP 2006

Modeling rat’s directional control using MDP?

)10(),(

,3,2,1,0

,,,

,,2,1

21

factor discount function Cost

horizon decision Infinite

space action Finite

space state Finite

aic

T

aaa

nS

iii mi

A

s

ii iS

),,(

)(:

aapolicy controller Stationary

a a mapping Action AA

MDPs:

Page 22: Jennie sinsfadp06

[email protected] NSF ADP 2006

Manual lever press following cue Brain control - “imaginary lever press” following cue

Page 23: Jennie sinsfadp06

[email protected] NSF ADP 2006

Possible implementation

Define 6 possible states:• Idle – between two trials• Ready – right before trial start• Reward – success of a trial• No-Reward – failure of a trial • Left experiment state – left cue experiment• Right experiment state – right cue experiment

The action (control) is the rat’s volition represented by corresponding neural activities

Going from one state to another depends on the current state as well as the action taken.

• The reward can be stated as r (L\L) = 1; r(L\R)=-1 … r (R\R) = 1; r(R\L)=-1 …

Page 24: Jennie sinsfadp06

[email protected] NSF ADP 2006

Does this tell us more?

• “Open loop” discrimination and CV analysis provide a baseline of relating neural activity (spike trains) to behavioral parameters (left/right decision)

• As a decoding tool, can an MDP model tell us more than “open loop” analysis?

• MDP model to explain the experiment as a decision process

Page 25: Jennie sinsfadp06

[email protected] NSF ADP 2006

Technicalities

• How to represent control (start/stop and bin size)

Trial and error, hard to formulate theoretically

• How to compute the transition matrix given uncertainty, partially observed sequences of spike trains

We can try to formulate this theoretically…

Page 26: Jennie sinsfadp06

[email protected] NSF ADP 2006

• Uncertain transition matrices

– Robust value iteration (Nilim & El Ghaoui, 2005)

– Robust policy iteration (Satia & Lave, 1973)

Page 27: Jennie sinsfadp06

[email protected] NSF ADP 2006

Problem formulation

• Classification of uncertain transition matrices– Expression of uncertain transition matrices

1 11 11 1

1

f (U)

f (U)

f (U)

j ji i

m mn n

a a

a a

i i

a a

n

P

P P

P

)U(f

)U(f

)(

)1(1

)(

)1(1

nn

nnP

P

Pa

a

a

a

UP U:P

Page 28: Jennie sinsfadp06

[email protected] NSF ADP 2006

11

11

1 1

1 1

j mi n

j mi n

ji

a aa

i

a aa

i

a

i

P

P

The transition matrix is correlated if

The transition matrix is independent if

is the projection of on the direction

o

P P P P

P P P P

P P

(1) (2) ( )1 2

( )

, , ,

ji

i

a

i j i

nn

P i S a

P P P

a a a

f

is the projection of on the direction

of

A

P P[ ]

[]

Problem formulation• Classification of uncertain transition matrices

– Definition of uncertain transition matrices

x

y

I1

I2

S1S2

212211 IISIIS

Page 29: Jennie sinsfadp06

[email protected] NSF ADP 2006

Problem formulation• Classification of MDPs

– MDPs with independent transition matrices

– MDPs with correlated transition matrices

• Optimality criterion– Minimizing maximum value function for any initial state

SiivivPPs

)()(maxmin *

P

• Stationary optimal policy pair

Siiviviv

P

PP

PPP

s

state initialany for

if optimal is

)(maxmin)(max)(

,**

*

**

PP

Page 30: Jennie sinsfadp06

[email protected] NSF ADP 2006

Problem formulation

• MDPs with independent transition matrices – An optimal policy pair exists

– Robust value iteration and robust policy iteration are applicable

• MDPs with correlated transition matrices– An optimal policy pair exists and both iterations are applicable

– An optimal policy pair exists but both iterations are no longer applicable

– An optimal policy pair does not exist

Page 31: Jennie sinsfadp06

[email protected] NSF ADP 2006

Questions to be answered

• Sufficient conditions to guarantee that robust value iteration and robust policy iteration are applicable;

• Optimality criterion to make a stationary optimal policy pair exist in a weak condition;

• Efficient algorithm.

Page 32: Jennie sinsfadp06

[email protected] NSF ADP 2006

Sufficient conditions

1 ( ) ( )

1

( )

1

( , , ) ,

max : ( ) ( ) : , ( ) max

,

max

n i ii i

ns

iiiv P

n

q

qv v i g v c i i P v i S

q

Lemma

a a

a

For any given a a and any given

a (1)

For any given

P

1

: ( ) ( ) : min , maxn a a

i i i

aii av P

qv v i g v c i a P v i S

g g

(2)

The functions and are monotone non - decreasing and contractive.

The problems (1) and (2) have the unique

A P

( ) ( )

v v

v g v v g v

optimal solutions denoted as

and , which are the unique solutions to the fixed-point equations

and , respectively.

The optimal transition probility rows are given by

( ) ( )

*( ) ( ) *

* *

arg max ( )

arg max , ( )

i ii i

a ai i

i ii i

P

a ai i i

P

P P v i S P

P P v i S a P

a a

a a , which constitute (3)

, which constitute (4)

P

PA

Page 33: Jennie sinsfadp06

[email protected] NSF ADP 2006

Sufficient conditions

10

1 1

1

0;

( )

1

n

k k k

k k k

v

v k

v v g v

v v v v

k k

Iterations for obtaining

(1) select and set

(2) compute by

(3) terminate if and output ;

otherwise, set and go to (2)

Iterations for obta1

0

1 1

1

0;

( )

1

n

k k k

k k k

v

v k

v v g v

v v v v

k k

ining

(1) select and set

(2) compute by

(3) terminate if and output ;

otherwise, set and go to (2)

Page 34: Jennie sinsfadp06

[email protected] NSF ADP 2006

Sufficient conditions

*

*

, ( )

s P

P

Theorem

When there exist, for any

defined by (3) is in the set , and

defined by (4) is in the set

i) A stationary op

P

P

timal policy pair exists

under the optimality criterion of

minimizing maximum value function

for any initial state

ii) Robust value iteration is applicable;

iii) Robust policy iteration is applicable.

Page 35: Jennie sinsfadp06

[email protected] NSF ADP 2006

Robust value iteration

0

1

1

1

* *

0;

( ) min ( , ) max

1

(

a ai i i

n

k

ak i k

a P

k k

v k

v

v i c i a P v

v v k

1. Select and set

2. Compute by

3. If , then go to 4; otherwise increment by and go to 2

4. Compute a

A P

* *

*

*

, , )

( ) arg min ( , ) max

arg max{ }

a ai i i

a ai i

* ai k

a P

a ai ki P

P

i c i a P v

P P v

P

a and defined by

a

5. If , output a stationary optimal policy pair

A P

P

P * *( , )P ;

otherwise, the algorithm can not be applied.

Page 36: Jennie sinsfadp06

[email protected] NSF ADP 2006

0 0 0

1 1 1

1

, , 0;

;

( , , )

( ) arg min ( , ) max

k

ai i

s

k k k

ka P

k

v

π

i c i a

1. Initialization : select a a and set

2. Policy evaluation : do iteration for

3. Policy improvement : find a a

aA

*1

*

*

arg max{ }

1

k

ai

k

a ai i

ai

k k

a ai ii P

P v

π π P

P P v i S a

k

P

4. If , compute by

and go to 5; otherwise increment by and go to 2;

5. If , output a sta

P

PA

P * *( , )Ptionary optimal policy pair ;

otherwise, the algorithm can not be applied.

Robust policy iteration

Page 37: Jennie sinsfadp06

[email protected] NSF ADP 2006

1

2

1

2

1 2 1 2

1 1 11

3 3 212 22 2 12

4 4 22

1 2 3 4

1,2 ,

1 (1, ) 1

1 (1, ) 2

1 (2, ) 3

1 (2, ) 4

U , , , W 0,0.2,0.4,0.6

a

a

a

a

S a a

u u c aP

u u c aPP

u u c aP

u u c aP

u u u u

Example

A A

1 3 2 4 1 4

,0.8,1

U : , ; , W

,

u u u u u u P

P

Correlated transition matrix

Independent transition matrix for

Optimal controller policy

U

* * * * *1 1

*

, , (1) (2)

0 1

0 1

0 1

0 1

a a

P

a a a a

Optimal nature policy P

Sufficient conditions

Page 38: Jennie sinsfadp06

[email protected] NSF ADP 2006

New optimality criterion

2

min max

sP

P

P P P

V

V V V

Minimizing maximum squared total value function

(5)

Where total value function

P

* *

*

2 2 2* *

(1) ( ) ( )

, max min maxs

P P P P

P PP P P

V v v i v n

P V V V

Stationary optimal policy pair

is optimal if P P

Page 39: Jennie sinsfadp06

[email protected] NSF ADP 2006

New optimality criterion

2

* *

max

( , )

PP

V

P

Existence of stationary optimal policy pair

:

Assuming for any , exists, a stationary optimal

policy pair exists in terms of (5)

Relationship between two

Theorem

P

optimality criterions

Optimality criterion of minimizing maximum squared total value

function generalizes optimality criterion of minimizing maximum

value function for any initial state

Page 40: Jennie sinsfadp06

[email protected] NSF ADP 2006

Robust policy iteration under total value function

0

• Policy evaluation– Direct method

– Iterative method

CPIPICVP

PP

11

2maxmax

PP

• Policy improvement– Policy improvement in robust policy iteration

ka

iPa

k vPaicia

ia

ii PAmax),(minarg)(1 a

– Controller policy elimination

vIteration for

123*

22 k

kk PPVVrationt k-th itel policy afor optimacondition Necessary

Page 41: Jennie sinsfadp06

[email protected] NSF ADP 2006

0 0 0 00, , , ,

k

s

k

k M

P

1. Initialization : set and select a a

2. Policy evaluation :

If the condition of iteration for is satisfied

(a) use "iterative method" to compute and P

2 2 2

22

max

:

1

k k kk k

kk k

PP P P

k k P P

k

V V V

V V

such that

Else

(b) use "direct method"

3. Policy improvement :

(a) eliminate controller policies

If

P

2

1 1 1 1 1

1

, ,

( ) arg min ( , ) max

kk

ai i

k k k k k kP

k ia P

M V

i c i a P

If the condition in is satisfied

(b) Set and and select a a by

aaiA P

Theorem

1

2 2

1 1 1

1

, ,k kk k

ak

k k

k k k kP P

v

k k

V M M V

If , go to 4; otherwise, set and go to 2

Else

(c) If set and and then select

1

(

k k k

k k k k k

k k

and set and go to 2; otherwise, select and set

and and go to 2

Else

(d) go to 4

4. Termination : output , )kk P as a stationary optimal policy pair

Algorithm of robust policy iteration under total value function

Page 42: Jennie sinsfadp06

[email protected] NSF ADP 2006

How to estimate uncertain stationary transition matrices in Markov decision processes using the experimental data collected from the rat’s cortical motor areas while he performed his control tasks?

Proposed Solution:D-S theory of evidence is proposed as new models for obtaining set estimation of stationary transition matrix

Mathematics worked out, need to implement with algorithms and compare with existing models

Is a POMDP model more feasible? How?

More work needed to give the rat’s cortical neural control mechanism a reasonable mathematical model

Remaining issues toward MDP model of the rat’s neural control strategy

Page 43: Jennie sinsfadp06

[email protected] NSF ADP 2006

Acknowledgement

• Support by NSF under ECS-0002098 and ECS-0233529, and partially by General Dynamics

• Support by ASU infrastructural funds

• Byron Olson and Jing Hu for work on rat experiment and analysis

• Baohua Li for robust dynamic programming results

• Jiping He for help with experiments

• Useful discussions with many (Dankert, L. Yang, C. Yang, Raghunathan …)

• Lab support by many (Silver, Scanlan, Tian…)