reasons to be careful about reward a flow (policy) cannot be specified with a scalar function of...

11
Reasons to be careful about reward • A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka the Helmholtz decomposition • Any (curl free) flow specified with reward can only have a fixed point attractor: reward cannot specify itinerant movement or policies • Value is produced by flow – not its cause: reward is a consequence of (defined by) behaviour not its cause The inherent tautology of reward: explaining behaviour in terms of maximising reward is like explaining the evolution of the eye by saying it maximises adaptive value Unresolved questions in motor control: A UCL-JHU workshop

Upload: agatha-gaines

Post on 14-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Reasons to be careful about reward

• A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka the Helmholtz decomposition

• Any (curl free) flow specified with reward can only have a fixed point attractor: reward cannot specify itinerant movement or policies

• Value is produced by flow – not its cause: reward is a consequence of (defined by) behaviour not its cause

The inherent tautology of reward: explaining behaviour in terms of maximising reward is like explaining the evolution of the eye by saying it maximises adaptive value

Unresolved questions in motor control: A UCL-JHU workshop

Page 2: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

A Physicist An Engineer An Economist

Page 3: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

( , , )

( , , )

( , )

( , )

( ) ln ( ( ) | )

a

s

a

x x a

s x a

a F s

F s D

F t p s t m

f

g

a

( ( ) | ) : ( )

( | ) 0 ( | ) ( )

p x t m p p p

p x m p x m

f E

( , )

( , )

( ) ( )

a

x

x a u u

u

x x a

x u

a t u x

f

f

f

0

( , , , ), ( , )

( ) ( , )( )

( , )( ( )) ( ( ))

d

t

P m

x t t x

t

B

A A

Random dynamical systems

Random attractors with small measure

( | ) ln ( ( ))H X m A

Kolmogorov forward equation

Free-energy formulation

| [ ( | )] ( )H X m H p x m dtF t Ergodic theorem

| exp( ( )) ( )

( ( )) ( ) ( )

( ( )) ( ( ))t

p x m V x

V x t R x V x

V x t R x t dt

f

E

( )V W Q V f

Helmholtz decomposition

( ) ( ( ))F t V x t

Value and reward

Free energy upper bounds expected cost

Page 4: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

( , )

( , )

( ) ( )

a

x

x a u u

u

x x a

x u

a t u x

f

f

f

Value and reward

( )V W Q V f

Helmholtz decomposition

max ( , ) ( ) ( , ) ( )u

x u V x x V x f f

Optimal control theory

( ( )) ( ( ))

( ( )) ( ) ( )

| exp( ( ))

t

V x t R x t dt

V x t R x V x

p x m V x

f

Page 6: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Forward models in motor control

Intrinsic frame of reference

Extrinsic frame of reference

hidden states

control

Optimal controlOptimal control

Motor commands Efference copyˆmin ( , )uu c x u dt

Forward modelForward modelState estimationState estimationSensory mappingSensory mapping

( )s x g

( , )c x uCost function

Plant kineticsPlant kinetics

( , )x x u f ˆ ˆ ˆ( ( ))x x K s g x ˆ ˆ( , )x f x u

u

s

u

Page 7: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Predictive coding in motor control

Intrinsic frame of reference

Extrinsic frame of reference

Optimal controlOptimal controlMotor commands

Efference copy

ˆmin ( , )uu c x u dt

Sensory mappingSensory mapping

( )s x g

( , )c x u

Cost function

Plant kineticsPlant kinetics

( , )x x u f

u

ˆ( )e e es g x

ps

u

es

Forward modelForward model

ˆ ˆ( , )x f x u K

Top-down predictions

Bottom up prediction error

ˆ( )p p ps g x

sensations control

Page 8: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Active inference

Intrinsic frame of reference

Extrinsic frame of reference

sensations

Classical reflexClassical reflex

Corollary discharge

min Ta p pa

Sensory mappingSensory mapping

( )s x g

vPrior beliefs

Plant kineticsPlant kinetics

( , )x x a f

e

( )a t

movementsForward modelForward model

ˆ ˆ( , )x f x v K

p

Bottom up prediction error

p

( )v t

Proprioceptive predictions

Page 9: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Vs

J

1

2

xs

x

(1)v

1J

1x

2x2J

(0,0)

1 2 3( , , )V v v v

visual input

proprioceptive input

Action with point attractorscf., equilibria point hypothesis

(1) (1) ( ( ) ( ))v v s a g

a (1)a va s

Descendingproprioceptive predictions

Exteroceptive predictions

(1)v

(2)v

(1)x

(1)x

(1)v

Page 10: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

(2)x

0 0.2 0.4 0.6 0.8 1 1.2 1.4

0.4

0.6

0.8

1

1.2

1.4

action

position (x)

posit

ion

(y)

0 0.2 0.4 0.6 0.8 1 1.2 1.4

observation

position (x)

Heteroclinic cycle

(1)x

(1)a va s

Action with heteroclinic cycles

Descendingproprioceptive predictions

Page 11: Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka

Unresolved questions in motor control: A UCL-JHU workshop

Reasons to be careful about reward

• A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka the Helmholtz decomposition

• Any (curl free) flow specified with reward can only have a fixed point attractor: reward cannot specify itinerant movement or policies

• Value is produced by flow – not its cause: reward is a consequence of (defined by) behaviour not its cause

The inherent tautology of reward: explaining behaviour in terms of maximising reward is like explaining the evolution of the eye by saying it maximises adaptive value – cf., Intelligent design