goal-directed feature and memory learning cornelius weber

27
al-Directed Feature and Memory Learni Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators: Sohrab Saeb and Jochen Triesch

Upload: kaiyo

Post on 14-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators: Sohrab Saeb and Jochen Triesch. for taking action, we need only the relevant features. y. z. x. unsupervised learning in cortex. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Goal-Directed Feature and Memory Learning Cornelius Weber

Goal-Directed Feature and Memory Learning

Cornelius Weber

Frankfurt Institute for Advanced Studies (FIAS)

Sheffield, 3rd November 2009

Collaborators: Sohrab Saeb and Jochen Triesch

Page 2: Goal-Directed Feature and Memory Learning Cornelius Weber

for taking action, we need only the relevant features

x

y

z

Page 3: Goal-Directed Feature and Memory Learning Cornelius Weber

unsupervisedlearningin cortex

reinforcementlearning

in basal ganglia

state spaceactor

Doya, 1999

Page 4: Goal-Directed Feature and Memory Learning Cornelius Weber

background:

- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

Page 5: Goal-Directed Feature and Memory Learning Cornelius Weber

reinforcement learning

go up? go right? go down? go left?

Page 6: Goal-Directed Feature and Memory Learning Cornelius Weber

reinforcement learning

input s

action a

weights

Page 7: Goal-Directed Feature and Memory Learning Cornelius Weber

action a

reinforcement learning

minimizing value estimation error:

d q(s,a) ≈ 0.9 q(s’,a’) - q(s,a)

d q(s,a) ≈ 1 - q(s,a)

moving target value

fixed at goal

q(s,a) value of a state-action pair(coded in the weights)

repeated running to goal:

in state s, agent performsbest action a (with random),yielding s’ and a’

--> values and action choices converge

input s

weights

Page 8: Goal-Directed Feature and Memory Learning Cornelius Weber

actor

go right? go left?

can’t handle this!

simple input

go right!

complex input

reinforcement learning

input (state space)

Page 9: Goal-Directed Feature and Memory Learning Cornelius Weber

sensory input

reward

action

complex input

scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;

reward given if horizontal bar at specific position

Page 10: Goal-Directed Feature and Memory Learning Cornelius Weber

need another layer(s) to pre-process complex data

feature detection

action selection

network definition:

s = softmax(W I)P(a=1) = softmax(Q s)

q = a Q s

a action

s state

I input

Q weight matrix

W weight matrix

position of relevant bar

encodes q

feature detector

Page 11: Goal-Directed Feature and Memory Learning Cornelius Weber

feature detection

action selection

network training:

E = (0.9 q(s’,a’) - q(s,a))2 = δ2

d Q ≈ dE/dQ = δ a sd W ≈ dE/dW = δ Q s I + ε

a action

s state

I input

W weight matrix

minimize error w.r.t. current target

reinforcement learning

δ-modulated unsupervised learning

Q weight matrix

Page 12: Goal-Directed Feature and Memory Learning Cornelius Weber

note: non-negativity constraint on weights

Details: network training minimizes error w.r.t. target Vπ

identities used:

Page 13: Goal-Directed Feature and Memory Learning Cornelius Weber

SARSA with WTA input layer

(v should be q here)

Page 14: Goal-Directed Feature and Memory Learning Cornelius Weber

RL action weights

feature weights

data

learning the ‘short bars’ data

reward

action

Page 15: Goal-Directed Feature and Memory Learning Cornelius Weber

short bars in 12x12 average # of steps to goal: 11

Page 16: Goal-Directed Feature and Memory Learning Cornelius Weber

RL action weights

feature weights

input reward 2 actions (not shown)

data

learning ‘long bars’ data

Page 17: Goal-Directed Feature and Memory Learning Cornelius Weber

WTAnon-negative weights

SoftMaxnon-negative weights

SoftMaxno weight constraints

Page 18: Goal-Directed Feature and Memory Learning Cornelius Weber

extension to memory ...

Page 19: Goal-Directed Feature and Memory Learning Cornelius Weber

if there are detection failures of features ...

... it would be good to have memory or a forward model

grey bars are invisible to the network

Page 20: Goal-Directed Feature and Memory Learning Cornelius Weber

s(t-1)

a(t-1)

network training by gradient descent as previously

softmax function used; no weight constraint

a action

s state

I input

W feature weights

Q action weights

Page 21: Goal-Directed Feature and Memory Learning Cornelius Weber

learnt feature detectors

Page 22: Goal-Directed Feature and Memory Learning Cornelius Weber

the network updates its trajectory internally

Page 23: Goal-Directed Feature and Memory Learning Cornelius Weber

network performance

Page 24: Goal-Directed Feature and Memory Learning Cornelius Weber

discussion

- two-layer SARSA RL performs gradient descent on value estimation error

- approximation with winner-take-all leads to local rule with δ-feedback

- learns only action-relevant features

- non-negative coding aids feature extraction

- memory weights develop into a forward model

- link between unsupervised- and reinforcement learning

- demonstration with more realistic data still needed

Page 25: Goal-Directed Feature and Memory Learning Cornelius Weber

video

Page 26: Goal-Directed Feature and Memory Learning Cornelius Weber
Page 27: Goal-Directed Feature and Memory Learning Cornelius Weber

Bernstein FocusNeurotechnology,BMBF grant 01GQ0840

EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3

Frankfurt Institutefor Advanced Studies,FIAS

Sponsors:

Thank you!

Collaborators:

Sohrab Saeb and Jochen Triesch