on linking reinforcement learning with unsupervised learning cornelius weber, fias

18
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009

Upload: penney

Post on 22-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009. for taking action, we need only the relevant features. y. z. x. unsupervised learning in cortex. actor. state space. reinforcement learning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

On Linking Reinforcement Learningwith Unsupervised Learning

Cornelius Weber, FIAS

presented at Honda HRI, Offenbach, 17th March 2009

Page 2: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

for taking action, we need only the relevant features

x

y

z

Page 3: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

unsupervisedlearningin cortex

reinforcementlearning

in basal ganglia

state spaceactor

Doya, 1999

Page 4: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

actor

state space

1-layer RL model of BG ...

go left?

go right?... is too simple to handle complex input

Page 5: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

complex input(cortex)

need another layer(s) to pre-process complex data

feature detection

action selection

actor

state space

Page 6: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

models’ background:

- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

Page 7: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

sensory input

reward

action

scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;

reward given if horizontal bar at specific position

Page 8: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

model that learns the relevant features

top layer: SARSA RL

lower layer: winner-take-all feature learning

both layers: modulate learning by δ

RL weights

featureweights

input

action

Page 9: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

SARSA with WTA input layer

Page 10: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

note: non-negativity constraint on weights

Energy function: estimation error of state-action value

identities used:

Page 11: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

RL action weights

feature weights

data

learning the ‘short bars’ data

reward

action

Page 12: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

short bars in 12x12 average # of steps to goal: 11

Page 13: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

RL action weights

feature weights

input reward 2 actions (not shown)

data

learning ‘long bars’ data

Page 14: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

WTAnon-negative

weights

SoftMaxnon-negative

weights

SoftMaxno weight

constraints

Page 15: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

Discussion

- simple model: SARSA on winner-take-all network with δ-feedback

- learns only the features that are relevant for action strategy

- theory behind: derivation of value function estimation (approx.)

- non-negative coding aids feature extraction

- link between unsupervised- and reinforcement learning

- demonstration with more realistic data needed

Bernstein FocusNeurotechnology,BMBF grant 01GQ0840

EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3

Frankfurt Institutefor Advanced Studies,FIAS

Sponsors

Page 16: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS
Page 17: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS
Page 18: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

Bernstein FocusNeurotechnology,BMBF grant 01GQ0840

EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3

Frankfurt Institutefor Advanced Studies,FIAS

Sponsors

thank you ...