learning behaviourally grounded state representations for reinforcement learning agents

14
LEARNING BEHAVIOURALLY GROUNDED STATE REPRESENTATIONS FOR REINFORCEMENT LEARNING AGENTS Warning: Long title… Vinay Papudesi and Manfred Huber

Upload: elie

Post on 22-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Warning: Long title…. Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents. Vinay Papudesi and Manfred Huber. Staged skill learning involves: To Begin: “Skills” are innate reflexes and raw representation of the world. The Process: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

LEARNING BEHAVIOURALLY GROUNDED STATE

REPRESENTATIONS FOR REINFORCEMENT LEARNING

AGENTS

Warning: Long title…

Vinay Papudesi and Manfred Huber

Page 2: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

INTRODUCTION Staged skill learning involves:

To Begin: “Skills” are innate reflexes and raw representation

of the world. The Process:

Abstract away details of learnt skills Use these abstractions as part of a higher-level

representation: Behavioural results Affordances

Rinse and repeat

Page 3: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

THE DEVELOPMENTAL LEARNER State representation encodes only

those aspects of the environmental state owing behavioural and reward implications in the context of its current capabilities. A compact representation Becomes more and more abstract over

time

But how to model this?...

Page 4: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

STATE-SPACES Three yummy flavours:

External (World) State Space (…maps to…) Internal State Space(…composed of…) Action State Spaces

Internal and External spaces are good friends: Si ← I(Se) Where: Internal state = Si

External state = SeMapping function = I

Objective: Don’t hard-code mapping function, automate it!

Internal State Space is a vector of Action Spaces, one for each action the agent provides…

Page 5: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

ACTION SPACE An action space is defined as a vector of paired

(indicator, predicator) conditions. Conditions are task-agnostic

Can be reused for learning different tasks Improvement over previous work

When an action is performed: Signals a transition between internal states, S1 → S2. Observes an outcome from the world, oʹ. Two conditions are constructed:

Indicator: Cind(S2) = oʹ Predicator: Cpre(S1) = oʹ

Page 6: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

OUTCOMES, GENETIC ALGORITHMS, NON-DETERMINISM, OH MY!

World state space is potentially vast Must measure outcome somehow

Genetic Algorithms (GAs) are used to train hierarchical, rule-based, classifiers

What if an outcome cannot be accurately measured? Classifiers simply flag world state as non-

deterministic. Outcome is thus a triple type:

(success%, failure%, undetermined)

Page 7: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

‘FIND’ ACTION“Rotate 360° or until an object is visible”

Page 8: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

TASKS With the abstract state space constructed, the

agent can now learn optimal policies for completing tasks.

Treat the problem as a Markov Decision Process (MDP). From some internal state the agent must select an

appropriate action to progress toward completing the task optimally.

Reinforcement learning is used to compute such policies: Select the policy which maximises the expected future

return. Future reward is estimated from prior experience.

Page 9: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

THE TASK MODEL Must acquire a Task Model

Agent interacts with environment, recording experiences as it does so.

The internal source and destination states get updated with new conditions.

The reward function is re-computed as the average reinforcement value over all the recorded experiences pertaining to the chosen action.

Will eventually converge on the true model

Page 10: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

TASK-SPECIFIC CONDITIONS Not all tasks can be optimally represented with

this approach. Actions are individually encapsulated, knowledge

contained within them is not shared among them. E.g. ‘GOTO’ and ‘PICK’

Solution is to build ‘bipartition’ states Allow the GOTO task a condition on whether the

item can be PICKed. … but only if the reward for doing so is significant

and the condition is statistically stable (low variance) and deterministic.

Page 11: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

RESULTS - FORAGING Left:

A hard-coded, expert-designed state space and policy.

Right: Dynamically

acquired equivalent.

Page 12: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

RESULTS – STATE SPACE SIZE As the agent

interacts with the environment the proposed algorithm maintains a near-constant state space complexity.

The representation is continually abstracted.

Page 13: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

RESULTS – POLICY PERFORMANCE The presented

technique is comparable to manually-designed behaviour.

Domain specific models are slow to converge. Their state spaces

are more complex = harder to learn.

Page 14: Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

CONCLUSIONARY SENTIMENTS The paper describes an approach that constructs

an abstract internal state space that is grounded in the set of actions that the agent provides. Reinforcement learning aids in selecting actions to complete tasks.

By applying an inherently epigenetic design they have devised a developmental learner that produces results that are comparable to hand-rolled solutions.

Task learning is performed in a bottom-up fashion (actions to tasks), but the representation of new tasks thereafter can be constructed from the top-down using previously acquired state abstractions.