life-long learning in sociable agents

16
A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou

Upload: pancho

Post on 17-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Life-long Learning in Sociable Agents. A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou. Sociable Agents. What are sociable agents? Essentially, agents that must interact with humans in a social manner Why sociable agents?. Major Issues. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Life-long Learning in Sociable Agents

A Hierarchical Reinforcement Learning Approach

Professor Andrea ThomazPeng Zhou

Page 2: Life-long Learning in Sociable Agents

Sociable AgentsWhat are sociable agents?

Essentially, agents that must interact with humans in a social manner

Why sociable agents?

Page 3: Life-long Learning in Sociable Agents

Major IssuesNatural language processing

Required for talking systemsActivity recognition

Not just in the real worldUser interface

Agent-human communication, non-linguisticLife-long learning

Teach, explore, reviseThe role of emotions

Not just fluff

Page 4: Life-long Learning in Sociable Agents

My Focus (for the moment)How to build persistent agents that

accumulate concepts and skills “opportunistically” from its environmentEnvironment includes humans (usually non-

expert)Socially guided learning

Page 5: Life-long Learning in Sociable Agents

Background: Teaching Agents Through Social Interaction

Human input is a long-standing topic in machine learning (ie supervised learning, learning by demonstration, etc.)

Many existing techniques for“teaching” the robot

Psychological benefitsEase of use (“how humans want to teach”),

increased believability, personal investment

Page 6: Life-long Learning in Sociable Agents

Previous Work: Sophie’s KitchenReinforcement Learning, domain ~1000 statesAutonomous exploration

Human input: guidance & state rewardsCommunication channel: gazing, explicit actions

Conducted user studiesResults:

Improved learning speedInsight into how humans like to

teachFun for the human

Page 7: Life-long Learning in Sociable Agents

Reinforcement LearningBasic idea: Finding an Optimal Policy

Act in the environment, receive rewards, modify policy accordingly

Typical formulation: a MDP defined by (S, A, R, T)Advantages:

Desirable statistical propertiesUnsupervised, autonomous learning

LimitationsThe curse of scalePoor transfer of knowledgeRewards can be hard to define

Page 8: Life-long Learning in Sociable Agents

Hierarchical Reinforcement LearningTackles scaling and transfer problemsMay more closely resemble human cognitive

process and therefore inform their expectations for the agent“I’m trying to teach you how to open doors, darn it!”

Two main componentsHierarchical task structureState abstraction

Learning the hierarchy (as opposed to handcrafting)U-trees, HEXQ, diverse density approaches, …

Page 9: Life-long Learning in Sociable Agents

My Approach: Extend Sophie’s Kitchen to HRLBasic idea behind Sophie’s Kitchen:

unsupervised learning is great, but if non-expert supervision is available, why not make use of it?Humans typically have insights into the domainHRL could make very good use of those structures

Challenges extending this to HRLAdapting non-expert, ambiguous inputModifying existing HRL algorithms to use adapted

inputSkill reuse and retention, evaluation of human

suggestions, improvement through practice, personality and trust issues

Page 10: Life-long Learning in Sociable Agents

Current Research StatusExtended Sophie’s Kitchen domain to a tool-

use grid world domain: Sophie’s Adventure Basic Features

NavigationTool useHierarchical StructureTransferrable skillsLarge number of states

Page 11: Life-long Learning in Sociable Agents

Current Research StatusOptions

Sutton, Precup, Singh (1999)HRL method that addresses hierarchical task

structure Temporally extended actions consisting of:

(Ι, π, β), where input set I is a subset of S, π is a local policy, β is the termination condition mapping states in S to [0, 1]

Learning options is a natural extension of RL learning

Primitive actions can be thought of as one-step options, options framework optimal if augmenting

Page 12: Life-long Learning in Sociable Agents

Current Research StatusLearning Options

Feature-based“Clapping” reward channel

Multi-step guidanceIntra-options learning

Keep track of successes and failuresPractice when user is not aroundAggregate similar options

Page 13: Life-long Learning in Sociable Agents

In ProgressFormalize Reward Types

State rewards: “doing good”Object-specific rewards: “look at this…”Special rewards: “that’s the way to do it”

Extracting state abstractions from rewardsObject-specific reward -> make object a

feature???

Page 14: Life-long Learning in Sociable Agents

Planned Future WorkOptions-level state abstraction (MAXQ, HAM,

etc.)Learning options-level state abstraction

U-treesInvolving human input – ie pointing out salient

features of the environmentThe “trust” issue: extending the user

evaluation process for the purpose of formulating “trust” for certain users

Page 15: Life-long Learning in Sociable Agents

Planned Future WorkActual transfer learning experiments, and

exploring how humans could facilitate the process

Carry out user studies on the systemAgent transparency in HRL – how to

communicate internal state to the humanAmbiguous user signalsShould agent ask for clarification?

Page 16: Life-long Learning in Sociable Agents

ConclusionSociable agents are, or will be, ubiquitousThese agents should be able to learn from

humansSocially guided learning can both improve

the learning speed and “personalize” the agent

Higher-order learning likely necessary for realistic applications

Interesting inquiry into our own social expectations and desires