life-long learning in sociable agents

A Hierarchical Reinforcement Learning Approach

Professor Andrea ThomazPeng Zhou

Sociable AgentsWhat are sociable agents?

Essentially, agents that must interact with humans in a social manner

Why sociable agents?

Major IssuesNatural language processing

Required for talking systemsActivity recognition

Not just in the real worldUser interface

Agent-human communication, non-linguisticLife-long learning

Teach, explore, reviseThe role of emotions

Not just fluff

My Focus (for the moment)How to build persistent agents that

accumulate concepts and skills “opportunistically” from its environmentEnvironment includes humans (usually non-

expert)Socially guided learning

Background: Teaching Agents Through Social Interaction

Human input is a long-standing topic in machine learning (ie supervised learning, learning by demonstration, etc.)

Many existing techniques for“teaching” the robot

Psychological benefitsEase of use (“how humans want to teach”),

increased believability, personal investment

Previous Work: Sophie’s KitchenReinforcement Learning, domain ~1000 statesAutonomous exploration

Human input: guidance & state rewardsCommunication channel: gazing, explicit actions

Conducted user studiesResults:

Improved learning speedInsight into how humans like to

teachFun for the human

Reinforcement LearningBasic idea: Finding an Optimal Policy

Act in the environment, receive rewards, modify policy accordingly

Typical formulation: a MDP defined by (S, A, R, T)Advantages:

Desirable statistical propertiesUnsupervised, autonomous learning

LimitationsThe curse of scalePoor transfer of knowledgeRewards can be hard to define

Hierarchical Reinforcement LearningTackles scaling and transfer problemsMay more closely resemble human cognitive

process and therefore inform their expectations for the agent“I’m trying to teach you how to open doors, darn it!”

Two main componentsHierarchical task structureState abstraction

Learning the hierarchy (as opposed to handcrafting)U-trees, HEXQ, diverse density approaches, …

My Approach: Extend Sophie’s Kitchen to HRLBasic idea behind Sophie’s Kitchen:

unsupervised learning is great, but if non-expert supervision is available, why not make use of it?Humans typically have insights into the domainHRL could make very good use of those structures

Challenges extending this to HRLAdapting non-expert, ambiguous inputModifying existing HRL algorithms to use adapted

inputSkill reuse and retention, evaluation of human

suggestions, improvement through practice, personality and trust issues

Current Research StatusExtended Sophie’s Kitchen domain to a tool-

use grid world domain: Sophie’s Adventure Basic Features

NavigationTool useHierarchical StructureTransferrable skillsLarge number of states

Current Research StatusOptions

Sutton, Precup, Singh (1999)HRL method that addresses hierarchical task

structure Temporally extended actions consisting of:

(Ι, π, β), where input set I is a subset of S, π is a local policy, β is the termination condition mapping states in S to [0, 1]

Learning options is a natural extension of RL learning

Primitive actions can be thought of as one-step options, options framework optimal if augmenting

Current Research StatusLearning Options

Feature-based“Clapping” reward channel

Multi-step guidanceIntra-options learning

Keep track of successes and failuresPractice when user is not aroundAggregate similar options

In ProgressFormalize Reward Types

State rewards: “doing good”Object-specific rewards: “look at this…”Special rewards: “that’s the way to do it”

Extracting state abstractions from rewardsObject-specific reward -> make object a

feature???

Planned Future WorkOptions-level state abstraction (MAXQ, HAM,

etc.)Learning options-level state abstraction

U-treesInvolving human input – ie pointing out salient

features of the environmentThe “trust” issue: extending the user

evaluation process for the purpose of formulating “trust” for certain users

Planned Future WorkActual transfer learning experiments, and

exploring how humans could facilitate the process

Carry out user studies on the systemAgent transparency in HRL – how to

communicate internal state to the humanAmbiguous user signalsShould agent ask for clarification?

ConclusionSociable agents are, or will be, ubiquitousThese agents should be able to learn from

humansSocially guided learning can both improve

the learning speed and “personalize” the agent

Higher-order learning likely necessary for realistic applications

Interesting inquiry into our own social expectations and desires

life-long learning in sociable agents

Documents

machine learning

unsupervised learning

improved learning speedinsight

teaching agents

persistent agents

good use

onestep options

hrladapting nonexpert