a scalable reinforcement learning approach to error handling in spoken language interfaces

31
A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

Upload: reuben

Post on 07-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces. Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

Dan Bohuswww.cs.cmu.edu/[email protected]

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15217

Page 2: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

2

problem

spoken language interfaces lack robustness when faced with understanding errors.

Page 3: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

3

more concretely …

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]S: Sorry I’m not sure I understood what you said. What city are you leaving from ?U: Urbana Champaign [FOR MINUTE SINCE HEY]S: I’m still having trouble understanding you. To go on, I need you to answer the

following question. What city are you leaving from?U: Chicago [CHICAGO]S: traveling from Chicago. Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]S: traveling in the afternoon. Okay what day would you be departing chicagoU: /uh/ the tenth of august [AT THE TENTH OF AUGUST]S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm,

arrives Seoul at 5pm the next day. Do you want to take that?U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]S: traveling on Saturday, August 12th … I have a flight departing Chicago at

1:40pm arrives Seoul at ………

Page 4: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

4

problem source

stems mostly from speech recognition spans most domains and interaction types exacerbated by operating conditions

spontaneous speech medium / large vocabularies large, varied, and changing user populations

Page 5: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

5

speech recognition impact

typical word-error-rates 10-20% for natives (novice users) 40% and above for non-native users

significant negative impact on performance[Walker, Sanders]

word-error-rate

task success

Page 6: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

6

approaches for increasing robustness

gracefully handle errors through interaction

fix recognition

detect the problems develop a set of recovery strategies know how to choose between them (policy)

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 7: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

7

outline

a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 8: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

8

MISunderstanding

non- and misunderstandings

NONunderstanding

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]S: Sorry I’m not sure I understood what you said. What city are you leaving from ?U: Urbana Champaign [FOR MINUTE SINCE HEY]S: I’m still having trouble understanding you. To go on, I need you to answer the

following question. What city are you leaving from?U: Chicago [CHICAGO]S: traveling from Chicago. Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]S: traveling in the afternoon. Okay what day would you be departing chicagoU: /uh/ the tenth of august [AT THE TENTH OF AUGUST]S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm,

arrives Seoul at 5pm the next day. Do you want to take that?U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm

arrives Seoul at ………

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 9: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

9

six not-so-easy pieces

detection

strategies

policy

misunderstandings non-understandings

recognition or semanticconfidence scores

typically trivial[some exceptions may apply]

explicit confirmationDid you say 10am?

implicit confirmationStarting at 10am… until what time?

accept, reject

Sorry, I didn’t catch that …Can you repeat that?

Can you rephrase that?You can say something like “at 10 a.m.”

[MoveOn]

Handcrafted heuristicsfirst notify, then ask repeat, then give help,

then give up

confidence threshold model

reject accept

explicit implicit0 1

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 10: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

10

outline

a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 11: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

11

spoken dialog system architecture

LanguageUnderstanding

Dialog Manager

DomainBack-end

LanguageGeneration

SpeechRecognition

SpeechSynthesis

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 12: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

12

reinforcement learning in dialog systems

LanguageUnderstanding

Dialog Manager

DomainBack-end

LanguageGeneration

SpeechRecognition

SpeechSynthesis

noisysemantic input

actions(semantic output)

debate over design choices

learn choices using reinforcement learning

agent interacting with an environment

noisy inputs temporal / sequential aspect task success / failure

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 13: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

13

NJFun

“Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System”

[Singh, Litman, Kearns, Walker]

provides information about “fun things to do in New Jersey”

slot-filling dialog type-of-activity location time

provide information from a database

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 14: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

14

NJFun as an MDP

define state-space define action-space define reward structure collect data for training & learn policy evaluate learned policy

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 15: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

15

NJFun as an MDP: state-space

internal system state: 14 variables state for RL → vector of 7 variables

greet: has the system greeted the user attribute: which attribute the system is currently querying confidence: recognition confidence level (binned) value: value has been obtained for current attribute tries: how many times the current attribute was asked grammar: non-restrictive or restrictive grammar was used history: was there any trouble on previous attributes

62 different states

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 16: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

16

NJFun as an MDP: actions & rewards

type of initiative (3 types) system initiative mixed initiative user initiative

confirmation strategy (2 types) explicit confirmation no confirmation

resulting MDP has only 2 action choices / state

reward: binary task success

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 17: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

17

NJFun as an MDP: learning a policy

training data: 311 complete dialogs collected using exploratory policy

learned the policy using value iteration begin with user initiative back-off to mixed or system initiative when re-asking for an

attribute specific type of back-off is different for different attributes confirm when confidence is low

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 18: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

18

NJFun as an MDP: evaluation

evaluated policy on 124 testing dialogs

task success rate: 52% → 64% weak task completion: 1.72 → 2.18 subjective evaluation: no significant

improvements, but move-to-the-mean effect

learned policy better than hand-crafted policies comparatively evaluated policies on learned MDP

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 19: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

19

outline

a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 20: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

20

challenge 1: scalability

contrast NJFun with RoomLine conference room reservation and scheduling mixed-initiative task-oriented interaction

system obtains list or rooms matching initial constraints system negotiates with user to identify room that best matches their

needs 37 concepts (slots), 25 questions that can be asked

another example: LARRI

full-blown MDP is intractable not clear how to do state-abstraction

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 21: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

21

challenge 2: reusability

underlying MDP is system-specific MDP design still requires a lot of human expertise new MDP for each system new training & new evaluation are we really saving time & expertise?

maybe we’re asking for too much?

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 22: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

22

addressing the scalability problem

approach 1: user models / simulations costly to obtain real data → simulate simplistic simulators [Eckert, Levin] more complex, task-specific simulators [Scheffler & Young]

real-world evaluation becomes paramount

approach 2: value function approximation data-driven state abstraction / state aggregation [Denecke]

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 23: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

23

outline

a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 24: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

24

reinforcement learning in dialog systems

LanguageUnderstanding

Dialog Manager

DomainBack-end

LanguageGeneration

SpeechRecognition

SpeechSynthesis

semantic input

actions / semantic output

Focus RL only on the difficult decisions!

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 25: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

25

task-decoupled approach

decouple

error handling decisions

domain-specific dialog control decisions

use reinforcement learning

use your favorite DM framework

advantages reduces the size of the learning problem favors reusability of learned policies lessens system authoring effort

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 26: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

26

RavenClaw

Dialogue Task (Specification)

Domain-Independent Dialogue Engine

RoomLine

Login

Welcome

AskRegistered AskName

GreetUser

GetQuery

DateTime Location Properties

Network Projector Whiteboard

GetResults DiscussResults

user_nameregistered

query

results

RoomLine

Login

AskRegistered

Dialogue Stack

registered: [No]-> false, [Yes] -> true

registered: [No]-> false, [Yes] -> trueregistered: [No]-> false, [Yes] -> trueuser_name: [UserName]

registered: [No]-> false, [Yes] -> trueregistered: [No]-> false, [Yes] -> trueuser_name: [UserName]user_name: [UserName]query.date_time: [DateTime]query.location: [Location]query.network: [Network]

Expectation Agenda

Error HandlingDecision Process

Strategies

ErrorIndicators

ExplicitConfirm

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 27: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

27

decision process architecture

RoomLine

Login

Welcome

AskRegistered AskName

GreetUser

user_nameregistered

GatingMechanism

Concept-MDP Concept-MDP

Topic-MDP

Topic-MDP

Small-size models Parameters can be tied across

models Accommodate dynamic task

generation

Favors reusability of policies Initial policies can be easily

handcrafted

No Action

Explicit Confirm

No Action

No Action

ExplicitConfirmation

Independence assumption

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 28: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

28

reward structure & learning

Gating Mechanism

MDP MDP MDP

Action

Global, post-gate rewardsReward

Gating Mechanism

MDP MDP MDP

Action

Local rewards

Reward Reward Reward

Rewards based on any dialogue performance metric

Atypical, multi-agent reinforcement learning setting

Multiple, standard RL problems

Risk solving local problems, but not the global one

a closer look : RL in spoken dialog systems : current challenges : RL for error handling

Page 29: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

29

conclusion

reinforcement learning – very appealing approach for dialog control

in practical systems, scalability is a big issue how to leverage knowledge we have?

state-space design solutions that account or handle sparse data

bounds on policies hierarchical models

Page 30: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

30

thankyou!

Page 31: A Scalable Reinforcement Learning Approach to Error Handling in  Spoken Language Interfaces

31

Structure of Individual MDPs

HC

ExplConf

ImplConf

NoAct

LC

ExplConf

ImplConf

NoAct

MC

ExplConf

ImplConf

NoAct

0NoAct

Concept MDPs State-space: belief indicators Action-space: concept scoped system actions

Topic MDPs State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions