planning, optimizing, and characterizing

PLANNING, OPTIMIZING, AND CHARACTERIZING

Three approaches to dialogue management

Presented by Lee BeckerOctober 21, 2009

“The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.”– Dorothy Neville

Introduction

Sample Dialogue

1 Tutor: Hi, how are you?2 Student

:Good

3 Tutor: Excellent, Let’s talk a bit about your experiences with science. Tell me about what you have been doing in science most recently.

4 Student:

We’ve been learning about circuits and how to work light bulbs

5 Tutor: Circuits and working light bulbs, cool. Tell me more about circuits

The Dialogue Management Problem Giving an appropriate response

Understanding what was said and how it fits into the overall conversational context

Responding with intention Obeying social norms

Turn-taking Feedback / Acknowledgment

“Words are also actions, and actions are a kind of words.”

– Ralph Waldo Emerson

Dialogue as Planning

System View of Dialogue Management

Dialogue Manager

User Utterance Response

Planning Agents

Maintain state of the world (beliefs)

Predetermined wants (desires) specify how the world should look

Select goals (intentions)

Build/Execute Plan Belief Monitoring BDI Architecture

Blocks World Example Init(On(A, Table) ^ On(B, Table) ^ On(C, Table)

^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C))

Goal(On(A,B) ^ On(B,C)) Action( Move(b,x,y))

Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b)^ (b != x) ^ (b != y) ^ (x != y)

Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) Action (MoveToTable(b,x))

Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x)

A B C A CB

CBA

Speech acts and planning

Planning is intuitive for physical actions

How can utterances fit into a plan? “Can you give me the directions to The

Med?” “Did you take out the trash?” “I will try my best to be at home for

dinner” “I name this ship the "Queen Elizabeth””

Speech Acts (Austin, Searle) Illocutionary Force Performative Action

Speech Acts Meet AI

Allen, Cohen, and Perrault Speech Acts Expressed in terms of

Preconditions Effects

Related to change in agent’s mental states

Plans are sequence of speech acts

Example Speech Acts

REQUEST(speaker, hearer, act): effect: speaker WANT hearer DO act

INFORM(speaker, hearer, proposition): effect: KNOW(hearer, proposition)

TRAINS

A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition

Conversational Agent for Logistics and planning

Users converse with a “Manager” to develop a plan of action in the TRAINS domain.

Sample TRAINS ScenarioUser (1) We have to make OJ.

(2) There are oranges at I.(3) And an OJ Factory at B.(4) Engine E3 is scheduled to

arrive at I at 3PM(5) Shall we ship the

oranges?System

(6) Yes,(7) Shall I start loading the

oranges in the empty car at I?

User (8) Yes, and well have E3 pick it up.

(9) OK?System

(10)

OK

City B

City G

City I

OJ Factory

OrangeSource

BananaSource

Empty Car

Engine E3Empty car

Deliberative Agent

Communicative Agent

Discourse Obligations

BDI does not account for what compels one speaker to answer another

Two Strangers Example: A: Do you know the time? B: Sure. It’s half past two.

Answering Person B’s questions does not help Person A attain his goals.

Discourse Obligations Obligations – Like Speech Acts,

produce observable effects on the speaker.Source of obligation Obliged action

S1 Accept or Promise A S1 achieve AS1 Request A S2 address Request:

accept or reject AS1 Y/N-Question whether P S2 Answer-if PS1 Wh-Question S2 Inform-ref xutterance not understood or incorrect

repair utterance

S1 Initiate discourse unit S2 acknowledge discourse unit

Request Repair of P Repair PRequest Acknowledgement of P Acknowledge P

Discourse Obligations

Inherent tension between Obligations and Goals

Approaches Perform all obligated actions Perform only actions that will lead to

desired state A blend of the other two approaches

TRAINS Discourse Obligationsloop

if system has obligations then address obligationselse if system has performable intentions then perform actionselse deliberate on goals end if

end loop

TRAINS Discourse Obligations Ensure system cooperation even if

response is in conflict with the user’s goals

Aids in developing mixed-initiative Goal-driven actions Speaker Led

Initiative Obligation-driven actions Other Led

Initiative

Mutual Belief and Grounding Conversational agents do not act in

isolation Mental states should account for:

Private Beliefs Shared Beliefs

In TRAINS Shared Belief needed for:

Modeling the domain-plan under-construction

Common understanding

Mutual Belief and Grounding Extended Conversation ActsAct Type Description Sample ActsTurn-Taking Maintain, release, or

take the turn in the dialogue

take-turn, keep-turn

Grounding Deal with establishing shared knowledge about the dialogue

Repair, acknowledge

Core Speech Acts

Original illocutionary acts

Inform, yes-no-question, accept, request

Argumentation Characterize the relationship between utterances

Elaborate, Q&A

The TRAINS Approach

Attempts to capture the processes underlying dialogue via: Speech acts Discourse Obligations Mutual Belief, Grounding

Potentially Rigid Rules and logic handcrafted

“When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell

Dialogue as a Markov Decision Process

Flexible Dialogue

Qualities of robust dialogue Flexible conversation flow Adapted to users’ preferences / skill levels Resilient to errors in understanding

The dialogue author’s dilemma: Robustness vs. Effort:

Other Issues: Noisy Channels: ASR, NLU Evaluation

What is an optimal decision?

Dialogue with uncertainty

Markov Decision Process (MDP) Probabilistic Framework Ability to model planning and decision-

making over time Based on the Markov Assumption: Future states depend only on the current

state Future states independent of other past

states

Markov Decision Processes

Markov chains with choice!

+✓


Agent based process defined by 4-tuple: S: A set of states describing the agent’s

world A: A set of actions the agent may take T: A set of transition probabilities:Pa(s,s’) =P(s’|s,a) = P(st+1|st, at)

R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.

Markov Decision Processes Policy Function π(s)

Mapping of states to actions Optimal policy π*(s) yields highest

possible cumulative reward MDP with a fixed policy is a Markov

Chain Rewards

Cumulative Reward

€

Q = λ tr(st ,a)t= 0

∞

∑

Q = r(s,a) +λ P(s' | s,a)maxa '

Q(s',a')s'

∑

Solving an MDP

Approaches: Value Iteration, Policy Iteration, Q-Learning

Ideally: Encode state space with relevant features and

rewards Compute state transition and reward

probabilities directly from a corpus of annotated dialogues

In Practice: Reduce state space and do random exploration Simulate a user and produce a corpus

Reinforcement Learning for Dialogue Policy Design NJFun System [Singh, et al 2002]

NJFun Sample Dialogue

S1: Welcome to NJFun. How may I help you? U1: I’d like to find um winetasting in Lambertville in the

morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning)

S2: Did you say you are interested in Lambertville? U2: Yes. S3: Did you say you want to go in the morning? U3: Yes. S4: I found a winery near Lambertville that is open in the

morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’.

U4: Good. S5: Goodbye and have a nice day.

NJFun – State Spaces

Full state space – encode everything State explosion quickly makes

optimization intractable State-space Estimator

Encode sufficient information to learn good decisions

NJFun – State Space Estimator Limited policy optimization for two

types of decisions Initiative – Direct vs. Open Ended

System Initiative: “Please say the name of the town or city that you are interested in.”

User Initiative: “Please give me more information.”

Confirmation – Verify or Assume “Did you say you are interested in

<location>?

NJFun State Features & ValuesFeature Values ExplanationGreet (G) 0, 1, 2 Whether the system has greeted the

userAttribute (A) 1,2,3,4 Which attribute is being worked on Confidence / Confirmed (C)

0,1,2,3,4

0,1,2 for low, medium and high ASR confidence3,4 for explicitly confirmed and disconfirmed

Value (V) 0,1 Whether the value has been obtained for current attribute

Tries (T) 0,1,2 How many times current attribute has been asked

Grammar (M) 0,1 Whether non-restrictive or restrictive grammar was used

History (H) 0,1 Whether there was trouble on any previous attribute

State Space Estimator

Features yield 62 possible dialogue states

42 Choice States each with 2 actions per state Confirm/Not confirm System/User initiative

In total 242 unique dialogue trajectories

Finding an Optimal Policy

Gathering Training Data New system built with randomized dialogue

policy Deployed to 54 users each assigned 6 tasks 311 dialogues in total

Reward Measure Binary task completion

+1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…)

-1: Otherwise Reinforcement Learning

Finding an Optimal Policy

RL Learned Policy: Initiative

Begin with user initiative Back off to mixed or system initiative

Confirmation In general confirm at lower confidence

ASR values Other features describe more complex

interventions

Evaluating the Optimal Policy System with optimal policy tested on

additional 21 users 124 test dialogues Did not significantly perform baseline

on binary completion measure (p =0.059)

Statistically significant improvement using weak completion and ASR measures

Limited Observability

MDPs assume the world is fully observable

However: Not all errors or states are directly

observable Undetected errors may propagate Evidence may not indicate error

Limited Observability

Turn Utterance [ASR Output] S1 Are you a registered user?U2 No, I’m not [NO I’M NOT]S3 What is your full nameU4 <Username> [ATHENS IN AKRON]S5 There is an Athens in Georgia and in Greece. What destination did

you want?U6 Neither [NEITHER] S7 I’m sorry, I did not catch that. There is an Athens in Georgia and in

Greece. Which destination do you want? U8 Georgia [GEORGIA] S9 A flight from Athens. . . Where do you want to go? U10 Start over [START OVER]

Partially Observable Markov Decision Processes (POMDPS) Intuition

Maintain parallel hypothesis of what was said

Backpedal or switch strategies when a hypothesis becomes sufficiently false

POMDP ExampleSystem / User / [ASR] POMDP belief state Traditional Method

Order: { size: <null>}

Order: { size: small}Order: { size: small}

Order: { size: large[?]}

Initial State

S: How can I help you?U: A small pepperoni pizza. [A small pepperoni pizza.]S: Ok, what toppings?U: A small pepperoni [A small pepperoni]

S: And why type of crust?U: Uh just normal [large] normal

S M L

b

S M L

b

S M Lb

S M L

b

A comparison of Markov Models

Markov ModelsDo we have control over

state transitionsNo Yes

Are states completely observable?

Yes Markov-Chain MDP

No HMM POMDP

Table courtesy of http://www.cassandra.org/pomdp/pomdp-faq.shtml

POMDPs

Extends the MDP Model O: Set of observations agent can receive

about the world Z: Observation Probabilities b(s): Belief state, probability of being in

state s Not in fixed state, instead maintains

a probability distribution over all possible states

POMDPs

Influence DiagramR: RewardA: ActionS: StateO: Observation

POMDPs for Spoken Dialogue Systems SDS-POMDP [Williams and Young

2007] Claim: POMDPs perform better for SDS

because Maintain parallel dialogue state Can incorporate ASR confidence scores

directly in the belief state update

SDS-POMDP ArchitectureA ActionY Audio Signal~A Recognized

ActionC Confidence

SDS-POMDP ComponentsStandardPOMDP

SDS-POMDP

State set S (Sm, Au, Sd)

Observation Set O (~Au, C)

Action Set A (Am)Transition Function p(s’|s,a) p(s’u|su,am)p(a’u|s’u,am)p(s’d|

a’u,sd, am)

Observation Function

p(o’|s’,a) p(~a’,c’u|a’u)

Reward function r(s,a) r(su, au, sd, am)Belief State b(s) b(su, au, sd)

SDS-POMDP Experiments

Travel Domain test-bed simulation Users asked series of questions and then

finalize ticket purchase 16 available actions (greet, ask-from,

ask-to, confirm-to-x, confirm-from-x, submit-x-y…) As well as a Fail action to start over

1945 Total dialogue states


Reward Function Based on task completion and “dialog

appropriateness” Confirmation before user references item:

-3 Aborting the dialogue: -5 Issuing the correct submit-x-y query: +10 Issuing an incorrect submit-x-y query: -10 All other actions: -1


Finding a policy Created user simulations

Handcrafted Probabilities chosen to make user cooperative

but varied Model based on real data (10,000 turns)

Trained using Perseus (Spaan and Vlassis, 2005) variant of point-based value iteration

Significantly outperformed MDP trained for the same domain


MDPs and derivatives provide a rich representation for automatic dialogue planning

Probabilistic underpinnings accommodate uncertainty

Challenges with scaling to more complex scenarios

“Every discourse, even a poetic or oracular sentence, carries with it a system of rules for producing analogous things and thus an outline of methodology.” – Jacques Derrida

Learning Dialogue Structure

Learning Dialogue Structure Approaches thus far

Do not address effort to author dialogue states

Highly tuned to a specific task Have not addressed similarities from

dialogue to dialogue or task to task

Learning Dialogue Structure [Bangalore et al., 2006]

Characterize dialogue structure Move towards data-driven creation of

SDS components Learn models for predicting dialogue

acts and sub-task structure from several corpora of task-driven human-human dialogues

Dialogue Tree Structure

Dialogue is an incremental process

Utterance analysis and classification Lowest level of the dialogue structure

hierarchy Segmenter splits utterances into

individual clauses Syntactic annotation done via a

supertagger [Bangalore and Joshi, 1999] and a Tree-Adjoining Grammar (TAG) [Joshi, 1987] operations Give dependency analysis Predicate-argument structure

Utterance analysis and classification Dialogue Act Annotation

DAMSL [Core, 1998] too general Utilized acts specific to customer service domain

Dialogue Acts: Ask, explain, conversational, request Sub-types: info, order_info, product_info, hello,

thanks, repeat, order_status, … Several corpora used to train a dialogue act

tagger Features

Speaker information Current and Previous Utterances (Word Trigrams) Supertagged Utterance

Modeling Subtask Structure Dialogue Acts and utterances hint at

the conversational context Knowing how the utterance fits into

the overall flow and sequencing of tasks can help in deciding the next action

Two approaches to deriving structure Chunking Parsing

Modeling Subtask Structure

Chun

k M

odel

Pars

e M

odel

Modeling Subtask Structure Chunk Model

BIO (beginning, inside, outside) sequence classifier

Given a sequence of utterances U = u1, u2, …, un

Find the best subtask label ST = {stB, stI, stO} ST*=argmaxSTP(ST|U)

Parse Model Like incremental parsing Given a sequence of utterances U = u1, u2, …, un

Find most likely plan-tree PT PT*=argmaxSTP(PT|U)

Modeling Subtask Structure Performance on sequence prediction

between Chunk and Parse models were comparable

Chunk model’s efficiencies better suited for dialogue’s real-time demands

Parse model structure provides little extra information

Data-Driven Dialogue Characterization Advantages

Does not suffer from issues of scalability or tractability

Rapid-prototyping & domain adaptation Drawbacks

Cost of collecting and annotating large corpora

Generality of technique still unknown

“We demand rigidly defined areas of doubt and uncertainty.”– Douglas Adams

Conclusions

Conclusions

Takeaways Planning / TRAINS

Breakdown of conversational tasks and units Preconditions/Effects of utterances

MDPs Statistical mechanism for optimizing dialogue

behavior Mathematically grounded model

Data-driven Dialogue Characterization Attempt to understand the global flow and

structure of conversation

Conclusions

Drawbacks Planning

Handcrafted Rigid

MDPs State-space quickly explodes Training data is problematic Rewards are handcrafted

Data-driven Dialogue Characterization Does not address the decision making process Annotation may be expensive

Conclusions

Moving forward Perhaps sub-optimal policies are

sufficient Can we move beyond fine-tuning

individual dialogue systems? Do the lessons learned from these

approaches extend beyond task-oriented dialogue?

How does the semantic content of the utterance influence decision making?

References

[Allen et al., 1995] Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W., Poesio, M., and Traum, D. R. (1995). The trains project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7.

[Bangalore et al., 2006] Bangalore, S., Di Fabbrizio, G., and Stent, A. (2006). Learning the structure of task-driven human-human dialogs. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 201–208, Morristown, NJ, USA. Association for Computational Linguistics.

[Singh et al., 2002] Singh, S., Litman, D., Kearns, M., and Walker, M. (2002). Optimizing dialogue management with reinforcement learning: Experiments with the njfun system. Journal of Artificial Intel ligence Research, 16:105–133.

[Traum, 1996] Traum, D. R. (1996). Conversational agency: The trains-93 dialogue manager. In In Susann LuperFoy, Anton Nijhholt, and Gert Veldhuijzen van Zanten, editors, Proceedings of Twentieth Workshop on Language Technology, TWLT-II, pages 1–11.

[Williams and Young, 2007] Williams, J. D. and Young, S. (2007). Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang., 21(2):393–422.

References [Allen and Perrault, 1980] Allen, J. F. and Perrault, C. R. (1980). Analyzing intention in utterances.

Artificial Intelligence, 15(3):143–178. [Austin, 1962] Austin, J. (1962). How to Do Things with Words. Harvard University Press. [Bangalore and Joshi, 1999] Bangalore, S. and Joshi, A. K. (1999). Supertagging: an approach to

almost parsing. Comput. Linguist., 25(2):237–265. [Bratman et al., 1988] Bratman, M. E., Israel, D., and Pollack, M. (1988). Plans and resource-

bounded practical reasoning. Computational Intel ligence, 4:349–355. [Cohen and Perrault, 1979] Cohen, P. R. and Perrault, C. R. (1979). Elements of a plan-based

theory of speech acts. Cognitive Science, 3:177–212. [Core, 1998] Core, M. G. (1998). Analyzing and predicting patterns of damsl utterance tags. In In

Working Notes AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 18–24.

[Joshi, 1987] Joshi, A. K. (1987). Introduction to tree adjoining grammars. In Manaster-Ramer, A., editor, Mathematics of Language. John Benjamins, Amsterdam.

[Russell and Norvig, 2003] Russell, S. and Norvig, P. (2003). Artificial Intel ligence A Modern Approach. Pearson Education, Inc.

[Searle, 1975] Searle, J. R. (1975). A Taxonomy of Illocutionary Acts. [Traum and Allen, 1994] Traum, D. R. and Allen, J. F. (1994). Discourse obligations in dialogue

processing. In 32nd Annual Meeting of the Association for Computational Linguistics, pages 1–8. [Turing, 1950] Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236):433– 460

[Weizenbaum, 1966] Weizenbaum, J. (1966). Eliza - a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36–45

SDS-Components

Observation Function Impossible to calculate directly from

data p(o’|s’,a) = p(~a’,c’u|a’u)

Instead estimate it with:

perr = probability of a speech recognition error€

p( ˜ a 'u | au ') =1− perr if ˜ a 'u = au '

perr

Au −1 if ˜ a 'u ≠ au '

⎧ ⎨ ⎪

⎩ ⎪

planning, optimizing, and characterizing

Documents