planning, optimizing, and characterizing
DESCRIPTION
Three approaches to dialogue management. Planning, Optimizing, and Characterizing. Presented by Lee Becker October 21, 2009. Introduction. “ The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.” - PowerPoint PPT PresentationTRANSCRIPT
PLANNING, OPTIMIZING, AND CHARACTERIZING
Three approaches to dialogue management
Presented by Lee BeckerOctober 21, 2009
“The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.”– Dorothy Neville
Introduction
Sample Dialogue
1 Tutor: Hi, how are you?2 Student
:Good
3 Tutor: Excellent, Let’s talk a bit about your experiences with science. Tell me about what you have been doing in science most recently.
4 Student:
We’ve been learning about circuits and how to work light bulbs
5 Tutor: Circuits and working light bulbs, cool. Tell me more about circuits
The Dialogue Management Problem Giving an appropriate response
Understanding what was said and how it fits into the overall conversational context
Responding with intention Obeying social norms
Turn-taking Feedback / Acknowledgment
“Words are also actions, and actions are a kind of words.”
– Ralph Waldo Emerson
Dialogue as Planning
System View of Dialogue Management
Dialogue Manager
User Utterance Response
Planning Agents
Maintain state of the world (beliefs)
Predetermined wants (desires) specify how the world should look
Select goals (intentions)
Build/Execute Plan Belief Monitoring BDI Architecture
Blocks World Example Init(On(A, Table) ^ On(B, Table) ^ On(C, Table)
^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C))
Goal(On(A,B) ^ On(B,C)) Action( Move(b,x,y))
Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b)^ (b != x) ^ (b != y) ^ (x != y)
Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) Action (MoveToTable(b,x))
Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x)
A B C A CB
CBA
Speech acts and planning
Planning is intuitive for physical actions
How can utterances fit into a plan? “Can you give me the directions to The
Med?” “Did you take out the trash?” “I will try my best to be at home for
dinner” “I name this ship the "Queen Elizabeth””
Speech Acts (Austin, Searle) Illocutionary Force Performative Action
Speech Acts Meet AI
Allen, Cohen, and Perrault Speech Acts Expressed in terms of
Preconditions Effects
Related to change in agent’s mental states
Plans are sequence of speech acts
Example Speech Acts
REQUEST(speaker, hearer, act): effect: speaker WANT hearer DO act
INFORM(speaker, hearer, proposition): effect: KNOW(hearer, proposition)
TRAINS
A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition
Conversational Agent for Logistics and planning
Users converse with a “Manager” to develop a plan of action in the TRAINS domain.
Sample TRAINS ScenarioUser (1) We have to make OJ.
(2) There are oranges at I.(3) And an OJ Factory at B.(4) Engine E3 is scheduled to
arrive at I at 3PM(5) Shall we ship the
oranges?System
(6) Yes,(7) Shall I start loading the
oranges in the empty car at I?
User (8) Yes, and well have E3 pick it up.
(9) OK?System
(10)
OK
City B
City G
City I
OJ Factory
OrangeSource
BananaSource
Empty Car
Engine E3Empty car
Deliberative Agent
Communicative Agent
Discourse Obligations
BDI does not account for what compels one speaker to answer another
Two Strangers Example: A: Do you know the time? B: Sure. It’s half past two.
Answering Person B’s questions does not help Person A attain his goals.
Discourse Obligations Obligations – Like Speech Acts,
produce observable effects on the speaker.Source of obligation Obliged action
S1 Accept or Promise A S1 achieve AS1 Request A S2 address Request:
accept or reject AS1 Y/N-Question whether P S2 Answer-if PS1 Wh-Question S2 Inform-ref xutterance not understood or incorrect
repair utterance
S1 Initiate discourse unit S2 acknowledge discourse unit
Request Repair of P Repair PRequest Acknowledgement of P Acknowledge P
Discourse Obligations
Inherent tension between Obligations and Goals
Approaches Perform all obligated actions Perform only actions that will lead to
desired state A blend of the other two approaches
TRAINS Discourse Obligationsloop
if system has obligations then address obligationselse if system has performable intentions then perform actionselse deliberate on goals end if
end loop
TRAINS Discourse Obligations Ensure system cooperation even if
response is in conflict with the user’s goals
Aids in developing mixed-initiative Goal-driven actions Speaker Led
Initiative Obligation-driven actions Other Led
Initiative
Mutual Belief and Grounding Conversational agents do not act in
isolation Mental states should account for:
Private Beliefs Shared Beliefs
In TRAINS Shared Belief needed for:
Modeling the domain-plan under-construction
Common understanding
Mutual Belief and Grounding Extended Conversation ActsAct Type Description Sample ActsTurn-Taking Maintain, release, or
take the turn in the dialogue
take-turn, keep-turn
Grounding Deal with establishing shared knowledge about the dialogue
Repair, acknowledge
Core Speech Acts
Original illocutionary acts
Inform, yes-no-question, accept, request
Argumentation Characterize the relationship between utterances
Elaborate, Q&A
The TRAINS Approach
Attempts to capture the processes underlying dialogue via: Speech acts Discourse Obligations Mutual Belief, Grounding
Potentially Rigid Rules and logic handcrafted
“When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell
Dialogue as a Markov Decision Process
Flexible Dialogue
Qualities of robust dialogue Flexible conversation flow Adapted to users’ preferences / skill levels Resilient to errors in understanding
The dialogue author’s dilemma: Robustness vs. Effort:
Other Issues: Noisy Channels: ASR, NLU Evaluation
What is an optimal decision?
Dialogue with uncertainty
Markov Decision Process (MDP) Probabilistic Framework Ability to model planning and decision-
making over time Based on the Markov Assumption: Future states depend only on the current
state Future states independent of other past
states
Markov Decision Processes
Markov chains with choice!
+✓
Markov Decision Processes
Agent based process defined by 4-tuple: S: A set of states describing the agent’s
world A: A set of actions the agent may take T: A set of transition probabilities:Pa(s,s’) =P(s’|s,a) = P(st+1|st, at)
R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.
Markov Decision Processes Policy Function π(s)
Mapping of states to actions Optimal policy π*(s) yields highest
possible cumulative reward MDP with a fixed policy is a Markov
Chain Rewards
Cumulative Reward
€
Q = λ tr(st ,a)t= 0
∞
∑
Q = r(s,a) +λ P(s' | s,a)maxa '
Q(s',a')s'
∑
Solving an MDP
Approaches: Value Iteration, Policy Iteration, Q-Learning
Ideally: Encode state space with relevant features and
rewards Compute state transition and reward
probabilities directly from a corpus of annotated dialogues
In Practice: Reduce state space and do random exploration Simulate a user and produce a corpus
Reinforcement Learning for Dialogue Policy Design NJFun System [Singh, et al 2002]
NJFun Sample Dialogue
S1: Welcome to NJFun. How may I help you? U1: I’d like to find um winetasting in Lambertville in the
morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning)
S2: Did you say you are interested in Lambertville? U2: Yes. S3: Did you say you want to go in the morning? U3: Yes. S4: I found a winery near Lambertville that is open in the
morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’.
U4: Good. S5: Goodbye and have a nice day.
NJFun – State Spaces
Full state space – encode everything State explosion quickly makes
optimization intractable State-space Estimator
Encode sufficient information to learn good decisions
NJFun – State Space Estimator Limited policy optimization for two
types of decisions Initiative – Direct vs. Open Ended
System Initiative: “Please say the name of the town or city that you are interested in.”
User Initiative: “Please give me more information.”
Confirmation – Verify or Assume “Did you say you are interested in
<location>?
NJFun State Features & ValuesFeature Values ExplanationGreet (G) 0, 1, 2 Whether the system has greeted the
userAttribute (A) 1,2,3,4 Which attribute is being worked on Confidence / Confirmed (C)
0,1,2,3,4
0,1,2 for low, medium and high ASR confidence3,4 for explicitly confirmed and disconfirmed
Value (V) 0,1 Whether the value has been obtained for current attribute
Tries (T) 0,1,2 How many times current attribute has been asked
Grammar (M) 0,1 Whether non-restrictive or restrictive grammar was used
History (H) 0,1 Whether there was trouble on any previous attribute
State Space Estimator
Features yield 62 possible dialogue states
42 Choice States each with 2 actions per state Confirm/Not confirm System/User initiative
In total 242 unique dialogue trajectories
Finding an Optimal Policy
Gathering Training Data New system built with randomized dialogue
policy Deployed to 54 users each assigned 6 tasks 311 dialogues in total
Reward Measure Binary task completion
+1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…)
-1: Otherwise Reinforcement Learning
Finding an Optimal Policy
RL Learned Policy: Initiative
Begin with user initiative Back off to mixed or system initiative
Confirmation In general confirm at lower confidence
ASR values Other features describe more complex
interventions
Evaluating the Optimal Policy System with optimal policy tested on
additional 21 users 124 test dialogues Did not significantly perform baseline
on binary completion measure (p =0.059)
Statistically significant improvement using weak completion and ASR measures
Limited Observability
MDPs assume the world is fully observable
However: Not all errors or states are directly
observable Undetected errors may propagate Evidence may not indicate error
Limited Observability
Turn Utterance [ASR Output] S1 Are you a registered user?U2 No, I’m not [NO I’M NOT]S3 What is your full nameU4 <Username> [ATHENS IN AKRON]S5 There is an Athens in Georgia and in Greece. What destination did
you want?U6 Neither [NEITHER] S7 I’m sorry, I did not catch that. There is an Athens in Georgia and in
Greece. Which destination do you want? U8 Georgia [GEORGIA] S9 A flight from Athens. . . Where do you want to go? U10 Start over [START OVER]
Partially Observable Markov Decision Processes (POMDPS) Intuition
Maintain parallel hypothesis of what was said
Backpedal or switch strategies when a hypothesis becomes sufficiently false
POMDP ExampleSystem / User / [ASR] POMDP belief state Traditional Method
Order: { size: <null>}
Order: { size: small}Order: { size: small}
Order: { size: large[?]}
Initial State
S: How can I help you?U: A small pepperoni pizza. [A small pepperoni pizza.]S: Ok, what toppings?U: A small pepperoni [A small pepperoni]
S: And why type of crust?U: Uh just normal [large] normal
S M L
b
S M L
b
S M Lb
S M L
b
A comparison of Markov Models
Markov ModelsDo we have control over
state transitionsNo Yes
Are states completely observable?
Yes Markov-Chain MDP
No HMM POMDP
Table courtesy of http://www.cassandra.org/pomdp/pomdp-faq.shtml
POMDPs
Extends the MDP Model O: Set of observations agent can receive
about the world Z: Observation Probabilities b(s): Belief state, probability of being in
state s Not in fixed state, instead maintains
a probability distribution over all possible states
POMDPs
Belief Monitoring Shifting probability mass to match
observations Optimal action depends only on the
agent’s current belief state
€
b'(s') =Z ⋅ T
s∈S∑ ⋅b(s)
p(o' | a,b)=
p(o' | s',a) ⋅ p(s' | a,s) ⋅b(s)s∈S∑
p(o' | a,b)
= k ⋅ p(o' | s',a) ⋅ p(s' | a,s) ⋅b(s)s∈S∑
POMDPs
Influence DiagramR: RewardA: ActionS: StateO: Observation
POMDPs for Spoken Dialogue Systems SDS-POMDP [Williams and Young
2007] Claim: POMDPs perform better for SDS
because Maintain parallel dialogue state Can incorporate ASR confidence scores
directly in the belief state update
SDS-POMDP ArchitectureA ActionY Audio Signal~A Recognized
ActionC Confidence
SDS-POMDP ComponentsStandardPOMDP
SDS-POMDP
State set S (Sm, Au, Sd)
Observation Set O (~Au, C)
Action Set A (Am)Transition Function p(s’|s,a) p(s’u|su,am)p(a’u|s’u,am)p(s’d|
a’u,sd, am)
Observation Function
p(o’|s’,a) p(~a’,c’u|a’u)
Reward function r(s,a) r(su, au, sd, am)Belief State b(s) b(su, au, sd)
SDS-POMDP Experiments
Travel Domain test-bed simulation Users asked series of questions and then
finalize ticket purchase 16 available actions (greet, ask-from,
ask-to, confirm-to-x, confirm-from-x, submit-x-y…) As well as a Fail action to start over
1945 Total dialogue states
SDS-POMDP Experiments
Reward Function Based on task completion and “dialog
appropriateness” Confirmation before user references item:
-3 Aborting the dialogue: -5 Issuing the correct submit-x-y query: +10 Issuing an incorrect submit-x-y query: -10 All other actions: -1
SDS-POMDP Experiments
Finding a policy Created user simulations
Handcrafted Probabilities chosen to make user cooperative
but varied Model based on real data (10,000 turns)
Trained using Perseus (Spaan and Vlassis, 2005) variant of point-based value iteration
Significantly outperformed MDP trained for the same domain
Markov Decision Processes
MDPs and derivatives provide a rich representation for automatic dialogue planning
Probabilistic underpinnings accommodate uncertainty
Challenges with scaling to more complex scenarios
“Every discourse, even a poetic or oracular sentence, carries with it a system of rules for producing analogous things and thus an outline of methodology.” – Jacques Derrida
Learning Dialogue Structure
Learning Dialogue Structure Approaches thus far
Do not address effort to author dialogue states
Highly tuned to a specific task Have not addressed similarities from
dialogue to dialogue or task to task
Learning Dialogue Structure [Bangalore et al., 2006]
Characterize dialogue structure Move towards data-driven creation of
SDS components Learn models for predicting dialogue
acts and sub-task structure from several corpora of task-driven human-human dialogues
Dialogue Tree Structure
Dialogue is an incremental process
Utterance analysis and classification Lowest level of the dialogue structure
hierarchy Segmenter splits utterances into
individual clauses Syntactic annotation done via a
supertagger [Bangalore and Joshi, 1999] and a Tree-Adjoining Grammar (TAG) [Joshi, 1987] operations Give dependency analysis Predicate-argument structure
Utterance analysis and classification Dialogue Act Annotation
DAMSL [Core, 1998] too general Utilized acts specific to customer service domain
Dialogue Acts: Ask, explain, conversational, request Sub-types: info, order_info, product_info, hello,
thanks, repeat, order_status, … Several corpora used to train a dialogue act
tagger Features
Speaker information Current and Previous Utterances (Word Trigrams) Supertagged Utterance
Modeling Subtask Structure Dialogue Acts and utterances hint at
the conversational context Knowing how the utterance fits into
the overall flow and sequencing of tasks can help in deciding the next action
Two approaches to deriving structure Chunking Parsing
Modeling Subtask Structure
Chun
k M
odel
Pars
e M
odel
Modeling Subtask Structure Chunk Model
BIO (beginning, inside, outside) sequence classifier
Given a sequence of utterances U = u1, u2, …, un
Find the best subtask label ST = {stB, stI, stO} ST*=argmaxSTP(ST|U)
Parse Model Like incremental parsing Given a sequence of utterances U = u1, u2, …, un
Find most likely plan-tree PT PT*=argmaxSTP(PT|U)
Modeling Subtask Structure Performance on sequence prediction
between Chunk and Parse models were comparable
Chunk model’s efficiencies better suited for dialogue’s real-time demands
Parse model structure provides little extra information
Data-Driven Dialogue Characterization Advantages
Does not suffer from issues of scalability or tractability
Rapid-prototyping & domain adaptation Drawbacks
Cost of collecting and annotating large corpora
Generality of technique still unknown
“We demand rigidly defined areas of doubt and uncertainty.”– Douglas Adams
Conclusions
Conclusions
Takeaways Planning / TRAINS
Breakdown of conversational tasks and units Preconditions/Effects of utterances
MDPs Statistical mechanism for optimizing dialogue
behavior Mathematically grounded model
Data-driven Dialogue Characterization Attempt to understand the global flow and
structure of conversation
Conclusions
Drawbacks Planning
Handcrafted Rigid
MDPs State-space quickly explodes Training data is problematic Rewards are handcrafted
Data-driven Dialogue Characterization Does not address the decision making process Annotation may be expensive
Conclusions
Moving forward Perhaps sub-optimal policies are
sufficient Can we move beyond fine-tuning
individual dialogue systems? Do the lessons learned from these
approaches extend beyond task-oriented dialogue?
How does the semantic content of the utterance influence decision making?
References
[Allen et al., 1995] Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W., Poesio, M., and Traum, D. R. (1995). The trains project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7.
[Bangalore et al., 2006] Bangalore, S., Di Fabbrizio, G., and Stent, A. (2006). Learning the structure of task-driven human-human dialogs. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 201–208, Morristown, NJ, USA. Association for Computational Linguistics.
[Singh et al., 2002] Singh, S., Litman, D., Kearns, M., and Walker, M. (2002). Optimizing dialogue management with reinforcement learning: Experiments with the njfun system. Journal of Artificial Intel ligence Research, 16:105–133.
[Traum, 1996] Traum, D. R. (1996). Conversational agency: The trains-93 dialogue manager. In In Susann LuperFoy, Anton Nijhholt, and Gert Veldhuijzen van Zanten, editors, Proceedings of Twentieth Workshop on Language Technology, TWLT-II, pages 1–11.
[Williams and Young, 2007] Williams, J. D. and Young, S. (2007). Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang., 21(2):393–422.
References [Allen and Perrault, 1980] Allen, J. F. and Perrault, C. R. (1980). Analyzing intention in utterances.
Artificial Intelligence, 15(3):143–178. [Austin, 1962] Austin, J. (1962). How to Do Things with Words. Harvard University Press. [Bangalore and Joshi, 1999] Bangalore, S. and Joshi, A. K. (1999). Supertagging: an approach to
almost parsing. Comput. Linguist., 25(2):237–265. [Bratman et al., 1988] Bratman, M. E., Israel, D., and Pollack, M. (1988). Plans and resource-
bounded practical reasoning. Computational Intel ligence, 4:349–355. [Cohen and Perrault, 1979] Cohen, P. R. and Perrault, C. R. (1979). Elements of a plan-based
theory of speech acts. Cognitive Science, 3:177–212. [Core, 1998] Core, M. G. (1998). Analyzing and predicting patterns of damsl utterance tags. In In
Working Notes AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 18–24.
[Joshi, 1987] Joshi, A. K. (1987). Introduction to tree adjoining grammars. In Manaster-Ramer, A., editor, Mathematics of Language. John Benjamins, Amsterdam.
[Russell and Norvig, 2003] Russell, S. and Norvig, P. (2003). Artificial Intel ligence A Modern Approach. Pearson Education, Inc.
[Searle, 1975] Searle, J. R. (1975). A Taxonomy of Illocutionary Acts. [Traum and Allen, 1994] Traum, D. R. and Allen, J. F. (1994). Discourse obligations in dialogue
processing. In 32nd Annual Meeting of the Association for Computational Linguistics, pages 1–8. [Turing, 1950] Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236):433– 460
[Weizenbaum, 1966] Weizenbaum, J. (1966). Eliza - a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36–45
SDS-Components
Observation Function Impossible to calculate directly from
data p(o’|s’,a) = p(~a’,c’u|a’u)
Instead estimate it with:
perr = probability of a speech recognition error€
p( ˜ a 'u | au ') =1− perr if ˜ a 'u = au '
perr
Au −1 if ˜ a 'u ≠ au '
⎧ ⎨ ⎪
⎩ ⎪