emergence of gricean maxims from multi-agent decision theory
DESCRIPTION
Emergence of Gricean Maxims from Multi-agent Decision Theory. Adam Vogel Stanford NLP Group Joint work with Max Bodoia , Chris Potts, and Dan Jurafsky. Decision-Theoretic Pragmatics. Gricean cooperative principle:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/1.jpg)
Emergence of Gricean Maxims from Multi-agent Decision Theory
Adam VogelStanford NLP Group
Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky
![Page 2: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/2.jpg)
Decision-Theoretic Pragmatics
Gricean cooperative principle:
Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or
direction of the talk exchange in which you are engaged.
![Page 3: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/3.jpg)
Decision-Theoretic Pragmatics
Gricean Maxims:• Be truthful: speak with evidence• Be relevant: speak in accordance with goals• Be clear: be brief and avoid ambiguity• Be informative: say exactly as much as needed
![Page 4: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/4.jpg)
Emergence of Gricean Maxims
Co-operative principle
•Be truthful•Be relevant•Be clear•Be informative
???
Approach: Operationalize the co-operative principleTool: Multi-agent decision theoryGoal: Maxims emerge from rational behavior
Joint utility Rationality
![Page 5: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/5.jpg)
Related Work
• One-shot reference tasks– Generating spatial referring expressions [Golland et al.
2010] – Predicting pragmatic reasoning in language games
[Stiller et al. 2011]• Interpreting natural language instructions– Learning to read help guides [Branavan et al. 2009]– Learning to following navigational directions [Vogel
and Jurafsky 2010] [Artzi and Zettlemoyer 2013] [Chen and Mooney 2011] [Tellex et al. 2011]
![Page 6: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/6.jpg)
CARDS Task
![Page 7: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/7.jpg)
Outline
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
![Page 8: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/8.jpg)
Spatial Semantics“in the top left of the board”
“on the left side” “right in the middle”
BOARD(top;left) BOARD(left) BOARD(middle)
MaxEnt Classifier w/ Bag of Words
Estimated from Corpus Data
![Page 9: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/9.jpg)
Complexity Ahoy
• Approximate decision making only feasible for problems with <10k states!
1001000
10000100000
100000010000000
1000000001000000000
10000000000
![Page 10: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/10.jpg)
Semantic State Representation• Divide board into 16 regions• Cluster squares based on meanings
![Page 11: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/11.jpg)
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
Outline
![Page 12: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/12.jpg)
Partially Observable Markov Decision Process (POMDP)
Or: An HMM you get to drive!
![Page 13: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/13.jpg)
State space S: hidden configuration of the world• Location of card• Location of player
![Page 14: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/14.jpg)
Action space A: what we can do• Move around the board• Search for the card
![Page 15: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/15.jpg)
Observations : sensor information + messages• Whether we are on top of the card• BOARD(right;top) etc.
![Page 16: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/16.jpg)
Observation Model : sensor model• We see the card if we search for it and are on it• For messages
![Page 17: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/17.jpg)
Reward R(s,a): value of an action in a state • Large reward if in the same square as the card• Every action adds small negative reward
![Page 18: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/18.jpg)
Transition T(s’|a,s): dynamics of the world• Travel actions change player location• Card never moves
![Page 19: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/19.jpg)
Initial belief state : distribution over S• Uniform distribution over card location• Known initial player location
![Page 20: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/20.jpg)
Belief Update: Action: SEARCHObservation: (Card not here, )
![Page 21: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/21.jpg)
Belief Update:
![Page 22: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/22.jpg)
Belief Update: Action: SEARCHObservation: (Card not here, “left side”)
![Page 23: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/23.jpg)
Belief Update:
![Page 24: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/24.jpg)
Decision Making
Choose policy
Goal: Maximize expected reward
Solution: Perseus, an approximate value iteration algorithm [Spaan et al. 2005]
Computational complexity: P-SPACE!
Immediate reward Future rewardExpected +
![Page 25: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/25.jpg)
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
Outline
![Page 26: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/26.jpg)
DialogBot
• (Approximately) tracks beliefs of other player• Speech actions change beliefs of other player• Model: Decentralized POMDP (Dec-POMDP)– Problem: NEXP Hard!!
Top!
![Page 27: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/27.jpg)
Each agent selects its own action
![Page 28: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/28.jpg)
Each agent receives its own observation
![Page 29: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/29.jpg)
Transition depends on both actions
![Page 30: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/30.jpg)
Reward is shared between agentsFormalization of the co-operative principle
![Page 31: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/31.jpg)
Exact Multi-agent Belief Update
Time
![Page 32: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/32.jpg)
Approximate Multi-agent Belief Update
Time
![Page 33: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/33.jpg)
Single-agent POMDP Approximation
Other agent belief transition model
World transition model
Resulting POMDP has states
![Page 34: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/34.jpg)
What to say?
![Page 35: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/35.jpg)
“Top”
![Page 36: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/36.jpg)
“Middle”
![Page 37: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/37.jpg)
“Right”
![Page 38: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/38.jpg)
“Right”
![Page 39: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/39.jpg)
Return to Grice
• Be truthful• Be relevant• Be clear• Be informative
![Page 40: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/40.jpg)
Cooperating DialogBots
Middle of the board
![Page 41: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/41.jpg)
Cooperating DialogBots
Middle of the board
![Page 42: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/42.jpg)
Adolescent DialogBots
Top
![Page 43: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/43.jpg)
Return to Grice
• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win
the game• Be clear• Be informative
![Page 44: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/44.jpg)
Experimental Results• Evaluate pairs of agents from 197 random
initial states• Agents have 50 high-level moves to find the
cardBots % Success Average High
Level ActionsListenerBot & ListenerBot
84.4% 19.8
ListenerBot & DialogBot
87.2% 17.5
DialogBot & DialogBot
90.6% 16.6
![Page 45: Emergence of Gricean Maxims from Multi-agent Decision Theory](https://reader035.vdocument.in/reader035/viewer/2022062500/56815143550346895dbf60c5/html5/thumbnails/45.jpg)
Emergent Gricean Behavior
• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win• Be clear: need variable costs on messages• Be informative: requires levels of specificity
ACL 2013: Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
From joint reward, not hard coded
Future Work: intentions, joint plans, deeper belief nesting
Thanks!