emergence of gricean maxims from multi-agent decision theory

Emergence of Gricean Maxims from Multi-agent Decision Theory

Adam VogelStanford NLP Group

Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky

Decision-Theoretic Pragmatics

Gricean cooperative principle:

Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or

direction of the talk exchange in which you are engaged.

Decision-Theoretic Pragmatics

Gricean Maxims:• Be truthful: speak with evidence• Be relevant: speak in accordance with goals• Be clear: be brief and avoid ambiguity• Be informative: say exactly as much as needed

Emergence of Gricean Maxims

Co-operative principle

•Be truthful•Be relevant•Be clear•Be informative

???

Approach: Operationalize the co-operative principleTool: Multi-agent decision theoryGoal: Maxims emerge from rational behavior

Joint utility Rationality

Related Work

• One-shot reference tasks– Generating spatial referring expressions [Golland et al.

2010] – Predicting pragmatic reasoning in language games

[Stiller et al. 2011]• Interpreting natural language instructions– Learning to read help guides [Branavan et al. 2009]– Learning to following navigational directions [Vogel

and Jurafsky 2010] [Artzi and Zettlemoyer 2013] [Chen and Mooney 2011] [Tellex et al. 2011]

CARDS Task

Outline

• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it

• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs

Spatial Semantics“in the top left of the board”

“on the left side” “right in the middle”

BOARD(top;left) BOARD(left) BOARD(middle)

MaxEnt Classifier w/ Bag of Words

Estimated from Corpus Data

Complexity Ahoy

• Approximate decision making only feasible for problems with <10k states!

1001000

10000100000

100000010000000

1000000001000000000

10000000000

Semantic State Representation• Divide board into 16 regions• Cluster squares based on meanings



Outline

Partially Observable Markov Decision Process (POMDP)

Or: An HMM you get to drive!

State space S: hidden configuration of the world• Location of card• Location of player

Action space A: what we can do• Move around the board• Search for the card

Observations : sensor information + messages• Whether we are on top of the card• BOARD(right;top) etc.

Observation Model : sensor model• We see the card if we search for it and are on it• For messages

Reward R(s,a): value of an action in a state • Large reward if in the same square as the card• Every action adds small negative reward

Transition T(s’|a,s): dynamics of the world• Travel actions change player location• Card never moves

Initial belief state : distribution over S• Uniform distribution over card location• Known initial player location

Belief Update: Action: SEARCHObservation: (Card not here, )

Belief Update:

Belief Update: Action: SEARCHObservation: (Card not here, “left side”)

Belief Update:

Decision Making

Choose policy

Goal: Maximize expected reward

Solution: Perseus, an approximate value iteration algorithm [Spaan et al. 2005]

Computational complexity: P-SPACE!

Immediate reward Future rewardExpected +



Outline

DialogBot

• (Approximately) tracks beliefs of other player• Speech actions change beliefs of other player• Model: Decentralized POMDP (Dec-POMDP)– Problem: NEXP Hard!!

Top!

Each agent selects its own action

Each agent receives its own observation

Transition depends on both actions

Reward is shared between agentsFormalization of the co-operative principle

Exact Multi-agent Belief Update

Time

Approximate Multi-agent Belief Update

Time

Single-agent POMDP Approximation

Other agent belief transition model

World transition model

Resulting POMDP has states

What to say?

“Top”

“Middle”

“Right”

Return to Grice

• Be truthful• Be relevant• Be clear• Be informative

Cooperating DialogBots

Middle of the board

Adolescent DialogBots

Top

Return to Grice

• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win

the game• Be clear• Be informative

Experimental Results• Evaluate pairs of agents from 197 random

initial states• Agents have 50 high-level moves to find the

cardBots % Success Average High

Level ActionsListenerBot & ListenerBot

84.4% 19.8

ListenerBot & DialogBot

87.2% 17.5

DialogBot & DialogBot

90.6% 16.6

Emergent Gricean Behavior

• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win• Be clear: need variable costs on messages• Be informative: requires levels of specificity

ACL 2013: Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

From joint reward, not hard coded

Future Work: intentions, joint plans, deeper belief nesting

Thanks!

emergence of gricean maxims from multi-agent decision theory

Documents