1 graphical models for online solutions to interactive pomdps prashant doshi yifeng zeng qiongyu...

26
1 Graphical Models for Online Solutions to I nteractive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National Univ. USA Denmark of Si ngapore ernational Conference on Autonomous Agents and Multiagent Sys (AAMAS 2007)

Upload: melvyn-francis

Post on 16-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

1

Graphical Models for Online Solutions to Interactive

POMDPs

Prashant Doshi Yifeng Zeng Qiongyu ChenUniversity of Georgia Aalborg University National Univ.

USA Denmark of Singapore

International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2007)

Page 2: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

2

Decision-Making in Multiagent Settings

State (S)

Act to optimize preferences given beliefs

Actions (Ai)

Agent i

Observations (Oi)

Actions (Aj)

Observations (Oj)

Agent j

Belief over state and model of j

Belief over state and model of i

Page 3: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

3

Finitely Nested I-POMDP (Gmytrasiewicz&Doshi, 05) A finitely nested I-POMDP of agent i with a strategy

level l : Interactive states:

Beliefs about physical environments: Beliefs about other agents in terms of their preferences,

capabilities, and beliefs: Type: A Joint actions Possible observations Ti Transition function: S×A×S [0,1]

Oi Observation function: S×A× [0,1]

Ri Reward function: S×A

},,,,,;{:1, jjjjjjjjlj OCRAbM

S1,, ljli MSIS

iiiijili RAAAISPOMDPI li ,,,),(,,,

i

1, ljM

i

Page 4: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

4

Belief Update

Page 5: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

5

Forget It!

Different approach Use the language of Influence Diagrams (IDs) to

represent the problem more transparently Belief update

Use standard ID algorithms to solve it Solution

Page 6: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

6

Challenges

Representation of nested models for other agents Influence diagram is a single agent oriented

language

Update beliefs on models of other agents New models of other agents Over time agents revise beliefs over the models of

others as they receive observations

Page 7: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

7

Related Work

Multiagent Influence Diagrams (MAIDs) (Koller&Milch,2001) Uses IDs to represent incomplete information games Compute Nash equilibrium solutions efficiently by exploiting

conditional independence

Network of Influence Diagrams (NIDs) (Gal&Pfeffer,2003) Allows uncertainty over the game Allows multiple models of an individual agent Solution involves collapsing models into a MAID or ID

Both model static single play games Do not consider agent interactions over time (sequential decisio

n-making)

Page 8: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

8

Introduce Model Node and Policy Link

A generic level l Interactive-ID (I-ID) for agent i situated with one other agent j Model Node: Mj,l-1

Models of agent j at level l-1

Policy link: dashed line Distribution over the other

agent’s actions given its models

Beliefs on Mj,l-1

P(Mj,l-1|s) Update?

AiRi

Oi

S Aj

M j,l-1

Level l I-ID

Page 9: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

9

Details of the Model Node

Members of the model node Different chance nodes are

solutions of models mj,l-1

Mod[Mj] represents the different models of agent j

CPT of the chance node Aj is a multiplexer Assumes the distribution of

each of the action nodes (Aj

1, Aj2) depending on the valu

e of Mod[Mj]

Mod[Mj]

Aj1

Aj2

M j,l-1

S

m j,l-11

m j,l-12

Aj

m j,l-11 , m j,l-12 could be I-IDs or IDs

Page 10: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

10

Whole I-ID

AiRi

Oi

S Aj

Mod[Mj]

Aj1 Aj

2

m j,l-11 m j,l-12m j,l-11 , m j,l-12 could be I-IDs or IDs

Page 11: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

11

Interactive Dynamic Influence Diagrams (I-DIDs)

Ait+1

Ri

Oit+1

St+1 Ajt+1

M j,l-1t+1

Ait

Ri

Oit

St

Ajt

M j,l-1t

Model Update Link

Page 12: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

12

m j,l-1t,2

Semantics of Model Update Link

Mod[Mjt]

Aj1

M j,l-1t

st

m j,l-1t,1

Ajt

Aj2

Oj1

Oj2

Oj

Mod[Mjt+1]

Aj1

M j,l-1t+1

st+1

m j,l-1t+1,1

m j,l-1t+1,2

Ajt+1

Aj2

Aj3

Aj4

m j,l-1t+1,3

m j,l-1t+1,4

These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

Page 13: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

13

Notes

Updated set of models at time step (t+1) will have at most models :number of models at time step t :largest space of actions :largest space of observations

New distribution over the updated models uses original distribution over the models probability of the other agent performing the action, and receiving the observation that led to the updated model

|||||| 1, jjtlj AM

|| 1,tljM

|| jA

|| j

Page 14: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

14

Ait+1

Ri

Oit+1

St+1Oit

Ait

Ri

St

m j,l-1t,1

m j,l-1t,2

Aj1 Oj

1

Aj2 Oj

2

Aj1

Aj2

Aj3

Aj4

m j,l-1t+1,1

m j,l-1t+1,2

m j,l-1t+1,3

m j,l-1t+1,4

Ajt+1

Mod[Mjt]

Ajt

Oj

Mod[Mjt+1]

Page 15: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

15

Example Applications: Emergence of Social Behaviors Followership and Leadership in the persistent

multiagent tiger problem

Altruism and Reciprocity in the public good problem with punishment

Strategies in a simple version of two-player Poker

Page 16: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

16

Followership and Leadership in Multiagent Persistent Tiger Experimental Setup:

Agent j has a better hearing capability (95% accurate) compared to i’s (65% accuracy)

Agent i does not have initial information about the tiger’s location

Agent i considers two models of agent j which differ in j’s level 0 initial beliefs Agent j likely thinks that the tiger is behind the left door Agent j likely thinks that the tiger is behind the right door

Solve the corresponding level 1 I-DID expanded over three time steps and get the normative behavioral policy of agent i

Page 17: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

17

Level 1 I-ID in the Tiger Problem

Expand over threetime steps

Mapping decision nodes to chance nodes

Page 18: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

18

Policy Tree 1: Agent i has hearing accuracy of 65%

LL

LL LL

OROR LLLL LL LL OLOL

GL,* GR,*

GL,CRGL,S/CL

GR,*GL,*

GR,S/CR

GR,CL

Conditional Followership

Page 19: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

19

Policy Tree 2: Agent i loses hearing ability (accuracy is 0.5)

LL

LL

OROR OLOLLL

*,*

*,CR *,S *,CL

Unconditional (Blind) Followership

Page 20: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

20

Example 2: Altruism and Reciprocity in the Public Good Problem Public Good Game

Two agents initially endowed with XT amount of resources Each agent may choose

contribute (C) a fixed amount of the resources to a public pot not contribute ie. defect (D)

Agents’ actions and pot are not observable, but agents receive an observation symbolizing the state of the public pot plenty (PY) meager (MR)

Value of resources in the public pot is discounted by ci (<1) for each agent i, where ci is the marginal private return

In order to encourage contributions, the contributing agents punish free riders P but incur a small cost cp for administering the punishment

Page 21: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

21

Agent Types

Altruistic and Non-altruistic types Altruistic agent has a high marginal private return (ci

is close to 1) and does not punish others who defect Optimal Behavior

One action remaining: both types of agents choose to contribute to avoid being punished

Two actions to go: altruistic type chooses to contribute, while the other defects Why?

Three steps to go: the altruistic agent contributes to avoid punishment and the non-altruistic type defects

Greater than three steps: altruistic agent continues to contribute to the public pot depending on how close its marginal return is to 1, the non-altruistic type prescribes defection

Page 22: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

22

Level 1 I-ID in the Public Good Game

Expand over threetime steps

Page 23: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

23

Policy Tree 1: Altruism in PG

If agent i (altruistic type) believes with a probability 1 that j is altruistic, i chooses to contribute for each of the three steps.

This behavior persists when i is unaware of whether j is altruistic, and when i assigns a high probability to j being the non-altruistic type

CC

CC

CC

*

*

Page 24: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

24

Policy Tree 2: Reciprocal Agents Reciprocal Type

The reciprocal type’s marginal private return is less and obtains a greater payoff when its action is similar to that of the other

Experimental Setup Consider the case when the

reciprocal agent i is unsure of whether j is altruistic and believes that the public pot is likely to be half full

Optimal Behavior From this prior belief, i chooses to

defect On receiving an observation of

plenty, i decides to contribute, while an observation of meager makes it defect

With one action to go, i believes that j contributes, will choose to contribute too to avoid punishment regardless of its observations

DD

CC

CC

DD

CC

*

PY

*

MR

Page 25: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

25

Conclusion and Future Work

I-DIDs: A general ID-based formalism for sequential decision-making in multiagent settings Online counterparts of I-POMDPs

Solving I-DIDs approximately for computational efficiency (see AAAI ’07 paper on model clustering)

Apply I-DIDs to other application domains

Visit our poster on I-DIDs today for more information

Page 26: 1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National

26

Thank You!