generalized point based value iteration for interactive pomdps prashant doshi dept. of computer...

Generalized Point Based Value Iteration for Interactive

POMDPs

Prashant DoshiDept. of Computer Science

and AI InstituteUniversity of [email protected]

Twenty Third Conference on AI (AAAI 2008)

Dennis P. PerezAI Institute

University of [email protected]

UGA

• Background – Point Based Value Iteration (PBVI) for

POMDPs– Overview of Interactive POMDPs

• Interactive PBVI: Generalization of PBVI to Multiagent Settings

• Experimental Results

Outline

Decision Making in Single Agent Setting

Physical State(Loc, Orient,...)

1tia

tio

i

transition

POMDP

Solving POMDPs• Exact solutions are difficult to compute in

general– PSpace-Complete for finite horizon

(Papadimitriou&Tsitsiklis ’87, Mundhenk et al. ’00)

– Undecidable for infinite or indefinite horizon (Madani et al. ’03)

• Approximations that always guarantee a near-optimal policy quickly are not possible (Lusena et al. ’01)

– However, for many instances, algorithms in use or being developed will be fast and close to optimal

Point Based Value Iteration (PBVI)

• Potentially scalable approach for solving POMDPs approximately (Pineau et al., ’03,‘06)

Exact SolutionSelect belief pointsPrune alpha vectors

Expansion of belief points

Next runPrune alpha vectors

Point Based Value Iteration• Many different belief expansion strategies

– Stochastic trajectory generation– Greedy error minimization– Gain based methods (Samples et al. ’07)

• Improvements on PBVI– Randomly backing up vectors at select points

(Perseus; Spaan&Vlassis, ’05)– Prioritized vector backup (Shani et al. ’06)

Interactive state

Decision Making in Multiagent Setting

Physical State(Loc, Orient,...)

1tia

tio

i

Interactive POMDP

j

1tja

tjo

ji SIS (See JAIR ‘05)

Interactive POMDPsKey ideas:• Integrate game theoretic concepts into a decision theoretic

framework– Include possible models of other agents in decision making

intentional (types) and subintentional models

– Address uncertainty by maintaining beliefs over the state and models of other agents Bayesian learning

– Beliefs over intentional models give rise to interactive belief systems Interactive epistemology, recursive modeling

– Computable approximation of the interactive belief system Finitely nested belief system

– Compute best responses to your beliefs Subjective rationality

Interactive POMDPs

• Interactive state space– Include models of other agents into the state space

• Beliefs in I-POMDPs could be nested

1,, ljli SIS jjjjjjljlj OCROTAb ,,,,,,1,1,

)()( 1,,, ljlili SISb)( 1, jljBS

(computable)

Interactive PBVI (I-PBVI)• Hypothesis: Extending PBVI approach to I-

POMDPs results in a scalable approximation for I-POMDPs

• Generalizing PBVI to multiagent settings is not trivial– Research challenges:

1. Space of agent models is countably infinite2. Parameterized representation of nested beliefs is

difficult3. Other agents’ actions need to be predicted suggesting a

recursive implementation

Issue 1: Space of agent models is infinite

Approach• Analogous to PBVI in POMDPs, select a few initial models of

the other agent– Need to ensure that the true model is within this set, otherwise

the belief update is inconsistent

• Select models so that the Absolute Continuity Condition is satisfied– Subjective distribution over future observations (paths of play)

should not rule out the observation histories considered possible by the true distribution

• How to satisfy ACC?– Cautious beliefs– Select a finite set of models, , with the partial (domain)

knowledge that the true or an equivalent model is one of themji /

~

Issue 2: Representing nested beliefs is difficult

• Level 0 beliefs are standard discrete distributions (vectors of probabilities that sum to 1)

• Level 1 beliefs could be represented as probability density functions over level 0 beliefs

• Probability density functions over level 1 beliefs may not be computable in general– Parameters of level 1 beliefs may not be bounded (e.g., a

polynomial of any degree)– Level 2 beliefs are strictly partial recursive functions

Approach• We previously limited the set of models, • Level l belief becomes a discrete probability distribution

ji /~

1,,~~

ljli SSI )~(~

,, lili SIb

Issue 3: Predict other agent’s actions

Approach• Candidate agent models grow over time and must

be tracked– Define a complete interactive state spaceReach( ,0) =

• Solve other agent’s models at each level to predict actions– Recursively invoke I-PBVI to solve models

1,~

lj 1,~

lj Reach( , H ) = Set of models of agent j in the course of H steps

1,~

lj

),~(Re~1,, HachSSI ljli

Interactive PBVI• Back project alpha vectors for I-POMDPs (see paper) • Retain alpha vectors optimal at selected belief points

• Computational Savings

vs

Experimental Results• Measured the least time taken in reaching a particular

performance in terms of the rewards– Function of belief points, number of models and horizons– Compared with Interactive Particle Filter (I-PF; Doshi&

Gmytrasiewicz, ’05)

LEVEL 1LEVEL 1 LEVEL 2LEVEL 2

(Dual Processor Xeon, 3.4GHz, 4GB RAM, Linux)

Multiagent Tiger Problem

Experimental Results• I-PBVI is able to solve for larger horizons than

previous approximation technique

Level 1 H Time (secs)I-PBVI I-PF

Tiger 7 322 49620 2,290 *40 3,043 *

MM 8 498 75220 2,303 *40 3,360 *

Discussion

• Interactive PBVI generalizes PBVI to multiagent settings– The generalization is not trivial

• I-PBVI demonstrates scalable results on toy problems– Further testing on realistic applications is within reach

• Further improvement is possible by carefully limiting the set of models in Reach()– True or equivalent model must remain in the set otherwise

the belief update may become inconsistent

Thank You

Questions

Interactive POMDPs• Background

– Well-known framework for decision making in single agent partially observable settings: POMDP

– Traditional analysis of multiagent interactions: Game theory

• Problem“... there is currently no good way to combine game theoretic

and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.

• Belief Update: The belief update function for I-POMDPi involves:– Use the other agent’s model to predict its action(s)– Anticipate the other agent’s observations and how it updates its model– Use your own observations to correct your beliefs

Interactive POMDPsFormal Definition and Relevant Properties

iiiiii ROTAISPOMDPI ,,,,,

Prediction:

Correction:

• Policy Computation– Analogously to POMDPs (given the new belief update)

generalized point based value iteration for interactive pomdps prashant doshi dept. of computer...

Documents