endgame logistics

Post on 23-Feb-2016

72 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Endgame Logistics. Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class ) Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project Not including questions Final Project Reports - PowerPoint PPT Presentation

TRANSCRIPT

1

Endgame Logisticsh Final Project Presentations

5 Tuesday, March 19, 3-5, KEC20575 Powerpoint suggested (email to me before class)

g Can use your own laptop if necessary (e.g. demo)5 10 minutes of presentation per project

g Not including questions

h Final Project Reports5 Due: Friday, March 22, 12 noon

2

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

3

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions (but …) vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown vs. partial model

numeric vs. discrete

STRIPS Planning

4

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

MDP Planning

5

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

ReinforcementLearning

6

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown vs. simulator

numeric vs. discrete

Simulation-BasedPlanning

7

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

8

Numeric Statesh In many cases states are naturally described in

terms of numeric quantitiesh Classical control theory typically studies MDPs

with real-valued continuous state spaces5 Typically assume linear dynamical systems5 Quite limited for most applications we are interested in

in AI (often mix of discrete and numeric)

h Typically we deal with this via feature encodings of the state space

h Simulation based methods, function approximation RL, and policy gradient are agnostic about whether state is numeric or not

9

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

10

Partial Observabilityh In reality we only observe percepts of the world

not the actual stateh Partially-Observable MDPs (POMDPs) extend

MDPs to handle partial observability5 Start with an MDP and add an observation distribution

P(o | s) : probability of observation o given state s5 We see a sequence of observations rather than

sequence of states

h POMDP planning is much harder than MDP planning. Scalability is poor.

h Can often apply RL in practice using features of observations

11

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

12

Other Sources of Changeh In many cases the environment changes even if

no actions are select by the agenth Sometimes due to exogenous events, e.g. 911

calls come in at randomh Sometimes due to other agents

5 Adversarial agents try to decrease our reward5 Cooperative agents may be trying to increase our

reward or have their own objectives

h Decision making in the context of other agents is studied in the area of game theory

13

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

14

Durative Actionsh Generally different actions have different

durations5 Often durations are stochastic

h Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations5 Transition distribution changes to P(s’,t | s, a)

which gives the probability of ending up in state s’ in t time steps after taking action a in state s

h Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

15

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

16

Durative Actionsh Generally different actions have different

durations5 Often durations are stochastic

h Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations5 Transition distribution changes to P(s’,t | s, a)

which gives the probability of ending up in state s’ in t time steps after taking action a in state s

h Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

17

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

18

Concurrent Durative Actionsh In many problems we need to form plans that

direct the actions of a team of agents5 Typically requires planning over the space of

concurrent activities, where the different activities can have different durations

h Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions5 Standard MDP algorithms will break

h There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class

19

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

20

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

21

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

22

PerceptsActions

Worldperfect vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

23

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

24

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

top related