1 endgame logistics final project presentations tuesday, march 19, 3-5, kec2057 powerpoint...

24
1 Endgame Logistics Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class) Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project Not including questions Final Project Reports Due: Friday, March 22, 12 noon

Upload: mervyn-kennedy

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

1

Endgame Logistics

Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class)

Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project

Not including questions

Final Project Reports Due: Friday, March 22, 12 noon

Page 2: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

2

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 3: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

3

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions (but …) vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown vs. partial model

numeric vs. discrete

STRIPS Planning

Page 4: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

4

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

MDP Planning

Page 5: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

5

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

ReinforcementLearning

Page 6: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

6

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown vs. simulator

numeric vs. discrete

Simulation-BasedPlanning

Page 7: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

7

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 8: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

8

Numeric States In many cases states are naturally described in

terms of numeric quantities

Classical control theory typically studies MDPs with real-valued continuous state spaces Typically assume linear dynamical systems Quite limited for most applications we are interested in

in AI (often mix of discrete and numeric)

Typically we deal with this via feature encodings of the state space

Simulation based methods are agnostic about whether the state is numeric or discrete

Page 9: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

9

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 10: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

10

Partial Observability In reality we only observe percepts of the world not the

actual state

Partially-Observable MDPs (POMDPs) extend MDPs to handle partial observability Start with an MDP and add an observation distribution

P(o | s) : probability of observation o given state s We see a sequence of observations rather than sequence of

states

POMDP planning is much harder than MDP planning. Scalability is poor.

Can often apply RL in practice using features of observations

Page 11: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

11

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 12: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

12

Other Sources of Change In many cases the environment changes even if

no actions are select by the agent

Sometimes due to exogenous events, e.g. 911 calls come in at random

Sometimes due to other agents Adversarial agents try to decrease our reward Cooperative agents may be trying to increase our

reward or have their own objectives

Decision making in the context of other agents is studied in the area of game theory

Page 13: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

13

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 14: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

14

Durative Actions Generally different actions have different durations

Often durations are stochastic

Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a)

which gives the probability of ending up in state s’ in t time steps after taking action a in state s

Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

Page 15: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

15

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 16: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

16

Durative Actions Generally different actions have different durations

Often durations are stochastic

Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a)

which gives the probability of ending up in state s’ in t time steps after taking action a in state s

Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

Page 17: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

17

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 18: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

18

Concurrent Durative Actions In many problems we need to form plans that direct

the actions of a team of agents Typically requires planning over the space of concurrent

activities, where the different activities can have different durations

Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions Standard MDP algorithms will break

There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class

Page 19: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

19

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 20: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

20

Percepts

ActionsWorldperfect

vs. noisy

fully observable vs. partially observable

instantaneous vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

numeric vs. discrete

Page 21: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

21

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

Page 22: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

22

PerceptsActions

Worldperfect vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

Page 23: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

23

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown

Page 24: 1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own

24

Percepts ActionsWorldperfect

vs. noisy

fully observable vs. partially observable instantaneous

vs. durative

deterministic vs. stochastic

AI Planning

????

sole sourceof change vs. other sources

concurrent actions vs. single action

goal satisfaction vs. general reward

Objective

known world model vs. unknown