1 endgame logistics final project presentations tuesday, march 19, 3-5, kec2057 powerpoint...
TRANSCRIPT
1
Endgame Logistics
Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class)
Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project
Not including questions
Final Project Reports Due: Friday, March 22, 12 noon
2
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
3
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions (but …) vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown vs. partial model
numeric vs. discrete
STRIPS Planning
4
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
MDP Planning
5
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
ReinforcementLearning
6
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown vs. simulator
numeric vs. discrete
Simulation-BasedPlanning
7
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
8
Numeric States In many cases states are naturally described in
terms of numeric quantities
Classical control theory typically studies MDPs with real-valued continuous state spaces Typically assume linear dynamical systems Quite limited for most applications we are interested in
in AI (often mix of discrete and numeric)
Typically we deal with this via feature encodings of the state space
Simulation based methods are agnostic about whether the state is numeric or discrete
9
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
10
Partial Observability In reality we only observe percepts of the world not the
actual state
Partially-Observable MDPs (POMDPs) extend MDPs to handle partial observability Start with an MDP and add an observation distribution
P(o | s) : probability of observation o given state s We see a sequence of observations rather than sequence of
states
POMDP planning is much harder than MDP planning. Scalability is poor.
Can often apply RL in practice using features of observations
11
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
12
Other Sources of Change In many cases the environment changes even if
no actions are select by the agent
Sometimes due to exogenous events, e.g. 911 calls come in at random
Sometimes due to other agents Adversarial agents try to decrease our reward Cooperative agents may be trying to increase our
reward or have their own objectives
Decision making in the context of other agents is studied in the area of game theory
13
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
14
Durative Actions Generally different actions have different durations
Often durations are stochastic
Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a)
which gives the probability of ending up in state s’ in t time steps after taking action a in state s
Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
15
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
16
Durative Actions Generally different actions have different durations
Often durations are stochastic
Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a)
which gives the probability of ending up in state s’ in t time steps after taking action a in state s
Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
17
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
18
Concurrent Durative Actions In many problems we need to form plans that direct
the actions of a team of agents Typically requires planning over the space of concurrent
activities, where the different activities can have different durations
Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions Standard MDP algorithms will break
There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class
19
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
20
Percepts
ActionsWorldperfect
vs. noisy
fully observable vs. partially observable
instantaneous vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
numeric vs. discrete
21
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
AI Planning
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
22
PerceptsActions
Worldperfect vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
23
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
AI Planning
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown
24
Percepts ActionsWorldperfect
vs. noisy
fully observable vs. partially observable instantaneous
vs. durative
deterministic vs. stochastic
AI Planning
????
sole sourceof change vs. other sources
concurrent actions vs. single action
goal satisfaction vs. general reward
Objective
known world model vs. unknown