bams 517 decision analysis: a dynamic programming perspective

43
1 BAMS 517 Decision Analysis: A Dynamic Programming Perspective Martin L. Puterman UBC Sauder School of Business Winter Term 2011

Upload: nitesh

Post on 22-Jan-2016

48 views

Category:

Documents


1 download

DESCRIPTION

BAMS 517 Decision Analysis: A Dynamic Programming Perspective. Martin L. Puterman UBC Sauder School of Business Winter Term 2011. Introduction to Decision Analysis -outline. Course info Dynamic decision problem introduction Decision problems and decision trees Single decision problems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

1

BAMS 517Decision Analysis: A Dynamic Programming Perspective

Martin L. Puterman

UBC Sauder School of Business

Winter Term 2011

Page 2: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

2

Introduction to Decision Analysis -outline Course info Dynamic decision problem introduction Decision problems and decision trees

Single decision problems Multiple decision problems

Probability Expected value decision making Value of Perfect Information Value of Imperfect Information Utility and Prospect Theory Finite Horizon Dynamic Programming

Page 3: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

3

Some dynamic decision problems

Assigning customers to tables in a restaurant

Deciding when to release an auction on eBay

Choosing the quantity to produce (inventory models)

Deciding when to start a medical treatment or accept an organ transplant

Playing Tetris Deciding when to add capacity

to a system

Advanced patient scheduling Managing a bank of elevators Deciding when to replace a car Managing a portfolio Deciding when to stop a

clinical trial Guiding a robot to a target Playing golf

In each case there is a trade-off between immediate reward and uncertain long term gain

Page 4: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

4

Common ingredients of these dynamic decision problems Problem persists over time. Problem structure remains the same every period. Current decisions impact future system behavior probabilistically. Current decision may result in immediate costs or rewards. These problems are all examples of Markov decision problems or

MDPs or stochastic dynamic programs They were first formulated in the 1940s for problems in reservoir

management (Masse) and sequential statistical estimation problems (Wald)

They were formalized in the 1950s by Bellman and Howard. Theory was developed between 1960-1990. Rediscovered in the 1990s by computer scientists

Reinforcement learning Approximate dynamic programming

Page 5: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

5

Basic Decision Analysis

Page 6: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

6

Decision Analysis Goal: to understand how to properly structure, and then solve,

decision problems of nearly any type Structuring the decision problem and obtaining the inputs is usually the

hard part Once the right structure has been found, solving for the best course of

action is usually straightforward We will be guided by mathematical and scientific principles These principles will ensure that:

Our decision-making is rational and logically coherent We choose the best course of action based on our preferences for

certain outcomes and knowledge available at the time of the decision We might not always be satisfied with the outcome but we will be

confident with that the process we used was the best available.

Page 7: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

7

Decision Analysis Our analysis will tell us what decision ought to be taken,

as a rational person, and not what decision people actually tend to make in the same situation Normative (or prescriptive) analysis, rather than a descriptive

analysis Many studies have shown that people do not always act

rationally The methods we introduce provides a framework that

translates your preferences for outcomes and your assessments of the likelihood of each consequence into a recipe for action Places minimal requirements on your preferences and

assessments Does not impose someone else’s values in place of your own

We begin by exploring how to assess the likelihood of outcomes. We will discuss how to determine your preference for outcomes in a few classes.

Page 8: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

8

Simple decision problems The basic problem is to select an action from a finite set without

knowing which outcome will occur In order to decide on the proper action, we need to:

Quantify the uncertainty of future events Assign probabilities to the events

Evaluate and compare the “goodness” of the possible outcomes Assign utilities to the outcomes

Once these are in place, we have fully specified the decision problem

Page 9: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

Assessing Probabilities Through Decision Trees

9

Page 10: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

10

The election stock market problem

Suppose we are faced with the following opportunity on September 8, 2008. You can pay $.56 and if Obama wins the election you receive $1

and if he loses you receive $0. http://iemweb.biz.uiowa.edu/graphs/graph_Pres08_WTA.cfm

Decision: Invest $.56 or do not Uncertain Event: Obama wins.

Suppose this has probability q

The election stock market problem is perhaps the simplest decision problem we will study. It contains, however, all the basic elements of many more complex problems.

Page 11: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

11

The election problem on September 8

Payoff

(Gain)

Obama wins Obama loses

Buy 1 share 1

(+.44)

0

(-.56)

Do not 0 0

Page 12: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

12

A decision tree for the election problem

Do not

invest

$0

Buy 1 share

$0.44

Obama wins

Obama

loses

1-q

q

-$0.56

Page 13: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

13

Valuing gambles

Under certain conditions (to be discussed in class 3) it is advantageous to evaluate gambles by their mathematical expectation.

For the previous problem the expected value of the gamble would be

.44 q - .56 (1-q) = q - .56

Page 14: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

14

Solving the election problem - a reduced problem

Do not

invest

$0

Buy 1 share

$( q-.56)

We replace the gamble by its expectation – latter we use expected utility of the gamble

Page 15: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

15

The election problem solution Assume you will choose the decision which maximizes your

expected payoff. If you invest, your expected payoff is q-.56; if you do not your

expected payoff is 0. Thus if you thought (on September 8) that q, the probability Obama wins exceeds .56, you will invest, if you don’t, you will not invest.

You will be indifferent, when q =.56. On September 8, the consensus (among investors in the Iowa

Electronic Stock Market) probability of Obama winning was 0.56 Why?

Thus an Electronic Stock Market provides an alternative to polls when predicting outcomes of random events and a method for assessing probabilities. Current markets Wikipedia Article (Gives comparison of accuracy compared to polls)

Page 16: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

16

Odds - definitions If p is the probability an event occurs;

o = p/ (1-p)

is called the odds of the event occurring

Often we consider l = ln(o) = ln (p/1-p) which is called the log-odds or logit of p. Aside: this is a key ingredient in a logistic regression model

ln(p/1-p) = β0 + β1x

Thus the odds (on September 8) of Obama winning is

o = .56/.44 = 1.27 (to one)

Page 17: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

17

Odds - Examples

On December 30, 2008 The Globe and Mail gave the following odds for various teams winning the Super Bowl : NY Giants 2 to 1 Tennessee Titans 4 to 1 Arizona Cardinals 40 to 1

They have the following meaning in this context; If you bet $1 on the Cardinals (on Dec 30) and they win the

Super Bowl, you get back $41 dollars for a net gain of $40 dollars

The relation of these odds to probabilities can be determined using decision analysis What is the implied odds makers probability q, that Arizona wins

the Super Bowl?

Page 18: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

18

A decision tree for the Super Bowl problem

Do not

bet

$0

Bet on Arizona

$40

Arizona wins

Arizona

loses

1-q

q

-$1

Page 19: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

19

Solving the Super Bowl problem

Do not

bet

$0

Bet on Arizona

$ 41q -1

You would be indifferent between the two decisions if q = 1/41 or 1-q =40/41

Page 20: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

20

Odds and Bookmaker’s Odds Based on the decision tree and expectations the probability of winning is 1/41

So using the above definition of odds you would find the odds of winning is oW = 1/41/ (1 - 1/41) = 1/40 (to 1)

The odds of losing would be oL = 40/41/ (1-40/41) = 40 (to 1)

Thus quoted odds for sports events are the odds of losing. Hence the odds (on December 30) the Giants don’t win the Super Bowl are

2 to 1 and the odds they win the Super Bowl are 1 to 2. The implied probability the Giants win the Super Bowl is 1/3.

Another interpretation (courtesy Wikipedia) “Generally, 'odds' are not quoted to the general public in the format (p/1-p) because

of the natural confusion with the chance of an event occurring being expressed fractionally as a probability. ”

Example – Suppose that you are told to pick a digit from 0 to 9. Then the odds are 9 to 1 against you choosing a 7. One way to think about this interpretation is that there are 10 outcomes in 1 you succeed in picking a 7 and in 9 you don’t succeed.

This interpretation doesn’t work for one time events like the Super Bowl. I’ll refer to these as ‘bookmaker’s odds”.

Page 21: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

21

Games of Chance and Odds

The payout on a successful bet on a single number is 35 to 1 plus the amount bet.The true bookmaker’s odds are 37 to 1 on an American roulette wheel (with 0 and 00).(assuming a fair wheel)

Page 22: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

22

A decision tree for a single number bet in roulette

Do not

bet

$0

Bet $1 on 7

$35

Ball stops on 7

Lose

37/38

1/38

-$1

Page 23: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

23

Solving the roulette problem

Do not

$0

Bet $1 on 7 -$ 0.0526

Page 24: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

24

Bet name Winning spacesPayou

tOdds against

winning

Expected value

(on a $1 bet)

0 0 35 to 1 37 to 1 −$0.053

00 00 35 to 1 37 to 1 −$0.053

Straight up Any single number 35 to 1 37 to 1 −$0.053

Row 00 0, 00 17 to 1 18 to 1 −$0.053

Split any two adjoining numbers vertical or horizontal 17 to 1 18 to 1 −$0.053

Trio 0, 1, 2 or 00, 2, 3 11 to 1 11.667 to 1 −$0.053

Street any three numbers horizontal (1, 2, 3 or 4, 5, 6 etc.) 11 to 1 11.667 to 1 −$0.053

Corner any four adjoining numbers in a block (1, 2, 4, 5 or 17, 18, 20, 21 etc. ) 8 to 1 8.5 to 1 −$0.053

Five Number Bet 0, 00, 1, 2, 3 6 to 1 6.6 to 1 −$0.079

Six Lineany six numbers from two horizontal rows (1, 2, 3, 4, 5, 6 or 28, 29, 30, 31, 32, 33 etc.)

5 to 1 5.33 to 1 −$0.053

1st Column 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34 2 to 1 2.167 to 1 −$0.053

2nd Column 2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35 2 to 1 2.167 to 1 −$0.053

3rd Column 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36 2 to 1 2.167 to 1 −$0.053

1st Dozen 1 through 12 2 to 1 2.167 to 1 −$0.053

2nd Dozen 13 through 24 2 to 1 2.167 to 1 −$0.053

3rd Dozen 25 through 36 2 to 1 2.167 to 1 −$0.053

Odd 1, 3, 5, ..., 35 1 to 1 1.111 to 1 −$0.053

Even 2, 4, 6, ..., 36 1 to 1 1.111 to 1 −$0.053

Red1, 3, 5, 7, 9, 12,14, 16, 18, 19, 21, 23,25, 27, 30, 32, 34, 36

1 to 1 1.111 to 1 −$0.053

Black2, 4, 6, 8, 10, 11,13, 15, 17, 20, 22, 24,26, 28, 29, 31, 33, 35

1 to 1 1.111 to 1 −$0.053

1 to 18 1, 2, 3, ..., 18 1 to 1 1.111 to 1 −$0.053

19 to 36 19, 20, 21, ..., 36 1 to 1 1.111 to 1 −$0.053

Page 25: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

25

Roulette

Based on the previous table; every $1 bet in roulette has an expected value of negative $0.526.

Thus roulette is an unfavorable game. Note there is research on how to play unfavorable

games optimally based on dynamic programming.

But; if you play many times, and the wheel is fair, you will lose money.

Why do people play?

Page 26: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

26

Money Lines or “odds sets” Another way of expressing odds.

Used frequently for hockey and baseball betting. The Globe and Mail (December 31,2008)

In an NHL game the favorite Calgary has a line of -175 and Edmonton the underdog has a line of +155.

This means that if you want to bet on Calgary, you must bet $175 to win $100 and if you want to bet on Edmonton you must bet $100 to $155.

This implies P(Calgary) = 7/11 = .636 P(Edmonton) = 20/51 = .392

What’s happening? What about ties? Suppose the Calgary probability is correct, (and ties are not possible), then the

probability of Edmonton winning should be 1-.636 = .354 and the money line on Edmonton should be 636/.354 = 180!

So “the House” is taking $25 off the payout on a winning Edmonton bet. The same argument for Calgary implies ?

So again like in roulette “the House” is taking a premium on every bet by reducing the payoff below the expected value of the gamble.

Page 27: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

27

Assigning probabilities to events

The uncertainty of an event will be measured according to its probability of occurrence

For events that have been repeated several times and regularly observed, it’s easy to assign a probability: Outcomes of gambling games:

Tossing coins, rolling dice, spinning roulette wheels, etc. Actuarial and statistical events:

A 30-year-old female driver having an accident in the next year The chance of rain tomorrow, given today’s weather conditions The number of cars driving over the Lion’s Gate bridge tomorrow

between 8 and 9 AM The number of admits to the emergency room at VGH on January 7,

2009

Page 28: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

28

Assigning probabilities to events However, not all events occur with statistical regularity:

General Motors will be bankrupt by July 1, 2010 A democrat will win the 2012 US presidential election

The uncertainty of an event often derives from a lack of precise knowledge How many jellybeans are in a jar? Was W.L. MacKenzie King Prime Minister of Canada in 1936?

Or there is not much data available Will a new medical treatment be effective in a specific patient?

Since these events cannot be repeated in any meaningful way, how can we assign a probability to their occurrence? We can rely on election stock markets or odds if they are

available. What if they’re not?

Page 29: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

29

Assigning probabilities to events It is important to recognize that two different people in the same

situation might assign two different probabilities to the same event A probability assignment reflects your personal assessment of the

likelihood of an event – the uncertainty being measured is your uncertainty

Different people may have different knowledge about the event in question

Even people with the same knowledge could still differ in their opinion of the likelihood of an event

Someone could coherently assign a probability of ¼ to a coin coming up heads, if he/she had reason to believe the coin is not fair

They are often called subjective probabilities. The assessment of subjective probabilities is a key topic in research on

decision analysis (and forecasting)

Page 30: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

30

Assigning probabilities to events

Example; Suppose we wished to assign a probability to the event “A thumb tack lands with its point up”

How we could we find this probability? We could guess. We can gauge our belief of the likelihood of an event by

comparing it to a set of “standard” statistical probabilities through a reference lottery.

We can compare the following two gambles to assess this probability:

Choice A: Toss the thumbtack. If it lands point up, you win $1; otherwise you receive $0Choice B: Spin the spinner, If it ends on blue, you win $1; otherwise you receive $0

We can adjust the portion of the spinner that is blue until we are indifferent between the two choices.

This Probability spinner provides a way of varying the blue portion systematically.

Spinner.exe

Page 31: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

31

The implied decision tree

Thumbtack land up

Thumbtack lands on

“tip down”

Spinner blue

Spinner red

+1

0

0

+1

Choice A

Choice B

Page 32: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

Implications of using reference lottery If the spinner is set so that the probability of blue is .5 and you

prefer A to B, then you believe the probability “thumbtack up” is greater than .5

If the spinner is set so that the probability of blue is .9 and you prefer B to A, then you believe the probability “thumbtack up” is less than .9

Repeating this can give a plausible range for the probability of “thumbtack up” .

This is hard to do! There is a big literature on biases of such assignments.

Alternatively we could construct a distribution of plausible values for this probability and the likelihood of each of these values instead of assigning one number. Or we could input our assessment into the decision problem and do

sensitivity analysis.

32

Page 33: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

33

Another option – acquire information

Suppose you were faced with Choice A only? What would this gamble be worth?

One approach; provide a prior distribution on the probability of the event p. Example: Uniformly distributed on [0,1]. Base the decision on the mean, median or mode of

this distribution.

Toss the thumbtack once, and use Bayes’ theorem to update this probability.

Page 34: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

34

Assigning probabilities to events Let E be an event, and let H represent the knowledge and

background information used to make a probability judgment. We denote the assigned probability as P(E | H) “The probability of event E given information H”

We do not consider probabilities as separate from the information used to assess them This reflects the fact that we consider all probabilities to be based on the

judgment of an individual and the individual’s knowledge at the time of the assessment.

Even though we consider probabilities to based on an individual’s judgment, they cannot be arbitrarily assigned Certain rules must be obeyed for the assignments to be coherent Using the method outlined above to assign probabilities avoids

incoherent assignments

Page 35: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

35

Axioms of Probability

The probability assignments P(E | H) must obey the following basic axioms: 0 ≤ P(E |H) ≤ 1 (Addition law) Suppose that E1 and E2 are two events that could not both

occur together (they are mutually exclusive). Then

P(E1 or E2 | H) = P(E1 | H) + P(E2 | H)

If E1 and E2 are mutually exclusive and collectively exhaustive, then

P(E1 or E2 | H) = P(E1 | H) + P(E2 | H) = 1

(Multiplication law) For any two events E1 and E2, P(E1 and E2 |H) = P(E1 | H)P(E2 | E1 and H)

If E1 and E2 are independent (i.e., P(E2 | E1 and H) = P(E2 |H)), then

P(E1 and E2 | H) = P(E1 | H) P(E2 | H)

These rules can be used to compute probability assignments for complex events based on those for simpler events

Page 36: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

36

The law of total probabilities

This law can be derived from the axioms and the definition of conditional probability. It says that for any two events A and E, P(A | H)= P(A and E | H) + P(A and Ec | H)

= P(A | E and H)•P(E | H) + P(A | Ec and H) • P(Ec | H) This law is useful because it allows one to divide a

complex event into subparts for which it may be easier to assess probabilities. Also it generalizes to more than just conditioning on E and Ec.

We can replace it by any set (or continuum) of events that partitions the sample space.

It is used widely in probability theory to compute complex probabilities and is fundamental for evaluating Markov chains

Page 37: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

37

Bayes’ rule This is a very important rule that we will use extensively.

It is a way to systematically include information in assessing probabilities To simplify notation lets drop the conditioning on H and assume that it is

understood that probabilities are conditional on history. Bayes’ Rule can be written as

It is derived using the definition of conditional probability and the law of total probabilities

It generalizes to any set of events that partitions the sample space.

)()|()()|(

)()|(

)(

)&()|(

cC APABPAPABP

APABP

BP

BAPBAP

Page 38: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

38

Updating probability assessments Suppose that you can’t see inside a jellybean

jar containing only red and white beans, but I tell you that either 25% of the beans are red or 25% are black. You think these possibilities are equally likely.

Suppose I pick a bean at random. What is the probability it is red?

Now you draw 5 jellybeans from the jar with replacement, and find that 4 of them are red. How should you revise your belief in the probability that 25% of the beans are red, in light of this information?

Let A be the event “25% of beans in the jar are red” and let E be the event of drawing 5 beans and obtaining 4 red and 1 black

We want to find P(A | E) and we know: P(A) = P(Ac) = 0.5 P(E | A) = .75 (.25)4 = 0.00293 P(E | Ac) = .25 (.75)4= 0.0791

Page 39: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

39

Updating probability assessments Using Bayes’ rule, we now compute P(A |E) = .00293(0.5) / [.00293(0.5) + .0791(0.5)] = .0357 Thus, you should now believe that there is about a 3.5% chance

that 25% of the jellybeans are red. You also think there is about a 96.5% chance that 25% of the beans are

black Obviously, we have received strong evidence regarding the contents of

the jar, since our beliefs have gone from complete uncertainty (50%) to high probability (96.5%)

Let’s look at some of the terms involved in Bayes’ rule: The expression P(A | E) is known as the “posterior” probability of A, i.e.,

the assessed probability of A after we learn that E has occurred P(A) is known as the “prior” probability of A, i.e., the assessment of the

probability of A before the new information was received P(E | A) is known is the “likelihood” of E occurring given that A is true

Page 40: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

40

Probabilities for single events To follow on the previous example, let’s now ask what we think the

probabilities are now for the next bean drawn from the jar to be red We assign a probability of .965 to there being 75% red beans in the jar, and

a chance of .035 to there being 25% red beans in the jar Let A = “75% of the beans in the jar are red” Let Ac = “25% of the beans in the jar are red” Let B = “The next bean drawn from the jar is red”

We use the law of total probability to compute P(B):

P(B) = P(B | A) P(A) + P(B | Ac) P(Ac)

= .75(.965) + .25(.035) = .733 P(B) is called a marginal probability What was this probability before sampling?

Page 41: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

41

The “Monty Hall” problem

Monty Hall was the host of the once-popular game show “Let’s Make a Deal” In the show, contestants were shown three doors, behind each of which was a

prize. The contestant chose a door and received the prize behind that door This setup was behind one of the most notorious problems in probability Suppose you are the contestant, and Monty tells you that there is a car behind

one of the doors, and a goat behind each of the other doors. (Of course, Monty knows where the car is)

Suppose you choose door #1

Page 42: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

42

The “Monty Hall” problem

Before revealing what’s behind door #1, Monty says “Now I’m going to reveal to you one of the other doors you didn’t choose” and opens door #3 to show that there is a goat behind the door.

Monty now says: “Before I open door #1, I’m going to allow you to change your choice. Would you rather that I open door #2 instead, or do you want to stick with your original choice of door #1?”

What do you do?

Page 43: BAMS 517 Decision Analysis:  A Dynamic Programming Perspective

43

Summary

Sequential Decision Problems Decision Trees Probability Assessment, Odds and Gambling Probability updating Monty Hall and Hatton Realty for next time.