the flaw of averages 2018 - asq mid-hudson section #302€¦ · d z d } v o } d z } kwzz/',d...
TRANSCRIPT
HOW OUR METHODS DOOM OUR ESTIMATES AND WHAT WE CAN DO ABOUT ITD AV E N O R T H C U T T, P R E S I D E N T
C AT S K I L L A N A LY T I C S
09-May-2018 COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 1
AgendaOverview Overview
How we often do estimationHow we often do estimation
An example that hits close to homeAn example that hits close to home
Understanding (and acknowledging) variationUnderstanding (and acknowledging) variation
Creating estimates that reflect variationCreating estimates that reflect variation
ExamplesExamples
Summary & Getting StartedSummary & Getting Started
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 209-May-2018
OverviewEstimation is a necessary part of virtually every important business endeavor
◦ Time estimates◦ Cost estimates◦ Resource estimates◦ Revenue estimates
In spite of the importance, in general, we often do a poor job of estimating –examples are numerous in virtually every business
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 3
This raises a fundamental question: Is it just our estimates that are bad, or are the methods that we use to generate them flawed?
09-May-2018
How We Often Do EstimationMany of our estimation methods rely on using averages of historical data for input
Little or no formal analysis typically goes into the generation of these averages, which are often adversely affected by outliers or erratic properties in the underlying data
These averages are then typically used as input to spreadsheet models, which, by their very nature are static analytical tools
Often, only a single scenario is generated to arrive at the required estimate
When multiple scenarios are generated, the number of scenarios is almost always limited to a “best case,” “average case,” and “worst case”
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 4
Unfortunately, plans based on average assumptions are rarely correct – real-world variability is the culprit
09-May-2018
An Example That Hits Close to Home*
Suppose you have $200,000 in your retirement fund, and you want to know how much you can withdraw each year and have it last 20 years
Since its inception in 1952 until 2000, the Standard and Poor’s 500 Index has varied but has averaged about 14% per year, so let’s use 14% as the average growth
Using any standard annuity tool (like in Excel!), you can easily figure out that you should be able to withdraw about $32,000 per year
Clearly, the return will fluctuate over this period, but as long as it averages 14% you should be OK, right?
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 5
* - From The Flaw of Averages by Sam Savage, Oct. 8 2000 in the San Jose Mercury News (http://www.stanford.edu/~savage/flaw/Article.htm)
Wrong! Given typical levels of volatility in the stock market, there are only slim odds that your retirement fund will last 20 years!
09-May-2018
An Example That Hits Close to Home (too close?)
Simply using actual historical data and different starting periods illustrates the problem:
Start: 1973o Average return 14%o Lasts only 8 years
Start: 1974o Average return 15.4%o Still solvent after 20 years
The average return for all of these periods was at least 14%
The average return for a given 20-year interval is not a good predictor of success
What really drives the result is getting off to a good start, which is what sets 1974 apart from the rest
So, why doesn’t this work?
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 6
problem: Start: 1975
o Average return 15.4%o Lasts only 13 years
Start: 1976o Average return 15.3%o Lasts only 10 years
09-May-2018
Why Don’t Average Inputs Yield Average Results?Any result that depends on random variable inputs is known as a function of random variables
In general, using average values for uncertain inputs in a function of random variables does not result in the average value of the function itself – in mathematical terms
Using average values will only yield the average result when F is a linear function of random variables – this is almost never the case in real-world situations and complex spreadsheets
Even when the problem is linear, simply knowing the average result is of little value if we don’t also know the likely deviation from that average – it is very unlikely that the final outcome will be exactly the average value even in a linear system
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 7
( ( )) ( ( ))F E x E F x
09-May-2018
Understanding (and Acknowledging) VariationBefore we can account for the effects of variability in our estimates, we must acknowledge and understand the variability in our historical datao Real-world data are variable – a fact that we too often choose to ignoreo There are two types of variation, common-cause and special-cause, and they
must be correctly identified, as they have very different effects on a systemo Complex systems contain many sources of variation that may need to be
accounted for
Statistical Process Control (SPC) has been shown to be an effective method for understanding variation and separating common-cause from special-cause variation
SPC can be used to:o Determine if your data are predictable – if not, your estimates will be little more
than guesseso Determine the expected amount of variability in your datao Determine if your data can be represented by a simple probability distribution or
a more complex one
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 809-May-2018
SPC ExampleAre the following data “predictable,” and if so, what is the extent of the likely variability?
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 9
23.79092 13.93506 11.49281 19.57453 24.1235 23.620819.82282 12.16534 12.93379 13.20001 17.55429 15.0148810.05668 17.70389 24.29298 22.09277 18.26579 18.1463520.65692 20.89657 13.59006 18.08601 11.54941 22.37475
Creating an SPC chart can answer these questions.
0
10
20
30
40
1 3 5 7 9 11 13 15 17 19 21 23
Sample
Du
rati
on
The data are predictable
The average is 17.7 and the likely range is between 5.9 and 29.5.
The chart virtually forces us to acknowledge and deal with the variability in the data!
09-May-2018
Creating Estimates That Reflect Variation
Once the historical variation for each of the input factors is understood, it is possible to use that insight to create better estimates
Monte Carlo simulation allows us to build dynamic models—often from existing static models—that replace fixed assumptions with random inputs that we specify and control
These random inputs can be chosen from a variety of distributions that represent our best estimates of the shapes and ranges of those inputs
Completed models are then “run” for many iterations, and data are collected for all of the outcomes of interest.
These data allow us to compute the likelihood of any outcome from the model
Most Monte Carlo tools are simply add-ins to Microsoft® Excel, making it easy to enhance existing static models or create new ones
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 1009-May-2018
The Monte Carlo Method
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 11
1. Determine the Inputs & Outcomes
2. Consider the Variation
Time
Money
Effort
Quality
3. Define the Relationships
Profit = Revenue - Cost
4. Try it! –10,000 Times
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
13 14 15 16 17 18 19 20 21
Project Time
Cu
mu
lati
ve F
req
uen
cy
Project Time Mean of Project Time
5. Review the Results
Time
Effort
09-May-2018
Monte Carlo InsightWith a single point estimate, there is little insight into the nature of the expected results
With Monte Carlo Simulation, the possible outcomes and the respective likelihoods are generated through repeated sampling, giving a view of the range of possibilities
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 12
Retirement Plan Outcomes
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
4 6 8 10 12 14 16 18 20
Years
Cu
mu
lati
ve F
req
uen
cy
Years Mean of Years
09-May-2018
An Example: Estimating Project Timeline RiskProject timelines are a common place where we often use static or near-static assumptions – each task is given an expected duration and perhaps, a worst-case duration
In reality, the time to complete every task has a range of outcomes, and all of the task ranges must be considered to understand the risk—some tasks are related (correlated) as well
To compound the problem, task estimates are often optimistic, and when aggregated, give an overall estimate that is very unlikely
Replacing static estimates with probabilistic ones and using Monte Carlo simulation allows us to estimate the likelihood of the project being completed within any given timeframe
Such information is invaluable when trying to manage multiple projects as a portfolio
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 1309-May-2018
Example: Estimating Project Timeline Risk
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 14
This simple project had a total time with static estimates of 15 weeks
I replaced each static estimate with a distribution of times, with the estimates above being most likely but somewhat optimistic
The total time exceeds 15 weeks most of the time
The project would almost certainly be late using the static estimate
Task A5 Weeks
Task B3 Weeks
Task D4 Weeks
Task C2 Weeks
Task E3 Weeks
Total15 Weeks
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
13.3 14.3 15.3 16.3 17.3 18.3 19.3 20.3
Predicted Project Time
Cu
mu
lati
ve
Fre
qu
ency
Project Time Mean of Project Time
09-May-2018
How Does This Work? The Excel Add-In allows us to replace standard “deterministic” formulas with “probabilistic” formulas, for example: Instead of saying that the total time in the previous example is always
A + B + D + E, where A, B, C, D, and E are all cells with static values,We can say that the total time is
max(A + B + C + E, A + B + D + E), where A, B, C, D, and E are all cells with probabilistic values as follows:
A: Triangular(4, 5, 6.5)B: Triangular(2.5, 3, 4)C: Triangular(1.5, 2, 4)D: Triangular(2, 4, 6.5)E: Triangular(2.5, 3, 4)
09-May-2018 COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 15
The software then “runs” the model thousands of times, keeping track of each result, to generate a cumulative distribution of the outcomes.
Distribution of Project Times
09-May-2018 COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 16
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
13.3 14.3 15.3 16.3 17.3 18.3 19.3 20.3
Predicted Project Time
Cu
mu
lati
ve
Fre
qu
en
cy
Project Time Mean of Project Time
Isn’t This Overly Simplistic?
Any model is, by its very nature, an abstraction of the reality it claims to represent
The goal is to capture the essential factors that have the biggest effect on the system in question – some detail will necessarily be lost
Verification (ensuring the model works as advertised) and Validation (ensuring the model gives accurate predictions) are necessary but often overlooked steps in any modeling activity
Making the variability in the process and the resulting estimations explicit is one of the biggest values of this method – we don’t tend to think probabilistically!
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 17
All Models are Wrong,
Some Are Useful.
- Dr. George E. P. Box
09-May-2018
SummaryBecause we often fail to identify and account for natural variability in our estimation process, our estimates fail to provide us with the insight we need to make sound business decisions
Using Statistical Process Control can help us to better understand the nature of the variation that is present in our historical data – the historical data are one of the primary determinants of the quality of our estimates
Using Monte Carlo Simulation can allow us to incorporate the variation present in all of our systems into the estimation process
The output from the simulation is a range of estimates and associated likelihoods rather than simply point estimates which are very unlikely to be correct
The result is significantly more insight into the nature of the situation and the opportunity to make informed decisions
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 1809-May-2018
How Can You Implement Monte Carlo Simulation?To get started, you need a tool that allows you to build Monte Carlo simulations.
There are several software packages available for Excel — 3 popular ones are:
◦ SIPmath™ – This is the current version of Sam Savage’s toolset . It’s freeand the models you build are shareable. www.probabilitymanagement.org
◦ Crystal Ball – Commercial, licensed product. Good feature set. www.oracle.com/applications/crystalball
◦ @Risk – Commercial, licensed product. Good feature set. http://www.palisade.com/risk
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 1909-May-2018
How Can You Implement Monte Carlo Simulation?There are a lot of helpful tutorials on the internet for all of these packages. The www.probabilitymanagement.org site has numerous examples as well.
Some good introductory references:◦ Understanding Variation: The Key to Managing Chaos, 2nd ed. by Donald
J. Wheeler, 2000 – good basic understanding of SPC◦ The Flaw of Averages by Sam L. Savage, 2012 – lots of examples and easy
to read◦ Introduction to Simulation and Risk Analysis, 2nd ed. by James R. Evans &
David Louis Olson, 2001 – a more academic treatment, but very valuable reference
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 2009-May-2018
Thanks for Attending!Any Questions?
If you have questions later, please feel free to contact me:
Dave Northcutt, President
Catskill Analytics
(908) 500-1196
COPYRIGHT © 2018, CATSKILL ANALYTICS, LLC 2109-May-2018