risk management and reliable forecasting using un-reliable data (magennis) - lkce 2014

Risk Management and Reliable Forecasting Using Un-Reliable Data

First Presented at Lean Kanban Central Europe, Hamburg. November 2014

Troy Magennis Twitter: @t_magennis

Get Slides: http://bitly.com/1E9Hh8l

@t_magennis2

Don’t Follow the Light

3

Question Current Approaches to…

EstimationForecasting

Risk

4

Sources of Forecast

Risk

Work

Throughput

Dependencies

6 @t_magennisPeople

@t_magennis7

People

• People are biased – intentionally and/or un-intentionally

• In order to forecast and manage risk– We need good expert opinions– We need to confirm these opinions against reality– We need to learn from our forecast errors

• Often we get opinions on a fractional understanding of the eventual problem solved

@t_magennis8

Not Getting Data(At All or Early Enough)

@t_magennis9

Getting Reliable Data from People

• Why would people take the time?– We tell them (rarely works as intended)– Was politely ask them (works sometimes)– We make it part of their self-interest (most likely)

• Gamification• Challenge their view on fairness

• NEVER: Embarrass a team or individual– you will totally destroy reliable data capture….

10

Strategy 1 – “Gamify” Presentation

Interactive charts get attention, vibrant colors for teams with good data

TeamsStrategies

Features

Teams don’t like being “Red”(default to red; teams will make them green)

Coloring teams in dull (grey) based on poor quality data capture often gets action.

Make it sexy. Show how “my” metric connects to strategy

@t_magennis11

Strategy 2 – Visibility to Decisions

• Operations Reviews! Giving meaning to data!• Make it clear when data has led to decisions

– “Based on the data and analysis presented, this is clearly an opportunity we will pursue.”

– “Lets track the first month actuals against the model and fully invest if it is tracking well.”

• Make it clear when more data would have “won”• “If I could clearly see the impact of giving you those extra team members,

this would be easy”

• Promote lively debate around data– React quickly if data presented is gamed or teams repetitively

fail against THEIR models

@t_magennis12

Strategy 3 – Perceived Fairness

• One team gets some “extra” attention based on an argument supported by data– Extra resources, More Investment– More time to demo

• With just a few examples, often there is an avalanche of willing metric support by others

• Make it clear why the data swayed a decision

@t_magennis13

Uncertain Data Quality

@t_magennis14

Checking for Gaming & Errors

• We can ask tougher questions– What assumptions are built into this forecast?

• Why would we be 2x better than we ever have before?

– Walk me through the logic supporting your analysis– Looking at historical data, we predict very poorly when

there are 3 or more dependent teams. Have you considered this?

• We can test for unlikely patterns– Distribution analysis– Benford’s Law

15 @t_magennis

Throughput per week

Evidence of data quality is a well formed and explainable

distribution shape

Customer: “Our data is crap. You can’t use any of it”

@t_magennis16

Distribution Shape & Outliers

• Plot visually using Histogram• Set a rule: E.g. >10 times the mode? (state it)

Mode is 3

50 & 100 are outliers worth discussion..

@t_magennis17

Benford’s Law

• Benford's Law, also called the First-Digit Law, refers to the frequency distribution of digits in many real-life sources of data.

• Know to apply to: electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, …, and processes described by power laws.

Source: Wikipedia

Common in story counts per epics in software projects. Also probable in lead time cycle time values.

@t_magennis18

Benford’s Law Applied to Story Count

• Story count estimate for 48 randomly picked epics

• The frequency of the first digits was computed

• These were compared to Benford’s prediction (green within 1.5%)

dBenford’s Prediction

P(d)

Actual DataP(d)

1 30.1% 31.3%2 17.6% 18.8%3 12.5% 20.8%4 9.7% 8.3%5 7.9% 8.3%6 6.7% 8.3%7 5.8% 0%8 5.1% 4.2%9 4.6% 0%

Based on real data n = 48

@t_magennis19

Data Analysis Spreadsheethttps://github.com/FocusedObjective/FocusedObjective.Resources

20Data

@t_magennis21

Forecasting using data without considering context

@t_magennis22

Throughput Trend by Week

W2-2012 W10-2012 W18-2012 W26-2012 W34-2012 W42-2012 W50-2012 W4-2013 W12-2013 W20-2013 W28-2013 W36-2013 W44-2013 W52-2013 W7-2014 W15-20140

200

400

600

800

1000

1200

1400

1600

All Enabling Spec Bugs NFRs

@t_magennis23



200

400

600

800

1000

1200

1400

1600


@t_magennis24



200

400

600

800

1000

1200

1400

1600


High Volatility

Decline?

Restructure?

Training? Coaches added

end of year break

@t_magennis25

Good Contextual Forecasting

• Know the past– Track the date of significant company events

• Reorgs, releases, competitor releases,

– Track reference data that may show context• Staff numbers by date, National Holidays

– Markup all charts and data with context labels• Consider the future

– What events are likely over the forecast period– Draw samples considering these contexts

@t_magennis26

Some Context Events…• Internal differences in team skills• Any change (Hawthorn Effect)• Change of Risk Profile• Unstable WIP• Poor Quality• Unstable Test Environment• Seasons - Vacations• Executive Re-org• Natural Disasters• Exceptional Sickness• Changes in Staff• Team Changes• Location• Environmental Disturbance• Moral Shifts• Process Change• Architectural Change• Fatigue (Low Work Moral)• Change of demand for different classes of service• Account of Expedites• Changes in how to measure• Poor record keeping• Delivery frequency / cadence• Org changes / staffing

• Gaming the System• Mergers and Acquisitions• Multi tasking• High attrition rates• Staff availability due to prod issues• Critical specialists not available• Introduce new technology• Technical architectural changes• Legal requirements (date fixed)• Beginning the project• User stories too large• Dependency identification• Technical complexity• External spot demands• Changing prioritization• Expedited work• External dependencies• Better coffee• Relevant training• Process changes• Process problem moving tickets• New management policy

@t_magennis27

Forecasting using poor estimates from “Experts”

“Uncertain Uncertainty”

@t_magennis28

Improving Estimates

Stop• Point estimates• Ignoring uncertainty• Thinking it’s easy• “Never speak of this again”• Inventing units (points)• Rewarding gaming• Tolerating ambiguity

Start• Using Range estimates• Expressing Un-certainty• Train & practice estimation• Learning with feedback• Using dollars, time, counts• Rewarding honesty• Presenting unbiased data

@t_magennis31

http://ccnss.org/materials/pdf/sigman/callibration_probabilities_lichtenstein_fischoff_philips.pdf

32

Estimation Training

• How sure you are about guesses?• This can be practiced• Calibration – Trivia Game

– Ask a question about a known actual– Ask people to guess the range

• “True or False: "A hockey puck fits in a golf hole” • “Confidence: Choose the probability that best

represents your chance of getting this question right...

50% 60% 70% 80% 90% 100%”

– Disclose the result – 50% (no idea) should get 50% of the questions right by guess alone

Source: http://en.wikipedia.org/wiki/Calibrated_probability_assessment

@t_magennis33

No Lead Time Data?

• No team yet? No history?• We need two estimates with probability

– 1 in 5 tasks should take less than 1 day– 4 in 5 tasks should take less than 5 days

• We need to solve the curve that fits these two probabilities (and hopefully the others)

@t_magennis34

http://bit.ly/1tC1Phy

• Why lead time is Weibull, Why you care…

@t_magennis35

80% <= 5 Days(4 in 5)

20% <= 1 Day(1 in 5)

How do we get experts to estimate ranges and predict higher order percentiles from two estimates?

36

80% <= 5 Days

20% <= 1 Day

p2 x2

p1 x1

See detailed paper on the mathematics: http://www.johndcook.com/quantiles_parameters.pdf

?

37

https://github.com/FocusedObjective/FocusedObjective.Resources

Excel Formula: =(LN(-LN(1-p2_param))-LN(-LN(1-p1_param)))/(LN(x2_param) -

LN(x1_param))

=x1_param/(POWER((-LN(1-p1_param)),(1/Shape_result)))

=Scale_result*POWER(-LN(1-A27),1/Shape_result)

@t_magennis38

Missing HUGE delays and workload beyond the 95th

Percentile

39

http://connected-knowledge.com/

@t_magennis40

Long Tail Distribution Sampling

Good chance of Samples

Low chance of Samples

41

Hard to sample high-end percentiles…

• You find high end quickly for uniform dist.– 12 samples (50% certain of finding 90% range)

• Not so, for long tail distribution (Eg. Weibull shape: 1.5)

– 88% never found after 1000 trials, avg. 425 if lucky

@t_magennis

From samples (likely in practice)

By Formula (NOT likely in practice)

@t_magennis42

What is Risk?

95% <= 8.29 Days

Big Risks

How can we identify these?

@t_magennis43

The RISK is out there…

Lazy

@t_magennis44

Contact Details

www.FocusedObjective.comDownload latest software, videos, presentations and articles on

forecasting and applied predictive analytics

[email protected] email address for all questions and comments

@t_magennisTwitter feed from Troy Magennis

http://www.focusedobjective.com/

mailto:[email protected]

@t_magennis45

CASE STUDY: ESTIMATING TOTAL STORY COUNT

Do we have to break down EVERY epic to estimate story counts?

@t_magennis46

Problem: Getting a high level time and cost estimate for

proposed business strategytime and costs

Approach: Randomly sample epics from the 328 proposed

and perform story breakdown. Then use throughput history to

estimate time and costs

@t_magennis47

9

5

13 13

11

9

13

117

5

35

14

4 19

1Sum: 51

14751128

35195131183

Trial 1Trial 2 Trial 100

…

Number of stories

Sample with replacementRemember to put the piece of paper back in after each draw!

@t_magennis48

Epic Breakdown – Sample Count

Process 50% CI

75% CI

95% CI

MC 48 samples 261 282 315

Actual Sum262

Facilitated by well known consulting company, team performed story breakdown (counts) of epics. 48 (out of 328) epics were analyzed.

Process 50% CI

75% CI

95% CI

MC 48 samples 261 282 315MC 24 samples 236 257 292

Process 50% CI

75% CI

95% CI

MC 48 samples 261 282 315MC 24 samples 236 257 292MC 12 samples 223 239 266

Process 50% CI

75% CI

95% CI

MC 48 samples 261 282 315MC 24 samples 236 257 292MC 12 samples 223 239 266MC 6 samples 232 247 268

@t_magennis49

PROBLEMS WITH NON-LINEAR SCALES

@t_magennis50

Fibonacci Bias…

1 2 3 5 8 13 … 21

Team (3 of 130, 82% Median 5) Median Mean SDTeam AProcess Change Team 5 4.4 3Team BUI Software Dev Team 5 5.4 6Team CLibrary Software Dev Team 5 5.7 5.5

Question: What is the

middle value for this scale?

Perceived (5) Mathematical (10.5)

Being < 0 at MEAN – 1 SD should be an

indicator something is

wrong!

@t_magennis51

Normal?

Expect ~50%

Expect~15%

Expect~35%

@t_magennis52

Paper: Does the use of Fibonacci numbers in Planning Poker affect effort estimates?

“Conclusion: The use of a Fibonacci scale, and possibly other non-linear scales, is likely to affect the effort

estimates towards lower values compared to linear scales.

A possible explanation for this scale-induced effect is that people tend to be biased towards toward the middle of the

provided scale, especially when the uncertainty is substantial. The middle value is likely to be perceived as

lower for the Fibonacci than for the linear scale.”

https://www.simula.no/publications/Simula.simula.1282

R. Tamrakar and M. Jørgensen (2012)

https://www.simula.no/people/magnej

@t_magennis53

Really, really, know the question…

• What is the goal or question being asked?• How is this question answered now?

– Good enough? Is it believed?– Current cost OK?

• What data would be necessary to answer this question slightly better?– Is the cost justified?– Would the result be more reliable?

@t_magennis

Import/Cleaning Tools

54

Re-runnable / Automation

Machine Learning

Importing

Normalizing

Imputing

Visualization

Estimating missing values

@t_magennis55 Spurious Correlations: http://tylervigen.com/

@t_magennis56 Spurious Correlations: http://tylervigen.com/

@t_magennis57

Correlation != Causation

• Criteria for causality– The cause precedes the effect in sequence– The cause and effect are empirically correlated

and have a plausible interaction – The correlations is not spurious

Sources: Kan,2003 pp80 and Babbie, 1986(HTTP://XKCD.COM/552/ CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL 2.5 LICENSE)

http://xkcd.com/552/



http://creativecommons.org/licenses/by-nc/2.5/