virtual site event predictive analytics: what managers...

35
Virtual Site Event 1 Predictive Analytics: What Managers Need to Know Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Upload: vantu

Post on 29-Aug-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Virtual Site Event

1

Predictive Analytics:

What Managers Need to Know

Presented by:

Paul Arnest, MS, MBA, PMP

February 11, 2015

Page 2: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Virtual Site Ground Rules

Ground Rules PMI Code of Conduct applies for this virtual presentation.

The Virtual Attendees are expected to:

Participate for a minimum of 40 minutes. Login information

will be verified.

Answer the question pertaining to the presentation correctly

in the survey in order to obtain the PDU credit (1).

Respond to the survey within 48 hours (By Friday February

13, 2015) of participation in order to obtain the PDU credit.

2

Page 3: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Predictive Analytics

What Managers Need to Know

3

Page 4: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

A NEW ENVIRONMENT

Predictive Analytics

4

Page 5: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Definition

Predictive Analytics: Techniques that

quantify potential outcomes or events

based on past data

Not descriptive analysis and descriptive

statistics

Not techniques that enable end-users to

perform individual data discovery or to

customize reports

5

Page 6: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Convergence

Once restricted to specialized statistics

organizations, advanced modeling

techniques are moving into the IT

mainstream

6

Stat/Analytics Shop IT

Page 7: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Concepts/Buzzwords

Machine learning ▫ Supervised learning ▫

Unsupervised learning

Response variable ▫ Target variable ▫

Dependent variable ▫ Left hand side

variable

Explanatory variable ▫ Independent

variable ▫ Right hand side variable

Logistic regression ▫ Random forest, etc.

Sensitivity ▫ Specificity

7

Page 8: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Tool independence

Predictive techniques use mathematical

algorithms that are independent of

particular tools

SAS, R, Stata, SPSS, many more

Use specialized tools for model

development

It is possible to implement models using

general software tools, i.e., Java, .Net

8

Page 9: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Don’t be intimidated

Your stat/analysis package is

programmed to do the heavy math

You’ll discover that most internal stat shops

are using a small set of models and

techniques over and over again

Most of the work:

Understanding what you want to accomplish

Understanding the data

Organizing the data

9

Page 10: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Understand the results

Predictive analytics produce a probability

of a characteristic or behavior based on a

detailed analysis of past characteristics or

behaviors

Probability is ≤ 100% ≠ Certainty

Model accuracy depends on similarity of past

conditions to present

10

Page 11: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

HOW IT WORKS

AND WHAT TO EXPECT

Predictive Analytics

11

Page 12: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Logistic regression

Workhorse procedure for predictive

analytics

Supervised technique

12

Page 13: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Step 1

Identify a known population that exhibits

the characteristic you want to predict —

‘dependent’, ‘target’ or ‘response’ variable

— plus a known population that does not

You may take the whole population (‘big

data’) or a sample

Use 80% or 90% of the sample as the

training data set

Withhold the remainder for validation

13

Page 14: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Step 2

Construct a hypothesis (‘null hypothesis’)

Select variables expected to distinguish

target population — ‘independent’ or

‘explanatory’ variables

14

Page 15: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Step 3

Run a logistic regression against the

variables

Logistic regression will calculate the

likelihood (predictive odds) that the

independent variables are associated with

the dependent variable

15

Page 16: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Step 4

Test the hypothesis on the withheld

sample and the broader population

Caution:

It’s critical to identify the target

characteristics accurately

16

Page 17: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Logistic regression: targets

17

Target: Workers’

Compensation

Fraudsters Target

High Incidence

Organization

Dr on CMS

Ineligible

List

High Risk

Occupation

Psychological

Impairment

Imperceptible

Physical

Impairment

Linda 1 1 1 1 1 1

Rebecca 1 1 1 1 0 1

Samuel 1 1 0 1 1 0

Stephen 1 0 0 0 1 1

Amanda 1 1 0 0 1 0

Hugh 1 0 1 0 0 1

Francesco 1 0 1 1 0 1

Allen 1 1 0 0 1 0

Eric 1 1 0 0 1 1

Gail 1 0 1 0 0 1

Joseph 1 1 1 1 0 0

Derek 1 1 1 0 1 0

Kevin 1 1 0 1 1 1

Page 18: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Logistic regression: general

18

General population

of covered workers Target

High Incidence

Organization

Dr on CMS

Ineligible

List

High Risk

Occupation

Psychological

Impairment

Imperceptible

Physical

Impairment

Linda 0 1 1 1 1 1

Rebecca 0 0 0 1 0 1

Samuel 0 0 0 0 0 0

Stephen 0 0 0 0 0 1

Amanda 0 1 0 0 1 0

Hugh 0 0 1 0 0 1

Francesco 0 0 0 0 0 0

Allen 0 0 0 0 1 0

Eric 0 0 0 0 1 1

Gail 0 0 1 0 0 1

Joseph 0 0 0 1 1 0

Derek 0 0 1 0 0 0

Kevin 0 1 0 1 1 1

Page 19: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Results

Maximum Likelihood Estimates:

Fraud likelihood = −1.9884 (intercept) +

2.1370 (multiple cases) + 1.2356 (CMS

ineligible) + .3784 (rep disciplined)

+.1877 (psychological) + .4805

(imperceptible physical)

19

Page 20: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Interpretation

Positive coefficients mean all factors

contribute to likelihood of fraud

Coefficients reflect the actual weight the

model places on each factor

Intercept (−1.9884) means this model

predicts a 12% likelihood of fraud if no

modeled factors present

20

Page 21: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Test of model accuracy

C-statistic (probability outcome is better

than chance) = 0.814

≥0.70 indicates an acceptable model

≥0.80 indicates a strong model — the closer

to 1 the better

Visually represented as ROC curve

21

Page 22: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Considerations

Accuracy only as good as the target

population sample

Sum of the terms = ‘logit’ of the predictive

probability of the model — translates into

odds a claim is fraudulent

Conversion of coefficient of the target

variable — logit(p) — to probability

𝑝 = 1

1+ 𝑒−logit(𝑝)

22

Page 23: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Logit transformation p logit(p) p logit(p) p logit(p) p logit(p)

0.01 -4.5951 0.26 -1.0460 0.51 0.0400 0.76 1.1527

0.02 -3.8918 0.27 -0.9946 0.52 0.0800 0.77 1.2083

0.03 -3.4761 0.28 -0.9445 0.53 0.1201 0.78 1.2657

0.04 -3.1781 0.29 -0.8954 0.54 0.1603 0.79 1.3249

0.05 -2.9444 0.30 -0.8473 0.55 0.2007 0.8 1.3863

0.06 -2.7515 0.31 -0.8001 0.56 0.2412 0.81 1.4500

0.07 -2.5867 0.32 -0.7538 0.57 0.2819 0.82 1.5163

0.08 -2.4423 0.33 -0.7082 0.58 0.3228 0.83 1.5856

0.09 -2.3136 0.34 -0.6633 0.59 0.3640 0.84 1.6582

0.10 -2.1972 0.35 -0.6190 0.60 0.4055 0.85 1.7346

0.11 -2.0907 0.36 -0.5754 0.61 0.4473 0.86 1.8153

0.12 -1.9924 0.37 -0.5322 0.62 0.4895 0.87 1.9010

0.13 -1.9010 0.38 -0.4895 0.63 0.5322 0.88 1.9924

0.14 -1.8153 0.39 -0.4473 0.64 0.5754 0.89 2.0907

0.15 -1.7346 0.40 -0.4055 0.65 0.6190 0.9 2.1972

0.16 -1.6582 0.41 -0.3640 0.66 0.6633 0.91 2.3136

0.17 -1.5856 0.42 -0.3228 0.67 0.7082 0.92 2.4423

0.18 -1.5163 0.43 -0.2819 0.68 0.7538 0.93 2.5867

0.19 -1.4500 0.44 -0.2412 0.69 0.8001 0.94 2.7515

0.20 -1.3863 0.45 -0.2007 0.70 0.8473 0.95 2.9444

0.21 -1.3249 0.46 -0.1603 0.71 0.8954 0.96 3.1781

0.22 -1.2657 0.47 -0.1201 0.72 0.9445 0.97 3.4761

0.23 -1.2083 0.48 -0.0800 0.73 0.9946 0.98 3.8918

0.24 -1.1527 0.49 -0.0400 0.74 1.0460 0.99 4.5951

0.25 -1.0986 0.50 0.0000 0.75 1.0986

23

If all factors present,

logit(p) = −1.9884 +

2.1370 + 1.2356 +

0.3784 + 0.1877 +

0.4805 = 2.4308 =

92% probability of

fraud

Page 24: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

LR weaknesses

All potential fraud factors combined into a

single equation

With many independent predictor variables,

characteristics can cancel each other out

Logistic regression has a hard time weighting

interactions between individual variables

Must be programmed explicitly

Requires additional data manipulation

24

Page 25: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

LR weaknesses (ctd)

In rare-event modeling with a large

number of predictive variables, logistic

regression can produce many false

positives

Difficult to differentiate rare events from

normal events when the rare events occur

with extremely low frequency

Bad solution is to boost the sensitivity of the

model

25

Page 26: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Other supervised methods

Decision tree mitigates the problem of

numerous weak predictors overwhelming

a strong predictor (logistic regression)

Sorts observations of the dependent

variable into buckets corresponding to its

available classification values

Conditional selection into paths (‘branches’)

Priority determined by frequency of

characteristics

26

Page 27: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Decision tree example

27

HighIncidence

Organization

4F/10N 9F/3N

Purity 4F/5N Purity 7F/3N

1F/3N 3F/2N 4F/1N 3F/2N

Purity Tie PurityTie Purity3F/1N Tie 2F/1N

0 cases = 01 case = 1

1 case = 01 case = 1

1 case = 02 cases = 1

0 cases = 01 cases = 1

Doctor on CMS Ineligible

List

0 cases = 01 case = 1

2 cases = 02 cases = 1

0 cases = 01 case = 1

2 cases = 00 cases = 1

1 case = 01 case = 1

1 case = 01 case = 1

Doctor on CMS Ineligible

List

High Risk Occupation

High Risk Occupation

0 cases = 02 cases = 1

Imperceptible Physical

Impairment

Psychological Impairment

High Risk Occupation

5 cases = 00 cases = 1

Doctor on CMS Ineligible

List

Imperceptible Physical

Impairment

Psychological Impairment

Imperfect PurityPurity Tie

Left-Facing Arrows: Value = Characteristic is absentRight-Facing Arrows: Value = Characteristic is present0 = No Fraud1 = FraudMisclassification Rate = 23.08%

Page 28: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Beyond decision tree

Decision tree may overweight high-

frequency but insignificant characteristics

Boosted decision tree and random forest

are techniques to improve on the results

of the basic algorithm based on

misclassification rates

Neural networks model all possible

combinations and select the best ones

based on misclassification rates

28

Page 29: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Unsupervised methods

K-means cluster

Consider it a generalization of logistic

regression

Identify a set of independent variables

Transformations likely required, as above

Procedure tries to identify a set of statistically

significant ‘clusters’ based on the selected

variables

Can tease out meaningful characteristics

29

Page 30: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

SOME BEST PRACTICES

IN DATA MANAGEMENT

Predictive Analytics

30

Page 31: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Data best practices

Understand your data

What does it represent

How does it enter your data warehouse

Check data for suitability

Missing values?

Do target and individual predictors correlate?

Ensure that data cleansing and

transformation steps are documented

and repeatable for model re-estimation

31

Page 32: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Counterintuitive-ness

The more independent variables, the less

predictive value each individual variable,

or characteristic, has, on average

32

Page 33: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Counterintuitive-ness (ctd)

In rare event modeling, even a very

accurate model can produce

disproportionately large false positives

Example: Target population 1% in a

population of 1,000,000 (10,000 targets).

If predictive model has a 10% false positive rate (90%

accurate):

33

Target General population

10,000 990,000

True positives: 9,000 True negatives: 891,000

False negatives: 1,000 False positives: 99,000

Page 34: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Takeaways for success

1.Clearly identify target variable

2.Limit predictor variables

3.Know the model data and manage it —

data management is most of the work

4.Know how to measure model

performance

5.Set goals and expectations for the model

6.Monitor model performance and adjust/

re-estimate as necessary

34

Page 35: Virtual Site Event Predictive Analytics: What Managers ...pmibaltimore.org/pmi/events/attachments/9678200.pdf · Predictive Analytics: What Managers Need to Know Presented by: Paul

Thank you/Questions

Paul Arnest

[email protected]

35