1 chapter 4: introduction to predictive modeling: regressions 4.1 introduction 4.2 selecting...

101
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

Upload: elinor-douglas

Post on 04-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

1

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 2: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

2

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 3: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

3

Model Essentials – Regressions

Predict new cases.

Select useful inputs.

Optimize complexity.

...

Page 4: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

4

Model Essentials – Regressions

Best modelfrom sequence

Sequentialselection

Predict new cases.

Select useful inputs

Optimize complexity

Select useful inputs.

Optimize complexity.

...

Page 5: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

6

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction

estimate^

Choose intercept and parameter estimates to minimize:

∑( yi – yi )2

trainingdata

^squared error function

...

Page 6: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

7

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction

estimate^

Choose intercept and parameter estimates to minimize.

∑( yi – yi )2

trainingdata

^squared error function

...

Page 7: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

8

Logistic Regression Prediction Formula

= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores

...

^log

p

1 – p( )^

Page 8: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

9

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

...

logitlink function

0 1

5

-5

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

^log

p

1 – p( )^

logit scores

Page 9: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

10

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores

...

logitlink function

0 1

5

-5

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

^log

p

1 – p( )^

Page 10: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

11

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

...

^log

p

1 – p( )^

1

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

=

Page 11: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

12

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

...

^log

p

1 – p( )^

1

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

=

Page 12: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

13

Logit Link Function

...

Page 13: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

14

Simple Prediction Illustration – Regressions Predict dot color for each x1 and x2.

You need intercept and parameter estimates.

...

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 14: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

15

Simple Prediction Illustration – Regressions

You need intercept and parameter estimates.

...

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 15: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

16

Simple Prediction Illustration – Regressions

log-likelihood function

Find parameter estimates by maximizing

...

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 16: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

17

Simple Prediction Illustration – Regressions

log-likelihood function

Find parameter estimates by maximizing

...

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 17: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

18

Simple Prediction Illustration – Regressions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.

...

Page 18: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

19

Page 19: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

20

4.01 Multiple Choice PollWhat is the logistic regression prediction for the indicated point?

a. 0.243

b. 0.56

c. yellow

d. It depends.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 20: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

21

4.01 Multiple Choice Poll – Correct AnswerWhat is the logistic regression prediction for the indicated point?

a. 0.243

b. 0.56

c. yellow

d. It depends.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 21: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

22

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 22: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

23

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 23: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

24

Missing Values and Regression Modeling

Training Datatargetinputs

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

...

Page 24: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

25

Consequence: missing values can significantly reduce your amount of training data for regression modeling!

Missing Values and Regression Modeling

Training Datatargetinputs

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

...

Page 25: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

26

Missing Values and Regression Modeling

Consequence: Missing values can significantly reduce your amount of training data for regression modeling!

Training Datatargetinputs

...

Page 26: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

27

Missing Values and the Prediction Formula

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

...

Page 27: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

28

Missing Values and the Prediction Formula

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

...

Page 28: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

29

Missing Values and the Prediction Formula

...

Problem 2: Prediction formulas cannot score cases with missing values.

Page 29: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

30

Missing Values and the Prediction Formula

...

Problem 2: Prediction formulas cannot score cases with missing values.

Page 30: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

31

Missing Value Issues

Manage missing values.

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

...

Problem 2: Prediction formulas cannot score cases with missing values.

Page 31: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

32

Missing Value Issues

Manage missing values.

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

...

Problem 2: Prediction formulas cannot score cases with missing values.

Page 32: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

33

Missing Value Causes

Manage missing values.

Non-applicable measurement

No match on merge

Non-disclosed measurement

...

Page 33: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

34

Missing Value Remedies

Manage missing values.

xi = f(x1, … ,xp)

Non-applicable measurement

No match on merge

Non-disclosed measurement

...

Page 34: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

35

Managing Missing Values

This demonstration illustrates how to impute synthetic data values and create missing value indicators.

Page 35: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

36

Running the Regression Node

This demonstration illustrates using the Regression tool.

Page 36: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

37

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 37: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

38

Predictionformula

Model Essentials – Regressions

Best modelfrom sequence

Sequentialselection

Predict new cases.

Select useful inputs

Optimize complexity.

Select useful inputs.

Page 38: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

39

Sequential Selection – Forward

Entry CutoffInput p-value

...

Page 39: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

40

Sequential Selection – Forward

Entry CutoffInput p-value

...

Page 40: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

41

Sequential Selection – Forward

Entry CutoffInput p-value

...

Page 41: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

42

Sequential Selection – Forward

Entry CutoffInput p-value

...

Page 42: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

43

Sequential Selection – Forward

Entry CutoffInput p-value

Page 43: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

44

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 44: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

45

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 45: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

46

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 46: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

47

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 47: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

48

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 48: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

49

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 49: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

50

Sequential Selection – Backward

Stay CutoffInput p-value

...

Page 50: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

51

Sequential Selection – Backward

Stay CutoffInput p-value

Page 51: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

52

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 52: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

53

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 53: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

54

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 54: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

55

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 55: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

56

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 56: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

57

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

...

Page 57: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

58

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

Page 58: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

59

Page 59: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

60

4.02 PollThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.

True

False

Page 60: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

61

4.02 Poll – Correct AnswerThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.

True

False

Page 61: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

62

Selecting Inputs

This demonstration illustrates using stepwise selection to choose inputs for the model.

Page 62: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

63

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 63: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

64

Model Essentials – Regressions

Predict new cases.

Select useful inputs.

Optimize complexity.

Predictionformula

Sequentialselection

...

Page 64: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

65

Model Fit versus Complexity

1 2 3 4 5 6

Model fit statistic

training

validation

...

Page 65: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

66

Select Model with Optimal Validation Fit

1 2 3 4 5 6

Model fit statistic

Evaluate eachsequence step.

...

Page 66: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

67

Optimizing Complexity

This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

Page 67: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

68

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 68: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

69

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 69: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

70

Beyond the Prediction Formula

Manage missing values

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 70: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

71

Logistic Regression Prediction Formula

...

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

^log

p

1 – p( )^

logit scores

Page 71: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

72

Odds Ratios and Doubling Amounts

Odds ratio: Amount odds change with unit change in input.Doubling amount:

How much does an input have to change to double the odds?

1 odds exp(wi)

odds 20.69wi

Δxi consequence

...

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

^log

p

1 – p( )^

logit scores

Page 72: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

73

Interpreting a Regression Model

This demonstration illustrates interpreting a regression model using odds ratios.

Page 73: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

74

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 74: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

75

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 75: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

76

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

standard regression

true association

standard regression

true association

Original Input Scale

...

Page 76: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

77

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

standard regression

true association

standard regression

true association

Original Input Scale

more symmetricdistribution

Regularized Scale

...

Page 77: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

78

Original Input Scale

Regularizing Input Transformations

more symmetricdistribution

Regularized Scale

standard regression

standard regression

...

Original Input Scale

high leverage pointsskewed inputdistribution

Page 78: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

79

Regularizing Input TransformationsRegularized Scale

standard regression

standard regression

...

Original Input ScaleOriginal Input Scale

regularized estimate

regularized estimate

Page 79: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

80

Regularizing Input TransformationsRegularized Scale

standard regression

standard regression

...

Original Input Scale

regularized estimate

regularized estimate

true association

true association

Page 80: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

81

Page 81: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

82

4.03 Multiple Choice PollWhich statement below is true about transformations of input variables in a regression analysis?

a. They are never a good idea.

b. They help model assumptions match the assumptions of maximum likelihood estimation.

c. They are performed to reduce the bias in model predictions.

d. They typically are done on nominal (categorical) inputs.

Page 82: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

83

4.03 Multiple Choice Poll – Correct AnswerWhich statement below is true about transformations of input variables in a regression analysis?

a. They are never a good idea.

b. They help model assumptions match the assumptions of maximum likelihood estimation.

c. They are performed to reduce the bias in model predictions.

d. They typically are done on nominal (categorical) inputs.

Page 83: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

84

Transforming Inputs

This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.

Page 84: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

85

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

Page 85: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

86

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 86: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

87

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 87: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

88

Nonnumeric Input Coding

Level DI

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

00000001

ABCDEFGHI

...

Page 88: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

89

DI

000000001

DI

000000001

Coding Redundancy

Level

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

...

Page 89: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

90

DI

000000001

Coding Consolidation

Level

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

...

Page 90: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

91

DI

000000001

Coding Consolidation

Level

1 0 0 0 0 0 0 0

DABCD DB DC DD DEF DF DGH DH

1 0 0 1 0 0 0 0

1 1 0 0 0 0 0 01 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 1 10 0 0 0 0 0 0 0

ABCDEFGHI

Page 91: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

92

Recoding Categorical Inputs

This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.

Page 92: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

93

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)4.7 Polynomial Regressions (Self-Study)

Page 93: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

94

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 94: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

95

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

...

Page 95: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

96

Standard Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ ·

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Page 96: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

97

Polynomial Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

quadratic terms

+ w3 x1 + w4 x2 2 2^ ^

+ w5 x1 x2

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40 0.50 0.60 0.700.30

0.60

0.70

0.80

...

^

Page 97: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

98

Adding Polynomial Regression Terms Selectively

This demonstration illustrates how to add polynomial regression terms selectively.

Page 98: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

99

Adding Polynomial Regression Terms Autonomously (Self-Study)

This demonstration illustrates how to add polynomial regression terms autonomously.

Page 99: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

100

Exercises

This exercise reinforces the concepts discussed previously.

Page 100: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

101

Regression Tools ReviewReplace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator.

Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios.

Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation.

continued...

Page 101: 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4

102

Regression Tools Review

Consolidate levels of a nonnumeric input using the Replacement Editor window.

Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.