presentation and data http:// short courses regression analysis using jmp

49
Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop 1

Upload: ricky

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop. Regression Analysis Using JMP. Mark Seiss , Dept. of Statistics. Presentation Outline. Simple Linear Regression Multiple Linear Regression - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Presentation and Data

http://www.lisa.stat.vt.edu

Short Courses

Regression Analysis Using JMP

Download Data to Desktop

1

Page 2: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Mark Seiss, Dept. of Statistics

Regression Analysis Using JMP

February 28, 2012

Page 3: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Presentation Outline

1. Simple Linear Regression

2. Multiple Linear Regression

3. Regression with Binary and Count Response Variables

Page 4: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Presentation Outline

Questions/Comments

Individual Goals/Interests

Page 5: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression

1. Definition2. Correlation3. Model and Estimation4. Coefficient of Determination (R2)5. Assumptions6. Example

Page 6: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Simple Linear Regression (SLR) is used to study the relationship

between a variable of interest and another variable.• Both variables must be continuous• Variable of interest known as Response or Dependent

Variable• Other variable known as Explanatory or Independent Variable

• Objectives• Determine the significance of the explanatory variable in

explaining the variability in the response (not necessarily causation).

• Predict values of the response variable for given values of the explanatory variable.

Page 7: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Scatterplots are used to graphically examine the relationship

between two quantitative variables.• Linear or Non-linear• Positive or Negative

Page 8: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression

Positive Linear Relationship

Non-Linear RelationshipNo Relationship

Negative Linear Relationship

10

20

30

40

50

Y

0 5 10 15 20X

0

10

20

30

40

50

60

Y

0 5 10 15 20X

10

15

20

25

30

35

40

45

50

55

Y

0 5 10 15 20X

Page 9: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Correlation

• Measures the strength of the linear relationship between two quantitative variables.

• Pearson Correlation Coefficient• Assumption of normality• Calculation:

• Spearman’s Rho and Kendall’s Tau are used for non-normal quantitative variables.

Page 10: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Properties of Pearson Correlation Coefficient

• -1 ≤ r ≤ 1• Positive values of r: as one variable increases, the other

increases• Negative values of r: as one variable increases, the other

decreases• Values close to 0 indicate no linear relationship between the

two variables• Values close to +1 or -1 indicated strong linear relationships• Important note: Correlation does not imply causation

Page 11: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Pearson Correlation Coefficient: General Guidelines

• 0 ≤ |r| < 0.2 : Very Weak linear relationship• 0.2 ≤ |r| < 0.4 : Weak linear relationship• 0.4 ≤ |r| < 0.6 : Moderate linear relationship• 0.6 ≤ |r| < 0.8 : Strong linear relationship• 0.8 ≤ |r| < 1.0 : Very Strong linear relationship

Page 12: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• The Simple Linear Regression Model

• Basic Model: response = deterministic + stochastic• Deterministic: model of the linear relationship between X

and Y• Stochastic: Variation, uncertainty, and miscellaneous

factors

• Model

yi= value of the response variable for the ith observation

xi= value of the explanatory variable for the ith observation

β0= y-intercept

β1= slope

εi= random error, iid Normal(0,σ2)

Page 13: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Least Square Estimation

• Predicted Values

• Residuals

Page 14: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Interpretation of Parameters

• β0: Value of Y when X=0• β1: Change in the value of Y with an increase of 1 unit of X

(also known as the slope of the line)

• Hypothesis Testing• β0- Test whether the true y-intercept is different from 0

Null Hypothesis: β0=0

Alternative Hypothesis: β0≠0• β1- Test whether the slope is different from 0

Null Hypothesis: β1=0

Alternative Hypothesis: β1≠0

Page 15: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Analysis of Variance (ANOVA) for Simple Linear Regression

Source Df Sum of Squares

Mean Square F Ratio P-value

Model 1 SSR SSR/1 F1=MSR/MSE P(F>F1,1-α,1,n-2)

Error n-2 SSE SSE/(n-2)

Total n-1 SST

Page 16: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression

Page 17: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Coefficient of Determination (R2)

• Percent variation in the response variable (Y) that is explained by the least squares regression line

• 0 ≤ R2 ≤ 1• Calculation:

Page 18: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Assumptions of Simple Linear Regression

1. IndependenceResiduals are independent of each otherRelated to the method in which the data were

collected or time related dataTested by plotting time collected vs. residualsParametric test: Durbin-Watson Test

2. Constant VarianceVariance of the residuals is constantTested by plotting predicted values vs. residualsParametric test: Brown-Forsythe Test

Page 19: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Assumptions of Simple Linear Regression

3. NormalityResiduals are normally distributedTested by evaluating histograms and normal-

quantile plots of residualsParametric test: Shapiro Wilkes test

Page 20: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Constant Variance: Plot of Fitted Values vs. Residuals

Good Residual Plot: No Pattern Bad Residual Plot: Variability Increasing

-20

-10

0

10

20

30

Res

idua

ls Y

0 1 2 3 4 5 6 7 8 9 10X

-6

-4

-2

0

2

4

6

8

Res

idua

ls Y

0 5 10 15 20XPredicted Values Predicted Values

Page 21: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Normality: Histogram and Q-Q Plot of Residuals

Normal Assumption Appropriate Normal Assumption Not Appropriate

Page 22: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Some Remedies

• Non-Constant Variance: Weight Least Squares• Non-normality: Box-Cox Transformation• Dependence: Auto-Regressive Models

Page 23: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Example Dataset: Chirps of Ground Crickets

• Pierce (1949) measure the frequency (the number of wing vibrations per second) of chirps made by a ground cricket, at various ground temperature.

• Filename: chirp.jmp

Page 24: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Simple Linear Regression• Questions/Comments about Simple Linear Regression

Page 25: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression

1. Definition2. Categorical Explanatory Variables3. Model and Estimation4. Adjusted Coefficient of Determination5. Assumptions6. Model Selection7. Example

Page 26: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Explanatory Variables

• Two Types: Continuous and Categorical• Continuous Predictor Variables

• Examples – Time, Grade Point Average, Test Score, etc.• Coded with one parameter – β#x#

• Categorical Predictor Variables• Examples – Sex, Political Affiliation, Marital Status, etc.• Actual value assigned to Category not important• Ex) Sex - Male/Female, M/F, 1/2, 0/1, etc.• Coded Differently than continuous variables

Page 27: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Categorical Explanatory Variables

• Consider a categorical explanatory variable with L categories• One category selected as reference category

• Assignment of Reference Category is arbitrary • Variable represented by L-1 dummy variables

• Model Identifiability• Effect Coding (Used in JMP)

• xk = 1 if explanatory variable is equal to category k 0 otherwise• xk = -1 for all k if explanatory variable equals category I

Page 28: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Similar to simple linear regression, except now there is more than

one explanatory variable, which may be quantitative and/or qualitative.

• Model

yi= value of the response variable for the ith observation

x#i= value of the explanatory variable # for the ith observation

β0= y-intercept

β#= parameter corresponding to explanatory variable #

εi= random error, iid Normal(0,σ2)

Page 29: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Least Square Estimation

• Predicted Values

• Residuals

Page 30: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Interpretation of Parameters

• β0: Value of Y when X=0• Β#: Change in the value of Y with an increase of 1 unit of X#

in the presence of the other explanatory variables

• Hypothesis Testing• β0- Test whether the true y-intercept is different from 0

Null Hypothesis: β0=0

Alternative Hypothesis: β0≠0• Β#- Test of whether the value change in Y with an increase

of 1 unit in X# is different from 0 in the presence of the other explanatory variables.

Null Hypothesis: β#=0

Alternative Hypothesis: β#≠0

Page 31: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Adjusted Coefficient of Determination (R2)

• Percent variation in the response variable (Y) that is explained by the least squares regression line with explanatory variables x1, x2,…,xp

• Calculation of R2:

• The R2 value will increase as explanatory variables added to the model

• The adjusted R2 introduces a penalty for the number of explanatory variables.

Page 32: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Other Model Evaluation Statistics

• Akaike Information Criterion (AIC or AICc)• Schwartz Information Criterion (SIC)• Bayesian Information Criterion (BIC)• Mallows’ Cp

• Prediction Sum of Squares (PRESS)

Page 33: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Model Selection

• 2 Goals: Complex enough to fit the data wellSimple to interpret, does not overfit

the data• Study the effect of each explanatory variable on the

response Y• Continuous Variable – Graph Y versus X• Categorical Variable - Boxplot of Y for categories of X

Page 34: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Model Selection cont.

• Multicollinearity• Correlations among explanatory variables resulting in an

increase in variance• Reduces the significance value of the variable • Occurs when several explanatory variables are used in the

model

Page 35: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Algorithmic Model Selection

• Backward Selection: Start with all explanatory variables in the model and remove those that are

insignificant• Forward Selection: Start with no explanatory variables in the

model and add best explanatory variables one at a time

• Stepwise Selection: Start with two forward selection steps then alternate backward and forward selection steps until no variables to add or remove

Page 36: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Example Dataset: Discrimination in Salaries

• A researcher was interested in whether there was discrimination in the salaries of tenure track professors at a small college. The professor collected six variables from 52 professors.

• Filename: Salary.xls• Reference: S. Weisberg (1985). Applied Linear Regression, Second Edition.

New York: John Wiley and Sons. Page 194.

Page 37: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Other Multiple Linear Regression Issues

• Outliers• Interaction Terms• Higher Order Terms

Page 38: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Multiple Linear Regression• Questions/Comments about Multiple Linear Regression

Page 39: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Regression with Non-Normal Response

1. Logistic Regression with Binary Response

2. Poisson Regression with Count Response

Page 40: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Logistic Regression• Consider a binary response variable.

• Variable with two outcomes • One outcome represented by a 1 and the other represented

by a 0• Examples:

Does the person have a disease? Yes or NoWho is the person voting for? McCain or

ObamaOutcome of a baseball game? Win or loss

Page 41: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Logistic Regression• Consider the linear probability model

where yi = response for observation i

xi = quantitative explanatory variable

Predicted values represent the probability of Y=1 given X• Issue: Predicted probability for some subjects fall outside of the

[0,1] range.

Page 42: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Logistic Regression• Consider the logistic regression model

• Predicted values from the regression equation fall between 0 and 1

Page 43: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Logistic Regression• Interpretation of Coefficient β – Odds Ratio

• The odds ratio is a statistic that measures the odds of an event compared to the odds of another event.

• Say the probability of Event 1 is π1 and the probability of Event 2 is π2 . Then the odds ratio of Event 1 to Event 2 is:

• Value of Odds Ratio range from 0 to Infinity• Value between 0 and 1 indicate the odds of Event 2 are greater• Value between 1 and infinity indicate odds of Event 1 are greater• Value equal to 1 indicates events are equally likely

22

11

1

1

2

1

)()(_

OddsOddsRatioOdds

Page 44: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Logistic Regression• Example Dataset: A researcher is interested how GRE exam scores,

GPA, and prestige of a students undergraduate institution affect admission into graduate school.

Filename: Admittance.csv

Important Note: JMP models the probability of the 0 category

Page 45: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Poisson Regression• Consider a count response variable.

• Response variable is the number of occurrences in a given time frame.

• Outcomes equal to 0, 1, 2, ….• Examples:

Number of penalties during a football game.Number of customers shop at a store on a given day.Number of car accidents at an intersection.

Page 46: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Poisson Regression• Consider the model

where yi = response for observation i

xi = quantitative explanatory variable for observation i

• Issue: Predicted values range from -∞ to +∞

Page 47: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Poisson Regression• Consider the Poisson log-linear model

• Predicted response values fall between 0 and +∞• In the case of a single predictor, An increase of one unit of x

results an increase of exp(β) in μ

iiii xxYE exp|

ii x log

Page 48: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Poisson Regression• Example Data Set: Researchers are interested in the number of

awards earned by students at a high school. Other variables measured as possible explanatory variables include type of program in which the student was enrolled (vocational, general, or academic), and the final score on their math final exam.

Filename: Awards.csv

Page 49: Presentation and Data http://  Short Courses Regression Analysis Using JMP

Attendee Questions

If time permits