multiple linear regression. multiple regression in multiple regression we have multiple predictors x...
TRANSCRIPT
Multiple Linear RegressionMultiple Linear Regression
Multiple RegressionMultiple RegressionIn multiple regression we have multiple In multiple regression we have multiple
predictors Xpredictors X11, X, X22, …, X, …, Xpp and we are and we are
interested in modeling the mean of the interested in modeling the mean of the response Y as function of these predictors, response Y as function of these predictors, i.e. we wish to estimate i.e. we wish to estimate E(Y| XE(Y| X11, X, X22, …, X, …, Xpp) or E(Y|) or E(Y|XX). In linear ). In linear
regression we will use a linear function of the regression we will use a linear function of the model parameters, e.g.model parameters, e.g.
E(Y|XE(Y|X11,X,X22) = ) = oo + + 11XX11 + + 22XX22 + + 1212XX11XX22
E(Y|XE(Y|X11,X,X22,X,X33) = ) = oo + + 11ln(Xln(X11) + ) + 22XX2222++33XX33
Example 1: NC Birth Weight DataExample 1: NC Birth Weight Data
Y = birth weight of infant (g)Y = birth weight of infant (g)Consider the following potential predictorsConsider the following potential predictorsXX11 = mother’s age (yrs.) = mother’s age (yrs.)
XX22 = father’s age (yrs.) = father’s age (yrs.)
XX33 = mother’s education (yrs.) = mother’s education (yrs.)
XX4 4 = father’s education (yrs.)= father’s education (yrs.)
XX55 = mother’s smoking status (1 = yes, 0 = no) = mother’s smoking status (1 = yes, 0 = no)
XX66 = weight gained during pregnancy (lbs.) = weight gained during pregnancy (lbs.)
XX77 = gestational age (weeks) = gestational age (weeks)
XX88 = number of prenatal visits = number of prenatal visits
XX99 = race of child (White, Black, Other) = race of child (White, Black, Other)
Dichotomous Categorical Dichotomous Categorical PredictorsPredictors
• In this study smoking status (XIn this study smoking status (X55) is an ) is an example of dichotomous (2 level) example of dichotomous (2 level) categorical predictor. How do use a categorical predictor. How do use a predictor like this in a regression model?predictor like this in a regression model?
• There are two approaches that get used:There are two approaches that get used:One approach is to code smoking status as One approach is to code smoking status as 0 or 1 and treat it as a numeric predictor 0 or 1 and treat it as a numeric predictor (this is called “0-1 coding”) (this is called “0-1 coding”)
The other is to code smoking status as -1 or The other is to code smoking status as -1 or 1 and treat it as a numeric predictor 1 and treat it as a numeric predictor (this is called “contrast coding”)(this is called “contrast coding”)
Example 1: NC Birth Weight DataExample 1: NC Birth Weight DataWe first consider 0-1 codingWe first consider 0-1 coding
and fit the model E(Y|Xand fit the model E(Y|X55) = ) = oo + + 55XX55
E(Y|Smoker) = 3287.66 – 214.85(1) = 3072.80 gE(Y|Smoker) = 3287.66 – 214.85(1) = 3072.80 g
E(Y|Non-smoker) = 3287.66 – 214.85(0) = 3287.66 gE(Y|Non-smoker) = 3287.66 – 214.85(0) = 3287.66 g
smoker-non 0
smoker 1 Status Smoking 5X
Example 1: NC Birth Weight DataExample 1: NC Birth Weight Data
Compare to a pooled t-testCompare to a pooled t-test
E(Y|Smoker) = 3072.80 gE(Y|Smoker) = 3072.80 gE(Y|Non-smoker) = 3287.66 gE(Y|Non-smoker) = 3287.66 g
Regression Output (0-1 coding)
95% CI for 5: -214.85 + 1.96*57.84 = (-328.86, -101.34)
Punchline: Two-sample t-test is equivalent to regression!!
Example 1: NC Birth Weight DataExample 1: NC Birth Weight DataNow consider -1 / +1 codingNow consider -1 / +1 coding
and fit the model E(Y|Xand fit the model E(Y|X55) = ) = oo + + 55XX55
E(Y|Smoker) = 3180.18 + 107.38( -1) = 3072.80 gE(Y|Smoker) = 3180.18 + 107.38( -1) = 3072.80 g
E(Y|Non-smoker) = 3180.18 + 107.38(+1) = 3287.66 gE(Y|Non-smoker) = 3180.18 + 107.38(+1) = 3287.66 g
smoker-non 1
smoker 15X
Example 1: NC Birth Weight DataExample 1: NC Birth Weight Data
Compare to a pooled t-testCompare to a pooled t-test
E(Y|Smoker) = 3072.80 gE(Y|Smoker) = 3072.80 gE(Y|Non-smoker) = 3287.66 gE(Y|Non-smoker) = 3287.66 g
Regression Output (-1/+1 coding)
2*(95% CI for 5): 2(107.38 + 1.96*28.90) = (101.34, 328.36)
Punchline: Two-sample t-test is equivalent to regression!!
2 x
Factors with more than two levelsFactors with more than two levelsConsider Consider RaceRace of the child coded as: of the child coded as:
W = white, B = black, O = other W = white, B = black, O = otherE(Birth Weight|Race) = ?????E(Birth Weight|Race) = ?????
E(Birth Weight|White) = 3226.33 – 159.52(-1) + 56.74(-1)E(Birth Weight|White) = 3226.33 – 159.52(-1) + 56.74(-1) = 3329.11 g= 3329.11 g
E(Birth Weight|Black) = 3226.33 – 159.52(+1)E(Birth Weight|Black) = 3226.33 – 159.52(+1) = 3066.81 g= 3066.81 g
E(Birth Weight|Other) = 3226.33 + 56.74(+1)E(Birth Weight|Other) = 3226.33 + 56.74(+1) = 3283.08 g= 3283.08 g
What comes alphabetically last is the “reference group”, the other groups are coded as -1/+1.
Factors with more than two levelsFactors with more than two levels
E(Birth Weight|White) = 3329.11 g
E(Birth Weight|Black) = 3088.62 g
E(Birth Weight|Other) = 3283.08 g
Tukey’s RegressionTukey’s Regression
Mean birth weight of black infants significantly differs from that for white infants as white infants are the reference group (p < .0001). However, non-black minority infants do not significantly differ from the white infants in terms of mean birth weight (p = .2729).
Blacks infants have a significantly lower mean birth weight than both white and non-black minority infants.
ANOVA = Regression!ANOVA = Regression!
One-way ANOVA is equivalent to One-way ANOVA is equivalent to regression on the {-1 ,+1} coded regression on the {-1 ,+1} coded levels of the factor with one of the levels of the factor with one of the kk populations to be compared being populations to be compared being viewed as the reference group.viewed as the reference group.
Example: NC Birth WeightsExample: NC Birth Weights
We have evidence that the mean birth weight of infants born to the population of smoking mothers is between 102.5 and 327.06 g less than the mean birth weight of infants born to non-smokers. Does this mean that if we compared the populations of full-term babies that the mean birth weights of babies born to smokers would be lower than that for those born to non-smokers?
Not necessarily, maybe smoking leads to earlier births and that is the reason for the overall difference above.
Example: NC Birth WeightsExample: NC Birth Weights
One way to explore this possibility is to add One way to explore this possibility is to add gestational age as a covariate to a regression gestational age as a covariate to a regression model already containing smoking status, i.e.model already containing smoking status, i.e.
wherewhere
7755).,|( XXAgeGestSmokingWeightE o
(weeks) Age lGestationa
smoker-non 1
smoker 1 Status Smoking
7
5
X
and
X
7750
7750
)(Gest.Age)Smoker,|(
)(Gest.Age)Nonsmoker,|(
XWeightE
XWeightE
Example: NC Birth WeightsExample: NC Birth Weights
The estimated equation isThe estimated equation is
thus for smokers and non-smokers we havethus for smokers and non-smokers we have
The difference between the smokers and The difference between the smokers and non-smokers is holding non-smokers is holding gestational age constant.gestational age constant.
AgeGestSmokingAgeGestSmokingWeightE .32.12613.8903.1671).,|(
AgeGest.
AgeGestWeightE
Gest.Age
AgeGestWeightE
.32.126901581
.32.12613.8903.1671 Gest.Age)Nonsmoker,|(
126.321760.16
.32.12613.8903.1671Gest.Age)Smoker,|(
g 26.17813.892
Example: NC Birth WeightsExample: NC Birth Weights 95% CI for the “Smoking Effect” for infants with a given 95% CI for the “Smoking Effect” for infants with a given
gestational age is gestational age is 2*(89.13 2*(89.13 ++ 1.96*24.12) 1.96*24.12) = 2*(41.85,136.41) = (83.70 g, 272.82 g)= 2*(41.85,136.41) = (83.70 g, 272.82 g)
Thus adjusting for gestational age, we estimate that the mean Thus adjusting for gestational age, we estimate that the mean birth weight of infants born to smoking mothers is between birth weight of infants born to smoking mothers is between 83.70 g and 272.82 g lower than the mean birth weight of 83.70 g and 272.82 g lower than the mean birth weight of infants born to non-smoking mothers.infants born to non-smoking mothers.
Q:Q: What if the effect of gestational age is different for smokers What if the effect of gestational age is different for smokers and non-smokers? For example, maybe for smokers an and non-smokers? For example, maybe for smokers an additional week of gestational age does not translate to the additional week of gestational age does not translate to the same increase in birth weight as it does for non-smokers? same increase in birth weight as it does for non-smokers? What should we do? What should we do?
A:A: Add a smoking and gestational age interaction term, Add a smoking and gestational age interaction term, Smoking*Gest.AgeSmoking*Gest.Age, which will allow the lines for smokers , which will allow the lines for smokers and nonsmokers to different slopes.and nonsmokers to different slopes.
Example: NC Birth WeightsExample: NC Birth Weights
The lines here look very parallel, so there is little evidence of a significant interaction in the form of different slopes.
The interaction is not statistically significant (p = .9564). So the parallel lines model is sufficient.
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
Study of premature infants born at three hospitals.
Variables are:
• Birth weight (g)
• Gest. Age (wks.)
• Hospital (A,B,C)
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
Do the mean birth weights significantly differ across the three hospitals in this study?
Using one-way ANOVA we find that the means significantly differ (p = .0022).
We conclude the mean birth weight of infants born at Hospital A is significantly lower than the mean birth weight of infants at Hospital B, we estimate between 128.1 g and 611.0 g lower.
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
What role does gestational age play in these What role does gestational age play in these differences? Perhaps gestational age differs differences? Perhaps gestational age differs across hospitals and that helps explains the across hospitals and that helps explains the birth weight differences.birth weight differences.
One-way ANOVA yields p = .1817 for comparing the mean gestational ages of infants born at the three hospitals.
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
This is a scatter plot of birth weight vs. gestational age with the points color coded by hospital. Is there evidence that the weight gain per week differs between the hospitals?
The lines seem to suggest that the weight gain per week differs across the hospitals.
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
AgeGestAgeGest
AgeGestAgeGestCHospitalAgeGestWeightE
AgeGest
AgeGestAgeGestBHospitalAgeGestWeightE
AgeGest
AgeGestAgeGestAHospitalAgeGestWeightE
.49.7605.113660.3008.31.59.3016.2908.31
.16.2983.15391.136.92.7792.1162),.|(
.52.10894.1959
60.3008.31.60.3083.153.92.7792.1162),.|(
.76.4854.392
16.2908.31.16.2991.135.92.7792.1162),.|(
C hospital if 1
B hospital if 1][
C hospital if 1
A hospital if 1][ BHospitalAHospital
Example 2: Birth Weight, Gestational Age & HospitalExample 2: Birth Weight, Gestational Age & Hospital
AgeGestCHospitalAgeGestWeightE
AgeGestBHospitalAgeGestWeightE
AgeGestAHospitalAgeGestWeightE
.49.7605.1136),.|(
.52.10894.1959),.|(
.76.4854.392),.|(
The intercepts are meaningless for these data. For hospital A we see that the weight gain for premature babies is 48.76 g/week, 108.52 g/week for hospital B, and 76.49 g/week for hospital C. As a result the differences between the mean birth weights as function of age are larger for infants that are closer to full term.
Analysis of Covariance (ANCOVA)Analysis of Covariance (ANCOVA)
These two examples are analysis of These two examples are analysis of covariance models where we were covariance models where we were primarily interested in potential differences primarily interested in potential differences between populations defined but a between populations defined but a nominal variable (e.g. smoking status) and nominal variable (e.g. smoking status) and we are making adjustment in that we are making adjustment in that comparison for other factors such as comparison for other factors such as gestational age. The variables that we are gestational age. The variables that we are adjusting for are called adjusting for are called covariatescovariates. .
Example 1: NC Birth Data (cont’d)Example 1: NC Birth Data (cont’d)We now consider comparing smoking and We now consider comparing smoking and
non-smoking mothers adjusting for the “full non-smoking mothers adjusting for the “full set” of potential confounding factors.set” of potential confounding factors.
XX11 = mother’s age (yrs.) = mother’s age (yrs.)
XX22 = father’s age (yrs.) = father’s age (yrs.)
XX33 = mother’s education (yrs.) = mother’s education (yrs.)
XX44 = father’s education (yrs.) = father’s education (yrs.)
XX55 = mother’s smoking status (1 = yes, 0 = no) = mother’s smoking status (1 = yes, 0 = no)
XX66 = weight gained during pregnancy (lbs.) = weight gained during pregnancy (lbs.)
XX77 = gestational age (weeks) = gestational age (weeks)
XX88 = number of prenatal visits = number of prenatal visits
XX99 = race of child (White, Black, Other) = race of child (White, Black, Other)
Example 1: NC Birth Data (cont’d)Example 1: NC Birth Data (cont’d)
][][
...
..)|(
987
6543
21~
ORaceBRacelNumprenata
GainWtSmokingEducMotherEducFather
AgeMotherAgeFatherXtBirthweighE o
Covariates
Example 1: NC Birth Data (cont’d)Example 1: NC Birth Data (cont’d)
Effect TestsEffect Tests
These covariates are not significant but are also fairly correlated, thus they contain much the same information. We might consider removing some or potentially all of these predictors from the model.
Example 1: NC Birth Data (cont’d)Example 1: NC Birth Data (cont’d)
Age of the mother and father are quite correlated (r = .7539), thus it is unlikely both of these pieces of information would be needed in the same regression model. When this happens we say there is multicollinearity amongst the predictors.
Also in regression, when building models we wish them to be parsimonious, i.e. be simple but effective.
Stepwise Model SelectionStepwise Model SelectionWhen building regression models one of the simplest When building regression models one of the simplest
strategies is to use is stepwise model selection. strategies is to use is stepwise model selection. There are two main types of stepwise methods: There are two main types of stepwise methods: forward selection and backward elimination.forward selection and backward elimination.
Forward SelectionForward Selection
1.1. Fit model with intercept only, E(Y|X)=Fit model with intercept only, E(Y|X)=
2.2. Fit model adding the “best” predictor amongst those Fit model adding the “best” predictor amongst those available. This could be done by looking at one with available. This could be done by looking at one with maximum Rmaximum R22 for example. for example.
3.3. Continue adding predictors one at time, maximizing the Continue adding predictors one at time, maximizing the RR2 2 at each step until no more predictors can be added at each step until no more predictors can be added that have p-values that have p-values << . . Generally Generally is chosen to be .10 is chosen to be .10 or potentially higher.or potentially higher.
Stepwise Model SelectionStepwise Model SelectionWhen building regression models one of the simplest When building regression models one of the simplest
strategies is to use is stepwise model selection. strategies is to use is stepwise model selection. There are two main types of stepwise methods: There are two main types of stepwise methods: forward selection and backward elimination.forward selection and backward elimination.
Backward EliminationBackward Elimination
1.1. Fit model with all potential predictors added.Fit model with all potential predictors added.
2.2. Remove worst predictor as judged by highest p-value Remove worst predictor as judged by highest p-value usually.usually.
3.3. Continue removing predictors one at time until all p-Continue removing predictors one at time until all p-values for included predictors are values for included predictors are << . . Again, generally Again, generally is chosen to be .10 or potentially higher.is chosen to be .10 or potentially higher.
This is the approach I usually take.
Example 1: NC Birth Data Example 1: NC Birth Data Backward EliminationBackward Elimination
Step 1: Remove Father’s Education
Step 2: Remove Father’s Age
Step 3: Stop, no p-values > .10.
Example 1: NC Birth Data (cont’d)Example 1: NC Birth Data (cont’d)
][40.41][64.96.04.13.45.116.41.6
.87.85.22.10.43.693.1700)|(ˆ~
ORaceBRaceprenatalNumberAgeGestGainWt
StatusSmokingEducMothersAgeMothersXWeightE
R2 = 35.62% of the variation in birth weight is explained by our model.
Fitted Model
Interpretation of Smoking Status
Adjusting for mother’s age & education, weight gain during pregnancy, gestational age & race of the infant, and number of prenatal visits we find the smoking mothers have a mean birth weight which is 2 x 85.87 = 171.74 g less than that for mothers who do not smoke during pregnancy.
95% CI for Difference in Means 95% CI for Difference in Means
After adjusting for mother’s age & years of education, weight gain during pregnancy, gestational age & race of the infant, and number of prenatal visits, we estimate that the mean birth weight of infants born to women who smoke during pregnancy is between 77 g and 266 g less than that for women who do not smoke during pregnancy.
kerker smosmonon
g) 33.266 , g 16.77(
)16.133,58.38(2
)13.2496.187.85(2))ˆ(96.1ˆ(2 55
SE This can also be obtained directly from parameter estimates.
Checking AssumptionsChecking AssumptionsAssumptionsAssumptions1.1. The specified function form for E(Y|The specified function form for E(Y|XX) is ) is
adequate. adequate.2.2. The Var(Y|X) or SD(Y|X) is constant. The Var(Y|X) or SD(Y|X) is constant. 3.3. Random errors are normally distributed.Random errors are normally distributed.4.4. Error are independent.Error are independent.
Basic plots:Basic plots:• Residuals vs. Fitted Values (checks 1, 2, 4)Residuals vs. Fitted Values (checks 1, 2, 4)• Normal Quantile Plot of Residuals (checks 3)Normal Quantile Plot of Residuals (checks 3)
Note: These are the same plots used in simple linear Note: These are the same plots used in simple linear regression to check model assumptions. regression to check model assumptions.
Checking AssumptionsChecking AssumptionsWith the exception of a few mild outliers and one fairly extreme outlier there are no obvious violations of model assumptions, there is no curvature evidence and the variation looks constant.
Residuals are approximately normally distributed with the exception of a few extreme outliers on the low end.
Example 3: Factors Related to Job Example 3: Factors Related to Job Performance of NursesPerformance of Nurses
A nursing director would like to use nurses’ personal A nursing director would like to use nurses’ personal characteristics to develop a regression model for characteristics to develop a regression model for predicting job performance (JOBPER). The predicting job performance (JOBPER). The following potential predictors are available:following potential predictors are available:
• XX11 = assertiveness (ASSERT) = assertiveness (ASSERT)• XX22 = enthusiasm (ENTHUS) = enthusiasm (ENTHUS)• XX33 = ambition (AMBITION) = ambition (AMBITION)• XX44 = communication skills (COMM) = communication skills (COMM)• XX55 = problem-solving skills (PROB) = problem-solving skills (PROB)• XX66 = initiative (INITIATIVE) = initiative (INITIATIVE)• Y = job performance (JOBPER)Y = job performance (JOBPER)
Example 3: Factors Related to Job Example 3: Factors Related to Job Performance of NursesPerformance of Nurses
Example 3: Factors Related to Job Example 3: Factors Related to Job Performance of NursesPerformance of Nurses
Correlations and Scatter Plot MatrixCorrelations and Scatter Plot MatrixWe can see that ambition has the strongest correlation with performance (r = .8787, p < .0001) and problem-solving skills the weakest (r = .1555, p = .4118). It also interesting to note that initiative has a negative correlation with performance (r = -.5777, p =.0008).
What really would like to see is the correlation between job performance and each variable adjusting for the other variables because we can clearly see that the predictors themselves are related.
Partial CorrelationsPartial CorrelationsThe partial correlation between a response/dependent The partial correlation between a response/dependent
variable (Y) and predictor/independent variable (Xvariable (Y) and predictor/independent variable (X ii) is ) is
a measure of the strength of linear association a measure of the strength of linear association between Y and Xbetween Y and Xi i adjusted for the other independent adjusted for the other independent
variables being considered. variables being considered.
Taking the other variables into account we that ambition (partial corr. = .8023) and initiative (partial corr. = -.4043) have the strongest adjusted relationship with job performance. We would therefore expect these variables to be a “final” regression model for job performance.
Example 3: Factors Related to Job Example 3: Factors Related to Job Performance of NursesPerformance of Nurses
INITIATIVEPROBCOMM
AMBITIONENTHUSASSERTXJOBPERE o
654
321~
)|(
Several predictors appear to be unimportant and could be removed from the model, we will again use backward elimination to do this.
R2 = 84.8% of the variation in job performance is explained by the model. The adjusted R-square penalizes for having too many predictors in the model. Every predictor added to a model will increase the R-square, however we generally reach a point of diminishing returns as we continue to add predictors. Here the adjusted R2 = 80.9%.
Added Variable (Leverage) PlotsAdded Variable (Leverage) Plots
These plots are a visualization of the partial correlation. They show the relationship between the response Y and each of the predictors adjusted for the other predictors. The correlation exhibited in each is the partial correlation.
Ambition and Initiative exhibit the strongest adjusted relationship with job performance.
Using backward eliminationUsing backward elimination
Example 3: Factors Related to Job Example 3: Factors Related to Job Performance of NursesPerformance of Nurses
Step 1: Drop Problem-Solving
Step 2: Drop Communication
Step 3: Drop Enthusiasm
Step 4: Drop Assertiveness
R2 = 80.7% of variation in job performance explained by the regression on ambition and initiative. Notice this is not much different than the adjusted R2 for the full model.
Checking AssumptionsChecking Assumptions
No problems here…. Or here…
“Final” Regression ModelINITIATIVEAMBITIONINITIATIVEAMBITIONJOBPERE 446.787.96.31),|(ˆ
SummarySummary• Two-sample t-tests, one-way, and two-way Two-sample t-tests, one-way, and two-way
ANOVA are all really just regression models ANOVA are all really just regression models with nominal predictors.with nominal predictors.
• Analysis of Covariance (ANCOVA) is also Analysis of Covariance (ANCOVA) is also just regression where we are interested in just regression where we are interested in making population/treatment comparisons making population/treatment comparisons adjusting for the potential effects of other adjusting for the potential effects of other factors/covariates.factors/covariates.
• Multiple regression in general is process of Multiple regression in general is process of estimating the mean response of a variable estimating the mean response of a variable (Y) using multiple predictors/independent (Y) using multiple predictors/independent variables, E(Y|Xvariables, E(Y|X11,…,X,…,Xpp).).
SummarySummary
• Partial correlation and added variable or Partial correlation and added variable or leverage plots help understand the leverage plots help understand the relationship between the response and an relationship between the response and an individual independent variable adjusting individual independent variable adjusting for the other independent variables being for the other independent variables being considered.considered.
• Assumption checking is basically the same Assumption checking is basically the same as it was for simple linear regression.as it was for simple linear regression.
SummarySummary• When problems are evident general When problems are evident general
remedies include:remedies include:• Transforming the response (Y)Transforming the response (Y)• Transforming the predictorsTransforming the predictors• Adding nonlinear terms to the model like squared Adding nonlinear terms to the model like squared
terms (Xterms (Xii22) or including interaction terms.) or including interaction terms.
• Still need to be aware of “strange” Still need to be aware of “strange” observations, i.e. outliers and influential observations, i.e. outliers and influential points.points.