6. multiple regression - proc glmpublicifsv.sund.ku.dk/~kach/sas/6. the general linear...multiple...
TRANSCRIPT
![Page 1: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/1.jpg)
6. Multiple regression - PROC GLM
Karl B Christensenhttp://192.38.117.59/~kach/SAS
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 2: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/2.jpg)
Contents
Analysis of covariance (ANCOVA)
the general linear model
Interaction
Multiple regression
Automatic variable selection
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 3: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/3.jpg)
Data example: lung capacity
Data from 32 patients subject to a heart/lung transplantation.TLC (Total Lung Capacity) is determined from whole-bodyplethysmography. Are men and women different with respect tototal lung capacity?
OBS SEX AGE HEIGHT TLC
1 F 35 149 3.40
2 F 11 138 3.41
3 M 12 148 3.80
. . . . .
. . . . .
29 F 20 162 8.05
30 M 25 180 8.10
31 M 22 173 8.70
32 M 25 171 9.45
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 4: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/4.jpg)
Box plots for comparison of sex groups
PROC GPLOT DATA=TLCdata;
PLOT tlc*sex / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM;
AXIS2 LABEL=(H=3 A=90) VALUE =(H=2);
SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 5: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/5.jpg)
Box plots for comparison of sex groups
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 6: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/6.jpg)
Group comparisons
Using t-tests1
PROC TTEST DATA=tlc;
CLASS sex;
VAR tlc height;
RUN;
Output
T-Tests
Variable Method Variances DF t Value Pr > |t|
TLC Pooled Equal 30 -3.67 0.0009
TLC Satterthwaite Unequal 29.7 -3.67 0.0009
Height Pooled Equal 30 -3.73 0.0008
Height Satterthwaite Unequal 29.5 -3.73 0.0008
Obvious sex difference for TLC as well as for Height
1Note that we can specify more than one variable in the VAR statement.Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 7: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/7.jpg)
Confounding when comparing groups
Occurs if the distributions of some other relevant explanatoryvariables differ between the groups. Here “relevant” meansthings we would have liked to be the same (or at least verysimilar) for everybody, because we think of it as noise ordistortion.
Can be reduced by performing a regression analysis with therelevant variables as covariates.
Confounding could be a problem in the current example, if weintended to compare the lung function between men andwomen of similar height
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 8: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/8.jpg)
Relation between TLC and HEIGHT
PROC GPLOT DATA=TLCdata;
PLOT tlc*height=sex / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=4) VALUE =(H=3) MINOR=NONE;
AXIS2 LABEL=(A=90 H=4) VALUE=(H=3) ORDER =(3 TO 10) MINOR=NONE;
SYMBOL1 C=RED V=DOT H=2 I=SM75S L=1 W=3 MODE=INCLUDE;
SYMBOL2 C=BLUE V=CIRCLE H=2 I=SM75S L=41 W=3 MODE=INCLUDE;
LEGEND1 LABEL =(H=2.5) VALUE =(H=2 JUSTIFY=LEFT);
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 9: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/9.jpg)
Relation between TLC and HEIGHT
���
������
������
�� �� �� �� � � ��
��� � �
���
������
������
�� �� �� �� � � ��
��� � �
(Plotted using I=RL)
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 10: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/10.jpg)
Analysis of covariance
Comparison of parallel regression lines
Model: ygi = αg + βxgi + εgi g = 1, 2; i = 1, · · · , ngHere α2 − α1 is the expected difference in the responsebetween the two groups for fixed value of the covariate, thatis, when comparing any two subjects who have the same valueof (match on) the covariate x (“adjusted for x”).
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 11: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/11.jpg)
But what if the lines are not parallel?More general model: ygi = αg + βgxgi + εgi
If β1 6= β2 there is an interaction between Height and Sex
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 12: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/12.jpg)
Interaction
Interaction between Height and Sex
The effect of height depends on sex
The difference between men and women depends on height
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 13: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/13.jpg)
Model with interaction
PROC REG only works for linear covariates. Group variables can behandled directly in PROC GLM by specifying the group variable as aCLASS variable.
PROC GLM DATA=TLCdata;
CLASS sex;
MODEL tlc=sex height sex*height / SOLUTION;
RUN; QUIT;
The option SOLUTION is needed if we want to see the regressionparameter estimates.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 14: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/14.jpg)
PROC GLM output
The GLM Procedure
Class Level Information
Class Levels Values
Sex 2 F M
Number of Observations Read 32
Number of Observations Used 32
Dependent Variable: TLC
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 42.81845030 14.27281677 10.28 <.0001
Error 28 38.89354970 1.38905535
Corrected Total 31 81.71200000
R-Square Coeff Var Root MSE TLC Mean
0.524017 19.36069 1.178582 6.087500
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 15: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/15.jpg)
PROC GLM output
Source DF Type I SS Mean Square F Value Pr > F
Sex 1 25.31161250 25.31161250 18.22 0.0002
Height 1 17.48233164 17.48233164 12.59 0.0014
Height*Sex 1 0.02450616 0.02450616 0.02 0.8953
Source DF Type III SS Mean Square F Value Pr > F
Sex 1 0.07951043 0.07951043 0.06 0.8127
Height 1 17.36061701 17.36061701 12.50 0.0014
Height*Sex 1 0.02450616 0.02450616 0.02 0.8953
The interaction is not significant.
The Type III p-values for the two main effects should neverbe used for anything in a model including the interaction!
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 16: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/16.jpg)
PROC GLM output
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -5.827971333 B 4.97706299 -1.17 0.2515
Sex F -1.727664141 B 7.22116113 -0.24 0.8127
Sex M 0.000000000 B . . .
Height 0.073564647 B 0.02854339 2.58 0.0155
Height*Sex F 0.005743619 B 0.04324220 0.13 0.8953
Height*Sex M 0.000000000 B . . .
These are the regression parameters
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 17: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/17.jpg)
Where are the two lines in the output?
Line for males (the reference group):
TLC = -5.828 + 0.07356 × Height
Line for females:
TLC = −5.828 + (−1.727) + (0.07356 + 0.00574)× Height
= −7.555 + 0.07930× Height
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 18: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/18.jpg)
Same model, new parameterization
PROC GLM DATA=TLCdata;
CLASS sex;
MODEL tlc=sex sex*height / NOINT SOLUTION;
RUN; QUIT;
Output (edited)
Standard
Parameter Estimate Error t Value Pr > |t|
Sex F -7.555635475 5.23201797 -1.44 0.1598
Sex M -5.827971333 4.97706299 -1.17 0.2515
Height*Sex F 0.079308266 0.03248326 2.44 0.0212
Height*Sex M 0.073564647 0.02854339 2.58 0.0155
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 19: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/19.jpg)
Two different parameterizations
PROC GLM DATA=TLCdata;
CLASS sex;
MODEL tlc=sex height sex*height / SOLUTION;
RUN; QUIT;
(extrapolated) level at Height=0 for reference group
(extrapolated) difference between groups at Height=0
An effect of Height (slope) for the reference group
The difference between the slopes for the two sexes
PROC GLM DATA=TLCdata;
CLASS sex;
MODEL tlc=sex sex*height / NOINT SOLUTION;
RUN; QUIT;
The (extrapolated) level at Height=0 for each group
The effect of Height (the slope) for each group
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 20: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/20.jpg)
Model without interaction
No indication of interaction, we omit the term
PROC GLM DATA=TLCdata;
CLASS sex;
MODEL tlc=sex height / SOLUTION CLPARM;
RUN; QUIT;
Are there also other possible parameterizations in this model? (andwhich one should we use?)
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 21: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/21.jpg)
Source DF Type I SS Mean Square F Value Pr > F
Sex 1 25.31161250 25.31161250 18.86 0.0002
Height 1 17.48233164 17.48233164 13.03 0.0011
Source DF Type III SS Mean Square F Value Pr > F
Sex 1 3.24523555 3.24523555 2.42 0.1308
Height 1 17.48233164 17.48233164 13.03 0.0011
Note: The effect of sex seen in the group comparison hasdisappeared!!
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 22: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/22.jpg)
Model without interaction - results
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -6.263569903 B 3.67983781 -1.70 0.0994
Sex F -0.770859760 B 0.49571132 -1.56 0.1308
Sex M 0.000000000 B . . .
Height 0.076067188 0.02107532 3.61 0.0011
Parameter 95% Confidence Limits
Intercept -13.78968328 1.262543472
Sex F -1.784703236 0.242983716
Sex M . .
Height 0.032963311 0.119171065
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 23: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/23.jpg)
Confounding?
In this example it seems that
1 The observed difference in lung capacity between men andwomen can be explained by height differences
2 However, there may still be a sex difference for persons of thesame height (women vs. men), estimated as−0.77± 2× 0.50 = (−1.78, 0.24)
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 24: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/24.jpg)
But. . .
what if we did not have the two very short men to pull the line forthe men? Let us look at the subjects above 152 cm (using thestatement WHERE height>152; in PROC GLM). Test of interaction:
Source DF Type I SS Mean Square F Value Pr > F
Sex 1 22.10748067 22.10748067 19.92 0.0002
Height 1 0.25519165 0.25519165 0.23 0.6361
Height*Sex 1 2.76108429 2.76108429 2.49 0.1284
Estimated additive effects
Parameter Estimate 95% Confidence Limits Pr > |t|
Intercept 10.53318707 B -3.47974795 24.54612210 0.1339
Sex F -2.04829071 B -3.40956535 -0.68701607 0.0048
Sex M 0.00000000 B . . .
Height -0.01778053 -0.09665451 0.06109345 0.6459
A somewhat different conclusion. . .
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 25: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/25.jpg)
Plots for model checking in the HTML output:
ODS GRAPHICS ON;
PROC GLM DATA=TLCdata PLOTS=( DIAGNOSTICS RESIDUALS(SMOOTH ));
CLASS sex;
MODEL tlc=sex height sex*height / SOLUTION;
OUTPUT OUT=WithResid RSTUDENT=NormResidWithoutCurrent;
RUN; QUIT;
PROC GPLOT DATA=WithResid;
PLOT NormResidWithoutCurrent * sex;
SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;
RUN; QUIT;
In addition to the ODS GRAPHICS plots for PROC GLM, residualsshould be plotted against each of the CLASS variables (here sex) inorder to check variance homogeneity
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 26: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/26.jpg)
Exercise: Another look the Juul data
1 Get the data into SAS using a libname statement.
2 Create a new data set including only individuals above 25years, and make a new variable with log-transformed SIGF1.
3 Use PROC GPLOT to plot the relationship between age andlog-transformed SIGF-I.
4 Make separate regression lines for men and women.
5 Do a regression analysis to explore if slopes are equal in menand women.
6 Give an estimate for the difference in slopes, with 95%confidence interval.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 27: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/27.jpg)
Multiple regression. General linear model (GLM).
Data: n sets of observations, made on the same ’unit’:
unit x1....xp y
1 x11....x1p y12 x21....x2p y23 x31....x3p y3. . . . . . . .n xn1....xnp yn
The linear regression model with p explanatory variables(covariates) is written:
y = β0 + β1x1 + · · ·+ βpxp + ε
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 28: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/28.jpg)
Interpretation of regression coefficients
ModelYi = β0 + β1Xi1 + β2Xi2 + ...+ βpXip + ε
where ε ∼ N(0, σ2). Consider two subjects:A has covariate values (X1,X2, . . . ,Xp)B has covariate values (X1 + 1,X2, . . . ,Xp)Expected difference in the response (B − A)
[β0 + β1(X1 + 1) + β2Xi2 + ...]− [β0 + β1X1 + β2Xi2 + ...] = β1
This means that β1 is the effect of one unit’s difference in X1 forfixed levels of the other variables (X2, . . . ,Xp)
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 29: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/29.jpg)
School-age obesity data
School-age obesity score versus height and weight measured at 1year of age
Obs Obesity Height1 Weight1
1 -0.06967 79 11.70
2 -0.79982 72 9.55
3 2.67337 76 9.95
. . . .
. . . .
196 0.47968 78 10.60
197 -0.61818 77 10.10
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 30: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/30.jpg)
School-age obesity data
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 31: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/31.jpg)
SAS code
PROC REG DATA=SchoolObesity;
MODEL Obesity = Height1 Weight1 / CLB;
RUN; QUIT;
(part of the) output
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.18668 1.16769 0.16 0.8731
Height1 1 -0.06644 0.02163 -3.07 0.0024
Weight1 1 0.47653 0.07097 6.71 <.0001
Parameter Estimates
Variable DF 95% Confidence Limits
Intercept 1 -2.11631 2.48967
Height1 1 -0.10910 -0.02379
Weight1 1 0.33656 0.61650
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 32: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/32.jpg)
Interpretation of regression parameters
Remember that βj is the effect of the j’th explanatory variable,corrected for the effect of the other explanatory variables, that is,when comparing any two subject who match on all the othervariables.
The effect of Height1 corrected for the effect of Weight1 isfound to be β̂1 = −0.066 (95% CI: −0.109 to −0.024),p = 0.0024
In the univariate model without correction for Weight1 we gotβ̂1 = +0.048 (95% CI: +0.019 to +0.077), p = 0.0014
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 33: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/33.jpg)
Interpretation of regression parameters
The parameter for height answers two different questionsdepending on whether or not adjusted for weight:
Unadj. ’Are big 1-year-old children generally fatter during schoolage?’
Adj. ’Are slim 1-year-old children generally slimmer during schoolage?’
Both questions are relevant and both answers are valid!
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 34: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/34.jpg)
Relative effects and products or ratios of covariates
Both issues are solved by log-transforming the covariate(s)!Example: BMI = Weight/Height2 is a ratio measure. Logarithmicrules give
log(BMI) = log(Weight) − 2·log(Height)
so β·log(BMI) = β·log(Weight) −2β·log(Height)Choice of log-transformation of covariates
Use of log10 means that the regression parameter shows theeffect of two subjects differing by a factor 10. Do not uselog10 unless it is likely for two subjects to differ by a factor 10!
Use log2 [SAS code: LOG2(·)] when doubling is likely.
Use a covariate calculated as
XX=LOG(·)/LOG(1.1)
if 10% differences are likely.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 35: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/35.jpg)
BMI at age 1 appropriate predictor for school-age obesity?
1 BMI is a ratio measure involving weight and height, so weshould investigate log-transformed weight and height
2 Doubling is not a realistic difference, so we look at “per 10%”
DATA School1;
SET SchoolObesity;
HeightPer10pct = LOG(Height1 )/LOG (1.1);
WeightPer10pct = LOG(Weight1 )/LOG (1.1);
RUN;
PROC REG DATA=School1;
MODEL Obesity = HeightPer10pct WeightPer10pct / CLB;
TestBMI: TEST HeightPer10pct = -2* WeightPer10pct;
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 36: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/36.jpg)
Part of output
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| 95% Conf. Limits
Intercept 1 9.80370 5.95742 1.65 0.1015 -1.94593 21.55334
HeightPer10pct 1 -0.45993 0.15673 -2.93 0.0037 -0.76904 -0.15082
WeightPer10pct 1 0.45679 0.06714 6.80 <.0001 0.32437 0.58922
Test TestBMI Results for Dependent Variable Obesity
Mean
Source DF Square F Value Pr > F
Numerator 1 15.72218 20.34 <.0001
Denominator 194 0.77312
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 37: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/37.jpg)
Conclusion:
1 10% higher weight increases the expected school-age obesityscore by 0.456 (95% CI: 0.324 - 0.589),
2 10% lower height increases the expected school-age obesityscore by 0.460 (95% CI: 0.151 - 0.769),
3 BMI at age 1 year is not an appropriate choice (p < 0.0001).
4 Since the regression parameters for the log-transformed weightand height are of the same size, but with opposite signs, anappropriate predictor would be the ratio weight/height
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 38: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/38.jpg)
Model selection
Lung function - 25 patients with cystic fibrosis2
2O’Neill et al (1983).Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 39: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/39.jpg)
Which covariates have a univariate effect on the outcome PEmax?
Are these the variables to be included in the model?
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 40: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/40.jpg)
Model with all covariates
PROC REG DATA=pemax;
MODEL pemax=age sex height weight bmp fev1 rv frc tlc;
RUN; QUIT;
output
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 176.05821 225.89116 0.78 0.4479
age 1 -2.54196 4.80170 -0.53 0.6043
sex 1 -3.73678 15.45982 -0.24 0.8123
height 1 -0.44625 0.90335 -0.49 0.6285
weight 1 2.99282 2.00796 1.49 0.1568
bmp 1 -1.74494 1.15524 -1.51 0.1517
fev1 1 1.08070 1.08095 1.00 0.3333
rv 1 0.19697 0.19621 1.00 0.3314
frc 1 -0.30843 0.49239 -0.63 0.5405
tlc 1 0.18860 0.49974 0.38 0.7112
No significant effects. . .
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 41: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/41.jpg)
Automatic variable selection: Forward selection
Start with no covariates. In every step, add the most significantvariable
PROC REG DATA=pemax;
MODEL pemax=age sex height weight bmp fev1 rv frc tlc
/ SELECTION=FORWARD;
RUN; QUIT;
Final model: Weight BMP FEV1
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 42: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/42.jpg)
Automatic variable selection: Backward elimination
Start with all covariates. At each step, omit the least significantvariable
PROC REG DATA=pemax;
MODEL pemax=age sex height weight bmp fev1 rv frc tlc
/ SELECTION=BACKWARD;
RUN; QUIT;
Final model: Weight BMP FEV1
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM
![Page 43: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS](https://reader035.vdocument.in/reader035/viewer/2022081623/614241cb55c1d11d1b341433/html5/thumbnails/43.jpg)
But. . .
There is no guarantee that these automatic methods will give usthe same result:
Had observation no. 25 not been in the data set, backwardelimination would have excluded Height as the first variable,while forward selection would have included Height as the firstvariable!
A ’best’ automatic method has not been identified, butbackward elimination is often recommended over forwardselection.
WARNING: Output from selected model does not take modelselection uncertainty into account: The output (regressioncoefficients and p-values) is identical to what would have beenobtained had we fitted the final model with out doing anymodel selection. The importance of the selected covariates isover-estimated!
Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM