a casual tutorial on sample size planning for multiple regression models d. keith williams m.p.h....

87
sual Tutorial on Sample Size Plann for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Upload: jocelin-maxwell

Post on 13-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

A Casual Tutorial on Sample Size Planning for Multiple Regression Models

D. Keith Williams M.P.H. Ph.D.Department of Biostatistics

Page 2: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics
Page 3: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Area = 0.16

1.00

Page 4: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Area = 0.47

2.00

Page 5: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Area = 0.81

3.00

Page 6: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

3.87

Area = 0.955

Page 7: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics
Page 8: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Buzzwords

• Beta () = P(Type II error) = P(Conclude the experimental groups are the same when they really are different)

• Power = 1 - = P(Conclude experimental groups are different when they really are!)

Page 9: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Non Centrality ParameterTwo Group t-test

21

21

11nn

221 nndf

Page 10: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

An Example Scenario

• Alpha =0.05, sigma=2

• |mu1 – mu2| = 2, that is, a two unit diff in means for a population

• Propose n1 = 10 and n2 = 10

236.2

101

101

2

2

11

21

21

nn

Page 11: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Rejection region for two tailed t-test alpha=0.05, df = 18

Page 12: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Noncentrality value =2.236, Critical value = |2.101|Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18Power between 0.47 and 0.81, SAS calculation 0.56195

Page 13: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics
Page 14: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Key Point of the Review

• One conjectures the difference in means to estimate power in studies that compare means.

• In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.

Page 15: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Regression Power and Sample Size

• Power for specific predictors in the presence of other covariates in a model.

• More complex to conceptualize than testing differences among means.

Page 16: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Example Data Set

Page 17: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Hypothetical ScenarioA model with 4 terms

Predictors for PSA of interest that we choose to power:

1.SVI2.c_volume

Two Covariates to be included : cpen, gleason

Page 18: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Approaches in Estimating the Parameters to Calculate Power

Plan A• Complete specification of the parts for the

expression:

Page 19: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Details

gleasoncopenvolCSVIy43210

_

gleasoncopeny430

The full model We want to power the test that a model with these

2 predictors is statistically better than a model excluding them.

The reduced model

Page 20: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Full Model

Root MSE 30.98987 R-Square 0.4467

Dependent Mean 23.73013 Adj R-Sq 0.4226

Coeff Var 130.59291Predictors of interest

Note

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -40.76878 33.24420 -1.23 0.2232

c_volume 1 2.02821 0.58404 3.47 0.0008

svi 1 17.85690 10.75049 1.66 0.1001

cpen 1 1.10381 1.32538 0.83 0.4071

gleason 1 6.39294 5.02522 1.27 0.2065

Page 21: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Reduced Model

Root MSE 33.42074 R-Square 0.3424

Dependent Mean 23.73013 Adj R-Sq 0.3285

Coeff Var 140.83671

Note

R-Square difference

0.45 – 0.34=

0.11

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -71.59827 34.91893 -2.05 0.0431

cpen 1 4.82868 1.01632 4.75 <.0001

gleason 1 12.28661 5.19873 2.36 0.0202

Page 22: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.45rsqdiff=0.11ntotal= 97 80 70 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 .977 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11

Computed Power

N Index Total Power

1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677

Page 23: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

51. 45 95. 14

0. 8

0. 98

40 50 60 70 80 90 100

Tot al Sampl e Si ze

0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00

Page 24: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Great, but I don’t have a dataset

Page 25: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 1.00000 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume 0.62415<.0001

1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.52862<.0001

0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.55079<.0001

0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.42958<.0001

0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000

Use the Correlation Matrix

Page 26: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Piece 1Correlation of Y with all Predictors

Page 27: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa

c_volume 1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000

Piece 2 Correlation of All Predictors with Each Other

Page 28: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Piece 3 Correlation of Y with Reduced Model Predictors

Page 29: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa

c_volume

svi

cpen 1.00000 0.46157<.0001

gleason 0.46157<.0001

1.00000

Piece 4Correlation of All Reduced Predictors with Each Other

Page 30: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Matrix Arithmetic with Correlation Matrix

45.0

4.

6.

5.

6.

*

15.4.5.

5.17.7.

4.7.16.

5.7.6.1

*4.6.5.6.

1

2

FullR

34.04.

6.*

15.

5.1*4.6.2

Re

ducedR

11.034.045.02

Re

2 ducedFull

RR

Page 31: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Hold on, we will find out to do this arithmetic later

Page 32: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Different Rsquare Reductionsproc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.45

rsqdiff=0.11 .10 .09 .08ntotal= 97 80 70 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 .977 crossref=yes);run;

Page 33: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

51. 45

56. 25

62. 11 69. 44 95. 14

0. 8

0. 98

40 50 60 70 80 90 100

Tot al Sampl e Si ze

0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

R- squar e Di ff =0. 11

R- squar e Di ff =0. 1

R- squar e Di ff =0. 09

R- squar e Di ff =0. 08

Page 34: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Matrix Arithmetic with Compound Correlation Matrix

22.0

2.

2.

35.

35.

*

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

*2.2.35.35.

1

2

FullR

07.02.

2.*

12.

2.1*2.2.2

Re

ducedR

15.007.022.02

Re

2 ducedFull

RR

Page 35: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

proc iml;%let phi=0.35;%let rx=0.2;phi_yx_full={&phi,&phi,.2,.2};rxx_full={1 &rx &rx &rx , &rx 1 &rx &rx ,

&rx &rx 1 &rx , &rx &rx &rx 1 };

phi_yx_red={&rx,&rx};rxx_red={1 &rx , &rx 1 };

r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full);r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red;

r2diff=r2_full-r2_red;partial = (r2diff/(1-r2_red))**.5;

print r2_full r2_red r2diff partial;run;quit;

R2_FULL R2_RED R2DIFF PARTIAL

0.2171875 0.0666667 0.1505208 0.4015873

Page 36: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.22rsqdiff=0.15 .16ntotal= 40 50 60 70power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22

Computed Power

R-square N Index Diff Total Power

1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

Page 37: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

53. 38

50. 27

0. 8

40 50 60 70 80 90 100

Tot al Sampl e Si ze

0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00

R- squar e Di ff =0. 15

R- squar e Di ff =0. 16

Page 38: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Plan B

• Specify the typical value of the multiple partial correlation coefficient between Y and X.

• Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.

Page 39: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Using Our Example

• Say that we conjecture that the partial correlation between our Y and X’s of interest is:

• For our example this value was 0.408

Recall Rsqare diff in full and reduced models

408.034.1

34.045.00

1 2

22

red

redfull

R

RR

Page 40: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4 ntestpredictors= 2partialcorr= .408 .35ntotal= 97 80 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=.8 .85 .977 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05

Computed Power

Partial N Index Corr Total Power

1 0.408 97 0.979 2 0.408 80 0.949 3 0.408 60 0.864 4 0.408 50 0.787 5 0.408 40 0.677 6 0.350 97 0.910 7 0.350 80 0.843 8 0.350 60 0.713 9 0.350 50 0.623 10 0.350 40 0.514

Note n=4*10=40under powers

Page 41: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

51. 51 67. 9257. 92 76. 55 95. 27

0. 8

0. 85

0. 98

40 50 60 70 80 90 100

Tot al Sampl e Si ze

0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

Par t i al Cor r =0. 408

Par t i al Cor r =0. 36

Page 42: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Plan CUse the Table from Gatsonis and

Sampson (1989)

Page 43: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

U : the number of predictors of interest=2p : the total number of predictors in the model=4N = table value + p + 1For 80% power N = 72 + 4 + 1 = 77

Page 44: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Proc Power and the Tableproc power ;multreg model=randomalpha= .05nfullpredictors= 4 ntestpredictors= 2partialcorr= .35 .40ntotal= 77power=. ;plot x=n min=60 max=120key = oncurvesyopts=(ref=.8 .90 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Total Sample Size 77

Computed Power

Partial Index Corr Power

1 0.35 0.802 2 0.40 0.908

Page 45: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

76. 73

99. 2775. 02

0. 8

0. 9

60 70 80 90 100 110 120

Tot al Sampl e Si ze

0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00

Par t i al Cor r =0. 35

Par t i al Cor r =0. 4

Page 46: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Comments

• Power and sample size is ‘tricky.’• The n= 10 for each predictor will almost always under

power a study.

• Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations.

• This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)

Page 47: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Further Work for Somebody

• A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.

Page 48: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

An Algorithm for Estimating Power and Sample Size for Logistic Models with

One or More Independent Variables of Interest

Jay Northern

D. Keith Williams, PhD

Zoran Bursac, PhD

Joint Statistical Meetings, Denver, COJoint Statistical Meetings, Denver, CO August 3 – August 7, 2008August 3 – August 7, 2008

Page 49: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Background

• Existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text.– PASS– %powerlog macro

Page 50: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Macro Details

• Fit the full and the reduced model – In the reduced model one can exclude one or

more covariates of interest in order to test them simultaneously in the presence of other covariates

• Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom

Page 51: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Results

0102030405060708090

100

50 75 100 150

N (Sample Size)

Po

wer

(1.5,1,1,1,1,1);rho=0.1 (2,2,1,1,1,1);rho=0 (2,2,1,1,1,1);rho=0.1

Page 52: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

End

Page 53: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Plan CExchangeable Matrix in Plan A

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

FullRxx

2.2.2.35.35.` Full

xy

Page 54: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

2.2.2.2.2.R̀e

duced

12.2.2.2.

.12.2.2.

..12.2.

...12.

....1

Re ducedRxx

Page 55: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97

psa c_volume svi cpen gleason c_wt age bph

psa 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

c_volume

0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

svi 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

cpen 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

gleason

0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

c_wt 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

age 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

bph -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Page 56: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Full Correlation Matrix

psa c_volume svi cpen gleason c_wt age bph

psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

c_volume 0.624151 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.528619 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.550793 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.42958 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.026213 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.017199 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.01649 -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

Page 57: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Correlation of Y with All X’sFull Model

psa c_volume svi cpen gleason c_wt age bph

psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

Page 58: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix of X’sFull Model

psa c_volume svi cpen gleason c_wt age bph

psa

c_volume 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

Page 59: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Correlation of Y with All X’sReduced Model

psa c_volume svi cpen gleason c_wt age bph

psa 0.550793 0.42958 0.026213 0.017199 -0.01649

Page 60: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix of X’s

psa c_volume svi cpen gleason c_wt age bph

psa

c_volume

svi

cpen 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.001579 -0.02421 1 0.164324 0.321849

age 0.099555 0.225852 0.164324 1 0.366341

bph -0.08301 0.026826 0.321849 0.366341 1

Page 61: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Regular CorrelationsVersus

Partial Correlations3 Variables: psa c_volume svi

Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0

psa c_volume svi

psa 1.00000

0.62415 <.0001

0.52862 <.0001

c_volume 0.62415 <.0001

1.00000

0.58174 <.0001

svi 0.52862 <.0001

0.58174 <.0001

1.00000

5 Partial Variables: cpen gleason c_wt age bph

3 Variables: psa c_volume svi

Pearson Partial Correlation Coefficients, N = 97 Prob > |r| under H0: Partial Rho=0

psa c_volume svi

psa 1.00000

0.36564 0.0003

0.23248 0.0257

c_volume 0.36564 0.0003

1.00000

0.16518 0.1156

svi 0.23248 0.0257

0.16518 0.1156

1.00000

Page 62: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix

Obs psa c_volume svi cpen gleason c_wt age bph

1 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

2 0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

3 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

4 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

5 0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

6 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

7 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

8 -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Full R xyReduced Rxy

X’s of interest

Covariates in reduced model Rxx

Page 63: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix

Obs psa c_volume svi cpen gleason c_wt age bph

1 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

2 0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

3 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

4 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

5 0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

6 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

7 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

8 -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Full R xyReduced Rxy

X’s of interest

Covariates in reduced model Rxx

Page 64: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix

Obs psa c_volume svi cpen gleason c_wt age bph

1 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

2 0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

3 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

4 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

5 0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

6 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

7 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

8 -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Full R xyReduced Rxy

X’s of interest

Covariates in reduced model Rxx

Page 65: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Gold Standard ApproachSome Matrix Algebra

Page 66: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

=0.35

Page 67: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Gold Standard ApproachSome Matrix Algebra

Page 68: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

=0.35

Page 69: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa

c_volume 1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000

Page 70: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

cpen gleason

cpen 1.00000 0.46157<.0001

gleason 0.46157<.0001

1.00000

Page 71: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Page 72: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Page 73: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Full Correlation Matrix

psa c_volume svi cpen gleason c_wt age bph

psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

c_volume 0.624151 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.528619 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.550793 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.42958 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.026213 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.017199 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.01649 -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

Page 74: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Correlation of Y with All X’sFull Model

psa c_volume svi cpen gleason c_wt age bph

psa 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

)( fullyx

Page 75: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix of X’sFull Model

psa c_volume svi cpen gleason c_wt age bph

psa

c_volume 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

Page 76: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Correlation of Y with All X’sReduced Model

psa c_volume svi cpen gleason c_wt age bph

psa 0.550793 0.42958 0.026213 0.017199 -0.01649

)( reducedyx

Page 77: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Correlation Matrix of X’s

psa c_volume svi cpen gleason c_wt age bph

psa

c_volume

svi

cpen 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.001579 -0.02421 1 0.164324 0.321849

age 0.099555 0.225852 0.164324 1 0.366341

bph -0.08301 0.026826 0.321849 0.366341 1

Page 78: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

The Calculations

05.1825.01

11.025.097

1 2

2

Re

2

Full

dFull

R

RRN

Power = 0.97

Page 79: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

• proc power ;• multreg • model=fixed• alpha= .05• nfullpredictors= 7 • ntestpredictors= 2• rsqfull=0.2505682• rsqdiff=0.1111111• ntotal= 50 60 70 80 97 • power=. ;• plot x=n min=60 max=100• key = oncurves• yopts=(ref=.8 .85 .9 .95 crossref=yes)• ;• run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.250568 R-square of Reduced Model 0.111111

Computed Power

N Index Total Power

1 50 0.753 2 60 0.836 3 70 0.894 4 80 0.933 5 97 0.970

Page 80: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

62. 09 71. 35 86. 28

0. 85

0. 9

0. 95

60 70 80 90 100

Tot al Sampl e Si ze

0. 825

0. 850

0. 875

0. 900

0. 925

0. 950

0. 975

Page 81: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 1.00000 .35 .35 .2 .2

c_volume .35 1.00000 .2 .2 .2

svi .35 .2 1.00000 .2 .2

cpen .2 .2 .2 1.00000 .2

gleason .2 .2 .2 .2 1.00000

Page 82: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Matrix Arithmetic with Compound Correlation Matrix

22.0

2.

2.

35.

35.

*

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

*2.2.35.35.

1

2

FullR

07.02.

2.*

12.

2.1*2.2.2

Re

ducedR

15.007.022.02

Re

2 ducedFull

RR

Page 83: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.22rsqdiff=0.15 .16ntotal= 40 50 60 70power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22

Computed Power

R-square N Index Diff Total Power

1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

Page 84: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Calculations

2

2Re

2

1 Full

ducedFull

R

RRN

)1,(),1,([ 111 pNpFpNpFPPower

The number of predictors of interest 2

The total number of predictors in the model 4

Page 85: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Approaches in Estimating the Parameters to Calculate Power

Plan A• Complete specification of the parts for the

expression:

4.1945.01

34.045.097

1 2

2

Re

2

Full

ducedFull

R

RRN

= 0.34

= 0.45

Page 86: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

Approaches in Estimating the Parameters to Calculate Power

Plan A• Complete specification of the parts for the

expression:

Page 87: A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

p0

0. 0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

z

0 10 20 30 40 50

F(2,92)

F(2,92,19.4)

Critical Value for alpha = .05

3.07

Noncentrality Parameter

19.4

Total area in blue.Power = 0.97