a casual tutorial on sample size planning for multiple regression models d. keith williams m.p.h....

A Casual Tutorial on Sample Size Planning for Multiple Regression Models

D. Keith Williams M.P.H. Ph.D.Department of Biostatistics

Area = 0.16

1.00

Area = 0.47

2.00

Area = 0.81

3.00

3.87

Area = 0.955

Buzzwords

• Beta () = P(Type II error) = P(Conclude the experimental groups are the same when they really are different)

• Power = 1 - = P(Conclude experimental groups are different when they really are!)

The Non Centrality ParameterTwo Group t-test

21

21

11nn

221 nndf

An Example Scenario

• Alpha =0.05, sigma=2

• |mu1 – mu2| = 2, that is, a two unit diff in means for a population

• Propose n1 = 10 and n2 = 10

236.2

101

101

2

2

11

21

21

nn

Rejection region for two tailed t-test alpha=0.05, df = 18

Noncentrality value =2.236, Critical value = |2.101|Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18Power between 0.47 and 0.81, SAS calculation 0.56195

The Key Point of the Review

• One conjectures the difference in means to estimate power in studies that compare means.

• In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.

Regression Power and Sample Size

• Power for specific predictors in the presence of other covariates in a model.

• More complex to conceptualize than testing differences among means.

Example Data Set

The Hypothetical ScenarioA model with 4 terms

Predictors for PSA of interest that we choose to power:

1.SVI2.c_volume

Two Covariates to be included : cpen, gleason

Approaches in Estimating the Parameters to Calculate Power

Plan A• Complete specification of the parts for the

expression:

Details

gleasoncopenvolCSVIy43210

_

gleasoncopeny430

The full model We want to power the test that a model with these

2 predictors is statistically better than a model excluding them.

The reduced model

Full Model

Root MSE 30.98987 R-Square 0.4467

Dependent Mean 23.73013 Adj R-Sq 0.4226

Coeff Var 130.59291Predictors of interest

Note

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -40.76878 33.24420 -1.23 0.2232

c_volume 1 2.02821 0.58404 3.47 0.0008

svi 1 17.85690 10.75049 1.66 0.1001

cpen 1 1.10381 1.32538 0.83 0.4071

gleason 1 6.39294 5.02522 1.27 0.2065

Reduced Model

Root MSE 33.42074 R-Square 0.3424

Dependent Mean 23.73013 Adj R-Sq 0.3285

Coeff Var 140.83671

Note

R-Square difference

0.45 – 0.34=

0.11

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -71.59827 34.91893 -2.05 0.0431

cpen 1 4.82868 1.01632 4.75 <.0001

gleason 1 12.28661 5.19873 2.36 0.0202

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.45rsqdiff=0.11ntotal= 97 80 70 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 .977 crossref=yes);run;

The POWER Procedure Type III F Test in Multiple Regression

Fixed Scenario Elements

Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11

Computed Power

N Index Total Power

1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677

51. 45 95. 14

0. 8

0. 98

40 50 60 70 80 90 100

Tot al Sampl e Si ze

0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00

Great, but I don’t have a dataset

Pearson Correlation Coefficients, N = 97Prob > |r| under H0: Rho=0

psa c_volume svi cpen gleason

psa 1.00000 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume 0.62415<.0001

1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.52862<.0001

0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.55079<.0001

0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.42958<.0001

0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000

Use the Correlation Matrix



psa 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Piece 1Correlation of Y with all Predictors



psa

c_volume 1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000

Piece 2 Correlation of All Predictors with Each Other



psa 0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Piece 3 Correlation of Y with Reduced Model Predictors



psa

c_volume

svi

cpen 1.00000 0.46157<.0001

gleason 0.46157<.0001

1.00000

Piece 4Correlation of All Reduced Predictors with Each Other

Matrix Arithmetic with Correlation Matrix

45.0

4.

6.

5.

6.

*

15.4.5.

5.17.7.

4.7.16.

5.7.6.1

*4.6.5.6.

1

2

FullR

34.04.

6.*

15.

5.1*4.6.2

Re

ducedR

11.034.045.02

Re

2 ducedFull

RR

Hold on, we will find out to do this arithmetic later

Different Rsquare Reductionsproc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.45

rsqdiff=0.11 .10 .09 .08ntotal= 97 80 70 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 .977 crossref=yes);run;

51. 45

56. 25

62. 11 69. 44 95. 14

0. 8

0. 98

40 50 60 70 80 90 100


0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

R- squar e Di ff =0. 11




Matrix Arithmetic with Compound Correlation Matrix

22.0

2.

2.

35.

35.

*

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

*2.2.35.35.

1

2

FullR

07.02.

2.*

12.

2.1*2.2.2

Re

ducedR

15.007.022.02

Re

2 ducedFull

RR

proc iml;%let phi=0.35;%let rx=0.2;phi_yx_full={&phi,&phi,.2,.2};rxx_full={1 &rx &rx &rx , &rx 1 &rx &rx ,

&rx &rx 1 &rx , &rx &rx &rx 1 };

phi_yx_red={&rx,&rx};rxx_red={1 &rx , &rx 1 };

r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full);r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red;

r2diff=r2_full-r2_red;partial = (r2diff/(1-r2_red))**.5;

print r2_full r2_red r2diff partial;run;quit;

R2_FULL R2_RED R2DIFF PARTIAL

0.2171875 0.0666667 0.1505208 0.4015873

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.22rsqdiff=0.15 .16ntotal= 40 50 60 70power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 crossref=yes);run;



Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22

Computed Power

R-square N Index Diff Total Power

1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

53. 38

50. 27

0. 8

40 50 60 70 80 90 100


0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00



Plan B

• Specify the typical value of the multiple partial correlation coefficient between Y and X.

• Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.

Using Our Example

• Say that we conjecture that the partial correlation between our Y and X’s of interest is:

• For our example this value was 0.408

Recall Rsqare diff in full and reduced models

408.034.1

34.045.00

1 2

22

red

redfull

R

RR

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4 ntestpredictors= 2partialcorr= .408 .35ntotal= 97 80 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=.8 .85 .977 crossref=yes);run;



Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05

Computed Power

Partial N Index Corr Total Power

1 0.408 97 0.979 2 0.408 80 0.949 3 0.408 60 0.864 4 0.408 50 0.787 5 0.408 40 0.677 6 0.350 97 0.910 7 0.350 80 0.843 8 0.350 60 0.713 9 0.350 50 0.623 10 0.350 40 0.514

Note n=4*10=40under powers

51. 51 67. 9257. 92 76. 55 95. 27

0. 8

0. 85

0. 98

40 50 60 70 80 90 100


0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

Par t i al Cor r =0. 408


Plan CUse the Table from Gatsonis and

Sampson (1989)

U : the number of predictors of interest=2p : the total number of predictors in the model=4N = table value + p + 1For 80% power N = 72 + 4 + 1 = 77

Proc Power and the Tableproc power ;multreg model=randomalpha= .05nfullpredictors= 4 ntestpredictors= 2partialcorr= .35 .40ntotal= 77power=. ;plot x=n min=60 max=120key = oncurvesyopts=(ref=.8 .90 crossref=yes);run;



Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Total Sample Size 77

Computed Power

Partial Index Corr Power

1 0.35 0.802 2 0.40 0.908

76. 73

99. 2775. 02

0. 8

0. 9

60 70 80 90 100 110 120


0. 65

0. 70

0. 75

0. 80

0. 85

0. 90

0. 95

1. 00



Comments

• Power and sample size is ‘tricky.’• The n= 10 for each predictor will almost always under

power a study.

• Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations.

• This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)

Further Work for Somebody

• A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.

An Algorithm for Estimating Power and Sample Size for Logistic Models with

One or More Independent Variables of Interest

Jay Northern

D. Keith Williams, PhD

Zoran Bursac, PhD

Joint Statistical Meetings, Denver, COJoint Statistical Meetings, Denver, CO August 3 – August 7, 2008August 3 – August 7, 2008

Background

• Existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text.– PASS– %powerlog macro

Macro Details

• Fit the full and the reduced model – In the reduced model one can exclude one or

more covariates of interest in order to test them simultaneously in the presence of other covariates

• Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom

Results

0102030405060708090

100

50 75 100 150

N (Sample Size)

Po

wer

(1.5,1,1,1,1,1);rho=0.1 (2,2,1,1,1,1);rho=0 (2,2,1,1,1,1);rho=0.1

Plan CExchangeable Matrix in Plan A

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

FullRxx

2.2.2.35.35.` Full

xy

2.2.2.2.2.R̀e

duced

12.2.2.2.

.12.2.2.

..12.2.

...12.

....1

Re ducedRxx

Pearson Correlation Coefficients, N = 97

psa c_volume svi cpen gleason c_wt age bph

psa 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

c_volume

0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

svi 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

cpen 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

gleason

0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

c_wt 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

age 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

bph -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Full Correlation Matrix


psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

c_volume 0.624151 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.528619 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.550793 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.42958 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.026213 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.017199 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.01649 -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

The Correlation of Y with All X’sFull Model


psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

Correlation Matrix of X’sFull Model


psa

c_volume 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

The Correlation of Y with All X’sReduced Model


psa 0.550793 0.42958 0.026213 0.017199 -0.01649

Correlation Matrix of X’s


psa

c_volume

svi

cpen 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.001579 -0.02421 1 0.164324 0.321849

age 0.099555 0.225852 0.164324 1 0.366341

bph -0.08301 0.026826 0.321849 0.366341 1

Regular CorrelationsVersus

Partial Correlations3 Variables: psa c_volume svi

Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0

psa c_volume svi

psa 1.00000

0.62415 <.0001

0.52862 <.0001

c_volume 0.62415 <.0001

1.00000

0.58174 <.0001

svi 0.52862 <.0001

0.58174 <.0001

1.00000

5 Partial Variables: cpen gleason c_wt age bph

3 Variables: psa c_volume svi

Pearson Partial Correlation Coefficients, N = 97 Prob > |r| under H0: Partial Rho=0

psa c_volume svi

psa 1.00000

0.36564 0.0003

0.23248 0.0257

c_volume 0.36564 0.0003

1.00000

0.16518 0.1156

svi 0.23248 0.0257

0.16518 0.1156

1.00000

Correlation Matrix

Obs psa c_volume svi cpen gleason c_wt age bph

1 1.00000 0.62415 0.52862 0.55079 0.42958 0.02621 0.01720 -0.01649

2 0.62415 1.00000 0.58174 0.69290 0.48144 0.00511 0.03909 -0.13321

3 0.52862 0.58174 1.00000 0.68028 0.42857 -0.00241 0.11766 -0.11955

4 0.55079 0.69290 0.68028 1.00000 0.46157 0.00158 0.09956 -0.08301

5 0.42958 0.48144 0.42857 0.46157 1.00000 -0.02421 0.22585 0.02683

6 0.02621 0.00511 -0.00241 0.00158 -0.02421 1.00000 0.16432 0.32185

7 0.01720 0.03909 0.11766 0.09956 0.22585 0.16432 1.00000 0.36634

8 -0.01649 -0.13321 -0.11955 -0.08301 0.02683 0.32185 0.36634 1.00000

Full R xyReduced Rxy

X’s of interest

Covariates in reduced model Rxx

The Gold Standard ApproachSome Matrix Algebra



psa

c_volume 1.00000 0.58174<.0001

0.69290<.0001

0.48144<.0001

svi 0.58174<.0001

1.00000 0.68028<.0001

0.42857<.0001

cpen 0.69290<.0001

0.68028<.0001

1.00000 0.46157<.0001

gleason 0.48144<.0001

0.42857<.0001

0.46157<.0001

1.00000


cpen gleason

cpen 1.00000 0.46157<.0001

gleason 0.46157<.0001

1.00000



psa 0.62415<.0001

0.52862<.0001

0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason



psa 0.55079<.0001

0.42958<.0001

c_volume

svi

cpen

gleason

Full Correlation Matrix


psa 1 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

c_volume 0.624151 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.528619 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.550793 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.42958 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.026213 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.017199 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.01649 -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

The Correlation of Y with All X’sFull Model


psa 0.624151 0.528619 0.550793 0.42958 0.026213 0.017199 -0.01649

)( fullyx

Correlation Matrix of X’sFull Model


psa

c_volume 1 0.581742 0.692897 0.481438 0.005107 0.039094 -0.13321

svi 0.581742 1 0.680284 0.428573 -0.00241 0.117658 -0.11955

cpen 0.692897 0.680284 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.481438 0.428573 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.005107 -0.00241 0.001579 -0.02421 1 0.164324 0.321849

age 0.039094 0.117658 0.099555 0.225852 0.164324 1 0.366341

bph -0.13321 -0.11955 -0.08301 0.026826 0.321849 0.366341 1

The Correlation of Y with All X’sReduced Model


psa 0.550793 0.42958 0.026213 0.017199 -0.01649

)( reducedyx

Correlation Matrix of X’s


psa

c_volume

svi

cpen 1 0.461566 0.001579 0.099555 -0.08301

gleason 0.461566 1 -0.02421 0.225852 0.026826

c_wt 0.001579 -0.02421 1 0.164324 0.321849

age 0.099555 0.225852 0.164324 1 0.366341

bph -0.08301 0.026826 0.321849 0.366341 1

The Calculations

05.1825.01

11.025.097

1 2

2

Re

2

Full

dFull

R

RRN

Power = 0.97

• proc power ;• multreg • model=fixed• alpha= .05• nfullpredictors= 7 • ntestpredictors= 2• rsqfull=0.2505682• rsqdiff=0.1111111• ntotal= 50 60 70 80 97 • power=. ;• plot x=n min=60 max=100• key = oncurves• yopts=(ref=.8 .85 .9 .95 crossref=yes)• ;• run;



Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.250568 R-square of Reduced Model 0.111111

Computed Power

N Index Total Power

1 50 0.753 2 60 0.836 3 70 0.894 4 80 0.933 5 97 0.970

62. 09 71. 35 86. 28

0. 85

0. 9

0. 95

60 70 80 90 100


0. 825

0. 850

0. 875

0. 900

0. 925

0. 950

0. 975



psa 1.00000 .35 .35 .2 .2

c_volume .35 1.00000 .2 .2 .2

svi .35 .2 1.00000 .2 .2

cpen .2 .2 .2 1.00000 .2

gleason .2 .2 .2 .2 1.00000

Matrix Arithmetic with Compound Correlation Matrix

22.0

2.

2.

35.

35.

*

12.2.2.

2.12.2.

2.2.12.

2.2.2.1

*2.2.35.35.

1

2

FullR

07.02.

2.*

12.

2.1*2.2.2

Re

ducedR

15.007.022.02

Re

2 ducedFull

RR

proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.22rsqdiff=0.15 .16ntotal= 40 50 60 70power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 crossref=yes);run;



Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22

Computed Power

R-square N Index Diff Total Power

1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

Calculations

2

2Re

2

1 Full

ducedFull

R

RRN

)1,(),1,([ 111 pNpFpNpFPPower

The number of predictors of interest 2

The total number of predictors in the model 4



expression:

4.1945.01

34.045.097

1 2

2

Re

2

Full

ducedFull

R

RRN

= 0.34

= 0.45



expression:

p0

0. 0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1. 0

z

0 10 20 30 40 50

F(2,92)

F(2,92,19.4)

Critical Value for alpha = .05

3.07

Noncentrality Parameter

19.4

Total area in blue.Power = 0.97

a casual tutorial on sample size planning for multiple regression models d. keith williams m.p.h....

Documents