statistics 262: intermediate biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf ·...
TRANSCRIPT
![Page 1: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/1.jpg)
Statistics 262: IntermediateBiostatistics
Regression, ANOVA, Random Effects
Jonathan Taylor & Kristin Cobb
Statistics 262: Intermediate Biostatistics – p.1/??
![Page 2: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/2.jpg)
Overview of today’s class
Multiple regression models
Analysis of Variance: Fixed effects
Analysis of Variance: Random effects
Statistics 262: Intermediate Biostatistics – p.2/??
![Page 3: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/3.jpg)
Example: mortality rates
Pollution study: For n = 59 metropolitanareas, we record: median education, X1; %nonwhite X2; median income, X3 & pollution,X4.
Aim is to predict mortality rates Y based onX1, . . . , X4.
“Simplest model”
Y∣∣X1, . . . , X4 = β0 +
4∑
i=1
βi · Xi + ε.
Statistics 262: Intermediate Biostatistics – p.3/??
![Page 4: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/4.jpg)
Example: mortality rates
As in simple regression model, ε’s areassumed independent (conditional on all ofthe observed X ’s) and homoskedastic (equalvariance).
Model is fit using least squares – as usual.
Questions of possible interest: is pollutioncorrelated with mortality?
Or, H0 : β4 = 0?
Statistics 262: Intermediate Biostatistics – p.4/??
![Page 5: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/5.jpg)
Testing hypotheses:
Assuming ε ∼ N(0, σ2), the least squaresestimates
β̂ =(XTX
)−1XTY ∼ N
(β, σ2(XTX)−1
)
where X is the design matrix
1 X1,1 . . . X1,4... ... ... ...1 X59,1 . . . X59,4
Statistics 262: Intermediate Biostatistics – p.5/??
![Page 6: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/6.jpg)
Distributions
For any linear combination of β’s
〈a, β̂〉 ∼ N(〈a, β〉, σ2a(XTX)−1aT
).
We also “know” that
σ̂2 =1
54
59∑
i=1
(Yi − (β0 +
4∑
j=1
β̂jXj)
)2
∼ σ2·χ2
54
54.
Why “54” – because we estimated 5parameters, leaving 59 − 5 = 54 degrees offreedom.
Statistics 262: Intermediate Biostatistics – p.6/??
![Page 7: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/7.jpg)
t-tests
These facts tell us that
〈a, β̂〉√σ̂2a(XTX)−1aT
∼ t54.
This gives a way to test whether β4 = 0 (andget a CI for β4) : compute
T =β̂4√
σ̂2(XTX)−15,5
If |T | > t54,0.975 = 2.00 then we reject H0 atlevel 0.05. Statistics 262: Intermediate Biostatistics – p.7/??
![Page 8: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/8.jpg)
F -tests
A t-test is a “partial” regression test becauseit tests for the effect of one variable “allowingfor” the effects of the remaining variables.
Sometimes, it may be of interest to seewhether we can “safely” drop more than onevariable from the model.
For instance, suppose we want to see ifeducation and % nonwhite can be droppedfrom the model.
Test is based on the difference in SSEbetween the two models and the SSE of the“full” model. Statistics 262: Intermediate Biostatistics – p.8/??
![Page 9: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/9.jpg)
Error sums of squares
Consider the two models
Yi = β0 + β1 · X1 + β4 · X4
Yi = β0 +4∑
i=1
βi · Xi
Each model, has an SSE, let’s say SSE(R)(reduced) and SSE(F ) (full).
Statistics 262: Intermediate Biostatistics – p.9/??
![Page 10: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/10.jpg)
F -test procedure
If the ε’s are normally distributed andhomoskedastic, then
F =
SSE(R)−SSE(F )dfR−dfF
SSE(F )dfF
∼ FdfR−dfF ,dfF.
Reject H0 : β2 = β3 = 0 if F > FdfR−dfF ,dfF ,0.95.
That is, reject if the full model explainssignificantly more variability than the reducedmodel.
Statistics 262: Intermediate Biostatistics – p.10/??
![Page 11: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/11.jpg)
SAS code
PROC REG DATA=DATADIR.pollution;
MODEL MORTALITY = INCOME POLLUTION \
NONWHITE EDUCATION / PARTIAL ;
POLLUTION: TEST POLLUTION=0;
FTEST: TEST INCOME=0, EDUCATION=0;
RUN;
Statistics 262: Intermediate Biostatistics – p.11/??
![Page 12: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/12.jpg)
Diagnostics
QQ-plot as in simple regression.
Partial residual plots: for each variable
Measures of influence: Cook’s distance.
Variance inflation factors.
Statistics 262: Intermediate Biostatistics – p.12/??
![Page 13: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/13.jpg)
Cook’s distance, VIF
Cook’s distance is a measure of how much aparticular observation influences theregression model.
Measures the difference in the predictedmeans when the i-th observation is deletedfrom the dataset.
The variance inflation factor of a predictor is ameasure of how accurately you can estimatea coefficient.
Statistics 262: Intermediate Biostatistics – p.13/??
![Page 14: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/14.jpg)
SAS code: VIF, Cook’s distance
PROC REG DATA=DATADIR.pollution;
MODEL MORTALITY = INCOME POLLUTION NONWHITE \
EDUCATION / VIF INFLUENCE;
OUTPUT OUT=DATADIR.pdiag R=RESID \
COOKD=COOK P=YHAT;
RUN;
PROC PLOT DATA=DATADIR.pdiag;
PLOT COOK*YHAT;
RUN;
Statistics 262: Intermediate Biostatistics – p.14/??
![Page 15: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/15.jpg)
Using PROC GLM
PROC GLM DATA=DATADIR.pollution;
MODEL MORTALITY = INCOME POLLUTION \
NONWHITE EDUCATION;
CONTRAST ’Pollution’ POLLUTION 1;
CONTRAST ’Income & Education’ INCOME 1, \
EDUCATION 1;
RUN;
Statistics 262: Intermediate Biostatistics – p.15/??
![Page 16: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/16.jpg)
Analysis of Variance
All variables in the pollution dataset arecontinuous.
In clinical settings, often there are categoricalvariables.
Simplest example: comparing twopopulations – two sample t-test.
Statistics 262: Intermediate Biostatistics – p.16/??
![Page 17: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/17.jpg)
One-way ANOVA
First generalization: more than one level.
One-way ANOVA model: observations:(Yij), 1 ≤ i ≤ r, 1 ≤ j ≤ ni: r groups and ni
samples in i-th group.
Yij = µ + αi + εij, εij ∼ N(0, σ2).
Constraint:∑r
i=1 αi = 0.
Simplest question: is there any group effect?
H0 : α1 = · · · = αr = 0.
Statistics 262: Intermediate Biostatistics – p.17/??
![Page 18: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/18.jpg)
ANOVA tables: One-way
Source SS df E(MS)
Treatments SSTR =Pr
i=1 ni
`
Y i· − Y··
´2r − 1 σ2 +
Pri=1
niα2
i
r−1
Error SSE =Pr
i=1
Pnij=1(Yij − Y i·)
2Pr
i=1 ni − r σ2
Notation: Y i· is i-th group mean, Y ·· is overallmean.
By looking at the ANOVA table, we canconstruct tests very easily.
For instance, we see that underH0 : α1 = · · · = αr = 0, the expected value ofSSTR and SSE is σ2.
Statistics 262: Intermediate Biostatistics – p.18/??
![Page 19: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/19.jpg)
Example: rehab surgery
How does prior fitness affect recovery fromsurgery? Observations: 24 subjects’ recoverytime.
Three fitness levels: below average, average,above average.
If you are in better shape before surgery,does it take less time to recover?
Statistics 262: Intermediate Biostatistics – p.19/??
![Page 20: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/20.jpg)
Group effect
Full model:
Yij ∼ µ + αi + εij
Reduced model:
Yij ∼ µ + εij.
F -statistic
F =
∑r
i=1
∑nij=1(Yij−Y
··)2
−∑r
i=1
∑nij=1
(Yij−Y i·)2
2∑r
i=1
∑nij=1
(Yij−Y i·)2
21
=
SSTRdfTR
SSEdfE
Statistics 262: Intermediate Biostatistics – p.20/??
![Page 21: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/21.jpg)
Test for trend
Does increased fitness correspond to adecrease in recovery time?
One way to test this: test
H0 :3∑
j=1
(j − 2) · (µj − µ) = 0.
Rationale: if the means µ are of definite orderthen they will be correlated with the vector(1, 2, 3). If means are “symmetric’ around µ2,then test “should not” reject H0.
Statistics 262: Intermediate Biostatistics – p.21/??
![Page 22: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/22.jpg)
SAS code
PROC GLM DATA=DATADIR.rehab;
CLASS FITNESS;
MODEL TIME = FITNESS;
ESTIMATE ’TREND’ FITNESS 1 0 -1 ;
RUN;
Statistics 262: Intermediate Biostatistics – p.22/??
![Page 23: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/23.jpg)
Two-way ANOVA
Second generalization: more than onegrouping variable.
Two-way ANOVA model (equal sample sizes):observations:(Yijk), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤ n: rgroups in first grouping variable, m groups inssecond and n samples per “cell.”
Yijk = µ+αi+βj+(αβ)ij+εijk, εijk ∼ N(0, σ2).
Again: just a regression model.
Statistics 262: Intermediate Biostatistics – p.23/??
![Page 24: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/24.jpg)
Constraints on the parameters
∑ri=1 αi = 0
∑mj=1 βj = 0
∑mj=1(αβ)ij = 0, 1 ≤ i ≤ r
∑ri=1(αβ)ij = 0, 1 ≤ j ≤ m.
Statistics 262: Intermediate Biostatistics – p.24/??
![Page 25: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/25.jpg)
Questions of interest
Are there main effects for the groupingvariables?
H0 : α1 = · · · = αr = 0, H0 : β1 = · · · = βm = 0.
Are there interaction effects:
H0 : (αβ)ij = 0, 1 ≤ i ≤ r, 1 ≤ j ≤ m.
Statistics 262: Intermediate Biostatistics – p.25/??
![Page 26: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/26.jpg)
ANOVA tables: Two-way (fixed)
SS df E(MS)
SSA = nmPr
i=1
`
Y i·· − Y···
´2r − 1 σ2 + nm
Pri=1
α2
i
r−1
SSB = nrPm
j=1
`
Y·j· − Y
···
´2m − 1 σ2 + nr
Pmj=1
β2
j
r−1
SSAB = nPr
i=1
Pmj=1
`
Y ij· − Y i·· − Y·j· + Y
···
´2(m − 1)(r − 1) σ2 + n
Pri=1
Pmj=1
(αβ)2ij
(r−1)(m−1)
SSE =Pr
i=1
Pmj=1
Pnk=1(Yijk − Y ij·)
2 (n − 1)mr σ2
For instance, we see that underH0 : (αβ)ij = 0,∀i, j the expected value ofSSAB and SSE is σ2 – use these for anF -test.
Statistics 262: Intermediate Biostatistics – p.26/??
![Page 27: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/27.jpg)
Example: kidney failure
Time of stay in hospital depends on weightgain between treatments and duration oftreatment.
Two levels of duration, three levels of weightgain.
Is there an interaction? Main effects?
Statistics 262: Intermediate Biostatistics – p.27/??
![Page 28: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/28.jpg)
SAS code
PROC GLM DATA=DATADIR.kidney;
CLASS DURATION WEIGHT;
MODEL DAYS = DURATION WEIGHT DURATION*WEIGHT;
MEANS DURATION WEIGHT / LSD CLDIFF;
RUN;
Statistics 262: Intermediate Biostatistics – p.28/??
![Page 29: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/29.jpg)
Random vs. fixed effects
In two ANOVA examples, the categoricalvariables are well-defined categories: belowaverage fitness, long duration, etc.
In some designs, there is sometimes acategorical variable for each subject.
Simplest example: repeated measures,where more than one (identical)measurement is taken on the same individual.
In this case, the “group” effect αi is bestthought of as random.
Statistics 262: Intermediate Biostatistics – p.29/??
![Page 30: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/30.jpg)
When to use random effects?
Example (two-way): suppose we werestudying the variability of an assay todetermine Viral Load in HIV+ patients. Weare interested also in the variability acrosssubtype.
We might collect data from many differentcenters on a few of the most prevalentsubtypes.
Ignoring possible confounding, the “center”effect can be thought of as a random effect,and subtype as a fixed effect.
Statistics 262: Intermediate Biostatistics – p.30/??
![Page 31: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/31.jpg)
Example: sodium content in beer
How much sodium is there in North Americanbeer? How much does this vary by brand?
Observations: for 6 brands of beer, werecorded the sodium content of 8 12 ouncebottles.
Questions of interest: what is the “grandmean” sodium content? How much variabilityis there from brand to brand?
“Individuals” in this case are brands, repeatedmeasures are the 8 bottles.
Statistics 262: Intermediate Biostatistics – p.31/??
![Page 32: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/32.jpg)
One-way random effects model
Yij ∼ µ· + αi + εij, 1 ≤ i ≤ r, 1 ≤ j ≤ n
εij ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ n
αi ∼ N(0, σ2µ), 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.32/??
![Page 33: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/33.jpg)
One-way random (equal sample sizes)
Source SS df E(SS)
Treatments SSTR =Pr
i=1 n`
Y i· − Y··
´2r − 1 σ2 + nσ2
µ
Error SSE =Pr
i=1
Pnj=1(Yij − Y i·)
2 (n − 1)r σ2
Only change here is the expectation of SSTRwhich reflects randomness of αi’s.
ANOVA table is still useful to setup tests: thesame F statistics for fixed or random will workhere.
Statistics 262: Intermediate Biostatistics – p.33/??
![Page 34: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/34.jpg)
Inference for µ·
We know that E(Y ··) = µ·, and can show that
Var(Y ··) =nσ2
µ + σ2
rn.
Therefore,Y ·· − µ·
SSTR(r−1)rn
∼ tr−1
Statistics 262: Intermediate Biostatistics – p.34/??
![Page 35: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/35.jpg)
Inference for µ·
Why r − 1 degrees of freedom? Imagine wecould record an infinite number ofobservations for each individual, so thatY i· → µi.
To learn anything about µ· we still only have robservations (µ1, . . . , µr).
Sampling more within an individual cannotnarrow the CI for µ·.
Statistics 262: Intermediate Biostatistics – p.35/??
![Page 36: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/36.jpg)
Inference for σ2µ
σ2µ+σ2
This quantity describes the relativecontribution of the random effects variance tothe total variance of one observation.
We use the fact that
F =σ2 + nσ2
µ
σ2×
SSTRr−1SSE
(n−1)r
∼ Fr−1,(n−1)r.
Manipulate the inequalities:
P (Fr−1,(n−1)r,α/2 ≤ F ≤ Fr−1,(n−1)r,1−α/2) = 1−α.
Statistics 262: Intermediate Biostatistics – p.36/??
![Page 37: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/37.jpg)
Estimating σ2µ
From the ANOVA table
σ2µ =
E(SSTR/(r − 1)) − E(SSE/((n − 1)r))
n.
Natural estimate:
S2µ =
SSTR/(r − 1) − SSE/((n − 1)r)
n
Problem: this estimate can be negative! Oneof the difficulties in random effects model.
Statistics 262: Intermediate Biostatistics – p.37/??
![Page 38: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/38.jpg)
SAS code
PROC GLM DATA=DATADIR.beer;
CLASS BRAND BOTTLE;
MODEL SODIUM = BRAND;
MEANS BRAND / LSD CLDIFF;
RUN;
Statistics 262: Intermediate Biostatistics – p.38/??
![Page 39: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/39.jpg)
Two-way random effects model
Yijk ∼ µ·· + αi + βj + (αβ)ij + εij, 1 ≤ i ≤ r, 1 ≤j ≤ m, 1 ≤ k ≤ n
εijk ∼ N(0, σ2), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤n
αi ∼ N(0, σ2α), 1 ≤ i ≤ r.
βj ∼ N(0, σ2β), 1 ≤ j ≤ m.
(αβ)ij ∼ N(0, σ2αβ), 1 ≤ j ≤ m, 1 ≤ i ≤ r.
Statistics 262: Intermediate Biostatistics – p.39/??
![Page 40: Statistics 262: Intermediate Biostatisticsstatweb.stanford.edu/.../spring.2004/notes/week2.pdf · 2005-11-23 · variable from the model. For instance, suppose we want to see if education](https://reader034.vdocument.in/reader034/viewer/2022050504/5f962939bfae0f7e9f70cb34/html5/thumbnails/40.jpg)
ANOVA tables: Two-way (random)
SS df E(SS)
SSA = nmPr
i=1
`
Y i·· − Y···
´2r − 1 σ2 + nmσ2
α + nσ2αβ
SSB = nrPm
j=1
`
Y·j· − Y
···
´2m − 1 σ2 + nrσ2
β+ nσ2
αβ
SSAB = nPr
i=1
Pmj=1
`
Y ij· − Y i·· − Y·j· + Y
···
´2(m − 1)(r − 1) σ2 + nσ2
αβ
SSE =Pr
i=1
Pmj=1
Pnk=1(Yijk − Y ij·)
2 (n − 1)ab σ2
To test H0 : σ2α = 0 use SSA and SSAB.
To test H0 : σ2αβ = 0 use SSAB and SSE.
Statistics 262: Intermediate Biostatistics – p.40/??