group 4 ams 572. table of contents 1. introduction and history 1.1 part 1: ahram woo 1.2 part 2:...
TRANSCRIPT
ANCOVA
Group 4AMS 572
Table of Contents
1. Introduction and History
1.1 Part 1: Ahram Woo
1.2 Part 2: Jingwen Zhu
2. Theoretical Background
2.1 Part 1: Xin Yu
2.2 Part 2: Unjung Lee
3. Application of ANCOVA and Summary
3.1 Part 1: Xiaojuan Shang
3.2 Part 2: Younga Choi
3.3 Part 3: Qiao Zhang
1. Introduction and His-tory Group 4 by Ahram Woo
1. Introduction and His-tory Individual by Ahram Woo
Ahram Woo
Jingwen Zhu
Xiaojuan Shang Younga Choi
Qiao Zhang
Unjung Lee
Xin Yu
• Analysis of covariance : An extension of
ANOVA in which main effects and interac-
tions are assessed on Dependent
Variable(DV) scores after the DV has been
adjusted for by the DV’s relationship with
one or more Covariates (CVs)
1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo
• ANCOVA = ANOVA + Linear Regression
• R.A. Fisher who is credited with
the introduction of ANCOVA "S-
tudies in crop variation. IV. The
experimental determination of
the value of top dressings with
cereals" published in Journal of
Agricultural Science, vol. 17,
548-562. The paper was pub-
lished in 1927.
1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo
1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo
• ANOVA is described by R. A. Fisher to assist
in the analysis of data from agricultural ex-
periments.
• ANOVA compare the means of any number of
experimental conditions without any increase
in Type 1 error.
• ANOVA is a way of determining whether the
average scores of groups differed signifi-
cantly.
Model the relationship between ex-planatory and dependent variables by fitting a linear equation to ob-served data. (i.e. Y = a + bX)
1.2 Introduction to Linear Regression by Jingwen Zhu
1. Introduction and His-tory
There is a relationship or not ?
One variable causes the other?
Scatter Plot & Correlation Coefficient
The term “ regression” was first studied in depth by 19th-century sci-entist, Sir. Francis Galton.
Geographer Psychologist Statistician Meteorologist Eugenicist
1.2 Introduction to Linear Regression by Jingwen Zhu
1. Introduction and His-tory
Galton studied data on relative heights of fathers and their sons
Conclusions: A taller-than-average father tends to produce a taller-than-average son
The son is likely to be less tall than the father in terms of his relative position within his own population
1.2 Introduction to Linear Regression by Jingwen Zhu
1. Introduction and His-tory
ANCOVA is a merger of ANOVA and regres-sion.
ANCOVA allows to compare one variable in 2 or more groups taking into account (or to cor-rect for) variability of other variables, called covariates.
The inclusion of covariates can increase sta-tistical power because it accounts for some of the variability
1.2 Introduction to Linear Regression by Jingwen Zhu
1. Introduction and His-tory
Example: whether MCAT scores are significantly different among medical students who had dif-ferent types of undergraduate majors, when ad-justed for year of matriculation? •Dependent variable (continuous)
MCAT total (most recent)•Fixed factor (categorical variables)
Undergraduate major• 1 = Biology/Chemistry• 2 = Other science/health• 3 = Other
•Covariate Year of matriculation
1.2 Introduction to Linear Regression by Jingwen Zhu
1. Introduction and His-tory
One factor of k levels or groups. E.g., 3 treat-ment groups in a drug study.
The main objective is to examine the equality of means of different groups.
Total variation of observations (SST) can be split in two components: variation between groups (SSA) and variation within groups (SSE).
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
Consider a layout of a study with 16 subjects that intended to compare 4 treatment groups (G1-G4). Each group contains four subjects.
S1S2 S3 S4 G1 Y11 Y12 Y13 Y14 G2 Y21 Y22 Y23 Y24 G3 Y31 Y32 Y33 Y34 G4 Y41 Y42 Y43 Y44
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
Model:
Assumptions:– Observations yij are independent.– are normally distributed with mean
zero and constant standard deviation.
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
HypothesisHo: Means of all groups are equal.
Ha: At least one of them is not equal to other.
ANOVA Table
Source of Variance
Sum of Squares
Degree of Freedom
Mean Square
F
Treatment SSA a-1 SSA/(a-1) MSA/MSE
Error SSE N-a SSE/(N-a)
Total SST N-1
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
SSA (Variation between groups) is due to the difference in different groups. E.g. dif-ferent treatment groups or different doses of the same treatment.
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
Treatment1 2 …. a
….
….
…. …. …. ….
SAMPLE MEAN
….
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
SSE (Variation within groups) is the in-herent variation among the observations within each group.
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
Treatment1 2 …. a
….
….
…. …. …. ….
....
Sample Mean
….
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
• SST (SUM SQUARE OF TOTAL) is the combination of SSE and SSA
1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu
1. Introduction and His-tory
by Xin Yu
2. Theoretical Back-ground2.1 Model of ANOVA
ijiij uY
Data, the
jth
observatio
n of the ith
group
Grand mean of Y
Error N(0,σ ^2)
Effects of the jth group(we mainly
focus on when ai=0,i=1,…,a )
by Xin Yu
2. Theoretical Back-ground2.1 Model of Linear Regression
Data, the (ij)th observation
Predictor Error
Slope and Intersect (we mainly focus on the estimate)
2. Theoretical Back-ground2.1 ANCOVA: ANOVA Merged With Linear Regression by Xin Yu
ijijiij XXuY )(..
Effects of the ith group (We still
focus on if ai=0, i=1,…,a)
Known covariance
2. Theoretical Back-ground2.1 How to Perform ANCOVA by Xin Yu
ijijiij XXauY )(
..
)()(..
ˆ~XXYY ijijij
adjust
ANOVA Model!
2. Theoretical Back-ground2.1 How do we get by Xin Yu
ijijiij XXY )(..
Within each group, consider ai as a constant, and notice that we actually only desire the estimate of slope β instead of intersect.
2. Theoretical Back-ground2.1 How do we get (continue) by Xin Yu
(*)Within each group, do Least Square:
(*)Assume that β1=…=βi=…=βa
(*)Which means that αi and β are independent; Or, Covariate has nothing to do with group effect
2. Theoretical Back-ground2.1 How do we get (continue) by Xin Yu
We use POOLED ESTIMATE of β
2. Theoretical Back-ground by Xin Yu2.1 Model of ANOVA
Y = β0 + β1 X+ ε
Y : dependent (response) variable
X : independent (predictor) variable
β0 : the intercept
β1 : the slope
ε : error term ~ N(0,σ2)
E(Y) = β0 + β1X
2.2.A The Simple Linear Regression Model by Unjung Lee
2. Theoretical Back-ground
X
Y
(E Y) =β0 + β1 x
}} β1 = Slope
1
y
{Error:
β0 = Intercept
2.2.A The Simple Linear Regression Model by Unjung Lee
2. Theoretical Back-ground
Y
Identical normal distri-butions of errors, all centered on the re-gression line.
(E Y) =β0 + β1 x
y
N(my|x, sy|x2)
2.2.A The Simple Linear Regression Model by Unjung Lee
2. Theoretical Back-ground
The relationship between X and Y is the
straight-line relationship.
X and Y has a common variance σ2 .
Error is normally distributed.
Error is independent.
2.2.A Assumptions of simple linear regression modelby Unjung Lee
2. Theoretical Back-ground
2.2.A The least squares(LS) method by Unjung Lee
2. Theoretical Back-ground
The fitted values and residu-als
We can get these ones with the normal equations
2.2.A The least squares(LS) method by Unjung Lee
2. Theoretical Back-ground
X
Y
Data
X
Y
Three errors from a fitted line
X
Y
Three errors from the least squares regression line
e
X
Errors from the least squares regression line are minimized
2.2.A Fitting a Regression Line by Unjung Lee
2. Theoretical Back-ground
.{ˆError e y yi i i
ˆ the predicted value of for y xY
Y
X
ˆ the fitted regression liney xa b
ˆiy
xi
yi
2.2.A Errors in Regression by Unjung Lee
2. Theoretical Back-ground
A statistical model that utilizes two or more
quantitative and qualitative explanatory
variables (x1,..., xp) to predict a quantita-
tive dependent variable Y.
Caution: have at least two or more quanti-
tative explanatory variables (rule of thumb)
2.2.A Multiple linear regression by Unjung Lee
2. Theoretical Back-ground
• Involves categorical X variable with two levels– e.g., female-male, employed-not
employed, etc.• Variable levels coded 0 & 1• Assumes only intercept is different
– Slopes are constant across cate-gories
2.2.A Dummy-Variable Regression Model by Unjung Lee
2. Theoretical Back-ground
Y
X10
0
Same slopes b1
b0
b0 + b2
Females
Males
2.2.A Dummy-Variable Model Relationships by Unjung Lee
2. Theoretical Back-ground
• Permits use of qualitative data (e.g.: seasonal, class standing, location, gen-der).
• 0, 1 coding (nominative data)
• As part of Diagnostic Checking; incorporate outliers (i.e.: large residuals) and influence
measures.
2.2.A Dummy Variables by Unjung Lee
2. Theoretical Back-ground
• Hypothesizes interaction between pairs of X variables– Response to one X variable varies at differ-
ent levels of another X variable• Contains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 + • Can be combined with other models e.g. dummy variable models
2.2.A Interaction Regression Model by Unjung Lee
2. Theoretical Back-ground
• Given:
• Without interaction term, effect of X1 on Y is measured by 1
• With interaction term, effect of X1 onY is measured by 1 + 3X2
– Effect increases as X2i increases
Y X X X Xi i i i i i 0 1 1 2 2 3 1 2
2.2.A Effect of Interaction by Unjung Lee
2. Theoretical Back-ground
β
β β
Effect (slope) of X1 on Y does depend on X2 value
X1
4
8
12
00 10.5 1.5
Y Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
2.2.A Interaction Example by Unjung Lee
2. Theoretical Back-ground
2.2.A The two-way ANOVA by Unjung Lee
2. Theoretical Back-ground
sourse df ss Ms
Factor A a-1 SS(A) MS(A) = SS(A)/(a-1)
Factor B b-1 SS(B) MS(B) = SS(B)/(b-1)
Intersection AB
(a-1)(b-1) SS(AB) MS(AB)= SS(AB)/(a-1)(b-1)
Error ab(r-1) SSE SSE/ab(r-1)
Total abr-1 SS(Total)
2.2.A The two-way ANOVA table by Unjung Lee
2. Theoretical Back-ground
2.2.A Test homogeneity of variance by Unjung Lee
2. Theoretical Back-ground
2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu
2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu
a
ii
G SSESSE1
(1) Define Sum of Square of Errors within Groups Is calculated based on
AND, is generated by the random error ε.
i
2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu
i
i
(2) SSE is generated by (*) Random Error ε (*)Difference between distinct we can calculate SSE based on a common
(3) Let SSA=SSE- SSA Sum of Square between Groups SSA is constituted by the difference between dif-ferent
2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu
)2(
1
1)2(]1)1([
na
a
SSASSA
anana
SSEdfSSEMSA
dfMSA
dfdfdf
G
G
e
GG
a
G
eea
MSA Mean Square between Groups Mean Square within GroupsDo F test on MSA and to see whether we can reject our Ho F= MSA/
2. Theoretical Back-ground2.2.C Test Linear Relationship by Xin Yu
Assumption 3:Test a linear relationship between the dependent
variable and covariate. Ho: β=0 How to do it next? Use F test on SSR and SSE
S um of S quare of
R egress ion
2. Theoretical Back-ground
How to calculate SSR and MSR? From each
SST is the difference obtained from the summation of the square of the
differences between and .
2.
1 1
( )ina
iji j
SST y y
.y
2.2.C Test Linear Relationship by Xin Yu
2
1
ˆ( )n
ii
SSR y y
/1MSR SSR
2. Theoretical Back-ground
How to calculate SSE and MSE? From each
2
1 1
( )ina
ij ii j
SSE y y
( )
SSEMSE
n a
yi
ˆ
SSE is the error obtained from the summation of the square of the differences between and
2.2.C Test Linear Relationship by Xin Yu
2. Theoretical Back-ground
Based on the T.S. we determine whether to accept Ho(β=0) or not.
Assume Assumption 1 and 2 are already passed.
(*)If H0 is true (β=0), we do ANOVA.
(*)Otherwise, we do ANCOVA
So, anytime we want to use ANCOVA, we need to test the three assumptions first!
2.2.C Test Linear Relationship by Xin Yu
MSRF
MSE
3.1 Case Introduction by Xiaojuan Shang
3. Application of ANCOVA
Analysis of covariance (ANCOVA) is a statisti-
cal procedure that allows you to include both
categorical and continuous variables in a sin-
gle model. ANCOVA assumes that the regres-
sion coefficients are homogeneous (the same)
across the categorical variable. Violation of
this assumption can lead to incorrect conclu-
sions
3.1 Case Introduction by Xiaojuan Shang
3. Application of ANCOVA
Here is an example data file we will use. It
contains 30 subjects who used one of three
diets, diet 1 (diet=1), diet 2 (diet=2) and a
control group (diet=3). Before the start of the
study, the height of the subject was mea-
sured, and after the study the weight of the
subject was measured.
3.1 Data Structure by Xiaojuan Shang
3. Application of ANCOVA
3.1 Case Concerns by Xiaojuan Shang
3. Application of ANCOVA
• Difference between three diet groups
• Correlation between height and weight
• Difference between control group and the other two groups
3.1 Case Data: Compare with ANOVA by Xiaojuan Shang
3. Application of ANCOVA
PROC GLM DATA=htwt;
CLASS diet ;
MODEL weight = diet ;
MEANS diet / deponly ;
CONTRAST 'compare 1&2 with control' diet 1 1 -
2 ;
CONTRAST 'compare diet 1 with 2 ' diet 1 -1
0 ;
RUN;
QUIT;
3.1 Case Data: Compare with ANOVA by Xiaojuan Shang
3. Application of ANCOVA
3.1 Case Data: Compare with ANOVA by Xiaojuan Shang
3. Application of ANCOVA
1. Description of data2. Investigation of equality of slope for the
groups through traditional ANOVA model (homogeneity of regression assumption)
3. When homogeneity of assumption is vio-lated
examination on the effect of the group variable (diet group) at different levels of the co-variate (height levels).
3.2 SAS Codes for ANCOVA model: Outline by Younga Choi
3. Application of ANCOVA
•N= 30 •IV:
(1)Diet (three levels) - diet 1 (diet=1, n=10)- diet 2 (diet=2, n=10)
- diet 3, control group, (diet=3, n=10) (2) Height
•DV: weight of the subject was measured after the study
3.2 Data Description by Younga Choi
3. Application of ANCOVA
Comparing means of diet groups
Comparing means of diet groups
3.2 Reading the Data & Traditional ANCOVA model
by Younga Choi
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
Checking on the Homogeneity of Regression Assump-tion:
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
Checking on the Homogeneity of Regression Assump-
tion: Pairwise Comparisons
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
When the Homogeneity of Regression Assumption is Vio-lated
Comparing Slope of Diet1 and Diet2 and Diet3 Combined
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
Overall mean value of heightOverall mean value of height
3.2 Homogeneity of Regression Assumption by Younga Choi
3. Application of ANCOVA
3.3 SAS Output- One Way ANOVA Model by Qiao Zhang
3. Application of ANCOVA
The results are consistent with those of the ANOVA
3.3 Standard ANCOVA Model by Qiao Zhang
3. Application of ANCOVA
3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang
3. Application of ANCOVA
Diet=1Dependent Variable: weight
Diet=2Dependent Variable: weight
Diet=3Dependent Variable: weight
There is significant linear relationship be-tween weight and height in both diet 2 and diet 3 group, but not in diet 1 group.
3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang
3. Application of ANCOVA
The diet*height effect is indeed signifi-cant, indicating that the slopes do differ across the three diet groups.
3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang
3. Application of ANCOVA
These results indicate a significant differ-ence between diet 1 and diet 2 for those 59 inches tall, and a significant difference for those 64 inches tall. For those who are tall (i.e., 68 inches), diet 1 and diet 2 are about equally effective.
3.3 Tests : Comparing diet 1 with diet 2 by Qiao Zhang
3. Application of ANCOVA
The difference in weight between diet groups 1 and 2 combined and the control group is significant at different heights.
3.3 Comparing diets 1 and 2 to the control group by Qiao Zhang
3. Application of ANCOVA
The test comparing the slopes of diet group 1 versus 2 and 3 was significant, and the test comparing the slopes for diet groups 2 versus 3 was not signifi-cant.
We can combine slopes for diet group 2 and 3.
3.3 Testing to pool slopes by Qiao Zhang
3. Application of ANCOVA
Pooled slopes model
Unpooled slopes model
3.3 Overall analysis: diet groups 2 and 3 by Qiao Zhang
3. Application of ANCOVA
Comparing diet groups 1 and 2 when pooling slopes for diet groups 2 and 3
Comparing diet groups 2 and 3 when pooling slopes for diet groups 2 and 3
3.3 Overall analysis by Qiao Zhang
3. Application of ANCOVA
• The homogeneity of regression assumption is violated in this data set.
• We then estimated models that have separate slopes across groups.
• When comparing the control group to diets 1 and 2, we found the control group weighed more at 3 different levels of height (59 inches, 64 inches and 68 inches).
• When we comparing diets 1 and 2, we found diet 2 to be more effective at 59 and 64 inches, but there was no difference at 68 inches.
3.3 Summary of Outputs by Qiao Zhang
3. Application of ANCOVA