one-factor experiments & ancova
DESCRIPTION
One-Factor Experiments & ANCOVA. Group 3 Jesse Colton; Junyan Song; Kan He; Lijuan Kang; Minqin Chen; Xiaotong Li ; Xin Li ; Yaqi Xue. Outline:. History and Introduction. Model and Overall F Test. Theoretical Background. Do ANCOVA by Hand. ANCOVA. ANOVA. - PowerPoint PPT PresentationTRANSCRIPT
One-Factor Experiments & ANCOVA
Group 3
Jesse Colton; Junyan Song; Kan He; Lijuan Kang; Minqin Chen; Xiaotong Li;Xin Li ; Yaqi Xue
Outline:
ANOVA
ANCO
VA
Theoretical Background
Do ANCOVA by Hand
Check Assumptions
Do ANCOVA by SAS
History and Introduction
Model and Overall F Test
Pairwise Test for Group
Means
ANOVA Linear Model and
Tests
What Is ANCOVA?
Definition• ANOVA stands for Analysis Of Variance.
• ANCOVA stands for Analysis Of Covariance.
• ANCOVA uses aspects of ANOVA and Linear Regression to compare samples to each other, when there are outside variables involved
• “One-Factor Experiment” means we are testing an experiment using only one single treatment factor.
HistoryLike many of the important topics in statistical analysis, elements of ANOVA/ANCOVA come from works of R.A. Fisher, and some from Francis Galton
History
Ronald Aylmer Fisher1890-1962
• British Statistician, Eugenicist, Evolutionary Biologist & Geneticist• Fisher “pioneered the principles of the design of
experiments and elaborated his studies of analysis of variance.”(Wikipedia)• He also developed the method of maximum
likelihood, and is known for “Fisher’s exact test”
HistorySir Francis Galton
1822-1922
• Established the concept of correlation
• He “invented the use of the regression line and was the first to describe and explain the common phenomenon of regression toward the mean.”(Wikipedia)
Uses• ANOVA is used to compare the means of two or
more groups.• ANCOVA is used in situations where another
variable effects the experiment.• While we normally use the T-test for two group
means, there are many situations where it is not applicable or as useful.• More than 2 samples• Samples with additional variables• Other factors leading to skewed experimental results
Uses• When conducting an experiment, there is often an
initial difference between test groups.• ANCOVA “provides a way of measuring and
removing the effects of such initial systematic differences between the samples.”
(http://vassarstats.net/textbook/ch17pt2.html)• If you only compare the means, you are not
taking into account any previous advantages one group may have
UsesExample: Two methods of teaching a topic are tested on two different groups (A and B). However, in the preliminary data collected, group A is shown to have a higher IQ than group B. The fact that group A had a higher score after learning by one method does not prove the method is better. ANCOVA seeks to eliminate the difference between the groups before the experiment in order test which method is better.
Uses• By merging ANOVA with Linear Regression,
ANCOVA controls for the effects that the covariates we are not studying may have on the outcomes
ANOVA Linear Regression
ANCOVA
Aims of ‘ANOVA’ Models• Linear models with continuous response and one or
more categorical predictors
• Description:-relation between response variable (Y) and predictor (X) variable(s)
• Explanation:- How much of variation in Y explained by different sources of variation (factors or combination of factors)
Completely Randomized Designs• Experimental designs where there is
no restriction on random allocation of experimental/sampling units to groups or treatments
- single factor and factorial designs
Single factor model Completely randomized design
Terminology
• Factor (categorical predictor variable): - usually designed factor A • Number of observations within each
group: -ni
• Each observation: - y
Data layout
Estimating Model Parameters
Estimating Model Parameters
Estimating Model Parameters
Least Square (LS) Estimate
Estimating Model Parameters
Analysis of Variance
•Test the hypothesis
aH 210 :
equalareallNotH ia :
Analysis of Variance
•Test the hypothesis0: 210 aH
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑠𝑜𝑚𝑒𝜏 𝑖≠0
Analysis of Variance
Analysis of Variance
Analysis of Variance
Test Statistical
Analysis of Variance
Analysis of Variance
Unequal sample sizes
• Sums of squares equations provided only work for equal sample sizes
- can be modified for unequal samples sizes but very clumsy -model comparison approach simpler (and used by statistical software)
Unequal sample sizes• F-ratio tests less reliable if sample sizes
are different, especially if variances also different
- bigger difference in sample sizes, less reliable tests become• Use equal or similar sample sizes if
possible• But don’t omit data to balance sample
sizes!
Anova— Multiple Comparisons of Means
Reject , where a is the # of groups Not all means are equal.
But which means are significantly different from each other?
We need a more detailed comparison! Making multiple test
0 1 2: aH
Making multiple test Test All Pairwise equality Hypotheses
Number of Pairs: Using two sided t-test at level α:Reject if
where
Anova— Multiple Comparisons of Means
=
0 :
:ij i j
aij i j
H
H
𝐻0 𝑖𝑗2, /21 1 i j
i jij n n
i j
y yT t
Sn n
𝑆2=MSE=¿
𝑛𝑖 𝑦 𝑖 is the number of group i, is the mean of the observed value of group i, .
2
1 1
( ) /ina
ij ii j
y y N a
1
a
ii
N n
Anova— Multiple Comparisons of Means
Least Significant Difference (LSD):The critical value,
that the difference must exceed in order to be significant at level .
2, /2 2, /21 1
1 1 i j i j
i jij n n i j n n
i j
i j
y yT t y y t S
n nSn n
2, /21 1
i jn ni j
t Sn n
¿ 𝑦 𝑖−𝑦 𝑗∨¿
𝛼
Anova— Multiple Comparisons of Means
Familywise Error Rate (FWE):Type I error probability of declaring at least one pairwise difference to be falsely significant.
FWE=P{Reject at least one true null hypothesis}
If each test is done at level , then FWE will exceed .Why?
𝛼𝛼
Anova— Multiple Comparisons of Means
Let denote rejecting the true null hypothesis in test, where total number of test is k=.P( ) = = type I error.FWE=P( ) =P( )If is independent to each other,FWE=k*P( )= k
Our goal is to control FWE .
𝛼
𝐴𝑖
𝑖 h𝑡
𝐴𝑖
𝐴1 𝐴𝑘 𝐴𝑖
𝛼
≤𝛼
𝐴𝑖𝐴𝑖
≥𝛼
Anova— Multiple Comparisons of Means
Two Methods:• Bonferroni Method.• Tukey Method.
Anova— Multiple Comparisons of Means
Bonferroni Method• Idea: To perform k tests simultaneously, divide the FWE α among the k tests. If the error rate is allocated equally among the k tests, then each test is done at level α/k.
For example: α=0.05 and k=10 each test: 0.05/10=0.005
Anova— Multiple Comparisons of Means
Bonferroni Method• Test:
At FWE= , we reject if
0 :
:ij i j
aij i j
H
H
𝛼 0ijH
2, /21 1 i j
i jij n n k
i j
y yT t
sn n
where 2
1 1
( ) /ina
ij ii j
y y N a
Anova— Multiple Comparisons of Means
Tukey Method
At FWE= , we reject if
where,
0 :
:ij i j
aij i j
H
H
𝛼 0ijH
, ,| |1 1 2i j a N a
ij
i j
y y qt
sn n
, ,a N aq
Dummy Variable:A Dummy Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels.
How to create a Dummy Variable:The number of dummy variables necessary to represent a single attribute variable is equal to the number of levels(categories)(k) in that variable minus one. (k-1)
Gender: Male & Female
Categories D1
Male 1
Female 0
Rank: Assistant & Associate & Full
Categories D1 D2
Assistant 1 0
Associate 0 1
Full 0 0
ANOVA Models(A Multiple Regression with all categorical predictors):General Linear Model:
Dummy Variables
?Relationship between these Models:
constraint
Note: is the Grand Mean, but in the last case it is the mean of Group 3.
, is different from those in the last case.
The Interpretation differs depending on which constraint we apply.
: Group one mean-Group three mean
: Group two mean-Group three mean
:Group one mean –Group two mean
? How do we test ANOVA in terms of General Linear Model
1. Overall F-TestH0:
H0:
Recall Test for Multiple Regression Coeffcient:Reduced Model:Full Model:
P: numbers of parameters in H0. * p=a-1
Recall Test for ANOVA in terms of Model
General Linear Model
We reject H0 when So the Overall Test of ANOVA for both models are consistent.
2. Test for individual regression coefficient(Pairwise Test for Group Means)H0 differs depending on different coding of the Dummy Variables.For Example:
H0: T test
F test:
Full Model:
Reduced Model:
ANCOVA Models(A Multiple Regression with continuous predictors and dummy coded factors)
Continuous Dummy Variables Variables
Overall Test for ANCOVA in terms of Linear Model:H0:
H0:
What is analysis of Covariance?• An analysis procedure for looking at group
effects on a continuous outcome when some other continuous explanatory variable also has an effect on the outcome.• Generally, ANCOVA has at least one or more
categorical independent variables, and one or more covariates. It can be seen as multiply regression with 1+ covariates and 1+ dummy variable coded factors.
Why include Covariates in ANOVA
• To reduce within-group error variance: explain part of unexplained variance in terms of covariates so we can reduce the error variance and increase the statistical power.• Elimination of Confounds: if any variables which
will have an influence on the dependent variable can be measured, ANCOVA would be a good choice to use to partial out such effect.
Assumptions of ANCOVA
• Normality of Residuals• Homogeneity of Variances• Independence of Error terms• Linearity of Regression• Homogeneity of Regression Slopes• Independence of Covariates and treatment effect
Homogeneity of Regression
Test the Homogeneity of Regression
• Run ANCOVA model including independent variables and interaction term
• If interaction term is significant, the assumption is invalid.
• If interaction term is not significant, then try one more without intersection term.
General Linear Model of ANCOVA
• Yij = GMY + αi + [βi(Ci – Mij) + …… ] + εij
A continuous dependent
variable
Grand Mean of
dependent variable
Treatment effect
Regression coefficient for ith covariate
Known Covariance
Error N(0, σ2)
General Linear Model of ANCOVA
• Yij - [βi(Ci – Mij) + …… ] = GMY + αi + εij
Adjusted Yij = GMY + αi + εij
• Adjusted dependent variable means the relationship between dependent variable and covariates has been partialed out of dependent variable.
Adjusted continuous dependent
variable Same as ANOVA model
How to calculate Regression coefficient?
• The numerator is the covariance of X and Y within the group
• The denominator is the sum square of deviates within the group
• Then we should take the summation of βi hat, which is the regression coefficient
F test in ANCOVA
• F test in ANCOVA is same as that in ANOVA, the only difference is that now we are using the adjusted values of SSbg(Y) and SSwg(Y), along with adjusted value of df.
• If it is significant, the group means statistically differ after controlling for the effect of 1+ covariates
Abbreviation
• SS: sum square of deviates• SC: sum of co-deviates• SST: total sum square of deviates• SSWG: sum square of deviates within groups• SSBG: sum square of deviates between groups• SCT: total sum of co-deviates• SCWG: sum of co-deviates with group• SCBG: sum of co-deviates between group
ANCOVA
http://vassarstats.net/textbook/ch17pt2.html
Example:
Comparing twomethods of HypnoticInduction
Items to calculateFor the Dependent Variable Y
= - = - ) + ( - )
= -
Items to calculateFor the Covariate X
= -
= - ) + ( - )
Calculations
Items to calculateFor the Covariance of X and Y
(Sum of the co-deviates)
= (General form)
+
Calculations
4. The Final Set of CalculationsA summary of the values we obtained so far
X Y CovarianceSST(X) = 908.9SSwg(X) = 788.9
SST(Y) = 668.5SSwg(Y) = 662.5SSbg(Y) = 6.0
SCT = 625.9SCwg = 652.8
4a. Adjustment of SST(Y)The overall correlation between X and Y:
The proportion of the total variability of Y attributable to its covariance with X is accordingly(rT)2 = (+.803)2 = .645
we adjust SST(Y) by removing from it this proportion of covariance. Since SST(Y)=668.5
4b. Adjustment of SSwg(Y)The overall correlation between X and Y within the two groups:
The proportion of the within-groups variability of Y attributable to covariance with X is therefore(rwg)2 = (+.903)2 = .815
we adjust SSwg(Y) by removing from it this proportion of covariance. Since SSwg(Y)=662.5
4c. Adjustment of SSbg(Y)The adjusted value of SSbg(Y) can then be obtained through simple subtraction as
4d Adjustment of Means of Y for Groups A and B
Purpose: Adjust the group means of Y to the same starting point, using the aggregate correlation between X and Y within the two groups.
Recall for Linear Regression:
By Least Square Method:
We can get:
An increase by 1 unit of X is associated withan average increase of .83 units of Y.
bwg:=.83Original: Adjusted :
Mx My13.1 29.2+2.45(.83)=31.23 +2.45
15.55
-2.4518.0 28.1 -2.45(.83)=26.07
( ..)ij i ij ijY X X
?Linear Model for ANCOVA:
[ adjusted Yij]
Linear Model For ANOVA
Thus, as with the corresponding one way ANOVA,The final step in a one-way analysis of covariance Involves the calculation of an F-ratio of the general form.
4e. Analysis of Covariance Using Adjusted Values of SS
We have to use the adjusted values of SSbg(Y) and SSwg(Y), along with one adjusted value of df.
Total Numbers of YNumbers of Group
Numbers of independent variables
Interpretation:
Summary:
ANCOVA Begins
Four sets of
calculation
Get rid of covariate
from SS(Y)&Mean(Y)
ANAOVAF Test,
Interpretation
ANCOVA Assumptions
ANCOVA GLM
ANCOVA AssumptionsFull model—the model involving all x’sReduced model – the model involving only those x’s from the full model whose β coefficients are not hypothesized as 0.
(full) (reduced)T.S.
ANCOVA Assumptions
• No interaction between the factor and the covariate. Interaction between 2 independent variables is present when the effect of one on the outcome depends the value of the other.
• The slope terms for within group regression doesn’t differ• The regression line of different groups are parallel. • Group 1: • Group 2:
ANCOVA Assumptions
With interaction
Group 1:
Group2:
Slopes not equal:
¿ 𝛽0+𝛽1+( 𝛽2+𝛽3 )𝑐𝑖
ANCOVA Assumptions
Testing The Interaction for SignifianceInteraction of Interest: the interaction between the covariate and the dummy variable.Interaction term
FULL MODEL
REDUCED MODEL
k= 3, g=2
ANCOVA Assumptions
Example:
ANCOVA Assumptions
: the response, i.e. anxiety score: the drug doseDrug A: Drug B:
ANCOVA Assumptions
Reject Anxiety level increases at different rate as the drug dose is increasedfor drug A and drug B.
SAS Implementation
SAS code 1. Initial data exploration
proc contents data=Instruction;run;
proc means data=Instruction N MEAN STD MAXDEC=2;class method;var prescore postscore;
run;
Proc freq data=instruction;tables method;
Run;
proc sgplot data = instruction;reg x = Prescore y = PostScore / group = method;
run;
SAS code 2. ANOVA Model
PROC GLM DATA=Instruction; CLASS method ; No class statement in PROC REG MODEL PostScore=method/solution ; MEANS method / deponly ; RUN;QUIT;
We can also use: PROC ANOVA data=instruction;
class method;model postscore=method;means method/Tukey;
run;quit;
SAS code Another way: creating dummy variables for Method
data instruction_dummy;set instruction;
/**create dummy variables**/if method="A" then do; dummy1=0; dummy2=0;end;if method="B" then do; dummy1=1; dummy2=0;end;if method="C" then do; dummy1=0; dummy2=1;end;
run;
TITLE“ Regression model for Instruction method dataset";PROC GLM DATA=Instruction_dummy; MODEL PostScore=dummy1 dummy2 /solution; RUN;
Now we don’t need Class statement in PROC GLM
ANOVA output Accept Null hypothesis:H0:
Missing line because there are only two dummy variables
Accept Null hypothesis:H0:H0:
SAS code3. ANCOVA Model
ods graphic on;
proc glm data=instruction plot=meanplot(cl); class method; model PostScore = method PreScore/solution; lsmeans method / pdiff; output out=out p=yhat r=resid stdr=eresid;run;quit;
ods graphic off;
Include covariate x: PreScore
ANCOVA output Accept Alternative Hypothesis:H1: at least one
𝛽2≠0𝜇2≠𝜇3
is almost 0, we may expect
Covariate X is significant
ANCOVA outputAdjusted Means:
; ;
SAS code3. Checking on the homogeneity of Slope
/**1) perform an analysis that shows the slopes of each of the lines***/PROC SORT DATA=instruction; BY method;RUN;
PROC GLM DATA=instruction; BY method; MODEL PostScore = PreScore / SOLUTION ;RUN;QUIT;
/** 2) method*prescore effect tests if the three slopes are equal**/PROC GLM DATA=instruction; CLASS method; MODEL PostScore = method PreScore method*PreScore;RUN;QUIT;
Include Interaction Term
Interaction term is not significant: Assumption met
Comparing before and after adjusted means
Acknowledge: • http://www.ats.ucla.edu/stat/sas/library/hetreg.htm• http://www.unt.edu/rss/class/mike/6810/ANCOVA.pdf• http://www.stat.cmu.edu/~hseltman/309/Book/chapter10.pdf• SAS/STAT(R) 9.22 User's Guide• Text book: Statistics and Data Analysis from Elementary to
Intermediate