group 4 ams 572. table of contents 1. introduction and history 1.1 part 1: ahram woo 1.2 part 2:...

Post on 30-Mar-2015

247 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ANCOVA

Group 4AMS 572

Table of Contents

1. Introduction and History

1.1 Part 1: Ahram Woo

1.2 Part 2: Jingwen Zhu

2. Theoretical Background

2.1 Part 1: Xin Yu

2.2 Part 2: Unjung Lee

3. Application of ANCOVA and Summary

3.1 Part 1: Xiaojuan Shang

3.2 Part 2: Younga Choi

3.3 Part 3: Qiao Zhang

1. Introduction and His-tory Group 4 by Ahram Woo

1. Introduction and His-tory Individual by Ahram Woo

Ahram Woo

Jingwen Zhu

Xiaojuan Shang Younga Choi

Qiao Zhang

Unjung Lee

Xin Yu

• Analysis of covariance : An extension of

ANOVA in which main effects and interac-

tions are assessed on Dependent

Variable(DV) scores after the DV has been

adjusted for by the DV’s relationship with

one or more Covariates (CVs)

1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo

• ANCOVA = ANOVA + Linear Regression

• R.A. Fisher who is credited with

the introduction of ANCOVA "S-

tudies in crop variation. IV. The

experimental determination of

the value of top dressings with

cereals" published in Journal of

Agricultural Science, vol. 17,

548-562. The paper was pub-

lished in 1927. 

1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo

1. Introduction and His-tory1.1 Introduction to ANCOVA by Ahram Woo

• ANOVA is described by R. A. Fisher to assist

in the analysis of data from agricultural ex-

periments.

• ANOVA compare the means of any number of

experimental conditions without any increase

in Type 1 error.

• ANOVA is a way of determining whether the

average scores of groups differed signifi-

cantly.

Model the relationship between ex-planatory and dependent variables by fitting a linear equation to ob-served data. (i.e. Y = a + bX)

1.2 Introduction to Linear Regression by Jingwen Zhu

1. Introduction and His-tory

There is a relationship or not ?

One variable causes the other?

Scatter Plot & Correlation Coefficient

The term “ regression” was first studied in depth by 19th-century sci-entist, Sir. Francis Galton.

Geographer Psychologist Statistician Meteorologist Eugenicist

1.2 Introduction to Linear Regression by Jingwen Zhu

1. Introduction and His-tory

Galton studied data on relative heights of fathers and their sons

Conclusions: A taller-than-average father tends to produce a taller-than-average son

The son is likely to be less tall than the father in terms of his relative position within his own population

1.2 Introduction to Linear Regression by Jingwen Zhu

1. Introduction and His-tory

ANCOVA is a merger of ANOVA and regres-sion.

ANCOVA allows to compare one variable in 2 or more groups taking into account (or to cor-rect for) variability of other variables, called covariates.

The inclusion of covariates can increase sta-tistical power because it accounts for some of the variability

1.2 Introduction to Linear Regression by Jingwen Zhu

1. Introduction and His-tory

Example: whether MCAT scores are significantly different among medical students who had dif-ferent types of undergraduate majors, when ad-justed for year of matriculation? •Dependent variable (continuous)

MCAT total (most recent)•Fixed factor (categorical variables)

Undergraduate major• 1 = Biology/Chemistry• 2 = Other science/health• 3 = Other

•Covariate Year of matriculation

1.2 Introduction to Linear Regression by Jingwen Zhu

1. Introduction and His-tory

One factor of k levels or groups. E.g., 3 treat-ment groups in a drug study.

The main objective is to examine the equality of means of different groups.

Total variation of observations (SST) can be split in two components: variation between groups (SSA) and variation within groups (SSE).

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

Consider a layout of a study with 16 subjects that intended to compare 4 treatment groups (G1-G4). Each group contains four subjects.

S1S2 S3 S4 G1 Y11 Y12 Y13 Y14 G2 Y21 Y22 Y23 Y24 G3 Y31 Y32 Y33 Y34 G4 Y41 Y42 Y43 Y44

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

Model:

Assumptions:– Observations yij are independent.– are normally distributed with mean

zero and constant standard deviation.

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

HypothesisHo: Means of all groups are equal.

Ha: At least one of them is not equal to other.

ANOVA Table

Source of Variance

Sum of Squares

Degree of Freedom

Mean Square

F

Treatment SSA a-1 SSA/(a-1) MSA/MSE

Error SSE N-a SSE/(N-a)

Total SST N-1

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

SSA (Variation between groups) is due to the difference in different groups. E.g. dif-ferent treatment groups or different doses of the same treatment.

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

Treatment1 2 …. a

….

….

…. …. …. ….

SAMPLE MEAN

….

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

SSE (Variation within groups) is the in-herent variation among the observations within each group.

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

Treatment1 2 …. a

….

….

…. …. …. ….

....

Sample Mean

….

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

• SST (SUM SQUARE OF TOTAL) is the combination of SSE and SSA

1.2 Introduction to One-way Analysis of Variance by Jingwen Zhu

1. Introduction and His-tory

by Xin Yu

2. Theoretical Back-ground2.1 Model of ANOVA

ijiij uY

Data, the

jth

observatio

n of the ith

group

Grand mean of Y

Error N(0,σ ^2)

Effects of the jth group(we mainly

focus on when ai=0,i=1,…,a )

by Xin Yu

2. Theoretical Back-ground2.1 Model of Linear Regression

Data, the (ij)th observation

Predictor Error

Slope and Intersect (we mainly focus on the estimate)

2. Theoretical Back-ground2.1 ANCOVA: ANOVA Merged With Linear Regression by Xin Yu

ijijiij XXuY )(..

Effects of the ith group (We still

focus on if ai=0, i=1,…,a)

Known covariance

2. Theoretical Back-ground2.1 How to Perform ANCOVA by Xin Yu

ijijiij XXauY )(

..

)()(..

ˆ~XXYY ijijij

adjust

ANOVA Model!

2. Theoretical Back-ground2.1 How do we get by Xin Yu

ijijiij XXY )(..

Within each group, consider ai as a constant, and notice that we actually only desire the estimate of slope β instead of intersect.

2. Theoretical Back-ground2.1 How do we get (continue) by Xin Yu

(*)Within each group, do Least Square:

(*)Assume that β1=…=βi=…=βa

(*)Which means that αi and β are independent; Or, Covariate has nothing to do with group effect

2. Theoretical Back-ground2.1 How do we get (continue) by Xin Yu

We use POOLED ESTIMATE of β

2. Theoretical Back-ground by Xin Yu2.1 Model of ANOVA

Y = β0 + β1 X+ ε

Y : dependent (response) variable

X : independent (predictor) variable

β0 : the intercept

β1 : the slope

ε : error term ~ N(0,σ2)

E(Y) = β0 + β1X

2.2.A The Simple Linear Regression Model by Unjung Lee

2. Theoretical Back-ground

X

Y

(E Y) =β0 + β1 x

}} β1 = Slope

1

y

{Error:

β0 = Intercept

2.2.A The Simple Linear Regression Model by Unjung Lee

2. Theoretical Back-ground

Y

Identical normal distri-butions of errors, all centered on the re-gression line.

(E Y) =β0 + β1 x

y

N(my|x, sy|x2)

2.2.A The Simple Linear Regression Model by Unjung Lee

2. Theoretical Back-ground

The relationship between X and Y is the

straight-line relationship.

X and Y has a common variance σ2 .

Error is normally distributed.

Error is independent.

2.2.A Assumptions of simple linear regression modelby Unjung Lee

2. Theoretical Back-ground

 

2.2.A The least squares(LS) method by Unjung Lee

2. Theoretical Back-ground

 

The fitted values and residu-als

We can get these ones with the normal equations

2.2.A The least squares(LS) method by Unjung Lee

2. Theoretical Back-ground

X

Y

Data

X

Y

Three errors from a fitted line

X

Y

Three errors from the least squares regression line

e

X

Errors from the least squares regression line are minimized

2.2.A Fitting a Regression Line by Unjung Lee

2. Theoretical Back-ground

.{ˆError e y yi i i

ˆ the predicted value of for y xY

Y

X

ˆ the fitted regression liney xa b

ˆiy

xi

yi

2.2.A Errors in Regression by Unjung Lee

2. Theoretical Back-ground

A statistical model that utilizes two or more

quantitative and qualitative explanatory

variables (x1,..., xp) to predict a quantita-

tive dependent variable Y.

Caution: have at least two or more quanti-

tative explanatory variables (rule of thumb)

2.2.A Multiple linear regression by Unjung Lee

2. Theoretical Back-ground

• Involves categorical X variable with two levels– e.g., female-male, employed-not

employed, etc.• Variable levels coded 0 & 1• Assumes only intercept is different

– Slopes are constant across cate-gories

2.2.A Dummy-Variable Regression Model by Unjung Lee

2. Theoretical Back-ground

Y

X10

0

Same slopes b1

b0

b0 + b2

Females

Males

2.2.A Dummy-Variable Model Relationships by Unjung Lee

2. Theoretical Back-ground

• Permits use of qualitative data (e.g.: seasonal, class standing, location, gen-der).

• 0, 1 coding (nominative data)

• As part of Diagnostic Checking; incorporate outliers (i.e.: large residuals) and influence

measures.

2.2.A Dummy Variables by Unjung Lee

2. Theoretical Back-ground

• Hypothesizes interaction between pairs of X variables– Response to one X variable varies at differ-

ent levels of another X variable• Contains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 + • Can be combined with other models e.g. dummy variable models

2.2.A Interaction Regression Model by Unjung Lee

2. Theoretical Back-ground

• Given:

• Without interaction term, effect of X1 on Y is measured by 1

• With interaction term, effect of X1 onY is measured by 1 + 3X2

– Effect increases as X2i increases

Y X X X Xi i i i i i 0 1 1 2 2 3 1 2

2.2.A Effect of Interaction by Unjung Lee

2. Theoretical Back-ground

β

β β

Effect (slope) of X1 on Y does depend on X2 value

X1

4

8

12

00 10.5 1.5

Y Y = 1 + 2X1 + 3X2 + 4X1X2

Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1

Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1

2.2.A Interaction Example by Unjung Lee

2. Theoretical Back-ground

 

2.2.A The two-way ANOVA by Unjung Lee

2. Theoretical Back-ground

sourse df ss Ms

Factor A a-1 SS(A) MS(A) = SS(A)/(a-1) 

Factor B b-1 SS(B)  MS(B) = SS(B)/(b-1)

Intersection AB

(a-1)(b-1) SS(AB) MS(AB)= SS(AB)/(a-1)(b-1)

Error ab(r-1) SSE SSE/ab(r-1) 

Total abr-1 SS(Total)

2.2.A The two-way ANOVA table by Unjung Lee

2. Theoretical Back-ground

 

2.2.A Test homogeneity of variance by Unjung Lee

2. Theoretical Back-ground

2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu

2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu

a

ii

G SSESSE1

(1) Define Sum of Square of Errors within Groups Is calculated based on

AND, is generated by the random error ε.

i

2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu

i

i

(2) SSE is generated by (*) Random Error ε (*)Difference between distinct we can calculate SSE based on a common

(3) Let SSA=SSE- SSA Sum of Square between Groups SSA is constituted by the difference between dif-ferent

2. Theoretical Back-ground2.2.B Test Whether Ho: by Xin Yu

)2(

1

1)2(]1)1([

na

a

SSASSA

anana

SSEdfSSEMSA

dfMSA

dfdfdf

G

G

e

GG

a

G

eea

MSA Mean Square between Groups Mean Square within GroupsDo F test on MSA and to see whether we can reject our Ho F= MSA/

2. Theoretical Back-ground2.2.C Test Linear Relationship by Xin Yu

Assumption 3:Test a linear relationship between the dependent

variable and covariate. Ho: β=0 How to do it next? Use F test on SSR and SSE

S um of S quare of

R egress ion

2. Theoretical Back-ground

How to calculate SSR and MSR? From each

SST is the difference obtained from the summation of the square of the

differences between and .

2.

1 1

( )ina

iji j

SST y y

.y

2.2.C Test Linear Relationship by Xin Yu

2

1

ˆ( )n

ii

SSR y y

/1MSR SSR

2. Theoretical Back-ground

How to calculate SSE and MSE? From each

2

1 1

( )ina

ij ii j

SSE y y

( )

SSEMSE

n a

yi

ˆ

SSE is the error obtained from the summation of the square of the differences between and

2.2.C Test Linear Relationship by Xin Yu

2. Theoretical Back-ground

Based on the T.S. we determine whether to accept Ho(β=0) or not.

Assume Assumption 1 and 2 are already passed.

(*)If H0 is true (β=0), we do ANOVA.

(*)Otherwise, we do ANCOVA

So, anytime we want to use ANCOVA, we need to test the three assumptions first!

2.2.C Test Linear Relationship by Xin Yu

MSRF

MSE

3.1 Case Introduction by Xiaojuan Shang

3. Application of ANCOVA

Analysis of covariance (ANCOVA) is a statisti-

cal procedure that allows you to include both

categorical and continuous variables in a sin-

gle model. ANCOVA assumes that the regres-

sion coefficients are homogeneous (the same)

across the categorical variable. Violation of

this assumption can lead to incorrect conclu-

sions

3.1 Case Introduction by Xiaojuan Shang

3. Application of ANCOVA

Here is an example data file we will use. It

contains 30 subjects who used one of three

diets, diet 1 (diet=1), diet 2 (diet=2) and a

control group (diet=3). Before the start of the

study, the height of the subject was mea-

sured, and after the study the weight of the

subject was measured.

3.1 Data Structure by Xiaojuan Shang

3. Application of ANCOVA

3.1 Case Concerns by Xiaojuan Shang

3. Application of ANCOVA

• Difference between three diet groups

• Correlation between height and weight

• Difference between control group and the other two groups

3.1 Case Data: Compare with ANOVA by Xiaojuan Shang

3. Application of ANCOVA

PROC GLM DATA=htwt;

CLASS diet ;

MODEL weight = diet ;

MEANS diet / deponly ;

CONTRAST 'compare 1&2 with control' diet 1 1 -

2 ;

CONTRAST 'compare diet 1 with 2 ' diet 1 -1

0 ;

RUN;

QUIT;

3.1 Case Data: Compare with ANOVA by Xiaojuan Shang

3. Application of ANCOVA

3.1 Case Data: Compare with ANOVA by Xiaojuan Shang

3. Application of ANCOVA

1. Description of data2. Investigation of equality of slope for the

groups through traditional ANOVA model (homogeneity of regression assumption)

3. When homogeneity of assumption is vio-lated

examination on the effect of the group variable (diet group) at different levels of the co-variate (height levels). 

3.2 SAS Codes for ANCOVA model: Outline by Younga Choi

3. Application of ANCOVA

•N= 30 •IV:

(1)Diet (three levels) - diet 1 (diet=1, n=10)- diet 2 (diet=2, n=10)

- diet 3, control group, (diet=3, n=10) (2) Height

•DV: weight of the subject was measured after the study

3.2 Data Description by Younga Choi

3. Application of ANCOVA

Comparing means of diet groups

Comparing means of diet groups

3.2 Reading the Data & Traditional ANCOVA model

by Younga Choi

3. Application of ANCOVA

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

Checking on the Homogeneity of Regression Assump-tion:

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

Checking on the Homogeneity of Regression Assump-

tion: Pairwise Comparisons

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

When the Homogeneity of Regression Assumption is Vio-lated

Comparing Slope of Diet1 and Diet2 and Diet3 Combined

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

Overall mean value of heightOverall mean value of height

3.2 Homogeneity of Regression Assumption by Younga Choi

3. Application of ANCOVA

3.3 SAS Output- One Way ANOVA Model by Qiao Zhang

3. Application of ANCOVA

The results are consistent with those of the ANOVA

3.3 Standard ANCOVA Model by Qiao Zhang

3. Application of ANCOVA

3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang

3. Application of ANCOVA

Diet=1Dependent Variable: weight   

Diet=2Dependent Variable: weight   

Diet=3Dependent Variable: weight   

There is significant linear relationship be-tween weight and height in both diet 2 and diet 3 group, but not in diet 1 group.

3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang

3. Application of ANCOVA

The diet*height effect is indeed signifi-cant, indicating that the slopes do differ across the three diet groups.

3.3 Assumptions (Homogenity of Regresion) by Qiao Zhang

3. Application of ANCOVA

These results indicate a significant differ-ence between diet 1 and diet 2 for those 59 inches tall, and a significant difference for those 64 inches tall.  For those who are tall (i.e., 68 inches), diet 1 and diet 2 are about equally effective. 

3.3 Tests : Comparing diet 1 with diet 2 by Qiao Zhang

3. Application of ANCOVA

The difference in weight between diet groups 1 and 2 combined and the control group is significant at different heights.

3.3 Comparing diets 1 and 2 to the control group by Qiao Zhang

3. Application of ANCOVA

The test comparing the slopes of diet group 1 versus 2 and 3 was significant, and the test comparing the slopes for diet groups 2 versus 3 was not signifi-cant.

We can combine slopes for diet group 2 and 3.

3.3 Testing to pool slopes by Qiao Zhang

3. Application of ANCOVA

Pooled slopes model

Unpooled slopes model

3.3 Overall analysis: diet groups 2 and 3 by Qiao Zhang

3. Application of ANCOVA

Comparing diet groups 1 and 2 when pooling slopes for diet groups 2 and 3

Comparing diet groups 2 and 3 when pooling slopes for diet groups 2 and 3

3.3 Overall analysis by Qiao Zhang

3. Application of ANCOVA

• The homogeneity of regression assumption is violated in this data set.

• We then estimated models that have separate slopes across groups. 

• When comparing the control group to diets 1 and 2, we found the control group weighed more at 3 different levels of height (59 inches, 64 inches and 68 inches). 

• When we comparing diets 1 and 2, we found diet 2 to be more effective at 59 and 64 inches, but there was no difference at 68 inches.

3.3 Summary of Outputs by Qiao Zhang

3. Application of ANCOVA

top related