group #4 ams 572 – data analysis

Post on 25-Feb-2016

48 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

ANCOVA. Group #4 AMS 572 – Data Analysis. Professor : Wei Zhu. Team 4. Lin Wang (Lana). Xian Lin (Ben). Zhide Mo (Jeff). Miao Zhang. Juan E. Mojica. Yuan Bian. Ruofeng Wen. Hemal Khandwala. Lei Lei. Xiaochen Li ( Joe ). Team 4. Introduction to ANCOVA. What is ANCOVA. - PowerPoint PPT Presentation

TRANSCRIPT

Group #4AMS 572 – Data Analysis

ANCOVA

Professor: Wei Zhu1/85

Team 4Lin Wang (Lana)

Zhide Mo (Jeff)

Juan E. Mojica

Yuan Bian

Hemal Khandwala

Xiaochen Li (Joe)

Ruofeng Wen

Miao Zhang

Xian Lin (Ben)

Lei Lei

2/85

Team 4

3/85

Introduction to ANCOVA

4/85

What is ANCOVA

ANCOVA Analysis of Covariance

ANCOVA merge of ANOVA & Linear Regression

Analysis of Variance 5/85

Development and Application of ANOVA

6/85

ANOVA • described by R. A. Fisher to assist in the

analysis of data from agricultural experiments.

H0 is rejected when it is true

• Compare the means of any number of experimental conditions without any increase in Type 1 error.

7/85

ANOVA a way of determining whether the average scores of groups differed significantly.

Psychology Assess the average effect of different experimental conditions on subjects in terms of a particular dependent variable.

8/85

Ronald Aylmer Fisher

An English statistician,

Evolutionary biologist, and

Geneticist.

Contributions: Feb.17, 1890 – July 29, 1962

Analysis of Variance(ANOVA), Maximum

likelihood, F-distribution, etc.9/85

Development and Application of Linear

Regression

10/85

• developed and applied in different areas with

that of ANOVA

• got developed in biology and psychology

• The term "regression" was coined by Francis

Galton in the nineteenth century to describe a

biological phenomenon

Linear Regression

11/85

Francis Galton studied

the height of parents and

their adult children

Conclusion: short

parents’ children are usually

shorter than average, but

still taller than their parents.

5’6’’ 5’4’’

5’8’’

5’9’’

<

Average height

Regression toward the Mean 12/85

Regression applied to data obtained

from correlational or non-experimental research

Regression analysis helps us

understand the effect of changing one

independent variable on changing dependent

variable value13/85

Francis Galton(Feb. 16, 1822-Jan. 17, 1851)English anthropologist, eugenicist, and statistician.

Contributions:• widely promoted regression

toward the mean• created the statistical concept of correlation

• a pioneer in eugenics, coined the term in 1883

• the first to apply statistical methods to the study of human differences 14/85

• a statistical technique that combines regression and ANOVA(analysis of variance).

What is ANCOVA

• originally developed by R.A. Fisher to increase the precision of experimental analysis

• applied most frequently in quasi-experimental research

involve variables cannot be controlled directly 15/85

• DDDDDDDSDLCJASKDJFLKASJDFLASJD

16/85

One-Way Layout Experiment

Treatment1 2

Sample Mean

Sample SD

Balanced design, if

factor ALevels

Samples 17/85

• , where

• , where is the grand mean

1, 2,..., ; 1, 2,..., ii a j n

This is a linear model to represent Yij

18/85

ESTIMATORS

(grand mean)

19/85

Treatment1 2

Sample Mean

Sample SD

𝒏𝒊(𝒚 𝒊− �� )𝟐

… … …

What is SSA?

20/85

• the factor A sum of squares

• the factor A mean square, with d.f.

What is SSA?

21/85

Treatment1 2

Sample Mean

Sample SD

(𝒚 𝒊𝒋−𝒚 𝒊)𝟐

…… …

What is SSE?

22/85

• What is SSE?

23/85

Treatment1 2

Sample Mean

Sample SD

𝑺𝑺𝑻=∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒏𝒊

(𝒚 𝒊𝒋− �� )𝟐What is SST?

24/85

• the total sum of squares

• ANOVA identity

What is SST?

25/85

Source of Variance

Sum of Squares Degrees of Freedom

Mean Square F

Treatments

Error

Total

ANOVA TABLE

26/85

Theorethical Background

27/85

Model of ANOVA

ij i ijY Data, the jth observation

of the ith group

Grand mean of Y

Error N(0,σ2)

Effects of the ith group (We focus on if αi = 0, i = 1, …, a)

28/85

Model of Linear Regression

1 0ij ij ijY X

Data, the (ij)th

observation

ErrorPredictor

Slope and Intersect(We focus on the

estimate)29/85

ANCOVA is ANOVA merged with Linear Regression

( ..)ij i ij ijY X X

Known Covariate(What is this guy

doing here?)

Effects of the ith group

(We still focus on if αi = 0, i = 1, …, a)

30/85

How to perform ANCOVA

( ..)ij i ij ijY X X

¿

( )ij i ijY adjust

(This is just the ANOVA Model!)31/85

( ..)ij i ij ijY X X

1 0ij ij ijY X

Within each group, consider αi a constant, and notice that we actually only desire the estimate of slope β instead of INTERSECT.

How do we get ,then?

32/85

How do we get ,then?(2)

• Within each group, do Least Square:

. .

2.

( )( )ˆ( )ij i ij ij

iij ij

X X Y Y

X X

• Assume that

33/85

2. . .

22.

.

ˆ ( ) ( )( )ˆ

( ) ( )

i ij i ij i ij ii j i j

ij iij ii j

i j

X X X X Y Y

X X X X

How do we get ,then?(3)

• We use Pooled Estimate of β

. .

2.

( )( )ˆ( )ij i ij ij

iij ij

X X Y Y

X X

34/85

In each group, find Slope Estimation

via Linear Regression

��𝑖=∑𝑗

¿¿¿

Pool them together

2.

2.

ˆ ( )ˆ

( )

i ij ii j

ij ii j

X X

X X

Get rid of the Covariate ¿

Do ANOVA on the model

~𝑌 𝑖𝑗(𝑎𝑑𝑗𝑢𝑠𝑡)=𝜇+𝛼 𝑖+𝜀𝑖𝑗

ANCOVA begins: ( ..)ij i ij ijY X X

Go home and have dinner.

2 ( ) ?Yammy Cheeseburg ice Coke 35/85

ANCOVA, ANOVA and Regression

36/85

ANOVA /ANCOVA

Regression

General Linear Model

Simple Linear Regression

0Y X

Response Variable Predictor

Error

IntersectSlope

All of them are Scalars!37/85

Multiple Linear Regression

Y X

11 1,( 1)

1 ,( 1)

1

1

n

m m n

x x

x x

1

n

1

n

1

m

y

y

38/85

ANOVA: Dummy Variable Regression

0 1i i iY Z Outcome of the ith

unit Categorical variable (binary)

Residual for the ith

unit

coefficient for the intersect

coefficient for the slope

More about the : =1 if unit is the treatment group =0 if unit is the control group

iZ

iZiZ

39/85

40

Two-way ANOVA

ijk i j ij ijkY

Response variable

the effect due to any

interaction between the ith level of A and

the jth level of B

Residual for the ith

unit

effect due to the ith level of

factor A

effect due to the jth level of

factor B

Overall mean response

General Linear Model

0 1 1 2 2 1 1 2 2...i i i p p p p iy X X X X

Categorical Variables

Continuous Variable

Random Error

Categorical Variables

Continuous Variable

The above formula can be simply denoted as:

41/85

Y X What can this X be?

Before we see an example of X, we have learned thatGeneral Linear Model covers (1) Simple Linear Regression; (2) Multiple Linear Regression; (3) ANOVA; (4) 2-way/n-way ANOVA.

The ith response variable

X: Interaction Between Random Variables

Did you see the tricks?Next, let us see what assumptions shall be satisfied before using ANCOVA.

42/85

0 1 1 2 2 3 3Y X X X

X in the GLM might be expanded as

Where X3 in the above formula could be the INTERACTION between X1 and X20 1 1 2 2 3 1 2*Y X X X X

1 ... ...i a

Test the Three Assumptions

1. Test the homogeneity of variance

2. Test the homogeneity of regression whether H0:

3. Test whether there is a linear relationship between the dependent variable and covariate.

43/85

Before using ANCOVA…

For each i, calculate the MSE/ / 2

i

i i iMSE SSE df SSE n

1. Test the Homogeneity of Variance (1)

44/85

Utilize ( )and ( )i ii iMax MSE Min MSE maxto do a F test

to make sure is a constant under each different

levels.F=Max(MSE ) / ( )i iMin MSE

1 ... ...i a 2. Test Whether H0: (1)

45/85

i

2. Test Whether H0: 1 ... ...i a

1

aG

ii

SSE SSE

(1) DefineGSSE Sum of Square of Errors within Groups

iSSE Is calculated based on

AND, GSSE is generated by the random error .

(2)

46/85

i

2. Test Whether H0: 1 ... ...i a

(2) SSE is generated by

SSB Sum of Square between Groups

• Random Error

SSB is constituted by the difference between different

• Difference between distinct

(3) Let SSB=SSE – SSEG.

We can calculate SSE based on a common

i

(3)

47/85

[ ( 1) 1] ( 2) 1/ / 1

/ / ( 2)

Gb e e

b

G G G Ge

df df df a n a n aMSB SSB df SSB a

MSE SSE df SSE a n

Do F test on MSB and MSEG to see whether we can reject our HO

2. Test Whether H0: 1 ... ...i a

MSB Mean Square between GroupsGMSE Mean Square within Groups

F=MSB / MSEG

(4)

48/85

3. Test Linear Relationship (1)Assumption 3: Test a linear relationship between the

How to do it?

andHo: = 0 dependent variable covariate.

F test SSRon and SSE

Sum of Square of Regression

49/85

From each ix ˆiy0 1

ˆ ˆˆi iy x

SSR is the difference obtained from the summation of the square of the differences between and .yˆiy

3. Test Linear Relationship (2)How to calculate SSR and MSR?

2

1

ˆ( )n

ii

SSR y y

/1MSR SSR

50/85

From each ix ˆiy0 1

ˆ ˆˆi iy x

SSE is the error obtained from the summation of the square of the differences between and .

iyˆiy

3. Test Linear Relationship (3)How to calculate SSE and MSE?

2

1

ˆ( )n

i ii

SSE y y

/( 2)MSE SSE n

51/85

3. Linear Relationship Test (4)MSRFMSE

0

Based on the T.S. we determine whether to accept H0 ( ) or not.0

Assume Assumptions 01 and 02 are already passed.

• If H0 is true ( ),we do ANOVA.• Otherwise, we do ANCOVA.

So, anytime we want to use ANCOVA, we need to test the three assumptions first!

52/85

Application of ANCOVA

53/85

Our case• In this hypothetical study, a sample of 36 teams (id in the

data set) of 12-year-old children attending a summer camp participated in a study to determine which one of three different tree-watering techniques worked best to promote tree growth.

Techniques Frequency CodeWatering the base with a hose

10 minutes once per day

1

Watering the ground surrounding (drip system)

2 hours each day 2

Deep watering (sunk pipe) 10 minutes every 3 days

3

54/85

Conditions for the experiment• From a large set of equally sized and equally

healthy fast-growing trees, each team was given a tree to plant at the start of the camp.

• Each team was responsible for the watering and general care of their trees

• At the end of the summer, the height of each tree was measured.

60/85

Concerns• that some children might have had more

gardening experience than others, and • that any knowledge gained as a result of that

prior experience might affect the way the tree was planted and perhaps even the way in which the children cared for the tree and carried out the watering regime.

How to approach?Create a indicator for that knowledge. (i.e. a 40 point scale gardering experience)

61/85

id watering technique

tree growth

dvgardening

exp cov

1 1 39 242 1 36 183 1 30 214 1 42 24

……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6

Real Data

Grouping (1,2,3)

Dependend Variable

Covariate Variable

Data Structure

62/85

id watering technique

tree growth

dvgardening

exp cov

1 1 39 242 1 36 183 1 30 214 1 42 24

……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6

Real Data

Grouping (1,2,3)

Dependend Variable

Covariate Variable

( ..)ij i ij ijY X X

Overall Mean Response

Regression coefficient parameter.

Residual error

Data Structure

63/85

Model Assumptions

ANCOVASAS

Linearity of Regression

Homogenity of Regression

Homogenity of Variance

and dv is Normal

64/85

The Pearson correlation coefficient between the covariate and the dependentvar.is .81150.

n

i in

i i

n

i ii

YX

YX

YXYX

YYXX

YYXXYXEYX

12

12

1,

)()(

))(()])([(),cov(

65/85

Assumptions

Clearly a strong linear component to the relationship.

Linearity of regressionassumption appears to be met by the data set

66/85

Assumptions (Homogenity of Regresion)

The assumption of homogeneity of regression is tested by examining the interaction of the covariate and the independent variable. If it is not statistically significant, as is the case here, then the assumption is met.

67/85

Output

The Model contains the effectsof both the covariate and theindependent variable.

The effects of the covariateand the independent variableare separately evaluated inthis summary table.

68/85

Output

69/85

Output

Watering techniques coded as 1 (hose watering) and 3 (deep watering) are the only two groups whose means differ significantly

78/85

Experiment Conclusions• We can assert that prior gardening experience and

knowledge was quite influential in how well the trees fared under the attention of the young campers.

• when we statistically control for or equate the gardening experience and knowledge of the children, was a relatively strong factor in how much growth was seen in the trees.

• On the basis of the adjusted means, we may therefore conclude that, when we statistically control for gardening experience,deep watering is more effective than hose watering but is not significantly more effective than drip watering.

79/85

SAS Code for ANCOVA

GROUP VARIABLE, DEPENDENT VARIABLE and COVARIATE

THIS IS ANCOVA!!!!!

80/85

ENTERPRISE GUIDE APPORACH

81/85

ENTERPRISE GUIDE APPORACH

Tasks->Graph->Scatter Plot82/85

ENTERPRISE GUIDE APPORACH

Tasks->ANOVA->Linear Models83/85

ENTERPRISE GUIDE APPORACH

84/85

QUESTIONS?THANK YOU!

85/85

top related