biol 582

18
BIOL 582 Lecture Set 6 Tests for multiple groups: Part 2 One-Way ANOVA via the General Linear Model (GLM)

Upload: osgood

Post on 15-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

BIOL 582. Lecture Set 6 Tests for multiple groups: Part 2 One-Way ANOVA via the General Linear Model (GLM). BIOL 582. Intro to Linear Models. A simple linear model. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BIOL 582

BIOL 582

Lecture Set 6

Tests for multiple groups: Part 2

One-Way ANOVA via the General Linear Model (GLM)

Page 2: BIOL 582

A simple linear model

The ith observation from a sample (i.e., a sample of size n taken from a population of size N (observational study) or randomly assigned (experimental study); this is the observation of one of the subjects in the sample). Y is called the dependent variable.

The y-intercept

Slope

The independent value of the ith subject (e.g., sex, size, rank, category). X is called the independent variable

The portion unexplained by the model (error)

BIOL 582 Intro to Linear Models

Page 3: BIOL 582

A simple linear model

BIOL 582 Intro to Linear Models

0:

0:

1

10

bH

bH

A

Page 4: BIOL 582

This is the model. The model “predicts” or “estimates” a value of y for a given value of x

This is the error or uncertainty of a prediction for any value of y. The model is not perfect. This is a measure of imperfection. For any single value, it is called a residual (error refers more to a treatment of all residuals)

BIOL 582 Intro to Linear Models

A simple linear model

Page 5: BIOL 582

Height (cm)short tall

Mas

s (k

g)li g

hthe

avy

BIOL 582 Intro to Linear Models

A simple linear model

Page 6: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

For a simple linear model,

there are two null hypotheses (but both are rather similar)H0: b0 = 0

H0: b1 = 0

There are a few cases in biological research where testing null hypotheses for intercepts is valuable. We will focus on the slope because the slope has more direct biological meaning (measures a rate of change). Plus, the intercept is rather an artifact of the slope in most cases (a significant negative slope may have a rather large y-intercept). Nevertheless, as the methods for testing both hypotheses are the same, one can easily test the intercept too.

Page 7: BIOL 582

BIOL 582 Linear Models and comparison of group means

The general linear model,

Can also be written as

This is easy to see with a plot

Which indicates that the value of subject j in group i is equal to some expected value (overall mean or group 1 mean) plus some effect, τ, that measures the difference of the ith group from the expected value, plus error. The values of x are 0s and 1s to indicate group association. X is a “dummy variable”

Page 8: BIOL 582

A Group, x B

0 1

Mas

s (k

g)li g

hthe

avy

BIOL 582 Intro to Linear Models

The general linear model for groups

Page 9: BIOL 582

A Group, x B

0 1

Mas

s (k

g)li g

hthe

avy

BIOL 582 Intro to Linear Models

The general linear model for groups

Page 10: BIOL 582

A Group, x B

0 1

Mas

s (k

g)li g

hthe

avy

BIOL 582 Intro to Linear Models

The general linear model for groups

Page 11: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

To avoid unnecessary redundancy, we will try to use this terminology. And to be consistent with the way R uses dummy variables, we will assume that the intercept is the first group mean

Thus the model above indicates that groups differ from the mean of group 1, based on the slope (effect) of b1 for group 2.

The equation, , however, means that τ represents a suite of coefficients for g -1 groups.

This will make more sense soon….

Model + error

Page 12: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

H0: b1 = 0

Testing the null hypothesis for a parameter estimate is the same as comparing two different models! (One model contains the parameter; one lacks it)

“Full” Model

“Reduced” Model (Also called a null model)

Model + error

This is the same as saying a model that only produces the overall mean

This is the same as saying that group means are not different, such that they are equal to each other and equal to the overall mean

Page 13: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

H0: b1 = 0

One-way ANOVA is a comparison of the error variances from full and reduced models!

“Full” Model

“Reduced” Model (Also called a null model)

Is the same as saying

Page 14: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

H0: b1 = 0

One-way ANOVA is a comparison of the error variances from full and reduced models!

Is the same as saying

These are residuals

Therefore

Page 15: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

ˆ( )L β | Y Model likelihood: the likelihood of model parameters (β), given the data (Y)

1 12 2

/ 2

1ˆ( )2

t ti i n

np nL e

Y -Y Σ Y -Y Y μ Σ Y μ

β | YΣ

The right side of the equation is the multivariate normal probability density function (Σ is the error covariance matrix), and is maximized when

(i.e., the exponent is 0). For univariate response data, Σ is simply SSE/nY = μ

1ˆlog ( ) log log 22 2 2

n nL n β Σ This latter part is a constant

WARNING: This may hurt!

The smaller the error, the larger the likelihood of the model

Page 16: BIOL 582

BIOL 582 Linear Models and Hypothesis tests

WARNING: This may hurt!• Consider the following:

•Therefore, is a likelihood ratio test (LRT)stat *

1 1ˆ ˆlog ( ) log ( ) log log2 2full reduced full reducedL L n n

β β Σ Σ

1log log

2 full reducedn Σ Σ

* Note: it is common convention to multiply both sides by -2 and express the LRT as

which approximately follows a Chi-square distribution with (Δk) df, where ki is the number of model parameters for the ith model.

Page 17: BIOL 582

Reduced Model

Reduced Model

Full ModelFull Model

BIOL 582 Linear Models and Hypothesis tests

So What is ANOVA?

ANOVA is a form of likelihood ratio test…..

Consider three ways to test a null hypothesis for a linear model

:

Page 18: BIOL 582

BIOL 582 Comments

It is wise to learn how to perform matrix operations to fully understand linear models

You are not required to do this, but a supplemental slideshow is provided

The assumptions for ANOVA using F are

① Normal error

② Homoscedastic variance

The assumptions for ANOVA using randomization are…. Nothing

The assumptions for the LRT depend on the data used – there are different LRTs, as there are different likelihood functions