analysis of variance st 511 introduction n analysis of variance compares two or more populations of...

40
Analysis of Variance ST 511

Upload: alyson-kelley

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Analysis of VarianceAnalysis of Variance

ST 511

Page 2: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Introduction

Analysis of variance compares two or more populations of quantitative data.

Specifically, we are interested in determining whether differences exist between the population means.

The procedure works by analyzing the sample variance.

Page 3: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

§1 One Way Analysis of Variance

The analysis of variance is a procedure that tests to determine whether differences exist between two or more population means.

To do this, the technique analyzes the sample variances

Page 4: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance: Example

A magazine publisher wants to compare three different styles of covers for a magazine that will be offered for sale at supermarket checkout lines. She assigns 60 stores at random to the three styles of covers and records the number of magazines that are sold in a one-week period.

Page 5: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance: Example

How do five bookstores in the same city differ in the demographics of their customers? A market researcher asks 50 customers of each store to respond to a questionnaire. One variable of interest is the customer’s age.

Page 6: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Idea Behind ANOVA –two types of variability

1. Within group variability2. Between group

variability

Page 7: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2 Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Page 8: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Idea behind ANOVA: recall the two-sample t-statistic

Difference between 2 means, pooled variances, sample sizes both equal to n

Numerator of t2: measures variation between the groups in terms of the difference between their sample means

Denominator: measures variation within groups by the pooled estimator of the common variance.

If the within-group variation is small, the same variation between groups produces a larger statistic and a more significant result.

2

1 1

22

2

( )

( )2

n

pp n n

n

p

x yx yss

x y

s

t

t

Page 9: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance: Example

Example 1– An apple juice manufacturer is planning

to develop a new product -a liquid concentrate.

– The marketing manager has to decide how to market the new product.

– Three strategies are considered Emphasize convenience of using the product. Emphasize the quality of the product. Emphasize the product’s low price.

Page 10: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance

Example 1 - continued– An experiment was conducted as follows:

In three cities an advertisement campaign was launched .

In each city only one of the three characteristics (convenience, quality, and price) was emphasized.

The weekly sales were recorded for twenty weeks following the beginning of the campaigns.

Page 11: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance

Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532

Weekly sales

Weekly sales

Weekly sales

Page 12: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

One Way Analysis of Variance

Solution– The data are quantitative– The problem objective is to compare

sales in three cities.– We hypothesize that the three

population means are equal

Page 13: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

H0: 1 = 2 = 3

H1: At least two means differ

To build the statistic needed to test the hypotheses use the following notation:

• Solution

Defining the Hypotheses

Page 14: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Independent samples are drawn from k populations (treatment groups).

1 2 kX11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample size

Sample mean

First observation,first sample

Second observation,second sample

X is the “response variable”.The variables’ values are called “responses”.

Notation

Page 15: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Terminology

In the context of this problem…Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy.Factor levels – the population (treatment) names. In this problem factor levels are the 3 marketing strategies: 1) convenience, 2) quality, 3) price

Page 16: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Two types of variability are employed when testing for the

equality of the population means

The rationale of the test statistic

1.Within sample variability2.Between sample

variability

Page 17: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

The rationale behind the test statistic – I

If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).

If the alternative hypothesis is true, at least some of the sample means would differ.

Thus, we measure variability between sample means.

H0: 1 = 2 = 3

H1: At least two means differ

Page 18: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

This sum is called the Sum of Squares for Groups

SSG

• The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.

In our example, treatments arerepresented by the differentadvertising strategies.

Variability between sample means

Page 19: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

k2

j jj 1

SSG n (x x)

There are k treatments

The size of sample j The mean of sample j

Sum of squares for treatment groups (SSG)

Note: When the sample means are close toone another, their distance from the grand mean is small, leading to a small SSG. Thus, large SSG indicates large variation between sample means, which supports H1.

Page 20: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Sum of squares for treatment groups (SSG)

Solution – continuedCalculate SSG

1 2 3

k2

j jj 1

x 577.55 653.00 608.65

SSG n (x x)

x x

= 20(577.55 - 613.07)2 + + 20(653.00 - 613.07)2 + + 20(608.65 - 613.07)2 == 57,512.23

The grand mean is calculated by

1 1 2 2

1 2

...

...k k

k

n x n x n xx

n n n

Page 21: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Sum of squares for treatment groups (SSG)

Is SSG = 57,512.23 large enough to reject H0 in favor of H1?

Page 22: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

The rationale behind test statistic – II

Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

Therefore, even though sample means may markedly differ from one another, SSG must be judged relative to the “within samples variability”.

Page 23: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

This sum is called the Sum of Squares for Error

SSE

Within samples variability The variability within samples is

measured by adding all the squared distances between observations and their sample means.

In our example, this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities).

Page 24: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Sum of squares for errors (SSE)

Solution – continuedCalculate SSE2 2 21 2 3

2

1 1

10,775.00 7,238.11 8,670.24

( )jnk

ij jj i

s s s

SSE x x

= (n1 - 1)s12 + (n2 -1)s2

2 + (n3 -1)s3

2

= (20 -1)10,775 + (20 -1)7,238.11+ (20-1)8,670.24 = 506,983.50

Page 25: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Sum of squares for errors (SSE)

Is SSG = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal?

Page 26: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

To perform the test we need to calculate the mean squares as follows:

The mean sum of squares

Calculation of MSG - Mean Square for treatment Groups

157,512.23

3 128,756.12

SSGMSG

k

Calculation of MSEMean Square for Error

506,983.50

60 38,894.45

SSEMSE

n k

Page 27: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Calculation of the test statistic

28,756.12

8,894.45

3.23

MSGF

MSE

with the following degrees of freedom:

v1=k -1 and v2=n-k

Required Conditions:1. The populations tested are normally distributed.2. The variances of all the populations tested are equal.

Page 28: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

And finally the hypothesis test:

H0: m1 = m2 = …=mk

H1: At least two means differ

Test statistic:

R.R: F>Fa,k-1,n-k

MSGF

MSE

The F test rejection region

Page 29: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

The F test

Ho: m1 = m2= m3

H1: At least two means differ

Test statistic F= MSG/ MSE= 3.23

0.05,3 1,60 31. .: 3.15k n kR R F F F

Since 3.23 > 3.15, there is sufficient evidence to reject Ho in favor of H1,

and argue that at least one of the mean sales is different than the others.

28,756.12

8,894.45

3.23

MSGF

MSE

Page 30: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

0 0.5 1 1.5 2 2.5 3 3.5 40

0.010.020.030.040.050.060.070.080.090.1

The F test p- value

Use Excel to find the p-valuefx Statistical F.DIST.RT(3.23,2,57) = .0469

p Value = P(F>3.23) = .0469

Page 31: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Excel single factor ANOVA

SS(Total) = SSG + SSE

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Convenience 20 11551 577.55 10775.00Quality 20 13060 653.00 7238.11Price 20 12173 608.65 8670.24

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 57512 2 28756 3.23 0.0468 3.16Within Groups 506984 57 8894

Total 564496 59

Page 32: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Multiple Comparisons

When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different, and at what ranking order.

Two statistical inference procedures, geared at doing this, are presented:– “regular” confidence interval

calculations– Bonferroni adjustment

Page 33: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Multiple Comparisons Two means are considered

different if the confidence interval for the difference between the corresponding sample means does not contain 0. In this case the larger sample mean is believed to be associated with a larger population mean.

How do we calculate the confidence intervals?

Page 34: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

“Regular” Method This method builds on the equal variances

confidence interval for the difference between two means.

The CI is improved by using MSE rather than sp

2 (we use ALL the data to estimate the common variance instead of only the data from 2 samples)

2,

1 1( )

. . ,

i j n ki j

x x t sn n

d f n k s MSE

Page 35: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Experiment-wise Type I error rate(the effective Type I error)

The preceding “regular” method may result in an increased probability of committing a type I error.

The experiment-wise Type I error rate is the probability of committing at least one Type I error at significance level . It is calculated by:experiment-wise Type I error rate = 1-(1 – )g

where g is the number of pairwise comparisons (i.e. g = k C 2 = k(k-1)/2.

For example, if =.05, k=4, thenexperiment-wise Type I error rate =1-.735=.265

The Bonferroni adjustment determines the required Type I error probability per pairwise comparison (*) , to secure a pre-determined overall .

Page 36: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Bonferroni Adjustment The procedure:

– Compute the number of pairwise comparisons (g)[g=k(k-1)/2], where k is the number of populations.

– Set * = /g, where is the true probability of making at least one Type I error (called experiment-wise Type I error).

– Calculate the following CI for i – j

* 2,

1 1( )

. . ,

i j n ki j

x x t sn n

d f n k s MSE

Page 37: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

1 2

1 3

2 3

577.55 653 75.45

577.55 608.65 31.10

653 608.65 44.35

x x

x x

x x

Bonferroni Method Example - continued

– Rank the effectiveness of the marketing strategies (based on mean weekly sales).

– Use the Bonferroni adjustment method Solution

– The sample mean sales were 577.55, 653.0, 608.65.

– We calculate g=k(k-1)/2 to be 3(2)/2 = 3.– We set * = .05/3 = .0167, thus t.0167/2, 60-3 =

2.467 (Excel). – Note that s = √8894.447 = 94.31

* 2

1 1

2.467*94.31 1/ 20 1/ 20 73.57

i j

t sn n

Page 38: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Bonferroni Method: The Three Confidence Intervals

* 2,

1 1( )

. . ,

i j n ki j

x x t sn n

d f n k s MSE

* 2

1 1

2.467*94.31 1/ 20 1/ 20 73.57

i j

t sn n

1 3 : 31.10 73.57 ( 104.67,42.47)

2 3 :44.35 73.57 ( 29.22,117.92)

1 2 : 75.45 73.57 ( 149.02, 1.88)

There is a significant difference between 1 and 2.

1 2

1 3

2 3

577.55 653 75.45

577.55 608.65 31.10

653 608.65 44.35

x x

x x

x x

Page 39: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Bonferroni Method: Conclusions Resulting from Confidence Intervals

1 3 : 31.10 73.57 ( 104.67,42.47)

Do we have evidence to distinguish two means? Group 1 Convenience: sample mean 577.55 Group 2 Quality: sample mean 653 Group 3 Price: sample mean 608.65

List the group numbers in increasing order of their sample means; connecting overhead lines mean no significant difference

1 2 : 75.45 73.57 ( 149.02, 1.88)

2 3 :44.35 73.57 ( 29.22,117.92)

1 3 2

1 3 : 31.10 73.57 ( 104.67,42.47)

Page 40: Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested

Bonferroni Method: Conclusions Resulting from Confidence Intervals

Do we have evidence to distinguish two means? Group 1 Convenience: sample mean

577.55 Group 2 Quality: sample mean 653 Group 3 Price: sample mean 608.65

List the group numbers in increasing order of their sample means; connecting overhead lines mean no significant difference

1 3 : 31.10 73.57 ( 104.67,42.47) 1 2 : 75.45 73.57 ( 149.02, 1.88)

2 3 :44.35 73.57 ( 29.22,117.92)

1 3 2