copyright © 2013, 2009, and 2007, pearson education, inc. chapter 14 comparing groups: analysis of...

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.

Chapter 14Comparing Groups: Analysis

of Variance Methods

Section 14.2

Estimating Differences in Groups for a Single Factor

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.3

Confidence Intervals Comparing Pairs of Means

Follow Up to an ANOVA F-Test:

When an analysis of variance F-test has a small P-value, the test does not specify which means are different or how different they are.

We can estimate differences between population means with confidence intervals.


For two groups i and j, with sample means and

having sample sizes ni and nj, the 95% confidence

interval for is:

The t-score has total sample size - # groups

SUMMARY: Confidence Interval Comparing Means

i j

df N g


Confidence Intervals Comparing Pairs of Means

In the context of follow-up analyses after the ANOVA F test by forming this confidence interval to compare a pair of means, some software (such as MINITAB) refers to this method of comparing means as the Fisher method.

When the confidence interval does not contain 0, we can infer that the population means are different. The interval shows just how different the means may be.


A recent GSS study asked: “About how many good friends do you have?”

The study also asked each respondent to indicate whether they were ‘very happy,’ ‘pretty happy,’ or ‘not too happy’.

Example: Number of Good Friends and Happiness


Let the response variable y = number of good friends

Let the categorical explanatory variable x = happiness level



Table 14.3 Summary of ANOVA for Comparing Mean Number of Good Friendsfor Three Happiness Categories. The analysis is based on GSS data.



Construct a 95% CI to compare the population mean number of good friends for the three pairs of happiness categories—very happy with pretty happy, very happy with not too happy, and pretty happy with not too happy.

95% CI formula:



First, use the output to find s:

df=828 Use software or a table to find the t-value of 1.963



For comparing the very happy and pretty happy categories, the confidence interval for is

Since the CI contains only positive numbers, this suggests that, on average, people who are very happy have more good friends than people who are pretty happy.


1 2


The Effects of Violating Assumptions

The t confidence intervals have the same assumptions as the ANOVA F test:1. normal population distributions with

2. identical standard deviations

3. data obtained from randomization

When the sample sizes are large and the ratio of the largest standard deviation to the smallest is less than 2, these procedures are robust to violations of these assumptions.

If the ratio of the largest standard deviation to the smallest exceeds 2, use the confidence interval formulas that use separate standard deviations for the groups.


Controlling Overall Confidence with Many Confidence Intervals

The confidence interval method just discussed is mainly used when g is small or when only a few comparisons are of main interest.

The confidence level of 0.95 applies to any particular confidence interval that we construct.


How can we construct the intervals so that the 95% confidence extends to the entire set of intervals rather than to each single interval?

Methods that control the probability that all confidence intervals will contain the true differences in means are called multiple comparison methods.

For these methods, all intervals are designed to contain the true parameters simultaneously with an overall fixed probability.



The method that we will use is called the Tukey method.

It is designed to give overall confidence level very close to the desired value (such as 0.95).

This method is available in most software packages.



Example: Number of Good Friends

Table 14.4 Multiple Comparisons of Mean Good Friends for Three Happiness Categories. An asterisk * indicates a significant difference, with the confidence interval not containing 0.


ANOVA and Regression

ANOVA can be presented as a special case of multiple regression by using indicator variables to represent the factors. For example, with 3 groups we need 2 indicator variables to indicate group membership:

The first indicator variable is x1 = 1 for observations from the first group,

= 0 otherwise


The second indicator variable is for observations from the second group otherwise

The indicator variables identify the group to which an observation belongs as follows:


2 1x 0

1 2

1 2

1 2

1: 1 0

2 : 0 1

3: 0 0

Group if x and x

Group if x and x

Group if x and x


The multiple regression equation for the mean of y is

Table 14.5 Interpretation of Coefficients of Indicator Variables in Regression ModelThe indicator variables represent a categorical predictor with three categories specifying three groups.



Using Regression for the ANOVA Comparison of Means

For three groups, the null hypothesis for the ANOVA F test is

If is true, then and

In the Multiple Regression model:

with and

Thus, ANOVA hypothesis is equivalent to

in the regression model.

copyright © 2013, 2009, and 2007, pearson education, inc. chapter 14 comparing groups: analysis of...

Documents