© 2010 pearson prentice hall. all rights reserved the chi-square goodness-of-fit test

© 2010 Pearson Prentice Hall. All rights reserved

The Chi-Square Goodness-of-Fit Test

12-2

Characteristics of the Chi-Square Distribution

1. It is not symmetric.

12-3

1. It is not symmetric.2. The shape of the chi-square distribution

depends on the degrees of freedom, just like Student’s t-distribution.


12-4



3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.


12-5



3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.

4. The values of 2 are nonnegative, i.e., the values of 2 are greater than or equal to 0.


12-7

A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a specific distribution.

12-8

Expected Counts

Suppose that there are n independent trials of anexperiment with k ≥ 3 mutually exclusive possibleoutcomes. Let p1 represent the probability of observing

the first outcome and E1 represent the expected count of

the first outcome; p2 represent the probability of

observing the second outcome and E2 represent the

expected count of the second outcome; and so on. Theexpected counts for each possible outcome are given by

Ei = i = npi for i = 1, 2, …, k

12-9

A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from 2000.

Parallel Example 1: Finding Expected Counts

12-10

Step 1: The probabilities are the relative frequencies from the 2000 distribution:

p<1yr = 0.228 p1-2yr = 0.239

p3-4yr = 0.176 p ≥5yr = 0.357

Solution

12-11

Step 2: There are n=1,000 trials of the experiment so the expected counts are:

E<1yr = np<1yr = 1000(0.228) = 228

E1-2yr = np1-2yr = 1000(0.239) = 239

E3-4yr = np3-4yr =1000(0.176) = 176

E≥5yr = np ≥5yr = 1000(0.357) = 357

Solution

12-12

Test Statistic for Goodness-of-Fit Tests

Let Oi represent the observed counts of category i, Ei

represent the expected counts of category i, k representthe number of categories, and n represent the number ofindependent trials of an experiment. Then the formula

approximately follows the chi-square distribution withk-1 degrees of freedom, provided that1. all expected frequencies are greater than or equal to 1

(all Ei ≥ 1) and2. no more than 20% of the expected frequencies are less

than 5.

2 Oi E i 2

E i

i 1, 2,, k

12-13

Step 1: Determine the null and alternative hypotheses. H0: The random variable follows a

certain distribution H1: The random variable does not

follow a certain distribution

The Goodness-of-Fit Test

To test the hypotheses regarding a distribution, we use the steps that follow.

12-14

Typically the hypotheses can be symbolically represented as:

The Goodness-of-Fit Test(hypotheses cont.)

kk ppppppH ˆ...ˆ,ˆ: 22110

vs. the alternative:

conformnot does ˆ oneleast At : ia pH

12-15

Step 2: Decide on a level of significance, , depending on the seriousness of making a Type I error.

12-16

Step 3: a) Calculate the expected counts for each

of the k categories. The expected counts are Ei=npi for i = 1, 2, … , k where n is the number of trials and pi is the probability of the ith category, assuming that the null hypothesis is true.

12-17

Step 3: b) Verify that the requirements for the goodness-

of-fit test are satisfied.1. All expected counts are greater than or

equal to 1 (all Ei ≥ 1).2. No more than 20% of the expected counts

are less than 5.c) Compute the test statistic:

Note: Oi is the observed count for the ith category.

02

Oi E i 2

E i

12-18

CAUTION!

If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low-frequency categories into a single category.

12-19

Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic.

P-Value Approach

12-20

Step 5: If the P-value < , reject the null hypothesis.If the P-value ≥ α, fail to reject the null hypothesis.

P-Value Approach

12-21

Step 6: State the conclusion in the context of the problem.

Note: in many cases, when the null hypothesis is rejected at the conclusion of the test, we will have to attempt to explain what the non-conformity was.

12-22

A sociologist wishes to determine whether the distribution forthe number of years care-giving grandparents are responsiblefor their grandchildren is different today than it was in 2000.

According to the United States Census Bureau, in 2000, 22.8%of grandparents have been responsible for their grandchildrenless than 1 year; 23.9% of grandparents have been responsiblefor their grandchildren for 1 or 2 years; 17.6% of grandparentshave been responsible for their grandchildren 3 or 4 years; and35.7% of grandparents have been responsible for theirgrandchildren for 5 or more years. The sociologist randomlyselects 1,000 care-giving grandparents and obtains thefollowing data.

Parallel Example 2: Conducting a Goodness-of -Fit Test

12-23

Test the claim that the distribution is different today than it was in 2000 at the = 0.05 level of significance.

12-24

Step 1: We want to know if the distribution today is different than it was in 2000. The hypotheses are then:

H0: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000

H1: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000

Solution

12-25

Or alternatively:

357.ˆ,176.ˆ

,239.ˆ,228.ˆ:

43

210

pp

ppH

conformnot does ˆ oneleast At : ia pH

12-26

Step 2: The level of significance is =0.05.

Step 3:

(a) The expected counts were computed in Example 1.

Solution

Number of Years

Observed Counts

Expected Counts

<1 252 228

1-2 255 239

3-4 162 176

≥5 331 357

12-27

Step 3:

(b) Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied.

(c) The test statistic is

Solution

02

252 228 2

228

255 239 2

239

162 176 2

176

331 357 2

3576.605

12-28

Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of . Thus, P-value ≈ 0.09.

Solution: P-Value Approach

02 6.605

12-29

Step 5: Since the P-value ≈ 0.09 is greater than the level of significance = 0.05, we fail to reject the null hypothesis.

Solution: P-Value Approach

12-30

Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the = 0.05 level of significance.

Solution

© 2010 pearson prentice hall. all rights reserved the chi-square goodness-of-fit test

Documents