© 2010 pearson prentice hall. all rights reserved the chi-square goodness-of-fit test
Post on 19-Dec-2015
218 views
TRANSCRIPT
© 2010 Pearson Prentice Hall. All rights reserved
The Chi-Square Goodness-of-Fit Test
12-2
Characteristics of the Chi-Square Distribution
1. It is not symmetric.
12-3
1. It is not symmetric.2. The shape of the chi-square distribution
depends on the degrees of freedom, just like Student’s t-distribution.
Characteristics of the Chi-Square Distribution
12-4
1. It is not symmetric.2. The shape of the chi-square distribution
depends on the degrees of freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.
Characteristics of the Chi-Square Distribution
12-5
1. It is not symmetric.2. The shape of the chi-square distribution
depends on the degrees of freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.
4. The values of 2 are nonnegative, i.e., the values of 2 are greater than or equal to 0.
Characteristics of the Chi-Square Distribution
12-6
12-7
A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a specific distribution.
12-8
Expected Counts
Suppose that there are n independent trials of anexperiment with k ≥ 3 mutually exclusive possibleoutcomes. Let p1 represent the probability of observing
the first outcome and E1 represent the expected count of
the first outcome; p2 represent the probability of
observing the second outcome and E2 represent the
expected count of the second outcome; and so on. Theexpected counts for each possible outcome are given by
Ei = i = npi for i = 1, 2, …, k
12-9
A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from 2000.
Parallel Example 1: Finding Expected Counts
12-10
Step 1: The probabilities are the relative frequencies from the 2000 distribution:
p<1yr = 0.228 p1-2yr = 0.239
p3-4yr = 0.176 p ≥5yr = 0.357
Solution
12-11
Step 2: There are n=1,000 trials of the experiment so the expected counts are:
E<1yr = np<1yr = 1000(0.228) = 228
E1-2yr = np1-2yr = 1000(0.239) = 239
E3-4yr = np3-4yr =1000(0.176) = 176
E≥5yr = np ≥5yr = 1000(0.357) = 357
Solution
12-12
Test Statistic for Goodness-of-Fit Tests
Let Oi represent the observed counts of category i, Ei
represent the expected counts of category i, k representthe number of categories, and n represent the number ofindependent trials of an experiment. Then the formula
approximately follows the chi-square distribution withk-1 degrees of freedom, provided that1. all expected frequencies are greater than or equal to 1
(all Ei ≥ 1) and2. no more than 20% of the expected frequencies are less
than 5.
2 Oi E i 2
E i
i 1, 2,, k
12-13
Step 1: Determine the null and alternative hypotheses. H0: The random variable follows a
certain distribution H1: The random variable does not
follow a certain distribution
The Goodness-of-Fit Test
To test the hypotheses regarding a distribution, we use the steps that follow.
12-14
Typically the hypotheses can be symbolically represented as:
The Goodness-of-Fit Test(hypotheses cont.)
kk ppppppH ˆ...ˆ,ˆ: 22110
vs. the alternative:
conformnot does ˆ oneleast At : ia pH
12-15
Step 2: Decide on a level of significance, , depending on the seriousness of making a Type I error.
12-16
Step 3: a) Calculate the expected counts for each
of the k categories. The expected counts are Ei=npi for i = 1, 2, … , k where n is the number of trials and pi is the probability of the ith category, assuming that the null hypothesis is true.
12-17
Step 3: b) Verify that the requirements for the goodness-
of-fit test are satisfied.1. All expected counts are greater than or
equal to 1 (all Ei ≥ 1).2. No more than 20% of the expected counts
are less than 5.c) Compute the test statistic:
Note: Oi is the observed count for the ith category.
02
Oi E i 2
E i
12-18
CAUTION!
If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low-frequency categories into a single category.
12-19
Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic.
P-Value Approach
12-20
Step 5: If the P-value < , reject the null hypothesis.If the P-value ≥ α, fail to reject the null hypothesis.
P-Value Approach
12-21
Step 6: State the conclusion in the context of the problem.
Note: in many cases, when the null hypothesis is rejected at the conclusion of the test, we will have to attempt to explain what the non-conformity was.
12-22
A sociologist wishes to determine whether the distribution forthe number of years care-giving grandparents are responsiblefor their grandchildren is different today than it was in 2000.
According to the United States Census Bureau, in 2000, 22.8%of grandparents have been responsible for their grandchildrenless than 1 year; 23.9% of grandparents have been responsiblefor their grandchildren for 1 or 2 years; 17.6% of grandparentshave been responsible for their grandchildren 3 or 4 years; and35.7% of grandparents have been responsible for theirgrandchildren for 5 or more years. The sociologist randomlyselects 1,000 care-giving grandparents and obtains thefollowing data.
Parallel Example 2: Conducting a Goodness-of -Fit Test
12-23
Test the claim that the distribution is different today than it was in 2000 at the = 0.05 level of significance.
12-24
Step 1: We want to know if the distribution today is different than it was in 2000. The hypotheses are then:
H0: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000
H1: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000
Solution
12-25
Or alternatively:
357.ˆ,176.ˆ
,239.ˆ,228.ˆ:
43
210
pp
ppH
conformnot does ˆ oneleast At : ia pH
12-26
Step 2: The level of significance is =0.05.
Step 3:
(a) The expected counts were computed in Example 1.
Solution
Number of Years
Observed Counts
Expected Counts
<1 252 228
1-2 255 239
3-4 162 176
≥5 331 357
12-27
Step 3:
(b) Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied.
(c) The test statistic is
Solution
02
252 228 2
228
255 239 2
239
162 176 2
176
331 357 2
3576.605
12-28
Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of . Thus, P-value ≈ 0.09.
Solution: P-Value Approach
02 6.605
12-29
Step 5: Since the P-value ≈ 0.09 is greater than the level of significance = 0.05, we fail to reject the null hypothesis.
Solution: P-Value Approach
12-30
Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the = 0.05 level of significance.
Solution