goodness of fit tests categorical data and the analysis...
TRANSCRIPT
![Page 1: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/1.jpg)
Chapter 12The Analysis of
Categorical Data and Goodness of Fit Tests
![Page 2: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/2.jpg)
There are six colors – so k = 6.
Suppose we wanted to determine if the proportions for the different colors in a large bag of M&M candies matches the proportions that the company claims is in their candies.
We could record the color of each candy in the bag.
![Page 3: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/3.jpg)
M&M Candies Continued . . .
We could count how many candies of each color are in the bag.
252219212823BrownOrangeYellowGreenBlueRed
![Page 4: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/4.jpg)
Goodness-of-Fit Test Procedure
Null Hypothesis: H0: p1 = hypothesized proportion for Category 1
pk = hypothesized proportion for Category kHa: H0 is not true
Test Statistic:
. . .
![Page 5: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/5.jpg)
Goodness-of-Fit Test Procedure Continued . . .P-values: When H0 is true and all expected counts are
at least 5, X2 has approximately a chi-square distribution with df = k – 1. Therefore, the P-value associated with the computed test statistic value is the area to the right of X2 under the df = k – 1 chi-square curve.
Assumptions:1) Observed cell counts are based on a random sample2) The sample size is large enough as long as every
expected cell count is at least 5
![Page 6: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/6.jpg)
• Different df have different curves• χ2 curves are skewed right• As df increases, the χ2 curve shifts
toward the right and becomes more like a normal curve
Facts About χ2 distributions
![Page 7: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/7.jpg)
A common urban legend is that more babies than expected are born during certain phases of the lunar cycle, especially near the full moon.The table below shows the number of days in the eight lunar phases with the number of births in each phase for 24 lunar cycles.
Lunar Phase Number of Days Number of BirthsNew Moon 24 7680Waxing Crescent 152 48,442First Quarter 24 7579Waxing Gibbous 149 47,814Full Moon 24 7711Waning Gibbous 150 47,595Last Quarter 24 7733Waning Crescent 152 48,230
![Page 8: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/8.jpg)
Lunar Phases Continued . . .Let:
p1 = proportion of births that occur during the new moonp2 = proportion of births that occur during the waxing crescent moonp3 = proportion of births that occur during the first quarter moonp4 = proportion of births that occur during the waxing gibbous moonp5 = proportion of births that occur during the full moon p6 = proportion of births that occur during the waning gibbous moon p7 = proportion of births that occur during the last quarter moonp8 = proportion of births that occur during the waning crescent moon
![Page 9: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/9.jpg)
Lunar Phase Observed Number of Births
Expected Number of Births
New Moon 7680 7641.49Waxing Crescent 48,442 48455.52First Quarter 7579 7641.49Waxing Gibbous 47,814 47,497.55Full Moon 7711 7641.49Waning Gibbous 47,595 47809.45Last Quarter 7733 7641.49Waning Crescent 48,230 48,455.52
Lunar Phases Continued . . .
H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175
Ha: H0 is not true
![Page 10: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/10.jpg)
Lunar Phases Continued . . .
H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175
Ha: H0 is not true
Test Statistic:
P-value > .10 df = 7 α = .05Since the P-value > α, we fail to reject H0. There is not sufficient evidence to conclude that lunar phases and number of births are related.
Type II
![Page 11: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/11.jpg)
A study was conducted to determine if collegiate soccer players had in increased risk of concussions over other athletes or students. The two-way frequency table below displays the number of previous concussions for students in independently selected random samples of 91 soccer players, 96 non-soccer athletes, and 53 non-athletes.
240152245158Total5303545Non-Athletes96581568Non-Soccer Players9110112545Soccer Players
Total3 or more210
Number of Concussions
![Page 12: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/12.jpg)
X2 Test for Homogeneity
Null Hypothesis: H0: the true category proportions are the same for
all the populations or treatments
Alternative Hypothesis:Ha: the true category proportions are not all the
same for all the populations or treatments
Test Statistic:
![Page 13: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/13.jpg)
X2 Test for Homogeneity Continued . . .Expected Counts: (assuming H0 is true)
P-value: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of X2 under the appropriate chi-square curve.
![Page 14: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/14.jpg)
X2 Test for Homogeneity Continued . . .
Assumptions:1) Data are from independently chosen random
samples or from subjects who were assigned at random to treatment groups.
2) The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.
![Page 15: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/15.jpg)
Soccer Players Continued . . .
240152245158Total5303545Non-Athletes96581568Non-Soccer Players9110112545Soccer Players
Total3 or more210
Number of Concussions
H0: Proportions in each response category (number of concussions) are the same for all three groups
Ha: Category proportions are not all the same for all three groups
Df = (2)(3) = 6
![Page 16: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/16.jpg)
240152245158Total530 (3.3)3 (4.9)5 (10.0)45 (34.9)Non-Athletes965 (6.0)8 (8.8)15 (18.0)68 (63.2)Non-Soccer Players9110 (5.7)11 (8.325 (17.1)45 (59.9)Soccer Players
Total3 or more210
Number of Concussions
2402245158Total533 (8.2)5 (10.0)45 (34.9)Non-Athletes9613 (14.8)15 (18.0)68 (63.2)Non-Soccer Players9121 (14.0)25 (17.1)45 (59.9)Soccer Players
Total2 or more10
Number of Concussions
Soccer Players Continued . . .
df = 4Test Statistic:
![Page 17: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/17.jpg)
2402245158Total533 (8.2)5 (10.0)45 (34.9)Non-Athletes9613 (14.8)15 (18.0)68 (63.2)Non-Soccer Players9121 (14.0)25 (17.1)45 (59.9)Soccer Players
Total2 or more10
Number of Concussions
Soccer Players Continued . . .
Since the P-value < α, we reject H0. There is strong evidence to suggest that the category
proportions for the number of concussions is not the same for the 3 groups.
![Page 18: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/18.jpg)
X2 Test for Independence
Null Hypothesis: H0: The two variables are independent
Alternative Hypothesis:Ha: The two variables are not independent
Test Statistic:
![Page 19: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/19.jpg)
X2 Test for Independence Continued . . .Expected Counts: (assuming H0 is true)
P-value: When H0 is true and assumptions for X2 test are satisfied, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of X2 under the appropriate chi-square curve.
![Page 20: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/20.jpg)
X2 Test for Independence Continued . . .
Assumptions:1) The observed counts are based on data from a
random sample. 2) The sample size is large: all expected cell
counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.
![Page 21: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/21.jpg)
The paper “Contemporary College Students and Body Piercing” (Journal of Adolescent Health, 2004) described a survey of 450 undergraduate students at a state university in the southwestern region of the United States. Each student in the sample was classified according to class standing (freshman, sophomore, junior, senior) and body art category (body piercing only, tattoos only, both tattoos and body piercing, no body art). Is there evidence that there is an association between class standing and response to the body art question? Use α = .01.
Body Piercing Only Tattoos Only
Both Body Piercing and
Tattoos No Body ArtFreshman 61 7 14 86Sophomore 43 11 10 64Junior 20 9 7 43Senior 21 17 23 54
![Page 22: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/22.jpg)
Body Art Continued . . .
Body Piercing Only Tattoos Only
Both Body Piercing and
Tattoos No Body ArtFreshman 61 7 14 86Sophomore 43 11 10 64Junior 20 9 7 43Senior 21 17 23 54
H0: class standing and body art category are independent
Ha: class standing and body art category are not independent
df = 9
Body Piercing Only Tattoos Only
Both Body Piercing and
Tattoos No Body ArtFreshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)
![Page 23: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/23.jpg)
Body Art Continued . . .
Test Statistic:
P-value < .001 α = .01
Body Piercing Only Tattoos Only
Both Body Piercing and
Tattoos No Body ArtFreshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)
![Page 24: Goodness of Fit Tests Categorical Data and The Analysis ofbfhskasten2.weebly.com/.../copy_of_chapter_12.ppt.pdf · Chapter 12 The Analysis of Categorical Data and Goodness of Fit](https://reader033.vdocument.in/reader033/viewer/2022050113/5f4a07df91bb81620f671af9/html5/thumbnails/24.jpg)
Body Art Continued . . .
Since the P-value < α, we reject H0. There is sufficient evidence to suggest that class standing and the body art category are associated.
Body Piercing Only Tattoos Only
Both Body Piercing and
Tattoos No Body ArtFreshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)