chapter 11 inference for tables: chi-square procedures 11.1 target goal:i can compute expected...
TRANSCRIPT
Chapter 11Chapter 11Inference for Tables: Inference for Tables:
Chi-Square Chi-Square ProceduresProcedures
11.111.1
Target Goal:Target Goal: I can compute I can compute expected counts, conditional expected counts, conditional distributions, and contributions to the distributions, and contributions to the chi-square statistic.chi-square statistic.
h.w: pg. 621: 1, 3, 5, 9, 11 h.w: pg. 621: 1, 3, 5, 9, 11
Test for Goodness of FitTest for Goodness of Fit
To analyze To analyze categorical datacategorical data, we , we construct construct two-way tablestwo-way tables and and examine the examine the counts or percentscounts or percents of of the explanatory and response the explanatory and response variables.variables.
Count and record M&M colors per Count and record M&M colors per bag.bag.
Expected count:Expected count:
M&Ms Color Distribution % M&Ms Color Distribution % according to their websiteaccording to their website
Brown Yellow Red Blue Orange Green
Plain 13 14 13 24 20 16
Peanut 12 15 12 23 23 15
Peanut Butter/ Almond
10 20 10 20 20 20
We want to We want to compare the observed compare the observed counts to the expected counts.counts to the expected counts.
The The null hypothesis null hypothesis is that there is is that there is no differenceno difference between the between the observed and expected counts.observed and expected counts.
The The alternative hypothesisalternative hypothesis is that is that there there is a differenceis a difference between the between the observed and expected counts observed and expected counts
Simulate count of M&M’s bagSimulate count of M&M’s bagor use own M&M’s bagor use own M&M’s bag
Label:Label:1-131-13 BrownBrown14-2714-27 YellowYellow28-4028-40 RedRed41-6441-64 Blue Blue65-8465-84 Orange Orange85-0085-00 Green Green
Math:Prb:Math:Prb:Randint(0,99,50)Randint(0,99,50) sto in L1 sto in L1Sort in ascending and tally.Sort in ascending and tally.
Chi-square statisticChi-square statistic
It measures It measures how well the observed how well the observed counts counts fitfit the expected counts the expected counts, , assuming that the null hypothesis is assuming that the null hypothesis is true.true.
2
2 O E
E
Go to Blank student notes.Go to Blank student notes.
The distribution of the The distribution of the chi-square statisticchi-square statistic is is
called the called the chi-square distribution, chi-square distribution, X X 22..
This distribution is a density curve.This distribution is a density curve.The The total areatotal area under the curve is under the curve is 11. . The curve The curve begins at zerobegins at zero on the on the
horizontal axis and is horizontal axis and is skewed right.skewed right. As the As the degrees of freedom increasedegrees of freedom increase, ,
the the shapeshape of the curve becomes of the curve becomes more more symmetric. symmetric.
Pg. 703
““Goodness of Fit TestGoodness of Fit Test.” .”
Using the M&M Minis® chi-square Using the M&M Minis® chi-square statistic, find the probability of statistic, find the probability of obtaining a obtaining a XX22 value at least this value at least this extreme assuming the null extreme assuming the null hypothesis is true.hypothesis is true.
Use your Use your Chi-square statistic and df = 6-1 = Chi-square statistic and df = 6-1 = 55
P-value = XP-value = X22 cdf(lb,up,df) cdf(lb,up,df)
CONDITIONS for Individual CONDITIONS for Individual Expected CountsExpected Counts::
The Goodness of Fit Test may be The Goodness of Fit Test may be used when used when all expected all expected counts are at counts are at least 1least 1 and and no more thanno more than 20% of the 20% of the expected counts are less than 5.expected counts are less than 5.
Following the Goodness of Fit Test, Following the Goodness of Fit Test, check to see check to see which component which component made made the greatest contribution to the chi-the greatest contribution to the chi-square statistic to square statistic to see where the see where the biggest changes occurred. biggest changes occurred.
Conditions for Chi-Square Conditions for Chi-Square TestTest
Random:Random: The data come from a The data come from a random sample or a randomized random sample or a randomized experiment.experiment.
Large sample size: Large sample size: All expected All expected counts are counts are at least 5.at least 5.
Independent:Independent: Individual Individual observations are independent. When observations are independent. When sampling without replacement, sampling without replacement, check check the 10% condition.the 10% condition.
Ex: The Graying of AmericaEx: The Graying of America
It is believed that with better It is believed that with better medicine and healthier lifestyles, medicine and healthier lifestyles, people are living longer and people are living longer and consequently a larger percentage of consequently a larger percentage of the population is of retirement age. the population is of retirement age. Compare distribution of 1980 Compare distribution of 1980 population to 1996 population.population to 1996 population.
Step 1:Step 1: State -State - Identify the population of interest and the Identify the population of interest and the parameter you want to draw a conclusion about. parameter you want to draw a conclusion about.
State the hypothesis in words and symbols.State the hypothesis in words and symbols.
We want determine if the We want determine if the distribution distribution of age groupsof age groups in the United States in in the United States in 1996 1996 has changed significantlyhas changed significantly from from the 1980 distribution.the 1980 distribution.
HHoo: the age group dist. in 1996 is the : the age group dist. in 1996 is the same assame as the 1980 dist.the 1980 dist.
HHaa: the age group dist. in 1996 is : the age group dist. in 1996 is different from thedifferent from the 1980 dist. 1980 dist.
Or, Or, State the hypothesis as proportionsState the hypothesis as proportions..
HHoo: : pp0-240-24 = 0.4139, p = 0.4139, p25-4425-44 = 0.2768, p = 0.2768, p45-45-
6464 = 0.1964,= 0.1964, p p65+65+ = = 0.1128.0.1128.
HHaa: : at least one of the proportions at least one of the proportions differsdiffers from the stated values. from the stated values.
Goal of “Goodness of Fit Goal of “Goodness of Fit Tests”Tests”
The The more the observed counts differ more the observed counts differ from the expected countsfrom the expected counts, the more , the more the evidence we have the evidence we have to reject Hto reject Hoo and thus and thus concludeconclude that the that the population dist. in 1996 population dist. in 1996 is is significantly different from significantly different from 1980.1980.
Always a good idea to plot the Always a good idea to plot the data.data.
Step 2:Step 2: Plan -Plan - Choose the appropriate inference Choose the appropriate inference procedure. procedure. Verify the conditionsVerify the conditions for using the for using the
selected procedure.selected procedure.
If the conditions are met, conduct If the conditions are met, conduct a a chi-square goodness of fit test.chi-square goodness of fit test.
Random: Random: We must assume the two We must assume the two distributions of age groups come distributions of age groups come from a from a randomized experiment. randomized experiment.
Calculate expected countsCalculate expected counts in each age in each age category and verify that they are large category and verify that they are large enough (see conditions). enough (see conditions). Yes, all > 5; Proceed
with Chi – square calculations
Independent:Independent:We clearly have two independent We clearly have two independent
age groups, one from 1980 and one age groups, one from 1980 and one from 1996. from 1996. We must check the 10% We must check the 10% condition. condition.
There are at least 10(286,598) U.S There are at least 10(286,598) U.S citizens in 1980 and at least 10(500) citizens in 1980 and at least 10(500) U.S citizens in 1996.U.S citizens in 1996.
Step 3:Step 3: Do -Do - If the conditions are met, carry If the conditions are met, carry out the inference procedure.out the inference procedure.
Calculate theCalculate the x x 22 statistic statistic to measure to measure how well the observed counts (O) how well the observed counts (O) differ form the expected counts (E) differ form the expected counts (E) under Hunder Hoo.. 2
2 O E
E
A A large value oflarge value of x x 22 shows shows more more evidence against Hevidence against Hoo and also results and also results in a in a small P-value.small P-value.
Calculate P-valueCalculate P-value
df: use df: use n-1 n-1 degrees of freedom.degrees of freedom.This is because This is because X X 22 the family of the family of
curves is used to assess evidence curves is used to assess evidence against Hagainst Hoo..
Since we are using percentagesSince we are using percentages, 3 of , 3 of the 4 percentages are allowed to the 4 percentages are allowed to vary, the vary, the 4th is not.4th is not.
Df = Df = 44-1 = 3, -1 = 3,
Table C for a P-value of 0.05, critical Table C for a P-value of 0.05, critical value is value is 7.817.81. .
Calc: 2nd VARS: Calc: 2nd VARS: X X 22 cdf(8.2275,E99,3)cdf(8.2275,E99,3)
.0415.0415
Step 4.Step 4. Conclude -Conclude - Interpret the Interpret the results in the context of the problem.results in the context of the problem.
Since Since our value of 8.2275 is more our value of 8.2275 is more extreme than 7.81extreme than 7.81, , we reject Hwe reject Hoo and conclude that the and conclude that the population population dist. in 1996 is significantly differentdist. in 1996 is significantly different from the 1980 dist. at the 5% level.from the 1980 dist. at the 5% level.
To be cont.To be cont.