(Please grab a chromebook)Warm Up: Breakfast time
Researchers designed a survey to compare the
proportions of children who come to school
without eating breakfast in two low-income
elementary schools. An SRS of 80 students from
School 1 found that 19 had not eaten breakfast.
At School 2, an SRS of 150 students included 26
who had not had breakfast. More than 1500
students attend each school. Do these data
give convincing evidence at the πΌ = 0.05 level
of a difference in population proportions?
Step 1:
p1= the true proportion of students at school 1
who did not eat breakfast
p2= the true proportion of students at school 2 who did not eat breakfast
H0: p1-p2 = 0
HA: p1-p2 β 0
Step 2 Solution If conditions are met, we should perform a two-
sample z test for the difference of proportions1. Random?
Yes
2. 10%:Sampling without replacement, so check:
80β€π
πππππ and 150β€
π
ππππππ
3. Large counts:
The counts of βsuccessesβ and βfailuresβ in each sample or group are all at least 10
ππ β₯ ππ, πππ ππ β₯ ππππ β₯ ππ, πππ πππ β₯ ππ
Step 3 Solution
π1 = 80ΖΈπ1 = 0.2375π2 = 150ΖΈπ2 = 0.1733,
ΖΈπ1 β ΖΈπ2 = 0.0642ΖΈππ = 0.1957
Test statistic: π = π. ππ
P-value: 0.2427
Step 4 Solution
Because our P-value, 0.2427, is greater than
πΌ = 0.05, we fail to reject the null hypothesis.
There is not convincing evidence that the
true proportions of students at the two
schools who didnβt eat breakfast are
different.
What IS the average difference between
the proportion of students from each
school who have not eaten breakfast?
To answer this, we will use a confidence
interval about a difference in proportions.
Conditions for constructing a
confidence interval about a
difference in proportions.1. Random?
The data come from two independent random samples or from two groups in a randomized experiment.
2. 10%:
When sampling without replacement, check that ππ β€π
πππ΅π and ππ β€
π
πππ΅π
Do NOT need to check 10% in an experiment (unless the individuals are randomly sampled).
3. Large counts:
The counts of βsuccessesβ and βfailuresβ in each sample or group are all at least 10
ππΰ·ππ β₯ ππ, πππ ππ(π β ΰ·ππ) β₯ ππππΰ·ππ β₯ ππ, πππ ππ(π β ΰ·ππ) β₯ ππ
Proceeding with Confidence
Intervals
Because we do not know the values of the
parameters p1and p2 in the standard
deviation formula, they are replaced by
ΖΈπ1 πππ ΖΈπ2. The result is the standard error of
the statistic, ΖΈπ1 β ΖΈπ2.
πΊπ¬ΰ·ππβΰ·ππ =ΰ·ππ(π β ΰ·ππ)
ππ+ΰ·ππ(π β ΰ·ππ)
ππ
What Standard Error tells us
Remember, standard error tells us how far
the difference of the statistic (in this case,
the difference in sample proportions) will
typically be from the difference in
population proportions if we repeat the
random sampling/assignment many times.
Confidence interval for p1- p2
The format is still
Statistic Β± (critical value) Γ (standard error)
( ΖΈπ1β ΖΈπ2) Β± πΰ·ππ(π β ΰ·ππ)
ππ+ΰ·ππ(π β ΰ·ππ)
ππ
Ending interval should look likeβ¦
< π1 β π2 <
Example 1
Back to the Breakfast problem
Construct the confidence interval for the true
difference in proportions of students between
School 1 and School 2 who did not have
breakfast on a given morning.
We are 95% that the interval from -0.047 to 0.175
captures the true difference in proportions of
students from School 1 and School 2 who did
not have breakfast on a given morning.
Significance test conclusions
and confidence intervalsWe are always looking for zero in the confidence interval.
β’ If zero is in the confidence interval, this means that there isnβt a significant difference between the proportions of two populations. The null hypothesis should have failed to be rejected.
β’ If zero is not in the confidence interval, this means that there is a significant difference between the proportions of two populations. The null hypothesis should have been rejected.
β’ An interval bound by two negatives tells us that the second populationβs proportion was larger than the first.
β’ An interval bound by two positives tells us that the first population proportion was larger.
Example 2Many news organizations conduct polls asking adults in the United States if they approve of the job the president is doing. How did President Obamaβs approval rating change from October 2012 to October 2013? According to a Gallup poll of 1500 randomly selected U.S. adults on October 2-4, 2012, 52% approved of Obamaβs job performance. A Gallup poll of 1500 randomly selected U.S. adults on October 5-7, 2013, showed that 46% approved of Obamaβs job performance.
a. Calculate the standard error of the sampling distribution of the difference in the sample proportions (2013 - 2012). Interpret this value.
b. Use the results of these polls to construct and interpret a 90% confidence interval for the change in Obamaβs approval rating among all U.S. adults from October 2012 to October 2013.
c. Based on your interval, is there convincing evidence that Obamaβs job approval rating has changed?
Penguin StudyAre metal bands used for tagging harmful to penguins? Researchers (Saraux et al., 2011) investigated this question with a sample of 20 penguins near Antarctica. All of these penguins had already been tagged with RFID chips, and the researchers randomly assigned 10 of them to receive a metal band on their flippers in addition to the RFID chip. The other 10 penguins did not receive a metal band. Researchers then kept track of which penguins survived for the 4.5-year study and which did not.
a. Identify and classify the
explanatory and response variables in this study.Explanatory:
whether or not the penguins were
βbannedβ
Response:
Did the penguin survive or not
b. Why did the researchers
include a comparison group in
this study?
Why didnβt they just see how
many penguins survived while
wearing a metal band?
Hypothesis TestLet pm= the proportion of penguins who survived from the metal-band group.Let pc= the proportion of penguins who survived from the control group.
Ho: the survival rate of the banned penguins was the same as those without bands:
pm - pc = 0
HA: the survival rate of the banned penguins was less than those without bands:
pm - pc< 0
Results
No metal band (control)
Metal band Total
Survived 6 3 9
Did not survive 4 7 11
Total 10 10 20
The researchers found that 9 of the 20 penguins
survived, of whom 3 had a metal band and 6 did
not.
Complete the table.
Is it possible that this difference could have happened even if the metal band had no
effect; i.e., simply due simply to the random
nature of assigning penguins to groups (i.e.,
the luck of the draw)?
Sure, this is possible. But how unlikely is it? In order to answer this question we will once again turn to simulation. Like always, we will perform our simulation assuming that the null hypothesis is true: that the metal band has no effect on penguin survival. More specifically, we will assume that the 9 penguins who survived would have done so with the metal band or not, and the 11 penguins who did not survive also would have not survived with the metal band or not. This situation is too complex to be modeled by flipping a coin, but we can use cards.
Simulatione) You will need 20 cards: 10 black, 10 red. The black cards will represent the penguins who would have died no matter if they had a metal band or not, and the red cards would represent the penguins who would have lived no matter what.
f) Perform this shuffling and dealing. Then fill in the table of simulated results in the table.g) Calculate the difference in proportions who survived (control minus metal band group).
h) Repeat this shuffling and dealing a second time. Again fill in the table of simulated results:i) Calculate the difference in proportions who survived (control minus metal band group):
j) Plot both of your difference in proportions on a constructed dotplot of the differences in success proportions (control minus metal band group). Be careful to label the axis clearly and correctly.
Using the Applet We really need to carry out this simulated random assignment process hundreds, preferably thousands of times. This would be very tedious and time-consuming by hand, so letβs turn to the Two Proportions applet found at: http://www.rossmanchance.com/ISIapplets.html.
l) Enter the observed counts in the 2Γ2 table for this study into the table in the upper left cell of the applet.
β’ Click: Use Table
β’ Check the Show Shuffle Options and then press the Shufflebutton.
β’ Check the Show table box on the
β’ Press the Shuffle more times.
β’ Increase the Number of Shuffles so that the applet βshufflesβ at least 1000 total hands.
m) Describe the resulting (null) distribution of
the differences in sample proportions of
survival between the two groups.
Comment on its shape (does this look
familiar), center (why does the center make
sense?), and variability.
n) Determine the proportion of your 1000 simulated random assignments for which the results are at least as extreme as the actual study (which, youβll recall, saw a 0.30 difference in success proportions between the groups). Report and interpret this (approximate) p-value, including what is meant by the phrase βby chance aloneβ in this context.
The P-value describes how common the studyβs actual difference in penguin survival rate is, assuming that there is no difference in survival rates between the banned and non-banned penguins; it describes how often the results from the study occurred by chance alone, with no influence of the bands on the survival rates.
o) Is your simulated p-value small enough so that you would consider a difference in the observed proportion of successes of 0.30 or more to be surprising under the null model that the metal band has no effect on penguinsβ survival?
p) Explain why it would not be appropriate to conclude that this experiment provides strong evidence that the metal band has no effect on penguinsβ survival.
Full Study We have a confession to make: The results described above were actually just a small part of the study. We chose to start with partial results to make the by-hand simulation with cards fairly manageable. But the actual study involved 100 penguins, with 50 randomly assigned to each group. The researchers found that 16 of 50 survived in the metal band group, and 31 of 50 survived in the control group.
q) Record the actual results from the full study in the 2Γ2 table:
r) Determine the conditional proportions who survived in each group. Also calculate the difference in these proportions (control β metal band). Is the value of this difference similar to what you calculated earlier for the partial study?
No metal band (control)
Metal band Total
Survived 31 16 47
Did not survive 19 34 53
Total 50 50 100
s) Describe how you could use cards to perform a simulation analysis based on data from the full study. Indicate what would be different from your earlier simulation analysis, and also indicate what would remain the same.
t) Use the applet to conduct a simulation analysis based on data from the full study. Conduct at least 1000 repetitions of the random assignment. Again describe the (null) distribution of the difference in conditional proportions of survival between the two groups (shape, center, variability).
u) Use the applet to determine the (approximate) p-value from your simulation analysis. Interpret what this p-value means in this context.
s) Describe how you could use cards to perform a simulation analysis based on data from the full study. Indicate what would be different from your earlier simulation analysis, and also indicate what would remain the same.
t) Use the applet to conduct a simulation analysis based on data from the full study. Conduct at least 1000 repetitions of the random assignment. Again describe the (null) distribution of the difference in conditional proportions of survival between the two groups (shape, center, variability).
u) Use the applet to determine the (approximate) p-value from your simulation analysis. Interpret what this p-value means in this context.