chapter 11inferences on two samples ch 11.1 inference ... 227 fa… · we must verify the...
TRANSCRIPT
Statcrunch review:
Confidence Intervals & Hypothesis testing; test statistic, p-value (Stat);
Critical Values (Stat-Calculators)
𝑝 (proportion, percent) Proportion stats normal
𝜇 (means) T stats T
𝜎 (standard deviation & variance)
Variance stats Chi-square for one variance/SD F for two variances/SD
Chapter 11Inferences on Two Samples
Ch 11.1 Inference about Two Population Proportions
Objective A :Distinguish between Independent and Dependent Sampling
Example 1: Determine whether each sampling method is independent or dependent.
(a) Test scores of the same students in English and Math.
Dependent: The English and Math score are recorded for the same individual.
(b) The effectiveness of two different diets on two different groups of
individuals.
Independent: The two different diet results are recorded for different individuals.
Objective B :Test Hypotheses or Confidence Intervals Regarding Two Proportions from
Independent Samples
Example 1:
The drug Prevnar is a vaccine meant to prevent certain types of bacterial meningitis. It is
typically administered to infants starting around 2 months of age. In randomized, double-blind
clinical trials of Prevnar, infants were randomly divided into two groups. Subjects in group 1
received Prevnar, while subjects in group 2 received a control vaccine (also referred to as a
placebo). After the second dose, 137 of 452 subjects in the experimental group (group 1)
experienced drowsiness as a side effect. After the second dose, 31 of 99 subjects in the control
group (group 2) experienced drowsiness as a side effect. Does the evidence suggest that a
lower proportion of subjects in group 1 experienced drowsiness as a side effect than subjects in
group 2 at the 0.05 level of significance?
Note: Double-blind means that the subjects and doctors were not aware which treatment was
being assigned in order to avoid bias in behavior.
Dependent samples or Independent Samples for Two Proportions? Independent
Group 1: sample proportion (infants who received the vaccine Prevnar)
3031.0452
137ˆ
1
11
n
xp
Group 2: sample proportion (control group)
3131.099
31ˆ
2
22
n
xp
(a) Setup
Ho: The proportion of infants who experienced drowsiness is the same for both groups. (There
is no difference between the two groups.)
H1: The proportion of infants who experienced drowsiness for the Prevnar group is less than
the control group.
Or
Ho: 21 pp ---> 021 pp
H1: 21 pp ---> 021 pp
(b) P value using Statcrunch
We must verify the requirements to perform the hypothesis test between
two population proportions.
Stat --> Proportion Stats --> Two Sample --> With Summary -->
Input the following, --> Compute
StatCrunch Results:
P-value = 0.4221 not unusual, not less than 0.05 or 5%. Do not reject the null hypothesis. 𝐻𝑜
(c) Conclusion
There is not enough evidence to support that a lower proportion of infants who took Prevnar
experienced drowsiness as a side effect compared to infants who took the placebo. Thus, there
is no significant difference between the groups. Taking the vaccine Prevnar made no significant
difference.
d) Find the 95% confidence interval. Does the CI support the same conclusion as the hypothesis
test?
Options – edit – select confidence interval – compute
(-0.111, 0.091) We are 95% confident that the true difference between the two groups is
between -0.111 and 0.091.
Since the difference of zero is in the CI, then this supports the hypothesis test that there is no
significant difference between the two groups.
Example 2: The body mass index (BMI) of an individual is one measure that is used to
judge whether an individual is at a healthy weight. A BMI between 20 and
25 indicates that one is at a normal weight. In a survey of 750 men and 750
women, the Gallup organization found that 203 men and 270 women were
normal weight. Construct a 90% confidence interval to gauge whether there
is a difference in the proportion of men and women who are normal weight.
Interpret the interval.
We must verify the requirements for constructing a confidence interval for the
difference between two population proportions.
a. Is this a dependent or an independent sampling of two proportions?
Independent
Group 1 Men: 271.0750
203ˆ
m
m
mn
xp
Group 2 Women: 360.0750
270ˆ
w
w
wn
xp
b. Find the C. I. for 21 pp at a 90% confidence level
Stat --> Proportion Stats --> Two Sample --> With Summary --> Input the following Compute
90% confidence interval for wm pp is (-0.129, -0.050).
We are 90% confident that the true difference in the proportion of men and women that are
normal weight is between – 12.9% and -5.0%.
Since the difference of zero is not in the CI, then we can conclude that there is a significant
difference between the two groups.
What do the negative values in the CI mean? Since the CI contained differences that were
negative then this meant that the proportion of women who were at a normal weight was
higher compared to the proportion of men (which we knew from calculating the sample
proportions).
c) Conduct a hypothesis test. How does this support the conclusion from the CI?
Set up:
Ho: There is no difference in proportions between genders who are at a normal weight.
H1: There is a difference in proportions between genders who are at a normal weight.
Or
Ho: 21 pp ---> 021 pp
H1: 21 pp ---> 021 pp
P - value
Options-edit-select hypothesis test ≠ - compute
p-value = 0.0002 which is unusual since it is less than 10% (90% confidence).
P value is low, the null must go. Reject the null hypothesis Ho
Conclusion: There is enough evidence to support the claim that there is a difference in
proportions between genders who are at a normal weight.
This is the same conclusion obtained from the confidence interval.
Hw 11.1#10: a) stat-table-frequency-select columns: gender response-response: group by
gender-compute Write down proportions and ‘yes’ out of ‘total’
Then do a regular two sample proportion test with summary of data
Interpreting p-value: If the population proportions are equal, one would expect a sample
difference proportion greater than the absolute value of the one observed in about ___ out of
100 repetitions of this experiment.
Ch 11.2Inference about Two Means: Dependent Samples
Objective A :Test Hypotheses or Confidence Intervals about the Population Mean
Difference of Matched-Pairs Data
Example 1: In an experiment conducted online at the University of Mississippi, study
participants are asked to react to a stimulus. In one experiment, the
participant must press a key on seeing a blue screen. Reaction time (in
seconds_ to press the key is measured. The same person is then asked
to press a key on seeing a red screen, again with reaction time measured.
The results for six randomly sampled study participants are as follows:
(a) Why are these matched-pairs data?
(Dependent) The same participants are used for comparison.
(b) Is the reaction time to the blue stimulus different from the reaction
time to the red stimulus at the 0.01 level of significance?
Note: A normal probability plot and boxplot of the data indicate that
the differences are approximately normally distributed with no outliers.
Setup:
Ho: There is no difference in reaction time to the blue and red stimulus.
H1: There is difference in reaction time to the blue and red stimulus.
or
Ho: 𝜇1 = 𝜇2 0d
H1: 𝜇1 ≠ 𝜇2 0d
P – value:
Input reaction time for seeing a blue screen in column 1 and reaction time for seeing a red
screen in column 2
stat--> T Stats --> Paired --> sample 1 var1, sample 2 var2. Select Save: differences -->Compute
(see below)
Hw: 11.2 #2 asks for differences so if you select differences you won’t have to do by hand. The
mean difference will show up in the hypothesis test under ‘mean’. The standard deviation you
can get from summary stats for the difference column.
StatCrunch Results:
P – value is 0.2466 which is not unusual. 0.2466 is not less than 0.01 (1% significance level).
Cannot reject the null hypothesis, Ho.
Conclusion:
There is not enough evidence to support the claim that the reaction time is different between
the blue and red stimulus.
(c) Construct a 99% confidence interval about the population mean difference. Interpret your
results.
Option – edit Stat –select confidence interval, input 0.99 --> Compute
99% confidence interval for d is (-0.193, 0.379).
We are 99% confidence that the true difference in the reaction time between the
blue stimulus and red stimulus is between -0.193 second and 0.379 second.
d) How does the confidence interval support the conclusion from the hypothesis test?
Since the confidence interval includes the difference of zero, then there is no significant
difference in reaction time to the two stimuli. This is the same conclusion as before for the
hypothesis test.
Homework:
11.2 #2 raw data given part a) stat-t stats-paired-select ‘save differences’. This way statcrunch
computes the differences
11.2 #3 summary of data given b) Since the differences have already been calculated then we
only have one sample of differences. Stat-t stats-one sample- with summary
If time do 11.2 #4 together
Ch 11.3 Inference about Two Means: Independent Samples Objective A :Test Hypotheses or Confidence Intervals regarding the Difference of Two
Independent Means
If we can assume 21 , use t distribution with POOLED standard error.
In general, we use t distribution without POOLED standard error unless instructed
otherwise.
Project: Is there a difference in the means between males and females?
Ho: There is no difference in the mean……. between genders.
H1: There is a difference in the mean ………. between genders.
or
Ho: mf ---> 0 mf
H1: mf ----> 0 mf
Example 1:
(a) The normal probability plots indicated the samples came from the populations that
are normally distributed. The boxplots indicated the samples had no outliers. Assuming
the samples were randomly selected and each sample size is no more than 5% of the
population size, Welch's t-test can be used.
(b) Hypothesis test: Independent Samples for Two Means
Setup:
Ho: There is no difference between genders in mean reaction times.
H1: There is a difference between genders in mean reaction times.
Or
Ho: mf ---> 0 mf
H1: mf ----> 0 mf
P – value
Enter raw data into column 1 (female)and 2 (male).
Stat-t stat- two sample – with data – sample 1 (female var 1), sample 2 (male var 2), hypothesis
test - compute
P – value = 0.5568 is not unusual. 0.5568 is not less than the significance level of 0.05 (or 5%). Cannot
reject the null hypothesis Ho.
Conclustion
There is not enough evidence to support the claim that there is a significant difference in the
reaction times between genders.
Conduct the CI: options – edit – select confidence interval 0.95 – compute
(- 0.61, 0.110)
We are 95% confident that the true difference in means between the genders is between -0.61
and 0.110 seconds.
How does this support the conclusion from the hypothesis test?
Since the difference of zero is in the confidence interval then there is no sigficant difference in reation
times between the genders. This supports the conclusion from the hypothesis test.
(c) Graph --> Boxplot --> Select Female Students, press the control key then select Male Students-->
Input the following --> Compute
The medians are similar for both genders. However, the female data had more variability wheras the
male data was more consistant. No outlers were present for either gender data. The overall spread is
similar as well.
Example 2:
a) What is the typical time it takes to chill glass and aluminum?
Glass: 133.8 ± 9.9 minutes
Glass takes between 123.9 and 143.7 minutes to chill a bottle of beer.
Aluminum: 92.4 ± 7.3 minutes
Aluminum takes between 85.1 and 99.7 minutes to chill a bottle of beer.
b) Is this a dependent or independent sampling? Independent
c) Construct and interpret a 90% confidence intervals for AG .
Stat --> T Stats --> Two Sample --> With Summary --> Input the following
and select confidence interval 0.90--> Compute
CI (38.1, 44.7)
One can be 90% confident that the true mean difference in cooling time between glass and alumiinum is
between 38.1 and 44.7 minutes.
Since the difference of zero is not in the confidence interval than we can conclude that there is a
significant difference in mean cooling times between glass and aluminum. It appears that glass takes
longer to chill.
d) Perform a hypothesis test:
set up:
Ho: There is no difference between cooling times for glass and aluminum.
H1: There is a difference between cooling times for glass and aluminum.
Or
Ho: Ag ---> 0 Ag
H1: Ag ----> 0 Ag
p-value:
options – edit – hypothesis test – compute
p – value < 0.0001 which is unusual. 0.0001 is less than 0.10 ( or 0.01% is less than 10%)
If the p is low, the null must go. Reject the null hypothesis.
Conclusion:
There is evidence to support the claim that there is a significant difference between cooling
times for glass and aluminum.
This supports the conclusion from the CI.
Homework 11.3#8: Hypothesis conclusions cannot imply ‘causation’.
X caused y to happen.
Ch 11.4 Inference about Two Population Standard Deviations
Objective A : Fisher’s F distribution
Objective B : Test Hypotheses regarding Two Population Standard Deviations
Example 1:
Assume that the populations are normally distributed.
Concepts for Hypothesis Tests with Two Variances
2
2
2
1
s
sF where 2
1s is the larger of the two sample variances. Sample 1 is the larger, sample 2 is the
smaller.
If the two populations really do have equal variances, then the ratio of2
2
2
1
s
s should be close to 1 because
2
1s and 2
2s tend to be close in value. If the two populations do not have equal variances, then the ratio of
2
2
2
1
s
swill be bigger than 1 by selecting 2
1s be the larger sample variance. Consequently, a large value of
F will be evidence against 2
2
2
1 .
First, we need to identify which of the two given standard deviations will be used for 2
1s ( 2
1s is the larger
of the two sample variances). Take the standard deviations and square them to get the variance.
In this problem, 64.842.9 22
1 s and 96.736.8 22
2 s 9.2 is SD, 84.64 is variance
Stat --> Variance Stats --> Two Sample --> With Summary ---> Input the following
---> Compute
StatCrunch Results:
Since the P-value (0.7659) is not less than 1.0 , do not reject Ho: 2
2
2
1 .
There is not sufficient evidence to conclude that 2
2
2
1 <--> 21
Or
There is not sufficient evidence to conclude that the standard deviations are not the same.
Example 2:
Identify 2
1s : 2
1s is the larger of the two sample variances. Take the standard deviations and find the
variances (by squaring the SD’s).
28095322
1 s ---> 2601 n
11563422
2 s ---> 2692 n
Stat --> Variance Stats --> Two Sample --> With Summary ---> Input the following
---> Compute
Since the P-value (<0.0001) is less than 05.0 , reject Ho: 2
2
2
1 .
There is sufficient evidence to conclude that the standard deviation walking speed is different between
the two groups.