week111 the t distribution suppose that a srs of size n is drawn from a n(μ, σ) population. then...
TRANSCRIPT
week11 1
The t distribution • Suppose that a SRS of size n is drawn from a N(μ, σ)
population. Then the one sample t statistic
has a t distribution with n -1 degrees of freedom.
• The t distribution has mean 0 and it is a symmetric distribution.
• The is a different t distribution for each sample size.
• A particular t distribution is specified by the degrees of freedom that comes from the sample standard deviation.
ns
xt
week11 2
Tests for the population mean when is unknown
• Suppose that a SRS of size n is drawn from a population having unknown mean μ and unknown stdev. . To test the hypothesis H0: μ = μ0 , we first estimate by s – the sample stdev., then compute the one-sample t statistic given by
• In terms of a random variable T having the t (n - 1) distribution, the P-value for the test of H0 against
Ha : μ > μ 0 is P( T ≥ t )
Ha : μ < μ 0 is P( T ≤ t )
Ha : μ ≠ μ 0 is 2·P( T ≥ |t|)
ns
xt 0
week11 3
Example • In a metropolitan area, the concentration of cadmium (Cd) in
leaf lettuce was measured in 6 representative gardens where sewage sludge was used as fertilizer. The following measurements (in mg/kg of dry weight) were obtained.
Cd 21 38 12 15 14 8 Is there strong evidence that the mean concentration of Cd is
higher than 12.
Descriptive Statistics
Variable N Mean Median TrMean StDev SE MeanCd 6 18.00 14.50 18.00 10.68 4.36
• The hypothesis to be tested are: H0: μ = 12 vs Ha: μ > 12.
week11 4
• The test statistics is:
The degrees of freedom are df = 6 – 1 = 5
Since t = 1.38 < 2.015, we cannot reject H0 at the 5% level and so there are no strong evidence.
The P-value is 0.1 < P(T(5) ≥ 1.38) < 0.15 and so is greater then 0.05 indicating a non significant result.
18 12 1.38/ 10.68/ 6
xts n
week11 5
CIs for the population mean when unknown
• Suppose that a SRS of size n is drawn from a population having unknown mean μ. A C-level CI for μ when is unknown is an interval of the form
where t* is the value for the t (n -1) density curve with area C between –t* and t*.
• Example:
Give a 95% CI for the mean Cd concentration.
n
stx
n
stx ** ,
week11 6
• MINITAB commands: Stat > Basic Statistics > 1-Sample t
• MINITAB outputs for the above problem:
T-Test of the Mean
Test of mu = 12.00 vs mu > 12.00
Variable N Mean StDev SE Mean T P
Cd 6 18.00 10.68 4.36 1.38 0.11
T Confidence Intervals
Variable N Mean StDev SE Mean 95.0 % CI
Cd 6 18.00 10.68 4.36 (6.79, 29.21)
week11 7
Question 3 Final exam Dec 2000
• In order to test H0: μ = 60 vs Ha: μ ≠ 60 a random sample of 9 observations (normally distributed) is obtained, yielding and s = 5. What is the p-value of the test for this sample?
a) greater than 0.10.b) between 0.05 and 0.10.c) between 0.025 and 0.05.d) between 0.01 and 0.025.e) less than 0.01.
55x
week11 8
Question
A manufacturing company claims that its new floodlight will last 1000 hours. After collecting a simple random sample of size ten, you determine that a 95% confidence interval for the true mean number of hours that the floodlights will last, , is (970, 995). Which of the following are true? (Assume all tests are two-sided.)
I) At any < .05, we can reject the null hypothesis that the true mean is 1000.
II) If a 99% confidence interval for the mean were determined here, the numerical value 972 would certainly lie in this interval.
III) If we wished to test the null hypothesis H0: = 988, we could say that the p-value must be < 0.05.
week11 9
Questions
1. Alpha (level of sig. α) is
a) the probability of rejecting H0 when H0 is true.b) the probability of supporting H0 when H0 is false.c) supporting H0 when H0 is true.d) rejecting H0 when H0 is false.
2. Confidence intervals can be used to do hypothesis tests for
a) left tail tests.b) right tail testsc) two tailed test
3. The Type II error is supporting a null hypothesis that is false. T/F
week11 10
Robustness of the t procedures
• Robust procedures
A statistical inference procedure is called robust if the probability calculations required are insensitive to violations of the assumptions made.
• t-procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness.
week11 11
Simulation study• Let’s generate 100 samples of size 10 from a moderately
skewed distribution (Chi-square distribution with 5 df ) and calculate the 95% t-intervals to see how many of them contain the true mean μ = 5.
• First let’s have a look at the histogram of the 1000 values generated from this distribution.
Variable N Mean Median TrMean StDev
C1 1000 4.9758 4.2788 4.7329 3.1618
3020100
400
300
200
100
0
C1
Fre
quency
week11 12
T Confidence Intervals
Variable N Mean StDev SE Mean 95.0 % CIC1 10 5.21 3.89 1.23 ( 2.43, 7.99). . . C4 10 4.449 1.593 0.504 ( 3.309, 5.589)C5 10 5.33 4.23 1.34 ( 2.31, 8.36)C6 10 3.267 2.312 0.731 ( 1.612, 4.921)*C7 10 4.981 2.988 0.945 ( 2.844, 7.118)C8 10 3.725 1.520 0.481 ( 2.638, 4.812)*C9 10 4.487 2.332 0.738 ( 2.819, 6.155). . .
C14 10 4.650 1.854 0.586 ( 3.324, 5.977)C15 10 2.973 2.163 0.684 ( 1.425, 4.520)*C16 10 4.685 2.254 0.713 ( 3.072, 6.297)C26 10 5.594 2.984 0.944 ( 3.459, 7.728)C27 10 3.468 2.078 0.657 ( 1.982, 4.955)*C28 10 5.59 3.84 1.22 ( 2.84, 8.34). . .
C62 10 5.689 3.113 0.984 ( 3.462, 7.916)C63 10 3.724 1.741 0.551 ( 2.479, 4.970)*C64 10 4.387 2.157 0.682 ( 2.843, 5.930). . .
C87 10 7.01 3.44 1.09 ( 4.55, 9.47)C88 10 3.281 2.265 0.716 ( 1.661, 4.902)*C89 10 4.78 3.20 1.01 ( 2.49, 7.06). . .
C99 10 6.52 4.24 1.34 ( 3.49, 9.56)C100 10 3.614 2.198 0.695 ( 2.042, 5.186)
The number of intervals not capturing the true mean (μ = 5) is 6/100.
week11 13
Example• 100 samples of size 15 were drawn from a very skewed
distribution (Chi-square distribution with d. f. 1)
Variable N Mean Median TrMean StDev
C1 1500 0.9947 0.4766 0.8059 1.3647
• The 95% CIs (t-intervals) for these 100 samples are given below.
151050
1500
1000
500
0
C1
Fre
qu
en
cy
week11 14
T Confidence IntervalsVariable N Mean StDev SE Mean 95.0 % CIC1 15 0.773 0.939 0.242 ( 0.253, 1.293)C2 15 1.093 1.491 0.385 ( 0.268, 1.919)C3 15 0.553 0.735 0.190 ( 0.146, 0.960)*C4 15 0.387 0.732 0.189 ( -0.019, 0.792)*C5 15 1.239 2.146 0.554 ( 0.051, 2.427)...C23 15 0.491 0.619 0.160 ( 0.148, 0.834)*C24 15 0.582 1.088 0.281 ( -0.020, 1.184)C25 15 0.550 0.660 0.170 ( 0.184, 0.915)*C26 15 0.634 0.769 0.199 ( 0.208, 1.060)C27 15 0.508 0.528 0.136 ( 0.216, 0.800)*... C51 15 1.122 1.292 0.334 ( 0.406, 1.837)C52 15 0.519 0.664 0.171 ( 0.151, 0.887)*C53 15 1.666 2.028 0.524 ( 0.543, 2.789)... C59 15 1.208 2.297 0.593 ( -0.065, 2.480)C60 15 0.644 0.525 0.136 ( 0.353, 0.935)*C61 15 1.088 1.122 0.290 ( 0.466, 1.709)
week11 15
T Confidence Intervals (continuation)
...
C79 15 0.895 0.931 0.240 ( 0.379, 1.411)
C80 15 0.391 0.767 0.198 ( -0.034, 0.816)*
C81 15 1.038 0.992 0.256 ( 0.488, 1.587)
C82 15 0.952 1.407 0.363 ( 0.173, 1.732)
C83 15 0.2763 0.2999 0.0774 ( 0.1102, 0.4424)*
C84 15 1.237 1.999 0.516 ( 0.130, 2.345)
...
C99 15 0.921 0.865 0.223 ( 0.442, 1.400)
C100 15 0.813 1.437 0.371 ( 0.018, 1.609)
The number of intervals not capturing the true mean (μ = 1) is 9/100.
week11 16
Match Pairs t-test • In a matched pairs study, subjects are matched in pairs and the
outcomes are compared within each matched pair. The experimenter can toss a coin to assign two treatment to the two subjects in each pair. Matched pairs are also common when randomization is not possible. One situation calling for match pairs is when observations are taken on the same subjects, under different conditions.
• A match pairs analysis is needed when there are two measurements or observations on each individual and we want to examine the difference.
• For each individual (pair), we find the difference d between the measurements from that pair. Then we treat the di as one sample and use the one sample t – statistic to test for no difference between the treatments effect.
• Example: similar to exercise 7.41 on page 446 in IPS.
week11 17
Data Display
Row Student Pretest Posttest improvement 1 1 30 29 -1 2 2 28 30 2 3 3 31 32 1 4 4 26 30 4 5 5 20 16 -4 6 6 30 25 -5 7 7 34 31 -3 8 8 15 18 3 9 9 28 33 5 10 10 20 25 5 11 11 30 32 2 12 12 29 28 -1 13 13 31 34 3 14 14 29 32 3 15 15 34 32 -2 16 16 20 27 7 17 17 26 28 2 18 18 25 29 4 19 19 31 32 1 20 20 29 32 3
week11 18
• One sample t-test for the improvement
T-Test of the Mean
Test of mu = 0.000 vs mu > 0.000 Variable N Mean StDev SE Mean T P improvem 20 1.450 3.203 0.716 2.02 0.029
• MINITAB commands for the paired t-test Stat > Basic Statistics > Paired t
Paired T-Test and Confidence Interval
Paired T for Posttest – Pretest N Mean StDev SE Mean Posttest 20 28.75 4.74 1.06 Pretest 20 27.30 5.04 1.13 Difference 20 1.450 3.203 0.716 95% CI for mean difference: (-0.049, 2.949) T-Test of mean difference=0 (vs > 0): T-Value = 2.02 P-Value = 0.029
week11 19
Character Stem-and-Leaf Display
Stem-and-leaf of improvement N = 20Leaf Unit = 1.0 2 -0 54 4 -0 32 6 -0 11 8 0 11 (7) 0 2223333 5 0 4455 1 0 7
86420-2-4
6
5
4
3
2
1
0
improvement
Fre
quency
week11 20
Two-sample problems
• The goal of inference is to compare the response in two groups.
• Each group is considered to be a sample form a distinct population.
• The responses in each group are independent of those in the other group.
• A two-sample problem can arise form a randomized comparative experiment or comparing random samples separately selected from two populations.
• Example:
A medical researcher is interested in the effect of added calcium in our diet on blood pressure. She conducted a randomized comparative experiment in which one group of subjects receive a calcium supplement and a control group gets a placebo.
week11 21
Comparing two means (with two independent samples)
• Here we will look at the problem of comparing two population means when the population variances are known or the sample sizes are large. Suppose that a SRS of size n1 is drawn from an N( μ1, σ1) population and that an independent SRS of size n2 is drown from an N( μ2, σ2) population. Then the two-sample z statistics for testing the null hypothesis H0: μ1 = μ2 is given by
and has the standard normal N(0,1) sampling distribution.
• Using the standard normal tables, the P-value for the test of H0 against
Ha : μ1 > μ2 is P( Z ≥ z ) Ha : μ1 < μ2 is P( Z ≤ z ) Ha : μ1 ≠ μ 2 is 2·P(Z ≥ |z|)
2
221
21
2121
nn
xxz
week11 22
Example
• A regional IRS auditor runs a test on a sample of returns filed by March 15 to determine whether the average return this year is larger than last year. The sample data are shown here for a random sample of returns from each year.
• Assume that the std. deviation of returns is known to be about 100 for both years. Test whether the average return is larger this year than last year.
Last Year This Year
Mean 380 410
Sample size 100 120
week11 23
Solution
• The hypothesis to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2.
• The test statistics is:
• The P-value = P(Z < -2.22) = 0.0139 < 0.05, therefore we can reject H0 and conclude that at the 5% significant level, the average return is larger this year than last year.
• A 95% CI for the difference is given by:
,
380 410 0 2.22 1.6452 2100 100
100 120
z
2 2* 1 2
1 2 1 2x x Z n n
2 2100 10030 1.96 30 26.5
100 120(3.5, 56.5)
week11 24
Comparing two population means
(unknown std. deviations) • Suppose that a SRS of size n1 is drawn from a normal
population with unknown mean 1 and that an independent SRS of size n2 is drawn from another normal population with unknown mean 2. To test the null hypothesis H0: 1 = 2, we compute the two sample t-statistic
• This statistic has a t-distribution with df approximately equal to smaller of n1 – 1 and n2 - 1. We can use this distribution to compute the P-value.
2
221
21
2121
nsns
xxt
week11 25
Example
• The weight gains for n1 = n2 = 8 rats tested on diets 1 and 2 are summarized here. Test whether diet 2 has greater mean weight gain. Use the 5% significant level.
• The hypotheses to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2 .
• The test statistic is
Diet 1 Diet 2
n 8 8
Std dev. .033 0.070
mean 3.1 3.2
3.1 3.2 0 3.652 20.033 0.070
8 8
t
week11 26
• The P-value is P(T(7) ≤- 3.65) = P(T(7) ≥ 3.65) , from table D we have 0.005 < P-value < 0.01 and so we reject H0 and conclude that the mean weight gain from diet 2 is significantly greater than that from diet 1 (at the 5% and 1% significant level).
• A C% CI for the difference between the two means is given by,
• For this example the 95% CI is
= (0.0353, 0.165)
2
22
1
21
21 n
s
n
stxx
8
070.0
8
033.0365.21.32.3
22