week111 the t distribution suppose that a srs of size n is drawn from a n(μ, σ) population. then...

week11 1

The t distribution • Suppose that a SRS of size n is drawn from a N(μ, σ)

population. Then the one sample t statistic

has a t distribution with n -1 degrees of freedom.

• The t distribution has mean 0 and it is a symmetric distribution.

• The is a different t distribution for each sample size.

• A particular t distribution is specified by the degrees of freedom that comes from the sample standard deviation.

ns

xt

week11 2

Tests for the population mean when is unknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ and unknown stdev. . To test the hypothesis H0: μ = μ0 , we first estimate by s – the sample stdev., then compute the one-sample t statistic given by

• In terms of a random variable T having the t (n - 1) distribution, the P-value for the test of H0 against

Ha : μ > μ 0 is P( T ≥ t )

Ha : μ < μ 0 is P( T ≤ t )

Ha : μ ≠ μ 0 is 2·P( T ≥ |t|)

ns

xt 0

week11 3

Example • In a metropolitan area, the concentration of cadmium (Cd) in

leaf lettuce was measured in 6 representative gardens where sewage sludge was used as fertilizer. The following measurements (in mg/kg of dry weight) were obtained.

Cd 21 38 12 15 14 8 Is there strong evidence that the mean concentration of Cd is

higher than 12.

Descriptive Statistics

Variable N Mean Median TrMean StDev SE MeanCd 6 18.00 14.50 18.00 10.68 4.36

• The hypothesis to be tested are: H0: μ = 12 vs Ha: μ > 12.

week11 4

• The test statistics is:

The degrees of freedom are df = 6 – 1 = 5

Since t = 1.38 < 2.015, we cannot reject H0 at the 5% level and so there are no strong evidence.

The P-value is 0.1 < P(T(5) ≥ 1.38) < 0.15 and so is greater then 0.05 indicating a non significant result.

18 12 1.38/ 10.68/ 6

xts n

week11 5

CIs for the population mean when unknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ. A C-level CI for μ when is unknown is an interval of the form

where t* is the value for the t (n -1) density curve with area C between –t* and t*.

• Example:

Give a 95% CI for the mean Cd concentration.

n

stx

n

stx ** ,

week11 6

• MINITAB commands: Stat > Basic Statistics > 1-Sample t

• MINITAB outputs for the above problem:

T-Test of the Mean

Test of mu = 12.00 vs mu > 12.00

Variable N Mean StDev SE Mean T P

Cd 6 18.00 10.68 4.36 1.38 0.11

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CI

Cd 6 18.00 10.68 4.36 (6.79, 29.21)

week11 7

Question 3 Final exam Dec 2000

• In order to test H0: μ = 60 vs Ha: μ ≠ 60 a random sample of 9 observations (normally distributed) is obtained, yielding and s = 5. What is the p-value of the test for this sample?

a) greater than 0.10.b) between 0.05 and 0.10.c) between 0.025 and 0.05.d) between 0.01 and 0.025.e) less than 0.01.

55x

week11 8

Question

A manufacturing company claims that its new floodlight will last 1000 hours. After collecting a simple random sample of size ten, you determine that a 95% confidence interval for the true mean number of hours that the floodlights will last, , is (970, 995). Which of the following are true? (Assume all tests are two-sided.)

I) At any < .05, we can reject the null hypothesis that the true mean is 1000.

II) If a 99% confidence interval for the mean were determined here, the numerical value 972 would certainly lie in this interval.

III) If we wished to test the null hypothesis H0: = 988, we could say that the p-value must be < 0.05.

week11 9

Questions

1. Alpha (level of sig. α) is

a) the probability of rejecting H0 when H0 is true.b) the probability of supporting H0 when H0 is false.c) supporting H0 when H0 is true.d) rejecting H0 when H0 is false.

2. Confidence intervals can be used to do hypothesis tests for

a) left tail tests.b) right tail testsc) two tailed test

3. The Type II error is supporting a null hypothesis that is false. T/F

week11 10

Robustness of the t procedures

• Robust procedures

A statistical inference procedure is called robust if the probability calculations required are insensitive to violations of the assumptions made.

• t-procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness.

week11 11

Simulation study• Let’s generate 100 samples of size 10 from a moderately

skewed distribution (Chi-square distribution with 5 df ) and calculate the 95% t-intervals to see how many of them contain the true mean μ = 5.

• First let’s have a look at the histogram of the 1000 values generated from this distribution.

Variable N Mean Median TrMean StDev

C1 1000 4.9758 4.2788 4.7329 3.1618

3020100

400

300

200

100

0

C1

Fre

quency

week11 12

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CIC1 10 5.21 3.89 1.23 ( 2.43, 7.99). . . C4 10 4.449 1.593 0.504 ( 3.309, 5.589)C5 10 5.33 4.23 1.34 ( 2.31, 8.36)C6 10 3.267 2.312 0.731 ( 1.612, 4.921)*C7 10 4.981 2.988 0.945 ( 2.844, 7.118)C8 10 3.725 1.520 0.481 ( 2.638, 4.812)*C9 10 4.487 2.332 0.738 ( 2.819, 6.155). . .

C14 10 4.650 1.854 0.586 ( 3.324, 5.977)C15 10 2.973 2.163 0.684 ( 1.425, 4.520)*C16 10 4.685 2.254 0.713 ( 3.072, 6.297)C26 10 5.594 2.984 0.944 ( 3.459, 7.728)C27 10 3.468 2.078 0.657 ( 1.982, 4.955)*C28 10 5.59 3.84 1.22 ( 2.84, 8.34). . .

C62 10 5.689 3.113 0.984 ( 3.462, 7.916)C63 10 3.724 1.741 0.551 ( 2.479, 4.970)*C64 10 4.387 2.157 0.682 ( 2.843, 5.930). . .

C87 10 7.01 3.44 1.09 ( 4.55, 9.47)C88 10 3.281 2.265 0.716 ( 1.661, 4.902)*C89 10 4.78 3.20 1.01 ( 2.49, 7.06). . .

C99 10 6.52 4.24 1.34 ( 3.49, 9.56)C100 10 3.614 2.198 0.695 ( 2.042, 5.186)

The number of intervals not capturing the true mean (μ = 5) is 6/100.

week11 13

Example• 100 samples of size 15 were drawn from a very skewed

distribution (Chi-square distribution with d. f. 1)

Variable N Mean Median TrMean StDev

C1 1500 0.9947 0.4766 0.8059 1.3647

• The 95% CIs (t-intervals) for these 100 samples are given below.

151050

1500

1000

500

0

C1

Fre

qu

en

cy

week11 14

T Confidence IntervalsVariable N Mean StDev SE Mean 95.0 % CIC1 15 0.773 0.939 0.242 ( 0.253, 1.293)C2 15 1.093 1.491 0.385 ( 0.268, 1.919)C3 15 0.553 0.735 0.190 ( 0.146, 0.960)*C4 15 0.387 0.732 0.189 ( -0.019, 0.792)*C5 15 1.239 2.146 0.554 ( 0.051, 2.427)...C23 15 0.491 0.619 0.160 ( 0.148, 0.834)*C24 15 0.582 1.088 0.281 ( -0.020, 1.184)C25 15 0.550 0.660 0.170 ( 0.184, 0.915)*C26 15 0.634 0.769 0.199 ( 0.208, 1.060)C27 15 0.508 0.528 0.136 ( 0.216, 0.800)*... C51 15 1.122 1.292 0.334 ( 0.406, 1.837)C52 15 0.519 0.664 0.171 ( 0.151, 0.887)*C53 15 1.666 2.028 0.524 ( 0.543, 2.789)... C59 15 1.208 2.297 0.593 ( -0.065, 2.480)C60 15 0.644 0.525 0.136 ( 0.353, 0.935)*C61 15 1.088 1.122 0.290 ( 0.466, 1.709)

week11 15

T Confidence Intervals (continuation)

...

C79 15 0.895 0.931 0.240 ( 0.379, 1.411)

C80 15 0.391 0.767 0.198 ( -0.034, 0.816)*

C81 15 1.038 0.992 0.256 ( 0.488, 1.587)

C82 15 0.952 1.407 0.363 ( 0.173, 1.732)

C83 15 0.2763 0.2999 0.0774 ( 0.1102, 0.4424)*

C84 15 1.237 1.999 0.516 ( 0.130, 2.345)

...

C99 15 0.921 0.865 0.223 ( 0.442, 1.400)

C100 15 0.813 1.437 0.371 ( 0.018, 1.609)

The number of intervals not capturing the true mean (μ = 1) is 9/100.

week11 16

Match Pairs t-test • In a matched pairs study, subjects are matched in pairs and the

outcomes are compared within each matched pair. The experimenter can toss a coin to assign two treatment to the two subjects in each pair. Matched pairs are also common when randomization is not possible. One situation calling for match pairs is when observations are taken on the same subjects, under different conditions.

• A match pairs analysis is needed when there are two measurements or observations on each individual and we want to examine the difference.

• For each individual (pair), we find the difference d between the measurements from that pair. Then we treat the di as one sample and use the one sample t – statistic to test for no difference between the treatments effect.

• Example: similar to exercise 7.41 on page 446 in IPS.

week11 17

Data Display

Row Student Pretest Posttest improvement 1 1 30 29 -1 2 2 28 30 2 3 3 31 32 1 4 4 26 30 4 5 5 20 16 -4 6 6 30 25 -5 7 7 34 31 -3 8 8 15 18 3 9 9 28 33 5 10 10 20 25 5 11 11 30 32 2 12 12 29 28 -1 13 13 31 34 3 14 14 29 32 3 15 15 34 32 -2 16 16 20 27 7 17 17 26 28 2 18 18 25 29 4 19 19 31 32 1 20 20 29 32 3

week11 18

• One sample t-test for the improvement

T-Test of the Mean

Test of mu = 0.000 vs mu > 0.000 Variable N Mean StDev SE Mean T P improvem 20 1.450 3.203 0.716 2.02 0.029

• MINITAB commands for the paired t-test Stat > Basic Statistics > Paired t

Paired T-Test and Confidence Interval

Paired T for Posttest – Pretest N Mean StDev SE Mean Posttest 20 28.75 4.74 1.06 Pretest 20 27.30 5.04 1.13 Difference 20 1.450 3.203 0.716 95% CI for mean difference: (-0.049, 2.949) T-Test of mean difference=0 (vs > 0): T-Value = 2.02 P-Value = 0.029

week11 19

Character Stem-and-Leaf Display

Stem-and-leaf of improvement N = 20Leaf Unit = 1.0 2 -0 54 4 -0 32 6 -0 11 8 0 11 (7) 0 2223333 5 0 4455 1 0 7

86420-2-4

6

5

4

3

2

1

0

improvement

Fre

quency

week11 20

Two-sample problems

• The goal of inference is to compare the response in two groups.

• Each group is considered to be a sample form a distinct population.

• The responses in each group are independent of those in the other group.

• A two-sample problem can arise form a randomized comparative experiment or comparing random samples separately selected from two populations.

• Example:

A medical researcher is interested in the effect of added calcium in our diet on blood pressure. She conducted a randomized comparative experiment in which one group of subjects receive a calcium supplement and a control group gets a placebo.

week11 21

Comparing two means (with two independent samples)

• Here we will look at the problem of comparing two population means when the population variances are known or the sample sizes are large. Suppose that a SRS of size n1 is drawn from an N( μ1, σ1) population and that an independent SRS of size n2 is drown from an N( μ2, σ2) population. Then the two-sample z statistics for testing the null hypothesis H0: μ1 = μ2 is given by

and has the standard normal N(0,1) sampling distribution.

• Using the standard normal tables, the P-value for the test of H0 against

Ha : μ1 > μ2 is P( Z ≥ z ) Ha : μ1 < μ2 is P( Z ≤ z ) Ha : μ1 ≠ μ 2 is 2·P(Z ≥ |z|)

2

221

21

2121

nn

xxz

week11 22

Example

• A regional IRS auditor runs a test on a sample of returns filed by March 15 to determine whether the average return this year is larger than last year. The sample data are shown here for a random sample of returns from each year.

• Assume that the std. deviation of returns is known to be about 100 for both years. Test whether the average return is larger this year than last year.

Last Year This Year

Mean 380 410

Sample size 100 120

week11 23

Solution

• The hypothesis to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2.

• The test statistics is:

• The P-value = P(Z < -2.22) = 0.0139 < 0.05, therefore we can reject H0 and conclude that at the 5% significant level, the average return is larger this year than last year.

• A 95% CI for the difference is given by:

,

380 410 0 2.22 1.6452 2100 100

100 120

z

2 2* 1 2

1 2 1 2x x Z n n

2 2100 10030 1.96 30 26.5

100 120(3.5, 56.5)

week11 24

Comparing two population means

(unknown std. deviations) • Suppose that a SRS of size n1 is drawn from a normal

population with unknown mean 1 and that an independent SRS of size n2 is drawn from another normal population with unknown mean 2. To test the null hypothesis H0: 1 = 2, we compute the two sample t-statistic

• This statistic has a t-distribution with df approximately equal to smaller of n1 – 1 and n2 - 1. We can use this distribution to compute the P-value.

2

221

21

2121

nsns

xxt

week11 25

Example

• The weight gains for n1 = n2 = 8 rats tested on diets 1 and 2 are summarized here. Test whether diet 2 has greater mean weight gain. Use the 5% significant level.

• The hypotheses to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2 .

• The test statistic is

Diet 1 Diet 2

n 8 8

Std dev. .033 0.070

mean 3.1 3.2

3.1 3.2 0 3.652 20.033 0.070

8 8

t

week11 26

• The P-value is P(T(7) ≤- 3.65) = P(T(7) ≥ 3.65) , from table D we have 0.005 < P-value < 0.01 and so we reject H0 and conclude that the mean weight gain from diet 2 is significantly greater than that from diet 1 (at the 5% and 1% significant level).

• A C% CI for the difference between the two means is given by,

• For this example the 95% CI is

= (0.0353, 0.165)

2

22

1

21

21 n

s

n

stxx

8

070.0

8

033.0365.21.32.3

22

week111 the t distribution suppose that a srs of size n is drawn from a n(μ, σ) population. then...

Documents