chapter 13 inference about comparing two populations

73
Chapter 13 Inference About Comparing Two Populations

Upload: audra-norris

Post on 02-Jan-2016

232 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Chapter 13 Inference About Comparing Two Populations

Chapter 13

Inference About Comparing

Two Populations

Page 2: Chapter 13 Inference About Comparing Two Populations

Comparing Two Populations…Previously we looked at techniques to estimate and test parameters for one population:

Population Mean , Population Variance , and

Population Proportion p

We will still consider these parameters when we are looking at two populations, however our interest will now be:

The difference between two means.

The ratio of two variances.

The difference between two proportions.

Page 3: Chapter 13 Inference About Comparing Two Populations

Difference of Two Means…In order to test and estimate the difference between two population means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are completely unrelated to one another.

(Likewise, we consider for Population 2)

Sample, size: n1

Population 1

Parameters: Statistics:

Page 4: Chapter 13 Inference About Comparing Two Populations

Difference of Two Means

In order to test and estimate the difference between two population means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are completely unrelated to one another.

Because we are comparing two population means, we use the statistic:

Page 5: Chapter 13 Inference About Comparing Two Populations

Sampling Distribution of

1. is normally distributed if the original populations are normal –or– approximately normal if the populations are nonnormal and the sample sizes are large (n1, n2 > 30)

2. The expected value of is

3. The variance of is

and the standard error is:

Page 6: Chapter 13 Inference About Comparing Two Populations

Making Inferences About Since is normally distributed if the original populations are normal –or– approximately normal if the populations are nonnormal and the sample sizes are large (n1, n2 > 30), then:

is a standard normal (or approximately normal) random variable. We could use this to build test statistics or confidence interval estimators for …

Page 7: Chapter 13 Inference About Comparing Two Populations

Making Inferences About …except that, in practice, the z statistic is rarely used since the population variances are unknown.

Instead we use a t-statistic. We consider two cases for the unknown population variances: when we believe they are equal and conversely when they are not equal.

??

Page 8: Chapter 13 Inference About Comparing Two Populations

When are variances equal?

How do we know when the population variances are equal?

Since the population variances are unknown, we can’t know for certain whether they’re equal, but we can examine the sample variances and informally judge their relative values to determine whether we can assume that the population variances are equal or not.

Page 9: Chapter 13 Inference About Comparing Two Populations

Test Statistic for (equal variances)

1) Calculate – the pooled variance estimator as…

2) …and use it here:

degrees of freedom

Page 10: Chapter 13 Inference About Comparing Two Populations

CI Estimator for (equal variances)

The confidence interval estimator for when the population variances are equal is given by:

degrees of freedompooled variance estimator

Page 11: Chapter 13 Inference About Comparing Two Populations

Test Statistic for (unequal variances)

The test statistic for when the population variances are unequal is given by:

Likewise, the confidence interval estimator is:

degrees of freedom

Page 12: Chapter 13 Inference About Comparing Two Populations

Which case to use?Which case to use? Equal variance or unequal variance?

Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the

equal variances t-test.

This is so, because for any two given samples:

The number of degrees of freedom for the equal variances case

The number of degrees of freedom for the unequal variances case

≥Larger numbers of degrees of freedom

have the same effect as having larger sample

sizes

Page 13: Chapter 13 Inference About Comparing Two Populations

• Example 13.1– Do people who eat high-fiber cereal for

breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.

– For each person the number of calories consumed at lunch was recorded.

Example: Making an inference about –

Page 14: Chapter 13 Inference About Comparing Two Populations

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Solution: • The data are interval. • The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (1) is less than that of non-consumers (2).

Example: Making an inference about –

Page 15: Chapter 13 Inference About Comparing Two Populations

• The hypotheses are:

H0: (1 - 2) = 0H1: (1 - 2) < 0

– To check the whether the population variances are equal, we use (Xm13-01) computer output to find the sample variances

We have s12= 4103, and s2

2 = 10,670.

– It appears that the variances are unequal.

Example: Making an inference about –

Page 16: Chapter 13 Inference About Comparing Two Populations

Example 13.1…A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. For each person the number of calories consumed at lunch was recorded. The data: Independent Pop’ns;

Either you eat high fibercereal or you don’t

n1+n2=150

There is reason to believethe population variances

are unequal…

Recall H1:

Page 17: Chapter 13 Inference About Comparing Two Populations

Example 13.1…

Thus, our test statistic is:

The number of degrees of freedom is:

Hence the rejection region is…

COMPUTE

Page 18: Chapter 13 Inference About Comparing Two Populations

Example 13.1…Our rejection region:

Our test statistic:

Since our test statistic (-2.09) is less than our critical value of t (-1.658), we reject H0 in favor of H1 — that is, there is sufficient evidence to support the claim that high fiber cereal eaters consume less calories at lunch.

COMPUTE

Compare

INTERPRET

Page 19: Chapter 13 Inference About Comparing Two Populations

Confidence Interval…Suppose we wanted to compute a 95% confidence interval estimate of the difference between mean caloric intake for consumers and non-consumers of high-fiber cereals…

That is, we estimate that non-consumers of high fiber cereal eat between 1.56 and 56.86 more calories than consumers.

Page 20: Chapter 13 Inference About Comparing Two Populations

• Example 13.2

– An ergonomic chair can be assembled

using two different sets of operations

(Method A and Method B)

– The operations manager would like to

know whether the assembly time under

the two methods differ.

Example: Making an inference about –

Page 21: Chapter 13 Inference About Comparing Two Populations

• Example 13.2

– Two samples are randomly and

independently selected

• A sample of 25 workers assembled the chair

using method A.

• A sample of 25 workers assembled the chair

using method B.

• The assembly times were recorded

– Do the assembly times of the two methods

differs?

Example: Making an inference about –

Page 22: Chapter 13 Inference About Comparing Two Populations

Example 13.2 : Making an inference about –

Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Method A Method B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Assembly times in Minutes

Solution

• The data are interval.

• The parameter of interest is the difference between two population means.

• The claim to be tested is whether a difference between the two methods exists.

Page 23: Chapter 13 Inference About Comparing Two Populations

Example 13.2 : Making an inference about –

• Compute: Manually

–The hypotheses test is:

H0: (1 - 2) 0

H1: (1 - 2) 0

– To check whether the two unknown population variances are equal we calculate S1

2 and S22 (Xm13-02).

Page 24: Chapter 13 Inference About Comparing Two Populations

Example: Making an inference about –

Manually

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

3031.1s 8478.0s 016.6x 288.6x 22

2121

076.122525

)303.1)(125()848.0)(125(S2

p

– To calculate the t-statistic we have:

COMPUTE

Page 25: Chapter 13 Inference About Comparing Two Populations

Example 13.2…The assembly times for each of the two methods are recorded and preliminary data is prepared…

COMPUTE

The sample variances are similar, hence we will assume that the population variances are

equal…

Page 26: Chapter 13 Inference About Comparing Two Populations

Example 13.2…Recall, we are doing a two-tailed test, hence the rejection region will be:

The number of degrees of freedom is:

Hence our critical values of t (and our rejection region) becomes:

COMPUTE

Page 27: Chapter 13 Inference About Comparing Two Populations

Example 13.2…In order to calculate our t-statistic, we need to first calculate the pooled variance estimator, followed by the t-statistic…

COMPUTE

Page 28: Chapter 13 Inference About Comparing Two Populations

Example 13.2…

Since our calculated t-statistic does not fall into the rejection region, we cannot reject H0 in favor of H1, that is, there is not sufficient evidence to infer that the mean assembly times differ.

INTERPRET

Page 29: Chapter 13 Inference About Comparing Two Populations

Confidence Interval…We can compute a 95% confidence interval estimate for the difference in mean assembly times as:

That is, we estimate the mean difference between the two assembly methods between –.36 and .96 minutes. Note: zero is included in this confidence interval…

Page 30: Chapter 13 Inference About Comparing Two Populations

Checking the required Conditions for the equal variances case (Example

13.2)

The data appear to be approximately normal

0

2

4

6

8

10

12

5 5.8 6.6 7.4 8.2 More

Design A

01234567

4.2 5 5.8 6.6 7.4 More

Design B

Page 31: Chapter 13 Inference About Comparing Two Populations

Identifying Factors I…Factors that identify the equal-variances t-test and estimator of :

Page 32: Chapter 13 Inference About Comparing Two Populations

Identifying Factors II…Factors that identify the unequal-variances t-test and estimator of :

Page 33: Chapter 13 Inference About Comparing Two Populations

13.4 Matched Pairs Experiment

• What is a matched pair experiment?

• Why matched pairs experiments are needed? • How do we deal with data produced in this way?

The following example demonstrates a situationwhere a matched pair experiment is the correct approach to testing the difference between two population means.

Page 34: Chapter 13 Inference About Comparing Two Populations

Example 13.3 – To investigate the job offers obtained by MBA graduates,

a study focusing on salaries was conducted.

– Particularly, the salaries offered to finance majors were compared to those offered to marketing majors.

– Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. The data are stored in file Xm13-03.

– Can we infer that finance majors obtain higher salary

offers than do marketing majors among MBAs?.

13.4 Matched Pairs Experiment

Page 35: Chapter 13 Inference About Comparing Two Populations

• Solution– Compare two

populations of interval data.

The parameter tested is 1 - 2

Finance Marketing61,228 73,36151,836 36,95620,620 63,62773,356 71,06984,186 40,203

. .

. .

. .

1

2

The mean of the highest salaryoffered to Finance MBAs

The mean of the highest salaryoffered to Marketing MBAs

– H0: (1 - 2) = 0

H1: (1 - 2) > 0

Example 13.3

Page 36: Chapter 13 Inference About Comparing Two Populations

• Solution – continued

From the data we have:

559,228,262s

,294,433,360s

423,60x624,65x

22

21

2

1

• Let us assume equal variances

Equal VariancesFinance Marketing

Mean 65624 60423Variance 360433294 262228559Observations 25 25Pooled Variance 311330926Hypothesized Mean Difference 0df 48t Stat 1.04P(T<=t) one-tail 0.1513t Critical one-tail 1.6772P(T<=t) two-tail 0.3026t Critical two-tail 2.0106

There is insufficient evidence to conclude that Finance MBAs are offered higher salaries than marketing MBAs.

Example 13.3

Page 37: Chapter 13 Inference About Comparing Two Populations

• Question–The difference between the

sample means is 65624 – 60423 = 5,201.

–So, why could we not reject H0

and favor H1 where(1 – 2 > 0)?

The effect of a large sample variability

Page 38: Chapter 13 Inference About Comparing Two Populations

• Answer: – Sp

2 is large (because the sample variances are large) Sp

2 = 311,330,926.

– A large variance reduces the value of the t statistic and it becomes more difficult to reject H0.

The effect of a large sample variability

)n1

n1

(s

)()xx(t

21

2p

21

Page 39: Chapter 13 Inference About Comparing Two Populations

Reducing the variability

The values each sample consists of might markedly vary...

The range of observationssample B

The range of observationssample A

Page 40: Chapter 13 Inference About Comparing Two Populations

...but the differences between pairs of observations might be quite close to one another, resulting in a small variability of the differences.

0

Differences

The range of thedifferences

Reducing the variability

Page 41: Chapter 13 Inference About Comparing Two Populations

The matched pairs experiment• Since the difference of the means is equal

to the mean of the differences we can rewrite the hypotheses in terms of D (the mean of the differences) rather than in terms of 1 – 2.

• This formulation has the benefit of a smaller variability.

Group 1 Group 2 Difference10 12 - 215 11 +4

Mean1 =12.5 Mean2 =11.5Mean1 – Mean2 = 1 Mean Differences = 1

Page 42: Chapter 13 Inference About Comparing Two Populations

• Example 13.4 – It was suspected that salary offers were

affected by students’ GPA, (which caused S1

2 and S22 to increase).

– To reduce this variability, the following procedure was used:• 25 ranges of GPAs were predetermined.• Students from each major were randomly

selected, one from each GPA range.• The highest salary offer for each student was

recorded.

– From the data presented can we conclude that Finance majors are offered higher salaries?

The matched pairs experiment

Page 43: Chapter 13 Inference About Comparing Two Populations

Example 13.4…The numbers in black are the original starting salary data; the number in blue were calculated.

although a student is either in Finance OR in Marketing (i.e. independent), that the data is grouped in this fashion makes it a matched pairs experiment (i.e. the two students in group #1 are ‘matched’ by their GPA range

the difference of the means is equal to the mean of the differences, hence we will consider the “mean of the paired differences” as our parameter of interest:

Page 44: Chapter 13 Inference About Comparing Two Populations

Example 13.4…Do Finance majors have higher salary offers than Marketing majors?

Since:

We want to research this hypothesis:

H1:

(and our null hypothesis becomes

H0: )

IDENTIFY

Page 45: Chapter 13 Inference About Comparing Two Populations

Test Statistic for

The test statistic for the mean of the population of differences ( ) is:

which is Student t distributed with nD–1 degrees of freedom, provided that the differences are normally distributed.

Thus our rejection region becomes:

Page 46: Chapter 13 Inference About Comparing Two Populations

Example 13.4…From the data, we calculate…

…which in turn we use

for our t-statistic…

…which we compare to our critical value of t:

COMPUTE

Page 47: Chapter 13 Inference About Comparing Two Populations

Example 13.4…•Since our calculated value of t (3.81) is greater than our critical value of t (1.711), it falls in the rejection region, hence we reject H0 in favor of H1; that is, there is overwhelming evidence (since the p-value = .0004) that Finance majors do obtain higher starting salary offers than their peers in Marketing.

INTERPRET

Compare…

Page 48: Chapter 13 Inference About Comparing Two Populations

Confidence Interval Estimator forWe can derive the confidence interval estimator for

algebraically as:

In the previous example, what is the 95% confidence interval estimate of the mean difference in salary offers between the two business majors?

That is, the mean of the population differences is between LCL=2,321 and UCL=7,809 dollars.

Example 13.5

Page 49: Chapter 13 Inference About Comparing Two Populations

Identifying Factors…Factors that identify the t-test and estimator of :

Page 50: Chapter 13 Inference About Comparing Two Populations

Inference about the ratio of two variances

So far we’ve looked at comparing measures of central location, namely the mean of two populations.

When looking at two population variances, we consider the ratio of the variances, i.e. the parameter of interest to us is:

The sampling statistic: is F distributed with

degrees of freedom.

Page 51: Chapter 13 Inference About Comparing Two Populations

Inference about the ratio of two variances

Our null hypothesis is always:

H0:

(i.e. the variances of the two populations will be equal, hence their ratio will be one)

Therefore, our statistic simplifies to:

Page 52: Chapter 13 Inference About Comparing Two Populations

CI Estimator of σ12 / σ2

2

With algebraic manipulation we get

s1 1LCL =

s2 Fα/2,ν ,ν

s1 1UCL =

s2 Fα/2,ν ,ν

Where ν1 = n1 – 1 and ν2= n2 - 1

2

2

1 2

2

2

2 1

Page 53: Chapter 13 Inference About Comparing Two Populations

Example 13.6…

In example 13.1, we looked at the variances of the samples of people who consumed high fiber cereal and those who did not and assumed they were not equal. We can use the ideas just developed to test if this is in fact the case.

We want to show: H1:

(the variances are not equal to each other)

Hence we have our null hypothesis: H0:

IDENTIFY

Page 54: Chapter 13 Inference About Comparing Two Populations

Example 13.6…

Since our research hypothesis is: H1:

We are doing a two-tailed test, and our rejection region is:

CALCULATE

F

Page 55: Chapter 13 Inference About Comparing Two Populations

Example 13.6…Our test statistic is:

Hence there is sufficient evidence to reject the null hypothesis in favor of the alternative; that is, there is a difference in the variance between the two populations.

CALCULATE

F.58 1.61

Page 56: Chapter 13 Inference About Comparing Two Populations

Example 13.7…

If we wanted to determine the 95% confidence interval estimate of the ratio of the two population variances in Example 13.1, we would proceed as follows…

The confidence interval estimator for σ 2 / σ 2 , is:

CALCULATE

1 2

Page 57: Chapter 13 Inference About Comparing Two Populations

Example 13.7…

The 95% confidence interval estimate of the ratio of the two population variances in Example 13.1 is:

That is, we estimate that σ 2 / σ 2 lies between .2388 and .6614

Note that one (1.00) is not within this interval…

CALCULATE

1 2

Page 58: Chapter 13 Inference About Comparing Two Populations

Identifying Factors

Factors that identify the F-test and estimator of σ 2 / σ 2 :1 2

Page 59: Chapter 13 Inference About Comparing Two Populations

Difference Between Two Population Proportions

We will now look at procedures for drawing inferences about the difference between populations whose data are nominal (i.e. categorical).

As mentioned previously, with nominal data, calculate proportions of occurrences of each type of outcome. Thus, the parameter to be tested and estimated in this section is the difference between two population proportions: p1–p2.

Page 60: Chapter 13 Inference About Comparing Two Populations

Sampling from two populations of nominal data

Population 1 Population 2

Parameter: p1 Parameter: p2

Statistic:p1

^ ^Statistic:p2

^

SampleSize: n1

SampleSize: n2

Page 61: Chapter 13 Inference About Comparing Two Populations

Statistic and Sampling Distribution…

To draw inferences about the the parameter p1–p2, we take samples of population, calculate the sample proportions and look at their difference.

is an unbiased estimator for p1–p2.x1 successes in a sample of size n1 from population 1

Page 62: Chapter 13 Inference About Comparing Two Populations

Sampling DistributionThe statistic is approximately normally distributed if the sample sizes are large enough so that:

Since its “approximately normal” we can describe the normal distribution in terms of mean and variance…

…hence this z-variable will also be approximately standard normally distributed:

Page 63: Chapter 13 Inference About Comparing Two Populations

Testing and Estimating p1–p2…

Because the population proportions (p1 & p2) are unknown, the standard error:

is unknown. Thus, we have two different estimators for the standard error of , which depend upon the null hypothesis. We’ll look at these cases on the next slide…

Page 64: Chapter 13 Inference About Comparing Two Populations

Test Statistic for p1–p2…

There are two cases to consider…

Page 65: Chapter 13 Inference About Comparing Two Populations

Example 13.8…

A consumer packaged goods (CPG) company is test marketing two new versions of soap packaging. Version one (bright colors) is distributed in one supermarket, while version two (simple colors) is in another. Since the first version is more expensive, it must outsell the other design, that is its market share, p1, must be greater than that of the other soap package design, i.e. p2.

That is, we want to know, is p1 > p2? or, using the language of statistics:

H1: (p1–p2) > 0

Hence our null hypothesis will be H0: (p1–p2) = 0 [case 1]

IDENTIFY

Page 66: Chapter 13 Inference About Comparing Two Populations

Example 13.8…

Here is the summary data…

Our null hypothesis is H0: (p1–p2) = 0, i.e. is a “case 1” type problem, hence we need to calculate the pooled proportion:

IDENTIFY

Page 67: Chapter 13 Inference About Comparing Two Populations

Example 13.8…At a 5% significance level, our rejection region is:

The value of our z-statistic is…

Since 2.90 > 1.645, we reject H0 in favor of H1, that is, there is enough evidence to infer that the brightly colored design is more popular than the simple design.

CALCULATE

Compare…

Page 68: Chapter 13 Inference About Comparing Two Populations

Example 13.9…

Suppose in our test marketing of soap packages scenario that instead of just a difference between the two package versions, the brightly colored design had to outsell the simple design by at least 3%

Our research hypothesis now becomes:

H1: (p1–p2) > .03

And so our null hypothesis is: H0: (p1–p2) = .03

IDENTIFY

Since the r.h.s. of the H0 equation is

not zero, it’s a “case 2” type problem

Page 69: Chapter 13 Inference About Comparing Two Populations

Example 13.9…Same summary data as before:

Since this is a “case 2” type problem, we don’t need to calculate the pooled proportion, we can go straight to z:

IDENTIFY

Page 70: Chapter 13 Inference About Comparing Two Populations

Example 13.9…

Since our calculated z-statistic (1.15) does not fall into our rejection region

there is not enough evidence to infer that the brightly colored design outsells the other design by 3% or more.

INTERPRET

Page 71: Chapter 13 Inference About Comparing Two Populations

Confidence Intervals…

The confidence interval estimator for p1–p2 is given by:

and as you may suspect, its valid when…

Page 72: Chapter 13 Inference About Comparing Two Populations

Example 13.10…Create a 95% confidence interval for the difference between the two proportions of packaged soap sales from Ex. 13.8:

COMPUTE

Page 73: Chapter 13 Inference About Comparing Two Populations

Identifying Factors…

•Factors that identify the z-test and estimator for p1–p2