Download - 1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter

1

Chapter 12

Inference About a Inference About a PopulationPopulation

Inference About a Inference About a PopulationPopulation

2

IntroductionIntroduction

• In this chapter we utilize the approach In this chapter we utilize the approach developed before to describe a population.developed before to describe a population.– Identify the parameter to be estimated or tested.Identify the parameter to be estimated or tested.– Specify the parameter’s estimator and its sampling Specify the parameter’s estimator and its sampling

distribution.distribution.– Construct a confidence interval estimator or perform Construct a confidence interval estimator or perform

a hypothesis test.a hypothesis test.

3

• We shall develop techniques to estimate and test three population parameters.– The expected value – The variance 2

– The population proportion p (for qualitative data)

IntroductionIntroduction

4

• Recall: By the central limit theorem, when 2 is known is normally distributed if:

• the sample is drawn from a normal population, or • the population is not normal but the sample is sufficiently large.

• When 2 is unknown, another random variable

describes the distribution of

x

12.1 Inference About a Population Mean 12.1 Inference About a Population Mean When the Population Standard Deviation When the Population Standard Deviation is Unknownis Unknown

x

5

The t - StatisticThe t - Statistic

n

x

n

x

Z s

When the sampled population is normally distributed,the statistic t is Student t distributed. See next.

When is unknown, we use s2 instead, and the Z statistic is then replaced by the t-statistic

t

6

The t - StatisticThe t - Statistic

n

x

n

x

s

0

The Student- t distribution is mound-shaped, and symmetrical around zero.

Degrees of freedom = n2

Degrees of freedom= n1

n1 < n2

t

Using the t-table

The degrees of freedom determine the distribution shape

7

Testing Testing when when is unknown is unknown

• Example 12.1 - Productivity of newly hired Trainees

8

• Example 12.1– In order to determine the number of workers required to meet

demand, the productivity of newly hired trainees is studied.

– It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring.

– Can we conclude that this belief is correct, based on productivity observation of 50 trainees (raw data is presented later in the file Xm12-01).


9

• Example 12.1 – Solution– The problem objective is to describe the population

of the number of packages processed in one hour.– The data is quantitative.

H0: = 450H1: > 450

– The t statistic

d.f. = n - 1 = 49ns

xt

We want to prove that the trainees

reach 90% productivity of experienced workers

We want to prove that the trainees

reach 90% productivity of experienced workers


10

After transforming into a t-statistic we express the rejection region in terms of the statistic t.

• Solution - continued

Observe: H1 has the form of > 0, thus

The rejection region is


Lxx

t t,n-1t t,n-1

x

11

• Solution continued (solving by hand)

The rejection region is t > t,n – 1.

t,n - 1 = t.05,49


The critical value (table entry)

t.05,50 = 1.676

You can use the Excel function =tinv to obtain the critical value. This function gives the two-tail probability ‘t value’. That is, for a two tail test with significance level of alpha, it returns the critical value of t,n – 1. Since our test is one-tail, we’ll use 2 instead of . Thus, type in =tinv(.1,49), to obtain the result 1.676551.

2(.05) = .1

12

89.15083.38

45038.460

ns

xt

• Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative.

• Conclusion: There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.

1.676 1.89

Rejection region


The test statistic is calculated based on the data provided in Xm12-01

13


.05

.0323

Xm12-01.xls

Using Data Analysis Plus and the p-value approachto test the mean.

t-Test: Mean

Packages

Mean 460.38

Standard Deviation 38.8271

Hypothesized Mean 450

df 49

t Stat 1.8904

P(T<=t) one-tail 0.0323

t Critical one-tail 1.6766

Since .02323 < .05, we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.

1.89

14

Estimating Estimating when when is unknown is unknown

• Confidence interval estimator of when s2 is unknown

1n.f.dn

stx 2 1n.f.d

n

stx 2

15

• Example 12.2– An investor is trying to estimate the return on

investment in companies that won quality awards last year.

– A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them.

– Construct a 95% confidence interval for the mean return.


16

• Solution (solving by hand)– The problem objective is to describe the population

of annual returns from buying shares of quality award-winners.

– The data is quantitative.– Solving by hand

• From the data we determine

8.3168.98s

68.98s15.02x 2

835.16,205.138331.8

990.102.15n

stx 1n,2

t.025,82 t.025,80


17


t - estimate: MeanReturns

Mean 15.0172Standard Deviation 8.3054LCL 13.0237UCL 16.8307

Using Data Analysis Plus

18

Checking the required conditionsChecking the required conditions

• We need to check that the population is normally distributed, or at least not extremely non-normal.

• There are statistical methods that can be used to test for normality (to be introduced later in the book, but not discussed here).

• From the sample histograms we see…

19

0

5

10

15

20

25

30

-4 2 8 14 22 30 More

02468

101214

400 425 450 475 500 525 550 575 More

A Histogram for XM-11- 01

PackagesA Histogram for XM-11- 02

Returns

20

12.2 Inference About a Population Variance12.2 Inference About a Population Variance

• Some times we are interested in making inference about the variability of processes.

• Examples:– The consistency of a production process for quality control

purposes.– To evaluate the risk associated with different investments.

• To draw inference about variability, the parameter of interest is 2.

21

• The population variance can be estimated or its value tested using the sample variance s2.

• The sample variance s2 is an unbiased, consistent and efficient point estimator for 2.

• The inference about 2 is made by using a sample statistic that incorporates s2 and 2.

Inference About a Population VarianceInference About a Population Variance

22

• This statistic is .

• It has a distribution called Chi-squared, if the population is normally distributed.

2

2s)1n(


1ndfσ

1)s(n2

22

1ndf

σ1)s(n2

22

23


1ndfσ

1)s(nχ 2

22

1ndf

σ1)s(n

χ 2

22

0

0.02

0.04

0.06

0.08

0.1

0 5 10 15 20 25

DF = 5

DF=10

The Chi-squared distribution

The degfrees of freedom (df)determines the distribution shape

24

• Example 1 (operation management application)– A container-filling machine is believed to fill 1 liter

containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter).

– To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03.xls)

– Do these data support the belief that the variance is less than 1cc at 5% significance level?

Testing the population variance – Testing the population variance – Left hand tail testLeft hand tail test

25

• Solution– The problem objective is to describe the population of 1-liter fills

from a filling machine. – The data are quantitative, and we are interested in the variability

of the fills.– The two hypotheses are:

H0: 2 = 1

H1: 2 <1We want to prove that the process is consistent

Testing the population varianceTesting the population variance

s2 Critical Values2 Critical Value

The rejection region has the form:

26

Testing the population varianceTesting the population variance• Solution

– The two hypotheses are:H0: 2 = 1

H1: 2 <1

21n,1

2 21n,1

2

The rejection region in terms of 2 is:

27

• Solving by hand– Note that (n - 1)s2 = (xi - x)2 = xi

2 – (xi)2/n – From the sample (data is presented in units of cc-1000

to avoid rounding) we can calculate xi = 24,996.4, and

xi2 = 24,992,821.3

– Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78


28

There is insufficient evidence to reject the hypothesis thatthe variance is equal to 1cc.

There is insufficient evidence to reject the hypothesis thatthe variance is equal to 1cc.


Using the 2 table

Rejection Region

20.7813.84

Since 20.78>13.8484 do not rejectthe null hypothesis

.8484.13

,78.201

78.20s)1n(

2125,95.

21n,1

22

22

29

• A right hand tail test:• H0: 2 = value

H1: 2 > value

• Rejection region

Testing the population variance – Testing the population variance – Right hand tail test; Two tail test;Right hand tail test; Two tail test;

21n,

2 21n,

2

Click

30

• A right hand tail test:– H0: 2 = value

H1: 2 > value

– Rejection region

• A two tail test– H0: 2 value

H1: 2 value

– Rejection region:

21n,

2 21n,

2

21n,2

221n,21

2 or 21n,2

221n,21

2 or

Testing the population variance – Testing the population variance – Right hand tail test; Two tail test;Right hand tail test; Two tail test;

31

Estimating the population varianceEstimating the population variance

From the following probability statement

P(21-/2 < 2 < 2

/2) = 1-

we have (by substituting 2 = [(n - 1)s2]/2.)

22/1

22

22/

2 s)1n(s)1n(

22/1

22

22/

2 s)1n(s)1n(

This is the confidence interval for 2

with 1- % confidence level.

32


• Example 2– Estimate the variance of fills in example 12.3 with

99% confidence.• Solution

– We have (n-1)s2 = 20.78.From the Chi-squared table we have2

/2,n-1 = 2.005, 24 = 45.5585

2/2,n-1 = 2

.0995, 24 = 9.88623

33

• The confidence interval is

10.246.

88623.978.20

5585.4578.20

s)1n(s)1n(

2

2

2

2/1

22

2

2/

2


34

12.4 Inference About a Population 12.4 Inference About a Population ProportionProportion

• When the population consists of nominal or categorical data, the only inference we can make is about the proportion of occurrence of a certain value.

• The parameter “p” was used before to calculate these proportion under the binomial distribution.

35

size.samplensuccesses.ofnumberthex

wherenx

p

ˆ

size.samplensuccesses.ofnumberthex

wherenx

p

ˆ

• Statistic and sampling distribution– the statistic used when making inference about ‘p’ is:

– Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, with

= p and 2 = p(1 - p)/n.p̂

12.4 Inference About a Population 12.4 Inference About a Population ProportionProportion

36

Testing and estimating the ProportionTesting and estimating the Proportion

• Test statistic for p

• Interval estimator for p (1- confidence level)

5)p1(nand5npwhere

n/)p1(ppp̂

Z

5)p1(nand5npwhere

n/)p1(ppp̂

Z

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

37

• Example 12.5 (Predicting the winner in election day)– Voters are asked by a certain network to participate in an

exit poll in order to predict the winner on election day.– Based on the data presented in Xm12.5.xls (where

1=Democrat, and 2=Republican), can the network conclude that the republican candidate will win the state college vote?

Testing the ProportionTesting the Proportion

38

• Solution– The problem objective is to describe the population

of votes in the state.– The parameter to be tested is ‘p’.– Success is defined as “Republican vote”.– The hypotheses are:

H0: p = .5

H1: p > .5 More than 50% vote republicanMore than 50% vote republican


39

– Solving by hand• The rejection region is z > z = z.05 = 1.645.• From file Xm12.5.xls we count 407 success. Number of

voters participating is 765.• The sample proportion is• The value of the test statistic is

• The p-value is = P(Z>1.77) = .0382

532.765407p̂

77.1765/)5.1(5.

5.532.

n/)p1(p

pp̂Z


40

z-Test : Proportion

Sample Proportion 0.5321Observations 765Hypothesized Proportion 0.5z Stat 1.7739P(Z<=z) one-tail 0.0382z Critical one-tail 1.6449P(Z<=z) two-tail 0.0764z Critical two-tail 1.96

There is sufficient evidence to reject the null hypothesisin favor of the alternative hypothesis. At 5% significance level we can conclude that more than 50% voted Republican.

Using Data Analysis Plus we have:

< 0.05


41

• Example (marketing application)– In a survey of 2000 TV viewers at 11.40 p.m. on a

certain night, 226 indicated they watched “The Tonight Show”.

– Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use 95% confidence level.

– Solution

014.113.

2000/)887(.113.96.1113.n/)p̂1(p̂zp̂ 2/

Estimating the ProportionEstimating the Proportion

226/2000 = .1131-.113 = .887

42

• Solution

Estimating the ProportionEstimating the Proportion

z - Estimate: Proportion

Sample Proportion 0.113Observations 2000LCL 0.0991UCL 0.1269

Using Excel we have:

LCL = .0991(1,000,000)= 9.9 millionUCL = .1269(1,000,000)=12.7 million

A confidence interval for the number of viewers who watched the tonight Show:

43

Selecting the Sample Size to Estimate Selecting the Sample Size to Estimate the Proportionthe Proportion

• Recall: The confidence interval for the proportion is

• Thus, to estimate the proportion to within W, we can write

n/)p̂1(p̂zp̂ 2/

n/)p̂1(p̂zW 2/

44

Selecting the Sample Size to Estimate Selecting the Sample Size to Estimate the Proportionthe Proportion

• The required sample size is

2

2/

Wn/)p̂1(p̂z

n

2

2/

Wn/)p̂1(p̂z

n

45

• Example– Suppose we want to estimate the proportion of customers

who prefer our company’s brand to within .03 with 95% confidence.

– Find the sample size needed to guarantee that this requirement is met.

– SolutionW = .03; 1 - = .95, therefore /2 = .025, so z.025 = 1.96

2

03.)p̂1(p̂96.1

n

Since the sample has not yet been taken, the sample proportionis still unknown.

We proceed using either one of the following two methods:

Sample Size to Estimate the ProportionSample Size to Estimate the Proportion

46

• Method 1:– There is no knowledge about the value of

• Let . This results in the largest possible n needed for a 1- confidence interval of the form .

• If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below.

5.p̂ 03.p̂

p̂

068,103.

)5.1(5.96.1n

2

68303.

)2.1(2.96.1n

2

Sample Size to Estimate the ProportionSample Size to Estimate the Proportion

• Method 2:– There is some idea about the value of

• Use the value of to calculate the sample sizep̂p̂

Download - 1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter

Top Related