1
Chapter 12
Inference About a Inference About a PopulationPopulation
Inference About a Inference About a PopulationPopulation
2
IntroductionIntroduction
• In this chapter we utilize the approach In this chapter we utilize the approach developed before to describe a population.developed before to describe a population.– Identify the parameter to be estimated or tested.Identify the parameter to be estimated or tested.– Specify the parameter’s estimator and its sampling Specify the parameter’s estimator and its sampling
distribution.distribution.– Construct a confidence interval estimator or perform Construct a confidence interval estimator or perform
a hypothesis test.a hypothesis test.
3
• We shall develop techniques to estimate and test three population parameters.– The expected value – The variance 2
– The population proportion p (for qualitative data)
IntroductionIntroduction
4
• Recall: By the central limit theorem, when 2 is known is normally distributed if:
• the sample is drawn from a normal population, or • the population is not normal but the sample is sufficiently large.
• When 2 is unknown, another random variable
describes the distribution of
x
12.1 Inference About a Population Mean 12.1 Inference About a Population Mean When the Population Standard Deviation When the Population Standard Deviation is Unknownis Unknown
x
5
The t - StatisticThe t - Statistic
n
x
n
x
Z s
When the sampled population is normally distributed,the statistic t is Student t distributed. See next.
When is unknown, we use s2 instead, and the Z statistic is then replaced by the t-statistic
t
6
The t - StatisticThe t - Statistic
n
x
n
x
s
0
The Student- t distribution is mound-shaped, and symmetrical around zero.
Degrees of freedom = n2
Degrees of freedom= n1
n1 < n2
t
Using the t-table
The degrees of freedom determine the distribution shape
7
Testing Testing when when is unknown is unknown
• Example 12.1 - Productivity of newly hired Trainees
8
• Example 12.1– In order to determine the number of workers required to meet
demand, the productivity of newly hired trainees is studied.
– It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring.
– Can we conclude that this belief is correct, based on productivity observation of 50 trainees (raw data is presented later in the file Xm12-01).
Testing Testing when when is unknown is unknown
9
• Example 12.1 – Solution– The problem objective is to describe the population
of the number of packages processed in one hour.– The data is quantitative.
H0: = 450H1: > 450
– The t statistic
d.f. = n - 1 = 49ns
xt
We want to prove that the trainees
reach 90% productivity of experienced workers
We want to prove that the trainees
reach 90% productivity of experienced workers
Testing Testing when when is unknown is unknown
10
After transforming into a t-statistic we express the rejection region in terms of the statistic t.
• Solution - continued
Observe: H1 has the form of > 0, thus
The rejection region is
Testing Testing when when is unknown is unknown
Lxx
t t,n-1t t,n-1
x
11
• Solution continued (solving by hand)
The rejection region is t > t,n – 1.
t,n - 1 = t.05,49
Testing Testing when when is unknown is unknown
The critical value (table entry)
t.05,50 = 1.676
You can use the Excel function =tinv to obtain the critical value. This function gives the two-tail probability ‘t value’. That is, for a two tail test with significance level of alpha, it returns the critical value of t,n – 1. Since our test is one-tail, we’ll use 2 instead of . Thus, type in =tinv(.1,49), to obtain the result 1.676551.
2(.05) = .1
12
89.15083.38
45038.460
ns
xt
• Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative.
• Conclusion: There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
1.676 1.89
Rejection region
Testing Testing when when is unknown is unknown
The test statistic is calculated based on the data provided in Xm12-01
13
Testing Testing when when is unknown is unknown
.05
.0323
Xm12-01.xls
Using Data Analysis Plus and the p-value approachto test the mean.
t-Test: Mean
Packages
Mean 460.38
Standard Deviation 38.8271
Hypothesized Mean 450
df 49
t Stat 1.8904
P(T<=t) one-tail 0.0323
t Critical one-tail 1.6766
Since .02323 < .05, we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
1.89
14
Estimating Estimating when when is unknown is unknown
• Confidence interval estimator of when s2 is unknown
1n.f.dn
stx 2 1n.f.d
n
stx 2
15
• Example 12.2– An investor is trying to estimate the return on
investment in companies that won quality awards last year.
– A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them.
– Construct a 95% confidence interval for the mean return.
Estimating Estimating when when is unknown is unknown
16
• Solution (solving by hand)– The problem objective is to describe the population
of annual returns from buying shares of quality award-winners.
– The data is quantitative.– Solving by hand
• From the data we determine
8.3168.98s
68.98s15.02x 2
835.16,205.138331.8
990.102.15n
stx 1n,2
t.025,82 t.025,80
Estimating Estimating when when is unknown is unknown
17
Estimating Estimating when when is unknown is unknown
t - estimate: MeanReturns
Mean 15.0172Standard Deviation 8.3054LCL 13.0237UCL 16.8307
Using Data Analysis Plus
18
Checking the required conditionsChecking the required conditions
• We need to check that the population is normally distributed, or at least not extremely non-normal.
• There are statistical methods that can be used to test for normality (to be introduced later in the book, but not discussed here).
• From the sample histograms we see…
19
0
5
10
15
20
25
30
-4 2 8 14 22 30 More
02468
101214
400 425 450 475 500 525 550 575 More
A Histogram for XM-11- 01
PackagesA Histogram for XM-11- 02
Returns
20
12.2 Inference About a Population Variance12.2 Inference About a Population Variance
• Some times we are interested in making inference about the variability of processes.
• Examples:– The consistency of a production process for quality control
purposes.– To evaluate the risk associated with different investments.
• To draw inference about variability, the parameter of interest is 2.
21
• The population variance can be estimated or its value tested using the sample variance s2.
• The sample variance s2 is an unbiased, consistent and efficient point estimator for 2.
• The inference about 2 is made by using a sample statistic that incorporates s2 and 2.
Inference About a Population VarianceInference About a Population Variance
22
• This statistic is .
• It has a distribution called Chi-squared, if the population is normally distributed.
2
2s)1n(
Inference About a Population VarianceInference About a Population Variance
1ndfσ
1)s(n2
22
1ndf
σ1)s(n2
22
23
Inference About a Population VarianceInference About a Population Variance
1ndfσ
1)s(nχ 2
22
1ndf
σ1)s(n
χ 2
22
0
0.02
0.04
0.06
0.08
0.1
0 5 10 15 20 25
DF = 5
DF=10
The Chi-squared distribution
The degfrees of freedom (df)determines the distribution shape
24
• Example 1 (operation management application)– A container-filling machine is believed to fill 1 liter
containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter).
– To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03.xls)
– Do these data support the belief that the variance is less than 1cc at 5% significance level?
Testing the population variance – Testing the population variance – Left hand tail testLeft hand tail test
25
• Solution– The problem objective is to describe the population of 1-liter fills
from a filling machine. – The data are quantitative, and we are interested in the variability
of the fills.– The two hypotheses are:
H0: 2 = 1
H1: 2 <1We want to prove that the process is consistent
Testing the population varianceTesting the population variance
s2 Critical Values2 Critical Value
The rejection region has the form:
26
Testing the population varianceTesting the population variance• Solution
– The two hypotheses are:H0: 2 = 1
H1: 2 <1
21n,1
2 21n,1
2
The rejection region in terms of 2 is:
27
• Solving by hand– Note that (n - 1)s2 = (xi - x)2 = xi
2 – (xi)2/n – From the sample (data is presented in units of cc-1000
to avoid rounding) we can calculate xi = 24,996.4, and
xi2 = 24,992,821.3
– Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78
Testing the population varianceTesting the population variance
28
There is insufficient evidence to reject the hypothesis thatthe variance is equal to 1cc.
There is insufficient evidence to reject the hypothesis thatthe variance is equal to 1cc.
Testing the population varianceTesting the population variance
Using the 2 table
Rejection Region
20.7813.84
Since 20.78>13.8484 do not rejectthe null hypothesis
.8484.13
,78.201
78.20s)1n(
2125,95.
21n,1
22
22
29
• A right hand tail test:• H0: 2 = value
H1: 2 > value
• Rejection region
Testing the population variance – Testing the population variance – Right hand tail test; Two tail test;Right hand tail test; Two tail test;
21n,
2 21n,
2
Click
30
• A right hand tail test:– H0: 2 = value
H1: 2 > value
– Rejection region
• A two tail test– H0: 2 value
H1: 2 value
– Rejection region:
21n,
2 21n,
2
21n,2
221n,21
2 or 21n,2
221n,21
2 or
Testing the population variance – Testing the population variance – Right hand tail test; Two tail test;Right hand tail test; Two tail test;
31
Estimating the population varianceEstimating the population variance
From the following probability statement
P(21-/2 < 2 < 2
/2) = 1-
we have (by substituting 2 = [(n - 1)s2]/2.)
22/1
22
22/
2 s)1n(s)1n(
22/1
22
22/
2 s)1n(s)1n(
This is the confidence interval for 2
with 1- % confidence level.
32
Estimating the population varianceEstimating the population variance
• Example 2– Estimate the variance of fills in example 12.3 with
99% confidence.• Solution
– We have (n-1)s2 = 20.78.From the Chi-squared table we have2
/2,n-1 = 2.005, 24 = 45.5585
2/2,n-1 = 2
.0995, 24 = 9.88623
33
• The confidence interval is
10.246.
88623.978.20
5585.4578.20
s)1n(s)1n(
2
2
2
2/1
22
2
2/
2
Estimating the population varianceEstimating the population variance
34
12.4 Inference About a Population 12.4 Inference About a Population ProportionProportion
• When the population consists of nominal or categorical data, the only inference we can make is about the proportion of occurrence of a certain value.
• The parameter “p” was used before to calculate these proportion under the binomial distribution.
35
size.samplensuccesses.ofnumberthex
wherenx
p
ˆ
size.samplensuccesses.ofnumberthex
wherenx
p
ˆ
• Statistic and sampling distribution– the statistic used when making inference about ‘p’ is:
– Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, with
= p and 2 = p(1 - p)/n.p̂
12.4 Inference About a Population 12.4 Inference About a Population ProportionProportion
36
Testing and estimating the ProportionTesting and estimating the Proportion
• Test statistic for p
• Interval estimator for p (1- confidence level)
5)p1(nand5npwhere
n/)p1(ppp̂
Z
5)p1(nand5npwhere
n/)p1(ppp̂
Z
5)p̂1(nand5p̂nprovided
n/)p̂1(p̂zp̂ 2/
5)p̂1(nand5p̂nprovided
n/)p̂1(p̂zp̂ 2/
37
• Example 12.5 (Predicting the winner in election day)– Voters are asked by a certain network to participate in an
exit poll in order to predict the winner on election day.– Based on the data presented in Xm12.5.xls (where
1=Democrat, and 2=Republican), can the network conclude that the republican candidate will win the state college vote?
Testing the ProportionTesting the Proportion
38
• Solution– The problem objective is to describe the population
of votes in the state.– The parameter to be tested is ‘p’.– Success is defined as “Republican vote”.– The hypotheses are:
H0: p = .5
H1: p > .5 More than 50% vote republicanMore than 50% vote republican
Testing the ProportionTesting the Proportion
39
– Solving by hand• The rejection region is z > z = z.05 = 1.645.• From file Xm12.5.xls we count 407 success. Number of
voters participating is 765.• The sample proportion is• The value of the test statistic is
• The p-value is = P(Z>1.77) = .0382
532.765407p̂
77.1765/)5.1(5.
5.532.
n/)p1(p
pp̂Z
Testing the ProportionTesting the Proportion
40
z-Test : Proportion
Sample Proportion 0.5321Observations 765Hypothesized Proportion 0.5z Stat 1.7739P(Z<=z) one-tail 0.0382z Critical one-tail 1.6449P(Z<=z) two-tail 0.0764z Critical two-tail 1.96
There is sufficient evidence to reject the null hypothesisin favor of the alternative hypothesis. At 5% significance level we can conclude that more than 50% voted Republican.
Using Data Analysis Plus we have:
< 0.05
Testing the ProportionTesting the Proportion
41
• Example (marketing application)– In a survey of 2000 TV viewers at 11.40 p.m. on a
certain night, 226 indicated they watched “The Tonight Show”.
– Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use 95% confidence level.
– Solution
014.113.
2000/)887(.113.96.1113.n/)p̂1(p̂zp̂ 2/
Estimating the ProportionEstimating the Proportion
226/2000 = .1131-.113 = .887
42
• Solution
Estimating the ProportionEstimating the Proportion
z - Estimate: Proportion
Sample Proportion 0.113Observations 2000LCL 0.0991UCL 0.1269
Using Excel we have:
LCL = .0991(1,000,000)= 9.9 millionUCL = .1269(1,000,000)=12.7 million
A confidence interval for the number of viewers who watched the tonight Show:
43
Selecting the Sample Size to Estimate Selecting the Sample Size to Estimate the Proportionthe Proportion
• Recall: The confidence interval for the proportion is
• Thus, to estimate the proportion to within W, we can write
n/)p̂1(p̂zp̂ 2/
n/)p̂1(p̂zW 2/
44
Selecting the Sample Size to Estimate Selecting the Sample Size to Estimate the Proportionthe Proportion
• The required sample size is
2
2/
Wn/)p̂1(p̂z
n
2
2/
Wn/)p̂1(p̂z
n
45
• Example– Suppose we want to estimate the proportion of customers
who prefer our company’s brand to within .03 with 95% confidence.
– Find the sample size needed to guarantee that this requirement is met.
– SolutionW = .03; 1 - = .95, therefore /2 = .025, so z.025 = 1.96
2
03.)p̂1(p̂96.1
n
Since the sample has not yet been taken, the sample proportionis still unknown.
We proceed using either one of the following two methods:
Sample Size to Estimate the ProportionSample Size to Estimate the Proportion
46
• Method 1:– There is no knowledge about the value of
• Let . This results in the largest possible n needed for a 1- confidence interval of the form .
• If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below.
5.p̂ 03.p̂
p̂
068,103.
)5.1(5.96.1n
2
68303.
)2.1(2.96.1n
2
Sample Size to Estimate the ProportionSample Size to Estimate the Proportion
• Method 2:– There is some idea about the value of
• Use the value of to calculate the sample sizep̂p̂