Distribution of Distribution of total and sample meantotal and sample mean
Sample Statistics & Data display
We can calculate statistics form a sample.These reflect what is happening in the population as a whole. The statistics in the sample reflect the parameters in the
population
Notation Population Sample
Mean
Standard deviation
Variance
x
s
2 2sParameters Statistics
Example
25 65
1625kg
35kg
25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load
( )E T n
5 7
25 7
( )SD T n
If we repeat an experiment a certain number of times, then T is the sum of n independent random variables.
2
( )
( )
( )
E T n
VAR T n
SD T n
A fruit and vegetable market accepts deliveries of crates of apples. Each crate has a weight that is normally distributed with a mean of 21kg and a standard deviation of 0.4 kg. The crates are delivered in groups of 18 on pallets that weigh exactly 30kg.
a) Calculate the mean total weight of a pallet with 18 crates of apples
b) Calculate the standard deviation of the total weight of a pallet with 18 crates of apples.
Central Limit TheoremConsider a sample size n from a population X with a mean of µ and a SD of σ
sample mean =µ
Sample standard deviation s =
Variance s2= σ2
x
n
is sometimes called the standard error of the sample meann
If n is large (>30) then the distribution of the sample means will be approximately a normal distribution
The Central Limit Theorem states that values of the sample means could be expected to average out to the population mean. There is a certain amount of spread about the mean. This is the standard error or standard deviation of the sample mean
xn
Example
A sample of size 20 is taken from a box of beans. The mean length of the beans in the box is 19 cm with a SD of 2.5cm.
a) What would the expected value of the sample be?
b) What would the variance of the sample be?
c) What would the standard error of the sample be?
) The expected value E(X)=μ
19
a
cm
2
2
) The variance =
(2.5)
200.3125
bn
) The standard error
is the standard deviation
=n
2.5
200.559
c
cm
Random Variable
Mean Variance Standard Deviation
PopulationX
Total of n values T
Samplemean
Summary Table from p. 182
2
n 2n n
X
2
n
n
We need to know the difference between the mean, variance and standard deviation, of the population, total of n values and the sample
When we deal with the sum of a few variables We use:
Probabilities for the total
( )E T n2( )VAR T n
( )SD T n
The distribution of the sum is normal, it will be shaped like the bell curve
Lower
Upper
The probability is the area under the curve
the sum will be within a certain given range
Example
A sample of 16 items is taken from a population X with a mean µ=34, and a SD σ=4 Calculate the probability that the total T of 16 items is below 530
16 34 4n
34 16
544
544
530
16 4
16
SD n
Lower: -1Exp99
Upper: 530
= 16
= 544
0.80921
0.8092(4 )
P
dp
Example
A lift is licensed to carry a maximum of 25 passengers. It is overloaded when the total passenger loads exceeds 1700kg. The weight of single passengers chosen at random have a mean of 65kg and a standard deviation of 7kg. Calculate the probability that the lift is overloaded, assuming the lift is carrying 25 passengers.
Probabilities for the sample mean
Sometimes we need to know the probability of where the sample mean is likely to be in relation to the population mean. The sample mean is likely to have a smaller spread as the standard deviation will be smaller
For this we use: 2
( )
( )
VAR Xn
SD Xn
Probability for samples
A sample size of 36 is taken from a normally distributed population with a mean of 40 and a standard deviation of 12. Calculate the probability that the sample mean is
a) Less then 41
b) Between 37 and 42
Confidence IntervalsRemember we calculate statistics from a sample to estimate
the parameters of the population
Each sample mean will be slightly different for every other sample mean, so it is better to give an interval that we will be confident that the sample mean is within. This is our degree of confidence.
The spread of the values that the sample means take gives an idea of how accurate the estimate is. This is called the confidence interval.
The spread on either side of the mean, the standard deviation of the mean is called the standard error
Using the calculator to find confidence intervals
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
95%
0.5
0.475
For the calculator
0.475
Confidence interval between these boundaries 0.5 0.475
0.975
Area
0
Calculator only measures from the far left We can use the
calculator to find Z the number of SDs
Calculating the Sample SizeIf we want to have a certain confidence level that the sample mean of a sample we are going to take, will lie with in given boundaries.
The margin of error is the distance between one of the end points of the interval and the sample mean
Margin of error
e
Eg For 30m<µ<34m, the confidence interval is 32±2m
The margin of error is 2m
n
e=z ×
Margin of error
e
n
e=z ×
µ
A certain make of scientific calculator is known to have a voltage rating with a standard deviation of 0.05v. The mean voltage of 40 of these calculators is 3.02V.
a. Construct a 90% confidence interval for the average voltage.
b. Explain the meaning of this confidence interval
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
95%
0.5
0.475
From the calculator
0.475
Confidence interval between these values
1.96Z
26.58 30.02
z zX X
n n
(1.96)(4.38) (1.96)(4.38)28.3 28.3
25 25
28.3 1.717 28.3 1.717
26.58 30.0228.3
Using the Calculator to check your answer
In Stats mode
F4 intr
Z F1
1 s F1
Sample with one mean
F2 Var
Eg #1 Ex14.1
Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38
Enter values
EXE
26.58 30.02
WB Eg 11
The time taken for an individual to walk to work is to be estimated. On 15 occasions the time in minutes were, 18, 17, 15, 20, 16, 14, 19, 13, 17, 16, 14, 15, 20, 18, 19
a) Find the sample mean and SD
b) Assuming normal distribution and that the sample is sufficiently large, calculate a 95% confidence interval for the mean time to walk to work.
Use the calculator to answer a)16.73(2 ), 2.17(2 )x dp dp
(1.96)(2.17) (1.96)(2.17)16.73 16.73
15 15
16.73 1.10 16.73 1.10
15.6 17.8minutes
95%
0.475+0.5
=0.975
Z=1.96
0.475
z zX X
n n
Interpreting Confidence Intervals
16.73(2 ), 2.17(2 )x dp dp
15.6 17.8minutes
16.7315.6 17.8
There is a 95% probability that the interval 15.6-17.8 contains the true mean.
Ex P75 4.01
Confidence Interval for
Proportions
Confidence Intervals for ProportionsAnother parameter of the population is the population proportion p or π.
This is the probability of success over a large number of trials, which should be similar to the proportion of successes in the population as a whole
The best estimate of the proportion of success for the population is the sample
successes
number of trials
xp
nx
n
X, the random variable for the number of successes in the sample has a approximately a normal distribution.
( )E X np
( ) estimated value of XE X
pp
Example A random sample of 80 households showed that 30% owned
PCs.
Construct a 95% confidence interval for p, the percentage of households that own a PC
There is a 95% probability that the interval 19.96%-40.04%
contains the true population proportion.
(There is 95% probability that the interval 19.96%-40.04% contains the proportion
of households that own PCs.)
150
210p
In a sample of 210 people with high blood pressure a particular drug is found to be effective for 150 of them. Construct a 95% confidence interval for P the proportion of all patients who use this particular drug for high blood pressure
60
210q 1.96z
1 1
2 2
pq pqp Z p p Z
n n
(0.71429)(0.28571) (0.71429)(0.28571)0.71429 (1.96) 0.71429 (1.96)
210 210p
0.71429 0.28571
0.653 0.775p
65.3% 77.5%p
The main purpose of a recent survey was to estimate the proportion of all adult NZers who are opposed to tipping for service in restaurants. The survey used a random sample of 663 adult New Zealanders, of whom 292 indicated that they are opposed to tipping for service.
a) State clearly the parameter of interest in this survey (A)
b) Calculate a 90% confidence interval for the proportion of all adult NZers who oppose tipping.(A)
c) Analyse the effect of increasing the number of adults surveyed on the width of this confidence interval. (E)
d) Suppose 50 independent random samples of adult NZers are taken and 90% confidence interval is constructed from the results of each sample. Analyse the phrase “90% confidence" by making reference to these 50 confidence intervals. (E)
There is 90% probability that the true population proportion lies within the confidenceInterval of any one of the 50 random samples. That is 45 out of 50 confidence intervalscontains the true population proportion.
Motel occupancy rates for July 1997 from a random sample of 35 motels gave the following statistics:
• Sample size 35• Sample mean 0.572• Sample standard deviation:0.0651) Calculate a 95% confidence interval for the mean occupancy rate for July 1997 for the
population sampled. (A)
2) What would be the effect of increasing the level of confidence on the width of this confidence interval? (M)
3) The mean occupancy rate for the same population for July 1996 is 0.585. It is claimed that the mean occupancy rate for July 1997 is the same as the mean occupancy rate for July 1996. Using the confidence interval calculated in (a) at the 95% level of confidence, demonstrate whether the random sample gives us evidence against this claim. (M)
4) Calculate the number of motels needed to be sampled if the mean occupancy rate for July 1997 was to have been estimated to within 0.015 of its true value at the 95% level of confidence. (M)
Confidence interval for the difference between two
means
Confidence interval for the difference between two means
If two populations are the similar then we would expect the difference between their two means to be about zero.
If the populations are different then we would expect the means to be different.
So if two populations are different, the confidence interval of the difference between their means must not contain 0.
Notation mean SD Sample size Sample mean
Population 1
Population 2
1
1
21n
2n
1x
2x
1 2 1 2We use to estimate x x
1 2
1 2
1 2
( ) ( )
( ) ( )
E D E X X
E X E X
1 2
1 2
21 2
1 2
( ) ( )
( ) ( )
VAR D VAR X X
VAR X VAR X
n n
21 2
1 2
( )SD Dn n
21 1 2
1 21 22
Confidence Interval
( )x x Zn n
On formula sheet
1 2
21 2
1 2X X n n
Example A random sample of 30 objects is taken from a normally
distributed population with a SD of 6, another sample of 50 objects is taken from a population with a SD of 8. The mean of the first sample is 115, and that of the second is 108.
1) Construct a 96% confidence interval for µ1- µ2.
2) Explain whether its likely that the two groups have the same mean.
1 23.77 10.23 Is the 96% confidence interval for the difference between the two means.
The interval does not contain 0, so it is not likely that the two means are equal. We can say this with at least 96% confidence.
Students are told to measure the area of the classroom, they provide estimates which are approximately normally distributed with SD=0.15m2. 31 students measured one classroom obtained a mean of 29.76m2 , while 26 students measured another classroom and obtained a mean of 31.23m2. What is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first.
2 11.392 1.548
We are 95% confident that the area of the second exceeds that of the first as zero is not in the confidence interval
This is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first.
Interpretation
If the confidence interval includes zero then we cannot say that there is a difference between the two samples
If zero is not included then we are confident that there is a difference between the two samples
We need to make the assumptions that the samples are large enough and that they are independently selected and that the population they are selected from is normally distributed
a< μ2– μ1 <b
• If both a and b are positive, it is reasonable to assume that μ2 is larger than μ1 by between a and b units. It’s unlikely two means are the same.
• If both a and b are negative, it is reasonable to assume that μ2 is smaller than μ1 by between -a and -b units. It’s unlikely two means are the same.
• If a and b have opposite signs, it is reasonable to assume that μ2 is smaller than μ1 by –a or μ2 is larger than μ1 by b units or somewhere in between. This includes the possibility that the two means are equal.
True or false
A 99% confidence interval for the difference between two means is calculated from sample data. -3.5< μ2– μ1 <9.4.
a. There is a 99% probability that the means are equal because the interval includes 0.
b. 99% of intervals calculated in the same way will include the difference of the two means.
Below is a random sample of times for both male and female competitors to complete the annual Mountain Biking Race.
a) Calculate a 95% confidence interval for the difference between the mean time for males to complete the race and the mean time for females to complete the race.
b) In last year’s race, a similar 95% confidence interval for the difference between µmand µf was calculated and found to be -6.25< µm - µf <1.36. Based on this confidence interval, demonstrate whether there is a significant difference between the mean race times for males and females.
0 is in the 95% interval (-6.25< µm - µf <1.36) so it can be concluded that there is no significant difference between the mean race time for males and females.
Sample size Mean Standard deviation
Male 30 57min 10min
female 30 65min 14min
Below is the summary stats for the length of the snapper surveyed in each region are shown in the table below.
a) Calculate a 95% confidence interval for the difference between the mean length of snapper in the reserve and the non-reserve regions.
b) It is claimed that the ‘average snapper’ in the reserve is at least 130mm longer than the “average snapper” in the non-reserve region. Use the 95% confidence interval from a to analyse the validity of this claim.
95% of the confidence interval between 85.03 and 121.15 contains the difference between the non-reserve and reserve snappar. 130 mm is not in this interval and so one can be 95% sure that this claim is invalid.
Sample size Sample mean Sample standard deviation
Reserve 897 360.18 94.48
Non-reserve’ 47 257.09 59.35
Interpretation of Confidence IntervalsInterpretation of Confidence Intervals The company produces two different models of batteries. ‘power’
and ‘super’. 95 people were interviewed who have used both ‘power’ and ‘super’ batteries, to find out which of the two models these people prefer to use in their torches. Of the 95 people, 63 said that they prefer to use the ‘power’ model in their torches.
a) Find a 95% confidence interval for the proportion of all people who have used both ‘power’ and ‘super’ batteries and prefer to use the ‘power’ model of battery in their torches.
0.568<π<0.758
b) Write a clear description that gives the meaning of this confidence interval.
95% of the confidence intervals from 0.568 and 0.758 contain the true proportion of people who prefer the ‘power’ model.
Calculating Sample Size
If we are given a particular level of confidence we can calculate the sample size (n) to give the required margin of error (e)
first we need to find Z the number of SDs
95%
0.975 for calc
Z=1.96
0.4754
2 1.96n
41.96
2n
3.92n
15.366
16
n
n
n
e=z × 95% confidence interval, σ=4, margin of error e=2
How big is the sample size?
A random variable is known to have a standard deviation of 14. What sample size would be required to be 90% confident that an estimate of the mean was within 2 units of its true value
0.45
0.95 for the calc
1.6448Z
142 1.6448
n
141.6448
2n
11.5136n
132.56n
133n
n
e=z ×
Calculate sample sizefor proportion
A pilot survey from a few tax returns has shown that approximately 12% of all taxpayers are in ‘high-income’ category. If the Inland Revenue Department wishes to estimate this percentage to within 1%, with 96% confidence, how many tax returns should it sample?
Calculating sample size for proportion
A market research company wishes to estimate the percentage of people in a certain age bracket who read a current-affairs magazine. The degree of confidence required for this estimate is 90%. What sample size should be taken to estimate the percentage to within 4%.4%
0.04
e
90%
0.9
unstated so use 0.5
p=0.5
p0.5q
p
For Calculator
0.5 0.45
0.5 0.45
0.95
1.6448
1.645
Z
1
2
pqe Z
n
It is easier to rearrange the formula first
e pq
Z n
2
2
e pq
Z n
2
2
pqZn
e
2
2
(0.5)(0.5)(1.645)
(0.04)n
422.8n 423n
minimum sample size
is 423
Calculating Sample Size
What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy.
σ ze
2) n=1)Use inverse norm to find out Z value σ=1 μ=0
Sample size for population proportion
Radio Sport wishes to conduct an opinion poll on whether the captain of the New Zealand netball team should be replaced. The degree of confidence required for this poll is 95%. What size sample should be used to obtain the percentage to within 5% accuracy? 1)Use inverse norm to find out Z value σ=1 μ=0
pq z2
e22) n=
Sample size for proportion and sample mean
• An opinion poll with a level of confidence of 95% and an estimated value of p of 0.5 has a margin of error of 4%.
How many people would have taken part in the poll?
• A sample of containers of car parts has a mean weight of 40kg and a standard deviation of 5 kg. How many containers would need to have been in the sample to ensure at the 95% level of confidence that the sample was within 0.5kg of the population?
Confidence Interval Revision• Sample mean μ• Sample proportion p• Difference of Means μ1- μ2
• Margin of error is Half of the confidence interval• Sample size for sample mean: n= σz e• Sample size for sample proportion n= pqz2
e2
• Sample size for Difference of means when two σ and n are the same n= 2 σ2z2
e2
2
Meaning of confidence interval• Mean (99%) 99% of such interval include the population mean.
• Proportion (99%) 99% of such interval include the population proportion.
• Difference of means (99%) 99% of such interval include the difference of the two population mean.
• Confidence interval for difference of mean If 0 is included in the confidence interval, no difference between the
two means are suggested. If 0 is not included in the confidence interval, a difference of the two
means are suggested.
Confidence Interval Revision• Mean A sample of 120 wire cables is tested. The mean breaking strain
was found to be 5.4 tonnes with a standard deviation of 1.3 tonnes. Calculate a 95% confidence interval for the breaking strain for this type of wire cable.
• Proportion A sample opinion poll of 200 students is taken and 130 students
are found to support the idea of extending opening hours of the library. Calculate a 99% confidence interval for the proportion of all students in the school in favour of extending the library hours.
• Difference between two means A sample of 150 Longlife batteries showed a mean capability of
140 photos and a standard deviation of 12 photos. A sample of 200 Lastshot batteries showed a mean capability of 120 photos and a std devation of 8 photos. Find 95% confidence interval for the difference in the mean life time between the two brands of batteries.
Sample size (use solver)Sample size (use solver)
• The owner of a camera shop knows that 65% of the customers return to his store. How large a sample would the shop owner have to take to be 95% confident that the sample proportion is within 5% of the true value?
• What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy.
We need to know the difference between T=X1+X2 and Y=2X
1 2
1 2
1 2
2 2
2
is the sum of two random variables,
which can take values.
E(T)=E(X ) E(X )
2
( ) ( ) ( )
2
( )
differen
2
t
T
T X X
VAR T VAR X VAR X
SD T
2
2
Y can represent the outcome
of X multiplied by 2
2
( ) (2 )
2 ( )
2
( ) (2 )
2 ( )
4
( ) 2
Y X
E Y E X
E X
VAR Y VAR X
VAR X
SD Y
ie 2 identical
Normal Distribution
68% of the data is within 1 standard deviation either side of the meanData is likely to be in this region95% of the data is within 2 standard deviations either side of the meanData is very likely to be in this region99% of the data is within 3 standard
deviations either side of the meanData is almost certain to be in this region
T is the outcome of the same variable multiplied by n.
T = nX
E(T)=nμ
VAR(T)=n2σ2
T is the sum of n independent random variables with might take Different values.
1 2 3 ........... nT X X X X
2
( )
( )
( )
E T n
VAR T n
SD T n
The mean petrol usage for a car is 7 litre per day. Standard deviation is 0.3 litre. The cost for petrol is $1.96 per litre. What’s the mean and SD of the cost of petrol per day?
25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load
The apples in the baskets have a mean weight of 1.2g each. And a SD of 0.3g each. Find the mean and SD of a basket of 20 apples.
1 kg of apple costs $1.2. A basket of apple produced from ABC factory has a mean weight of 2.5kg and a SD of 3 kg. What’s the cost of one basket of apples?