sampling( probability and statistics)
TRANSCRIPT
-
8/13/2019 Sampling( probability and statistics)
1/13
FUNDAMENTAL SAMPLINGDISTRIBUTIONS AND DATA
DESCRIPTION
20thNovember
Previous lecture: Sampling, its methods and errors Todays lecture: 1. Sample Mean, sample variance, sample standard deviation 2. Data display and graphical Methods:
Box and whisker plot or boxplot
Quantile plot
Detection of deviation from normality
Normal quantile quantile plot
Normal Probability Plotting
3. Sampling Distributions
Sampling distribution of Means
Central limit Theorem
Sampling Distribution of the Difference between two averages
4. T-Distribution
5. F-distribution
WHAT IS SAMPLING DISTRIBUTION?
Sampling is defined as the process of selecting a number ofobservations (subjects) from all the observations (subjects)from a particular group or population.
Sampling distribution is defined as the frequencydistribution of the statistic for many samples.
It is the distribution of means and is also called thesampling distribution of the mean.
*
FEATURES OF SAMPLING DISTRIBUTION
The 4 features of sampling distribution include:1) The statistic of interest (Proportion, SD, orMean)2) Random selection of sample3) Size of the random sample (very important)4) The characteristics of the population being
sampled.
-
8/13/2019 Sampling( probability and statistics)
2/13
1. SAMPLE MEAN, SAMPLE VARIANCE, SAMPLE STANDARD DEVIATION
The s mple me n from a group of observations is an estimateof the population mean . Given a sample of size n,consider n independent random variables X1, X2, ..., Xn, each
corresponding to one randomly selected observation. Each of
these variables has the distribution of the population, with
mean and standard deviation .
By the properties of means and variances of random variables,the mean and variance of the sample mean are the following:
EXAMPLE
Although the mean of the distribution of is identical to the mean of the population
distribution, the variance is much smaller for large sample sizes.
For example, suppose the random variable Xrecords a randomly selected student'sscore on a national test, where the population distribution for the score is normal
with mean 70 and standard deviation 5 (N(70,5)). Given a simple random sample
(SRS)of 200 students, the distribution of the sample mean score has mean 70 and
standard deviation 5/sqrt(200) = 5/14.14 = 0.35.
FUN
-
8/13/2019 Sampling( probability and statistics)
3/13
CHARACTERISTICS OF SAMPLING
DISTRIBUTION
Central Limit TheoremWhen random samples of size is taken from a population, thedistribution of sample means will approach the normaldistribution.
When the Sampling distribution of the mean has samplesizes of 30 or more then it is said to be normally distributed. So we can see that as the sample size increases to 30 or
more then it resembles a normal distribution, therefore thesize of the random sample is a very important feature of
sampling distribution.
CENTRAL LIMIT THEOREM
1. The random variable xhas a distribution (which mayor may not be normal) with mean and standarddeviation .
2. Samples all of the same sizenare randomly selectedfrom the population of xvalues.
Given:
-
8/13/2019 Sampling( probability and statistics)
4/13
1. The distr ibut ion of samplexwill , as the
sample size increases, approach a normaldistribution.
2. The mean of the sample means will be the
population mean .
3. The standard deviation of the sample means
will approach n
Conclusions:
Central Limit Theorem
-
8/13/2019 Sampling( probability and statistics)
5/13
PRACTICE QS.
A1. The mean of the sampling distribution of the mean is the mean of the population
from which the scores were sampled, in this case 14.
A2. The population has a mean of 30 and a standard deviation of 6. The sample size of
your sampling distribution is N=9. What is the variance of the sampling distributionof the mean?
A3. The standard error is the standard deviation of the population divided by the square
root of N. In this case, 12/4 = 3
A4. According to the central limit theorem, regardless of the shape of the parent
population, the sampling distribution of the mean approaches a normal distribution
as N increases. In this case, a sample size of 30 is sufficiently large to cause the
sampling distribution of the mean to look about normal.
A5. This problem is asking about the sampling distribution of the mean: Mean = 75, SD
= 10/sqrt(25) = 10/5 = 2, Skew = about 0 because the central limit theorem states
that the sampling distribution of the mean would be about normal with a largeenough N.
PRACTICAL RULESCOMMONLY USED:1. For samples of size nlarger than 30, the distribution of the
sample means can be approximated reasonably well by a normaldistribution. The approximation gets better as the sample size nbecomes larger.
2. If the original population is itself normally distributed, then thesample means will be normally distributed for any sample size n(not just the values of nlarger than 30).
-
8/13/2019 Sampling( probability and statistics)
6/13
the mean of the sample meansx=
NOTATION NOTATIONthe mean of the sample means
the standard deviation of sample meanx=
x= n
NOTATIONthe mean of the sample means
the standard deviation of sample mean
(often called standard error of the mean)
x=
x=
n
Distribution of 200 digits fromSocial Security Numbers
(Last 4 digits from 50 students)
Figure 5-19
-
8/13/2019 Sampling( probability and statistics)
7/13
Distribution of 50 Sample Meansfor 50 Students
Figure 5-20
As the sample size increases, thesampling distribution of samplemeans approaches a normal
distribution.
EXAMPLE 1GIVEN THE POPULATION OF MEN HAS NORMALLY DISTRIBUTEDWEIGHTS WITH A MEAN OF 172 LB AND A STANDARD DEVIATION OF 29LB,A IF ONE MAN IS RANDOMLY SELECTED, FIND THE PROBABILITYTHAT HIS WEIGHT IS GREATER THAN 167 LB.B IF 12 DIFFERENT MEN ARE RANDOMLY SELECTED, FIND THEPROBABILITY THAT THEIR MEAN WEIGHT IS GREATER THAN 167 LB.
-
8/13/2019 Sampling( probability and statistics)
8/13
EXAMPLE 1: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,A IF ONE MAN IS RANDOMLY SELECTED, FIND THE PROBABILITYTHAT HIS WEIGHT IS GREATER THAN 167 LB.z = 167 172 = 0.17
29
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,A IF ONE MAN IS RANDOMLY SELECTED, THE PROBABILITY THAT HISWEIGHT IS GREATER THAN 167 LB. IS 0.5675.
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,B IF 12 DIFFERENT MEN ARE RANDOMLY SELECTED, FIND THEPROBABILITY THAT THEIR MEAN WEIGHT IS GREATER THAN 167 LB.
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,B IF 12 DIFFERENT MEN ARE RANDOMLY SELECTED, FIND THEPROBABILITY THAT THEIR MEAN WEIGHT IS GREATER THAN 167 LB.
-
8/13/2019 Sampling( probability and statistics)
9/13
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,B IF 12 DIFFERENT MEN ARE RANDOMLY SELECTED, FIND THEPROBABILITY THAT THEIR MEAN WEIGHT IS GREATER THAN 167 LB.
z = 167 172 = 0.6029
36
z = 167 172 = 0.6029
36
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 143 LB AND A STANDARDDEVIATION OF 29 LB,B. IF 12 DIFFERENT MEN ARE RANDOMLY SELECTED, THEPROBABILITY THAT THEIR MEAN WEIGHT IS GREATER THAN 167 LB IS0.7257.
EXAMPLE: GIVEN THE POPULATION OF MEN HAS NORMALLYDISTRIBUTED WEIGHTS WITH A MEAN OF 172 LB AND A STANDARDDEVIATION OF 29 LB,
b) if 12 different men are randomly selected, their mean
weight is greater than 167 lb.
P(x > 167) = 0.7257It is much easier for an individual to deviate from themean than it is for a group of 12 to deviate from the mean.
a) if one man is randomly selected, find the probabili ty
that his weight is greater than 167 lb.
P(x > 167) = 0.5675
NON-NORMAL POPULATIONS
What can we say about the shape of the samplingdistribution of x when the population from which thesample is selected is not normal?
53
490
10272
35 21 26 17 8 10 2 3 1 0 0 1
0
100
200
300
400
500
600
Frequency
Salary($1,000's)
BaseballSalaries
-
8/13/2019 Sampling( probability and statistics)
10/13
THE IMPORTANCE OF THE CENTRAL
LIMIT THEOREM
When we select simple random samples ofsize n, the sample means we find will vary
from sample to sample. We can model the
distribution of these sample means with a
probability model that is
,N n
HOW LARGE SHOULD N BE?
For the purpose of applying the central limit
theorem, we will consider a sample size to belarge when n > 30.
SUMMARYPopulation: mean ; stand dev. ; shape ofpopulation dist. is unknown; value of isunknown; select random sample of size n;Sampling distribution of x:mean ; stand. dev. /n;always trueBy the Central Limit Theorem:the shape of the sampling distribution isapprox normal, that is
x ~ N(, /n)
EXAMPLE
( ) 48
A random sample of =64 observations is
drawn from a population with mean =15
and standard deviation =4.
a. ( ) 15; ( ) .5
b. The shape of the sampling distribution model for
is approx. no
SD X
n
n
E X SD X
x
( )
rmal (by the CLT) with
mean E(X) 15 and ( ) .5. The answer
depends on the sample size since ( ) .SD X
n
SD X
SD X
-
8/13/2019 Sampling( probability and statistics)
11/13
GRAPHICALLY
Shape of population
dist. not known
EXAMPLE (CONT.)
15.5 15 .5.5 .5( )
c. 15.5;
1
This means that =15.5 is one standard
deviation above the mean ( ) 15
x
SD X
x
z
x
E X
EXAMPLE 2
The probability distribution of 6-month
incomes of account executives has mean
$20,000 and standard deviation $5,000.
a) A single executives income is $20,000.Can it be said that this executives income
exceeds 50% of all account executive
incomes?
ANSWER No. P(X
-
8/13/2019 Sampling( probability and statistics)
12/13
EXAMPLE 3
A sample of size n=16 is drawn from a
normally distributed population with mean
E(x)=20 and SD(x)=8.
8
16
20 24 202 2
16 20 24 202 2
~ (20,8); ~ (20, )
) ( 24) ( ) ( 2)
1 .9772 .0228
) (16 24)
( 2 2) .9772 .0228 .9544
X
X N X N
a P X P P z
b P X P z
P z
EXAMPLE 3 (CONT.)
c. Do we need the Central Limit Theorem to
solve part a or part b?
NO. We are given that the population is
normal, so the sampling distribution of the
mean will also be normal for any sample size n.
The CLT is not needed.
EXAMPLE 4
Battery life X~N(20, 10). Guarantee: avg.battery life in a case of 24 exceeds 16 hrs.Find the probability that a randomly
selected case meets the guarantee.
10
24
20 16 20
2.04 2.04
( ) 20; ( ) 2.04. ~ (20,2.04)
( 16) ( ) ( 1.96)
.1 .0250 .9750
X
E x SD x X N
P X P P z
EXAMPLE 5
Cans of salmon are supposed to have a netweight of 6 oz. The canner says that thenet weight is a random variable with
mean =6.05 oz. and stand. dev. =.18oz.Suppose you take a random sample of 36
cans and calculate the sample meanweight to be 5.97 oz.
Find the probability that the mean weight
of the sample is less than or equal to 5.97oz.
-
8/13/2019 Sampling( probability and statistics)
13/13
POPULATION X: AMOUNT OF SALMON
IN A CAN
E(X)=6.05 OZ, SD(X) = .18 OZ
X sampling dist: E(x)=6.05 SD(x)=.18/6=.03
By the CLT, X sampling dist is approx. normal
P(X 5.97) = P(z [5.97-6.05]/.03)
=P(z -.08/.03)=P(z -2.67)= .0038
How could you use this answer?
Suppose you work for a consumerwatchdog group If you sampled the weights of 36 cans andobtained a sample mean x 5.97 oz., whatwould you think? Since P( x 5.97) = .0038, either
you observed a rare event (recall: 5.97 oz is2.67 stand. dev. below the mean) and the meanfill E(x) is in fact 6.05 oz. (the value claimed bythe canner)
the true mean fill is less than 6.05 oz., (thecanner is lying ).
EXAMPLE 6
X: weekly income. E(x)=600, SD(x) = 100
n=25; X sampling dist: E(x)=600SD(x)=100/5=20
P(X 550)=P(z [550-600]/20)
=P(z -50/20)=P(z -2.50) = .0062
Suspicious of claim that average is $600;
evidence is that average income is less.
EXAMPLE 7
12% of students at NCSU are left-handed. Whatis the probability that in a sample of 50students, the sample proportion that are left-
handed is less than 11%?
.12*.88 ( ) .12; ( ) .046
50E p p SD p
.12 .11 .12( .11)
.046 .046
( .22) .4129
pP p P
P z
By the CLT, ~ (.12,.046)p N