lecture on sampling distributions

Upload: shahidanahmad

Post on 09-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Lecture on Sampling Distributions

    1/31

    Sampling Distributions

    Stat 515 Lecture

  • 8/8/2019 Lecture on Sampling Distributions

    2/31

    12/18/2010 2

    Inching Towards Inference

    Recall that one of our main goals is to make

    inference about the unknown parameters of the

    population or the distribution, such as the mean Q,

    the standard deviation W, or some other summary

    measures such as the median, etc.

    We now have possible models for the population,

    which are provided by the probability distributions

    (Binomial, Poisson, Normal, Uniform, others).

    We also know how to compute sample statistics suchas the sample mean, sample standard deviation, and

    others, with these sample statistics to be used for

    making inference about the parameters.

  • 8/8/2019 Lecture on Sampling Distributions

    3/31

    12/18/2010 3

    Sampling as a Random Experiment

    To understand the notion of a sampling distribution of

    a sample statistic, it is important to realize that the

    process of taking a sample from a population could

    be viewed as a random experiment.

    To illustrate this idea, consider a population taking 3

    values: 2, 4, 5 according to the following probability

    distribution.

    Probability Function: p(2) = .4, p(4) = .5, p(5) = .1 You may imagine that 40% of all the values in the

    population equals 2; 50% equals 4; and 10% equals

    5.

  • 8/8/2019 Lecture on Sampling Distributions

    4/31

    12/18/2010 4

    The Population

    4s

    2s

    5s

  • 8/8/2019 Lecture on Sampling Distributions

    5/31

    12/18/2010 5

    Characteristics of the Population

    For this population, we have the parameters:

    Q = (2)(.4) + (4)(.5) + (5)(.1) = .8 + 2 + .5 = 3.3

    W2 = (2 - 3.3)2(.4) + (4 - 3.3)2(.5) + (5 - 3.3)2(.1) = 1.21

    W = (1.21)

    1/2

    = 1.1

    Its shape is given by the bar graph below:

    0

    0.2

    0.4

    0.6

    2 3 4 5

  • 8/8/2019 Lecture on Sampling Distributions

    6/31

    12/18/2010 6

    Possible Outcomes of Sampling Process

    Possible

    Samples

    Proba bil ity of

    S a mp le

    Sa m ple M ean S a m p le

    Variance

    (2 , 2) (.4)(.4 ) = .16 2 0

    (2 , 4) (.4)(.5 ) = .20 3 2

    (2 , 5) (.4)(.1 ) = .04 3 .5 4 .5

    (4 , 2) (.5)(.4 ) = .20 3 2

    (4 , 4) (.5)(.5 ) = .25 4 0

    (4 , 5) (.5)(.1 ) = .05 4 .5 .5(5 , 2) (.1)(.4 ) = .04 3 .5 4 .5

    (5 , 4) (.1)(.5 ) = .05 4 .5 .5

    (5 , 5) (.1)(.1 ) = .01 5 0

    Now, consider the sampling process of taking n = 2

    observations (with replacement) from this population

    or distribution. Below is a table of possibilities.

  • 8/8/2019 Lecture on Sampling Distributions

    7/31

    12/18/2010 7

    Some Points about the Preceding Table

    Since we are sampling with replacement, to obtain

    the probability of each possible sample, we simply

    multiply the probabilities of each of the observations

    (Think of a tree diagram!). The 9 possible samples represent the elementary

    events of the experiment of taking a sample of size 2

    from the population or distribution.

    The sample mean ( ) is obtained the usual way. The sample variance is computed the usual way. For

    example, for the second sample, we have

    S2 = [(2-3)2 + (4-3)2]/(2-1) = [1 + 1]/1 = 2

    X

  • 8/8/2019 Lecture on Sampling Distributions

    8/31

    12/18/2010 8

    Sample Statistics as Random Variables

    Since the sample mean and the sample variance are

    numerical characteristics of each of the possible

    samples, they can be viewed as random variables in

    this sampling experiment. Therefore, we could obtain the probability

    distributions of the sample mean and sample

    variance.

    These probability distributions are called samplingdistributions.

    Thus we will have the sampling distribution of the

    sample mean, as well as the sample variance.

  • 8/8/2019 Lecture on Sampling Distributions

    9/31

    12/18/2010 9

    Sampling Distribution of the Sample Mean

    From the earlier table, we could construct theprobability distribution of the sample mean, now

    called the sampling distribution of the sample mean.

    This is given by the following table.

    X )(XP )(* XPX)()3.3(

    2XPX

    2 .16 0.32 .2704

    3 .20 + .20 = .40 1.20 .0360

    3.5 .04 + .04 = .08 0.28 .0032

    4 .25 1.00 .12254.5 .05 + .05 = .10 0.45 .1440

    5 .01 0.05 .0289

    Sums 1.00 3.3 .6050

  • 8/8/2019 Lecture on Sampling Distributions

    10/31

    12/18/2010 10

    Graph of the Sampling Distribution of

    the Sample Mean

    Note that it has become more concentrated near the

    population mean of 3.3, compared to the original

    distribution.

    3

    .

    .

    .

    .3

    .

    X

    a r

    Xbar

    am pl ing Dis t r ibu tion o f the

    a m p l e Me a n

    a s e d o n

    a

    a m p l e o f

    i

    e n

  • 8/8/2019 Lecture on Sampling Distributions

    11/31

    12/18/2010 11

    Parameters of the Sampling Distribution

    Because the sampling distribution is just like any

    other probability distribution, we are also able to

    obtain its mean, variance, and standard deviation.

    Thus, for the sampling distribution of the sample

    mean, we find the mean to be 3.3, which coincideswith the original population mean; while

    the variance of the sampling distribution of the

    sample mean turns out to be equal to .605, which is

    equal to (1.21)/2, the population variance divided bythe sample size.

    The standard deviation of the sample mean, now

    called the standard error (SE), is (.605)1/2 = .7778.

  • 8/8/2019 Lecture on Sampling Distributions

    12/31

    12/18/2010 12

    Recapitulation

    Sampling from a probability distribution or population

    could be viewed as a random experiment, and the

    elementary outcomes are the possible samples.

    Sample statistics, such as the sample mean, could

    be viewed as random variables, and as such have

    their associated probability distributions, which are

    called sampling distributions.

    The sampling distribution also has a mean.

    And it also has a variance.

    The standard deviation of the sampling distribution is

    called the standard error (SE).

  • 8/8/2019 Lecture on Sampling Distributions

    13/31

    12/18/2010 13

    Sampling Distribution of the Sample Mean

    The mean of the sampling distribution of the sample

    mean equals the population mean.

    The variance of the sampling distribution of the

    sample mean equals the population variance divided

    by the sample size.

    These two characteristics are always true for the

    sampling distribution of the sample mean when

    sampling with replacement.

  • 8/8/2019 Lecture on Sampling Distributions

    14/31

    12/18/2010 14

    Obtaining Sampling Distributions

    In the example considered, we obtained the sampling

    distribution of the sample mean by enumerating all

    the possible samples that could arise.

    However, such a method is not feasible if the sample

    size is large. For instance, if n = 10, then there will

    be a total of (3)(3)(3)(3) = 310 = 59049 possible

    samples, and complete enumeration is not anymore

    possible.

    How do we obtain sampling distributions?

  • 8/8/2019 Lecture on Sampling Distributions

    15/31

    12/18/2010 15

    Some Methods for Obtaining Sampling

    Distributions of Statistics Complete enumeration, if possible.

    Computer simulation or via the onte Carlo method.

    In this method the computer generates many, many

    samples, and then constructs the probabilityhistogram of the values of the statistic of interest.

    This will provide an empirical approximation.

    Using theoretical results such as, for instance, when

    sampling from a Bernoulli population the number ofsuccesses is binomially-distributed.

    Using theoretical approximations such as the Central

    Limit Theorem or the de oivre approximation.

  • 8/8/2019 Lecture on Sampling Distributions

    16/31

    12/18/2010 16

    Illustrating the Monte Carlo Method

    We illustrate the use of the simulation or onte Carlo

    method by approximating the sampling distribution of

    the sample mean based on n = 10 observations from

    the population considered earlierwhich has:

    p(2) = .4, p(4) = .5, p(5) = .1

    We generate 500 samples of size n = 10 from this

    population, and for each sample we compute the

    sample mean.

    This simulation was done using initab.

  • 8/8/2019 Lecture on Sampling Distributions

    17/31

    12/18/2010 17

    First 10 of the 500Generated Samples

    The table below shows the first 10 samples of size n

    = 10 that were generated from the population.

    Also included are their corresponding sample means.

    y p(y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Sample ean2 0.4 4 2 2 5 4 2 4 2 2 4 3.14 0.5 4 4 2 2 5 4 4 4 2 2 3.3

    5 0.1 4 2 2 2 2 2 4 4 4 5 3.1

    2 2 4 2 2 2 2 2 2 2 2.22 5 2 4 4 4 2 2 5 2 3.2

    4 4 4 2 2 4 2 2 2 4 3.04 4 4 4 2 2 5 2 2 4 3.3

    2 2 2 4 2 2 4 2 2 2 2.42 4 2 2 4 5 5 2 2 2 3.0

    2 5 4 4 2 2 4 4 4 2 3.3

  • 8/8/2019 Lecture on Sampling Distributions

    18/31

    12/18/2010 18

    Relative Frequency Histogram of the 500

    Sample Means

    2

    0

    10

    S l

    RelieFequency(in%)

    Si ul ed Sampling Di i uti n of the Sample

    Mean Based on 10 Observations when Sampling

    from the Population p(2) = .4, p(4) = .5, and p(5) = .1

  • 8/8/2019 Lecture on Sampling Distributions

    19/31

    12/18/2010 19

    Points to Ponder

    This relative frequency histogram of the simulated

    sample means serves as an approximation to the

    sampling distribution of the sample mean when n =

    10 and when sampling from the given population.

    Notice that the values of the sample means are nowclustered around the population mean of 3.3, and

    furthermore, the shape of the histogram is almost

    bell-shaped.

    Looking at this histogram, it also shows that thechances of getting a sample of size n = 10 whose

    sample mean is less than 2.5 or greater than 4.5 is

    rather small.

  • 8/8/2019 Lecture on Sampling Distributions

    20/31

    12/18/2010 20

    When the mean of the 500 sample means is

    computed, it turns out to be 3.3094. [Their median isexactly 3.30!]

    Recall that the population mean is 3.30.

    The standard deviation of the 500 sample means

    turns out to be 0.3497. Recall that the population standard deviation is

    (1.21)1/2 = 1.1, so

    .3478.1622.3

    1.1

    10

    1.1!!!

    n

    W

  • 8/8/2019 Lecture on Sampling Distributions

    21/31

    12/18/2010 21

    We therefore note that the mean of the simulated

    sample means is very close to the population mean,and

    the standard deviation of the simulated sample

    means is also very close to the population standard

    deviation divided by the square root of the sample

    size.

    Indeed, we always have the theoretical results:

    nX

    X

    X

    X

    WW

    QQ

    !!

    !!

    orrorStd.

    oean

  • 8/8/2019 Lecture on Sampling Distributions

    22/31

    12/18/2010 22

    An Important Result About the Sampling

    Distribution of the Sample Mean

    When the population being sampled is a

    normal population with mean Q and standard

    deviation W, then the sampling distribution ofthe sample mean is also normal with mean Q

    and standard error ofW/n1/2, forany sample

    size n.

    When the population is not normal, however,

    then the sampling distribution of the sample

    mean need not be normal. But we have:

  • 8/8/2019 Lecture on Sampling Distributions

    23/31

    12/18/2010 23

    Central Limit Theorem

    If a random sample of size n is taken from a

    population or distribution with mean Q and standard

    deviation W, and if the sample size is large (n > 30),

    then the sampling distribution of the sample mean isapproximately normal with mean Q and standard

    deviation (or standard error) ofW/n1/2. That is,

    .,approx.is2

    nNX WQ

  • 8/8/2019 Lecture on Sampling Distributions

    24/31

    12/18/2010 24

    Uses of the Central Limit Theorem

    Because of this approximation, when computing

    probabilities associated with the sample mean, we

    can use the approximation given below which uses

    the standard normal distribution. Note: Z b N(0,1), the standard normal variable.

    _ a .

    ee}ee

    n

    bZ

    n

    aPbXaPWQ

    WQ

  • 8/8/2019 Lecture on Sampling Distributions

    25/31

    12/18/2010 25

    Applications of the CLT

    Situation 1: Suppose we take a sample of

    size n = 30 from the population described by

    the probability function p(2) = 0.4, p(3) = 0.5,

    p(5) = 0.1. This is the population we wereusing earlier.

    Question 1: We seek the approximate

    probability that the sample mean is between

    3.1 and 3.5. Question 2: Find the approximate probability

    that the sample mean is less than 2.6.

  • 8/8/2019 Lecture on Sampling Distributions

    26/31

    12/18/201026

    Applications continued

    Situation 2: The systolic blood pressure

    population data set has mean Q = 114.58 and

    standard deviation ofW = 14.06. Its

    distribution is not normal as it is right-skewed.Suppose we take a random sample of n = 50

    people, and obtain the sample mean of their

    systolic blood pressures.

    Question 1: What is the approximateprobability that this sample mean will exceed

    120?

  • 8/8/2019 Lecture on Sampling Distributions

    27/31

    12/18/201027

    Continued ...

    Question 2: What would be the value of A

    such that the probability that the samplemean of the systolic blood pressures of a

    sample of size 50 is greater than A is 0.95?

  • 8/8/2019 Lecture on Sampling Distributions

    28/31

    12/18/201028

    Sampling a Bernoulli Population

    A Bernoulli population is one where there are only

    two possible values or outcomes, called a Success,

    denoted by the value of = 1, and a Failure,

    denoted by a value of = 0. The probability of a

    Success is denoted by p.

    For such a population we have:

    ean = Q = p;

    Variance = W2 = p(1-p).

    Consider now taking a sample of size n from this

    population and letting equal the proportion of

    successes in the sample. That is,

    p

  • 8/8/2019 Lecture on Sampling Distributions

    29/31

    12/18/201029

    Sample Proportion

    .

    1

    n

    Successes""oNumber

    1

    X

    Xn

    p

    n

    i

    i

    !

    !

    !

    !

    Because the Bernoulli observations are either

    0 or 1 (with 1 representing success), then

    the sample proportion could be defined via:

  • 8/8/2019 Lecture on Sampling Distributions

    30/31

    12/18/201030

    Sampling Distribution of the Sample

    Proportion

    .,approx.is 2

    !!

    n

    pqpNp

    ppWQ

    Since the sample proportion is the sample mean of

    the observations from a Bernoulli population, by the

    Central Limit Theorem, it follows that the sampling

    distribution of the sample proportion, when thesample size is large (that is n > 30), is approximately

    normal with mean of p and SE of [p(1-p)/n]1/2.

  • 8/8/2019 Lecture on Sampling Distributions

    31/31

    12/18/201031

    An Application

    Situation: One of the ways most Americans relieve

    stress is to reward themselves with sweets.

    According to one study, 46% admit to overeating

    sweet foods when stressed. Suppose that the 46%

    figure is correct and we take a random sample of sizen = 100 Americans and ask them if they overeat

    sweets when they are stressed out.

    Question 1: What is the probability that theproportion who overeats sweets in this sample

    exceeds 0.50?