stats lecture 06 sampling distributions

Upload: katherine-sauer

Post on 06-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    1/51

    Sampling Distributions

    for

    Means and Proportions

    Quantitative Methods for Economics

    Dr. Katherine Sauer

    Metropolitan State College of Denver

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    2/51

    Chapter Overview:

    I. Sampling Distributions: Means

    II. The Central Limit TheoremIII. The Normal Distribution

    IV. Sampling Distributions: Proportions

    V. Desirable Properties of Estimators

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    3/51

    We can use sample statistics to inferthings about thepopulation

    parameters.

    Sometimes the sample statistic (e.g. mean) will be close to thepopulation parameter, sometimes it will not.

    Recall: Greek letters are used for the population, English letters are

    used for the corresponding sample characteristic.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    4/51

    Lets start with some review.

    Suppose our population consists of five numbers.

    3, 1, 5, 6, 2

    Calculate the population mean, variance, and standard deviation.

    = 3.4

    2 = 3.44

    = 1.8547

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    5/51

    I. Sampling Distributions: Means

    Suppose we want a sample of size 2. In the table, list all possiblecombinations of samples of size 2.

    Repeat for a sample of size 3.

    Sample of 2 Sample of 3

    3,1 3,1,5

    3,5 3,1,6

    3,6 3,1,2

    3,2 3,5,6

    1,5 3,5,2

    1,6 3,6,2

    1,2 1,5,65,6 1,5,2

    5,2 1,6,2

    6,2 5,6,2

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    6/51

    Now, calculate each samples mean.

    Sample of 2 Mean Sample of 3 Mean

    3,1 2 3,1,5 3

    3,5 4 3,1,6 3.33

    3,6 4.5 3,1,2 2

    3,2 2.5 3,5,6 4.67

    1,5 3 3,5,2 3.33

    1,6 3.5 3,6,2 3.67

    1,2 1.5 1,5,6 4

    5,6 5.5 1,5,2 2.67

    5,2 3.5 1,6,2 3

    6,2 4 5,6,2 4.33

    Notice that the sample means vary from the population mean.

    from 1.5 to 5.5 for sample of n=2

    from 2 to 4.67 for sample of n=3

    Depending on the sample chosen, the sample mean could be a

    good estimate of the population mean, or not.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    7/51

    Lets calculate the mean of all the sample means and the standard

    deviation of the sample means.

    x x

    For sample of size 2: For sample of size 3:

    mean 3.4 3.4standard dev. 1.1358 0.7572

    The mean of all the sample means is the same as the populationmean.

    The standard deviation of all the sample means decreases as the

    sample size increases.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    8/51

    The standard deviation of all the sample means is called the

    standard error of the mean.

    It can be calculated directly from the samples (as we just did) or

    by using the formula when the population standard deviation is

    known:

    1

    N

    nN

    nx

    1

    N

    nNis the finite population correction factor.

    When N is large, this factor is approximately 1.

    - if the sample size is less than 5% of the

    population size, you dont need the

    correction factor (finite populations)

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    9/51

    The difference between the population mean and its point estimate

    is called the sampling error.

    If point estimates are the same as the population parameters, there

    is no sampling error and the standard error is zero.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    10/51

    Aprobability distribution is a list of every possible outcome with the

    corresponding probability.

    For our example, there are 10 possible samples of size 2. The

    probability of each sample being selected is 0.10.

    Lets plot the probability distribution of our sample means.

    Step 1: Construct a frequency distribution table.

    - 3 intervals is probably appropriate

    - 1.5 to less than 3, 3 to less than 4.5, 4.5 to less than 6

    Interval Frequency

    1.5< x < 3 3

    3 < x < 4.5 5

    4.5 < x < 6 2

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    11/51

    Step 2: Calculate the relative frequencies.

    probability of particular sample x frequency

    Interval Frequency Relative Frequency1.5< x < 3 3 0.3

    3 < x < 4.5 5 0.5

    4.5 < x < 6 2 0.2

    Step 3: Plot the probability histogram.

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    1.5< x < 3 3 < x < 4.5 4.5 < x < 6

    RelativeFreq

    uency

    Sample Mean

    Distribution of Sample Means

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    12/51

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    1.5< x < 3 3 < x < 4.5 4.5 < x < 6

    RelativeFrequency

    Sample Mean

    Distribution of Sample Means

    Notice, even for this very small population and sample size, the

    probability distribution is tending toward the bell shape of the

    Normal Distribution.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    13/51

    II. The Central Limit Theorem says that the probability distribution

    of the sample means

    for samples of size 30 or greater

    selected from any population whose mean and variance

    are known

    approaches a Normal distribution

    with mean and standard deviation .

    The distribution of sample means for sample

    sizes ofn > 30.

    n

    x

    nNx

    ,~

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    14/51

    In addition, the Central Limit Theorem applies for small samples

    from Normal populations, when the population variance is

    known.

    for samples ofany size from a Normal

    distribution with known variance.

    The Central Limit Theorem allows us to calculate

    - probabilities regarding sample means- the limits that contain various percentages of sample

    means

    ( later it will also help us construct confidence intervals)

    nNx

    ,~

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    15/51

    III. The Normal probability distribution

    It has long been recognized that large numbers of measurements,

    when sorted and plotted in a histogram, tend to look like a bell-shaped form.

    This bell-shaped curve is the Normal probability distribution

    curve.

    Formula:

    This formula would trace out a bell-curve, symmetrical around

    the mean of .

    The area under the curve sums to 1.

    - true of any probability distribution

    2

    2

    1

    2

    1)(

    x

    exf

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    16/51

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    17/51

    The probability that a random variable,X, has a value betweenx = a

    andx = b is given by the area under the curve betweenx = a and

    x = b.

    2

    21

    2

    1)(

    x

    exf

    However, we actually dont need to do the integration because the

    Normal curve has some special characteristics that let us find the

    area from a single table.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    18/51

    Special properties of the Normal distribution:

    1. Total area under the curve is one. (true of any probability

    distribution)

    2. The curve is symmetrical about the mean.

    - the area to the left of the mean is 0.5- the area to the right of the mean is 0.5

    3. The area under the curve between the mean and any point

    depends on the number of standard deviations between the pointand the mean.

    - theZ-score is the number of standard deviations

    between the point and the mean

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    19/51

    The area between the mean and a point which is one standard

    deviation from the mean is 0.3413.

    68.26% of the total area is within one standard deviation

    The area between the mean and a point which is two standard

    deviations from the mean is 0.4772.95.44% of the total area is within two standard deviations

    The area between the mean and a point which is three standarddeviations from the mean is 0.4986.

    99.72% of the total area is within three standard deviations

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    20/51

    0.3413 = 34%0.3413

    0.135 = 13.5%

    0.0235 = 2.35%

    0.0015 = 0.15%

    0.135

    0.0235

    0.0015

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    21/51

    The Z-score is calculated as

    Z is the number of standard deviations between the point (x) and

    the mean.

    Calculate Z to two decimal points.

    Once you have Z, use a Normal probability distribution table tofind the area under the curve.

    xZ

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    22/51

    Here is an excerpt from the table.

    Ex: Z = 1.00

    Area in upper

    tail = 0.1587

    Area between

    and +

    = 0.50.1587

    = 0.3413

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    23/51

    Example: Suppose the time it takes to process an email inquiry is

    normally distributed with a mean time of 500 seconds and a

    standard deviation of 10 seconds. What is the probability that a

    selected email will be processed in more than 505 seconds?

    Step 1: Sketch the curve and indicate relevant information.

    Step 2: Calculate Z.

    Z = 505500 = 0.510

    Step 3: Look up in table.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    24/51

    When Z = 0.5, the area in the upper tail is 0.3085.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    25/51

    The probability that an email will take more than 505 seconds to

    process is 0.3085.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    26/51

    What if instead we wanted to know the probability that processing

    an email will take less than 485 seconds?

    Step 1: Sketch the curve and indicate relevant information.

    Step 2: Calculate Z.

    Z = 485500 = -1.5

    10

    Step 3: Look up in table.

    For Z = 1.5 area in tail is 0.0668

    The probability that processing an email will take less than 485

    seconds to process is 0.0668.

    h if i d d k h b bili h i

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    27/51

    What if instead we wanted to know the probability that processing

    an email will take between 485 and 505 seconds?

    Step 1: Sketch the curve and indicate relevant information.

    Step 2: Calculate Z.

    Z1 = 485500 = -1.5 Z2 = 505500 = 0.5

    10 10

    Step 3: Look up in the table.

    For Z = 1.5, area in tail is 0.0668

    For Z = 0.5, area in tail is 0.3085

    Subtract from 1. 10.06680.3085 = 0.6247

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    28/51

    Example: An importer of Herbs and Spices claims that the average

    weight of packets of saffron is 20 grams. However, packets are

    actually filled to an average weight of 19.5 grams with a standard

    deviation of 1.8 grams. A random sample of 36 packets is selected.

    Find the probability that the average weight is 20 grams or more.

    In this example we are dealing with a sample of size n > 30. Well

    apply the CLT and calculate the mean and standard error of thedistribution of means.

    For our sample, and

    nx

    x

    5.19 x 3.036

    8.1

    nx

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    29/51

    Step 1: Sketch the curve and indicate relevant information.

    Step 2: Calculate Z (using the calculated mean and std error).

    Z = 2019.5 = 1.67

    0.3

    Step 3: Look up Z in the table.

    For Z = 1.67, the area in the tail is 0.0475

    This is the probability that the average weight is 20 grams or more.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    30/51

    Instead, lets find the lower and upper limit within which 95% of all

    packets weigh.

    In this case, we are dealing with the population, not the sample. Use

    the population mean and standard deviation.

    Step 1: Sketch the curve and indicate relevant information.

    Step 2: Look up the Z that corresponds to a tail area of 0.025.

    =19.5

    Area in tail above

    line = 0.025

    Area in tail below

    line = 0.025

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    31/51

    When the area of the upper tail is 0.025, Z = 1.96.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    32/51

    =19.5

    Area in tail above

    line = 0.025

    Area in tail below

    line = 0.025

    15.972

    Step 3: Find the upper and lower limits.

    Z x = number of units from the mean

    1.96 x 1.8 = 3.528 grams

    19.5 + 3.528 = 23.028 grams is the upper limit

    19.53.528 = 15.972 grams is the lower limit

    23.028

    95% of the packets of saffron are between 15.972 and 23.028

    grams in weight.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    33/51

    Instead, lets calculate the two limits within which 95% of all

    average weights fall.

    Now we are dealing with the sample of n = 36.

    The methodology is the same as when we use the entire population,

    except well use the standard error of the means instead of the

    standard deviation for the population.

    Step 1: Sketch the curve and indicate relevant information.

    Area in tail above

    line = 0.025

    Area in tail below

    line = 0.025

    5.19 x

    3.036

    8.1

    nx

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    34/51

    Step 2: Look up the Z that corresponds to a tail area of 0.025.

    Z = 1.96

    Step 3: Find the upper and lower limits.

    Z x x = number of units from the mean

    1.96 x 0.3 = 0.588 grams

    19.5 + 0.588 = 20.088 grams is the upper limit

    19.50.588 = 18.912 grams is the lower limit

    95% of the samples average weights are between 18.912 and 20.088

    grams.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    35/51

    IV. Sampling Distributions: Proportions

    Aproportion is the number of elements with a given characteristic

    divided by the total number of elements in the group.ex: The proportion of people who vote in an election is

    the number who vote divided by the number eligible to

    vote.

    X or x are the number of elements with a given characteristic.

    Often times proportions are quoted as percentages.

    The sample proportion is a point estimate of the population

    proportion.

    N

    X

    n

    xp

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    36/51

    Example: Suppose we have the following population of data.3, 1, 5, 6, 2

    Calculate the population proportion of even numbers.

    = 2 = 0.4

    5

    Referring back to our samples of size 2 and 3, calculate the sample

    proportion of even numbers.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    37/51

    3,1 0 3,1,5 0

    3,5 0 3,1,6 1/3 = 0.33

    3,6 1/2 = 0.5 3,1,2 1/3 = 0.33

    3,2 1/2 = 0.5 3,5,6 1/3 = 0.33

    1,5 0 3,5,2 1/3 = 0.33

    1,6 1/2 = 0.5 3,6,2 2/3 = 0.67

    1,2 1/2 = 0.5 1,5,6 1/3 = 0.33

    5,6 1/2 = 0.5 1,5,2 1/3 = 0.33

    5,2 1/2 = 0.5 1,6,2 2/3 = 0.67

    6,2 2/2 = 1 5,6,2 2/3 = 0.67

    Sample

    Proportion

    Sample

    ProportionSample of 3Sample of 2

    Calculate the mean of all sample proportions for each sample size.

    The mean of all the sample proportions is the same as thepopulation proportion.

    For samples of size 2: p = 0.4

    For samples of size 3: p = 0.4

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    38/51

    The standard deviation of all the sample proportions decreases as

    the sample size increases.

    The standard error of all sample proportions is given by

    (when N is large, we can omit the finite population correction

    factor)

    For samples of size 2:

    = (0.3464)(0.8660)

    = 0.29998

    = 0.3

    1

    )1(

    N

    nN

    np

    15

    25

    2

    )4.01(4.0

    p

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    39/51

    For samples of size 3:

    = (0.2828)(0.7071)= 0.199967

    = 0.2

    15

    35

    3

    )4.01(4.0

    p

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    40/51

    The list of every possible sample proportion with its probability

    is called the sampling distribution of proportions.

    Lets plot the probability distribution of our proportions for the

    samples of size 2.

    Step 1: Construct a frequency distribution table.

    - we only have 3 values forp (0, 0.5, 1)

    p Frequency

    0 30.5 6

    1 1

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    41/51

    Step 2: Calculate the relative frequency distribution. (probability

    distribution)

    probability of particular sample x frequency

    0.10

    Step 3: Plot the probability histogram.

    p Frequency Relative Frequency

    0 3 0.3

    0.5 6 0.6

    1 1 0.1

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    42/51

    Notice, even for this very small population and sample size, the

    probability distribution is tending toward the bell shape of the

    Normal Distribution.

    For samples of size 30 or greater the distribution of

    sample proportions is approximately Normal with

    mean and standard deviation

    The distribution of sample proportions

    for sample sizes ofn > 30.

    p

    np

    )1(

    n

    Np)1(

    ,~

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    43/51

    Example: In a certain neighborhood, it is known that 12% of

    people age 16 to 24 are unemployed. If a random sample of 150

    people age 16 to 24 is selected, what is the probability that the

    sample contains at most 10% unemployed?

    Step 1: Calculatep and p .

    In this case, n = 150.

    If 12% of the population is unemployed then = 0.12.

    p = = 0.12

    = 0.0265150

    )12.01(12.0

    p

    Step 2: Sketch the curve and indicate relevant information

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    44/51

    Step 2: Sketch the curve and indicate relevant information.

    Step 3: Calculate Z

    = 0.100.12 = -0.7547 = -0.75

    0.0265

    Step 4: Look up Z in the table.

    Area in tail is 0.2266.

    The probability that at most 10% of the sample is unemployed is

    0.2266.

    = 0.12

    0.10

    p

    pZ

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    45/51

    Instead, lets calculate the probability that the sample contains at

    most 25 unemployed people.

    Step 1: Convert the number into a proportion.

    25 / 150 = 0.16667

    Step 2: Calculatep and p .

    p = = 0.12 p = 0.0265

    S 3 Sk h h d i di l i f i

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    46/51

    Step 3: Sketch the curve and indicate relevant information.

    Pr(p

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    47/51

    Many times the value of the population proportion is unknown.

    We can approximate the mean and standard error of the

    proportions by :

    pp n

    ppsp

    )1(

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    48/51

    V. Some desirable properties of estimators

    1. Estimators should be unbiased.

    accurate

    An estimator is unbiased if the average value of all the pointestimates is equal to the population parameter being estimated.

    To prove that x is an unbiased estimator of we would need to

    show that the expected value of the sample mean is equal to the

    population mean.

    E(x) =.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    49/51

    2. The values of sample statistics vary around the population

    parameter. It is desirable to keep this variance at a minimum

    minimum variance

    precise

    An estimator is precise when the values of the estimates are

    close.

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    50/51

  • 8/3/2019 Stats Lecture 06 Sampling Distributions

    51/51

    Concepts:

    - Central Limit Theorem- Normal distribution

    - desirable properties of estimators

    Skills:For both means and proportions:

    - calculate the mean of all the sample means (proportions) and the

    standard deviation of the sample means (proportions)

    - construct a probability distribution table

    - calculate the probability of an event