normal distribution; sampling distribution; inference using the normal distribution ● continuous...

16
Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution Continuous and discrete distributions; Density curves The important normal distribution and its properties 68-95-99.7 empirical rules Z-scores; percentiles. Distributions of sample statistics in repeated sampling In particular, distribution of the sample proportion and sample mean (normal) Inferring about population parameters: confidence interval construction for proportions and means [We will need to spend two lectures on these materials.]

Upload: benedict-weaver

Post on 16-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution

● Continuous and discrete distributions; Density curves

● The important normal distribution and its properties– 68-95-99.7 empirical rules– Z-scores; percentiles.

● Distributions of sample statistics in repeated sampling– In particular, distribution of the sample proportion

and sample mean (normal)● Inferring about population parameters: confidence

interval construction for proportions and means ● [We will need to spend two lectures on these

materials.]

Page 2: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Continuous and Discrete Distributions● Recall the concept of probability distribution (or

probability model)● Some distributions are for discrete variables (e.g., coin

toss outcome), others for continuous variables (e.g., income or weight data)

● For discrete distributions, we may list all the possible values of the random variable and the associated probabilities (e.g. coin toss)

● For continuous distribution this is not possible (why?) We instead describe it with a density curve (corresponding to a density function)

– The probability of the random variable taking values within any interval is given by the area under the density curve between that interval. (Can we talk about Pr(x=a), a is a single number?)

● How do probability rules operate for continuous distributions?

– e.g. “All possible outcomes together must have probability 1” means the total area under the density curve is 1.

Page 3: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

The Normal Distribution

● The normal distribution is the most important type of continuous distribution

– The distributions of many real world variables (such as weights, heights, and some psychological test scores for a relatively homogeneous group) are approximately normal

– The distribution of sample mean/proportion is approximately normal, even if the distribution of the population from which the samples are drawn is not normal. This fact is tremendously useful for statistical inference, as we shall see.

● The normal density curve is symmetric, bell-shaped, uni-modal, and is completely determined by two values: the expectation (or mean) and the standard deviation (defined in similar fashion as standard deviation in observed data)

Page 4: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Normal Density Curves

Page 5: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

● Knowing the mean and the standard deviation of a normal density curve, we will be able to tell:– What is the probability the variable takes any range

of values– At what percentile is a given value – What value corresponds to a given percentile– Can use tables or software to find the answers– See the normal curve applet:

http://www.whfreeman.com/scc7e or this calculator: http://davidmlane.com/hyperstat/z_table.html

● In particular, we have these useful empirical rules:– 68% of the values fall within one standard deviation

of the mean– 95% of the values fall within two standard

deviations of the mean– 99.7% of the values fall within three standard

deviations of the mean

Normal Density Curves

Page 6: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

The 68-95-99.7 Rule for N(0,1)

Page 7: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Using the Empirical Rules: Example

The distribution of heights of young women aged 18-24 is approximately normal with mean 65in and standard dev. 2.5in.

Page 8: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Using the Empirical Rules: Example

The SAT scores follow N(500, 100). Where does Jenny's 600 score stand? (84th percentile)

Page 9: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Standardized Scores (Z scores)

● There is one normal distribution for each pair of mean/standard deviation, so infinite possible normal distributions.

● All can be transformed into the “standard normal” distribution, which has mean 0 and standard deviation 1. (Statistical tables are made for this one)

● If x comes from a normal distribution with mean and standard deviation then Z=(x-)/ comes from the standard normal distribution.

● Z is called the standardized score, or Z score. It tells us how many standard deviations away the x score is from the mean.

– e.g. the 600 SAT score has a Z score of (600-500)/100=1: it is one standard deviation above the mean of 500.

● Use the normal table (such as the one on the next slide) to find the percentile of a given Z score, or the Z score that has a given percentile. Or better, use the calculator online

Page 10: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Percentiles of the normal distribution: Pr(Z<z)

Standard Standard Standard

Score Percentile Score Percentile Score Percentile

–3.4 0.03 –1.1 13.57 1.2 88.49

–3.3 0.05 –1.0 15.87 1.3 90.32

–3.2 0.07 –0.9 18.41 1.4 91.92

–3.1 0.10 –0.8 21.19 1.5 93.32

–3.0 0.13 –0.7 24.20 1.6 94.52

–2.9 0.19 –0.6 27.42 1.7 95.54

–2.8 0.26 –0.5 30.85 1.8 96.41

–2.7 0.35 –0.4 34.46 1.9 97.13

–2.6 0.47 –0.3 38.21 2.0 97.73

–2.5 0.62 –0.2 42.07 2.1 98.21

–2.4 0.82 –0.1 46.02 2.2 98.61

–2.3 1.07 0.0 50.00 2.3 98.93

–2.2 1.39 0.1 53.98 2.4 99.18

–2.1 1.79 0.2 57.93 2.5 99.38

–2.0 2.27 0.3 61.79 2.6 99.53

–1.9 2.87 0.4 65.54 2.7 99.65

–1.8 3.59 0.5 69.15 2.8 99.74

–1.7 4.46 0.6 72.58 2.9 99.81

–1.6 5.48 0.7 75.80 3.0 99.87

Page 11: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

● Males (ages 18-24) have a mean height of 70 inches and a standard deviation of 2.8 in. Females (ages 18-24) have a mean height of 65 in. and a standard deviation of 2.5 in. What is the standardized score corresponding to your height? What is the percentile?

● What height value is the 90th percentile for men aged 18 to 24?

X = 70 + Z 2.8

= 70 + 1.3 2.8

= 70 + 3.64 = 73.64

● What proportion of men aged 18 to 24 have heights between 65in and 70in?

Z=(65-70)/2.8=-5/2.8=-1.79 --->3.6% below 65in (from the table.) Since 50% is below 70in, the proportion in between is 50%-3.6%=46.4%.

Using the Z Score: More Examples

Page 12: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling. i.e., assigns probabilities to the values a statistic can take. These probabilities must satisfy Rules A-D.

● The sample statistic can take on many values in repeated sampling, so sampling distribution typically described by continuous distributions such as the normal. Probability of the sample statistic falling in a given interval of values determined by the area under the density curve between the interval.

● Often this density curve is a normal curve

– So can apply the “68-95-99.7 rule” or any other tricks we've learned about the normal distribution

● It is proven that sample proportions and sample means are approximately normally distributed.

Sampling Distribution

Page 13: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Sampling Distribution of a Sample Proportion

Page 14: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Sampling Distribution of a Sample Proportion

Page 15: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Sampling Distribution of the Sample Proportion (true p=.5)

Page 16: Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important

Sampling Distribution for Proportion Who Voted

● 61.7% of registered voters actually voted in the 2008 presidential election.

● In a random sample of 1600 voters, the proportion who claimed to have voted was 63.7%

● Such sample proportions from repeated sampling would have a normal distribution with mean .617 and standard deviation .012

● What is the probability of observing a sample proportion as large or larger than .637?

Z=(.637 - .617) / .012 = 1.67 From normal table, this corresponds to about 95% percentile.

So only about 5% chance that observe sample proportion larger than .637.