stat 111 introductory statistics lecture 9: inference and estimation june 2, 2004

STAT 111 Introductory Statistics

Lecture 9: Inference and Estimation

June 2, 2004

Today’s Topics

• Introduction to statistical inference

• Point Estimation

• Confidence Intervals

Introduction

• The application of the methods of probability to the analysis and interpretation of empirical data is known as statistical inference.

• More specifically, it is the process by which we generalize from a particular sample to the theoretical population from which the sample came.

Introduction

• The precise form of the generalization can vary considerably from situation to situation.

• Possible forms of statistical inference:– Single numerical estimate– Range of numerical estimates– Simple “yes” or “no”

Example

• Suppose the chief programming executive at ABC is trying to decide which shows to cancel and which to renew.

• Data might be the day-by-day logs of programs that are watched by a random sample of families.

• Task: use sample information to estimate total number of viewers tuned to ABC programs.

Example

• Suppose instead that a zoologist would like whether a particular species of vampire bat prefers blood at room temperature or at body temperature.

• Equal numbers of similar bats into 2 cages; cage A has blood at room temperature, B at body temperature.

• He finds bats in A consumed 3% more blood.

Estimation and Hypothesis Testing

• The two previous examples highlight the two broad areas into which statistical inference is traditionally divided.

• In the first example, inference is numerical. This is the area referred to as estimation.

• In the second example, the inference is instead a “yes” or “no” decision between two conflicting theories. This is what we call hypothesis testing.

• Both areas have wide applicability.

Parameter Estimation

• In many situations, the family of probability models describing a phenomenon may be known (or at least assumed to be known), but the particular member of the family that best describes that phenomenon may be unknown.

• Hence, estimating the unknown parameter or parameters of a presumed data model is usually one of the steps we have to take in an inference problem.

Parameter Estimation

• Our usual goal then is to estimate the value of the (unknown) population parameter based on an appropriate statistic we observe from a random sample of that population.

• Two types of estimation exist:– Point estimation – This is what we meant by a single

numerical estimate.– Confidence interval – This is what we meant by a

range of numerical estimates.

Point Estimation

• A point estimator draws inference about a population by estimating the value of an unknown parameter using a single value or a point.

Population distribution

parameter

Point estimator

Sampling distribution

Point Estimation

• A point estimate summarizes up the value of the population parameter using a single value.

• Naturally, then, we have some properties for a point estimate that we would desire in order to feel comfortable using it.

• What sort of properties should a good (point) estimator have?

Desirable Properties of Estimators

• Certainly, it seems reasonable to ask as a first condition that the sampling distribution of our estimator be somehow “centered” with respect to the population parameter.

• If this condition is not met, then our point estimator will tend to be consistently overestimating or underestimating the value of the parameter, something that we typically do not desire.


• This first condition is what we call unbiasedness.

• In other words, on the average, a good estimator will be equal to the population parameter it is estimating.

• Mathematically, if W is an estimator, and θ is the population parameter being estimated by W, then W is unbiased if

allfor ,)( WE


• A second property of a good estimator is precision. An estimator is said to be precise if its distribution’s dispersion is small.

• The idea of precision leads to the concept of efficiency.

• Suppose we have multiple unbiased estimators for the population parameter. Which one should we use? Are they all equivalent, or are some better than others?


• Formally, let W1 and W2 be two unbiased estimators for a population parameter θ with variances Var(W1) and Var(W2), respectively.

• Then W1 is said to be more efficient than W2 if Var(W1) is less than Var(W2).

• We define the relative efficiency of W1 with respect to W2 as the ratio Var(W2) / Var(W1).

• Which is the more efficient estimator if this ratio is less than 1? Greater than 1?


• Unbiasedness and efficiency lead to the most basic characterizations of point estimates, but there are other properties of a statistic and its sampling distribution that merit examination.

• The first concerns the limiting behavior of the statistic as the sample size n gets large.

• In some cases, it is possible that the sampling distribution has some very desirable properties in the limit that it fails to possess for any finite n.


• Consistency is one such property of the sampling distribution that appears in the limit.

• Roughly speaking, an estimator is consistent if, as n gets large, the probability that our statistic W lies arbitrarily close to the parameter being estimated becomes arbitrarily close to 1.

• Two immediate implications of consistency:– W is asymptotically unbiased– Var(W) converges to 0


• The last property we might desire from an estimator is sufficiency.

• If we draw a sample of size n from some population with a given distribution, we know that the sample space is all possible n-tuples.

• An estimator W, then, has the effect of partitioning this sample space into a set of mutually exclusive subsets.


• As an example, suppose we draw two observations from a discrete distribution on the non-negative integers, and we define our statistic W as the mean of these two observations.

• Then, W is observed to be 3 for any one of the following pairs of observations: (0,6), (1,5), (2,4), (3,3), (4,2), (5,1). And similarly, W will equal 2.5 if the outcome of our draws is (0,5), (1,4), (2,3), (3,2), (4,1), or (5,0).


• So, in this example, knowing the sample mean W of our outcome provides the same amount of information as the actual outcome itself does.

• In other words, W is sufficient for the population parameter we are trying to estimate.

• A statistic is sufficient if knowing its value gives us just as much information about the parameter of interest as knowing the actual sample itself does.

Example

• Let X1, …, Xn be a simple random sample from a population with mean µ and variance σ2.

• Suppose the sample size is larger than 1, and let m be an integer between 1 and n (i.e., 1 < m < n).

• Consider these three estimators for µ:

n

XXX

m

XXXX n

nm

m

111

Example

• Which of these estimators is unbiased for µ?

• What are the relative efficiencies of the three estimators (pairwise comparisons)?– Based on these results, which estimator is the most

efficient? The least?

Interval Estimation

• An interval estimator draws inference about a population by estimating the value of an unknown parameter using a interval

Population distribution

parameter

Sampling distribution

Interval estimator

Confidence Intervals

• A confidence interval has the form

estimate ± margin of error

• The estimate is our guess for the value of the unknown population parameter.

• The margin of error shows how accurate we believe our guess is, based on the variability of the estimate.

Example

• The heights of American female students aged 18 to 24 are approximately normal with mean µ and standard deviation 2.5. We repeatedly select 100 female students at random. The sample mean follows the normal distribution with mean µ and standard deviation

0.25100

5.2

X

Example

• According to 68-95-99.7 rule, the probability is about 0.95 that will be within 0.5 inches(two standard deviations) of the population mean µ.

• To say that lies within 0.5 inches of µ is the same as saying that µ lies within 0.5 inches of

• So 95% of all samples we take will capture the true µ in the interval from to

X

XX

5.0X 5.0X

Example

• Suppose now we observe a sample with

• Then, for the interval [63 – 0.5, 63 + 0.5] = [62.5, 63.5], we have two possibilities:– The interval between 62.5 and 63.6 contains the true µ.

– Our SRS was one of the few samples for which

is not within 0.5 inches of the true µ. Only 5% of all samples will give such inaccurate results.

63X

X

Example

• We say that we are 95% confident that the unknown mean height of American female students lies between 62.5 and 63.5.

• This is shorthand for saying “we arrived at these numbers by a method that gives correct results 95% of the time.”

• It is incorrect to say that there is probability 0.95 that the unknown mean height of American female students lies between 62.5 and 63.5


• Recall that the sampling distribution of the sample mean is, for large enough sample sizes, always at least approximately normal regardless of the actual probability distribution.

• Suppose we choose an SRS of size n from a population with unknown mean µ and standard deviation σ. A level C confidence interval for µ is

nzx

*


• Here, z* is the value on the standard normal curve with area C between –z* and z*.

• The confidence interval will be exact when the population distribution is normal, and thanks to the Central Limit Theorem, it will be approximately correct for large n in other cases.

Example

• Assume that the helium porosity (in percentage) of coal samples taken from any particular seam is normally distributed with true standard deviation σ = 0.75– Compute a 90% confidence interval for the true

average porosity of a certain seam if the average porosity for 20 specimens from the seam was 4.85

– Compute a 95% confidence interval for the true average porosity of that same seam using the information above.


• Generally speaking, the margin of error is determined by the choice of C for the confidence interval.

• High confidence and small margin of error are desirable.

• High confidence – method almost always gives correct answers.

• Small margin of error – parameter is pinned down quite precisely.


• Suppose you calculate a margin of error and decide that it is too large.

• How to reduce it:– Use a lower level of confidence (smaller C)– Increase the sample size (larger n)– Reduce σ

• In our last example, how would the 95% confidence interval change if our sample consisted of 200 specimens instead of 20?


• The confidence interval for a population mean will have a specified margin of error m when the sample size is

• In surveys for determining proportions, this tends to explain why for a survey sample of about 1000 people gives a margin of error of approximately .03

2*

m

zn


• Remember:– Data must be an SRS from the population.– Formula is incorrect for complex sampling designs.– No correct method for inference from data

haphazardly collect with unknown bias.– Outliers can have a large effect on the interval.– For small sample size and non-normal populations,

the true confidence level is different from the value C.– Standard deviation σ must be known.– Margin of error covers only random sampling errors.

stat 111 introductory statistics lecture 9: inference and estimation june 2, 2004

Documents

unknown population parameter

unknown parameter

single value

inference problem

good point estimator

parameter estimationin

theoretical population

particular sample