inferential statistic paradigm of statisticscourses.washington.edu/b517/lectures/l11.pdf•...

Lecture 11

1

BIOST 514/517Biostatistics I / Applied Biostatistics I

Kathleen Kerr, Ph.D.Associate Professor of Biostatistics

University of Washington

Lecture 11:Properties of Estimates;

Confidence Intervals;Standard Errors; Inference for proportions

November 8 and 13, 2013 22

Lecture Outline

• Properties of Estimates (Inferential Statistics)– variability– bias– mean squared error– consistency– efficiency

• Confidence Intervals for Population Parameters• Confidence Intervals for Proportion

– Asymptotic vs. Exact methods• Estimating Standard Errors• Comparing proportions: risk difference, odds ratio;

33

Inferential Statistic

• An inferential statistic or an estimate is computed on a sample and used to estimate a population parameter– sample mean used to estimate population mean– sample median used to estimate population median– proportion of a sample that is hypertensive used to

estimate the proportion of the population that is hypertensive

– etc

Paradigm of Statistics

• Population parameters are real but unknown numbers• Inferential statistics computed on samples are used to

estimate population parameters• We don’t expect to exactly estimate a population

parameter. We use statistical theory to understand the error in our estimate.

44

Lecture 11

2

Error of Estimates

There are two kinds of error that estimates can have:1. Variability2. Bias

Another way to say this is: there are two desirable properties of estimates:

1. Precision2. Accuracy

556

“target”the true value of the population parameter

bias: measures accuracy

variability:measures precision

7

Estimates and Error

We like estimates that are precise and accurate. Said differently, we like estimates that have low variability and little or no bias.

8

Who is the bestplayer ?

Lecture 11

3

9

sample statistic valuestrue parameter value

This statistic showslow (actually no) biasand low variability,

i.e. high precision

Statistic A

sample statistic values

true parameter value

and high variability, i.e. low precision

low (actually no) biasThis statistic shows

Statistic B

sample statistic


This statistic shows

and low variability, i.e. high precision

high bias

values

Statistic C

sample statistic values


Statistic Dhigh biasand high variability,

i.e. low precision

This statistic shows

Estimates and Error

• In the preceding slide, the distributions represent the sampling distribution of the statistic

• Most of the statistics we use are unbiased – the expected value of the sampling distribution is the true value of the population parameter

• Unbiasedness is desirable but not an absolutely necessary property of good estimates

10

11

Example:• Estimator of population mean µ:

Sample mean:

– Expected value:

Unbiased estimator for the population mean.

– Variance:

Precision increases with the sample size.

– Standard Error:

n

jjX

nX

1

1

][XE

nXVar /][ 2

nXVar

][

Mean Squared Error

For an estimator T of a population parameter ϴ, the mean squared error of T is E[(T- ϴ )2]

MSE is related to the bias and variability of T. Specifically, MSE(T) = var(T) + Bias2(T)

1212

Lecture 11

4

Consistency

Good estimators are consistent. Roughly, this means that as the sample increases, the sampling distribution of the estimators becomes more concentrated around the true value of the population parameter.

• There are different precise mathematical definitions of “becomes more concentrated” corresponding to notions of weak and strong consistency.

• For example, an unbiased estimator whose variance decreases as n increases is consistent.– E.g., the mean

1313

Consistency

An estimator can be biased and still be consistent.

1414

Consistency

An estimator can be biased and still be consistent.• Some estimates of the variance are biased yet

consistent (depends whether we divide by n or n-1)

1515

Efficiency

A desirable property of an estimator is that it is efficient. Roughly, this means that it uses as much of the information in the data as possible.– We won’t get into the exact technical definitions of

efficiency• For example, for a sample X1, X2, … Xn, suppose we

want to estimate the mean.• is unbiased and consistent.

• is also unbiased and consistent. But it is less efficient than

1616

nX

oddXnX

Lecture 11

5

Efficiency

Suppose we know our variable is Normally distributed in the population. Then the mean and the median are the same parameter μ. The sample mean and the sample median are both unbiased estimates of μ.

However, the median is about 64% as efficient in estimating μ as the mean. Estimating μ using the median is like throwing out a random 1/3 of your data and then using the mean.

17171818

Confidence Intervals

1919


Confidence Intervals answer questions of the following sort:• For what values of the population parameter are the data

fairly “typical”? • For what values of the population parameter are the data

consistent?

2020


“Fairly typical” is defined with respect to the sampling distribution:

• Not in the upper extreme of the sampling distribution or,• Not in the lower extreme of the sampling distribution or,• In neither of the tails of the sampling distribution.

Lecture 11

6

Confidence Intervals: Thought Exercise

We want to estimate the mean LDL cholesterol level among senior citizens. The mean in our sample of 50 senior citizens is 132 mg/dL.

Ask: If the true mean in the population were # mg/dL, would a sample mean of 132 be surprising?

132

22

Ask: If the true mean in the population were 130 mg/dL, would a sample mean of 132 be surprising?

132

sampling distribution of the mean of a sample of size 50 when the population mean is 130

Ans: No. If the true mean in the population were 130 mg/dL, a sample mean of 132 would not be surprising.

23


132

sampling distribution of the mean of a sample of size

50 when the population mean

is 140

Ans: No. If the true mean in the population were 140 mg/dL, a sample mean of 132 would not be verysurprising. 24


132

Ans: Yes. If the true mean in the population were 100 mg/dL, a sample mean of 132 would be verysurprising.

sampling distribution of the mean of a sample of size

50 when the population mean

is 100

Lecture 11

7

Confidence Intervals: Thought ExerciseWe want to estimate the mean LDL cholesterol level

among senior citizens. The mean in our sample of 50 senior citizens is 132 mg/dL.

In this example, we might report that “with 95% confidence, the mean cholesterol level among senior citizens is between 122 and 142”

or

“The data are consistent with a mean cholesterol level among senior citizens between 122 and 142” 26

Now the math: Confidence Interval for the Population Mean

• Question: When we do not know the population mean, how can we use the sample to estimate the population mean, and use our knowledge of probability to give a range of values consistent with the data?

• Parameter:

• Estimator: X

27

• Parameter: • Estimator:

• Given adequate sample size, using the CLT, we can state:

X

95.096.1/

96.1

n

XP

n

NX ,~

28

95.096.1/

96.1

n

XP

95.0/96.1/96.1 nXnP

95.0/96.1/96.1 nXnXP

95.0/96.1/96.1 nXnXP

Rearranging:

nn /96.1X,/96.1X

The 95% confidence interval for is:

Lecture 11

8

Interpretation

29

Correct: If we repeat the procedure of taking a sample of the same size and constructing a 95% confidence interval on the sample, about 95% of those confidence intervals will contain the true value.

Incorrect: “There is a 95% chance that the 95% confidence interval contains the true value”

30

Simulation Study: Confidence intervals

Simulation of 100 data sets: 95% CIs were computed for each data set. The true parameter value is in purple – 95% of these intervals contain the true value!

31

Confidence Interval: Population Mean• We showed that the 95% confidence interval for is:

• Want a 100(1-α) % confidence interval (note: α is in the interval (0,1)].

Develop:

100(1-α) % confidence interval for is

nn /96.1X,/96.1X

1

/ 22 1znXzP

n

zXn

zX

22 11 ,32

Confidence Interval: Population Mean• A 100(1-α) % confidence interval for is

• This formula requires knowledge of the population variance (2). In practice, we do not know the population variance.

n

zXn

zX

22 11 ,

Lecture 11

9

33

Confidence Interval: Population Mean• Usually, the population variance is unknown.

We can estimate it with s2 (the sample variance)!

• The statistic

has a t-distribution with n-1 degrees of freedom. We can use this distribution to obtain a confidence interval for when is not known.

2

1

2

11

n

jj XX

ns

nsXT

/

34

Normal and t distributions

35

Confidence Interval: Population Mean• A 100(1-α) % confidence interval for when the population

variance is unknown is given by

nstX

nstX nn 22 1,11,1 ,

Critical value (quantile) in the t-distribution with (n-1) degrees of freedom.

3636

t-distribution and confidence Intervals

• Whenever we make a confidence interval for the mean of a continuous variable, we must also estimate the population variance.

• So technically, we should make confidence intervals with the t-distribution instead of the Normal distribution.

• A t-distribution is centered at 0 and has heavier tails than a Normal distribution. A t-distribution is parameterized by its degrees of freedom.

• Often, we are OK to gloss over this detail, since a t-distribution is very close to Normal for large degrees of freedom.

Lecture 11

10

t-based critical values

df (n-1) Critical value for 95% CI10 2.2220 2.0950 2.01100 1.98200 1.97300 1.967… …

Normal 1.96

38

Confidence Interval: General form

)ˆ oferr (std valuecritical 2 - 1ˆ:for interval confidence%)-(1100

The standard error of the estimator is ]ˆ[Var

To find the critical value we need to know the sampling distribution of the estimator.

Estimate

For many parameters that we estimate, we make confidence intervals via analogous methods.

Parameter of interest

3939

Standard errors

• We have seen that the standard error of the mean is SD(X)/√n. It can be estimated by s/√ n.

• We need standard errors for other estimates of other parameters.

Example: Difference in Means

• Suppose we have n1 observations in group 1 with standard deviation s1 and n2 observations in group 2 with standard deviation s2.

• There are two widely used estimates for the standard error of the difference in the two group means.

Lecture 11

11


• SEequal estimates the standard error under the assumption that Groups 1 and 2 have the same variance.– Sometimes called a “pooled” variance estimate

• SEunequal estimates the standard error without assuming that Groups 1 and 2 have the same variance.– This is the one we have seen already.– This is the one you should know (be able to derive).– Since we rarely know that the true population

variances are the same in the two groups, it makes sense to use this one.


• SEunequal estimates the standard error without assuming that Groups 1 and 2 have the same variance.– The Central Limit Theorem tells us that as long as n1

and n2 are not too small, will have approximately a Normal distribution centered at the true difference in population means. The standard deviation of this sampling distribution is consistently estimated with SEunequal.

– We can make confidence intervals using critical values from the Normal distribution or, when n’s are small, a t distribution

• Which t-distribution? There are two ways of calculating the degrees of freedom: Satterwaite and Welch. Extremely technical and uninteresting. Let your software do it.

21 XX

Example: Proportions

• A population proportion is often a parameter of interest. Since a proportion is also a mean, we could use what we already know.

• However, there is a mean-variance relationship for binary variables. For a binary variable with true population mean p, the true population standard deviation is √[p(1-p)]. It is conventional to estimate the standard error of our estimate of p using this formula rather than from s.


• FEV data. Estimating the proportion of kids who smoke.

. gen is_smoker= smoke==1

. tab is_smoker

is_smoker |Freq. Percent Cum.

------------+----------------------------

0 | 589 90.06 90.06

1 | 65 9.94 100.00

------------+-----------------------------

Total | 654 100.00

Lecture 11

12


• FEV data. Estimating the proportion of kids who smoke. One option is to treat is_smoker like a continuous variable, and use s to estimate the standard error of our estimate.

. ci is_smoker

Variable |Obs Mean Std. Err. [95% Conf. Interval]

----------+------------------------------------------------

is_smoker |654 .0993884 .0117079 .0763987 .1223781


• FEV data. Estimating the proportion of kids who smoke. A 2nd option is acknowledge that is_smoker is a binary variable and use the mean-variance relationship to estimate the standard error of our estimate.

. ci is_smoker, binomial wald

-- Binomial Wald ---

Variable | Obs Mean Std. Err. [95% Conf. Interval]

----------+-----------------------------------------------

is_smoker | 654 .0993884 .011699 .0764588 .1223179

4747

Exact Distribution

• Here, we do not have to rely on asymptotic theory

• A binary variable must be Bernoulli

• Sums of independent Bernoulli random variables must be binomial

• We can use the exact binomial distribution to compute our probabilities– (Well, computers can)

4848

Binomial Distribution

• Probability theory provides a formula for the distribution of binomial random variables

knk

n

ini

n

ppknk

nkYnk

pnBXXXY

pBXX

)1()!(!

!)(Pr:,,1,0For

,~

,1~ ,, Data

11

iid

1

Lecture 11

13

4949

Exact Point Estimate

• Still use the sample mean

n

i

ni

iin

nXXX

nXp

ppXVarpXEpBXX

1

1

iid

1

1ˆ:estimatePoint

1,1~ ,, Data

5050

Exact Confidence Intervals

• Use the binomial distribution– (But let a computer do it for you)

2/ˆ1ˆ)!(!

!ˆ;Pr

2/ˆ1ˆ)!(!

!ˆ;Pr

find toused issearch iterativean whereˆ,ˆ is n observatioon based

for interval confidence )%-00(11exact An

0

in

LiL

n

kiL

inU

iU

k

iU

UL

ppini

npkY

ppini

npkY

ppkYp


• FEV data. Estimating the proportion of kids who smoke. 3rd option: binomial exact confidence interval

. ci is_smoker, binomial

-- Binomial Exact --

Variable |Obs Mean Std. Err. [95% Conf. Interval]

---------+----------------------------------------------

is_smoker|654 .0993884 .011699 .0775451 .124923


• 1st option would be an unusual choice in practice.– It is valid, but should be better to use a mean-

variance relationship when we have it.• 2nd option is commonly used• 3rd option is commonly used. Exact confidence intervals

are better when they are possible because we avoid making a distributional approximation. Option 3 is the preferred method, but will be similar to Option 2 unless np or n(1-p) is small.– Exact binomial confidence intervals are the default in

STATA when the “binomial” option is used, but not in other software.

Lecture 11

14

Proportions: 0 events in n trials

• 2-sided confidence intervals fail in case where there are either 0 or n events observed in n Bernoulli trials

• However, we can derive one-sided confidence bounds in these cases.

Proportions: 0 events in n trials: Upper Confidence Bound

• Exact upper confidence bound when there are 0 “successes” or “events” in n trials

nU

nUU

U

p

ppY

pp

YpnBY

/11ˆ

ˆ1ˆ;0Pr

ˆ is for bound confidenceupper )%-00(11Exact

observed is 0 and ),(~ Suppose

Large sample approximation

n

pn

ppp

pnp

U

UUU

Un

U

logˆ largefor so

ˆˆ1log ˆ smallFor

logˆ1logˆ1

Large sample approximation

• “Three over n rule”– log(.05) = -2.9957– So for 0 events in n trials upper confidence bound is

approximately 3/n

• 99% upper confidence bound– log(0.01)= -4.605– Use 4.6/n as 99% upper confidence bound

Lecture 11

15

5757

Approximation vs Exact

• When X=0 events observed in n Bernoulli trials95% bound 99% bound

n Exact 3/n Exact 4.6/n

2 .7764 1.50 .9000 2.3000

5 .4507 .60 .6019 .9200

10 .2589 .30 .3690 .4600

20 .1391 .15 .2057 .2300

30 .0950 .10 .1423 .1533

50 .0582 .06 .0880 .0920

100 .0295 .03 .0450 .0460

• Impress your friends! Compute confidence intervals during an elevator ride!

n events in n trials

• We can also use the “three over n rule” to find the lower confidence bound for p when every trial of n trials has an event– Lower 95% confidence bound is 1 - 3/n

5858

Comparing proportions: risk difference

• Sometimes we are interested in comparing rates across groups.

• For example, in the FEV data we might be interested in comparing smoking rates for boys and girls.

• We estimate the proportion of girls and boys who smoke with their corresponding sample proportions. We estimate the difference in smoking rates with the difference in sample proportions, which has standard error:

• In large samples, CIs can be computed from Normal approximation. 5959

Comparing proportions: risk difference

Boys GirlsSmoker 26 39

Not Smoker 310 279

Total 336 318

6060

• In the sample, 12.3% of girls smoke and 7.8% of boys.• We estimate the difference in smoking rates between

girls and boys is 4.5% with 95% confidence interval -0.1% to 9.1%.

Lecture 11

16

Comparing proportions

• There are other ways to compare two proportions– Relative risk (risk ratio)– odds ratio

• These provide examples where we make CIs using transformations.

6161

Transformation to improve CLT: OR

• For some statistical summaries, it is standard to calculate standard errors and confidence intervals for some transformation of the summary. One example is the odds ratio.

• The odds ratio can be shown to be ad/bc (HW6)• Its standard error is estimated by

6262

Exposed UnexposedDisease a b

Not Disease c d


• We usually work with the logarithm of the odds ratio, whose standard error is estimated by

• Although the sampling distributions for both OR and log(OR) approach Normal distributions as the sample size increases, it happens faster for log(OR)

• This means that for a given sample size, a CI for the log(OR) is more reliable than for the OR.

6363


• A 95% confidence interval for log(OR) is

• Since 95% of sample log odds ratios are in this interval, then 95% of sample odds ratios are in the exponentiatedinterval

This confidence interval will not be symmetric around the point estimate• Similar results hold for the relative risk (RR). Best to make

confidence intervals for the log(RR) and exponentiate. 6464

Lecture 11

17

STATA. csi 26 39 310 279

| Exposed Unexposed | Total-----------------+------------------------+------------

Cases | 26 39 | 65Noncases | 310 279 | 589

-----------------+------------------------+------------Total | 336 318 | 654

| |Risk | .077381 .1226415 | .0993884

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Risk difference | -.0452606 | -.0912611 .00074 Risk ratio | .6309524 | .3935794 1.011488

Prev. frac. ex. | .3690476 | -.011488 .6064206 Prev. frac. pop | .1896024 |

+-------------------------------------------------chi2(1) = 3.74 Pr>chi2 = 0.0532

STATA. csi 26 39 310 279, or

| Exposed Unexposed | Total-----------------+------------------------+------------

Cases | 26 39 | 65Noncases | 310 279 | 589

-----------------+------------------------+------------Total | 336 318 | 654

| |Risk | .077381 .1226415 | .0993884

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Risk difference | -.0452606 | -.0912611 .00074 Risk ratio | .6309524 | .3935794 1.011488

Prev. frac. ex. | .3690476 | -.011488 .6064206 Prev. frac. pop | .1896024 |

Odds ratio | .6 | .3576171 1.006864 (Cornfield)+-------------------------------------------------

chi2(1) = 3.74 Pr>chi2 = 0.0532

inferential statistic paradigm of statisticscourses.washington.edu/b517/lectures/l11.pdf•...

Documents