inferential statistic paradigm of statisticscourses.washington.edu/b517/lectures/l11.pdf•...
TRANSCRIPT
Lecture 11
1
BIOST 514/517Biostatistics I / Applied Biostatistics I
Kathleen Kerr, Ph.D.Associate Professor of Biostatistics
University of Washington
Lecture 11:Properties of Estimates;
Confidence Intervals;Standard Errors; Inference for proportions
November 8 and 13, 2013 22
Lecture Outline
• Properties of Estimates (Inferential Statistics)– variability– bias– mean squared error– consistency– efficiency
• Confidence Intervals for Population Parameters• Confidence Intervals for Proportion
– Asymptotic vs. Exact methods• Estimating Standard Errors• Comparing proportions: risk difference, odds ratio;
33
Inferential Statistic
• An inferential statistic or an estimate is computed on a sample and used to estimate a population parameter– sample mean used to estimate population mean– sample median used to estimate population median– proportion of a sample that is hypertensive used to
estimate the proportion of the population that is hypertensive
– etc
Paradigm of Statistics
• Population parameters are real but unknown numbers• Inferential statistics computed on samples are used to
estimate population parameters• We don’t expect to exactly estimate a population
parameter. We use statistical theory to understand the error in our estimate.
44
Lecture 11
2
Error of Estimates
There are two kinds of error that estimates can have:1. Variability2. Bias
Another way to say this is: there are two desirable properties of estimates:
1. Precision2. Accuracy
556
“target”the true value of the population parameter
bias: measures accuracy
variability:measures precision
7
Estimates and Error
We like estimates that are precise and accurate. Said differently, we like estimates that have low variability and little or no bias.
8
Who is the bestplayer ?
Lecture 11
3
9
sample statistic valuestrue parameter value
This statistic showslow (actually no) biasand low variability,
i.e. high precision
Statistic A
sample statistic values
true parameter value
and high variability, i.e. low precision
low (actually no) biasThis statistic shows
Statistic B
sample statistic
true parameter value
This statistic shows
and low variability, i.e. high precision
high bias
values
Statistic C
sample statistic values
true parameter value
Statistic Dhigh biasand high variability,
i.e. low precision
This statistic shows
Estimates and Error
• In the preceding slide, the distributions represent the sampling distribution of the statistic
• Most of the statistics we use are unbiased – the expected value of the sampling distribution is the true value of the population parameter
• Unbiasedness is desirable but not an absolutely necessary property of good estimates
10
11
Example:• Estimator of population mean µ:
Sample mean:
– Expected value:
Unbiased estimator for the population mean.
– Variance:
Precision increases with the sample size.
– Standard Error:
n
jjX
nX
1
1
][XE
nXVar /][ 2
nXVar
][
Mean Squared Error
For an estimator T of a population parameter ϴ, the mean squared error of T is E[(T- ϴ )2]
MSE is related to the bias and variability of T. Specifically, MSE(T) = var(T) + Bias2(T)
1212
Lecture 11
4
Consistency
Good estimators are consistent. Roughly, this means that as the sample increases, the sampling distribution of the estimators becomes more concentrated around the true value of the population parameter.
• There are different precise mathematical definitions of “becomes more concentrated” corresponding to notions of weak and strong consistency.
• For example, an unbiased estimator whose variance decreases as n increases is consistent.– E.g., the mean
1313
Consistency
An estimator can be biased and still be consistent.
1414
Consistency
An estimator can be biased and still be consistent.• Some estimates of the variance are biased yet
consistent (depends whether we divide by n or n-1)
1515
Efficiency
A desirable property of an estimator is that it is efficient. Roughly, this means that it uses as much of the information in the data as possible.– We won’t get into the exact technical definitions of
efficiency• For example, for a sample X1, X2, … Xn, suppose we
want to estimate the mean.• is unbiased and consistent.
• is also unbiased and consistent. But it is less efficient than
1616
nX
oddXnX
Lecture 11
5
Efficiency
Suppose we know our variable is Normally distributed in the population. Then the mean and the median are the same parameter μ. The sample mean and the sample median are both unbiased estimates of μ.
However, the median is about 64% as efficient in estimating μ as the mean. Estimating μ using the median is like throwing out a random 1/3 of your data and then using the mean.
17171818
Confidence Intervals
1919
Confidence Intervals
Confidence Intervals answer questions of the following sort:• For what values of the population parameter are the data
fairly “typical”? • For what values of the population parameter are the data
consistent?
2020
Confidence Intervals
“Fairly typical” is defined with respect to the sampling distribution:
• Not in the upper extreme of the sampling distribution or,• Not in the lower extreme of the sampling distribution or,• In neither of the tails of the sampling distribution.
Lecture 11
6
Confidence Intervals: Thought Exercise
We want to estimate the mean LDL cholesterol level among senior citizens. The mean in our sample of 50 senior citizens is 132 mg/dL.
Ask: If the true mean in the population were # mg/dL, would a sample mean of 132 be surprising?
132
22
Ask: If the true mean in the population were 130 mg/dL, would a sample mean of 132 be surprising?
132
sampling distribution of the mean of a sample of size 50 when the population mean is 130
Ans: No. If the true mean in the population were 130 mg/dL, a sample mean of 132 would not be surprising.
23
Ask: If the true mean in the population were 140 mg/dL, would a sample mean of 132 be surprising?
132
sampling distribution of the mean of a sample of size
50 when the population mean
is 140
Ans: No. If the true mean in the population were 140 mg/dL, a sample mean of 132 would not be verysurprising. 24
Ask: If the true mean in the population were 100 mg/dL, would a sample mean of 132 be surprising?
132
Ans: Yes. If the true mean in the population were 100 mg/dL, a sample mean of 132 would be verysurprising.
sampling distribution of the mean of a sample of size
50 when the population mean
is 100
Lecture 11
7
Confidence Intervals: Thought ExerciseWe want to estimate the mean LDL cholesterol level
among senior citizens. The mean in our sample of 50 senior citizens is 132 mg/dL.
In this example, we might report that “with 95% confidence, the mean cholesterol level among senior citizens is between 122 and 142”
or
“The data are consistent with a mean cholesterol level among senior citizens between 122 and 142” 26
Now the math: Confidence Interval for the Population Mean
• Question: When we do not know the population mean, how can we use the sample to estimate the population mean, and use our knowledge of probability to give a range of values consistent with the data?
• Parameter:
• Estimator: X
27
• Parameter: • Estimator:
• Given adequate sample size, using the CLT, we can state:
X
95.096.1/
96.1
n
XP
n
NX ,~
28
95.096.1/
96.1
n
XP
95.0/96.1/96.1 nXnP
95.0/96.1/96.1 nXnXP
95.0/96.1/96.1 nXnXP
Rearranging:
nn /96.1X,/96.1X
The 95% confidence interval for is:
Lecture 11
8
Interpretation
29
Correct: If we repeat the procedure of taking a sample of the same size and constructing a 95% confidence interval on the sample, about 95% of those confidence intervals will contain the true value.
Incorrect: “There is a 95% chance that the 95% confidence interval contains the true value”
30
Simulation Study: Confidence intervals
Simulation of 100 data sets: 95% CIs were computed for each data set. The true parameter value is in purple – 95% of these intervals contain the true value!
31
Confidence Interval: Population Mean• We showed that the 95% confidence interval for is:
• Want a 100(1-α) % confidence interval (note: α is in the interval (0,1)].
Develop:
100(1-α) % confidence interval for is
nn /96.1X,/96.1X
1
/ 22 1znXzP
n
zXn
zX
22 11 ,32
Confidence Interval: Population Mean• A 100(1-α) % confidence interval for is
• This formula requires knowledge of the population variance (2). In practice, we do not know the population variance.
n
zXn
zX
22 11 ,
Lecture 11
9
33
Confidence Interval: Population Mean• Usually, the population variance is unknown.
We can estimate it with s2 (the sample variance)!
• The statistic
has a t-distribution with n-1 degrees of freedom. We can use this distribution to obtain a confidence interval for when is not known.
2
1
2
11
n
jj XX
ns
nsXT
/
34
Normal and t distributions
35
Confidence Interval: Population Mean• A 100(1-α) % confidence interval for when the population
variance is unknown is given by
nstX
nstX nn 22 1,11,1 ,
Critical value (quantile) in the t-distribution with (n-1) degrees of freedom.
3636
t-distribution and confidence Intervals
• Whenever we make a confidence interval for the mean of a continuous variable, we must also estimate the population variance.
• So technically, we should make confidence intervals with the t-distribution instead of the Normal distribution.
• A t-distribution is centered at 0 and has heavier tails than a Normal distribution. A t-distribution is parameterized by its degrees of freedom.
• Often, we are OK to gloss over this detail, since a t-distribution is very close to Normal for large degrees of freedom.
Lecture 11
10
t-based critical values
df (n-1) Critical value for 95% CI10 2.2220 2.0950 2.01100 1.98200 1.97300 1.967… …
Normal 1.96
38
Confidence Interval: General form
)ˆ oferr (std valuecritical 2 - 1ˆ:for interval confidence%)-(1100
The standard error of the estimator is ]ˆ[Var
To find the critical value we need to know the sampling distribution of the estimator.
Estimate
For many parameters that we estimate, we make confidence intervals via analogous methods.
Parameter of interest
3939
Standard errors
• We have seen that the standard error of the mean is SD(X)/√n. It can be estimated by s/√ n.
• We need standard errors for other estimates of other parameters.
Example: Difference in Means
• Suppose we have n1 observations in group 1 with standard deviation s1 and n2 observations in group 2 with standard deviation s2.
• There are two widely used estimates for the standard error of the difference in the two group means.
Lecture 11
11
Example: Difference in Means
• SEequal estimates the standard error under the assumption that Groups 1 and 2 have the same variance.– Sometimes called a “pooled” variance estimate
• SEunequal estimates the standard error without assuming that Groups 1 and 2 have the same variance.– This is the one we have seen already.– This is the one you should know (be able to derive).– Since we rarely know that the true population
variances are the same in the two groups, it makes sense to use this one.
Example: Difference in Means
• SEunequal estimates the standard error without assuming that Groups 1 and 2 have the same variance.– The Central Limit Theorem tells us that as long as n1
and n2 are not too small, will have approximately a Normal distribution centered at the true difference in population means. The standard deviation of this sampling distribution is consistently estimated with SEunequal.
– We can make confidence intervals using critical values from the Normal distribution or, when n’s are small, a t distribution
• Which t-distribution? There are two ways of calculating the degrees of freedom: Satterwaite and Welch. Extremely technical and uninteresting. Let your software do it.
21 XX
Example: Proportions
• A population proportion is often a parameter of interest. Since a proportion is also a mean, we could use what we already know.
• However, there is a mean-variance relationship for binary variables. For a binary variable with true population mean p, the true population standard deviation is √[p(1-p)]. It is conventional to estimate the standard error of our estimate of p using this formula rather than from s.
Example: Proportions
• FEV data. Estimating the proportion of kids who smoke.
. gen is_smoker= smoke==1
. tab is_smoker
is_smoker |Freq. Percent Cum.
------------+----------------------------
0 | 589 90.06 90.06
1 | 65 9.94 100.00
------------+-----------------------------
Total | 654 100.00
Lecture 11
12
Example: Proportions
• FEV data. Estimating the proportion of kids who smoke. One option is to treat is_smoker like a continuous variable, and use s to estimate the standard error of our estimate.
. ci is_smoker
Variable |Obs Mean Std. Err. [95% Conf. Interval]
----------+------------------------------------------------
is_smoker |654 .0993884 .0117079 .0763987 .1223781
Example: Proportions
• FEV data. Estimating the proportion of kids who smoke. A 2nd option is acknowledge that is_smoker is a binary variable and use the mean-variance relationship to estimate the standard error of our estimate.
. ci is_smoker, binomial wald
-- Binomial Wald ---
Variable | Obs Mean Std. Err. [95% Conf. Interval]
----------+-----------------------------------------------
is_smoker | 654 .0993884 .011699 .0764588 .1223179
4747
Exact Distribution
• Here, we do not have to rely on asymptotic theory
• A binary variable must be Bernoulli
• Sums of independent Bernoulli random variables must be binomial
• We can use the exact binomial distribution to compute our probabilities– (Well, computers can)
4848
Binomial Distribution
• Probability theory provides a formula for the distribution of binomial random variables
knk
n
ini
n
ppknk
nkYnk
pnBXXXY
pBXX
)1()!(!
!)(Pr:,,1,0For
,~
,1~ ,, Data
11
iid
1
Lecture 11
13
4949
Exact Point Estimate
• Still use the sample mean
n
i
ni
iin
nXXX
nXp
ppXVarpXEpBXX
1
1
iid
1
1ˆ:estimatePoint
1,1~ ,, Data
5050
Exact Confidence Intervals
• Use the binomial distribution– (But let a computer do it for you)
2/ˆ1ˆ)!(!
!ˆ;Pr
2/ˆ1ˆ)!(!
!ˆ;Pr
find toused issearch iterativean whereˆ,ˆ is n observatioon based
for interval confidence )%-00(11exact An
0
in
LiL
n
kiL
inU
iU
k
iU
UL
ppini
npkY
ppini
npkY
ppkYp
Example: Proportions
• FEV data. Estimating the proportion of kids who smoke. 3rd option: binomial exact confidence interval
. ci is_smoker, binomial
-- Binomial Exact --
Variable |Obs Mean Std. Err. [95% Conf. Interval]
---------+----------------------------------------------
is_smoker|654 .0993884 .011699 .0775451 .124923
Example: Proportions
• 1st option would be an unusual choice in practice.– It is valid, but should be better to use a mean-
variance relationship when we have it.• 2nd option is commonly used• 3rd option is commonly used. Exact confidence intervals
are better when they are possible because we avoid making a distributional approximation. Option 3 is the preferred method, but will be similar to Option 2 unless np or n(1-p) is small.– Exact binomial confidence intervals are the default in
STATA when the “binomial” option is used, but not in other software.
Lecture 11
14
Proportions: 0 events in n trials
• 2-sided confidence intervals fail in case where there are either 0 or n events observed in n Bernoulli trials
• However, we can derive one-sided confidence bounds in these cases.
Proportions: 0 events in n trials: Upper Confidence Bound
• Exact upper confidence bound when there are 0 “successes” or “events” in n trials
nU
nUU
U
p
ppY
pp
YpnBY
/11ˆ
ˆ1ˆ;0Pr
ˆ is for bound confidenceupper )%-00(11Exact
observed is 0 and ),(~ Suppose
Large sample approximation
n
pn
ppp
pnp
U
UUU
Un
U
logˆ largefor so
ˆˆ1log ˆ smallFor
logˆ1logˆ1
Large sample approximation
• “Three over n rule”– log(.05) = -2.9957– So for 0 events in n trials upper confidence bound is
approximately 3/n
• 99% upper confidence bound– log(0.01)= -4.605– Use 4.6/n as 99% upper confidence bound
Lecture 11
15
5757
Approximation vs Exact
• When X=0 events observed in n Bernoulli trials95% bound 99% bound
n Exact 3/n Exact 4.6/n
2 .7764 1.50 .9000 2.3000
5 .4507 .60 .6019 .9200
10 .2589 .30 .3690 .4600
20 .1391 .15 .2057 .2300
30 .0950 .10 .1423 .1533
50 .0582 .06 .0880 .0920
100 .0295 .03 .0450 .0460
• Impress your friends! Compute confidence intervals during an elevator ride!
n events in n trials
• We can also use the “three over n rule” to find the lower confidence bound for p when every trial of n trials has an event– Lower 95% confidence bound is 1 - 3/n
5858
Comparing proportions: risk difference
• Sometimes we are interested in comparing rates across groups.
• For example, in the FEV data we might be interested in comparing smoking rates for boys and girls.
• We estimate the proportion of girls and boys who smoke with their corresponding sample proportions. We estimate the difference in smoking rates with the difference in sample proportions, which has standard error:
• In large samples, CIs can be computed from Normal approximation. 5959
Comparing proportions: risk difference
Boys GirlsSmoker 26 39
Not Smoker 310 279
Total 336 318
6060
• In the sample, 12.3% of girls smoke and 7.8% of boys.• We estimate the difference in smoking rates between
girls and boys is 4.5% with 95% confidence interval -0.1% to 9.1%.
Lecture 11
16
Comparing proportions
• There are other ways to compare two proportions– Relative risk (risk ratio)– odds ratio
• These provide examples where we make CIs using transformations.
6161
Transformation to improve CLT: OR
• For some statistical summaries, it is standard to calculate standard errors and confidence intervals for some transformation of the summary. One example is the odds ratio.
• The odds ratio can be shown to be ad/bc (HW6)• Its standard error is estimated by
6262
Exposed UnexposedDisease a b
Not Disease c d
Transformation to improve CLT: OR
• We usually work with the logarithm of the odds ratio, whose standard error is estimated by
• Although the sampling distributions for both OR and log(OR) approach Normal distributions as the sample size increases, it happens faster for log(OR)
• This means that for a given sample size, a CI for the log(OR) is more reliable than for the OR.
6363
Transformation to improve CLT: OR
• A 95% confidence interval for log(OR) is
• Since 95% of sample log odds ratios are in this interval, then 95% of sample odds ratios are in the exponentiatedinterval
This confidence interval will not be symmetric around the point estimate• Similar results hold for the relative risk (RR). Best to make
confidence intervals for the log(RR) and exponentiate. 6464
Lecture 11
17
STATA. csi 26 39 310 279
| Exposed Unexposed | Total-----------------+------------------------+------------
Cases | 26 39 | 65Noncases | 310 279 | 589
-----------------+------------------------+------------Total | 336 318 | 654
| |Risk | .077381 .1226415 | .0993884
| || Point estimate | [95% Conf. Interval]|------------------------+------------------------
Risk difference | -.0452606 | -.0912611 .00074 Risk ratio | .6309524 | .3935794 1.011488
Prev. frac. ex. | .3690476 | -.011488 .6064206 Prev. frac. pop | .1896024 |
+-------------------------------------------------chi2(1) = 3.74 Pr>chi2 = 0.0532
STATA. csi 26 39 310 279, or
| Exposed Unexposed | Total-----------------+------------------------+------------
Cases | 26 39 | 65Noncases | 310 279 | 589
-----------------+------------------------+------------Total | 336 318 | 654
| |Risk | .077381 .1226415 | .0993884
| || Point estimate | [95% Conf. Interval]|------------------------+------------------------
Risk difference | -.0452606 | -.0912611 .00074 Risk ratio | .6309524 | .3935794 1.011488
Prev. frac. ex. | .3690476 | -.011488 .6064206 Prev. frac. pop | .1896024 |
Odds ratio | .6 | .3576171 1.006864 (Cornfield)+-------------------------------------------------
chi2(1) = 3.74 Pr>chi2 = 0.0532