Lecture 2
Statistical Inference
An example of data Mid-Heights of Parents ๐ฅ
Hei
gh
ts o
f A
du
lt C
hil
dre
n
below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum
above 5 3 2 4 14
73.2 3 4 3 2 2 3 17
72.2 1 4 4 11 4 9 7 1 41
71.2 2 11 18 20 7 4 2 64
70.2 5 4 19 21 25 14 10 1 99
69.2 1 2 7 13 38 48 33 18 5 2 167
68.2 1 7 14 28 34 20 12 3 1 120
67.2 2 5 11 17 38 31 27 3 4 138
66.2 2 5 11 17 36 25 17 1 3 117
65.2 1 1 7 2 15 16 4 1 1 48
64.2 4 4 5 5 14 11 16 59
63.2 2 4 9 3 5 7 1 1 32
62.2 1 3 3 7
below 1 1 1 1 1 5
sum 14 23 66 78 211 219 183 68 43 19 4 928
F. Galton:Regression towards mediocrity in hereditary stature, Anthropological Miscellanea (1886)
2
Multi-dimensional data and data matrix
2 dimensional data
No. father Adult child
1 68.2 63.3
2 71.8 72.5
3 64.4 69.2
โฏ โฏ โฏ
๐ ๐ฅ๐ ๐ฆ๐
โฏ โฏ โฏ
928 ๐ฅ928 ๐ฆ928
๐ฅ1 ๐ฅ2 โฏ ๐ฅ๐ โฏ ๐ฅ๐
1
2
โฎ
๐ ๐ฅ๐1 ๐ฅ๐2 โฏ ๐ฅ๐๐ โฏ ๐ฅ๐๐
โฎ
๐
p variables
nsa
mp
les
p dimensional data
๐๐ =
๐ฅ๐1๐ฅ๐2โฎ๐ฅ๐๐
๐-th data
data matrix
๐ = ๐1 ๐2โฏ ๐๐ โฏ๐๐๐
3
Mid-heights of Parents
Hei
gh
ts o
f A
du
lt C
hil
dre
n
below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum
above 5 3 2 4 14
73.2 3 4 3 2 2 3 17
72.2 1 4 4 11 4 9 7 1 41
71.2 2 11 18 20 7 4 2 64
70.2 5 4 19 21 25 14 10 1 99
69.2 1 2 7 13 38 48 33 18 5 2 167
68.2 1 7 14 28 34 20 12 3 1 120
67.2 2 5 11 17 38 31 27 3 4 138
66.2 2 5 11 17 36 25 17 1 3 117
65.2 1 1 7 2 15 16 4 1 1 48
64.2 4 4 5 5 14 11 16 59
63.2 2 4 9 3 5 7 1 1 32
62.2 1 3 3 7
below 1 1 1 1 1 5
sum 14 23 66 78 211 219 183 68 43 19 4 928
x
y
Joint distribution
4
Marginal distribution (Heights of adult children)
population
samples
statisticalinference
Inch below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above sum
Num. 5 7 32 59 48 117 138 120 167 99 64 41 17 14 928
Rel. Freq.
0.005 0.008 0.034 0.064 0.052 0.126 0.149 0.129 0.180 0.107 0.069 0.044 0.018 0.015 1.000
0.000
0.050
0.100
0.150
0.200
below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above
5
Point estimates of population parameters
The population distribution is our ultimate target. However, it is not reasonable to discuss the full
information about it. We will estimate some parameters of the
population distribution, e.g., mean value, variance, maximal value, correlation coefficient, ....
Point estimateLet ๐ be a parameter of the population distribution.We wish to find a function
๐ = ๐ ๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐such that ๐ is a reasonable estimate of the unknown parameter ๐ by using the sample data ๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐.
population distribution
samples
๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐
๐
statisticalinference
6
Unbiased estimates
Definition ๐ = ๐ ๐1, ๐2, โฏ , ๐๐ is called an
unbiased estimator of ๐ if ๐ ๐ = ๐.
population
samples๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐
Unknown parameter ๐
๐ = ๐ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐
Sample data ๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐ are definite values once a sampling is performed.
However, they vary with each sampling though the population remains the same.
Moreover, because each sample ๐ฅ๐ is a result of random sampling, it is modeled by a random variable ๐๐ obeying the population distribution.
Thus, the sample data ๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐ are considered as realized values of the independent random variables ๐1, ๐2, ๐3, โฏ , ๐๐
Therefore, ๐ should be considered as a random variable ๐ = ๐ ๐1, ๐2, โฏ , ๐๐ .
As usual, we adoptrandom sampling with replacement
7
Sample mean
For random samples ๐1, ๐2, ๐3, โฏ , ๐๐
the sample mean is defined by
Theorem The sample mean ๐ is an unbiased estimator of the mean value of the population:
where ๐ is the mean value of the population.
๐ =1
๐
๐=1
๐
๐๐
๐ ๐ = ๐
PROOF Since each ๐๐ is a random sample from the population, its distribution coincides with the population distribution. In particular, ๐ ๐๐ = ๐. Then,
๐ ๐ =1
๐
๐=1
๐
๐ ๐๐ = ๐
population mean ๐
sample 1: ๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐ โ ๐ฅ
sample 2: ๐ฆ1, ๐ฆ2, ๐ฆ3, โฏ , ๐ฆ๐ โ ๐ฆ
โฎ
๐ ๐ฅ ๐ฆ
8
Better estimates
1. We have seen that the sample mean
๐ =1
๐
๐=1
๐
๐๐
is an unbiased estimator: ๐ ๐ = ๐.
Let ๐1, ๐2, ๐3, โฏ , ๐๐ be random samples.
2. The weighted mean
๐ =
๐=1
๐
๐๐๐๐ ,
๐=1
๐
๐๐ = 1
is also an unbiased estimator.
๐ ๐ =
๐=1
๐
๐๐๐ ๐๐ = ๐
In fact,
3. Which is better?
Definition Let ๐1 and ๐2 be two unbiased
estimators of ๐, i.e., ๐ ๐1 = ๐ ๐2 = ๐.
We say that ๐1 is better than ๐2 if
๐ ๐1 โ ๐2
โค ๐ ๐2 โ ๐2.
In general, ๐ ๐ โ ๐2
= ๐ ๐ is called the
mean squared error.
9
Sample mean is better than weighted mean
For the weighted mean ๐ = ๐=1๐ ๐๐๐๐
we compute the mean squared error: ๐ ๐ โ๐ 2 .
๐2 =
๐=1
๐
๐๐๐๐
๐=1
๐
๐๐๐๐
=
๐โ ๐
๐๐๐๐๐๐๐๐ +
๐=1
๐
๐๐2๐๐
2
=
๐โ ๐
๐๐๐๐ ๐2 +
๐=1
๐
๐๐2 ๐2 +๐2
= ๐2
๐,๐=1
๐
๐๐๐๐ + ๐2
๐=1
๐
๐๐2 =๐2 + ๐2
๐=1
๐
๐๐2First note that
and
Hence
๐ ๐ โ๐ 2 = ๐ ๐2 โ๐2 = ๐2
๐=1
๐
๐๐2
= ๐2
๐=1
๐
๐๐ โ1
๐
2
+2
๐
๐=1
๐
๐๐ โ ๐ ร1
๐2
= ๐2
๐=1
๐
๐๐ โ1
๐
2
+๐2
๐
๐ ๐2 =
๐โ ๐
๐๐๐๐ ๐ ๐๐๐๐ +
๐=1
๐
๐๐2๐ ๐๐
2
=
๐โ ๐
๐๐๐๐ ๐ ๐๐ ๐ ๐๐ +
๐=1
๐
๐๐2 ๐2 +๐2
This attains minimum when ๐๐ = 1/๐ (sample mean).
10
Exercise 3 (10min)
(1) A coin is tossed ๐ times. Let ๐ be the number of heads. Examine that the relative frequency ๐/๐ is an unbiased estimator of a probability that a head occurs. [Hint] ๐/๐ =
1
๐ ๐=1๐ ๐๐ , where ๐๐ = 0 or 1 according to heads or tails.
(2) An urn contains many white balls and a relatively small number of red
balls. ๐ balls are chosen randomly with replacement. Find an unbiased
estimator of the ratio of red balls in the urn.
11
Exercise 4 (Challenge)[Median is an unbiased estimator]
There is a deck of ๐ cards and a unique number from 1 to ๐ is assigned to each
card as a reward. Three cards are drawn randomly with replacement, say,
๐1, ๐2, ๐3. Then we know that (๐1 + ๐2 + ๐3)/3 is an unbiased estimator of the
mean of the reward. The median ๐ of ๐1, ๐2, ๐3 is the reward which locates at
the middle rank among the three, i.e., in this case the second largest (or the
second smallest).
(1) Find the distribution of ๐, i.e., ๐ ๐ = ๐ .
(2) Show that ๐ is an unbiased estimator.
12
Unbiased variance
โข Let ๐1, ๐2, ๐3, โฏ , ๐๐ be random
samples with mean ๐ and variance ๐2.
โข The sample mean is defined by
๐ =1
๐
๐=1
๐
๐๐
โข The variance would be naturally defined by
๐2 =1
๐
๐=1
๐
๐๐ โ ๐ 2
โข However, ๐ ๐2 โ ๐2.
TheoremAn unbiased estimator of ๐2 is given by
๐2 =1
๐ โ 1
๐=1
๐
๐๐ โ ๐ 2.
In other words,
๐ ๐2 = ๐2.
Definition ๐2 is called the unbiased variance (or sample variance in some literatures).
PROOF is by direct calculation of ๐ ๐2
expanding the right-hand side [Hoel: p.220].
13
Maximum likelihood estimates
โข It is desired to use the best unbiased estimator.
โข The best unbiased estimators are known for many concrete cases,
however, it is not straightforward to obtain the best unbiased
estimator in general.
โข Then the maximum likelihood method is employed because
the maximum likelihood estimator is rather easily derived;
and in many cases it is the best unbiased estimator.
14
Examples:
(1) Binomial population โ opinion poll, etc.
The population distribution is the Bernoulli
distribution ๐ต 1, ๐ , so the unknown parameter
is ๐.
(2) Poisson population โ counting rare events
The population distribution Po ๐ contains just
one unknown parameter ๐.
(3) Normal population โ Population consisting of
individuals sharing a lot of small fluctuations
The normal distribution ๐ ๐, ๐2 contains two
unknown parameters ๐ and ๐2.
Formulation
population distribution
๐ ๐ฅ, ๐
with unknown parameter ๐
samples๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐
๐ = ๐ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐
15
Likelihood functions
Let ๐ ๐ฅ, ๐ be the population distribution, where ๐ is unknown.
The likelihood function is defined by
๐ฟ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐; ๐ =
๐=1
๐
๐ ๐ฅ๐ , ๐
Basic idea: โข Given a set of sample data ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐,
we determine the value ๐ which maximizes ๐ฟ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐; ๐ .
โข Roughly, we presume that the occurring event (the data ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐) is the one that have the highest probability among many other candidates.
The maximum value of ๐ฟ is found by solving ๐๐ฟ
๐๐= 0.
Such a value ๐ = ๐ is called the maximum likelihood estimator.
It is often more convenient to consider the log-likelihood function:
log ๐ฟ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐; ๐ =
๐=1
๐
log ๐ ๐ฅ๐ , ๐
Then
๐ log ๐ฟ
๐๐=
๐=1
๐๐
๐๐log ๐ ๐ฅ๐ , ๐ .
16
Binomial population
We consider
๐ ๐ฅ, ๐ =๐, if ๐ฅ = 11 โ ๐, if ๐ฅ = 0
= ๐๐ฅ 1 โ ๐ 1โ๐ฅ
where ๐ is unknown and to be estimated.
The log-likelihood function is defined by
log ๐ฟ =
๐=1
๐
๐ฅ๐ log ๐ + 1 โ ๐ฅ๐ log 1 โ ๐
The likelihood function is defined by
๐ฟ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐; ๐ =
๐=1
๐
๐ ๐ฅ๐ , ๐ =
๐=1
๐
๐๐ฅ๐ 1 โ ๐ 1โ๐ฅ๐
Then we have
๐
๐๐log ๐ฟ =
๐=1
๐
๐ฅ๐1
๐+ 1 โ ๐ฅ๐
โ1
1 โ ๐
=
๐=1
๐๐ฅ๐ โ ๐
๐ 1 โ ๐
and solving ๐
๐๐log ๐ฟ = 0, we
obtain the maximum likelihood estimator:
๐ =1
๐
๐=1
๐
๐ฅ๐
which is known as the sample mean and is an unbiased estimator too.
17
Point estimates vs Interval estimates
populationsamples
๐ฅ1, ๐ฅ2, ๐ฅ3, โฏ , ๐ฅ๐unknown parameter
๐
๐ ๐
point estimator
๐ = ๐ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐
โข ๐ is calculated by using our good estimator.โข However, nobody knows whether or not ๐ is
close to ๐.โข What we know is that our estimating method
is statistically reasonable in the sense that ๐ hits a good approximation of ๐ with high frequency if we apply many times.
Probabilistic estimate is done using the confidence interval [Hoel: Chap 5]18
Problem 5
19
Consider the interval [0, ๐ด], where ๐ด > 0 is unknown and to be estimated (e.g., the ultimate world record). Let ๐1, ๐2, โฏ , ๐๐ be random samples obeying the uniform distribution [0, ๐ด]. Let ๐ be the sample mean.
(1) Show that 2 ๐ is an unbiased estimator of ๐ด.
(2) This model is too simple for application to estimating the ultimate world record of long jump. Discuss about this point.
Problem 6
There are many runners on the street. Each runner is given a number cloth starting with
1 to a certain unknown big number ๐. Snap shots are taken at random.
(1) In a snap shot there are 3 runners with numbers ๐1, ๐2, ๐3. Set ๐ = max ๐1, ๐2, ๐3 .
Prove that
๐ ๐ = ๐ =3 ๐ โ 1 ๐ โ 2
๐ ๐ โ 1 ๐ โ 2
(2) Calculating the mean value ๐ ๐ , show that 4
3๐ โ 1 is an unbiased estimator of ๐.
(3) In a snap shot there are ๐ runners with numbers ๐1, ๐2, โฏ , ๐๐. Show that
1 +1
๐๐ โ 1 is an unbiased estimator of ๐, where ๐ = max ๐1, ๐2, โฏ , ๐๐ .
20
Problem 7
Recall Exercise 7. There is a deck of ๐ cards and a unique number from 1 to ๐ is
assigned to each card as a reward. Three cards are drawn randomly with replacement,
say, ๐1, ๐2, ๐3. We know that both
๐ =1
3๐1 + ๐2 + ๐3 , ๐ = median ๐1, ๐2, ๐3
are unbiased estimator of the mean of the reward. Calculating the mean squared
errors, determine the better estimator.
21
Problem 8
The exponential distribution with parameter ๐ > 0 is defined by the density
function:
๐ ๐ฅ, ๐ = ๐๐โ๐๐ฅ , ๐ฅ โฅ 0,
0, ๐ฅ < 0.
Then the likelihood function is defined by
๐ฟ ๐ฅ1, ๐ฅ2, โฏ , ๐ฅ๐; ๐ =
๐=1
๐
๐ ๐ฅ๐ , ๐ .
Find the maximum likelihood estimator ๐ of ๐.
22
Problem 9It is known [de Moivre-Laplace theorem] that the binomial distribution ๐ต ๐, ๐ is well approximated
by the normal distribution ๐ ๐๐, ๐๐ 1 โ ๐ when ๐๐ โฅ 30 and ๐ 1 โ ๐ โฅ 30. Then, for a
random variable ๐~๐ต ๐, ๐ we have
๐ ๐ โค ๐ฅ = ๐๐ โ ๐๐
๐๐ 1 โ ๐โค ๐ฆ โ
1
2๐ โโ
๐ฆ
๐โ๐ก2
2 ๐๐ก , ๐ฆ =๐ฅ โ ๐๐
๐๐ 1 โ ๐
On the other hand, by numerical computation for ๐~๐ 0,1 we know that
๐ ๐ โค 1.96 =1
2๐ โ1.96
1.96
๐โ๐ก2
2 ๐๐ก = 0.95
A fair coin is tossed ๐ times (๐ is so large as ๐ โซ 100). Let ๐๐ be the number of heads among ๐
trials. Find ๐๐ such that
๐๐๐
๐โ1
2โค ๐๐ = 0.95
23
Solution to Exercise 3
(1) Let ๐ be the probability that a head occurs
and ๐ the number of heads obtained in
tossing a coin ๐ times. Then ๐/๐ is an
unbiased estimator of ๐. In fact, define
๐๐ = 1, if a head occurs on ๐โth toss
0, otherwise
Then ๐1, ๐2, โฏ , ๐๐ are random samples from
the population with mean ๐ (This is called a
binomial population). Then we know that the
sample mean
๐ =1
๐
๐=1
๐
๐๐
(2) Let ๐ be the ratio of the red balls in the urn.
Then, choosing a ball from the urn is equivalent to
tossing a coin such that the probability that a
head occurs is ๐. Then the situation is the same as
in (1). Letting ๐ be the number of red balls among
๐ trials, X/๐ is an unbiased estimator of ๐.
is an unbiased estimator, i.e., ๐ ๐ = ๐. On the other
hand, ๐=1๐ ๐๐ is the number of of heads obtained in
tossing a coin ๐ times and coincides with ๐.
Consequently, ๐
๐= ๐ is an unbiased estimator of ๐.
24
Solution to Exercise 4
First we find the distribution of ๐. The event ๐ = ๐ is divided into the following four cases:
โข ๐1 < ๐, ๐2= ๐, ๐3 > ๐ or its permutations
โข ๐1 = ๐2 = ๐, ๐3> ๐ or its permutations
โข ๐1 < ๐, ๐2 = ๐3 = ๐ or its permutations
โข ๐1 = ๐2 = ๐3 = ๐
Thus,
๐ ๐ = ๐ =1
๐36 ๐ โ 1 ๐ โ ๐ + 3 ๐ โ ๐ + 3 ๐ โ 1) + 1 =
1
๐3โ6๐2 + 6 ๐ + 1 ๐ โ 3๐ + 2
๐ธ ๐ =
๐=1
๐
๐ ๐ ๐ = ๐ =1
๐3
๐=1
๐
โ6๐3 + 6 ๐ + 1 ๐2 โ 3๐ + 2 ๐
=1
๐3โ6
๐2 ๐ + 1 2
4+ 6 ๐ + 1
๐ ๐ + 1 2๐ + 1
6โ 3๐ + 2
๐ ๐ + 1
2=
๐ + 1
2
The mean value of ๐ is computed as follows:
which coincides with the mean of the reward ๐ + 1 /2. 25