9783642033827-c1
DESCRIPTION
tiuloTRANSCRIPT
CHAPTER 2
A Review of Some Basic Statistical Concepts
2.1 Variance and Covariance of Linear Combinations of Random Variables.
a. Let Y D a C bX, then E.Y/ D E.a C bX/ D a C bE.X/. Hence,
var.Y/ D EŒY � E.Y/�2 D EŒa C bX � a � bE.X/�2 D EŒb.X � E.X//�2
D b2EŒX � E.X/�2 D b2 var.X/.
Only the multiplicative constant b matters for the variance, not the additive
constant a.
b. Let Z D a C bX C cY, then E.Z/ D a C bE.X/ C cE.Y/ and
var.Z/ D EŒZ � E.Z/�2 D EŒa C bX C cY � a � bE.X/ � cE.Y/�2
D EŒb.X � E.X// C c.Y � E.Y//�2
D b2EŒX�E.X/�2Cc2EŒY�E.Y/�2C2bc EŒX�E.X/�ŒY�E.Y/�
D b2var.X/ C c2var.Y/ C 2bc cov.X, Y/.
c. Let Z D aCbXCcY, and W D dCeXCfY, then E.Z/ D aCbE.X/CcE.Y/
E.W/ D d C eE.X/ C fE.Y/
and
cov.Z, W/ D EŒZ � E.Z/�ŒW � E.W/�
D EŒb.X�E.X//Cc.Y�E.Y//�Œe.X�E.X//Cf.Y�E.Y//�
D be var.X/ C cf var.Y/ C .bf C ce/ cov.X, Y/.
2.2 Independence and Simple Correlation.
a. Assume that X and Y are continuous random variables. The proof is similar
if X and Y are discrete random variables and is left to the reader. If X and
Y are independent, then f.x, y/ D f1.x/f2.y/ where f1.x/ is the marginal
probability density function (p.d.f.) of X and f2.y/ is the marginal p.d.f. of
Y. In this case,
E.XY/ D ’xyf.x, y/dxdy D ’
xyf1.x/f2.y/dxdyD .
Rxf1.x/dx/.
Ryf2.y/dy/ D E.X/E.Y/
6 Badi Baltagi
Hence,
cov.X, Y/ D EŒX � E.X/�ŒY � E.Y/� D E.XY/ � E.X/E.Y/
D E.X/E.Y/ � E.X/E.Y/ D 0.
b. If Y D a C bX, then E.Y/ D a C bE.X/ and cov.X, Y/ D EŒX � E.X/�ŒY �E.Y/� D EŒX � E.X/�Œa C bX � a � bE.X/� D b var.X/ which takes the
sign of b since var(X) is always positive. Hence,
correl.X, y/ D ¡xy D cov.X, Y/p
var.X/var.Y/D b var.X/p
var.X/var.Y/
but var.Y/ D b2 var.X/ from problem 2.1a. Hence, ¡XY D b var.X/pb2.var.X//2
D˙1 depending on the sign of b.
2.3 Zero Covariance Does Not Necessarily Imply Independence.X P(X)�2 1/5�1 1/5
0 1/51 1/52 1/5
E.X/ D2X
XD�2
XP.X/ D 15
Œ.�2/ C .�1/ C 0 C 1 C 2� D 0
E.X2/ D2X
XD�2
X2P.X/ D 15
Œ4 C 1 C 0 C 1 C 4� D 2
and var.X/ D 2. For Y D X2, E.Y/ D E.X2/ D 2 and
E.X3/ D2X
XD�2
X3P.X/ D 15
Œ.�2/3 C .�1/3 C 0 C 13 C 23� D 0
In fact, any odd moment of X is zero. Therefore,
E.YX/ D E.X2.X/ D E.X3/ D 0
Chapter 2: A Review of Some Basic Statistical Concepts 7
and
cov.Y, X/ D E.X � E.X//.Y � E.Y// D E.X � 0/.Y � 2/
D E.XY/ � 2E.X/ D E.XY/ D E.X3/ D 0
Hence, ¡XY D cov.X,Y/pvar.X/var.Y/
D 0.
2.4 The Binomial Distribution.
a. PrŒX D 5 or 6� D PrŒX D 5� C PrŒX D 6�
D b.n D 20, X D 5, ™ D 0.1/ C b.n D 20, X D 6, ™ D 0.1/
D
205
!
.0.1/5.0.9/15 C
206
!
.0.1/6.0.9/14
D 0.0319 C 0.0089 D 0.0408.
This can be easily done with a calculator, on the computer or using the
Binomial tables, see Freund (1992).
b.
n
n � X
!
D nŠ.n�X/Š.n�nCX/Š
D nŠ.n�X/ŠXŠ
D
nX
!
Hence,
b.n, n � X, 1 � ™/ D
nn � X
!
.1 � ™/n�X.1 � 1 C ™/n�nCX
D nX
.1 � ™/n�X™X D b.n, X, ™/.
c. Using the MGF for the Binomial distribution given in problem 2.14a, we get
MX.t/ D Œ.1 � ™/ C ™et�n.
Differentiating with respect to t yields M0X.t/ D nŒ.1 � ™/ C ™et�n�1™et.
Therefore, M0X.0/ D n™ D E.X/.
Differentiating M0X.t/ again with respect to t yields
M00X.t/ D n.n � 1/Œ.1 � ™/ C ™et�n�2.™et/2 C nŒ.1 � ™/ C ™et�n�1™et.
Therefore M00X.0/ D n.n � 1/™2 C n™ D E.X2/.
8 Badi Baltagi
Hence var.X/ D E.X2/ � .E.X//2 D n™ C n2™2 � n™2 � n2™2 D n™.1 � ™/.
An alternative proof for E.X/ DnP
XD0Xb.n, X, ™/. This entails factorial
moments and the reader is referred to Freund (1992).
d. The likelihood function is given by
L.™/ D f .X1, .., Xn; ™/ D ™
nP
iD1Xi
.1 � ™/n�
nP
iD1Xi
so that log L.™/ D� nP
iD1Xi
�
log ™ C�
n �nP
iD1Xi
�
log.1 � ™/
@ log L.™/
@™D
nP
iD1Xi
™�
�
n �nP
iD1Xi
�
.1 � ™/D 0.
Solving for ™ one getsnX
iD1
Xi � ™
nX
iD1
Xi � ™n C ™
nX
iD1
Xi D 0
so that O™mle DnP
iD1Xi=n D NX.
e. E. NX/ DnP
iD1E.Xi/=n D n™=n D ™.
Hence, NX is unbiased for � . Also, var. NX/ D var.Xi/ = n D �.1 � �/ = n
which goes to zero as n ! 1. Hence, the sufficient condition for NX to be
consistent for ™ is satisfied.
f. The joint probability function in part d can be written as
f.X1, .., Xn; ™/ D ™n NX.1 � ™/n�n NX D h. NX, ™/g.X1, : : : , Xn/
where h. NX, ™/ D ™n NX.1�™/n�n NX and g.X1, .., Xn/ D 1 for all Xi0s. The latter
function is independent of ™ in form and domain. Hence, by the factorization
theorem, NX is sufficient for ™.
g. NX was shown to be MVU for ™ for the Bernoulli case in Example 2 in
the text.
h. From part (d), L.0.2/ D .0.2/
nP
iD1Xi
.0.8/n�
nP
iD1Xi
while L.0.6/ D .0.6/
nP
iD1Xi
.0.4/n�
nP
iD1Xi
with the likelihood ratio L.0.2/L.0.6/
D � 13
�nP
iD1Xi
2n�
nP
iD1Xi
Chapter 2: A Review of Some Basic Statistical Concepts 9
The uniformly most powerful critical region C of size ’ � 0.05 is given by� 1
3
�nP
iD1Xi
2n�
nP
iD1Xi � k inside C. Taking logarithms of both sides
�nX
iD1
Xi.log 3/ C
n �nX
iD1
Xi
!
log 2 < log k
solving � nX
iD1
Xi
!
log 6 � K0 ornX
iD1
Xi � K
where K is determined by making the size of C D ’ � 0.05. In this case,nP
iD1Xi � b.n, ™/ and under Ho ; ™ D 0.2. Therefore,
nP
iD1Xi � b.n D 20, ™ D
0.2/. Hence, ’ D PrŒb.n D 20, ™ D 0.2/ � K� � 0.05.
From the Binomial tables for n D 20 and ™ D 0.2, K D 7 gives PrŒb.n D 20,
™ D 0.2/ � 7� D 0.0322. Hence,nP
iD1Xi � 7 is our required critical region.
i. The likelihood ratio test is
L.0.2/
L.O™mle/D .0.2/
nP
iD1Xi
.0.8/n�
nP
iD1Xi
� NX�nP
iD1Xi �
1 � NX�n�nP
iD1Xi
so that LR D �2 log L.0.2/ C 2 log L.O™mle/
D �2
" nX
iD1
Xi.log 0.2 � log NX/
#
� 2
"
n �nX
iD1
Xi
!
.log 0.8 � log.1 � NX//
#
.
This is given in Example 5 in the text for a general ™o. The Wald statistic is
given by W D . NX � 0.2/2
NX.1 � NX/=n
and the LM statistic is given by LM D . NX � 0.2/2
.0.2/.0.8/=nAlthough, the three statistics, LR, LM and W look different, they are all
based on j NX � 2j � k and for a finite n, the same exact critical value could
be obtained from the binomial distribution.
2.5 d. The Wald, LR, and LM Inequality. This is based on Baltagi (1995). The
likelihood is given by equation (2.1) in the text.
L��, ¢2� D .1=2 ¢2/n=2e
�.1=2¢2/nP
iD1.Xi��/2
(1)
10 Badi Baltagi
It is easy to show that the score is given by
S��, ¢2� D
0
@
n. NX��/
¢2nP
iD1.Xi��/2�n¢2
2¢4
1
A , (2)
and setting S.�, ¢2/ D 0 yields O� D NX and O¢2 DnP
iD1.Xi � NX/2=n. Under
Ho, Q� D �0 and
Q¢2 DnX
iD1
.Xi � �0/2 =n.
Therefore,
log L� Q�, Q¢2� D �n
2log Q¢2 � n
2log 2 � n
2(3)
and
log L� O�, O¢2� D �n
2log O¢2 � n
2log 2 � n
2. (4)
Hence,
LR D n log
2
664
nP
iD1.Xi � �0/2
nP
iD1
�Xi � NX�2
3
775 . (5)
It is also known that the information matrix is given by
I
�
¢2
!
D"
n¢2 00 n
2¢4
#
. (6)
Therefore,
W D . O� � �0/2 OI11 D n2 � NX � �0�2
nP
iD1
�Xi � NX�2
, (7)
where OI11 denotes the (1,1) element of the information matrix evaluated
at the unrestricted maximum likelihood estimates. It is easy to show from
(1) that
log L� Q�, O¢2� D �n
2log O¢2 � n
2log 2 �
nP
iD1.Xi � �0/2
2 O¢2 . (8)
Chapter 2: A Review of Some Basic Statistical Concepts 11
Hence, using (4) and (8), one gets
�2 log�L� Q�, O¢2� =L. O�, O¢2/
� D
nP
iD1.Xi � �0/2 � n O¢2
O¢2 D W, (9)
and the last equality follows from (7). Similarly,
LM D S2 � Q�, Q¢2� QI11 D n2. NX � �0/2
nP
iD1.Xi � �0/
2, (10)
where QI11 denotes the (1,1) element of the inverse of the information matrix
evaluated at the restricted maximum likelihood estimates. From (1), we
also get
log L� O�, Q¢2� D �n
2log Q¢2 � n
2log 2 �
nP
iD1.Xi � NX/2
2 Q¢2 (11)
Hence, using (3) and (11), one gets
�2 log�L� Q�, Q¢2� =L
� O�, Q¢2�� D n �
nP
iD1
�Xi � NX�2
Q¢2 D LM, (12)
where the last equality follows from (10). L. Q�, Q¢2/ is the restricted max-
imum; therefore, log L. Q�, O¢2/ � log L. Q�, Q¢2/, from which we deduce
that W � LR. Also, L. O�, O¢2/ is the unrestricted maximum; therefore
log L. O�, O¢2/ � log L. O�, Q¢2/, from which we deduce that LR � LM.
An alternative derivation of this inequality shows first that
LMn
D W=n1 C .W=n/
andLRn
D log
�
1 C Wn
�
.
Then one uses the fact that y � log.1 C y/ � y=.1 C y/ for y D W=n.
2.6 Poisson Distribution.
a. Using the MGF for the Poisson derived in problem 2.14c one gets
Mx.t/ D eœ.et�1/.
12 Badi Baltagi
Differentiating with respect to t yields
M0x.t/ D eœ.et�1/œet.
Evaluating M0X.t/ at t D 0, we get
M0x.0/ D E.X/ D œ.
Similarly, differentiating M0X.t/ once more with respect to t, we get
M00x .t/ D e�.et�1/
��et�2 C e�.et�1/�et
evaluating it at t D 0 gives
M00x .0/ D �2 C œ D E.X2/
so that
var.X/ D E.X2/ � .E.X/2/ D �2 C � � �2 D �.
Hence, the mean and variance of the Poisson are both equal to œ.
b. The likelihood function is
L .�/ D e�n��
nP
iD1Xi
X1ŠX2Š..XnŠ
so that
log L.�/ D �n� C nX
iD1
Xi
!
log � �nX
iD1
log XiŠ
@ log L.�/
@�D �n C
nP
iD1Xi
�D 0.
Solving for œ, yields Oœmle D NX.
c. The method of moments equates E(X) to NX and since E.X/ D œ the solution
is Oœ D NX, same as the ML method.
d. E. NX/ DnP
iD1E.Xi/=n D nœ=n D œ. Therefore NX is unbiased for œ. Also,
var. NX/ D var.Xi/n D �
n which tends to zero as n ! 1. Therefore, the
sufficient condition for NX to be consistent for œ is satisfied.
Chapter 2: A Review of Some Basic Statistical Concepts 13
e. The joint probability function can be written as
f.X1, .., Xn; �/ D e�n��n NX 1X1Š..XnŠ
D h� NX, �
�g .X1, .., Xn/
where h. NX, œ/ D e�nœœn NX and g.X1, .., Xn/ D 1X1Š..XnŠ
. The latter is inde-
pendent of œ in form and domain. Therefore, NX is a sufficient statistic
for œ.
f. log f.X; œ/ D �œ C X log œ � log X!
and
@ log f.X; �/
@�D �1 C X
�
@2 log f.X; �/
@�2 D �X�2 .
The Cramer-Rao lower bound for any unbiased estimator Oœ of œ is given by
var� O�
� � 1
nE�
@2 log f.X;�/
@�2
D �2
n E.X/D �
n.
But var. NX/ D œ=n, see part (d). Hence, NX attains the Cramer-Rao
lower bound.
g. The likelihood ratio is given by
L.2/
L.4/D e�2n2
nP
iD1Xi
e�4n4nP
iD1Xi
The uniformly most powerful critical region C of size ’ � 0.05 is given by
e2n�
12
� nP
iD1Xi
� k inside C
Taking logarithms of both sides and rearranging terms, we get
� nX
iD1
Xi
!
log 2 � K0
ornX
iD1
Xi � K
14 Badi Baltagi
where K is determined by making the size of C D ’ � 0.05. In this case,nP
iD1Xi � Poisson.nœ/ and under Ho; œ D 2. Therefore,
nP
iD1Xi � Poisson
(œ D 18). Hence ’ D PrŒPoisson.18/ � K� � 0.05.
From the Poisson tables, for œ D 18, K D 26 gives PrŒPoisson.18/ �26� D 0.0446. Hence,
nP
iD1Xi � 26 is our required critical region.
h. The likelihood ratio test is
L.2/
L. NX/D e�2n2
nP
iD1Xi
e�n NX � NX�nP
iD1Xi
D en. NX�2/
�2NX� nP
iD1Xi
so that
LR D �2 log L.2/ C 2 log L. NX/ D �2n. NX � 2/ � 2nX
iD1
Xi�log 2 � log NX� .
In this case,
C .�/ Dˇˇˇˇ@2 log L .�/
@�2
ˇˇˇˇ D
ˇˇˇˇˇˇˇˇ
�
nP
iD1Xi
�2
ˇˇˇˇˇˇˇˇ
and
I .�/ D �E
@2 log L.�/
@�2
�
D n�
�2 D n�
.
The Wald statistic is based upon
W D � NX � 2�2 I
�O�mle
D . NX � 2/2 � n
NXusing the fact that Oœmle D NX. The LM statistic is based upon
LM D S2.2/I�1.2/ D n2. NX � 2/2
4� 2
nD n. NX � 2/2
2.
Note that all three test statistics are based upon j NX � 2j � K, and for finite
n the same exact critical value could be obtained using the fact thatnP
iD1Xi
has a Poisson distribution, see part (g).
Chapter 2: A Review of Some Basic Statistical Concepts 15
2.7 The Geometric Distribution.
a. Using the MGF for the Geometric distribution derived in problem 2.14d,
one gets
Mx.t/ D ™et
Œ1 � .1 � ™/et�.
Differentiating it with respect to t yields
M0X.t/ D ™et �1 � .1 � ™/et�C .1 � ™/et™et
Œ1 � .1 � ™/et�2D ™et
Œ1 � .1 � ™/et�2
evaluating M0X.t/ at t D 0, we get
M0x.0/ D E.X/ D ™
™2 D 1™
.
Similarly, differentiating M0X.t/ once more with respect to t, we get
M00x.t/ D ™et �1 � .1 � ™/et�2 C 2
�1 � .1 � ™/et� .1 � ™/et™et
Œ1 � .1 � ™/et�4
evaluating M00X.t/ at t D 0, we get
M00x .0/ D E.X2/ D ™3 C 2™2.1 � ™/
™4 D 2™2 � ™3
™4 D 2 � ™
™2
so that
var.X/ D E.X2/ � .E.X//2 D 2 � ™
™2 � 1™2 D 1 � ™
™2 .
b. The likelihood function is given by
L .™/ D ™n.1 � ™/
nP
iD1Xi�n
so that
log L.™/ D n log ™ C nX
iD1
Xi � n
!
log.1 � ™/
@ log L.™/
@™D n
™�
nP
iD1Xi � n
.1 � ™/D 0
solving for ™ one gets
n.1 � ™/ � ™
nX
iD1
Xi C n™ D 0
16 Badi Baltagi
or
n D ™
nX
iD1
Xi
which yields
O™mle D n=
nX
iD1
Xi D 1NX.
The method of moments estimator equates
E.X/ D NX
so that
1O™ D NX or O™ D 1
NXwhich is the same as the MLE.
2.8 The Uniform Density.
a. E.X/ D R 10 x dx D 1
2 Œx2�10 D 12
E.X2/ D R 10 x2dx D 1
3
�x3�1
0 D 13
so that var.X/ D E.X2/ � .E.X//2 D 13 � 1
4 D 4�312 D 1
12 .
b. PrŒ0.1 < X < 0.3� D R 0.30.1 dx D 0.3 � 0.1 D 0.2. It does not matter if
we include the equality signs PrŒ0.1 � X � 0.3� since this is a continuous
random variable. Note that this integral is the area of the rectangle, for X
between 0.1 and 0.3 and height equal to 1. This is just the length of this
rectangle, i.e., 0.3 � 0.1 D 0.2.
2.9 The Exponential Distribution.
a. Using the MGF for the exponential distribution derived in problem 2.14e,
we get
Mx.t/ D 1.1 � ™t/
.
Chapter 2: A Review of Some Basic Statistical Concepts 17
Differentiating with respect to t yields
M0x.t/ D ™
.1 � ™t/2 .
Therefore
MX0.0/ D ™ D E.X/.
Differentiating M0X.t/ with respect to t yields
M00X.t/ D 2™2 .1 � ™t/
.1 � ™t/4 D 2™2
.1 � ™t/3 .
Therefore
M00X.0/ D 2™2 D E.X2/.
Hence
var.X/ D E.X2/ � .E.X//2 D 2™2 � ™2 D ™2.
b. The likelihood function is given by
L .™/ D�
1™
�n
e�
nP
iD1Xi=™
so that
log L.™/ D �nlog™ �
nP
iD1Xi
™
@ log L .™/
@™D �n
™C
nP
iD1Xi
™2 D 0
solving for ™ one getsnX
iD1
Xi � n™ D 0
so that
O™mle D NX.
c. The method of moments equates E.X/ D NX. In this case, E.X/ D ™, henceO™ D NX is the same as MLE.
18 Badi Baltagi
d. E. NX/ DnP
iD1E.Xi/=n D n™=n D ™. Hence, NX is unbiased for ™. Also,
var. NX/ D var.Xi/=n D �2=n which goes to zero as n ! 1. Hence, the
sufficient condition for NX to be consistent for ™ is satisfied.
e. The joint p.d.f. is given by
f .X1, .., Xn; ™/ D�
1™
�n
e�
nP
iD1Xi=™ D e�n NX=�
�1™
�n
D h. NX; �/g.X1, : : : , Xn/
where h. NX; �/ D e�n NX=�� 1
�
�n and g.X1, : : : , Xn/ D 1 independent of ™ in
form and domain. Hence, by the factorization theorem, NX is a sufficient
statistic for ™.
f. log f.X; ™/ D � log ™ � X™
and
@ log f.X; ™/
@™D �1
™C X
™2 D X � ™
™2
@2 log f.X; ™/
@™2 D 1™2 � 2X™
™4 D ™ � 2X™3
The Cramer-Rao lower bound for any unbiased estimator O™ of ™ is given by
var�O™
� �1
nE�
@2 log f.X;™/
@™2
D �™3
nE .™ � 2X/D ™2
n.
But var. NX/ D ™2=n, see part (d). Hence, NX attains the Cramer-Rao lower
bound.
g. The likelihood ratio is given by
L.1/
L.2/D e
�nP
iD1Xi
2�ne�
nP
iD1Xi=2
D 2ne�
nP
iD1xi=2
.
The uniformly most powerful critical region C of size ’ � 0.05 is given by
2n e�
nP
iD1Xi=2 � k inside C.
Taking logarithms of both sides and rearranging terms, we get
n log2 � nX
iD1
X=2
!
� K0
Chapter 2: A Review of Some Basic Statistical Concepts 19
ornX
iD1
Xi � K
where K is determined by making the size of C D ’ � 0.05. In this case,nP
iD1Xi is distributed as a Gamma p.d.f. with “ D ™ and ’ D n. Under
Ho; ™ D 1. Therefore,nX
iD1
Xi � Gamma.˛ D n, ˇ D 1/.
Hence, PrŒGamma.’ D n, “ D 1/ � K� � 0.05
K should be determined from the integralR1
K1
�.n/xn�1e�x dx D 0.05 for
n D 20.
h. The likelihood ratio test is
L.1/
L. NX/D e
�nP
iD1Xi
�1NXn
e�n
so that
LR D �2 log L.1/ C 2 log L. NX/ D 2nX
iD1
Xi � 2n log NX � 2n.
In this case,
C .™/ Dˇˇˇˇ@2 log L.™/
@™2
ˇˇˇˇ D
ˇˇˇˇˇ
n™2 �
2nP
i�1Xi
™3
ˇˇˇˇˇ
Dˇˇˇˇˇ
n™ � 2nP
iD1Xi
™3
ˇˇˇˇˇ
and
I.™/ D �E
@2`nL .™/
@™2
�
D n™2 .
The Wald statistic is based upon
W D � NX � 1�2 I
�O™mle
D � NX � 1
�2 nNX2
using the fact that O™mle D NX. The LM statistic is based upon
LM D S2.1/I�1.1/ D nX
iD1
Xi � n
!21n
D n� NX � 1
�2 .
20 Badi Baltagi
All three test statistics are based upon j NX�1j � k and, for finite n, the same
exact critical value could be obtained using the fact thatnP
iD1Xi is Gamma
(’ D n, and “ D 1) under Ho, see part (g).
2.10 The Gamma Distribution.
a. Using the MGF for the Gamma distribution derived in problem 2.14f, we get
MX.t/ D .1 � “t/�˛ .
Differentiating with respect to t yields
M0X.t/ D �’.1 � “t/�’�1.�“/ D ’“.1 � “t/�’�1.
Therefore
M0x.0/ D ’“ D E.X/.
Differentiating MX0.t/ with respect to t yields
M00x.t/ D �’“.’ C 1/.1 � “t/�’�2 .�“/ D ’“2.’ C 1/.1 � “t/�’�2.
Therefore
M00x .0/ D ’2“2 C ’“2 D E
�X2� .
Hence
var.X/ D E�X2� � .E .X//2 D ’“2.
b. The method of moments equates
E.X/ D NX D ’“
and
E.X2/ DnX
iD1
X2i =n D ’2“2 C ’“2.
Chapter 2: A Review of Some Basic Statistical Concepts 21
These are two non-linear equations in two unknowns. Substitute ’ D NX=“
into the second equation, one getsnX
iD1
X2i =n D NX2 C NX“
Hence,
O“ D
nP
iD1.Xi � NX/2
n NXand
O’ D n NX2
nP
iD1
�Xi � NX�2
.
c. For ’ D 1 and “ D ™, we get
f .X; ’ D 1, “ D ™/ D 1�.1/™
X1�1e�X=™ for X > 0 and ™ > 0,
D 1™
e�X=™
which is the exponential p.d.f.
d. For ’ D r=2 and “ D 2, the Gamma .’ D r=2, “ D 2/ is ¦2r . Hence, from
part (a), we get
E.X/ D ’“ D .r=2/.2/ D r
and
var.X/ D ’“2 D .r=2/.4/ D 2r.
The expected value of a ¦2r is r and its variance is 2r.
e. The joint p.d.f. for ’ D r=2 and “ D 2 is given by
f.X1, .., Xn; ’ D r=2, “ D 2/ D�
1�.r=2/2r=2
�n
.X1 : : : Xn/r2 �1e
�nP
iD1Xi=2
D h.X1, .., Xn; r/g .X1, .., Xn/
22 Badi Baltagi
where
h.X1, .., Xn; r/ D�
1� .r=2/ 2r=2
�n
.X1 : : : Xn/r2 �1
and g.X1, .., Xn/ D e�
nP
iD1Xi=2
independent of r in form and domain. Hence,
by the factorization theorem .X1 : : : Xn/ is a sufficient statistic for r.
f. Let X1, .., Xm denote independent N(0, 1) random variables. Then, X21, .., X2
m
will be independent ¦21 random variables and Y D
mP
iD1X2
i will be ¦2m. The
sum of m independent ¦21 random variables is a ¦2
m random variable.
2.12 The t-distribution with r Degrees of Freedom.
a. If X1, .., Xn are IIN.�, ¢2/, then NX � N.�, ¢2=n/ and z D . NX��/
¢=p
n is N(0, 1).
b. .n � 1/s2=¢2 � ¦2n�1. Dividing our N(0, 1) random variables z in part (a),
by the square-root of our ¦2n�1 random variable in part (b), divided by its
degrees of freedom, we get
t D� NX � �
�=¢=
pn
q.n�1/s2
¢2 =.n � 1/
D . NX � �/
s=p
n.
Using the fact that NX is independent of s2, this has a t-distribution with (n-1)
degrees of freedom.
c. The 95% confidence interval for � would be based on the t-distribution
derived in part (b) with .n � 1/ D 15 degrees of freedom.
t D NX � �
s=p
nD 20 � �
2=p
16D 20 � �
1=2D 40 � 2�
PrŒ�t’=2 < t < t’=2� D 1 � ’ D 0.95
From the t-tables with 15 degrees of freedom, t0.025 D 2.131. Hence
PrŒ�2.131 < 40 � 2� < 2.131� D 0.95.
rearranging terms, one gets
PrŒ37.869=2 < � < 42.131=2� D 0.95
PrŒ18.9345 < � < 21.0655� D 0.95.
Chapter 2: A Review of Some Basic Statistical Concepts 23
2.13 The F-distribution.
.n1 � 1/ s21=�2
1 D 24.15.6/=�21 � 2
24
also
.n2 � 1/s22=¢2
2 D 30 .18.9/ =¢22 � ¦2
30.
Therefore, under Ho; ¢21 D ¢2
2
F D s22=s2
1 D 18.915.6
D 1.2115 � F30,24.
Using the F-tables with 30 and 24 degrees of freedom, we find F.05,30,24 D 1.94.
Since the observed F-statistic 1.2115 is less than 1.94, we do not reject Ho that
the variance of the two shifts is the same.
2.14 Moment Generating Function (MGF).
a. For the Binomial Distribution,
Mx.t/ D E.eXt/ DnX
XD0
nX
!
eXt™X .1 � ™/n�X
DnX
XD0
nX
!�™et�X
.1 � ™/n�X
D �.1 � ™/ C ™et�n
where the last equality uses the binomial expansion .a C b/n DnP
XD0
nX
!
aXbn�X with a D ™et and b D .1 � ™/. This is the fundamental
relationship underlying the binomial probability function and what makes it
a proper probability function.
b. For the Normal Distribution,
MX .t/ D E�eXt� D
Z C1
�1eXt 1
�p
2e� 1
2�2 .X��/2dx
D 1�
p2
Z C1
�1e� 1
2�2 fX2�2�XC�2�Xt2�2gdx
D 1¢
p2
Z C1
�1e� 1
2¢2 fX2�2.�Ct¢2/XC�2gdx
24 Badi Baltagi
completing the square
Mx.t/ D 1¢
p2
Z C1
�1e� 1
2¢2
nŒx�.�Ct¢2/�
2�.�Ct¢2/2C�2
o
dx
D e� 12¢2 Œ�2��2�2�t¢2�t2¢4�
The remaining integral integrates to 1 using the fact that the Normal den-
sity is proper and integrates to one. Hence Mx.t/ D e�tC 12 ¢2t2 after some
cancellations.
c. For the Poisson Distribution,
Mx.t/ D E.eXt/ D1X
XD0
eXt e���X
XŠD
1X
XD0
e��.�et/X
XŠ
D e�œ
1X
XD0
.œet/X
XŠD e�et � œ D eœ.et�1/
where the fifth equality follows from the fact that1P
XD0
aX
XŠD ea and in this
case a D œet. This is the fundamental relationship underlying the Poisson
distribution and what makes it a proper probability function.
d. For the Geometric Distribution,
MX.t/ D E.eXt/ D1X
XD1
™.1 � ™/X�1eXt D ™
1X
XD1
.1 � ™/X�1e.X�1/tet
D ™et1X
XD1
�.1 � ™/ et�X�1 D ™et
1 � .1 � ™/et
where the last equality uses the fact that1P
XD1aX�1 D 1
1�a and in this case
a D .1�™/et. This is the fundamental relationship underlying the Geometric
distribution and what makes it a proper probability function.
Chapter 2: A Review of Some Basic Statistical Concepts 25
e. For the Exponential Distribution,
MX.t/ D E.eXt/ DZ 1
0
1™
e�X=™eXtdx
D 1™
Z 1
0e�XŒ 1
™�t�dx
D 1™
Z 1
0e�XŒ 1�™t
™ �dx
D 1™
� �™
.1 � ™t/
he�X. 1�™t
™ /i1
0D .1 � ™t/�1
f. For the Gamma Distribution,
MX.t/ D E.eXt/ DZ 1
0
1� .’/ “’
X’�1e�X=“eXt dx
D 1� .’/ “’
Z 1
0X’�1e�X
�1“
�t
dx
D 1�.’/“’
Z 1
0X’�1e�X
�1�“t
“
dx
The Gamma density is proper and integrates to one using the fact thatR1
0 X’�1e�X=“dx D �.’/“’. Using this fundamental relationship for the
last integral, we get
Mx.t/ D 1� .’/ “’
� �.’/
�“
1 � “t
�’
D .1 � “t/�’
where we substituted “=.1 � “t/ for the usual “. The ¦2r distribution is
Gamma with ’ D r2 and “ D 2. Hence, its MGF is .1 � 2t/�r=2.
g. This was already done in the solutions to problems 5, 6, 7, 9 and 10.
2.15 Moment Generating Function Method.
a. If X1, .., Xn are independent Poisson distributed with parameters .œi/ respec-
tively, then from problem 2.14c, we have
MXi.t/ D e�i.et�1/ for i D 1, 2, : : : , n
26 Badi Baltagi
Y DnP
iD1Xi has MY.t/ D
nQ
iD1MXi.t/ since the Xi
0s are independent. Hence
MY.t/ D enP
iD1�i.et�1/
which we recognize as a Poisson with parameternP
iD1œi.
b. If X1, .., Xn are IIN��i, ¢2
i�, then from problem 2.14b, we have
MXi.t/ D e�itC 12 ¢2
i t2 for i D 1, 2, .., n
Y DnP
iD1Xi has MY.t/ D
nQ
iD1MXi.t/ since the Xi
0s are independent. Hence
MY.t/ D e
nP
iD1�i
!
tC 12
nP
iD1¢2
i
!
t2
which we recognize as Normal with meannP
iD1�i and variance
nP
iD1¢2
i .
c. If X1, .., Xn are IIN.�, ¢2/, then Y DnP
iD1Xi is N.n�, n¢2/ from part b using
the equality of means and variances. Therefore, NX D Y=n is N.�, ¢2=n/.
d. If X1, .., Xn are independent ¦2 distributed with parameters .ri/ respectively,
then from problem 2.14f, we get
MXi.t/ D .1 � 2t/�ri=2 for i D 1, 2, .., n
Y DnP
iD1Xi has MY.t/ D
nQ
iD1MXi.t/ since the Xi
0s are independent. Hence,
MY.t/ D .1 � 2t/�
nP
i�1ri=2
which we recognize as ¦2 with degrees of freedomnP
iD1ri.
2.16 Best Linear Prediction. This is based on Amemiya (1994).
a. The mean squared error predictor is given by MSE D E.Y � ’ � “X/2 DE.Y2/ C ’2 C “2E.X2/ � 2’E.Y/ � 2“E.XY/ C 2’“E.X/ minimizing this
Chapter 2: A Review of Some Basic Statistical Concepts 27
MSE with respect to ’ and “ yields the following first-order conditions:
@MSE@’
D 2’ � 2E.Y/ C 2“E.X/ D 0
@MSE@“
D 2“E.X2/ � 2E.XY/ C 2’E.X/ D 0.
Solving these two equations for ’ and “ yields O’ D �Y � O“�X from the
first equation, where �Y D E.Y/ and �X D E.X/. Substituting this in the
second equation one gets
O“E.X2/ � E.XY/ C �Y�X � O“�2X D 0
O“var.X/ D E.XY/ � �X�Y D cov.X, Y/.
Hence, O“ D cov.X, Y/=var.X/ D ¢XY=¢2X D ¡¢Y=¢X, since ¡ D
¢XY=¢X¢Y. The best predictor is given by OY D O’ C O“X.
b. Substituting O’ into the best predictor one gets
OY D O’ C O“X D �Y C O“.X � �X/ D �Y C ¡¢Y
¢X.X � �X/
one clearly deduces that E. OY/ D �Y and var. OY/ D ¡2 ¢2Y
¢2X
var.X/ D ¡2¢2Y.
The prediction error Ou D Y � OY D .Y � �Y/ � ¡ ¢Y¢X
.X � �X/ with E.Ou/ D 0
and var.Ou/ D E.Ou2/ D var.Y/ C ¡2 ¢2Y
¢2X
var.X/ � 2¡ ¢Y¢X
¢XY D ¢2Y C ¡2¢2
Y �2¡2¢2
Y D ¢2Y.1 � ¡2/.
This is the proportion of the var(Y) that is not explained by the best linear
predictor OY.
c. cov. OY, Ou/ D cov. OY, Y � OY/ D cov. OY, Y/ � var. OY/
But
cov. OY, Y/ D E� OY � �Y
.Y � �Y/ D E
�
¡¢Y
¢X.X � �X/ .Y � �Y/
�
D ¡¢Y
¢Xcov .X, Y/ D ¡2 ¢X¢2
Y¢X
D ¡2¢2Y
Hence, cov. OY, Ou/ D ¡2¢2Y � ¡2¢2
Y D 0.
28 Badi Baltagi
2.17 The Best Predictor.
a. The problem is to minimize EŒY � h.X/�2 with respect to h(X). Add and
subtract E(Y/X) to get
EfŒY � E.Y=X/� C ŒE.Y=X/ � h.X/�g2
D EŒY � E.Y=X/�2 C EŒE.Y=X/ � h.X/�2
and the cross-product term EŒY �E.Y=X/�ŒE.Y=X/�h.X/� is zero because
of the law of iterated expectations, see the Appendix to this chapter or
Amemiya (1994). In fact, this says that expectations can be written as
E D EXEY=X
and the cross-product term given above EY=XŒY�E.Y=X/�ŒE.Y=X/�h.X/�
is clearly zero. Hence, EŒY � h.X/�2 is expressed as the sum of two positive
terms. The first term is not affected by our choice of h(X). The second term
however is zero for h.X/ D E.Y=X/. Clearly, this is the best predictor of Y
based on X.
b. In the Appendix to this chapter, we considered the bivariate Normal dis-
tribution and showed that E.Y=X/ D �Y C ¡ ¢Y¢X
.X � �X/. In part (a), we
showed that this is the best predictor of Y based on X. But, in this case, this
is exactly the form for the best linear predictor of Y based on X derived in
problem 2.16. Hence, for the bivariate Normal density, the best predictor is
identical to the best linear predictor of Y based on X.
http://www.springer.com/978-3-642-03382-7