ms exercisesolutions

Upload: aset999

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 MS ExerciseSolutions

    1/26

    School of Mathematical Sciences

    MTH736U Mathematical Statistics Solutions to Exercises

    Exercise 1.1 For any events A,B,Cdefined on a sample space we have

    Commutativity

    A B = B AA B = B A

    Associativity

    A (B C) = (A B) CA (B C) = (A B) C

    Distributive LawsA (B C) = (A B) (A C)A (B C) = (A B) (A C)

    DeMorgans Laws

    (A B)c = Ac Bc(A B)c = Ac Bc

    Exercise 1.2 Here we will use general versions of DeMorgans Laws: For any events A1, A2, . . . A,where A is a -algebra defined on a sample space we have:

    i=1

    Aic

    =

    i=1

    Aci and

    i=1

    Aic

    =

    i=1

    Aci

    Now, by Definition 1.5 ifA1, A2, . . . A, then also Ac1, Ac2, . . . A. Therefore,

    i=1 Aci A .

    Then, by the first general DeMorgan Law we have

    i=1

    Aci =

    i=1

    Aic A

    and so

    i=1 Ai A as well. It means that -algebra A is closed under intersection of its elements.

    1

  • 7/30/2019 MS ExerciseSolutions

    2/26

    Exercise 1.3 Theorem 1.2

    IfP is a probability function, then

    (a) P(A) =

    i=1 P(A Ci) for any partition C1, C2, . . ..(b) P(

    i=1 Ai)

    i=1 P(Ai) for any events A1, A2, . . .. [Booles Inequality]

    Proof

    (a) Since C1, C2, . . . form a partition of we have that Ci Cj = for all i = j and =

    i=1 Ci.Hence, by the Distributive Law, we can write

    A = A = A i=1

    Ci

    =

    i=1

    (A Ci).

    Then, since A Ci are disjoint,

    P(A) = P

    i=1(A Ci)

    =

    i=1P(A Ci).

    (b) First we construct a disjoint collection of events A1, A2, . . . such that

    i=1

    Ai =

    i=1

    Ai.

    DefineA1 = A1

    Ai = Ai \ i1

    j=1

    Aj

    , i = 2, 3, . . .

    Then

    P

    j=1

    Ai

    = P

    j=1

    Ai

    =

    j=1

    P(Ai )

    since Ai are disjoint. Now, by construction Ai Ai for all i = 1, 2, . . .. Hence, P(Ai )

    P(Ai) for all i = 1, 2, . . ., and so

    i=1

    P(Ai )

    i=1

    P(Ai).

    Exercise 1.4 Let X Bin(8, 0.4), that is n = 8 and the probability of success p = 0.4. The pmf, shownin a mathematical, tabular and graphical way and a graph of the c.d.f. of the variable X follow.Mathematical form:

    (x, P(X = x) = 8Cx(0.4)x(0.6)8x ), x X= {0, 1, 2, . . . , 8}

    Tabular form:

    x 0 1 2 3 4 5 6 7 8P(X = x) 0.0168 0.0896 0.2090 0.2787 0.2322 0.1239 0.0413 0.0079 0.0007P(X x) 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993 1

    2

  • 7/30/2019 MS ExerciseSolutions

    3/26

    0 2 4 6 8

    x

    0.00065536

    0.07016448

    0.13967360

    0.20918272

    0.27869184

    0 1 2 3 4 5 6 7 8

    x

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Figure 1: Graphical representation of the mass function and the cumulative distribution function for X Bin(8, 0.4)

    Exercise 1.5 1. To verify that F(x) is a cdf we will check the conditions of Theorem 1.4.(i) F(x) = 2

    x3> 0 for x X= (1, ). Hence, F(x) is increasing on (1, ). The function is

    equal to zero otherwise.

    (ii) Obviously limx F(x) = 0

    (iii) limx F(x) = 1 as limx1

    x2= 0

    (iv) F(x) is continuous.

    2. The pdf is

    f(x) =

    0, for x (, 1);2

    x3, for x (1, );

    not defined, for x = 1.

    3. P(3 X 4) = F(4) F(3) = 25144 . Exercise 1.6 Discrete distributions:

    1. Uniform U(n) (equal mass at each outcome): The support set and the pmf are, respectively,X= {x1, x2, . . . , xn} and

    P(X = xi) =1

    n, xi X,

    where n is a positive integer. In the special case ofX= {1, 2, . . . , n} we have

    E(X) =n + 1

    2

    , var(X) =(n + 1)(n 1)

    12

    .

    Examples:

    (a) X first digit in a randomly selected sequence of length n of 5 digits;(b) X randomly selected student in a class of 15 students.

    2. Bern(p) (only two possible outcomes, usually called success and failure): The support setand the pmf are, respectively, X= {0, 1} and

    P(X = x) = px(1 p)1x, x X,where p [0, 1] is the probability of success.

    E(X) = p, var(X) = p(1 p).Examples:

    (a) X an outcome of tossing a coin;

    3

  • 7/30/2019 MS ExerciseSolutions

    4/26

    (b) X detection of a fault in a tested semiconductor chip;(c) X guessed answer in a multiple choice question.

    3. Bin(n, p) (number of success in n independent trials): The support set and the pmf are, respec-tively, X= {0, 1, 2, . . . n} and

    P(X = x) =

    nx

    px(1 p)nx, x X,

    where p [0, 1] is the probability of success.

    E(X) = np, var(X) = np(1 p).Examples:

    (a) X number of heads in several tosses of a coin;(b) X number of semiconductor chips in several faulty chips in which a test finds a defect;(c) X number of correctly guessed answers in a multiple choice test ofn questions.

    4. Geom(p) (the number of independent Bernoulli trials until first success): The support set andthe pmf are, respectively,

    X=

    {1, 2, . . .

    }and

    P(X = x) = p(1 p)x1 = pqx1, x X,

    where p [0, 1] is the probability of success, q = 1 p.

    E(X) =1

    p, var(X) =

    1 pp2

    .

    Examples include:

    (a) X number of bits transmitted until the first error;(b) X number of analyzed samples of air before a rare molecule is detected.

    5. Hypergeometric(n,M,N ) (the number of outcomes of one kind in a random sample of size ntaken from a dichotomous sample space with M and N elements in each group): The supportset and the pmf are, respectively, X= {0, 1, . . . , n} and

    P(X = x) =

    Mx

    N M

    n x

    Nn

    , x X,where M x, N M n x.

    E(X) =nM

    N, var(X) =

    nM

    N (N M)(N n)

    N(N 1).

    6. Poisson() (a number of outcomes in a period of time or in a part of a space): The support setand the pmf are, respectively, X= {0, 1, . . .} and

    P(X = x) =x

    x!e,

    where > 0.E(X) = , var(X) = .

    Examples:

    (a) count blood cells within a square of a haemocytometer slide;

    (b) number of caterpillars on a leaf;

    (c) number of plants of a rear variety in a square meter of a meadow;

    (d) number of tree seedlings in a square meter around a big tree;

    4

  • 7/30/2019 MS ExerciseSolutions

    5/26

    (e) number of phone calls to a computer service within a minute.

    Exercise 1.7 Continuous distributions:

    1. Uniform,U(a, b): The support set and the pdf are, respectively,X= [a, b] and

    fX(x) =1

    b aI[a,b](x).

    E(X) =a + b

    2, var(X) =

    (b a)212

    .

    2. Exp(): The support set and the pdf are, respectively,X= [0, ) and

    fX(x) = exI[0,)(x),

    where > 0.

    E(X) =1

    , var(X) =1

    2

    .

    Examples:

    X the time between arrivals of e-mails on your computer; X the distance between major cracks in a motorway; X the life length of car voltage regulators.

    3. Gamma(, ): An exponential rv describes the length of time or space between counts. Thelength until r counts occur is a generalization of such a process and the respective rv is calledthe Erlang random variable. Its pdf is given by

    fX(x) =xr1rex

    (r 1)!I[0,)(x), r = 1, 2, . . .

    The Erlang distribution is a special case of the Gamma distribution in which the parameter r isany positive number, usually denoted by . In the gamma distribution we use the generalizationof the factorial represented by the gamma function, as follows.

    () =

    0

    x1exdx, for > 0.

    A recursive relationship that may be easily shown integrating the above equation by parts is

    () = ( 1)( 1).

    If is a positive integer, then() = ( 1)!

    since (1) = 1.

    The pdf is

    fX(x) =x1ex

    ()I[0,)(x),

    where the > 0, > 0 and () is the gamma function. The mean and the variance of thegamma rv are

    E(X) =

    , var(X) =

    2.

    4. 2(): A rv X has a Chi-square distribution with degrees of freedom iff X is gamma dis-tributed with parameters = 2 and =

    12 . This distribution is used extensively in interval

    estimation and hypothesis testing. Its values are tabulated.

    5

  • 7/30/2019 MS ExerciseSolutions

    6/26

    Exercise 1.8 We can write

    E(X b)2 = E X E X+ E X b2= E

    (X E X) + (E X b)2

    = E(X E X)2 + (E X b)2 + 2 E

    (X E X)(E X b)

    = E(X E X)2 + (E X b)2 + 2(E X b) E(X E X)= E(X E X)2 + (E X b)2 as E(X E X) = 0.

    This is minimized when (E X b)2 = 0, that is, E X = b.

    Exercise 1.9var(aX+ b) = E(aX+ b)2 {E(aX+ b)}2

    = E(a2X2 + 2abX+ b2) (a2{E X}2 + 2ab E X + b2)= a2 E X2 a2{E X}2 = a2 var(X)

    Exercise 1.10 Let X

    Bin(n, p). The pmf ofX is

    P(X = x) =

    nx

    px(1 p)nx, x = 0, 1, . . . , n .

    To obtain the mgf ofX we will use the Bionomial formula

    nx=0

    nx

    uxvnx = (u + v)n.

    Hence, we may write

    MX(t) = E

    etX

    =n

    x=0 etxnxpx(1 p)nx=

    nx=0

    nx

    etpx

    (1 p)nx

    =

    etp + (1 p)n.Now, assume that X Poisson(). Then

    P(X = x) =ex

    x!, x = 0, 1, 2, . . .

    Here we will use the Taylor series expansion of eu at zero, i.e.,

    eu =

    x=0

    uxx!

    .

    Hence, we have

    MX(t) = E

    etX

    =

    x=0

    etxex

    x!

    = e

    x=0

    etx

    x!

    = eeet

    = e(et1).

    6

  • 7/30/2019 MS ExerciseSolutions

    7/26

    Exercise 1.11 The pdf of a standard normal rv Z is

    fZ(z) =12

    ez2/2IR.

    To find the distribution of a transformed rv Y = g(Z) = + Z we will apply Theorem 1.11. Theinverse mapping at y is z = g1(y) = y . Assuming > 0 the function g() is increasing. Then,we can write

    fY(y) = fZg1(y) ddy

    g1(y)

    =12

    e(y)2

    221

    =1

    2e

    (y)2

    22 .

    This is a pdf of a normal distribution with parameters R and 2 R+, i.e., Y N(, 2).

    Exercise 1.12

    (a) Let MX(t) be a mgf of a random variable X. Then, for Y = a + bX, where a and b are constants,we can write

    MY(t) = E

    etY

    = E

    et(a+bX)

    = E

    etaetbX

    = eta E

    etbX

    = etaMX(tb).

    (b) First we derive the mgf for a standard normal rv Z.

    MZ(t) = E

    etZ

    =

    etz12

    ez2/2dz

    =

    1

    2 e 1

    2(z22tz)

    dz

    =

    12

    e12 (z

    22tz+t2)+ 12 t2

    dz

    =

    12

    e(zt)2

    2 et2

    2 dz

    = et2

    2

    12

    e(zt)2

    2 dz

    = et2

    2 ,

    as the integrand is the pdf of a rv with mean t and variance 1. Now, we can calculate the mgf of

    Y N(, 2) using the result of part (a), that is,

    MY(t) = E(et(+Z)) = etMZ(t) = e

    tet22

    2 = e(2t+t22)/2.

    Exercise 1.13 Let us write the table of counts given for the smoking status and gender in terms of propor-

    tions, as follow:

    S1 2 3

    G 0 0.20 0.32 0.08 0.601 0.10 0.05 0.25 0.40

    0.30 0.37 0.33 1.00

    7

  • 7/30/2019 MS ExerciseSolutions

    8/26

    where S is a random variable denoting smoking status such that S(s) = 1, S(q) = 2, S(n) = 3and G is a random variable denoting gender such that G(m) = 0, G(f) = 1. In the table we havethe distribution ofX = (G, S) as well as marginal distributions ofG and ofS.

    1. pG(0) = 0.6.

    2. pX(0, 1) = 0.2.

    3. pS(1) +pS(2) = 0.30 + 0.37 = 0.67.

    4. pX(1, 1) +pX(0, 2) = 0.10 + 0.05 = 0.15.

    Exercise 1.14 Here we have conditional probabilities.

    1. pS(S = 1|G = 0) = 0.200.60 = 13 .2. pG(G = 0|S = 1) = 0.100.30 = 13 .

    Exercise 1.15 The first equality is a special case of Lemma 1.2 with g(Y) = Y.

    To show the second equality we will work on the RHS of the equation.

    var(Y|X) = E(Y2|X) [E(Y|X)]2.

    So

    E[var(Y|X)] = E[E(Y2|X)] E{[E(Y|X)]2} = E(Y2) E{[E(Y|X)]2}.Also,

    var(E[Y

    |X]) = E

    {[E(Y

    |X)]2

    } {E[E(Y

    |X)]

    }2 = E

    {[E(Y

    |X)]2

    } [E(Y)]2.

    Hence, adding these two expressions we obtain

    E[var(Y|X)] + var(E[Y|X]) = E(Y2) [E(Y)]2 = var(Y).

    Exercise 1.16 (See example 1.25)

    Denote by a success a message which gets into the server and by X the number of successes in Ytrials. Then

    X|Y

    Bin(Y, p), p =

    1

    3Y Poisson() = 21

    As in Example 1.25

    X Poisson(p) = 7.Hence

    E(X) = p, var(X) = p = 7.

    Exercise 1.17 Here we have a two-dimensional rv X = (X1, X2) which denotes length of life of two

    components, with joint pdf equal to

    fX(x1, x2) =1

    8x1e

    12 (x1+x2)I{(x1,x2):x1>0,x2>0}.

    8

  • 7/30/2019 MS ExerciseSolutions

    9/26

    1. Probability that the length of life of each of the two components will be greater than 100 hours

    is:

    PX(X1 > 1, X2 > 1) =

    1

    1

    1

    8x1e

    12 (x1+x2)dx1dx2

    =1

    8

    1

    e12x2

    1

    x1e 1

    2x1dx1

    =6e1/2 (by parts)dx2

    =1

    8

    1

    e12x2

    6e12

    dx2

    =3

    2e1.

    That is, the probability that the length of life of each of the two components is greater than 100

    hours is 32 e1.

    2. Now we are interested in component II only, so we need to calculate the marginal pdf of the

    length of its life.

    fX2(x2) =

    0

    1

    8x1e

    12 (x1+x2)dx1

    =1

    8e

    12

    x2

    0x1e

    12

    x1dx1

    =4 (by parts)

    =1

    2e

    12x2 .

    Now,

    PX2(X2 > 2) =

    2

    1

    2e

    12

    x2dx2 = e1.

    That is, the probability that the length of life of component II will be bigger than 200 hours is

    e1.

    3. We can write

    fX(x1, x2) =

    1

    8x1e

    12

    x1

    e

    12

    x2

    = g(x1)h(x2).

    The joint pdf can be written as a product of two functions, one depending on x1 only and theother on x2 only for all pairs (x1x2) and the domains are independent. Hence, the randomvariables X1 and X2 are independent.

    4. Write g(X1) =1

    X1and h(X2) = X2. Then, by part 2 of Theorem 1.13 we have

    E[g(X1)h(X2)] = E[g(X1)]E[h(X2)] = E

    1

    X1

    E[X2].

    The marginal pdfs for X1 and X2, respectively, are

    fX1(x1) =1

    4x1e

    12x1 , fX2(x2) =1

    2e

    12x2 .

    Hence,

    E

    1

    X1

    =

    1

    4

    0

    1

    x1x1e

    12x1dx1

    =1

    4

    0

    e12x1dx1 =

    1

    2.

    E[X2] =1

    2

    0

    x2e 1

    2x2dx2 = 2e1.

    Finally,EX2

    X1

    = E

    1X1

    E[X2] = e

    1.

    9

  • 7/30/2019 MS ExerciseSolutions

    10/26

    Exercise 1.18 Take the nonnegative function of t, g(t) = var(tX1 + X2) 0. Then,g(t) = var(tX1 + X2) = E

    (tX1 + X2 (t1 + 2))2

    = E

    (t(X1 1) + (X2 2))2

    = E

    ((X1 1)2t2 + 2(X1 1)(X2 2)t + (X2 2)2

    = var(X1) t2 + 2 cov(X1, X2) t + var(X2).

    1. g(t) 0, hence there is one or no real roots of this quadratic function. The discriminant is: = 4 cov2(X1, X2) 4 var(X1) var(X2) 0.

    Hence,

    cov2(X1, X2) var(X1) var(X2),that is

    1 cov2(X1, X2)

    var(X1) var(X2) 1

    and so

    1 (X1, X2) 1.

    2. |(X1, X2)| = 1 if and only if the discriminant is equal to zero. That is (X1, X2) = 1 if andonly if g(t) has a single root. But since (t(X1 1) + (X2 2))2 0, the expected valueg(t) = E

    (t(X1 1) + (X2 2))2

    = 0 if and only if

    P

    [t(X1 1) + (X2 2)]2 = 0

    = 1.

    This is equivalent to

    P (t(X1 1) + (X2 2) = 0) = 1.It means that P(X2 = aX1 + b) = 1 with a = t and b = 1t + 2, where t is the root ofg(t),that is

    t = cov(X1, X2)var(X1)

    .

    Hence, a = t has the same sign as (X1, X2).

    Exercise 1.19 We have

    fX,Y(x, y) =

    8xy, for 0 x < y 1;0, otherwise.

    (a) Variables X and Y are not independent because their ranges are not independent.

    (b) We have cov(X, Y) = E(XY) E(X) E(Y).

    E(XY) = 1

    0 1

    x

    xy8xydydx =4

    9

    .

    To calculate E(X) and E(Y) we need the marginal pdfs for X and for Y.

    fX(x) =

    1x

    8xydy = 4x(1 x2) on (0, 1);

    and

    fY(y) =

    10

    8xydx = 4y3 on (0, 1).

    Now we can calculate the expectations, that is,

    E(X) = 1

    0

    x4x(1

    x2)dx =

    8

    15

    , E(Y) = 1

    0

    y4y3dy =4

    5

    .

    Hence,

    cov(X, Y) =4

    9 8

    15

    4

    5 0.01778.

    10

  • 7/30/2019 MS ExerciseSolutions

    11/26

    (c) The transformation and the inverses are, respectively,

    u =x

    yand v = y;

    and

    x = uy = uv and y = v.

    Then, Jacobian of the transformation is

    J = det

    v u0 1

    = v.

    So, by the theorem for a bivariate transformation we can write

    fU,V(u, v) = fX,Y(uv,v)|J| = 8uv3.

    The support of(U, V)is

    D = {(u, v) : 0 u < 1, 0 < v 1}.

    (d) The joint pdf for (U, V) is a product of two functions: one depends on u only and the other onv only. Also, the ranges ofU and V are independent. Hence, U and V are independent randomvariables.

    (e) U and V are independent, hence cov(U, V) = 0.

    Exercise 1.20 (a) Here u = xx+y and v = x + y. Hence, the inverses are

    x = uv and y = v uv.

    The Jacobian of the transformation is J = v.

    Furthermore, the joint pdf of (X, Y) (by independence) is

    fX,Y(x, y) = 2e(x+y).

    Hence, by the transformation theorem we get

    fU,V(u, v) = 2evv

    and the support for (U, V) is

    D = {(u, v) : 0 < u < 1, 0 < v < }.

    (b) Random variables U and V are independent because their joint pdf can be written as a productof two functions, one depending on v only and the other on u only (this is a constant) and theirdomains are independent.

    (c) We will find the marginal pdf for U.

    fU(u) =

    0

    2evvdv = 1

    on (0, 1). That is U Uniform(0, 1).(d) A sum of two identically distributed exponential random variables has the Erlang(2, ) distribu-

    tion.

    11

  • 7/30/2019 MS ExerciseSolutions

    12/26

    Exercise 1.21 (a) By definition of expectation we can write

    E(Ya) =

    0

    yay1ey

    ()dy

    =

    0

    ya+1ey

    ()

    a(a + )

    a(a + )dy

    =

    (a + )

    a()

    0

    ya+1a+ey

    (a + ) dy (pdf of a gamma rv)

    =(a + )

    a().

    Now, the parameters of a gamma distribution have to be positive, hence a + > 0.

    (b) To obtain E(Y) we apply the above result with a = 1. This gives

    E(Y) =(1 + )

    ()=

    ()

    ()=

    .

    For the variance we have var(Y) = E(Y2) [E(Y)]2. Hence we need to find E(Y2). Again,we will use the result of point (a), now with a = 2. That is,

    E(Y2) =(2 + )

    2()=

    (1 + )(1 + )

    2()=

    (1 + )()

    2()=

    (1 + )

    2.

    Hence, the variance is

    var(Y) =(1 + )

    2

    2=

    2.

    (c) X 2 is a special case of gamma distribution when = 2 and = 12 . Hence, from (b) weobtain

    E(X) =

    =

    22 = , var(X) =

    2=

    24 = 2.

    Exercise 1.22 Here we have Yi iid

    N(0, 1), Y = 155

    i=1 Yi and Y6 N(0, 1) independently of the otherYi.

    (a) Yi iid

    N(0, 1), hence Y2i iid

    21. Then, by Corollary 1.1 we get

    W =5

    i=1Y2i 25.

    (b) U =5

    i=1(Yi Y)2 = 4S2

    2, where S2 = 14

    5i=1(Yi Y)2 and 2 = 1. Hence, as shown in

    Example 1.34 we have

    U =4S2

    2 24.

    (c) U 24 and Y26 21. Hence, by Corollary 1.1 we getU + Y26 25.

    (d) Y6 N(0, 1) and W 25. Hence, by Theorem 1.19, we getY6W/5

    t5.

    12

  • 7/30/2019 MS ExerciseSolutions

    13/26

    Exercise 1.23 Let X= (X1, X2, . . . , X n)T be a vector of mutually independent rvs with

    mgfs MX1(t), MX2(t), . . . , M Xn(t) and let a1, a2, . . . , an and b1, b2, . . . , bn be fixed constants. Thenthe mgf of the random variable Z =

    ni=1(aiXi + bi) is

    MZ(t) = et

    bi

    ni=1

    MXi(ait).

    Proof. By definition of the mgf we can write

    MZ(t) = E

    etZ

    = E

    etn

    i=1(aiXi+bi)

    = E

    etn

    i=1 aiXi+tn

    i=1 bi

    = E

    etn

    i=1 bietn

    i=1 aiXi

    = etn

    i=1 bi En

    i=1

    etaiXi = etn

    i=1 bi

    ni=1

    E etaiXi

    = etn

    i=1 bi

    ni=1

    MXi(tai).

    Exercise 1.24 Here we will use the result of Theorem 1.22. We know that the mgf of a normal rv X withmean and variance 2 is

    MX(t) = e(2t+t22)/2.

    Hence, by Theorem 1.22 with ai =1n and bi = 0 for all i = 1, . . . , n, we can write

    MX(t) =n

    i=1

    MXi

    1

    nt

    =

    ni=1

    e(21n t+

    1n2

    t22)/2

    =

    e(2

    1n t+

    1n2

    t22)/2

    n

    = en(21n t+

    1n2

    t22)/2

    = e(2t+t2 2n )/2.

    This is the mgf of a normal rv with mean and variance 2/n, that is

    X N

    ,2

    n

    .

    Exercise 1.25

    Here g(X) = Y= AX, hence, h(Y) = X= A1Y= BY, where B =A1. The Jacobian is

    Jh(y) = det

    yh(y) = det

    yBy = detB = detA1 =

    1

    detA.

    Hence, by Theorem 1.25 we have

    fY(y) = fXh(y)

    Jh(y) = fXA1Y) 1detA

    |.

    13

  • 7/30/2019 MS ExerciseSolutions

    14/26

    Exercise 1.26

    A multivariate normal rv has the pdf of the form

    fX(x) =1

    (2)n/2

    detVexp

    1

    2(x )TV1(x )

    .

    Then, by the result of Exercise 1.25, for X Nn(,V) and Y= AXwe havefY(y) = fX

    A1Y)

    1| detA|

    =1

    | detA|1

    (2)n/2

    detVexp

    1

    2(A1y )TV1(A1y )

    =

    1

    | detA|1

    (2)n/2

    detVexp

    1

    2

    A1(y A)TV1A1(y A)

    =1

    | detA|1

    (2)n/2

    detVexp

    1

    2(y A)T(A1)TV1A1(y A)

    =1

    (2)n/2detV(detA)2 exp12 (y A)TAVAT1(y A)=

    1

    (2)n/2

    det(AVAT)exp

    1

    2(y A)TAVAT1(y A)

    This is a pdf of a multivariate normal rv with mean A and variance-covariance matrix AVAT.

    14

  • 7/30/2019 MS ExerciseSolutions

    15/26

    CHAPTER 2

    Exercise 2.1 Suppose that Y = (Y1, . . . , Y n) is a random sample from an Exp() distribution. Then wemay write

    fY(y) =n

    i=1

    eyi = nen

    i=1 yi

    g(T(Y)) 1

    h(y).

    It follows that T(Y) = ni=1 Yi is a sufficient statistic for . Exercise 2.2 Suppose that Y = (Y1, . . . , Y n) is a random sample from an Exp() distribution. Then the

    ratio of the joint pdfs at two different realizations ofY, x and y, is

    f(x; )

    f(y; )=

    nen

    i=1 xi

    nen

    i=1 yi

    = en

    i=1 yin

    i=1 xi

    The ratio is constant iffni=1 yi =

    ni=1 xi. Hence, by Lemma 2.1, T(Y) =

    ni=1 Yi is a minimal

    sufficient statistic for .

    Exercise 2.3 Yi are identically distributed, hance have the same expectation, say E(Y), for all i = 1, . . . , n.Here, for y [0, ], we have:

    E(Y) =2

    2

    0

    y( y)dy

    =2

    2

    0

    (y y2)dy

    =2

    2

    1

    2

    y2

    1

    3

    y3

    0

    =1

    3.

    Bias:

    E(T(Y)) = E(3Y) = 31

    n

    ni=1

    E(Yi) = 31

    nn

    1

    3 = .

    That it bias(T(Y)) = E(T(Y)) = 0.

    Variance: Yi are identically distributed, hance have the same variance, say var(Y), for all i =1, . . . , n,

    var(Y) = E(Y

    2

    ) [E(Y)]2

    .We need to calculate E(Y2).

    E(Y2) =2

    2

    0

    y2( y)dy

    =2

    2

    0

    (y2 y3)dy

    =2

    2

    1

    3y3 1

    4y4

    0

    =1

    62.

    Hence var(Y) = E(Y2) [E(Y)]2 = 16 2 19 2 = 29 2. This gives

    var(T(Y)) = 9 var(Y) = 91

    n2

    ni=1

    var(Yi) = 91

    n2

    ni=1

    2

    92 = 9

    1

    n2n

    2

    92 =

    2

    n2.

    15

  • 7/30/2019 MS ExerciseSolutions

    16/26

    Consistency:

    T(Y) is unbiased, so it is enough to check if its variance tends to zero when n tends to infinity.Indeed, as n we have 2n 2 0, that is T(Y) = 3Y is a consistent estimator of.

    Exercise 2.4 We have Xi iid

    Bern(p) for i = 1, . . . , n. Also, X = 1nn

    i=1 Xi.

    (a) For an estimator to be consistent for we require that the MSE() 0 as n . We haveMSE() = var() + [bias()]2.

    We will now calculate the variance and bias ofp = X.E(X) =

    1

    n

    ni=1

    E(Xi) =1

    nnp = p.

    Hence X is an unbiased estimator ofp.

    var(X) =

    1

    n2

    n

    i=1

    var(Xi) =

    1

    n2 npq =

    1

    n pq.

    Hence, MSE(X) = 1n pq 0 as n 0, that is, X is a consistent estimator of p.(b) The estimator is asymptotically unbiased for ifE() as n , that is, the bias tends

    to zero as n . Here we have

    E(pq) = E[X(1 X)] = E[X X2] = E[X] E[X2].Note that we can write

    E[X2

    ] = var(X) + [E(X)]2 =1

    npq+p2.

    That is

    E(pq) = p 1n

    pq+p2

    =pq(n 1)

    n pq as n .

    Hence, the estimator is asymptotically unbiased for pq.

    Exercise 2.5 Here we have a single parameter p and g(p) = p.By definition the CRLB(p) is

    CRLB(p) =

    dg(p)dp

    2Ed2 log P(Y=y;p)

    dp2

    , (1)where the joint pmf ofY= (Y1, . . . , Y n)

    T, where Yi Bern(p) independently, is

    P(Y= y;p) =n

    i=1

    pyi(1 p)1yi = pn

    i=1 yi(1 p)nn

    i=1 yi.

    For the numerator of (1) we getdg(p)

    dp = 1. Further, for brevity denote P = P(Y = y;p). For the

    denominator of (1) we calculate

    log P =n

    i=1

    yi logp + (n n

    i=1

    yi) log(1 p)

    16

  • 7/30/2019 MS ExerciseSolutions

    17/26

    andd log P

    dp=

    ni=1 yip

    +

    ni=1 yi n

    1 p ,d2 log P

    dp2=

    ni=1 yip2

    +

    ni=1 yi n

    (1 p)2 .

    Hence, since E(Yi) = p for all i, we get

    E d2 log Pdp2 = E

    ni=1 Yip2

    E

    ni=1 Yi n(1 p)2

    =

    np

    p2 np n

    (1 p)2 =n

    p(1 p)Hence, CRLB(p) = p(1p)n .

    Since var(Y) = p(1p)n , it means that var(Y) achieves the bound and so Y has the minimum varianceamong all unbiased estimators ofp.

    Exercise 2.6 From lectures, we know that the joint pdf of independent normal r.vs is

    f(y; , 2) = (22)n2 exp

    1

    22

    ni=1

    (yi )2

    .

    Denote f = f(y|, 2). Taking log of the pdf we obtain

    log f = n2

    log(22) 122

    ni=1

    (yi )2.

    Thus, we havelog f

    =

    1

    2

    ni=1

    (yi )

    andlog f

    2= n

    22+

    1

    24

    ni=1

    (yi )2.

    It follows that2 log

    2= n

    2,

    2 log f

    2= 1

    4

    ni=1

    (yi )

    and

    2 log f 4

    = n24

    16

    ni=1

    (yi )2.

    Hence, taking expectation of each of the second derivatives we obtain the Fisher information matrix

    M =

    n

    20

    0 n24

    .

    Now, let g(, 2) = + 2. Then we have g/ = 1 and g/2 = 1. So

    CRLB(g(, 2)) = (1, 1)

    n

    20

    0 n24

    111

    = (1, 1) 2n 0

    0 24

    n

    11 = 2

    n(1 + 22).

    17

  • 7/30/2019 MS ExerciseSolutions

    18/26

    Exercise 2.7 Suppose that Y1, . . . , Y n are independent Poisson() random variables. Then we know thatT =

    ni=1 Yi is a sufficient statistic for .

    Now, we need to find out what is the distribution of T. We showed in Exercise 1.10 that the mgf of aPoisson() rv is

    MY(z) = e(ez1).

    Hence, we may write (we used z not to be confused with the values of T, denoted by t).

    MT(z) =n

    i=1

    MYi(z) =n

    i=1

    e(ez1) = en(e

    z1).

    Hence, T Poisson(n), and so its probability mass function is

    P(T = t) =(n)ten

    t!, t = 0, 1, . . . .

    Next, suppose that

    E{

    h(T)}

    =

    t=0 h(t)(n)ten

    t!= 0

    for > 0. Then we have

    t=0

    h(t)

    t!(n)t = 0

    for > 0. Thus, every coefficient h(t)/t! is zero, so that h(t) = 0 for all t = 0, 1, 2, . . ..

    Since T takes on values t = 0, 1, 2, . . . with probability 1 it means that

    P{h(T) = 0} = 1

    for all . Hence, T = ni=1 Yi is a complete sufficient statistic.

    Exercise 2.8 S =n

    i=1 Yi is a complete sufficient statistic for . We have seen that T = Y = S/n is aMVUE for . Now, we will find a unique MVUE of = 2. We have

    E(T2) = E

    1

    n2S2

    =1

    n2E(S2)

    =1

    n2

    var(S) + [E(S)]2

    =1

    n2 n + n22=

    1

    n + 2 =

    1

    nE(T) + 2.

    It means that

    E

    T2 1n

    T

    = 2,

    i.e., T2 1n T = Y2 1n Y is an unbiased estimator of 2. It is a function of a complete sufficient

    statistic, hence it is the unique MVUE of2.

    18

  • 7/30/2019 MS ExerciseSolutions

    19/26

    Exercise 2.9 We may write

    P(Y = y; ) =ye

    y!

    =1

    y!exp{y log }

    =1

    y!

    exp

    {(log )y

    }.

    Thus, we have a() = log , b(y) = y, c() = and h(y) = 1y! . That is the P(Y = y; ) has arepresentation of the form required by Definition 2.10.

    Exercise 2.10 (a) Here, for y > 0, we have

    f(y|, ) =

    ()y1ey

    = explog

    ()y1ey

    = exp

    y + ( 1)log y + log

    ()

    This has the required form of Definition 2.10, where p = 2 and

    a1(, ) = a2(, ) = 1b1(y) = y

    b2(y) = log y

    c(, ) = log ()h(y) = 1

    (b) By Theorem 2.8 (lecture notes) we have that

    S1(Y) =n

    i=1

    Yi and S2(Y) =n

    i=1

    log Yi

    are the joint complete sufficient statistics for and .

    Exercise 2.11 To obtain the Method of Moments estimators we compare the population and the sample

    moments. For a one parameter distribution we obtain as the solution of:E(Y) = Y . (2)

    Here, for y [0, ], we have:E(Y) =

    2

    2

    0

    y( y)dy

    =2

    2

    0

    (y y2)dy

    = 22 1

    2y2 1

    3y3

    0

    =1

    3.

    19

  • 7/30/2019 MS ExerciseSolutions

    20/26

    Then by (2) we get the method of moments estimator of :

    = 3Y .

    Exercise 2.12 (a) First, we will show that the distribution belongs to an exponential family. Here, for y > 0

    and known , we have

    f(y|, ) =

    ()y1ey

    = y1

    ()ey

    = y1 exp

    log

    ()ey

    = y1 exp

    y + log

    ()

    This has the required form of Definition 2.11, where p = 1 and

    a() = b(y) = y

    c() = log

    ()

    h(y) = y1

    By Theorem 2.8 (lecture notes) we have that

    S(Y) =

    n

    i=1 Yiis the complete sufficient statistic for .

    (b) The likelihood function is

    L(;y) =n

    i=1

    ()y1i e

    yi

    =n

    i=1

    ()elog

    y1i

    eyi

    = ()n e(1)ni=1 log yieni=1 yiThe the log-likelihood is

    l(;y) = log L(;Y) = n log n log () + ( 1)n

    i=1

    log yi n

    i=1

    yi.

    Then, we obtain the following derivative ofl(;Y) with respect to :

    dl

    d= n

    1

    n

    i=1yi

    This, set to zero, gives = nni=1 yi

    =

    y.

    20

  • 7/30/2019 MS ExerciseSolutions

    21/26

    Hence, the MLE() = /Y. So, we get

    MLE[g()] = MLE

    1

    =

    1

    MLE()=

    1

    Y =

    1

    n

    ni=1

    Yi =1

    nS(Y).

    That is, MLE[g()] is a function of the complete sufficient statistic.

    (c) To show that it is an unbiased estimator of g() we calculate:

    E[g()] = E 1n

    ni=1

    Yi

    =

    1

    n

    ni=1

    E(Yi) =1

    nn

    1

    =

    1

    .

    It is an unbiased estimator and a function of a complete sufficient statistics, hence, by Corollary

    2.2 (given in Lectures), it is the unique MVUE(g()).

    Exercise 2.13 (a) The likelihood is

    L(0, 1;y) = (22)

    n2 exp 1

    22n

    i=1

    (yi 0 1xi)2 .Now, maximizing this is equivalent to minimizing

    S(0, 1) =n

    i=1

    (yi 0 1xi)2,

    which is the criterion we use to find the least squares estimators. Hence, the maximum likelihood

    estimators are the same as the least squares estimators.

    (c) The estimates of0 and 1 are0 = Y 1x = 94.123,1 = ni=1 xiYi nxYn

    i=1 x2i nx2

    = 1.266.

    Hence the estimate of the mean response at a given x is

    E(Y|x = 40) = 94.123 1.266x.

    For the temperature of x = 40 degrees we obtain the estimate of expected hardness equal to

    E(Y|x) = 43.483.

    Exercise 2.14 The LS estimator of1 is

    1 = ni=1 xiYi nxYni=1 x

    2i nx2

    .

    We will see that it has a normal distribution and is unbiased, and we will find its variance. Now,

    normality is clear from the fact that we may write

    1 = ni=1 xiYi

    x

    ni=1 Yini=1 x2i nx2 =

    n

    i=1(xi

    x)

    Sxx Yi,

    21

  • 7/30/2019 MS ExerciseSolutions

    22/26

    where Sxx =n

    i=1 x2i nx2, so that 1 is a linear function ofY1, . . . , Y n, each of which is normally

    distributed. Next, we have

    E(1) = 1Sxx

    ni=1

    (xi x) E(Yi)

    =1

    Sxx

    n

    i=1xiE(Yi) x

    n

    i=1E(Yi)

    =1

    Sxx

    ni=1

    xi(0 + 1xi) xn

    i=1

    (0 + 1xi)

    =1

    Sxx

    n0x + 1

    ni=1

    x2i n0x n1x2

    =1

    Sxx

    n

    i=1

    x2i nx2

    1

    =1

    SxxSxx1 = 1.

    Finally, since the Yis are independent, we have

    var(1) =1

    (Sxx)2

    ni=1

    (xi x)2var(Yi)

    =1

    (Sxx)2

    ni=1

    (xi x)22

    =1

    (Sxx)2 Sxx

    2 =2

    Sxx.

    Hence, 1 N

    (1

    , 2/Sxx

    ) and a 100(1

    )% confidence interval for 1

    is

    1 tn2,2

    S2

    Sxx,

    where S2 = 1n2n

    i=1(Yi 0 1xi)2 is the MVUE for 2.

    Exercise 2.15 (a) The likelihood is

    L(;y) =

    ni=1

    y1i =

    n ni=1

    yi1

    ,

    and so the log-likelihood is

    (;y) = n log + ( 1)log

    ni=1

    yi

    .

    Thus, solving the equation

    d

    d=

    n

    + log

    n

    i=1

    yi

    = 0,

    we obtain the maximum likelihood estimator of as = n/ log(ni=1 Yi).(b) Since

    d2

    d2= n

    2,

    22

  • 7/30/2019 MS ExerciseSolutions

    23/26

    we have

    CRLB() =1

    E d2

    d2

    = 2n

    .

    Thus, for large n, N(, 2/n).(c) Here we have to replace CRLB() with its estimator to obtain the approximate pivot

    Q(Y, ) = 2n

    approx.

    AN(0, 1)

    This gives

    P

    z2

  • 7/30/2019 MS ExerciseSolutions

    24/26

    Then, for z2

    such that P(|Z| < z2

    ) = 1 , Z N(0, 1),we may write

    P

    z2

  • 7/30/2019 MS ExerciseSolutions

    25/26

    where a is a constant chosen to give significance level . It means that we reject the nullhypothesis if

    p0(1 p1)p1(1 p0)

    ni=1 yi

    1 p01 p1

    n a,

    which is equivalent to

    p0(1 p1)p1(1

    p0)

    ni=1 yi

    b,or, after taking logs of both sides, to

    ni=1

    yi log

    p0(1 p1)p1(1 p0)

    c,

    where b and c are constants chosen to give significance level .When p1 > p0 we have

    log

    p0(1 p1)p1(1 p0)

    < 0.

    Hence, the critical region can be written as

    R=

    {y : y

    d}

    ,

    for some constant d chosen to give significance level .

    By the central limit theorem, we have that (when the null hypothesis is true, i.e., p = p0):

    Y AN

    p0,p0(1 p0)

    n

    .

    Hence,

    Z =Y p0

    p0(1p0)

    n

    AN(0, 1)

    and we may write = P(Y d|p = p0) = P(Z z),

    where z =dp0p0(1p0)

    n

    . Hence d = p0 + z

    p0(1p0)

    n and the critical region is

    R =y : y p0 + z

    p0(1 p0)

    n

    .

    (b) The critical region does not depend on p1, hence it is the same for all p > p0 and so there is auniformly most powerful test for H0 : p = p0 against H1 : p > p0.

    The power function is

    (p) = P(Y R|p)

    = P

    Y p0 + z

    p0(1 p0)

    n|p

    = P

    Y pp(1p)

    n

    p0 + z

    p0(1p0)

    n pp(1p)

    n

    = 1 {g(p)},

    where g(p) =p0+z

    p0(1p0)

    n

    pp(1p)n

    and denotes the cumulative distribution function of the

    standard normal distribution.

    25

  • 7/30/2019 MS ExerciseSolutions

    26/26

    Question 2.18 (a) Let us denote:

    Yi Bern(p) - a response of mouse i to the drug candidate.

    Then, from Question 1, we have the following critical region

    R =y : y p0 + z

    p0(1 p0)

    n

    .

    Here p0 = 0.1, n = 30, = 0.05, z = 1.6449. It gives

    R = {y : y 0.19}

    From the sample we havep = y = 630 = 0.2, that is there is evidence to reject the null hypothesisat the significance level = 0.05.

    (b) The power function is

    (p) = 1 {g(p)},

    where g(p) =

    p0+z

    p0(1p0)

    np

    p(1p)n

    . When n = 30, p0 = 0.1 and p = 0.2 we obtain, for

    z0.05 = 1.6449,

    g(0.2) = 0.1356 and (0.1356) = 1 (0.1356) = 1 0.5539 = 0.4461.

    It gives the power equal to (0.2) = 0.5539. It means that the probability of type II error is0.4461, which is rather high.

    This is because the value of the alternative hypothesis is close to the null hypothesis and also the

    number of observations is not large.

    To find what n is needed to get the power (0.2) = 0.8 we calculate:

    g(p) =0.1 + z0.05

    0.09

    n 0.20.16

    n

    = 0.75 z0.05 0.25

    n.

    For (p) = 1 {g(p)} to be equal to 0.8 it means that {g(p)} = 0.2. From statistical tableswe obtain that g(p) = 0.8416. Hence, it gives, for z0.05 = 1.6449,

    n = (4 0.8416 + 3 1.6449)2 = 68.9.

    At least n = 69 mice are needed to obtain as high power test as 0.8 for detecting that theproportion is 0.2 rather than 0.1.