further inference topics

Upload: thrphys1940

Post on 02-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Further Inference Topics

    1/31

    1/31

    EC114 Introduction to Quantitative Economics10. Further Inference Topics

    Department of EconomicsUniversity of Essex

    13/15 December 2011

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

  • 8/11/2019 Further Inference Topics

    2/31

    2/31

    Outline

    1 Correlations and Independence

    2 Tests About Two Populations

    Reference: R. L. Thomas,Using Statistics in Economics,McGraw-Hill, 2005, sections 6.36.4.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

  • 8/11/2019 Further Inference Topics

    3/31

    Correlations and Independence 3/31

    Although the covariance can tell us whether there is a

    positive or a negative linear association between XandY,

    it tells us nothing about the strength of this association.

    For example, what constitutes a large linear association,

    whether positive or negative?

    Thecorrelation coefficient,, provides such information,

    and is defined as

    = Cov(X,Y)V(X)

    V(Y)

    .

    The usefulness of the correlation coefficient lies in the factthat, unlike the covariance, it can only take values within a

    definite finite range.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

  • 8/11/2019 Further Inference Topics

    4/31

    Correlations and Independence 4/31

    While a covariance can take any value between and+, the correlation is restricted to values within the range

    1to+1.

    When there is an exact (perfect) positive linear association

    between XandY, the correlation takes the value = +1.

    Similarly, when there is an exact (perfect) negative linear

    association betweenXandY, the correlation is = 1.Furthermore, when there is no linear association betweenXandYat all, then = 0.

    The correlation coefficient gives us a standard by which we

    can judge the strength of any linear association between

    two variables.Clearly, if were to take a value close to zero, we wouldjudge the association to be a very weak one.

    However, values close to +1 or 1would imply strongpositive and negative linear associations, respectively.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

  • 8/11/2019 Further Inference Topics

    5/31

    Correlations and Independence 5/31

    For the die-rolling example, we have already computed the

    covariance as

    1.0625, whereas we found V(X) = 17.19

    andV(Y) = 0.94.Hence, we obtain the correlation as:

    = 1.062517.19

    0.94

    = 0.26.

    The correlation is negative, as expected, but the value of is rather closer to 0 than to 1.So we can say that there is a fairly weak negative linear

    association betweenXandYin this case.

    This is not unexpected because, intuitively, we would not

    expect a close relationship between X, the product of the

    two numbers on the two dice, and Y, their difference.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    C l i d I d d /

  • 8/11/2019 Further Inference Topics

    6/31

    Correlations and Independence 6/31

    Recalling thatCov(X,Y) = E(XY)E(X)E(Y)we can writeas=

    E(XY)E(X)E(Y)V(X)

    V(Y)

    .

    It follows that, ifE(XY) =E(X)E(Y), then= 0i.e. thecorrelation betweenXandYwill be zero.

    But this refers tolinearrelationships, and in Economics,

    relationships are not always linear.

    Hence we often need to discover whether any non-linear

    relationships between variables are present.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    C l ti d I d d 7/31

  • 8/11/2019 Further Inference Topics

    7/31

    Correlations and Independence 7/31

    In the lecture on probability (Lecture 2), we saw that

    independence between two eventsAandBimplied thatPr(AandB) = Pr(A) Pr(B).That is, the joint probability of two independent events

    occurring is given by the product of the marginal

    probabilities of the individual events.

    Two random variables,XandY, are said to be independent

    if the joint probabilities are the product of the relevant

    marginal probabilities for all possible combinations of X

    andY.

    That is,XandYareindependentif and only ifp(X, Y) =f(X) g(Y) for allXandY.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 8/31

  • 8/11/2019 Further Inference Topics

    8/31

    Correlations and Independence 8/31

    Consider the following two independent random variables,

    XandY, whose joint and marginal distributions are:

    Y\X 1 2 3 4 5 g(Y)5 0.06 0.04 0.04 0.04 0.02 0.20

    10 0.09 0.06 0.06 0.06 0.03 0.30

    15 0.15 0.10 0.10 0.10 0.05 0.50

    f(X) 0.30 0.20 0.20 0.20 0.10

    Notice that the relationshipp(X, Y) =f(X)g(Y)holds forallcombinations ofXandY.

    For example,p(4, 5) =f(4)g(5)andp(1, 10) =f(1)g(10).

    If just one combination of XandYwere to fail to obey theconditionp(X,Y) = f(X)g(Y), then the variables could nolonger be called independent.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 9/31

  • 8/11/2019 Further Inference Topics

    9/31

    Correlations and Independence 9/31

    While zero correlation(= 0)implies the absence of anylinear association betweenXandY, independence is a

    stronger condition.Independence implies the absence ofanyassociation

    between XandY, linear or nonlinear.

    Hence independence implies zero correlation, but zero

    correlation does not necessarily imply independence.

    Thus the conditionE(XY) =E(X)E(Y)impliesCov(X,Y) = 0, and hence zero correlation, but does notnecessarily imply independence.

    For independence we also require p(X,Y) =f(X)g(Y).

    Thus, if two variables are uncorrelated, they are notlinearly associated, but they could still be not independent

    (i.e. dependent) if there were some nonlinear (possibly

    weak) association present.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 10/31

  • 8/11/2019 Further Inference Topics

    10/31

    Correlations and Independence 10/31

    Although the discussion here has been restricted to

    discrete variables, the concepts of independence and

    correlation apply equally well to continuous variables.

    However, it can be shown that the distinction between thetwo concepts disappears when continuous variables are

    normally distributed.

    If two normally distributed variables,XandY, are

    uncorrelated, then they must be independent.

    Another useful property of normally distributed variables is

    given by the following theorem.

    Theorem

    Any linear function of a series of independently and normally

    distributed variables is itself normally distributed.

    Example: ifX,YandZare all independent normally

    distributed random variables, then it follows that

    W= 2X+ 4Y

    3Zwill also be normally distributed.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 11/31

  • 8/11/2019 Further Inference Topics

    11/31

    Correlations and Independence 11/31

    Correlations can also be computed for samples.

    Thesample correlation coefficient,R, is defined as

    R=

    (X X)(Y Y)

    (X X)2

    (Y Y)2 .

    As with the population correlation we find that

    1 R 1

    with the same interpretation of values e.g. R= 1implies

    a perfect negative correlation between XandYetc.To compute Rwe dont need to worry about normalising

    anything bynorn 1.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 12/31

  • 8/11/2019 Further Inference Topics

    12/31

    Correlations and Independence 12/31

    To see this note that

    (XX)(Y Y) = XY n

    XY,(X X)2 = X2 nX2,(Y Y)2 =

    Y2 nY2.

    As an example, a sample of 10 trials of the two-dice

    experiment yields the following values for XandY:

    X 3 2 12 3 4 4 6 12 4 8

    Y 2 1 1 2 0 3 1 1 0 2

    From these values we obtain:

    X= 58,

    X2 = 458,

    XY= 72,

    Y= 13, Y2 = 25.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Correlations and Independence 13/31

  • 8/11/2019 Further Inference Topics

    13/31

    p

    Hence X= 58/10= 5.8, Y= 13/10= 1.3and so

    (X X)(Y Y) = 72 10(5.8)(1.3) = 3.40,

    (X X)2 = 458 10(5.8)2 = 121.60,

    (Y Y)2 = 25 10(1.3)2 = 8.10.

    It follows that

    R= 3.4121.6

    8.1

    = 0.108,

    suggesting that there is a weak negative linear relationshipbetween XandY.

    (Note that the population correlation = 0.260.)

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 14/31

  • 8/11/2019 Further Inference Topics

    14/31

    p

    Often in statistics we need to compute parameter values

    relating to two or more different populations.

    Consider two cities, AandB.Suppose that a researcher suspects that mean annual

    income in cityBis greater than in city A, and wishes to test

    whether this is actually the case.

    Let1 and2 denote the population mean incomes incitiesAandBrespectively.

    As always, we formulate a null hypothesis:

    H0:1=2 (no difference between mean incomes)

    and an alternative hypothesis:

    HA:1< 2 (mean income is greater in cityB).

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 15/31

  • 8/11/2019 Further Inference Topics

    15/31

    How can we derive a suitable test statistic?

    LetX1 be the annual income of a resident from city A, and

    X2 the annual income of a resident in city B.

    We therefore have a population of very many values forX1from cityA, with a mean1 and a variance

    2

    1.

    Similarly, we have a population of very many values forX2from cityB, with mean2 and variance

    2

    2.

    Notice that the absolute sizes of the populations in the two

    cities are unimportant, provided both cities are large.

    We can now apply the Central Limit Theorem (CLT) to bothpopulations in turn.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 16/31

  • 8/11/2019 Further Inference Topics

    16/31

    Suppose we take a sufficiently large sample of sizen1 from

    cityAand compute the sample mean income X1.

    Then the CLT implies that

    X1 N

    1,2

    1

    n1

    .

    Similarly, taking a sufficiently large sample of size n2 from

    cityByields the following result for the sample mean X2:

    X2 N

    2,2

    2

    n2

    .

    As we are interested in the difference between the two

    unknown population means,1 2, it makes sense tobase any test statistic on the quantity X1 X2, thedifference between the two sample means.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 17/31

  • 8/11/2019 Further Inference Topics

    17/31

    If it were possible to take very many samples, each

    yielding a value for X1 X2, we would obtain a samplingdistribution for X1 X2, from which we could derive asuitable test statistic.

    We therefore need to find the sampling distribution for

    X1 X2.As both X1 and X2 are normally distributed it follows (fromthe Theorem on slide 10) that X1 X2 is also normallydistributed.

    We therefore need to find the mean and variance ofX1 X2.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 18/31

  • 8/11/2019 Further Inference Topics

    18/31

    It is straightforward to show that

    E(X1 X2) =E(X1)E(X2) =1 2.

    If we assume that the two samples are independent (not

    unreasonable) thenCov(X1,X2) = 0and so

    V(X1 X2) =V(X1) + V(X2) = 21

    n1+

    22n2

    .

    Hence

    X

    1 X

    2 N1 2,2

    1

    n1 +

    22

    n2

    .

    This is summarised in the diagram on the next slide.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 19/31

  • 8/11/2019 Further Inference Topics

    19/31

    We can use this normal sampling distribution to derive an

    appropriate test statistic.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 20/31

  • 8/11/2019 Further Inference Topics

    20/31

    We can now standardise X1 X2 in the usual way bysubtracting the mean and dividing by the standard

    deviation to obtain aN(0, 1)distribution:

    X1 X2 (1 2)2

    1

    n1+

    2

    2

    n2

    N(0, 1).

    However, we require this distribution under H0:1=2,resulting in the test statistic

    TS=X1 X2

    2

    1

    n1 +

    2

    2

    n2 N(0, 1)

    underH0.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 21/31

  • 8/11/2019 Further Inference Topics

    21/31

    Recall thatHA:1< 2 implying that1 2< 0.We therefore have a lower-sided one-tail test.

    Adopting a 5% level of significance the critical value from

    theN(0, 1)distribution is

    1.64and our test criterion

    becomes:

    rejectH0 ifTS< 1.64but reserve judgement if TS> 1.64.

    The test criterion is illustrated on the next slide.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 22/31

  • 8/11/2019 Further Inference Topics

    22/31

    Suppose we take samples of size n1=n2= 200andobtain:

    X1= 14, 860, s1= 1655, X2= 17, 230, s1= 2108.

    As21

    and22

    are unknown we replace them with the

    unbiased estimatorss21

    ands22

    .

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 23/31

  • 8/11/2019 Further Inference Topics

    23/31

    Then

    TS= 14, 860 17, 230

    16552

    200 + 2108

    2

    200

    = 12.52.

    Using the test criterion we find that TS< 1.64and hencewe rejectH0:1=2 in favour ofHA:1< 2 i.e. there isevidence that the mean income in cityAis below that in

    cityB.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 24/31

  • 8/11/2019 Further Inference Topics

    24/31

    The preceding results were based on large samples and

    the CLT.

    However, when samples are small we have to make use ofthe Studentst-distribution.

    Provided that both populations:

    1 are normally distributed, and2

    have the same variance2

    (i.e.2

    1 =2

    2 =2

    ),then

    X1 X2 (1 2)

    1

    n1+ 1

    n2

    N(0, 1),

    even when samples are small.

    But this will not be the case when 2 is unknown and hasto be replaced by an estimator of it.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 25/31

  • 8/11/2019 Further Inference Topics

    25/31

    When2 is unknown we can estimate it using

    s2 =(n1

    1)s21

    + (n2 1)s2

    2n1+ n2 2 .

    It then follows that

    X1

    X2

    (1

    2)

    s

    1

    n1+ 1

    n2 tn1+n22.

    For example, ifH0:1=2 then we can use the teststatistic

    TS=

    X1

    X2

    s

    1

    n1+ 1

    n2

    tn1+n22

    underH0 and apply the usual testing procedure.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

  • 8/11/2019 Further Inference Topics

    26/31

    Tests About Two Populations 27/31

  • 8/11/2019 Further Inference Topics

    27/31

    An example of an F-distribution with 20 d.f. for the

    numerator and 20 d.f. for the denominator is as follows:

    The distribution is strictly positive and not symmetric so wehave to find two critical values for a two-tail test from the

    following table:

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 28/31

  • 8/11/2019 Further Inference Topics

    28/31

    Upper 2.5% critical values of the F-distribution withv1 degrees of freedom for the numeratorandv2 degrees of freedom for the denominator

    v1

    v2 1 2 3 4 5 6 7 8 9

    1 647.79 799.50 864.16 899.58 921.85 937.11 948.22 956.66 963.282 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.393 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.474 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.905 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.686 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.527 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.828 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36

    9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.0310 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.7811 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.5912 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.4413 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.3114 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.2115 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.1216 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.0518 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.9320 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.8422 5.79 4.38 3.78 3.44 3.22 3.05 2.93 2.84 2.76

    24 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.7026 5.66 4.27 3.67 3.33 3.10 2.94 2.82 2.73 2.6528 5.61 4.22 3.63 3.29 3.06 2.90 2.78 2.69 2.6130 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.5740 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.4560 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33

    120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 5.02 3.69 3.12 2.79 2.57 2.41 2.29 2.19 2.11

    NB: Entries areFu such that Pr(Fv1,v2 > Fu) = 0.025.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Tests About Two Populations 29/31

  • 8/11/2019 Further Inference Topics

    29/31

    Table A.4 in Thomas provides additional d.f. for the

    numerator as well as additional significant levels.

    The table gives the upper-tail critical value,F

    u

    ; thelower-tail critical value is simply the inverse of this i.e.

    Fl = 1

    Fu.

    For example, with 8 d.f. for the numerator and 30d.f. for thedenominator, the table gives

    Fu = 2.65 Fl = 12.65

    = 0.38.

    The test criterion for the test is:

    rejectH0 ifTSFu

    but reserve judgement if Fl

  • 8/11/2019 Further Inference Topics

    30/31

    For example, suppose we have two samples yielding:

    n1= 10, s21= 14.5, n2= 20, s22= 4.8.

    The resulting test statistic is

    TS= 14.5

    4.8 = 3.02

    and has an F9,19 distribution under the null.

    The upper two-tail 5% critical value (which puts 2.5% into

    each tail) is 2.88 and the lower-tail value is 1/2.88= 0.35.

    AsTS> 2.88we reject the null in favour of the alternativethat2

    1=2

    2.

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics

    Summary 31/31

    Summary

  • 8/11/2019 Further Inference Topics

    31/31

    Summary

    Correlations and independence.

    Tests about two populations.

    Next term:

    Econometrics (but first enjoy the vacation. . . )

    EC114 Introduction to Quantitative Economics 10. Further Inference Topics