analogues slides

Upload: christopher-gian

Post on 07-Apr-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Analogues Slides

    1/44

    Probability and Statistics:

    A Sample Analogues Approach

    Charlie Gibbons

    Economics 140University of California, Berkeley

    Summer 2011

  • 8/6/2019 Analogues Slides

    2/44

    Outline

    1 Populations and samples

    2 Probability

    Simple probabilityJoint probabilities

    Conditional probabilityIndependence

    3 Expectations

    4 DispersionVarianceCovariance

    5 Appendix: Additional examples

  • 8/6/2019 Analogues Slides

    3/44

    Populations and samples

    The population is the universe of units that you care about.Example: American adults.

    A sample is a subset of the population.Example: The observations in the Current Population Survey.

    Econometrics uses a sample to make inferences about thepopulation. Sample statistics have population analogues.

  • 8/6/2019 Analogues Slides

    4/44

    Sample frequencies

    We begin with some random quantity Y that takes on Kpossible values y1, . . . , yK. The value of this random variable forobservation i is yi; yi is a realization of the random variable Y.Example: The roll of a die can take on values 1, . . . ,6.

    We ask, what is the sample frequency of some y from the sety1, . . . , yK? That is, what fraction of our observations have anobserved value of y?

    All we do is count the number of observations that have the

    value y and divide by the number of observations:

    f(y) =#{yi = y}

    N.

  • 8/6/2019 Analogues Slides

    5/44

    Probability mass function

    We typically define the probability of y as the fraction of timesthat it arises if we had infinite observationsthe samplefrequency of y in an infinite sample.

    We write this as Pr(Y = y). This is the probability that a

    random variable Y takes on the value y.Example: The probability of getting some value y {1, . . . , 6}when you roll a die is Pr(Y = y) = 1

    6for all y.

    Terminology: y is a realization of the random variable Y and

    Pr(Y = y) is a probability mass function.

  • 8/6/2019 Analogues Slides

    6/44

    Cumulative distribution function

    We might care about the probability that Y takes on a value ofy or less: Pr(Y y). This is called the cumulative distributionfunction (CDF) of Y.

    To get this, we add up the probability of getting any value less

    than y:F(y) Pr(Y y) =

    yjy

    Pr(Y = yj).

    Example: When you roll a die, the probability of getting a 3 orless is

    F(3) = Pr(Y 3) = Pr(Y = 1) + Pr(Y = 2) + Pr(Y = 3) =1

    2.

  • 8/6/2019 Analogues Slides

    7/44

    Continuous random variables

    Life is pretty simple when we have a finite number of y values,but what if we have an infinite number?

    The definition of the sample frequency doesnt change, butoften the frequency of any particular value of y will be 1

    Ni.e.,

    only one observation will have that value.

  • 8/6/2019 Analogues Slides

    8/44

    Probability density function

    Instead of a probability mass function, we have a probabilitydensity function that is defined as the derivative of the CDF:

    f(y) =d

    dyF(y).

    Intuition: The derivative of the CDF answers, how much doesthe total probability change if we consider a little bigger valueof y? How much more probable is getting a value less than y ifwe make y a bit bigger? This is additional contribution in

    probability of a small change in y is the probability density of y.Note: For continuous random variables, the CDF is the integralof the PDF (cf., for discrete random variables, the CDF is thesum of the PMF).

  • 8/6/2019 Analogues Slides

    9/44

    CDF-PDF example

    3 2 1 0 1 2 3

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x

    CDF

    3 2 1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    x

    PDF

    Figure: Normal CDF and PDF; slope of CDF line is height of PDF line

  • 8/6/2019 Analogues Slides

    10/44

    Joint probabilities

    Suppose that we have 2 random variables, X and Y and wantto consider their joint frequency in the sample. Extending ourprevious definition, we have

    f(x, y) =#{yi = y and xi = x}

    N.

    These are often called cross tabs (tabulations).

    We have obvious extensions to a joint PMF, Pr(X = x, Y = y), joint PDF, f(x, y), and joint CDF, F(x, y).

    Examples: Joint PMF Joint CDF (discrete)Joint normal PDF Joint normal CDF

  • 8/6/2019 Analogues Slides

    11/44

    Conditional frequencies

    Suppose that we have two random variables, but we want toconsider the distribution of one for some fixed value of theother. That is, what is the distribution of Y when X = x?

    Note that we are limiting our samplewe only care about the

    observations such that xi = x. Of this subgroup, what is thefrequency of y?Example: What is the distribution of student heights given thatthey are male?

    f(y|X = x) = #{yi = y and xi = x}#{xi = x}.

    This is the sample frequency of y given or conditional upon Xbeing xthe conditional sample frequency.

  • 8/6/2019 Analogues Slides

    12/44

    Conditional probability

    The population analogue of conditional frequency, theconditional probability of Y, forms the core of econometrics.The probability that Y takes the value y given that X takesthe value x is

    Pr(Y = y|X = x) = Pr(Y = y and X = x)Pr(X = x).

    We divide by the probability that X = x to account for the factthat we are only considering a subpopulation.

    Example: Conditional probabilities and dice

  • 8/6/2019 Analogues Slides

    13/44

    Dictatorships and growth

    Example from Bill Easterlys Benevolent Autocrats (2011).

    Growth Commission Report, World Bank

    Growth at such a quick pace, over such a long period, requiresstrong political leadership.

    Thomas Friedman, NY Times

    One-party autocracy certainly has its drawbacks. But when it isled by a reasonably enlightened group of people, as China istoday, it can also have great advantages. That one party canjust impose the politically difficult but critically importantpolicies needed to move a society forward in the 21st century.

  • 8/6/2019 Analogues Slides

    14/44

    Wrong question, wrong interpretation

    Autocracy Democracy

    Growth Success 9 1

    f(Autocracy | Success) =9

    9 + 1= 90%

    f(Democracy | Success) =1

    9 + 1= 10%

  • 8/6/2019 Analogues Slides

    15/44

    Typical question

    Econometricians generally ask for the

    Pr(outcome | treatment and other predictors).

  • 8/6/2019 Analogues Slides

    16/44

  • 8/6/2019 Analogues Slides

    17/44

    Independence

    X and Y are independent if and only if

    FX,Y(x, y) = FX(x)FY(y)

    (note: these are the population CDFs) and

    fX,Y(x, y) = fX(x)fY(y).

    We also see that X and Y are independent if and only if

    fY|X(y | X = x) = fY(y) x X.

    Example: Whats the probability of getting heads on a secondcoin toss if you got heads on the first?

    This implies that knowing X gives you no additional ability topredict Y, an intuitive notion underlying independence.Example: Independence and dependence

  • 8/6/2019 Analogues Slides

    18/44

    Sample average

    We are all familiar with the sample average of Y: add up all theobserved values and divide by N:

    y =1

    N

    Ni=1

    yi.

    Alternatively, we can consider every possible value of Y,y1, . . . , yK and multiply each by its sample frequency:

    y =

    Kj=1

    yj#{yi = yj}

    N .

  • 8/6/2019 Analogues Slides

    19/44

    Expectations

    The population version is the expectationtake each value thatY can take on and multiply by its probability (as opposed to itssample frequency):

    E

    (Y) =

    K

    j=1 y

    j Pr(Y = yj).

    For a continuous random variable, we turn sums into integrals:

    E(Y) =

    yf(y) dy.

  • 8/6/2019 Analogues Slides

    20/44

    Expectations of functions

    We can calculate expectations of functions of Y, g(Y).We have the equations

    E[g(Y)] =yY

    f(y)g(y)

    E[g(Y)] =

    g(y)f(y) dy

    for discrete and continuous random variables respectively.

  • 8/6/2019 Analogues Slides

    21/44

    Expectations of functions example

    Note that, in general, E[g(Y)] = g[E(Y)].Using a die rolling example,

    E(Y2) = 12 1

    6+ 22

    1

    6+ 32

    1

    6+ 42

    1

    6+ 52

    1

    6+ 62

    1

    6

    = 916

    = 15.17

    = 3.52 = 12.25

    E(Y2) = [E(Y)]2

  • 8/6/2019 Analogues Slides

    22/44

    Expectations are linear

    Expectations are linear operators, i.e.,

    E(a g(Y) + b h(Y) + c) = a E[g(Y)] + b E[h(Y)] + c.

  • 8/6/2019 Analogues Slides

    23/44

    Expectations and independence

    Recall that, for independent random variables X and Y,

    fY|X(y | X = x) = fY(y) and fX|Y(x | Y = y) = fX(x)

    Hence,E

    (Y | X) =E

    (Y) andE

    (X | Y) =E

    (X).

    C

  • 8/6/2019 Analogues Slides

    24/44

    Conditional expectations

    The conditional expectationE

    [Y | X = x] asks, what is theaverage value of Y given that X takes on the value x?

    Conditional expectations hold X fixed at some x and the valueE[Y | X = x] varies depending upon which x we pick.

    Since X is fixed, it isnt random and can come out of theexpectation:

    E[g(X)Y + h(X) | X = x] = g(x)E[Y] + h(x).

    L f i d i

  • 8/6/2019 Analogues Slides

    25/44

    Law of iterated expectations

    The law of iterated expectations says that

    EY[Y] = EX [E[Y | X = x]] ;

    the expectation of Y is the conditional expectation of Y at

    X = x averaged over all possible values of X.

    V i

  • 8/6/2019 Analogues Slides

    26/44

    Variance

    The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofY:

    2

    Y Var(Y) = E

    (Y )2

    Multiplying this out yields:

    = E

    Y2 2Y + 2

    = E

    Y2

    2E(Y) + 2

    = E Y2 [E(Y)]2

    S diff t i

  • 8/6/2019 Analogues Slides

    27/44

    Same mean, different variance

    3 2 1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    Density

    V i f t

  • 8/6/2019 Analogues Slides

    28/44

    Variance facts

    The standard deviation, , of a random variable is the squareroot of its variance; i.e., =

    Var(Y).

    While the variance is in squared units, the standard deviation isin the same units as y.

    See that Var(aY + b) = a2

    Var(Y).

    S l l f i

  • 8/6/2019 Analogues Slides

    29/44

    Sample analogue of variance

    A candidate for the sample analogue of the variance of Y is

    2 =

    1

    N

    Ni=1

    (yi y)2.

    It turns out that this is a biased estimator of2, so we use

    2 =

    1

    N 1

    Ni=1

    (yi y)2

    instead to get an unbiased estimator.

    It turns out that the other estimator is consistent; its bias goesto 0 as N goes to .

    Covariance

  • 8/6/2019 Analogues Slides

    30/44

    Covariance

    The covariance of random variables X and Y is defined as

    Cov(X, Y) XY = E [(X EX(X)) (Y EY(Y))]

    = E(XY) XY.

    We have

    Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X, Y).

    Note that covariance only measures the linear relationship

    between two random variables; well see just what this meanslater on.

    The covariance between two independent random variables is 0.

    Correlation

  • 8/6/2019 Analogues Slides

    31/44

    Correlation

    The correlation of random variables X and Y is defined as

    XY =XY

    XY

    .

    Correlation is a normalized version of covariancehow big is

    the covariance relative to the variation in X and Y? Both willhave the same sign.

    Sample analogues for covariance and correlation

  • 8/6/2019 Analogues Slides

    32/44

    Sample analogues for covariance and correlation

    Of course, we can get an unbiased estimator for covariance:

    XY =1

    N 1

    Ni=1

    (xi x)(yi y).

    The sample analogue of correlation can be calculated using thepreceding definitions.

    Standardization

  • 8/6/2019 Analogues Slides

    33/44

    Standardization

    Suppose that we take Y, subtract off its mean and divide byits standard deviation . We have

    E

    Y

    =E[Y]

    = 0

    and

    Var

    Y

    =

    1

    2Var(Y ) =

    1

    2Var(Y) = 1.

    This is called standardizing a random variable.

    Appendix: Additional examples

  • 8/6/2019 Analogues Slides

    34/44

    Appendix: Additional examples

    Example setup

  • 8/6/2019 Analogues Slides

    35/44

    Example setup

    Consider the roll of two dice and let X and Y be the outcomeson each die. Then the 36 (equally-likely) possibilities are:

    x, y 1 2 3 4 5 6

    1 1,1 1,2 1,3 1,4 1,5 1,6

    2 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

    Joint PMF example

  • 8/6/2019 Analogues Slides

    36/44

    Joint PMF example

    The joint probability mass function (joint PMF), fX,

    Y is

    fX,Y(x, y) = Pr(X = x and Y = y)

    What is fX,Y(6, 5)?

    x, y 1 2 3 4 5 61 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

    fX,Y(6, 5) =1

    36

    Joint CDF definition

  • 8/6/2019 Analogues Slides

    37/44

    Joint CDF definition

    The joint cumulative distribution function (joint CDF),FX,Y(x, y), of the random variables X and Y is defined by

    FX,Y(x, y) = Pr(X x and Y y)

    =

    x

    s=

    y

    t=f

    X,

    Y(s , t)

    Joint CDF example

  • 8/6/2019 Analogues Slides

    38/44

    Joint CDF example

    What is FX

    ,

    Y(2, 3)?

    x, y 1 2 3 4 5 6

    1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6

    3 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

    FX,Y(2, 3) = 636

    = 16

    Joint normal PDF

  • 8/6/2019 Analogues Slides

    39/44

    Jo t o a

    Joint PDF of independent normals

    4

    2

    0

    2

    4

    4

    2

    0

    2

    4

    XY

    Density

  • 8/6/2019 Analogues Slides

    40/44

    Conditional probability example

  • 8/6/2019 Analogues Slides

    41/44

    p y p

    What is f(Y = 3 | X 2)?

    1 2 3 4 5 6

    1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6

    fY|X(y = 3 | X 2) = 212= 1

    6

    Note how our table changed dimensions because conditioning isall about changing the range of values that we care about; here,we only care about what happens if X 2.

    Independence example

  • 8/6/2019 Analogues Slides

    42/44

    p p

    We showed in the two dice example that FX,Y(2, 3) =1

    6

    , whichis equal to

    FX(2) FY(3) =2

    6

    3

    6=

    1

    6.

    This is because the rolls of the two dice are intuitively

    independentthe result on one die has no bearing on that ofthe other.

    A new random variable

  • 8/6/2019 Analogues Slides

    43/44

    Imagine instead that X is the outcome on the first die and Z isthe sum of the outcomes on two dice. Then we have

    x, z 1 2 3 4 5 6

    1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,8

    3 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12

    As we would imagine, the result of X influences the value of Z,so they shouldnt be independent.

    Dependence example

  • 8/6/2019 Analogues Slides

    44/44

    Lets prove it: What is FX,Z(2, 5)?

    x, z 1 2 3 4 5 6

    1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,83 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12

    FX,Z(2, 5) =

    7

    36 =

    5

    54 =

    2

    6

    10

    36 = FX(2) FZ(5)