probability review - applied bayesian...

37
Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 31, 2017 1 Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <[email protected]>

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Probability ReviewApplied Bayesian Statistics

    Dr. Earvin Balderama

    Department of Mathematics & StatisticsLoyola University Chicago

    August 31, 2017

    1Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Random VariablesMathematically, a random variable is a function that maps a sample spaceinto the real numbers: X : S → R.

    1 Countable (discrete).2 Uncountable (continuous).

    Example: 3 coin tosses

    S = {HHH,HHT ,HTH,THH,THT ,TTH,TTT}We may want to create a random variable, X , defined as the number of tails.X = {0,1,2,3}

    2Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • ProbabilityMathematically, a probability function assigns numbers (between 0 and 1) tosubsets of a sample space: P : B → [0,1], ∀B ⊆ S.

    Two interpretations:1 (Frequentist) Based on long-run relative frequencies of possible

    outcomes.2 (Bayesian) Based on belief about how likely each possible outcome is.

    Regardless of interpretation, same basic probability laws apply, e.g.,P(A) ≥ 0,P(S) = 1,P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.

    3Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • ProbabilityMathematically, a probability function assigns numbers (between 0 and 1) tosubsets of a sample space: P : B → [0,1], ∀B ⊆ S.

    Two interpretations:1 (Frequentist) Based on long-run relative frequencies of possible

    outcomes.2 (Bayesian) Based on belief about how likely each possible outcome is.

    Regardless of interpretation, same basic probability laws apply, e.g.,P(A) ≥ 0,P(S) = 1,P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.

    3Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Probability distributionsA probability distribution is a list of all possible values of a random variableand their corresponding probabilities.

    1 Discrete random variable: probability mass function (PMF)PMF: f (x) = Prob(X = x) ≥ 0Mean: E(X ) =

    ∑x

    xf (x)

    Variance: V(X ) =∑

    x

    [x − E(X )]2f (x)

    2 Continuous random variable: probability density function (PDF)Prob(X = x) = 0 for all x

    PDF: f (x) ≥ 0, Prob(X ∈ B) =∫B

    f (x)dx

    Mean: E(X ) =∫

    xf (x)dx

    Variance: V(X ) =∫ [

    x − E(X )]2

    f (x)dx

    4Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Parametric families of distributionsA statistical analysis typically proceeds by selecting a PMF (or PDF) thatseems to match the distribution of a sample.We rarely know the PMF (or PDF) exactly, but we may assume it is from aparametric family of distributions, and estimate the parameters.

    1 Discrete random variablesBinomial (Bernoulli is a special case)PoissonNegativeBinomial

    2 Continuous random variablesNormalGamma (Exponential and χ2 are special cases)InverseGammaBeta (Uniform is a special case)

    5Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Bernoulli(θ)

    Only two outcomes, (success/failure, 0/1, zero/nonzero, etc.),where θ is the probability of success.X ∈ {0,1}

    PMF: f (x) = Prob(X = x) =

    {1− θ, if x = 0,θ, if x = 1.

    Mean: E(X ) =∑

    x

    xf (x) = 0(1− θ) + 1θ = θ

    Variance: V(X ) =∑

    x

    [x − θ]2f (x) = (0− θ)2(1− θ) + (1− θ)2θ = θ(1− θ)

    6Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Binomial(n, θ)

    X = number of “successes” in n independent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =

    (nx

    )θx(1− θ)n−x .

    Mean: E(X ) = nθVariance: V(X ) = nθ(1− θ)

    7Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Poisson(λ)

    X = number of events that occur in a unit of time.X ∈ {0,1, . . .}

    PMF: f (x) = Prob(X = x) =λxe−λ

    x!.

    Mean: E(X ) = λVariance: V(X ) = λ

    Note: Can be parameterized with λ = nθ,where θ is the expected number of events per unit time.E(X ) = V(X ) = nθ.

    8Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ NegativeBinomial(r , θ)

    X = number of “failures” until r “successes” in a sequence ofindependent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =

    (x+r−1x

    )θr (1− θ)x .

    Mean: E(X ) = r(1−θ)θVariance: V(X ) = r(1−θ)

    θ2

    Note: The geometric distribution is a special case: Geom(θ) = NB(1, θ).Note: MANY different ways to specify the NB distribution. The importantthing to note is that NB is a discrete count distribution that is a moreflexible model than Poisson.

    9Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Normal(µ, σ2)

    X ∈ (−∞,∞)

    PDF: f (x) =1√2πσ

    exp

    [−1

    2

    (x − µσ

    )2].

    Mean: E(X ) = µVariance: V(X ) = σ2

    10Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Gamma(a,b)

    X ∈ (0,∞)PDF: f (x) = b

    a

    Γ(a) xa−1e−bx .

    Mean: E(X ) =ab

    Variance: V(X ) =ab2

    Parameters: shape a > 0, rate b > 0.

    11Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ InverseGamma(a,b)

    If Y ∼ Gamma(a,b), then X = 1Y ∼ InverseGamma(a,b).X ∈ (0,∞)PDF: f (x) = b

    a

    Γ(a) x−a−1e−b/x .

    Mean: E(X ) =b

    a− 1, for a > 1.

    Variance: V(X ) =b2

    (a− 1)2(a− 2), for a > 2.

    Parameters: shape a > 0, rate b > 0.

    12Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • X ∼ Beta(a,b)

    X ∈ [0,1]PDF: f (x) = Γ(a+b)Γ(a)Γ(b) x

    a−1(1− x)b−1.

    Mean: E(X ) =a

    a + b

    Variance: V(X ) =ab

    (a + b)2(a + b + 1)

    Parameters: a > 0, b > 0.

    13Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Joint distributionsA random vector of p random variables: X = (X1,X2, . . . ,Xp).

    For now, suppose we have just p = 2 random variables, X and Y .(X ,Y ) can be discrete or continuous.

    14Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Joint distributions

    1 Discrete (X ,Y )joint PMF: f (x , y) = Prob(X = x ,Y = y)

    marginal PMF for X : fX (x) = Prob(X = x) =∑

    y

    f (x , y)

    marginal PMF for Y : fY (y) = Prob(Y = y) =∑

    x

    f (x , y)

    2 Continuous (X ,Y )joint PDF: f (x , y)

    Prob[(X ,Y ) ∈ B] =∫B

    f (x , y)dxdy

    marginal PDF for X : fX (x) =∫

    f (x , y)dy

    marginal PDF for Y : fY (y) =∫

    f (x , y)dx

    15Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Patients are randomly assigned a dose and followed to determine whetherthey develop a tumor.X ∈ {5,10,20} is the dose;Y ∈ {0,1} is 1 if a tumor develops and 0 otherwise.

    The joint PMF is given by

    XY 5 10 200 0.469 0.124 0.0491 0.231 0.076 0.051

    16Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find the marginal PMFs of X and Y .

    fY (0) =∑

    x

    f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642

    fY (1) =∑

    x

    f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358

    fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    17Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find the marginal PMFs of X and Y .

    fY (0) =∑

    x

    f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642

    fY (1) =∑

    x

    f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358

    fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    17Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find the marginal PMFs of X and Y .

    fY (0) =∑

    x

    f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642

    fY (1) =∑

    x

    f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358

    fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    17Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    conditional PMF of Y given X :

    f (y |x) = Prob(Y = y |X = x) = Prob(X = x ,Y = y)Prob(X = x)

    =f (x , y)fX (x)

    conditional =joint

    marginal

    Note: Here, x is treated as fixed, so f (y |x) is only a function of y .Note: This is not

    ∑x

    f (x , y) = fY (y) nor∑

    y

    f (x , y) = fX (x).

    Note: Showing that f (y |x) is a valid PMF,

    ∑y

    f (y |x) =∑

    y

    f (y , x)fX (x)

    =

    ∑y f (y , x)fX (x)

    =fX (x)fX (x)

    = 1

    18Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    conditional PMF of Y given X :

    f (y |x) = Prob(Y = y |X = x) = Prob(X = x ,Y = y)Prob(X = x)

    =f (x , y)fX (x)

    conditional =joint

    marginal

    Note: Here, x is treated as fixed, so f (y |x) is only a function of y .Note: This is not

    ∑x

    f (x , y) = fY (y) nor∑

    y

    f (x , y) = fX (x).

    Note: Showing that f (y |x) is a valid PMF,

    ∑y

    f (y |x) =∑

    y

    f (y , x)fX (x)

    =

    ∑y f (y , x)fX (x)

    =fX (x)fX (x)

    = 1

    18Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find f (y |x) and f (x |y).

    The joint PMF is given by

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    19Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find f (y |x) and f (x |y).

    The joint PMF is given by

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    Prob(Y = 0 |X = 5) = 0.4690.7

    = 0.67

    Prob(Y = 1 |X = 5) = 0.2310.7

    = 0.33

    Prob(X = 5 |Y = 0) = 0.4690.642

    = 0.67. . .

    20Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Discrete random variables

    Example

    Find f (y |x) and f (x |y).

    The joint PMF is given by

    XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

    0.7 0.2 0.1 1

    Prob(Y = 0 |X = 5) = 0.4690.7

    = 0.67

    Prob(Y = 1 |X = 5) = 0.2310.7

    = 0.33

    Prob(X = 5 |Y = 0) = 0.4690.642

    = 0.67. . .

    20Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    Example

    Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

    f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

    Find Prob(X > 7,Y > 40)

    =

    ∫ 5040

    ∫ 107

    f (x , y)dxdy

    = . . .

    = 0.25

    21Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    Example

    Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

    f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

    Find Prob(X > 7,Y > 40)

    =

    ∫ 5040

    ∫ 107

    f (x , y)dxdy

    = . . .

    = 0.25

    21Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    Example

    Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

    f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

    Find fX (x)

    =

    ∫ 5020

    f (x , y)dy

    = . . .

    = 0.52e−|x−7|

    22Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    Example

    Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

    f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

    Find fX (x)

    =

    ∫ 5020

    f (x , y)dy

    = . . .

    = 0.52e−|x−7|

    22Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    conditional PDF of Y given X :

    f (y |x) = f (x , y)fX (x)

    conditional =joint

    marginal

    Note: Here, x is treated as fixed, so f (y |x) is only a function of y .

    Note: This is not∫

    f (x , y)dx = fY (y) nor∫

    f (x , y)dy = fX (x).

    Note: Showing that f (y |x) is a valid PDF,∫f (y |x)dy =

    ∫f (y , x)fX (x)

    dy =∫

    f (y , x)dyfX (x)

    =fX (x)fX (x)

    = 1

    23Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    conditional PDF of Y given X :

    f (y |x) = f (x , y)fX (x)

    conditional =joint

    marginal

    Note: Here, x is treated as fixed, so f (y |x) is only a function of y .

    Note: This is not∫

    f (x , y)dx = fY (y) nor∫

    f (x , y)dy = fX (x).

    Note: Showing that f (y |x) is a valid PDF,∫f (y |x)dy =

    ∫f (y , x)fX (x)

    dy =∫

    f (y , x)dyfX (x)

    =fX (x)fX (x)

    = 1

    23Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Continuous random variables

    Example

    Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

    f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

    Find f (y |x).

    24Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Bivariate normal distribution

    The bivariate normal is the most common multivariate family.There are 5 parameters:

    1 µx is the marginal mean of X .2 µy is the marginal mean of Y .3 σ2x is the marginal variance of X .4 σ2y is the marginal variance of Y .5 ρxy is the correlation between X and Y .

    The joint PDF is

    1

    2πσXσY√

    1− ρ2exp

    −(

    x−µXσX

    )2+(

    y−µYσY

    )2− 2ρ

    (x−µXσX

    )(y−µYσY

    )2(1− ρ2)

    25Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Bivariate normal distribution

    Example

    Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the marginal distribution of X .

    26Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Bivariate normal distribution

    Example

    Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the conditional distribution of Y given X .

    27Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Bayes’ theorem

    Recall conditional distributions:

    f (y |x) = f (x , y)f (x)

    conditional =joint

    marginal

    Can be extended to

    f (y |x) = f (x , y)f (x)

    =f (x |y)f (y)

    f (x)=

    f (x |y)f (y)∑all y

    f (x |y)f (y)

    This is the form of the famous “Bayes’ theorem” (or “Bayes’ rule”).Note: the denominator is simply a normalizing constant.

    28Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama

  • Bayes’ theorem

    In a Bayesian data analysis, we select:1 the prior f (θ),2 the likelihood f (y |θ).

    Based on these, we must compute3 the posterior f (θ |y).

    Bayes’ theorem

    The mathematical formula to convert the likelihood and prior to the posterior.

    f (θ |y) = f (y |θ)f (θ)f (y)

    Posterior ∝ Likelihood × Prior

    29Applied Bayesian Statistics

    Last edited September 8, 2017 by Earvin Balderama