probability review - applied bayesian...

Probability ReviewApplied Bayesian Statistics

Dr. Earvin Balderama

Department of Mathematics & StatisticsLoyola University Chicago

August 31, 2017

1Applied Bayesian Statistics

Last edited September 8, 2017 by Earvin Balderama

Random VariablesMathematically, a random variable is a function that maps a sample spaceinto the real numbers: X : S → R.

1 Countable (discrete).2 Uncountable (continuous).

Example: 3 coin tosses

S = {HHH,HHT ,HTH,THH,THT ,TTH,TTT}We may want to create a random variable, X , defined as the number of tails.X = {0,1,2,3}



ProbabilityMathematically, a probability function assigns numbers (between 0 and 1) tosubsets of a sample space: P : B → [0,1], ∀B ⊆ S.

Two interpretations:1 (Frequentist) Based on long-run relative frequencies of possible

outcomes.2 (Bayesian) Based on belief about how likely each possible outcome is.

Regardless of interpretation, same basic probability laws apply, e.g.,P(A) ≥ 0,P(S) = 1,P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.



Probability distributionsA probability distribution is a list of all possible values of a random variableand their corresponding probabilities.

1 Discrete random variable: probability mass function (PMF)PMF: f (x) = Prob(X = x) ≥ 0Mean: E(X ) =

∑x

xf (x)

Variance: V(X ) =∑

x

[x − E(X )]2f (x)

2 Continuous random variable: probability density function (PDF)Prob(X = x) = 0 for all x

PDF: f (x) ≥ 0, Prob(X ∈ B) =∫B

f (x)dx

Mean: E(X ) =∫

xf (x)dx

Variance: V(X ) =∫ [

x − E(X )]2

f (x)dx



Parametric families of distributionsA statistical analysis typically proceeds by selecting a PMF (or PDF) thatseems to match the distribution of a sample.We rarely know the PMF (or PDF) exactly, but we may assume it is from aparametric family of distributions, and estimate the parameters.

1 Discrete random variablesBinomial (Bernoulli is a special case)PoissonNegativeBinomial

2 Continuous random variablesNormalGamma (Exponential and χ2 are special cases)InverseGammaBeta (Uniform is a special case)



X ∼ Bernoulli(θ)

Only two outcomes, (success/failure, 0/1, zero/nonzero, etc.),where θ is the probability of success.X ∈ {0,1}

PMF: f (x) = Prob(X = x) =

{1− θ, if x = 0,θ, if x = 1.

Mean: E(X ) =∑

x

xf (x) = 0(1− θ) + 1θ = θ

Variance: V(X ) =∑

x

[x − θ]2f (x) = (0− θ)2(1− θ) + (1− θ)2θ = θ(1− θ)



X ∼ Binomial(n, θ)

X = number of “successes” in n independent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =

(nx

)θx(1− θ)n−x .

Mean: E(X ) = nθVariance: V(X ) = nθ(1− θ)



X ∼ Poisson(λ)

X = number of events that occur in a unit of time.X ∈ {0,1, . . .}

PMF: f (x) = Prob(X = x) =λxe−λ

x!.

Mean: E(X ) = λVariance: V(X ) = λ

Note: Can be parameterized with λ = nθ,where θ is the expected number of events per unit time.E(X ) = V(X ) = nθ.



X ∼ NegativeBinomial(r , θ)

X = number of “failures” until r “successes” in a sequence ofindependent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =

(x+r−1x

)θr (1− θ)x .

Mean: E(X ) = r(1−θ)θVariance: V(X ) = r(1−θ)

θ2

Note: The geometric distribution is a special case: Geom(θ) = NB(1, θ).Note: MANY different ways to specify the NB distribution. The importantthing to note is that NB is a discrete count distribution that is a moreflexible model than Poisson.



X ∼ Normal(µ, σ2)

X ∈ (−∞,∞)

PDF: f (x) =1√2πσ

exp

[−1

2

(x − µσ

)2].

Mean: E(X ) = µVariance: V(X ) = σ2



X ∼ Gamma(a,b)

X ∈ (0,∞)PDF: f (x) = b

a

Γ(a) xa−1e−bx .

Mean: E(X ) =ab

Variance: V(X ) =ab2

Parameters: shape a > 0, rate b > 0.



X ∼ InverseGamma(a,b)

If Y ∼ Gamma(a,b), then X = 1Y ∼ InverseGamma(a,b).X ∈ (0,∞)PDF: f (x) = b

a

Γ(a) x−a−1e−b/x .

Mean: E(X ) =b

a− 1, for a > 1.

Variance: V(X ) =b2

(a− 1)2(a− 2), for a > 2.

Parameters: shape a > 0, rate b > 0.



X ∼ Beta(a,b)

X ∈ [0,1]PDF: f (x) = Γ(a+b)Γ(a)Γ(b) x

a−1(1− x)b−1.

Mean: E(X ) =a

a + b

Variance: V(X ) =ab

(a + b)2(a + b + 1)

Parameters: a > 0, b > 0.



Joint distributionsA random vector of p random variables: X = (X1,X2, . . . ,Xp).

For now, suppose we have just p = 2 random variables, X and Y .(X ,Y ) can be discrete or continuous.



Joint distributions

1 Discrete (X ,Y )joint PMF: f (x , y) = Prob(X = x ,Y = y)

marginal PMF for X : fX (x) = Prob(X = x) =∑

y

f (x , y)

marginal PMF for Y : fY (y) = Prob(Y = y) =∑

x

f (x , y)

2 Continuous (X ,Y )joint PDF: f (x , y)

Prob[(X ,Y ) ∈ B] =∫B

f (x , y)dxdy

marginal PDF for X : fX (x) =∫

f (x , y)dy

marginal PDF for Y : fY (y) =∫

f (x , y)dx



Discrete random variables

Example

Patients are randomly assigned a dose and followed to determine whetherthey develop a tumor.X ∈ {5,10,20} is the dose;Y ∈ {0,1} is 1 if a tumor develops and 0 otherwise.

The joint PMF is given by

XY 5 10 200 0.469 0.124 0.0491 0.231 0.076 0.051




Example

Find the marginal PMFs of X and Y .

fY (0) =∑

x

f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642

fY (1) =∑

x

f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358

fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1

XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

0.7 0.2 0.1 1




conditional PMF of Y given X :

f (y |x) = Prob(Y = y |X = x) = Prob(X = x ,Y = y)Prob(X = x)

=f (x , y)fX (x)

conditional =joint

marginal

Note: Here, x is treated as fixed, so f (y |x) is only a function of y .Note: This is not

∑x

f (x , y) = fY (y) nor∑

y

f (x , y) = fX (x).

Note: Showing that f (y |x) is a valid PMF,

∑y

f (y |x) =∑

y

f (y , x)fX (x)

=

∑y f (y , x)fX (x)

=fX (x)fX (x)

= 1




Example

Find f (y |x) and f (x |y).


XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358

0.7 0.2 0.1 1



Continuous random variables

Example

Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by

f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

Find Prob(X > 7,Y > 40)

=

∫ 5040

∫ 107

f (x , y)dxdy

= . . .

= 0.25




Example


f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

Find fX (x)

=

∫ 5020

f (x , y)dy

= . . .

= 0.52e−|x−7|




conditional PDF of Y given X :

f (y |x) = f (x , y)fX (x)

conditional =joint

marginal

Note: Here, x is treated as fixed, so f (y |x) is only a function of y .

Note: This is not∫

f (x , y)dx = fY (y) nor∫

f (x , y)dy = fX (x).

Note: Showing that f (y |x) is a valid PDF,∫f (y |x)dy =

∫f (y , x)fX (x)

dy =∫

f (y , x)dyfX (x)

=fX (x)fX (x)

= 1




Example


f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).

Find f (y |x).



Bivariate normal distribution

The bivariate normal is the most common multivariate family.There are 5 parameters:

1 µx is the marginal mean of X .2 µy is the marginal mean of Y .3 σ2x is the marginal variance of X .4 σ2y is the marginal variance of Y .5 ρxy is the correlation between X and Y .

The joint PDF is

1

2πσXσY√

1− ρ2exp

−(

x−µXσX

)2+(

y−µYσY

)2− 2ρ

(x−µXσX

)(y−µYσY

)2(1− ρ2)




Example

Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the marginal distribution of X .




Example

Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the conditional distribution of Y given X .



Bayes’ theorem

Recall conditional distributions:

f (y |x) = f (x , y)f (x)

conditional =joint

marginal

Can be extended to

f (y |x) = f (x , y)f (x)

=f (x |y)f (y)

f (x)=

f (x |y)f (y)∑all y

f (x |y)f (y)

This is the form of the famous “Bayes’ theorem” (or “Bayes’ rule”).Note: the denominator is simply a normalizing constant.



Bayes’ theorem

In a Bayesian data analysis, we select:1 the prior f (θ),2 the likelihood f (y |θ).

Based on these, we must compute3 the posterior f (θ |y).

Bayes’ theorem

The mathematical formula to convert the likelihood and prior to the posterior.

f (θ |y) = f (y |θ)f (θ)f (y)

Posterior ∝ Likelihood × Prior



probability review - applied bayesian...

Documents