probability review - applied bayesian...
TRANSCRIPT
-
Probability ReviewApplied Bayesian Statistics
Dr. Earvin Balderama
Department of Mathematics & StatisticsLoyola University Chicago
August 31, 2017
1Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Random VariablesMathematically, a random variable is a function that maps a sample spaceinto the real numbers: X : S → R.
1 Countable (discrete).2 Uncountable (continuous).
Example: 3 coin tosses
S = {HHH,HHT ,HTH,THH,THT ,TTH,TTT}We may want to create a random variable, X , defined as the number of tails.X = {0,1,2,3}
2Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
ProbabilityMathematically, a probability function assigns numbers (between 0 and 1) tosubsets of a sample space: P : B → [0,1], ∀B ⊆ S.
Two interpretations:1 (Frequentist) Based on long-run relative frequencies of possible
outcomes.2 (Bayesian) Based on belief about how likely each possible outcome is.
Regardless of interpretation, same basic probability laws apply, e.g.,P(A) ≥ 0,P(S) = 1,P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.
3Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
ProbabilityMathematically, a probability function assigns numbers (between 0 and 1) tosubsets of a sample space: P : B → [0,1], ∀B ⊆ S.
Two interpretations:1 (Frequentist) Based on long-run relative frequencies of possible
outcomes.2 (Bayesian) Based on belief about how likely each possible outcome is.
Regardless of interpretation, same basic probability laws apply, e.g.,P(A) ≥ 0,P(S) = 1,P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.
3Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Probability distributionsA probability distribution is a list of all possible values of a random variableand their corresponding probabilities.
1 Discrete random variable: probability mass function (PMF)PMF: f (x) = Prob(X = x) ≥ 0Mean: E(X ) =
∑x
xf (x)
Variance: V(X ) =∑
x
[x − E(X )]2f (x)
2 Continuous random variable: probability density function (PDF)Prob(X = x) = 0 for all x
PDF: f (x) ≥ 0, Prob(X ∈ B) =∫B
f (x)dx
Mean: E(X ) =∫
xf (x)dx
Variance: V(X ) =∫ [
x − E(X )]2
f (x)dx
4Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Parametric families of distributionsA statistical analysis typically proceeds by selecting a PMF (or PDF) thatseems to match the distribution of a sample.We rarely know the PMF (or PDF) exactly, but we may assume it is from aparametric family of distributions, and estimate the parameters.
1 Discrete random variablesBinomial (Bernoulli is a special case)PoissonNegativeBinomial
2 Continuous random variablesNormalGamma (Exponential and χ2 are special cases)InverseGammaBeta (Uniform is a special case)
5Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Bernoulli(θ)
Only two outcomes, (success/failure, 0/1, zero/nonzero, etc.),where θ is the probability of success.X ∈ {0,1}
PMF: f (x) = Prob(X = x) =
{1− θ, if x = 0,θ, if x = 1.
Mean: E(X ) =∑
x
xf (x) = 0(1− θ) + 1θ = θ
Variance: V(X ) =∑
x
[x − θ]2f (x) = (0− θ)2(1− θ) + (1− θ)2θ = θ(1− θ)
6Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Binomial(n, θ)
X = number of “successes” in n independent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =
(nx
)θx(1− θ)n−x .
Mean: E(X ) = nθVariance: V(X ) = nθ(1− θ)
7Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Poisson(λ)
X = number of events that occur in a unit of time.X ∈ {0,1, . . .}
PMF: f (x) = Prob(X = x) =λxe−λ
x!.
Mean: E(X ) = λVariance: V(X ) = λ
Note: Can be parameterized with λ = nθ,where θ is the expected number of events per unit time.E(X ) = V(X ) = nθ.
8Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ NegativeBinomial(r , θ)
X = number of “failures” until r “successes” in a sequence ofindependent “Bernoulli trials,”where θ is the probability of success on each trial.X ∈ {0,1, . . . ,n}PMF: f (x) = Prob(X = x) =
(x+r−1x
)θr (1− θ)x .
Mean: E(X ) = r(1−θ)θVariance: V(X ) = r(1−θ)
θ2
Note: The geometric distribution is a special case: Geom(θ) = NB(1, θ).Note: MANY different ways to specify the NB distribution. The importantthing to note is that NB is a discrete count distribution that is a moreflexible model than Poisson.
9Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Normal(µ, σ2)
X ∈ (−∞,∞)
PDF: f (x) =1√2πσ
exp
[−1
2
(x − µσ
)2].
Mean: E(X ) = µVariance: V(X ) = σ2
10Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Gamma(a,b)
X ∈ (0,∞)PDF: f (x) = b
a
Γ(a) xa−1e−bx .
Mean: E(X ) =ab
Variance: V(X ) =ab2
Parameters: shape a > 0, rate b > 0.
11Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ InverseGamma(a,b)
If Y ∼ Gamma(a,b), then X = 1Y ∼ InverseGamma(a,b).X ∈ (0,∞)PDF: f (x) = b
a
Γ(a) x−a−1e−b/x .
Mean: E(X ) =b
a− 1, for a > 1.
Variance: V(X ) =b2
(a− 1)2(a− 2), for a > 2.
Parameters: shape a > 0, rate b > 0.
12Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
X ∼ Beta(a,b)
X ∈ [0,1]PDF: f (x) = Γ(a+b)Γ(a)Γ(b) x
a−1(1− x)b−1.
Mean: E(X ) =a
a + b
Variance: V(X ) =ab
(a + b)2(a + b + 1)
Parameters: a > 0, b > 0.
13Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Joint distributionsA random vector of p random variables: X = (X1,X2, . . . ,Xp).
For now, suppose we have just p = 2 random variables, X and Y .(X ,Y ) can be discrete or continuous.
14Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Joint distributions
1 Discrete (X ,Y )joint PMF: f (x , y) = Prob(X = x ,Y = y)
marginal PMF for X : fX (x) = Prob(X = x) =∑
y
f (x , y)
marginal PMF for Y : fY (y) = Prob(Y = y) =∑
x
f (x , y)
2 Continuous (X ,Y )joint PDF: f (x , y)
Prob[(X ,Y ) ∈ B] =∫B
f (x , y)dxdy
marginal PDF for X : fX (x) =∫
f (x , y)dy
marginal PDF for Y : fY (y) =∫
f (x , y)dx
15Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Patients are randomly assigned a dose and followed to determine whetherthey develop a tumor.X ∈ {5,10,20} is the dose;Y ∈ {0,1} is 1 if a tumor develops and 0 otherwise.
The joint PMF is given by
XY 5 10 200 0.469 0.124 0.0491 0.231 0.076 0.051
16Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find the marginal PMFs of X and Y .
fY (0) =∑
x
f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642
fY (1) =∑
x
f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358
fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
17Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find the marginal PMFs of X and Y .
fY (0) =∑
x
f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642
fY (1) =∑
x
f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358
fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
17Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find the marginal PMFs of X and Y .
fY (0) =∑
x
f (x ,0) = 0.469 + 0.124 + 0.049 = 0.642
fY (1) =∑
x
f (x ,1) = 0.231 + 0.076 + 0.051 = 0.358
fX (5) = 0.7, fX (10) = 0.2, fX (20) = 0.1
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
17Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
conditional PMF of Y given X :
f (y |x) = Prob(Y = y |X = x) = Prob(X = x ,Y = y)Prob(X = x)
=f (x , y)fX (x)
conditional =joint
marginal
Note: Here, x is treated as fixed, so f (y |x) is only a function of y .Note: This is not
∑x
f (x , y) = fY (y) nor∑
y
f (x , y) = fX (x).
Note: Showing that f (y |x) is a valid PMF,
∑y
f (y |x) =∑
y
f (y , x)fX (x)
=
∑y f (y , x)fX (x)
=fX (x)fX (x)
= 1
18Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
conditional PMF of Y given X :
f (y |x) = Prob(Y = y |X = x) = Prob(X = x ,Y = y)Prob(X = x)
=f (x , y)fX (x)
conditional =joint
marginal
Note: Here, x is treated as fixed, so f (y |x) is only a function of y .Note: This is not
∑x
f (x , y) = fY (y) nor∑
y
f (x , y) = fX (x).
Note: Showing that f (y |x) is a valid PMF,
∑y
f (y |x) =∑
y
f (y , x)fX (x)
=
∑y f (y , x)fX (x)
=fX (x)fX (x)
= 1
18Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find f (y |x) and f (x |y).
The joint PMF is given by
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
19Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find f (y |x) and f (x |y).
The joint PMF is given by
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
Prob(Y = 0 |X = 5) = 0.4690.7
= 0.67
Prob(Y = 1 |X = 5) = 0.2310.7
= 0.33
Prob(X = 5 |Y = 0) = 0.4690.642
= 0.67. . .
20Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Discrete random variables
Example
Find f (y |x) and f (x |y).
The joint PMF is given by
XY 5 10 200 0.469 0.124 0.049 0.6421 0.231 0.076 0.051 0.358
0.7 0.2 0.1 1
Prob(Y = 0 |X = 5) = 0.4690.7
= 0.67
Prob(Y = 1 |X = 5) = 0.2310.7
= 0.33
Prob(X = 5 |Y = 0) = 0.4690.642
= 0.67. . .
20Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
Example
Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by
f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).
Find Prob(X > 7,Y > 40)
=
∫ 5040
∫ 107
f (x , y)dxdy
= . . .
= 0.25
21Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
Example
Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by
f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).
Find Prob(X > 7,Y > 40)
=
∫ 5040
∫ 107
f (x , y)dxdy
= . . .
= 0.25
21Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
Example
Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by
f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).
Find fX (x)
=
∫ 5020
f (x , y)dy
= . . .
= 0.52e−|x−7|
22Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
Example
Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by
f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).
Find fX (x)
=
∫ 5020
f (x , y)dy
= . . .
= 0.52e−|x−7|
22Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
conditional PDF of Y given X :
f (y |x) = f (x , y)fX (x)
conditional =joint
marginal
Note: Here, x is treated as fixed, so f (y |x) is only a function of y .
Note: This is not∫
f (x , y)dx = fY (y) nor∫
f (x , y)dy = fX (x).
Note: Showing that f (y |x) is a valid PDF,∫f (y |x)dy =
∫f (y , x)fX (x)
dy =∫
f (y , x)dyfX (x)
=fX (x)fX (x)
= 1
23Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
conditional PDF of Y given X :
f (y |x) = f (x , y)fX (x)
conditional =joint
marginal
Note: Here, x is treated as fixed, so f (y |x) is only a function of y .
Note: This is not∫
f (x , y)dx = fY (y) nor∫
f (x , y)dy = fX (x).
Note: Showing that f (y |x) is a valid PDF,∫f (y |x)dy =
∫f (y , x)fX (x)
dy =∫
f (y , x)dyfX (x)
=fX (x)fX (x)
= 1
23Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Continuous random variables
Example
Let X = birthweight, Y = gestational age. X ∈ (2,10) pounds;Y ∈ (20,50) weeks.The joint PDF is given by
f (x , y) = 0.26 exp(−|x − 7| − |y − 40|).
Find f (y |x).
24Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Bivariate normal distribution
The bivariate normal is the most common multivariate family.There are 5 parameters:
1 µx is the marginal mean of X .2 µy is the marginal mean of Y .3 σ2x is the marginal variance of X .4 σ2y is the marginal variance of Y .5 ρxy is the correlation between X and Y .
The joint PDF is
1
2πσXσY√
1− ρ2exp
−(
x−µXσX
)2+(
y−µYσY
)2− 2ρ
(x−µXσX
)(y−µYσY
)2(1− ρ2)
25Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Bivariate normal distribution
Example
Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the marginal distribution of X .
26Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Bivariate normal distribution
Example
Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1.Find the conditional distribution of Y given X .
27Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Bayes’ theorem
Recall conditional distributions:
f (y |x) = f (x , y)f (x)
conditional =joint
marginal
Can be extended to
f (y |x) = f (x , y)f (x)
=f (x |y)f (y)
f (x)=
f (x |y)f (y)∑all y
f (x |y)f (y)
This is the form of the famous “Bayes’ theorem” (or “Bayes’ rule”).Note: the denominator is simply a normalizing constant.
28Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama
-
Bayes’ theorem
In a Bayesian data analysis, we select:1 the prior f (θ),2 the likelihood f (y |θ).
Based on these, we must compute3 the posterior f (θ |y).
Bayes’ theorem
The mathematical formula to convert the likelihood and prior to the posterior.
f (θ |y) = f (y |θ)f (θ)f (y)
Posterior ∝ Likelihood × Prior
29Applied Bayesian Statistics
Last edited September 8, 2017 by Earvin Balderama