applied bayesian inference, ksu, april 29, 2012 § ❷ / §❷ an introduction to bayesian inference...
TRANSCRIPT
![Page 1: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/1.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 1
§❷ An Introduction to Bayesian inference
Robert J. Tempelman
![Page 2: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/2.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 2
Bayes Theorem
• Recall basic axiom of probability:– f(q,y) = f(y|q) f(q)
• Also– f(q,y) = f(q|y) f(y)
• Combine both expressions to get:
or
Posterior Likelihood * Prior
||y θ
θ yy
θf
f f
f
||θ y θy θff f
![Page 3: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/3.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/
Prior densities/distributions• What can we specify for ?
– Anything that reflects our prior beliefs.– Common choice: “conjugate” prior.
• is chosen such that is recognizeable and of same form.
– “Flat” prior: . Then
– flat priors can be dangerous…can lead to improper ; i.e.
θf
|θ yf θf
constantθf | |
| |
θ y y θ θ
y θ y θ
f f f
f constant f
|θ yf |θ
θ y θf d
3
![Page 4: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/4.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 4
Prior information / Objective?
• Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with – 1) human rational behavior – 2) nature of the scientific method. – Memory property: past inference (posterior) can be
used as updated prior in future inference.• Nevertheless, many applied Bayesian data analysts
try to be as “objective” as possible using diffuse (e.g., flat) priors.
![Page 5: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/5.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 5
Example of conjugate prior
• Recall the binomial distribution:
• Suppose we express prior belief on p using a beta distribution:
– Denoted as Beta(a,b)
!Prob | , (1 )
!( )!y n yn
n pY yn
p py y
1 1(1| , )pf p p
![Page 6: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/6.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 6
Examples of different beta densities
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
p
Be
ta D
en
sitie
s
=9,=1=1,=1=2,=18
| ,pE
2var | ,
1p
Diffuse (flat) bounded prior(but it is proper since it is bounded!)
![Page 7: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/7.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 7
Posterior density of p
• Posterior Likelihood * Prior
• i.e. Beta(y+a,n-y+b)
• Beta is conjugate to the Binomial
1
1 1
1(1 ) (1 )
| , , , Prob | , | ,
(1 )
y n y
y n y
f p n y n f
p p
Y y
p
p p
p p
p
![Page 8: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/8.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 8
Suppose we observe data
• y = 10, n = 15.• Consider
three alternative priors:– Beta(1,1)– Beta(9,1)– Beta(2,18)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
p
Be
ta D
en
sitie
s=19,=6=11,=6=12,=23
Posterior densities:Beta(y+a,n-y+b)
![Page 9: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/9.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 9
Suppose we observed a larger dataset
• y = 100, n = 150.• Consider same alternative priors:
– Beta(1,1)– Beta(9,1)– Beta(2,18)
Posterior densities
0.0 0.2 0.4 0.6 0.8 1.0
02
46
81
0
p
Be
ta D
en
sitie
s
=109,=51=101,=51=102,=68
![Page 10: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/10.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 10
Posterior information
• Given:
• Posterior information = likelihood information + prior information.
• One option for point estimate: joint posterior mode of q using Newton Raphson.– Also called MAP (maximum a posteriori) estimate of q.
fff || yy
ln | constant+ ln | lnθ y y θ θf f f
'
ln
'
|ln
'
|ln 222
fff yy
![Page 11: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/11.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 11
Recall the plant genetic linkage example
• Recall
Suppose• Then
1 2 3 4
1 2 3 4
! 2 1 1|
! ! ! ! 4 4 4 4y
y y y yn
py y y y
1 1| , (1 )f
1 2 3 4
1 2 3 4
1 2 3 4
1 1
1 1
1 1
| , , | | ,
2 1(1 )
4 4 4
2 1 (1 )
2 1
y yy y y y
y y y y
y y y y
f p f
Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.
![Page 12: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/12.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 12
Plant linkage example cont’d.Suppose data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output;run;title "Posterior Standard Error = &poststd";proc print; var iterate theta logpost;run;
| , 50, 500Beta
Posterior standard error
1
2
2ˆ
ln ||
yy
fsd f
![Page 13: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/13.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 13
OutputPosterior Standard Error = 0.0057929339
Obs iterate theta logpost
1 1 0.018318 997.95
2 2 0.030841 1035.74
3 3 0.044771 1060.65
4 4 0.053261 1071.06
5 5 0.054986 1072.79
6 6 0.055037 1072.84
7 7 0.055037 1072.84
8 8 0.055037 1072.84
9 9 0.055037 1072.84
10 10 0.055037 1072.84
11 11 0.055037 1072.84
Posterior Standard Error = 0.0057929339
![Page 14: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/14.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 14
Additional elements of Bayesian inference
• Suppose that q can be partitioned into two components, a px1 vector q 1 and a qx1 vector q2,
• If want to make probability statements about q, use probability calculus:
• There is NO repeated sampling concept.– Condition on one observed dataset.– However, Bayes estimators typically do have very good
frequentist properties!
2
1
dpob yy ||Pr
![Page 15: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/15.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 15
Marginal vs. conditional inference
• Suppose you’re primarily interested in q1:
– i.e. average over uncertainty on q2 (nuisance variables)
• Of course, if q2 was known, you would condition your inference on q1 accordingly:
y,yy,
yyy
y21
|2221
22121
|E||
|,||
2
2
22
pdpp
dpdpp
R
RR
y,21 | p
![Page 16: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/16.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 16
Two-stage model example
• Given with yi ~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem:
nyyy 21'y
2 22| , , | ,y y| a af f f
2,~ aaN
2
22
2
1exp
2
1,| a
aa
aaf
Suppose
i.e.
![Page 17: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/17.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 17
Simplify likelihood
2 2
1
/2/2 222
1
, ,
12 exp
2
y| |n
ii
nnn
ii
f f y
y
2
21
1exp
2
n
ii
yy y
2
2
2
1
1exp 2
2
n
i ii
y y yy y y
n
i
y1
2
22
1exp
2
2
2
1exp y
n
![Page 18: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/18.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 18
Posterior density
• Consider the following limit:
• Consistent with or
2
22
2
2
22
2| , , ,
e|1
xp,2
,y|
y
a
a
aa
a
a
yf f
n
f
n
yf aa
a2
222
2
1exp,,,|lim
2
y
constantf 1f
22 2| , , , ~ ,y a a N y
n
![Page 19: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/19.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 19
Interpretation of Posterior Density with Flat Prior
• So
• Then
• i.e.
222 ,,,| |y|yy ffff
22 ,,|
|yy fArgMaxfArgMax
2 2Posterior mode | , ML | ,y y y
![Page 20: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/20.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 20
Posterior density with informative prior
• Now
After algebraic simplication:
n
yf
a
aaa 2
2
2
222
2
1exp,,,|
y
n
n
n
ny
Nf
a
a
a
aa
aa 22
22
22
2
22
22 ~,~~,,,|
y
![Page 21: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/21.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 21
• Note that
a
a
a
a
a
aa
n
n
y
nn
ny
2
2
2
2
2
222
22
11
1~
12
1 12 2
12
1 12 2
a
a a
a
n
n n
y
122
122
2
2
121
a
a
a
n
n
n
Posterior precision = prior precision + sample (likelihood) precisioni.e., weighted average of data
mean and prior mean
![Page 22: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/22.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 22
Hierarchical models
• Given
• Two stage:
• Three stage:
– What’s the difference? When do you consider one over another?
2
1
1 2 1 1 2| , | |θ y θ y θ θ θp p p
1 2 1 1 2 2, | | |θ θ y y θ θ θ θp p p p
![Page 23: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/23.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 23
Simple hierarchical model
• Random effects model– Yij = m + ai + eij
m: overall mean, ai ~ NIID(0,t2) ; eij ~ NIID(0,s2).
Suppose we knew m , s2, and t2:
| 1yi iBE y
2
| 1yi BVarn
2
22
nB
n
Shrinkage factor
![Page 24: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/24.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 24
What if we don’t know m , s2, or t2?
• Option 1: Estimate them:
• Then “plug them” in.
• Not truly Bayesian.– Empirical Bayesian (EB) (next section).– Most of us using PROC MIXED/GLIMMIX are EB!
k
yy
k
ii
1̂
)1(ˆ ,
2
2
kn
yyji
iij
2
2
2ˆ
1ˆ
ii
n y y
kn
ˆ ˆ| 1 ˆyi iBE y 2ˆ
| 1 ˆyi BVarn
e.g.method of moments
![Page 25: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1](https://reader036.vdocument.in/reader036/viewer/2022081513/56649ed35503460f94be3664/html5/thumbnails/25.jpg)
Applied Bayesian Inference, KSU, April 29, 2012
§❷/ 25
A truly Bayesian approach
• 1) Yij|qi ~ N(qi,s2) ; for all i,j
• 2) q1, q2, …, qk are iid N(m, t2)o Structural prior (exchangeable entities)
• 3) m ~ p(m); t2~ p(t2); s2 ~ p(s2)o Subjective prior
22
1 1
2221 |||,,,,...,, ppppypp
k
ii
n
jiijak
y
2 21 2
2 21 2 1 1
, ,..., , , , |... ...
,..., ,...,
yk a
i i k
p
d d d d d d d d
y|ip
Fully Bayesian inference (next section after that!)