§❹ the bayesian revolution: markov chain monte carlo (mcmc)
DESCRIPTION
§❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) . Robert J. Tempelman. Simulation-based inference. f(x): density g(x): function. As n → . Suppose you’re interested in the following integral/expectation: - PowerPoint PPT PresentationTRANSCRIPT
Applied Bayesian Inference, KSU, April 29, 2012
§ / 1
§❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC)
Robert J. Tempelman
Applied Bayesian Inference, KSU, April 29, 2012
§ / 2
Simulation-based inference
• Suppose you’re interested in the following integral/expectation:
• You can draw random samples x1,x2,…,xn from f(x). Then compute
• With Monte Carlo Standard Error:
Ex
f dx g x x xg
1
1ˆn
ii
E g x g x E g xn
As n →
2
1
ˆ1
1
n
ii
g x E g x
nn
f(x): densityg(x): function.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 3
Beauty of Monte Carlo methods
• You can determine the distribution of any function of the random variable(s).
• Distribution summaries include:– Means,– Medians,– Key Percentiles (2.5%, 97.5%)– Standard Deviations,– Etc.
• Generally more reliable than using “Delta method” especially for highly non-normal distributions.
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Using method of composition for sampling (Tanner, 1996).
• Involve two stages of sampling.• Example:
– Suppose Yi|li~Poisson(li)
– In turn., li|a,b ~ Gamma(a,b)
– Then
4
Prob |!
i
yi
i iY y ey
l ll
1| , i
i ip ea
blabl a b la
1
Pr | , P | ,r |
!1
! 1 1i
i
i
i
i
i
i
i i
yi
i iR
yi
iiR
ob Y y dob Y y
e dyy
y
p
e
l
l
aabal l
l a b
b la
a b l
a bla b b
l
l
negative binomial distribution with mean a/ b and variance (a/b)(1+ b -1).
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Using method of composition for sampling from negative binomial:
data new; seed1 = 2; alpha = 2; beta = 0.25; do j = 1 to 10000; call rangam(seed1,alpha,x); lambda = x/beta; call ranpoi(seed1,lambda,y); output; end;run;proc means mean var; var y;run; 5
1. Draw li|a,b ~ Gamma(a,b) .2. Draw Yi ~Poisson(li)
The MEANS Procedure
Variable Mean Variance
y 7.9749 39.2638
E(y) = a/ = 2b /0.25 = 8
Var(y) = (a/b)(1+ b -1) = 8*(1+4)=40
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Another example? Student t.data new; seed1 = 29523; df=4; do j = 1 to 100000; call rangam(seed1,df/2,x); lambda = x/(df/2); t = rannor(seed1)/sqrt(lambda); output; end;run;proc means mean var p5 p95; var t;run;data new; t5 = tinv(.05,4); t95 = tinv(.95,4);run;proc print;run; 6
1. Draw li|n ~ Gamma(n/2,n/2) .2. Draw ti |li~Normal(0,1/li)
Then t ~ Student tnVariable Mean Variance 5th Pctl 95th Pctl
t -0.00524 2.011365 -2.1376 2.122201
Obs t5 t95
1 -2.1319 2.13185
Applied Bayesian Inference, KSU, April 29, 2012
§ / 7
Expectation-Maximization (EM)
• Ok, I know that EM is NOT a simulation-based inference procedure.– However, it is based on data augmentation.
• Important progenitor of Markov Chain Monte Carlo (MCMC) methods
– Recall the plant genetics example
4321
441
41
42
!!!!!|
4321
yyyy
yyyynL
y
4321
441
41
42|
yyyy
L
y
Applied Bayesian Inference, KSU, April 29, 2012
§ / 8
Data augmentation
• Augment “data” by splitting first cell into two cells with probabilities ½ and /4 for 5 categories:
1 2 3 41 1|
412 4 4 4
y,x y yx y y
L x
1 4 2 314 4
y x y y y
3241 1| yyyxyxL y,
1 11p a ba b
a b
Looks like a Beta Distribution to me!
Applied Bayesian Inference, KSU, April 29, 2012
§ / 9
Data augmentation (cont’d)• So joint distribution of “complete” data:
• Consider the part just including the “missing data”
41 2 3
1 2 3 4
2! 1 1! ! 4
|! ! ! 4 4 44
y,yx y yx yn
x yp x
yx y y
xyx
xy
xp
1
222| 1
y,
12E |
2y,x y
binomial
Applied Bayesian Inference, KSU, April 29, 2012
§ / 10
Expectation-Maximization.
• Start with complete log-likelihood:
• 1. Expectation (E-step)
1 2 34 log4
1glog lo4
| y, y x yL x con ys y
1 1 4 2 3[ ]log lo2
ˆ 2g 1
ty y y y y
1loglog
2ˆˆ
324][
][
1 yyyyt
t
1 42 3
1 2 3 4
! 2log | log! ! ! 4 44
1!
14! 4
y,yx yx yynL x
x y x y y y
1 4 2 3E log | log logE 1y,x
L x y yx y y
Applied Bayesian Inference, KSU, April 29, 2012
§ / 11
• 2. Maximization step– Use first or second derivative methods to
maximize
– Set to 0:
[ ]
1 4[ ]2 3
ˆˆ 2
1E log | y,
t
tLy
y yxy
432][
][
1
4][
][
1
]1[
2ˆˆ
2ˆˆ
ˆ
yyyy
yy
t
t
t
t
t
E log | y,x
L x
Applied Bayesian Inference, KSU, April 29, 2012
§ / 12
Recall the dataProbability Genotype Data (Counts)
Prob(A_B_) y1=1997
Prob(aaB_) y2=906
Prob(A_bb) y3=904
Prob(aabb) y4=32 p31
4
p4 4
p21
4
p12
4
0 1 → 0: close linkage in repulsion → 1: close linkage in coupling
Applied Bayesian Inference, KSU, April 29, 2012
§ / 13
PROC IML code:proc iml; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.20; /*Starting value */ do iter = 1 to 20; Ex2 = y1*(theta)/(theta+2); /* E-step */ theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ print iter theta; end;run;
iter theta1 0.10553032 0.06801473 0.05120314 0.04326465 0.03942346 0.03754297 0.0366178 0.03615989 0.035933810 0.035821911 0.035766612 0.035739213 0.035725614 0.035718915 0.035715616 0.035713917 0.035713118 0.035712719 0.035712520 0.0357124
Slower than Newton-Raphson/Fisher scoring…but generally more robust to poorer starting values.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 14
How derive an asymptotic standard error using EM?
• From Louis (1982): 2 ,
2
|
2
2
log | log | ,| ,
log |-
,varX θ Y
X
θ Y θ Y XX θ Y X
θθθ X
θYp p
pp
d
3241 1| yyyxyxp y,
1 4 2 3log |1
y,p x y y yx y
ˆ 1
ˆ2 2 .03571997 34.42ˆ ˆ .0357 2 .0var
3572|
22,yx y
22
ˆˆˆ
log |var 21 34.42var | , var | ,
.698
574
37
0. 1y
,y
y xp xx
Given:
Applied Bayesian Inference, KSU, April 29, 2012
§ / 15
Finish off
• Now
• Hence:
1 4 2 32
2
2 2
log |
1
y, yp yx x y y
1 1 42 3
2 2
2
ˆ2ˆ 5
lo
ˆ4507
2ˆ2 .0
g |E 6
1
y,x
y y yy yp x
2
2ˆ
54507.06log |
269 27519.67.41 58θ Y
θp
1ˆ .0060
27519.65se
2 ,
2
|
2
2 | ,log | ,
- varlogl g || ,o
X θ YX
θ Yθ Y X θ Y Xθ
Y Xθ
X θθ
pp
dpp
26987.41
Applied Bayesian Inference, KSU, April 29, 2012
§ / 16
Stochastic Data Augmentation (Tanner, 1996)
• Posterior Identity
• Predictive Identity
• Implies
( | )| | ,xR
y yy pp p x dxx
R
yyy dpxpxp )|(,||
| ,
|
| ,
| ,
| , ( | )
( |
|
( | )
)
x
x
R R
R
y
yyy
y
y
y
y
yR
R
p
p dx
dx d
p
x
p x
p x
x d
K
p p
p
d
| yK
Transition function for Markov Chain
Suggests an “iterative” method of composition approach for sampling
Applied Bayesian Inference, KSU, April 29, 2012
§ / 17
Sampling strategy from p(|y)
• Start somewhere: (starting value = [0] )– Sample x[1] from – Sample [1] from
– Sample x[2] from – Sample [2] ] from
– etc.– It’s like sampling from “E-steps” and “M-steps”
[0]| ,yp x
[1]| ,yp x x Cycle 1
[1]| ,yp x
[2]| ,yp x x Cycle 2
Applied Bayesian Inference, KSU, April 29, 2012
§ / 18
What are these Full Conditional Densities (FCD) ?
• Recall “complete” likelihood function
• Assume prior on is “flat” :
• FCD:
1p
| , , |y yp x p x
4 2 31 1 11 1 1| y, y x y y yp x Beta(a=(y1-x +y4 +1),b=(y2+y3+1))
431 2
1 2 3 4
! 2|! ! ! ! ! 4 4
14 4 4
1y,yy xx y ynp x
x y x y y y
12|
2 2y,
x y x
p x
Binomial(n=y1, p = 2/(+2))
Applied Bayesian Inference, KSU, April 29, 2012
§ / 19
IML code for Chained Data Augmentation Example
proc iml; seed1=4; ncycle = 10000; /* total number of samples */ theta = j(ncycle,1,0);y1 = 1997; y2 = 906; y3 = 904; y4 = 32; beta = y2+y3+1; theta[1] = ranuni(seed1); /* initial draw between 0 and 1 */
do cycle = 2 to ncycle; p = 2/(2+theta[cycle-1]); xvar= ranbin(seed1,y1,p); alpha = y1+y4-xvar+1; xalpha = rangam(seed1,alpha); xbeta = rangam(seed1,beta); theta[cycle] = xalpha/(xalpha+xbeta); end; create parmdata var {theta xvar }; append;run;data parmdata; set parmdata; cycle = _n_;run;
,1
,,1 ,1
GammaBeta
Gamma Gammaa
a ba b
Starting value
Applied Bayesian Inference, KSU, April 29, 2012
§ / 20
Trace Plotproc gplot data=parmdata; plot theta*cycle;run;
Burn-in?
“bad” starting value
Should discard the first “few” samples to ensure that one is truly sampling from p(|y) Starting value should have no impact.
“Convergence in distribution”.
How to decide on this stuff? Cowles and Carlin (1996)
Throw away the first 1000 samples as “burn-in”
Applied Bayesian Inference, KSU, April 29, 2012
§ / 21
Histogram of samplespost burn-in
proc univariate data=parmdata ; where cycle > 1000; var theta ; histogram/normal(color=red mu=0.0357 sigma=0.0060);run;
Bayesian inference
N 9000
Posterior Mean 0.03671503
Post. Std Deviation 0.00607971
Quantiles for Normal Distribution
Percent Quantile
Observed(Bayesian)
Asymptotic(Likelihood)
5.0 0.02702 0.02583
95.0 0.04728 0.04557
Asymptotic Likelihood inference
Applied Bayesian Inference, KSU, April 29, 2012
§ / 22
Zooming in on Trace PlotHints of autocorrelation.
Expected with Markov Chain Monte Carlo simulation schemes.
Number of drawn samples is NOT equal number of independent draws.
The greater the autocorrelation…the greater the problem…need more samples!
Applied Bayesian Inference, KSU, April 29, 2012
§ / 23
Sample autocorrelation
Autocorrelation Check for White NoiseTo Lag Chi-
SquareDF Pr >
ChiSqAutocorrelations
6 3061.39 6 <.0001 0.497 0.253 0.141 0.079 0.045 0.029
proc arima data=parmdata plots(only)=series(acf); where cycle > 1000; identify var= theta nlag=1000 outcov=autocov ;run;
Applied Bayesian Inference, KSU, April 29, 2012
§ / 24
How to estimate the effective number of independent samples (ESS)
• Consider posterior mean based on m samples:
• Initial positive sequence estimator (Geyer, 1992; Sorensen and Gianola, 1995):
0
ˆˆ (0) 2ˆvar
t
m mj
m
j
m
[ ]
1
1ˆ im
mim
[ ]
1,2,...,
1ˆvar var im i mm
[ ] [ ]
1
1 ˆ ˆˆ ( )i m j
i i tm m m
i
jm
Lag-m autocovariance
ˆ ˆ ˆ2 2 1 , 0,1,..., m m mj j j j
Sum of adjacent lag autocovariances
ˆ (0)mvariance
Applied Bayesian Inference, KSU, April 29, 2012
§ / 25
Initial positive sequence estimator
• Choose t such that all• SAS PROC MCMC chooses a slightly different
cutoff (see documentation).
ˆ 0, 0,1,..., m j j t
0
ˆˆ (0) 2ˆvar
t
m mj
m
j
m
ˆ 0ˆvar
m
m
ESS
Extensive autocorrelation
across lags…..leads to smaller ESS
Applied Bayesian Inference, KSU, April 29, 2012
§ / 26
SAS code%macro ESS1(data,variable,startcycle,maxlag);data _null_; set &data nobs=_n;; call symputx('nsample',_n);run;proc arima data=&data ; where iteration > &startcycle; identify var= &variable nlag=&maxlag outcov=autocov ;run;
proc iml; use autocov; read all var{'COV'} into cov; nsample = &nsample; nlag2 = nrow(cov)/2; Gamma = j(nlag2,1,0); cutoff = 0; t = 0;
do while (cutoff = 0); t = t+1; Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2]; if Gamma[t] < 0 then cutoff = 1; if t = nlag2 then do; print "Too much autocorrelation"; print "Specify a larger max lag"; stop; end; end; varm = (-Cov[1] + 2*sum(Gamma)) / nsample; ESS = Cov[1]/varm; /* effective sample size */ stdm = sqrt(varm); parameter = "&variable";/* Monte Carlo standard error */ print parameter stdm ESS;run;%mend ESS1;
Recall: 9000 MCMC post burnin cycles.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 27
Executing %ESS1
• %ESS1(parmdata,theta,1000,1000);
Recall: 1000 MCMC burnin cycles.
parameter stdm ESS
theta 0.0001116 2967.1289
i.e. information equivalent to drawing 2967 independent draws from density.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 28
How large of an ESS should I target?
• Routinely…in the thousands or greater.• Depends on what you want to estimate.
– Recommend no less than 100 for estimating “typical” location parameters: mean, median, etc.
– Several times that for “typical” dispersion parameters like variance.
• Want to provide key percentiles?– i.e., 2.5th , 97.5th percentiles? Need to have ESS in the
thousands!– See Raftery and Lewis (1992) for further direction.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 29
Worthwhile to consider this sampling strategy?
• Not too much difference, if any, with likelihood inference.
• But how about smaller samples?– e.g.,
y1=200,y2=91,y3=90,y4=3– Different story
Applied Bayesian Inference, KSU, April 29, 2012
§ / 30
Gibbs sampling: origins(Geman and Geman, 1984).
• Gibbs sampling was first developed in statistical physics in relation to spatial inference problem– Problem: true image was corrupted by a stochastic
process to produce an observable image y (data) • Objective: restore or estimate the true image in the light of
the observed image y.– Inference on based on the Markov random field joint
posterior distribution, through successively drawing from updated FCD which were rather easy to specify.
– These FCD each happened to be the Gibbs distn’s.• Misnomer has been used since to describe a rather general
process.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 31
Gibbs sampling
• Extension of chained data augmentation for case of several unknown parameters.
• Consider p = 3 unknown parameters:• Joint posterior density• Gibbs sampling: MCMC sampling strategy
where all FCD are recognizeable:
p 1 2 3, , | y
p 1 2 3| , , y
2 1 3| , , yp
3 1 2| , , yp
1 2 3, ,
Applied Bayesian Inference, KSU, April 29, 2012
§ / 32
Gibbs sampling: the process1) Start with some “arbitrary” starting values (but within allowable parameter space)
2) Draw from3) Draw from 4) Draw from
5) Repeat steps 2)-4) m times.
011
02 2
03 3
1
1 0 0
1 2 2 3 3| , ,yp
12
1 02 1 1 3 3| , , yp
13
1 13 1 1 2 2| , , yp
Steps 2-4 constitute one cycle of Gibbs samplingm: length of Gibbs chain
One cycle = one random draw from p 1 2 3, , | y
Applied Bayesian Inference, KSU, April 29, 2012
§ / 33
General extension of Gibbs sampling
• When there are d parameters and/or blocks of parameters:
• Again specify starting values:• Sample from the FCD’s in cycle i
Sample 1(k+1)
fromSample 2
(k+1) from …Sample d
(k+1) from
[ ]1 2'θ θ d (0) (0) (0)
1 2θ d
( ) ( )1 2| ,...,θ ,yk k
dp 1
2 1 3| , ,...,θ ,yk k kdp
1 1 11 2 1| , ,...,θ ,yk k k
d dp
Generically, sample i from | θ ,yi ip
Applied Bayesian Inference, KSU, April 29, 2012
§ / 34
• Throw away enough burn-in samples (k<m)
• (k+1) , (k+2) ,..., (m) are a realization of a Markov chain with equilibrium distribution p(|y)
• The m-k joint samples of (k+1) , (k+2) ,..., (m) are then considered to be random drawings from the joint posterior density p(|y).
• Individually, the m-k samples of j(k+1) , j
(k+2) ,..., j(k+m) are random
samples of j from the marginal posterior density , p(j|y) j = 1,2,…,d. – i.e., j are “nuisance” variables if interest is directed on j
Applied Bayesian Inference, KSU, April 29, 2012
§ / 35
Mixed model example with known variance components, flat prior on b.
• Recall:
– where
• Write
– i.e.
11 12 2
1 1
ˆ ' ',
ˆ ' ' -1
X R X X R Zββ,u| , ,y~NZ R X Z R Z+Gue u
' '' '
bu
X R X X R ZZ R X Z R Z + G
X' R yZ' R y-1
1
1
1 1
1 1
1
ˆˆˆβθu
βθ
u
1 1
1 1
' '' ' -1
X R X X R ZC
Z R X Z R Z+G
2 2 1ˆ,θ| , ,y~N θ Ce u ALREADY KNOW JOINT POSTERIOR DENSITY!
Applied Bayesian Inference, KSU, April 29, 2012
§ / 36
FCD for mixed effects model with known variance components
• Ok..really pointless to use MCMC here..but let’s demonstrate. But it be can shown FCD are:
• where
i i e u i iy N v| , , , ~ ~ , ~ 2 2
1,
p q
ij jj j
ii
i
i
i
b
c
c
1
ii
i
vc
1 1
1 1
' '' ' -1
X R X X R ZC
Z R X Z R Z+G
βθ
u
1
1
X'R yb
Z'R y
ith row
ith row
ith column
ith diagonal element
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Two ways to sample b and u• 1. Block draw from
– faster MCMC mixing (less/no autocorrelation across MCMC cycles)
– But slower computing time (depending on dimension of ).• i.e. compute Cholesky of C• Some alternative strategies available (Garcia-Cortes and Sorensen,
1995)
• 2. Series of univariate draws from
– Faster computationally.– Slower MCMC mixing
• Partial solution: “thinning the MCMC chain” e.g., save every 10 cycles rather than every cycle
2 2 1ˆ,θ| , ,y~N θ Ce u
2 2| , , , ~ , ; 1,2,...θi i e u i iy N v i p q
37
Applied Bayesian Inference, KSU, April 29, 2012
§ / 38
Example: A split plot in time example(Data from Kuehl, 2000, pg.493)
• Experiment designed to explore mechanisms for early detection of phlebitis during amiodarone therapy. – Three intravenous treatments:
(A1) Amiodarone (A2) the vehicle solution only (A3) a saline solution.
– 5 rabbits/treatment in a completely randomized design. – 4 repeated measures/animal (30 min. intervals)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 39
SAS data stepdata ear; input trt rabbit time temp; y = temp; A = trt; B = time; trtrabbit = compress(trt||'_'||rabbit); wholeplot=trtrabbit; cards; 1 1 1 -0.3 1 1 2 -0.2 1 1 3 1.2 1 1 4 3.1 1 2 1 -0.5 1 2 2 2.2 1 2 3 3.3 1 2 4 3.7 etc.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 40
The data (“spaghetti plot”)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 41
Profile (Interaction) means plots
Applied Bayesian Inference, KSU, April 29, 2012
§ / 42
A split plot model assumption for repeated measures
Treatment 1
Rabbit 3Rabbit 2Rabbit 1
Time 1
Time 2
Time 3
Time 4
Time 1
Time 2
Time 3
Time 4
Time 1
Time 2
Time 3
Time 4
RABBIT IS THE EXPERIMENTAL UNIT FOR TREATMENT
RABBIT IS THE BLOCK FOR TIME
Applied Bayesian Inference, KSU, April 29, 2012
§ / 43
Suppose CS assumption was appropriate
CONDITIONAL SPECIFICATION: Model variation between experimental units (i.e. rabbits)
– This is a partially nested or split-plot design.• i.e. for treatments, rabbits is the experimental unit;
for time, rabbits is the block!
( )ijk i k i j ij ijky u e a b ab
2( ) ( )~ 0,k i uu NIID a 2,0~ eijk NIIDe
Applied Bayesian Inference, KSU, April 29, 2012
§ / 44
Analytical (non-simulation) Inference based on PROC MIXED
Let’s assume “known”Flat priors on fixed effects p(b) 1.
title 'Split Plot in Time using Mixed';title2 'Known Variance Components';proc mixed data=ear noprofile; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); parms (0.1) (0.6) /hold = 1,2; ods output solutionf = solutionf;run;proc print data=solutionf; where estimate ne 0;run;
2( ) 0.10u a 2 0.60e
Applied Bayesian Inference, KSU, April 29, 2012
§ / 45
(Partial) OutputObs Effect trt time Estimate StdErr DF
1 Intercept _ _ 0.2200 0.3742 12
2 trt 1 _ 2.3600 0.5292 123 trt 2 _ -0.2200 0.5292 125 time _ 1 -0.9000 0.4899 366 time _ 2 0.02000 0.4899 36
7 time _ 3 -0.6400 0.4899 369 trt*time 1 1 -1.9200 0.6928 3610 trt*time 1 2 -1.2200 0.6928 3611 trt*time 1 3 -0.06000 0.6928 36
13 trt*time 2 1 0.3200 0.6928 3614 trt*time 2 2 -0.5400 0.6928 3615 trt*time 2 3 0.5800 0.6928 36
Applied Bayesian Inference, KSU, April 29, 2012
§ / 46
MCMC inference
• First set up dummy variables.
/* Based on the zero out last level restrictions */proc transreg data=ear design order =data; model class(trt|time / zero=last); id y trtrabbit; output out=recodedsplit;run;
proc print data=recodedsplit (obs=10); var intercept &_trgind;run;
Corner parameterization implicit in SAS linear model s software
Applied Bayesian Inference, KSU, April 29, 2012
§ / 47
Partial Output (First two rabbits)Obs _NA
ME_Intercept
trt1 trt2 time1
time2
time3
Trt1time1
Trt1time2
Trt1time3
Trt2time1
Trt2time2
Trt2time3
trt time y trtrabbit
1 -0.3 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -0.3 1_1
2 -0.2 1 1 0 0 1 0 0 1 0 0 0 0 1 2 -0.2 1_1
3 1.2 1 1 0 0 0 1 0 0 1 0 0 0 1 3 1.2 1_1
4 3.1 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.1 1_1
5 -0.5 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -0.5 1_2
6 2.2 1 1 0 0 1 0 0 1 0 0 0 0 1 2 2.2 1_2
7 3.3 1 1 0 0 0 1 0 0 1 0 0 0 1 3 3.3 1_2
8 3.7 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.7 1_2
9 -1.1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -1.1 1_3
10 2.4 1 1 0 0 1 0 0 1 0 0 0 0 1 2 2.4 1_3
Part of X matrix (full-rank)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 48
MCMC using PROC IMLproc iml;
seed = &seed; nburnin = 5000; /* number of burn in samples */ total = 200000;/* total number of Gibbs cycles beyond burnin */ thin= 10; /* saving every “thin" */
ncycle = total/skip; /* leaving a total of ncycle saved samples */
Full code available online
Applied Bayesian Inference, KSU, April 29, 2012
§ / 49
Key subroutine (univariate sampling)
start gibbs; /* univariate Gibbs sampler */ do j = 1 to dim; /* dim = p + q *//* generate from full conditionals for fixed and random effects */ solt = wry[j] - coeff[j,]*solution + coeff[j,j]*solution[j]; solt = solt/coeff[j,j]; vt = 1/coeff[j,j]; solution[j] = solt + sqrt(vt)*rannor(seed); end; finish gibbs;
2 2| , , , ~ , ; 1, 2,...θi i e u i iy N v i p q
1
ii
i
vc
1,
p q
ij jj j
ii
i
i
i
b
c
c
Applied Bayesian Inference, KSU, April 29, 2012
§ / 50
• Output samples to SAS data set called soldataproc means mean median std data=soldata;run;
ods graphics on; %tadplot(data=soldata, var=_all_);ods graphics off;
%tadplot is a SAS automacro suited for processing MCMC samples.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 51
Comparisons for fixed effectsMCMC (Some Monte Carlo error) EXACT (PROC MIXED)
Effect trt time Estimate StdErrIntercept _ _ 0.2200 0.3742trt 1 _ 2.3600 0.5292trt 2 _ -0.2200 0.5292time _ 1 -0.9000 0.4899time _ 2 0.02000 0.4899time _ 3 -0.6400 0.4899trt*time 1 1 -1.9200 0.6928trt*time 1 2 -1.2200 0.6928trt*time 1 3 -0.06000 0.6928trt*time 2 1 0.3200 0.6928trt*time 2 2 -0.5400 0.6928trt*time 2 3 0.5800 0.6928
Variable Mean Median Std Dev Nint 0.218 0.218 0.374 20000
TRT1 2.365 2.368 0.526 20000TRT2 -0.22 -0.215 0.532 20000
TIME1 -0.902 -0.903 0.495 20000TIME2 0.0225 0.0203 0.491 20000TIME3 -0.64 -0.643 0.488 20000TRT1
TIME1-1.915 -1.916 0.692 20000
TRT1TIME2
-1.224 -1.219 0.69 20000
TRT1TIME3
-0.063 -0.066 0.696 20000
TRT2TIME1
0.321 0.316 0.701 20000
TRT2TIME2
-0.543 -0.54 0.696 20000
TRT2TIME3
0.58 0.589 0.694 20000
Applied Bayesian Inference, KSU, April 29, 2012
§ / 52
%TADPLOT output on “intercept”.TracePlot
AutocorrelationPlot
Posterior Density
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Marginal/Cell Means
• Effects on previous 2-3 slides not of particular interest.
• Marginal means:– Can derive using contrast vectors that are used to
compute least squares means in PROC GLM/MIXED/GLIMMIX etc.
• lsmeans trt time trt*time / e;
– Ai: marginal mean for trt i– Bj : marginal mean for time j– AiBj: cell mean for trt i time j.
53
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Examples of marginal/cell means
• Marginal means
• Cell mean
54
1 1 11 1
1 time timen n
A j jj jtime
trt time trt timen
1 1 11 1
1 trt trt
i
n n
B i iitrt
time trt trt timen
1 1 1 1 1 1A B trt time trt time
Applied Bayesian Inference, KSU, April 29, 2012
§ / 55
Marginal/cell (“LS”) means.Variable Mean Median Std Dev
A1 1.403 1.401 0.223A2 -0.293 -0.292 0.223A3 -0.162 -0.162 0.224B1 -0.501 -0.5 0.216B2 0.366 0.365 0.213B3 0.465 0.466 0.217B4 0.932 0.931 0.216
A1B1 -0.234 -0.231 0.373A1B2 1.382 1.382 0.371A1B3 1.88 1.878 0.374A1B4 2.583 2.583 0.372A2B1 -0.584 -0.585 0.375A2B2 -0.524 -0.526 0.373A2B3 -0.062 -0.058 0.373A2B4 -0.003 -0.005 0.377A3B1 -0.684 -0.684 0.377A3B2 0.24 0.242 0.374A3B3 -0.422 -0.423 0.376A3B4 0.218 0.218 0.374
trt time Estimate Standard Error
1 1.4 0.22362 -0.29 0.22363 -0.16 0.2236
1 -0.5 0.2162 0.3667 0.2163 0.4667 0.2164 0.9333 0.216
1 1 -0.24 0.37421 2 1.38 0.37421 3 1.88 0.37421 4 2.58 0.37422 1 -0.58 0.37422 2 -0.52 0.37422 3 -0.06 0.37422 4 -3.61E-16 0.37423 1 -0.68 0.37423 2 0.24 0.37423 3 -0.42 0.37423 4 0.22 0.3742
MCMC (Monte Carlo error) EXACT (PROC MIXED)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 56
Posterior densities of a1, b1, a1b1.
Dotted lines: normal density inferences based on PROC MIXED
Closed lines: MCMC
Applied Bayesian Inference, KSU, April 29, 2012
§ / 57
Generalized linear mixed models(Probit Link Model)
• Stage 1:
• Stage 2:
• Stage 3:
1' ' ' '
1
| , 1y β u x β z u x β z ui i
n y y
i i i ii
p
2 11/2 2/2 2
1 1| exp '22
u u A uA
u quu
p
constantβp
2up
2 22 |, | ,, | y β uu y ββ u uu upp p pp
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Rethinking prior on b
• i.e.– Might not be the best idea for binary data, especially
when the data is “sparse”• Animal breeders call this the “extreme category problem”
– e.g., if all of responses in a fixed effects subclass is either 1 or 0, then ML/PM of corresponding marginal mean will approach -/+ ∞.
– PROC LOGISTIC has the FIRTH option for this very reason.• Alternative:
– Typically,
58
constantβp
20~ ( , )ββ β IN
0β 016 < 2
b < 50 is probably sufficient on the underlying latent scale (conditionally N(0,1))
Applied Bayesian Inference, KSU, April 29, 2012
§ / 59
Recall Latent Variable Concept(Albert and Chib, 1993)
• Recall uzxu ''),|1Prob( iiiY bb
uzx-)uzxu '''' 1(~,β| iiiiii ,N bb
Suppose for animal i ' ' 1x β z ui i
Then Prob( 1) 1 0.159iY
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
liability
Sta
ndar
d N
orm
al D
ensi
ty
Applied Bayesian Inference, KSU, April 29, 2012
§ / 60
Data augmentation with ={i},
• i.e.
n
iiii
n
iipp
1
''
1
,|,| uzx-uu bbb
otherwise0
0if1|1Pr
i
iiYob
otherwise0
0if1|0Pr
i
iiYob
Pr | 0I I 0 I 0 I 1i i i i i iob Y y Y Y
distribution of Y becomes degenerate or point mass in form conditional on l
I . indicator function
Applied Bayesian Inference, KSU, April 29, 2012
§ / 61
Rewrite hierarchical model
• Stage 1a)
• Stage 1b)
• Those two stages define likelihood function
1
| I 0 I 0 I 0 I 1yn
i i i ii
p Y Y
' '
1 1
| , | ,β u β u - x β z un n
i i i ii i
p p
( |( | , ) ( |)| , ) ,y β u y, β y β uup p d dp p
1
Prob( | ) ( | , )β un
ii
iiY y p d
n
i
yii
yii
ii
1
1'''' 1 uzxuzx bb
Applied Bayesian Inference, KSU, April 29, 2012
§ / 62
Joint Posterior Density
• Now
• Let’s for now assume known 2u:
2 2 2( | ) ( | , ( | () ), , , | )y β u β uβ u yu uup p ppp p
2 2( | ) ( ( | )|, , | , , ) βy yβ uu β uu up p pp p
Applied Bayesian Inference, KSU, April 29, 2012
§ / 63
FCD
• Liabilities:
),|()|(Pr,,,| 2 uyu bb iiiui pyYobp
0I,,,| ''2 iiiiuip uzxyu bb
0I,,,| ''2 iiiiuip uzxyu bb
if yi = 1
if yi = 0
i.e., draw from truncated normals
Applied Bayesian Inference, KSU, April 29, 2012
§ / 64
FCD (cont’d)
• Fixed and random effects
2
1
22
2'exp
2'exp
)|(),|(,,|,
u
uu ppp
uAuZuXZuX
uuyu
bb
bb
1
1 2
ˆ
ˆX'X X'Z X'βZ'X Z'Z A Z'u u
12 2
ˆ ' ',
' 'ˆe u -1
X X X Zββ,u| , ,y~NZ X Z Z+Gu
where
Applied Bayesian Inference, KSU, April 29, 2012
§ / 65
Alternative Sampling strategies for fixed and random effects
• 1. Joint multivariate draw from– faster mixing…but computationally expensive?
• 2. Univariate draws from FCD using partitioned matrix results.– Refer to Slides # 36, 37, 49– Slower mixing.
2uβ,u| ,y,
Applied Bayesian Inference, KSU, April 29, 2012
§ / 66
Recall “binarized” RCBD
Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5>75 80.9>75 79.1>75 88.6>75 95.9>75 2 70.9 81.8>75 70.9 88.6>75 85.9>75 3 76.8>75 86.4>75 90.5>75 89.1>75 83.2>75 4 75.9>75 75.5>75 62.7 91.4>75 87.7>75 5 77.3>75 77.3>75 69.5 75.0 74.5 6 66.4 73.2 86.4>75 79.5>75 72.7 7 59.1 77.7>75 72.7 85.0>75 90.9>75 8 64.1 72.3 73.6 75.9>75 60.0 9 74.5 81.4>75 64.5 75.5>75 83.6>75 10 67.3 82.3>75 65.9 70.5 63.2
Applied Bayesian Inference, KSU, April 29, 2012
§ / 67
MCMC analysis
• 5000 burn-in cycles• 500,000 additional cycles
– Saving every 10: 50,000 saved cycles• Full conditional univariate sampling on fixed
and random effects.• “Known” 2
u = 0.50.
• Remember…no 2e.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 68
Fixed Effect Comparison on inferences(conditional on “known” 2
u = 0.50)
• MCMC
• PROC GLIMMIX
Solutions for Fixed EffectsEffect diet Estimate Standard
ErrorIntercept 0.3097 0.4772diet 1 -0.5935 0.5960diet 2 0.6761 0.6408diet 3 -0.9019 0.6104diet 4 0.6775 0.6410diet 5 0 .
Variable Mean Median Std Dev Nintercept 0.349 0.345 0.506 50000
DIET1 -0.659 -0.654 0.64 50000DIET2 0.761 0.75 0.682 50000DIET3 -1 -0.993 0.649 50000DIET4 0.76 0.753 0.686 50000
1
2
3
4
5
β
aaaaa
Applied Bayesian Inference, KSU, April 29, 2012
§ / 69
Marginal Mean Comparisons
• Based on K’b1 1 0 0 01 0 1 0 0
' 1 0 0 1 01 0 0 0 11 0 0 0 0
K
diet Least Squares Means
diet Estimate Standard Error
1 -0.2838 0.4768
2 0.9858 0.5341
3 -0.5922 0.4939
4 0.9872 0.5343
5 0.3097 0.4772
MCMC
PROC GLIMMIX
Variable Mean Median Std Dev Nmm1 -0.31 -0.302 0.499 50000mm2 1.11 1.097 0.562 50000mm3 -0.651 -0.644 0.515 50000mm4 1.109 1.092 0.563 50000mm5 0.349 0.345 0.506 50000
Applied Bayesian Inference, KSU, April 29, 2012
§ / 70
Diet 1 Marginal Mean (+a1)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 71
Posterior Density discrepancy between MCMC and Empirical Bayes for i?
Diet Marginal Means
Dotted lines: normal approximation based on PROC GLIMMIX
Closed lines: MCMC
Do we run the risk of overstating precision with conventional methods?
Applied Bayesian Inference, KSU, April 29, 2012
§ / 72
How about probabilities of success?
• i.e., (K’b) or normal cdf of marginal means
diet Estimate Standard Error
Mean Standard
ErrorMean
1 -0.2838 0.4768 0.3883 0.18272 0.9858 0.5341 0.8379 0.13113 -0.5922 0.4939 0.2769 0.16534 0.9872 0.5343 0.8382 0.13095 0.3097 0.4772 0.6216 0.1815
Variable Mean Median Std Dev N
prob1 0.391 0.381 0.173 20000
prob2 0.833 0.864 0.126 20000
prob3 0.282 0.26 0.157 20000
prob4 0.833 0.863 0.126 20000
prob5 0.623 0.635 0.173 20000
MCMC
PROC GLIMMIX
DELTA METHOD
Applied Bayesian Inference, KSU, April 29, 2012
§ / 73
Comparison of Posterior Densitiesfor Diet Marginal Mean Probabilities
Dotted lines: normal approximation based on PROC GLIMMIX
Closed lines: MCMC
Largest discrepancies along the boundaries
Applied Bayesian Inference, KSU, April 29, 2012
§ / 74
Posterior density of (+a1) & (+a2)
(+a2) (+a1)
Applied Bayesian Inference, KSU, April 29, 2012
§ / 75
Posterior density of (+a2) - (+a1)
Probability (F (m+a2) - F (m+a1) < 0) = 0.0164
“Two-tailed” P-value = 2*0.0164 = 0.0328
prob21_diff Frequency Percent
prob21_diff < 0 819 1.64
prob21_diff >= 0 49181 98.36
Applied Bayesian Inference, KSU, April 29, 2012
§ / 76
How does that compare with PROC GLIMMIX?
EstimatesLabel Estimate Standard
ErrorDF t Value Pr > |t| Mean Standard
ErrorMean
diet 1 lsmean
-0.2838 0.4768 10000 -0.60 0.5517 0.3883 0.1827
diet 2 lsmean
0.9858 0.5341 10000 1.85 0.0650 0.8379 0.1311
diet1 vs diet2 dif
-1.2697 0.6433 10000 -1.97 0.0484 Non-est .
Recall, we assumed “known” 2u …hence normal rather than
t-distributed test statistic.
Applied Bayesian Inference, KSU, April 29, 2012
§ / 77
What if variance components are not known?
• Specify priors on variance components: Options?– 1. Conjugate (Scaled Inverted Chi-Square) denoted as c-2 (nm, nmsm
2))
– 2. Flat (and bounded as well?)
– 3. Gelman’s (2006) prior
2
2
2 2
122 2 2 22
| , ; ,
2
m
m mm
m
m ms
m m m mm
s
p s e m u e
n
nn
n
n n
2 1; , mp m u e
1
2 2 2 22(0, ) ; 0 m m m mp Uniform A p A
1mn 2 0ms
2mn 2 0ms
Applied Bayesian Inference, KSU, April 29, 2012
§ / 78
Relationship between Scaled Inverted Chi-Square & Inverted Gamma
• Scaled Inverted Chi-square:
• Inverted Gamma
2( 1)2 2| ,p e ba a
a b ba
2
2
2
12 2 2
2
2 2| ,2
2
s
p s e
sn
n
n
n
n
n
2E ; 11
b aa
22 22 2
2
2Var | , ; 2
2 4
v sv s
v v n
2
2 2E | , ; 22
ss n n nn
22
2Var1 2
ba a
1mn 2 0ms 12
a 0b Gelman’s prior Gelman’s prior
Applied Bayesian Inference, KSU, April 29, 2012
§ / 79
Gibbs sampling and mixed effects models
• Recall the following hierarchical model:
/22 2 22
'2 exp
2e ue
y Xβ Zu y Xβ Zuy|β,u, ,
n
ep
1/22 2
2
'| 2 exp2
u A uuq
u uu
p
2
2122 2 2 2| ,
u uu
u
s
u u u up s enn n
2
2122 2 2 2| ,
e ee
e
s
e e e ep s enn n
Applied Bayesian Inference, KSU, April 29, 2012
§ / 80
Joint Posterior Density and FCD
• FCD for b and u: same as before: normal• FCD for VC: c-2
2
2
2
2
/2
122 2
22
1/222
2
12 2
2
2
'e
'e
xp2
xp2
e ue
y Xβ Zβ,u, , |y
u Z
u A
u
u
y Xβ
eu uu e
eu
eq
uu
n
s
e
s
u e ee
p
n
n
nn
2 2 2
1
| , , `e eyn
e e e ej
p ELSE n s c n n
2 2 1 2
1
| , , `y u A un
u u u uj
p ELSE q s c n n
e y Xβ Zu
Applied Bayesian Inference, KSU, April 29, 2012
§ / 81
Back to Split Plot in Time Example
• Empirical Bayes (EGLS based on REML)
title 'Split Plot in Time using Mixed';title2 'UnKnown Variance Components';proc mixed data=ear covtest ; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); ods output solutionf = solutionf;run;proc print data=solutionf; where estimate ne 0;run;
Fully Bayes:
• 5000 burnin-cycles• 200000 subsequent
cycles• Save every 10 post
burn-in• Use Gelman’s prior
on VCCode available online
Applied Bayesian Inference, KSU, April 29, 2012
§ / 82
Variance component inferenceCovariance Parameter Estimates
Cov Parm Estimate Standard Error
Z Value Pr > Z
rabbit(trt) 0.08336 0.09910 0.84 0.2001
Residual 0.5783 0.1363 4.24 <.0001
MCMC
PROC MIXED
Variable Mean Median Std Dev N
sigmau 0.127 0.0869 0.141 20000
sigmae 0.632 0.611 0.15 20000
Applied Bayesian Inference, KSU, April 29, 2012
§ / 83
MCMC plots
Random effects variance Residual Variance
Applied Bayesian Inference, KSU, April 29, 2012
§ / 84
Estimated effects ± se (sd)PROC MIXED
Effect trt time Estimate StdErr
Intercept _ _ 0.22 0.3638
trt 1_ 2.36 0.5145
trt 2_ -0.22 0.5145
time _ 1 -0.9 0.481
time _ 2 0.02 0.481
time _ 3 -0.64 0.481
trt*time 1 1 -1.92 0.6802
trt*time 1 2 -1.22 0.6802
trt*time 1 3 -0.06 0.6802
trt*time 2 1 0.32 0.6802
trt*time 2 2 -0.54 0.6802
trt*time 2 3 0.58 0.6802
MCMCVariable Mean Median Std Dev Nintercept 0.217 0.214 0.388 20000
TRT1 2.363 2.368 0.55 20000TRT2 -0.22 -0.219 0.55 20000
TIME1 -0.898 -0.893 0.499 20000TIME2 0.0206 0.0248 0.502 20000TIME3 -0.64 -0.635 0.501 20000TRT1
TIME1-1.924 -1.931 0.708 20000
TRT1TIME2
-1.222 -1.22 0.71 20000
TRT1TIME3
-0.057 -0.057 0.715 20000
TRT2TIME1
0.318 0.315 0.711 20000
TRT2TIME2
-0.54 -0.541 0.711 20000
TRT2TIME3
0.585 0.589 0.71 20000
Applied Bayesian Inference, KSU, April 29, 2012
§ / 85
Least Squares MeansEffect trt time Estimate Standar
d ErrorDF
trt 1 1.4000 0.2135 12trt 2 -0.2900 0.2135 12trt 3 -0.1600 0.2135 12time 1 -0.5000 0.2100 36time 2 0.3667 0.2100 36time 3 0.4667 0.2100 36time 4 0.9333 0.2100 36trt*time 1 1 -0.2400 0.3638 36trt*time 1 2 1.3800 0.3638 36trt*time 1 3 1.8800 0.3638 36trt*time 1 4 2.5800 0.3638 36trt*time 2 1 -0.5800 0.3638 36trt*time 2 2 -0.5200 0.3638 36trt*time 2 3 -0.06000 0.3638 36trt*time 2 4 4.44E-16 0.3638 36
trt*time 3 1 -0.6800 0.3638 36trt*time 3 2 0.2400 0.3638 36trt*time 3 3 -0.4200 0.3638 36trt*time 3 4 0.2200 0.3638 36
Marginal (“Least Squares”) Means
Variable Mean Median Std DevA1 1.399 1.401 0.24A2 -0.292 -0.29 0.237A3 -0.16 -0.161 0.236B1 -0.502 -0.501 0.224B2 0.364 0.363 0.222B3 0.467 0.466 0.224B4 0.934 0.936 0.222
A1B1 -0.244 -0.246 0.389A1B2 1.378 1.379 0.391A1B3 1.882 1.88 0.391A1B4 2.581 2.584 0.391A2B1 -0.586 -0.586 0.393A2B2 -0.526 -0.525 0.385A2B3 -0.058 -0.054 0.387A2B4 0.0031 0.0017 0.386A3B1 -0.676 -0.678 0.388A3B2 0.239 0.241 0.386A3B3 -0.422 -0.427 0.392A3B4 0.219 0.216 0.385
MCMCPROC MIXED
mA1
mB1
mA1B1
mA1
mB1
mA1B1
Applied Bayesian Inference, KSU, April 29, 2012
§ / 86
Posterior Densities of A1, B1, A1B1
Dotted lines: t densities based on estimates/stderrs from PROC MIXED
Closed lines: MCMC
Applied Bayesian Inference, KSU, April 29, 2012
§ / 87
How about fully Bayesian inference in generalized linear mixed models?
• Probit link GLMM.– Extensions to handle unknown variance
components are exactly the same given the augmented liability variables.
• i.e. scaled-inverted chi-square conjugate to 2u.
– No “overdispersion” (2e) to contend with for
binary data.• But stay tuned for binomial/Poisson data!
Applied Bayesian Inference, KSU, April 29, 2012
§ / 88
Analysis of “binarized” RCBD data.
title 'Posterior inference conditional on unknown VC';proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link = probit; random litter; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 / ilink; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/ ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0; run;
10000 burnin cycles
200000 cycles therafter
Saving every 10
Gelman’s prior on VC.
Empirical Bayes Fully Bayes
Applied Bayesian Inference, KSU, April 29, 2012
§ / 89
Inferences on VCMethod = RSPL MCMC
Covariance Parameter Estimates
Estimate Standard Error
0.5783 0.5021
Covariance Parameter Estimates
Estimate Standard Error
0.6488 0.6410
Method = Laplace
Covariance Parameter Estimates
Estimate Standard Error
0.6662 0.6573
Method = Quad
Analysis Variable : sigmau Mean Median Std Dev N2.048 1.468 2.128 20000
Applied Bayesian Inference, KSU, April 29, 2012
§ / 90
Inferences on marginal means (+ai)Method = Laplace MCMC
diet Least Squares Means
diet Estimate Standard Error
DF
1 -0.3024 0.5159 36
2 1.0929 0.5964 36
3 -0.6428 0.5335 36
4 1.0946 0.5976 36
5 0.3519 0.5294 36
Variable Mean Median Std Dev Nmm1 -0.297 -0.301 0.643 20000mm2 1.322 1.283 0.716 20000mm3 -0.697 -0.69 0.662 20000mm4 1.319 1.285 0.72 20000mm5 0.465 0.442 0.671 20000
Larger: take into account uncertainty on variance components
Applied Bayesian Inference, KSU, April 29, 2012
§ / 91
Posterior Densities of (+ai)
Dotted lines: t36 densities based estimates and standard errors from PROC GLIMMIX (method=laplace)
Closed lines: MCMC
Applied Bayesian Inference, KSU, April 29, 2012
§ / 92
MCMC inferences on probabilities of “success”:
(based on (+ai)
Applied Bayesian Inference, KSU, April 29, 2012
§ /
MCMC inferences on marginal probabilities:(based on )
93
21i
u
a
Potentially big issues with empirical Bayes inference…dependent upon quality of VC inference & asymptotics!
Applied Bayesian Inference, KSU, April 29, 2012
§ / 94
Inference on Diet 1 vs. Diet 2 probabilities
EstimatesLabel Mean Standard
ErrorMean
diet 1 lsmean
0.3812 0.1966
diet 2 lsmean
0.8628 0.1309
diet1 vs diet2 dif
Non-est .
Variable Mean Median Std Dev
N
Probdiet1
0.4 0.382 0.212 20000
Probdiet2
0.857 0.899 0.137 20000
Probdiff
0.457 0.464 0.207 20000P-value = 0.0559
prob21_diff Frequency Percent
prob21_diff < 0 180 0.90
prob21_diff >= 0 19820 99.10Probability (F (m+a2) - F (m+a1) < 0) = 0.0090 (“one-tailed”)
PROC GLIMMIX MCMC
MCMC
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Any formal comparisons between GLS/REML/EB(M/PQL) and MCMC for GLMM?
• Check Browne and Draper (2006).• Normal data (LMM)
– Generally, inferences based on GLS/REML and MCMC are sufficiently close.
– Since GLS/REML is faster, it is the method of choice for classical assumptions.
• Non-normal data (GLMM).– Quasi-likelihood based methods are particularly problematic in bias of
point estimates and interval coverage of variance components.• Side effects on fixed effects inference.
– Bayesian methods with diffuse priors are well calibrated for both properties for all parameters.
– Comparisons with Laplace not done yet.95
Applied Bayesian Inference, KSU, April 29, 2012
§ /
A pragmatic take on using MCMC vs PL for GLMM under classical assumptions?
• If datasets are too small to warrant asymptotic considerations, then the experiment is likely to be poorly powered.– Otherwise, PL might ≈ MCMC inference.
• However, differences could depend on dimensionality, deviation of data distribution from normal, and complexity of design.
• The real big advantage of MCMC ---is multi-stage hierarchical models (see later)
96
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Implications of design on Fully Bayes vs. PL inference for GLMM?
• RCBD: Known for LMM, that inferences on treatment differences in RCBD are resilient to estimates of block VC.– Inference on differences in treatment effects thereby
insensitive to VC inferences in GLMM?• Whole plot treatment factor comparisons in split plot
designs?• Greater sensitivity (i.e. whole plot VC).
– Sensitivity of inference for conditional versus “population-averaged” probabilities?
97
' vs.x β i'
21
x βiu
Applied Bayesian Inference, KSU, April 29, 2012
§ / 98
Ordinal Categorical Data
• Back to the GF83 data.– Gibbs sampling strategy laid out by Sorensen and
Gianola (1995); Albert and Chib (1993).– Simple extensions to what was considered earlier
for linear/probit mixed models
11
1 ifPr | , ,
0 otherwise
j i j
i i j job Y j
Applied Bayesian Inference, KSU, April 29, 2012
§ / 99
Joint Posterior Density• Stages
111
| , Pr | , I Iy τ τn c
i i j i j iji
p ob Y j Y j
' '1| , ,β u,τ x β z ui i i i j i jp
uAu
Au 1
22/122/
2 '2
1exp2
1|uu
qup
constantbp
2
12222 exp,|~u
uuuuuu
uspb
nn
1A
1B
2
2
3
(or something diffuse)
Applied Bayesian Inference, KSU, April 29, 2012
§ /100
Anything different for FCD compared to probit binary?
• Liabilities
• Thresholds:
– This leads to painfully slow mixing…a better strategy is based on Metropolis sampling (Cowles et al., 1996).
' '1| , I Iβ u, x β z ui i i i i j i j ip y j Y j
| , min( | 1),max( | )j jp ELSE U Y j Y j
Applied Bayesian Inference, KSU, April 29, 2012
§ /101
Fully Bayesian inference on GF83
• 5000 burn-in samples• 50000 samples post burn-in• Saving every 10.
Diagnostic plots for 2
u
Applied Bayesian Inference, KSU, April 29, 2012
§ /102
Posterior SummariesVariable Mean Median Std Dev 5th Pctl 95th Pctl
intercept -0.222 -0.198 0.669 -1.209 0.723hy 0.236 0.223 0.396 -0.399 0.894
age -0.036 -0.035 0.392 -0.69 0.598sex -0.172 -0.171 0.393 -0.818 0.48
sire1 -0.082 -0.042 0.587 -1 0.734sire2 0.116 0.0491 0.572 -0.641 0.937sire3 0.194 0.106 0.625 -0.64 1.217sire4 -0.173 -0.11 0.606 -1.118 0.595
sigmau 1.362 0.202 8.658 0.0021 4.148thresh2 0.83 0.804 0.302 0.383 1.366
probfemalecat1 0.598 0.609 0.188 0.265 0.885
probfemalecat2 0.827 0.864 0.148 0.53 0.986
probmalecat1 0.539 0.545 0.183 0.23 0.836
probmalecat2 0.79 0.821 0.154 0.491 0.974
Applied Bayesian Inference, KSU, April 29, 2012
§ /103
Posterior densities of sex-specific cumulative probabilities (first two categories)
How would interpret a “standard error” in this context?
Applied Bayesian Inference, KSU, April 29, 2012
§ /104
Posterior densities of sex-specific probabilities (each category)
Applied Bayesian Inference, KSU, April 29, 2012
§ /105
What if some FCD are not recognizeable?
• Examples: Poisson mixed models, logistic mixed models.
• Hmmm.. Need a different strategy.– Use Gibbs sampling whenever you can.– Use Metropolis-Hastings sampling for FCD that are
not recognizeable.• NEXT!