§❺ metropolis-hastings sampling and general mcmc approaches for glmm
DESCRIPTION
§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM. Robert J. Tempelman. Genetic linkage example…again. Recall plant genetic linkage analysis problem or Suppose flat constant prior ( p( q ) 1) was used. Then. Suppose posterior density is not recognizable . - PowerPoint PPT PresentationTRANSCRIPT
§ /
Applied Bayesian Inference, KSU, April 29, 2012
1
§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM
Robert J. Tempelman
§ /
Applied Bayesian Inference, KSU, April 29, 2012
2
Genetic linkage example…again
Recall plant genetic linkage analysis problem
or
Suppose flat constant prior (p(q) 1) was used. Then
4321
441
41
42
!!!!!|
4321
yyyy
yyyynL
qqqqq y
4321 112| yyyyL qqqqq y
q
R
dL
Lpy
yy|
||
§ /
Applied Bayesian Inference, KSU, April 29, 2012
3
Suppose posterior density is not recognizable
• Additionally, suppose there is no clear data augmentation strategy
• Several solutions:– e.g. adaptive rejection sampling (not discussed here)– One recourse is to use the Metropolis-Hastings
algorithm in which one generates from a candidate (or proposal) density function q(q', q'') in generating a MCMC chain of random variates from.
• q‘ : where you’re at now at current MCMC cycle• q'': proposed value for next MCMC cycle
§ /
Applied Bayesian Inference, KSU, April 29, 2012
4
Metropolis Hastings
• Say MCMC cycle is currently at value q[t-1] from cycle t-1.
• Draw a proposed value q* from candidate density for cycle t.
• Accept move from q[t-1] to q[t] = q* with probability:
– Otherwise set q[t] = q*
0*,|*,1
1,*,|
*,|*min*, ]1[]1[]1[
]1[
]1[
ttt
t
t qpifotherwise
qpqp
qqqqqqqqq
qq y yy
[ 1], *tq q q
Good readable reference? Chib and Greenburg (1995)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
5
How to compute this ratio “safely”
• Always use logarithms whenever evaluating ratios!!!
• Once you compute this…then backtransform
[ 1]
[ 1] [ 1]
[ 1] [ 1] [ 1]
* | *,log log
| , *
log * | log *, log | log , *
y
y
y y
t
t t
t t t
p q
p q
p q p q
q q q
q q q
q q q q q q
exp log
§ /
Applied Bayesian Inference, KSU, April 29, 2012
6
Back to plant genetics exampleRecall y1=1997, y2=906, y3=904, y4=32.Let’s use as the candidate generating function (based on likelihood approx.)1.Determine a starting value (i.e. 0th cycle) q[o]
2.For t = 1, m (number of MCMC cycles)a) Generate q * from q(q[t-1], q*) = N(0.0357,3.6338 x 10-5) b) Generate U from a Uniform(0,1) distributionc) If U<(q[t-1], q*) then set q[t]= q *, else set q[t] = q[t-1]
• Note that this is an independence chains algorithm
q(q[t-1], q*) = N(m 0.0357, s2 3.6338 x 10-
5)
q(q[t-1], q*) = q(q*)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
7
Independence chains Metropolis
• When candidate does not depend on q[t-1]
– i.e.
• However, in spite of this “independence” label, there is still serial autocorrelation between the samples.
• IML code online. Generate output for 9000 draws after 1000 burn-in samples. Save every 10.
0*|,1
1,*|
|*min*, ]1[]1[
]1[
]1[
qqqqqq
qq qpifotherwise
qpqp
tt
t
t y yy
q(q[t-1], q*) = q(q*)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
8
Key plots and summaries
§ /
Applied Bayesian Inference, KSU, April 29, 2012
9
Monitoring MH acceptance rates over cycles for genetic linkage example
• Average MH acceptance rates (for every 10 cycles)
ALPHASAV
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
CYCLE
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Many acceptance rates close to 1!
Is this good?
NO
Intermediate acceptance ratios (0.25-0.5) are optimal for MH mixing.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
10
How to optimize Metropolis acceptance ratios
• Recall q(q[t-1], q*) = N(m,s2)– m = 0.0357, s2=3.6338 x 10-5
• Suggest using q(q[t-1], q*) = N(m,cs2) and modify c (during burn-in) so that MH acceptance rates are intermediate– Increase c ….decrease acceptance rates– Decrease c ….increase acceptance rates.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
11
“Tuning” the MH-sampler:My strategy
• Every 10 MH cycles for first half of burnin, assess the following:– if average acceptance rate > .80, then set c = 1.2 c,– if average acceptance rate < .20 then set c = 0.7 c,– otherwise let c be.
• SAS PROC MCMC has a somewhat different strategy.
• Let’s rerun same PROC IML code again but with this modification.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
12
Average acceptance ratio versus cycle(during 400 burn-in cycles)
C_CHG
1
2
3
4
5
6
7
CYCLE_SCALE
0 1000 2000 3000 4000
c
cycle
One should finish the tuning process not much later than half-ways through “burnin”
§ /
Applied Bayesian Inference, KSU, April 29, 2012
13
Monitoring MH acceptance rates over cycles
• Average MH acceptance rates (every 10 cycles) post burn-in (16000 cycles)
ALPHASAV
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
CYCLE
4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000
§ /
Applied Bayesian Inference, KSU, April 29, 2012
14
Posterior density of q.
Analysis Variable : theta Mean Median Std Dev 5th Pctl 95th Pctl0.0366 0.0366 0.0064 0.0265 0.0471
§ /
Applied Bayesian Inference, KSU, April 29, 2012
15
Random walk Metropolis sampling• More common (especially when proposals based
on likelihood function are not plausible) than independence chains Metropolis.
• Proposal density is chosen to be symmetric in q* and q[t-1].– i.e. q(q[t-1], q*) = q(q*, q[t-1])
• Example: generate a random variate d from N(0,cs2) and add it to the previous cycle value q[t-1] to generate q* = q[t-1] + d: same as sampling from
]1[']1[
22
]1[ **21exp
2
1*, ttt
ccq qqqq
ssqq
§ /
Applied Bayesian Inference, KSU, April 29, 2012
16
Random walk Metropolis (cont’d)
• Because of symmetricity of q(q[t-1], q*) in q[t-1]
and q*, MH acceptance ratio simplifies:
• i.e., because
0|
,1
1,||*min*, ]1[]1[]1[
y y
yttt pif
otherwisepp
qqq
[ 1] [ 1], * *,t tq qq q q q
§ /
Applied Bayesian Inference, KSU, April 29, 2012
17
Back to example.• Start again with s2 = 0.00602 and a starting value for q[t-1] at
t=1.• Generate proposed value from
accept with probability
– i.e., generate from N(0,cs2) and add to q[t-1] • Tune c for intermediate acceptance rates during burn-in.
]1[']1[
22
]1[ **21exp
2
1*, ttt
ccq qqqq
ssqq
0|
,1
1,||*min*, ]1[]1[]1[
y y
yttt pif
otherwisepp
qqq
§ /
Applied Bayesian Inference, KSU, April 29, 2012
18
Summary
§ /
Applied Bayesian Inference, KSU, April 29, 2012
19
What about “canned” software?
• WinBugs• AD Model Builder• Various R packages (MCMCglmm)• SAS PROC MCMC
– Will demonstrate shortly…functions a bit like PROC NLINMIXED (no class statement)
• They all work fine.– But sometimes they don’t recognize conjugacy in priors
• i.e., can’t distinguish between conjugate and non-conjugate (Metropolis) sampling.
• So often defaults to Metropolis. (PROC MCMC: random walk Metropolis)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
20
Recall old split plot in time example
• Recall the “bunny” example from earlier.– We used PROC GLIMMIX and MCMC (SAS PROC IML) to
analyze the data.– Our MCMC implementation involved recognizeable FCD
• Split plot in time assumption.– Other alternatives?
• Marginal versus conditional specifications on CS• AR(1)• Others?
– Some FCD are not recognizeable• Metropolis updates necessary.• Let’s use SAS PROC MCMC.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
21
First create the dummy variables using PROC TRANSREG (PROC MCMC does not have a “CLASS” statement)
(Dataset called ‘recodedsplit’)Obs _NA
ME_Intercept
trt1 trt2 time1
time2
time3
Trt1time1
Trt1time2
Trt1time3
Trt2time1
Trt2time2
Trt2time3
trt time y trtrabbit
1 -0.3 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -0.3 1_1
2 -0.2 1 1 0 0 1 0 0 1 0 0 0 0 1 2 -0.2 1_1
3 1.2 1 1 0 0 0 1 0 0 1 0 0 0 1 3 1.2 1_1
4 3.1 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.1 1_1
5 -0.5 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -0.5 1_2
6 2.2 1 1 0 0 1 0 0 1 0 0 0 0 1 2 2.2 1_2
7 3.3 1 1 0 0 0 1 0 0 1 0 0 0 1 3 3.3 1_2
8 3.7 1 1 0 0 0 0 0 0 0 0 0 0 1 4 3.7 1_2
9 -1.1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 -1.1 1_3
10 2.4 1 1 0 0 1 0 0 1 0 0 0 0 1 2 2.4 1_3
Part of X matrix (full-rank) &_trgind
§ /
Applied Bayesian Inference, KSU, April 29, 2012
22
SAS PROC MCMC(“Conditional” specification)
proc mcmc data=recodedsplit outpost=ksu.postsplit propcov=quanew seed = &seed nmc=400000 thin=10monitor = (beta1-beta&nvar sigmae sigmag); array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parms sige 1 ; * residual sd; parms sigg 1 ; * random ef sd; parms (beta1-beta&nvar) 1;
prior beta:~normal(0,var=1e6);/* prior beta: ~ general(0); could also do this too */ prior sige ~ general(0,lower=0); /* Gelman prior */ prior sigg ~ general(0,lower=0); /* Gelman prior */
Where to save the MCMC samples
Metropolis implementation strategy
data null; call symputx(‘seed', 8723); call symputx('nvar',12); run;
Save how often?
Total number of samples after burnin
NBI = 1000 (default number of burn-in cycles)
Fixed effects dummy variables
Fixed effects
Parms: starting values
Priors: b ~ N(0,106)p(se) ~ constant;p(su) ~ constant
§ /
Applied Bayesian Inference, KSU, April 29, 2012
23
SAS PROC MCMC(conditional specification)
beginnodata; sigmae = sige*sige; sigmau = sigg*sigg; endnodata;
call mult(covar, beta, mu); random u ~ normal (0,var=sigmau) subject=trtrabbit ; model y ~ normal(mu + u,var=sigmae);run;
'x βi im
' ' 2,x β z ui i i ey N s 20,i uu N s
22u us s
22e es s
§ /
Applied Bayesian Inference, KSU, April 29, 2012
24
PROC MCMC outputParametersBlock Parameter Sampling
MethodInitialValue
Prior Distribution
1 sige N-Metropolis 1.0000 general(0,lower=0)
2 sigg N-Metropolis 1.0000 general(0,lower=0)
3 beta1 N-Metropolis 1.0000 normal(0,var=1e6)
beta2 1.0000 normal(0,var=1e6)
beta3 1.0000 normal(0,var=1e6)
beta4 1.0000 normal(0,var=1e6)
beta5 1.0000 normal(0,var=1e6)
beta6 1.0000 normal(0,var=1e6)
beta7 1.0000 normal(0,var=1e6)
beta8 1.0000 normal(0,var=1e6)
beta9 1.0000 normal(0,var=1e6)
beta10 1.0000 normal(0,var=1e6)
beta11 1.0000 normal(0,var=1e6)
beta12 1.0000 normal(0,var=1e6)
Random Effects Parameters
Parameter Subject Levels Prior Distribution
u trtrabbit 15 normal(0,var=sigmau)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
25
Posterior SummariesParameter N Mean Standard
DeviationPercentiles25% 50% 75%
beta1 40000 0.2178 0.3910 -0.0434 0.2199 0.4823beta2 40000 2.3706 0.5528 2.0007 2.3707 2.7360beta3 40000 -0.2079 0.5524 -0.5761 -0.2063 0.1545beta4 40000 -0.8958 0.5086 -1.2292 -0.8967 -0.5616beta5 40000 0.0139 0.5066 -0.3172 0.0115 0.3501beta6 40000 -0.6407 0.5006 -0.9753 -0.6429 -0.3033beta7 40000 -1.9340 0.7151 -2.4049 -1.9339 -1.4548beta8 40000 -1.2282 0.7134 -1.7030 -1.2309 -0.7548beta9 40000 -0.0719 0.7071 -0.5445 -0.0763 0.3993beta10 40000 0.3055 0.7127 -0.1721 0.3011 0.7832beta11 40000 -0.5411 0.7097 -1.0132 -0.5395 -0.0682beta12 40000 0.5758 0.7033 0.1095 0.5748 1.0406sigmae 40000 0.6314 0.1478 0.5266 0.6124 0.7148sigmau 40000 0.1276 0.1465 0.0285 0.0850 0.1748
Compare to conditional model results from § 82,84
§ /
Applied Bayesian Inference, KSU, April 29, 2012
26
Effective Sample SizesParameter ESS Autocorrelation
TimeEfficiency
beta1 4285.7 9.3334 0.1071beta2 5778.0 6.9229 0.1444beta3 5171.1 7.7353 0.1293beta4 5639.7 7.0926 0.1410beta5 3900.5 10.2550 0.0975beta6 3901.6 10.2522 0.0975beta7 4197.4 9.5297 0.1049beta8 6248.7 6.4013 0.1562beta9 6857.7 5.8329 0.1714beta10 2890.5 13.8385 0.0723beta11 6647.5 6.0173 0.1662beta12 5563.2 7.1902 0.1391sigmae 6173.6 6.4792 0.1543sigmau 1364.3 29.3186 0.0341
§ /
Applied Bayesian Inference, KSU, April 29, 2012
27
LSMEANS USING PROC MIXED
trt Least Squares Meanstrt Estimate Standard
Error1 1.4000 0.21352 -0.2900 0.21353 -0.1600 0.2135
time Least Squares Meanstime Estimate Standard
Error1 -0.5000 0.21002 0.3667 0.21003 0.4667 0.21004 0.9333 0.2100
trt*time Least Squares Means
trt time Estimate Standard Error
1 1 -0.2400 0.3638
1 2 1.3800 0.3638
1 3 1.8800 0.3638
1 4 2.5800 0.3638
2 1 -0.5800 0.3638
2 2 -0.5200 0.3638
2 3 -0.06000 0.3638
2 4 5.5E-15 0.3638
3 1 -0.6800 0.3638
3 2 0.2400 0.3638
3 3 -0.4200 0.3638
3 4 0.2200 0.3638
§ /
Applied Bayesian Inference, KSU, April 29, 2012
28
“Least-squares means”using output from PROC MCMC
Variable Mean Median Std DevTRT1 1.399202 1.399229 0.241373TRT2 -0.2857 -0.28771 0.238766TRT3 -0.16286 -0.16038 0.241136
TIME1 -0.50001 -0.50024 0.225171TIME2 0.362834 0.365804 0.226114TIME3 0.466009 0.465563 0.224869TIME4 0.938682 0.937432 0.223448
Variable Mean Median Std DevTRT1TIME1 -0.24151 -0.24036 0.395506TRT1TIME2 1.374094 1.373212 0.390686TRT1TIME3 1.875858 1.873671 0.388689TRT1TIME4 2.588362 2.585974 0.388577TRT2TIME1 -0.58048 -0.58151 0.387481TRT2TIME2 -0.51727 -0.51545 0.385221TRT2TIME3 -0.05497 -0.05467 0.389197TRT2TIME4 0.0099 0.008927 0.389475TRT3TIME1 -0.67805 -0.67985 0.393538TRT3TIME2 0.231677 0.2315 0.395277TRT3TIME3 -0.42287 -0.41975 0.38795TRT3TIME4 0.217785 0.219946 0.390986
Marginal means
Cell means
Compare to Gibbs sampling results from § 85
§ /
Applied Bayesian Inference, KSU, April 29, 2012
29
Posterior densities of s2u s2
e
Bounded above 0…by definition
§ /
Applied Bayesian Inference, KSU, April 29, 2012
30
The Marginal Model Specification (Type = CS)
• SAS PROC MIXED CODE
title "Marginal Model: Compound Symmetry using PROC MIXED";proc mixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type=cs rcorr; lsmeans trt*time;run;
§ /
Applied Bayesian Inference, KSU, April 29, 2012
31
• Now
• To ensure R is p.s.d,– nt: number of repeated measures per rabbit
2 2 2 2 2
2 2 2 2 22
( ) 2 2 2 2 2
2 2 2 2 2
1111
R
u e u u u
u u e u uk i
u u u e u
u u u u e
s s s s s s s s s s
s s s s s s s s s s s
2
2 2u
u e
s
s s
2 2 2
u es s s
11
1tn
§ /
Applied Bayesian Inference, KSU, April 29, 2012
32
Need to format data differentlyObs trt time trtrabbit first last y1 1 1 1_1 1 0 -0.32 1 2 1_1 0 0 -0.23 1 3 1_1 0 0 1.24 1 4 1_1 0 1 3.15 1 1 1_2 1 0 -0.56 1 2 1_2 0 0 2.27 1 3 1_2 0 0 3.38 1 4 1_2 0 1 3.79 1 1 1_3 1 0 -1.110 1 2 1_3 0 0 2.4
data=recodedsplit1
§ /
Applied Bayesian Inference, KSU, April 29, 2012
33
I’ll keep the covariates in a different file too.
Obs Intercept
trt1 trt2 time1
time2
time3
trt1time1
trt1time2
trt1time3
trt2time1
trt2time2
trt2time3
1 1 1 0 1 0 0 1 0 0 0 0 02 1 1 0 0 1 0 0 1 0 0 0 03 1 1 0 0 0 1 0 0 1 0 0 04 1 1 0 0 0 0 0 0 0 0 0 05 1 1 0 1 0 0 1 0 0 0 0 06 1 1 0 0 1 0 0 1 0 0 0 07 1 1 0 0 0 1 0 0 1 0 0 08 1 1 0 0 0 0 0 0 0 0 0 09 1 1 0 1 0 0 1 0 0 0 0 010 1 1 0 0 1 0 0 1 0 0 0 0
data=covariates
§ /
Applied Bayesian Inference, KSU, April 29, 2012
34
PROC MCMCdata a; run;
/* PROC MCMC WITH COMPOUND SYMMETRY ASSUMPTION */title1 "Bayesian inference on compound symmetry ";proc mcmc jointmodel data=a outpost=ksu.postcs propcov=quanew seed = &seed nmc=400000 thin=10 ;
array covar[1]/nosymbols ; array data[1]/nosymbols; array first1[1]/nosymbols; array last1[1]/nosymbols;
array beta[&nvar] ; array mu[&nrec]; array ytemp[&nrep]; array mutemp[&nrep]; array VCV[&nrep,&nrep];
This data step is a little silly but it is required.
jointmodel option implies that each observation contribution to likelihood function is NOT conditionally independent.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
35
begincnst; rc = read_array("recodedsplit1",data,"y"); rc = read_array("recodedsplit1",first1,"first"); rc = read_array("recodedsplit1",last1,"last"); rc = read_array("covariates",covar);endcnst;
parms sige .25 ; * residual sd; parms intrcl .3 ; * intraclass correlation;
parms (beta1-beta&nvar) 1;
§ /
Applied Bayesian Inference, KSU, April 29, 2012
36
beginnodata; prior beta:~normal(0,var=1e6); prior sige ~ general(0, lower=0); /* Gelman prior */ prior intrcl ~ general(0,lower=&lbound1,upper=.999); sigmae = sige*sige; sigmag = intrcl*sigmae; call fillmatrix(VCV,sigmag);
do i = 1 to &nrep; VCV[i,i] = sigmae;end;call mult(covar,beta,mu);
endnodata;
ljointpdf = 0;
• &lbound1 = -1/3 (lower bound on CS correlation when blocksize = 4)
§ /
Applied Bayesian Inference, KSU, April 29, 2012
37
do irec = 1 to &nrec; if (first1[irec] = 1) then counter=0;
counter = counter + 1; ytemp[counter] = data[irec]; mutemp[counter] = mu[irec]; if (last1[irec] = 1) then do; do; ljointpdf = ljointpdf + lpdfmvn(ytemp, mutemp, VCV); end; end;
end; model general(ljointpdf);run;
§ /
Applied Bayesian Inference, KSU, April 29, 2012
38
PROC MCMCPosterior SummariesParameter N Mean Standard
DeviationPercentiles25% 50% 75%
sige 40000 0.8643 0.1040 0.7921 0.8528 0.9225intrcl 40000 0.1736 0.1453 0.0679 0.1599 0.2661beta1 40000 0.2267 0.3909 -0.0313 0.2298 0.4869beta2 40000 2.3553 0.5523 1.9916 2.3491 2.7140beta3 40000 -0.2290 0.5536 -0.5965 -0.2327 0.1388beta4 40000 -0.8982 0.5012 -1.2320 -0.8984 -0.5682beta5 40000 0.0185 0.4937 -0.3080 0.0204 0.3433beta6 40000 -0.6505 0.4985 -0.9830 -0.6529 -0.3221beta7 40000 -1.9185 0.7058 -2.3900 -1.9170 -1.4498beta8 40000 -1.2292 0.7038 -1.6901 -1.2329 -0.7667beta9 40000 -0.0599 0.7024 -0.5232 -0.0555 0.4045beta10 40000 0.3204 0.7087 -0.1426 0.3182 0.7891beta11 40000 -0.5386 0.7072 -0.9975 -0.5438 -0.0748beta12 40000 0.5890 0.7025 0.1227 0.5945 1.0596
§ /
Applied Bayesian Inference, KSU, April 29, 2012
39
PROC MIXED vs PROC MCMC
Covariance Parameter EstimatesCov Parm Subject Estimate Standard
ErrorZ Value Pr Z
CS rabbit(trt) 0.08336 0.09910 0.84 0.4002Residual 0.5783 0.1363 4.24 <.0001
Variable Median Std Dev Minimum Maximum
sigmau2 0.110874 0.15354 -0.34127 5.535211
sigmae2 0.592512 0.152743 0.246462 1.870365
PROC MCMC
PROC MIXED
§ /
Applied Bayesian Inference, KSU, April 29, 2012
40
Posterior marginal densities for s2u and s2
e under marginal model
Notice how much of the posterior density of s2
u is concentrated to the left of 0!
Potential “ripple effect” on inferences on K’b ? (Stroup and Littell., 2002) relative to conditional spec.?
§ /
Applied Bayesian Inference, KSU, April 29, 2012
41
First order autoregressive model (type = AR(1))
• SAS PROC MIXED CODE
title "Marginal Model: AR(1) using PROC MIXED";proc mixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type= AR(1) rcorr; lsmeans trt*time;run; CORRECTION!
§ /
Applied Bayesian Inference, KSU, April 29, 2012
42
Specifying VCV for AR(1)
• Note
• Might be easier to specify:
2 3
22
( ) 2
3 2
11
11
Rk i
s
2
1( ) 2 2 2
1 0 01 0 1
0 1 10 0 1
Rk i
s
Especially for large Rk(i)
Example MCMC code provided online.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
43
Variance Component InferenceCovariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
AR(1) rabbit(trt) 0.2867 0.1453
Residual 0.6551 0.141
Variable Median Std Dev 5th Pctl 95th Pctl
rho 0.286 0.149 0.0313 0.52
sigmae2 0.706 0.178 0.501 1.056
PROC MIXED MCMC
§ /
Applied Bayesian Inference, KSU, April 29, 2012
44
An example of a “sticky” situation
• Consider a Poisson (count data) example.• Simulated data from a split plot design.
– 4 whole plots per each of 3 levels of a whole plot factor.
• 3 subplots per whole plot -> 3 levels of a subplot factor.
• Whole plot variance: s2w = 0.50
• Overdispersion (G-side) variance:– B*wholeplot variance: s2
e = 1.00
§ /
Applied Bayesian Inference, KSU, April 29, 2012
45
GLIMMIX code:
proc glimmix data=splitplot method=laplace; class A B wholeplot subject ; model y = A|B /dist=poisson solution ; random wholeplot(A) B*wholeplot(A); lsmeans A B A*B/e ilink;run;
§ /
Applied Bayesian Inference, KSU, April 29, 2012
46
Inferences on variance components:
• PROC GLIMMIX
Covariance Parameter EstimatesCov Parm Estimate Standard Errorwholeplot(A) 0.6138 0.3516B*wholeplot(A) 0.9293 0.2514
§ /
Applied Bayesian Inference, KSU, April 29, 2012
47
Using PROC MCMCproc mcmc data=recodedsplit outpost=postout propcov=quanew seed = 9548 nmc=400000 thin=10; array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parms sigmau .5; parms sigmae .5; parms (beta1-beta&nvar) 1; prior beta: ~ normal(0,var=10E6); prior sigmae ~ igamma(shape=.1,scale=.1); prior sigmau ~ igamma(shape=.1,scale=.1); call mult(covar, beta, mu); random u~ normal (0,var=sigmau) subject=plot ; random e~ normal (0,var=sigmae) subject= subject; lambda = exp(mu + u + e); model y ~ poisson(lambda);run;
'x βi im 20,j uu N s
' 'exp x β z ui i i ie 20,i ee N s
~i iy Poisson
2 0.1,0.1u IGs 2 0.1,0.1e IGs 6~ ,10β 0 IN
§ /
Applied Bayesian Inference, KSU, April 29, 2012
48
Some outputPosterior SummariesParameter
N Mean StandardDeviation
Percentiles25% 50% 75%
sigmag 40000 0.7947 0.5956 0.3891 0.6635 1.0324sigmae 40000 1.4055 0.4559 1.0802 1.3285 1.6449beta1 40000 6.6630 0.3811 6.4611 6.6790 6.9158beta2 40000 -3.8229 0.8258 -4.3769 -3.8290 -3.2845beta3 40000 -4.2165 0.8073 -4.7672 -4.2412 -3.7257beta4 40000 -0.7618 0.4472 -1.0997 -0.8095 -0.4266beta5 40000 -1.5901 0.6757 -2.1210 -1.5089 -1.1206beta6 40000 -2.0756 0.7286 -2.5323 -2.0938 -1.6069beta7 40000 0.7144 1.1396 -0.0554 0.7189 1.4600beta8 40000 0.6214 1.1488 -0.1162 0.6336 1.3851beta9 40000 2.4683 1.0499 1.8227 2.4922 3.1429beta10 40000 1.9011 1.1083 1.2645 1.9517 2.6003beta11 40000 -0.8063 0.8887 -1.4099 -0.8112 -0.2278beta12 40000 1.3887 0.9450 0.6332 1.4562 2.0298
In the same ball-park as the PROC GLIMMIX solutions/VC estimates…but there is a PROBLEM ->>>>>
§ /
Applied Bayesian Inference, KSU, April 29, 2012
49
Pretty slow mixingEffective Sample SizesParameter ESS Autocorrelation
TimeEfficiency
sigmag 155.1 257.9 0.0039sigmae 186.2 214.8 0.0047beta1 43.0 931.1 0.0011beta2 59.4 673.8 0.0015beta3 61.8 646.8 0.0015beta4 44.1 906.0 0.0011beta5 42.5 940.4 0.0011beta6 54.4 735.8 0.0014beta7 62.5 639.9 0.0016beta8 86.9 460.1 0.0022beta9 58.6 682.1 0.0015beta10 136.2 293.7 0.0034beta11 53.7 745.5 0.0013beta12 49.3 811.0 0.0012
§ /
Applied Bayesian Inference, KSU, April 29, 2012
50
sigmag
sigmae
beta1
beta2
§ /
Applied Bayesian Inference, KSU, April 29, 2012
51
From SAS log file:
Too sticky!!! Solution? Thin even more than saving every 10….and generate a lot more samples!
§ /
Applied Bayesian Inference, KSU, April 29, 2012
52
Hierarchical centering sampling advocated by SAS
proc mcmc data=recodedsplit outpost=postout propcov=quanew seed = 234 nmc=400000 thin=10; array covar[&nvar] intercept &_trgind; array beta[&nvar] ; array wp[16]; parms wp: 0; parms sigmae .5 ; parms sigmag .5 ; parms (beta1-beta&nvar) 1; prior wp: ~ normal(0,var=sigmag); prior beta: ~ normal(0,var=10E6); prior sigmae ~ igamma(shape=.1,scale=.1); prior sigmag ~ igamma(shape=.1,scale=.1); call mult(covar, beta, mu); w = wp[plot] + mu; random llambda ~ normal (w,var=sigmae) subject= subject; lambda = exp(llambda); model y ~ poisson(lambda);run;
20,j uu N s 6~ ,10β 0 IN 2 0.1,0.1e IGs 2 0.1,0.1u IGs 'x βi im
' 'x β z ui i iw
~i iy Poisson
2log ~ ,i i eN w s
§ /
Applied Bayesian Inference, KSU, April 29, 2012
53
Faster mixing!Effective Sample Sizes
Parameter ESS AutocorrelationTime
Efficiency
wp1 497.2 80.4554 0.0124wp2 621.5 64.3569 0.0155wp3 336.4 118.9 0.0084wp4 669.9 59.7148 0.0167wp5 967.1 41.3624 0.0242wp6 1767.9 22.6263 0.0442wp7 1160.7 34.4624 0.0290wp8 1109.0 36.0701 0.0277wp9 1275.3 31.3651 0.0319wp10 717.9 55.7176 0.0179wp11 1518.0 26.3512 0.0379wp12 1223.3 32.6995 0.0306wp13 583.9 68.5094 0.0146wp14 606.2 65.9881 0.0152wp15 674.1 59.3384 0.0169wp16 799.2 50.0492 0.0200
Effective Sample Sizes
Parameter ESS AutocorrelationTime
Efficiency
sigmae 3831.5 10.4397 0.0958sigmag 825.1 48.4794 0.0206beta1 850.1 47.0507 0.0213beta2 1475.5 27.1103 0.0369beta3 908.7 44.0188 0.0227beta4 907.1 44.0954 0.0227beta5 6352.5 6.2967 0.1588beta6 4736.8 8.4446 0.1184beta7 8021.8 4.9864 0.2005beta8 4565.9 8.7606 0.1141beta9 7303.8 5.4766 0.1826beta10 8076.8 4.9525 0.2019beta11 5080.2 7.8738 0.1270beta12 4005.2 9.9870 0.1001
§ /
Applied Bayesian Inference, KSU, April 29, 2012
54
sigmag
sigmae
beta1
beta2
§ /
Applied Bayesian Inference, KSU, April 29, 2012
55
Natural next step
• Compute marginal/cell means as function of effects (b)…just like before.– i.e., k’b
• Transform to the observed scale and look at posterior distribution:– Naturally(?): exp(k’b)
• But that is a “conditional specification”– Marginally; it might be something different…..
§ /
Applied Bayesian Inference, KSU, April 29, 2012
56
Simple illustration of marginal versus conditional in overdispersed Poisson
• If Yi ~ Poisson (exp(m+ui)) then marginally
so we probably should look at this posterior density of this function instead for “population-averaged” inference.
• Conditionally on ui = 0
• Implications on what functions we look at for posterior distributions.
2
E exp exp2E
i
ui i
uY u sm m
E | 0 expi iY u m “subject-specific” inference
§ /
Applied Bayesian Inference, KSU, April 29, 2012
57
Enough with your probit link!
• I WANT TO DO MCMC ON A LOGISTIC MIXED EFFECTS MODEL.– I’m an odd(s ratio) kind of guy/girl. – Ok..fine. See worked out example for PROC
MCMC.• Chen Fang. 2011. The RANDOM statement and more:
moving on with PROC MCMC. SAS Global Forum 2011. http://support.sas.com/rnd/app/papers/abstracts/334-2011.html
§ /
Applied Bayesian Inference, KSU, April 29, 2012
58
Other SAS procedures doing Bayesian/MCMC inference?
• Yes, but primarily for fixed effects models.– PROC GENMOD, LIFEREG, PHREG.– Greater need might be for mixed model versions.
• PROC MIXED has some Bayesian MCMC capabilities for simple variance component models.– i.e., not repeated measures.
§ /
Applied Bayesian Inference, KSU, April 29, 2012
59
Repeated measures in generalized linear mixed models
• The G-Side versus R-side conundrum• In classical GLMM analyses (PROC GLMM,
GENMOD), the R-side process cannot be simulated.– Model is “vacuous” (Walt Stroup).
• So take the G-side route.– This would be easy to analyze using MCMC if
underlying liabilities were augmented (need a multivariate normal cdf otherwise).