a survey of automatic bayesian software and why you should ... · a survey of automatic bayesian...

24
A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu [email protected] zhenkewu.com December 15 th , 2015 Hopkins Biostatistics Computing Club

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

ASurveyofAutomaticBayesianSoftwareand

WhyYouShouldCareZhenke Wu

[email protected]

zhenkewu.com

December15th,2015

HopkinsBiostatisticsComputing Club

Page 2: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

BayesFormulaModellikelihoodforobserveddata𝑥

Prioronmodelparameter𝜃

• Marginaldistribution ofdatagiventhemodel;• “Evidence”thatthisdata𝑥aregeneratedbythis

model (Box1980,JRSS-A)• Hardtocompute

ThomasBayes(1701-1761)

Page 3: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

GibbsSampling

Usesimulatedsamplestoapproximatetheentire jointposteriordistribution

Lookfromtop

Page 4: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

WhyAutomaticSoftwareforBayesianInference?• Self-codedsimulationalgorithmsusuallyrequireextratunning andcostmuchtime(shareyourexperience!)• Generalformula/recipesexistforsamplingfromcommondistributions• Modelersgenerallywantreasonable andfast modeloutputstospeedupmodelbuilding,testingandinterpretation

Page 5: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

AnalyticPipeline AutomaticBayesianSoftware

Page 6: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

BayesianSoftwareandGoogleTrends

• WinBUGS/OpenBUGS• JAGS• Stan• PyMC3• Others,e.g,R-INLA,NIMBLE,MCMCpack…

https://goo.gl/YNQbCP

Page 7: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

IfAddingtheTrendforR?

https://goo.gl/orflly

Page 8: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

WewillConnectTheseSoftwaretoR…

Abstract: If you are using R and you think you’re in hell, this is a map for you.

Page 9: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

WinBUGS http://www.mrc-bsu.cam.ac.uk/software/bugs/

• BayesianinferenceUsingGibbsSampling• LatestVersion:1.4.3;Add-onmodules,e.g.,GeoBUGS• CallfromRby“R2WinBUGS”• Since1989inMedicalResearchCouncil(MRC)BiostatisticsUnit,Cambridge---DavidSpiegelhalterwithchiefprogrammerAndrewThomas;MotivatedbyArtificialIntelligenceresearch• 1996toImperialCollege,London--- NickyBest,JonWakefieldandDaveLunn• Nochangesince2007• In2004OpenBUGS isbranchedfromWinBUGS byAndrewThomas(http://www.openbugs.net/w/FrontPage);stillunderdevelopment• Moreat:Lunn,D.,Spiegelhalter,D.,Thomas,A.andBest,N.(2009)TheBUGSproject:Evolution,critiqueandfuturedirections(withdiscussion), StatisticsinMedicine 28:3049--3082.

Page 10: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

GoodExperience- WinBUGS

• GUI,easyforvisualinspectionofchainswithouttoomuchposteriorsampleprocessing• Goodteachingtoolwithacompanionbook: TheBUGSBook- APracticalIntroductiontoBayesianAnalysis• Codedinmanycommondistributionssuitablefordifferenttypesofdata(seeManual)

• Relativeeasyfordebuggingbecauseitpointstospecificerrors

Page 11: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

BadExperiences- WinBUGS

• “WhyyoushouldnotuseWinBUGS orOpenBUGS”– BarryRowlingsonhttp://geospaced.blogspot.com/2013/04/why-you-should-not-use-winbugs-or.html• Odderrors,e.g.,“trap”messagesformemoryerrors• WritteninComponentPascal;canonlybereadwithBlackBox ComponentBuilderfromOberonMicrosystems,whichonlyrunsonWindows.AlsoBlackBox wasabandonedbyitsowndevelopersin2012.• Notveryopen-source,althroughwithtoolstoextendWinBUGS• Essentiallysamplenodesunivariately;blocksamplingonlyavailableformultivariatenodes,orfixed-effectparametersinGLMsbyMetroplis-HastingsalgorithmproposedbyIterativelyReweightedLeastSquares.

Page 12: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Example:Penalized-SplineRegressionWinBUGS (500datapoints;10,000iterations;5.87secs)for(i in1:N){

M[i]~dnorm(mu[i],prec)

#mu[i]<- inprod2(ZB[i,],beta[])

mu[i]<- ZB[i,1]*beta[1]+ZB[i,2]*beta[2]+ZB[i,3]*beta[3]+ZB[i,4]*beta[4]+

ZB[i,5]*beta[5]+ZB[i,6]*beta[6]+ZB[i,7]*beta[7]+ZB[i,8]*beta[8]+

ZB[i,9]*beta[9]+ZB[i,10]*beta[10]#scalarcalculations.

}

sigma<- pow(prec,-0.5)

#priorforB-splinecoefficients:first-orderpenaltymatrix:

beta[1]~dnorm(0,prec_beta1)

for(cin2:C){

beta[c]~dnorm(beta[c-1],taubeta)

}

taubeta ~dgamma(3,2)

prec_beta1<- 1/4*prec

prec ~dgamma(1.0E-2,1.0E-2)

}

Page 13: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Example:Penalized-SplineRegressionWinBUGS (10,000iterations;4.15secs)

Datapoints

Posteriorsamplesofmeancurves

B-splinebasismultipliedbyestimatedcoefficients

Truemeancurve

Page 14: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

JAGS(JustAnotherGibbsSampler)

• http://mcmc-jags.sourceforge.net/• Latestversion4.0.0;Author:MartynPlummer;firstrelease:2007• “notwhollyunlikeBUGS”withthreeaims:- cross-platformengine(writteninC++),e.g.,MacOSX,Linux,Windows- extensibility- aplatformforexperimentation• Experience:- greatspeed(loadthe“glm”module!);built-invectorization- responsiveonlinecommunity(mostlyrespondedinadaybyMartynhimself)- genericerrormessageshardtoknowexactlywhatwentwrong- noGUI

Page 15: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Example:Penalized-SplineRegressionJAGS(10,000iterations;4.15secs)model{

for(i in1:N){

M[i]~dnorm(mu[i],prec)

}

sigma<- pow(prec,-0.5)

mu<- ZB%*%beta#vectorized.

#priorforB-splinecoefficients: first-orderpenaltymatrix:

beta[1]~dnorm(0,prec_beta1)

for(cin2:C){

beta[c]~dnorm(beta[c-1],taubeta)

}

taubeta ~dgamma(3,2)

prec_beta1<- 1/4*prec

prec ~dgamma(1.0E-2,1.0E-2)

}

Page 16: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Stanhttp://mc-stan.org/interfaces/

• namedinhonorofStanislawUlam,pioneeroftheMonteCarlomethod(Metropolis,Nicholas,andStanislawUlam."TheMonteCarlomethod.”(1949)JASA)

• InferentialEngine:• MCMCsampling(NoU-TurnSampler;HamiltonianMonteCarlo)• ApproximateBayesianinference(variational inference)• Penalizedmaximumlikelihoodestimation(Optimization)

• Latestversion2.9.0;DevelopedbyatColumbia;initialreleaseAugust2012• Cross-platform;WritteninC++;Open-source

• CallfromRby“rstan”;canalsobecalledfromPythonby“PyStan”;Julia…

• Verysweetpart:“shinyStan”package;seedemo.

Page 17: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Example:Penalized-SplineRegressionStan(10,000iterations;9.44secs)data{int<lower=0>N;//numberofobservations int<lower=0>C;//number ofB-spline basesmatrix[N,C]ZB;//predictorforobservationnvector[N]M;//outcomeforobservation n}parameters{real<lower=0>sigma;//noise variancereal<lower=0>sigma_beta;//smoothing parameter.vector[C]beta;//regression}transformedparameters{vector[N]mu;mu<- ZB*beta;}model{sigma~cauchy(0,5);sigma_beta ~cauchy(0,5); beta[1]~normal(0,2*sigma); for(lin2:C)beta[l]~normal(beta[l-1],sigma_beta);M~normal(mu, sigma);}

PosteriorIntervals

Page 18: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

shinyStan

Page 19: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

RStan Experience• Vectorized functions--- fast!(builtuponEigen,aC++templatelibraryforlinearalgebra)• Goodwhenthedataarebigbutthemodelissmall

• Ctypevariabledeclaration;providesextensivewarning/errormessages• Notreliantuponconjugatepriors(comparetoBUGS)• Convenienttoinstallbyinstall.packages(“rstan”)• HostedbyGitHub

• Currentlycannotsamplediscreteunknownparameters• NotalwaysfasterthanBUGS/JAGS:“Slowerperiterationbutmuchbetteratmixingandconverging”BobCarpenter;Thehopeistotrade-offwalltimeforshorterchains.

Page 20: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

PyMC3 (DangerZone:TheSpeakerhasnoExperience)• BasedonHamiltonianMonteCarlo• Requiregradientinformation,calculatedbyTheano (fast;tightlyintegratedwithNumPy)• ModelspecificationdirectlyinPythoncode:

“Thereshouldbeone—andpreferablyonlyone—obviouswaytodoit”— ZenofPython

• Readings:- https://pymc-devs.github.io/pymc3/getting_started/- http://andrewgelman.com/2015/10/15/whats-the-one-thing-you-have-to-know-about-pystan-

and-pymc-click-here-to-find-out/

Page 21: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

INLA

• IntegratednestedLaplaceapproximation(Rue,MartinoandChopin(2009)JRSS-B)• SuitableforlatentGaussianMarkovrandomfieldmodels,e.g.,Generalizedadditivemodels,Timeseriesmodels,Geoadditivemodels…(recommendtoyourfriendswhodospatialstatistics!)• Fastformarginalposteriordensities,hencesummarystatisticsofinterest,posteriormeans,variancesorquantiles

• Moreat:http://www.math.ntnu.no/~hrue/r-inla.org/doc/Intro/Intro.pdf

Page 22: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

RPackage“baker”:https://github.com/zhenkewu/baker

• BayesianAnalyticKitforEtiologyResearch• CallJAGSorWinBUGS fromR• AutomaticallywritethefullmodelfileusinganRwrapperfunction• “Plug-and-Play”toaddextralikelihoodcomponentsandpriors• Built-invisualizationsforinterpretingresults

Page 23: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Summary• Modeler’stime:- modeldesign/interpretation(iterativenatureofmodelling)- writeone’sowncodeforposteriorcomputing• Surveyedsoftwarethatdoesautomaticposteriorinference• Choiceofsoftwaredependson- Stageofmodeldevelopment(debuggingormassproduction)- Scaleofanalysis- Documentationandonlinecommunity- RorPythonastheprimarydataprocessinglanguage• P-splineregressiondonebydifferentsoftware;comparison• IntroducedanRpackage“baker”fordiseaseetiologyresearch;usedJAGSorWinBUGS;potentialimprovements

Page 24: A Survey of Automatic Bayesian Software and Why You Should ... · A Survey of Automatic Bayesian Software and Why You Should Care Zhenke Wu zhwu@jhu.edu zhenkewu.com December 15th,

Thanks!

• Questions

• Shareyourexperience