the multcomp r add-on package - uzhuser.math.uzh.ch/hothorn/talks/multcomp_hannover_2013.pdfthe...
TRANSCRIPT
IFSPM Institut fur Sozial- und Praventivmedizin
The multcomp R Add-onPackage11 Years of Simultaneous Inference for the Masses
Torsten Hothorn, Universitat Zurich
2013-09-26
The Power of R
– The R system for statistical computing is nowadays the lingua francafor statistical analyses in many fields.
– With nearly 5000 R add-on packages being available from theComprehensive R Archive Network, procedures implementing a widerange of statistical and other analysis methods are easily accessible toa large audience.
– Software (aka R add-on packages) determines how data is analysedtoday in a much stronger way than theoretical papers.
– We will take a closer look at two packages, mvtnorm and multcompas computational incarnations of fairly general theories.
University of Zurich, IFSPM 2013-09-26 multcomp Page 2
mvtnorm History
1992 Alan Genz: Numerical computation of multivariate normalprobabilities (JCGS)
1993 Alan Genz: Comparison of methods for the computation ofmultivariate normal probabilities (CSS)
1999 Alan Genz & Frank Bretz: Numerical computation ofmultivariate t-probabilities with application to powercalculation of multiple contrasts (JCGS)
1992– Alan Genz: MVTDST–A set of FORTRAN subroutines forthe numerical computation of multivariate t integrals, withmaximum dimension 100. This is an assimilation of thebest sofware in MVTPACK. This software may also be usedto compute multivariate normal integrals.
2000 2000-11-14 mvtnorm 0.1-8 published on CRAN. Basically avalue-added interface to MVTDST.
University of Zurich, IFSPM 2013-09-26 multcomp Page 3
mvtnorm R Add-on Package
Package: mvtnorm
Title: Multivariate Normal and t Distributions
Version: 0.9-9995
Date: 2013-05-29
Author: Alan Genz, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi,
Friedrich Leisch, Fabian Scheipl, Bjoern Bornkamp, Torsten Hothorn
Maintainer: Torsten Hothorn <[email protected]>
Description: Computes multivariate normal and t probabilities,
quantiles, random deviates and densities.
Imports: stats
Depends: R(>= 1.9.0)
License: GPL-2
University of Zurich, IFSPM 2013-09-26 multcomp Page 4
The mvtnorm Environment
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
acopula
adaptsmoFMRI
AdMit
asbio
asd
AssotesteR
B2Z
BACprior
BAEssd
BayesCR
BayesFactor
bayesGARCH
BayesSingleSub
bayesTFR
Bayesthresh
bcrm
Bergm
bgmm
bifactorial
bindata
binMto
BinNor
Bmix
bpkde
BSagri
bscr
caper
ccaPP
CDM
CDVine
ChemoSpec
chngpt
clinfun
clusteval
coin
colorspace
conting
copula
copulaedas
coxme
CPMCGLMcrawl
cslogistic
cudia
depend.truncation
disp2D
distrEllipse
dmt
DoseFinding
ecp
edrGraphicalTools
EMC
emulator
EquiNorm
ETC
FactMixtAnalysis
FastPCS
FastRCS
fdasrvf
fExpressCertificates
fgof
flexmix
flexsurv
FNN
ForImp
fpc
geiger
genetics
GenOrdGGMselect
glassomix
glmlep
glmmAK
glmpathcr
gMCP
gmm
goric
growthrate
gsg
hbimhda
HH
hierarchicalDS
highfrequency
HPbayes
HSAURHSAUR2
hsmm
HWEBayes
hyperdirichlet
hyperSpec
ic.inferICS
ICSNP
ifa
IMIS
imputeYn
intamap
IPMpackipredIPSUR
JJcorr
ks
latentnet
lava.tobit
lawstat
LCAextend
listlmec
LogConcDEADlogcondiscr
longclust
lordifltm
mada
MAMS
MARSSmatie maxstat
mc2d
MCMCglmmMCPAN
MCPMod
mcprofileMetabolAnalyze
metamisc
MFDA
mhsmm
mirt
miscF
miscFuncs
mistatMitISEM
mixsmsn
mmcm
MNM
Modalclust
mombfmonomvn
mratios
MSBVAR
msm
multcomp
MultiOrd
multmod
multxpert
muma
mutossMVA
mvProbit
mvtBinaryEP
nparcomp
odprism
OPDOE
OptInterim
pacose PairedData
paleoTS
pamm
party
partykit
PAWL
pcaPP
plgpplsRbeta plsRglm
polycor
polyCubpomp
PowerTOSTprabclus
protiq psbcGroup
pscl
psych
QRM
RAD
random.polychor.pa
RcmdrPlugin.EZR
reglogit
RfmriVCrobustfa
robustHD
rpf
rrcovRxCEcolInf
SamplerCompare
sampleSelection
schwartz97
selectiongain
SeleMix
SemiParBIVProbit
SemiParSampleSel
SEMModComp
simboot
SimComp
simexaft
simFramesirt
sisus
smart
SmoothHazard
SNPmaxsel
sparsediscrim
SparseGrid
sparseLTSEigen
SparseTSCGM
spate
spatialprobit
spikeSlabGAM
ssmrob
sspir
SSsimple
Stem
stochmod symmoments
TAM
tclust
texmex
tilting
timeROC
tlemix
tlmec
tmvtnorm
varComp
varSelectIP
vcd
VineCopulaweightedScores
Zelig
abd
adegenet
AERafex
agRee
agridat
Amelia
anacor
approximatoraqp
ARAMIS
archetypesarulesViz
aspect
BACCO
BaSTA
bayesDem
bayesLife
bayesPop
BayesX
bcool
bear
benchmark
betareg
biclust
binseqtest
BiodiversityR
Biograph
biwt
BMA
boostSeq
BVS
calibrator
caret
catdata
catIrt
cba
cg
chemometrics
ChemometricsWithR
choplump clue
CoClustCollocInfer
CONOR
CONORData
convevol
CopulaRegression
curvHDR
Daim
DAMisc
DandEFA
Deducer
DeducerPlugInScaling
difR
DirichletReg
doBy
DoE.base
dynaTree
effects
eiPack
elliptic
EMCC
ENmisc
ergm
evtree expands
extracat
FAiR
fastR
feature
FinTSfitdistrplus
forecast
forensic
fractaldim
FRB
frontiles
fscaret
ftsa
gamlss.util
gap
GB2
gdimap
gems
GenABEL
GetR
ggplot2
ggthemes
glmpermglmulti
gMWT
gpairs
grImport
GSE
GSIF
GWAF
HAC
HDMD
hdrcde
highriskzonehysteresis
infutil
intamapInteractive
interval
introgress
irtoys
IsotopeR
iteRates
jackknifeKME
JADE
kequate
Lahman
languageR
laser
latdiag
latticist
LDheatmap
LearnEDA
lmmlasso
lmSupport
logcondenslqmm
lsmeans
luca
MAclinical
MAR1
MASSI
MasterBayes
maticce
matrixpls
MAVTgsa
mboost
mcmcplots
MCMC.qpcr
metafor
MetaPCA
mets
MetSizeR
mice
miP
mixAK
mlearning
mlr
MM
mobForest
ModelGoodmosaicmovMF
MSeasy
msr
multcompView
MultEq
multilevelPSA
multivator
MuMIn
munsell
mutossGUI
nacopula
NEff
nFactors
NHEMOtree
nonparaeff
oc
OjaNP
OpenRepGrid
OpenStreetMap
opm
optBiomarkerorddom
oro.pet
pairwiseCI
parfm
partDSA
pastis
pcalg
PCovRPeak2Trough
pec
pedantics
perm
phytools
PIN
plotKML
plsRcox
PopGenReport
postgwas
prefmod
propagate
prospectr
psgp
psychomix
psychotree
psytabs
PVR
qgraph
R2BayesX
rainbow
rasterVis
rattle
Rcmdr
RcmdrPlugin.coinRcmdrPlugin.doex
RcmdrPlugin.HH
RcmdrPlugin.mosaic
RcmdrPlugin.qual
RcmdrPlugin.SM
RcmdrPlugin.StatisticalURV
RecordLinkage
Reol
Rgnuplot
RGraphics
riv
RM2
RMark
rms
RobAStBase
robCompositions
robust
robustX
rrcovHD
rrcovNA
rriskDistributions
rrlda
RSurvey
RTextTools
rtop
rugarch
RWeka
Rz
SciencesPo
sdcMicroGUI
semsemGOF
semPlot
seqMeta
seriation
sfsmisc
SGP
sgr
shotGroups
simPH
simPopulation
simsem
skatMeta
Sleuth3
smacof
snp.plotter
SODC
spacodiR
SpatialNP
sperrorest
SSDforR
statnet
SuperLearner
surveillance
tableplotTIMP
tonymisc
tourr
tourrGui
treemap
TreeSim
TriMatch
trimcluster
trioGxE
tspmeta
vcdExtra
VIM
vines
vitality
VizCompX
widals
wild1
wnominate
YaleToolkit
zooimage
mvtnorm
University of Zurich, IFSPM 2013-09-26 multcomp Page 5
The mvtnorm Environment
10.
9.
8.
7.
6.
5.
4.
3.
2.
1.
Dependency Ranking of 4828 CRAN Packages
Number of reverse dependencies
0 200 400 600 800
methods
MASS
stats
lattice
graphics
utils
survival
mvtnorm
Matrix
ggplot2
University of Zurich, IFSPM 2013-09-26 multcomp Page 6
Application: Parametric Simultaneous Inference
– Our main motivation to package-up Alan’s FORTRAN code was to geteasy access to the multivariate normal and t distributions for theimplementation of multiple tests and multiple comparison procedures.
– Basic idea: Extract parameter estimates (coef(), mostly) and theircovariance matrix (vcov(), mostly) from fitted (semi)parametricmodels. For models where either the exact t distribution of theestimated coefficients is known or a central limit ensures theconvergence to a multivariate normal distribution, the multcompadd-on package implements simultaneous inference procedures forlinear functions of the parameters.
– Very much inspired by the book Multiple Comparisons and MultipleTests using the SAS® System (Westfall et al., 1999).
– We wanted to be nice and make this as painless as possible. Did wesucceed?
University of Zurich, IFSPM 2013-09-26 multcomp Page 7
Application: Parametric Simultaneous Inference
> library("multcomp")
> data("alpha", package = "coin")
> amod <- aov(elevel ~ alength, data = alpha)
> amod_glht <- glht(amod, linfct = mcp(alength = "Tukey"))
– The main function is glht() (general linear hypotheses), taking any(OK, OK, maybe not all of them) parametric model as input alongwith a matrix defining the linear function of interest.
– summary() and confint() methods allow global and simulaneousinference on these linear functions.
– mcp() sets-up linear functions corresponding to many multiplecomparison procedures taking the model contrasts into account.
– The real work, of course, is done by mvtnorm.
University of Zurich, IFSPM 2013-09-26 multcomp Page 8
Application: Parametric Simultaneous Inference
> summary(amod_glht)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = elevel ~ alength, data = alpha)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
intermediate - short == 0 0.4342 0.3836 1.132 0.4924
long - short == 0 1.1888 0.5203 2.285 0.0614 .
long - intermediate == 0 0.7546 0.4579 1.648 0.2270
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
University of Zurich, IFSPM 2013-09-26 multcomp Page 9
Application: Parametric Simultaneous Inference
> confint(amod_glht)
Simultaneous Confidence Intervals
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = elevel ~ alength, data = alpha)
Quantile = 2.3719
95% family-wise confidence level
Linear Hypotheses:
Estimate lwr upr
intermediate - short == 0 0.43415 -0.47580 1.34410
long - short == 0 1.18875 -0.04524 2.42274
long - intermediate == 0 0.75460 -0.33141 1.84061
University of Zurich, IFSPM 2013-09-26 multcomp Page 10
multcomp History
2002 multcomp version 0.2-6 published on CRAN 2002-06-20.One-way ANOVA only.
2006 multcomp version 0.991-1 published on CRAN.Simultaneous tests and confidence intervals for generallinear hypotheses in parametric models, including linear,generalized linear, linear mixed effects, and survival models.
2008 Hothorn, Bretz, Westfall: Simultaneous inference in generalparametric models (BiomJ, 2008): 380 (1 / 373, ISI WoS)
2010 Bretz, Hothorn, Westfall: Multiple Comparisons Using R(CRC Press): 136 (google scholar)
University of Zurich, IFSPM 2013-09-26 multcomp Page 11
mvtnorm R Add-on Package
Package: multcomp
Title: Simultaneous Inference in General Parametric Models
Version: 1.2-20
Date: 2013-08-29
Authors@R: c(person("Torsten", "Hothorn", role = c("aut", "cre"),
email = "[email protected]"),
person("Frank", "Bretz", role = "aut"),
person("Peter", "Westfall", role = "aut"),
person("Richard M.", "Heiberger", role = "ctb"),
person("Andre", "Schuetzenmeister", role = "ctb"))
Description: Simultaneous tests and confidence intervals
for general linear hypotheses in parametric models, including
linear, generalized linear, linear mixed effects, and survival
models. The package includes demos reproducing analyzes presented
in the book "Multiple Comparisons Using R" (Bretz, Hothorn,
Westfall, 2010, CRC Press).
Depends: stats, graphics, mvtnorm (>= 0.8-0), survival (>= 2.35-7)
Suggests: lme4 (>= 0.999375-16), nlme, robustbase, mboost, coin,
MASS, car, foreign, xtable, sandwich, lmtest, coxme (>= 2.2-1)
License: GPL-2
University of Zurich, IFSPM 2013-09-26 multcomp Page 12
The multcomp Environment
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
benchmark
bifactorialBiodiversityR BSagri
cg
coin
Deducer
doBy
DoseFinding
ETC
evtree
fastR
flexmix
ggplot2
gMCP
HH
HSAUR2
languageRlsmeans
MAVTgsa
MCPAN
mcprofile
mratios
multcompViewmutossmutossGUI
MVAnparcomp
opm
Rcmdr
RcmdrPlugin.coinRcmdrPlugin.doex
RcmdrPlugin.EZR
RcmdrPlugin.HHRcmdrPlugin.StatisticalURV
rmsSimComp
Sleuth3
abctools
abd
afex
agridat
aoristic
archetypes
automap
bams
bbmle
BCA
BCEA
bcrm
bear
betaregbfp
Biograph
caret
catdata
catenary
Causata
cda
choplumpchron
classify
clhs
clue
clusthaplo
coefplot
coloc
contrast
CPE
cplm
cvxclustr
Daim
data.table
DeducerExtras
DeducerPlugInExample
DeducerPlugInScaling
DeducerSpatial
DeducerSurvival
DeducerText
demi
DescribeDisplay
dielectric
directlabels
diveRsity
dsm
DTR
dynsurv
earlywarnings
EasyHTMLReport
eeptools
ENmisc
EnQuireR
epibase
erer
evaluate
expands
extracat
ez
ezsim
FAOSTAT
FeaLect
FField FinCal
fishmove
fpc
frontiles
fscaret
gap
gazetools
gcookbook
GGallyggdendro ggmap
ggmcmcggparallel
ggROC
ggsubplot
ggthemes
gitter
glinternet
glmperm
GOsummaries
gpmap
gppois
granovaGG
gridDebug
gridExtra
growcurves
gsDesign
GSE
gstudio
gsubfn
haplo.stats
HistData
HiveR
HLMdiag
Hmisc
HSAUR
hyperSpec
interval
IPSUR
kdetrees
klaR
Kmisc
knitrBootstrap
kobe
Lahman
LCAextend
lda
LDheatmap
likeLTD
LMERConvenienceFunctions
localgauss
lordif
MAc
MAd
marked
mchof
MergeGUI
meteogRam
microbenchmark
micromap
MIPHENO
MissingDataGUI
mistatmixstock
Mobilize
ModelGood
mosaic
MRMR
MSG
multcomp
MultEq
multilevelPSA
multitable
munsellNeatMap
ngramr
NlsyLinks
nonparaeff
nullabor
OpenStreetMap
opmdata
pa PairedData
pairwiseCI
pander
party
PAWL
pbdPROF
PBImisc
pcrcoal
pcrsim
Peak2Trough
pec
pequod
perm
perry
perturb
PhaseType
phylosim
pitchRx
PKgraph
PKreport
planar
playwithpmgPopGenReport
poppr
popReconstruct
postgwas
PracTools
PReMiuM
PRISMA
processdata
profr
ProgGUIinR
ProjectTemplate
psd
psychomix
qdap
quadrupen
R2admb
rAltmetric
rankhazard
rasterVis
rattle
rbefdata
Rcell
RcmdrPlugin.BCARcmdrPlugin.depthTools
RcmdrPlugin.doBy
RcmdrPlugin.DoE
RcmdrPlugin.EACSPIR
RcmdrPlugin.EBM
RcmdrPlugin.epackRcmdrPlugin.IPSUR
RcmdrPlugin.KMggplot2
RcmdrPlugin.lfstat
RcmdrPlugin.MA
RcmdrPlugin.mosaic
RcmdrPlugin.MPAStatsRcmdrPlugin.orloca
RcmdrPlugin.plotByGroupRcmdrPlugin.qual
RcmdrPlugin.sampling
RcmdrPlugin.SCDA
RcmdrPlugin.seeg
RcmdrPlugin.SLC
RcmdrPlugin.SM
RcmdrPlugin.survival
RcmdrPlugin.TeachingDemos
RcmdrPlugin.temis
RcmdrPlugin.UCA
refGenome
rgbifRGraphics
riskRegression
robustbase
robustHD
robustlmm
rotations
rpf
rplos
RSA
RSDA
rtematres
rtf
RVAideMemoire
rvertnet
Rz
SamplerCompareSCGLR
SciencesPo
SDaA
seewave
shotGroups
simPH
SixSigma
SMFI5
sparkTable
spcosa
spikeSlabGAM
SPOT
strvalidator
SuperLearner
survAUC
SvyNom
synbreed
tabplot
taRifxTeachingDemos
timeit
timeline
TimeProjection tis
tlemix
tourr
tourrGui
transmission
trapezoid
treecm
TriMatch
TripleR
Tsphere
tspmeta
useful
varbvs
vcd
vcdExtra
wethepeople
wq
xkcd
XLConnectYourCast
zoo
University of Zurich, IFSPM 2013-09-26 multcomp Page 13
The multcomp Environment
72.
...
10.
9.
8.
7.
6.
5.
4.
3.
2.
1.
Dependency Ranking of 4828 CRAN Packages
Number of reverse dependencies
0 200 400 600 800
methods
MASS
stats
lattice
graphics
utils
survival
mvtnorm
Matrix
ggplot2
multcomp
University of Zurich, IFSPM 2013-09-26 multcomp Page 14
multcomp Applications
2008 2009 2010 2011 2012 2013
Hothorn et al. (2008, BiomJ)
Num
ber
of c
itatio
ns
020
4060
8010
012
0
University of Zurich, IFSPM 2013-09-26 multcomp Page 15
multcomp Applications
Hothorn et al. (2008, BiomJ)
Number of citations
0 20 40 60 80
Ecology
Plant Sciences
Environmental Sciences
Biodiversity Conservation
Marine−Freshwater Biology
Zoology
Statistics Probability
Entomology
Forestry
Toxicology
University of Zurich, IFSPM 2013-09-26 multcomp Page 16
multcomp As A Personal Research Tool
2010 Herberich, Sikorski & Hothorn: A robust procedure forcomparing multiple means under heteroscedasticity inunbalanced designs (PLoS ONE)
2012 Herberich & Hothorn: Dunnett-type inference in the frailtyCox model with covariates (SiM)
2013 Herberich & Hothorn: Multiple curve comparisons(hopefully)
University of Zurich, IFSPM 2013-09-26 multcomp Page 17
Computational Frameworks
– multcomp is an example of a fairly broad yet dense implementation ofa rich theory.
– Extensible: Plug-in your own (semi)parametric models, linearfunctions, or covariance matrices.
– Makes QA easy–especially when there are many published examplesavailable (Multiple comparisons SAS book).
– With good documentation being available (paper!), people pick up thepackage (and cite the paper) fast.
– One can also fill the (theoretical) gaps and publish in Stats journals.
– This does not only worked for multcomp, but also for ...
University of Zurich, IFSPM 2013-09-26 multcomp Page 18
Computational Frameworks: party
– Implements conditional inference trees for unbiased recursivepartitioning; applicable to arbitrary responses and also includes anunbiased random forest variant.
– Unbiased recursive partitioning: A conditional inference framework(JCGS, 2006): 235 (1 / 403)
– Bias in random forest variable importance measures: Illustrations,sources and a solution (BMC Bioinf, 2007): 177 (11 / 4589)
– Many follow-up papers, mostly by other groups.
University of Zurich, IFSPM 2013-09-26 multcomp Page 19
Computational Frameworks: coin
– Implements linear permutation tests (including ‘exact’ variants of manywell-known tests) and also permutation-based simultaneous inference(more to come soon).
– A Lego system for conditional inference (AmStat, 2006): 79 (2 / 452)
– Implementing a Class of Permutation Tests: The coin Package (JSS,2008): 76 (7 / 343)
– No follow-up paper so far (because of extreme lazyness).
University of Zurich, IFSPM 2013-09-26 multcomp Page 20
Computational Frameworks: mboost
– Implements generic functional gradient descent (boosting) for a largeclass of (classical and novel) regression models.
– Boosting algorithms: Regularization, prediction and model fitting(StatSci, 2007): 105 (1 / 256)
– Kept 4 + 1 PhD students busy so far, couple of papers...
University of Zurich, IFSPM 2013-09-26 multcomp Page 21
Summary
– Implementing your methods by writing software (in a way others canactually use it) is rewarding:
1. You really understand what’s going on and where the pitfalls are.2. It stimulates your own research.3. It makes your own research reproducible.4. It spreads the word about your ideas very fast.5. It makes your contributions visible.6. Your students have something to chew on.
– On the downside:1. Users will complain (lack of functionality, documentation, high-level
interfaces...).2. Users will find errors (well, that’s actually a good thing...).3. Maintainance takes time (more time than writing the software in the
first place).4. Making sure your package is in sync with all its reverse dependencies
can be very challenging these days.5. You write the code but your students write the paper (I’m not sure
about this one...).
University of Zurich, IFSPM 2013-09-26 multcomp Page 22
Thank you...
...for your attention!
University of Zurich, IFSPM 2013-09-26 multcomp Page 23