the multcomp r add-on package - uzhuser.math.uzh.ch/hothorn/talks/multcomp_hannover_2013.pdfthe...

23
IFSPM Institut f¨ ur Sozial- und Pr¨ aventivmedizin The multcomp R Add-on Package 11 Years of Simultaneous Inference for the Masses Torsten Hothorn, Universit¨ at Z¨ urich 2013-09-26

Upload: others

Post on 02-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

IFSPM Institut fur Sozial- und Praventivmedizin

The multcomp R Add-onPackage11 Years of Simultaneous Inference for the Masses

Torsten Hothorn, Universitat Zurich

2013-09-26

Page 2: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

The Power of R

– The R system for statistical computing is nowadays the lingua francafor statistical analyses in many fields.

– With nearly 5000 R add-on packages being available from theComprehensive R Archive Network, procedures implementing a widerange of statistical and other analysis methods are easily accessible toa large audience.

– Software (aka R add-on packages) determines how data is analysedtoday in a much stronger way than theoretical papers.

– We will take a closer look at two packages, mvtnorm and multcompas computational incarnations of fairly general theories.

University of Zurich, IFSPM 2013-09-26 multcomp Page 2

Page 3: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

mvtnorm History

1992 Alan Genz: Numerical computation of multivariate normalprobabilities (JCGS)

1993 Alan Genz: Comparison of methods for the computation ofmultivariate normal probabilities (CSS)

1999 Alan Genz & Frank Bretz: Numerical computation ofmultivariate t-probabilities with application to powercalculation of multiple contrasts (JCGS)

1992– Alan Genz: MVTDST–A set of FORTRAN subroutines forthe numerical computation of multivariate t integrals, withmaximum dimension 100. This is an assimilation of thebest sofware in MVTPACK. This software may also be usedto compute multivariate normal integrals.

2000 2000-11-14 mvtnorm 0.1-8 published on CRAN. Basically avalue-added interface to MVTDST.

University of Zurich, IFSPM 2013-09-26 multcomp Page 3

Page 4: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

mvtnorm R Add-on Package

Package: mvtnorm

Title: Multivariate Normal and t Distributions

Version: 0.9-9995

Date: 2013-05-29

Author: Alan Genz, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi,

Friedrich Leisch, Fabian Scheipl, Bjoern Bornkamp, Torsten Hothorn

Maintainer: Torsten Hothorn <[email protected]>

Description: Computes multivariate normal and t probabilities,

quantiles, random deviates and densities.

Imports: stats

Depends: R(>= 1.9.0)

License: GPL-2

University of Zurich, IFSPM 2013-09-26 multcomp Page 4

Page 5: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

The mvtnorm Environment

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●●

acopula

adaptsmoFMRI

AdMit

asbio

asd

AssotesteR

B2Z

BACprior

BAEssd

BayesCR

BayesFactor

bayesGARCH

BayesSingleSub

bayesTFR

Bayesthresh

bcrm

Bergm

bgmm

bifactorial

bindata

binMto

BinNor

Bmix

bpkde

BSagri

bscr

caper

ccaPP

CDM

CDVine

ChemoSpec

chngpt

clinfun

clusteval

coin

colorspace

conting

copula

copulaedas

coxme

CPMCGLMcrawl

cslogistic

cudia

depend.truncation

disp2D

distrEllipse

dmt

DoseFinding

ecp

edrGraphicalTools

EMC

emulator

EquiNorm

ETC

FactMixtAnalysis

FastPCS

FastRCS

fdasrvf

fExpressCertificates

fgof

flexmix

flexsurv

FNN

ForImp

fpc

geiger

genetics

GenOrdGGMselect

glassomix

glmlep

glmmAK

glmpathcr

gMCP

gmm

goric

growthrate

gsg

hbimhda

HH

hierarchicalDS

highfrequency

HPbayes

HSAURHSAUR2

hsmm

HWEBayes

hyperdirichlet

hyperSpec

ic.inferICS

ICSNP

ifa

IMIS

imputeYn

intamap

IPMpackipredIPSUR

JJcorr

ks

latentnet

lava.tobit

lawstat

LCAextend

listlmec

LogConcDEADlogcondiscr

longclust

lordifltm

mada

MAMS

MARSSmatie maxstat

mc2d

MCMCglmmMCPAN

MCPMod

mcprofileMetabolAnalyze

metamisc

MFDA

mhsmm

mirt

miscF

miscFuncs

mistatMitISEM

mixsmsn

mmcm

MNM

Modalclust

mombfmonomvn

mratios

MSBVAR

msm

multcomp

MultiOrd

multmod

multxpert

muma

mutossMVA

mvProbit

mvtBinaryEP

nparcomp

odprism

OPDOE

OptInterim

pacose PairedData

paleoTS

pamm

party

partykit

PAWL

pcaPP

plgpplsRbeta plsRglm

polycor

polyCubpomp

PowerTOSTprabclus

protiq psbcGroup

pscl

psych

QRM

RAD

random.polychor.pa

RcmdrPlugin.EZR

reglogit

RfmriVCrobustfa

robustHD

rpf

rrcovRxCEcolInf

SamplerCompare

sampleSelection

schwartz97

selectiongain

SeleMix

SemiParBIVProbit

SemiParSampleSel

SEMModComp

simboot

SimComp

simexaft

simFramesirt

sisus

smart

SmoothHazard

SNPmaxsel

sparsediscrim

SparseGrid

sparseLTSEigen

SparseTSCGM

spate

spatialprobit

spikeSlabGAM

ssmrob

sspir

SSsimple

Stem

stochmod symmoments

TAM

tclust

texmex

tilting

timeROC

tlemix

tlmec

tmvtnorm

varComp

varSelectIP

vcd

VineCopulaweightedScores

Zelig

abd

adegenet

AERafex

agRee

agridat

Amelia

anacor

approximatoraqp

ARAMIS

archetypesarulesViz

aspect

BACCO

BaSTA

bayesDem

bayesLife

bayesPop

BayesX

bcool

bear

benchmark

betareg

biclust

binseqtest

BiodiversityR

Biograph

biwt

BMA

boostSeq

BVS

calibrator

caret

catdata

catIrt

cba

cg

chemometrics

ChemometricsWithR

choplump clue

CoClustCollocInfer

CONOR

CONORData

convevol

CopulaRegression

curvHDR

Daim

DAMisc

DandEFA

Deducer

DeducerPlugInScaling

difR

DirichletReg

doBy

DoE.base

dynaTree

effects

eiPack

elliptic

EMCC

ENmisc

ergm

evtree expands

extracat

FAiR

fastR

feature

FinTSfitdistrplus

forecast

forensic

fractaldim

FRB

frontiles

fscaret

ftsa

gamlss.util

gap

GB2

gdimap

gems

GenABEL

GetR

ggplot2

ggthemes

glmpermglmulti

gMWT

gpairs

grImport

GSE

GSIF

GWAF

HAC

HDMD

hdrcde

highriskzonehysteresis

infutil

intamapInteractive

interval

introgress

irtoys

IsotopeR

iteRates

jackknifeKME

JADE

kequate

Lahman

languageR

laser

latdiag

latticist

LDheatmap

LearnEDA

lmmlasso

lmSupport

logcondenslqmm

lsmeans

luca

MAclinical

MAR1

MASSI

MasterBayes

maticce

matrixpls

MAVTgsa

mboost

mcmcplots

MCMC.qpcr

metafor

MetaPCA

mets

MetSizeR

mice

miP

mixAK

mlearning

mlr

MM

mobForest

ModelGoodmosaicmovMF

MSeasy

msr

multcompView

MultEq

multilevelPSA

multivator

MuMIn

munsell

mutossGUI

nacopula

NEff

nFactors

NHEMOtree

nonparaeff

oc

OjaNP

OpenRepGrid

OpenStreetMap

opm

optBiomarkerorddom

oro.pet

pairwiseCI

parfm

partDSA

pastis

pcalg

PCovRPeak2Trough

pec

pedantics

perm

phytools

PIN

plotKML

plsRcox

PopGenReport

postgwas

prefmod

propagate

prospectr

psgp

psychomix

psychotree

psytabs

PVR

qgraph

R2BayesX

rainbow

rasterVis

rattle

Rcmdr

RcmdrPlugin.coinRcmdrPlugin.doex

RcmdrPlugin.HH

RcmdrPlugin.mosaic

RcmdrPlugin.qual

RcmdrPlugin.SM

RcmdrPlugin.StatisticalURV

RecordLinkage

Reol

Rgnuplot

RGraphics

riv

RM2

RMark

rms

RobAStBase

robCompositions

robust

robustX

rrcovHD

rrcovNA

rriskDistributions

rrlda

RSurvey

RTextTools

rtop

rugarch

RWeka

Rz

SciencesPo

sdcMicroGUI

semsemGOF

semPlot

seqMeta

seriation

sfsmisc

SGP

sgr

shotGroups

simPH

simPopulation

simsem

skatMeta

Sleuth3

smacof

snp.plotter

SODC

spacodiR

SpatialNP

sperrorest

SSDforR

statnet

SuperLearner

surveillance

tableplotTIMP

tonymisc

tourr

tourrGui

treemap

TreeSim

TriMatch

trimcluster

trioGxE

tspmeta

vcdExtra

VIM

vines

vitality

VizCompX

widals

wild1

wnominate

YaleToolkit

zooimage

mvtnorm

University of Zurich, IFSPM 2013-09-26 multcomp Page 5

Page 6: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

The mvtnorm Environment

10.

9.

8.

7.

6.

5.

4.

3.

2.

1.

Dependency Ranking of 4828 CRAN Packages

Number of reverse dependencies

0 200 400 600 800

methods

MASS

stats

lattice

graphics

utils

survival

mvtnorm

Matrix

ggplot2

University of Zurich, IFSPM 2013-09-26 multcomp Page 6

Page 7: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Application: Parametric Simultaneous Inference

– Our main motivation to package-up Alan’s FORTRAN code was to geteasy access to the multivariate normal and t distributions for theimplementation of multiple tests and multiple comparison procedures.

– Basic idea: Extract parameter estimates (coef(), mostly) and theircovariance matrix (vcov(), mostly) from fitted (semi)parametricmodels. For models where either the exact t distribution of theestimated coefficients is known or a central limit ensures theconvergence to a multivariate normal distribution, the multcompadd-on package implements simultaneous inference procedures forlinear functions of the parameters.

– Very much inspired by the book Multiple Comparisons and MultipleTests using the SAS® System (Westfall et al., 1999).

– We wanted to be nice and make this as painless as possible. Did wesucceed?

University of Zurich, IFSPM 2013-09-26 multcomp Page 7

Page 8: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Application: Parametric Simultaneous Inference

> library("multcomp")

> data("alpha", package = "coin")

> amod <- aov(elevel ~ alength, data = alpha)

> amod_glht <- glht(amod, linfct = mcp(alength = "Tukey"))

– The main function is glht() (general linear hypotheses), taking any(OK, OK, maybe not all of them) parametric model as input alongwith a matrix defining the linear function of interest.

– summary() and confint() methods allow global and simulaneousinference on these linear functions.

– mcp() sets-up linear functions corresponding to many multiplecomparison procedures taking the model contrasts into account.

– The real work, of course, is done by mvtnorm.

University of Zurich, IFSPM 2013-09-26 multcomp Page 8

Page 9: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Application: Parametric Simultaneous Inference

> summary(amod_glht)

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = elevel ~ alength, data = alpha)

Linear Hypotheses:

Estimate Std. Error t value Pr(>|t|)

intermediate - short == 0 0.4342 0.3836 1.132 0.4924

long - short == 0 1.1888 0.5203 2.285 0.0614 .

long - intermediate == 0 0.7546 0.4579 1.648 0.2270

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Adjusted p values reported -- single-step method)

University of Zurich, IFSPM 2013-09-26 multcomp Page 9

Page 10: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Application: Parametric Simultaneous Inference

> confint(amod_glht)

Simultaneous Confidence Intervals

Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = elevel ~ alength, data = alpha)

Quantile = 2.3719

95% family-wise confidence level

Linear Hypotheses:

Estimate lwr upr

intermediate - short == 0 0.43415 -0.47580 1.34410

long - short == 0 1.18875 -0.04524 2.42274

long - intermediate == 0 0.75460 -0.33141 1.84061

University of Zurich, IFSPM 2013-09-26 multcomp Page 10

Page 11: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

multcomp History

2002 multcomp version 0.2-6 published on CRAN 2002-06-20.One-way ANOVA only.

2006 multcomp version 0.991-1 published on CRAN.Simultaneous tests and confidence intervals for generallinear hypotheses in parametric models, including linear,generalized linear, linear mixed effects, and survival models.

2008 Hothorn, Bretz, Westfall: Simultaneous inference in generalparametric models (BiomJ, 2008): 380 (1 / 373, ISI WoS)

2010 Bretz, Hothorn, Westfall: Multiple Comparisons Using R(CRC Press): 136 (google scholar)

University of Zurich, IFSPM 2013-09-26 multcomp Page 11

Page 12: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

mvtnorm R Add-on Package

Package: multcomp

Title: Simultaneous Inference in General Parametric Models

Version: 1.2-20

Date: 2013-08-29

Authors@R: c(person("Torsten", "Hothorn", role = c("aut", "cre"),

email = "[email protected]"),

person("Frank", "Bretz", role = "aut"),

person("Peter", "Westfall", role = "aut"),

person("Richard M.", "Heiberger", role = "ctb"),

person("Andre", "Schuetzenmeister", role = "ctb"))

Description: Simultaneous tests and confidence intervals

for general linear hypotheses in parametric models, including

linear, generalized linear, linear mixed effects, and survival

models. The package includes demos reproducing analyzes presented

in the book "Multiple Comparisons Using R" (Bretz, Hothorn,

Westfall, 2010, CRC Press).

Depends: stats, graphics, mvtnorm (>= 0.8-0), survival (>= 2.35-7)

Suggests: lme4 (>= 0.999375-16), nlme, robustbase, mboost, coin,

MASS, car, foreign, xtable, sandwich, lmtest, coxme (>= 2.2-1)

License: GPL-2

University of Zurich, IFSPM 2013-09-26 multcomp Page 12

Page 13: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

The multcomp Environment

●●

● ●

●●

●●

●●

●●

● ●

benchmark

bifactorialBiodiversityR BSagri

cg

coin

Deducer

doBy

DoseFinding

ETC

evtree

fastR

flexmix

ggplot2

gMCP

HH

HSAUR2

languageRlsmeans

MAVTgsa

MCPAN

mcprofile

mratios

multcompViewmutossmutossGUI

MVAnparcomp

opm

Rcmdr

RcmdrPlugin.coinRcmdrPlugin.doex

RcmdrPlugin.EZR

RcmdrPlugin.HHRcmdrPlugin.StatisticalURV

rmsSimComp

Sleuth3

abctools

abd

afex

agridat

aoristic

archetypes

automap

bams

bbmle

BCA

BCEA

bcrm

bear

betaregbfp

Biograph

caret

catdata

catenary

Causata

cda

choplumpchron

classify

clhs

clue

clusthaplo

coefplot

coloc

contrast

CPE

cplm

cvxclustr

Daim

data.table

DeducerExtras

DeducerPlugInExample

DeducerPlugInScaling

DeducerSpatial

DeducerSurvival

DeducerText

demi

DescribeDisplay

dielectric

directlabels

diveRsity

dsm

DTR

dynsurv

earlywarnings

EasyHTMLReport

eeptools

ENmisc

EnQuireR

epibase

erer

evaluate

expands

extracat

ez

ezsim

FAOSTAT

FeaLect

FField FinCal

fishmove

fpc

frontiles

fscaret

gap

gazetools

gcookbook

GGallyggdendro ggmap

ggmcmcggparallel

ggROC

ggsubplot

ggthemes

gitter

glinternet

glmperm

GOsummaries

gpmap

gppois

granovaGG

gridDebug

gridExtra

growcurves

gsDesign

GSE

gstudio

gsubfn

haplo.stats

HistData

HiveR

HLMdiag

Hmisc

HSAUR

hyperSpec

interval

IPSUR

kdetrees

klaR

Kmisc

knitrBootstrap

kobe

Lahman

LCAextend

lda

LDheatmap

likeLTD

LMERConvenienceFunctions

localgauss

lordif

MAc

MAd

marked

mchof

MergeGUI

meteogRam

microbenchmark

micromap

MIPHENO

MissingDataGUI

mistatmixstock

Mobilize

ModelGood

mosaic

MRMR

MSG

multcomp

MultEq

multilevelPSA

multitable

munsellNeatMap

ngramr

NlsyLinks

nonparaeff

nullabor

OpenStreetMap

opmdata

pa PairedData

pairwiseCI

pander

party

PAWL

pbdPROF

PBImisc

pcrcoal

pcrsim

Peak2Trough

pec

pequod

perm

perry

perturb

PhaseType

phylosim

pitchRx

PKgraph

PKreport

planar

playwithpmgPopGenReport

poppr

popReconstruct

postgwas

PracTools

PReMiuM

PRISMA

processdata

profr

ProgGUIinR

ProjectTemplate

psd

psychomix

qdap

quadrupen

R2admb

rAltmetric

rankhazard

rasterVis

rattle

rbefdata

Rcell

RcmdrPlugin.BCARcmdrPlugin.depthTools

RcmdrPlugin.doBy

RcmdrPlugin.DoE

RcmdrPlugin.EACSPIR

RcmdrPlugin.EBM

RcmdrPlugin.epackRcmdrPlugin.IPSUR

RcmdrPlugin.KMggplot2

RcmdrPlugin.lfstat

RcmdrPlugin.MA

RcmdrPlugin.mosaic

RcmdrPlugin.MPAStatsRcmdrPlugin.orloca

RcmdrPlugin.plotByGroupRcmdrPlugin.qual

RcmdrPlugin.sampling

RcmdrPlugin.SCDA

RcmdrPlugin.seeg

RcmdrPlugin.SLC

RcmdrPlugin.SM

RcmdrPlugin.survival

RcmdrPlugin.TeachingDemos

RcmdrPlugin.temis

RcmdrPlugin.UCA

refGenome

rgbifRGraphics

riskRegression

robustbase

robustHD

robustlmm

rotations

rpf

rplos

RSA

RSDA

rtematres

rtf

RVAideMemoire

rvertnet

Rz

SamplerCompareSCGLR

SciencesPo

SDaA

seewave

shotGroups

simPH

SixSigma

SMFI5

sparkTable

spcosa

spikeSlabGAM

SPOT

strvalidator

SuperLearner

survAUC

SvyNom

synbreed

tabplot

taRifxTeachingDemos

timeit

timeline

TimeProjection tis

tlemix

tourr

tourrGui

transmission

trapezoid

treecm

TriMatch

TripleR

Tsphere

tspmeta

useful

varbvs

vcd

vcdExtra

wethepeople

wq

xkcd

XLConnectYourCast

zoo

University of Zurich, IFSPM 2013-09-26 multcomp Page 13

Page 14: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

The multcomp Environment

72.

...

10.

9.

8.

7.

6.

5.

4.

3.

2.

1.

Dependency Ranking of 4828 CRAN Packages

Number of reverse dependencies

0 200 400 600 800

methods

MASS

stats

lattice

graphics

utils

survival

mvtnorm

Matrix

ggplot2

multcomp

University of Zurich, IFSPM 2013-09-26 multcomp Page 14

Page 15: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

multcomp Applications

2008 2009 2010 2011 2012 2013

Hothorn et al. (2008, BiomJ)

Num

ber

of c

itatio

ns

020

4060

8010

012

0

University of Zurich, IFSPM 2013-09-26 multcomp Page 15

Page 16: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

multcomp Applications

Hothorn et al. (2008, BiomJ)

Number of citations

0 20 40 60 80

Ecology

Plant Sciences

Environmental Sciences

Biodiversity Conservation

Marine−Freshwater Biology

Zoology

Statistics Probability

Entomology

Forestry

Toxicology

University of Zurich, IFSPM 2013-09-26 multcomp Page 16

Page 17: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

multcomp As A Personal Research Tool

2010 Herberich, Sikorski & Hothorn: A robust procedure forcomparing multiple means under heteroscedasticity inunbalanced designs (PLoS ONE)

2012 Herberich & Hothorn: Dunnett-type inference in the frailtyCox model with covariates (SiM)

2013 Herberich & Hothorn: Multiple curve comparisons(hopefully)

University of Zurich, IFSPM 2013-09-26 multcomp Page 17

Page 18: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Computational Frameworks

– multcomp is an example of a fairly broad yet dense implementation ofa rich theory.

– Extensible: Plug-in your own (semi)parametric models, linearfunctions, or covariance matrices.

– Makes QA easy–especially when there are many published examplesavailable (Multiple comparisons SAS book).

– With good documentation being available (paper!), people pick up thepackage (and cite the paper) fast.

– One can also fill the (theoretical) gaps and publish in Stats journals.

– This does not only worked for multcomp, but also for ...

University of Zurich, IFSPM 2013-09-26 multcomp Page 18

Page 19: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Computational Frameworks: party

– Implements conditional inference trees for unbiased recursivepartitioning; applicable to arbitrary responses and also includes anunbiased random forest variant.

– Unbiased recursive partitioning: A conditional inference framework(JCGS, 2006): 235 (1 / 403)

– Bias in random forest variable importance measures: Illustrations,sources and a solution (BMC Bioinf, 2007): 177 (11 / 4589)

– Many follow-up papers, mostly by other groups.

University of Zurich, IFSPM 2013-09-26 multcomp Page 19

Page 20: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Computational Frameworks: coin

– Implements linear permutation tests (including ‘exact’ variants of manywell-known tests) and also permutation-based simultaneous inference(more to come soon).

– A Lego system for conditional inference (AmStat, 2006): 79 (2 / 452)

– Implementing a Class of Permutation Tests: The coin Package (JSS,2008): 76 (7 / 343)

– No follow-up paper so far (because of extreme lazyness).

University of Zurich, IFSPM 2013-09-26 multcomp Page 20

Page 21: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Computational Frameworks: mboost

– Implements generic functional gradient descent (boosting) for a largeclass of (classical and novel) regression models.

– Boosting algorithms: Regularization, prediction and model fitting(StatSci, 2007): 105 (1 / 256)

– Kept 4 + 1 PhD students busy so far, couple of papers...

University of Zurich, IFSPM 2013-09-26 multcomp Page 21

Page 22: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Summary

– Implementing your methods by writing software (in a way others canactually use it) is rewarding:

1. You really understand what’s going on and where the pitfalls are.2. It stimulates your own research.3. It makes your own research reproducible.4. It spreads the word about your ideas very fast.5. It makes your contributions visible.6. Your students have something to chew on.

– On the downside:1. Users will complain (lack of functionality, documentation, high-level

interfaces...).2. Users will find errors (well, that’s actually a good thing...).3. Maintainance takes time (more time than writing the software in the

first place).4. Making sure your package is in sync with all its reverse dependencies

can be very challenging these days.5. You write the code but your students write the paper (I’m not sure

about this one...).

University of Zurich, IFSPM 2013-09-26 multcomp Page 22

Page 23: The multcomp R Add-on Package - UZHuser.math.uzh.ch/hothorn/talks/multcomp_Hannover_2013.pdfThe Power of R {The R system for statistical computing is nowadays the lingua franca for

Thank you...

...for your attention!

University of Zurich, IFSPM 2013-09-26 multcomp Page 23