visualizing multivariate linear models in r outline the ...datavis.ca/papers/toulouser-2x2.pdfmanova...

9
Visualizing multivariate linear models in R Michael Friendly 1 Matthew Sigal 1 1 York University, Toronto Cinqui ` emes Recontres R Toulouse, June 22–24, 2016 6.0 6.5 7.0 7.5 8.5 9.0 9.5 10.0 tear gloss + Error rate additive rate:additive Group Low High Low High rate additive rate:additive Main effects 0 20 40 60 80 100 40 50 60 70 80 90 100 110 SAT PPVT + Error n s ns na ss + Error n s ns na ss Hi Lo Outline 1 Background Overview Visual overview Data ellipses The Multivariate Linear Model 2 Hypothesis Error (HE) plots Motivating example Visualizing H and E variation MANOVA designs MREG designs 3 Reduced-rank displays Low-D displays of high-D data Canonical discriminant HE plots 4 Recent extensions Robust MLMs Influence diagnostics for MLMs 5 Conclusions Al 10.1 20.8 + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn Fe 0.92 7.09 + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn Mg 0.53 7.23 + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn Ca 0.01 0.31 + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn + Error Site AshleyRails Caldicot IsleThorns Llanedyrn Na 0.03 0.54 Slides: http://datavis.ca/papers/ToulouseR-2x2.pdf 2 / 36 Background Overview Overview: Research topics This talk outlines research on graphical methods for multivariate linear models (MLMs)— extending visualization for multiple regression, ANOVA, and ANCOVA designs to those with several response variables. The topics addressed include: Visualizing multivariate tests with Hypothesis–Error (HE) plots in 2D and 3D Low-D views: Generalized canonical discriminant analysis canonical HE plots Visualization methods for tests of equality of covariance matrices in MANOVA designs Extending these methods to robust MLMs Developing multivariate analogs of influence measures and diagnostic plots for MLMs. 3 / 36 Background Overview Overview: R packages The following R packages implement these methods: car package: provides the infrastructure for hypothesis tests (Anova()) and tests of linear hypotheses (linearHypothesis()) in MLMs, including repeated measures designs. heplots package: implements the HE plot framework in 2D (heplot()), 3D (heplot3d()), and scatterplot matrix form (pairs.mlm()). Also provides: covEllipses() for covariance ellipses, with optional robust estimation boxM() and related methods for testing / visualizing equality of covariance matrices in MANOVA Tutorial vignettes and many data set examples of use candisc package: generalized canonical discriminant analysis for an MLM, and associated plot methods. mvinfluence package: Multivariate extensions of leverage and influence (Cook’s D) and influencePlot.mlm() in various forms. 4 / 36

Upload: others

Post on 10-Feb-2021

1 views

Category:

Documents


1 download

TRANSCRIPT

  • Visualizing multivariate linear models in R

    Michael Friendly1 Matthew Sigal11York University, Toronto

    Cinquièmes Recontres RToulouse, June 22–24, 2016

    6.0 6.5 7.0 7.5

    8.5

    9.0

    9.5

    10.0

    tear

    glos

    s

    +

    Error

    rate

    additiverate:additive

    Group

    Low

    High●

    Low

    High

    rate

    additiverate:additive

    Main effects

    0 20 40 60 80 100

    4050

    6070

    8090

    100

    110

    SATP

    PV

    T

    +

    Error

    ns ns

    na

    ss

    +

    Error

    n

    sns

    na

    ss

    Hi

    Lo

    Outline1 Background

    OverviewVisual overviewData ellipsesThe Multivariate Linear Model

    2 Hypothesis Error (HE) plotsMotivating exampleVisualizing H and E variationMANOVA designsMREG designs

    3 Reduced-rank displaysLow-D displays of high-D dataCanonical discriminant HE plots

    4 Recent extensionsRobust MLMsInfluence diagnostics for MLMs

    5 Conclusions

    c(min, max)

    Al10.1

    20.8

    Fe

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Mg

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Ca

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Fe0.92

    7.09

    Mg

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Ca

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Fe

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Mg0.53

    7.23

    Ca

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Fe

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Mg

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Ca0.01

    0.31

    Na

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    +

    Error

    Site

    ●● ●

    AshleyRailsCaldicot IsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRailsCaldicotIsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRailsCaldicotIsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRails CaldicotIsleThorns

    Llanedyrn

    c(m

    in, m

    ax)

    Na0.03

    0.54

    Slides: http://datavis.ca/papers/ToulouseR-2x2.pdf

    2 / 36

    Background Overview

    Overview: Research topics

    This talk outlines research on graphical methods for multivariate linearmodels (MLMs)— extending visualization for multiple regression, ANOVA,and ANCOVA designs to those with several response variables.The topics addressed include:

    Visualizing multivariate tests with Hypothesis–Error (HE) plots in 2D and 3DLow-D views: Generalized canonical discriminant analysis→ canonical HEplotsVisualization methods for tests of equality of covariance matrices inMANOVA designsExtending these methods to robust MLMsDeveloping multivariate analogs of influence measures and diagnostic plotsfor MLMs.

    3 / 36

    Background Overview

    Overview: R packages

    The following R packages implement these methods:car package: provides the infrastructure for hypothesis tests (Anova())and tests of linear hypotheses (linearHypothesis()) in MLMs,including repeated measures designs.heplots package: implements the HE plot framework in 2D (heplot()),3D (heplot3d()), and scatterplot matrix form (pairs.mlm()). Alsoprovides:

    covEllipses() for covariance ellipses, with optional robust estimationboxM() and related methods for testing / visualizing equality of covariancematrices in MANOVATutorial vignettes and many data set examples of use

    candisc package: generalized canonical discriminant analysis for anMLM, and associated plot methods.mvinfluence package: Multivariate extensions of leverage and influence(Cook’s D) and influencePlot.mlm() in various forms.

    4 / 36

    http://datavis.ca/papers/ToulouseR-2x2.pdf

  • Background Overview

    Context: The LM family and friendsModels, graphical methods and opportunities

    Gender: Male

    Adm

    it: A

    dmitt

    ed

    Gender: Female

    Adm

    it: R

    ejec

    ted

    1198 1493

    557 1278

    5 / 36

    Background Visual overview

    Visual overview: Multivariate data, Yn×p

    What we know how to do well (almost)2 vars: Scatterplot + annotations (data ellipses)p vars: Scatterplot matrix (all pairs)p vars: Reduced-rank display– show max. total variation 7→ biplot

    Setosa

    Versicolor

    Virginica

    Sep

    al le

    ngth

    (m

    m)

    40

    50

    60

    70

    80

    Petal length (mm)

    10 20 30 40 50 60 70

    Setosa

    Versicolor

    Virginica

    Sep

    al le

    ngth

    (m

    m)

    40

    50

    60

    70

    80

    Petal length (mm)

    10 20 30 40 50 60 70

    SepalLen

    43

    79

    SepalWid

    20

    44

    PetalLen

    10

    69

    PetalWid

    1

    25

    −3 −2 −1 0 1 2 3

    −3

    −2

    −1

    01

    23

    Dimension 1 (73.0 %)

    Dim

    ensi

    on 2

    (23

    .0 %

    ) setosa

    +

    versicolor

    +

    virginica

    +●

    ● ●

    ●●

    ●●

    Sepal.Length

    Sepal.Width

    Petal.LengthPetal.Width

    6 / 36

    Background Visual overview

    Visual overview: Multivariate linear model,Y = X B + U

    What is new here?2 vars: HE plot— data ellipses of H (fitted) and E (residual) SSP matricesp vars: HE plot matrix (all pairs)p vars: Reduced-rank display— show max. H wrt. E 7→ Canonical HEplot

    1 2 3 4 5 6 7

    4.5

    5.0

    5.5

    6.0

    6.5

    7.0

    Petal.Length

    Sep

    al.L

    engt

    h

    +

    Error

    Species

    setosa

    versicolor

    virginica

    c(min, max)

    Petal.Length

    1

    6.9

    Sepal.Length

    Pet

    al.L

    engt

    h

    +Error

    Species

    setosa

    versicolor

    virginica

    Petal.Width

    Pet

    al.L

    engt

    h

    +Error

    Species

    setosa

    versicolor

    virginica

    Sepal.Width

    Pet

    al.L

    engt

    h

    +Error

    Species●

    setosa

    versicolor

    virginica

    Petal.Length

    +

    ErrorSpecies

    setosa

    versicolor

    virginica

    c(min, max)

    c(m

    in, m

    ax)

    Sepal.Length

    4.3

    7.9

    Petal.Width

    Sep

    al.L

    engt

    h

    +

    ErrorSpecies

    setosa

    versicolor

    virginica

    Sepal.Width

    Sep

    al.L

    engt

    h

    +

    Error

    Species●

    setosa

    versicolor

    virginica

    Petal.Length

    +Error

    Species

    setosa

    versicolor

    virginica

    Sepal.Length

    Pet

    al.W

    idth

    +Error

    Species

    setosa

    versicolor

    virginica

    c(min, max)

    c(m

    in, m

    ax)

    Petal.Width

    0.1

    2.5

    Sepal.Width

    Pet

    al.W

    idth

    +Error

    Species●

    setosa

    versicolor

    virginica

    +

    Error

    Species

    setosa

    versicolor

    virginica

    Sep

    al.W

    idth

    +

    Error

    Species

    setosa

    versicolor

    virginica

    Sep

    al.W

    idth

    +

    Error

    Species

    setosa

    versicolor

    virginica

    c(m

    in, m

    ax)

    Sepal.Width

    2

    4.4

    −10 −5 0 5 10 15

    −4

    −2

    02

    46

    8

    Canonical dim. 1 (99.1%)

    Can

    onic

    al d

    im. 2

    (0.

    9%)

    +Error

    Species●

    ●setosa

    versicolorvirginica

    Petal.Length

    Sepal.LengthPetal.Width

    Sepal.Width

    7 / 36

    Background Visual overview

    Visual overview: Recent extensions

    Extending univariate methods to MLMs:Robust estimation for MLMsInfluence measures and diagnostic plots for MLMsVisualizing canonical correlation analysis

    10 12 14 16 18 20

    02

    46

    8

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    ClassicalRobust

    ●●

    ●●

    −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

    −4

    −3

    −2

    −1

    log Leverage

    log

    Res

    idua

    l

    5

    10

    14

    25

    2729

    ● ●●

    ● ●●

    ● ●

    ●●

    −2 −1 0 1 2

    −2

    −1

    01

    2

    PA dimension 1

    Abi

    lity

    dim

    ensi

    on 1 ●

    ● ●●

    ● ●●

    ● ●

    ●●

    3

    17

    11

    1st Canonical dimension: CanR=0.67 (77.3%)

    8 / 36

  • Background Data ellipses

    Data ellipsoids: Visually sufficient summaries

    For any p-variable, multivariate normal y ∼ Np(µ,Σ), the mean vector ȳand sample covariance S are sufficient statisticsGeometrically, contours of constant density are ellipsoids centered at µwith size and shape determined by Σ7→ the data (concentration) ellipsoid, E(ȳ ,S) is a sufficient visualsummaryEasily robustified by using robust estimators of location and scatter

    9 / 36

    Background Data ellipses

    Data Ellipses: Galton’s data

    Mid

    Par

    ent h

    eigh

    t

    61

    63

    65

    67

    69

    71

    73

    75

    Child height

    61 63 65 67 69 71 73 75

    r

    (0.40) Univariate: mean +- 1s

    (0.68) Bivariate: mean +- 1s

    r

    Galton’s data on Parent & Child height: 40%, 68% and 95% data ellipses

    10 / 36

    Background Data ellipses

    The Data Ellipse: Details

    Visual summary for bivariate relationsShows: means, standard deviations, correlation, regression line(s)Defined: set of points whose squared Mahalanobis distance ≤ c2,

    D2(y) ≡ (y − ȳ)T S−1 (y − ȳ) ≤ c2

    S = sample covariance matrixRadius: when y is ≈ bivariate normal, D2(y) has a large-sample χ22distribution with 2 degrees of freedom.

    c2 = χ22(0.40) ≈ 1: 1 std. dev univariate ellipse– 1D shadows: ȳ ± 1sc2 = χ22(0.68) = 2.28: 1 std. dev bivariate ellipsec2 = χ22(0.95) ≈ 6: 95% data ellipse, 1D shadows: Scheffé intervals

    Construction: Transform the unit circle, U = (sinθ, cosθ),

    Ec = ȳ + cS1/2U

    S1/2 = any “square root” of S (e.g., Cholesky)

    p variables: Extends naturally to p-dimensional ellipsoids

    11 / 36

    Background The Multivariate Linear Model

    The univariate linear model

    Model: yn×1 = Xn×q βq×1 + �n×1, with � ∼ N (0, σ2In)

    LS estimates: β̂ = (X TX )−1X TyGeneral Linear Test: H0 : Ch×q βq×1 = 0, where C = matrix ofconstants; rows specify h linear combinations or contrasts of parameters.e.g., Test of H0 : β1 = β2 = 0 in model yi = β0 + β1x1i + β2x2i + �i

    Cβ =[

    0 1 00 0 1

    ] β0β1β2

    = ( 00)

    All→ F-test: How big is SSH relative to SSE?

    F =SSH/dfhSSE/dfe

    =MSHMSE

    −→ (MSH − F MSE ) = 0

    12 / 36

  • Background The Multivariate Linear Model

    The multivariate linear model

    Model: Yn×p = Xn×q Bq×p + U, for p responses, Y = (y1,y2, . . . ,yp)General Linear Test: H0 : Ch×q Bq×p = 0h×pAnalogs of sums of squares, SSH and SSE are (p× p) matrices, H and E,

    H = (CB̂)T [C(X TX )−CT]−1 (CB̂) ,

    E = UTU = Y T[I − H]Y .

    Analog of univariate F is

    det (H − λE) = 0 ,

    How big is H relative to E ?Latent roots λ1, λ2, . . . λs measure the “size” of H relative to E ins = min(p, dfh) orthogonal directions.Test statistics (Wilks’ Λ, Pillai trace criterion, Hotelling-Lawley trace criterion,Roy’s maximum root) all combine info across these dimensions

    13 / 36

    HE plots Motivating example

    Motivating Example: Romano-British Pottery

    Tubb, Parker & Nicholson analyzed the chemical composition of 26 samplesof Romano-British pottery found at four kiln sites in Britain.

    Sites: Ashley Rails, Caldicot, Isle of Thorns, LlanedrynVariables: aluminum (Al), iron (Fe), magnesium (Mg), calcium (Ca) andsodium (Na)→ One-way MANOVA design, 4 groups, 5 responses

    1 R> library(heplots)2 R> Pottery

    1 Site Al Fe Mg Ca Na2 1 Llanedyrn 14.4 7.00 4.30 0.15 0.513 2 Llanedyrn 13.8 7.08 3.43 0.12 0.174 3 Llanedyrn 14.6 7.09 3.88 0.13 0.205 . . .6 25 AshleyRails 14.8 2.74 0.67 0.03 0.057 26 AshleyRails 19.1 1.64 0.60 0.10 0.03

    14 / 36

    HE plots Motivating example

    Motivating Example: Romano-British Pottery

    Questions:Can the content of Al, Fe, Mg, Ca and Na differentiate the sites?How to understand the contributions of chemical elements todiscrimination?

    Numerical answers:1 R> pottery.mod car::Manova(pottery.mod)

    1 Type II MANOVA Tests: Pillai test statistic2 Df test stat approx F num Df den Df Pr(>F)3 Site 3 1.55 4.30 15 60 2.4e-05 ***4 ---5 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    What have we learned?Can: YES! We can discriminate sites.But: How to understand the pattern(s) of group differences: ???

    15 / 36

    HE plots Motivating example

    Motivating Example: Romano-British Pottery

    Univariate plots are limited

    Do not show the relations of response variables to each otherDo not show how variables contribute to multivariate tests

    AshlyRls Caldicot IslThrns Llandyrn

    1012

    1416

    1820

    Al

    AshlyRls Caldicot IslThrns Llandyrn

    12

    34

    56

    7

    Fe

    ●●

    AshlyRls Caldicot IslThrns Llandyrn

    12

    34

    56

    7

    Mg

    AshlyRls Caldicot IslThrns Llandyrn

    0.00

    0.10

    0.20

    0.30

    Ca

    ●●

    ●●

    AshlyRls Caldicot IslThrns Llandyrn

    0.1

    0.2

    0.3

    0.4

    0.5

    Na

    16 / 36

  • HE plots Motivating example

    Motivating Example: Romano-British Pottery

    Visual answer: HE plot

    Shows variation of means (H)relative to residual (E) variationSize and orientation of H wrt E :how much and how variablescontribute to discriminationEvidence scaling: H is scaled sothat it projects outside E iff nullhypothesis is rejected.

    Run heplot-movie.ppt

    1 R> heplot3d(pottery.mod)

    17 / 36

    HE plots Visualizing H and E variation

    HE plots: Visualizing H and E variation

    Scatter around group meansrepresented by each ellipse

    (a)

    1

    2

    3

    4

    5

    6

    7

    8

    (b)

    H matrix

    E matrix

    Deviations of group means fromgrand mean (H) and pooledwithin-group (E) ellipses.

    How big is H relative to E?

    (b)

    (a) Individual group scatterY2

    0

    10

    20

    30

    40

    Y1

    0 10 20 30 40

    (b) Between and Within ScatterY2

    0

    10

    20

    30

    40

    Y1

    0 10 20 30 40

    Ideas behind multivariate tests: (a) Data ellipses; (b) H and E matrices

    H ellipse: data ellipse for fitted values, ŷij = ȳj .E ellipse: data ellipse of residuals, ŷij − ȳj .

    18 / 36

    HE plots Visualizing H and E variation

    HE plots: Visualizing multivariate hypothesis tests

    H*

    HE-1

    E*

    I

    The E matrix is orthogonalizedby its principal components.The same transformation is

    applied to the H matrix.

    (c)

    HE-1

    I

    The size of HE-1 is now showndirectly by the size of its

    latent roots.

    (d)

    (c) H Matrix standardized by E matrix, giving HE -1

    Sec

    ond

    Can

    onic

    al d

    imen

    sion

    -6

    -4

    -2

    0

    2

    4

    6

    First Canonical dimension

    -6 -4 -2 0 2 4 6

    (d) Principal axes of HE -1

    Sec

    ond

    Can

    onic

    al d

    imen

    sion

    -6

    -4

    -2

    0

    2

    4

    6

    First Canonical dimension

    -6 -4 -2 0 2 4 6

    Ideas behind multivariate tests: latent roots & vectors of HE−1

    λi , i = 1, . . .dfh show size(s) of H relative to E .latent vectors show canonical directions of maximal difference.

    19 / 36

    HE plots MANOVA designs

    HE plot details: H and E matrices

    Recall the data on 5 chemical elements in samples of Romano-British potteryfrom 4 kiln sites:

    1 R> summary(Manova(pottery.mod))

    1 Sum of squares and products for error:2 Al Fe Mg Ca Na3 Al 48.29 7.080 0.608 0.106 0.5894 Fe 7.08 10.951 0.527 -0.155 0.0675 Mg 0.61 0.527 15.430 0.435 0.0286 Ca 0.11 -0.155 0.435 0.051 0.0107 Na 0.59 0.067 0.028 0.010 0.1998 -----------------------------------9

    10 Term: Site1112 Sum of squares and products for hypothesis:13 Al Fe Mg Ca Na14 Al 175.6 -149.3 -130.8 -5.89 -5.3715 Fe -149.3 134.2 117.7 4.82 5.3316 Mg -130.8 117.7 103.4 4.21 4.7117 Ca -5.9 4.8 4.2 0.20 0.1518 Na -5.4 5.3 4.7 0.15 0.26

    E matrix: Within-group(co)variation of residuals

    diag: SSE for each variableoff-diag: ∼ partialcorrelations

    H matrix: Between-group(co)variation of means

    diag: SSH for each variableoff-diag: ∼ correlations ofmeans

    How big is H relative to E?Ellipsoids: dim(H) = rank(H) =min(p,dfh)

    20 / 36

  • HE plots MANOVA designs

    HE plot details: Scaling H and E

    The E ellipse is divided bydfe = (n − p)→ data ellipse of residuals

    Centered at grand means→ show factormeans in same plot.

    “Effect size” scaling– H/dfe → data ellipseof fitted values.“Significance” scaling– H ellipse protrudesbeyond E ellipse iff H0 can be rejected byRoy maximum root test

    H/(λαdfe) where λα is critical value ofRoy’s statistic at level α.direction of H wrt E 7→ linearcombinations that depart from H0.

    10 12 14 16 18 20

    02

    46

    8

    Pottery data: Al and Fe

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Effect scaling:H dfe

    E dfe

    10 12 14 16 18 20

    02

    46

    8

    Pottery data: Al and Fe

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Effect scaling:H dfe

    E dfe

    Site

    Significance scaling: H / λαdfe

    R> heplot(pottery.mod, size="effect") R>heplot(pottery.mod, size="evidence")

    21 / 36

    HE plots MANOVA designs

    HE plot details: Contrasts and linear hypotheses

    An overall effect 7→ an H ellipsoid ofs = min(p,dfh) dimensionsLinear hypotheses, of rank h,H0 : Ch×q Bq×p = 0h×p 7→ sub-ellipsoid ofdimension h

    C =[0 1 0 00 0 1 0

    ]1D tests and contrasts 7→ degenerate 1Dellipses (lines)Beautiful geometry:

    Sub-hypotheses are tangent to enclosinghypothesesOrthogonal contrasts form conjugate axes

    10 12 14 16 18 20

    02

    46

    8

    Pottery data: Contrasts

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Site: 3 df H

    10 12 14 16 18 20

    02

    46

    8

    Pottery data: Contrasts

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Site: 3 df H

    Caldicot & Isle Thorns

    Contrast:2 df H

    10 12 14 16 18 20

    02

    46

    8

    Pottery data: Contrasts

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Site: 3 df H

    Caldicot & Isle ThornsC−A

    I−A1 df H

    Contrast:2 df H

    22 / 36

    HE plots MANOVA designs

    HE plot matrices: All bivariate views

    AL stands out –opposite patternr(Fe,Mg) ≈ 1

    Jump to low-D

    c(min, max)

    Al10.1

    20.8

    Fe

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Mg

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Ca

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Al +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Fe0.92

    7.09

    Mg

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Ca

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Fe

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Mg0.53

    7.23

    Ca

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Na

    Mg +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Al

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Fe

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    Mg

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    c(min, max)

    c(m

    in, m

    ax)

    Ca0.01

    0.31

    Na

    Ca +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    +

    Error

    Site

    ●● ●

    AshleyRailsCaldicot IsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRailsCaldicotIsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRailsCaldicotIsleThorns

    Llanedyrn

    Na +

    ErrorSite

    ● ●●

    AshleyRails CaldicotIsleThorns

    Llanedyrn

    c(m

    in, m

    ax)

    Na0.03

    0.54

    R> pairs(pottery.mod)

    23 / 36

    HE plots MANOVA designs

    Two-way MANOVA– Plastic film data

    Data from an experiment to deterimne the optimal conditions forextruding plastic film.

    Factors: rate of extrusion (low/high), amount of additive (low/high)Responses: Tear resistance, film gloss, opacity→ 2× 2 MANOVA design, 3 responses, n = 5 per cell.

    HE plots show main effects, interactions and linear hypotheses in relationto each other

    1 R> plastic.mod Manova(plastic.mod, test.statistic="Roy")

    1 Type II MANOVA Tests: Roy test statistic2 Df test stat approx F num Df den Df Pr(>F)3 rate 1 1.6188 7.5543 3 14 0.003034 **4 additive 1 0.9119 4.2556 3 14 0.024745 *5 rate:additive 1 0.2868 1.3385 3 14 0.3017826 ---7 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    24 / 36

  • HE plots MANOVA designs

    Two-way MANOVA– Plastic film dataVisualizing tests of model terms andcomposite linear hypotheses

    Main effects & interaction term(1 df each)Balanced design: Hsorthogonal in 3DAdd H ellipse for test of maineffects, Main = rate +additive (2 df)Add H ellipse for test of groupdiffs, Group = rate *additive (3 df)

    6.2 6.4 6.6 6.8 7.0 7.2 7.4

    8.8

    9.0

    9.2

    9.4

    9.6

    9.8

    tear

    glos

    s

    +

    Error

    rate

    additiverate:additive

    Low

    High●

    Low

    High

    6.2 6.4 6.6 6.8 7.0 7.2 7.4

    9.0

    9.5

    10.0

    tear

    glos

    s

    +

    Error

    rate

    additiverate:additive

    Main effects

    Low

    High●

    Low

    High

    6.0 6.5 7.0 7.5

    8.5

    9.0

    9.5

    10.0

    tear

    glos

    s

    +

    Error

    rate

    additiverate:additive

    Groups

    Low

    High●

    Low

    High

    25 / 36

    Reduced-rank displays Low-D displays of high-D data

    Low-D displays of high-D dataHigh-D data often shown in 2D (or 3D) views— orthogonal projections invariable space— scatterplotDimension-reduction techniques: project the data into subspace that hasthe largest shadow— e.g., accounts for largest variance.→ low-D approximation to high-D data

    26 / 36

    Reduced-rank displays Canonical discriminant HE plots

    Canonical discriminant HE plotsAs with biplot, we can visualize MLM hypothesis variation for allresponses by projecting H and E into low-rank space.Canonical projection: Yn×p 7→ Zn×s = YE−1/2V , where V = eigenvectorsof HE−1.This is the view that maximally discriminates among groups, ie max. Hwrt E !

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    −10 −5 0 5 10 15

    −4

    −2

    02

    46

    8

    Canonical dim. 1 (99.1%)

    Can

    onic

    al d

    im. 2

    (0.

    9%)

    ++

    +setosa

    versicolor

    virginica Petal.LengthSepal.Length

    Petal.Width

    Sepal.Width

    ●●

    27 / 36

    Reduced-rank displays Canonical discriminant HE plots

    Canonical discriminant HE plotsCanonical HE plot is just the HE plot of canonical scores, (z1, z2) in 2D,or, z1, z2, z3, in 3D.As in biplot, we add vectors to show relations of the yi response variablesto the canonical variates.variable vectors here are structure coefficients = correlations of variableswith canonical scores.

    −10 −5 0 5 10 15

    −4

    −2

    02

    46

    8

    Canonical dim. 1 (99.1%)

    Can

    onic

    al d

    im. 2

    (0.

    9%)

    +Error

    Species●

    ●setosa

    versicolorvirginica

    Petal.Length

    Sepal.LengthPetal.Width

    Sepal.Width

    28 / 36

  • Reduced-rank displays Canonical discriminant HE plots

    Canonical discriminant HE plots: PropertiesCanonical variates are uncorrelated: E ellipse is spherical7→ axes must be equated to preserve geometryVariable vectors show how variables discriminate among groupsLengths of variable vectors ∼ contribution to discrimination

    −10 −5 0 5 10 15

    −4

    −2

    02

    46

    8

    Canonical dim. 1 (99.1%)

    Can

    onic

    al d

    im. 2

    (0.

    9%)

    +Error

    Species●

    ●setosa

    versicolorvirginica

    Petal.Length

    Sepal.LengthPetal.Width

    Sepal.Width

    29 / 36

    Reduced-rank displays Canonical discriminant HE plots

    Canonical discriminant HE plots: Pottery dataCanonical HE plots provide 2D (3D) visual summary of H vs. E variationPottery data: p = 5 variables, 4 groups 7→ dfH = 3Variable vectors: Fe, Mg and Al contribute to distingiushing (Caldicot,Llandryn) from (Isle Thorns, Ashley Rails): 96.4% of mean variationNa and Ca contribute an additional 3.5%. End of story!

    −15 −10 −5 0 5 10 15

    −6

    −4

    −2

    02

    46

    Canonical dim1 (96.4%)

    Can

    onic

    al d

    im2

    (3.5

    %)

    +Error

    Site

    AshleyRails

    Caldicot

    IsleThornsLlanedyrnAl

    FeMg

    Ca

    Na

    Run heplot-movie.ppt

    30 / 36

    Recent extensions Robust MLMs

    Robust MLMs

    R has a large collection of packages dealing with robust estimation:robust::lmrob(), MASS::rlm(), for univariate LMsrobust::glmrob() for univariate generalized LMsHigh breakdown-bound methods for robust PCA and robust covarianceestimationHowever, none of these handle the fully general MLM

    heplots now provides robmlm() for robust MLMs:Uses a simple M-estimtor via iteratively re-weighted LS.Weights: calculated from Mahalanobis squared distances, using a simplerobust covariance estimator, MASS::cov.trob() and a weight function,ψ(D2).

    D2 = (Y − Ŷ )TS−1trob(Y − Ŷ ) ∼ χ2p (1)

    This fully extends the "mlm" classCompatible with other mlm extensions: car:::Anova() and heplot().

    31 / 36

    Recent extensions Robust MLMs

    Robust MLMs: ExampleFor the Pottery data:

    0 5 10 15 20 25

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Index

    Obs

    erva

    tion

    wei

    ght

    ● ●

    ●●

    ● ● ●

    ● ●

    ● ●●

    LlanedyrnCaldicot

    IsleThornsAshleyRails

    10 12 14 16 18 20

    02

    46

    8

    Al

    Fe +

    Error

    Site

    AshleyRails

    Caldicot

    IsleThorns

    Llanedyrn

    ClassicalRobust

    Some observations are given weights ∼ 0The E ellipse is considerably reduced, enhancing apparent significance

    32 / 36

  • Recent extensions Influence diagnostics

    Influence diagnostics for MLMs

    Influence measures & diagnostic plots well-developed for univariate LMsInfluence measures: Cook’s D, DFFITS, dfbetas, etc.Diagnostic plots: Index plots, car:::influencePlot() for LMsHowever, these are have been unavailable for MLMs

    The mvinfluence package now provides:Calculation for multivariate analogs of univariate influence measures(following Barrett & Ling, 1992), e.g., Hat values & Cook’s D:

    HI = XI(X TX )−1X TI (2)

    DI = [vec(B − B(I))]T[S−1 ⊗ (X TX )][vec(B − B(I))] (3)Provides deletion diagnostics for subsets (I) of size m ≥ 1.e.g., m = 2 can reveal cases of masking or joint influence.Extension of influencePlot() to the multivariate case.A new plot format: leverage-residual (LR) plots (McCulloch & Meeter, 1983)

    33 / 36

    Recent extensions Influence diagnostics

    Influence diagnostics for MLMs: Example

    For the Rohwer data:

    ●●

    ●●

    ●●

    ●●

    0.1 0.2 0.3 0.4 0.5

    0.0

    0.2

    0.4

    0.6

    0.8

    Hat value

    Coo

    k's

    D

    5

    10

    14

    25

    2729

    Cook’s D vs. generalized Hat value

    ●●

    ●●

    −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

    −4

    −3

    −2

    −1

    log Leverage

    log

    Res

    idua

    l

    5

    10

    14

    25

    2729

    Leverage - Residual (LR) plot

    34 / 36

    Recent extensions Influence diagnostics

    Influence diagnostics for MLMs: LR plots

    Main idea: Influence ∼ Leverage(L) × Residual (R)7→ log(Infl) = log(L) + log(R)7→ contours of constant influencelie on lines with slope = -1.Bubble size ∼ influence (Cook’sD)This simplifies interpretation ofinfluence measures

    ●●

    ●●

    −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

    −4

    −3

    −2

    −1

    log Leverage

    log

    Res

    idua

    l

    5

    10

    14

    25

    2729

    35 / 36

    Conclusions

    Conclusions: Graphical methods for MLMsSummary & Opportunities

    Data ellipse: visual summary of bivariate relationsUseful for multiple-group, MANOVA dataEmbed in scatterplot matrix: pairwise, bivariate relationsEasily extend to show partial relations, robust estimators, etc.

    HE plots: visual summary of multivariate tests for MANOVA and MMRAGroup means (MANOVA) or 1-df H vectors (MMRA) aid interpretationEmbed in HE plot matrix: all pairwise, bivariate relationsExtend to show partial relations: HE plot of “adjusted responses”

    Dimension-reduction techniques: low-rank (2D) visual summariesBiplot: Observations, group means, biplot data ellipses, variable vectorsCanonical HE plots: Similar, but for dimensions of maximal discrimination

    Beautiful and useful geometries:Ellipses everywhere; eigenvector–ellipse geometries!Visual representation of significance in MLMOpportunities for other extensions

    — FIN et Merci —36 / 36

    BackgroundOverviewVisual overviewData ellipsesThe Multivariate Linear Model

    Hypothesis Error (HE) plotsMotivating exampleVisualizing H and E variationMANOVA designsMREG designs

    Reduced-rank displaysLow-D displays of high-D dataCanonical discriminant HE plots

    Recent extensionsRobust MLMsInfluence diagnostics for MLMs

    Conclusions