case study - technical university of denmark · enote 3 indhold 2 indhold 3 case study1 3.1...

eNote 3 1

eNote 3

Case study

eNote 3 INDHOLD 2

Indhold

3 Case study 1

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2 Initial explorative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 7

3.4 Post hoc analysis and summarizing the results . . . . . . . . . . . . . . . . 9

3.4.1 Estimates of the variance parameters . . . . . . . . . . . . . . . . . 9

3.4.2 Estimates of the fixed parameters . . . . . . . . . . . . . . . . . . . . 9

3.4.3 Comparisons of the fixed parameters . . . . . . . . . . . . . . . . . 10

3.5 R-TUTORIAL: Creating report ready tables and figures . . . . . . . . . . . 13

3.5.1 Plot devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.2 Plotting with colours . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5.3 Report ready tables with xtable . . . . . . . . . . . . . . . . . . . . 15

3.6 R-TUTORIAL: Initial explorative analysis . . . . . . . . . . . . . . . . . . . 17

3.7 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 18

3.8 R-TUTORIAL: Post hoc analysis and summarizing the results . . . . . . . 20

3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

eNote 3 3.1 INTRODUCTION 3

3.1 Introduction

This module consists of the first part of a complete analysis of the beech wood datapresented as an example in module 2. The aim is to show that the principles for dataanalysis and result summary for fixed ANOVA and/or regression models also applyfor mixed models. And maybe some readers will find it helpful to have some of theseprinciples reviewed.

For completeness we repeat here the description and initial factor structure considera-tions. To investigate the effect of drying of beech wood on the humidity percentage, thefollowing experiment was conducted. Each of 20 planks was dryed in a certain periodof time. Then the humidity percentage was measured in 5 depths and 3 widths for eachplank:

depth 1: close to the topdepth 5: in the centerdepth 9: close to the bottomdepth 3: between 1 and 5depth 7: between 5 and 9

width 1: close to the sidewidth 3: in the centerwidth 2: between 1 and 3

So there are 3 · 5 = 15 measurements for each plank and all together 300 observations.The data is can be found as planks.txt and is reproduced in the following table.

eNote 3 3.1 INTRODUCTION 4

Width 1 Width 2 Width 3Depth Depth Depth

Planks 1 3 5 7 9 1 3 5 7 9 1 3 5 7 91 3.4 4.9 5.0 4.9 4.0 4.1 4.7 5.2 4.6 4.3 4.4 4.8 5.0 4.9 4.22 4.3 5.5 6.2 5.4 4.7 3.9 5.6 5.7 5.5 4.9 4.0 4.7 4.5 3.9 4.03 4.2 5.5 5.6 6.3 4.5 5.4 6.2 6.1 6.4 5.2 4.5 4.9 4.9 4.9 4.44 4.4 6.0 7.1 6.9 4.6 4.6 6.1 6.6 6.5 4.7 4.9 5.9 5.8 6.4 4.75 3.9 4.7 5.2 5.0 3.7 4.2 5.2 5.4 4.8 3.9 4.0 4.4 4.4 4.1 3.56 4.6 5.9 6.3 5.8 4.8 5.9 7.3 6.9 6.9 4.4 5.2 5.7 6.6 6.0 4.07 3.9 5.6 6.0 5.3 5.0 4.9 6.9 7.1 6.1 4.5 4.3 5.4 5.9 5.5 4.28 3.9 4.5 5.3 5.6 4.7 3.7 4.9 4.8 4.9 4.3 3.8 4.5 5.4 4.8 4.09 3.6 4.1 4.0 4.4 3.7 3.8 5.1 5.0 4.6 3.3 3.0 3.9 4.7 4.9 3.8

10 6.5 8.7 9.5 7.9 6.6 6.9 8.9 7.4 7.0 6.9 5.8 7.5 7.7 7.3 5.911 3.7 5.2 5.5 5.9 4.4 4.7 5.8 5.7 4.9 4.2 3.7 5.0 6.3 5.2 4.312 4.3 5.8 6.2 5.2 4.4 4.8 6.7 7.0 6.1 5.2 5.1 5.7 5.9 6.4 5.113 6.5 8.8 9.1 8.9 6.0 5.9 7.5 8.4 7.9 5.7 4.0 4.2 4.9 4.6 3.514 4.4 6.2 6.7 6.4 4.3 5.7 7.0 7.4 7.3 5.5 4.6 6.2 6.8 5.8 4.915 5.5 7.1 7.5 6.9 5.4 6.4 8.4 8.9 8.1 6.1 6.5 8.4 9.1 9.2 7.516 5.2 6.0 6.2 6.6 5.3 6.6 7.6 7.8 7.7 5.8 5.9 6.7 6.7 5.0 3.917 3.7 4.5 5.0 4.5 3.7 3.7 4.4 4.8 4.4 4.3 3.7 4.5 4.7 5.3 3.918 6.0 7.4 7.8 7.5 5.7 6.9 8.6 8.8 7.5 5.4 5.1 6.1 5.2 5.4 4.719 3.8 4.6 4.8 4.4 3.8 3.7 4.7 4.7 4.3 3.7 3.3 3.5 3.7 3.4 3.220 6.1 7.4 7.7 6.7 4.6 4.7 6.3 7.1 6.5 5.1 4.7 6.0 6.0 6.3 4.2

In this experiment we have 3 factors apart from the trivial factors I and 0. Let us usethe factor names plank, width and depth. The factor plank has 20 levels, width has 3and depth has 5 levels. For the ith measurement of humidity, planki denotes the plankon which this measurement was performed. And correspondingly widthi and depthidenotes the width and depth, respectively, of this ith measurement. It would be naturalto include the interaction between width and depth corresponding to the product factorwidth× depth. The product factor has in this case 15 levels.

A natural model would include plank as a block factor while depth and width entertogether with their interaction. If Yi denotes the humidity percentage corresponding tothe ith measurement, the model with fixed block effect can be written as:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + δ(planki) + εi, (3-1)

where i = 1, . . . , 300 and where the εis are independent and normally distributed ran-dom variables. Or similarly:

Yijk = µ + αi + β j + γij + δk + εijk

eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5

Figur 3.1: The factor structure diagram

where Yijk is the kth measurement within the (i, j)th combination of the two factors,i = 1, . . . , 3, j = 1, . . . , 5 and k = 1, . . . , 20. As pointed out in Module 1 the block (plank)effect should be considered as a random effect, leading to the mixed model:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + d(planki) + εi, (3-2)

where d(planki) ∼ N(0, σ2Plank) and εijk ∼ N(0, σ2). This model corresponds to the

factor structure diagram given in figure 3.1.

3.2 Initial explorative analysis

Having realized the complete structure of the data, it is time to do initial plotting/ expl-orative analysis. Throughout this module, figures and results are presented without

eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 6

46

8width

mea

n of

hum

idity

1 2 3

46

8

depth

mea

n of

hum

idity

1 5 9

4.5

5.5

6.5

width

mea

n of

hum

idity

1 2 3

4.5

5.5

6.5

depthm

ean

of h

umid

ity1 5 9

Figur 3.2: Four average humidity profiles

showing R code or raw R output. This can be seen as a standard for reports in the cour-se! Typically, numerous figures not entering a final project report should be studied,since this phase is explorative, and final figures to present the key results are chosenafter the statistical analysis is completed.

The plotting of various average profiles is usually a helpful tool for data with severalfactors. In figure 3.2 four of these are presented. In the top left diagram the width hu-midity patterns for each plank is depicted by plotting the average humidity (taking theaverage of the five depths for each width and plank) against the widths.

It is immediately clear that there is extensive plank-to-plank variations in the level ofhumidity. The message about the width effect is less clear. In the top right the similarplot for the depth effect is seen. Here the message is much clearer: The humidity is highin the center (depth=5) and low at the top (depth=1) and at the bottom (depth=9). Aspointed out, this is the effect seen when the three widths are averaged. It could be thatthe depth effect is different for widths close to the side of the plank (width=1) than forwidths in the center (width=3). In other words, there could be a plank*width interactioneffect, that we wouldn’t find in the plots above. Instead similar plots are given in thebottom diagrams of figure 3.2 for the widths and depths by averaging over the planks(that is, plotting the 15 average values).

The depth structure already seen is recognized. Also, it is seen that there is a clear shiftin humidity level from width to width and that the depth humidity pattern seems to be

eNote 3 3.3 TEST OF OVERALL EFFECTS/MODEL REDUCTION 7

roughly the same for the three widths. However, there are some deviations from parallelpatterns and the uncertainties in the deviations from parallel patterns are not visible. Asimilar increasing-decreasing width pattern, that was not clearly visible from the topdiagram is now seen. This pattern seems to be roughly the same for all depths (with thesame precautions as before) and the low humidity levels for the top and bottom depthsare clearly seen. Note again that the two bottom plots contain the same information: hadthere been clearly non-parallel patterns in one figure (an interaction effect) this wouldalso appear in the other figure. The next step is to start the actual statistical analysis ofthe data.

3.3 Test of overall effects/model reduction

A statistical analysis of this kind is commonly carried out in several steps, starting withthe basic model found from the factor structure considerations. This model usually con-tains every possible effect there may be in the data. However, it is of interest to sim-plify things into easily interpretable results, if possible! So, the idea is to remove non-signifcant ”complex stuff” from the model before summarizing the results.

Carrying out the mixed model analysis corresponding to the model given by (3-2) givesthe following ANOVA table of fixed effects:

Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 266 78.26 <0.0001width 2 266 29.65 <0.0001depth*width 8 266 1.08 0.3745

We see, that the depth*width interaction effect is non-significant. Hence, we remove theinteraction term and do the analysis based on the model:

Yi = µ + α(widthi) + β(depthi) + d(planki) + εi, (3-3)

where d(planki) ∼ N(0, σ2Plank) and εi ∼ N(0, σ2). This model is illustrated by the factor

structure diagram in figure 3.3.

Note how the 8 degrees of freedom from the interaction effect has now been added tothe error degrees of freedom. The table of fixed effects then becomes:


Figur 3.3: The factor structure diagram

Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 274 78.07 <0.0001width 2 274 29.57 <0.0001

Note that the removal of the non-significant interaction effect only has minor effects onthe conclusions regarding the depth and width effects: They are both extremely signi-ficant, confirming what we “explored”above. Since there are no more non-significant

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 9

fixed effects, the model given by 3-3 is the final model to use for summarizing the re-sults.

3.4 Post hoc analysis and summarizing the results

3.4.1 Estimates of the variance parameters

The final model is given by (3-3), since main effects of as well width as depth are clearlysignificant. Estimates of the two variance parameters are:

σ̂2Planks = 0.9797, σ̂2 = 0.4047

Uncertainties of these estimates are given by:

2.5 % 97.5 %.sig01 0.72 1.37

.sigma 0.58 0.69

The remaining part of this subsection on post-hoc analysis and presentation of resultsillustrates how the information in factors can be summarized whenever the factor doesnot interact with any other factor.

3.4.2 Estimates of the fixed parameters

Estimates of the expected values (LSMEANS) for each level of depth, together with theiruncertainties and 95% confidence intervals are:

Estimate SE Lower UpperDepth 1 4.7150 0.2361 4.2270 5.2030Depth 3 5.9050 0.2361 5.4170 6.3930Depth 5 6.1950 0.2361 5.7070 6.6830Depth 7 5.8633 0.2361 5.3753 6.3514Depth 9 4.6533 0.2361 4.1653 5.1414

and correspondingly for each level of width:


Estimate SE Lower UpperWidth 1 5.5140 0.2303 5.0352 5.9928Width 2 5.7860 0.2303 5.3072 6.2648Width 3 5.0990 0.2303 4.6202 5.5778

3.4.3 Comparisons of the fixed parameters

A commonly used post hoc analysis is to compare either specific pairs of depths (resp.widths) or compare all combinations within each factor. For the former, a standard t-tests can be used, e.g.

t =β̂(1)− β̂(2)

SE(

β̂(1)− β̂(2))

using the error degrees of freedom (274). Or equivalently expressed by a 95% confidenceinterval:

β̂(1)− β̂(2)± t.975,274SE(

β̂(1)− β̂(2))

In this case, the estimates of the fixed effects are raw averages of the data based onthe same number of observations for each level, so the standard error of the differencebetween two depth levels is given by

SE(

β̂(1)− β̂(2))=√

2√

σ̂2/60

This means that two depth levels are claimed signifcantly different if they differ by morethan

t.975,274√

2√

σ̂2/60

from each other. This is also called the 95% Least Significant Difference (LSD) value.

It would be tempting to do such tests for all combinations of levels within each factor.This is generally NOT an acceptable approach, since the probability of ”significance-by-chance”becomes too large when many tests are performed simultaneously. This is calledthe ”multiplicity problem”. With five depth levels there are 5× 4/2 = 10 possible depthpairs to compare. Comparing two specific (decided before seeing the data) levels is notthe same as comparing the smallest among five with the largest among five. In a casewith no effects one would always expect the latter two to be more different by chancethan the former.

There are numerous solutions to properly handle this problem, if all comparisons in-deed are made. All of them amounts to requiring differences to be larger than required


by the usual t-test to be claimed significant. One general idea, that can be used whene-ver numerous tests are performed simultaneously, is the Bonferroni correction: If k testsare performed simultaneously, then use level α/k in each test rather than α. For instan-ce, if all depth levels are compared, standard pair-wise t-test output can be used, butemploying level 0.5% in each test rather than 5%: So only claiming those differencessignificant for which the usual P-value is less than 0.005. This method is known to besomewhat conservative, meaning that it may be too critical, or in other words again: itmay miss some actual differences.

Another solution is to use another distribution than the t-distribution, when compa-risons are made. With the so-called Tukey-Kramer method two depth levels would beclaimed signifcantly different if they differ by more than

ν.975,J,274

√σ̂2/60

from each other, where J is the number of groups to be compared and ν0.975,J,274 is the97.5%-quantile of the so-called “studentized range” distribution with J groups. Thisdistribution takes into account that the two levels that we compare in a single test iscoming from J groups all together. This distribution is, just like the t-distribution, ta-bulated or ”available”in the computer. Note that if J = 2, then the studentized rangedistribution corresponds to the t-distribution,

ν.975,2,274 = t.975,274√

2

The Tukey-adjusted results are:

Depth Parameter Estimate SE Lower Upper P-valuedifference1-3 β(1)− β(2) -1.1900 0.1162 -1.5090 -0.8710 <0.00011-5 β(1)− β(3) -1.4800 0.1162 -1.7990 -1.1610 <0.00011-7 β(1)− β(4) -1.1483 0.1162 -1.4673 -0.8294 <0.00011-9 β(1)− β(5) 0.06167 0.1162 -0.2573 0.3806 0.98413-5 β(2)− β(3) -0.2900 0.1162 -0.6090 0.02896 0.09433-7 β(2)− β(4) 0.04167 0.1162 -0.2773 0.3606 0.99643-9 β(2)− β(5) 1.2517 0.1162 0.9327 1.5706 <0.00015-7 β(3)− β(4) 0.3317 0.1162 0.01271 0.6506 0.03705-9 β(3)− β(5) 1.5417 0.1162 1.2227 1.8606 <0.00017-9 β(4)− β(5) 1.2100 0.1162 0.8910 1.5290 <0.0001

Note that since the P-values are ”corrected”, that is, based on the more proper studen-tized range distribution, they can be used directly without any additional Bonferronicorrection. Similarly for the width effect:


Width Parameter Estimate SE Lower Upper P-valuedifference1-2 α(1)− α(2) -0.2720 0.08997 -0.4840 -0.05998 0.00771-3 α(1)− α(3) 0.4150 0.08997 0.2030 0.6270 <0.00012-3 α(2)− α(3) 0.6870 0.08997 0.4750 0.8990 <0.0001

Frequently, the key information of the two tables for each effect is summarized into asingle table in which the lsmeans are ordered by size:

EstimateDepth 9 4.6533aDepth 1 4.7150aDepth 7 5.8633bDepth 3 5.9050bcDepth 5 6.1950c

The letter subscripts express the 5% significance results of the 10 pair-wise comparisons:

• Two depths sharing a subscript are NOT significantly different

• Two depths NOT sharing a subscript are significantly different

So the pattern already observed in Figure 3.2 can now be statistically confirmed: there isa clear lower humidity close to the top and the bottom (and no difference between topand bottom). Also there is an indication that the center position has significantly higherhumidity than the in between positions (between which no difference is seen).

For the width effect, the summary table becomes particularly simple, since all threedifferences are significant:

EstimateWidth 3 5.0990aWidth 1 5.5140bWidth 2 5.7860c

For these data, a figure of the raw data, like one of the bottom plots of figure 3.2 togetherwith a statement of the lack of significant width*depth interaction and the two summarytables would probably suffice for most purposes. In later modules we will see how

eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 13

additional plots of the model expectations/details will provide informative figures forinterpretation.

Other types (than the multiple comparison approach) of post hoc analysis may be em-ployed, especially when quantitative information about the factor levels are available.In this case we know exactly the positions that corresponds to the different widths anddepths and this could be used in the analysis. For instance, it could be investigatedwhether a quadratic function of the depths could be used to describe the humidity pat-tern. Apart from the nice direct functional interpretation of the dependence of humidityon depth, it could possibly provide more powerful tests for interaction effects. In factthis would still be a ”linear”model, and could be handled by lmer We will return tosuch analyzes in a later module. Non-linear models (using e.g. exponentials etc) couldalso be an option in some cases, but then the model will no longer be a linear model,and additional theory and packages would be needed.

The summary approach above was based on the assumption of no interaction betweenwidth and depth, that is, the conclusions regarding widths hold for all the depths, andvice versa. Had there been a significant interaction, we would have to present, say, thedepth effects for each of the three widths (and/or vice versa), since the significance tellsus that these three conclusions will NOT be the same. In practice, we proceed as above,BUT for the combined width*depth factor with 15 levels rather than for each of themseparately. We will see examples of this later.

One important step in the analysis given is missing: An investigation of the validity ofthe model assumptions! We return to this issue in Module 6, where we then finish theanalysis of this data set on the humidity of beech wood planks.

3.5 R-TUTORIAL: Creating report ready tables and figures

Since reports witout raw R-code or raw R-output are requested as well in this course asgenerally, it is useful to be able to apply some of the tools given in R to create nice tables(and figures) for LaTex and/or Word-based report writing.

3.5.1 Plot devices

First of all, there are different device functions for saving plots in various formats, e.g.to save a plot as a pdf, write:


pdf("myplanksinteractionplot.pdf")

with(planks, interaction.plot(depth,width,humidity,legend=F,col=2:4))

dev.off()

Or as a png: (you choose the extension of the output file yourself, but it is clearly highlyrecommended to choose the ”right” extension)

png("myplanksinteractionplot.png")


dev.off()

And similarly there are bmp and jpeg device functions. Plots can also be exported direct-ly from the plots-windows in Rstudio.

3.5.2 Plotting with colours

Colors can be specified in several different ways. And various plot functions may havevarious colour options for colouring different aspects of the plot. The simplest way tospecify a colour is with a character string giving the color name (e.g., ”red”). A list ofthe possible colors can be obtained with the function colours, write:

colors (distinct = FALSE)

to see all the possible choices. Have a look at this website to see what all these colourslook like, or go to: the QuickR website.

Even more easily you can use integers as colour codes. As a default R uses a palette of 8colours:

palette()

[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow"

[8] "gray"

which can then be refered to by the numbers 1-8. And then it would cycle modulus 8,meaning that using 9 would give black again.

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

http://www.statmethods.net/advgraphs/parameters.html


There are a number pre-defined palettes that can be used when more (and better) col-lection of colours are needed, e.g. functions hsv, rainbow and hsv, e.g. write:

?heat.colors

which then could be used e.g. as:

par(mfrow=c(2,2))

with(planks, interaction.plot(width,plank,humidity,legend=F,col=heat.colors(20)))

with(planks, interaction.plot(depth,plank,humidity,legend=F,col=terrain.colors(20)))

with(planks, interaction.plot(width,depth,humidity,legend=F,col=topo.colors(5)))

with(planks, interaction.plot(depth,width,humidity,legend=F,col=cm.colors(3)))

par(mfrow=c(1,1))

Or:

# Rainbow color

# you notice the value 10 is used to tell that you want 10 colors

# e.g. rainbow(10) gives 10 different colors. rainbow(5) gives 5 colors

par(mfrow=c(2,2))

with(planks, interaction.plot(width,plank,humidity,legend=F,col=rainbow(20)))

with(planks, interaction.plot(depth,plank,humidity,legend=F,col=rainbow(20)))

with(planks, interaction.plot(width,depth,humidity,legend=F,col=rainbow(5)))

with(planks, interaction.plot(depth,width,humidity,legend=F,col=rainbow(3)))

par(mfrow=c(1,1))

Or:

par(mfrow=c(2,2))

with(planks, interaction.plot(width,plank,humidity,legend=F,col=hsv(1:20/20)))

with(planks, interaction.plot(depth,plank,humidity,legend=F,col=hsv(1:20/20)))

with(planks, interaction.plot(width,depth,humidity,legend=F,col=hsv(1:5/5)))

with(planks, interaction.plot(depth,width,humidity,legend=F,col=hsv(1:3/3)))

par(mfrow=c(1,1))

3.5.3 Report ready tables with xtable

Nice tables can be produced by the xtable function of the xtable-package. An example:


means=as.matrix(with(planks, tapply(humidity,width,mean)))

xtable(means)

% latex table generated in R 3.2.1 by xtable 1.7-4 package

% Fri Sep 18 13:46:26 2015

\begin{table}[ht]

\centering

\begin{tabular}{rr}

\hline

& x \\

\hline

1 & 5.51 \\

2 & 5.79 \\

3 & 5.10 \\

\hline

\end{tabular}

\end{table}

And then when this tex-code is included in your tex-file it will appear in the report as:

x1 5.512 5.793 5.10

Note how the input to xtable was a matrix here. The function is prepared to recognizea number of different R-objects, see e.g.:

methods(xtable)

[1] xtable.anova* xtable.aov*

[3] xtable.aovlist* xtable.coxph*

[5] xtable.data.frame* xtable.glm*

[7] xtable.lm* xtable.matrix*

[9] xtable.prcomp* xtable.summary.aov*

[11] xtable.summary.aovlist* xtable.summary.glm*

[13] xtable.summary.lm* xtable.summary.prcomp*

[15] xtable.table* xtable.ts*

eNote 3 3.6 R-TUTORIAL: INITIAL EXPLORATIVE ANALYSIS 17

[17] xtable.zoo*

see ’?methods’ for accessing help and source code

For instance, ANOVA-tables will be recognized. So a LaTex-user can then copy thesetex-lines into the report .tex-document. Or to integrate the R-code into the tex-code, usethe knitR-package to create the pure tex-file from a .Rnw file, which is a kind of tex-filewith all the R-code integrated into it, with a lot of flexibility in controlling what will beshowed/evaluated etc in the output. This can be used for both raw code/results, tablesand figures.

A word user may also use xtable through the html-print-option:

print(xtable(means), type = "html")





<table border=1>

<tr> <th> </th> <th> x </th> </tr>

<tr> <td align="right"> 1 </td> <td align="right"> 5.51 </td> </tr>



</table>

And then print the table directly into a file:

print(xtable(means), type = "html", file = "myhtmltable.html")

Open the file in a browser and copy-paste to Word.

3.6 R-TUTORIAL: Initial explorative analysis

The data set planks is imported as described in R Module 1. Assume that the data set iscalled planks in R.

The plots in figure 3.2 in Module 3 are produced using the functioninteraction.plot which requires three arguments: first the factor that is to be on the x-axis, then the factor that separates the data into distinct graphs and finally the response


variable. An optional parameter legend which takes either FALSE (F) or TRUE (T) spe-cifies whether or not a legend should be added (relating the graphs to the factor levels)

par(mfrow=c(2,2))

planks <- read.table("planks.txt", header = TRUE, sep = ",")

with(planks, interaction.plot(width,plank,humidity,legend=F,col=2:11))

with(planks, interaction.plot(depth,plank,humidity,legend=F,col=2:11))

with(planks, interaction.plot(width,depth,humidity,legend=F,col=2:11))


Notice that the with{ ... } function around the interaction.plot statements resultsin evaluation of the statements within a frame where the data set planks is available.This approach avoids having to attach data sets.

To obtain all four plots in a two-by-two setup exactly like in figure 3.2, the statementpar(mfrow=c(2,2)) should be issued prior to the above with statements. As alreadymentioned in the R Module 1, the function par is used to set a variety of graphicalparameters (try typing ?par for details). The parameter mfrow is a vector of length twowhere the first component is the number of rows on the graphical device and the secondcomponent is the number of columns on the graphical device. To return to the defaultuse par(mfrow=c(1,1)).

3.7 Test of overall effects/model reduction

In the previous section we did not need to define factors (Module 2)to use interaction.plot,but now we do. Configure the three variables depth, plank and width as factors

planks$plank <- factor(planks$plank)

planks$depth <- factor(planks$depth)

planks$width <- factor(planks$width)

Analysis of models including random effects can be done using the lmre function inthe package lme4. The general model with fixed-effects structure consisting of the in-teraction between two factors and random effects assigned to the plank is specified asfollows


model1 <- lmer(humidity ~ depth*width +(1 | plank), data = planks)

Notice that the fixed-effects structure is specified as either ”depth+width+depth:width”or”depth*width”as more short used here - they give the same result. The relevant tests ofthe fixed-effects structure are obtained applying anova(model1) after making sure thelmerTest-package is available

require(lmerTest)

anova(model1)

Analysis of Variance Table of type III with Satterthwaite

approximation for degrees of freedom

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)

depth 126.388 31.5969 4 266 78.259 < 2.2e-16 ***

width 23.939 11.9696 2 266 29.646 2.381e-12 ***

depth:width 3.501 0.4377 8 266 1.084 0.3745

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Or using the xtable:

xtable(anova(model1))

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)depth 126.39 31.60 4.00 266.00 78.26 0.0000width 23.94 11.97 2.00 266.00 29.65 0.0000depth:width 3.50 0.44 8.00 266.00 1.08 0.3745

or using ANOVA from the car package:

require(car)

xtable(Anova(model1, test.statistic = "F", type = 3))

The interaction is not significant and a reduced model can be formulated

eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS20

F Df Df.res Pr(>F)(Intercept) 305.22 1 36 0.0000depth 32.39 4 266 0.0000width 3.63 2 266 0.0278depth:width 1.08 8 266 0.3745

model2 <- lmer(humidity ~ depth + width + (1 | plank), data = planks)

xtable(anova(model2))

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)depth 126.39 31.60 4.00 274.00 78.07 0.0000width 23.94 11.97 2.00 274.00 29.57 0.0000

Both factors are highly significant and no further reduction is possible.

3.8 R-TUTORIAL: Post hoc analysis and summarizing theresults

The so-called likelihood profile based confidence intervals for the two variance parame-ters are found as::

summary(model2)$varcor

Groups Name Std.Dev. plank (Intercept) 0.98980 Residual 0.63619

m2prof <- profile(model2,which=1:2)

xtable(confint(m2prof))

2.5 % 97.5 %.sig01 0.72 1.37

.sigma 0.58 0.69

As in R Module 1 we can use lsmeans to compute the estimated mean levels and theirdifferences:


require(lsmeans)

lsmeans::lsmeans(model2, pairwise ~ depth)

$lsmeans

depth lsmean SE df lower.CL upper.CL

1 4.715000 0.2360734 23.27 4.226963 5.203037

3 5.905000 0.2360734 23.27 5.416963 6.393037

5 6.195000 0.2360734 23.27 5.706963 6.683037

7 5.863333 0.2360734 23.27 5.375296 6.351370

9 4.653333 0.2360734 23.27 4.165296 5.141370

Results are averaged over the levels of: width

Confidence level used: 0.95

$contrasts

contrast estimate SE df t.ratio p.value

1 - 3 -1.19000000 0.1161519 274 -10.245 <.0001

1 - 5 -1.48000000 0.1161519 274 -12.742 <.0001

1 - 7 -1.14833333 0.1161519 274 -9.886 <.0001

1 - 9 0.06166667 0.1161519 274 0.531 0.9841

3 - 5 -0.29000000 0.1161519 274 -2.497 0.0943

3 - 7 0.04166667 0.1161519 274 0.359 0.9964

3 - 9 1.25166667 0.1161519 274 10.776 <.0001

5 - 7 0.33166667 0.1161519 274 2.855 0.0370

5 - 9 1.54166667 0.1161519 274 13.273 <.0001

7 - 9 1.21000000 0.1161519 274 10.417 <.0001

Results are averaged over the levels of: width

P value adjustment: tukey method for comparing a family of 5 estimates

lsmeans::lsmeans(model2, pairwise ~ width)

$lsmeans

width lsmean SE df lower.CL upper.CL

1 5.514 0.2302876 21.09 5.035212 5.992788

2 5.786 0.2302876 21.09 5.307212 6.264788

3 5.099 0.2302876 21.09 4.620212 5.577788

Results are averaged over the levels of: depth


Confidence level used: 0.95

$contrasts

contrast estimate SE df t.ratio p.value

1 - 2 -0.272 0.08997091 274 -3.023 0.0077

1 - 3 0.415 0.08997091 274 4.613 <.0001

2 - 3 0.687 0.08997091 274 7.636 <.0001

Results are averaged over the levels of: depth

P value adjustment: tukey method for comparing a family of 3 estimates

or used together with the xtable function:

print(xtable(summary(lsmeans::lsmeans(model2, pairwise ~ depth)$lsmeans)))

depth lsmean SE df lower.CL upper.CL1 1 4.72 0.24 23.27 4.23 5.202 3 5.91 0.24 23.27 5.42 6.393 5 6.20 0.24 23.27 5.71 6.684 7 5.86 0.24 23.27 5.38 6.355 9 4.65 0.24 23.27 4.17 5.14

print(xtable(summary(lsmeans::lsmeans(model2, pairwise ~ width)$lsmeans)))

width lsmean SE df lower.CL upper.CL1 1 5.51 0.23 21.09 5.04 5.992 2 5.79 0.23 21.09 5.31 6.263 3 5.10 0.23 21.09 4.62 5.58

The multcomp package also includes the so-called compact letter displays:

require(multcomp)

tuk2 <- glht(model2, linfct = mcp(depth = "Tukey"))

tuk.cld2 <- cld(tuk2)

tuk.cld2

1 3 5 7 9

"a" "bc" "c" "b" "a"


### use sufficiently large upper margin

old.par <- par(mai=c(1,1,1.25,1), no.readonly=TRUE)

plot(tuk.cld2, col=2:6)

1 3 5 7 9

34

56

78

depth

linea

r pr

edic

tor

a

b c

c

b

a

par(old.par)

tuk2 <- glht(model2, linfct = mcp(width = "Tukey"))

tuk.cld2 <- cld(tuk2)

tuk.cld2

1 2 3

"b" "c" "a"

### use sufficiently large upper margin


old.par <- par(mai=c(1,1,1.25,1), no.readonly=TRUE)

plot(tuk.cld2, col=2:6)

1 2 3

34

56

78

width

linea

r pr

edic

tor

b

c

a

par(old.par)

The lmerTest package also offers some differences of lsmeans posthoc analysis (based onthe Satterthwaite’s DF method) together with some plotting:

summodel2 <- step(model2,reduce.fixed = FALSE, reduce.random = FALSE)

## Tests for random effects

xtable(summodel2$rand.table)


Chi.sq Chi.DF p.valueplank 285.85 1 0.00

## Tests for fixed effects

xtable(summodel2$anova.table)

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)depth 126.39 31.60 4 274.00 78.07 0.00width 23.94 11.97 2 274.00 29.57 0.00

## LSMEANS table

names(summodel2$lsmeans.table)[4]="SE"

names(summodel2$lsmeans.table)[7]="LowCI"

names(summodel2$lsmeans.table)[8]="UppCI"

xtable(summodel2$lsmeans.table )

depth width Estimate SE DF t-value LowCI UppCI p-valuedepth 1 1 4.71 0.24 23.30 19.97 4.23 5.20 0.00depth 3 3 5.91 0.24 23.30 25.01 5.42 6.39 0.00depth 5 5 6.20 0.24 23.30 26.24 5.71 6.68 0.00depth 7 7 5.86 0.24 23.30 24.84 5.38 6.35 0.00depth 9 9 4.65 0.24 23.30 19.71 4.17 5.14 0.00width 1 1 5.51 0.23 21.10 23.94 5.04 5.99 0.00width 2 2 5.79 0.23 21.10 25.13 5.31 6.26 0.00width 3 3 5.10 0.23 21.10 22.14 4.62 5.58 0.00

## DIFF LSMEANS table

xtable(summodel2$diffs.lsmeans.table)

## Plots of all LSMEANS and DIFFLSMEANS:

plot(summodel2)


Estimate Standard Error DF t-value Lower CI Upper CI p-valuedepth 1 - 3 -1.19 0.12 274.00 -10.25 -1.42 -0.96 0.00depth 1 - 5 -1.48 0.12 274.00 -12.74 -1.71 -1.25 0.00depth 1 - 7 -1.15 0.12 274.00 -9.89 -1.38 -0.92 0.00depth 1 - 9 0.06 0.12 274.00 0.53 -0.17 0.29 0.60depth 3 - 5 -0.29 0.12 274.00 -2.50 -0.52 -0.06 0.01depth 3 - 7 0.04 0.12 274.00 0.36 -0.19 0.27 0.72depth 3 - 9 1.25 0.12 274.00 10.78 1.02 1.48 0.00depth 5 - 7 0.33 0.12 274.00 2.86 0.10 0.56 0.00depth 5 - 9 1.54 0.12 274.00 13.27 1.31 1.77 0.00depth 7 - 9 1.21 0.12 274.00 10.42 0.98 1.44 0.00width 1 - 2 -0.27 0.09 274.00 -3.02 -0.45 -0.09 0.00width 1 - 3 0.41 0.09 274.00 4.61 0.24 0.59 0.00width 2 - 3 0.69 0.09 274.00 7.64 0.51 0.86 0.00

depth width

−1

0

1

−0.5

0.0

0.5

1 −

31

− 5

1 −

71

− 9

3 −

53

− 7

3 −

95

− 7

5 −

97

− 9

1 −

2

1 −

3

2 −

3

levels

hum

idity

SignificanceNSp−value < 0.001p−value < 0.01p−value < 0.05

Using the generic plotting of LSMEANS and DIFFLSMEANS from the lmerTest-package

eNote 3 3.9 EXERCISES 27

like this has currently the (unfortunate) feaure that it ignores any definition of mfrow formultiple-plot-pr-page setting one might have, and simply lists the plots on a number ofpages with one plot pr. page.

3.9 Exercises

Exercise 1 Colour of spinage

Spinage heated to 90 or 100 degrees Celcius was vacuum packed and stored for 0, 1 or2 weeks before the packs were opened and chill stored in normal atmosphere for 0, 1 or2 days. Then the colour was measured on a Hunter Lab. Two of the colour coordinates,a and b (measuring respectively something like red and yellow colour), were recordedand are given in the data set below. The variable batch is a blocking variable referringto two batches of spinage. The data is available here and listed below:

Batch temp weeks days a b

A 90 0 0 -7.19 8.89

A 90 0 1 -7.17 9.11

A 90 0 2 -7.49 9.69

A 90 1 0 -7.43 9.97

A 90 1 1 -7.07 9.09

A 90 1 2 -7.16 9.19

A 90 2 0 -6.69 10.07

A 90 2 1 -6.80 9.13

A 90 2 2 -6.93 9.58

A 100 0 0 -7.54 9.09

A 100 0 1 -7.19 8.74

A 100 0 2 -7.11 8.63

A 100 1 0 -7.16 8.92

A 100 1 1 -7.23 8.89

A 100 1 2 -7.38 9.36

A 100 2 0 -5.28 10.41

A 100 2 1 -5.71 9.72

A 100 2 2 -7.35 10.10

B 90 0 0 -7.45 9.81

B 90 0 1 -7.53 9.52

B 90 0 2 -7.54 9.89

B 90 1 0 -6.88 9.35

../Data/spinage.txt


B 90 1 1 -7.16 9.55

B 90 1 2 -6.56 8.91

B 90 2 0 -7.07 10.39

B 90 2 1 -6.13 9.52

B 90 2 2 -6.63 9.43

B 100 0 0 -7.45 9.23

B 100 0 1 -7.75 9.18

B 100 0 2 -7.58 9.32

B 100 1 0 -7.10 8.97

B 100 1 1 -7.06 9.16

B 100 1 2 -6.93 9.08

B 100 2 0 -7.17 10.34

B 100 2 1 -7.30 9.99

B 100 2 2 -6.64 9.31

a) Write down all the factors relevant for the analysis, and their levels and mutu-al structure. Are they crossed or nested, for example? Make the factor structurediagram.

b) Analyse the effect of the different factors on the two colour measurements andsummarize the significant effects. (lsmeans etc)

Exercise 2 Sensory evaluation of spinage

In the spinage experiment from exercise 1 sensory evaluations were performed besidethe colour measurements. The treatments were still the same, so the factors were heatingtemperature, original storage (weeks), storage after opening (days), and batch.

The products from each treatment combination from each batch were assessed by (someof) 7 assessors who gave a score (between 0 and 15) for each of 6 different sensoryproperties (see the list further below).

There was one sesssion for each combination of batch and weeks, and at each sessionthe assessors evaluated the same 6 products (6 combinations of days and temperature).Note that not all assessors were present at all sessions.


The results, with one line per evaluation, are given in the order:weeks of storage, days after opening, batch, temperature, session number, assessor num-ber, and the six sensory properties hay flavour 1, hay flavour 2, hay taste, spinage fla-vour 1, spinage flavour 2, spinage taste.

The data is available here and listed partly below:

0 0 A 90 1 1 4.1 3.6 4.6 3.9 9.3 5

0 0 A 90 1 2 . . . . . .

0 0 A 90 1 3 . . . . . .

0 0 A 90 1 4 6 3.7 4.5 5.4 10.8 10.2

0 0 A 90 1 5 8.6 4.1 6.7 3.8 10 7.2

0 0 A 90 1 6 4.3 3.8 5.1 7.1 10.8 9.6

0 0 A 90 1 7 8.9 5.7 7 4.7 8.8 8.3

0 0 A 100 1 1 2.6 .8 6.2 2.7 8.7 6.3

0 0 A 100 1 2 . . . . . .

0 0 A 100 1 3 . . . . . .

0 0 A 100 1 4 6.1 2.5 4.6 6.4 11 11.3

0 0 A 100 1 5 5.9 6.5 5.5 8.7 8.4 7.2

0 0 A 100 1 6 3.8 2.8 3.7 4.9 10.7 8.9

0 0 A 100 1 7 10.4 4.3 7.1 3.3 7 8.6

0 0 B 90 4 1 3.5 4.3 6.7 4.1 9 10.6

0 0 B 90 4 2 . . . . . .

. . . . . . . . . (252 lines in total)

2 2 B 100 6 6 3.6 3.7 3.9 4.4 5.9 7.4

2 2 B 100 6 7 . . . . . .

a) Write down the factors relevant for the analysis, and their levels and mutual struc-ture. [You should include a production factor corresponding to the combinationsof temperature, weeks, days, and batch.]

b) Specify which effects you want to include in the model. Pay particular attentionto which interactions you want in the model. [Include at least some of the inte-ractions between assessor and treatment factors]. Which effects are random andwhich are fixed?

../Data/spinagesens.txt


c) Perform the analysis for one of the sensory properties and draw conclusions.

case study - technical university of denmark · enote 3 indhold 2 indhold 3 case study1 3.1...

Documents