factor structure diagrams - technical university of … · 2 factor structure diagrams1 ... 2.5...

eNote 2 1

eNote 2

Factor Structure Diagrams

eNote 2 INDHOLD 2

Indhold

2 Factor Structure Diagrams 1

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2.1 Different notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.2 Crossed factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.3 Nested factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.4 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 The factor structure diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Example: Drying of beech wood . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Another way of making the diagram in R . . . . . . . . . . . . . . . 15

2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1 Introduction

In this module we will discuss some basics of general analysis of variance modeling.We will concentrate on writing up the models using a general notation and getting anoverview of the experimental design in a given situation by the use of factor structurediagrams.

eNote 2 2.2 FACTORS 3

2.2 Factors

In an experiment there are a number of experimental units, that we will assign num-bers 1, . . . , N. We consider the situation in which we have measured a variable Y (theresponse variable, sometimes called the dependent variable) for each experimental unit.We denote the results Y1,. . . ,YN. Hence, for the i’th experimental unit we write Yi, wherei = 1, . . . , N.

A factor partitions the experimental units in a number of disjoint groups, that all toget-her are the total set of experimental units. The groups are given names, that are the factorlevels. An example is the factor gender that partitions the experimental units (the indi-viduals) into two groups corresponding to the factor levels male and female. The factorlevel for the ith experimental unit is denoted factori, that is for the example either maleor female depending on the ith individual. Assume that we have 10 individuals in theexperiment, of which the first 5 are male and the last 5 are female. In the mathematicalway, this can be expressed by:

gender1 = male, . . . , gender5 = male,gender6 = female, . . . , gender10 = female.

In the mathematical sense a factor is a mapping that to each experimental unit attaches afactor level. The mapping is given by:

i 7→ factori,

and it maps the set of indices into the set of factor levels for the given factor.

Another example could be a fertilizer experiment with 12 plots of land that each isgiven a certain amount of fertilizer (eg. Nitrogen measured in hectokilogram pr. ha.) Inthe experiment is used 4 different amounts of fertilizer (60,80,100,120) and each of these4 treatments are given to 3 plots. In the experiment, we have the factor fertilizer withthe levels 60, 80, 100 and 120. The treatment of the first 3 plots could eg. be

fertilizer1 = 80, fertilizer2 = 120, fertilizer3 = 80,

and in general fertilizeri will denote the factor level (here the amount of fertilizer)corresponding to the ith plot.


Factors play a role in experimental design, model formulation and statistical analysisof the experimental data. A linear normal model for the results Yi from the ”genderexperiment”is

Yi = µ(genderi) + εi,

where ε1, . . . , εN are independent normally distributed with the same variance.

2.2.1 Different notations

The notation in e.g. Littell et al. (1996) and other classical presentations of simple ANOVAmodels for this situation would be somewhat different. First of all, the observationswould be indexed as Yij where i = 1, 2 corresponding to male/female and j = 1, . . . , 5corresponding to the number of individuals within each gender group. The model wouldthen be expressed as:

Yij = µ + αi + εij, j = 1, . . . , 5, i = 1, 2

This notation is probably the more used in statistical textbooks, but it has its limitations.So for the general treatment of factors in this note, we will use the (more general) wayof writing up models presented above, but occasionally add the more conventional wayof writing up the models (however, with a less precise explanation of meaning of theindex, individual model terms etc) to facilitate the link to other text books.

The model for the ”gender experiment”corresponds to a one-way analysis of variancewith two groups, where the two expected values are, respectively, µ(male) and µ(female).Often only the expected value of the observations (the systematic part) is given in a sta-tement of the model:

EYi = µ(genderi).

For the sake of simplicity the index i may sometimes be omitted:

EY = µ(gender).

For the fertilizer experiment let us assume that we want to use a simple linear regres-sion, where the response variable (eg. yield) is modelled as a linear function of the amo-unt of fertilizer:

EYi = α + β · fertilizeri,

or justEY = α + β · fertilizer,


where α and β are the two parameters of the model, that must be estimated from theexperimental data. Note the difference to the model for the ”Gender experiment”: Forthe latter, µ(·) denotes a function of the factor level, and the function can attain the twodifferent values µ(male) and µ(female). For the fertilizer model, the figure fertilizer

is multiplied by the coefficient b. In the SAS procedures GLM or MIXED the differencebetween the two types of effects is that gender is listed in a class statement whereasfertilizer is not.

Note that the index of the experimental units also is the level of a factor, that is, thefactor that has a different level for each experimental unit. We denote this factor I andnote that it has N levels.

2.2.2 Crossed factors

Corresponding to crossed factors in an experiment, we will now define the concept of aproduct factor. This is used in experiments with several factors and is closely related tothe concept of interactions between factors.

Let F and G be two factors in an experiment. The product factor of F and G is called F× G

and for the ith experimental unit it has level (Fi, Gi).

Consider now a simple experiment on storage of vegetables. Each experimental unit isstored in either normal or modified atmosphere and at one of the three temperatures,5, 10 or 15 centigrades (celcius). We have two factors, one with 2 levels and one with 3levels: The factor atm with the levels norm and mod, and the factor temp with the levels 5,10, 15. The product factor atm× temp has six levels:

(norm, 5), (norm, 10), (norm, 15),(mod, 5), (mod, 10), (mod, 15).

This assumes, however, that all six combinations are present in the study. If, for in-stance, only the two lowest temperatures are used for normal atmosphere whereas allthree temperatures are used for the modified atmosphere, then there is only 5 differentcombinations in the experiment. In this case the product factor only has 5 levels. Theexperiment is a 2× 3 factorial design and in most cases the product factor will have 6levels, which (partly) is the reason for the name the product factor.

The product factor is a natural part of a model with several factors that possibly interactwith each other. In the previous example with the two factors atm and temp we canformulate a model with a main effect of each of the two factors and with a combined


effect:EY = α(atm) + β(temp) + γ(atm× temp),

where the last term also could be written γ(atm, temp) since it is a function of the twofactor levels. The model is the usual model for two-way ANOVA. If the last part ofthe model is left out we have a model with only the two main effects, that is, with nointeraction effect. A test for interaction can hence be carried out by testing whether thelast term can be ignored.

The two-way ANOVA model may also be written as

EY = γ(atm× temp),

where the main effects are omitted in the model expression. This does not mean thatthe main effects are not present in the model, they are now a part of the product fa-ctor. In fact, this corresponds to defining a new factor with the levels 1-6, one for eachcombination of atm and temp and then regard the situation as a one-way ANOVA.

Note that in the model expression each term corresponds to an (unknown) function ofthe levels of the specific factor: The term β(temp) attaches a value to each of the factorlevels 5, 10 or 15; this value being β(5), β(10) or β(15). Hence it is seen that when a factoris in the model the expected value has a term that is a function of the factor level. Theterm corresponding to the product factor allows correspondingly the expected valueto depend on the combination of the levels of the two factors. The conventional modelexpression for the two-way ANOVA is: (if in addition there are replications)

Yijk = µ + αi + β j + γij + εijk

where Yijk is the kth replication within the (i, j)th combination of the two factors.

2.2.3 Nested factors

As the final concept regarding relations between factors we define the following: A fa-ctor F is said to be finer than another factor G if the partitioning of the experimental unitsinto groups according to F’s levels is a sub-partitioning of the groups obtained by parti-tioning of the experimental units into groups according to G’s levels. If for experimentalunit i the factor level Fi is known, then you also know the factor level Gi. Finally, if F isfiner than G, then G is said to be coarser than F.

In an experiment the weight of piglets after 7 days were recorded. A number of pigletsfrom different sows (that each enters with one litter) is weighed. There are a number of


sows from each of a number of different breeds. We have here the factors breed and sow.We could also consider the factor piglet, but since we only have one measurement foreach piglet this factor would be identical to the factor I, that is, with the experimentalunit. In this experiment the factor sow is finer than the factor breed, since each sowbelongs to exactly one breed. On the other hand there are several sows for each breed.You also use the expression that sow is nested within breed. Yet another expression isthat the breed effect is contained within the sow effect.

In the experiment with the factors atm and temp from above none of the two factorsare finer than the other. For each temperature there are several levels of atm, that is,both norm and mod. Analogously, each atmosphere does not belong to one particulartemperature. But the product factor atm× temp is finer than both atm and temp, sinceif we know the combination of the two factors we also know what the level is for eachindividual factor.

2.2.4 Balance

In many situations experiments are designed such that there are the same number ofexperimental units for each treatment. A factor is called balanced, if there are the samenumber of experimental units for each level of the factor. If, for instance, we have thesame number of men and women in an experiment, the factor gender is balanced. Thetwo-way setup above is usually called balanced if each combination of the two factorsoccurs equally often, that is, if the product factor atm× temp is balanced. Then also eachof the factors atm and temp will be balanced. On the other hand, each of the two factorscan be balanced without the product factor being balanced – try yourself to constructan example!

In an experiment certain factors may be balanced while others are not. Some possiblepleasant consequences of (a certain degree of) balance are that:

• ANOVA tables are unique (TYPE I tables equal TYPE III tables, cf. the discussionin Module 1).

• Estimates of fixed effects levels equal their raw means.

• Estimates of fixed effect levels do not change even though random effects are con-sidered fixed.

• In many cases, tests for the significance of fixed and/or random effects in themixed model become simple F-tests.

eNote 2 2.3 THE FACTOR STRUCTURE DIAGRAMS 8

If an experiment is balanced across all levels of all factors in an experiment, then we ob-tain all of these simplifying effects. However, one or more of these simplifications mayoccur without this restrictive assumptions. We will NOT try to define formally, whichsituations will lead to which simplifications. In general, the use of a proper mixed modelsoftware package will ensure that we proceed in a sensible way. From an interpretatio-nal point of view, the simplications may help, since small deviations from balance dueto a few missing values etc. are quite common in practice.

We will now introduce a special kind of diagram that for many practical situations canmake a complicated experimental design more easy to grasp.

2.3 The factor structure diagrams

When a linear model is formulated it must be decided which effects to include in themodel (at least from the beginning). First of all the factors used for the design of theexperiment is listed. In the storage experiment, we had the factors atm and temp. Letus be more specific and assume that all of the six treatments (combinations of the twofactors) are used, each on 5 experimental units. It seems natural to include the productfactor in the model. So we have the three factors:

atm, temp, atm× temp.

A possible conclusion of the analysis could be that the expected value of the respon-se variable does not depend on neither temperature nor atmosphere. In this case theexpected value is constant - corresponding to the term µ in the conventional modelexpression. What factor does this amount to in the model? It must be a factor with onlyone level corresponding to only one value, µ. The answer is the constant factor called 0,where the factor level is 0 for all experimental units. All N experimental units belong tothe same group for this factor, ie. this factor is coarser than any other factor. Note thatfactors I that partitions each experimental unit into its own group consisting of onlythis one experimental unit, is the opposite extreme that is finer than all other factors.The model term corresponding to the factor I is the residual term, εi, that may take dif-ferent values for all experimental units. So with the factors 0 and I we have 5 factors inthe storage experiment, that is

0, atm, temp, atm× temp, I.

These factors correspond to the complete model expression

Yi = µ + α(atmi) + β(tempi) + γ(atm× tempi) + εi, i = 1, . . . , 30,


where µ could be expressed as µ(0), and where we note that the term εi is differentfrom the other terms in that it is a (normally distributed) random variable; the other 5terms are (unknown) functions, whose values exactly are those parameters we want toestimate. As above the conventional model expression for the two-way ANOVA maybe written as:

Yijk = µ + αi + β j + γij + εijk

where Yijk is the kth replication within the (i, j)th combination of the two factors.

With respect to the structural relations among the factors we can say that atm and temp

are both coarser than the product factor, and that all factors are finer than 0 and coarserthan I, The structure is better viewed in a factor structure diagram, see figure below.

This vertical version of the diagram was made in R using the diagram package:

require(diagram, warn.conflicts = FALSE, quietly = TRUE)

## Creating the list of factor names with indices:

names=c(expression("[I]"[24]^{30}),expression(atm:temp[2]^{6}),expression(atm[1]^{2}),expression(temp[2]^{3}),expression(0[1]^{1}))

## Since there are 5 factors create the 5x5 matrix of zeros:

M <- matrix(nrow = 5, ncol = 5, byrow = TRUE, data = 0)

## Envision the structure: e.g. I need an arrow from my first

## name to my second name so assign something to M[2,1] etc:

M[2, 1] <- M[3,2] <- M[4, 2] <- M[5,3] <- M[5,4] <- ""

## Make the diagram:

plotmat(M, pos = c(1, 1, 2, 1), name = names, lwd = 2,

box.lwd = 1, cex.txt = 1, box.size = 0.1,

box.type = "square", box.prop = 0.5, arr.type="triangle",curve=0)


[I]2430

atm : temp26

atm12 temp2

3

011

Or to make the diagram horisontally: First a function to turn the vertical positions intohorisontals:

matrix_position = function(pos_vec){n = sum(pos_vec) # rows

m = length(pos_vec)-2 # inner layers

d_hori = 0.8/(m+1)

bot=0.1; mid = 0.5;

top=0.9

pos_mat = matrix(nrow = n, ncol=2)

pos_mat[1,1] = bot

pos_mat[n,1] = top

pos_mat[1,2] = pos_mat[n,2] = mid

cum_pos = cumsum(pos_vec)

for (i in 1:m) {n_vert = pos_vec[i+1]

d_vert = 0.8/(n_vert+1)

for (j in 1:n_vert){


pos_mat[cum_pos[i]+j,2] = 0.1+j*d_vert

pos_mat[cum_pos[i]+j,1] = 0.1+i*d_hori

}}return(pos_mat)

}

And then doing the diagram with this new function as ”filter”:

plotmat(M, pos = matrix_position(c(1, 1, 2, 1)), name = names, lwd = 2,



[I]2430 atm : temp2

6

atm12

temp23

011

There would be many ways of making this visually much better with colours etc. Toread more about how to do this, you can check the Vignette from the diagram package.

Apart from the indices figures at each factor, we see that the factor diagram consists of

http://cran.r-project.org/web/packages/diagram/vignettes/diagram.pdf

eNote 2 2.4 RANDOM EFFECTS 12

all the factors in the model together with a number of arrows connecting some of thefactors. Furthermore brackets are added around the factor I.

The arrows show the relations among the factors by going from a finer factor to a coarserone. For example from the product factor to temp. In principle all such arrows are drawn,except that arrows that can be substituted by two arrows, are not used, for instance, thearrow from I to temp is not drawn, since we have the arrows from I to the product factorand from the product factor to temp that together show that temp is coarser than I. Notethat it is convenient to arrange the diagram such that the coarsest factors are on one side(eg. to the top as shown or often also to the right).

The upper index figure at each factor in the diagram depicts the number of factor levelsfor the factor in question. The lower index figure is the number degrees of freedomfor the factor. Although the primary use of the diagram is to provide an overview ofthe structure in the experiment, we will give the (rather easy) method to compute thedegrees of freedom (at least for ”nice”sitations): Start to the right/top in the diagramand write 1 as the lower index of 0. Continue towards the left/don, for instance to atm.Here the degrees of freedom are calculated as the upper index (2) minus the sum of thedegrees of freedom (the lower indices) from all the factors the is coarser than (to theright of) atm. In this case only 0 is coarser such that we get 2− 1 = 1 as result. Similarlywe get 3− 1 = 2 for temp, and for the product factor we get 6− 1− 2− 1 = 2.

2.4 Random effects

The bracket around the factor I means that this factor enters the model as a randomvariable, that is, it is a factor with random effect. Sums of squares corresponding to factorswith random effects can enter the numerator in F-tests in analysis of variance, and wewill see later how the factor diagram can provide the overview regarding which (fixed)effects should be tested against which random effects.

In a model having factors with random effects some of the F-tests from the conventio-nal analysis of variance should be modified. In ”nice cases”the modification amountsto using another denominator than the residual variation. For instance, in a (balanced)split-plot experiment, the whole-plot factor should be tested versus the whole plot va-riation: the denominator of the F-test becomes the whole plot mean square. To find outwhich effects should be tested against which the factor structure diagram can be used.This method, however, will only be strictly valid, when the experiments are ”nice”(in acertain way, that we will not define exactly here). It is therefore always a good idea touse a computer software that automatically performs the tests correctly.

eNote 2 2.5 EXAMPLE: DRYING OF BEECH WOOD 13

The principle is that each (fixed) factor should be tested against the coarsest random factorthat is finer than the factor in question. This is called the error stratum. The candidates arehence the random effect factors finer than the factor in question, that is, all the randomeffects that can be found by moving leftwards/upwards in the diagram starting fromthe factor in question. If there is more than one candidate, we check whether one ofthem is coarser than the rest. If so, this is the denominator of the F-test. If it is not thecase that a single factor is coarser than the rest, then the F-test cannot be performed bydividing by a single mean square and other methods are called upon.

2.5 Example: Drying of beech wood

To investigate the effect of drying of beech wood on the humidity percentage, the fol-lowing experiment was conducted. Each of 20 planks was dryed in a certain period oftime. Then the humidity percentage was measured in 5 depths and 3 widths for eachplank:

depth 1: close to the topdepth 5: in the centerdepth 9: close to the bottomdepth 3: between 1 and 5depth 7: between 5 and 9

width 1: close to the sidewidth 3: in the centerwidth 2: between 1 and 3

So there are 3 · 5 = 15 measurements for each plank and all together 300 observations.The data can be found as planks and is reproduced in the following table.

http://www2.compute.dtu.dk/courses/02429/Data/datafiles/planks.txt


Width 1 Width 2 Width 3Depth Depth Depth

Planks 1 3 5 7 9 1 3 5 7 9 1 3 5 7 91 3.4 4.9 5.0 4.9 4.0 4.1 4.7 5.2 4.6 4.3 4.4 4.8 5.0 4.9 4.22 4.3 5.5 6.2 5.4 4.7 3.9 5.6 5.7 5.5 4.9 4.0 4.7 4.5 3.9 4.03 4.2 5.5 5.6 6.3 4.5 5.4 6.2 6.1 6.4 5.2 4.5 4.9 4.9 4.9 4.44 4.4 6.0 7.1 6.9 4.6 4.6 6.1 6.6 6.5 4.7 4.9 5.9 5.8 6.4 4.75 3.9 4.7 5.2 5.0 3.7 4.2 5.2 5.4 4.8 3.9 4.0 4.4 4.4 4.1 3.56 4.6 5.9 6.3 5.8 4.8 5.9 7.3 6.9 6.9 4.4 5.2 5.7 6.6 6.0 4.07 3.9 5.6 6.0 5.3 5.0 4.9 6.9 7.1 6.1 4.5 4.3 5.4 5.9 5.5 4.28 3.9 4.5 5.3 5.6 4.7 3.7 4.9 4.8 4.9 4.3 3.8 4.5 5.4 4.8 4.09 3.6 4.1 4.0 4.4 3.7 3.8 5.1 5.0 4.6 3.3 3.0 3.9 4.7 4.9 3.8

10 6.5 8.7 9.5 7.9 6.6 6.9 8.9 7.4 7.0 6.9 5.8 7.5 7.7 7.3 5.911 3.7 5.2 5.5 5.9 4.4 4.7 5.8 5.7 4.9 4.2 3.7 5.0 6.3 5.2 4.312 4.3 5.8 6.2 5.2 4.4 4.8 6.7 7.0 6.1 5.2 5.1 5.7 5.9 6.4 5.113 6.5 8.8 9.1 8.9 6.0 5.9 7.5 8.4 7.9 5.7 4.0 4.2 4.9 4.6 3.514 4.4 6.2 6.7 6.4 4.3 5.7 7.0 7.4 7.3 5.5 4.6 6.2 6.8 5.8 4.915 5.5 7.1 7.5 6.9 5.4 6.4 8.4 8.9 8.1 6.1 6.5 8.4 9.1 9.2 7.516 5.2 6.0 6.2 6.6 5.3 6.6 7.6 7.8 7.7 5.8 5.9 6.7 6.7 5.0 3.917 3.7 4.5 5.0 4.5 3.7 3.7 4.4 4.8 4.4 4.3 3.7 4.5 4.7 5.3 3.918 6.0 7.4 7.8 7.5 5.7 6.9 8.6 8.8 7.5 5.4 5.1 6.1 5.2 5.4 4.719 3.8 4.6 4.8 4.4 3.8 3.7 4.7 4.7 4.3 3.7 3.3 3.5 3.7 3.4 3.220 6.1 7.4 7.7 6.7 4.6 4.7 6.3 7.1 6.5 5.1 4.7 6.0 6.0 6.3 4.2

In this experiment we have 3 factors apart from the trivial factors I and 0. Let us usethe factor names plank, width and depth. The factor plank has 20 levels, width has 3and depth has 5 levels. For the ith measurement of humidity, planki denotes the plankon which this measurement was performed. And correspondingly widthi and depthidenotes the width and depth, respectively, of this ith measurement. It would be naturalto include the interaction between width and depth corresponding to the product factorwidth× depth. The product factor has in this case 15 levels.

A natural model would include plank as a block factor while depth and width entertogether with their interaction. If Yi denotes the humidity percentage corresponding tothe ith measurement, the model with fixed block effect can be written as:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + δ(planki) + εi, (2-1)

where i = 1, . . . , 300 and where the εis are independent and normally distributed ran-dom variables. Or similarly:

Yijk = µ + αi + β j + γij + δk + εijk


where Yijk is the kth measurement within the (i, j)th combination of the two factors,i = 1, . . . , 3, j = 1, . . . , 5 and k = 1, . . . , 20. As pointed out in Module 1 the block (plank)effect should be considered as a random effect, leading to the mixed model:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + d(planki) + εi, (2-2)

where d(planki) ∼ N(0, σ2Plank) and εijk ∼ N(0, σ2). This model corresponds to the

factor structure diagram given in figure 2.1.

In the following Chapter this is taken as the starting point for a complete analysis of thisdata set.

2.5.1 Another way of making the diagram in R


names=c(expression("[I]"[24]^{30}),expression(atm:temp[2]^{6}),expression(atm[1]^{2}),expression(temp[2]^{3}),expression(0[1]^{1}))

## Define the positions of all the terms in a, say, (0-10)-by-(0-10) grid:

x <- c(2, 4, 6, 6, 8)

y <- c(5, 5, 3, 7, 5)

## Make the basic plot without the arrows:

plot(NA, NA, xlim=c(0,10), ylim=c(0,10), type="n", axes=F, xlab="", ylab="")

text(x,y,names)

# Either define and add the arrows directly, e.g.:

x0 <- c(2, 4, 4, 6, 6)+.5

y0 <- c(5,5,5,3,7)

x1 <- x0+1

y1 <- c(5,3,7,5,5)

arrows(x0, y0, x1, y1, length=0.1)

It may be challenging to define the proper starting and end positions for all the arrows.Alternatively use the following function that is built on the locator function:



names=c(expression("[I]"[266]^{300}),expression(depth:width[8]^{15}),expression("[plank]"[19]^{20}),expression(width[2]^{3}),expression(depth[4]^{5}),expression(0[1]^{1}))

## Since there are 6 factors create the 6x6 matrix of zeros:

M <- matrix(nrow = 6, ncol = 6, byrow = TRUE, data = 0)

## Envision the structure: e.g. I need an arrow from my first

## name to my second name so assign something to M[2,1] etc:

M[2, 1] <- M[3, 1] <- M[4, 2] <- M[5, 2] <- "."

M[6, 3] <- M[6, 4] <- M[6, 5] <- "."

## Make the diagram:

plotmat(M, pos = c(1, 1, 3, 1), name = names, lwd = 2,



.

.

. .

. ..

[I]266300

depth : width815

[plank]1920 width2

3 depth45

011

Figur 2.1: The factor structure diagram

"draw.arrows" <-

function(n){


ret<-locator(2*n)

o<-(1:(2*n))%%2==1; e<-(1:(2*n))%%2==0

x0<-ret$x[o]; y0<-ret$y[o]; x1<-ret$x[e]; y1<-ret$y[e];

return(list(x0=x0,y0=y0,x1=x1,y1=y1))

}

Using this function you can use the mouse to position the arrows as you want: Clicktwice in the graphics device for each wanted arrow: the start andn the end of the arrow.E.g. for the the five needed arrows in this case write:


text(x,y,names)

arr<-draw.arrows(5)

Do the mouse clicking – in this case 10 clicks to defined 5 arrows, and afterwards youhave the positions saved to be used to add the arrows to the plot:

arrows(arr$x0,arr$y0,arr$x1,arr$y1,length=.1)

Having done this in the R graphics device, you can subsequently use the saved arrowpositions to make a pdf and/or png-version of the diagram:

pdf("genfig/mydiagram.pdf")


text(x,y,names)

arrows(arr$x0,arr$y0,arr$x1,arr$y1,length=.1)

dev.off()

eNote 2 2.6 EXERCISES 18

[I]266300 depth : width8

15

[plank]1920

width23

depth4501

1

2.6 Exercises


Exercise 1 Sensory evaluation of cookies

Consider the cookies data from Exercise 2 in Module 1.

a) Write down all the factors relevant for the analysis, and their levels and mutu-al structure. Are they crossed or nested, for example? Make the factor structurediagram.

b) Assume now that assessors are a random effect (and hence also the assessor*treatmentinteraction), and analyse the four variables wrt. the treatment effect. What appearsto be the difference to the fixed effects analysis carried out in exercis 1.2.

c) Assume that the treatments 46,. . . ,50 belong to one group and the treatments 51,. . . ,55belong to another group. Test the hypothesis that these two groups on average areequal. And estimate the difference between the groups (including a confidenceband).

Exercise 2 Maillard reaction in milk powder

In an experiment with production of milk powder the effect of water activity and tem-perature on the formation of maillard reaction products was investigated. There were9 treatment combinations of the two factors and three replicates (blocks) of the experi-ment giving a total of 27 productions. The factors and levels were: water activity (ap-prox. 0.15, 0.25 and 0.10, coded as 1, 2, 3 in the data set), and temperature (100 C, 110 C,120 C, 140 C).

The 27 samples were stored and measurements were made after 4, 6 and 8 weeks. Themeasurements (response variables) were: concentration of maillard reaction products(which may give a bad taste), and sensory evaluation of taste (high = good taste). Thedata is available here and is listed below:

http://www2.compute.dtu.dk/courses/02429/Data/datafiles/cookies.txt

http://www2.compute.dtu.dk/courses/02429/Data/datafiles/milk.txt


water temp rep maill4 maill6 maill8 taste4 taste6 taste8

1 100 1 2.90 2.13 2.39 10.1 10.0 9.6

1 100 2 2.19 2.20 2.27 11.0 9.3 11.0

1 100 3 2.13 2.20 2.41 10.1 7.0 9.6

1 110 1 2.13 2.34 2.41 11.0 10.5 9.8

1 110 2 2.32 2.27 2.25 11.0 11.3 11.2

1 110 3 2.13 2.34 2.42 9.4 10.7 9.0

1 120 1 2.00 2.49 2.71 11.1 11.2 11.4

1 120 2 2.41 2.49 2.46 11.6 11.7 9.6

1 120 3 2.22 2.49 2.73 10.7 10.3 10.2

2 100 1 2.13 2.41 2.49 11.1 10.8 11.2

2 100 2 2.49 2.34 2.53 11.1 11.2 9.2

2 100 3 2.80 2.63 3.33 8.3 9.7 7.8

2 120 1 2.38 2.85 2.06 11.9 11.2 11.2

2 120 2 2.61 2.70 2.70 11.7 10.8 11.0

2 120 3 2.77 3.06 3.25 10.9 9.0 9.4

2 140 1 2.56 2.84 3.10 10.7 11.2 9.8

2 140 2 2.63 2.61 2.81 10.8 11.0 11.6

2 140 3 2.99 3.28 3.75 9.2 9.6 9.6

3 100 1 2.60 2.24 2.32 10.8 8.4 10.8

3 100 2 2.06 2.11 2.20 11.0 11.2 11.8

3 100 3 1.98 2.34 2.80 10.3 10.2 10.6

3 110 1 1.91 2.06 2.29 11.0 11.4 9.4

3 110 2 1.98 1.98 2.15 10.0 11.8 10.6

3 110 3 1.98 2.51 2.81 9.3 9.2 10.2

3 140 1 2.27 2.42 2.72 10.8 11.6 12.0

3 140 2 2.27 2.20 2.41 11.2 11.0 11.4

3 140 3 2.20 2.77 3.06 10.5 10.2 10.0

a) Make the factor structure diagram corresponding to the analysis of (say) the mail-lard reaction after 4 weeks.

b) Analyse the effect of the factors on the maillard reaction products (for ONE of thethree maillard reaction variables: maill4,maill6,maill8). (Including a summaryof the significant effects). Some useful R-lines are:

milk <- read.table("milk.txt", header = TRUE, sep = ",")

milk$water <- factor(milk$water)


milk$temp <- factor(milk$temp)

milk$rep <- factor(milk$rep)

lmer(maill4 ~ water*temp + (1|rep),data=milk)

For this part do a complete analysis and reporting:

1. Data/structure description (cf. above)

2. Explorative plotting

3. Modelling: testing/reducing (and a presentation thereoff in the report)

4. Presentation of final results/findings, post-hoc tables/plots

c) Assume for a moment that the measurements from the three storage times werefrom different productions (such that there were 81 productions in total). So nowthere is only two variables but instead 81 observations. The data is available onthat form here. Also see the description in Enote13

For these two variables construct the factor structure diagram (including the factorstorage and possible interactions) and sketch a plan for possible explorative plotsand suggets a good starting model (largest possible model) for the analysis to bedone for this data. (You are not supposed to actually do anything on this data)

http://www2.compute.dtu.dk/courses/02429/Data/datafiles/milk2.txt

http://www2.compute.dtu.dk/courses/02429/Data/datafiles/milk2.txt

http://02429.compute.dtu.dk/enote/afsnit/NUID193/


Remark 2.1 Report tips, part 1

Please have a look at the report tips document shared in CampusNet or read throughthe tips here:

• All R (or SAS) code should be in an appendix arranged in a readable way witha sufficient amount of comments

• It is often good to do an explorative analysis via plots, tables and summarymeasures before the actual analysis is carried out.

• All plots and tables should be discussed in the text.

• The report should not include excessive material – it counts down, not up.

• Subsidiary plots and tables can be placed in an appendix.

• Be aware of the number of significant digits you report. Usually no more than3 significant digits should be reported. Standard errors often give an indicationabout the uncertainty and can be used as a guide.

• Pay attention to module 3. As it says, it is a standard for the reports you willbe handing in. This means that tables of expected means, parameter estimates,confidence intervals etc. is part of the results – just like in module 3. The xtable-package in R is useful for this purpose.

• The description of data should be presented in a way that is clear and easyto comprehend and preferably in a table or bulleted list. It should contain in-formation on whether the variable is categorical or continuous; the numberof levels, possibly the actual levels; fixed or random; balanced or unbalanced;nested or crossed structures; and possibly the number degrees of freedom.

• Think of yourself as a consulting statistician when writing the reports: If youare doing essentially the same analysis several times, only give the detailsabout how you arrived at the first one, and then summarize the results of allanalyses in tables or figures to give a better overview.


Remark 2.2 Report tips, part 2

• Your client will not care about which commands in some arbitrary softwareyou use to get the results, so code should be part of the appendix. Your clientshould have enough information to replicate your analyses and have a thirdparty validate your results.

• It is a good idea to rephrase the question with your own words so that yourclient (and I) can see how you understand the problem. This also makes anexcellent introduction.

• It is important to clearly state the mathematical models that you use includingdistributional assumptions, assumptions of independence etc.

• It is quite valuable if you can briefly reflect on your results in the context of thedata/exercise.

• For those of you who are familiar with LaTeX, there is a very nice option tomerge R-code with text. Basically all text and R-code are contained in the sa-me .Rnw document. This is processed by the Sweave or knitr program thatproduces the .tex file that can be LaTeX’ed to produce the final document.

factor structure diagrams - technical university of … · 2 factor structure diagrams1 ... 2.5...

Documents