mixer package for cda - - graphical display of three and four part (sub)compositions

22
MixeR Package for CDA - - graphical display of three and four part (sub)compositions Matevž Bren 1,2 Vladimir Batagelj 2,3 1 University of Maribor, Slovenia [email protected] 2 Institute of Mathematics, Physics and Mechanics, Slovenia 3 University of Ljubljana, Slovenia IAMG 2005, August 21-26, Toronto, Canada

Upload: natala

Post on 27-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Matevž Bren 1,2 Vladimir Batagelj 2,3 1 University of Maribor, Slovenia [email protected] 2 Institute of Mathematics, Physics and Mechanics, Slovenia 3 University of Ljubljana, Slovenia IAMG 2005, August 21-26, Toronto, Canada. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

MixeR Package for CDA -- graphical display of three and four part

(sub)compositions

Matevž Bren1,2

Vladimir Batagelj2,3

1University of Maribor, [email protected]

2Institute of Mathematics, Physics and Mechanics, Slovenia 3University of Ljubljana, Slovenia

IAMG 2005, August 21-26, Toronto, Canada

Page 2: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 2

Introduction

Groundwork on Compositional Data Analysis is the book of John Aitchison from 1986 The statistical Analysis of Compositional Data.

From the book we quote: “The properties of many substances or objects, such as

gasoline, metal alloys and cakes, depend on the particular mixture, or composition, of their ingredients. The purpose of the experiments with different mixtures is to obtain some understanding of the nature and extend of the dependence of the properties on the composition. In the analysis of such experiments the composition is confined to the role of a covariate.”

Page 3: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 3

Introduction…

Examlpe 1: Glacial data set - from Aitchison (1986) 92 samples of pebbles of glacial tills sorted into four

categories red sandstone, gray sandstone, crystalline and miscellaneous. The percentages by weight of these four categories and the total pebbles counts are recorded.

RedSandstone GraySandstone Crystalline Misc Counts1 91.8 7.1 1.1 0.0 2822 88.9 10.1 0.5 0.5 368... ... ...90 15.9 83.3 0.8 0.0 24591 16.9 74.3 1.2 5.9 57592 31.4 65.9 2.7 0.0 698

“The glaciologist is interested in describing the pattern of variability of his data and whether the compositions are in any way related to abundance.”

Page 4: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 4

Introduction…

Compositions (compounds, mixtures, alloy…) can be represented with vectors of the portions of individual components. The portions are nonnegative and they have constant sum equal to 100 (percentage) or 1 (portions).

The sample space for compositions is (unit) simplex SD

For D=3 graphically represented by a ternary diagram

For D=4 graphically represented by a tetrahedron

Page 5: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 5

Introduction…

Left: three parts compositions x=(x1, x2, x3) in ternary diagram

x1 + x2 + x3 =1

Right: four part compositions x=(x1, x2, x3 , x4,) in tetrahedron

x1 + x2 + x3 + x4 =1

Page 6: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 6

Introduction…

R at http://www.r-project.org

is `GNU S' - a language and environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering...). Further extensions can be provided as packages.

Page 7: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 7

Introduction…

In 2003 we started a MixeR project - library of functions in R to support the CDA i.e. statistical analysis of mixtures:

operations on compositions perturbation and power multiplication, subcomposition with or without residuals, computing Aitchison's, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc.

graphical presentation of three and four parts (sub)compositions in ternary diagrams and tetrahedrons with additional features: barycentre, geometric mean of the data set, the percentiles and ratio lines, marking and colouring of subsets of the data set, centring of the data, notation of individual data in the set etc.

logratio transformations of compositions into real vectors that are amenable to standard multivariate statistical analysis etc.

Page 8: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 8

Compositional Data Analysis SW tools

• CoDa 1986 by John Aitchison, written in Quick Basic available with the Aitchison’s book

• CoDa upgraded by John Bacon-Shone• CoDaPack 2001 freeware SW by Santiago Thió and

Raimon Tolosana in Excel http://ima.udg.es/Recerca/EIO/inici_cat.html

• atemps in R

- by Joel Raynolds and Dean Billheimer at http://www.biostat.wustl.edu/archives/html/s-news/2003-12/msg00139.html

Page 9: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 9

Compositional data analysis SW tools…

• MixeR 2003 by Batagelj and Bren at http://vlado.fmf.uni-lj.si/pub/MixeR

• ‘compositions’ package 2005, by K. Gerald van den Boogaart and Raimon Tolosana Delgado at http://cran.r-project.org/src/contrib/Descriptions/compo

sitions.html

Page 10: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 10

Mixture class in R

The input mixture data - object m consist of m$tit the title, m$mat the data matrix,m$sum the value of the row sums, if constant and m$sta status of the mix object with values -2 - matrix contains negative elements -1 - zero row sum exists 0 - matrix contains zero elements 1 - matrix contains positive elements, rows with different row sums 2 - matrix with constant row sum 3 - normalized mixture, the row sums are equal to 1

Page 11: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 11

Mixture class in R…

Example 1: The glacial mixture object > m <- mix.Read('glacial.dat')$tit[1] "GLACIAL DATA 92 samples of pebbles of glacial tills

sorted into four categories percentages by weight"$sum[1] NA$sta[1] 0$mat RedSandstone GraySandstone Crystalline Misc1 91.8 7.1 1.1 0.02 88.9 10.1 0.5 0.5... ... ...91 16.9 74.3 1.2 5.992 31.4 65.9 2.7 0.0attr(,"class")[1] "mixture"

Page 12: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 12

The 'mix' procedures in R mix.Read(file, eps=1e-6) Reads a mix data from the file and returns a mix object.

If |m$sum - 1|< eps it sets m$sta = 3mix.Check(m, eps=1e-6) Determines the m$sum and m$sta of a given mixture

object m.mix.Normalize(m, c=1) Normalizes a given mixture object m if m$sta > 0. The

rows sums are now normalized to the constant c with default value c=1.

mix.Random(nr, nc, s=1)Constructs the random mix object with nr rows and nc

columns and constant row sum s

Page 13: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 13

The 'mix' procedures in R…

Subcompositions of mixture objects mix.Sub(m, k, Normalize=TRUE) subcomposition of m without the k=(k_1,...,k_r)

columns normalized if Normalize=Tmix.Extract(m, k, Normalize=TRUE) subcomposition of m with only the k=(k_1,...,k_r)

columns normalized if Normalize=Tmix.ExtractRes(m, k) subcomposition with the k=(k_1,...,k_r) columns

all the rest is amalgamated in the residual output is the normalized mixture object with the r+1 columns

Page 14: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 14

The 'mix' procedures in R…

Visualization in ternary diagram routine

mix.Ternary(m,dist,distG,cls,Center, Borders,Gmean)

Draws ternary diagram with mixture data m

with additional features centered, borders percentile lines and geometric mean of the data.

The default value for Center, Borders and Gmean is FALSE.

dist - additional distances to numbers marking the percentile line,

distG - additional distances to numbers marking the percentile line of the geometric mean and

cls – colors of the percentile lines.

Page 15: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 15

The 'mix' procedures in R…

LEFT: The three part subcomposition with geometric mean

> mix.Ternary(mix.Sub(m,4),Gmean=T)

RIGHT: centered for better visualization of the differences between cases – border perc. lines for actual variation.

>mix.Ternary(mix.Sub(m,4),Borders=T,Center=T)

Page 16: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 16

The 'mix' procedures in R…

Visualization in tetrahedron routine

mix.Q2kin(fkin, m) transforms a 4 parts mixture m quadrays into 3-

dimensional XYZ coordinates and writes them as a file.kin.

The kin file we display as 3D animation with MAGE

viewer – free software available at http://kinemage.biochem.duke.edu/software/software1.html/#mage

Page 17: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 17

The 'mix' procedures in R…

Snapshots of glac.kin 3D MAGE view of tetrahedral display of glacial data – four parts compositions.

> mix.Q2kin(“glac.kin", m)

Page 18: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 18

The 'mix' procedures in R…

Percentile lines routinepercentile.lines(y, direction, cls, dist,lt)draws percentile lines into drown ternary diagram. y – percents or portions for percentile linesdirection - directionions for percentile lines, value 1,

percentile lines to the vertex No.1 = top, value 2, to the vertex No.2 = right, value 3, to the vertex No. 3 = left. The default value is direction = 1:3 (all directionions)

cls – is the vector with colours, first for percentile lines to the vertex No. 1, second … The default value is cls = c("yellow" , "yellow2", "yellow3")

dist – additional distances to numbers marking the percentile lines, first for perc. lines to the vertex No.1… The default value dist = c(0.05, 0.05, 0.05)

lt – is the vector with line types (values 1, 2,..., 10), first for…The default value lt = c(4,3,2)

Page 19: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 19

The 'mix' procedures in R…

Example 2mix object m with nine cases and three variables, i.e. 9x3matrix having 0.1 to 0.9 values in the first column, ratiosbetween the second and third being ½ $tit[1] "Deciles values in the first column"$sum[1] 1$sta[1] 3$mat aa bb cc1 0.1 0.30000000 0.600000002 0.2 0.26666670 0.533333303 0.3 0.23333330 0.46666670... ... ... 9 0.9 0.03333333 0.06666667attr(,"class")[1] "mixture"

Page 20: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 20

The 'mix' procedures in R…

We draw a ternary diagram with these nine points in different colours – cls, shapes – pch, and size cex=1

> cls <- c("khaki", "pink", "sienna", "tan", ...,"purple" )

> mix.Ternary(m, col=cls, pch=0:8, cex=1)

> perc.lines(10*1:9,dir=1, cls="cyan", lt=1)

Example 3

> mix.Ternary(mix.Random(22,3))

> perc.lines(10*1:9, cls=c("blue", "blueviolet", "violet"))

Page 21: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 21

The 'mix' procedures in R…

LEFT: Three parts compositions with deciles values in the first variable and constant ratios ½ between the second and the third variable – simulated data, deciles lines in the first direction

RIGHT: ternary diagram with random 22 points and deciles lines in all three directions.

Page 22: MixeR Package for CDA - - graphical display of three and four part (sub)compositions

IAMG 2005, August 21-26, Toronto 22

Conclusions

We have demonstrated some mix routines and features forvisualization of three and four parts (sub)compositions, available at http://vlado.fmf.uni-lj.si/pub/MixeR To provide a complementary use of ‘compositions’ packageand MixeR routines would be a most welcoming step.Therefore our future work would be to code transformationsroutines from the mix object to the objects of the fivedifferent classes: rplus, rcomp, acomp, aplus and multimplemented in ‘compositions’ package and of coursetransformations from the four classes to the mix objects.With these routines we hope to enable users to apply andto benefit from both, the ‘compositions’ package and alsothe MixeR library routines.