package ‘phylomeasures’ - the comprehensive r … ‘phylomeasures’ january 14, 2017 ......

46
Package ‘PhyloMeasures’ January 14, 2017 Type Package Title Fast and Exact Algorithms for Computing Phylogenetic Biodiversity Measures Version 2.1 Date 2017-1-14 Author Constantinos Tsirogiannis [aut, cre], Brody Sandel [aut] Maintainer Constantinos Tsirogiannis <[email protected]> Description Given a phylogenetic tree T and an assemblage S of species represented as a subset of tips in T, we want to compute a measure of the diversity of the species in S with respect to T. The current package offers efficient algorithms that can process large phylogenetic data for several such measures. Most importantly, the package includes algorithms for computing efficiently the standardized versions of phylogenetic measures and their p-values, which are essential for null model comparisons. Among other functions, the package provides efficient computation of richness-standardized versions for indices such as the net relatedness index (NRI), nearest taxon index (NTI), phylogenetic diversity index (PDI), and the corresponding indices of two-sample measures. The package also introduces a new single-sample measure, the Core Ancestor Cost (CAC); the package provides functions for computing the value and the standardised index of the CAC and, more than that, there is an extra function available that can compute exactly any statistical moment of the measure. The package supports computations under different null models, including abundance-weighted models. License GPL-3 Imports ape, methods LazyLoad yes SystemRequirements C++11 NeedsCompilation yes Repository CRAN Date/Publication 2017-01-14 01:13:19 1

Upload: letram

Post on 10-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

Package ‘PhyloMeasures’January 14, 2017

Type Package

Title Fast and Exact Algorithms for Computing PhylogeneticBiodiversity Measures

Version 2.1

Date 2017-1-14

Author Constantinos Tsirogiannis [aut, cre], Brody Sandel [aut]

Maintainer Constantinos Tsirogiannis <[email protected]>

Description Given a phylogenetic tree T and an assemblage S of species represented asa subset of tips in T, we want to compute a measure of the diversityof the species in S with respect to T. The current package offersefficient algorithms that can process large phylogenetic data for several such measures.Most importantly, the package includes algorithms for computingefficiently the standardized versions of phylogenetic measures and their p-values, which areessential for null model comparisons. Among other functions,the package provides efficient computation of richness-standardized versionsfor indices such as the net relatedness index (NRI),nearest taxon index (NTI), phylogeneticdiversity index (PDI), and the corresponding indices of two-sample measures.The package also introduces a newsingle-sample measure, the Core Ancestor Cost (CAC); the package providesfunctions for computing the value and the standardised index of the CAC and,more than that, there is an extra function available that can compute exactlyany statistical moment of the measure. The package supports computationsunder different null models, including abundance-weighted models.

License GPL-3

Imports ape, methods

LazyLoad yes

SystemRequirements C++11

NeedsCompilation yes

Repository CRAN

Date/Publication 2017-01-14 01:13:19

1

Page 2: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

2 PhyloMeasures-package

R topics documented:PhyloMeasures-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2cac.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4cac.pvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6cac.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8cbl.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10cbl.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11cd.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13cd.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14cdnt.averaged.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16cdnt.directed.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19cdnt.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21mntd.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23mntd.pvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25mntd.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27mpd.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29mpd.pvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31mpd.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33pd.moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35pd.pvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37pd.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39phylosor.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42unifrac.query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Index 46

PhyloMeasures-package PhyloMeasures: Fast Computations of Phylogenetic BiodiversityMeasures

Description

The package supports fast biodiversity computations on large phylogenetic data. More specifically,the package provides functions for computing the stdandardized values and p-values of several phy-logenetic biodiversity measures. The measures which are supported in the current version of thepackage are: the Phylogenetic Diversity (PD), the Mean Pairwise Distance (MPD), the Mean Near-est Taxon Distance (MNTD), the Core Ancestor Cost (CAC), the Common Branch Length (CBL),the Community Distance (CD), the Community Distance Nearest Taxon (CDNT), the PhylogeneticSorensen’s Similarity (PhyloSor), and the Unique Fraction (UniFrac).

Details

Package: PhyloMeasuresType: PackageVersion: 2.1Date: 2017-1-14License: GPL-3

Page 3: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

PhyloMeasures-package 3

The package contains three types of functions; functions that compute the (standardized) valueof a phylogenetic biodiversity measure (query functions), functions that compute the p-values ofa measure (p-value functions), and functions that compute the statistical moments of a measure(moment functions).

For the single sample measures, the computation of the standardized values, the statistical moments,and of the p-values can be done under three different null models. We refer to these models as uni-form, frequency by richness, and sequential. All these models maintain species richness, and thelatter two models are abundance-weighted. The strong advantage of this package is the very effi-cient algorithms that support the provided functions. These algorithms make it possible to processvery large phylogenies in reasonable time even on a standard computer. For more details and ex-perimental results, the reader can refer to the main paper for this package, but also the other worksthat introduce the efficient algorithms used in the package functions.

Author(s)

Constantinos Tsirogiannis and Brody Sandel

Maintainer: Constantinos Tsirogiannis <[email protected]>

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A Package for Computing PhylogeneticBiodiversity Measures and Their Statistical Moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C. and B. Sandel. 2016. Fast Computation of Measures of Phylogenetic Beta Diver-sity. PLoS ONE, 11(4): e0151167, doi:10.1371/journal.pone.0151167.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient Computation of Popular PhylogeneticTree Measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New Algorithms for Computing PhylogeneticBiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

Webb, C.O. 2000. Exploring the Phylogenetic Structure of Ecological Communities: An Examplefor Rain Forest Trees. The American Naturalist 156: 145-155.

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Use query function to calculate pd values for each communitypd.query(bird.families,comm)

#Use query function to calculate standardised versions under the uniform modelpd.query(bird.families,comm,TRUE)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))

Page 4: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

4 cac.moments

names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelpd.query(bird.families,comm,TRUE,null.model="sequential",

abundance.weights=weights, reps=1000)

cac.moments Computes the statistical moments of the Core Ancestor Cost measure

Description

Calculates the mean and standard deviation of the Core Ancestor Cost (CAC) for a tree and a vectorof tip set sizes. The means and deviations can be calculated under two different null models whichmaintain species richness. When the "uniform" null model is used, the function can calculate thefirst k statistical moments of this measure. The CAC is calculated by identifying the node in thetree that is the most recent common ancestor of at least chi proportion of the tips in the set, wherechi is an input parameter larger than 0.5. The CAC is the distance of this node from the root of thetree.

Usage

cac.moments(tree, chi, sample.sizes, k=2,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

chi A number in the interval (0.5,1]

sample.sizes A vector of non-negative integers specifying the tip set sizes for which to calcu-late moments

k A positive integer specifying the number of moments to compute (default = 2).If the "sequential" model is selected, the only values that can be used for thisargument are either one or two.

null.model A character vector (string) that defines which null model is used for computingthe moments of the measure. There are two possible null models that can beused for computing the moments: these are "uniform" and "sequential". Bothmodels maintain species richness. More specifically, the available models aredefined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is "uniform".

Page 5: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cac.moments 5

abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe "uniform" model is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if the "uniform" model is selected.

Value

A k-column matrix with length(sample.sizes) rows. Entry [i,j] in the matrix gives the j-th momentfor the i-th sample size in sample.sizes . The first moment in each row is the mean, and for j largerthan one, the j-th returned moment is the raw statistical moment of order j. Given a phylogenetictree T, a sample size r and a value chi, the raw statistical moment of order j for the CAC is definedas:

ER∈Sub(T,r)[CACχ(T,R)k],

where ER∈Sub(T,r) denotes the expectation of a random variable for all tip sets in T that consist ofr tips each.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

cac.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

# Calculate first four raw moments under the uniform modelcac.moments(bird.families,0.75,1:100,k=4)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate mean and variance under the sequential modelcac.moments(bird.families,0.75,1:100,k=2,

Page 6: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

6 cac.pvalues

null.model="sequential", abundance.weights=weights, reps=1000)

cac.pvalues Computes the p-values of the Core Ancestor Cost measure

Description

Calculates the p-values of the core ancestor cost (CAC) measure for sets of tips on a phylogeny.The p-values can be calculated under two different null models which maintain species richness.The CAC is calculated by identifying the node in the tree that is the most recent common ancestorof at least chi proportion of the tips in the set, where chi is an input parameter larger than 0.5. TheCAC is the distance of this node from the root of the tree.

Usage

cac.pvalues(tree, matrix, chi, null.model="uniform",abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

chi A number in the interval (0.5,1]

null.model A character vector (string) that defines which null model is used for computingthe p-values of the measure. There are two possible null models that can be usedfor computing the p-values: these are "uniform" and "sequential". Both modelsmaintain species richness. More specifically, the available models are defined asfollows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

Page 7: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cac.pvalues 7

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe other null model is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if the other null model is selected.

Value

A vector which stores the computed CAC p-values. The i-th entry in this vector stores the CACp-value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

cac.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

chi=0.6

#Calculate p-values under the uniform modelcac.pvalues(bird.families,comm,chi, reps=1000)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate p-values under the sequential modelcac.pvalues(bird.families,comm,chi,null.model="sequential",

abundance.weights=weights, reps=1000)

Page 8: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

8 cac.query

cac.query Computes the (standardized) value of the Core Ancestor Cost measure

Description

Calculates the Core Ancestor Cost (CAC) for sets of tips on a phylogeny. The CAC is calculated byidentifying the node in the tree that is the most recent common ancestor of at least chi proportion ofthe tips in the set. The CAC is the distance of this node from the root of the tree. The same functioncan also calculate the standardized value of this measure under three different null models whichmaintain species richness.

Usage

cac.query(tree, matrix, chi, standardize = FALSE,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

chi A number in the interval (0.5,1]

standardize Specifies whether the function should standardize the CAC value according tothe variation in species richness. For each input tip set R, the CAC value for Ris standardized by subtracting the mean CAC and dividing by the standard de-viation of this measure. The mean and standard deviation are calculated amongall tip sets that have the same number of elements as set R (default = FALSE)

null.model A character vector (string) that defines which null model is used for computingthe standardized values of the measure. There are two possible null modelsthat can be used for computing the standardized values: these are "uniform",and "sequential". Both models maintain species richness. More specifically, theavailable models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is set to "uniform".

Page 9: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cac.query 9

abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe "uniform" model selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if this model is not selected.

Value

A vector which stores the computed (standardized) CAC values. The i-th entry in this vector storesthe CAC value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

cac.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate cac values for each communitycac.query(bird.families,comm,0.8)

#Calculate standardized versions under the uniform modelcac.query(bird.families,comm,0.8,TRUE)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))

Page 10: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

10 cbl.moments

names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelcac.query(bird.families,comm,0.8,TRUE,null.model="sequential",

abundance.weights=weights, reps=1000)

cbl.moments Computes the moments of the Common Branch Length measure underthe uniform null model

Description

Calculates the mean and standard deviation of the Common Branch Length (CBL) on a given treefor several pairs of tip set sizes. These calculations consider equal (uniform) probability among allpossible tip samples of the same richness.

Usage

cbl.moments(tree, sample.sizes, comp.expectation = TRUE,comp.deviation = TRUE)

Arguments

tree A phylo tree object

sample.sizes A two-column matrix of non-negative integers indicating the tip set sizes forwhich the moments should be calculated. The moments are calculated for eachrow of the matrix. Let x and y be the values that are stored in the i-th rowof sample.sizes. For this row, the CBL moments are calculated considering allpairs of tip sets that can be extracted from the input tree such that one set has xelements and the other set has y elements.

comp.expectation

Specifies whether the function computes the mean of the CBL (default = TRUE)

comp.deviation Specifies whether the function computes the standard deviation of the CBL (de-fault = TRUE)

Value

If both comp.expectation and comp.deviation are TRUE, the function returns a two-column ma-trix with one row per element in sample.sizes; the first column of the output matrix stores themean CBL and the second column stores the standard deviation of this measure. If only one ofcomp.expectation or comp.deviation is TRUE, the function returns a vector with the correspondingvalues instead.

Author(s)

Constantinos Tsirogiannis ([email protected])

Page 11: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cbl.query 11

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

Tsirogiannis, C. and B. Sandel. In prep. Fast computation of measures of phylogenetic beta diver-sity.

See Also

cbl.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

cbl.moments(bird.families,sample.sizes = expand.grid(1:10,1:10))

cbl.query Computes the (standardized) value of the Common Branch Lengthmeasure

Description

Calculates the Common Branch Length (CBL) given paired sets of tips on a phylogeny. The Com-mon Branch Length is the beta diversity version of Phylogenetic Diversity (PD), giving the totalbranch length shared between two communities. The same function can also calculate the stan-dardized value of this measure for the given tip sets. The standardized calculations consider equal(uniform) probability among all possible tip samples of the same richness.

Usage

cbl.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL, standardize = FALSE)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

Page 12: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

12 cbl.query

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the CBL values. Each row in query.matrix indicates apair of tip sets for which we want to compute the CBL value. Let k and r be thevalues that are stored in the i-th row of query.matrix, where k is the value storedin the first column and r is the value stored in the second column. If matrix.bis given, the function computes the CBL value between the k-th row of matrix.aand the r-th row of matrix.b. If matrix.b is not given, the function computes theCBL value between the k-th and r-th row of matrix.a (default = NULL)

standardize Specifies whether the function should return the standardized value of the CBLfor each sample pair. The value is standardized by subtracting the mean anddividing by the standard deviation of CBL. The mean and standard deviation arecalculated among all tip sets that have the same number of elements as the twosamples (default = FALSE)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the CBL values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the CBL values for all combinations of a row in matrix.a withrows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the CBL values forthe pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b are both given,CBL values are computed for the rows in matrix.a specified by the first column of query.matrixagainst the rows in matrix.b specified in the second column of query.matrix.

Value

The CBL values for the requested pairs of tip sets. If query.matrix is provided, then the values arereturned in an one-dimensional vector. The i-th element of this vector is the CBL value for the pairof tip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, the CBL valuesare returned in a matrix object; entry [i,j] in the output matrix stores the CBL value between the tipsets specified on the i-th and j-th row of matrix.a (if matrix.b is not specified), or the CBL valuebetween the i-th row of matrix.a and the j-th row of matrix.b (if matrix.b is specified)

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

Tsirogiannis, C. and B. Sandel. In prep. Fast computation of measures of phylogenetic beta diver-sity.

See Also

cbl.moments

Page 13: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cd.moments 13

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate all pairwise CBL values for communities in commcbl.query(bird.families,comm)

#Calculate pairwise CBL values from#the first two rows of comm to all rowscbl.query(bird.families, comm[1:2,],comm)

#Calculate the CBL from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)cbl.query(bird.families,comm,query.matrix = qm)

#Calculate standardized versionscbl.query(bird.families,comm, standardize = TRUE)

cd.moments Computes the moments of the Community Distance measure under theuniform null model

Description

Calculates the mean and standard deviation of the Community Distance (CD) on a given tree forseveral pairs of tip set sizes. These calculations consider equal (uniform) probability among allpossible tip samples of the same richness.

Usage

cd.moments(tree, sample.sizes, comp.expectation = TRUE,comp.deviation = TRUE)

Arguments

tree A phylo tree object

sample.sizes A two-column matrix of non-negative integers indicating the tip set sizes forwhich the moments should be calculated. The moments are calculated for eachrow of the matrix. Let x and y be the values that are stored in the i-th rowof sample.sizes. For this row, the CD moments are calculated considering allpairs of tip sets that can be extracted from the input tree such that one set has xelements and the other set has y elements.

Page 14: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

14 cd.query

comp.expectation

Specifies whether the function returns the mean of the CD (default = TRUE)

comp.deviation Specifies whether the function returns the standard deviation of the CD (default= TRUE)

Value

If both comp.expectation and comp.deviation are TRUE, the function returns a two-column matrixwith one row per element in sample.sizes, such that the first column stores the mean CD and thesecond column stores the standard deviation of this measure. If only one of comp.expectation orcomp.deviation is TRUE, the function returns a vector with the corresponding values instead.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

Tsirogiannis, C. and B. Sandel. In prep. Fast computation of measures of phylogenetic beta diver-sity.

See Also

cd.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

cd.moments(bird.families,sample.sizes = expand.grid(1:10,1:10))

cd.query Computes the (standardized) value of the Community Distance mea-sure

Description

Calculates the Community Distance (CD) given paired sets of tips on a phylogeny. The CommunityDistance is the beta diversity version of Mean Pairwise Distance (MPD), giving the average phylo-genetic distance between two communities. The same function can also calculate the standardizedvalue of this measure for the given tip sets. The standardized calculations consider equal (uniform)probability among all possible tip samples of the same richness.

Page 15: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cd.query 15

Usage

cd.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL, standardize = FALSE)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the CD values. Each row in query.matrix indicates a pairof tip sets for which we want to compute the CD value. Let k and r be the valuesthat are stored in the i-th row of query.matrix, where k is the value stored inthe first column and r is the value stored in the second column. If matrix.b isgiven, the function computes the CD value between the k-th row of matrix.a andthe r-th row of matrix.b. If matrix.b is not given, the function computes the CDvalue between the k-th and r-th row of matrix.a (default = NULL)

standardize Specifies whether the function should return the standardized value of the CDfor each sample pair. The value is standardized by subtracting the mean anddividing by the standard deviation of the CD. The mean and standard deviationare calculated among all tip sets that have the same number of elements as thetwo samples (default = FALSE)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the CD values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the CD values for all combinations of a row in matrix.a withrows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the CD valuesfor the pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b are bothgiven, CD values are computed for the rows in matrix.a specified by the first column of query.matrixagainst the rows in matrix.b specified in the second column of query.matrix.

Value

The CD values for the requested pairs of tip sets. If query.matrix is provided, then the values arereturned in an one-dimensional vector. The i-th element of this vector is the CD value for the pair oftip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, the CD values arereturned in a matrix object; entry [i,j] in the output matrix stores the CD value between the tip setsspecified on the i-th and j-th row of matrix.a (if matrix.b is not specified), or the CD value betweenthe i-th row of matrix.a and the j-th row of matrix.b (if matrix.b is specified)

Page 16: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

16 cdnt.averaged.query

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

Tsirogiannis, C. and B. Sandel. In prep. Fast computation of measures of phylogenetic beta diver-sity.

See Also

cd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate all pairwise CD values for communities in commcd.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowscd.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)cd.query(bird.families,comm,query.matrix = qm)

#Calculate standardized versionscd.query(bird.families,comm,standardize = TRUE)

cdnt.averaged.query Computes the value of the averaged Community Distance NearestTaxon measure

Page 17: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cdnt.averaged.query 17

Description

Calculates the averaged Community Distance Nearest Taxon (aCDNT) given paired sets of tips ona phylogeny. The aCDNT is a beta diversity version of MNTD. It is computed based on the valuesof the directed CDNT (dCDNT); let A and B be two tip sets in the input tree. Let dCDNT(A,B) bethe directed CDNT from A to B, and let dCDNT(B,A) be the directed CDNT from B to A. Let also|A| denote the number of elements in A, and |B| the number of elements in B. The aCDNT betweenthese two tip sets is equal to:

|A| · dCDNT(A,B) + |B| · dCDNT(B,A)

|A|+ |B|

The version of this function that computes the standardised value of aCDNT is not yet available.

Usage

cdnt.averaged.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the aCDNT values. Each row in query.matrix indicatesa pair of tip sets for which we want to compute the aCDNT value. Let k andr be the values that are stored in the i-th row of query.matrix, where k is thevalue stored in the first column and r is the value stored in the second column.If matrix.b is given, the aCDNT value is computed between the k-th row ofmatrix.a and the r-th row of matrix.b. If matrix.b is not given, the aCDNT valueis computed between the k-th and r-th row of matrix.a (default = NULL)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the aCDNT values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the aCDNT values for all combinations of a row in matrix.awith rows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the aCDNTvalues for the pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b areboth given, aCDNT values are computed for the rows in matrix.a specified by the first column ofquery.matrix against the rows in matrix.b specified in the second column of query.matrix

Page 18: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

18 cdnt.averaged.query

Value

The aCDNT values for the requested pairs of tip sets. If query.matrix is provided, then the valuesare returned in an one-dimensional vector. The i-th element of this vector is the aCDNT value forthe pair of tip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, theaCDNT values are returned in a matrix object; entry [i,j] in the output matrix stores the aCDNTvalue between the tip sets specified on the i-th and j-th row of matrix.a (if matrix.b is not specified),or the aCDNT value between the i-th row of matrix.a and the j-th row of matrix.b (if matrix.b isspecified)

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

See Also

cdnt.directed.query, cdnt.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate all pairwise aCDNT values for communities in commcdnt.averaged.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowscdnt.averaged.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)cdnt.averaged.query(bird.families,comm,query.matrix = qm)

Page 19: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cdnt.directed.query 19

cdnt.directed.query Computes the value of the directed Community Distance NearestTaxon measure

Description

Calculates the directed Community Distance Nearest Taxon (dCDNT) given paired sets of tips ona phylogeny. The dCDNT is a beta diversity version of MNTD, giving the mean distance on thetree of tips in a set A to their nearest neighbors in a set B. Note that the measure is asymmetrical,as for taxon t in A the nearest neighbor in B may be taxon s, but for taxon s the nearest neighbor inA might be a third taxon u. Therefore, for every input pair of tip sets A and B, the function returnstwo values; the dCDNT value from A to B, and the dCDNT value from B to A. The version of thisfunction that computes the standardised value of the measure is not yet available.

Usage

cdnt.directed.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the dCDNT values. Each row in query.matrix indicatesa pair of tip sets for which we want to compute the two dCDNT values. Let kand r be the values that are stored in the i-th row of query.matrix, where k is thevalue stored in the first column and r is the value stored in the second column.If matrix.b is given, the function computes both the dCDNT value from the tipset in the k-th row of matrix.a to the tip set in the r-th row of matrix.b, and thedCDNT value from the tip set in the r-th row of matrix.b to the tip set in thek-th row of matrix.a . If matrix.b is not given, the two values computed are thedCDNT from the k-th to the r-th set of matrix.a, and the dCDNT from the r-thto the k-th set of matrix.a (default = NULL)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the two dCDNT values for each pair of rows in matrix.a . If matrix.b is given but notquery.matrix, the function computes the two dCDNT values for each combination of a row in ma-trix.a with a row in matrix.b . If query.matrix is given and matrix.b is not, the function returns

Page 20: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

20 cdnt.directed.query

two dCDNT values for each pair of rows in matrix.a specified by query.matrix. If query.matrixand matrix.b are both given, two dCDNT values are computed for each pair of a row in matrix.aspecified by the fist column of query.matrix with a row in matrix.b specified in the second columnof query.matrix.

Value

The dCDNT values for the requested pairs of tip sets. If query.matrix is provided, then the valuesare returned in a list that contains two elements; each element is a one-dimensional vector storingdCDNT values. Let k and r be the values stored in the i-th row of query.matrix. For the first vectorin the returned list, the i-th element of this vector is the dCDNT value from the tip set of the k-throw in matrix.a to the tip set of the r-th row in matrix.a (if matrix.b is not specified), or to the tipset of the r-th row of matrix.b (if this matrix is specified). For the second vector of the output list,the i-th element of this vector is the dCDNT value from the tip set of the r-th row in matrix.a (ifmatrix.b is not specified), or from the tip set of the r-th row of matrix.b (if this matrix is specified)to the tip set of the k-th row in matrix.a.

If query.matrix is not provided and matrix.b is provided, the function returns a list which consistsof two matrix objects; for the first matrix in this list, entry [i,j] stores the dCDNT value from thetip set specified on the i-th row of matrix.a to the tip set in the j-th row of matrix.b. For the secondmatrix in the output list, entry [i,j] stores the dCDNT value from the tip set specified on the j-th rowof matrix.b to the tip set in the i-th row of matrix.a.

If neither query.matrix nor matrix.b are provided, the function returns a matrix object such thatmatrix entry [i,j] stores the dCDNT value from the tip set specified on the i-th row of matrix.a tothe tip set in the j-th row of this matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

See Also

cdnt.query, cdnt.averaged.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

Page 21: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

cdnt.query 21

#Calculate all pairwise dCDNT values for communities in commcdnt.directed.query(bird.families,comm)#Note that this matrix is asymmetrical, compare to:cdnt.query(bird.families,comm)cdnt.averaged.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowscdnt.directed.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)cdnt.directed.query(bird.families,comm,query.matrix = qm)

cdnt.query Computes the value of the maximised Community Distance NearestTaxon measure

Description

Calculates the maximised Community Distance Nearest Taxon (mCDNT) given paired sets of tipson a phylogeny. The mCDNT is a beta diversity version of MNTD, and provides the distance onthe tree between nearest neighbors in sets A and B. It is derived by calculating the directed CDNT(dCDNT) from set A to set B, and from set B to set A, and taking the maximum of these two values.A version of this function that computes the standardised value of the measure is not yet available.

Usage

cdnt.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the mCDNT values. Each row in query.matrix indicatesa pair of tip sets for which we want to compute the mCDNT value. Let k andr be the values that are stored in the i-th row of query.matrix, where k is the

Page 22: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

22 cdnt.query

value stored in the first column and r is the value stored in the second column.If matrix.b is given, the mCDNT value is computed between the k-th row ofmatrix.a and the r-th row of matrix.b. If matrix.b is not given, the mCDNT valueis computed between the k-th and r-th row of matrix.a (default = NULL)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the mCDNT values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the mCDNT values for all combinations of a row in matrix.awith rows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the mCDNTvalues for the pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b areboth given, mCDNT values are computed for the rows in matrix.a specified by the first column ofquery.matrix against the rows in matrix.b specified in the second column of query.matrix

Value

The mCDNT values for the requested pairs of tip sets. If query.matrix is provided, then the valuesare returned in an one-dimensional vector. The i-th element of this vector is the mCDNT value forthe pair of tip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, themCDNT values are returned in a matrix object; entry [i,j] in the output matrix stores the mCDNTvalue between the tip sets specified on the i-th and j-th row of matrix.a (if matrix.b is not specified),or the mCDNT value between the i-th row of matrix.a and the j-th row of matrix.b (if matrix.b isspecified)

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

See Also

cdnt.directed.query, cdnt.averaged.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

Page 23: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mntd.moments 23

#Calculate all pairwise mCDNT values for communities in commcdnt.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowscdnt.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)cdnt.query(bird.families,comm,query.matrix = qm)

mntd.moments Computes the moments of the Mean Nearest Taxon Distance measure

Description

Calculates the mean and standard deviation of the Mean Nearest Taxon Distance (MNTD) for a treeand a vector of tip set sizes. The means and deviations can be calculated under three different nullmodels which maintain species richness. Note: this function returns a result for the "uniform" and"frequency.by.richness" models only if the input tree is ultrametric.

Usage

mntd.moments(tree, sample.sizes, comp.expectation = TRUE, comp.deviation = TRUE,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

sample.sizes A vector of non-negative integers specifying the tip set sizes for which to calcu-late moments

comp.expectation

Specifies whether the function should compute the mean (default = TRUE)

comp.deviation Specifies whether the function should compute the standard deviation (default =TRUE)

null.model A character vector (string) that defines which null model is used for computingthe moments of the measure. There are three possible null models that can beused for computing the moments: these are "uniform", "frequency.by.richness",and "sequential". All these models maintain species richness. More specifically,the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, each

Page 24: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

24 mntd.moments

species is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".

abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Value

If both comp.expectation and comp.deviation are TRUE, the function returns a two-column matrixwith one row per element in sample.sizes, where the first column stores the mean MNTD and thesecond column stores the standard deviation for this sample size. If only one of comp.expectationor comp.deviation are TRUE, the function returns a vector with the corresponding values instead.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example forrain forest trees. The American Naturalist 156: 145-155.

See Also

mntd.query

Page 25: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mntd.pvalues 25

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

# Calculate mean and variance under the uniform modelmntd.moments(bird.families,1:100)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate mean and variance under the sequential modelmntd.moments(bird.families,1:100,

null.model="sequential", abundance.weights=weights, reps=1000)

mntd.pvalues Computes the p-values of the Mean Nearest Taxon Distance measure

Description

Calculates the p-values of the mean nearest taxon distance (MNTD) measure for sets of tips on aphylogeny. The p-values can be calculated under two different null models which maintain speciesrichness.

Usage

mntd.pvalues(tree, matrix, null.model="uniform",abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

null.model A character vector (string) that defines which null model is used for computingthe p-values of the measure. There are two possible null models that can be usedfor computing the p-values: these are "uniform" and "sequential". Both modelsmaintain species richness. More specifically, the available models are defined asfollows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

Page 26: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

26 mntd.pvalues

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe other null model is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if the other null model is selected.

Value

A vector which stores the computed MNTD p-values. The i-th entry in this vector stores the MNTDp-value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetictree measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

mntd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

Page 27: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mntd.query 27

#Calculate p-values under the uniform modelmntd.pvalues(bird.families,comm, reps=1000)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate p-values under the sequential modelmntd.pvalues(bird.families,comm,null.model="sequential",

abundance.weights=weights, reps=1000)

mntd.query Computes the (standardized) value of the Mean Nearest Taxon Dis-tance measure

Description

Calculates the Mean Nearest Taxon Distance (MNTD) for sets of tips on a phylogeny. The samefunction can also calculate the standardised value of this measure under three different null modelswhich maintain species richness (the standardised value is equal to minus one times the so-calledNearest Taxon Index, NTI). Note: if the input argument is.standardised is set to TRUE and the nullmodel used is either "uniform" or "frequency.by.richness", then this function returns a result only ifthe input tree is ultrametric.

Usage

mntd.query(tree, matrix, standardize = FALSE,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

standardize Specifies whether the function should standardize the MNTD for variation inspecies richness. For each tip set S, the observed MNTD is standardized bysubtracting the mean MNTD and dividing by the standard deviation of this mea-sure. The mean and standard deviation are calculated among all tip sets thathave the same number of elements as set S, the tip set whose value we want tostandardize (default = FALSE).

Page 28: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

28 mntd.query

null.model A character vector (string) that defines which null model is used for computingthe standardized values of the measure. There are three possible null modelsthat can be used for computing the standardized values: these are "uniform","frequency.by.richness", and "sequential". All these models maintain speciesrichness. More specifically, the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, eachspecies is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Value

A vector which stores the computed (standardized) MNTD values. The i-th entry in this vectorstores the (standardized) MNTD value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example forrain forest trees. The American Naturalist 156: 145-155.

Page 29: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mpd.moments 29

See Also

mntd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate mntd values for each communitymntd.query(bird.families,comm)

#Calculate standardized versions under the uniform modelmntd.query(bird.families,comm,TRUE)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelmntd.query(bird.families,comm,TRUE,null.model="sequential",

abundance.weights=weights, reps=1000)

mpd.moments Computes the moments of the Mean Pairwise Distance measure

Description

Calculates the mean and standard deviation of the Mean Pairwise Distance (MPD) for a tree and avector of tip set sizes. The means and deviations can be calculated under three different null modelswhich maintain species richness.

Usage

mpd.moments(tree, sample.sizes, comp.expectation = TRUE, comp.deviation = TRUE,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

sample.sizes A vector of non-negative integers specifying the tip set sizes for which to calcu-late moments

comp.expectation

Specifies whether the function should compute the mean (default = TRUE)

Page 30: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

30 mpd.moments

comp.deviation Specifies whether the function should compute the standard deviation (default =TRUE)

null.model A character vector (string) that defines which null model is used for computingthe moments of the measure. There are three possible null models that can beused for computing the moments: these are "uniform", "frequency.by.richness",and "sequential". All these models maintain species richness. More specifically,the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, eachspecies is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Value

If both comp.expectation and comp.deviation are TRUE, the function returns a two-column matrixwith one row per element in sample.sizes, where the first column stores the mean MPD and thesecond column stores the standard deviation for this sample size. If only one of comp.expectationor comp.deviation are TRUE, the function returns a vector with the corresponding values instead.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Page 31: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mpd.pvalues 31

Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example forrain forest trees. The American Naturalist 156: 145-155.

See Also

mpd.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

# Calculate mean and variance under the uniform modelmpd.moments(bird.families,1:100)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate mean and variance under the sequential modelmpd.moments(bird.families,1:100,

null.model="sequential", abundance.weights=weights, reps=1000)

mpd.pvalues Computes the p-values of the Mean Pairwise Distance measure

Description

Calculates the p-values of the mean pairwise distance (MPD) measure for sets of tips on a phy-logeny. The p-values can be calculated under two different null models which maintain speciesrichness.

Usage

mpd.pvalues(tree, matrix, null.model="uniform",abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

Page 32: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

32 mpd.pvalues

null.model A character vector (string) that defines which null model is used for computingthe p-values of the measure. There are two possible null models that can be usedfor computing the p-values: these are "uniform" and "sequential". Both modelsmaintain species richness. More specifically, the available models are defined asfollows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is set to "uniform".

abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe other null model is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if the other null model is selected.

Value

A vector which stores the computed MPD p-values. The i-th entry in this vector stores the MPDp-value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetictree measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

See Also

mpd.moments

Page 33: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

mpd.query 33

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate p-values under the uniform modelmpd.pvalues(bird.families,comm, reps=1000)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate p-values under the sequential modelmpd.pvalues(bird.families,comm,null.model="sequential",

abundance.weights=weights, reps=1000)

mpd.query Computes the (standardized) value of the Mean Pairwise Distancemeasure

Description

Calculates the Mean Pairwise Distance (MPD) measure for sets of tips on a phylogeny. The samefunction can also calculate the standardized value of this measure under three different null modelswhich maintain species richness (this is equal to the Net Relatedness Index, NRI).

Usage

mpd.query(tree, matrix, standardize = FALSE,null.model="uniform", abundance.weights, reps=1000, seed)

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

standardize Specifies whether the function should standardize the MPD for variation inspecies richness. For each tip set S, the observed MPD is standardized by sub-tracting the mean MPD and dividing by the standard deviation of this measure.The mean and standard deviation are calculated among all tip sets that have the

Page 34: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

34 mpd.query

same number of elements as set S, the tip set whose value we want to standardize(default = FALSE).

null.model A character vector (string) that defines which null model is used for computingthe standardized values of the measure. There are three possible null modelsthat can be used for computing the standardized values: these are "uniform","frequency.by.richness", and "sequential". All these models maintain speciesrichness. More specifically, the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, eachspecies is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Value

A vector which stores the computed (standardized) MPD values. The i-th entry in this vector storesthe (standardized) MPD value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetictree measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

Page 35: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

pd.moments 35

Webb, C.O. 2000. Exploring the phylogenetic structure of ecological communities: An example forrain forest trees. The American Naturalist 156: 145-155.

See Also

mpd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate mpd values for each communitympd.query(bird.families,comm)

#Calculate standardized versions under the uniform modelmpd.query(bird.families,comm,TRUE)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelmpd.query(bird.families,comm,TRUE,null.model="sequential",

abundance.weights=weights, reps=1000)

pd.moments Computes the moments of the Phylogenetic Diversity measure

Description

Calculates the mean and standard deviation of the unrooted Phylogenetic Diversity (PD) for a treeand a vector of tip set sizes. The means and deviations can be calculated under three different nullmodels which maintain species richness.

Usage

pd.moments(tree, sample.sizes, comp.expectation = TRUE, comp.deviation = TRUE,null.model="uniform", abundance.weights, reps=1000, seed)

Page 36: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

36 pd.moments

Arguments

tree A phylo tree object

sample.sizes A vector of non-negative integers specifying the tip set sizes for which to calcu-late moments

comp.expectation

Specifies whether the function should compute the mean (default = TRUE)

comp.deviation Specifies whether the function should compute the standard deviation (default =TRUE)

null.model A character vector (string) that defines which null model is used for computingthe moments of the measure. There are three possible null models that can beused for computing the moments: these are "uniform", "frequency.by.richness",and "sequential". All these models maintain species richness. More specifically,the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, eachspecies is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Value

If both comp.expectation and comp.deviation are TRUE, the function returns a two-column matrixwith one row per element in sample.sizes, where the first column stores the mean PD and thesecond column stores the standard deviation for this sample size. If only one of comp.expectationor comp.deviation are TRUE, the function returns a vector with the corresponding values instead.

Page 37: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

pd.pvalues 37

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Faith, D.P. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61:1-10.

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

pd.query

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

# Calculate mean and variance under the uniform modelpd.moments(bird.families,1:100)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

# Calculate mean and variance under the sequential modelpd.moments(bird.families,1:100,

null.model="sequential", abundance.weights=weights, reps=1000)

pd.pvalues Computes the p-values of the unrooted Phylogenetic Diversity measure

Description

Calculates the p-values of the unrooted phylogenetic diversity (PD) measure for sets of tips on aphylogeny. The p-values can be calculated under two different null models which maintain speciesrichness.

Usage

pd.pvalues(tree, matrix, null.model="uniform",abundance.weights, reps=1000, seed)

Page 38: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

38 pd.pvalues

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

null.model A character vector (string) that defines which null model is used for computingthe p-values of the measure. There are two possible null models that can be usedfor computing the p-values: these are "uniform" and "sequential". Both modelsmaintain species richness. More specifically, the available models are defined asfollows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlike theother model (which is computed analytically), this model uses Monte-Carlorandomization.

This argument is optional, and its default value is set to "uniform".abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if option "sequential" is selected. The names stored at the vector mustmatch the names of the tips in the tree. This argument is redundant if the "uni-form" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifthe other null model is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if the other null model is selected.

Value

A vector which stores the computed PD p-values. The i-th entry in this vector stores the PD p-valueof the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Faith, D.P. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61:1-10.

Page 39: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

pd.query 39

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetictree measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

pd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate p-values under the uniform modelpd.pvalues(bird.families,comm, reps=1000)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelpd.pvalues(bird.families,comm,null.model="sequential",

abundance.weights=weights, reps=1000)

pd.query Computes the (standardized) value of the unrooted Phylogenetic Di-versity measure

Description

Calculates the unrooted phylogenetic diversity (PD) measure for sets of tips on a phylogeny. Thesame function can also calculate the standardized value of this measure under three different nullmodels which maintain species richness (this is equal to the Phylogenetic Diversity Index, PDI).

Usage

pd.query(tree, matrix, standardize = FALSE,null.model="uniform", abundance.weights, reps=1000, seed)

Page 40: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

40 pd.query

Arguments

tree A phylo tree object

matrix A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error.

standardize Specifies whether the function should standardize the PD for variation in speciesrichness. For each tip set S, the observed PD is standardized by subtracting themean PD and dividing by the standard deviation of this measure. The mean andstandard deviation are calculated among all tip sets that have the same numberof elements as set S, the tip set whose value we want to standardize (default =FALSE).

null.model A character vector (string) that defines which null model is used for computingthe standardized values of the measure. There are three possible null modelsthat can be used for computing the standardized values: these are "uniform","frequency.by.richness", and "sequential". All these models maintain speciesrichness. More specifically, the available models are defined as follows:

• "uniform" considers samples with equal (uniform) probability among allpossible tip samples of the same richness.

• "frequency.by.richness" is an abundance-weighted model where speciessamples are chosen in a manner similar to the following process; first, eachspecies is selected independently with probability proportional to its abun-dance. If the resulting sample consists of exactly the same number of ele-ments as the input assemblage then it is used by the null model, otherwiseit is tossed and the whole process is repeated.

• "sequential" is an abundance-weighted null model where species samplesare chosen based on the same method as R’s sample function. Unlikethe other two models (which are computed analytically), this model usesMonte-Carlo randomization.

This argument is optional, and its default value is set to "uniform".

abundance.weights

A vector of positive numeric values. These are the abundance weights that willbe used if either of the options "frequency.by.richness" or "sequential" are se-lected. The names stored at the vector must match the names of the tips in thetree. This argument is redundant if the "uniform" model is selected.

reps An integer that defines the number of Monte-Carlo random repetitions that willbe performed when using the "sequential" model. This argument is redundant ifany of the other two null models is selected.

seed A positive integer that defines the random seed used in the Monte-Carlo ran-domizations of the "sequential" model. This argument is optional, and becomesredundant if any of the other two null models is selected.

Page 41: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

pd.query 41

Value

A vector which stores the computed (standardized) PD values. The i-th entry in this vector storesthe (standardized) PD value of the i-th row in the input matrix.

Author(s)

Constantinos Tsirogiannis ([email protected])

References

Faith, D.P. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61:1-10.

Tsirogiannis, C. and B. Sandel. 2015. PhyloMeasures: A package for computing phylogeneticbiodiversity measures and their statistical moments. Ecography, doi: 10.1111/ecog.01814, 2015.

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetictree measures. Algorithms in Bioinformatics, LNCS 7534: 30-43.

Tsirogiannis, C., B. Sandel and A. Kalvisa. 2014. New algorithms for computing phylogeneticbiodiversity. Algorithms in Bioinformatics, LNCS 8701: 187-203.

See Also

pd.moments

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 100 random communities with 50 families eachcomm = matrix(0,nrow = 100,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Use query function to calculate pd values for each communitypd.query(bird.families,comm)

#Use query function to calculate standardized versions under the uniform modelpd.query(bird.families,comm,TRUE)

# Create random abundance weightsweights = runif(length(bird.families$tip.label))names(weights) = bird.families$tip.label

#Use query function to calculate standardized versions under the sequential modelpd.query(bird.families,comm,TRUE,null.model="sequential",

abundance.weights=weights, reps=1000)

Page 42: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

42 phylosor.query

phylosor.query Computes the value of the Phylogenetic Sorensen’s Similarity measure

Description

Calculates the Phylogenetic Sorensen’s Similarity (PhyloSor) given paired sets of tips on a phy-logeny. A version of this function that computes the standardised value of the measure is not yetavailable.

Usage

phylosor.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the PhyloSor values. Each row in query.matrix indicatesa pair of tip sets for which we want to compute the PhyloSor value. Let k andr be the values that are stored in the i-th row of query.matrix, where k is thevalue stored in the first column and r is the value stored in the second column.If matrix.b is given, the function computes the PhyloSor value between the k-throw of matrix.a and the r-th row of matrix.b. If matrix.b is not given, the functioncomputes the PhyloSor value between the k-th and r-th row of matrix.a (default= NULL)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the PhyloSor values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the PhyloSor values for all combinations of a row in matrix.awith rows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the PhyloSorvalues for the pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b areboth given, PhyloSor values are computed for the rows in matrix.a specified by the first column ofquery.matrix against the rows in matrix.b specified in the second column of query.matrix.

Page 43: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

unifrac.query 43

Value

The PhyloSor values for the requested pairs of tip sets. If query.matrix is provided, then the valuesare returned in an one-dimensional vector. The i-th element of this vector is the PhyloSor value forthe pair of tip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, thePhyloSor values are returned in a matrix object; entry [i,j] in the output matrix stores the PhyloSorvalue between the tip sets specified on the i-th and j-th row of matrix.a (if matrix.b is not specified),or the PhyloSor value between the i-th row of matrix.a and the j-th row of matrix.b (if matrix.b isspecified)

Author(s)

Constantinos Tsirogiannis ([email protected]) and Brody Sandel ([email protected])

References

Graham, C.H. and P.V.A. Fine. 2008. Phylogenetic beta diversity: linking ecological and evolu-tionary processes across space and time. Ecology Letters 11: 1265:1277.

Swenson, N.G. 2011. Phylogenetic beta diversity metrics, trait evolution and inferring functionalbeta diversity of communities. PLoS ONE: 6: e21264.

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate all pairwise PhyloSor values for communities in commphylosor.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowsphylosor.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)phylosor.query(bird.families,comm,query.matrix = qm)

unifrac.query Computes the value of the Unique Fraction measure

Description

Calculates the Unique Fraction (UniFrac) given paired sets of tips on a phylogeny. A version of thisfunction that computes the standardised value of the measure is not yet available.

Page 44: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

44 unifrac.query

Usage

unifrac.query(tree, matrix.a, matrix.b = NULL,query.matrix = NULL)

Arguments

tree A phylo tree object

matrix.a A matrix with binary (0/1) values, where each row represents a tip set. Each col-umn name in the matrix must match a tip label on the input tree. If not all valuesin the matrix are binary, we consider two cases; if the matrix contains only non-negative values, all values are coerced to binary ones and a warning message isprinted. If the matrix contains at least one negative value, the function throwsan error

matrix.b Optional, a second matrix with a similar format as matrix.a

query.matrix Optional, a two-column matrix specifying the pairs of rows (tip sets) for whichthe function computes the UniFrac values. Each row in query.matrix indicatesa pair of tip sets for which we want to compute the UniFrac value. Let k andr be the values that are stored in the i-th row of query.matrix, where k is thevalue stored in the first column and r is the value stored in the second column. Ifmatrix.b is given, the function computes the UniFrac value between the k-th rowof matrix.a and the r-th row of matrix.b. If matrix.b is not given, the functioncomputes the UniFrac value between the k-th and r-th row of matrix.a (default= NULL)

Details

Queries can be given in four ways. If neither matrix.b nor query.matrix are given, the functioncomputes the UniFrac values for all pairs of rows (tip sets) in matrix.a . If matrix.b is given but notquery.matrix, the function computes the UniFrac values for all combinations of a row in matrix.awith rows in matrix.b. If query.matrix is given and matrix.b is not, the function returns the UniFracvalues for the pairs of rows in matrix.a specified by query.matrix. If query.matrix and matrix.b areboth given, UniFrac values are computed for the rows in matrix.a specified by the first column ofquery.matrix against the rows in matrix.b specified in the second column of query.matrix.

Value

The UniFrac values for the requested pairs of tip sets. If query.matrix is provided, then the valuesare returned in an one-dimensional vector. The i-th element of this vector is the UniFrac value forthe pair of tip sets indicated in the i-th row of query.matrix. If query.matrix is not provided, theUniFrac values are returned in a matrix object; entry [i,j] in the output matrix stores the UniFracvalue between the tip sets specified on the i-th and j-th row of matrix.a (if matrix.b is not specified),or the UniFrac value between the i-th row of matrix.a and the j-th row of matrix.b (if matrix.b isspecified)

Author(s)

Constantinos Tsirogiannis ([email protected]) and Brody Sandel ([email protected])

Page 45: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

unifrac.query 45

References

Lozupone C. and R. Knight. 2005. UniFrac: a New Phylogenetic Method for Comparing MicrobialCommunities. Applied and Environmental Microbiology,71(12):8228-35.

Examples

#Load phylogenetic tree of bird families from package "ape"data(bird.families, package = "ape")

#Create 10 random communities with 50 families eachcomm = matrix(0,nrow = 10,ncol = length(bird.families$tip.label))for(i in 1:nrow(comm)) {comm[i,sample(1:ncol(comm),50)] = 1}colnames(comm) = bird.families$tip.label

#Calculate all pairwise UniFrac values for communities in communifrac.query(bird.families,comm)

#Calculate pairwise distances from#the first two rows of comm to all rowsunifrac.query(bird.families, comm[1:2,],comm)

#Calculate the distances from the first two rows#to all rows using the query matrixqm = expand.grid(1:2,1:10)unifrac.query(bird.families,comm,query.matrix = qm)

Page 46: Package ‘PhyloMeasures’ - The Comprehensive R … ‘PhyloMeasures’ January 14, 2017 ... (CAC); the package ... seed A positive integer that defines the random seed used in

Index

∗Topic biodiversityPhyloMeasures-package, 2

∗Topic distancePhyloMeasures-package, 2

∗Topic phylogenyPhyloMeasures-package, 2

cac.moments, 4, 7, 9cac.pvalues, 6cac.query, 5, 8cbl.moments, 10, 12cbl.query, 11, 11cd.moments, 13, 16cd.query, 14, 14cdnt.averaged.query, 16, 20, 22cdnt.directed.query, 18, 19, 22cdnt.query, 18, 20, 21

mntd.moments, 23, 26, 29mntd.pvalues, 25mntd.query, 24, 27mpd.moments, 29, 32, 35mpd.pvalues, 31mpd.query, 31, 33

pd.moments, 35, 39, 41pd.pvalues, 37pd.query, 37, 39PhyloMeasures (PhyloMeasures-package), 2PhyloMeasures-package, 2phylosor.query, 42

unifrac.query, 43

46