1001 - a tool for binary representations of unordered ... · 1" 1001 - a tool for binary...

22
1 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev E. V 1 1 University of Florida, Florida Museum of Natural History, Gainesville, FL 32611 USA. [email protected] Abstract. In modern molecular systematics, matrices of unordered multistate characters, such as DNA sequence alignments, are used for analysis with no further re-coding procedures nor any a priori determination of character polarity. Here we present 1001, a simple freely available Python-based tool that helps re-code matrices of non-additive characters as different types of binary matrices. Despite to the historical basis, our analytical approach to DNA and protein data has never been properly investigated since the beginning of the molecular age. The polarized matrices produced by 1001 can be used as the proper inputs for Cladistic analysis, as well as used as inputs for future three-taxon permutations. The 1001 binary representations of molecular data (not necessary polarized) may also be used as inputs for different parametric software. This may help to reduce the complicated sets of assumptions that normally precede either Bayesian or Maximum Likelihood analyses. PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015 PrePrints

Upload: others

Post on 27-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

1  

1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data)

Mavrodiev E. V 1

1University of Florida, Florida Museum of Natural History, Gainesville, FL 32611 USA. [email protected]

Abstract. In modern molecular systematics, matrices of unordered multistate characters,

such as DNA sequence alignments, are used for analysis with no further re-coding

procedures nor any a priori determination of character polarity. Here we present 1001, a

simple freely available Python-based tool that helps re-code matrices of non-additive

characters as different types of binary matrices. Despite to the historical basis, our

analytical approach to DNA and protein data has never been properly investigated since

the beginning of the molecular age. The polarized matrices produced by 1001 can be used

as the proper inputs for Cladistic analysis, as well as used as inputs for future three-taxon

permutations. The 1001 binary representations of molecular data (not necessary

polarized) may also be used as inputs for different parametric software. This may help to

reduce the complicated sets of assumptions that normally precede either Bayesian or

Maximum Likelihood analyses.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 2: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

2  

Introduction

In an un-polarized, binary matrix the states 0 and 1 do not represent a hypothesis

of character polarity. In an polarized binary matrix character-states 0 and 1 are considered

to be plesimorphic (“primitive”) and apomorphic (“derived”) respectively apriori to

analysis (see Kitching et al. 1998). Because in Cladistics groups must be define based

only on the synapomorphies (e. g., Donoghue and Maddison 1986, Williams and Ebach

2008), it is critical to assume states’ polarity before analysis and group only on the states

“1” of the polarized binary matrix (e. g., Platnick 1985, 2013, de Pinna, 1996, Williams

and Ebach 2008, Waegele, 2004, 2005). Therefore a fundamental problem in molecular

systematics today is that molecular matrices are not polarized (Waegele 2004, 2005) and

are therefore analytically uninformative form Cladistics standpoint (Williams and Ebach

2008).

Historically, polarized binary matrices were proposed as an ideal data format for

cladistics analysis following the argumentation schemes of Willi Hennig, who

determined each character’s polarity before the construction of a cladogram (e. g.,

Swofford and Begle 1993, Kitching et al. 1998, Waegele 2004, 2005, Williams and

Ebach 2008, Ebach et al., 2013, Wiley and Lieberman 2011). Hennigian logic may be

clear even from the pure methodological standpoint: without a character hypothesis in

place apriori to analysis, we are unable to test hypotheses aposteriori (Ebach, personal

note).

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 3: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

3  

Given this, we have developed 1001, a computer program that converts un-

polarized molecular data matrices into the different types of binary matrices, either un-

polarized or with established polarity.

Implementing 1001

1001 is implemented as Python-based script that translates conventional matrices

of unordered multisite characters into polarized and non-polarized binary matrices written

in Phylip or Comma Separated Values (CSV) file formats. 1001 can be used with any

operating system that has a Python interpreter (e.g., Linux, Mac OS X, and Windows

(http://www.python.org/). Script been written by Dr. Matthew A . Gitzendanner

(University of Florida, Department of Biology and Florida Museum of Natural History,

Gainesville, FL 32611 USA).    

1001 accepts DNA and amino acid sequence alignments, as well non-molecular

data, in “relaxed” PHYLIP format (e. g., Maddison and Maddson 2011). All gaps and

ambiguities of the conventional multistate matrices must be recoded as "?" ("missing

entities") before running of 1001. The DNA sequence data is the subject of our primary

interest and the default option of 1001 designed for these kind of data. However, the

script may handle different kinds of unordered multistate characters.

Figures 1 (A – C), 2 and 3 provide the summary and the explanation of the results

and methods. Both methods implemented by 1001 a priori polarize conventional data by

comparing with an assumed all-plesiomorphic outgroup, as it was proposed before for

morphological data. This way of re-coding led to the non-direct methods of polarity

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 4: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

4  

estimation, as defined by Nelson (1978)(reviewed in Nixon and Carpenter 1993, de Pinna

1994, Kitching et al. 1998, Bryant 2001). Also, as it was summarized by Nixon and

Carpenter (1993: 414), the earliest mention of “out-group comparison” belong to Platnick

and Gertch (1976)(reviewed in Nixon and Carpenter 1993, see also Platnick and Gertsch

1976: 2).

The first Method (Fig. 1B) is based on a standard bioinformatics technique

frequently cited as the “Vos representation” of DNA sequences (reviewed in Bernaola-

Galvan et al. 2002) or as “CODE-4 encoding” of DNA data (Demeler and Shou 1991:

1594, see also absence/presence coding of Carine and Scotland 1999, Scotland 2000a, b

and Pleijel 1995 (“Method D”), reviewed in Kitching et al. 1998), but, additionally, with

the re-coding of the resulted 1/0 matrix as the polarized binary matrix.

Eight output files resulted from each run of 1001, if the First Method selected:

- non-polarized binary matrix, with and without invariant characters removed

(both phy and csv files);

- polarized binary matrix, with and without invariant characters removed (both

phy and csv files);

The absence/presence coding may results the similar trees as obtained with the

regularly coded characters, or may bias the original multistate data (de Laet 2005: 94-96)

therefore an additional method of the binary representation of multistate data also

implemented in 1001. Additional ways of binary coding are also possible, at least for the

DNA characters (e. g., Bernaola-Galvan et al., 2002: 106, Table 1).

This second method (Fig. 1C), or as we prefer to call it the “Cladistic” Method,

directly represents the conventional multistate alignment as a set of maximum

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 5: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

5  

relationships (Williams and Ebach 2008) following the values of the pre-selected out-

group taxon (Fig. 1). This method is designed for the polarized binary outputs only

(available in both phy and csv formats).

Both proposed methods increase the number of parsimony-informative characters.

Both polarized and non-polarized binary 1001 outputs can be used with popular

phylogenetic software, with many different statistical packages as well as an input for

three-taxon permutations (Nelson and Platnick 1991) using TAXODIUM 1.2 (polarized

binary data only)(Mavrodiev and Madorsky 2012) for the future completion of the

Cladistics analysis.

1001 is available for free from the Web (https://github.com/magitz/1001)

Breaking with the tradition of using unpolarized matrices in molecular systematics

Many popular phylogenetic applications are able to polarize characters before

analysis (e.g., command “AncState” of PAUP* (Swofford 2002) and “ancstates” of TNT

(Goloboff et al. 2008), see also the option “ancestors” included in some programs of

PHYLIP package (Felsenstein 1989). However our analytical approach to DNA and

protein data has never been properly investigated since the beginning of the molecular

age (Waegele 2004, 2005).

As clarified by Nixon and Carpenter (1993, 2012), by using unpolarized data,

modern phylogenetists are following Farris (1970, 1972, 1982)(see, however, Kluge

1976) and Meacham (1984)(reviewed Nixon and Carpenter, 1993, 2012, see also

Meacham, 1986 and Williams and Ebach 2008). For example, Meacham (1984, 1986)

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 6: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

6  

explicitly did not recommended to polarize characters before analysis, and was initially

strongly criticized for this position (Donoghue and Maddison 1986).

However, as was later clearly summarized by Swofford and Begle (1993: 3, 27) in

general agreement with Meacham (1984, 1986), it is better to infer the topology of the

tree and the character polarities simultaneously, rather than going through the two-stage

process of assigning polarities first and then estimating the tree (see also Maddison et al.,

1984, among others).

Maddison et al. (1984) and Swofford and Begle (1993) also noted that a priori

polarization of the characters is reasonable only when the polarity determination is

unambiguous (i.e., there is no heterogeneity in the outgroup for characters that are

variable within the ingroup); when the outgroup is heterogeneous, and the most

parsimonious assignment of an ancestral condition for the ingroup depends upon how the

outgroup taxa are related to each other (Swofford and Begle 1993: 3, 27, see also

Maddison et al., 1984, Nixon and Carpenter, 1993, Maddison and Madison, 2011,

Kitching et al. 1998, Lyons-Weiler et al. 1998 among others).

In other words, if the number of taxa within the out-group is in some way reduced

to one (see Arnold 1989 among others including the TNT program (Goloboff et al. 2008)

that offer a single out-group taxon as a default option) (or also in the case of

homogeneous outgroup), for a character with two or more states, the state occurring in

the outgroup can be indeed assumed to be the plesiomorphic state (Platnick and Gertsch

1976: 2, see also Watrous and Wheeler, 1981, Bryant 2001, de Pinna 1994, Kitching et

al. 1998, Donoghue and Maddison 1986, and Nixon and Carpenter 1993 for the reviews).

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 7: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

7  

We believe that numerous analytical possibilities are still missed from this simple

cladistics perspective and therefore the 1001 may help to investigate the field better.

If the characters of conventional multistate matrix are polarized, then the data

are represented in the form of relations, either “maximum” or “minimum” (Williams and

Ebach 2008). One of the goals of 1001 is the explication of sets of “maximum

relationships” as separate entities (as polarized binary matrices) for future analyses. The

listed popular software (see above) is unable to perform such explications, even if in

principle these programs can polarize matrices before analyses.

Each relation is a not equal to the conventional character anymore, but represents

the hypothesis of the relationships between taxa (e. g., Platnick et al. 1996, Williams and

Ebach 2008). Therefore the polarized binary matrix is semantically different from either

raw multistate alignment or from the non-polarized binary representation of this

alignment. One, for example, may note that the polarized binary matrix represents a kind

of structure, rather than the collection of raw characters.

Another may tell us that the notion that systematic data constitutes a normal

character by taxon matrix is not an intrinsically cladistic notion (Platnick 1993: 271, see

also Williams and Ebach 2006, 2008 and Ebach et al. 2013) and, therefore, another type

of data may require for the Cladistic analysis, especially if the last one is viewed as an

extension of comparative approach (e. g., Nelson and Platnick 1981, Williams and Ebach

2008, Rieppel et al., 2013, see also Nelson, 1970). The sets of maximum relationships

explicated by 1001 may be considered as putative candidates for the proper inputs for

Cladistic analysis.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 8: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

8  

1001 and three-item analysis

It was argued multiple times, that the three-taxon matrix constitutes the actual

systematic data (e. g., Platnick 1993: 271, see also Nelson and Platnick 1991, Williams

and Siebert 2000 and Williams and Ebach 2008). However the three-taxon representation

of unordered multistate data may be an issue (Williams and Siebert 2000).

Scotland and others (Carine and Scotland 1999, Scotland 2000a, b) already

performed the three-taxon-permutations of non-additive binary data (eventually re-coded

multistate data). They also discussed their methodology in the context of Patterson’s idea

of “pair homology” (Carine and Scotland 1999, Scotland 2000a, b). The polarized binary

outputs of 1001 may also be used as inputs for future three-taxon representations using

TAXODIUM 1. 2 (Mavrodiev and Madorsky 2012)(Fig. 2). As well as the Williams -

Siebert three-taxon representation of unordered multistate data (Williams and Siebert

2000, Mavrodiev and Madorsky 2012, Mavrodiev et al. 2014), this option may help to

prevent the artificial groupings under the conditions of 3TA, mentioned by Kluge and

Farris (1999) and Farris et al. (2001) in their comments of the results of Scotland and

others (Carine and Scotland 1999, Scotland 2000a, b).

1001 and parametric methodology

Below we argued that it is critical to break with the tradition of using unpolarized

matrices in molecular systematics. However even the non-polarized 1001-binary

representations of molecular data may essentially extend the horizons of conventional

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 9: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

9  

phylogenetic analyses. For example these representations may also be used as inputs for

parametric software and analyzed under the conditions of the simplest Mk model (Lewis,

2001) and it elementary binary derivates (Stamatakis 2014)(Fig. 3). This may help to

simplify the complicated sets of assumptions (Jefferys and Berger 1992, Berger and

Jefferys 1992) that normally precede either Bayesian or Maximum Likelihood

approaches increasing the “robustness” of the analyses (Berger and Jefferys 1992). More

investigation is necessary here, however.

Conclusion

A fundamental problem in molecular systematics today is that molecular matrices are not

polarized. Historically, specifically polarized binary matrices were been proposed as an

ideal data format for Cladistic analysis following the argumentation schemes of Willi

Hennig, but despite of historical background, this analytical approach to the molecular

data, never been properly investigated. Given this, we have developed 1001, a simple

computer program that converts un-polarized molecular data matrices into the different

types of binary matrices, either un-polarized or with established polarity. Both methods

implemented by 1001 a priori polarize conventional data by comparing with an assumed

all-plesiomorphic outgroup, as it was proposed before for morphological data. The

polarized matrices explicated by 1001 may be considered as candidates for the proper

inputs for Cladistic analysis, as well as used as inputs for future three-taxon permutations.

The 1001 binary representations of molecular data (not necessary polarized) may also be

used as inputs for different parametric software. This may help to reduce the complicated

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 10: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

10  

sets of assumptions that normally precede either Bayesian or Maximum Likelihood

analyses.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 11: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

11  

Acknowledgments

I thank Prof. Gareth Nelson for brief helpful discussion regarding the Cladistics

representation of the DNA characters (Method 2), as implemented in “1001”. I also thank

Dr. Matthew Gitzendanner for elegant Phyton-based implementation of “1001”. Finally, I

would like to acknowledge Drs. David Williams and Malte Ebach for their helpful

comments and linguistic remarks that help to sharp arguments. No agreement imply from

the behalf of any person acknowledged in this section.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 12: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

12  

References

Arnold, EN (1989) Systematics and adaptive radiation of equatorial african lizards assigned to the genera Adolfus, Bedriagaia, Gastropholis, Holaspis, and Lacerta (reptilia, Lacertidae). Journal of Natural History 23, 525-555.

Bansal, MS, Burleigh, JG, Eulenstein, O, Fernandez-Baca, D (2010) Robinson-Foulds supertrees. Algorithms for Molecular Biology 5. 18.

Berger, JO, Jefferys, WH (1992) The application of robust Bayesian analysis to hypothesis testing and Occam's razor. Journal of the Italian Statistical Society 1, 17-32.

Bernaola-Galvan, P, Carpena, P, Roman-Roldan, R, Oliver, JL (2002) Study of statistical correlations in DNA sequences. Gene 300, 105-115.

Bryant, HN (2001) Character polarity and the rooting of cladograms. The character concept in evolutionary biology. 319-338.

Carine, MA, Scotland, RW (1999) Taxic and transformational homology: different ways of seeing. Cladistics 15, 121-129.

Crowl, AA, Mavrodiev, E, Mansion, G, Haberle, R, Pistarino, A, Kamari, G, Phitos, D, Borsch, T, Cellinese, N (2014) Phylogeny of Campanuloideae (Campanulaceae) with Emphasis on the Utility of Nuclear Pentatricopeptide Repeat (PPR) Genes. PloS one 9, e94199-e94199.

de Pinna, MCC (1994) Ontogeny, rooting, and polarity. Systematics Association Special Volume Series 52, 157-172.

De Laet, J (2005) Parsimony and the problem of inapplicables in sequence data. In 'Parsimony, phylogeny and genomics.' (Ed. VA Albert.) pp. 81-116. (Oxford University Press: Oxford)

Demeler, B, Zhou, GW (1991) Neural network optimization for Escherichia coli promoter prediction. Nucleic Acids Research 19, 1593-1599.

Ebach, MC, Williams, DM, Vanderlaan, TA (2013) Implementation as theory, hierarchy as transformation, homology as synapomorphy. Zootaxa 3641, 587-594.

Farris, JS (1970) Methods for computing Wagner trees. Systematic Zoology 19, 83-92. Farris, JS (1972) Estimating phylogenetic trees from distance matrices. American

Naturalist 106, 645-668. Farris, JS (1982a) Outgroups and parsimony. Systematic Zoology 31, 328-334. Farris, JS, Albert, VA, Kallersjo, M, Lipscomb, D, Kluge, AG (1996) Parsimony

jackknifing outperforms neighbor-joining. Cladistics 12, 99-124. Farris, JS, Kluge, AG, De Laet, JE (2001) Taxic revisions. Cladistics 17, 79-103. Felsenstein, J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5,

164-166. Goloboff, PA, Farris, JS, Nixon, KC (2008) TNT, a free program for phylogenetic

analysis. Cladistics 24, 774-786. Jefferys, WH, Berger, JO (1992) Ockham's razor and Bayesian analysis. American

Scientist 64-72. Kitching, IJ, Forey, PL, Humphries, CJ, Williams, DM (1998) 'Cladistics: the theory and

practice of parsimony analysis.' (Oxford University Press: Oxford).

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 13: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

13  

Kluge, AG (1976) Phylogenetic relationships in the lizard family Pygopodidae: an evaluation of theory, methods and data. Miscellaneous Publs Mus Zool Univ Mich No. 152, 1-72.

Kluge, AG, Farris, JS (1999) Taxic homology equals overall similarity. Cladistics 15, 205-212.

Lewis, PO (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50, 913-925.

Lyons-Weiler, J, Hoelzer, GA, Tausch, RJ (1998) Optimal outgroup analysis. Biological Journal of the Linnean Society 64, 493-511.

Ma, P-F, Zhang, Y-X, Zeng, C-X, Guo, Z-H, Li, D-Z (2014a) Chloroplast phylogenomic analyses resolve deep-level relationships of an intractable bamboo tribe Arundinarieae (Poaceae). Systematic Biology 63, 933-950.

Ma, P-F, Zhang, Y-X, Zeng, C-X, Guo, Z-H, Li, D-Z (2014b) Chloroplast phylogenomic analyses resolve deep-level relationships of an intractable bamboo tribe Arundinarieae (Poaceae). Data from Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.d5h1n

Maddison, WP, Donoghue, MJ, Maddison, DR (1984) Outgroup analysis and parsimony. Systematic Zoology 33, 83-103.

Maddison, WP, Maddison, DR, 2011. Mesquite: a modular system for evolutionary analysis. Version 2.75.

Mavrodiev, EV, Madorsky, A (2012) TAXODIUM Version 1.0: a simple way to generate uniform and fractionally weighted three-item matrices from various kinds of biological data. Plos One 7.

Mavrodiev, EV, Martinez-Azorin, M, Dranishnikov, P, Crespo, MB (2014) At least 23 genera instead of one: the case of Iris L. s.l. (Iridaceae). Plos One 9.

Meacham, CA (1984) The role of hypothesized direction of characters in the estimation of evolutionary history. Taxon 26-38.

Meacham, CA (1986) Polarity assessment in phylogenetic systematics - more about directed characters - a reply. Taxon 35, 538-540.

Miller, M, A.,, Pfeiffer, W, Schwartz, T (2010) 'Creating the CIPRES Science Gateway for inference of large phylogenetic trees, Gateway Computing Environments Workshop (GCE) 14 Nov. 2010.' New Orleans, LA, USA.

Nelson, GJ (1970) Outline of a theory of comparative biology. Systematic Zoology 19, 373-384.

Nelson, G (1978) Ontogeny, phylogeny, paleontology, and Biogenetic Law. Systematic Zoology 27, 324-345.

Nelson, G, Platnick, N (1981) Systematics and biogeography: cladistics and vicariance. Systematics and biogeography: cladistics and vicariance. 1-567.

Nelson, G, Platnick, NI (1991) Three-taxon statements - a more precize use of parsimony? Cladistics 7, 351-366.

Nixon, KC (1999) The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407-414.

Nixon, KC, Carpenter, JM (1993) On outgroups. Cladistics 9, 413-426. Nixon, KC, Carpenter, JM (2012) On homology. Cladistics 28, 160-169. Platnick, NI, Gertsch, WJ (1976) The suborders of spiders: a cladistic analysis

(Arachnida, Araneae). American Museum Novitates No. 2607, 1-15.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 14: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

14  

Platnick, NI (1985) Philosophy and the transformation of cladistics revisited. Cladistics 1, 87-94.

Platnick, NI (2013) Less on homology. Cladistics 29, 10-12. Platnick, NI (1993) Character optimization and weighting - differences between the

standard and three-taxon approaches to phylogenetic inference. Cladistics 9, 267-272.

Platnick, NI, Humphries, CJ, Nelson, G, Williams, DM (1996) Is Farris optimization perfect?: three-taxon statements and multiple branching. Cladistics 12, 243-252.

Pleijel, F (1995) On character coding for phylogeny reconstruction. Cladistics 11, 309-315.

Rieppel, O, Williams, DM, Ebach, MC (2013) Adolf Naef (1883-1949): on foundational concepts and principles of systematic morphology. Journal of the History of Biology 46, 445-510.

Scotland, R (2000a) Homology, coding and three-taxon statement analysis. In 'Homology and systematics: coding characters for phylogenetic analysis.' (Eds RW Scotland, P R.T.) Vol. 58 pp. 145-182. (Chapman and Hall: London, New York)

Scotland, RW (2000b) Taxic homology and three-taxon statement analysis. Systematic Biology 49, 480-500.

Sikes, D, S., Lewis, P, O. (2001) 'PAUPRat: PAUP* implementation of the parsimony ratchet. Beta software, version 1. Distributed by the authors.' (Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, USA: Connecticut, USA).

Stamatakis, A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312-1313.

Swofford, DL, 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Massachusetts.

Swofford, D, L., Begle, D, P. (1993) 'PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 3.1. User's Manual.' (Laboratory of Molecular Systematics, MRC 534, MSC, Smithsonian Institution: Washington DC, USA)

Waegele, JW (2005) 'Foundations of phylogenetic systematics.' (Pfeil Verlag: München) Wagele, JW (2004) Hennig's Phylogenetic Systematics brought up to date. In 'Milestones

in Systematics.' (Ed. DM Williams, Forey, P. L.) pp. 101-125. (CRC Press: USA, FL, Boca Raton)

Watrous, LE, Wheeler, QD (1981) The out-group comparison method of character analysis. Systematic Zoology 30, 1-11.

Wiley, EO, Lieberman, BS (2011) Phylogenetics: the theory and practice of phylogenetic systematics. Second edition. Phylogenetics: the theory and practice of phylogenetic systematics. Second edition. i-xvi, 1-406.

Williams, DM, Ebach, MC (2006) The data matrix. Geodiversitas 28, 409-420. Williams, DM, Ebach, MC (2008) 'Foundations of systematics and biogeography.'

(Springer: New York, United States) Williams, DM, Siebert, DJ (2000) Characters, homology and three-item analysis. In

'Homology and systematics: coding characters for phylogenetic analysis.' (Eds RW Scotland, P R.T.) Vol. 58 pp. 183-208. (Chapman and Hall: London, New York)

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 15: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

15  

Figure Legends Fig. 1. A. Results of Maximum Parsimony Analyses (MP) of conventional plastid

genomic DNA matrix from Ma et al. 2014a, b. Final topologies rooted relatively

Dendrocalamus latiflorus Munro (Ma et al. 2014). Median Consensus Tree based on

Robinson-Foulds (RF) distance (with the best score found = 8837) of 184 shortest trees

of length = 5019 (CI = 0.89, RI = 0.91). Number of taxa = 157. All constant characters

from the original alignment are excluded from the analysis. Number of variable

characters = 4304, number of parsimony-informative characters = 2003. * nodes received

Maximum Parsimony Jackknife (JK) support >50% after 20 000 fast JK replicates; !

nodes recovered Maximum Parsimony Bootstrap support in the original analysis of Ma et

al. 2014 (200 full heuristic replicates). B. Results of Maximum Parsimony Analyses

(MP) of binary representation of conventional DNA matrix from A., re-coded following

proposed Method 1. Initial binary date were also polarized before analysis relatively D.

latiflorus, assumed as an outgroup based on the results of Ma et al. (2014a). Majority-

Rule Consensus of 191 shortest trees of length = 10014 (CI = 0.88, RI = 0.89). Number

of taxa = 157. Number of binary characters = 8783, number of parsimony-informative

characters = 4088. * nodes received Maximum Parsimony Jackknife (JK) support >50%

after 20 000 fast JK replicates. C. Results of Maximum Parsimony Analyses (MP) of

binary representation of conventional DNA matrix from A., re-coded following proposed

Method 2. Data polarized before analysis relatively Dendrocalamus latiflorus, assumed

as an out-group based on the results of Ma et al. 2014. Majority-Rule Consensus of 139

shortest trees of length = 4993 (CI = 0.89, RI = 0.91). Number of taxa = 157. Number of

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 16: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

16  

binary characters = 4993, number of parsimony-informative characters = 2027. * nodes

received Maximum Parsimony Jackknife (JK) support >50% after 20 000 fast JK

replicates.

All MP analyses were conducted using program PAUPrat (Nixon 1999; Sikes and

Lewis 2001; Swofford 2002) as implemented in CIPRES (Miller et al. 2010) following

200 ratchet replicates with no more than 1 tree of length greater than or equal to 1 saved

in each replicate, and the TBR branch swapping/MulTrees option in effect; -pct = 20%,

all characters weighted uniformly, gaps were treated as ‘‘missing”. Maximum Parsimony

jackknifing (Farris et al. 1996) conducted using program PAUP* (Swofford 2002).

Robinson-Foulds consensus (reviewed in Kitching et al. 1998 and Bansal et al. 2010)

calculated using program RFS v. 2.0 (Bansal et al. 2010). Majority-Rule consensus

calculated in PAUP* (Swofford 2002).

All gaps and ambiguities of the conventional DNA matrix (A.) recoded as missing

data ("?") before binary permutations.

Roman numbers corresponds to the “major lineages” of bamboos, as specified by

Ma et al. 2014a.

Fig. 2. The results of two preliminary three-taxon analyses (3TAs) of Clades 1 and 2 of

the general topology, described on Fig. 1. In both cases, the DNA alignments derived

from described above (Fig. 1, all original data came from Ma et al. 2014a, b) simply by

proper sampling. These alignments been polarized following Method 2 and after that

established as a tree-taxon matices using TAXODIUM 1.2 (Mavrodiev, Madorsky,

2015). Indocalamus wilsonii (Rendle) C.S.Chao & C.D.Chu (Clade 1) and Bergbambos

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 17: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

17  

tessellata (Nees) Stapleton (Clade 2) assumed as an outgroup taxa before Method 2

applied to the DNA characters. A. The results of the first 3TA (Clade 1). Majority-Rule

Consensus of 193 shortest trees of length = 527046 (CI = 0.92, RI = 0.91). The number of

taxa in 487168 character-3TA matrix is 72. All 487168 3TSs are parsimony-informative

and weighted uniformly. B. The results of the second 3TA (Clade 2). Majority-Rule

Consensus of 201 shortest trees of length = 187857 (CI = 0.86, RI = 0.83). The number of

taxa in 161027 character-3TA matrix is 80. All 1610278 3TSs are parsimony-informative

and weighted uniformly. Roman Numbers corresponds to the “major lineages” of

bamboos, as specified by Ma et al. 2014a. See also the Legend of the Fig. 1 for the details

of the MP analyses.

Fig. 3. A. Most probable topology recovered from a Maximum Likelihood (ML) analysis

(RAxML)(Stamatakis 2014) of conventional Campanula s.l. & outgroups DNA plastid +

nuclear (PPR loci) combined matrix from Crowl et al. (2014). The names of the all taxa

are taken from the Supplemental data of Crowl et al. (2014). ML bootstrap (BS) values

for nodes receiving .50% supports are indicated above and below the branches (1000

rapid replicates). GTR + G model was assumed to be the best choice for the molecular

dataset. Final ML Optimisation Likelihood: -57876.127159. B. Most probable topology

recovered from Maximum Likelihood analysis (RAxML)(Stamatakis 2014) of

conventional Campanula s.l. & out-groups DNA plastid + nuclear (PPR loci) combined

matrix from Crowl et al. (2014), but established as a non-polarized binary matrix

following Method 1. Binary matrix analyzed under the assumptions of BINGAMMA

model (Stamatakis 2014, see also Lewis 2001) with the putative ascertainment bias

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 18: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

18  

(Lewis 2001) left uncorrected. ML bootstrap values for nodes receiving .50% supports

are indicated above and below the branches (1000 rapid ML BS replicates). Final ML

Optimisation Likelihood: -85930.633845.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 19: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

Chimonobambusa szechuanensis

Fargesia qinlingensis

Fargesia fungosa

Yushania basihirsuta

Thamnocalamus spathiflorus

Phyllostachys nidularia

Indocalamus tessellatus

Yushania qiaojiaensis

Pleioblastus solidus

Drepanostachyum ampullare

Indocalamus decorus

Pleioblastus amarus

Yushania levigata

Himilayacalamus falconeri

Indocalamus longiauritus

Yushania yadongensis

Indocalamus hirtivaginatusFargesia nitida

Drepanostachyum hookerianum

Sasa qingyuanensis

Arundinaria abietina

Pleioblastus yixingensis

Pleioblastus wuyishanensis

Fargesia yunnanensis

Arundinaria qingchengshanensis

Ampelocalamus scandens

Indocalamus emeiensis

Ampelocalamus patellaris

Chimonocalamus pallens

Yushania maculata

Chimonocalamus dumosus

Phyllostachys nigra

Sasa sinica

Gelidocalamus sp. 2

Indocalamus guangdongensis

Chimonocalamus montanus

Pseudosasa hirta

Arundinaria aristata

Indocalamus herktosii

Indocalamus victorialis

Arundinaria faberi

Pleioblastus juxianensis

Pseudosasa guangxiensis

Yushania brevipaniculata

Pleioblastus sanmingensis

Fargesia spathacea

Indocalamus hirsutissimus

Pleioblastus intermedia

Fargesia robusta

Indocalamus sp.

Menstruocalamus sichuanensis

Brachystachyum densiflorum

Chimonobambusa macrophylla

Sinobambusa seminuda

Pseudosasa maculifera

Fargesia edulis

Chimonocalamus fimbriatus

Oligostachyum oedogonatum

Fargesia macclureana

Yushania baishanzuensis

Oldeania alpine

Arundinaria fargesii

Phyllostachys propinqua

Ampelpcalamus actinotrichus

Pseudosasa amabilis var. convexa

Phyllostachys heteroclada

Gelidocalamus rutilans

Arundinaria spanostachya

Gaoligongshania megalothyrsa

Indocalamus bashanensis

Fargesia decurvata

Indocalamus sinicus

Phyllostachys edulis

Pseudosasa acutivagina

Acidosasa notata

Indocalamus latifolius

Bergbambos tessellata

Yushania crassicollis

Chimonocalamus longiusculus

Indocalamus quadratus

Indocalamus wuxiensis

Phyllostachys heteroclada

Menstruocalamus sichuanensis

Indocalamus longiauritus

Sinobambusa seminuda

Indocalamus bashanensis

Yushania levigata

Gelidocalamus sp. 2

Acidosasa notata

Pseudosasa hirta

Sasa sinicaOligostachyum oedogonatum

Fargesia spathacea

Indocalamus tessellatus

Brachystachyum densiflorum

Himilayacalamus falconeri

Indocalamus hirsutissimus

Pleioblastus intermedia

Pseudosasa acutivagina

Fargesia fungosa

Phyllostachys propinqua

Chimonocalamus longiusculus

Pleioblastus amarus

Gelidocalamus rutilans

Yushania maculata

Chimonocalamus dumosus

Bergbambos tessellata

Ampelocalamus patellaris

Oldeania alpine

Arundinaria aristata

Fargesia edulis

Fargesia yunnanensis

Fargesia nitida

Drepanostachyum hookerianum

Indocalamus sp.

Chimonocalamus fimbriatus

Indocalamus quadratus

Arundinaria qingchengshanensis

Phyllostachys edulis

Phyllostachys nidularia

Arundinaria fargesii

Pleioblastus juxianensis

Fargesia decurvata

Sasa qingyuanensis

Indocalamus emeiensis

Ampelpcalamus actinotrichus

Arundinaria faberi

Yushania qiaojiaensis

Indocalamus sinicus

Indocalamus latifolius

Yushania basihirsuta

Yushania brevipaniculata

Arundinaria spanostachya

Fargesia robusta

Pleioblastus yixingensis

Pleioblastus sanmingensis

Pleioblastus wuyishanensis

Yushania yadongensis

Chimonobambusa szechuanensis

Pleioblastus solidus

Ampelocalamus scandens

Pseudosasa amabilis var. convexa

Chimonobambusa macrophylla

Pseudosasa maculifera

Pseudosasa guangxiensis

Fargesia macclureana

Indocalamus guangdongensis

Chimonocalamus montanus

Indocalamus victorialis

Arundinaria abietina

Yushania baishanzuensisYushania crassicollis

Indocalamus herktosii

Phyllostachys nigra

Indocalamus wuxiensis

Drepanostachyum ampullare

Chimonocalamus pallens

Gaoligongshania megalothyrsa

Indocalamus decorus

Fargesia qinlingensis

Indocalamus hirtivaginatus

Thamnocalamus spathiflorus

Brachystachyum densiflorum

Ampelocalamus patellaris

Yushania yadongensis

Fargesia decurvata

Arundinaria aristata

Yushania maculata

Phyllostachys nigra

Pseudosasa acutivagina

Arundinaria fargesii

Gaoligongshania megalothyrsa

Phyllostachys heteroclada

Indocalamus sp.

Yushania qiaojiaensis

Pleioblastus wuyishanensis

Fargesia macclureana

Indocalamus victorialis

Indocalamus hirsutissimus

Pleioblastus yixingensis

Indocalamus emeiensis

Thamnocalamus spathiflorus

Indocalamus sinicus

Fargesia nitida

Indocalamus longiauritus

Drepanostachyum hookerianum

Yushania basihirsuta

Phyllostachys nidularia

Indocalamus guangdongensis

Pseudosasa amabilis var. convexa

Indocalamus decorus

Indocalamus quadratus

Yushania levigata

Arundinaria spanostachya

Pseudosasa hirta

Chimonocalamus pallens

Drepanostachyum ampullare

Pleioblastus amarus

Gelidocalamus rutilans

Chimonocalamus fimbriatus

Indocalamus wuxiensis

Indocalamus tessellatusIndocalamus hirtivaginatus

Yushania brevipaniculata

Pleioblastus intermedia

Phyllostachys propinqua

Pleioblastus solidus

Ampelpcalamus actinotrichus

Sinobambusa seminudaPseudosasa maculifera

Sasa sinica

Sasa qingyuanensis

Bergbambos tessellata

Fargesia fungosa

Arundinaria qingchengshanensis

Yushania crassicollisMenstruocalamus sichuanensis

Indocalamus herktosii

Pleioblastus sanmingensis

Gelidocalamus sp. 2

Chimonocalamus montanus

Arundinaria faberi

Himilayacalamus falconeri

Chimonocalamus dumosus

Fargesia robusta

Indocalamus bashanensis

Yushania baishanzuensis

Indocalamus latifolius

Ampelocalamus scandens

Acidosasa notata

Oldeania alpine

Arundinaria abietina

Oligostachyum oedogonatum

Fargesia qinlingensis

Chimonobambusa szechuanensis

Chimonocalamus longiusculus

Fargesia yunnanensis

Pseudosasa guangxiensis

Pleioblastus juxianensis

Fargesia spathacea

Chimonobambusa macrophylla

Phyllostachys edulis

Fargesia edulis

↑ ↑ ↑I II

VII

X X X

III III IIIII

II IIIX IX

IX

VII VII

V

V V

↑ ↑ ↑↑

↑↑ ↑ ↑ ↑

↑ ↑ ↑

*

**

*

*

*

***

**

*

*

***

*

*

*

**

*

**

***

**

*

*

*

***

**

*

**

*

*

!

!

!

!

!

!!

!!

!

!!

!

!

! !! !

!

B. Binary data: Method 1 (polarized binary example)

C. Binary data: Method 2A. DNA characters

Fig. 1

Clade 2

Clade 1 & Outgroups Clade 1 & Outgroups Clade 1 &Outgroups

Clade 2 Clade 2

A A T+ A C C T+

011011000011110110000

+ - value of the assumed Outgroup

A A T+ A C C T+

1? 1? 00 1? ?1 ?1 00

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 20: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

Acidosasa nanunica

Indocalamus barbatus

Arundinaria nanningensis

Pleioblastus sp

Indosasa spongiosa

Pleioblastus gramineus

Indosasa shibataeoides

Ferrocalamus strictus

Acidosasa gigantea

Sasa palmata

Indosasa sinica

Sasa oblongula

Shibataea nanpingensis

Oligostachyum scabriflorum

Bambusa emeiensis

Pleioblastus pygmaeus

Sasa oshidensis

Sasa sp. 2

Pleioblastus maculatusOligostachyum shiuyingianum

Sasa longiligulata

Indocalamus tongchunensis

Dendrocalamus latiflorus

Bambusa oldhamii

Oligostachyum gracilipes

Sasaella ramosa

Arundinaria gigantea

Pseudosasa orthotropa

Shibataea lanceifolia

Ferrocalamus rimosivaginus

Pseudosasa cantorii

Acidosasa chienouensis

Acidosasa purpurea

Indosasa lipoensis

Acidosasa chinensis

Indosasa hispida

Sasa sp. 1

Pseudosasa japonica var. tsutsumiana

Pseudosasa gracilis

Sasa tsuboiana

Indocalamus wilsonii

Pseudosasa viridula

Sasa guangxiensis

Indocalamus jinpingensis

Acidosasa guangxiensis

Indosasa crassiflora

Sasa senanensis

Indosasa gigantea

Sasa magnonoda

Pseudosasa hindsii

Oligostachyum hupehense

Sinobambusa intermedia

Indosasa triangulata

Pseudosasa amabilis

Arundinaria tecta

Pleioblastus argenteostriatus

Sasa kurilensis

Indocalamus pseudosinicus

Shibatae hispidus

Indosasa jinpingensis

Oligostachyum lubricum

Arundinaria appalachiana

Pseudosasa longiligula

Shibatae chinensis

Pseudosasa japonica

Oligostachyum sulcatum

Ampelocalamus calcareus

Metasasa carinata

Acidosasa edulis

Gelidocalamus tessellatus

Sinobambusa tootsik

Gelidocalamus sp. 1

Indosasa glabrata

Sasa veitchii

Indosasa sp.

Indosasa patens

Pleioblastus argenteostriatus

Pseudosasa hindsii

Pleioblastus gramineus

Indocalamus wilsonii

Arundinaria tecta

Ampelocalamus calcareus

Sasa oshidensis

Oligostachyum lubricum

Sasa sp. 2

Indocalamus jinpingensis

Dendrocalamus latiflorus(= assumed Out)

Ferrocalamus rimosivaginus

Indosasa lipoensis

Pseudosasa gracilis

Indosasa glabrata

Sasa tsuboiana

Pseudosasa japonica

Pseudosasa cantorii

Indocalamus pseudosinicus

Shibataea nanpingensis

Sinobambusa intermedia

Indosasa crassiflora

Pleioblastus maculatus

Indosasa hispida

Metasasa carinata

Acidosasa guangxiensis

Oligostachyum scabriflorum

Gelidocalamus sp. 1

Pleioblastus pygmaeus

Pseudosasa amabilis

Sasa longiligulata

Indosasa gigantea

Arundinaria appalachiana

Shibatae hispidus

Sasa guangxiensis

Indosasa triangulata

Indosasa spongiosa

Indosasa sinica

Sasa veitchii

Pleioblastus sp.

Acidosasa gigantea

Pseudosasa orthotropa

Acidosasa chinensis

Arundinaria nanningensis

Shibataea lanceifolia

Acidosasa purpurea

Arundinaria gigantea

Indosasa sp.

Pseudosasa viridula

Indosasa patens

Indocalamus barbatus

Sasa palmata

Gelidocalamus tessellatus

Sasa sp. 1

Sasa magnonoda

Bambusa emeiensis

Indocalamus tongchunensis

Sinobambusa tootsik

Bambusa oldhamii

Pseudosasa japonica var tsutsumiana

Sasaella ramosa

Ferrocalamus strictus

Acidosasa chienouensis

Sasa kurilensis

Acidosasa edulis

Oligostachyum shiuyingianum

Oligostachyum gracilipes

Pseudosasa longiligula

Acidosasa nanunica

Indosasa jinpingensis

Indosasa shibataeoides

Sasa senanensisSasa oblongula

Shibatae chinensis

Oligostachyum sulcatum

Oligostachyum hupehense

Pseudosasa gracilis

Acidosasa gigantea

Pseudosasa cantorii

Pseudosasa orthotropa

Sasa oblongula

Indosasa shibataeoides

Oligostachyum shiuyingianum

Sasa longiligulata

Arundinaria appalachiana

Gelidocalamus tessellatus

Indocalamus pseudosinicus

Pleioblastus pygmaeus

Indocalamus jinpingensis

Arundinaria tecta

Indosasa glabrata

Indosasa sinica

Pseudosasa japonica var tsutsumiana

Indocalamus barbatus

Bambusa oldhamii

Indosasa lipoensis

Pseudosasa amabilis

Shibatae chinensis

Indocalamus tongchunensis

Arundinaria nanningensis

Pleioblastus sp.

Shibataea nanpingensis

Sasa kurilensis

Pleioblastus maculatus

Oligostachyum gracilipes

Pseudosasa viridula

Indosasa patens

Acidosasa chienouensis

Acidosasa purpurea

Oligostachyum hupehense

Sinobambusa intermedia

Ampelocalamus calcareus

Acidosasa guangxiensis

Indosasa gigantea

Sasa oshidensis

Sasa magnonoda

Pleioblastus argenteostriatus

Metasasa carinata

Arundinaria gigantea

Pseudosasa longiligula

Gelidocalamus sp. 1

Indosasa triangulata

Sasa guangxiensis

Bambusa emeiensis

Oligostachyum lubricum

Pseudosasa japonica

Sasa sp. 1Sasa senanensis

Indocalamus wilsonii

Indosasa hispida

Indosasa crassiflora

Indosasa spongiosa

Dendrocalamus latiflorus

Shibatae hispidus

Indosasa sp.

Pseudosasa hindsii

Sasaella ramosa

Sasa sp. 2

Ferrocalamus strictus

Acidosasa edulis

Pleioblastus gramineus

Oligostachyum scabriflorum

Sasa tsuboiana

Oligostachyum sulcatum

Indosasa jinpingensis

Sinobambusa tootsik

Ferrocalamus rimosivaginus

Shibataea lanceifolia

Sasa palmata

Sasa veitchii

Acidosasa nanunica

Acidosasa chinensis

↑ ↑ ↑

IXVIII

IV

VI

IXVIIIVIII IX

VI

IVIV

VI

↑↑↑↑

↑ ↑

*

*

*

*

**

**

* **

*

*

*

*

*

*

***

*

***

*

*

* *

****

**

*

*

*

***

**

**

*

*

*

* ***

*

**

* *

**

*

*

*

***

**

*

**

*

!

!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!!

!

!!

!

!

B. Binary data: Method 1(polarized binary example)

C. Binary data: Method 2A. DNA characters

Fig. 1 (continious)

Clade 1 & Outgroups

Clade 2 Clade 2 Clade 2

Clade 1 & Outgroups Clade 1 & Outgroups

(= assumed Out)

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 21: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

Pseudosasa hirta

Yushania basihirsuta

Phyllostachys nidularia

Indocalamus victorialis

Sasa sinica

Indocalamus bashanensisIndocalamus wuxiensis

Indocalamus herktosii

Indocalamus decorus

Arundinaria fargesii

Indocalamus longiauritus

Brachystachyum densiflorum

Fargesia qinlingensisChimonocalamus fimbriatus

Phyllostachys propinqua

Arundinaria spanostachya

Yushania levigata

Pseudosasa amabilis var. convexa

Fargesia fungosa

Pseudosasa maculifera

Indocalamus guangdongensis

Fargesia yunnanensis

Fargesia robusta

Indocalamus emeiensis

Gaoligongshania megalothyrsa

Arundinaria abietina

Ampelpcalamus actinotrichus

Pseudosasa acutivagina

Phyllostachys nigra

Acidosasa notata

Fargesia nitida

Chimonocalamus pallens

Fargesia decurvata

Fargesia edulis

Chimonocalamus montanus

Fargesia spathacea

Pleioblastus juxianensis

Indocalamus quadratus

Yushania baishanzuensis

Gelidocalamus rutilans

Oligostachyum oedogonatum

Drepanostachyum hookerianum

Indocalamus sinicus

Menstruocalamus sichuanensis

Gelidocalamus sp. 2

Chimonocalamus dumosus

Yushania qiaojiaensisYushania brevipaniculata

Indocalamus hirsutissimus

Pleioblastus intermedia

Himilayacalamus falconeri

Indocalamus latifolius

Phyllostachys edulis

Pleioblastus yixingensisSasa qingyuanensis

Indocalamus hirtivaginatus

Chimonobambusa szechuanensis

Chimonocalamus longiusculus

Ampelocalamus patellaris

Drepanostachyum ampullare

Ampelocalamus scandens

Pleioblastus solidus

Arundinaria aristata

Phyllostachys heteroclada

Indocalamus sp.

Thamnocalamus spathiflorus

Pleioblastus wuyishanensis

Yushania yadongensis

Oldeania alpine

Yushania maculata

Pseudosasa guangxiensis

Pleioblastus amarus

Fargesia macclureana

Indocalamus tessellatus

Arundinaria qingchengshanensis

Chimonobambusa macrophylla

Arundinaria faberi

Pleioblastus sanmingensis

Yushania crassicollis

Sinobambusa seminuda

Sasa tsuboiana

Pseudosasa japonica

Shibataea lanceifolia

Gelidocalamus tessellatus

Indosasa hispida

Shibatae hispidus

Indocalamus wilsonii

Oligostachyum sulcatum

Sasa oshidensis

Indosasa lipoensis

Pleioblastus argenteostriatus

Arundinaria appalachianaArundinaria tecta

Oligostachyum hupehense

Sasa kurilensis

Ferrocalamus strictus

Pleioblastus gramineus

Oligostachyum lubricum

Pleioblastus maculatus

Pseudosasa japonica var. tsutsumiana

Arundinaria gigantea

Acidosasa chienouensis

Sasa guangxiensis

Acidosasa nanunica

Pseudosasa amabilis

Indocalamus barbatus

Oligostachyum scabriflorum

Oligostachyum gracilipes

Acidosasa guangxiensis

Sasa longiligulata

Pleioblastus pygmaeus

Sasaella ramosa

Indocalamus tongchunensis

Pseudosasa viridula

Indosasa triangulata

Sinobambusa intermedia

Sasa magnonoda

Indocalamus pseudosinicus

Sasa veitchii

Sasa palmata

Indosasa gigantea

Sasa oblongula

Sinobambusa tootsik

Indosasa spongiosa

Pseudosasa hindsii

Metasasa carinata

Indocalamus jinpingensis

Indosasa glabrata

Pleioblastus sp.

Indosasa crassiflora

Pseudosasa longiligula

Shibataea nanpingensis

Sasa senanensis

Acidosasa gigantea

Gelidocalamus sp. 1

Pseudosasa gracilis

Indosasa shibataeoides

Ferrocalamus rimosivaginus

Acidosasa chinensis

Acidosasa purpurea

Acidosasa edulis

Oligostachyum shiuyingianum

Pseudosasa cantorii

Indosasa patens

Indosasa jinpingensis

Pseudosasa orthotropa

Shibatae chinensis

Arundinaria nanningensis

Sasa sp. 2

Indosasa sp.

Sasa sp. 1

Indosasa sinica

Bergbambos tessellata(= assumed Outgroup)

VIII

IV

Clade 1 Clade 2

I VIIII

IX

V

III

(= assumed Outgroup)

↑↑ ↑↑

1? 1? 00 1? ?1 ?1 00

11?? 1?1? 0000 ?11? ???1 ???1 0000

↑[to 3TSs]

A A T+ A C C T+

+ - value of the assumed Outgroup (e. g., Bergbambos tessellata )

3TA: 3TA:

VI

Fig. 2

A. B.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts

Page 22: 1001 - A tool for binary representations of unordered ... · 1" 1001 - A tool for binary representations of unordered multistate characters (with examples from genomic data) Mavrodiev

Legousia pentagonia

Campanula cochlearifolia

Platycodon grandiflorus

Azorina vidalii

Campanula cenisia

Jasione montana

Campanula persicifolia

Campanula carnica cfr.

Campanula carpatica

Campanula bononiensis

Campanula carpatha

Campanula elatinoides

Musschia aurea

Wahlenbergia hederacea

Campanula exigua

Cyphia elata

Campanula mollis

Campanula erinus

Campanula macrorhiza

Campanula drabifolia

Jasione crispa

Asyneuma virgatum

Feeria angustifolia

Campanula tubulosa

Hanabusaya asiatica

Campanula foliosa

Campanula spatulata

Campanula saxatilis

Campanula forsythii

Campanula rapunculoides

Campanula brixiensis 2

Campanula hawkinsiana

Heterochaenia ensifolia

Campanula reverchonii

Petromarula pinnata

Campanula creutzburgii

Campanula stenocodon

Campanula elatines

Campanula hierapetrae

Githopsis diffusa

Campanula edulis

Campanula mirabilis

Campanula medium

Legousia hybrida

Campanula petraea

Campanula barbata

Physoplexis comosa

Campanula alliariifolia

Campanula patula

Campanula latifolia

Campanula caespitosa

Campanula rapunculus

Githopsis pulchella

Campanula brixiensis

Campanula peregrina

Campanula divaricata

Campanula bertolae 3

Campanula rhomboidalis

Wahlenbergia gloriosa

Campanula thyrsoides

Trachelium caeruleum

Symphyandra hofmanni

Legousia falcata

Campanula bertolae 2

Campanula glomerata

Campanula laciniata

Campanula portenschlagiana

Campanula rotundifolia

Campanula arvatica

Campanula sarmatica

Campanula scheuchzeri

Campanula brixiensis

Canarina canariensis

Campanula bertolae

Campanula marchesettii 2

Campanula aucheri

Wahlenbergia linifolia

Campanula herminii

Campanula parryi

Campanula robinsiae

Jasione laevis

Campanula versicolor

Campanula carnica

Campanula lusitanica

Campanula excisa

Symphyandra armena

Campanula lanata

Campanula marchesettii

Campanula bellidifolia

Campanula grossheimii

Campanula pelviformis

Campanula lactiflora

Campanula witasekiana

Jasione heldreichii

Campanula trachelium

Campanula sabatia

Campanula incurva

Campanula brixiensis

Campanula appenina

Michauxia tchihatchewii

Campanula pyramidalis

Campanula macrostachya

Legousia speculum veneris

Campanula cretica

Symphyandra pendula

Wahlenbergia berteroi

Triodanis coloradoensis

Campanula spicata

Campanula cervicaria

Gadellia lactiflora

Wahlenbergia angustifolia

Campanula fragilis

Campanula aizoides

Heterocodon rariflorum

Campanula sibirica

Edraianthus graminifolius

Campanula spatulata cfr.

Merciera tenuifolia

Campanula alpestris

Campanula scouleri

Campanula jacquinii

100

9 7

100

100

9 5

100

9 5

8 6

9 8

7 4

6 4

7 1

5 2

100

100

100

9 7

9 9

6 7

100

6 9

9 9

6 8

100

9 9

100

100

100

9 9

100

100

8 1

8 7

6 6

6 2

9 5

5 4

9 5100

9 6

100

9 6

9 5

9 9 1009 9

9 1

100

100

100

100

9 7

100

9 5

7 4

8 6

8 4

9 1

5 4

100

100

6 8

7 1

100

100

100

8 2

100

100

9 1

8 7

100

100

100

9 4

9 0

9 9

100

100100

8 8

5 2

9 9

9 9

9 7

7 0

100

100

9 3

Campanula pelviformis

Trachelium caeruleum

Campanula spicata

Campanula medium

Campanula parryi

Heterocodon rariflorum

Campanula foliosa

Campanula marchesettii 2

Campanula saxatilis

Campanula laciniata

Campanula bertolae

Jasione montana

Wahlenbergia gloriosa

Campanula arvatica

Campanula creutzburgii

Campanula thyrsoides

Campanula divaricata

Campanula carpatica

Campanula carnica

Symphyandra pendula

Campanula stenocodon

Campanula brixiensis 4

Campanula portenschlagiana

Campanula pyramidalis

Campanula rapunculoides

Symphyandra armena

Feeria angustifolia

Campanula caespitosa

Triodanis coloradoensis

Platycodon grandiflorus

Campanula bononiensis

Legousia pentagonia

Wahlenbergia berteroi

Campanula aizoides

Gadellia lactiflora

Campanula peregrina

Campanula cenisiaCampanula elatinoides

Campanula mollis

Campanula excisa

Campanula lusitanica

Campanula jacquinii

Campanula versicolor

Campanula macrorhiza

Edraianthus graminifolius

Githopsis pulchella

Campanula bertolae 2

Heterochaenia ensifolia

Campanula bertolae 3

Campanula scheuchzeri

Campanula brixiensis 2

Campanula brixiensis 3

Campanula incurva

Campanula latifolia

Campanula carnica cfr.

Campanula elatines

Campanula robinsiae

Physoplexis comosa

Campanula fragilis

Campanula sabatia

Musschia aurea

Campanula drabifolia

Campanula exigua

Petromarula pinnata

Azorina vidalii

Campanula alpestris

Campanula aucheri

Wahlenbergia hederacea

Cyphia elata

Campanula rapunculus

Campanula cretica

Campanula herminii

Legousia speculum veneris

Campanula witasekiana

Campanula scouleri

Campanula sibirica

Campanula hawkinsiana

Githopsis diffusa

Campanula barbata

Michauxia tchihatchewii

Campanula lanata

Jasione crispa

Campanula tubulosa

Campanula rotundifolia

Campanula cervicaria

Campanula hierapetrae

Campanula forsythii

Campanula macrostachya

Hanabusaya asiatica

Campanula brixiensis

Legousia falcata

Campanula edulis

Jasione heldreichii

Campanula sarmatica

Campanula cochlearifolia

Campanula carpatha

Legousia hybrida

Campanula reverchonii

Campanula persicifolia

Campanula lactiflora

Campanula mirabilis

Campanula appenina

Wahlenbergia linifolia

Symphyandra hofmanni

Jasione laevis

Merciera tenuifolia

Campanula spatulata cfr.

Campanula trachelium

Campanula patula

Campanula erinus

Campanula bellidifolia

Canarina canariensis

Campanula marchesettii

Asyneuma virgatum

Campanula grossheimii

Campanula rhomboidalis

Campanula petraea

Wahlenbergia angustifolia

Campanula spatulata

Campanula alliariifolia

Campanula glomerata

100

9 8

7 9

100

7 7

9 1

8 8

9 8

7 1

100

100

8 9

9 5

8 8

9 6

100

100

100

100

100

100

100

6 3

100

9 5

9 8

100

100

9 8

100

1009 6

100

7 2

9 8

100

100

100

100

100

100

100

6 7

100

100

9 3

100

100

9 8

100

7 4

6 7

100

100

5 7

9 1

6 5

100

100

7 5

100

8 4

100

9 8

100

8 4

100

5 7 7 7

5 2

9 5

100

100

6 1

9 8

6 3

100

100

100

100

9 9

100

100

100

100

100

100

9 9

9 7

100

100

100

6 9

100

43RAxML

A 001A 001T 010A 001C 100C 100T 010

B. Binary data: Method 1/RAxMLA. DNA characters/(non-polarized binary example)

Fig. 3 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1153v1 | CC-BY 4.0 Open Access | rec: 2 Jun 2015, publ: 2 Jun 2015

PrePrin

ts