what is phylogenetics - amazon s3 · pdf fileevery object has a state vector & inherit the...

26
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features) of the species, under the natural assumption that similar species (i.e., species with similar characters) are genetically close. The term phylogeny refers to these relationships, usually presented as a phylogenetic tree Classic phylogenetics dealt mainly with physical, or morphological features - size, color, number of legs, etc. Modern phylogeny uses information extracted from genetic material - mainly DNA and protein sequences. The characters used are usually the DNA or protein sites (a site means a single position in the sequence) after aligning several such sequences, and using only blocks which were conserved in all the examined species. An interesting example is a research project that used phylogenetics in order to trace the origins of the human population on earth. During evolution, it is very common for a gene to be duplicated. Therefore, when discussing matching genes in different species, we differentiate between orthologous matches - which means both genes are ``the same'' gene in the strong sense - they are connected directly, and not through a duplication or sequences diverged after a speciation event paralogous matches - which are the result of some duplication along the evolutionary line, xenologs (horizontal transfers) which are genes that are transfered between organisms in other ways (e.g., by virus). Therefore, if we base our analysis on paralogs or xenologs (rather than orthologs) we are in big trouble.

Upload: lenhi

Post on 19-Mar-2018

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

What is Phylogenetics • Phylogenetics is the area of research concerned with finding the genetic connections and

relationships between species. • The basic idea is to compare specific characters (features) of the species, under the natural

assumption that similar species (i.e., species with similar characters) are genetically close. • The term phylogeny refers to these relationships, usually presented as a phylogenetic tree

• Classic phylogenetics dealt mainly with physical, or morphological features - size, color, number of legs, etc.

• Modern phylogeny uses information extracted from genetic material - mainly DNA and protein sequences. The characters used are usually the DNA or protein sites (a site means a single position in the sequence) after aligning several such sequences, and using only blocks which were conserved in all the examined species.

• An interesting example is a research project that used phylogenetics in order to trace the origins of the human population on earth.

• During evolution, it is very common for a gene to be duplicated. • Therefore, when discussing matching genes in different species, we differentiate between

orthologous matches - which means both genes are ``the same'' gene in the strong sense - they are connected directly, and not through a duplication or sequences diverged after a speciation event

paralogous matches - which are the result of some duplication along the evolutionary line, xenologs (horizontal transfers) which are genes that are transfered between organisms in

other ways (e.g., by virus). Therefore, if we base our analysis on paralogs or xenologs (rather than orthologs) we are in big trouble.

Page 2: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Theory of Evolution

• Basic idea

– speciation events lead to creation of different species.

– Speciation caused by physical separation into groups where different genetic variants become dominant

• Any two species share a (possibly distant) common ancestor

Page 3: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Basic Assumptions

Closer related organisms have more similar genomes.

Highly similar genes are homologous (have the same ancestor).

A universal ancestor exists for all life forms.

Molecular difference in homologous genes (or protein sequences) are positively correlated with evolution time.

Phylogenetic relation can be expressed by a dendrogram (a “tree”) .

Page 4: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Primate evolution

Speciation events

Page 5: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Molecular clock

This phylogenetic tree has all leaves in the same level. When this property holds, the phylogenetic tree is said to satisfy a molecular clock. Namely, the time from a speciation event to the formation of current species is identical for all paths (wrong assumption in reality).

Page 6: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Phylogenetic tree The results of phylogenetic analysis are usually presented as a collection of nodes and branches and that is, a tree In such tree, taxa that are closely related in an evolutionary sense appear close to each other, and taxa that are distantly related are in different (far) branches of the trees

A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships amtong various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics. In a phylogenetic tree, every node represents a species. Nodes are labeled, either with species names or the values (also referred to as states) of their characters, and the edges represent the genetic connections. It is important to note that there is usually a big difference between the leaf nodes, that represent real species, and the internal nodes, that in most cases represent the hypothetical evolutionary ancestors of the species in the data. The taxa joined together in the tree are implied to have descended from a common ancestor. Trees are useful in fields of biology such as bioinformatics, systematics, and comparative phylogenetics.

Page 7: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Contents

History

Types

• Rooted tree

• Unrooted tree

• Bifurcating tree

• Special tree types

Construction

Limitations

Page 8: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

History

• The idea of a "tree of life" arose from ancient notions of a ladder-like progression from lower to higher forms of life.

• Early representations of "branching" phylogenetic trees include a "paleontological chart" showing the geological relationships among plants and animals in the book Elementary Geology, by Edward Hitchcock.

• Charles Darwin (1859) also produced one of the first illustrations and crucially popularized the notion of an evolutionary "tree" in his seminal book The Origin of Species.

• Over a century later, evolutionary biologists still use tree diagrams to depict evolution because such diagrams effectively convey the concept that speciation occurs through the adaptive and random splitting of lineages.

• Over time, species classification has become less static and more dynamic.

Page 9: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Rooted tree (A rooted tree is a directed tree in which one of the nodes is

stipulated to be the root which is the most ancient hypothetical common ancestor of the OTUs being compared, and thus the direction of ancestral relationships is determined)

In a rooted phylogenetic tree, • each node with descendants represents the inferred most recent

common ancestor of the descendants, and the edge lengths in some trees may be interpreted as time estimates.

• Each node is called a taxonomic unit. • Internal nodes are generally called hypothetical taxonomic units,

as they cannot be directly observed. • Rooting an unrooted tree involves inserting a new node, which will

function as the root node. • This can be done by introducing an outgroup, a species that is

definitely distant from all the species of interest. The proposed root will be the direct predecessor of the outgroup.

Page 10: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

The tips of the branches (terminal nodes) represent the sequences being compared (sometimes called operational taxonomic units or OTUs) or Leaves represent present day species orTaxonomical units for which we want to create phylogeny are called Objects

e.g. species, population Every object has a state vector & inherit the same characters but not the same states! The nodes connecting the branches (internal nodes) represent hypothetical common ancestors of the OTUs that the branches subtend.

Branch lengths may have meaning in radial diagrams and phylograms.

They may represent the calculated distances between nodes if distance algorithms are used. They may represent the minimum number of steps between nodes if parsimony algorithms are used.

Edges length - “time” from one speciation to the next

Page 11: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

An unrooted tree • has no pre-determined root and therefore induces no hierarchy.

Thus illustrate the relatedness of the leaf nodes without making assumptions about ancestry at all

• Therefore, in this case, the distance between the nodes should be symmetric or specifies only the nodes interrelations (since the tree edges are not directed).

• While unrooted trees can always be generated from rooted ones by simply omitting the root,

• a root cannot be inferred from an unrooted tree without some means of identifying ancestry; this is normally done by including an outgroup in the input data or introducing additional assumptions about the relative rates of evolution on each branch. An outgroup is a species that have unambiguously separated early from the other species being considered.

• Example: comparing Humas and Gorilas, Baboons could be used as outgroups and the root would be placed somewhere along the branch conecting Baboons to the common ancestors for Humans and Gorilas.

• A radial diagram is particularly useful when the tree is unrooted or the root is uncertain.

Page 12: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

• Unrooted tree represents the same phylogeny without the root node

Page 13: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Rooted versus unrooted trees

Tree A Tree B

b

c

a

Page 14: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Bifurcating tree • Both rooted and unrooted phylogenetic trees can be either

bifurcating or multifurcating, and either labeled or unlabeled. • A rooted bifurcating tree has exactly two descendants arising from

each interior node (that is, it forms a binary tree), and an unrooted bifurcating tree takes the form of an unrooted binary tree, a free tree with exactly three neighbors at each internal node.

• In contrast, a rooted multifurcating tree may have more than two children at some nodes and an unrooted multifurcating tree may have more than three neighbors at some nodes.

• A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, defines a topology only.

• The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more multifurcating than bifurcating trees, more labeled than unlabeled trees, and more rooted than unrooted trees.

• The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root.

Page 15: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

• For labeled bifurcating trees, there are:

total rooted trees and

total unrooted trees

Page 16: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

The bifurcating tree

• A tree that bifurcates has a maximum of 2 descendants arising from each of the interior nodes.

Page 17: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

The multi-furcating tree

• A tree that multi-furcates has multiple descendants arising from each of the interior nodes.

Page 18: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Special tree types • A dendrogram is a broad term for the diagrammatic representation of a

phylogenetic tree. • A cladogram is a phylogenetic tree formed using cladistic methods. This

type of tree only represents a branching pattern; i.e., its branch spans do not represent time or relative amount of character change. A cladogram (slanted or rectangular) places all OTUs equidistant from the root. In taxonomy, OTUs on all branches with a common ancestor are called a clade. – A taxonomic unit (species, genus, family, etc.) is said to be monophyletic if the

smallest clade containing all members of that unit does not contain members of another unit.

– A taxonomic unit is said to be polyphyletic if the smallest clade containing all members of that unit contains members of other units.

• A phylogram is a phylogenetic tree that has branch spans proportional to

the amount of character change. A phylogram allows variation in the distance of OTUs from the root.

• A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch spans.

Page 19: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

A monophyletic group = CLADE

Page 20: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Construction • Distance-matrix methods A tree that recursively combines two nodes of the smallest distance. calculate genetic distance from multiple sequence alignments, simplest to implement, do not invoke an evolutionary model. • such as neighbor-joining or UPGMA, (Unweighted Pair Group Method using Arithmetic

Averages) and Fitch Margoliash Many sequence alignment methods such as ClustalW also create trees by using the simpler

algorithms (i.e. those based on distance) of tree construction. Maximum parsimony is another simple method of estimating phylogenetic trees, but implies an

implicit model of evolution (i.e. parsimony) or A tree with a total minimum number of character changes between nodes.

. More advanced methods use the optimality criterion of maximum likelihood, often within a

Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation.The method of choice nowadays. Most known and useful software called phylip uses this method.

Identifying the optimal tree using many of these techniques is NP-hard,so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.

Page 21: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Terminology for character states • The following terms, coined by Hennig, are used to identify shared or distinct characters

among groups:[ • A plesiomorphy ("close form") or ancestral state is a character state that a taxon has retained

from its ancestors. When two or more taxa that are not nested within each other share a plesiomorphy, it is a symplesiomorphy (from syn-, "together") of theirs. Symplesiomorphies do not mean that the taxa that have them are necessarily closely related. For example, Reptilia is traditionally characterized by (among other things) being cold-blooded (i.e. not maintaining a constant high body temperature), whereas birds are warm-blooded. Since cold-bloodedness is a plesiomorphy, inherited from the common ancestor of traditional reptiles and birds, and thus a symplesiomorphy of turtles, snakes and crocodiles (among others), it does not mean that turtles, snakes and crocodiles form a clade that excludes the birds.

• An apomorphy ("separate form") or derived state is an innovation. It can thus be used to diagnose a clade – or even to define a clade name in phylogenetic nomenclature. One clade may have autapomorphies (from auto-, "self"), two sister-groups may have synapomorphies (from syn-, "together"). For example, the possession of digits that are homologous with those of Homo sapiens is an apomorphy within the vertebrates. The tetrapods can be singled out as consisting of the first vertebrate with such digits together with all descendants of this vertebrate (an apomorphy-based phylogenetic definition).[19] Importantly, snakes and other tetrapods that do not have digits are nonetheless tetrapods: they descend from ancestors that possessed digits which were homologous with ours.

• A character state is homoplastic or "a homoplasy" if it is shared by two or more organisms but was not present in their common ancestor. It has evolved by convergence or reversion. Both mammals and birds are able to maintain a high constant body temperature (i.e. they are 'warm-blooded'). However, the ancestors of each group did not share this character, so it must have evolved independently. Warm-bloodedness is separately an apomorphy of mammals (or a larger clade) and one of birds (or a larger clade), but it is not a synapomorphy of these two clades.

Page 22: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

• The terms plesiomorphy and apomorphy are relative; their application depends on the position of a group within a tree. An (aut)apomorphy of one clade is a plesiomorphy of each of its members. For example, when trying to decide whether the tetrapods form a clade, an important question is whether having four limbs is a synapomorphy of all the taxa to be included within Tetrapoda: did all the possible members of the Tetrapoda inherit four limbs from a common ancestor, whereas all other vertebrates did not? By contrast, for a group within the tetrapods, such as birds, having four limbs is a plesiomorphy. Using these two terms allows a greater precision in the discussion of homology, in particular allowing clear expression of the hierarchical relationships among different homologies.

• It can be difficult to decide whether a character is in fact the same and thus can be classified as a synapomorphy which may identify a monophyletic group or whether it only appears to be the same and is thus a homoplasy which cannot identify such a group. There is a danger of circular reasoning: assumptions about the shape of a phylogenetic tree are used to justify decisions about characters, which are then used as evidence for the shape of the tree.[20] Phylogenetics uses various forms of parsimony to decide such questions; but the solutions often depend on the dataset and the methods.

Page 23: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

The Importance of Phylogenetic Trees

1. Increasing use of phylogenetic trees in the biological sciences

2. Need to know what trees diagrams do and do not

communicate

3. Provide an efficient structure for organizing biodiversity info

4. Develop accurate conception of totality of evolutionary

history

5. Important for aspiring biologists to develop this

understanding

Page 24: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

Limitations

• They do not necessarily accurately represent the species evolutionary history.

• The data on which they are based is noisy;

• the analysis can be confounded by horizontal gene transfer,

• hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution, and conserved sequences.

Page 25: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

• Also, there are problems in basing the analysis on a single type of character, such as a single gene or protein or only on morphological analysis, because such trees constructed from another unrelated data source often differ from the first, and therefore great care is needed in inferring phylogenetic relationships among species.

• This is most true of genetic material that is subject to lateral gene transfer and recombination, where different haplotype blocks can have different histories.

• In general, the output tree of a phylogenetic analysis is an estimate of the character's phylogeny (i.e. a gene tree) and not the phylogeny of the taxa (i.e. species tree) from which these characters were sampled, though ideally, both should be very close.

• For this reason, serious phylogenetic studies generally use a combination of genes that come from different genomic sources (e.g., from mitochondrial or plastid vs. nuclear genomes), or genes that would be expected to evolve under different selective regimes, so that homoplasy (false homology) would be unlikely to result from natural selection

Page 26: What is Phylogenetics - Amazon S3 · PDF fileEvery object has a state vector & inherit the ... The nodes connecting the branches (internal nodes) ... (Unweighted Pair Group Method

• When extinct species are included in a tree, they are terminal nodes, as it is unlikely that they are direct ancestors of any extant species..