introduce of phylogeny_ english

Upload: nhbach

Post on 10-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Introduce of Phylogeny_ English

    1/15

    TREE CONSTRUCTION

    1. Definition of a phylogenetic tree

    2. Features of a phylogenetic tree

    Branches

    Nodes (External & Internal)

    3. Unrooted trees

    4. Rooted trees

    Choice of an outgroup

    5. Inferred and true trees

    6. Gene trees are not the same as species trees

    7. Tree reconstruction

    7.1 Molecular sequences

    7.2 Sequence alignment is the essential preliminary to tree reconstruction

    7.3 Converting the alignment data into a phylogenetic tree

    7.4 Assessing accuracy of a reconstructed tree

    7.5 Molecular clocks enable the time of divergence of ancestral sequences to be

    estimated

    8. The applications (examples) of molecular phylogenetics

    8.1 Clarifying evolutionary relationships between humans & other primates8.2 The oldest life on earth (rRNA and phylogeny)

    8.3 The origin of Aids

    8.4 Problems with prion

    9. Molecular phylogeny as a tool of the study of human prehistory

    9.1 Intraspecific studies require highly variable genetic loci

    9.2 The origins of modern humans - out of Africa or not?

    9.3 The pattern of more recent migration into Europe are also controversial

    9.4 Prehistoric human migration into the new world

    10. Exercises for drawing a phylogenetic tree

    11. Software packages for reconstruction of phylogenetic trees

  • 8/8/2019 Introduce of Phylogeny_ English

    2/15

    3

    52

    41

    C

    B D

    A

    B C

    D

    3

    A

    5

    BC

    D

    A1

    D

    B

    A

    C

    4

    BD

    C

    A

    2

    D

    A

    B

    C

    Unrooted tree

    1. Definition of a phylogenetic treeA tree is an acyclic connected graph that consists of a collection of nodes (internal and external)

    and branches connecting them so that every node can be reached by a unique path from every

    other branch.

    Figure: An unrooted phylogenetic tree joining 4 taxonomic units.

    2. Features of a phylogenetic treeIn the area of phylogenetic inference, trees are used as visual displays that represent

    hypothetical, reconstructed evolutionary events. The tree in this case consists of:

    internal nodes which represent taxonomic units such as species or genes; the external nodes,

    those at the ends of the branches, represent living organisms.

    The lengths of the branches usually represent an elapsed time, measured in years, or the

    length of the branches may represent number of molecular changes (e.g. mutations) that have

    taken place between the two nodes. This is calculated is from the degree of differences when

    sequences are compared (refer to alignments later)

    Sometimes, the lengths are irrelevant and the tree represents only the order of evolution. [In

    a dendrogram, only the lengths of horizontal (or vertical, as the case may be) branches count]. Finally the tree may be rooted or unrooted.

    3. Unrooted treesAn unrooted tree simply represents phylogenetic but doesnot provide an evolutionary path. In

    an unrooted tree, an external node represents a contemporary organism. Internal nodes

    represent common ancestors of some of the external nodes. In this case, the tree shows the

    relationship between organisms A, B, C & D and does not tell us anything about the series of

    evolutionary events that led to these genes (see figure above). There is also no way to tell

    whether or not a given internal node is a common ancestor of any 2 external nodes.

    4. Rooted treesGene trees are not the same as species trees

    In case of a rooted tree, one of the internal nodes is used as an outgroup, and, in essence,

    becomes the common ancestor of all the other external nodes. The outgroup therefore enables

    the root of a tree to be located and the correct evolutionary pathway to be identified. In the

    above case, five different evolutionary pathways are possible using an outgroup, each depicted

    by a different rooted tree.

    A

    B D

    CBranches

    Internal nodes

    External nodes

    Figure. The five rooted trees that can be drawn from the unrooted tree (box). The positions of

    the roots are indicated by the number on the outline of the unrooted tree (box)

  • 8/8/2019 Introduce of Phylogeny_ English

    3/15

    5. Inferred and true treesThe criteria used to choose an outgroup depends very much on the type of analysis that is

    carried out. Suppose that 4 homologous (orthologous) genes in a tree come from human,

    chimpanzee, gorilla and orangutan. A useful homologous primate outgroup sequence is that

    from baboon as palaeontological evidence suggests that baboons branched away from the

    lineage leading to human, chimpanzee, gorilla and orangutan before the time of the common

    ancestor of the four species (figure below).

    Figure: The use of an outgroup to root a phylogenetic tree.

    We refer to the rooted tree given above, as an inferred tree. This is to emphasise that it depicts the

    series of evolutionary events that are inferred from the data that were analysed, and may not

    necessarily be the same as the true tree, the one that depicts the actual series of events that

    occurred. Sometimes we can be fairly confident that the inferred tree is the true tree but most

    phylogenetic data analysis are prone to uncertainties. Degrees of confidence can be assigned to the

    branching patterns in an inferred tree using bootstrap analysis (discussed in a later section). Due tothe imprecise nature of phylogenetic analysis controversies have arisen.

    Baboon

    Orangutan

    Gorilla

    Human

    Chimpanzee

  • 8/8/2019 Introduce of Phylogeny_ English

    4/15

    6. Gene trees are not the same as species treesThe above tree is a gene tree i.e. a tree derived by comparing orthologous sequences (those

    derived from the same ancestral sequence). The assumption is that this gene tree is a more accurate

    reflection of a species tree than the one that can be inferred from morphological data. This

    assumption is generally correct but it does not mean that the gene tree is the same as a species tree.

    Mutation and speciation are not expected to occur at the same time. For example, the mutation

    event could precede the speciation event. This would mean that, to begin with, both alleles will

    still be present in the same unsplit population of the ancestral species. When the population split

    occurs, it is likely that both alleles will be present in each of the resulting groups. After the split,

    the new population evolve independently. One possibility is that as a result of random genetic drift

    loss of one allele from one population and the loss of the other allele from the second population

    occurs. This establishes the two separate genetic lineages that were inferred from phylogenetic

    analysis of the gene. How do these considerations affect the coincidence between a gene and a

    species tree?

    (a) If a molecular clock is used to date the time at which gene divergence took place, than it

    cannot be assumed that this is also the time of the speciation event. A significant difference

    between a gene and a species event can exist though the species tree & gene tree look the same

    (see LHS figure a below).

    (b) If the first speciation event is followed closely by a second speciation event in one of

    the two populations, then the branching order of the gene tree might be different to that of the

    species tree. This can occur if the genes in the modern species are derived from alleles that had

    already appeared before the first of the two speciation (RHS Figure, below)

    Allele loss

    A

    A B C A B C

    A B A B

    Bb

    Speciation

    Mutation

    Gene tree & species tree look the same. However,

    mutation might precede speciation giving an

    incorrect time for the latter if a molecular clock is

    used

    Speciation

    Mutation

    A gene tree can have a different branching order

    from a species tree

    Speciation

    Mutation

    A B CB

  • 8/8/2019 Introduce of Phylogeny_ English

    5/15

    7. Tree reconstructionIn any molecular phylogenetic reconstruction the following 4 points need to be addressed.

    Molecular sequences12. Sequence alignment is the essential preliminary to tree reconstruction

    13. Converting the alignment data into a phylogenetic tree

    14. Assessing accuracy of a reconstructed tree15. Molecular clocks enable the time of divergence of ancestral sequences to be

    estimated

    7.1 Molecular sequences

    Nucleic acids (rRNA, DNA) and protein sequences are used in molecular phylogenetic tree

    construction. DNA yields more phylogenetic information than DNA and has become by the far

    predominant molecule for phylogeny:

    More statistical information from DNA data : The nucleotide sequences of a pair

    of homologous genes has a higher information content than the amino acid of the

    corresponding proteins, because mutation that result in nonsynchrononymouschanges affect only the DNA sequence. Hence coding as well as non-coding regions

    of the genome can be examined. Write out the DNA sequences or the following two

    amino acids as an example of this. You can see that at the protein level there is only

    1 difference but at the nucleic acid level there are 3 differences.

    Protein1 -gly-ala-ile-leu-asp-arg-

    DNA1 -gga-gcc-ata-tta-gat-aga

    DNA2 -gga-gca-att-ttt-gat-aga-

    Protein2 -gly-ala-ile-phe-asp-arg-

    Ease of sequencing DNA : Samples for DNA sequencing can be prepared by

    PCR which is an extremely easy technique.

    Protein electrograms, Restriction fragment length polymorphism (RFLP), Simple sequence

    length polymorphism (SSLP), Single nucleotide polymorphism (SNP) and DNA-DNA

    hybridazation data have also been used for molecular phylogenetic reconstruction.

    Immunological data from cross-reactivity studies were used in 1904 for such work as well.

    7.2 Sequence alignment is the essential preliminary to tree construction.

    This is the most important step in molecular phylogeny and a number of issues have to beconsidered:

    Sequence Homologs: Sequences that are to be aligned should be homologs. An example of

    this are the -globin genes of different vertebrates. This is to satisfy the phylogeny criteriawhich states that the sequence should be derived from an common ancestral sequence.

    Non-homologous sequences: If the sequences are not homologous and hence do not share acommon ancestor phylogenetic construction methods will always produce a tree but the tree

    will not be of any biological relevance. This type of error commonly occurs when undertaking

    homology analysis to assign functions to newly generated gene sequences. Blast is used

    extensively as on of the homology analysis methods and hence interpretation of the data arising

    from the analysis should be undertaken with care.

    Easy alignments: Correctly aligning the homologous sequence is the next task. In somecases it is an easy task. A simple sequence alignment is shown below:

  • 8/8/2019 Introduce of Phylogeny_ English

    6/15

    Sequence 1 AGCAATGGCCAGACAATAATG

    Sequence 2 AGCTATGGACAGACATTAATG

    *** **** ****** *****

    Difficult alignments: If sequences have evolved and diverged by accumulating insertionsand deletions as well as point mutations, then these sequence are not always easy to align.

    Insertions and deletions cannot be distinguished when pairs of sequences are aligned so we refer

    to them as indels Below is a pair of difficult sequences for alignment where placing the indel at

    the correct location can become a problem.

    Sequence 1 GACGACCATAGACCAGCATAG

    Sequence 2 GACTACCATAGA-CTGCAAAG

    *** ******** * *** **

    Sequence 1 GACGACCATAGACCAGCATAGSequence 2 GACTACCATAGACT-GCAAAG

    *** ********* *** **

    The dot matrix technique for alignment: Some alignments can be easily done by "eyeballing" the sequences yet others may require a pen and paper. The simplest is known as the dot

    matrix method. The two sequences are written out on the x- and y- axes of the graph paper at

    the positions corresponding to the identical nucleotides of the two sequences. The alignment is

    indicated by a diagonal series of dots broken by empty squares where the sequences have

    nucleotide differences, and shifting from one column to another where indels occur.

    Figure: The dot matrix technique for sequence alignments

    Two possible positions for

    the indel

    An indel is shown

    by a shift in the

    column

    Discontinued dot

    indicates a point

    mutation

  • 8/8/2019 Introduce of Phylogeny_ English

    7/15

    Similarity approach is a mathematical based alignment technique: The similarity approach(Needleman and Wunesh, 1970) aims to maximise the number of identical matched nucleotides

    in the two sequences. The distance method, (Waterman, 1976) on the other hand, minimises

    the number of mismatches. Often the two approaches will identify the same alignment as being

    the best one.

    Multiple alignments are generated for more then two sequences: Rarely can one do multiplealignments with a pen and paper and all the steps required for phylogenetic analysis is

    undertaken on a computer. For automatically generating multiple alignments several computer

    programs are available (discussed later)

    rRNA genes (aka rDNA) and rRNA have been used as molecular chronometers andphylogentetic studies undertaken. Refer to the section on rRNA for detailed notes on the

    methods of aligning these types of nucleic acids.

    7.3 Converting the alignment data into a phylogenetic tree

    This step is undertaken after an accurate alignment ofhomologous sequences has been

    generated.

    To date no one has devised a perfect method for tree construction and several methods areused. Extensive comparative tests have been conducted with test sequences yet none of the

    methods have failed to identify and particular method as better than the others.

    The main distinction between the different tree building methods is the way in whichmultiple sequence alignment is converted into numerical data that can be analysed

    mathematically in order to construct a tree.

    7.3.1 Distance Matrix methods

    7.3.1.1 Least squares distance matrix (modified Jukes & Cantor algorithm)

    Step 1: Generating a similarity matrix.

    Given below is an example alignment of 5 sequences with 25 positions in the alignment:Seq A AGAUUCGUCUGUAGGUUUCCACCAA

    Seq B ACAUUCGUGUAUAGGUUUCCACUAA

    Seq C ACAUUCGUGUAGAGGUUUCCACUAA

    Seq D AAGUUCGCUUGGAGGUUUCCACGAA

    Seq E AUCGUGAGAUCCAGGUAUCCACAAU

    The first step in the least squares distance matrix is to generate a similarity matrix. For this,

    count the number of identical bases in every pair of sequences in the alignment. For example thenumber of similar bases between Seq A and Seq B is 21 out of a total of 24. Therefore the

    similarity between Seq A and Seq B is 21 / 24 = 0.84.Seq A AGAUUCGUCUGUAGGUUUCCACCAA

    |X||||||X|X|||||||||||X||

    Seq B ACAUUCGUGUAUAGGUUUCCACUAA

    A similarity matrix is generated using this approach for each pair of sequences and a similarity

    table can be generated as shown below.A B C D E

    A ----- ----- ----- ----- -----

    B 0.84 ----- ----- ----- -----

    C 0.80 0.96 ----- ----- -----

  • 8/8/2019 Introduce of Phylogeny_ English

    8/15

    D 0.76 0.72 0.76 ----- -----

    E 0.52 0.52 0.52 0.52 -----

    From this table, it can be seen that sequences A and B are 0.84 (= 84%) similar, A and C are

    0.80 (=80%) similar, B and C are 0.96 (=96%) similar, etc, etc.

    Step 2 Conversion of similarities to evolutionary distances:

    Next is the calculation of evolutionary distances from their sequence similarity. Conversion of

    similarities to evolutionary distances starts with 1 - similarity (i.e. converting similarity to

    difference), which is then usually corrected for the probability of underestimation due to

    multiple substitutions (see the box below)

    .

    Evolutionary dissimilarity is usually corrected (fudged) because it is an underestimate of the

    actual evolutionary distance. Counting differences between two sequences underestimates the

    number of changes that occured between them, because more than one evolutionary change at a

    single position (e.g. A -> G -> U) counts as only one difference between two sequences, and in

    the case of reversion counts as no change at all (e.g. A -> G -> A). One way to correct

    evolutionary distances is the Jukes & Cantor method. This method is a conversion relating

    similarity to evolutionary distance such that difference (dis-similarity) and distance are very

    close initially, but levels off at 0.25 similarity, where evolutionary distance is infinite. This

    makes sense; in two sequence that are very simliar, the frequency of multiple changes at a single

    site is low, requiring only a small correction, whereas two random sequences will be appear to

    be 25% similar, just because there are only 4 bases and 1 out of 4 will match by chance.

    Understanding Multiple Substitutions

    Multiple substitution occurs when a single site undergoes two or more changes, as shown

    in the example.

    Ancestral Sequence .ATGT.

    Modern Sequence .AGGT .ACGT.

    There is only one nucleotide difference between the two modern sequences, but two

    nucleotide substitution have actually occurred. If this multiple hit is not recogonised than

    the evolutionary distance between the two sequences will be significantly underestimated.

    Distance matrices are therefore usually constructed using mathematical methods that

    include statistical approaches for estimating the amount of multiple substitutions that have

    occurred as explained below.

  • 8/8/2019 Introduce of Phylogeny_ English

    9/15

    With all of the similarities converted to evolutionary distances (whether or not they are

    corrected, or how they are corrected), you have a distance matrix:

    Corrected evolutionary distance (ED)A B C D E

    A ----- ----- ----- ----- -----B 0.18 ----- ----- ----- -----

    C 0.23 0.04 ----- ----- -----

    D 0.29 0.35 0.29 ----- -----

    E 0.77 0.77 0.77 0.77 -----

    7.3.1.2 Neighbor Joining Method approach. for building a tree from distance matrix

    The neighbor-joining method is a popular tree-building procedure that uses the distance matrix

    generated by distance matrix methods as described above.

    This done by starting with two of the sequences, separated by a line equal in length to theevolutionary distance between the sequences:

    Then the next sequence is added to the tree such that the distances between A, B and C are

    approximately equal to the evolutionary distances. Notice that the fit isn't perfect. If we could

    determine the evolutionary distances exactly, they would fit the tree exactly, but since we have

    to estimate these distances, the numbers are fit to the tree as closely as possible using a least-

    squares best fit.

    The next step is to add the next sequence, again re-adjusting the tree to fit the distances as well

    as possible:

    And at last we can add the final sequence and readjust the branch lengths one last time using

    least-squares:

  • 8/8/2019 Introduce of Phylogeny_ English

    10/15

    Notice that the distance between any two sequences is (approximately) equal to the sum of the

    length of the line segments joining those two sequences - in other words, the tree is additive.

    Interpretation of the phylogeteic tree:

    One way to think about it is to imagine that you're looking down on a real tree, with the

    branches spread out horizontally and the vertical trunk, coming up at you, is hidden from viewbut probably somewhere near the center of the branches. The nodes connecting different sets of

    branches represent common ancestors of those branches. This tree is unrooted - the single

    common ancestor of all of the sequences cannot be determined in this tree. Some people prefer

    dendrograms (see below) because evolutionary distance is easily visualized. In this example,

    sequence B and C are the most closely related. Each of these are somewhat less similar to A (a

    little closer in the case of seq B; that's why the branch to B is shorter than to C). A, B, and C are

    less similar to D, and E is only distantly related to the rest.

    Here is another way of looking at the same tree but in a different way, a dendrogram.

    A dendrogram shows evolutionary distances along the horizontal axis and assumes a root

    somewhere in the middle of the tree, in this case in the branch connecting sequence E to the restof the tree. Some people like this representation because the horizontal axis roughly

    approximates time.

    Whenever possible, it is best to include an outgroup sequence in the analysis; an outgroup is a

    sequence that is known to be outside of the group you're interested in treeing. For example, if

    you were building trees from mammalian sequences, you might include the sequence from a

    reptile as an outgroup. Outgroups provide the root to the rest of the tree - although no tree

    generated by these methods has a real root, if you know (from other information) that one of the

    sequences is unrelated to the rest, wherever that branch connects to the rest of the tree defines

    the root (common ancestor) of that portion of the tree. In the example above, sequences A - D

    might be mammalian whereas E might be a reptilian sequence. If the tree included onlymammailian sequences, it would be impossible to know where the root is, but the inclusion of

    an outgroup provides that information.

    7.3.2 Maximum Parsimony

    Neighbor joining method is a simple way of creating trees as the information content of themultiple alignment is reduced to its simplest form. Unfortunately as a result of this, information

    is lost, in particular those pertaining to ancestral identities at each position in the multiple

    alignment.

    In Maximum Parsimony method utilses a model in which it is assumed that evolutionfollows the shortest possible rout and the correct phylogenetic tree is therefore the one that

  • 8/8/2019 Introduce of Phylogeny_ English

    11/15

    requires the minimum number of nucleotide changes to produce the observed differences

    between the sequences. Trees are therefore constructed at random and the number of nucleotide

    changes that they involve calculated until all topologies have been examined and the one

    requiring the smallest number of steps identified. This is represented as the most inferred tree.

    More rigorous but necessitates more data handling. More sequences added means more treesneed to be generated. For example, with five sequences only 15 possible unrooted trees are

    generated but with 10 sequences, 2,027,025 unrooted trees and with 50 sequences the number

    exceeds the number of atoms in this universe. Not even super computers can evaluate all the

    trees with Maximum Parsimony method. This is also true for the sophisticated methods such as

    Maximum likelyhood and fastDNAML.

    7.3.4 Maximum likelyhood

    NOT FOR YEAR 2000 LECTURES

    7.3.5 Fast DNAml (Fast DNA maximum likelyhood)

    NOT FOR YEAR 2000 LECTURES

    7.4 Bootstrapping: Assessing accuracy of a reconstructed tree

    7.6 Molecular clocks enable the time of divergence of ancestral sequences to be

    estimated

  • 8/8/2019 Introduce of Phylogeny_ English

    12/15

    GLOSSARY OF TERMS:

    Allele: One of two or more alternative forms of a gene.

    Allele frequency: The frequency of an allele in a population.

    Allele-specific oligonucleotide (ASO) hybridization: The use of an oligonucleotide probe to

    determine which two alternative nucleotide sequences is contained in a DNA molecule.

    Ancestral character state: A character state possessed by a remote common ancestor of a group

    of organisms.

    Ancient DNA: DNA preserved in ancient biological samples.

    Bootstrapping or Bootstrap analysis: A method of inferring the degree of confidence that can be

    assigned to branch point in a phylogenetic tree.

    CAP: The chemical modification at the 5'-end of most eucaryotic mRNA molecules.

    CAP binding complex: The complex, also called eIG-4F and comprising the initiation factors

    eIF-4A, eIF-4E and eIF-4G, which makes the initial attachment to the CAP structure at the be

    beginning of the scanning phase of eucaryotic translation.

    Chimera: An organism composed of two or more genetically different cell types.

    Chromosome walking: A technique that can be used to construct a clone contig by identifying

    overlapping fragments of cloned DNA.

    Clone contig: A collection of clones whose DNA fragments overlap.

    Clone contig approach: A genome sequencing strategy in which the molecules to be sequenced

    are broken into manageable segments, each a few hundred kb or few Mb in length, which are

    sequenced individually.

    Codon: A triplet of nucleotides coding for a single nucleotide.

  • 8/8/2019 Introduce of Phylogeny_ English

    13/15

    Codon bias: Referes to the fact that not all codons are used equally frequently in the genes of a

    particular organism.

    Cancatemer: A DNA molecule made up of linear genomes linked head-to-tail

    Consensu sequence: A nucleotide sequence that represents "average" of a number of related butnonidentical sequences.

    Contig: A contiguous set of overlapping DNA sequences.

    Contour clamped homogenous electri field (CHEF): An electrophoresis method used to separate

    large DNA molecules.

    Convergent evolution: The situation that occurs when the same character state evolves

    independently in two lineages.

    Degenerate: Refers to the fact that the genetic code has more than one codon for most aminoacids.

    Derived character set: A character state that evolved in a recent ancestor of a subset of

    organisms in a group being studied.

    Directed evolution: A set of experimental techniques that is used to obtain novel genes with

    improved products.

    Discontinuous gene: A gene that is split into exons and introns.

    Distance matrix: A table showing the evolutionary distances between all pairs of nucleotide

    sequences in a dataset.

    Distance method: A rigorous mathematical approach to alignment nucleotide sequences (by

    maximising dissimilarities).

    Domain shuffling: Rearrangements of segments of one or more genes, each segment coding for

    a structural domain in the gene product, to create a new gene.

    Dominant: The allele that is expressed in a heterozygote.

    Dot matrix: A method for aligning nucleotide sequences.

    Exon: A coding region within a discontinuous gene.

    Exon theory of genes: An "introns early" hypothesis which state that introns were formed when

    the first DNA genomes were being formed.

    Expressed Sequence Tags (EST): A cDNA that is sequenced in order to gain rapid access to the

    genes in the genomes.

    External node: The end of a branch in a phylogenetic tree, representing one of the organisms orDNA sequences being studied.

  • 8/8/2019 Introduce of Phylogeny_ English

    14/15

    Field Inversion gel electrophoresis (FIGE):

    Molecular Clock: A device based on the inferred mutation rate that enables times to be assigned

    to the branch points in a gene tree.

    Molecular evolution: The gradual changes that occur in genomes over time due to theaccumulation of mutations and structural rearrangements resulting from recombination and

    transcription.

    Molecular phylogenetics: A set of techniques that enable the evolutionary relationships between

    DNA sequences to be inferred by making comparisons between those sequences.

    Multigene family: A group of genes, clustered or dispersed, with related nucleotide sequences.

    Multiple alignment: An alignment of three or more nucleotide sequences.

    Multiple hit or multiple substitution: The situation that occurs when a single nucleotide in aDNA sequence undergoes two mutational changes, giving rise to two new alleles, both of which

    differ from each other and from the parent at that nucleotide position.

    Multiregional evolution: A hypothesis that states that modern humans in the Old world are

    descended fromHomo erectus populations that left Africa over 1 million years ago.

    Natural selection: The preservation of favourable alleles and the rejection of injurious ones.

    Neighbor-Joining method: A method for the construction of phylogenetic trees.

    Nuclear genome: The DNA molecules present in the nucleus of eucaryotic cells.

    Open Reading Frame (ORF): A series of codons starting with an initiation codon and ending

    with a termination codon. The part of the protein-coding region that is translated into proteins.

    Orphan family: A group of homologous sequences genes whose functions are unknown.

    OFAGE:

    Orthologous: Refers to homologous genes located in the genomes of different organisms.

    Outgroup: An organism or DNA sequence that is used to root a phylogenetic tree.

    Overlapping genes: Two genes whose coding regions overlap.

    Paralogous: Refers to two or more homologous genes located in the same genome.

    Parsimony: An approach that decides between different phylogenetic tree topologies by

    identifying the one that involves the shortest evolutionary pathway.

    Phylogeny: A classification scheme that indicates evolutionary relationships between

    organisms.

    Proteome: The complete protein content of a cell.

  • 8/8/2019 Introduce of Phylogeny_ English

    15/15

    Protogenome: An RNA genome that existed during the RNA world.

    Selfisg DNA: DNA that appears to have no function and apparently contributes nothing to the

    cell in which it is found.

    Sequence tagged site (STS): A DNA sequence that is unique in the genome.