evolutionary biology concepts molecular evolution phylogenetic inference bio520 bioinformaticsjim...
TRANSCRIPT
Evolutionary Biology Concepts
Molecular Evolution
Phylogenetic Inference
BIO520 Bioinformatics Jim Lund
Reading: Ch7
Evolution
Evolution is a process that results in heritable changes in a population spread over many generations.
"In fact, evolution can be precisely defined as any change in the frequency of alleles within a gene pool from one generation to the next." - Helena Curtis and N. Sue Barnes, Biology, 5th ed. 1989 Worth Publishers, p.974
Levels of Evolution
• Changes in allele frequencies within a species.
• Speciation.
Molecular changes:– Single bp changes.
– Genomic changes (alterations in large DNA segments).
Branching Descent
Populations Individuals
Phylogeny
Branching diagram showing the ancestral relations among species.
“Tree of Life”
History of evolutionary change
FRAMEWORK for INFERENCE
The framework for phylogenetics
• How do we describe phylogenies?
• How do we infer phylogenies?
Inheritance
DNA RNA Protein Function
Ancestral Node or ROOT of
the TreeInternal Nodes orDivergence Points
(represent hypothetical ancestors of the taxa)
Branches or Lineages
Terminal Nodes
A
B
C
D
E
Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny
Common Phylogenetic Tree Terminology
Phylogenetic trees diagram the evolutionary relationships between the taxa
((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses
Taxon A
Taxon B
Taxon C
Taxon E
Taxon D
No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.
This dimension either can have no scale (for ‘cladograms’),can be proportional to genetic distance or amount of change(for ‘phylograms’ or ‘additive’ trees).
These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.
Taxon A
Taxon B
Taxon C
Taxon D
1
1
6
5
genetic change
Taxon A
Taxon B
Taxon C
Taxon D
no meaning
Two types of trees
Cladogram Phylogram or additive tree
Meaning of branch length differs.
All show the same evolutionary relationships,or branching orders, between the taxa.
Rooted vs Unrooted Trees
More Trees
A B C D E F
Trees-3
A B C D E F
Extinction
A B C D E F
Population Genetic Forces
• Natural Selection (fitness)• Drift (homozygosity by chance)
– much greater in small populations
• Mutation/Recombination (variation)• Migration
– homogenizes gene pools
Hardy-Weinberg Paradigmp+q=1
p2 + 2pq + q2 =1
Modes of speciation
Many ways speciation can occur, among the most common are:
• Geographic isolation.
• Reproductive isolation.– Sexual selection.– Behavioral isolation.
DNA, protein sequence change
Multiple Changes/No Change
..CCU AUA GGG..
..CCC AUA GGG..
..CCC AUG GGG..
..CCC AUG GGC..
..CCU AUG GGC..
..CCU AUA GGC..
5 mutations1 DNA change
0 amino acid changes (net)
Enumerating bp/aa changes underestimates evolutionary change
Mechanisms of DNA Sequence Change
Neutral Drift vs Natural Selection
Traditionalselectionmodel
Neutral(Kimura/Jukes)
Pan-neutralism
Rate of change (evolution) of hemoglobin protein
Each point on the graph is for a pair of species, or groups of species. From Kimura (1983) by way of Evolution, Ridley, 3rd ed.
Mutation rate varies Gene-to-Gene
Protein Rate (x 109 yr)
Lysozyme 2.0
Insulin 0.4
Histone H4 0.01
Rate varies Site-to-Site
Protein Coding Silent
Albumin 0.9 6.7
Histone H4
0.03 6.1
Average 0.9 4.6
Rate varies Site-to-Site
From Evolution. Mark Rdley, 3rd Ed.
Constraints on “Silent” Changes
• Codon Biases-translation rates• Transcription elongation rates
– polymerase ‘pause’ sites
• “Silent” regulatory elements– select for or against
presence/absence
• Overall genome structure
DNA, Protein Similarity
• Similarity by common descent– phylogenetic
• Similarity by convergence (rare)– functional importance
• Similarity by chance– random variation not limitless
– particular problem in wide divergence
Homology-similar by common descent
CCCAGG
CCCAAGCCCAAA
CCTAAA
Inferring Trees and Ancestors
CCCAGGCCCAAG->
CCCAAGCCCAAA->
CCTAAACCTAAA->
CCTAAC
Not always straightforward. The data doesn’t always give a single, correct answer.
Homology, Orthology, Paralogy
Paralogy Trap
Improper Inference
Garbage in, garbage out!
Our Goals
• Infer Phylogeny– Optimality criteria
– Algorithm
• Phylogenetic inference– (interesting ones)
Watch Out
“The danger of generating incorrect results is inherently greater in computational phylogenetics than in many other fields of science.”
“…the limiting factor in phylogenetic analysis is not so much in the facility of software application as in the conceptual understanding of what the software is doing with the data.”