what is a phylogenetic tree? - iammdelhi.org · biological sequence databases •electronic...
TRANSCRIPT
What is a Phylogenetic Tree?
TARU SINGH
What is a phylogenetic tree used for?
• A phylogenetic tree is used to help represent evolutionary relationships between organisms that are believed to have some common ancestry.
• The name “dendogram” is the broad term for trees.
Where did the idea for a tree come from?
• Charles Darwin is credited with the earliest representation of a phylogenetic tree published in his book The Origin of Species.
What does this tree look like?
• There are many different ways to represent the information found in a phylogenetic tree.
• The basic format of a tree is generally in one of the two forms shown, although there are other ways to represent the data.
What do the lines represent?
• Each line on the tree represents one particular organism of interest.
• The distance of the lines is used to determine how closely two organisms are related to one another or how long ago the may have had a common ancestor.
• The line that connect all the other lines is the representation of the common ancestor that is being looked at to compare other organisms to.
The “Rooted” vs. “Unrooted” tree
• A rooted tree is used to make inferences about the most common ancestor of the leaves or branches of the tree. Most commonly the root is referred to as an “outgroup”.
• An unrooted tree is used to make an illustration about the leaves or branches, but not make assumption regarding a common ancestor.
The bifurcating tree
• A tree that bifurcates has a maximum of 2 descendants arising from each of the interior nodes.
The multi-furcating tree
• A tree that multi-furcates has multiple descendants arising from each of the interior nodes.
Where do I go to make a tree?
• Many computational biology programs have dendogram programs.
• An example of a free program that is available via the EMBL-EBI (European BioInformatics Institute) called ClutsalW or ClustalX. – You pick the program based on the format of your
computer, i.e. command line verses graphical interface
What criteria is important when building a tree?
• There are many different things that you should consider as you get set to build your tree.
• Some examples are; • Efficiency
• Power
• Consistency/Reliability
• Robustness
Limitations to the use of trees
• It is important to remember that trees do have limitations. For example, trees are meant to provide insight into a research question and not intended to represent an entire species history.
• Several factors, like gene transfers, may affect the output placed into a tree.
• All knowledge of limitations related to DNA degradation over time must be considered, especially in the case of evolutionary trees aimed at ancient or extinct organisms.
PHYLOGENETIC TREE FORMATION
Accession no.
Nucleotide Blast
Sequences of the related organisms obtained.
Note pad file.
Get phylogenetic at http://align.genome.jp/
Biological sequence databases
• Electronic reservoir of information. • Nucleotide sequence databases- Gen bank (NCBI) in
collaboration with DDBJ & EMBL. • Protein sequence database - SWISS PROT & PIR. • Molecular sequencer database- PDB. • Information database- OMIM (Online Mendelian
Inheritance in man). • Literature database- Medline, AIDS line.
What is a nucleotide sequence?
• The order of nucleotides in a DNA or
RNA molecule .
CGTAACCAAGGTTAACCTTGGTTACG
• A succession of any number of
nucleotides greater than four is liable
to be called a sequence .
What is an alignment?
• Sequence alignment is an
arrangement of two or more
sequences, highlighting their
similarity.
tcctctgcctctgccatcat--- caaccccaaagt
| | | | | | | | | | | | | | || | | | | | | | | | | | | | tcctgtgcatctgcaatcatgggcaaccccaaagt
Why to align sequences?
• To find evolutionarily relationship
between 2 or more genes or proteins.
• To find structurally or functionally
similar regions within proteins.
SEQUENCE ALIGNMENT
MATCH : Corresponding nucleic acid sequences are vertically
aligned.
ATGGCAT ATGGCAT
GAP :
When a residue seems to
have been deleted or inserted,
represented by dashes.
(INDELS)
ATGAGCAT ATG - GCAT
Example: sequence alignment
Task: align “abcdef” with “abdgf”
Write second sequence below the first
abcdef
abdgf Move sequences to give maximum match
between them.
Show characters that match using vertical bar
Example sequence alignment
abcdef
| |
abdgf
Insert gap between b and d on lower sequence to allow d and f to align
Example sequence alignment
abcdef
| | | |
ab - dgf
Note e and g don’t match but it is the
best alignment that can be produced.
Sequence Alignment
Procedure for comparing two or more
sequences by searching patterns that
are in the same order in the
sequences.
Pair-wise alignment:
Compare two sequences
Multiple sequence alignment: Compare
more than two sequences
SEQUENCE ALIGNMENT
GLOBAL ALIGNMENT
Alignment from start to end.
LOCAL ALIGNMENT
Alignment stops at region of identity.
Priority given to find conserved nucleotide patterns.
PHYLOGENETIC ANALYSIS
Taxon =Phylogenetically distinct units on a tree
Taxonomy – naming & classifying organisms
Systematics – naming & classifying organisms according to their evolutionary relationships
Phylogenetics – reconstructing the evolutionary relationships among organisms
Phylogenetic tree – hypothesized genealogy traced back to the last common ancestor through hierarchical, dichotomous branching
Cladistics – the principles that guide the production of phylogenetic trees, a.k.a., cladograms
PHYLOGENETIC ANALYSIS Node – branch point, speciation event
Lineage or clade – an entire branch
A clade is a monophyletic group, i.e., an
ancestral species and all of its descendents
A polyphyletic group lacks the common ancestor of species in the group
E.g., If the Class Reptilia is to be monophyletic,
birds must be included!
CLADOGRAM
Branch length is not drawn proportionally to evolutionary distance
PHYLOGRAM
The branch lengths are drawn in a scale proportional to evolutionary distances.
k
A
B
C
D
E
F
J
H
G
I
Phylogenetic tree
• Phylogenetic tree represents the evolutionary relationship among different life-forms
– Node represents the most recent common ancestor of the descendants.
– Edge lengths correspond to time estimates.
– Each node in a phylogenetic tree is called a taxonomic unit.
Types of Trees
Unrooted trees illustrate relatedness
of leaf nodes without making assumptions about ancestry.
•Rooted tree •Directed tree
•Unique node
corresponding
to the most
recent ancestor
•Leaves
represent the
entities
k
A
B
C
D
E
F
J
H
G
I
D
E
G
A
B
C H
J I
F
ROOTED TREE UNROOTED TREE
k
A
B
C
D
E
F
J
H
G
I
ROOT
INTERNAL NODES (HTU)
EXTERNAL NODES (OTU)
OTU = Operational taxonomic unit (can represent many types of comparable data)
HTU= Hypothetical taxonomic unit (hypothetical progenitors of OUT)
Branch
Outgroup
Ingroup
Phylogenetic Analysis-4 Steps
• 1.Alignment
• 2.Determining substitution models
• 3.Tree building and
• 4. Tree evaluation
• Multiple sequence alignment (Clustal X)
• Manual Editing of alignment.
• Submission to tree building program (Treecon, Mega, Phylip).
(Guide tree from Clustal W is formatted as phylip tree and can be imported into various tree drawing programs.)
1.Alignment
2. Determining substitution models.
Substitution model is selected while keeping in mind its considerations for:
Variations in length.
Insertions
Deletions
And introduction into gaps
3.Tree building • TREE BUILDING METHODS
Distance Based-
Methods •Counts the number of differences between sequences. •This number called evolutionary distance •Neighbor-joining. •Fitch Margoliash. •UPGMA
Character Based-
Methods •Derive trees that optimize
the distribution of the
actual data pattern for
each character.
•Max. Parsimony. •Max. Likelihood
Submitting sequence to DataBank
Open the link http://www.ncbi.nlm.nih.gov/index.html
Click on GenBank
Submission to GenBank
BankIt
Can submit one or few sequences
Used when sequence is not complicated
Sequin
Use for long and complicated situations
Can submit mutation, phylogenetic, population, environmental, or segmented sets
• Click on BankIt.
(Can enter N no. of sequences together)
• Fill up the form containing the details of the isolate.
Construction of phylogenetic tree
Softwares required- CLUSTAL-X and TREECON
STEPS FOR TREE CONSTRUCTION:-
1. Go to NCBI Homepage(http://www.ncbi.nlm.nih.gov/)
2. Select “Nucleotide” in search box and put accession number “EU873539” and click “GO”.
1. A sequence will come, select the sequence in FASTA by click on “FASTA”.
2. Open ”Ribossomal Database Homepage”.
3. Click on “seq match”
4. Paste sequence from NCBI site to this page in the given space and click on “Type Isolates” and then on “submit”.
5. Show “printer friendly results”-click on it.
6. Take cell acession numbers one by one and go to NCBI site and take sequence in FASTA format for all.
7. Prepare a notepad file for all this sequence.
• Go on “CLUSTAL X” (multiple alignment mode)
• Click on “file” and then select “remove gap only columns”.
• Click on “alignment” and select “output format option”.
• From “output format options” choose “PHYLIP” format and then “CLOSE”.
• Click on “alignment” and select “do complete alignment”.
• After alignment three files will display.
Dnd file
Phy file
Aln file
Go on “TREECON”
Click on “distance estimation”
Click on “start distance estimation”
Load “PHY FILE”
Click on “open”
Windows of “select sequences” will open.
Select “select all” and then OK.
• Select “jukes and contor” all, “YES” for boot strap analysis and then “OK”.
• Select “100” for bootstrap analysis and then “ok”
• To finish select “ok”.
Select “infer tree topology” from TREECON window.
Select “infering tree topology”. “neighbour joining “yes” for boot strap analysis and “ok”.
Select “root unrooted trees” from treecon windows.
Select “ start rooting unrooted trees”.
Select “single sequence (forced) and “yes” and then “ok”. The select “ok” to finish.
Select “draw phylogenetic tree”
• Select “load new tree”(just below file option). Tree will display.
• Select “add bootstrap values”{70/ABC}
• Select “1” and then “ok”
• Select “distance scale {0.1} select “0.1-0.15-0.02” and then “ok”
• Go to file select “save tree” then “as treecon file” and save the tree.
• But this tree is still not complete as this tree is “without names of organisms”.
• To add sequence names in phylogenetic tree:-
• Click on “customize” then select “sequence names” and then “change”.
• Click on “ black boxes” one by one and then put the name in the box(new name) in the title window and then ok.
• After putting all the names, save the “tree”. This is required tree.
THANK YOU