introduction to bioinformatics - shandong...

Post on 11-Jul-2020

6 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introduction to Bioinformatics

Dr. rer. nat. Gong Jing

Cancer Research Center

Medicine School of Shandong University

2012.11.09

Introduction to Introduction to BioinformaticsBioinformatics

2

Chapter 4 Phylogenetic

Tree

Introduction to Introduction to BioinformaticsBioinformatics

3

Introduction to Introduction to BioinformaticsBioinformatics

PhylogenyEvidence from morphological (形态学的), biochemical, and gene sequence data suggests that all organisms on earth are genetically related, and the genealogical (谱系的) relationship of living things can be represented by a vast evolutionary tree, the tree of Life. The tree of life then represents the phylogeny of organisms.

A phylogeny is a tree representation for the evolutionary history relating the species we are interested in.

4

Introduction to Introduction to BioinformaticsBioinformatics

The most authentic evidences are fossils! But fossils are scattered, not complete, not systematic.

How to Study the Evolutionary History

5

Introduction to Introduction to BioinformaticsBioinformatics

We can use comparative morphology and comparative anatomy (解剖学) to determine general framework of evolution. But many details are controversial.

How to Study the Evolutionary History

6

Introduction to Introduction to BioinformaticsBioinformatics

Basic assumptions:

1) Nucleic acid sequences and protein sequences contain all information of evolutionary history of species;

2) Molecular clock: the rate of evolutionary change (the number of amino acid differences) of a certain protein was approximately constant over time and over different lineages.

=> The more similar two homologous proteins are, the closer they are to their common ancestor.

How to Study the Evolutionary HistoryComputational molecular evolution: phylogenetic tree. Evolution process happened on the level of molecules: DNA, RNA and protein.

7

Homologous gene are genes that derive from a common ancestor.

They have 3 types of relationships:

Orthologs (直系同源): They’re separated by speciation — is the phenomenon during which a common ancestor gives birth to two subgroups that slowly drift away from their common genetic makeup to become distinct species. Orthologsusually have similar functions and structure.

Paralogs (间接同源): Paralogs are homologues separated by a duplication event, meaning that within a genome, a gene was duplicated. One of the duplicates may have kept the original function while the other duplicate could have acquired a new function.

Xenologs (异同源): Xeno is a Greek word that means “foreigner.” Xenologsresult from a lateral transfer between two organisms — a direct DNA transfer between two species. This means that one of the species contains a gene that does not have the same history as the genome in which it is inserted. This is often seen between pathogenetic bacteria and humans.

Introduction to Introduction to BioinformaticsBioinformatics

How to Study the Evolutionary History

8

Introduction to Introduction to BioinformaticsBioinformatics

How to Study the Evolutionary History

9

Phylogenetic TreeWhat is a phylogenetic tree used for?

For a certain protein/gene, determining the closest relatives of the organism that you’re interested in.

Discovering the function of a new protein/gene.

Retracing the origin of a gene.

Introduction to Introduction to BioinformaticsBioinformatics

10

Conceptions:

leaf / outer node

branch / lineage

inner node

root

Phylogenetic Tree

Introduction to Introduction to BioinformaticsBioinformatics

11

All these trees represent the same evolutionary relationships.

Cladogram Change-based phylogram Time-based phylogram

Branch lengths do Branch lengths indicate Inner nodes indicatenot mean anything. numbers of evolutionary branching time points.

changes

Phylogenetic Tree

Introduction to Introduction to BioinformaticsBioinformatics

With different branches, the phylogenetic trees have different names.

12

Phylogenetic Tree

Introduction to Introduction to BioinformaticsBioinformatics

There are many different ways to represent the information found in a phylogenetic tree.

13

Phylogenetic Tree

Introduction to Introduction to BioinformaticsBioinformatics

Branches can be rotated at a node, without changing the relationships among the out nodes.

14

Should you do this on the protein or on the DNA sequence?

If DNA sequences > 70% identical: DNA multiple sequence alignment.

If DNA sequences ˂ 70% identical: If your sequences code for proteins: translate them into proteins and build the protein multiple sequence alignment.

If your sequences are too similar at the protein level, you can thread the DNA sequences back onto the protein alignment using pal2nal: http://www.bork.embl.de/pal2nal/.

In practice, unless your sequences are almost identical, it is easier to keep working at the protein level.

Choosing Right Sequences for the Right Tree

Introduction to Introduction to BioinformaticsBioinformatics

choose right sequences

do multiple sequence alignment

build a phylogenetic

tree

15

Paralogs of a large human gene family: story of this gene family.

Orthologs from different species: much like a species tree.

Choosing Right Sequences for the Right Tree

Introduction to Introduction to BioinformaticsBioinformatics

16

Algorithms of Tree Reconstruction

Maximum Parsimony (MP) 最大简约法:

Closely related sequences, accurate, sequence number <= 12.

Distance (Neighbor Joining, NJ) 邻接法:

Distantly/closely related sequences, not very accurate.

Maximum Likelihood (ML) 最大似然法:

Distantly related sequences, very accurate.

Speed:

Distance > Maximum Parsimony > Maximum Likelihood

Introduction to Introduction to BioinformaticsBioinformatics

17

Algorithms of Tree Reconstruction

Introduction to Introduction to BioinformaticsBioinformatics

18

Preparing Your Multiple Sequence Alignment

Computing your multiple sequence alignment:ClustalW: http://www.ebi.ac.uk/Tools/msa/clustalw2/MUSCLE: http://www.ebi.ac.uk/Tools/msa/muscle/T-coffee: http://tcoffee.crg.cat/

Removing bad columns that affect the tree quality:1. Make sure there are as many gap-free columns as possible. 2. Remove the extremities of your multiple alignment.3. Remove the gap-rich regions of your alignment.4. Be sure to keep the most informative blocks.

Before using your MSA for building a tree, you must make sure that it is as accurate as possible.

19

1. Make sure there are as many gap-free columns as possible.

Preparing Your Multiple Sequence Alignment

columns to remove

20

2. Remove the “bad” terminals of your multiple alignment.

columns to remove

Preparing Your Multiple Sequence Alignment

21

3. Remove the gap-rich regions of your alignment.

columns to remove

Preparing Your Multiple Sequence Alignment

22

4. Be sure to keep the most informative blocks.

columns to keep

Preparing Your Multiple Sequence Alignment

23

How to Delete Columns with WORDWhile pressing the Alt key on your

keyboard, use the mouse to select entire columns in your alignment.

When you’ve selected everything you want to remove, press the Delete key to remove the selected block.

+

24

Computing Your Tree

Guide Tree is NOT a phylogenetic tree.!

25

EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

26

EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

27

EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

28

clustalw.aln

sequences.fasta

EMBL ClustalW http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

29

This tree is much more accurate than a guide tree!

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

30

A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

Phylogram Tree

31

A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.In cladogram tree, the branch lengths do not represent any change.

Cladogram Tree

Computing Your Tree

English Courses English Courses for for

Graduate StudentsGraduate Students

32

Different tree representation by choosing display options.

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

Computing Your Tree

33

The easiest way to save your tree is to make a screen capture with theprint-screen (PrntScr) key on your keyboard. You can then cut and pastethis image into your favorite application (PowerPoint, Paint. etc.).

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

Displaying Your Tree

Paste (Ctrl + V) into Windows-Paint

34

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

MyTree.ph

Displaying Your Tree

35

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html

MyTree.ph

Displaying Your Tree

36

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html

right click MyTree.png

Displaying Your Tree

37

English Courses English Courses for for

Graduate StudentsGraduate Students

Introduction to Introduction to BioinformaticsBioinformatics

sequences.fasta

clustalw.aln

MyTree.ph

MyTree.png

top related