summer bioinformatics workshop 2008 comparative genomics and phylogenetics chi-cheng lin, ph.d.,...

18
mmer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center [email protected]

Upload: kathryn-robbins

Post on 18-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Summer Bioinformatics Workshop 2008

Comparative Genomics and Phylogenetics

Chi-Cheng Lin, Ph.D., ProfessorDepartment of Computer Science

Winona State University – Rochester [email protected]

2

Summer Bioinformatics Workshop 2008

Outline

• Comparative Genomics

• Phylogenetics

• Phylogenetic Tree

• Phylgenetics Applications

• Gene Tree vs. Species Tree

3

Summer Bioinformatics Workshop 2008

Comparative Genomics

• Analysis and comparison of genomes from different species

• Purposes– to gain a better understanding of how species have

evolved– to determine the function of genes and non-coding

regions of the genome • The functions of human genes and other DNA

regions often are revealed by studying their parallels in nonhumans.– Researchers have learned a great deal about the

function of human genes by examining their counterparts in simpler model organisms such as the mouse.

4

Summer Bioinformatics Workshop 2008

Comparative Genomics

• Features looked at when comparing genomes: – sequence similarity– gene location– length and number of coding regions within genes– amount of non-coding DNA in each genome– highly conserved regions maintained in organisms

• Computer programs that can line up multiple genomes and look for regions of similarity among them are used.

• Many of these sequence-similarity tools, such as BLAST, are accessible to the public over the Internet.

5

Summer Bioinformatics Workshop 2008

Of Mice and Men

• The full complement of human chromosomes can be cut into about 150 pieces, then reassembled into a reasonable approximation of the mouse genome.

• The colors of the mouse chromosomes and the numbers alongside indicate the human chromosomes containing homologous segments.

• This piecewise similarity between the mouse and human genomes means that insights into mouse genetics are likely to illuminate human genetics as well.

Source: http://www.ornl.gov/sci/techresources/Human_Genome/publicat/tko/06_img.html

6

Summer Bioinformatics Workshop 2008

Phylogenetics

• Phylogenetics– Study of evolutionary relationships

(sequences / species)– Infer evolutionary relationship from

shared features

• Phylogeny– Relationship between organisms

with common ancestor

• Phylogenetic tree– Graph representing evolutionary

history of sequences / speciesSource of image: http://superfrenchie.com/Pics/Blog/culture/evolution.jpg

7

Summer Bioinformatics Workshop 2008

Phylogenetics

• Premise– Members sharing common evolutionary history

(i.e., common ancestor) are more related to each other

– Can infer evolutionary relationship from shared features

• Long history of phylogenetics– Historically - based on analysis of observable features

(e.g., morphology, behavior, geographical distribution)– Now - mostly analysis of DNA / RNA / amino acid

sequences

8

Summer Bioinformatics Workshop 2008

Phylogenetics

• Goals– Understand relationship of sequence to similar sequences– Construct phylogenetic tree representing evolutionary history

• Motivation / application– Identify closely related families

• Use phylogenetic relationships to predict gene function

– Follow changes in rapidly evolving species (e.g., viruses)• Analysis can reveal which genes are under selection• Provide epidemiology for tracking infections & vectors

• Relationship to multiple sequence alignment (MSA)– Alignment of sequences should take evolution into account– More precise phylogenetic relationships Improved MSA– CLUTALW (http://www.ebi.ac.uk/clustalw/), a popular MSA

program, can produce alignment that is then used to build phylogenetic tree.

9

Summer Bioinformatics Workshop 2008

Phylogenetic Tree Terminology

• Leaf / terminal node / taxon– Node with no children– Original sequence

• Join / internal node– Point of joining two leaves / clusters– Inferred common ancestor

• Branches– Represent change– Length represents evolutionary distance

• Cluster / clade– All sequences in subtree with common

ancestor (treated as single node)

10

Summer Bioinformatics Workshop 2008

Phylogenetic Tree Terminology

• Binary tree– Each edge that splits must connect to two children

• Rooted tree– Contains a single ancestor of all nodes– Evolution proceeds from root to leaves of tree

• Unrooted tree– No single ancestor node– No direction of evolution

• Molecular clock assumption (rooted tree)– Mutations occur at constant rate– Distance from root to leaves same for each leaf

11

Summer Bioinformatics Workshop 2008

Rooted and Unrooted Trees

Human

Chimpanzee

Gorilla

Orangutan Human

Chimpanzee

Gorilla

Orangutan

Rooted Tree Unrooted Tree

Root

Direction of evolution

12

Summer Bioinformatics Workshop 2008

Possible Ways of Drawing Tree

13

Summer Bioinformatics Workshop 2008

Applications – Building Tree of Life

14

Summer Bioinformatics Workshop 2008

Source: http://gi.cebitec.uni-bielefeld.de/people/boecker/bilder/tree_of_life_new.gif

15

Summer Bioinformatics Workshop 2008

Applications – Mammal Systematics

Source: http://www.isem.univ-montp2.fr/PPP/PM/RES/Phylo/Mamm/PHYLMOL-Placentalia%7EEnglish.jpg

16

Summer Bioinformatics Workshop 2008

Application – Epidemiology (CSI!)

• Which patients are more likely infected by the dentist?

Source: http://trc.ucdavis.edu/djbegun/Lect_12.1.html

17

Summer Bioinformatics Workshop 2008

Application – Modern Human Evolution

• Based on mtDNA genome

• Example – Global mtDNA diversity

analysis (Ingman et al., 2000 Nature. Volume 408:708-713)

– Africans have twice as much diversity among them as do non-Africans Africans have a longer genetic history

– More recent population expansion for non-Africans

– Africans and non-Africans diverged recently

Out of Africa

Source of image: Ingman et al., 2000, Nature. Volume 408: 708-713

18

Summer Bioinformatics Workshop 2008

Gene Tree vs. Species Tree

• Gene typically diverges before speciation

• Phylogenetic tree based on divergence of one single homologous gene– Evolutionary history of

gene– Gene tree rather than

species tree• More genes are

needed to build species trees

Source of image: http://www.bioinf2.leeds.ac.uk/b/genomics.html