jonathan eisen talk on "phylogenomics of microbes" at lake arrowhead small genomes meeting...

Post on 01-Jun-2015

3.015 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk by Jonathan Eisen on Phylogenomics of microbes at Lake Arrowhead Small Genomes meeting in 2002.

TRANSCRIPT

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGRTIGRTIGRTIGRTIGR

“Nothing in biology makes senseexcept in the light of evolution.”

T. H. Dobzhansky (1973)

TIGRTIGRTIGRTIGR

Topics of Discussion• Introduction to phylogenomics• Uses of evolutionary analysis in genomics

– Selection of species– Functional prediction– Gene duplication– Gene loss– Genome rearrangements– Lateral transfer– Uncultured species– Specialization

TIGRTIGRTIGRTIGR

Phylogenomic Analysis

Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR

Strain Selection and Evolution

• Increasing phylogenetic representation• Determining relatedness to model organism• Understanding major evolutionary transitions• Identifying taxa with unusual (high or low) rates

of evolution• Identifying source of DNA from uncultured

species• Species naming and type strains (e.g., see Ward et.

al. 2001)

TIGRTIGRTIGRTIGRBacteria Archaea

Evolutionary Diversity Still Poorly Represented in Complete Genomes

TIGRTIGRTIGRTIGR

BacteriaArchaea

Eukaryotes

Giardia

Trichomonas

Naegleria

Trypanosoma

Euglena

Plasmodium

Tetrahymena

Phytophthora

Arabidopsis

Chlamydomonas

Dictyostelium

Humans

Fly

Worm

Encephalatozoon

S. cerevisiae

S. pombe

S. pombe Genome AnalysisEukaryotes vs. Prokaryotes

TIGRTIGRTIGRTIGR

Plants

Giardia

Trichomonas Parabisalia

Diplomonads

Naegleria

Trypanosoma

Euglena

Plasmodium

Tetrahymena

Phytophthora

Arabidopsis

Chlamydomonas

Fungi

Animals

Dictyostelium

HumansFly

Worm

Encephalatozoon

S. cerevisiaeS. pombe

Microsporidia

Dictyostelia

HeterokontsCiliates

ApicomplexaKinetoplastids

EuglenasAcrasidae

Single vs. Multi-celled

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer • Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR

Predicting Function

• Identification of motifs• Homology/similarity based methods

– Highest hit, top hit, HMMs, threading

• Evolutionary methods– Phylogenetic trees– Ds/Dn– Phylogenetic profiles

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf.StrpyMutS.BacsuMutSSynspMutSEcoliorfNeigoMutSThemaMutSTheaq

orf.Deiraorf.ChltrMSH1.SpombeMSH1.YeastMSH3.YeastSwi4.SpombeRep3.MousehMSH3.Humanorf.ArathMSH6.YeastGTBP.HumanGTBP.MouseMSH6.ArathorfStrpyyshDBacsuMSH5CaeelhMHS5humanMSH5YeastMutS.MetthorfBorbuMutS2AquaeMutSSynsporfDeiraMutS.HelpysgMutS.SauglMSH4.YeastMSH4.CaeelhMSH4.HumanA.AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMutS2.MetthMutS2.SauglStrpyBacsuCaeelHumanYeastBorbuAquaeSynspDeiraHelpyYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2B.AquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathMouseMouseFlyRatMouseHumanYeastStrpyBacsuEcoliTheaqYeastYeastHumanYeastHumanArathStrpyBacsuHumanMutS2-MetthBorbuAquaeSynspDeiraHelpyMutS2-SauglCaeelYeastYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2C.MutS2StrpyBacsuMutS2.MetthBorbuAquaeSynspDeiraHelpyMutS2.SauglCaeelYeastYeastCaeelHumanHumanMSH4Segregation &

Crossover

MSH5Segregation &

Crossover

FlyMouseHumanYeastAquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathArathMutS1All MMR

(Bacteria)

RatStrpyBacsuEcoliTheaqYeastYeastMouseHumanYeastHumanMouseMSH1MMR in

Mitochondria

MSH3MMR of

Large Loops in Nucleus

MSH6MMR of

Mismatches and Small Loopsin Nucleus

MSH2All MMR

in Nucleus

D.

TIGRTIGRTIGRTIGR

rRNA and Uncultured Microbes

TIGRTIGRTIGRTIGR

Evolutionary Rate Variation

231456

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR

Why Duplications Are Useful to Identify

• Allows division into orthologs and paralogs

• Improves functional predictions

• Helps identify mechanisms of duplication

• Can be used to study mutation processes in different parts of a genome

• Lineage specific duplications may be indicative of species’ specific adaptations

TIGRTIGRTIGRTIGR

Lineage Specific Duplications in Wolbachia wMelAnnotationankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinconserved domain proteinconserved domain proteinconserved domain proteinconserved domain proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinFRAMESHIFTconserved hypothetical proteinPOINT MUTATIONconserved hypothetical protein,degenerateconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,interruption-Cconserved hypothetical protein,POINT MUTATIONconserved hypothetical protein,POINT MUTATIONconserved hypothetical protein,truncatedconserved hypothetical protein,truncationDNA mismatch repair proteinMutL (mutL)DNA repair protein RadC,putativeDNA repair protein RadC,putative, truncationDNA repair protein RadC,truncationDnaJ domain proteinDnaJ domain proteinexopolysaccharide synthesisprotein ExoD-related proteinexopolysaccharide synthesisprotein ExoD-related proteinHNH endonuclease familyproteinHNH endonuclease familyproteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical protein

hypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinmajor facilitator familytransportermajor facilitator familytransportermajor facilitator familytransportermembrane protein, putativemembrane protein, putativemembrane protein, putativeMutL family proteinNa+/H+ antiporter family proteinNa+/H+ antiporter, putativepermease, putativeportal protein, FRAMESHIFTportal protein, FRAMESHIFTprophage LambdaW1, DNAmethylaseprophage LambdaW1, terminaselarge subunit, putativeprophage LambdaW2, ankyrinrepeat domain proteinprophage LambdaW2, ankyrinrepeat domain protein

prophage LambdaW2, baseplateassembly protein J, putativeprophage LambdaW2, baseplateassembly protein V, putativeFRAMESHIFTprophage LambdaW2, baseplateassembly protein V, putativeFRAMESHIFTprophage LambdaW2, baseplateassembly protein W, putativeprophage LambdaW2, minor tailprotein Z, putative,FRAMESHIFTprophage LambdaW2, site-specific recombinase, resolvasefamilyprophage LambdaW4, ankyrinrepeat domain proteinprophage LambdaW4, DNAmethylaseprophage LambdaW4, portalprotein, FRAMESHIFTprophage LambdaW4, portalprotein, FRAMESHIFTprophage LambdaW4, terminaselarge subunit, putativeprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, baseplateassembly protein J, putative,FRAMESHIFTprophage LambdaW5, baseplateassembly protein V, putativeprophage LambdaW5, baseplateassembly protein W, putativeprophage LambdaW5, minor tailprotein Z, putative, degenerate,FRAMESHIFTprophage LambdaW5, site-specific recombinase, resolvasefamilyregulatory protein RepA, putativeregulatory protein RepA, putativereverse transcriptase, putativereverse transcriptase, putativereverse transcriptase, putativesodium/alanine symporter familyproteinsodium/alanine symporter familyproteinTenA/THI-4 family proteintranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulator, putativetranslation elongation factor Tu(tuf)translation elongation factor Tu(tuf)transposase, degeneratetransposase, IS4 familytransposase, IS4 familytransposase, IS4 familytransposase, IS5 family,interruption-Ntransposase, IS5 family,truncationtransposase, putative, degeneratetransposase, putative, degeneratetransposase, putative, degeneratetype IV secretion system proteinVirB4, putativeUDP-N-acetylglucosaminepyrophosphorylase-relatedprotein

TIGRTIGRTIGRTIGR

MutL Duplication in Wolbachia wMel

ORF01096 DNA mismatch repair protein MutL (mutL)ORF00446 MutL family protein

TIGRTIGRTIGRTIGR

MutL Duplication in Wolbachia wMel

TIGRTIGRTIGRTIGR0.1

Schizosaccharomyces pombeGP139

Neurospora crassaPIRS55262S552

Clostridium perfringensGP18145

Bacillus subtilisSPP45864YWJD

Bacillus cereusGP6759487embCAB

B BACAN 01914 UV endonuclease

Bacillus haloduransOMNINTL01BH

B BACAN 01459 UV endonuclease

Deinococcus radioduransGP61167

Nostoc sp. PCC 7120GP17130610d

Older Duplication of UVDE

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR

X-files

Eisen et al. 2000. Genome Biology 1(6): 11.1-11.9

Also see Tillier and Collins. 2000. Nature Genetics 26(2):195-7 and Suyama and Bork. 2001. Trends Genetics 17: 10-13.

TIGRTIGRTIGRTIGR C. trachomatis MoPn

C. p

neu

mon

iae

AR

39Origin

Terminus

C. trachomatis vs C. pneumoniae Dot Plot

Read et al. 2000

TIGRTIGRTIGRTIGR

StrpB vs. StrpA All

13621300

13621500

13621700

13621900

13622100

13622300

13622500

13622700

13622900

13623100

0 500 1000 1500 2000 2500

Series1

TIGRTIGRTIGRTIGR

StrpB vs. StrpA: Orthologs

13621300

13621500

13621700

13621900

13622100

13622300

13622500

13622700

13622900

13623100

0 500 1000 1500 2000 2500

Series1

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR

Most ‘Evidence’ for Gene Transfer has Alternative Explanations

Observation Other Causes Always Occurs

Unusual Distribution Sampling bias Not if recipient already has gene.

Unusual GC/Codons Selection Not if donor/recipient similar.Not if it occurred long ago.

High hit to "distant" species SelectionRate variationGene loss

Usually.

Incongruent trees Bad treesMissed paralogs

Usually.

Correlation of above withneighbors

Selection Only if genes keep order aftertransfer.

TIGRTIGRTIGRTIGR

Steps in Lateral Gene Transfer

1

2

3-5

6

A B C D

TIGRTIGRTIGRTIGR

Mitochondrial Genome Integration into A. thaliana chrII

Lin et al., 1999

TIGRTIGRTIGRTIGR

Number of pBVTs Dependson # of Genomes Analyzed

1 2 3 4 5 Other

0

200

400

600

800

1000

1200

1400

1600

1800

Number of protein sets

Fruit flyC. elegansArabidopsisYeastParasites

Salzberg et al. 2001

TIGRTIGRTIGRTIGR

Trees Don’t Support Transfer II

TIGRTIGRTIGRTIGR

Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species

TIGRTIGRTIGRTIGR Beja O, et.al., Science 2000 289:1902-6, Nature (2001) 411: 786-789

TIGRTIGRTIGRTIGR

Puf Operons from Uncultured Bacteria

TIGRTIGRTIGRTIGR

Puf Operons vs. Cultured Species

TIGRTIGRTIGRTIGR

Alternative Phylogenetic AnchorsChlorobium tepidum

Cytophaga hutchinsonii

Prevotella ruminocola

Bacteroides fragilis

Porphyromonas gingivalis

MBBAD68TR

MBBAD65TR

TIGRTIGRTIGRTIGR

Acknowledgements• Outside TIGR

–A. Stoltzfus

–H. Ochman

–D. Bryant

–W. F. Doolittle

–M. Eisen

–M-I Benito

• $$$:

–NSF

–NIH

–ONR

–DOE

–NEB

TIGRTIGRTIGRTIGR

B. anthracis lineage specific duplications

ORF04205 molybdopterin biosynthesis protein MoeA (moeA)ORF05907 molybdopterin biosynthesis protein MoeA (moeA)ORF02636 molybdopterin biosynthesis protein MoeA (moeA)ORF04204 molybdopterin biosynthesis protein MoeB, putativeORF05908 molybdopterin biosynthesis protein MoeB, putativeORF02634 molybdopterin biosynthesis protein MoeB, putativeORF05904 molybdopterin converting factor, subunit 1 (moaD)ORF02639 molybdopterin converting factor, subunit 1 (moaD)ORF04206 molybdopterin converting factor, subunit 2 (moaE)ORF05905 molybdopterin converting factor, subunit 2 (moaE)ORF02638 molybdopterin converting factor, subunit 2 (moaE)

Based on Read et al. submitted

TIGRTIGRTIGRTIGR0.1

Schizosaccharomyces pombeGP139

Neurospora crassaPIRS55262S552

Clostridium perfringensGP18145

Bacillus subtilisSPP45864YWJD

Bacillus cereusGP6759487embCAB

B BACAN 01914 UV endonuclease

Bacillus haloduransOMNINTL01BH

B BACAN 01459 UV endonuclease

Deinococcus radioduransGP61167

Nostoc sp. PCC 7120GP17130610d

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

C. pneumoniae Paralogs by Position

TIGRTIGRTIGRTIGR

C. pneumoniae Paralogs - Lineage Specific

top related