an em algorithm for inferring the evolution of eukaryotic gene structure liran carmel, igor b....

35
An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National Institutes of Health

Upload: marco-morriss

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure

Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin

NCBI, NLM, National Institutes of Health

Page 2: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 3: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

GU AG

splicing

exon1 exon2intron

mRNA

What are Exons and Introns

exon1 exon2

Page 4: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Related work

2 3 4

Gilbert 2005 [hybrid; branch-specific]

Koonin2003 [Dollo Parsimony]

Csuros 2005 [ML; branch-

specific]

Kenmochi 2005 [ML; branch-

specific]

Stolzfus 2004 [Bayes; gene-

specific]

gain

Stolzfus

Koonin,Kenmochi,Csuros

Gilbert

Koonin,Kenmochi,Csuros

loss

Page 5: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 6: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Eukaryota

AME

DicdiUnikonts

Unikonts

Metazoa

CoelomataDeuterostomia

Diptera

FungiAscomycota

ScAfNc

Magnoliophyta

Chordata

Vertebrata

Apicomplexa

Pezizomycotina

Amniota

Mammals

DicdiCaeel

StrpuCioin

DanreGalga

Homsa DromeAnoga

CryneSchpo

SacceAspfu

NeucrArath

OrysaThepa

PlafaRoden

Phylogenetic tree

Page 7: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC…DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC…CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC…AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC…SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC…SP …ATGTTGATT---CTCGTCGTACTCTCGTAC…

Multiple alignment

Page 8: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

41 118 222 230 251 309 377 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0SP 0 1 1 0 0 0 1 0 0 0 0 0 0CE 1 1 0 0 0 0 1 0 0 0 0 0 0DM 1 1 0 0 0 0 1 0 0 1 0 0 0HS 1 1 0 0 1 0 1 0 1 1 1 0 0AT 1 1 0 1 0 1 1 1 0 1 0 1 1

Strong phyletic signal

Presence/absence maps (proteasome component C3)

Page 9: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Missing data

HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC…DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC…CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC…AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC…SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC…SP …ATGTTGATT---CTCGTCGTACTCTCGTAC…?

Page 10: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Missing data (proteasome component C3)

41 118 222 230 251 309 377 401 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0 0SP 0 1 1 0 0 0 1 ? 0 0 0 0 0 0CE 1 1 0 0 0 0 1 0 0 0 0 0 0 0DM 1 1 0 0 0 0 1 0 0 0 1 0 0 0HS 1 1 0 0 1 0 1 ? 0 1 1 1 0 0AT 1 1 0 1 0 1 1 1 1 0 1 0 1 1

Page 11: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Eukaryota

AME

DicdiUnikonts

Unikonts

Metazoa

CoelomataDeuterostomia

Diptera

FungiAscomycota

ScAfNc

Magnoliophyta

Chordata

Vertebrata

Apicomplexa

Pezizomycotina

Amniota

Mammals

DicdiCaeel

StrpuCioin

DanreGalga

Homsa DromeAnoga

CryneSchpo

SacceAspfu

NeucrArath

OrysaThepa

PlafaRoden

Bayesian Network

Page 12: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 13: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Probability structure

descendant in state 0 descendant in state 1

parent in state 0

parent in state 1

)1(1 tget )1( tget

root prior probability: transition probability: for gene and branch

of length

branch-specific loss

gene-specific loss

branch-specific

gain

gene-specific

gain

tg0

tget )1(1

tget )1(

t

Page 14: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Rate variation across sites

gain variation

loss variation

gg r

);()1()(~ Gr

gg r

);(~ Lr

shape parameter

(gain)

fraction of invariant sites

shape parameter

(loss)

Page 15: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Parameter Summary

Global parameters – probability for intron absence in the root – fraction of invariant sites – shape parameters of the gamma distribution

Gene-specific parameters – gain rate – loss rate

Branch-specific parameters – gain coefficient – loss coefficient

0

LG ,

g

g

tt

Page 16: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Homogeneous vs. Heterogeneous Evolution

The number of parameters in the model

GS 2)22(24

number of extant species

number of genes

HomogeneousEvolution

setting G = 1

HeterogeneousEvolution

fixing global parameters and branch-specific parameters

Page 17: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 18: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Likelihood maximization via Expectation Maximization

E-Stepinward-outward recursions on the treemember in the junction-tree algorithms

familymissing data are naturally embedded

Page 19: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Inward (gamma) recursion

)|)(Pr()( PqqLq

?

?

?

?

??

q

Page 20: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Inward (gamma) recursion - Initialization

)|)(Pr()( PqqLq

t

tt

t

tt

t

t

s

se

e

se

e

qtgk

tgk

tgk

tgk

1

1

1)1(

)1(

0)1(1

)1(1

)('

'

Page 21: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Inward (gamma) recursion - Recursion

)|)(Pr()( PqqLq

1

0

)()()()(j

Rtj

Ltjt

gijti qqqAq

q

Page 22: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outward (alpha) recursion

))0(|,Pr(),( qLPqqPqq

Page 23: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Likelihood maximization via EM E-Step

inward-outward recursions on the treemember in the junction-tree algorithms

familymissing data are naturally embedded

M-Steplow-tolerance variable-by-variable

maximizationNewton-Raphson

Page 24: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 25: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Intron density in ancient eukaryotes

2 3 4

Gilbert2005

Koonin2003

Csuros 2005

Kenmochi 2005

Stolzfus 2004

Page 26: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Evolutionary Landscape

loser

gainer

stable

dynamic

Eukaryota

AMEDicdiUnikonts

Unikonts

Metazoa

CoelomataDeuterostomia

Diptera

FungiAscomycota

ScAfNc

Magnoliophyta

Chordata

Vertebrata

Apicomplexa

Pezizomycotina

Amniota

Mammals

DicdiCaeel

StrpuCioin

DanreGalga

Homsa DromeAnoga

CryneSchpo

SacceAspfu

NeucrArath

OrysaThepa

PlafaRoden

Page 27: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Modes of Evolution

0 1 2 3 4 5 60

0.005

0.01

0.015

0.02

0.025

total loss rate [1/BYA]

tota

l gai

n ra

te [1

/BY

A]

Deuterostomia

Metazoa

Unikonts

Ascomycota

SacceDiptera

ScAfNc

Fungi

Roden

Caeel

AnogaSchpo

ChordataPezizomycotina

Cioin

Page 28: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Modes of Evolution

0 1 2 3 4 5 60

0.005

0.01

0.015

0.02

0.025

total loss rate [1/BYA]

tota

l gai

n ra

te [1

/BY

A]

Deuterostomia

Metazoa

Unikonts

Ascomycota

SacceDiptera

ScAfNc

Fungi

Roden

Caeel

AnogaSchpo

ChordataPezizomycotina

Cioin

loser

gainer

stable

dynamic

Page 29: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

234 genes

295 genes

187 genes

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 30: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Gene Characteristics

New features of genes: Intron gain rate Intron loss rate

Old features of genes: Expression level Evolutionary rate Lethality Connectivity in protein-protein interactions Connectivity in genetic interactions

Page 31: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Combined Features

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Status

Ada

ptab

ility

NPGI

PGLER

EL

PPI

KE

Page 32: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Combined Features

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Adaptability

Re

activ

ity

NP

GI

PGL

ER

EL

PPI

KE

PPI

ER

Page 33: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Important genes gain introns

Status Adaptability reactivity

Gain rate 0.33 0.06 0.37

Loss rate 0.05 -0.03 0.07

Page 34: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Outline

Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

Page 35: An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National

Conclusions

Disparate landscape – both gain and loss play role in intron evolution

The common ancestor of the crown group had an intron content comparable to fungi, apicomlexans and dipterans

Three modes of evolution – more than one mechanism?

Important genes tend to gain introns