alternative splicing: a playground of evolution

Post on 17-Jan-2016

26 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia RECOMB, 20 May 2008. % of alternatively splic ed human and mouse genes , by year of publication. 100%. - PowerPoint PPT Presentation

TRANSCRIPT

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for BioinformaticsInstitute for Information Transmission Problems RAS,

Moscow, Russia

RECOMB, 20 May 2008

% of alternatively spliced human and mouse genes, by year of publication

Human (genome / random sample)

Human (individual chromosomes)

Mouse (genome / random sample)

All genes

Only multiexon genes

Genes with high EST coverage

2008

C.Burge

100%

Roles of alternative splicing

• Functional:– creating protein diversity

• human: ~30.000 genes, >100.000 proteins

– maintaining protein identity• e.g. membrane (receptor) and secreted isoforms• dominant negative isoforms• combinatorial (transcription factors, signaling domains)

– regulatory• e.g. via chanelling to NMD (nonsense-mediated decay)

• Evolutionary

• Evolution of alternative exon-intron structure

• Origin of new (alternative) exons and sites

• Evolutionary rates in constitutive and alternative regions

Plan

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Mutually exclusive exons

Sources of data• ESTs:

1999 global 2002-3 comparative– mapping exon-intron structure to genome– global alignment of genomes– identifying non-conserved exons and splice sites

• oligonucleotide arrays (chips):2001 global2004 comparative– qualitative analysis (inclusion values)– genome-specific constitutive / alternative exons

• mRNA-seq (new generation high-throughput):2008 globalexpected 2009-10 comparative

Alternative exons are often genome-specific

(Modrek & Lee, 2003)

~ 25% AS events in ~50% genes are not conserved

Na/K-ATPase Fxyd2/FXYD2

p53

Nurtdinov…Gelfand, 2003

Alternative exon-intron structure in fruit flies and malarial mosquito

• Same procedure (AS data from FlyBase)

– cassette exons, splicing sites

– also mutually exclusive exons, retained introns

• Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes

• Technically more challenging:

– incomplete genomes

– the quality of alignment with the Anopheles genome is lower, especially for terminal exons

– frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Malko…Gelfand, 2006

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura

97% 75-80%

D. melanogaster – Anopheles gambiae

77% ~45%

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exact green – divided exons yellow – joined exonorange – mixed red – non-conserved• retained introns are the least conserved

(are all of them really functional?)• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exact green – divided exons yellow – joined exonsorange – mixed red – non-conserved• ~30% joined, ~10% divided exons (less introns in Aga)• mutually exclusive exons are conserved exactly• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Genome-specific AS: real or noise?

young or deteriorating?

• minor isoforms, small inclusion rate

• often frameshifting and/or stop-containing => NMD– regulatory role?

Sorek, Shamir & Ast, 2004

Alternative exon-intron structure in the human, mouse and dog genomes

• Human-mouse-dog triples of orthologous genes

• We follow the fate of human alternative sites and exons in the mouse and dog genomes

• Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation:– conservation of the corresponding region

(homologous exon is actually present in the considered genome);

– conservation of splicing sites (GT and AG)

Nurtdinov…Gelfand, 2007

Caveats

• we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes

• we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog

• functionality assignments (translated / NMD-inducing) are not very reliable

Gains/losses: loss in mouse

Commonancestor

Gains/losses: gain in human (or noise)

Commonancestor

Gains/losses: loss in dog (or possible gain in human+mouse)

Commonancestor

Human-specific alternatives: noise?

Conserved alternatives

Triple comparison

Human-specific alternatives: noise?

Conserved alternatives

Lost in dog

Lost in mouse

Translated and NMD-inducing cassette exons

• Mainly included exons are highly conserved irrespective of function• Mainly skipped translated exons are more conserved than NMD-inducing

ones • Numerous lineage-specific losses

– more in mouse than in dog– more of NMD-inducing than of translated exons

• ~40% of almost always skipped (<1% inclusion) human exons are conserved in at least one lineage (mouse or dog)

Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise

Nurtdinov…Gelfand, 2009

The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate

• Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time

Conserved rodent-specific exons and pseudoexons

Estimation of “FDR” by analysis of conservation of pseudoexons• intronic fragments with the same characteristics (length distribution etc.)• apply standard rules to estimate “conservation”• obtain the number (fraction) of rodent-specific exons that could be

pseudoexons conserved by chance (brown)• obtain the number (fraction) of real rodent-specific exons (dark green):

~50%, that is, ~15% of mouse-specific exons (the rest is likely noise)

Alternative donor and acceptor sites: same trends

• Higher conservation of ~uniformly used sites• Internal sites are more conserved than external ones (as expected)

Evolution of (alternative) exon-intron structure in 11 Drosophila spp.

Dana

Dmel

Dsec

Dyak

Dere

Dpse

Dmoj

DvirDgriD. melanogasterD. secheliaD. yakubaD. erectaD. ananassaeD. pseudoobscuraD. mojavensisD. virilisD. grimshawi

D. Pollard, http://rana.lbl.gov/~dan/trees.html

D.willistonii

D.persimilis

Gain and loss of alternative segments and constitutive exons

Dmel

Dsec

Dyak

DereDana

Dpse

Dmoj

DvirDgri

– 34.– 0.9

+ 131.+ 0.4

– 13.– 0.6

– 5.– 0.2

± 57.± 1.0

Sample size397 / 18596

– 134.– 1.1

– 24.– 1.2

– 14.– 1.6

– 40.– 2.3

+ 143.+ 1.1

– 100.– 6.6

+ 184.+ 1.1

– 37.– 8.7

– 57.– 0.5

+ 85.+ 0.8

Dwil

– 16.– 0.3

+ 45.+ 0.9

Unique events per 1000 substitutions. Caveat: We cannot observe exon gain outside and exon loss within the D.mel. lineage

Dper– 175.– 20.2

– 75.– 7.2

Gain and loss of alternative segments and constitutive exons

Dmel

Dsec

Dyak

DereDana

Dpse

Dmoj

DvirDgri

– 151.– 3.6

+ 188.+ 0.7

– 68.– 1.4

– 72.– 0.4

± 81.± 1.3

Sample size452 / 18874

– 233.– 1.8

– 33.– 2.9

– 40.– 2.1

– 83.– 4.2

+ 226.+ 2.7

– 330.– 9.3

+ 213.+ 1.3

– 164.– 11.7

– 272.– 1.0

+ 98.+ 1.3

Dwil

– 49.– 1.1

+ 120.+ 1.7

Non-unique events per 1000 substitutions (Dollo parsimony)

Dper– 408.– 27.6

– 238.– 9.8

Conserved alternative splicing in nematodes

• 92% of cassette exons from Caenorhabditis elegans are conserved in Caenorhabditis briggsae and/or Caenorhabditis remanei (EST-genome comparisons)– in minor isoforms as well– especially for complex events

• there is less difference between levels of AS (exon inclusion) in natural C.elegans isolates than in mutation accumulation lines (microarray analysis) => positive selection on the level of AS.

Irimia…Roy, 2007; Barberan-Sohler & Zaler, 2008

Plants: little conservation of alternative splicing

• Arabidopsis thaliana – Oriza sativa (rice)

• Oriza sativa (rice) – Zea mays (maize)

• Few AS events are conserved (5% of genes compared to ~50% of genes with AS)

• the level of conservation is the same for translated and NDM isoforms

Severing…van Hamm, 2009

Constitutive exons becoming alternative

• human-mouse comparison, EST data => 612 exons constitutively spliced in one species and alternatively in the other

• all are major isoform (predominantly included)• analysis of other species (selected cases):

ancestral exons have been constitutive• characteristics of such exons (molecular

evolution: Kn/Ks, conservation of intron flanks etc) are similar to those of constitutive exons

Lev-Maor…Ast, 2007

Changes in inclusion rate

• orthologous alternatively spliced (cassette) exons of human and chimpanzee

• quantitative microarray profiling• estimate the inclusion rate by comparison of

exon and exon-junction probes

=> 6-8% of altertnative exons have significantly different inclusion levels

Calarco…Blencowe, 2007

Sources of new exons

• exon shuffling and duplications– mutually exlusive exons

• exonisation: new exons, new sites– in repeats

• constitutive exons becoming alternative

Alternative splice sites: Model of random site fixation

• Plots: Fraction of exon-extending alternative sites as dependent on exon length– Main site defined as the one in

protein or in more ESTs– Same trends for the acceptor

(top) and donor (bottom) sites

• The distribution of alt. region lengths is consistent with fixation of random sites– Extend short exons– Shorten long exons

A natural model: genetic diseases

• Mutations in splice sites yield exon skips or activation of cryptic sites

• Exon skip or activation of a cryptic site depends on:– Density of exonic splicing enhancers (lower in skipped exons)– Presence of a strong cryptic nearby

Av. dist. to a stronger site

Skipped exons

Cryptic site exons

Non-mutated exons

Donor sites 220 75 289

Acceptor sites 185 66 81

Kurmangaliev & Gelfand, 2008

Creation of sites

acceptor sites in exon in intron

cryptic sites (mutations in the

main site)88 29

new sites 32 78

Vorechovsky, 2006; Buratti…Vorechovsky, 2007

donor sites in exon in intron

cryptic sites (mutations in the

main site)121 133

new sites 46 46

MAGE-A family of human CT-antigens• Retroposition of a spliced mRNA, then duplication• Numerous new (alternative) exons in individual copies arising from

point mutations

Creation of donor sites

Improvement of an acceptor site

Exonisation of repeats• early studies: 61 alternatively

spliced translated exon with hits to Alu (no constitutive exons)

• 84% frame-shiting or stop-containing

• exonisation by point mutations in cryptic sites in the Alu consensus– studied in experiment

• both donor and acceptor sites• recent studiy: 1824 human

exons, 506 mouse exons– Alu, L1, LTR may generate

completely new exons

Sorek, Ast, Graur, 2002; Lev-Maor…Ast, 2003; Sorek…Ast, 2004; Sela…Ast, 2007

human mouse

unique 1060 (Alu)

285 (B1, B2, B4, ID)

MIR 181 27

L1 219 102

L2 103 9

CR1 12 0

LTP 155 72

DNA 93 11

Evolutionary rate in constitutive and alternative regions

• Human and mouse orthologous genes• D. melanogaster and D. pseudoobscura

• Estimation of the dn/ds ratio: higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection

Human/mouse genes: non-symmetrical histogram of

dn/ds(const. regions)–dn/ds(alt. regions)

1 5

3

5

9 1 0

1 8

4 0

6 7

1 3 6

3 2 9

7 5 2 6 4 2

1 9 9

7 3

2 71 8

7 7

01 0 01

1 0

1 0 0

1 0 0 0

– 1 – 0 .9– – 0 .8 – 0 .7 – 0 .6 – 0 .5 – 0 .4 – 0 .3 – 0 .2 – 0 .1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

G en es

C– A

Black: shadow of the left half.In a larger fraction of genes dn/ds(alt) > dn/ds(const), especially for larger values

Concatenated regions:Alternative regions evolve faster than constitutive ones(*) in some other studies dN(alt)<dN(const): less synonymous substitutions in alternaitve regions

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN

dN/dS

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Weaker stabilizing selection (or positive selection) in alternative regions

(insignificant in Drosophila)

0,1680,183

П

0,068

A

0,076

0,405 0,414П A

dN/dS

dN

dS

П A

0,790,80

A

0,220,25

0,28

0,31

П

dN/dS

dS

dN

1

0

Different behavior of terminal alternatives

П A

AN

AI

AC

1,43

0,790,80

0,90

0,62

A

AN

AI

AC

0,22

0,250,23

0,33

0,25

0,28

0,31

0,37

0,23

0,28

П

AN

AI

A AN

П

0,1680,183 0,186

AI

0,169

AC

0,297

П

0,068

A

0,076

AN

0,076

AI

0,074

AC

0,132

0,405 0,414 0,4100,437П A AN

AI

0,445

AC

dN/dS

dS

dN

1,5

0

Mammals: Density of substitutions increases in the N-to-C direction

Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions, in internal alternative regions

Many drosophilas, different alternatives

dN in mutually exclusive exons same as in constitutive exons

dS lower in almost all alternatives: regulation?

Relaxed (positive?) selection in alternative regions

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions• Human and chimpanzee genome substitutions vs human SNPs• Exons conserved in mouse and/or dog• Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance

Pn/Ps (SNPs) Kn/Ks (genomes) diff. Signif.

Const. 0.72 0.62 – 0.10 0

Major 0.78 0.65 – 0.13 0.5%

Minor 1.41 1.89 + 0.48 0.1%

Minor isoform alternative regions:• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%• More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37%• Positive selection (as opposed to lower stabilizing selection):

α = 1 – (Pa/Ps) / (Ka/Ks) ~ 25% positions • Similar results for all highly covered genes or all conserved exons

An attempt of integration

• AS is often species-specific

• young AS isoforms are often minor and tissue-specific

• … but still functional– although species-specific isoforms may result from aberrant splicing

• AS regions show evidence for decreased negative selection– excess non-synonymous codon substitutions

• AS regions show evidence for positive selection – excess fixation of non-synonymous substitutions (compared to SNPs)

• AS tends to shuffle domains and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

What next?

• Changes in inclusion rates (mRNA-seq)– revisit constitutive-becoming-alternative exons

• Other taxonomical groups• Evolution of regulation

– donor and acceptor splicing sites– splicing enhabcers and silencers– cellular context (SR-proteins etc.)

• Control for:– functionality: translated / NMD-inducing (frameshifts, stop codons)– exon inclusion (or site choice) level: major / minor isoform– tissue specificity pattern (?)– type of alternative – 1: N-terminal / internal / C-terminal– type of alternative – 2: cassette and mutually exclusive exons,

alternative sites, etc.

Acknowledgements

• Discussions– Eugene Koonin (NCBI)– Igor Rogozin (NCBI) – Vsevolod Makeev (GosNIIGenetika)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)– Sergei Nuzhdin (USC)

• Support– Howard Hughes Medical Institute– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Foundation of Basic Research

Authors

• Andrei Mironov (Moscow State University)

• Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog

• Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito

• Ekaterina Ermakova (IITP) – Kn/Ks

• Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test

• Irena Artamonova (Inst. of General Genetics and IITP, Moscow) – human/mouse, plots, MAGE-A

Bonus track: conserved secondary structures regulating (alternative)

splicing in the Drosophila spp.

• ~ 50 000 introns

• 17% alternative, 2% with alt. polyA signals

• >95% of D.melanogaster introns mapped to at least 7 of 12 other Drosophila genomes

• Search for conserved complementary words at intron termini (within 150 nt. of intron boundaries), then align

• Restrictive search => 200 candidates

• 6 tested in experiment (3 const., 3 alt.). All 3 alt. ones confirmed

CG33298 (phopspholipid translocating ATPase): alternative donor sites

Atrophin (histone deacetylase): alternative acceptor sites

Nmnat (nicotinamide mononucleotide

adenylytransferase): alternative splicing and polyadenylation

Less restrictive search => many more candidates

Properties of regulated introns

• Often alternative• Longer than usual• Overrepresented in genes linked to

development

Authors

• Andrei Mironov (idea)• Dmitry Pervouchine (bioinformatics)• Veronica Raker, Center for Genome

Regulation, Barcelona (experiment)• Juan Valcarcel, Center for Genome

Regulation, Barcelona (advice)• Mikhail Gelfand (general pessimism)

top related