the genome sequence of melampsora larici- populina the causal agent of the poplar rust disease

27
The genome sequence of Melampsora larici-populina the causal agent of the poplar rust disease Gene content in the Mlp Genome (automated annotation) Mlp Summer workshop – INRA Nancy, August 20-21 2008 Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

Upload: eternity-lambert

Post on 02-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease Gene content in the Mlp Genome ( automated annotation). Duplessis Sébastien (INRA Nancy). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

The genome sequence of Melampsora larici-populinathe causal agent of the poplar rust disease

Gene content in the Mlp Genome(automated annotation)

Mlp Summer workshop – INRA Nancy, August 20-21 2008

Duplessis Sébastien (INRA Nancy)

Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

Page 2: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Annotation of Mlp Genome – Gene prediction

2006-2007

Codingpotential search

SpliceMachine

NetstartRepeats

BlastnBlastx

EuGene, FGeneSH, Genewise

Intrinsicapproaches

Extrinsicapproaches

PredictedGenes

(manual curation)

tBlastx

PucciniaSporobolomycesBasidiomycetes

Swissprot Mlp ESTs

Page 3: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Mlp Genome Project – Summer 2007

Pre-release of Mlp genome assembly (16.4% gaps – Assembled with JAZZ)

Main genome scaffold total: 2,682

ESTs from 50/50 spores and germtubes of Mlp 98AG31

INRA Nancy => ~4,000 (2004)JGI => ~60,000 (2007)

=> ~52,000 ESTs

ESTs from spores and germlings of Melamspora Spp. [Mlp, Mmd, Mmt, Mo]

CFS Laval => ~3,000 Mlp / ~4,200 Mmd / ~3,000 Mo / ~3,000 MmtIn planta ESTs from Mlp haustoria => ~1,700 Mlp H3B

=> ~15,000 ESTs

Page 4: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Blast against Mlp scafolds Blast against Mlp ESTsBlast against available basidiomycete genomes

Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant)

Page 5: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Files to help in annotation using Artemis

=> fasta of genome scaffolds

=> gff files of ESTs clusters

=> gff files of blastn Hits vs. Puccinia, Sporobolomyces & Ustilago gene models

Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant)

Page 6: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Annotation of FL sequences = TRAINING SET for gene predictors (EuGene, fgenesh, )

Gene models annotation based on complete EST support & Homology

Coding for know ubiquitous function (metabolism, cytoskeleton elements…)Coding for hypothetical proteins and new genes?Coding for proteins of various size

Mannual curation performed with Artemis (Nancy & Québec)

=> 348 GM curated

Edition of annotation cards => Melampsora Genome Consortium website

Page 7: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

TRAINING SET for gene prediction (EuGene, fgenesh, )

=> 348 GM curated

=> 52,269 ESTs from Mlp 98AG31

=> raw TE prediction based on Mlp genome pre-release

Page 8: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

• 39 scaffolds (43.9 Mbp)• 409 repetitive elements provided by collaborator ,

87 generated in pipeline• nr: N.crassa, M.grisea, F.graminearum• ESTs

– 3941 uniseqs described in 2003 paper– 6318 uniseqs described in 2008 paper– 8799 JGI cluster consensi (includes

external ESTs)• 5 C.parasitica CDSs from NCBI

JGI Gene prediction (Andrea Aerts – Jan-Mar/2008 )

Page 9: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Outputsfeature Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1

Scaffolds (Mbp)

101.1 21.2 64.9 35.1 90.9

Gaps (Mbp)3.4

(3.4%)0.33

(1.6%)6.2

(9.6%) N/A21.9

(24.1%)

Repeats (Mbp)

49.4 (48.9%)

0.31 (1.5%)

14.4 (22.2%)

0.32 (0.91%)

4.96 (5.46%)

Gene length (Mbp)

25.0 (24.7%)

13.2 (62.3%)

31.6 (48.7%)

16.8 (47.9%)

35.6 (39.2%)

# genes 15,410 5,536 20,614 10,048 17,173

# genes / Mbp 152.42 261.13 317.63 286.27 188.92

Page 10: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

What do the genes look like?Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1

Gene length

1622.89 2389.05 1533.42 1667.04 2075.26

Transcript length

1241.87 1750.21 1134.45 1365.73 1438.85

Protein length

383.36 564.80 367.19 455.18 458.46

Exon length

256.26 242.77 210.13 233.64 211.92

Intron length

101.07 104.88 92.70 64.18 111.92

Exon frequency

4.85 7.21 5.40 5.85 6.79

Page 11: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

How were the genes predicted?

Method Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1

KGs and ESTs

1377 (8.9%) 54 (1%) 64 (0.3%) 12 (0.1%) 61 (0.4%)

homology 2653 (17.2%)Eug 5603

(36.4%)

2713 (49%)

3699 (18%)Eug 9848

(47.7%)

3526 (35.1%)

7549 (43.9%)

ab initio 5777 (37.5%) 2769 (50%)

7003 (34%) 6510 (64.8%)

9563 (55.7%)

Page 12: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

How good are the genes?

metric Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1

start + stop

14432 (94%)

3891 (70%) 18218 (88%)

8352 (83%) 14569 (85%)

nr 6664 (43%) 4446 (80%) 10925 (53%)

ND 13374 (78%)

Pfam 4101 (27%) 3272 (59%) 7653 (37%) 4769 (47%) 7681 (45%)

EST 3230 (21%) 1759 (32%) 2468 (12%) ND 4038 (23%)

Page 13: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

KOG assignments

Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1

Cellular Processes & Signaling

2769 (18%)

1525 (28%)

3351 (16%)

2132 (21%)

3482 (20%)

Information Storage & Processing

1864 (12%)

1149 (21%)

2196 (11%)

1456 (14%)

2251 (13%)

Metabolism 2127 (14%)

1358 (25%)

2294 (11%)

2044 (20%)

3589 (21%)

Page 14: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Amino Acid Metabolism

Biodegradation of Xenobiotics

Biosynthesis of Secondary Metabolites

Carbohydrate Metabolism

Energy Metabolism Lipid Metabolism

Metabolism of Cofactors and Vitamins Metabolism of Complex Carbohydrates

Metabolism of Complex Lipids Metabolism of Other Amino Acids

Nucleotide Metabolism

Mellp1

Sporo1

Lacbi1

Phchr1

Pospl1

KEGG assignments

Page 15: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Prediction of Gene Models using EuGene (VIB - Ghent)

Annotation performed with Mlp genome pre-release

M-P Oudot Le Secq - Eugene annotation using Laccaria bicolor annotation parameters=> ~ 17,000 Mlp gene models (<1,500 TEs) => Mlp GM v0.0

Yao-Cheng Lin - Eugene annotation using parameters specifically defined for M. larici-populina=> ~9,000 Mlp gene models (> 200aa)

Annotation performed with Mlp genome assembly release Jan2008

Yao-Cheng Lin - EuGene annotation using specific training for M. larici-populina

=> 12,386 Mlp gene models

4308 hits vs yeast4899 hits against Uniprot (7487 no hits - 1/3 ; 2/3)4708 supported by ESTs

Yao-Cheng Lin – Last EuGene annotation (summer 2008)

including 454 data (~ 5000 contigs) and adjusted parameters for small secreted proteins prediction

=> 17,167 Mlp gene models (6,989 < 300aa)

Page 16: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

• Genewise – 9193 models• Fgenesh_pm 3147 models• estExt_fpm 2438 models

JGI Gene prediction (Andrea Aerts – 03/28/2008 )

Reconciliation and release in April 2008

+

EuGene Prediction

Page 17: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

JGI Gene Models prediction

16694 Gene models

4465 EuGene models (27%)

4810 fgenesh1 (29%) + 5422 fgenesh2 (32%)

=> 65.5% fgenesh models

1997 Genewise/GenewisePlus models (12%)

21% of fgenesh/genewise models were consolidated with EST Extension

Prediction method:– Ab initio: 51 %– EuGene: 27 %– Homology based: 14 %– EST based: 8 %

16,694 gene models predicted by JGI predictions (& EuGene)

Gene Model validation:– Complete (5'M-3'*): 94 %– Alignment with nr: 43 %– Alignment with pfam: 25 %– EST support: 27 %

Page 18: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

JGI Gene Models prediction

16,694 gene models predicted by JGI (& EuGene)

Mean exon size: 250 pb (Laccaria: 210 pb)Mean intron size: 120 pb (Laccaria: 93 pb)Mean protein size: 378 (Laccaria: 367 aa)

Mean gene length: 1685 pb (Laccaria: 1.5 kb)Mean transcript length: 1224 b (Laccaria: 1.1 kb)Exon # / gene: 4.90 (Laccaria: 5.4)

Protein length < 300 aa— Laccaria: 52%, Coprinus: 40%— Melampsora: 49%, Puccinia: 54%

Page 19: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

JGI Gene Models prediction – Introns donors and acceptors

Page 20: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Gene Models density on the 20 largest scaffoldsMean gene density of 2.04/10kb => 1 gene /4.9 kb (Laccaria 1 gene / 3.1 kb)

Page 21: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

28% of the genome is coding sequence

16,694 putative proteins (gene models) = JGI prediction + extra putative proteins identified with EuGene

15,725 proteins > 100 AALaccaria >17,000Phanerochaete 10,048Coprinopsis 8,759Ustilago 6,522

7,830 with homologs in nr (47%) including 3,893 hypothetical proteins

(Puccinia, Laccaria, mostly basidiomycete) 5,461 with homologs in swissprot (33%) 6,820 with homologs in Laccaria (41%) 4,507 supported by Mlp ESTs (27%)

A large proportion (30%) of Mlp genes do not have homologues in other fungal genomes including Pucciniales P. graminis and Sporobolomyces roseus

JGI Gene Models prediction – The Mlp gene space

Page 22: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

ESTs Phakopsora Puccinia Sporobolomyces Ustilago Phanerochaete Coprinus Laccaria Magnaporthe

0

10

20

30

40

50

60

70

Matchs (%)

Blast vs. Other fungal deduced proteomes

33% of Melampsora larici-populina specific Gene Models (5,500 models with no homologs but ~300 Pfam/IPR hits)

10,344 homologs in P. graminis (62%)~ 25% of orthologs with P. graminis

Page 23: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Mlp gene models functional classification

Page 24: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Cellular component

cell

macromolecular complex

organelle

extracellular region

envelope

Molecular function

catalytic activity

binding

transporter activity

enzyme regulator activity

molecular transducer activity

motor activity

transcription regulator activity

structural molecule activity

nutrient reservoir activity

antioxidant activity

Biological process

metabolic process

establishment of localization

cellular process

biological regulation

response to stimulus

reproduction

GO classification: 27.8%

Page 25: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

• KEGG pathways: 2758 gene models (16.5%)

Amino Acid

Metabolism

Biodegradat ion

of Xenobiot ics

Biosynthesis of

Secondary

Metabolit es

Carbohydrat e

Metabolism

Energy

Metabolism

Lipid

Metabolism

Metabolism of

Cofactors and

Vit amins

Metabolism of

Other Amino

Acids

Nucleot ide

Metabolism

0

5

10

15

20

25

30

35

Melampsora

Puccinia

Sporobolomyces

%

Page 26: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

JGI summary – A complete table to help in annotating Mlp gene models

Page 27: The  genome sequence  of  Melampsora larici- populina the causal agent of the  poplar rust disease

Emilie Tisserant & Benoît Hilselberger (INRA Nancy) Mlp Bioinfo

Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families

Mlp 98AG31

Marie-Pierre Oudot-Le Secq (INRA Nancy)early EuGene gene prediction

the 'bad guy' genomic team at INRA

UMR 1136 IAM Duplessis Sébastien & Francis Martin