melampsora genome annotation and genome structure analysis
Embed Size (px)
DESCRIPTION
Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium. Yao-Cheng Lin Bioinformatics & Evolutionary Genomics VIB Department of Plant Systems Biology, UGent. Overview. Gene prediction (structure annotation) - PowerPoint PPT PresentationTRANSCRIPT

Melampsora Genome Annotation and Genome Structure Analysis
First Annotation Workshop of the Melampsora Genome Consortium
Yao-Cheng LinBioinformatics & Evolutionary Genomics
VIB Department of Plant Systems Biology, UGent

Overview
• Gene prediction (structure annotation)• Gene family analysis• Phylogeney position of Melampsora

EuGène: gene prediction platform
EuGène
Intrinsic information
Extrinsic information
FunSiP
Coding IMMIntronic IMM
Translation start
TE & Repeat database
Protein databases
ESTs databases
Puccinia genomic sequence
RepeatMasker TblastXBlastX
BlastNGenomeThreader
start siteGT/AG
Splice site
Content potential for coding, intronic
and intergenic
Other prediction programs
Alternative models
Predicted genesGenomic sequence

Resources for Melampsora gene prediction• Gene models for training
– Previously identified core genes in basidiomycetes– Genes with manual curation from INRA-Nancy
• Splice site training/prediction– FunSiP: Michiel Van Bel developed it & helped for training
• BlastX database– 8 basidiomycete proteomes, Fungi RefSeq, SwissProt
• TBLASTX database– Puccinia graminis genomic sequence
• EST libraries– JGI Sanger sequencing– 454 Pyrosequencing (the 1st mira assembly)
• Repeat libraries– Hadi/Marie-Pierre.– In-house script, collected from first run of gene prediction. – Masked area from JGI.
• EuGene 3.4

Gene prediction – comparison of two prediction results
EuGene JGINumber of protein coding genes 17,167 16,694
Coding sequence < 300 aa 6,989 (40.7%) 8,212 (49.2%)
Average gene length (bp) 1,742.7 1,685.5Average coding sequence length (bp) 1,369.7 1,131.4Average exon length (bp) 261.1 235
Average exon number 5.3 4.8Average intron length (bp) 86.9 117.8
SwissProt support 6,521 (38.0%) 5,699 (34.1%)EST support 6,152 (35.8%) 6,241 (37.4%)EST support (< 300 aa) 1,066 995

Gene prediction – protein length distribution
100
300
500
700
900
1100
1300
1500
1700
1900
05
10152025303540
Melampsora JGIMelampsora EuGeneLaccariaPuccinia
Protein length (aa)
Freq
uenc
y (%
)

Example: metallothionein-like protein
• Metallothionein-like protein in Magnaporthe• Protein length: 22-amino acid (MMT1)• Six Cystein residues.• Mmt1 mutants loose the ability to cause plant disease.
• Difficulties in in silicon identification– Sequence divergence.– Short sequence, easily been rejected by E-value cut-off.

Overview
• Gene prediction and annotation platform• Gene family analysis• Phylogeny position of Melampsora

Gene family expansion and contraction
• Gene family clustering– Similarity search with 12 fungi genomes (10 basidiomycetes, 2
ascomycetes), (All-against-all BLASTP, E-value cutoff 1e-5).– Gene families constructed by TribeMCL with inflation factor 4.0.
• Species/Lineage specific gene family expansions– The mean gene family size and standard deviations were
calculate for all gene families (exclude SSFs and orphans).– To center and normalize the data, the matrix of previous profile
was transformed into a matrix of z-score.• Functional assignment
– Domain based: RPS-BLAST– HMM profile for each family -> Search the SwissProt and NR
database.– GO terms.

Protein phylogeny profile / z-score
A B C Mean SD1 5 10 15 10 5
2 4 6 5 5 1
3 20 5 10 11.7 7.6
100 1 1 1201 0 10 0
A B C1 -1 0 1
2 -1 1 0
3 1.1 -0.9 -0.2
Protein phylogeny profileZ-score profile
Z = Gene number – mean gene number
Standard deviation
Species specific gene family
Core-gene family
Genome
Fam
ily

Fungi genomes characteristics
Genome Genome size (Mb) Genes < 300 a.a
genesGC content
(%)Magnaporthe grisea 41.7 12,832 5,312 (41.4%) 51.6
Neurospora crassa 39.23 9,822 3,445 (35.1%) 49.3
Sporobolomyces roseus 21.1 5536 1,714 (31.0%) 49.5
Puccinia graminis 88.64 20,566 11,319 (55.0%) 43.0Melampsora larici-
populina 101.1 16,694 8,212 (49.2%) 42.1
Ustilago maydis 19.7 6,522 1,668 (25.6%) 54.0
Malassezia globosa 8.9 4,286 1,468 (34.3%) 52.0
Postia placenta 90.9 12,415 4,629 (37.3%) 52.4Phanerochaete chrysosporium 35.1 10,048 3,579 (35.6%) 53.2
Laccaria bicolor 64.9 19,036 10,013 (52.6%) 46.6
Coprinus cinereus 37.5 13,544 5,487 (40.5%) 51.6Cryptococcus neoformans 19.5 7,170 2,372 (33.1%) 48.2
1
2
3

Orphans / Species specific gene families
Neuro
spora
crass
a
Magnap
orthe g
risea
Cryptoco
ccus n
eoform
ans
Coprinus c
inereus
Lacca
ria bico
lor
Phanero
chae
te ch
rysosp
orium
Postia p
lacen
ta
Malass
ezia
globosa
Ustilag
o may
dis
Sporobolomyc
es ro
seus
Puccinia
graminis
Melampso
ra lar
ici-populin
a0
10
20
30
40
50
60
70
80
Orphans Genes in species specific families
% o
f gen
es
1
23

Difference in average gene family size
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4M
ean
z-sc
ore
Neurospora crassa
Magnaporthe grisea
Cryptococcus neoformans C
oprinus cinereus
Laccaria bicolor
Phanerochaete chrysosporium
Postia placent
Malassezia globosa
Ustilago maydis
Sporobolomyces roseus
Puccinia graminis_f._sp._tritici
Melampsora larici-populina
*Total 8035 families, exclude the species specific families

Hierarchical clustering of gene family
N. crassaM. grisea
S. roseusP. graminis
M. larici-populinU. maydisM. globosaP. placenta
P. chrysosporiumC. cinereus
L. bicolorC. neoformans
• Top100 most variable profiles, based on the standard deviations were calculated.
• Red: Protein kinase, esterase lipase, cre recombinase, DNA/RNA helicase, Leucine-rich repeat
• Blue: major facilitator superfamily

Overview
• Gene prediction and annotation platform• Gene family analysis• Phylogeny position of Melampsora

Phylogenies of Melampsora
• Construct the Melampsora phylogenic tree based on FUNYBASE with selected fungi genomes.
• FUNYBASE: single-copy gene family (246 genes) within 21 fungi species (mostly ascomycetes).
• 22 selected species:– Ascomycete: Aspergillus nidulans, Coccidioides immitis, Fusarium
graminearum, Mycosphaerella graminicola, Magnaporthe grisea, Neurospora crassa, Nectria haematococca, Pyrenophora tritici-repentis, Stagonospora nodorum, Schizosaccharomyces pombe, Sclerotinia sclerotiorum.
– Basidiomycete: Coprinus cinereus, Cryptococcus neoformans, Laccaria bicolor, Malassezia globosa, Melampsora larici-populina, Phanerochaete chrysosporium, Puccinia graminis, Postia placenta, Sporobolomyces roseus, Ustilago maydis
– Zygomycete: Rhizopus oryzae
*new genome; reject in FUNYBASE

Phylogenies of Melampsora - Method
• 246 HMM models for the conserved protein sequence blocks in FUNYBASE .
• For each genome, HMMER search against whole proteome and retain the protein sequence of the best hit in each model.
• 148 models have single-copy gene in our 22 selected species.
• Concatenate the 148 single-copy orthologs for tree building.

Melampsora in the phylogenetic tree of fungi
using phylo_win, Neighbor joining method with Poisson correction, 500 bootstrap.

Acknowledgements• Gent
• Stephane Rombauts• Michiel Van Bel• Klaas Vandepoele• Kenny Billiau• Thomas Abeel• Pierre Rouzé• Lieven Sterck• Yves Van de Peer
• Nancy
• Stéphane Hacquard• Emilie Tisserant• Marie-Pierre Oudot-Le Secq• Sébastien Duplessis• Francis Martin