peptide-assisted annotation of the mlp genome
DESCRIPTION
Peptide-assisted annotation of the Mlp genome. Philippe Tanguay Nicolas Feau David Joly Richard Hamelin. Objective. Use peptide libraries to validate the in silico prediction of gene models. Assumption : « if a peptide protein is detected, then there must be a gene that encodes it ». - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/1.jpg)
Peptide-assisted annotation of the Mlp genome
Philippe TanguayNicolas FeauDavid JolyRichard Hamelin
![Page 2: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/2.jpg)
Objective
• Use peptide libraries to validate the in silico prediction of gene models
Mapping peptides on a translated genome sequence = provides « correct frames of translation »
Assumption : « if a peptide protein is detected, then there must be a gene that encodes it »
![Page 3: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/3.jpg)
Methodology (hardware)
Urediniospores (3729)
Protein extraction
1D SDS-PAGE
Gel slicing (64)
Trypsin digestion
LC-MS/MS
Bioinformatics
Waters MassPREP station
LTQ ThermoElectron
Extraction SlicingDigestionElution
Peptide MS/MS dataacquisition
![Page 4: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/4.jpg)
Methodology (Bioinformatic)
Spectral identification by sequence
database searching
Statistical validation of peptide identifications
Protein databases built from…
1 - Comparison of results from both db2- Comparison of peptides and GM
(validation/correction of genome annotations)
6 frames translation of the genome
Gene catalog (16694 GM)
MascotSequest
MascotSequest
![Page 5: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/5.jpg)
MLP proteomic results so far
• 691 000 MS/MS spectra obtained from the total proteins
10980 3524699
Gene catalog 6-frame translation
Mascot +
SequestOnly
Mascot
352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog
Unique peptides:
False discovery rate below 1.6%
![Page 6: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/6.jpg)
Peptide frequency distribution on GM
0
50
100
150
200
250
300
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79
No. peptide/gene model
No
. ge
ne
mo
del
Mean 9 peptides covering 134 AA / GM
The 10980 + 4699 peptides represent assignments for nearly 10% of the Gene catalog e.g. 1659 GM
![Page 7: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/7.jpg)
Automated classification of peptides with no hit (352) on the Gene catalog
• 5’ extension of a predicted GM– If peptide (s) located within the 1000 bp upstream the predicted
GM start codon• 3’ extension of a predicted GM
– If peptide (s) located within the 1000 bp downstream the predicted GM stop codon
• 5’ and 3’ extension of a predicted GM– If peptides located within the 1000 bp upstream the start codon
and within the 1000 bp downstream the predicted GM stop codon
• Internal extension of a predicted GM– If peptide (s) located in the GM
• New GM– If no predicted GM in the vicinity of the peptide (s)
![Page 8: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/8.jpg)
Corrections-Additions to the Gene catalog
Modification Number of GM
5’ extension 44
Internal exon extension 31
3’ extension 22
5’ and 3’ extension 5
New GM 73
Total 172
• Mapping of the peptides with no hit on the genome allowed the following modifications
![Page 9: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/9.jpg)
Manual curation- Internal extension
![Page 10: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/10.jpg)
Manual curation- Internal extension
• EuGene’s prediction is OK
![Page 11: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/11.jpg)
Manual curation- New GM
![Page 12: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/12.jpg)
Manual curation- New GM
![Page 13: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/13.jpg)
Summary – Peptide-assisted genome annotation
– Validated 10 % of the predicted GM– Corrected/found > 170 GM
According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM
With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis:
![Page 14: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/14.jpg)
• A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes
Perspectives
• Analysing the Sequest output obtained from the 6-frames translation
5051 peptides identified with Mascot (352 with no hits on the Gene catalog)
Sequest ?
![Page 15: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/15.jpg)
Available material
• Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions
• The peptides GFF files will be made available to the Melampsora community
![Page 16: Peptide-assisted annotation of the Mlp genome](https://reader034.vdocument.in/reader034/viewer/2022051517/56815904550346895dc6382b/html5/thumbnails/16.jpg)
Finding the peptides on the different model prediction sets
Gene Catalog 16694 1659 9,9%
EuGene 12386 1348 10,9%
Genewise1 14087 977 6,9%
Genewise1Plus 14162 1046 7,4%
fgenesh1_pg 15760 1140 7,2%
fgenesh2_pg 17833 1377 7,7%
Do we need to perform a new spectra search on the whole model prediction sets ?
Total GMModel prediction set GM validated %