use of sequest search results with proteored miape extractor
DESCRIPTION
Use of SEQUEST search results with ProteoRed.org MIAPE Extractor. Sp-HPP. HPP. La Cristalera , Miraflores de la Sierra, 10-11 December 2012. Index. - PowerPoint PPT PresentationTRANSCRIPT
La Cristalera, Miraflores de la Sierra, 10-11 December 2012
HPP Sp-HPP
Use of SEQUEST search results with ProteoRed.org MIAPE Extractor
1. A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE ToolkitÓscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián
2. Data dependent acquisition using inclusion list (IL)Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal
INDEX
Ó. Gallardo
MASCOT WORKFLOW
MIAPE Generation
MIAPEExtractor
Mass Spectra Identification
Mascot
Output file
mzIdentML
MIAPE MS MIAPE MSI
A D MS D T I H E I K
<MZID/>
/>MIAPEMS<
ψ
/>MIAPEMSI<
ψ
MIAPEGenerator
Tool
RAWMGF
MGF
MIAPEExtractor
Ó. Gallardo
Mass Spectra Identification Output file
PROTEOME DISCOVERER WORKFLOW
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
mzIdentMLA D MS D T I H E I K
<MZID/>
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
RAW
MGF
MGF
(GPL) LP-CSIC/UAB 2011-2012
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
RAW
MGF
MGF
Proteome Discoverer
Discoverer Daemon
MIAPEExtractor
Ó. Gallardo
Mass Spectra Identification Output file
PROTEOME DISCOVERER WORKFLOW
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
mzIdentMLA D MS D T I H E I K
<MZID/>
Discoverer Daemon
Proteome Discoverer
ProCon0.9.152
Ó. Gallardo
A. Medina August 2012
PROTEOME DISCOVERER WORKFLOW
MSF
MSF
mzIdentMLA D MS D T I H E I K
<MZID/>
ProCon0.9.162
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
MSF
MSF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finishedSequenceCollection writtenCV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851)
1. ProCon 0.9.162 was unable to interpret correctly the Controlled Vocabulary used by
Proteome Discoverer to identify Post Translational Modifications (PTMs)
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finishedSequenceCollection writtenCV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
2. ProCon 0.9.162 also had problems with it’s internal array references
ERROR!!
ProCon 0.9.16
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
MSF
MSF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finishedSequenceCollection writtenCV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851)
1. ProCon 0.9.163 was unable to identify correctly Post Translational Modifications (PTMs) , marking all of them as “unknown
modification” in the resulting mzIdentML file
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finishedSequenceCollection writtenCV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
2. ProCon 0.9.163 had still problems with it’s internal array references
ERROR!!
23
ProCon 0.9.16
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
MSF
MSF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
34
MIAPE Generation
MIAPEGenerator
Tool
MIAPEExtractor
Ó. Gallardo
Mass Spectra Identification Output file
PROTEOME DISCOVERER WORKFLOW
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
Discoverer Daemon
Proteome Discoverer
MIAPE Generation
MIAPEExtractor
Ó. Gallardo
Mass Spectra Identification Output file
PROTEOME DISCOVERER WORKFLOW
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
Discoverer Daemon
Proteome Discoverer ...........................................................67% finished.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar.....MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finishedSequenceCollection writtenCV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
Spectra IDs didn’t match between MGF file and mzIdentML file
IDmgf
IDmzid
IDIDID ID
PepMSCharge
RT
MIAPE Generation
MIAPEGenerator
Tool
MIAPEExtractor
Ó. Gallardo
Mass Spectra Identification Output file
PROTEOME DISCOVERER WORKFLOW
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
MIAPE MS MIAPE MSI/>MIAPE
MS<
ψ
/>MIAPEMSI<
ψ
Discoverer Daemon
Proteome Discoverer
ID ID
PepMSCharge
RT
ID
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
MIAPE Generation
MIAPEGenerator
Tool
MIAPEExtractor
Mass Spectra Identification Output file
RAW MSF
MSF
Proteome Discoverer
MGF
MGF
.Prot.XML
mzIdentMLA D MS D T I H E I K
<MZID/>
MIAPE MS MIAPE MSI/>MIAPE
MS<
ψ
/>MIAPEMSI<
ψ
Discoverer Daemon
Proteome Discoverer
1. Uploading of MSF + mzIdentML files through MIAPE Extractor is not yet automatized
2. Although we can generate MIAPE data from Sequest search results, MIAPE Toolkit
doesn’t work very well with this data for the analysis stage: we can not retrieve the
identified proteins, there are problems with the Sequest Score fields, …
1. We are working in an automation script, to automatize MIAPE Extractor data
extraction: MIAPE Extractor Automator v.22. Development of MIAPE Extractor and
MIAPE Generator tool continues improvement in each version
1. Exportation of Prot.XML files from the MSF ones, and utter conversion of MSF +
Prot.XML files to mzIdentML files is not automatized
2. ProCon has still some errors, is very slow with large files, and is memory hungry
ProCon developers are working in a new version that doesn’t need Prot.XML files, making the
conversion process much faster and easier.
WORK IN PROGRESS
Ó. Gallardo
1. A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE ToolkitÓscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián
2. Data dependent acquisition using inclusion list (IL)Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal
INDEX
RATIONAL OF USING DDP WITH INCLUSION LIST (IL):
a.- Most target proteins assigned to the groups of the shotgun project were not detected using shotgun approaches.
b.- The few detected peptides were not optimum for MRM analysis (not proteotypic, with Met/Cys, with missed cleavage).
c.- Preliminary tests at LP-CSIC/UAB using targeted approaches require a limited list of peptides (need to restrict the list of target m/z values to 20-30) and failed to detect the target proteins.
DDP with Inclusion list increases the probability to positively detect low abundant proteins/peptides without the constraints of targeted approaches.
16 PROTEINS SELECTED FOR INCLUSION LIST
- 6 proteins assigned to the LPCSICUAB laboratory- 10 proteins assigned to MRM labs and not
detected by shotgun
Laboratory Uniprot Name
Canals P69905 HBA_HUMANFB Q6GPI1 CTRB2_HUMANCG P24855 DNAS1_HUMANMPV Q6A1A2 PDPK2_HUMANFC P16444 DPEP1_HUMANCG Q9BSW7 SYT17_HUMANCG P11597 CETP_HUMANMPV P15391 CD19_HUMANCG Q53FZ2 ACSM3_HUMANFV Q8N4N3 KLH36_HUMANAbian Q9BUU2 METTL22_HUMANAbian P33076 CIITA_HUMANAbian Q9Y661 HS3ST4_HUMANAbian Q14703 MBTPS1_HUMANAbian B7ZMK8 PRSS36_HUMANAbian A4GXA9 EME2_HUMAN
Data dependent acquisition with inclusion list
J. Villanueva
To obtain the inclusion list: 1.- All tryptic peptides 7-25AA. 2.- m/z values assuming z=2 and z=3 for all peptides. 3.- Filter duplicate m/z values (software requirement) Number of m/z values in the inclusion list: 556 (num peptides 282)
Signal ID m/zP33076_GCTLLLTARPR 400.9013P11597_VFHSLAK 401.2348P16444_YPDLIAELLR 401.5646Q53FZ2_EGWGNLK 402.2062P24855_YDIALVQEVR 402.5561Q8N4N3_VASMNQR 403.2032Q8N4N3_VKPAVCSLLPK 404.5779Q14703_APCPGCSHLTLK 409.5392Q9Y661_AISDYTQTLSK 409.5473Q9BSW7_TAVEQWHSLR 409.5478P69905_VDPVNFK 409.7243P16444_TLEQMDVVHR 409.8769A4GXA9_MGLLAVGPDLSR 410.2292
Samples CCD18 and MCF7Aliquot 250 µg protein
OffGel (12 fractions)
FASP digestion
LC-MS/MS (DDP, IL, Targeted)
Protein Discoverer
Procedure: Data Dependent with IL
J. Villanueva
DATA DEPENDENT WITH INCLUSION LIST: LTQ-ORBITRAP
RT: 0.00 - 140.02
0 20 40 60 80 100 120 140Time (min)
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e A
bund
ance
185641.90
256753.80
269255.50 7533
136.19
345868.15
241251.34
414980.00
439283.91
530198.67
173439.81
6742122.97
6210114.05
90525.72
40717.09
7635136.17
277055.79
261653.80 3374
64.60 358367.932495
52.20186942.36 3766
71.01 541898.504522
83.95157137.08 6832
122.716285
113.47106528.39 4819
88.7330815.32
NL: 2.17E9TIC F: FTMS + p NSI Full ms [400.00-1800.00] MS HPP_VallHebron_DDPorbi_Test1_120724_Fr06_06
NL: 9.66E8TIC F: FTMS + p NSI Full ms [400.00-1800.00] MS HPP_VallHebron_DDPorbi_Test1_120724_Fr07_07
Offgel Fr6
Offgel Fr7
Sample VH: MCF-7MS traces
J. Villanueva
RESULT:Data dependent with IL: 282 Listed peptides undetected
(same that targeted experiments)
Low amount of target proteins
Proteins not expressed in these cells
RESULTS: Inclusion list and targeted
DATA PROCESSING FOR IL DATA: 1.- MGF generation with PDv1.32.- Database search: Proteome Discoverer and Mascot3.- FDR 5%
J. Villanueva
DATA PROCESSING: 1.- MGF generation with PDv1.32.- Database search: Proteome Discoverer (and Mascot)3.- Search results and Filtering (1 %FDR): MIAPE Extractor (Data
Inspector Module) and Proteome Discoverer.
Work in progress:MIAPE EXTRACTOR:
The data could be uploaded and the FDR process could be achieved.
Data Inspector Module: Detected errors to be solved: unable to extract protein information from SEQUEST data.
Chromosome 16 protein description: Data Dependent Analysis
J. Villanueva
Sample Acquisition method
search method
MIAPE EXTRACTOR PROTEOME DISCOVERERNum peptides Num proteins Num peptides Num proteins
MCF7 DDP MASCOT 3079 2316 -- -- SEQUEST 3561 1422 3616 1282
CCD18 DDP MASCOT 3102 2370 3765 1180 SEQUEST 2250 980 2475 946
Work in progress...
Number of proteins that passed the 1%FDR filter:1.- Significant differences between searching algorithms
Need an in-depth data revision.
J. Villanueva
La Cristalera, Miraflores de la Sierra, 10-11 December 2012
HPP Sp-HPP
Use of SEQUEST search results with ProteoRed.org MIAPE Extractor
Thank you for your attention.
Any question?