embnet course proteomics using bioinformatics tools · findmod output - application of rules -...
TRANSCRIPT
![Page 1: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/1.jpg)
PMP
EMBnet course
Proteomics using Bioinformatics tools
MS identification tools
Patricia M. PalagiPIG, SIB, Geneva
![Page 2: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/2.jpg)
PMP
The data: list of m/z values
840.6950 13.751676.9606 26.11498.8283 128.91045.564 845.22171.9670 2.56861.1073 371.2842.51458 53.71456.7274 12.9863.268365 3.1
MS
1163.7008 286.1105 220.142986.1738 13.7619102.0752 4.3810147.1329 57.3333185.1851 649.0953185.3589 5.3810186.1876 81.4286213.0791 1.4286
MS/MS
frag
men
t m
ass
valu
es
Peptide
mas
s va
lues
and inte
nsi
ties
frag
men
t in
tensi
ties
Parent mass chargeParent mass value
![Page 3: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/3.jpg)
PMP
The tools
![Page 4: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/4.jpg)
PMP
One direct access to all- ExPASy
http://www.expasy.org/tools/
![Page 5: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/5.jpg)
PMP
Automatic protein identification
- Peptide mass fingerprinting – PMF
- MS/MS sequence search
- MS/MS spectra library search
- MS/MS prospective analysis (tag, open mod, de novo sequencing
![Page 6: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/6.jpg)
PMP
Peptide mass fingerprinting = PMF MS database matching
Enzymaticdigestion
In-silicodigestion
Protein(s) Peptides840.6950861676.960631498.82831045.5642171.967066861.107346842.514581456.727405863.268365
Mass spectraPeaklist
…MAIILAGGHSVRFGPKAFAEVNGETFYSRVITLESTNMFNEIIISTNAQLATQFKYPNVVIDDENHNDKGPLAGIYTIMKQHPEEELFFVVSVDTPMITGKAVSTLYQFLV …
- MAIILAGGHSVR- FGPK- AFAEVNGETFYSR- VITLESTNMFNEIIISTNAQLATQFK- YPNVVIDDENHNDK…
Sequence database entry
861.107346838.6950861676.960631498.82831045.5642171.967066842.514581457.827405863.268453
Theoretical peaklist
Theoretical proteolytic peptides
Match
Result: ranked list of protein candidates
![Page 7: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/7.jpg)
PMP
Peptide mass fingerprinting
What you have:- Set of peptide mass values- Information about the protein: molecular weight, pI, species.- Information about the experimental conditions: mass spectrometer precision, calibration used, possibility of missed-cleavages, possible modifications- Biological characteristics: post-translational modifications, fragments
What will do the tool:- Match between this information and a protein sequence database
What will you get:- a list of probable identified proteins
![Page 8: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/8.jpg)
PMP
What is the expected information in a submission form?
• Place to upload a spectrum (many spectra)
• Description of the sample process used – Chemical process such as alkylation/reduction, – Cleavage properties (enzyme), – Mass tolerance (m/z tolerance)
• Search space – Sequence databank, – taxonomy restriction– Mw, pI restriction
• Scoring criteria and filters
![Page 9: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/9.jpg)
PMP
One example of parameter effects on the search
• Accepted mass tolerancedue to imprecise measures and calibration problems
Source: Introduction to proteomics: tools for the new biology. Daniel C. Liebler. Human Press. 2002
![Page 10: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/10.jpg)
PMP
Summary of PMF tools
Tool Source website
Aldente www.expasy.org/cgi-bin/aldente
Mascot www.matrixscience.com/
MS-Fit prospector.ucsf.edu/
ProFound prowl.rockefeller.edu/profound_bin/WebProFound.exe
PepMAPPER wolf.bms.umist.ac.uk/mapper/
PeptideSearch www.mann.embl-heidelberg.de/GroupPages/PageLink/peptidesearchpage.html
PepFrag prowl.rockefeller.edu/prowl/pepfragch.html
Non exhaustive list!
![Page 11: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/11.jpg)
PMP
Scoring systems
• Essential for the identification! Gives a confidence value to each matched protein
• Three types of scores
• Shared peaks count (SPC): simply counts the number of matched mass values (peaks)
• Probabilistic scores: confidence value depends on probabilistic models or statistic knowledge used during the match (obtained from the databases)
• Statistic-learning: knowledge extraction from the influence of different properties used to match the proteins (obtained from the databases)
![Page 12: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/12.jpg)
PMP
Mascot• Internet free version in the above website (commercial versions available too)•Choice of several databases.• Considers multiple chemical modifications.• 0 to 9 missed-cleavages. • Score based on a combination of probabilistic and statistic approaches (is based on Mowsescore).• Considers Swiss-Prot annotations for Splice Variants (in locally installed versions).
http://www.matrixscience.com/
![Page 13: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/13.jpg)
PMP
Mascot - principles• Probability-based scoring• Computes the probability P that a match is
random • Significance threshold p< 0.05 (accepting that
the probability of the observed event occurring by chance is less than 5%)
• The significance of that result depends on the size of the database being searched.
• Mascot shades in green the insignificant hits• Score: -10Log10(P)
![Page 14: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/14.jpg)
PMP
Mascot
Input
![Page 15: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/15.jpg)
PMP
Hints about the significance of the score
Decoy Output
![Page 16: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/16.jpg)
PMP
Sequence coverage
Peptides matched
Error function
Output
![Page 17: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/17.jpg)
PMP
Aldente • SwissProt/TrEMBL db, indexed masses (trypsine and many others).
• Considers chemical modifications and user specified modifications.
• Considers biological modifications (annotations SWISS-PROT).
• 0 or 1 missed-cleavages.
• Use of robust alignment method (Hough transform):
• Determines deviation function of spectrometer
• Resolves ambiguities
• Less sensitive to noise
![Page 18: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/18.jpg)
PMP
Aldente – summary
Exp
erim
enta
l m
asse
s /
pea
ks
Theoretical masses / peptides
Spectrometer calibration error
Spectrometer internal error
• The Hough Transform estimates from the experimental data the deviation function of the mass spectrometer (the calibration error function).
• The program optimizes the set of best matches, excluding noise and outliers, to find the best alignment.
![Page 19: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/19.jpg)
PMP
Aldente - Input
![Page 20: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/20.jpg)
PMP
Aldente - Input
![Page 21: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/21.jpg)
PMP
Aldente - Input
![Page 22: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/22.jpg)
PMP
Aldente - Input
![Page 23: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/23.jpg)
PMP
Aldente - Input
![Page 24: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/24.jpg)
PMP
Output
Hints about the significance of the score
![Page 25: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/25.jpg)
PMP
Information from Swiss-Prot annotation. Processed protein (signal peptide is cleaved).
![Page 26: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/26.jpg)
PMP
BioGraph
![Page 27: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/27.jpg)
PMP
• A summary of the search parameters
• A list of potentially identified proteins (AC numbers) with scores and other evidences
• A detailed list of potentially identified peptides (associated or not to the potentially identified proteins) with scores
• Possibilities to validate/invalidate the provided results (info on the data processing, on the statistics, links to external resources, etc.)
• Possibilities to export the (validated) data in various formats
What is the expected information in an identification result?
![Page 28: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/28.jpg)
PMP
- Exact primary structure- Splicing variants- Sequence conflicts- PTMs
1 protein entry does not represent1 unique molecule
Protein characterization with PMF data
Prediction tools• PTMs and AA substitutions • Oligosaccharide structures • Unspecific cleavages
FindModGlycoModFindPept
Characterization tools at ExPASy using peptide mass fingerprinting data http://www.expasy.org/tools/
![Page 29: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/29.jpg)
PMP
SWISS-PROT feature table: active protein is more than just translation of
gene sequence (example: P20366)
![Page 30: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/30.jpg)
PMP
Detection of PTMs in MS
624.
376
9.8
893.
4
994.
510
56.1
1326
.715
01.9
1759
.8 1923
.421
00.6
600 2200
624.
376
9.8
893.
4
994.
510
56.1
1326
.715
01.9
1759
.819
23.4
2100
.6
600 2200
624.
376
9.8
893.
4
994.
510
56.1
1326
.715
01.9
1759
.8 1923
.421
00.6
600 2200
624.
376
9.8
893.
4
994.
5
1070
.113
26.7
1501
.9
1759
.8 1923
.4
2100
.6
600 2200
Δ m/z => PTM Δ m/z => PTM
Unmodifiedtrypticmasses
Trypticmasses of a modifiedprotein
![Page 31: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/31.jpg)
PMP
FindModhttp://www.expasy.org/tools/findmod/
DB entry
experimental
masses
experimental
options
AA modifications
![Page 32: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/32.jpg)
PMP
FindMod Output
}
} unmodified peptides,modified peptidesknown in SWISS-PROT and chemically modified peptides
putatively modified peptides predictedby mass differences
+ putative AA substitutions
![Page 33: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/33.jpg)
PMP
Modification rules can be defined from SWISS-PROT, PROSITE and the literature
modification amino acid rule exceptions
farnesylation Cys -
palmitoylation Cys Ser, ThrO-GlcNAc Ser, Thr Asn
amidation Xaa (C-term) where Gly followed Xaapyrrolidone carboxylic acid Gln (N-term) -
phosphorylation in eukaryotes: Ser, Thr, Asp, His, Tyr -in prokaryotes: Ser, Thr, Asp, His, Cys -
sulfatation in eukaryotes Tyr, PROSITE PDOC00003
some examples:
![Page 34: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/34.jpg)
PMP
FindMod Output - Application of Rules
- potentially modified peptides that agree with rules are listed- amino acids that potentially carry modifications are shown
- peptides potentially modified only by mass difference
- predictions can be tested by MS-MS peptide fragmentation
![Page 35: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/35.jpg)
PMP
• From MS (peptide mass fingerprint) data -detection of :– Matching peptides for unspecific cleavage– Masses resulting from possible contaminants– Matching peptides for specific cleavage (16 different
enzymes)– Peptides resulting from protease autolysis
FindPepthttp://www.expasy.org/tools/findpept.html
![Page 36: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/36.jpg)
PMP
FindPep
![Page 37: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/37.jpg)
PMP
FindPep
![Page 38: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/38.jpg)
PMP
FindPep
![Page 39: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/39.jpg)
PMP
FindPep
![Page 40: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/40.jpg)
PMP
• Tag search- Tools that search peptides based on a MS/MS Sequence Tag
– MS-Tag and MS-Seq, PeptideSearch
• Ion search or PFF - Tools that match MS/MS experimental spectra with “theoretical spectra” obtained via in-silicofragmentation of peptides generated from a sequence database– Phenyx, Mascot, Sequest, X!Tandem, OMSSA, ProID, …
• de novo sequencing - Tools that directly interpret MS/MS spectra and try to deduce a sequence– Convolution/alignment (PEDENTA)– De-novo sequencing followed by sequence matching
(Peaks, Lutefisk, Sherenga, PeptideSearch)– Guided Sequencing (Popitam)
In all cases, the output is a peptide structure per MS/MS spectrum
MS/MS based identification tools
![Page 41: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/41.jpg)
PMP
Peptide fragmentation fingerprinting = PFF = ion search
MS/MS database matchingEnzymaticdigestion
In-silicodigestion
Protein(s) Peptides340.695086676.96063498.8283545.5641171.967066261.107346342.51458456.727405363.268365
MS/MS spectra of peptides
Ions peaklists
…MAIILAGGHSVRFGPKAFAEVNGETFYSRVITLESTNMFNEIIISTNAQLATQFKYPNVVIDDENHNDKGPLAGIYTIMKQHPEEELFFVVSVDTPMITGKAVSTLYQFLV …
- MAIILAGGHSVR- FGPK- AFAEVNGETFYSR- VITLESTNMFNEIIIK- YPNVVIDDENNDK…
Sequence database entry
361.107346338.695086676.96063498.82831045.5641171.967066342.51458457.827405263.268453
Theoretical peaklist
Theoretical proteolytic peptides
Match
Result: ranked list of peptide
and protein
candidates
Theoretical fragmented
peptides
-MAIILAG-MAIILA-MAIIL-MAII-MAI-M-M-AIILAG
In-silicofragmentation
![Page 42: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/42.jpg)
PMP
P' nterm P' cterm
offset-28-45-46
0-17-18
+17
[N] is the mass of the N-term group[M] is the mass of the sum of the neutral amino acid residue masses
It is very important to know the ionic series produced by a spectrometer, otherwise potential matches will be missed.
In the other hand, if an ion-type not present in the original spectrum is taken into account, it will contribute to get false positive matches.
Ion-types
+28+ 2-15-16
-15
![Page 43: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/43.jpg)
PMP
Some PFF tools
Software Source website
InsPecT peptide.ucsd.edu/inspect.py
Mascot www.matrixscience.com/search_form_select.html
MS-Tag and MS-Seq prospector.ucsf.edu
PepFrag prowl.rockefeller.edu/prowl/pepfragch.html
Phenyx phenyx.vital-it.ch
Popitam www.expasy.org/tools/popitam
ProID (download) sashimi.sourceforge.net/software_mi.html
Sequest* fields.scripps.edu/sequest/index.html
Sonar 65.219.84.5/service/prowl/sonar.html
SpectrumMill* www.home.agilent.com
VEMS www.bio.aau.dk/en/biotechnology/vems.htm
X!Tandem (download) www.thegpm.org/TANDEM
*Commercialized
Same principle of a PMF, but using MS/MS spectra
Non exhaustive list!
![Page 44: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/44.jpg)
PMP
![Page 45: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/45.jpg)
PMP
Phenyx
![Page 46: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/46.jpg)
PMP
The Phenyx Web Interface:One result, multiple views
Excel, xml and text exports
DesktopResultsviews
Submission
Management consoleResults comparison
P.A. Binz
![Page 47: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/47.jpg)
PMP
The Proteins overview
List of identifiedproteins
Corresponding list of identified peptides
Protein group description
P.A. Binz
![Page 48: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/48.jpg)
PMP
The Proteins overview
Hints about the significance of the score
![Page 49: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/49.jpg)
PMP
Hints about the significance of the score
Better when high intensity peaks are matched and ion series are extended, without too many and too big holes
![Page 50: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/50.jpg)
PMP
The scoring system in Phenyx
• The score is the sum of up to 12 basic scores such as:– presence of a, b, y, y++, B-H2O…; co-occurrence of ion series
(using HMMs), peak intensities, residue modifications (PTM or chemical), …
• True probabilistic approach for each peptide match
(likelihood of being correct)log --------------------------------
(likelihood of being random)
• Function of instruments and molecular types– Esquire 3000+, LCQ; iTRAQ vs. unmodified peptides
• Scores are normalised into z-scores
Search in a query database
Search in a randomized set of peptides
![Page 51: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/51.jpg)
PMP
X!Tandem
www.thegpm.org
![Page 52: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/52.jpg)
PMP
X!Tandem - output
1
2
3
![Page 53: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/53.jpg)
PMP
The two-rounds searchMascot, Phenyx and X!Tandem
The identification process may be launched in 2-rounds
• Each round is defined with a set of search criteria– First round searches the selected database(s) with
stringent parameters,– Second round searches the proteins that have
passed the first round (relaxed parameters):
⇒Accelerate the job when looking for many variable modifications, or unspecific cleavages
⇒Appropriate when the first round defines stringent criteria to capture a protein ID, and the second round looks for looser peptide identifications
![Page 54: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/54.jpg)
PMP
Example 2nd round
1rnd,Only 3 fixed mods131 valid, 75% cov.
2rnd,Add variable mods205 valid,84% cov.
2rnd,With all modsAnd half cleaved348 valid,90% cov.
![Page 55: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/55.jpg)
PMP
Source of errors in assigning peptides
• Scores not adapted
• Parameters are too stringent or too loose
• Low MS/MS spectrum quality (many noise peaks, low signal to noise ratio, missing fragment ions, contaminants)
• Homologous proteins
• Incorrectly assigned charge state
• Pre-selection of the 2nd isotope (the parent mass is shifted of 1 Da. A solution is to take the parent mass tol. larger, but may drawn the good peptide too)…
• Novel peptide or variant
![Page 56: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/56.jpg)
PMP
Hints to know when the identification is correct
• Good sequence coverage: the larger the sub-sequences and the higher the sequence coverage value, the better
• Consider the length of the protein versus the number of matched theoretical peptides
•Better when high intensity peaks have been used in the identification
•Scores: the higher, the better. The furthest from the 2nd hit the better
• Filter on the correct species if you know it (reduces the search space, time, and errors)
• Better when the errors are more or less constants among all peptides found.
•If you have time, try many tools and compare the results
With MS
![Page 57: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/57.jpg)
PMP
Hints to know when the identification is correct
• The higher the number of peptides identified per protein, the better
• Sequence coverage: the larger the sub-sequences and the higher the sequence coverage value, the better
•Depends on the sample complexity and experiment workflow
• Scores: the higher, the better.
• Filter on the correct species if you know it (reduces the search space, time, and errors)
• Better when high intensity peaks are matched and ion series areextended, without too many and too big holes.
• Better when the errors are more or less constants among all ions.
• If you have time, try many tools and compare the results
With MS/MS
![Page 58: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/58.jpg)
PMP
E-values
• For a given score S, it indicates the number of matches that are expected to occur by chance in a database with a score at least equal to S.
• The e-value takes into account the size of the database that was searched. As a consequence it has a maximum of the number of sequences in the database.
• The lower the e-value, the more significant the score is.
• An e-value depends on the calculation of the p-value.
![Page 59: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/59.jpg)
PMP
p-value
• A p-value describes the probability, which assesses the chance of validly rejecting the null hypothesis. If the p-value is 10-5 then the rejection of the null hypothesis is due to chance with a probability of 10-5.
• A p-value has a maximum of 1.0.
• The larger the search space, the higher the p-value since the chance of a peptide being a random match increases.
• The lower the p-Value, the more significant is the match.
Source: Lisacek, Practical Proteomics, 2006 Sep;6 Suppl 2:22-32
![Page 60: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/60.jpg)
PMP
Z-score• Z-score is a dimensionless quantity
derived by subtracting the population mean from an individual (raw) score and then dividing the difference by the population standard deviation.
• The z score reveals how many units of the standard deviation a case is above or below the mean.
σμ−
=−x
scoreZ
Source: wikipedia
![Page 61: EMBnet course Proteomics using Bioinformatics tools · FindMod Output - Application of Rules - potentially modified peptides that agree with rules are listed - amino acids that potentially](https://reader035.vdocument.in/reader035/viewer/2022070107/602208124c20e006731388f7/html5/thumbnails/61.jpg)
PMP
So what?
• For small (significant) p-values, p and e are approximately equal, so the choice of one or the other is often equivalent. It is therefore reasonable to assimilate low p-values in Phenyx to e-values. X!Tandem simply switches e-values to log values to remove the powers of 10
• For a single search (or set of sampled peptides), you can compare z-scores. However, when two or more searches are performed on different size spaces, you first need to look at the p-values before comparing z-scores.
Source: Lisacek, Practical Proteomics, 2006 Sep;6 Suppl 2:22-32