2 3 j. proteome res., 2011, 10 (1), pp 153–160 doi: 10.1021/pr100677g
TRANSCRIPT
![Page 1: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/1.jpg)
Paulo Costa CarvalhoLaboratory for Proteomics and Protein EngineeringFiocruz - PR
Analyzing shotgun proteomic data
pcarvalho.com
![Page 2: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/2.jpg)
2
• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.
• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics
• Final Considerations
Outline
![Page 3: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/3.jpg)
3
J. Proteome Res., 2011, 10 (1), pp 153–160DOI: 10.1021/pr100677g
Motivations
![Page 4: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/4.jpg)
4
![Page 5: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/5.jpg)
5
Editorial
“There has been an unprecedented improvement in the quality and quantity of commercial proteomics data generation technologies, making data generation more accessible to many researchers. However, more and more discoveries will be led by researchers in command of the skills necessary to mine and extensively interpret the volumes of data. Already the ability to generate data vastly outpaces our ability to interpret it, and the lack of expertise in interpreting data is the current gating factor in the advancement of proteomics sciences. Proteomics scientists with training solely in data generation techniques will be shut out of more and more research opportunities.
Nuno Bandeira, July 2011
Computational Proteomics
![Page 6: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/6.jpg)
Too many roads not taken
Eduards AM, Nature, Feb 2011
![Page 7: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/7.jpg)
7
• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.
• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics
• Final Considerations
Outline
![Page 8: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/8.jpg)
Proteomics has revolutionized biochemical research
![Page 9: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/9.jpg)
pcarvalho.com 9
![Page 10: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/10.jpg)
10
LC / MS shotgun proteomic data
Mass / Charge
Time
![Page 11: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/11.jpg)
AF Y L K
m/z
A F Y AL KNH2 COOH
(precursor)2+
(B) (Y)
![Page 12: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/12.jpg)
A FY L K
A
m/z
A F Y L KNH2 COOH
AF Y L K
(precursor)2+
(B) (Y)
![Page 13: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/13.jpg)
A F Y
L K
L
m/z
A F Y KNH2 COOH
AF Y L K
A F
Y L K(precursor)2+
(B) (Y)
![Page 14: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/14.jpg)
m/z
A F Y L KNH2 COOH
K
A F Y L
AF Y L K
A F
Y L K
A F Y
L K
(precursor)2+
(B) (Y)
![Page 15: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/15.jpg)
15
• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.
• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics
• Final Considerations
Outline
![Page 16: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/16.jpg)
Strategies for protein identificationby mass spectrometry
• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to
specify PTMs a priori.• De novo sequencing
• Advantage: does not require a database • Disadvantage: most error prone.
• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to
small changes in the sequence• Disadvantages: not as sensitive as PSM when the protein
is in the DB
![Page 17: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/17.jpg)
17
• De novo sequencing• Advantage: does not require a database • Disadvantage: most error prone
M/Z
MS/MS
Inte
ns
ity
QG
D
F V L ET
S K
HA
GI
I
LV
L
G
T
SV
G
V
V
K
E
DA
S
PE
![Page 18: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/18.jpg)
18
• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to small sequence changes• Disadvantages: not as sensitive as PSM when the protein is in the DB
Na S et al., MCP, 2008
![Page 19: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/19.jpg)
19
• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to specify
PTMs a priori
![Page 20: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/20.jpg)
20
Protein Identification using a database
ProLuCIDXtandemOMSSA
AndromedaSEQUESTMascot
…
![Page 21: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/21.jpg)
Interpreting MS/MS Proteomics Results
Brian C. SearleProteome Software Inc. Portland, Oregon USA
NPC Progress Meeting(February 2nd, 2006)
Illustrated by Toni Boudreault
![Page 22: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/22.jpg)
M/z
Inte
nsity
R I T P E AH2O
B-type, A-type, Y-type IonsAll these peaks are seen together
simultaneouslyand we don’t
even know…
![Page 23: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/23.jpg)
M/z
Inte
nsity
What type of ion they are, making the mass differences approach even more difficult.
Finally, as with all analytical techniques,
![Page 24: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/24.jpg)
M/z
Inte
nsity
There’s noise,producing a final spectrum that looks like…
![Page 25: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/25.jpg)
M/z
Inte
nsity
….This, on a good day. And so it’s actually fairly difficult to…
![Page 26: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/26.jpg)
26
XCalibur :: Show experimental data
![Page 27: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/27.jpg)
Known Ion Types
B-type ionsA-type ionsY-type ions
We knew a couple of things about peptide fragmentation.
Not only do we know to expect B, A, and Y ions,
but…
![Page 28: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/28.jpg)
Known Ion Types
B-type ionsA-type ionsY-type ions
B- or Y-type +2H ionsB- or Y-type -NH3 ions
B- or Y-type -H2O ions
• 100%• 20%• 100%
• 50%• 20%• 20%
… likelihood of seeing each type of ion,
where generally B and Y ions are most prominent.
![Page 29: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/29.jpg)
If we know the amino acid
sequence of a peptide,
we can guess what the spectra should
look like!
So it’s actually pretty easy to guess what a spectrum
should look like
if we know what the peptide sequence is.
![Page 30: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/30.jpg)
ELVISLIVESK
Model Spectrum
*Courtesy of Dr. Richard Johnsonhttp://www.hairyfatguy.com/
So as an example, consider the peptide
ELVIS LIVES K
that was synthesized by Rich Johnson in
Seattle
![Page 31: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/31.jpg)
Model Spectrum
We can create a hypothetical spectrum based on our rules
![Page 32: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/32.jpg)
B/Y type ions (100%)
A type ionsB/Y -NH3/-H2O
(20%)
B/Y +2H type ions(50%)
Where B and Y ions are estimated at 100%,
plus 2 ions are estimated at
50%, and other stragglers are at 20%.
![Page 33: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/33.jpg)
Model Spectrum
So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…
![Page 34: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/34.jpg)
Model Spectrum
We can find where the overlap is between the hypothetical and the actual spectra…
![Page 35: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/35.jpg)
Model Spectrum
And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.
![Page 36: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/36.jpg)
• 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced.
• 1989 Yeast Genome project announced• 1990 Human Genome project announced• 1992 First chromosome (Yeast) sequenced• 1995 H. influenza sequenced • 1996 Yeast Genome sequenced• 2000 Human Genome draft
Sequencing Explosion
…
Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.
In 1994 Jimmy Eng and John Yates published a technique to
exploit genome sequencing
And the idea was …
for use in tandem mass
spectrometry.
![Page 37: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/37.jpg)
SEQUEST
.…instead of searching all possible peptide sequences,
search only those in genome databases.
Now, in the post- genomic world this seems like a pretty
trivial idea,
but back then there was a lot of assumption placed on
the idea
that we’d actually have a complete Human genome in
a reasonable amount of time.
![Page 38: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/38.jpg)
SEQUEST Model Spectrum
For a scoring function they decided to use Cross-Correlation,
Like so. which basically sums the peaks that
overlap between hypothetical and the actual spectra
![Page 39: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/39.jpg)
SEQUEST Model Spectrum
And then they shifted the spectra back and ….
![Page 40: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/40.jpg)
SEQUEST Model Spectrum
They used this number, also called the Auto-Correlation, as their background.
… Forth so that the peaks shouldn’t align.
![Page 41: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/41.jpg)
SEQUEST XCorr
Gentzel M. et al Proteomics 3 (2003) 1597-1610
Offset (AMU)
Cor
rela
tion
Sco
re
Cross Correlation(direct comparison)
Auto Correlation(background)
This is another representation of the Cross Correlation and the Auto Correlation.
![Page 42: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/42.jpg)
SEQUEST XCorrCross Correlation
(direct comparison)
Auto Correlation(background)
CrossCorr
avg AutoCorr offset=-75 to 75 XCorr =Gentzel M. et al Proteomics 3 (2003) 1597-1610
Offset (AMU)
Cor
rela
tion
Sco
re
The XCorr score is the Cross Correlation divided
by the average of the auto correlation over a
150 AMU range.
The XCorr is high if the direct comparison is significantly
greater than the background,
which is obviously good for peptide identification.
![Page 43: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/43.jpg)
SEQUEST DeltaCn
XCorr1 XCorr 2
XCorr1and so far, there really
haven’t been any significant
improvements on it.The DeltaCn is another
score that scientists often use.
It measures how good the XCorr is relative to the
next best match.
And this XCorr is actually a pretty robust method for estimating how accurate
the match is,
As you can see, this is actually a pretty crude calculation.
![Page 44: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/44.jpg)
44
Raw Xtractor / Pause for search
* Show an MS2 file
![Page 45: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/45.jpg)
45
ProLuCID
ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.
![Page 46: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/46.jpg)
Show ProLuCID RunnerCarvalho PC et al; unpublished
46
ProLuCID runner
![Page 47: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/47.jpg)
Search Engine (e.g. ProLuCID, SEQUEST, etc)
Workflow
MS PSM
Protein Identification
Database
![Page 48: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/48.jpg)
48
The Challenge: How to pinpoint trustworthy identifications
1 spectrum = 1 identification!
![Page 49: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/49.jpg)
49
Filtering data
![Page 50: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/50.jpg)
50
In the beginning…
spectrum scores protein peptide
sort
by
mat
ch s
core SEQUEST
XCorr > 2.5dCn > 0.1
MascotScore > 45
X!TandemScore < 0.01
Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.
Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.
![Page 51: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/51.jpg)
51
There has to be a better way
The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:
The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:
• Poor sensitivity/specificity trade-off, unless you consider multiple scores simultaneously.
• No way to choose an error rate (p=0.05).
• Need to have different thresholds for:– different instruments (QTOF, TOF-TOF, IonTrap)– ionization sources (electrospray vs MALDI)– sample complexities (2D gel spot vs MudPIT)– different databases (SwissProt vs NR)
• Impossible to compare results from different search algorithms, multiple instruments, and so on.
![Page 52: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/52.jpg)
52
Creating a discriminant score
spectrum scores protein peptide
sort
by
mat
ch s
core
PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.
PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.
![Page 53: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/53.jpg)
pcarvalho.com 53
Scaffold:: Proteome Software
![Page 54: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/54.jpg)
54
correctly identifieseverything, with
no error
Keller et al, Anal Chem 2002
This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).
The ideal is zero error and everything identified (sensitivity = 100%).
PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.
This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).
The ideal is zero error and everything identified (sensitivity = 100%).
PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.
![Page 55: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/55.jpg)
55
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
“correct”
“incorrect”
Discriminant score (D)
Num
ber
of s
pect
ra in
eac
h bi
nThis histogram shows the distributions of correct and incorrect matches.
PeptideProphet assumes that these distributions are standard statistical distributions.
Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.
This histogram shows the distributions of correct and incorrect matches.
PeptideProphet assumes that these distributions are standard statistical distributions.
Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.
Mixture of distributions
![Page 56: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/56.jpg)
56
Sequências alvo--------------------------
Decoys rotulados}{ Estratégia
decoy para FDR
Resultado
busca
Labeled decoy – False Discovery Rate
Elias and Gygi, Nature Methods, 2007
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
![Page 57: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/57.jpg)
pcarvalho.com 57
Search Engine Processor
SVM - example
![Page 58: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/58.jpg)
58
Summary: “The use of iProphet in the TPP increases thenumber of correctly identified peptides at a constant falsediscovery rate (FDR) as compared to both PeptideProphetand another state-of-the art tool Percolator.”
![Page 59: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/59.jpg)
59
Maximizing proteins under a given FDR
![Page 60: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/60.jpg)
60
![Page 61: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/61.jpg)
61
Target Sequences--------------------------
Labeled Decoys }{ New FDR strategy
Resultado
search
Unlabeled Decoys – False Discovery Rate
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
-------------------------Unlabeled Decoyd
U-Decoy
![Page 62: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/62.jpg)
Total Identified Spectra
LD (spectra) UD (spectra)
WNN 115248 1152 4656Bayes 108376 1083 1064
Unlabeled Decoys – False Discovery Rate
![Page 63: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/63.jpg)
Spectra Peptides Proteins (FDR) UL FDRSEPro 104,654 17,840 1283 (0.9%) 1%Scaffold 88,970 15,406 1,160 (2.3%) 2%
Table I. Scaffold A refers to a 99% confidence level for proteins, 95% for peptides. Scaffold B refers to 95 and 80%, respectively for proteins and peptides.
![Page 64: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/64.jpg)
64
Generating the SEPro Report
![Page 65: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/65.jpg)
65
Generating the SEPro Report
![Page 66: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/66.jpg)
66
Generating the SEPro Report
![Page 67: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/67.jpg)
67
Generating the SEPro Report
![Page 68: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/68.jpg)
68
Generating the SEPro Report
![Page 69: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/69.jpg)
69
Generating the SEPro Report
![Page 70: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/70.jpg)
70
Generating the SEPro Report
![Page 71: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/71.jpg)
71
• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.
• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics
• Final Considerations
Outline
![Page 72: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/72.jpg)
Relative quantitation
Thermo
![Page 73: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/73.jpg)
Picture from Strassberger et al, JOP, 2010
Label free quantitation
* Search for examples in xcalibur
Scan 12048How to deal with different charge states????
Subject to random sampling; what are its immplications?
![Page 74: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/74.jpg)
74
Differential Analysis is performed in two steps
Differential Analysis
Marginal Cases (found in only 1 condition)
Differential (found in both)
![Page 75: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/75.jpg)
75
Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate inbiological states B1 and B2. Panels A, B, and C consider only proteins that appearedin one or more, two or more, or in all three replicates, respectively.
![Page 76: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/76.jpg)
76
Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 (A) and B2 (B). R1, R2, and R3 refer to the replicates from 59each state.
![Page 77: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/77.jpg)
77
What proteins can be considered as statistically different for marginal cases?
![Page 78: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/78.jpg)
Low ()Num. Rep. (t) Num. Proteins Fraction () p-value
1 613 0.637 0.1802 283 0.294 0.0563 66 0.069 0.019
Medium ()1 297 0.310 0.1412 417 0.435 0.0423 245 0.255 0.015
High ()1 168 0.176 0.1122 185 0.193 0.0333 604 0.631 0.011
Very High ()1 59 0.070 0.0832 62 0.073 0.0243 725 0.857 0.008
Venn Diagram of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 and B2. Proteins that could not be statistically claimed to be differentially expressed in one of the two states according tothe proposed Bayesian approach (those forwhich p-value 0.05) were automatically filtered out during the generation of the Venn Diagram.
Carvalho PC et al; Bioinformatics 2011
![Page 79: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/79.jpg)
79
Differential Analysis is performed in two steps
Differential Analysis
Marginal Cases (found in only 1 condition)
Differential (found in both)
![Page 80: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/80.jpg)
80
}}
Estrategia Tradicional - Data Dependent Analysis (DDA)
Nova estrategia – Extended Data Independent Analysis (XDIA)
![Page 81: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/81.jpg)
Results
• Number of identified spectra increased by 250%.(improves label-free quantitation)
• Number of unique peptide increased by 35%.
81
![Page 82: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/82.jpg)
82
![Page 83: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/83.jpg)
![Page 84: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/84.jpg)
84
Multiplexed spectrum identification
![Page 85: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/85.jpg)
Confidence when integrating extracted ion chromatograms
DDA XDIA
![Page 86: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/86.jpg)
Co-eluting peptide ions of similar m/z
A AA, B B B B
Data Dependent AnalysisExtended Data Independent Analysis
Time
Peptide Mass:
![Page 87: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/87.jpg)
Spectral deconvolution and monotopic peaks reasignment to aid in identification and XIC quantitation
![Page 88: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/88.jpg)
88
• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.
• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics
• Final Considerations
Outline
![Page 89: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/89.jpg)
89
Show SEProQ here
![Page 90: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/90.jpg)
Pinpoint differentially expressed proteins Venn Diagrams
Gene Ontology Analysis Find trends in time-course experiments
PatternLab for proteomics: a one stop shop for data analysisCarvalho PC et al., Current Protocols in Bioinformatics, 2010
![Page 91: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/91.jpg)
91
![Page 92: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g](https://reader038.vdocument.in/reader038/viewer/2022103122/56649cf05503460f949bef00/html5/thumbnails/92.jpg)
Computational workflow
Finding Statistically Differentially Expressed Proteins / Data AnalysisPatternLab for proteomics (Trends, Venn Diagrams, Differential Statistics, Gene Ontology Analysis, etc..)
Protein Quantitation
Search Engine Processor / SEProQ
Protein Identification / Quality control ProLuCID => Search Engine Processor
Search Engine Preprocessing
YADA XDIA Processor CPM
Experimental: Data acquisition using the mass spectrometer
DDA XDIA