error tolerant search large number of spectra remain without significant score. reasonable number of...
DESCRIPTION
PTMs Complete modifications (chemical modifications) Variable modifications 3TRANSCRIPT
1
Error tolerant search• Large number of spectra remain without significant score.
Reasonable number of fragment ion peaks might have not match.– Underestimated mass measurement error (should be seen in
peptide view graphs,– Incorrect determination of precursor charge state– Peptide sequence is not in the database.– Missed cleavage & unexpected cleavage,– Unexpected chemical & post-translational modification.
• The biological structure, function and activity of a protein can be determined by the modification of the given protein.
• An increasing part of the proteins that have been mapped to e.g. different diseases, not only change in expression levels but also or exclusively in the level of posttranslational modifications.
2
Post-Translational Modifications (PTMs)• PTM alters the weight of amino acids and the
peptide that results peak shifts in the spectrum:
b1: Hb2: HQb3: HQSb4: HQSVb5: HQSVM…b10:HQSVMVGMVQ
QSVMVGMVQK:y10
SVMVGMVQK: y9
VMVGMVQK: y8
MVGMVQK: y7
VGMVQK: y6
…K: y1
m/z200 400 1000
b1 y1 b2 b3 y10b10b3 y10b10y7 ……
H Q S V M V G M V Q Kb1
y10
b2
y9
b3
y8
b4
y7
b5
y6
b9
y5
b6
y4
b7
y3
b8
y2
b10
y1
3
PTMs
• Complete modifications (chemical modifications)
• Variable modifications
4
PTMs
• Obstacles– Complexity (means longer execution time)
• Can increase the search space 1,10,...10000 fold– Significance
5
Obstacles - Complexity• Let the theoretical peptide be:
– HQSVMVGMVQK (11 amino acids)– Each amino acid can be modified by, let’s say, 5 PTMs
# included PTMs # modified theoretical spectra time
0 1 1 sec
1 11*5 = 55 55 seconds (1min)
2 11*25 = 275 4.5 mins
3 11*15*125 = 20625 5.7 hours
...
10 29839 hours (3.5 years)
In general:Peptide length = LIncluded PTMs = KPTMs/aa = M
1074218759765625*1151011 10
KMKL
6
– Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra.
100%
0%
1
0
7
Significance
• Increases the random matches– Inserting many PTMs make the theoretical spectra
too flexible and in the end all theoretical spectra can be aligned to the experimental spectra.
T hscore
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
Freq
uenc
y
8
Computational Identification of PTMs
• 3 approaches:– Targeted,– Untargeted or also called restricted– Unrestricted, de novo, blind search
9
Targeted approach
• Almost all search engine supports it.– Experimenter needs to guess the PTMs in the
sample. • Two pass strategy
– Two rounds, refinement on a smaller – Sequest, Mascot
10
Targeted approach – X!Tandem
11
Targeted approach – InsPecT
12
Untargeted approaches
• Uses a big list of databases– Search space is limited but can be very huge.– if we allow 5 of the 10 most frequent
modifications to occur in a peptide at the same type, the search space grows 3 orders of magnitude.
– The growth is more dramatic if instead of 10 types of modifications we wish to consider all of roughly 500 known types.
13
Database of PTMs
• Unimod– http://www.unimod.org– Contains 906 modifications
• Resid– http://www.ebi.ac.uk/RESID– 559 Entries
14
Untargeted
• PILOT_PTM– Uses a large dataset of modifications.– Binary Linear programming.
• Objective function is the number of the matched peaks• Linear constrain functions are guarantee meaningful
modifications of the peptide.
15
Unrestricted
• No priori information about PTMs. • De novo identification of PTMs• Search space is infinite.• In practice no more than one or two PTMs can
be identified on the same peptide.
16
TwinPeaks approach
• Based on the Sequest idea.• Shifts the experimental spectra over a range,
and plots the similarity score as a function of the mass shift.
17
TwinPeaks approachSu
m o
f mat
ched
inte
nsity
18
MS-Alignment
• Based on the alignment of the theoretical spectra to the experimental spectra
19
Theoretical Spectrum
Expe
rimen
tal S
pect
rum
20
MS-alignment
21
Comparison of targeted and unrestricted results
Scan ID log(-E) Peptide3.1.1 -13.8 fqyr295 ILTAAALCHF TSIEVVK 311kasg (130)
6.1.1 -6.6 rihr159 FVEKPQVFVS NK 170inag (471)
11.1.1 -3.4 rtcr30 SPEPGPSSSI GSPQASSPPR PN 51hyll (48)
12.1.1 -4.0 dvtr473 TMHFGTPTAY EK 484ecft (306)
13.1.1 -10.0 ietk133 FFDDDLLVST SR 144vrlf (176)
24.1.1 -4.2 pskr237 QTNGCLNGYT PSR 249krqa (112)
25.1.1 -2.5 ntpr149 KNGGLGHMNI ALLSDLTK 166qisr (1776)
27.1.1 -7.4 pqgr19 IHQIEYAMEA VK 30qgsa (10317)
31.1.1 -2.0 kefk80 DREDLVPYTG EK 91rgkv (137)
34.1.1 -1.6 dyhr131 YLAEFATGND R 141keaa (9406)
35.1.1 -7.0 grar16 QYTSPEEIDA QLQAEK 31qkar (2754)
36.1.1 -2.0 rlar172 QDPQLHPEDP ER 183raai (644)
37.1.1 -8.1 iflh92 ISDVEGEYVP VEGDEVTYK 110mcsi (73)
38.1.1 -3.9 mrsr328 TASGSSVTSL DGTR 341srsh (2698)
40.1.1 -3.7 lgnk29 YVQLNVGGSL YYTTVR 44altr (71)
42.1.1 -1.9 dlqk183 EGEFSTCFTE LQR 195dflk (239)
45.1.1 -2.9 pkek135 QPVAGSEGAQ YR 146kkql (694)
46.1.1 -10.3 lsar446 ASNAWILQQH IATVPSLTHL CR 467leir (107)
53.1.1 -6.8 evyr175 NSMPASSFQQ QK 186lrvc (7099)
57.1.1 -4.7 iygk81 QFEDELHPDL K 91ftga (491)
Scan ID P-value Peptide3 1.00E-05 R.ILTAAALCHFTSIEVVK.K
6 1.00E-05 R.FVEKPQVFVSNK.I
13 1.00E-05 K.FFDDDLLVSTSR.V
27 1.00E-05 R.IHQIEYAMEAVK.Q
47 0.028806584 A.V+172LTAFANGR.S
57 1.00E-05 K.QFEDELHPDLK.F
58 0.004739336 R.ETFY+18LAQDFFDR.F
59 1.00E-05 R.TCLSQLLDIMK.S
71 1.00E-05 K.EYFSTFGEVLM+16VQVK.K
75 1.00E-05 K.QH-18LENDPGSNEDTDIPK.G
97 0.004672897 Q.L+128GVSHVFEYIR.S
98 0.004830918 C.T+160EDMTEDELR.E
99 1.00E-05 R.EFFD-18SNGNFLYR.I
100 1.00E-05 R.LVLESPAPVEVNLK.L
105 1.00E-05 K.LQEFAYVTDGAC+14SEEDILR.M
108 1.00E-05 K.SFDENGFDYLLTYSDNPQTVFP+156.R
115 1.00E-05 R.GPATVEDLPSAFEEK.A
119 1.00E-05 Y.ITD+163VLTEEDALEILQK.G
147 1.00E-05 R.IYSYQMALTPVVVTLWYR.A
X!Tandem targeted
MS-AlignmentUnrestricted (de novo)
22
Validate your results
23
Summary
• What you should remember:– PTM identification is computationally expensive– 3 approaches (targeted, untargeted, unrestricted)– Always examine the results, omit weird PTMs,– Decreases the statistical significance– The more you are looking for the less you get (due
to significance)