data extraction and analysis for lc-ms based proteomics · 2016-01-06 · data extraction and...
TRANSCRIPT
![Page 1: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/1.jpg)
Data Extraction and Analysis for LC-MS Based Proteomics
Instructors
Jake Jaffe2, Deep Jaitly1, and Matt Monroe1
Co-Organizers
Josh Adkins1 and Dick Smith1
1 Pacific Northwest National Laboratory, Richland, WA 993542 The Broad Institute, Cambridge, MA 02142
![Page 2: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/2.jpg)
Course OutlineIntroduction
BiosPipelinesData and Tools Availability
Feature discovery in LC-MS datasetsFeature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion
QuestionsFuture Directions
![Page 3: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/3.jpg)
Jake Jaffe – Short BioPh.D. Biology in 2004
Harvard UniversityDr. George Church and Dr. Howard Berg, advisors
Research Scientist at The Broad Institute of MIT and Harvard
Proteomics Platform
Co-developer of the Platform for Experimental Proteomic Pattern Recognition (PEPPeR)
Landmark Matching AlgorithmProteogenomic Mapping
Does both Laboratory and Computational BiologyLikes coffee a lot
Would like to thank organizers for inviting him to Seattle!
![Page 4: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/4.jpg)
Deep Jaitly – Short BioMaster’s of Mathematics in 2000
University of WaterlooDr. Ming Li and Dr. Paul Kearney, advisors
Lead algorithm developer for the PNNL proteomics teamAuthor or co-author of over 10 peer-reviewed articles>10 years experience in programming and algorithm development
C++, C#, & VB .NETMatLabJava
4 years industrial experience at Caprion with emphasis on algorithm development for LC-MS and LC-MS/MS data
![Page 5: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/5.jpg)
Matt Monroe – Short BioPh.D. Analytical Chemistry in 2002
University of North Carolina, Chapel HillDr. James Jorgenson, advisor
Administers and expands SQL Server databases and associated software that supports the AMT tag pipelineAuthor or Co-author of over 30 peer-reviewed articles>15 years experience in programming
VB6 / VB .NETSQLLabView
Key member of PNNL’s proteomics data analysis pipeline
![Page 6: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/6.jpg)
X!Tandem or SEQUESTw filtering& Archive
Upstreamseparations
Complex mixture of proteins
TandemMS spectra
ParentMS spectra
CIDLC-MS/MS
Shotgun or MuDPIT Proteomics
![Page 7: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/7.jpg)
High-throughput LC-FTICR-MS Analysis (AMT) tag
Accurate Mass and Time Tag Approach
SEQUEST and/or X!Tandem Results•Filtering•Calculate Exact Mass•Normalize Observed Elution Time
μLC- FTICR-MS Peak-Matched Results
Compare Abundancesacross Multiple ProteomesShi, Adkins, et. al., J. Bio. Chem. 2006, 29131-29140.
Complex samples
![Page 8: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/8.jpg)
Accurate Mass and Time (AMT) Tag Data Processing Pipeline
Automated sample processing
Sample blocking
Sample blocking& randomization
LCMSWarp
SLiCScoreQA/QC
trends
QA/QC trends
SEQUESTX!Tandem
MASIC
Decon2Ls VIPER
STARSuite ExtractorQ Rollup
Mini-proteome
PRISM: G.R. Kiebel et. al. Proteomics 2006, 6, 1783-1790.
![Page 9: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/9.jpg)
Example Data for the AMT tag Pipeline Demo
Salmonella typhimurium, LC-MS/MSGrown in LB (Luria-Bertani) up to log phaseSoluble portion of cell lysis“Mini-AMT tag” database, composed of 25 SCX fractions analyzed by LC-MS/MSMass and time tag database composed from searches using X!Tandem (Log E_Value ≤ -2)Linear alignment of datasets for AMT tag database
LC-MSDifferent sample, grown and prepared in the same conditionsLC-FTICR-MS analysis (11T FTICR)Non-linear alignment and peak matching to the database
![Page 10: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/10.jpg)
IdentificationSampleSet
QuantificationSampleSet
Identity-centricLCMSMethod
Pattern-centricLCMSMethod
FeatureDetection
LandmarkMatching
PeakMatching
MS/MSInterpretation
y1
y2
y3 y
4
y5 b4b3
b2
b1
T A I F
MarkerDiscovery
Targeted MS/MSIdentifications
PEPPeR Pipeline
![Page 11: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/11.jpg)
Software & DataAMT tag Pipeline Software
http://ncrr.pnl.gov/
http://www.proteomicsresource.org/
Salmonella typhimurium data resource
http://www.broad.mit.edu/cancer/software/genepattern/
PEPPeR, software within GenePattern
![Page 12: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/12.jpg)
Funding for Tool DevelopmentNIH
National Center for Research ResourcesNational Institute of Allergy and Infectious DiseasesNational Cancer InstituteNational Institute of General Medical SciencesNational Institute of Diabetes & Digestive & Kidney Diseases
DOE Office of Biological and Environmental Research
![Page 13: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/13.jpg)
Other Excellent Software Resourceshttp://www.ms-utils.org/ (Magnus Palmblad)
http://open-ms.sourceforge.net/index.php (European consortium)
http://tools.proteomecenter.org/SpecArray.php (ISB)
http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Peak_Alignment/(Tobias Kind with Oliver Fiehn)
http://www.proteomecommons.org/tools.jsp(Phil Andrews and Jayson Falkner)
![Page 14: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/14.jpg)
Course OutlineIntroductionFeature discovery in LC-MS datasets
Feature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion
![Page 15: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/15.jpg)
0
10
20
30
40
50
60
70
0 20 40 60 80 100
kolker_19Oct04_Pegasus_0804-4_FT100k-res #265 RT: 24.14 AV: 1 NL: 1.39E4T: FTMS + p NSI Full ms [ 300.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e A
bund
ance
328.23 759.05
511.73564.19
408.31
638.21 1103.01
943.96770.88
1991.14
742.19
1291.701144.81
1838.461589.94954.38
time (min)
% B
kolker_19Oct04_Pegasus_0804-4_FT100k-res #498 RT: 37.66 AV: 1 NL: 1.81E6T: FTMS + p NSI Full ms [ 300.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e A
bund
ance
601.85
464.25
927.49
736.43
754.47368.72
658.80
841.32 1097.501000.37
1991.071202.70 1867.131484.40 1629.98
kolker_19O ct04_Pegasus_0804-4_FT100k-res #991 RT: 66.77 AV: 1 NL: 1.06E6T: FTMS + p NSI Full m s [ 300.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m /z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
451.16
523.22
901.32624.12 759.06 918.35324.22 1103.02 1345.68 1986.661594.78 1789.34
QC Standards (12 protein digest)
Mass spectra capture the changing composition of peptides eluting from the column
LC-MS dataComplex peptide mixture on a column is separated by liquid chromatography over a period of timeChanging composition of the mobile phase causes different peptides to elute at different timesThe components eluting from a column is sampled continuously by sequential mass spectra
![Page 16: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/16.jpg)
Structure of LC-MS DataEach compound is observed as an isotopic pattern in a mass spectrum which is dependent on its chemical composition, charge and resolution of instrument
Peptide: VKHPSEIVNVGDEINVK
Parent Protein: gi|16759851 30S ribosomal protein S1
Charge: 2+m/z: 939.0203Monoisotopic Mass: 1876.0054 Da
939.51939.00
940.01
940.51
941.01 941.51
25
50
75
100
939 939.5 940 940.5 941 941.5m/z
Theoretical Profile
![Page 17: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/17.jpg)
Structure of LC-MS DataA mass spectrum of a complex mixture contains overlaid distributions of several different compounds
748.40
899.48
822.47
949.17
599.991103.03
459.48530.21 1282.13
1343.10
2.5e+6
5.0e+6
7.5e+6
1.00e+7
1.25e+7
1.50e+7
500 750 1000 1250m/z
scan 1844
inte
nsity
![Page 18: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/18.jpg)
Structure of LC-MS DataA mass spectrum of a complex mixture contains overlaid distributions of several different compounds.
![Page 19: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/19.jpg)
Structure of LC-MS DataWith LC as the first dimension, each compound is observed over multiple spectra, showing a three-dimensional pattern of m/z, elution time and abundance
Salmonella typhimurium dataset
Peptide: VKHPSEIVNVGDEINVK
Parent Protein: gi|16759851 30S ribosomal protein S1
Charge: 2+m/z: 939.0203Monoisotopic Mass: 1876.0054 Da
Elution range: Scans 1539 - 1593
![Page 20: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/20.jpg)
Feature Discovery in LC-MS dataGoal: Infer (mass, elution time, intensity) of compounds that are present in data obtained from an LC-MS dataset
Since their identities are unknown, the compounds are more appropriately termed features to refer to the idea that these are inferred from a three dimensional pattern
2D view of an LC-MS analysis of Salmonella typhimurium
![Page 21: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/21.jpg)
Feature Discovery in LC-MS dataSequential process of finding features in each mass spectrum is followed by grouping of features over multiple spectra together
2D views of an LC-MS dataset in different stages of processing
raw dataCollapsed
monoisotopicfeatures in all spectra
LC-MS featuresdeisotoping Elution profile discovery
0
1000
2000
3000
4000
5000
6000
500 1000 1500 2000 2500 3000 3500
scan #
mon
oiso
topi
c m
ass
![Page 22: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/22.jpg)
Feature discovery in individual spectraDeisotoping
Process of converting a mass spectrum (m/z, intensity) into a list of species (mass, abundance, charge)
Deisotoping a mass spectrum of 4 overlapping species
charge Monoisotopic MW abundance2 1546.856603 5334672 1547.705048 1946072 1547.887682 6719472 1548.799612 426939
![Page 23: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/23.jpg)
Deisotoping routine for a peakAlgorithm to detect peptides in a complex spectrum
avg. mass = 1876.02
Charge detectionalgorithm2
theoretical spectrum
Fitness value
Averagine3
estimated empirical formula:
C83 H124 N23 O25 S1
Mercury4
charge = 2
observed spectrum
1. Horn, D.M., Zubarev, R.A., McLafferty, F.W. Automated Reduction and Interpretation of High Resolution Electrospray Mass Spectra of Large Molecules. J. Am. Soc. Mass Spectrom. 2000, 11, 320-332.2. Senko, M. W.; Beu, S. C.; McLafferty, F. W. Automated assignment of charge states from resolved isotopic peaks for multiplycharged ions. J. Am. Soc. Mass Spectrom. 1995, 6, 52–56.3. Senko, M. W.; Beu, S. C.; McLafferty, F. W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233.4. Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Rapid Calculation of Isotope Distributions. Anal. Chem. 1995, 67, 2699–2704.
![Page 24: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/24.jpg)
Deisotoping entire spectrum –Modification of THRASH1
SpectrumCalculate
background intensity
Find peaks in spectrum
Choose most abundant peak
S/N, intensity > thresholds
Determine its charge
Guess empirical formula for mass
= (mz-1.00782)*CS
Generate theoretical
profile, initialize fit = ∞
Calculate fitscore
fit improves?
Calculate fit Fit improves?
fit better thanthreshold ?
m/z of peak = mz
yes
Done
no
charge = CS
Empirical formula=
CnHmNxOySz
Fit score = fitnew
fit = fitnew
yes
noUnshift
theoretical profile
yes
noFit score = fitnew
fit = fitnew
yes
no
Delete isotopic peaks from peak list, points in spectrum,
& add to deisotoped results
Delete isotopic peaks from peak list & profile in spectrum
Shift theoretical
profile by +1Da
Shift theoretical
profile by -1Da
1. Horn, D.M., Zubarev, R.A., McLafferty, F.W. Automated Reduction and Interpretation of High Resolution Electrospray Mass Spectra of Large Molecules. J. Am. Soc. Mass Spectrom. 2000, 11, 320-332.
![Page 25: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/25.jpg)
Modified THRASH RoutineAlgorithm to detect peptides in a complex spectrum1. Discover all peaks in a spectrum above a specified S/N and keep in
unprocessed list2. Select most abundant peak still unprocessed3. Compute charge for peak using charge detection algorithm4. Compute average mass from observed m/z and predicted charge value5. Use “Averagine” algorithm to guess empirical formula based on mass
and average composition of peptides in database6. Use Mercury algorithm to generate theoretical spectrum from the
predicted empirical formula, charge of peak, and resolution of peak7. Calculate fitness value for similarity between theoretical and observed
spectrum8. Perform “THRASHING” by overlaying theoretical and observed spectra
after applying “isotopic” one dalton shift to the theoretical spectrum. Keep best fit
9. If successful fit was observed, delete isotopic peaks and associated profile of height above specified threshold of most intense ions using the theoretical pattern as template. Otherwise remove current peak only from list of unprocessed peaks.
10. While unprocessed peaks remain, repeat steps starting at step 2.
![Page 26: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/26.jpg)
Charge Detection AlgorithmPatterson (Autocorrelation) algorithm to detect charge of a peak in a complex spectrum
1. Zhang, Z; Marshall, A.G. A Universal Algorithm for Fast and Automated Charge State Deconvolution of Electrospray Mass-to-Charge Ratio Spectra. J. Am. Soc. Mass Spectrom. 1998, 9, 225-233.
2. Senko, M. W.; Beu, S. C.; McLafferty, F. W. Automated assignment of charge states from resolved isotopic peaks for multiplycharged ions. J. Am. Soc. Mass Spectrom. 1995, 6, 52–56.
3. Labowsky, M; Whitehouse, C.; Fenn, J.B. Rapid Commun. Mass Spectrom. 1993, 7, 71-84.4. Reinhold, B.B.; Reinhold, V.N. Electrospray Ionization Mass Spectrometry: Deconvolution by an Entropy-Based Algorithm. J. Am. Soc. Mass
Spectrom. 1992, 3, 207-215.5. Mann, M.; Meng, C.K.; Fenn, J.B. Interpreting Mass Spectra of Multiply Charged Ions. Analytical Chemistry. Aug. 1, 1989, 61, 1702-1708.
938.5 939 939.5 940 940.5 941 941.50
0.5
1
1.5
2
2.5
3
3.5x 10
6
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
Δm/z
corre
latio
n2 4 6 8 10
0.0
0.2
0.4
charge
corre
latio
nP(Δmz) = ΣI(mzi) * I(mzi+ Δmz)
![Page 27: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/27.jpg)
Averagine AlgorithmAlgorithm to guess an average empirical formula for a given mass
Uses average composition of all peptides in peptide database as the empirical formula for all peptidesProtein database Averagine formula: C4.9384 H7.7583 N1.3577 O1.4773 S0.0417 , Mass = 111.1254Average Mass of 1877.025 would give a multiplier of 1877.025/111.1254
1. Senko, M. W.; Beu, S. C.; McLafferty, F. W. Determination of monoisotopic masses and ion populations for large biomoleculesfrom resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233.
10.0417*1877.025/111.1254S251.4773*1877.025/111.1254O
231.3577*1877.025/111.1254NRemainder = 112H844.9834*1877.025/111.1254CAtomicityCopiesElement
Empirical formula used for theoretical profile = C83 H112 N23 O25 S1
![Page 28: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/28.jpg)
Theoretical Isotopic ProfileMercury algorithm to generate a theoretical profile for a compound
Treat each element’s isotopic distribution as a sum of delta (δ) functions
1. Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Rapid Calculation of Isotope Distributions. Anal. Chem. 1995, 67, 2699–2704.2. Kubinyi, H. Calculation of isotope distributions in mass spectrometry. A trivial solution for a non-trivial problem. Analytica Chemica
Acta. 1991, 247, 107-119.
0.99759 δ(m-15.99491) + 0.000374 δ(m-16.99913) + 0.002036δ(m-17.99916)
Oxygen
0.9502 δ(m-31.97207) + 0.0075 δ(m-32.97145)+ 0.0421 δ(m-33.96786) + 0.0002 δ(m-35.96708)
Sulphur
0.996337 δ(m-14.00307 ) + 0.003663 δ(m-15.00011 )Nitrogen
0.99985 δ(m-1.007825) + 0.00015 δ(m-2.014102)Hydrogen
0.98893 δ(m-12) + 0.01107 δ(m-13.00336)Carbon
Isotope distribution FunctionElement
Relative isotope abundance Isotope Mass
![Page 29: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/29.jpg)
Theoretical Isotopic ProfileMercury algorithm to generate a theoretical profile for a compound
Treat each element’s isotopic distribution as a sum of delta (δ) functionsConvert distribution function into frequency domain: delta functions convert to simple exponential functions
0.99759 e15.99491(i2π)μ + 0.000374 e16.99913(i2π)μ + 0.002036e17.99916 (i2π)μ
Oxygen
0.9502 e31.97207 (i2π)μ + 0.0075 e32.97145 (i2π)μ + 0.0421e33.96786(i2π)μ + 0.0002 e35.96708(i2π)μ
Sulphur
0.996337 e14.00307(i2π)μ + 0.003663 e15.00011(i2π)μNitrogen
0.99985 e1.007825(i2π)μ + 0.00015 e2.014102(i2π)μHydrogen
0.98893 e12(i2π)μ + 0.01107 e13.00336(i2π)μCarbon
Frequency Spectrum Function (fElem(μ))Element
![Page 30: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/30.jpg)
Theoretical Isotopic ProfileMercury algorithm to generate a theoretical profile for a compound
Treat each element’s isotopic distribution as a sum of delta (δ) functionsConvert distribution function into frequency domain: delta functions convert to simple exponential functions. Calculate the isotopic profile for a compound from the convolution of isotopic distributions of individual atoms and the imposition of a peak shape reflecting resolution of instrumentCompute convolution using multiplication in the frequency domain and by applying a Fourier transform
F(m) = FT [s(μ) fC(μ)n fH(μ)m fN(μ)x fO(μ)y fS(μ)z]
For the empirical formula CnHmNxOySz
Frequency spectra of elementsInverse transform of shape function
![Page 31: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/31.jpg)
Fit FunctionsFit functions to quantitate quality of match between theoretical and observed profiles
• Least square area1: Σ (ti-oi)2 / Σti2
• Least square peak: Σ (Tj-Oj)2 / ΣTj2
• Chi-square area: Σ (ti-oi)2 / Σti• Chi-square peaks2: Σ (Ti-Oi)2 / ΣTi
Threshold intensity for points to be scored
ti: theoretical intensity of ithpoint
oi: observed intensity of ithpoint (after normalizing)
Tj: theoretical intensity of jth“isotopic” peak
Oj: observed intensity of jth“isotopic” peak
1. Horn, D.M., Zubarev, R.A., McLafferty, F.W. Automated Reduction and Interpretation of High Resolution Electrospray Mass Spectra of Large Molecules. J. Am. Soc. Mass Spectrom. 2000, 11, 320-332.
2. Senko, M. W.; Beu, S. C.; McLafferty, F. W. Determination of monoisotopic masses and ion populations for large biomoleculesfrom resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 1995, 6, 229–233.
![Page 32: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/32.jpg)
Chemical Labeling with TagsSpecify a static tag to be applied to Averagine formula
Changes the Averagine formula generated
346.01
348.01
347.02
346.70
2.5e+3
5.0e+3
7.5e+3
1.00e+4
346 346.5 347 347.5 348 348.5 349m/z
inte
nsi
t y
+TOF MS: 0.359 min from bromoadenosine.wiff Agilent
Scan # 18
Subtract average mass of tag
autocorrelation charge = 1 Average mass = 345.01
autocorrelation charge = 1 mass = 265.1065
C5 H174 N1 O1 S0
Calculate Averagine formula
C5 H174 N1 O1 S0 Br1
Add tag formula to Averagine formula
Interesting profile because of the isotopic distribution of bromine (51% 78.91833 , 49% 80.91629)
![Page 33: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/33.jpg)
Chemical Labeling with TagsSpecify a static tag to be applied to Averagine formula
Changes the Averagine formula generated
346.02348.02
349.02347.02
2.5e+3
5.0e+3
7.5e+3
1.00e+4
346 347 348 349m/z
inte
nsi
ty
+TOF MS: 0.359 min from bromoadenosine.wiff Agilent
Scan # 18
C5 H174 N1 O1 Br 1Theoretical distribution
Subtract average mass of tag
autocorrelation charge = 1 Average mass = 345.01
autocorrelation charge = 1 mass = 265.1065
C5 H174 N1 O1 S0
Calculate Averagine formula
C5 H174 N1 O1 S0 Br1
Add tag formula to Averagine formula
![Page 34: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/34.jpg)
16O/18O MixturesOverlapping isotope patterns separated by 4 Da
If peaks exist 4 Da before current peak, those are processed first, and only the first four isotopic peaks are removed
656.84
658.85657.34
657.84
659.35
658.35
659.85
660.35658.59
d=0.502
d=0.501
d=0.501
d=0.502
d=0.501
d=0.502d=1.002 d=1.022
5.0e+5
1.00e+6
1.50e+6
2.00e+6
2.50e+6
3.00e+6
657 658 659 660 661
m/z
inte
nsi
ty
![Page 35: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/35.jpg)
Isotopic CompositionChanging natural abundances
![Page 36: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/36.jpg)
Isotopic CompositionChanging natural abundances
25
50
75
100
890.3 890.5 890.8 891 891.3 891.5
m/z
Isotopic distribution of peptide with similar mass and charge (16+), but with natural isotopic distribution of atoms
890.32
890.45
890.58890.38
890.50
890.70890.25 890.630
d=0.062 d=0.065
d=0.056 d=0.071
2.5e+6
5.0e+6
7.5e+6
1.00e+7
1.25e+7
890.3 890.4 890.5 890.6 890.7
m/z
Inte
nsity
13C, 15N depleted media – isotopic composition of atoms is different from those found in nature. Distribution of isotopes of Sulfur predominates the distribution shown below
![Page 37: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/37.jpg)
Isotopic CompositionChanging natural abundances
Changing 12C/13C, 14N/15N isotopic abundances from those in nature to approriate ones results in a better fitAs shown, estimated isotopic abundances were still not perfect
m/z
890.32890.45
890.38890.51
890.57
890.63890.70
890.76
.
0.0 .
0 0.
d=0.062d=0.065
d=0.056d=0.071
d=0.054 d=0.073
2.5e+6
5.0e+6
7.5e+6
1.00e+7
1.25e+7
890.3 890.4 890.5 890.6 890.7
inte
nsity
![Page 38: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/38.jpg)
LC-MS Feature Discovery
• Black dots indicate individual m/z values• Green dots signify successfully deisotoped data• Shades of red indicate data intensity
• Black dots indicate individual m/z values• Green dots signify successfully deisotoped data• Shades of red indicate data intensity
![Page 39: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/39.jpg)
Course OutlineIntroductionFeature discovery in LC-MS datasets
Feature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion
![Page 40: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/40.jpg)
Feature definition over elution timeDeisotoping collapses original data into data lists
Goal: Given series of deisotoped mass spectra, group related data across elution time
Look for repeated monoisotopic mass values in sequential spectra, allowing for missing dataCan also look for expected chromatographic peak shape
46.740.01561872.86621871.8631873.0910.0296936.93894510072150080.40.01051512.75331512.7531513.7620.0706757.885450399321500
111.20.00651181.65411181.6541182.3790.0198591.83435698962150057.520.00881376.71451376.7151377.640.0446689.36456306572150039.060.0096729.1045729.1045729.54610.024730.111766147711500109.010.00761282.63411282.6341283.4170.0253642.32436639542150092.090.0091374.76951374.771375.6940.0384688.3927340702150079.220.00862023.05492022.0522023.3750.02675.024698876131500120.360.0165942.9742942.9742943.55180.1025943.981512136071150077.940.0061124.6361124.6361125.3220.012563.325322978222150074.750.0137863.4846863.4846864.00730.0156864.491924228291150074.040.02221102.0261102.0261102.6980.11111103.033261491311500718.830.0106758.0576758.0576758.52220.0716759.0649277293311500
signal noisefwhmmost abu.
mwmonoiso
mwaverage mwfitmzabundancechargescan num
![Page 41: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/41.jpg)
Feature definition over elution timeCan visualize deisotoped data in two-dimensions
Time
Mas
s
S. typhimurium dataset on 11T FTICR
![Page 42: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/42.jpg)
Charge state view
Feature definition over elution time
• Plotting monoisotopic mass,but color is based on charge of the original data point seen
• Monoisotopic Mass =(m/z x charge) - 1.00728 x charged
Time
Mas
s
![Page 43: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/43.jpg)
Feature definition over elution timeZoom-in view of species
Time
Mas
s
![Page 44: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/44.jpg)
Same species in multiple spectra need to be grouped together Related peaks found using a
weighted Euclidean distance; considers:
MassAbundanceElution timeIsotopic Fit
Feature definition over elution time
![Page 45: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/45.jpg)
Grouping uses single linkage clusteringForm connections between data points in n-dimensionsCompute the Euclidean distance between two points
distance = Sqrt { [weightmass x (massa – massb)]2 + [weightabu x (LogAbua – LogAbub)]2 +[weightET x (ETa – ETb)]2 +[weightfit x (fita– Fitb)]2 }
If distance < threshold, combine points together
Feature definition over elution time
![Page 46: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/46.jpg)
Determine 6 separate groupsTypically require 2 or 3 points per group
Feature definition over elution time
![Page 47: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/47.jpg)
Feature definition over elution timeFeature detail
Median Mass: 1904.9399 Da (more tolerant to outliers than average)Elution Time: Scan 1757 (0.363 NET)Abundance: 1.7x107 counts (area under desired SIC)
See both 2+ and 3+ dataStats typically come from the most abundant charge state (2+)
Scan number
Monoisotopic Mass
1,904.850
1,904.870
1,904.890
1,904.910
1,904.930
1,904.950
1,904.970
1,740 1,745 1,750 1,755 1,760 1,765 1,770 1,775 1,780 1,785 1,790
5 ppm
1 2 3Charge:
Selected Ion Chromatograms
0.0E00
5.0E+5
1.0E+6
1.5E+6
2.0E+6
Abu
ndan
ce (c
ount
s) Both2+ data3+ data
![Page 48: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/48.jpg)
Second exampleLC-MS feature eluting over 7.5 minutes
Feature definition over elution time
Clustering algorithm allows for missing data, common with chromatographic tailing
![Page 49: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/49.jpg)
Second example, feature detailMedian Mass: 2068.1781 DaElution Time: Scan 1809 (0.380 NET)Abundance: 8.7x107 counts (area under 3+ SIC)
This example has primarily 3+ data; previous had even mix of 2+ and 3+ data
Feature definition over elution time
Scan number
Monoisotopic Mass
2,068.075
2,068.095
2,068.115
2,068.135
2,068.155
2,068.175
2,068.195
1,775 1,800 1,825 1,850 1,875 1,900 1,925 1,950 1,975 2,000 2,025 2,050
1 2 3Charge:
5 ppm
0.0E+0
1.0E+6
2.0E+6
3.0E+6
4.0E+6
Abu
ndan
ce (c
ount
s)
Both2+ data3+ data
Selected Ion Chromatograms
![Page 50: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/50.jpg)
Feature definition over elution timeRefining the features
Require data spans at least 3 spectraExclude grouped feature if it is too long (e.g. ≥ 15% of dataset)
Scan number
1,612.650
1,612.670
1,612.690
1,612.710
1,612.730
1,612.750
1,540 1,545 1,550 1,555 1,560 1,565 1,570 1,575 1,580 1,585 1,590 1,595 1,600 1,605 1,610 1,615
1,612.770
0.0E+0
1.0E+6
2.0E+6
3.0E+6
4.0E+6
Sometimes the Euclidean distance results in undesirable clusteringSplit if elution profile indicates two or more entities with a mass difference ≥ threshold (e.g. 4 ppm)Necessary since hard to define clustering weights and distance constraints that work in all situations
9 ppm
![Page 51: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/51.jpg)
Feature definition over elution timeExample: S. typhimurium dataset on 11T FTICR
• 100 minute LC-MS analysis (3360 mass spectra)• 67 cm, 150 μm I.D. column with 5 μm C18 particles• 78,641 deisotoped peaks• Group into 5910 LC-MS Features
![Page 52: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/52.jpg)
Isotopic Pairs ProcessingPaired features typically have identical sequences, with and without an isotopic label
e.g. 16O/18O pairs or 14N/15N pairs
Data prior to finding features
LC-FTICR-MS
Control(16O water)
Perturbed(18O water)
![Page 53: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/53.jpg)
Isotopic Pairs ProcessingData after finding paired features
4 Da pair spacing due to incorporation of two 18O atoms
LC-FTICR-MS
Control(16O water)
Perturbed(18O water)
![Page 54: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/54.jpg)
Paired feature example: 16O/18O data
Isotopic Pairs Processing
Monoisotopic Mass
Scan number
1,235.0
1,236.2
1,237.4
1,238.6
1,239.8
1,241.0
1,242.2
1,243.4
1,244.6
1,245.8
1,247.0
2,688 2,700 2,712 2,724 2,7360.0E+00
5.0E+04
1.0E+05
1.5E+05
2.0E+05
2700 2710 2720 2730
Pair #424; Charge used = 2AR = 1.78 (LightArea÷Heavyarea); orAR = 1.34 ± 0.2 (scan-by-scan)
4.0085 Da
Scan number
Monoisotopic Mass
1,279.0
1,280.2
1,281.4
1,282.6
1,283.8
1,285.0
1,286.2
1,287.4
1,288.6
1,289.8
1,291.0
3,010 3,026 3,042 3,058
4.0085 Da
0.0E+00
1.0E+06
2.0E+06
3.0E+06
4.0E+06
3010 3020 3030 3040 3050 3060 3070
Pair #460; Charge used = 2AR = 0.13 (LightArea÷Heavyarea); orAR = 0.12 ± 0.02 (scan-by-scan)
Compute AR using ratio of areas, or Compute AR scan-by-scan, then average AR values (members must co-elute)
![Page 55: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/55.jpg)
Scan number
Monoisotopic Mass
2,925.0
2,934.0
2,943.0
2,952.0
2,961.0
2,970.0
2,979.0
2,988.0
2,997.0
3,006.0
3,015.0
1,695 1,698 1,701 1,704 1,707 1,710 1,713 1,716 1,719 1,722 1,725 1,728 1,731
Paired feature example: 14N/15N dataPair members often do not co-eluteUse bulk area ratio, or re-align pair members then compute AR scan-by-scan
Isotopic Pairs Processing
AR = 1.17 (LightArea÷Heavyarea)
1.0E+6
2.0E+6
3.0E+6
4.0E+6
5.0E+6
30.9 Da, corresponding to 31 N atomsMatching AMT: GILSGEFDHIPEQAFYMVGSIDEAVEKEmpirical formula: C134H201N31O44S
![Page 56: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/56.jpg)
Feature definition over elution timeNumerous options for clustering data to form LC-MS features and for finding paired features
![Page 57: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/57.jpg)
Course OutlineIntroductionFeature discovery in LC-MS datasets
Feature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion
![Page 58: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/58.jpg)
Accurate Mass and Time (AMT) tagUnique peptide sequence whose monoisotopic mass and normalized elution time are accurately knownAMT tags also track any modified residues in peptide
AMT tag DBCollection of AMT tags
AMT tag Approach articlesR.D. Smith et. al. Proteomics 2002, 2, 513-523.J.D.S. Zimmer, M.E. Monroe et. al., Mass Spec. Reviews 2006, 25, 450-482.L. Shi, J.N. Adkins, et. al., J. of Biological Chem. 2006, 281, 29131-29140.
Assembling an AMT tag DB
![Page 59: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/59.jpg)
What can we use an AMT tag DB for?Query LC-MS/MS data to answer questions
How many distinct peptides were observed passing filter criteria?Which peptides were observed most often by LC-MS/MS?How many proteins had 2 or more partially or fully tryptic peptides?
Correlate LC-MS features to the AMT tagsAnalyze multiple, related samples by LC-MS using a high mass accuracy mass spectrometer
e.g. Time course study, 5 data points with 3 points per sampleCharacterize the LC-MS features
Deisotope to obtain monoisotopic mass and chargeCluster in time dimension to obtain abundance information
Match to AMT tags to identify peptidesAlign in mass and time dimensionsMatch mass and time of LC-MS features to mass and time of AMT tags
Assembling an AMT tag DB
![Page 60: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/60.jpg)
Assembling an AMT tag DBCharacterizing AMT tags
Analyze samples by LC-MS/MS10 minute to 180 minute LC separationsObtain 1000's of MS/MS fragmentation spectra for each sample
Analyze spectra using SEQUEST, X!Tandem, etc.SEQUEST: http://www.thermo.com/bioworks/ X!Tandem: http://www.thegpm.org/TANDEM/index.htmlR. Craig and R.C. Beavis, Bioinformatics 2004, 20, 1466-1467.
Collate results
List of peptide
and proteinmatches
![Page 61: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/61.jpg)
AID_STM_019_110804_19_LTQ_16Dec04_Earth_1004-10 #11195 RT: 44.76 AV: 1 NL: 2.79E5T: ITMS + c NSI d Full ms2 [email protected] [ 160.00-1265.00]
200 300 400 500 600 700 800 900 1000 1100 1200m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e A
bund
ance
552.47
774.25
445.94987.28873.30
717.22580.74 866.10703.01437.21 1004.39
360.21 678.22231.21 1086.31973.13 1178.33
Assembling an AMT tag DBAMT tag example
R.VKHPSEIVNVGDEINVK.VObserved in scan 11195 of dataset #19 in an SCX fractionation series
3+ speciesMatch 30 b/y ionsX!Tandem hyperscore = 80X!Tandem Log(E_Value) = -5.9
y3b8++
y4
b9++
b10++
y5
b11++
y6
b13++
y7
b16++y8 y9
y10b7++
![Page 62: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/62.jpg)
Assembling an AMT tag DBAMT tag example
R.VKHPSEIVNVGDEINVK.VObserved in scan 11195 of dataset #19 in an SCX fractionation series
3+ speciesMatch 30 b/y ionsX!Tandem hyperscore = 80X!Tandem Log(E_Value) = -5.9
![Page 63: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/63.jpg)
Assembling an AMT tag DBAlign related datasets using elution times of observed peptides
One option: utilize NET prediction algorithm to create theoretical dataset to align against
NET Prediction uses position and ordering of amino acid residues to predict normalized elution time
0.76488.043-6.5R.TFAISPGHMNQLRAESIPEAVIAGASALVLTSYLVR.C
0.58973.961-8.9R.KVAAQIPNGSTLFIDIGTTPEAVAHALLGHSNLR.I
0.43862.803-11.6K.KTGVLAQVQEALKGLDVR.E
0.51962.583-7.3K.RFNDDGPILFIHTGGAPALFAYHPHV.-
0.41553.003-8.2R.GIIKVGEEVEIVGIK.E
0.22436.915-8.8R.LVHGEEGLVAAKR.I
0.16733.958-6.1R.AARPAKYSYVDENGETK.T
Predicted NET
Elution Time
X!TandemLog (E_Value)Peptide
K. Petritis, L.J. Kangas, et al., Analytical Chemistry 2003, 75, 1039-1048. K. Petritis, L.J. Kangas, et al., Analytical Chemistry 2006, 78, 5026-5039.
![Page 64: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/64.jpg)
0
0.2
0.4
0.6
0.8
1
20 40 60 80 100Elution Time (minutes)
Pre
dict
ed N
ET
y = 0.01081x -0.1829R2 = 0.95
Example: 506 unique peptides used for alignment; Log(E_Value) ≤ -6
Assembling an AMT tag DBAlign related datasets using elution times of observed peptides
One option: utilize NET prediction algorithm to create theoretical dataset to align against
NET Prediction uses position and ordering of amino acid residues to predict normalized elution time
Alignment yields NET values based on observed elution timesObserved NET = Slope×(Observed Elution Time) + Intercept
VKHPSEIVNVGDEINVKElution time: 44.923 minutesPredicted NET: 0.292Observed NET: 0.303
![Page 65: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/65.jpg)
Assembling an AMT tag DBAMT tag example
R.VKHPSEIVNVGDEINVK.VObserved in 7 (of 25) LC-MS/MS datasets in the SCX fractionation series
Analysis 1, scan 11195 3+, hyperscore 80, Obs. NET 0.303
Compute monoisotopic mass: 1876.0053 DaAverage Normalized Elution Time: 0.3021 (StDev 0.0021)
Analysis 2, scan 9945 3+, hyperscore 69, Obs. NET 0.298
Analysis 3, scan 10905 2+, hyperscore 74, Obs. NET 0.301
Analysis 4, scan 9667 2+, hyperscore 77, Obs. NET 0.302
![Page 66: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/66.jpg)
Assembling an AMT tag DBMass and Time Tag Database
Repository for AMT tagsMass, elution time, modified residues, and supporting information for each AMT tag
Allows samples of unknown composition to be matched quickly and efficiently, without needing to perform tandem MSAssembled by analyzing a control set of samples, cataloging each peptide identification until subsequent analyses no longer provide new identifications
0.0050.5572533.23048MYGHLKGEVA…QER36843675
0.0110.4592590.281511WVKVDGWDN…FER36715875
0.0020.3791960.06025HRDLLGATNP…TLR36609588
0.0050.2351175.61463SSALNTLTNQK17683899
0.0000.1431338.68261MTGRELKPHDR1662039
Observed NET
StDev
Average Observed
NET
Calculated Monoisotopic
MassLC-MS/MS Obs. CountPeptideMT Tag ID
![Page 67: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/67.jpg)
Assembling an AMT tag DBMini AMT tag DB
Database constructed from a relatively small number of datasetse.g. 25 SCX fractionation samples from S. typhimurium, each analyzed by LC-MS/MS and then by X!TandemProtein database: S_typhimurium_LT2_2004-09-19
4550 proteins and 1.4 million residues
>STM1834 putative YebN family transport protein (yebN) {Salmonella typhimurium LT2}
MFAGGSDVFNGYPGQDVVMHFTATVLLAFGMSMDAFAASIGKGATLHKPKFSEALRTGLI
FGAVETLTPLIGWGLGILASKFVLEWNHWIAFVLLIFLGGRMIIEGIRGGSDEDETPLRR
HSFWLLVTTAIATSLDAMAVGVGLAFLQVNIIATALAIGCATLIMSTLGMMIGRFIGPML
GKRAEILGGVVLIGIGVQILWTHFHG
>STM1835 23S rRNA m1G745 methyltransferase (rrmA) {Salmonella typhimurium LT2}
MSFTCPLCHQPLTQINNSVICPQRHQFDVAKEGYINLLPVQHKRSRDPGDSAEMMQARRA
FLDAGHYQPLRDAVINLLRERLDQSATAILDIGCGEGYYTHAFAEALPGVTTFGLDVAKT
AIKAAAKRYSQVKFCVASSHRLPFADASMDAVIRIYAPCKAQELARVVKPGGWVVTATPG
PHHLMELKGLIYDEVRLHAPYTEQLDGFTLQQSTRLAYHMQLTAEAAVALLQMTPFAWRA
RPDVWEQLAASAGLSCQTDFNLHLWQRNR
![Page 68: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/68.jpg)
Assembling an AMT tag DBDatabase Relationships
Minimum information required:Single table with Mass and NET
T_Mass_Tags
PK Mass_Tag_ID
PeptideMonoisotopic_MassNET
Expanded schema:
T_Proteins
PK Ref_ID
ReferenceDescription
T_Mass_Tags
PK Mass_Tag_ID
PeptideMonoisotopic_Mass
T_Mass_Tags_NET
PK,FK1 Mass_Tag_ID
Avg_GANETCnt_GANETStD_GANET
T_Mass_Tags_to_Protein_Map
PK,FK1 Mass_Tag_IDPK,FK2 Ref_ID
PK := Primary KeyFK := Foreign Key
![Page 69: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/69.jpg)
Assembling an AMT tag DBMicrosoft Access DB Relationships
Full schema to track individual peptide observations
V_Filter_Set_Overview_Ex
Filter_TypeFilter_Set_IDExtra_InfoFilter_Set_NameFilter_Set_Description
T_Analysis_Description
PK Job
DatasetDataset_IDDataset_Created_DMSDataset_Acq_Time_StartDataset_Acq_Time_EndDataset_Scan_CountExperimentCampaignOrganismInstrument_ClassInstrumentAnalysis_ToolParameter_File_NameSettings_File_NameOrganism_DB_NameProtein_Collection_ListProtein_Options_ListCompletedResultTypeSeparation_Sys_TypeScanTime_NET_SlopeScanTime_NET_InterceptScanTime_NET_RSquaredScanTime_NET_Fit
T_Mass_Tags
PK Mass_Tag_ID
PeptideMonoisotopic_MassMultiple_ProteinsCreatedLast_AffectedNumber_Of_PeptidesPeptide_Obs_Count_Passing_FilterHigh_Normalized_ScoreHigh_Peptide_Prophet_ProbabilityMod_CountMod_DescriptionPMT_Quality_Score
T_Mass_Tags_NET
PK,FK1 Mass_Tag_ID
Min_GANETMax_GANETAvg_GANETCnt_GANETStD_GANETStdError_GANETPNET
T_Proteins
PK Ref_ID
ReferenceDescriptionProtein_SequenceProtein_Residue_CountMonoisotopic_MassProtein_Collection_IDLast_Affected
T_Mass_Tags_to_Protein_Map
PK,FK1 Mass_Tag_IDPK,FK2 Ref_ID
Mass_Tag_NameCleavage_StateFragment_NumberFragment_SpanResidue_StartResidue_EndRepeat_CountTerminus_StateMissed_Cleavage_Count
T_Peptides
PK Peptide_ID
FK1 Analysis_IDScan_NumberNumber_Of_ScansCharge_StateMHMultiple_ProteinsPeptide
FK2 Mass_Tag_IDGANET_ObsScan_Time_Peak_ApexPeak_AreaPeak_SN_Ratio
T_Score_Discriminant
PK,FK1 Peptide_ID
Peptide_Prophet_FScorePeptide_Prophet_Probability
T_Score_Sequest
PK,FK1 Peptide_ID
XCorrDelCnSpDelM
T_Score_XTandem
PK,FK1 Peptide_ID
HyperscoreLog_EValueDeltaCn2Y_ScoreY_IonsB_ScoreB_IonsDelMIntensityNormalized_Score
![Page 70: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/70.jpg)
Assembling an AMT tag DBExample data
1876.00533VKHPSEIVNVGDEINVK24847Monoisotopic_MassPeptideMass_Tag_ID
R.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.VR.VKHPSEIVNVGDEINVK.V
Peptide
29421206392248477626329159206391248477255629118206390248476908129667206389248476538621090520638824847615113994520638724847574613111952063862484753428
Charge State
Scan NumberJobMass Tag
IDPeptide_ID
-11.2760.376263-13.777872556-12.826969081-12.8077.265386-12.857461511-4.9269.257461-5.8980.253428
Log(E_Value) HyperscorePeptide_ID
2.11E-0370.302124847StD_GANETCnt_GANETAvg_GANETMass_Tag_ID
T_Mass_Tags_NETT_Mass_Tags
T_Peptides T_Score_XTandem
![Page 71: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/71.jpg)
Assembling an AMT tag DBProcessing stepsThermo-Finnigan LTQ .Raw files
MS/MS spectra files
Convert to .Dta using Extract_MSn.exe. Concatenate .Dta files into _Dta.txt file using Perl script. Improved application (under development): Decon_MSn.exe
X!Tandem Results
Process _Dta.txt files with X!Tandem(round 1 partially tryptic; round 2 dynamic oxidized methionine)
Tab delimited text files
Convert X!Tandem .XML files to tab-delimited files using the Peptide Hit Results Processor application
Summarized result files
Microsoft Access DB
Align datasets using MTDB Creator application
Load into database
![Page 72: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/72.jpg)
Assembling an AMT tag DBPHRP Relationships
Results_Info
PK Result_ID
FK1 Unique_Seq_IDGroup_IDScanChargePeptide_MHPeptide_HyperscorePeptide_Expectation_Value_Log(e)Multiple_Protein_CountPeptide_SequenceDeltaCn2y_scorey_ionsb_scoreb_ionsDelta_MassPeptide_Intensity_Log(I)
Result_To_Seq_Map
PK,FK1 Unique_Seq_IDPK,FK2 Result_ID
Seq_Info
PK Unique_Seq_ID
Mod_CountMod_DescriptionMonoisotopic_Mass
Mod_Details
PK,FK1 Unique_Seq_ID
Mass_Correction_TagPosition
Seq_to_Protein_Map
PK,FK1 Unique_Seq_IDPK Protein_Name
Cleavage_StateTerminus_StateProtein_Expectation_Value_Log(e)Protein_Intensity_Log(I)
![Page 73: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/73.jpg)
Assembling an AMT tag DBDatabase histograms – filtered on Log(E_Value) ≤ -2
Peptide Mass Histogram
0
200
400
600
800
1000
1200
1400
500 1500 2500 3500 4500
Peptide Mass
Freq
uenc
y
NET Histogram
0
100
200
300
400
500
600
0 0.2 0.4 0.6 0.8 1
Normalized Elution Time
Freq
uenc
y
X!Tandem Hyperscore Histogram
0
200
400
600
800
1000
1200
20 40 60 80 100 120
Hyperscore
Freq
uenc
y
![Page 74: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/74.jpg)
0
5000
10000
15000
20000
0 5 10 15 20 25Dataset Count
Pep
tide
Cou
nt
0
15000
30000
45000
60000
0 100 200 300 400 500 600Dataset Count
Pep
tide
Cou
nt
AMT Tag DB Growth TrendTrend for Mini AMT Tag DB
25 SCX fractionation datasets of a single growth condition
Trend for Mature AMT Tag DB
521 different samples from ~25 different growth conditionsSlope of curve decreases as more datasets added and fewer new peptides are seen
Filtered on Log(E_Value) ≤ -2
Filtered on Peptide Prophet Probability ≥ 0.99
![Page 75: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/75.jpg)
Identifying LC-MS FeaturesVIPER software
Visualize and find features in LC-MS dataMatch features to peptides (AMT tags)Graphical User Interface and automated analysis mode
![Page 76: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/76.jpg)
Peak Matching StepsLoad LC-MS peak lists from Decon2LSFilter dataFeature definition over elution timeSelect AMT tags to match againstOptionally, find paired features (e.g. 16O/18O pairs)Align LC-MS features to AMT tags using LCMSWarpBroad AMT tag DB searchSearch tolerance refinementFinal AMT tag DB searchReport results
Identifying LC-MS Features
![Page 77: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/77.jpg)
AMT Tag database selection
Identifying LC-MS Features
Connect to mass tag system (MTS) if inside PNNL or use Standalone Microsoft Access DB
![Page 78: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/78.jpg)
Alignment using LCMSWarp
Calculated monoisotopic mass
Average observed NET
AMTs
Deisotoped monoisotopic mass
Observed scan number
LC-MS Features
Align scan number (i.e. elution time) of features to NETs of peptides in given AMT tag database
Match mass and NET of AMT tags to mass and scan number of MS featuresUse LCMSWarp algorithm to find optimal alignment to give the most matches
![Page 79: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/79.jpg)
Scan number
Alig
nmen
t S
core Best score = 0.00681
Scan = 1113Shift = 113
Alignment using LCMSWarp
N. Jaitly, M.E. Monroe et. al., Analytical Chemistry 2006, 78, 7397-7409.
LCMSWarp computes a similarity score from conserved local mass and retention time patterns
![Page 80: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/80.jpg)
Alignment Function
Heatmap of similarity score between LC-MS features and AMT tags (z-score representation)
Alignment using LCMSWarpSimilarity scores between LC-MS features and AMT tags are used to generate a score graph of similarityBest alignment is found using a dynamic programming algorithm that determines the transformation function with maximum likelihood
AMT tag
NET
MS Scan Number
S. typhimurium on 11T
N. Jaitly, M.E. Monroe et. al., Analytical Chemistry 2006, 78, 7397-7409.
![Page 81: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/81.jpg)
Alignment using LCMSWarpTransformation function is used to convert from scan number to NET
Features centered at same scan number get the same obs. NET valueWhen matching LC-MS Features to AMTs, we will search +/- a NET tolerance, which effectively allows for LC-MS features to shift around a little in elution time
0.16790.16790.16790.16770.16450.16330.16090.15940.15890.15890.1569
LC-MS Feature
NET
0.168210560.169710560.186210560.165210550.18310420.151910370.150910270.165310210.150710190.162610190.15191011
MatchingAMT tag
NET
LC-MS Feature
Scan
00.10.20.30.40.50.60.70.80.9
750 1250 1750 2250 2750 3250LC-MS Feature Scan
LC-M
S F
eatu
re N
ET
![Page 82: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/82.jpg)
Alignment using LCMSWarpNET Residual Plots
Difference between NET of LC-MS feature and NET of matching AMT tag
Indicates quality of alignment between features and AMT tags
This data shows nearly linear alignment between features and AMTs, but the algorithm can easily account for non-linear trends
NET Residuals if a linear mapping is used NET Residuals after LCMSWarp
AM
T ta
g N
ET
MS Scan Number
S. typhimurium on 11T
![Page 83: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/83.jpg)
Non-linear alignment example #1
Identical LC separation system, but having column flow irregularities
Alignment using LCMSWarp
AMT tag
NET
MS Scan Number
S. typhimurium on 9T
NET Residuals after LCMSWarp
NET Residuals if a linear mapping is used
![Page 84: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/84.jpg)
Non-linear alignment example #2
PMT Tag DB from C18 LC-MS/MS analyses using ISCO-based LC (exponential dilution gradient)LC-MS analysis used C18 LC-MS via Agilent linear gradient pump
Alignment using LCMSWarp
NET Residuals after LCMSWarp
NET Residuals if a linear mapping is used
S. oneidensis on LTQ-Orbitrap
![Page 85: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/85.jpg)
Non-linear alignment example #3
PMT Tag DB from C18 LC-MS/MS analyses using ISCO-based LCLC-MS analysis used C18 LC-MS via Agilent linear gradient pump
Alignment using LCMSWarp
NET Residuals after LCMSWarp
NET Residuals if a linear mapping is used
QC Standards (12 protein digest) on LTQ-Orbitrap
![Page 86: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/86.jpg)
Alignment using LCMSWarpLCMSWarp Features
Fast and robustPrevious method used least-squares regression, iterating through a large range of guesses (slow and often gave poor alignment)
Requires that a reasonable number of LC-MS features match the AMT Tag DB
S. typhimurium on 11Tmatch against 18,617 S. typhimurium PMTs
S. typhimurium on 11Tmatch against 65,193 S. oneidensis PMTs
![Page 87: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/87.jpg)
Alignment using LCMSWarpIn addition to aligning data in time, we can also recalibrate the masses of the LC-MS features
Possible because mass and time values are available for both LC-MS features and AMT tags
Two options for mass re-calibrationBulk linear correctionPiece-wise correction via LCMSWarp
Visualize mass differences using mass error histogram or mass residual plot
![Page 88: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/88.jpg)
Match TolerancesMass: ±25 ppmNET: ±0.05 NET
Mass Error HistogramList of binned mass error values
Difference between feature's mass and matching AMT tag's massBin values to generate a histogramTypically observe background false positive level
3.60.005691573.8321573.838111.80.018481571.8921571.910712.20.019121571.8311571.849811.30.017701571.7261571.7432511.10.017451570.8831570.9005
Mass Error (ppm)
Delta Mass (Da)
AMT Tag Mass (Da)
LC-MS Feature
Mass (Da)
100
200
300
400
-10 0 10 20
Count (LC-MS Features)
Mass Error (ppm)
Likely false positive
identifications
Likely true positive
identifications
![Page 89: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/89.jpg)
Option 1: Bulk linear correctionUse location of peak in mass error histogram to adjust masses of all featuresShift by ppm mass; absolute shift amount increases as monoisotopic mass increases
Shift all masses -11.6 ppm:
Δmass= -11.6ppm x massold
1x106 ppm/Da
For 1+ feature at 1570.9005 Da,Δmass = -0.0182 Da
For 3+ feature at 2919.4658 Da,Δmass = -0.0339 Da
Mass Calibration
100
200
300
400
-10 0 10 20
Count (LC-MS Features)
Mass Error (ppm)
Peak Center of mass: 11.6 ppmPeak Width: 2 ppm at 60% of maxPeak Height: 404 counts/binNoise level: 19 counts/bin
Peak Center of mass: 11.6 ppmPeak Width: 2 ppm at 60% of maxPeak Height: 404 counts/binNoise level: 19 counts/bin
![Page 90: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/90.jpg)
Mass Calibration
MS Scan Number
Mass Residual
Mass Error (ppm) vs. Scan Number
Option 2: Piece-wise correction via LCMSWarpExamine sections of the data to determine a custom mass shift for each sectionOne option is to divide into time sections
Mass Error (ppm) vs. Scan Number after correction
MS Scan Number
S. typhimurium on 11T
![Page 91: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/91.jpg)
Mass Calibration
Mass Error (ppm) vs. m/z
m/z
Mass Residual
Option 2: Piece-wise correction via LCMSWarpSecond option is to divide into m/z sectionsLCMSWarp utilizes a hybrid correction based on both mass error vs. time and mass error vs. m/z
Mass Error (ppm) vs. m/zafter correction
m/z
S. typhimurium on 11T
![Page 92: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/92.jpg)
Mass CalibrationComparison of the three methods
Mass error histogram gets taller, narrower, and more symmetricLinear Mass error vs. m/z Mass error vs. time Hybrid
Not all datasets show the same trends, but Hybrid mass recalibration is generally superior
0
100
200
300
400
500
600
700
-5 -4 -3 -2 -1 0 1 2 3 4 5Mass Error (ppm)
Bin
cou
nt
LCMSWarp_Hybrid
LCMSWarp_vs_time
LCMSWarp_vs_mz
Linear Correction
S. typhimurium on 11T
0
200
400
600
800
1000
1200
1400
1600
-5 -4 -3 -2 -1 0 1 2 3 4 5Mass Error (ppm)
Bin
cou
nt
LCMSWarp_Hybrid
LCMSWarp_vs_time
LCMSWarp_vs_mz
Linear Correction
S. oneidensis on LTQ-FT
![Page 93: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/93.jpg)
Peak Matching StepsLoad LC-MS peak lists from Decon2LSFilter dataFeature definition over elution timeSelect AMT tags to match againstOptionally, find paired features (e.g. 16O/18O pairs)Align LC-MS features to AMT tags using LCMSWarpBroad AMT tag DB searchSearch tolerance refinementFinal AMT tag DB searchReport results
Identifying LC-MS Features
![Page 94: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/94.jpg)
Identifying LC-MS FeaturesMatch Features to LC-MS/MS IDsS. typhimurium DB, from 25 LC-MS/MS analyses
18,617 AMT tags, all fully or partially trypticLook for AMT tags within a broad mass range, e.g., ±25 ppm and ±0.05 NET of each feature
Average observed NET
S. typhimurium on 11T FTICRS. typhimurium AMT Tag Database
18,617 AMT tags 5,934 features5,934 features4,678 features have match,matching 6,242 AMT tags
Observed NET
![Page 95: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/95.jpg)
Search tolerance refinementCan use mass error and NET error histograms to determine optimal search tolerances
Examine distribution of errors to determine optimal tolerance using expectation maximization algorithm
Examine distribution of errors to determine optimal tolerance using expectation maximization algorithm
±1.76 ppm
![Page 96: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/96.jpg)
Repeat search with final search tolerances5,934 features3,866 features with matches3,958 out of 18,617 AMT tags matched using ±1.76 ppm
Identifying LC-MS Features
Match TolerancesMass: ±25 ppmNET: ±0.05 NET
Match TolerancesMass: ±1.76 ppmNET: ±0.0203 NET
Observed NET
![Page 97: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/97.jpg)
NET
Monoisotopic Mass
1,767.960
1,767.964
1,767.968
1,767.972
1,767.976
1,767.980
1,767.984
0.350 0.358 0.366 0.374 0.382 0.390 0.398 0.407
Given feature can match more than one AMT tagNeed measure of ambiguity
1767.9727 DaNET: 0.383
1767.9727 DaNET: 0.383
0.3921767.9664R.SIGIAPDVLICRGDRAI.P36259992
0.3801767.9730K.DLETIVGLQTDAPLKR.A105490
0.3731767.9777T.RALMQLDEALRPSLR.S35896216
NETMass (Da)PeptideAMT Tag IDMatch TolerancesMass: ±4 ppmNET: ±0.02 NET
Δ mass = 2.8 ppmΔ NET = -0.010
Δ mass = 0.17 ppmΔ NET = -0.003
Δ mass = -3.5 ppmΔ NET = 0.009
Identifying LC-MS Features
![Page 98: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/98.jpg)
σmj = 4 ppm, σtj = 0.025
2
2
2
22 )()(
tj
tji
mj
mjiij
tmd
σμ
σμ −
+−
=⎟⎠
⎞⎜⎝
⎛−
−=
∑=
−
−
N
kiktkmk
ijtjmjij
d
dp
1
21
21
)2/exp()(
)2/exp()(
σσ
σσ
38837.2Sum:0.145521.4
0.7027042.5
0.166273.3
pijNumerator
3.2670.3921767.966436259992
0.0900.3801767.9730105490
3.0120.3731767.977735896216
dij2NETMass (Da)AMT Tag ID
K.K. Anderson, M.E. Monroe, andD.S. Daly. Proteome Science 2006, 4, DOI:10.1186/1477-5956-4-1.
dij
NET
Monoisotopic Mass
1,767.960
1,767.964
1,767.968
1,767.972
1,767.976
1,767.980
1,767.984
0.350 0.358 0.366 0.374 0.382 0.390 0.398 0.407
Match TolerancesMass: ±4 ppmNET: ±0.02 NET
0.70
0.16
0.14
Identifying LC-MS Features
![Page 99: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/99.jpg)
SLiC: Spatially Localized Confidence ScoreMeasures uniqueness of match
0.062.150.140.3921767.9664R.SIGIAPDVLICRGDRAI.P36259992
0.973.680.700.3801767.9730K.DLETIVGLQTDAPLKR.A105490
0.613.130.160.3731767.9777T.RALMQLDEALRPSLR.S35896216
Avg Disc
ScoreAverage
XCorrSLiC ScoreNETMass (Da)PeptideAMT Tag ID
NET
Monoisotopic Mass
1,767.960
1,767.964
1,767.968
1,767.972
1,767.976
1,767.980
1,767.984
0.350 0.358 0.366 0.374 0.382 0.390 0.398 0.407
0.16
0.14
0.70
Identifying LC-MS Features
K.K. Anderson, M.E. Monroe, andD.S. Daly. Proteome Science 2006, 4, DOI:10.1186/1477-5956-4-1.
![Page 100: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/100.jpg)
Effect of search tolerances on Mass Error histogramIf mass error plot not centered at 0, then narrow mass windows exclude valid dataDecreasing mass and/or NET tolerance reduces background false positive level
Search tolerance refinement
0
100
200
300
400
-6 -4 -2 0 2 4 6
Mass Error (ppm)
Cou
nt (F
eatu
res)
±25 ppm; ±0.05 NET±25 ppm; ±0.02 NET±3 ppm; ±0.02 NET±1.5 ppm; ±0.02 NET
0
25
50
75
100
-6 -4 -2 0 2 4 6Mass Error (ppm)
Cou
nt (F
eatu
res)
±25 ppm; ±0.05 NET±25 ppm; ±0.02 NET±3 ppm; ±0.02 NET±1.5 ppm; ±0.02 NET
Mass error histograms with linear mass correction
Mass error histograms with LCMSWarp mass correction
![Page 101: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/101.jpg)
Peak Matching StepsLoad LC-MS peak lists from Decon2LSFilter dataFeature definition over elution timeOptionally, find paired features (e.g. 16O/18O pairs)Align LC-MS features to AMT tags using LCMSWarpBroad AMT tag DB search
±25 ppm and ±0.05 NETSearch tolerance refinementFinal AMT tag DB search
e.g. ±1.8 ppm and ±0.02 NETReport results
Identifying LC-MS Features
![Page 102: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/102.jpg)
Automated processing using VIPERProcessing steps and parameters defined in .Ini file
Separate .Ini file for 14N/15N pairs and 16O/18O pairs
Automated Peak Matching
![Page 103: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/103.jpg)
Browsable result folders for visual QC of each datasetS. typhimurium on 11T FTICR
Data Searched Data With Matches
Mass Errors Before Refinement Mass Errors After Refinement
2D Plot MetricsReasonable number of matchesNET range ≈ 0 to 1
2D Plot MetricsReasonable number of matchesNET range ≈ 0 to 1
Peak Matching Results
Mass Error Histogram Metrics
Well defined, symmetric mass error peak centered at 0 ppm
Mass Error Histogram Metrics
Well defined, symmetric mass error peak centered at 0 ppm
![Page 104: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/104.jpg)
Browsable result folders for visual QC of each datasetS. typhimurium on 11T FTICR
Total Ion Chromatogram (TIC)
NET Errors Before Refinement NET Errors After Refinement
Base Peak Intensity (BPI) Chromatogram
Peak Matching Results
NET Error Histogram Metrics
Well defined, symmetric NET error peak centered at 0
NET Error Histogram Metrics
Well defined, symmetric NET error peak centered at 0
Chromatogram Metrics
Narrow peaks evenly distributed throughout separation window
Chromatogram Metrics
Narrow peaks evenly distributed throughout separation window
![Page 105: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/105.jpg)
Browsable result folders for visual QC of each datasetS. typhimurium on 11T FTICR
Peak Matching Results
NET Alignment Surface MetricsShould show a smooth, bright yellow, diagonal line
NET Alignment Surface MetricsShould show a smooth, bright yellow, diagonal line
NET Alignment Residual MetricsData after recalibration should be narrowly distributed around zero
NET Alignment Residual MetricsData after recalibration should be narrowly distributed around zero
![Page 106: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/106.jpg)
Peak Matching ResultsWhat about the unmatched LC-MS features?
Could align LC-MS features across datasets Find the unmatched ones that show interesting trends
m/z
scan #
Generate list of the mass and elution times for the interesting featuresRe-analyze the sample to perform targeted LC-MS/MSAlignment example for 36 datasets using prototype software tool
After alignment
![Page 107: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/107.jpg)
Similar approaches and software tools: High Res LC-MSSpecArray (Pep3D, mzXML2dat, PepList, PepMatch, PepArray)
X.-J. Li, et. al. Mol Cell Proteomics 2005, 4, 1328-1340.msInspect
M. Bellew et. al. Bioinformatics 2006, 22, 1902-1909.PEPPeR
J. Jaffe et.al. Mol. Cell. Proteomics 2006, 5, 1927-1941.XCMS (for Metabolite profiling)
C.A. Smith et. al. Analytical Chemistry 2006, 78, 779-787.Surromed label-free quantitation software (MassView)
W. Wang et al. Analytical Chemistry 2003, 75, 4818-4826.
LC-MS Feature Discovery
![Page 108: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/108.jpg)
Similar approaches and software tools: Low Res LC-MSSignal maps software
A. Prakash et. al. Mol. Cell Proteomics 2006, 5, 423-432.Informatics platform for global proteomic profiling using LC-MS
D. Radulovic, et al. Mol. Cell. Proteomics 2004, 3, 984-997.Computational Proteomics Analysis System (CPAS)
A. Rauch et. al. J. Proteome Research 2006, 5, 112-121.
LC-MS Feature Discovery
![Page 109: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/109.jpg)
Course OutlineIntroductionFeature discovery in LC-MS datasets
Feature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion
![Page 110: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/110.jpg)
Alternative Approaches to Mass Spec Features:
Picking, Assignment, and Statistical Discovery Tools
Jacob D. JaffeThe Broad Institute of Harvard and MIT
Proteomics Platform
![Page 111: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/111.jpg)
Section OutlineMAPQUANT: an image-based feature picking engine
PEPPeR: a self-contained web-based Biomarker Discovery pipeline
GenePattern: a suite of analysis and visualiztiontools that works with just about anything
![Page 112: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/112.jpg)
A picture is worth 1000 parameters…
VARMIX_C_01 RT: 20.00 - 40.00 Mass: 350.00 - 1200.00 NL: 5.83E7 F:
20 22 24 26 28 30 32 34 36 38 40Time (min)
400
500
600
700
800
900
1000
1100
1200
m/z
![Page 113: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/113.jpg)
MAPQUANT
MAPQUANT treats LCMS runs as images
Can leverage tried and true image processing techniques
Scripting language allows fine tuning and optimization for your data
Supporting both hi-res (FTMS) and lo-res (ITMS) platforms
High computational overheadSupports parallelization on a Linux cluster
Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM. MapQuant: open-source software for large-scale protein quantification. Proteomics. 2006 Mar;6(6):1770-82.
![Page 114: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/114.jpg)
Raw
![Page 115: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/115.jpg)
Common Image OperationsStructuring Elements
Morphological Operations
Source: http://rkb.home.cern.ch/rkb/AN16pp/node178.html
![Page 116: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/116.jpg)
Opening with a Cross Element
![Page 117: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/117.jpg)
Opening with an Ellipse Element
![Page 118: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/118.jpg)
Closing with a Cross Element
![Page 119: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/119.jpg)
Filtering and SmoothingEases ‘jagged’ peaks
Gaussian, Boxcar, and Savitsky-Golay Filters
![Page 120: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/120.jpg)
Rough Peak
![Page 121: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/121.jpg)
Gaussian Smoothed Peak
![Page 122: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/122.jpg)
SegmentationFind the rough peak borders (think paint bucket!)
Break into many subtasks
![Page 123: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/123.jpg)
Peak Centering/ModelingFind the Center: Structuring Elements Again
Raster structure element over segmentFind local maxima of products
Model as a 3D Gaussian/oidModel peaks to contain 95% of abundanceAltered model available to better fit chromatographic tailing effects
![Page 124: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/124.jpg)
Picked Peaks – Gaussian/oid Models
![Page 125: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/125.jpg)
Isotope ClustersFind features that are close in time and not more than 1 m/z unit apart
Single-linkage clustering
Deconvolve the clustered peaksIterative – once a successful deconvolution is made, these peaks are subtracted and the residual is deconvolved again until no more deconvolution can be done
![Page 126: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/126.jpg)
Clustered, Deisotoped, Deconvolved
![Page 127: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/127.jpg)
MAPQUANT Last WordsChain these operations into scripts for automated processing
Break LCMS run into m/z stripes for parallelization on a cluster
Output:Parameterized peaks with m/z, r.t., abundance, z, carbon content
Also visualization tools based on scripting language for method development / trouble shooting
Cross-platform: Win32 and Linux
![Page 128: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/128.jpg)
PEPPeR: Platform for Experimental Proteomics Pattern Recognition
Jaffe JD, Mani DR, Leptos KC, Church GM, Gillette MA, Carr SA. PEPPeR, a Platform for Experimental Proteomic Pattern Recognition. Mol Cell Proteomics. 2006 Oct;5(10):1927-1941.
![Page 129: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/129.jpg)
Multiple LCMS Experiments: Good with the Bad
There is a lot of information in therePeptide/protein IDsQuantitative dataStatistical assessment
The information may be noisyRetention time driftInstrument response noise
Are there methods to leverage this information?Without ‘perfect’ chromatography?Without strict alignment?
![Page 130: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/130.jpg)
PEPPeR Concepts – Samples and Data Acquisition
![Page 131: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/131.jpg)
PEPPeR Concepts – Data Processing
![Page 132: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/132.jpg)
PEPPeR Concepts – Processing Continued…
![Page 133: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/133.jpg)
PEPPeR Concepts – Analysis and Follow up
![Page 134: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/134.jpg)
Landmark Matching: Identity Propagation
Use accurate mass, relative retention order comparison to identify peaks
Current ExperimentA
B
C
X
Y
m/z=999.4991
m/z=999.4996
![Page 135: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/135.jpg)
Landmark Matching: Identity PropagationUse accurate mass, relative retention order comparison to identify peaks
Current ExperimentA
B
C
X
Y
m/z=999.4991
m/z=999.4996
Comparison Experiment
A
B
C
M
N
APEPTIDEKm/z=999.4993
APDITEPEKm/z=999.4993
![Page 136: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/136.jpg)
Landmark Matching: Identity Propagation
Current ExperimentA
B
C
X
Y
m/z=999.4991
m/z=999.4996
Comparison Experiment
A
B
C
M
N
APEPTIDEKm/z=999.4993
APDITEPEKm/z=999.4993
Is X = M ?
![Page 137: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/137.jpg)
Landmark Matching: Identity Propagation
Current ExperimentA
B
C
X
Y
m/z=999.4991
m/z=999.4996
Comparison Experiment
A
B
C
M
N
APEPTIDEKm/z=999.4993
APDITEPEKm/z=999.4993
Is X = N ?
![Page 138: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/138.jpg)
Landmark Scoring and Confidence
S = ξ(Λ− i ,Λ0 ) + ξ(Λ0 ,Λi )[ ]i=1
w
∑
⎪⎪⎩
⎪⎪⎨
⎧
⎩⎨⎧−
−>+<−>
<
=
elseelse
n(n)m(m)mnnm
nm
nm
if 0 if 1
)()( and )()( if 5.0)()( if
)()( if 1
),(σμσμδττ
ττ
ττ
ξ
Let:Λ be a list of peptides observed in the comparison experiment ordered
by elution time. Here, elution time is defined by the centroid of all MS/MS scans leading to the identification of the peptide.Λ0 is defined as the position of the putative assignment in Λ
μ(x) be the centroid of elution time of peptide xin the comparison experiment (in scans)
σ(x) be the standard deviation of elution time of peptide xin the comparison experiment (in scans)
τ(x) be the centroid of elution time of peptide xin the current experiment (in seconds)
δ be the average retention time peak width, such that peptides eluting within δ sec are considered to be co-eluting (typically δ = 30 s)
w the number of peptides to consider before and afterthe putative assignment on the landmark list (typically w = 3)
landmarkzmoverall PPP /=))(1))(|/(1()()|/(
)()|/()/|(
landmarkPlandmarkzmPlandmarkPlandmarkzmPlandmarkPlandmarkzmP
zmlandmarkPPlandmark
−−+
==
![Page 139: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/139.jpg)
Nuts and bolts: How it worksMatch features to sequenced peptides in a single LCMS run
Refine/recalibrate m/z tolerance
Re-match features to sequenced peptides in a single LCMS run
Now compare list of all features to Basis Set for mass, relative elution order matches given landmarks as reference points – propagation of identified features across multiple experiments
![Page 140: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/140.jpg)
Peak Matching: Recognizing Identical Features
Retention Time
m/z
}
}
Use landmarks to derive corrections and tolerances for clustering of features across LCMS experiments
Break down the problem to make it parallelizable
![Page 141: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/141.jpg)
Peak Matching: Recognizing Identical Features
Retention Time
m/z
Expe
rimen
ts
Use landmarks to derive corrections and tolerances for clustering of features across LCMS experiments
![Page 142: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/142.jpg)
Retention Time
m/z
Use landmarks to derive corrections and tolerances for clustering of features across LCMS experiments
Gaussian mixture model (GMM) with parameters determined by maximizing Likelihood ratio using Expectation Maximization (EM)Number of clusters determined using Bayesian Information Criterion (BIC)Coalesce clusters if M/Z and RT variation is within tolerance
Peak Matching: Recognizing Identical Features
![Page 143: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/143.jpg)
Parameterized PeaksPeak ID m/z R.T. z Run 1 Run 2 Run 3 Run … Identity
1 490.3144 62.0 3 607.6 544.2 581.0 …2 743.3549 56.2 3 694.4 682.6 691.4 …3 999.4991 22.5 2 209.6 247.6 232.6 … APEPTIDEK4 396.7187 20.5 3 321.7 344.9 318.5 …5 934.6045 31.7 2 722.7 753.0 701.3 …6 678.1993 32.4 3 371.2 387.2 441.4 …7 999.4994 56.8 2 857.1 811.0 750.5 … APDITEPEK8 526.6502 46.0 3 183.6 169.0 155.2 …9 1105.3597 69.4 3 1130.1 1075.7 1075.1 …
10 1292.0880 34.5 2 709.7 614.0 656.0 …
![Page 144: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/144.jpg)
Experimental Parameters
Thermo LTQ-FT Mass SpectrometerResolution 100,0001 MS, 3 MS/MS for Quantification Experiments1 MS, 10 MS/MS for Identification Experiments
Agilent 1100 Nano-flow Chromatograph12.5 cm x 75 μm Reprosil 3 μm C18AQ material200 nl/min, 50’ RP gradient, 15 μm tip opening
SpectrumMill for MS/MS searches0.05 Da tolerance, Autovalidation
MapQuant for feature detectionLeptos et. al, Proteomics 6:1770 (2006)
![Page 145: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/145.jpg)
Calibration and Landmark PerformanceScale Mixture
A B C D E F G H IAprotinin 1 2 3 10 20 30 100 200 300Ribonuclease A 300 1 2 3 10 20 30 100 200Myoglobin 200 300 1 2 3 10 20 30 100beta-Lactoglobulin 100 200 300 1 2 3 10 20 30alpha Casein 30 100 200 300 1 2 3 10 20Carbonic anhydrase 20 30 100 200 300 1 2 3 10Ovalbumin 10 20 30 100 200 300 1 2 3Fibrinogen 3 10 20 30 100 200 300 1 2BSA 2 3 10 20 30 100 200 300 1Transferrin 100 100 100 100 100 100 100 100 100Plasminogen 30 30 30 30 30 30 30 30 30beta-Galactosidase 10 10 10 10 10 10 10 10 10
All concentrations in fmol/ul (nM)Inject 1 ul x 5 replicates each
Peaks with IDs (avg. per run):165 ⇒ 281 +70%
False positive rate:93% p < 0.005100% p < 0.05
False negative rate:~2%
![Page 146: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/146.jpg)
Measurement of Ratios with VariabilityVariability Mixture
α β α β α β α β α βAprotinin 100 5 100 5 100 5 100 5 100 5
Ribonuclease A 100 100 100 100 100 100 100 100 100 100Myoglobin 100 100 100 100 100 100 100 100 100 100
beta-Lactoglobulin 50 1 50 1 50 1 50 1 50 1alpha Casein 100 10 100 10 100 10 100 10 100 10
Carbonic anhydrase 100 100 100 100 100 100 100 100 100 100Ovalbumin 5 10 5 10 5 10 5 10 5 10Fibrinogen 25 25 25 25 25 25 25 25 25 25
BSA 200 200 200 200 200 200 200 200 200 200Transferrin 10 5 10 5 10 5 10 5 10 5
Plasminogen 2.5 25 2.5 25 2.5 25 2.5 25 2.5 25beta-Galactosidase 1 10 1 10 1 10 1 10 1 10
All concentrations in fmol/ul (nM)Inject 1 ul x 5 replicates each
Person EPerson A Person B Person C Person D
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
βG Pls Ov CA Alb RNAse Myo Fbr β Tfn Cas Apr βLG
log 2
ratio
KnownMeasured
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
βG Pls Ov Fbr β CA BSA Myo RNAse Tfn Cas Apr βLG
log 2
ratio
KnownMeasured
Complex Variability Mixture:
Mix α + Mitochondrial Protein from 2 wk. mouse liver
Mix β + Mitochondrial Protein from 6 wk. mouse liver
1 prep each sample, 6 injections each
* *
*
*
*
**
~
~
*
*
*
![Page 147: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/147.jpg)
Discovery of Novel Markers with PEPPeR
α1
α3α2
α5α4
β1
β3β2
β5β4
αβ
gi Number Species Name223424 E. coli RNA polymerase β'
38491462 E. coli GroEL42144 E. coli NusA42818 E. coli RNA polymerase β42900 E. coli Ribosomal protein S1
26249756 E. coli Argininosuccinate synthase8099322 B. taurus κ-casein
B-Galactosidase had 1:10 ratio!Casein had 10:1 ratio!
peptides / m/z features
Designed accurate mass ‘inclusion lists’ to hit these targets
Confident IDs of previously identified peptides agree 100% of the time (59/59)
60 novel confident peptide IDs25 belong to proteins in the mix
24/25 are changing
35 are from proteins not designedto be in the mixture
![Page 148: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/148.jpg)
PEPPeR and GenePattern
GenePattern is a suite of tools originally developed for microarray analysis
AIM: reproducible research through well-defined processing pipelines
Many analysis modules availablePEPPeR: Landmark Matching and Peak MatchingDaisy-chainable into pipelinesFeed into statistical tools
![Page 149: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/149.jpg)
PEPPeR in GenePattern
![Page 150: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/150.jpg)
Insert your favorite stuff here…Landmark Matching is platform agnostic
Need to get your data into a few simple flat-file formats and then zip them up togetherSearch engines i.e. SEQUEST, SpectrumMill, Mascot, etc.Peak Pickers: MAPQUANT, msInspect, DeCyder, etc.Some helper apps can be found with the PEPPeR bundle on the GenePattern website
All works via web-client interfaceJust press go
![Page 151: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/151.jpg)
Landmark Matching Output
The main output is a zipped directory of all the processed files. This can be used as input into the PeakMatch module.
It is a good idea to check the error log to make sure that everything was processed correctly.
![Page 152: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/152.jpg)
Peak Matching Interface
![Page 153: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/153.jpg)
GenePattern Downstream ToolsDifferential analysis/marker selection
Gene/Class neighborsComparative marker selectionGene Set Enrichment Analysis
Class Prediction – supervised learning – with cross-validationRegression treesK-nearest neighborsNeural networksSupport Vector Machine
Class Discovery – unsupervised learningHierarchical clusteringSelf-organizing mapsPrincipal Component Analysis
Data VisualizationHeat Maps, etc.
![Page 154: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/154.jpg)
Summary – what I hope you learnedOne basis for peak picking based on image processing techniques (MAPQUANT)
PEPPeR: Landmark Matching and Peak MatchingKeep track of all of those pesky peaks that you picked!
GenePattern: A web-based tool to coordinate reproducible research
An entrée into downstream discovery methods in an automated pipeline (more GenePattern)
![Page 155: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/155.jpg)
AcknowledgementsBroad Proteomics:
Steve CarrD.R. ManiVincent Fusaro
Broad GenePattern Team:Michael ReichJosh Gould
Church Lab, Harvard Medical SchoolGeorge ChurchKyriacos Leptos
![Page 156: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/156.jpg)
URLs:PEPPeR / GenePattern:
http://www.broad.mit.edu/cancer/software/genepattern/http://www.broad.mit.edu/cancer/software/genepattern/desc/proteomics.html
MAPQUANT:http://arep.med.harvard.edu/MapQuant/
![Page 157: Data Extraction and Analysis for LC-MS Based Proteomics · 2016-01-06 · Data Extraction and Analysis for LC-MS Based Proteomics Instructors Jake Jaffe2, Deep Jaitly1, and Matt Monroe1](https://reader036.vdocument.in/reader036/viewer/2022081406/5f106f817e708231d44918d7/html5/thumbnails/157.jpg)
Course OutlineIntroductionFeature discovery in LC-MS datasets
Feature discovery in individual spectraFeature definition over elution time
Identifying LC-MS Features using an AMT tag DBBreakAMT tag Pipeline DemoMAPQUANT, PEPPeR and GenePatternPanel Discussion