identification of variables and parameters for protein data analysis in clinical diagnostics david...
Post on 20-Dec-2015
218 Views
Preview:
TRANSCRIPT
Identification of variables and parameters for protein data
analysis in clinical diagnostics
David Yang
Leighton Ing
Mentor: Dr. Tina Xiao
JPL/NASA
Proteomics
National Cancer Institute and Early Detection Resource Network - Clinical Diagnostics Analyzing protein signature for general
characterization of normal vs. pathogenic states
Project Goals Characterize the experimental variables
which affect Mass Spectrometry(MS) output & the necessary steps of MS data processing What influences output and how do we
correct for those influences? What information do other users need?
Identify parameters for software evaluation in the processing of MS data.
Methodology
Research a method of protein analysis Research the mechanics Analyze how the mechanics influence the
output Recognize data important to other users Identify the data processing steps for
extracting a useful spectrum
Method of Protein Analysis
Mass spectrometry Measures quantity of molecules with specific
mass to charge ratios Produces output which could be used as a
protein signature Matrix Assisted Laser
Desorption/Ionization Time of Flight for protein analysis
Matrix Assisted Laser Desorption/Ionization (MALDI)
Light
MassAnalyzer
Proteinsample
Time of Flight (TOF)
Ionized particles accelerated by magnetic field
MALDI-TOF-MS
MALDI TOF Mass Spectrometry of a protein sample has three elements with parameters that influence output Inconsistencies between them reduce the
ability to compare samples• Produce variation which is not necessarily caused
by protein composition of sample
Sample
Freeze/thaw cycles Source of sample
Serum vs tissue Fractionated? Digested w/ protease?
Laser Ionization/Desorption
Plate and Matrix used in LDI Crystallization pattern Laser intensity
Plate and Matrix
Laser Ionization/Desorption
Plate and Matrix used in LDI Crystallization pattern Laser intensity
Crystallization
Randomized process Introduces variation between shots
Laser Ionization/Desorption
Plate and Matrix used in LDI Crystallization pattern Laser intensity
Mass analyzer
Mass calibration Internal vs external
Reflectron usage Detector voltage Detector saturation
Mass Calibration
Internal
External
Sample + Standard
Sample Standard
Mass analyzer
Mass calibration Internal vs external
Reflectron usage Detector voltage Detector saturation
Reflectron
Mass analyzer
Mass calibration Internal vs external
Reflectron usage Detector voltage Detector saturation
Mass analyzer
Mass calibration Internal vs external
Reflectron usage Detector voltage Detector saturation
Output Processing
Understanding the mechanics tells us what we need to do to process the output Usability of raw output for protein signature
comparison is limited
Baseline Correction
High KE ions saturate the detector, resulting in a higher intensity output
Malyarenko et al. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques
Mass Calibration
Required to convert time series output into m/z ratio
Time Intensity1 21832 21523 21184 21155 2086
M/Z Intensity9.9294487 21839.9375644 21529.9455109 21189.9532881 21159.9608962 2086
Normalization
Scale the intensities based on the largest intensity
Improves ability to compare samples by reducing the variability of intensity between spectra
www.psrc.usm.edu/mauritz/maldi.html
Smoothing
Decrease effects of electrical system noise
Peak detection
Identify potential masses Reduces number of features which need
to compared
Where am I?
Peak alignment
Aligns corresponding peaks across samples
Reduces phase variation across samples by ensuring that peptides share their set of peak locations
Averaging of spectra
Address variability between runs by averaging replicates
Recall crystallization and shot variability Averaging of multiple laser shots often
performed by machine
Results Identified vital information that affects the output
of the machine Information useful for a researcher using the spectra
Researched the processes which make the output more useful as protein signature
Next step: Identify parameters for software evaluation in MS data processing
Goal – Identify parameters for evaluating
software capabilities in the processing and analysis of Mass Spectrometry data.
Three candidates VIBE (Incogen Inc.) geWorkbench (Forge) S-PLUS (Insightful Corp.)
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
General Parameters Platform/Operating system compatibility?
Is the software Open source?
Is the software capable of performing the necessary tasks independently? Additional modifications? Internet access? Server ?
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Data Input
What types of file formats can the software open? Import?
What type of format must the data be? DNA (nucleotides – A, T, G, C) Proteins (amino acids – M, L, A, I, etc.)
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Software algorithms necessary for Proteomics data analysis
Can the software perform: Baseline subtractions? Mass calibrations? Noise reductions? Peak identifications? Normalization? Peak alignments?
Baseline Subtraction
(Malyarenko, et al. 2005)
Mass Calibration
(Kearsleya, et al. 2005)
Smoothing/Noise Reduction
(Malyarenko, et al. 2005)
Peak Identifications
(Do, 2006)
Normalization
(Kearsleya, et al. 2005)
Peak Alignments
(Malyarenko, et al. 2005)
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Results – Visualization of results
How can you visualize the data? Save/Export work
Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other
software packages? Print out
Can you print out a hard copy for record?
Visualization
MUSCLE (Edgar)
VIBE (Incogen Inc.)
Results – Visualization of results
How can you visualize the data? Save/Export work
Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other
software packages? Print out
Can you print out a hard copy for record?
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Software Benefits
What benefits does the software offer? Convenience of integrated modules Efficient – saves “man-power” of having to sit
there and do everything User-friendly interface
Convenience of Integrated
Modules
Efficiency
User-friendly Interface?
Software Evaluation
General parameters Input formats Algorithms for processing and analysis of
proteomics data Results Benefits Limitations
Software Limitations
Limitations customization Small modifications to existing modules? Adding a new module?
Internet/Server Dependent?
Conclusion – We have identified these parameters to
be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments
Conclusion –
VIBE Capable of manipulating protein sequences,
but unable to process raw data. geWorkbench
Did not pass general parameters for installation.
S-Plus Evaluation still in progress…
VIBE (by Incogen Inc.) Convenient integration of nucleotide and
amino acid analysis tools – BLAST (–X, –N, –P, TBLASTN, TBLASTP) Nucleotide and AA search
• FASTA, –X, –Y, Smith-Waterman, etc. Sequence manipulations
• Primer3, Conditional Filters, Translations, etc. Sequence alignments
• Crossmatch, ClustalW, Hidden Markov Model, etc.
Conclusion – We have identified these parameters to
be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments
Conclusion –
VIBE Capable of manipulating protein sequences,
but unable to process raw data. geWorkbench
Did not pass general parameters for installation.
S-Plus Evaluation still in progress…
Conclusion –
VIBE Capable of manipulating protein sequences,
but unable to process raw data. geWorkbench
Did not pass general parameters for installation.
S-Plus Evaluation still in progress…
Literature Citations1) Do, P. Improved Peak Detection in Mass Spectrometry Spectrum by
Incorporating Continuous Wavelet Transform-based Pattern Matching. Robert H. Lurie Comprehensive Cancer Center, Northwestern University. ppt slides. 2006.
2) Kearsleya, A., Wallaceb, W.E., Bernala, J., and CM Guttmanb. A numerical method for mass spectral data analysis. Applied Mathematics Letters. 18:1412–1417, 2005.
3) Malyarenko, D.I., Cooke, W.E., Adam B-L, Malik, G., Chen, H., Tracy, E.R., Trosset, M.W., Sasinowski, M., Semmes, O.J. and D.M. Manos. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques. Clinical Chemistry. 51(1):65-74. 2005.
Acknowledgements Jet Propulsion Laboratory
Dr. Tina Xiao
Southern California Bioinformatics Summer Institute (SoCalBSI) Dr. Sandra Sharp Dr. Jamil Momand Dr. Wendie Johnston Dr. Nancy Warter-Perez Ronnie Cheng Friends
Duke University Medical Center Dr. Simon Lin
Center for Disease Control and Prevention (CDC) Dr. R Cameron Craddock
Huntington Medical Research Institute (HMRI) Dr. James Riggins Dr. Alfred Fonteh
top related