identification of variables and parameters for protein data analysis in clinical diagnostics david...

Post on 20-Dec-2015

218 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Identification of variables and parameters for protein data

analysis in clinical diagnostics

David Yang

Leighton Ing

Mentor: Dr. Tina Xiao

JPL/NASA

Proteomics

National Cancer Institute and Early Detection Resource Network - Clinical Diagnostics Analyzing protein signature for general

characterization of normal vs. pathogenic states

Project Goals Characterize the experimental variables

which affect Mass Spectrometry(MS) output & the necessary steps of MS data processing What influences output and how do we

correct for those influences? What information do other users need?

Identify parameters for software evaluation in the processing of MS data.

Methodology

Research a method of protein analysis Research the mechanics Analyze how the mechanics influence the

output Recognize data important to other users Identify the data processing steps for

extracting a useful spectrum

Method of Protein Analysis

Mass spectrometry Measures quantity of molecules with specific

mass to charge ratios Produces output which could be used as a

protein signature Matrix Assisted Laser

Desorption/Ionization Time of Flight for protein analysis

Matrix Assisted Laser Desorption/Ionization (MALDI)

Light

MassAnalyzer

Proteinsample

Time of Flight (TOF)

Ionized particles accelerated by magnetic field

MALDI-TOF-MS

MALDI TOF Mass Spectrometry of a protein sample has three elements with parameters that influence output Inconsistencies between them reduce the

ability to compare samples• Produce variation which is not necessarily caused

by protein composition of sample

Sample

Freeze/thaw cycles Source of sample

Serum vs tissue Fractionated? Digested w/ protease?

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Plate and Matrix

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Crystallization

Randomized process Introduces variation between shots

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Mass Calibration

Internal

External

Sample + Standard

Sample Standard

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Reflectron

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Output Processing

Understanding the mechanics tells us what we need to do to process the output Usability of raw output for protein signature

comparison is limited

Baseline Correction

High KE ions saturate the detector, resulting in a higher intensity output

Malyarenko et al. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques

Mass Calibration

Required to convert time series output into m/z ratio

Time Intensity1 21832 21523 21184 21155 2086

M/Z Intensity9.9294487 21839.9375644 21529.9455109 21189.9532881 21159.9608962 2086

Normalization

Scale the intensities based on the largest intensity

Improves ability to compare samples by reducing the variability of intensity between spectra

www.psrc.usm.edu/mauritz/maldi.html

Smoothing

Decrease effects of electrical system noise

Peak detection

Identify potential masses Reduces number of features which need

to compared

Where am I?

Peak alignment

Aligns corresponding peaks across samples

Reduces phase variation across samples by ensuring that peptides share their set of peak locations

Averaging of spectra

Address variability between runs by averaging replicates

Recall crystallization and shot variability Averaging of multiple laser shots often

performed by machine

Results Identified vital information that affects the output

of the machine Information useful for a researcher using the spectra

Researched the processes which make the output more useful as protein signature

Next step: Identify parameters for software evaluation in MS data processing

Goal – Identify parameters for evaluating

software capabilities in the processing and analysis of Mass Spectrometry data.

Three candidates VIBE (Incogen Inc.) geWorkbench (Forge) S-PLUS (Insightful Corp.)

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

General Parameters Platform/Operating system compatibility?

Is the software Open source?

Is the software capable of performing the necessary tasks independently? Additional modifications? Internet access? Server ?

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Data Input

What types of file formats can the software open? Import?

What type of format must the data be? DNA (nucleotides – A, T, G, C) Proteins (amino acids – M, L, A, I, etc.)

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Software algorithms necessary for Proteomics data analysis

Can the software perform: Baseline subtractions? Mass calibrations? Noise reductions? Peak identifications? Normalization? Peak alignments?

Baseline Subtraction

(Malyarenko, et al. 2005)

Mass Calibration

(Kearsleya, et al. 2005)

Smoothing/Noise Reduction

(Malyarenko, et al. 2005)

Peak Identifications

(Do, 2006)

Normalization

(Kearsleya, et al. 2005)

Peak Alignments

(Malyarenko, et al. 2005)

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Results – Visualization of results

How can you visualize the data? Save/Export work

Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other

software packages? Print out

Can you print out a hard copy for record?

Visualization

MUSCLE (Edgar)

VIBE (Incogen Inc.)

Results – Visualization of results

How can you visualize the data? Save/Export work

Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other

software packages? Print out

Can you print out a hard copy for record?

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Software Benefits

What benefits does the software offer? Convenience of integrated modules Efficient – saves “man-power” of having to sit

there and do everything User-friendly interface

Convenience of Integrated

Modules

Efficiency

User-friendly Interface?

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Software Limitations

Limitations customization Small modifications to existing modules? Adding a new module?

Internet/Server Dependent?

Conclusion – We have identified these parameters to

be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

VIBE (by Incogen Inc.) Convenient integration of nucleotide and

amino acid analysis tools – BLAST (–X, –N, –P, TBLASTN, TBLASTP) Nucleotide and AA search

• FASTA, –X, –Y, Smith-Waterman, etc. Sequence manipulations

• Primer3, Conditional Filters, Translations, etc. Sequence alignments

• Crossmatch, ClustalW, Hidden Markov Model, etc.

Conclusion – We have identified these parameters to

be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

Literature Citations1) Do, P. Improved Peak Detection in Mass Spectrometry Spectrum by

Incorporating Continuous Wavelet Transform-based Pattern Matching. Robert H. Lurie Comprehensive Cancer Center, Northwestern University. ppt slides. 2006.

2) Kearsleya, A., Wallaceb, W.E., Bernala, J., and CM Guttmanb. A numerical method for mass spectral data analysis. Applied Mathematics Letters. 18:1412–1417, 2005.

3) Malyarenko, D.I., Cooke, W.E., Adam B-L, Malik, G., Chen, H., Tracy, E.R., Trosset, M.W., Sasinowski, M., Semmes, O.J. and D.M. Manos. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques. Clinical Chemistry. 51(1):65-74. 2005.

Acknowledgements Jet Propulsion Laboratory

Dr. Tina Xiao

Southern California Bioinformatics Summer Institute (SoCalBSI) Dr. Sandra Sharp Dr. Jamil Momand Dr. Wendie Johnston Dr. Nancy Warter-Perez Ronnie Cheng Friends

Duke University Medical Center Dr. Simon Lin

Center for Disease Control and Prevention (CDC) Dr. R Cameron Craddock

Huntington Medical Research Institute (HMRI) Dr. James Riggins Dr. Alfred Fonteh

top related