automatic annotation of n-glycan species in maldi-tof-tof spectra for rapid profiling and comparing...
Post on 19-Dec-2015
220 views
TRANSCRIPT
Automatic annotation of N-glycan species in MALDI-TOF-TOF spectra for rapid profiling and comparing
Chuan-Yih, Yu
2010.05.14 Capstone
Advisor: Prof. Haixu Tang
Indiana University Bloomington School of Informatics and Computing
2
Outline
• Introduction– Glycoprotein, Monosaccharides, N-linked
glycosylation, and Mass Spectrometry
• Problem set• Goals• MultiNGlycan• Result• Future works
3
Introduction
• Post-Translation Modification (PTM)– An enzyme-catalyzed change after synthesized– Acetylation, Cleavage, Glycosylation, Methylation,
Phosphorylation, and Prenylation
• 50% of all eukaryotic proteins are glycosylated1
[Apweiler, et al.]
1.Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, 1999. 1473(1): p. 4-8
http://yahoo.brand.edgar-online.com/EFX_dll/EDGARpro.dll?FetchFilingHTML1?SessionID=WD8AC7y2l3h1FMr&ID=5101862
4
Glycosylation
• N-linked glycosylation – Core structure – 2 GlcNac + 3 Man– Asn-X-Ser or Asn-X-Thr, X can be any but
Pro (glycosylation sequon)– Glycosylation before folding
• O-linked glycosylation– Many different core structures– Serine or Threonine– Glycosylation after folding
5
N-linked glycosylation • Tree structure• Monosaccharides- building blocks of
polysaccharide chain• Diverse linage – at most four
branches• Three types of N-linked glycan tree
– High mannose– Complex– Hybrid
Graphs: Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p
Name Molecular formula/ Structure
Mannose (Man) C6H12O6
Galactose (Gal) C6H12O6
Fucose (Fuc) C6H12O5
GlcNac C8H15NO6
NeuNAC C11H19NO9
NeuNGC C11H19NO10
7
MALDI-TOF-TOF
• Matrix-assisted laser desorption/ionization• Time of flight (TOF)
Graph:MALDI-TOF Mass Analysis. (2008, 11 16). Retrieved May 2, 2009, from The Protein Facility of the Iowa State University Office of Biotechnology www.protein.iastate.edu/maldi.html
8
Problem Sets
Glycopeptide isotope pattern overlap
Graphs: Isotope Pattern Calculator v4.0 http://yanjunhua.tripod.com/pattern.htm http://en.wikipedia.org/wiki/Carbon
2 GlcNac + 9 Man = 2374.5960 7 GlcNac + 3 Man = 2375.63
Mass % Mass %
2371 0.0
2372 84.3 2372 0.0
2373 100.0 2373 82.4
2374 68.5 2374 100.0
2375 34.3 2375 68.8
2376 13.9 2376 34.4
9
Problem Sets
High-throughput glycans profiling
http://www.functionalglycomics.org
10
Goals
• Glycans profile correlation– Report scores for non-overlap and overlap
profile– Glycans examination
• Glycan profile comparison– Report significant glycan between groups– Glycans biomarker discovery
11
Glycans Profile Correlation
• For each glycan combination– 412 different glycan combinations[Krambeck, et al. ]1
– Generate a theoretical isotope pattern– Calculate the correlation for following cases
1. Glycans
2. Glycans + Glycans, linear combination applied
3. Glycans + Unknown, linear combination applied
• Mercury algorithm2
– Generate the unknown isotope pattern
2.Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, 1995. 67: p. 2699-2704.
1.Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, 2005. 92(6): p. 711-28.
12
Three Cases
Experiment spectrum
Glycans
α
α
Glycans
Unknown
ScoreTheoretical isotope pattern
β
β 0.2
0.8
0.6
13
Glycan Profile Comparison
• Multiple spectra comparison• Biomarker discovery
– Given spectrum with several conditions– Find distinct glycans between samples
Graph: Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, 2008. 7(2): p. 603-10.
HCC: Hepatocellular Carcinoma( Cancer of liver)
CLD: Chronic liver disease
14
Concept
Health spectra(H1, H2, H3…Hk)
Disease spectra(D1, D2, D3…Dk)
Remove the least significant component. Repeat until all the score above threshold.
1.Hastie, T., et al., 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, 2000. 1(2): p. RESEARCH0003
70% identical with a cutoff at 0.5
15
Multi N-Glycan
• Software Requirements– .net framework 2.0 using C#– C++ runtime– R– Thermo Scientific Xcalibur
• Input– Spectrum
• Plain text (Peak list), mzXML1,RAW (Thermo Scientific raw file)
– Glycans list• CSV file (User-defined)
• Output– List of glycans with scores
1.Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, 2004. 22(11): p. 1459-1466.
16
Software Interface
17
Software features
• Signal preprocessing provided– Subtracting background – Smoothing peak– Tolerating Mass Spectrometry accuracy
• Flexible parameters incorporate actual experiment
• Useful tools provides– Isotope pattern generator
• Content rich output, multi-format supports– csv, text, html
18
Software screenshot
Html result export
19
Software screenshot
20
Result
• Data set– Liver Cancer : 73 individuals– Health: 78 individuals
• 412 glycan structures are tested• Glycan criterion
– Correlation score cut off < 0.5– Present in 30% of total spectra
Zhiqun T., et al., Identification of N-Glycan Serum Markers Associated with Hepatocellular Carcinoma from Mass Spectrometry Data. J Proteome Res, 2009
Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, 2008. 7(2): p. 603-10.
Anoop M., Chuan-Yih Y., A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data. I690 project, 2009 Fall
21
Result
Filtered out
Can’t find the glycan structure in CFG databaseCorrelation score
Overlap with 2192
22
Result
23
Future Works
• Test on more clinical samples• Verify the correlation between glycan
modification which reported by MultiNGlycan with Hepatocellular arcinoma
• Perform these tasks on O-linked glycan• Apply de novo glycan sequencing on reported
glycan (ongoing)
24
References• Anoop M., Chuan-Yih Y., A Multi-PCA Approach to Glycan Biomarker Discovery using Mass
Spectrometry Profile Data. I690 project, 2009 Fall• Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced
from analysis of the SWISS-PROT database. Biochim Biophys Acta, 1999. 1473(1): p. 4-8.• Dalit Shental-Bechor and Yaakov Levy, Effect of glycosylation on protein folding: A close look at
thermodynamic stabilization, PNAS June 11, 2008• Hastie, T., et al., ‘Gene shaving’ as a method for identifying distinct sets of genes with similar
expression patterns. Genome Biol, 2000. 1(2): p. RESEARCH0003.• Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol
Bioeng, 2005. 92(6): p. 711-28.• Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application
in a Proteomics Research Environment. Nature Biotechnology, 2004. 22(11): p. 1459-1466.• Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide
and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, 2008. 7(2): p. 603-10.• Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical
Chemistry, 1995. 67: p. 2699-2704.• Zhiqun, T., et al., Identification of N-glycan serum markers associated with hepatocellular
carcinoma from mass spectrometry data. J Proteome Res, 2010. 9(1): p. 104-12.• Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor
Laboratory Press. xxix, 784 p.
25
Acknowledge
• Advisor: Prof. Haixu Tang • Co-worker: Anoop Mayampurath• Collaborator: Yehia Mechref, Department
of Chemistry• COL Lab members
• This work will present in 26th May, 58th ASMS Conference Salt Lake City, Utah and submit to the Bioinformatics Application Notes.
Thank You