glycoproteomics analysis of human liver tissue by
TRANSCRIPT
Subscriber access provided by DALIAN INST OF CHEM PHYSICS
Journal of Proteome Research is published by the American Chemical Society.1155 Sixteenth Street N.W., Washington, DC 20036
Article
Glycoproteomics Analysis of Human Liver Tissue by Combinationof Multiple Enzyme Digestion and Hydrazide Chemistry
Rui Chen, Xinning Jiang, Deguang Sun, Guanghui Han,Fangjun Wang, Mingliang Ye, Liming Wang, and Hanfa Zou
J. Proteome Res., 2009, 8 (2), 651-661 • Publication Date (Web): 21 January 2009
Downloaded from http://pubs.acs.org on February 6, 2009
More About This Article
Additional resources and features associated with this article are available within the HTML version:
• Supporting Information• Access to high resolution figures• Links to articles and content related to this article• Copyright permission to reproduce figures and/or text from this article
Glycoproteomics Analysis of Human Liver Tissue by Combination of
Multiple Enzyme Digestion and Hydrazide Chemistry
Rui Chen,† Xinning Jiang,† Deguang Sun,‡ Guanghui Han,† Fangjun Wang,† Mingliang Ye,*,†
Liming Wang,‡ and Hanfa Zou*,†
Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic R&A Center, DalianInstitute of Chemical Physics, The Chinese Academy of Sciences, Dalian 116023, China, and The Second
Affiliated Hospital of Dalian Medical University, Dalian 116027, China
Received September 19, 2008
The study of protein glycosylation has lagged far behind the progress of current proteomics becauseof the enormous complexity, wide dynamic range distribution and low stoichiometric modification ofglycoprotein. Solid phase extraction of tryptic N-glycopeptides by hydrazide chemistry is becoming apopular protocol for the analysis of N-glycoproteome. However, in silico digestion of proteins in humanproteome database by trypsin indicates that a significant percentage of tryptic N-glycopeptides is notin the preferred detection mass range of shotgun proteomics approach, that is, from 800 to 3500 Da.And the quite big size of glycan groups may block trypsin to access the K, R residues near N-glycositesfor digestion, which will result in generation of big glycopeptides. Thus many N-glycosites could notbe localized if only trypsin was used to digest proteins. Herein, we describe a comprehensive way toanalyze the N-glycoproteome of human liver tissue by combination of hydrazide chemistry methodand multiple enzyme digestion. The lysate of human liver tissue was digested with three proteases,that is, trypsin, pepsin and thermolysin, with different specificities, separately. Use of trypsin aloneresulted in identification of 622 N-glycosites, while using pepsin and thermolysin resulted in identificationof 317 additional N-glycosites. Among the 317 additional N-glycosites, 98 (30.9%) could not be identifiedby trypsin in theory because the corresponding in silico tryptic peptides are either too small or too bigto detect in mass spectrometer. This study clearly demonstrated that the coverage of N-glycosites couldbe significantly increased due to the adoption of multiple enzyme digestion. A total number of 939N-glycosites were identified confidently, covering 523 noredundant glycoproteins from human livertissue, which leads to the establishment of the largest data set of glycoproteome from human liver upto now.
Keywords: Glycoproteomics • Hydrazide chemistry • Multiple enzyme digestion • Human liver proteomeproject
Introduction
Protein glycosylation is acknowledged as one of the majorpost-translational modifications with significant effects onprotein folding, conformation distribution, stability and activity.It is estimated that more than half of all mammalian proteinsare glycosylated.1 The attached oligosaccharides play structural,protective, stabilizing roles in living cells and is involved indiverse biological mechanisms, for examle, protein-proteinrecognition, receptor interaction, cell communication andadhesion.2 Besides, the aberrant glycosylation is a basic feature
of oncogenesis and tumor progression.3 Many attention hasbeen given to the growing field of glycoproteomics to studythe biological roles of protein glycosylation and the relationshipbetween glycosylation and diseases by structural characteriza-tion of glycopeptides or glycans attached to certain glycosyla-tion sites with mass spectrometric methods.4 And a variety ofbody fluids, like plasma,5 saliva6 have been studied to findbiomarkers of cancers and other diseases. However, littleattention has been paid to the organs, for example, the liver,despite the fact that the proteome study of liver can providevaluable knowledge to accelerate the research of liver diseaseand biomarker discovery.7
A crucial step for glycoproteome analysis is to specificallyenrich glycopeptides from protein digest because of thecomplexity of tissue samples and the great dynamic range ofglycoproteins.8 Solid phase extraction by hydrazide chemistryis becoming a popular protocol for enrichment of N-glycopep-tides (N-glycopeptides refer to the deglycosylated N-linkedglycopeptides in this work if not otherwise stated) for the
* To whom correspondence should be addressed. Prof. Dr. Hanfa Zou,National Chromatographic R&A Center, Dalian Institute of Chemical Physics,The Chinese Academy of Sciences, Dalian 116023, China. Tel, +86-411-84379610; fax, +86-411-84379620; e-mail, [email protected]. Prof. Dr.Mingliang Ye, National Chromatographic R&A Center, Dalian Institute ofChemical Physics, Chinese Academy of Sciences, Dalian 116023, China. Tel,+86-411-84379620; fax, +86-411-84379620; e-mail, [email protected].
† Dalian Institute of Chemical Physics, The Chinese Academy of Sciences.‡ The Second Affiliated Hospital of Dalian Medical University.
10.1021/pr8008012 CCC: $40.75 2009 American Chemical Society Journal of Proteome Research 2009, 8, 651–661 651Published on Web 01/21/2009
analysis of N-glycoproteome.9 With high-abundance proteindepletion and two-dimensional liquid chromatography MS/MS spectrometry, the hydrazide chemistry method resulted inthe identification of 639 N-glycosylation sites in human se-rum.10 Nevertheless, the coverage of this method to studyN-glycosylation in tissue is much lower. Only 445, 248 and 176N-glycosites were identified from prostate cancer tissue, livermetastasis and bladder cancer tissue, respectively.11
Although trypsin has the advantage of high specificity whichcleaves exclusively at arginine (R) and lysine (K) residue,12 it isfar from a perfect enzyme for studying postmodifications.Trypsin cleavage may be blocked by the attached carbohydratessterically when trypsin cleavage site was located right afterglycosylated asparagine.13 As a result, missed trypsin cleavagetends to occur more frequently when glycoprotein are digested.Moreover, some glycoproteins are not suitable for trypticdigestion either because they lack enough arginine and lysineresidues in their sequences or they are not soluble at theoptimal pH range of trypsin, for example, DQH sperm surfaceprotein. This protein has extremely low solubility at neutral orslightly alkaline pH conditions. It is necessary to use V8protease or pepsin to digest this protein at low pH value.14 Inthe analysis of gp120, an HIV envelope glycoprotein, the fiveglycosylation site could not be characterized due to incompletetrypsin digestion and the presence of multiple glycosylationsites within one peptide.15
To overcome the limitations described above, proteases, suchas proteinase K16 or Pronase17 have been adopted for digestionof several glycoproteins in a nonspecific manner. Thesenonspecific enzymes will generate a mixture of glycopeptideswith shorter sequences which are less difficult to detect by MSspectrometry. The nonglycosylated proteins and peptides arecompletely digested. So the peptides with attached glycans canbe easily isolated and analyzed by liquid chromatographytandem mass spectrometry. However, since the amino acidresidues in the peptides portions of glycopeptides are very shortwhich vary from two to five amino acid residues, it is difficultto identify the peptide sequence and the exact glycosylationsites if the sequence of the protein is unknown. An alternativemethod is the digestion of glycoproteins by Lys-C to generatea small number of peptides than tryptic fragments and theresultant larger peptides is analyzed by high resolution Fouriertransform ion cyclotron resolution mass spectrometer.18 How-ever, it is not easy to identify large peptides if low resolutionion trap mass spectrometer is used. Multiple enzyme digestionstrategy has been applied to the analysis of other modifica-tions.19 A specific enzyme, trypsin and two other enzymes withbroader specificity, subtilisin and elastase, were utilized for theprotein digestion. Each digest was analyzed separately by usingmultidimensional liquid chromatography and tandem massspectrometry. Eighteen sites of four different types of modifica-tion have been detected on three of the five proteins in a simplemixture. Their approach was further applied to analyze theproteins extracted from the lens tissue; 270 proteins wereidentified, and 11 different crystallins were found to contain atotal of 73 sites of modification.
Improved sequence coverage could be achieved by over-lapped peptides generated in multiple enzyme digestion,leading to an increased chance of finding modification sites.In addition, the digestion by protease with specificity comple-mentary to trypsin can also generate more peptides within thepreferable mass range for peptide identification and modifica-tion site location. With multiple enzyme digestion, it should
be able to find N-glycosylation sites that would not bediscovered with only trypsin digestion. To the best of ourknowledge, the application of multiple enzyme digestion tostudy protein glycosylation was not conducted in proteomelevel up to now. The combination of this strategy with N-glycopeptide enrichment method such as hydrazide chemistrywould be effective in the study of N-glycoproteomics.
The comprehensive analysis of human liver tissue N-glyco-proteome by hydrazide chemistry in combination with multipleenzyme digestion strategy is presented in this work. Pepsin andthermolysin with broader specificity were used as complemen-tary proteases to trypsin. It was found that this strategy cansignificantly improve N-glycosite coverage. A total number of939 N-glycosites were identified with high confidence, covering523 noredundant proteins. This is by far the largest data setobtained for human liver N-glycoproteome.
Materials and Methods
Materials and Chemicals. All the water used in this experi-ment was prepared using a Milli-Q system (Millipore, Bedford,MA); iodoacetamide (IAA) was purchased from AMERESCO(Solon, OH); dithiothreitol (DTT), ammonium bicarbonate,urea, trifluoroacetic acid (TFA), formic acid (FA) and acetonitrile(ACN) were all obtained from Aldrich (Milwaukee, WI). All thechemicals were of analytical grade except acetonitrile, whichwas of HPLC grade. 3-[(3-Cholamidopropyl)-dimethylammo-nio]-1-propane sulfonate (CHAPS) was from USB Corporation(Cleveland, OH). The Complete Mini protease inhibitor cocktailtablets were obtained from Roche Diagnostics (Mannhaim,Germany). Trypsin (from bovine pancreas, TPCK-treated),pepsin (from porcine gastric mucosa) and thermolysin (fromBacillus thermoproteolyticus rokko were obtained from Sigma(St. Louis, MO). Affi-Gel Hz Hydrazide Gel was purchased fromBio-Rad (Hercules, CA). PNGase F was from New EnglandBiolabs (Ipswich, MA).
Sample Preparation. The human liver tissue was the non-cancerous liver tissues g2 cm outside the hepatic cancernodules removed by surgical operation from patients at theSecond Affiliated Hospital of Dalian Medical University (Dalian,China). The noncancerous liver tissue has been verified byhistopathological examination which excluded the presence ofinvading or microscopic metastatic cancer cells. The utilizationof human tissues complied with guidelines of Ethics Committeeof the Hospital. The isolated human liver tissue was placed inan ice-cold homogenization buffer consisting of 8 M urea, 4%CHAPS (w/v), 65 mM DTT, a mixture of protease inhibitor(Complete Mini protease inhibitor cocktail tablets, 1 tablet for10 mL homogenization buffer) and 100 mM NH4HCO3 at pH7.8. The tissues were homogenized in a Potter-Elvejhemhomogenizer with a Teflon piston, using 10 mL of the homog-enization buffer per 2 g of tissue. The suspension was homog-enized for approximately 1 min, sonicated for 100 W × 30 sand centrifuged at 25 000g for 1 h. The supernatant containedthe total tissue proteins. The concentration of proteins wasdetermined by BCA method, which was 7.8 mg protein/mLlysate. Appropriate volumes of protein sample were precipitatedwith 5 vol of 50% acetone, 50% alcohol and 0.1% acetic acid(v/v/v), lyophilized to dryness, and dissolved in reducingsolution (8 M urea, 100 mM NH4HCO3). The sample wasreduced by 10 mM DTT and carboxyamidomethylated in 20mM iodoacetamide.
Protein Digestion. Samples containing 5 mg proteins weredigested with three different proteases separately using different
research articles Chen et al.
652 Journal of Proteome Research • Vol. 8, No. 2, 2009
protocols. Trypsin digestion: Protein sample was diluted with100 mM PBS buffer (pH ) 8.0) to a urea concentration of 1 M.A total of 200 µg of TPCK-treated trypsin was added andincubated for 24 h at 37 °C. The reaction was quenched byadding 10% TFA (v/v) to pH ) 3. Pepsin digestion: Samplecontaining the same amount of proteins was diluted with 0.1%TFA (v/v) 4-fold and adjusted to a pH between 2 and 3 with10% TFA (v/v). Pepsin was added at an enzyme-to-substrateratio of 1:25 (w/w) and incubated for 24 h at 37 °C. Thermol-ysin digestion: sample solution was diluted with 100 mMNH4HCO3 to a urea concentration of 2 M. A total of 200 µg ofthermolysin was added and incubated for 24 h at 37 °C. Thereaction was stopped by adding 10% TFA (v/v) to pH ) 3. Eachdigest was stored at -80 °C until analysis.
N-linked Glycopeptide Capture. The N-linked glycopeptidesin each digest were captured with hydrazide chemistry asdescribed in a published protocol with minor modification.20
Digested sample solution was first desalted with homemadeC18 solid phase extraction column. After it was dried in a SpeedVac (Thermo, San Jose, CA), it was reconstituted in 450 µL ofoxidation buffer (100 mM NaAc, 150 mM NaCl, pH ) 5.5); then,50 µL of 100 mM sodium periodate was added, and the reactionwas incubated at room temperature in darkness for 1 h. Theoxidized sample was desalted with C18 column again and theeluted solution was directly added to 100 µL of Bio-Rad Affi-Gel Hz Hydrazide Gel (bed volume). The coupling reaction wasleft overnight at room temperature with shaking. The gel withcaptured N-linked glycopeptides was then washed with 1 mLof 1.5 M NaCl, methanol, and 100 mM NH4HCO3 sequentiallythree times to remove nonspecific adsorptions. After additionof 250 µL of fresh 100 mM NH4HCO3 and 10 µL of PNGase F(500 units per µL) to the gel, the enzymatic release of peptidemoieties was carried out at 37 °C overnight. The end productfor this procedure is the isolation of deglycosylated peptides(termed as N-glycopeptide in this study) that originally containN-linked carbohydrates. The supernatant was collected bygentle centrifugation and combined with the supernatant of100 mM NH4HCO3 wash fraction. The peptide solution wasdesalted by C18 column, dried and reconstituted in 25 µL of0.1% formic acid as N-glycopeptide sample for LC-MS/MSanalysis.
Analysis of Enriched Glycopeptides. The analysis of theN-glycopeptide samples was carried out by 2D nano-LC/MS/MS according to our previous method with minor modifica-tion.21 The 25 µL reconstituted glycopeptide sample was loadedonto a phosphate SCX monolithic column (150 µm × 7 cm i.d.)by air pressure. After that, the SCX column with loadedglycopeptides was manually connected to a C18 column (75µm × 10 cm i.d.) in tandem by a union. The glycopeptides wereeluted onto the analytical reversed phase column using stepgradients generated with 1000 mM ammonium acetate in 0.1%formic acid. The salt concentrations of the buffer used for 10step gradients were 50, 100, 150, 200, 250, 300, 350, 400, 500and 1000 mM, respectively. Each salt step lasts 10 min with afollowing equilibrium by 0.1% formic acid for additional 10 min.After each elution step, a subsequent RPLC-MS/MS wasexecuted with a 120 min gradient from 5% to 35% acetonitrile.The RPLC-MS/MS was performed on a nano-RPLC-MS/MSsystem. A Finnigan surveyor MS pump (Thermo Finnigan, SanJose, CA) was used to deliver mobile phase. For the C18capillary separation column, one end of the fused-silica capil-lary was manually pulled to a fine point of ∼5 µm with a flametorch. The columns were in-house packed with C18 AQ beads
(5 µm, 120 Å) from Michrom BioResources (Auburn, CA) usinga pneumatic pump. The nano-RPLC column was directlycoupled with a LTQ linear ion trap mass spectrometer fromThermo Finnigan (San Jose, CA) with a nanospray source. Themobile phase consisted of mobile phase A containing 0.1%formic acid (v/v) in water, and mobile phase B containing 0.1%(v/v) formic acid in acetonitrile. A Finnigan LTQ linear ion trapmass spectrometer equipped with an ESI nanospray source wasused for the MS experiment. A voltage of 1.8 kV was applied tothe cross before the C18 capillary column. The LTQ instrumentwas operated at positive ion mode. Normalized collision energywas 35.0%. One microscan was set for each MS and MS/MSscan. All MS and MS/MS spectra were acquired in the data-dependent mode. The mass spectrometer was set that one fullMS scan was followed by six MS/MS scans on the six mostintense ions. The dynamic exclusion function was set as follows:repeat count 2, repeat duration 30 s, and exclusion duration90 s. System control and data collection were done by Xcalibursoftware version 1.4 (Thermo). The scan range was set fromm/z 400 to m/z 2000.
Database Searching and Data Processing. The MS/MSspectra were searched using SEQUEST22 (version 2.7) againsta composite database including both original and reversedhuman protein database of International Protein Index (ipi.human 3.17.fasta, including 60 234 entries, http://www.ebi.ac.uk/IPI/IPIhuman.html). Enzyme was selected as theprotease used for protein digestion. The cleavage site was setaccording to manufacture’s guide, trypsin, KR/P; pepsin, FLE/VAG; thermolysin, LFIVMA/P. Enzyme limits was set fullyenzymatic, cleaving at both ends. A maximum of two missedcleavage was allowed. Carboxamidomethylation (+57) was setas static modification and the following dynamic modificationswere used: oxidation of methionine (+16), and a PNGaseFcatalyzed conversion of asparagines to aspartic acid at N-glycosites (+1). The database search results of each proteasewere filtered with the criteria optimized by SFOER (ver-sion2.3).23 A false detection rate (FDR) of less than 2% was setfor all peptide identifications. The majority of the identifiedpeptides containing NXS/T-motif and only ∼15% of identifiedpeptides did not contain this motif. These peptide identifica-tions probably resulted from nonspecific isolation of N-linkedglycopeptides or random assignment in database search. Thispercentage of is lower than that in a similar work (∼20%) forisolation of N-glycopeptides,11 which indicated the high speci-ficity of N-glycopeptide isolation in this study. To reduce thefalse positive rate of identified N-glycosites, the peptideswithout such motifs were removed. After the filtration, fewpeptides that contained NXS/T were identified from thereversed database search (FDR < 0.1% for identification ofN-glycopeptides was obtained), which suggested the highidentification confidence. The identified glycoproteins fromhuman liver tissue were categorized by Gene Ontology (GO)with component, function and process terms extracted fromtext-based annotation files downloaded from the EuropeanBioinformatics Institute ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/).
Results
Glycoproteomic Analysis with Trypsin. The proteins ex-tracted from liver tissue were first digested by trypsin, and thenthe N-linked glycopeptides in the resulting digest was im-mobilized onto the hydrazide beads. After thoroughly washing,the peptide moieties of the captured N-linked glycopeptides
Glycoproteomics Analysis of Human Liver Tissue research articles
Journal of Proteome Research • Vol. 8, No. 2, 2009 653
were released from the beads by deglycosylation using PNGaseF. The deglycosylated peptides (N-glycopeptides) were finallyanalyzed by 2D nano-LC-MS/MS. Three 2D nano-LC-MS/MSruns were performed in this study, which resulted in theidentification of 840 unique tryptic N-glycopeptides from 384glycoproteins. The identified N-glycopeptides and glycoproteinswere given in the Supporting Information. On the basis of theNXS/T-motif, 622 N-glycosites were identified from human livertissue.
The acquisition settings for MS/MS were in the range of400-2000 Da in this study, thus, the peptide mass detectionrange should be 400-6000 Da considering three charge statesof peptides (+1, +2, +3). However, the mass range of identifiedN-glycopeptides is much narrower than this range as shownin Figure 1A. Most of the N-glycopeptides locate themselvesin a mass range of 800-3500 Da. For comparison, the MWdistribution of nonglycopeptides identified by directly analyzingthe tryptic peptides from the same sample with 1D nanoLC-MS/MS is also given in Figure 1B. It was found that the two
distributions are very similar. The similar mass ranges ofpeptides identified by shotgun proteomics were well-docu-mented in the literature.24 The small or big peptides cannotbe identified through database search probably because theseMS/MS spectra do not contain enough fragment information.The above facts indicate that the preferable mass range forpeptides identified by shotgun proteomics is from 800 to 3500Da.
The N-glycosite is not likely to be identified by shotgunproteomics approach if the mass of corresponding N-glyco-peptides is not in the range of 800-3500 Da. Therefore, it is ofinterest to see how many tryptic N-glycopeptides derived fromthe proteins in the human proteome are within this range. Themass distributions of tryptic peptides and tryptic N-glycopep-tides generated by in silico digestion of all proteins in thehuman protein database (IPI Human 3.17) were given in Figure2A. The in silico tryptic N-glycopeptides are tryptic peptidescontaining NXS/T motif. The percentage of tryptic peptides andtryptic N-glycopeptides in the mass range of 800-3500 Da were
Figure 1. Molecular weight distribution of peptides identified from tryptic digest of protein extracted from human liver tissue. (A)N-glycopeptides enriched by hydrazide chemistry; (B) nonglycopeptides identified by direct analysis.
research articles Chen et al.
654 Journal of Proteome Research • Vol. 8, No. 2, 2009
70.0% and 68.7%, respectively. Since most proteins wouldgenerate dozens of peptides after trypsin digestion and proteinscould be identified by any of these peptides with their massesin the range, the theoretical proteome coverage by trypticpeptides was still close to 100% even though 30.0% trypticpeptides are not in this range. However, the situation for theidentification of protein modification is different. The glycositecould be identified only if a peptide containing this site isidentified. As only 68.7% tryptic N-glycopeptides are in thepreferred mass range, a significant amount of N-glycositescould not be identified if only trypsin was used. The above-
mentioned in silico digestion did not include the missedcleavage sites, and the missed cleavage is inevitable in proteindigestion especially for glycoproteins. We analyzed the se-quences of the identified N-glycopeptides and nonglycopep-tides from human liver tissue. It was found that among the840 identified unique tryptic N-glycopeptides, 169 (20.1%) havemissed cleavage site. While among the nonglycopeptidesidentified by 1D LC-MS/MS, 152 of 1069 (14.2%) have missedcleavage sites, respectively. These results indicate that missedcleavage tends to happen more frequently when glycoproteinis digested. To investigate how missed cleavage affects N-
Figure 2. Comparison of molecular weight distribution of peptides by in silico digestion of proteins in human protein database (IPIHuman 3.17) by trypsin between all the peptides (in blue) and peptides with NXS/T motif (in red) with different missed cleavage sites.(A) No missed cleavage site; (B) one missed cleavage site; and (C) two missed cleavage sites. The average molecular weight of eachdistribution is listed in the inserted table.
Glycoproteomics Analysis of Human Liver Tissue research articles
Journal of Proteome Research • Vol. 8, No. 2, 2009 655
glycosite identification, in silico digestion of proteins fromhuman protein database with one missed cleavage was alsoconducted and the resulting peptide mass distributions areshown in Figure 2B. It is astonishing to find that N-glycopep-tides with missed cleavage sites were much bigger than thoseof all peptides. With one missed cleavage site, the average MWfor all peptides and N-glycopeptides was 2422.61 and 3756.37Da, respectively. There are still 71.9% of peptides locatethemselves in the mass range of 800-3500 Da for all thepeptides; however, only 52.2% of tryptic N-glycopeptides werein the mass range of 800-3500 Da. When two missed cleavageswere considered, this percentage descended to 33.5%. For allpeptides, this percentage is still as high as 59.2% as shown inFigure 2C. These data indicate that much more N-glycopeptidescould not be identified when trypsin cleavage sites are missed.These facts demonstrated that it was inadequate to use onlytrypsin for N-glycosite identification. To improve N-glycositecoverage in bottom up approach, enzymes with complemen-tary cleavage specificity should be adopted to digest proteins.
Glycoproteomic Analysis with Multiple Proteases Diges-tion. We adopted multiple enzyme digestion to improve thecoverage of N-glycosites for glycoproteome analysis. The aimof multiple enzyme digestion was once to promote sequencecoverage in identification of amino acid sequence variationsin shotgun analysis by the means of generating overlappedpeptides.25 When it was applied to the protein identificationfrom complex samples or tissue, the entire protein sequenceof abundant proteins could be covered in the chances ofidentifying a modification like phosphorylation on a specificamino acid residue.19 Pepsin and thermolysin were used ascomplements to trypsin in this study. After digestion, theN-glycopeptides were enriched by the same hydrazide chem-istry method and analyzed by 2D nano-LC-MS/MS.The number of identified N-glycosites and glycoproteins bythese two proteases as well as by trypsin were given in Figure3. The full lists of identified N-glycopeptides and glycoproteinswere given in the Supporting Information. Among all 939N-glycosites identified, only 20 N-glycosites were idetnified byall the three proteases and another 125 N-glycosites wereidentified by both proteases. These N-glycosites were identifiedby multiple peptides generated by different proteases; thus, theidentification of these sites were very confident. The use ofmultiple proteases can reduce the ambiguity in mappingmodifications and increase the possibility of obtaining highquality MS/MS spectrum to identify peptides.
However, the identified N-glycopeptides sharing the sameN-glycosite account for a very small proportion in the totalnumber of identified N-glycopeptides. Much more N-glyco-peptides with different N-glycosites were identified usingdifferent proteases which indicated good complement inN-glycosite identification to trypsin and so the N-glycositecoverage could be significantly increased. The other twoproteases used in this study led to identification of 317 (33.7%)additional N-glycosites and 139 (26.6%) additional glycopro-teins. This complementary effect is illustrated in Figure 3. Itcan be clearly seen that the adoption of multiple enzymedigestion could both enhance N-glycosites coverage of theidentified glycoproteins and identify the glycoproteins whichcannot be identified by using trypsin alone.
Among the 317 additional N-glycosites, 98 (30.9%) could notbe identified by trypsin in theory because the correspondingin silico tryptic peptides are not in the mass range of 800-3500Da. For example, the tryptic peptide carrying N-glycosite at
Asn426 of splice isoform Sap-mu-0 of Proactivator polypeptideprecursor is N#STK, which is too small to be identified.However, this N-glycosite can be well-identified with a peptideof LEKN#STKQEIL (MW ) 1303.73) by thermolysin. On theother hand, for proteins which lack lysine and arginine residues,trypsin digestion will generate peptides with longer sequence.These peptides cannot easily be fragmented in CID to give highquality mass spectra for sequence identification. For example,MAL2 Protein has only two trypsin cleavage sites in itssequence. Its tryptic peptide carrying N-glycosite is LQG-WVMFVSVTAFFFSLLFLGMFLSGMVAQIDANWNFLDFAYHFTV-FVFYFGAFLLEAAATSLHDLHCN#TTITGQPLLSDNQYNINVAA-SIFAFMTTACYGCSLGLALR (MW ) 12113.93); analysis of thishuge peptide needs the top-down technique by high resolutionmass spectrometry. Its pepsin generated peptide, HDLHCN#TTITGQPLL (MW ) 1161.82), is very easy to be analyzed usingbottom-up method. Another example further demonstrated thenecessity of using multiple enzyme digestion in glycosylationanalysis as shown in Figure 4. Neural-cadherin is a calciumdependent cell-cell adhesion glycoprotein that functions dur-ing gastrulation and is required for establishment of left-rightasymmetry.26 It has 10 NXS/T motifs in the sequence of thisprotein and 7 were annotated as potential N-glycosites in theSwiss-Prot database. However, in the N-glycosites database
Figure 3. Complementary effect of multiple enzyme digestion inglycoproteomics analysis of human liver tissue. (A) N-glycositesand (B) glycoprotein.
research articles Chen et al.
656 Journal of Proteome Research • Vol. 8, No. 2, 2009
(www.unipep.org) which is based on the analysis of trypticglycopeptides, N-glycosites at Asn325 and Asn402 were found onthis protein. In our study, 4 N-glycosites were identified withmultiple enzyme digestion. The Asn402 and Asn692 were identi-fied with trypsin digestion; the molecular weight of the trypticglycopeptides was well within the detection range of massspectrometry. However, the tryptic glycopeptides carryingAsn273 and Asn572 have a molecular weight of more than 7500,which exceed the detection range greatly and cannot beidentified using trypsin only. These two N-glycosites weresuccessfully identified by N-glycopeptides generated by pepsinand thermolysin as shown in Figure 4.
In the glycoproteomics analysis, the necessity for using ofcomplementary proteases is more prominent due to theblocking effect of the large size of carbohydrates attached toglycosites. When the trypsin cleavage site is adjacent to theN-glycosites, the cleavage site could probably be missed, whichmay lead to the generation of peptides with longer sequences.If these peptides are larger than 3500 Da, the glycosites wouldbe difficult to be identified. To investigate how this blockingeffect affects the digestion of glycoproteins, we analyzed thesequences of identified glycopeptides with trypsin digestion.Among the glycopeptides which have their N-glycosites adja-cent to lysine or arginine, 30 of 88 trypsin cleavage site havebeen missed, which suggests that blocking effects do exist.Moreover, it was found that missed cleavage tends to happenmore frequently when trypsin cleavage sites are at the C-terminal side of N-glycosylated asparagine residue, which isin accordance with the reported study.13 In the human proteindatabase, there are nearly 27 000 NXS/T-motifs in which X iseither K or R. If the cleavage sites in the motif cannot be cleavedby trypsin, 35.5% of the resulting tryptic peptides are biggerthan 3500 Da and cannot be identified. However, theseglycosites could be identified by using other proteases withdifferent specificities. In the N-glycosites identified by pepsinand thermolysin, 43 N-glycosites were found to be adjacent totrypsin cleavage site. The majority of these glycosites (37 of
43) could not be identified with digestion by trypsin in thisstudy, for instance, FCN#RTVSDCCPDF identified by pepsinfrom splice isoform 1 of tubulointerstitial nephritis antigen-like precursor, and ITEEAN#RTTME from splice isoform 1 oflaminin alpha-4 chain precursor. The arginine residue is tooclose to the N-glycosite to be accessed by trypsin, and therefore,the corresponding tryptic peptides are not detected. The missedcleavage of R79 in splice isoform 1 of tubulointerstitial nephritisantigen-like precursor will give a peptide sequence of 52 aminoacid residues, ADDCALPYLGAICYCDLFCN#RTVSDCCPDFWD-FCLGVPPPFPPIQGCMHGGR, which has a MW of 5701.59 Da.Decoding the structure of peptides like this needs top-downtechnique. In this study, digestion with thermolysin resultedin the localization of 80% of these N-glycosites, which showsthat protease with broader specificity may be more useful thanpepsin in post-translational modification analysis as comple-ment to trypsin. The above-mentioned results demonstratedthat some glycosites could not be identified with trypsinbecause the N-glycopeptides with missed cleavage sites are toobig to be identified. However, in the N-glycopeptides generatedwith completely in silico trypsin digestion, 4.7% of them havea molecular weight less than 800. The corresponding glycositescannot be identified because these N-glycopeptides are toosmall. Missed cleavage may increase the chance for theidentification of these N-glycosites because of the generationof bigger N-glycopeptides. To investigate this possibility, insilico digestion of identified tryptic N-glycopeptides with missedcleavage and calculation of the mass of these N-glycopeptideswith complete digestion were performed. It was found that ofthe 63 identified tryptic N-glycopeptides with missed cleavagesites, 6 of corresponding N-glycopeptides with complete diges-tion were too small to be detected. Missed cleavage sites didincrease the chance to identify these N-glycosites. It shouldbe mentioned that this makes little contribution on thecoverage of glycosite in proteome level since N-glycopeptidesbelow the preferable detection mass range takes a very smallportion of tryptic N-glycopeptides as shown in Figure 2A.Besides, these N-glycosites may be identified by other proteaseswith different cleavage specificities in the multiple enzymedigestion approach. The benefit of multiple enzyme digestionwas also demonstrated in the study of protein phosphoryla-tion.19 With the use of subtilisin and elastase as complementaryproteases, additional sites of phosphorylation were discoveredin their study.
Categorization of Identified Glycoproteins. The identifica-tion of 939 N-glycosites and 523 glycoproteins from human livertissue was achieved in this study. The full list of identifiedN-glycopeptides, N-glycosites and glycoproteins is given inSupporting Information. The 523 identified N-glycoproteinsfrom the human liver tissue were categorized by Gene Ontology(GO) and the annotations of 399 were found. The diagrams ofthe classification were shown in Figure 5. These proteins werefound to have various functions and be involved in manybiological processes. The identified glycoproteins includecoagulation and complement factors, receptors, enzymes,proteases and protease inhibitors. Functional information ofthe identified glycoproteins was further explored on Expasy(http://www.expasy.org). Some proteins related to liver diseasewere found to be N-glycosylated. Hepatocyte growth factorreceptor (c-Met) was found to be N-glycosylated at Asn202 whichwas identified by N-glycopeptides generated by both trypsinand thermolysin. This glycoprotein is receptor for hepatocytegrowth factor and scatter factor which has a tyrosine-protein
Figure 4. N-glycosites of Neural-cadherin identified by pepsin (ingreen), by thermolysin (in yellow), and by trypsin (in gray). Thein silico tryptic glycopeptides are underlined with black line.
Glycoproteomics Analysis of Human Liver Tissue research articles
Journal of Proteome Research • Vol. 8, No. 2, 2009 657
kinase activity. The c-Met has been found to be overexpressedin hepatocellular carcinoma (HCC) compared with nontumor-ous liver tissue.27 Scavenger receptor class B member 1, whichis a receptor for different ligands such as phospholipids,cholesterol ester, lipoproteins, phosphatidylserine and apop-totic cells, plays a critical role in hepatitis C virus (HCV)attachment and/or cell entry by interacting with HCV E1/E2glycoproteins heterodime.28 It was identified by pepsin withN-glycosites at Asn102 and Asn330. Identification of these gly-coproteins confirms that the adoption of multiple enzymedigestion is beneficial for glycoproteomics analysis.
A distinguished result is the identification of 6 glycosyltrans-ferases and 10 glycosidases. All the identified glycosyltrans-ferases and glycosidases and their functions are listed in Table1. The collaboration of these enzymes controls the synthesisand degradation of glycoproteins.29 Glycosyltransferases trans-fer sugar residues from an activated donor substrate to agrowing carbohydrate group.30 Previous studies have found that
these glycosyltransferases in mammalians are widely glycosy-lated, especially N-glycosylated, for example, bovine �1, 4-ga-lactosyltansferase, which catalyzes the specific transfer ofgalactose from UDP-galactose to a nonreducing terminalN-acetyl-D-glucosamine, forming a �1,4 glycosidic linkage.31 Sixglycosyltransferases were confirmed to be glycosylated in thisstudy. It is of great importance to study the expression ofglycosyltransferases in liver tissue since the structures ofcomplex carbohydrates are altered in cancer and these changesare highly associated with invasion and metastasis of cancercells. A good example is the overexpression of N-acetylglu-cosaminyltransferases III (GnT-III) and V (GnT-V) in cancerprogression and metastasis.32 Some of the identified glycosyl-transferases are also disease related, for example, ProteinO-mannosyl-transferase 1, which transfers mannosyl residuesto the hydroxyl group of serine or threonine residues to formO-mannosyl glycans. It was found that mutations in POMT1gene would deactivate this glycosyltranseferase and causeWalker-Warburg syndrome.33 Furthermore, in some particularcircumstances, glycosylation is necessary for the proper foldingof these glycosyltransferases themselves and maintaining theiractivities. It is important to investigate the relationship betweenthe function and modification of these glycosyltransferases,which is not so clear yet. N-glycosites were found in 10glycosidases in this study. Glycosidases degrade glycoproteins,glycolipids and other glycoconjugates in lysosome, especiallyin liver.34 The deficiency of lysosomal enzyme may causelysosomal storage disease which is fatal to human of all ages.For example, Schindler disease is an infantile neuroaxonaldystrophy resulting from the deficient activity of the lysosomalhydrolase, R-N-acetylgalactosaminidase (R-NAGA),35 which isidentified with two N-glycosylation sites on Asn177 and Asn201
in this study.Glycosylation is largely restricted to the addition of complex
glycans to proteins or lipids localized on the cell surface orwithin lumenal compartments of subcellular organelles, suchas the endoplasmic reticulum (ER), Golgi apparatus, endo-somes, or lysosomes. Therefore, the subcellular location ofidentified glycoprotein can be used to evaluate the reliabilityof the glycoproteome analysis. We used freely available predic-tion software SingalP 2.036 for the predication of signal peptidesand TMHMM (version2.0)37 for the prediction of transmem-brane regions as Zhang et al.11 did to predict the likelysubcellular localization of all the identified glycoproteins. Ofthe 939 identified N-glycosites, 896 (95.4%) were from cellsurface proteins, secreted proteins and transmembrane pro-teins, where glycosylation are expected to happen. This per-centage is higher than that reported by Zhang et al. (93.4%),11
which indicated higher specificity for the identification ofglycoproteins in this study. Low percentage of intracellularproteins was also observed in the GO annotation for theglycoproteins identified in this study. Only a few identifiedproteins were annotated as nuclear (9, 1.7%), cytoskeleton (5,0.96%) and mitochondrion (4, 0.76%). Above subcellular loca-tion of glycoproteins demonstrated the reliability of glycopro-tein identification, which was largely attributed to the highconfidence of N-glycopeptide identification.
Discussion
Shotgun proteomics has become a powerful tool in com-prehensive characterization of proteins in complex biologicalsamples. It begins with the digestion of protein mixtures intopeptides by using proteases, typically trypsin. The resulting
Figure 5. Categorization of identified glycoproteins by GeneOntology. (A) Biological process, (B) cellular component, and (C)molecular function.
research articles Chen et al.
658 Journal of Proteome Research • Vol. 8, No. 2, 2009
Tab
le1.
List
of
Iden
tifi
edG
lyco
sylt
ran
sfer
ases
and
Gly
cosi
das
esan
dT
hei
rFu
nct
ion
sa
IPI
pro
tein
nam
efu
nct
ion
glyc
osi
tes
IPI0
0184
851
Typ
e2
lact
osa
min
eal
ph
a-2,
3-si
alyl
tran
sfer
ase
Invo
lved
inth
esy
nth
esis
of
sial
yl-p
arag
lob
osi
de,
ap
recu
rso
ro
fsi
alyl
-Lew
isX
det
erm
inan
t.H
asan
alp
ha-
2,3-
sial
yltr
ansf
eras
eac
tivi
tyto
war
dG
al-�
1,4-
Glc
NA
cst
ruct
ure
on
glyc
op
rote
ins
and
glyc
olip
ids.
Has
are
stri
cted
sub
stra
tesp
ecifi
city
;it
uti
lizes
Gal
-�1,
4-G
lcN
Ac
on
glyc
op
rote
ins,
and
neo
lact
ote
trao
sylc
eram
ide
and
neo
lact
oh
exao
sylc
eram
ide,
bu
tn
ot
lact
ote
trao
sylc
eram
ide,
lact
osy
lcer
amid
eo
ras
ialo
-GM
1
N30
8
IPI0
0298
235
Splic
eIs
ofo
rm1
of
Pro
tein
O-m
ann
osy
l-tr
ansf
eras
e1
Tra
nsf
ers
man
no
syl
resi
du
esto
the
hyd
roxy
lgr
ou
po
fse
rin
eo
rth
reo
nin
ere
sid
ues
.C
oex
pre
ssio
no
fb
oth
PO
MT
1an
dP
OM
T2
isn
eces
sary
for
enzy
me
acti
vity
;ex
pre
ssio
no
fei
ther
PO
MT
1o
rP
OM
T2
alo
ne
isin
suffi
cien
t
N47
1
IPI0
0382
432
Splic
eIs
ofo
rm1
of
N-a
cety
lglu
cosa
min
e-1-
ph
osp
ho
tran
sfer
ase
sub
un
itsR
/�p
recu
rso
rC
atal
yzes
the
form
atio
no
fm
ann
ose
6-p
ho
sph
ate
(M6P
)m
arke
rso
nh
igh
man
no
sety
pe
olig
osa
cch
arid
esin
the
Go
lgi
app
arat
us.
M6P
resi
du
esar
ere
qu
ired
tob
ind
toth
eM
6Pre
cep
tors
(MP
R),
wh
ich
med
iate
the
vesi
cula
rtr
ansp
ort
of
lyso
som
alen
zym
esto
the
end
oso
mal
/pre
lyso
som
alco
mp
artm
ent
N69
9
IPI0
0058
192
Splic
eIs
ofo
rm1
of
GD
P-f
uco
sep
rote
inO
-fu
cosy
ltra
nsf
eras
e1
pre
curs
or
Cat
alyz
esth
ere
acti
on
that
atta
ches
fuco
seth
rou
ghan
O-g
lyco
sid
iclin
kage
toa
con
serv
edse
rin
eo
rth
reo
nin
ere
sid
ue
inE
GF
do
mai
ns.
Pla
ysa
cru
cial
role
inN
otc
hsi
gnal
ing.
N16
0
IPI0
0025
874
Do
lich
yl-d
iph
osp
ho
olig
osa
cch
arid
e-p
rote
ingl
yco
sylt
ran
sfer
ase
67kD
asu
bu
nit
pre
curs
or
Ess
enti
alsu
bu
nit
of
N-o
ligo
sacc
har
yltr
ansf
eras
een
zym
ew
hic
hca
taly
zes
the
tran
sfer
of
ah
igh
man
no
seo
ligo
sacc
har
ide
fro
ma
lipid
-lin
ked
olig
osa
cch
arid
ed
on
or
toan
asp
arag
ine
resi
du
ew
ith
inan
Asn
-X-S
er/T
hr
con
sen
sus
mo
tif
inn
asce
nt
po
lyp
epti
de
chai
ns
N33
8
IPI0
0013
887
CM
P-N
-ace
tyln
eura
min
ate-
�-ga
lact
osa
mid
e-R
-2,6
-si
alyl
tran
sfer
ase
Tra
nsf
ers
sial
icac
idfr
om
the
do
no
ro
fsu
bst
rate
CM
P-s
ialic
acid
toga
lact
ose
con
tain
ing
acce
pto
rsu
bst
rate
sN
149,
N16
1
IPI0
0012
440
Pla
smaR
-L-f
uco
sid
ase
pre
curs
or
R-L
-Fu
cosi
das
eis
resp
on
sib
lefo
rh
ydro
lyzi
ng
theR
-1,6
-lin
ked
fuco
sejo
ined
toth
ere
du
cin
g-en
dN
-ace
tylg
luco
sam
ine
of
the
carb
oh
ydra
tem
oie
ties
of
glyc
op
rote
ins
N23
9
IPI0
0009
410
ER
deg
rad
atio
n-e
nh
anci
ngR
-man
no
sid
ase-
like
3In
volv
edin
end
op
lasm
icre
ticu
lum
-ass
oci
ated
deg
rad
atio
n(E
RA
D).
Acc
eler
ates
the
glyc
op
rote
inE
RA
Db
yp
rote
aso
mes
.T
his
pro
cess
dep
end
so
nm
ann
ose
-tri
mm
ing
fro
mth
eN
-gly
can
s.Se
ems
toh
aveR
-1,2
-man
no
sid
ase
acti
vity
N19
5
IPI0
0012
585
�-H
exo
sam
inid
ase
bet
ach
ain
pre
curs
or
Res
po
nsi
ble
for
the
deg
rad
atio
no
fG
M2
gan
glio
sid
es,
and
ava
riet
yo
fo
ther
mo
lecu
les
con
tain
ing
term
inal
N-a
cety
lh
exo
sam
ines
,in
the
bra
inan
do
ther
tiss
ues
N32
3,N
327,
N84
IPI0
0441
344
�-G
alac
tosi
das
ep
recu
rso
rC
leav
es�-
linke
dte
rmin
alga
lact
osy
lre
sid
ues
fro
mga
ngl
iosi
des
,gl
yco
pro
tein
s,an
dgl
yco
sam
ino
glyc
ans
N46
4,N
555
IPI0
0414
909
R-N
-Ace
tylg
alac
tosa
min
idas
ep
recu
rso
rH
ydro
lysi
so
fte
rmin
aln
on
red
uci
ng
N-a
cety
l-D
-gal
acto
sam
ine
resi
du
esin
N-a
cety
l-R
-D-g
alac
tosa
min
ides
N17
7,N
201
IPI0
0328
488
Splic
eIs
ofo
rm1
of
Ep
idid
ymis
-sp
ecifi
cR
-man
no
sid
ase
pre
curs
or
Hyd
roly
sis
of
term
inal
,n
on
red
uci
ngR
-D-m
ann
ose
resi
du
esin
R-D
-man
no
sid
esN
516
IPI0
0293
088
Lyso
som
alal
ph
a-gl
uco
sid
ase
pre
curs
or
Hyd
roly
sis
of
term
inal
,n
on
red
uci
ng
1,4-
linke
dR
-D-g
luco
sere
sid
ues
wit
hre
leas
eo
fR
-D-g
luco
seN
140,
N47
0,N
882,
N92
5
IPI0
0011
454
Splic
eIs
ofo
rm2
of
Neu
tral
R-g
luco
sid
ase
AB
pre
curs
or
Cle
aves
seq
uen
tial
lyth
e2
inn
erm
ost
R-1
,3-l
inke
dgl
uco
sere
sid
ues
fro
mth
eG
lc(2
)Man
(9)G
lcN
Ac(
2)o
ligo
sacc
har
ide
pre
curs
or
of
imm
atu
regl
yco
pro
tein
s
N19
3
IPI0
0012
989
man
no
sid
ase,
alp
ha,
clas
s2B
,m
emb
er1
pre
curs
or
Nec
essa
ryfo
rth
eca
tab
olis
mo
fN
-lin
ked
carb
oh
ydra
tes
rele
ased
du
rin
ggl
yco
pro
tein
turn
ove
r.C
leav
esal
lkn
ow
nty
pes
of
R-m
ann
osi
dic
linka
ges
N36
7,N
766
IPI0
0025
869
R-G
alac
tosi
das
eA
pre
curs
or
Hyd
roly
sis
of
term
inal
,n
on
red
uci
ngR
-D-g
alac
tose
resi
du
esin
R-D
-gal
acto
sid
esN
215
aT
he
fun
ctio
ns
of
pro
tein
sar
eo
bta
ined
fro
mE
xpas
y(h
ttp
://w
ww
.exp
asy.
org
).
Glycoproteomics Analysis of Human Liver Tissue research articles
Journal of Proteome Research • Vol. 8, No. 2, 2009 659
peptides are separated by liquid chromatography and thenanalyzed by tandem mass spectrometry.38,39 However, theentire sequence of each identified protein could not be coveredsince only a small portion of peptides will be detected by massspectrometry. Some of trypsin generated peptides cannot beidentified either because they are too small or too hydrophilicto retain on reversed phase LC column. In an analysis ofhemoglobin proteins from human blood samples, the shortHis-containing peptides, GHGK (at position R 57-60) andAHGK (at position � 62-65), were most likely washed off theLC column during loading sample and not detected by MS.25
On the other hand, large peptides could not be identified bymost types of ion trap mass spectrometers because of thelimitation in instrument resolution. In fact, peptides of asuitable mass range would be preferred for analysis in shotgunproteomics. According to MW distribution of the identifiedpeptides in this study as shown in Figure 1, peptides smallerthan 800 Da or bigger than 3500 Da are not likely to beidentified by the shotgun proteomics approach. Although wedo not need the complete sequence to identify a protein, thismass preference would lead to the missing of modification sitesin glycoproteomics analysis when the modified peptides areout of the mass range. In silico digestion of proteins in humanproteome database by trypsin indicated that many N-glyco-peptides are not in the mass range of 800-3500 Da, whichmeans that these N-glycosites could not be located if onlytrypsin is used.
To circumvent this problem, a multiple enzyme digestionstrategy was integrated with the hydrazide capture protocol toanalyze the N-glycosylation in human liver tissue in this study.Besides trypsin, two other proteases, that is, pepsin andthermolysin, with broader specificity were used. The combina-tion of using these three proteases resulted in the identificationof 939 N-glycosites; among them, 317 (33.7%) N-glycosites wereonly identified by pepsin and thermolysin. Among the 317additional N-glycosites, 98 (30.9%) could not be identified bytrypsin in theory because the corresponding in silico trypticpeptides are either too small or too big to detect in massspectrometry. This study clearly demonstrated that the cover-age of N-glycosylation sites could be significantly increased dueto the adoption of multiple enzyme digestion.
The liver is a vital organ present in human body and is thelargest one. It plays a major role in metabolism and has avariety of functions in the body, including glycogen storage,decomposition of red blood cells, plasma protein synthesis, anddetoxification. Liver diseases, such as viral hepatitis, andhepatocellular carcinoma, are vital to human life. There areover 350 million hepatitis virus carriers in the world; over amillion deaths per year (about 10% of all deaths in the adultage range) can be attributed to viral hepatitis, and over a milliondeaths each year are caused by liver cancer.7,40,41 The HumanLiver Proteome Project (HLPP) is one of the initiatives launchedby the Human Proteome Organization (HUPO).7 Its globalscientific objectives are to reveal the human liver proteome,expression profiles, modification profiles, a protein linkagemap, and a proteome localization map. Post-translationalmodification (PTMs) profile of proteins in human liver is oneof global scientific objectives of HLPP. The study of proteinglycosylation in human liver is of great importance since manyliver diseases are correlated with glycoproteins. For example,Deficiency of R-1-antitrypsin, an N-linked glycoprotein, is themost common genetic cause of liver disease in children. It alsocauses chronic liver injury and hepatocellular carcinoma in
adults.42 N-acetylglucosaminyltransferases III (GnT-III) and V(GnT-V), play a pivotal role in the processing of N-linkedglycoproteins, and are highly involved in cancer progressionand metastasis.32 Although phosphoproteomics analysis ofhuman liver tissue has been recently conducted in our labora-tory,43 few works have been done for the analysis of N-glycosylation in human liver tissues. Totally, 523 glycoproteinswith 939 N-glycosites were identified with high confidence inthis study. These proteins were found to have various functionsand be involved in many biological processes. The identifica-tion of glycosyltransferases and glycosidase may provide analternative way to judge liver function by the analysis of therelative expressions of these proteins. Moreover, this large-scaleanalysis could contribute new N-glycosites to the database ofhuman N-glycosites (www.unipep.org), which is a valuableresource for biomarker discovery.
Conclusion
This is the first large-scale analysis of human liver N-glycoproteome by shotgun proteomics approach. A combina-tion of hydrazide capture method and multiple enzyme diges-tion strategy was applied for the comprehensive analysis whichresulted in the identification of 939 N-glycosites and 523glycoproteins from human liver tissue. This data set couldprovide valuable information for the study of liver disease andbiomarker discovery. It is shown that protease with broaderspecificity gives good complement to that of trypsin. Use oftrypsin alone resulted in identification of 622 N-glycosites, whileuse of pepsin and thermolysin resulted in identification ofadditional 317 N-glycosites. Among the 317 additional N-glycosites, 98 (30.9%) could not be identified by trypsin intheory because the corresponding in silico tryptic peptides areeither too small or too big. This study clearly demonstratedthat the coverage of N-glycosylation sites could be significantlyincreased due to the adoption of multiple enzyme digestion.
Acknowledgment. Financial supports from theNational Natural Sciences Foundation of China (No.20675081, 20735004), the China State Key Basic ResearchProgram Grant (2005CB522701, 2007CB914104), the ChinaHigh Technology Research Program Grant (2006AA02A309),the Knowledge Innovation program of CAS (KJCX2.YW.HO9,KSCX2.YW.079) and the Knowledge Innovation program ofDICP to H.Z. and National Natural Sciences Foundation ofChina (No. 20605022, 90713017) to M.Y. are gratefullyacknowledged.
Supporting Information Available: Supporting tableincludes all the identified glycopeptides, N-glycosites andglycoproteins identified by typsin, thermolysin and pepsin,respectively. This material is available free of charge via theInternet at http://pubs.acs.org.
References(1) Apweiler, R.; Hermjakob, H.; Sharon, N. On the frequency of
protein glycosylation, as deduced from analysis of the SWISS-PROTdatabase. Biochim. Biophys. Acta 1999, 1473 (1), 4–8.
(2) Varki, A. Biological roles of oligosaccharidessall of the theoriesare correct. Glycobiology 1993, 3 (2), 97–130.
(3) Glinsky, G. V. Antigen presentation, aberrant glycosylation andtumor progression. Crit. Rev. Oncol. Hematol. 1994, 17 (1), 27–51.
(4) Lee, R. T.; Lauc, G.; Lee, Y. C. Glycoproteomics: protein modifica-tions for versatile functionssMeeting on glycoproteomics. EMBORep. 2005, 6 (11), 1018–1022.
(5) Zhang, H.; Yi, E. C.; Li, X. J.; Mallick, P.; Kelly-Spratt, K. S.;Masselon, C. D.; Camp, D. G.; Smith, R. D.; Kemp, C. J.; Aebersold,
research articles Chen et al.
660 Journal of Proteome Research • Vol. 8, No. 2, 2009
R. High throughput quantitative analysis of serum proteins usingglycopeptide capture and liquid chromatography mass spectrom-etry. Mol. Cell. Proteomics 2005, 4 (2), 144–155.
(6) Ramachandran, P.; Boontheung, P.; Xie, Y. M.; Sondej, M.; Wong,D. T.; Loo, J. A. Identification of N-linked glycoproteins in humansaliva by glycoprotein capture and mass spectrometry. J. ProteomeRes. 2006, 5 (6), 1493–1503.
(7) He, F. C. Human Liver Proteome ProjectsPlan, progress, andperspectives. Mol. Cell. Proteomics 2005, 4 (12), 1841–1848.
(8) Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H. Glyco-proteomics based on tandem mass spectrometry of glycopeptides.J. Chromatogr., B 2007, 849 (1-12), 115–128.
(9) Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R. Identification andquantification of N-linked glycoproteins using hydrazide chem-istry, stable isotope labeling and mass spectrometry. Nat. Biotech-nol. 2003, 21 (6), 660–666.
(10) Liu, T.; Qian, W. J.; Gritsenko, M. A.; Camp, D. G.; Monroe, M. E.;Moore, R. J.; Smith, R. D. Human plasma N-glycoproteomeanalysis by immunoaffinity subtraction, hydrazide chemistry, andmass spectrometry. J. Proteome Res. 2005, 4 (6), 2070–2080.
(11) Zhang, H.; Liu, A. Y.; Loriaux, P.; Wollscheid, B.; Zhou, Y.; Watts,J. D.; Aebersold, R. Mass spectrometric detection of tissue proteinsin plasma. Mol. Cell. Proteomics 2007, 6 (1), 64–71.
(12) Olsen, J. V.; Ong, S. E.; Mann, M. Trypsin cleaves exclusivelyC-terminal to arginine and lysine residues. Mol. Cell. Proteomics2004, 3 (6), 608–614.
(13) Imre, T.; Schlosser, G.; Pocsfalvi, G.; Siciliano, R.; Molnar-Szollosi,E.; Kremmer, T.; Malorni, A.; Vekey, K. Glycosylation site analysisof human alpha-1-acid glycoprotein (AGP) by capillary liquidchromatography-electrospray mass spectrometry. J. Mass Spec-trom. 2005, 40 (11), 1472–1483.
(14) Bezouska, K.; Sklenar, J.; Novak, P.; Halada, P.; Havlicek, V.; Kraus,M.; Ticha, M.; Jonakova, V. Determination of the complete covalentstructure of the major glycoform of DQH sperm surface protein,a novel trypsin-resistant boar seminal plasma O-glycoproteinrelated to pB1 protein. Protein Sci. 1999, 8 (7), 1551–1556.
(15) Cutalo, J. M.; Deterding, L. J.; Tomer, K. B. Characterization ofglycopeptides from HIV-I-SF2 gp120 by liquid chromatographymass spectrometry. J. Am. Soc. Mass. Spectrom. 2004, 15 (11), 1545–1555.
(16) Larsen, M. R.; Hojrup, P.; Roepstorff, P. Characterization of gel-separated glycoproteins using two-step proteolytic digestion com-bined with sequential microcolumns and mass spectrometry. Mol.Cell. Proteomics 2005, 4 (2), 107–119.
(17) Clowers, B. H.; Dodds, E. D.; Seipert, R. R.; Lebrilla, C. B. Sitedetermination of protein glycosylation based on digestion withimmobilized nonspecific proteases and Fourier transform ioncyclotron resonance mass spectrometry. J. Proteome Res. 2007, 6(10), 4032–4040.
(18) Wu, S. L.; Kim, J.; Hancock, W. S.; Karger, B. Extended RangeProteomic Analysis (ERPA): A new and sensitive LC-MS platformfor high sequence coverage of complex proteins with extensivepost-translational modifications-comprehensive analysis of beta-casein and epidermal growth factor receptor (EGFR). J. ProteomeRes. 2005, 4 (4), 1155–1170.
(19) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark,J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss,A.; Clark, J. I.; Yates, J. R. Shotgun identification of proteinmodifications from protein complexes and lens tissue. Proc. Natl.Acad. Sci. U.S.A. 2002, 99 (12), 7900–7905.
(20) Tian, Y. A.; Zhou, Y.; Elliott, S.; Aebersold, R.; Zhang, H. Solid-phase extraction of N-linked glycopeptides. Nat. Protoc. 2007, 2(2), 334–339.
(21) Wang, F. J.; Dong, J.; Jiang, X. G.; Ye, M. L.; Zou, H. F. Capillarytrap column with strong cation-exchange monolith for automatedshotgun proteome analysis. Anal. Chem. 2007, 79 (17), 6599–6606.
(22) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlatetandem mass-spectral data of peptides with amino-acid-sequencesin a protein database. J. Am. Soc. Mass. Spectrom. 1994, 5 (11),976–989.
(23) Jiang, X. N.; Jiang, X. G.; Han, G. H.; Ye, M. L.; Zou, H. F.Optimization of filtering criterion for SEQUEST database searchingto improve proteome coverage in shotgun proteomics. BMC Bioinf.2007, 8.
(24) King, N. L.; Deutsch, E. W.; Ranish, J. A.; Nesvizhskii, A. I.; Eddes,J. S.; Mallick, P.; Eng, J.; Desiere, F.; Flory, M.; Martin, D. B.; Kim,B.; Lee, H.; Raught, B.; Aebersold, R. Analysis of the Saccharomycescerevisiae proteome with PeptideAtlas. GenomeBiology 2006, 7, 11.
(25) Gatlin, C. L.; Eng, J. K.; Cross, S. T.; Detter, J. C.; Yates, J. R.Automated identification of amino acid sequence variations inproteins by HPLC/microspray tandem mass spectrometry. Anal.Chem. 2000, 72 (4), 757–763.
(26) Garcia-Castro, M. I.; Vielmetter, E.; Bronner-Fraser, M. N-cadherin,a cell adhesion molecule involved in establishment of embryonicleft-right asymmetry. Science 2000, 288 (5468), 1047–1051.
(27) Ueki, T.; Fujimoto, J.; Suzuki, T.; Yamamoto, H.; Okamoto, E.Expression of hepatocyte growth factor and its receptor c-metproto-oncogene in hepatocellular carcinoma. Hepatology 1997, 25(4), 862–866.
(28) Scarselli, E.; Ansuini, H.; Cerino, R.; Roccasecca, R. M.; Acali, S.;Filocamo, G.; Traboni, C.; Nicosia, A.; Cortese, R.; Vitelli, A. Thehuman scavenger receptor class B type I is a novel candidatereceptor for the hepatitis C virus. EMBO J. 2002, 21 (19), 5017–5025.
(29) Parodi, A. J. Role of N-oligosaccharide endoplasmic reticulumprocessing reactions in glycoprotein folding and degradation.Biochem. J. 2000, 348, 1–13.
(30) Breton, C.; Snajdrova, L.; Jeanneau, C.; Koca, J.; Imberty, A.Structures and mechanisms of glycosyltransferases. Glycobiology2006, 16 (2), 29R–37R.
(31) Dagostaro, G.; Bendiak, B.; Tropak, M. Cloning of cdna-encodingthe membrane-bound form of bovine beta-1,4-galactosyltrans-ferase. Eur. J. Biochem. 1989, 183 (1), 211–217.
(32) Taniguchi, N.; Miyoshi, E.; Ko, J. H.; Ikeda, Y.; Ihara, Y. Implicationof N-acetylglucosaminyltransferases III and V in cancer: generegulation and signaling mechanism. Biochim. Biophys. Acta 1999,1455 (2-3), 287–300.
(33) Akasaka-Manya, K.; Manya, H.; Endo, T. Mutations of the POMT1gene found in patients with Walker-Warburg syndrome lead to adefect of protein O-mannosylation. Biochem. Biophys. Res. Com-mun. 2004, 325 (1), 75–79.
(34) Aronson, N. N.; Deduve, C. Digestive activity of lysosomes 0.2.digestion of macromolecular carbohydrates by extracts of rat liverlysosomes. J. Biol. Chem. 1968, 243 (17), 4564.
(35) Wang, A. M.; Schindler, D.; Desnick, R. J. Schindler disease - themolecular lesion in the alpha-n-acetylgalactosaminidase gene thatcauses an infantile neuroaxonal dystrophy. J. Clin. Invest. 1990,86 (5), 1752–1756.
(36) Bendtsen, J. D.; Nielsen, H.; von Heijne, G.; Brunak, S. Improvedprediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004, 340(4), 783–795.
(37) Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. L.Predicting transmembrane protein topology with a hidden Markovmodel: Application to complete genomes. J. Mol. Biol. 2001, 305(3), 567–580.
(38) Aebersold, R.; Goodlett, D. R. Mass spectrometry in proteomics.Chem. Rev. 2001, 101 (2), 269–295.
(39) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics.Nature 2003, 422 (6928), 198–207.
(40) Plan and progress of human liver proteome project (HLPP). CellBiol. Int. 2005, 29 (10), S2-S2.
(41) Kirk, G. D.; Bah, E.; Montesano, R. Molecular epidemiology ofhuman liver cancer: insights into etiology, pathogenesis andprevention from The Gambia, West Africa. Carcinogenesis 2006,27 (10), 2070–2082.
(42) Rudnick, D. A.; Perlmutter, D. H. Alpha-1-antitrypsin deficiency:A new paradigm for hepatocellular carcinoma in genetic liverdisease. Hepatology 2005, 42 (3), 514–521.
(43) Han, G. H.; Ye, M. L.; Zhou, H. J.; Jiang, X. N.; Feng, S.; Jiang, X. G.;Tian, R. J.; Wan, D. F.; Zou, H. F.; Gu, J. R. Large-scale phosphop-roteome analysis of human liver tissue by enrichment andfractionation of phosphopeptides with strong anion exchangechromatography. Proteomics 2008, 8 (7), 1346–1361.
PR8008012
Glycoproteomics Analysis of Human Liver Tissue research articles
Journal of Proteome Research • Vol. 8, No. 2, 2009 661