glycoproteomics analysis of human liver tissue by

Subscriber access provided by DALIAN INST OF CHEM PHYSICS

Journal of Proteome Research is published by the American Chemical Society.1155 Sixteenth Street N.W., Washington, DC 20036

Article

Glycoproteomics Analysis of Human Liver Tissue by Combinationof Multiple Enzyme Digestion and Hydrazide Chemistry

Rui Chen, Xinning Jiang, Deguang Sun, Guanghui Han,Fangjun Wang, Mingliang Ye, Liming Wang, and Hanfa Zou

J. Proteome Res., 2009, 8 (2), 651-661 • Publication Date (Web): 21 January 2009

Downloaded from http://pubs.acs.org on February 6, 2009

More About This Article

Additional resources and features associated with this article are available within the HTML version:

• Supporting Information• Access to high resolution figures• Links to articles and content related to this article• Copyright permission to reproduce figures and/or text from this article

http://pubs.acs.org/doi/full/10.1021/pr8008012

Glycoproteomics Analysis of Human Liver Tissue by Combination of

Multiple Enzyme Digestion and Hydrazide Chemistry

Rui Chen,† Xinning Jiang,† Deguang Sun,‡ Guanghui Han,† Fangjun Wang,† Mingliang Ye,*,†

Liming Wang,‡ and Hanfa Zou*,†

Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic R&A Center, DalianInstitute of Chemical Physics, The Chinese Academy of Sciences, Dalian 116023, China, and The Second

Affiliated Hospital of Dalian Medical University, Dalian 116027, China

Received September 19, 2008

The study of protein glycosylation has lagged far behind the progress of current proteomics becauseof the enormous complexity, wide dynamic range distribution and low stoichiometric modification ofglycoprotein. Solid phase extraction of tryptic N-glycopeptides by hydrazide chemistry is becoming apopular protocol for the analysis of N-glycoproteome. However, in silico digestion of proteins in humanproteome database by trypsin indicates that a significant percentage of tryptic N-glycopeptides is notin the preferred detection mass range of shotgun proteomics approach, that is, from 800 to 3500 Da.And the quite big size of glycan groups may block trypsin to access the K, R residues near N-glycositesfor digestion, which will result in generation of big glycopeptides. Thus many N-glycosites could notbe localized if only trypsin was used to digest proteins. Herein, we describe a comprehensive way toanalyze the N-glycoproteome of human liver tissue by combination of hydrazide chemistry methodand multiple enzyme digestion. The lysate of human liver tissue was digested with three proteases,that is, trypsin, pepsin and thermolysin, with different specificities, separately. Use of trypsin aloneresulted in identification of 622 N-glycosites, while using pepsin and thermolysin resulted in identificationof 317 additional N-glycosites. Among the 317 additional N-glycosites, 98 (30.9%) could not be identifiedby trypsin in theory because the corresponding in silico tryptic peptides are either too small or too bigto detect in mass spectrometer. This study clearly demonstrated that the coverage of N-glycosites couldbe significantly increased due to the adoption of multiple enzyme digestion. A total number of 939N-glycosites were identified confidently, covering 523 noredundant glycoproteins from human livertissue, which leads to the establishment of the largest data set of glycoproteome from human liver upto now.

Keywords: Glycoproteomics • Hydrazide chemistry • Multiple enzyme digestion • Human liver proteomeproject

Introduction

Protein glycosylation is acknowledged as one of the majorpost-translational modifications with significant effects onprotein folding, conformation distribution, stability and activity.It is estimated that more than half of all mammalian proteinsare glycosylated.1 The attached oligosaccharides play structural,protective, stabilizing roles in living cells and is involved indiverse biological mechanisms, for examle, protein-proteinrecognition, receptor interaction, cell communication andadhesion.2 Besides, the aberrant glycosylation is a basic feature

of oncogenesis and tumor progression.3 Many attention hasbeen given to the growing field of glycoproteomics to studythe biological roles of protein glycosylation and the relationshipbetween glycosylation and diseases by structural characteriza-tion of glycopeptides or glycans attached to certain glycosyla-tion sites with mass spectrometric methods.4 And a variety ofbody fluids, like plasma,5 saliva6 have been studied to findbiomarkers of cancers and other diseases. However, littleattention has been paid to the organs, for example, the liver,despite the fact that the proteome study of liver can providevaluable knowledge to accelerate the research of liver diseaseand biomarker discovery.7

A crucial step for glycoproteome analysis is to specificallyenrich glycopeptides from protein digest because of thecomplexity of tissue samples and the great dynamic range ofglycoproteins.8 Solid phase extraction by hydrazide chemistryis becoming a popular protocol for enrichment of N-glycopep-tides (N-glycopeptides refer to the deglycosylated N-linkedglycopeptides in this work if not otherwise stated) for the

* To whom correspondence should be addressed. Prof. Dr. Hanfa Zou,National Chromatographic R&A Center, Dalian Institute of Chemical Physics,The Chinese Academy of Sciences, Dalian 116023, China. Tel, +86-411-84379610; fax, +86-411-84379620; e-mail, [email protected]. Prof. Dr.Mingliang Ye, National Chromatographic R&A Center, Dalian Institute ofChemical Physics, Chinese Academy of Sciences, Dalian 116023, China. Tel,+86-411-84379620; fax, +86-411-84379620; e-mail, [email protected].

† Dalian Institute of Chemical Physics, The Chinese Academy of Sciences.‡ The Second Affiliated Hospital of Dalian Medical University.

10.1021/pr8008012 CCC: $40.75 2009 American Chemical Society Journal of Proteome Research 2009, 8, 651–661 651Published on Web 01/21/2009

http://pubs.acs.org/action/showImage?doi=10.1021/pr8008012&iName=master.img-000.png&w=251&h=40

analysis of N-glycoproteome.9 With high-abundance proteindepletion and two-dimensional liquid chromatography MS/MS spectrometry, the hydrazide chemistry method resulted inthe identification of 639 N-glycosylation sites in human se-rum.10 Nevertheless, the coverage of this method to studyN-glycosylation in tissue is much lower. Only 445, 248 and 176N-glycosites were identified from prostate cancer tissue, livermetastasis and bladder cancer tissue, respectively.11

Although trypsin has the advantage of high specificity whichcleaves exclusively at arginine (R) and lysine (K) residue,12 it isfar from a perfect enzyme for studying postmodifications.Trypsin cleavage may be blocked by the attached carbohydratessterically when trypsin cleavage site was located right afterglycosylated asparagine.13 As a result, missed trypsin cleavagetends to occur more frequently when glycoprotein are digested.Moreover, some glycoproteins are not suitable for trypticdigestion either because they lack enough arginine and lysineresidues in their sequences or they are not soluble at theoptimal pH range of trypsin, for example, DQH sperm surfaceprotein. This protein has extremely low solubility at neutral orslightly alkaline pH conditions. It is necessary to use V8protease or pepsin to digest this protein at low pH value.14 Inthe analysis of gp120, an HIV envelope glycoprotein, the fiveglycosylation site could not be characterized due to incompletetrypsin digestion and the presence of multiple glycosylationsites within one peptide.15

To overcome the limitations described above, proteases, suchas proteinase K16 or Pronase17 have been adopted for digestionof several glycoproteins in a nonspecific manner. Thesenonspecific enzymes will generate a mixture of glycopeptideswith shorter sequences which are less difficult to detect by MSspectrometry. The nonglycosylated proteins and peptides arecompletely digested. So the peptides with attached glycans canbe easily isolated and analyzed by liquid chromatographytandem mass spectrometry. However, since the amino acidresidues in the peptides portions of glycopeptides are very shortwhich vary from two to five amino acid residues, it is difficultto identify the peptide sequence and the exact glycosylationsites if the sequence of the protein is unknown. An alternativemethod is the digestion of glycoproteins by Lys-C to generatea small number of peptides than tryptic fragments and theresultant larger peptides is analyzed by high resolution Fouriertransform ion cyclotron resolution mass spectrometer.18 How-ever, it is not easy to identify large peptides if low resolutionion trap mass spectrometer is used. Multiple enzyme digestionstrategy has been applied to the analysis of other modifica-tions.19 A specific enzyme, trypsin and two other enzymes withbroader specificity, subtilisin and elastase, were utilized for theprotein digestion. Each digest was analyzed separately by usingmultidimensional liquid chromatography and tandem massspectrometry. Eighteen sites of four different types of modifica-tion have been detected on three of the five proteins in a simplemixture. Their approach was further applied to analyze theproteins extracted from the lens tissue; 270 proteins wereidentified, and 11 different crystallins were found to contain atotal of 73 sites of modification.

Improved sequence coverage could be achieved by over-lapped peptides generated in multiple enzyme digestion,leading to an increased chance of finding modification sites.In addition, the digestion by protease with specificity comple-mentary to trypsin can also generate more peptides within thepreferable mass range for peptide identification and modifica-tion site location. With multiple enzyme digestion, it should

be able to find N-glycosylation sites that would not bediscovered with only trypsin digestion. To the best of ourknowledge, the application of multiple enzyme digestion tostudy protein glycosylation was not conducted in proteomelevel up to now. The combination of this strategy with N-glycopeptide enrichment method such as hydrazide chemistrywould be effective in the study of N-glycoproteomics.

The comprehensive analysis of human liver tissue N-glyco-proteome by hydrazide chemistry in combination with multipleenzyme digestion strategy is presented in this work. Pepsin andthermolysin with broader specificity were used as complemen-tary proteases to trypsin. It was found that this strategy cansignificantly improve N-glycosite coverage. A total number of939 N-glycosites were identified with high confidence, covering523 noredundant proteins. This is by far the largest data setobtained for human liver N-glycoproteome.

Materials and Methods

Materials and Chemicals. All the water used in this experi-ment was prepared using a Milli-Q system (Millipore, Bedford,MA); iodoacetamide (IAA) was purchased from AMERESCO(Solon, OH); dithiothreitol (DTT), ammonium bicarbonate,urea, trifluoroacetic acid (TFA), formic acid (FA) and acetonitrile(ACN) were all obtained from Aldrich (Milwaukee, WI). All thechemicals were of analytical grade except acetonitrile, whichwas of HPLC grade. 3-[(3-Cholamidopropyl)-dimethylammo-nio]-1-propane sulfonate (CHAPS) was from USB Corporation(Cleveland, OH). The Complete Mini protease inhibitor cocktailtablets were obtained from Roche Diagnostics (Mannhaim,Germany). Trypsin (from bovine pancreas, TPCK-treated),pepsin (from porcine gastric mucosa) and thermolysin (fromBacillus thermoproteolyticus rokko were obtained from Sigma(St. Louis, MO). Affi-Gel Hz Hydrazide Gel was purchased fromBio-Rad (Hercules, CA). PNGase F was from New EnglandBiolabs (Ipswich, MA).

Sample Preparation. The human liver tissue was the non-cancerous liver tissues g2 cm outside the hepatic cancernodules removed by surgical operation from patients at theSecond Affiliated Hospital of Dalian Medical University (Dalian,China). The noncancerous liver tissue has been verified byhistopathological examination which excluded the presence ofinvading or microscopic metastatic cancer cells. The utilizationof human tissues complied with guidelines of Ethics Committeeof the Hospital. The isolated human liver tissue was placed inan ice-cold homogenization buffer consisting of 8 M urea, 4%CHAPS (w/v), 65 mM DTT, a mixture of protease inhibitor(Complete Mini protease inhibitor cocktail tablets, 1 tablet for10 mL homogenization buffer) and 100 mM NH4HCO3 at pH7.8. The tissues were homogenized in a Potter-Elvejhemhomogenizer with a Teflon piston, using 10 mL of the homog-enization buffer per 2 g of tissue. The suspension was homog-enized for approximately 1 min, sonicated for 100 W × 30 sand centrifuged at 25 000g for 1 h. The supernatant containedthe total tissue proteins. The concentration of proteins wasdetermined by BCA method, which was 7.8 mg protein/mLlysate. Appropriate volumes of protein sample were precipitatedwith 5 vol of 50% acetone, 50% alcohol and 0.1% acetic acid(v/v/v), lyophilized to dryness, and dissolved in reducingsolution (8 M urea, 100 mM NH4HCO3). The sample wasreduced by 10 mM DTT and carboxyamidomethylated in 20mM iodoacetamide.

Protein Digestion. Samples containing 5 mg proteins weredigested with three different proteases separately using different

research articles Chen et al.

652 Journal of Proteome Research • Vol. 8, No. 2, 2009

protocols. Trypsin digestion: Protein sample was diluted with100 mM PBS buffer (pH ) 8.0) to a urea concentration of 1 M.A total of 200 µg of TPCK-treated trypsin was added andincubated for 24 h at 37 °C. The reaction was quenched byadding 10% TFA (v/v) to pH ) 3. Pepsin digestion: Samplecontaining the same amount of proteins was diluted with 0.1%TFA (v/v) 4-fold and adjusted to a pH between 2 and 3 with10% TFA (v/v). Pepsin was added at an enzyme-to-substrateratio of 1:25 (w/w) and incubated for 24 h at 37 °C. Thermol-ysin digestion: sample solution was diluted with 100 mMNH4HCO3 to a urea concentration of 2 M. A total of 200 µg ofthermolysin was added and incubated for 24 h at 37 °C. Thereaction was stopped by adding 10% TFA (v/v) to pH ) 3. Eachdigest was stored at -80 °C until analysis.

N-linked Glycopeptide Capture. The N-linked glycopeptidesin each digest were captured with hydrazide chemistry asdescribed in a published protocol with minor modification.20

Digested sample solution was first desalted with homemadeC18 solid phase extraction column. After it was dried in a SpeedVac (Thermo, San Jose, CA), it was reconstituted in 450 µL ofoxidation buffer (100 mM NaAc, 150 mM NaCl, pH ) 5.5); then,50 µL of 100 mM sodium periodate was added, and the reactionwas incubated at room temperature in darkness for 1 h. Theoxidized sample was desalted with C18 column again and theeluted solution was directly added to 100 µL of Bio-Rad Affi-Gel Hz Hydrazide Gel (bed volume). The coupling reaction wasleft overnight at room temperature with shaking. The gel withcaptured N-linked glycopeptides was then washed with 1 mLof 1.5 M NaCl, methanol, and 100 mM NH4HCO3 sequentiallythree times to remove nonspecific adsorptions. After additionof 250 µL of fresh 100 mM NH4HCO3 and 10 µL of PNGase F(500 units per µL) to the gel, the enzymatic release of peptidemoieties was carried out at 37 °C overnight. The end productfor this procedure is the isolation of deglycosylated peptides(termed as N-glycopeptide in this study) that originally containN-linked carbohydrates. The supernatant was collected bygentle centrifugation and combined with the supernatant of100 mM NH4HCO3 wash fraction. The peptide solution wasdesalted by C18 column, dried and reconstituted in 25 µL of0.1% formic acid as N-glycopeptide sample for LC-MS/MSanalysis.

Analysis of Enriched Glycopeptides. The analysis of theN-glycopeptide samples was carried out by 2D nano-LC/MS/MS according to our previous method with minor modifica-tion.21 The 25 µL reconstituted glycopeptide sample was loadedonto a phosphate SCX monolithic column (150 µm × 7 cm i.d.)by air pressure. After that, the SCX column with loadedglycopeptides was manually connected to a C18 column (75µm × 10 cm i.d.) in tandem by a union. The glycopeptides wereeluted onto the analytical reversed phase column using stepgradients generated with 1000 mM ammonium acetate in 0.1%formic acid. The salt concentrations of the buffer used for 10step gradients were 50, 100, 150, 200, 250, 300, 350, 400, 500and 1000 mM, respectively. Each salt step lasts 10 min with afollowing equilibrium by 0.1% formic acid for additional 10 min.After each elution step, a subsequent RPLC-MS/MS wasexecuted with a 120 min gradient from 5% to 35% acetonitrile.The RPLC-MS/MS was performed on a nano-RPLC-MS/MSsystem. A Finnigan surveyor MS pump (Thermo Finnigan, SanJose, CA) was used to deliver mobile phase. For the C18capillary separation column, one end of the fused-silica capil-lary was manually pulled to a fine point of ∼5 µm with a flametorch. The columns were in-house packed with C18 AQ beads

(5 µm, 120 Å) from Michrom BioResources (Auburn, CA) usinga pneumatic pump. The nano-RPLC column was directlycoupled with a LTQ linear ion trap mass spectrometer fromThermo Finnigan (San Jose, CA) with a nanospray source. Themobile phase consisted of mobile phase A containing 0.1%formic acid (v/v) in water, and mobile phase B containing 0.1%(v/v) formic acid in acetonitrile. A Finnigan LTQ linear ion trapmass spectrometer equipped with an ESI nanospray source wasused for the MS experiment. A voltage of 1.8 kV was applied tothe cross before the C18 capillary column. The LTQ instrumentwas operated at positive ion mode. Normalized collision energywas 35.0%. One microscan was set for each MS and MS/MSscan. All MS and MS/MS spectra were acquired in the data-dependent mode. The mass spectrometer was set that one fullMS scan was followed by six MS/MS scans on the six mostintense ions. The dynamic exclusion function was set as follows:repeat count 2, repeat duration 30 s, and exclusion duration90 s. System control and data collection were done by Xcalibursoftware version 1.4 (Thermo). The scan range was set fromm/z 400 to m/z 2000.

Database Searching and Data Processing. The MS/MSspectra were searched using SEQUEST22 (version 2.7) againsta composite database including both original and reversedhuman protein database of International Protein Index (ipi.human 3.17.fasta, including 60 234 entries, http://www.ebi.ac.uk/IPI/IPIhuman.html). Enzyme was selected as theprotease used for protein digestion. The cleavage site was setaccording to manufacture’s guide, trypsin, KR/P; pepsin, FLE/VAG; thermolysin, LFIVMA/P. Enzyme limits was set fullyenzymatic, cleaving at both ends. A maximum of two missedcleavage was allowed. Carboxamidomethylation (+57) was setas static modification and the following dynamic modificationswere used: oxidation of methionine (+16), and a PNGaseFcatalyzed conversion of asparagines to aspartic acid at N-glycosites (+1). The database search results of each proteasewere filtered with the criteria optimized by SFOER (ver-sion2.3).23 A false detection rate (FDR) of less than 2% was setfor all peptide identifications. The majority of the identifiedpeptides containing NXS/T-motif and only ∼15% of identifiedpeptides did not contain this motif. These peptide identifica-tions probably resulted from nonspecific isolation of N-linkedglycopeptides or random assignment in database search. Thispercentage of is lower than that in a similar work (∼20%) forisolation of N-glycopeptides,11 which indicated the high speci-ficity of N-glycopeptide isolation in this study. To reduce thefalse positive rate of identified N-glycosites, the peptideswithout such motifs were removed. After the filtration, fewpeptides that contained NXS/T were identified from thereversed database search (FDR < 0.1% for identification ofN-glycopeptides was obtained), which suggested the highidentification confidence. The identified glycoproteins fromhuman liver tissue were categorized by Gene Ontology (GO)with component, function and process terms extracted fromtext-based annotation files downloaded from the EuropeanBioinformatics Institute ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/).

Results

Glycoproteomic Analysis with Trypsin. The proteins ex-tracted from liver tissue were first digested by trypsin, and thenthe N-linked glycopeptides in the resulting digest was im-mobilized onto the hydrazide beads. After thoroughly washing,the peptide moieties of the captured N-linked glycopeptides

Glycoproteomics Analysis of Human Liver Tissue research articles

Journal of Proteome Research • Vol. 8, No. 2, 2009 653

were released from the beads by deglycosylation using PNGaseF. The deglycosylated peptides (N-glycopeptides) were finallyanalyzed by 2D nano-LC-MS/MS. Three 2D nano-LC-MS/MSruns were performed in this study, which resulted in theidentification of 840 unique tryptic N-glycopeptides from 384glycoproteins. The identified N-glycopeptides and glycoproteinswere given in the Supporting Information. On the basis of theNXS/T-motif, 622 N-glycosites were identified from human livertissue.

The acquisition settings for MS/MS were in the range of400-2000 Da in this study, thus, the peptide mass detectionrange should be 400-6000 Da considering three charge statesof peptides (+1, +2, +3). However, the mass range of identifiedN-glycopeptides is much narrower than this range as shownin Figure 1A. Most of the N-glycopeptides locate themselvesin a mass range of 800-3500 Da. For comparison, the MWdistribution of nonglycopeptides identified by directly analyzingthe tryptic peptides from the same sample with 1D nanoLC-MS/MS is also given in Figure 1B. It was found that the two

distributions are very similar. The similar mass ranges ofpeptides identified by shotgun proteomics were well-docu-mented in the literature.24 The small or big peptides cannotbe identified through database search probably because theseMS/MS spectra do not contain enough fragment information.The above facts indicate that the preferable mass range forpeptides identified by shotgun proteomics is from 800 to 3500Da.

The N-glycosite is not likely to be identified by shotgunproteomics approach if the mass of corresponding N-glyco-peptides is not in the range of 800-3500 Da. Therefore, it is ofinterest to see how many tryptic N-glycopeptides derived fromthe proteins in the human proteome are within this range. Themass distributions of tryptic peptides and tryptic N-glycopep-tides generated by in silico digestion of all proteins in thehuman protein database (IPI Human 3.17) were given in Figure2A. The in silico tryptic N-glycopeptides are tryptic peptidescontaining NXS/T motif. The percentage of tryptic peptides andtryptic N-glycopeptides in the mass range of 800-3500 Da were

Figure 1. Molecular weight distribution of peptides identified from tryptic digest of protein extracted from human liver tissue. (A)N-glycopeptides enriched by hydrazide chemistry; (B) nonglycopeptides identified by direct analysis.



http://pubs.acs.org/action/showImage?doi=10.1021/pr8008012&iName=master.img-001.jpg&w=299&h=445

70.0% and 68.7%, respectively. Since most proteins wouldgenerate dozens of peptides after trypsin digestion and proteinscould be identified by any of these peptides with their massesin the range, the theoretical proteome coverage by trypticpeptides was still close to 100% even though 30.0% trypticpeptides are not in this range. However, the situation for theidentification of protein modification is different. The glycositecould be identified only if a peptide containing this site isidentified. As only 68.7% tryptic N-glycopeptides are in thepreferred mass range, a significant amount of N-glycositescould not be identified if only trypsin was used. The above-

mentioned in silico digestion did not include the missedcleavage sites, and the missed cleavage is inevitable in proteindigestion especially for glycoproteins. We analyzed the se-quences of the identified N-glycopeptides and nonglycopep-tides from human liver tissue. It was found that among the840 identified unique tryptic N-glycopeptides, 169 (20.1%) havemissed cleavage site. While among the nonglycopeptidesidentified by 1D LC-MS/MS, 152 of 1069 (14.2%) have missedcleavage sites, respectively. These results indicate that missedcleavage tends to happen more frequently when glycoproteinis digested. To investigate how missed cleavage affects N-

Figure 2. Comparison of molecular weight distribution of peptides by in silico digestion of proteins in human protein database (IPIHuman 3.17) by trypsin between all the peptides (in blue) and peptides with NXS/T motif (in red) with different missed cleavage sites.(A) No missed cleavage site; (B) one missed cleavage site; and (C) two missed cleavage sites. The average molecular weight of eachdistribution is listed in the inserted table.




glycosite identification, in silico digestion of proteins fromhuman protein database with one missed cleavage was alsoconducted and the resulting peptide mass distributions areshown in Figure 2B. It is astonishing to find that N-glycopep-tides with missed cleavage sites were much bigger than thoseof all peptides. With one missed cleavage site, the average MWfor all peptides and N-glycopeptides was 2422.61 and 3756.37Da, respectively. There are still 71.9% of peptides locatethemselves in the mass range of 800-3500 Da for all thepeptides; however, only 52.2% of tryptic N-glycopeptides werein the mass range of 800-3500 Da. When two missed cleavageswere considered, this percentage descended to 33.5%. For allpeptides, this percentage is still as high as 59.2% as shown inFigure 2C. These data indicate that much more N-glycopeptidescould not be identified when trypsin cleavage sites are missed.These facts demonstrated that it was inadequate to use onlytrypsin for N-glycosite identification. To improve N-glycositecoverage in bottom up approach, enzymes with complemen-tary cleavage specificity should be adopted to digest proteins.

Glycoproteomic Analysis with Multiple Proteases Diges-tion. We adopted multiple enzyme digestion to improve thecoverage of N-glycosites for glycoproteome analysis. The aimof multiple enzyme digestion was once to promote sequencecoverage in identification of amino acid sequence variationsin shotgun analysis by the means of generating overlappedpeptides.25 When it was applied to the protein identificationfrom complex samples or tissue, the entire protein sequenceof abundant proteins could be covered in the chances ofidentifying a modification like phosphorylation on a specificamino acid residue.19 Pepsin and thermolysin were used ascomplements to trypsin in this study. After digestion, theN-glycopeptides were enriched by the same hydrazide chem-istry method and analyzed by 2D nano-LC-MS/MS.The number of identified N-glycosites and glycoproteins bythese two proteases as well as by trypsin were given in Figure3. The full lists of identified N-glycopeptides and glycoproteinswere given in the Supporting Information. Among all 939N-glycosites identified, only 20 N-glycosites were idetnified byall the three proteases and another 125 N-glycosites wereidentified by both proteases. These N-glycosites were identifiedby multiple peptides generated by different proteases; thus, theidentification of these sites were very confident. The use ofmultiple proteases can reduce the ambiguity in mappingmodifications and increase the possibility of obtaining highquality MS/MS spectrum to identify peptides.

However, the identified N-glycopeptides sharing the sameN-glycosite account for a very small proportion in the totalnumber of identified N-glycopeptides. Much more N-glyco-peptides with different N-glycosites were identified usingdifferent proteases which indicated good complement inN-glycosite identification to trypsin and so the N-glycositecoverage could be significantly increased. The other twoproteases used in this study led to identification of 317 (33.7%)additional N-glycosites and 139 (26.6%) additional glycopro-teins. This complementary effect is illustrated in Figure 3. Itcan be clearly seen that the adoption of multiple enzymedigestion could both enhance N-glycosites coverage of theidentified glycoproteins and identify the glycoproteins whichcannot be identified by using trypsin alone.

Among the 317 additional N-glycosites, 98 (30.9%) could notbe identified by trypsin in theory because the correspondingin silico tryptic peptides are not in the mass range of 800-3500Da. For example, the tryptic peptide carrying N-glycosite at

Asn426 of splice isoform Sap-mu-0 of Proactivator polypeptideprecursor is N#STK, which is too small to be identified.However, this N-glycosite can be well-identified with a peptideof LEKN#STKQEIL (MW ) 1303.73) by thermolysin. On theother hand, for proteins which lack lysine and arginine residues,trypsin digestion will generate peptides with longer sequence.These peptides cannot easily be fragmented in CID to give highquality mass spectra for sequence identification. For example,MAL2 Protein has only two trypsin cleavage sites in itssequence. Its tryptic peptide carrying N-glycosite is LQG-WVMFVSVTAFFFSLLFLGMFLSGMVAQIDANWNFLDFAYHFTV-FVFYFGAFLLEAAATSLHDLHCN#TTITGQPLLSDNQYNINVAA-SIFAFMTTACYGCSLGLALR (MW ) 12113.93); analysis of thishuge peptide needs the top-down technique by high resolutionmass spectrometry. Its pepsin generated peptide, HDLHCN#TTITGQPLL (MW ) 1161.82), is very easy to be analyzed usingbottom-up method. Another example further demonstrated thenecessity of using multiple enzyme digestion in glycosylationanalysis as shown in Figure 4. Neural-cadherin is a calciumdependent cell-cell adhesion glycoprotein that functions dur-ing gastrulation and is required for establishment of left-rightasymmetry.26 It has 10 NXS/T motifs in the sequence of thisprotein and 7 were annotated as potential N-glycosites in theSwiss-Prot database. However, in the N-glycosites database

Figure 3. Complementary effect of multiple enzyme digestion inglycoproteomics analysis of human liver tissue. (A) N-glycositesand (B) glycoprotein.



http://pubs.acs.org/action/showImage?doi=10.1021/pr8008012&iName=master.img-003.png&w=216&h=382

(www.unipep.org) which is based on the analysis of trypticglycopeptides, N-glycosites at Asn325 and Asn402 were found onthis protein. In our study, 4 N-glycosites were identified withmultiple enzyme digestion. The Asn402 and Asn692 were identi-fied with trypsin digestion; the molecular weight of the trypticglycopeptides was well within the detection range of massspectrometry. However, the tryptic glycopeptides carryingAsn273 and Asn572 have a molecular weight of more than 7500,which exceed the detection range greatly and cannot beidentified using trypsin only. These two N-glycosites weresuccessfully identified by N-glycopeptides generated by pepsinand thermolysin as shown in Figure 4.

In the glycoproteomics analysis, the necessity for using ofcomplementary proteases is more prominent due to theblocking effect of the large size of carbohydrates attached toglycosites. When the trypsin cleavage site is adjacent to theN-glycosites, the cleavage site could probably be missed, whichmay lead to the generation of peptides with longer sequences.If these peptides are larger than 3500 Da, the glycosites wouldbe difficult to be identified. To investigate how this blockingeffect affects the digestion of glycoproteins, we analyzed thesequences of identified glycopeptides with trypsin digestion.Among the glycopeptides which have their N-glycosites adja-cent to lysine or arginine, 30 of 88 trypsin cleavage site havebeen missed, which suggests that blocking effects do exist.Moreover, it was found that missed cleavage tends to happenmore frequently when trypsin cleavage sites are at the C-terminal side of N-glycosylated asparagine residue, which isin accordance with the reported study.13 In the human proteindatabase, there are nearly 27 000 NXS/T-motifs in which X iseither K or R. If the cleavage sites in the motif cannot be cleavedby trypsin, 35.5% of the resulting tryptic peptides are biggerthan 3500 Da and cannot be identified. However, theseglycosites could be identified by using other proteases withdifferent specificities. In the N-glycosites identified by pepsinand thermolysin, 43 N-glycosites were found to be adjacent totrypsin cleavage site. The majority of these glycosites (37 of

43) could not be identified with digestion by trypsin in thisstudy, for instance, FCN#RTVSDCCPDF identified by pepsinfrom splice isoform 1 of tubulointerstitial nephritis antigen-like precursor, and ITEEAN#RTTME from splice isoform 1 oflaminin alpha-4 chain precursor. The arginine residue is tooclose to the N-glycosite to be accessed by trypsin, and therefore,the corresponding tryptic peptides are not detected. The missedcleavage of R79 in splice isoform 1 of tubulointerstitial nephritisantigen-like precursor will give a peptide sequence of 52 aminoacid residues, ADDCALPYLGAICYCDLFCN#RTVSDCCPDFWD-FCLGVPPPFPPIQGCMHGGR, which has a MW of 5701.59 Da.Decoding the structure of peptides like this needs top-downtechnique. In this study, digestion with thermolysin resultedin the localization of 80% of these N-glycosites, which showsthat protease with broader specificity may be more useful thanpepsin in post-translational modification analysis as comple-ment to trypsin. The above-mentioned results demonstratedthat some glycosites could not be identified with trypsinbecause the N-glycopeptides with missed cleavage sites are toobig to be identified. However, in the N-glycopeptides generatedwith completely in silico trypsin digestion, 4.7% of them havea molecular weight less than 800. The corresponding glycositescannot be identified because these N-glycopeptides are toosmall. Missed cleavage may increase the chance for theidentification of these N-glycosites because of the generationof bigger N-glycopeptides. To investigate this possibility, insilico digestion of identified tryptic N-glycopeptides with missedcleavage and calculation of the mass of these N-glycopeptideswith complete digestion were performed. It was found that ofthe 63 identified tryptic N-glycopeptides with missed cleavagesites, 6 of corresponding N-glycopeptides with complete diges-tion were too small to be detected. Missed cleavage sites didincrease the chance to identify these N-glycosites. It shouldbe mentioned that this makes little contribution on thecoverage of glycosite in proteome level since N-glycopeptidesbelow the preferable detection mass range takes a very smallportion of tryptic N-glycopeptides as shown in Figure 2A.Besides, these N-glycosites may be identified by other proteaseswith different cleavage specificities in the multiple enzymedigestion approach. The benefit of multiple enzyme digestionwas also demonstrated in the study of protein phosphoryla-tion.19 With the use of subtilisin and elastase as complementaryproteases, additional sites of phosphorylation were discoveredin their study.

Categorization of Identified Glycoproteins. The identifica-tion of 939 N-glycosites and 523 glycoproteins from human livertissue was achieved in this study. The full list of identifiedN-glycopeptides, N-glycosites and glycoproteins is given inSupporting Information. The 523 identified N-glycoproteinsfrom the human liver tissue were categorized by Gene Ontology(GO) and the annotations of 399 were found. The diagrams ofthe classification were shown in Figure 5. These proteins werefound to have various functions and be involved in manybiological processes. The identified glycoproteins includecoagulation and complement factors, receptors, enzymes,proteases and protease inhibitors. Functional information ofthe identified glycoproteins was further explored on Expasy(http://www.expasy.org). Some proteins related to liver diseasewere found to be N-glycosylated. Hepatocyte growth factorreceptor (c-Met) was found to be N-glycosylated at Asn202 whichwas identified by N-glycopeptides generated by both trypsinand thermolysin. This glycoprotein is receptor for hepatocytegrowth factor and scatter factor which has a tyrosine-protein

Figure 4. N-glycosites of Neural-cadherin identified by pepsin (ingreen), by thermolysin (in yellow), and by trypsin (in gray). Thein silico tryptic glycopeptides are underlined with black line.




kinase activity. The c-Met has been found to be overexpressedin hepatocellular carcinoma (HCC) compared with nontumor-ous liver tissue.27 Scavenger receptor class B member 1, whichis a receptor for different ligands such as phospholipids,cholesterol ester, lipoproteins, phosphatidylserine and apop-totic cells, plays a critical role in hepatitis C virus (HCV)attachment and/or cell entry by interacting with HCV E1/E2glycoproteins heterodime.28 It was identified by pepsin withN-glycosites at Asn102 and Asn330. Identification of these gly-coproteins confirms that the adoption of multiple enzymedigestion is beneficial for glycoproteomics analysis.

A distinguished result is the identification of 6 glycosyltrans-ferases and 10 glycosidases. All the identified glycosyltrans-ferases and glycosidases and their functions are listed in Table1. The collaboration of these enzymes controls the synthesisand degradation of glycoproteins.29 Glycosyltransferases trans-fer sugar residues from an activated donor substrate to agrowing carbohydrate group.30 Previous studies have found that

these glycosyltransferases in mammalians are widely glycosy-lated, especially N-glycosylated, for example, bovine �1, 4-ga-lactosyltansferase, which catalyzes the specific transfer ofgalactose from UDP-galactose to a nonreducing terminalN-acetyl-D-glucosamine, forming a �1,4 glycosidic linkage.31 Sixglycosyltransferases were confirmed to be glycosylated in thisstudy. It is of great importance to study the expression ofglycosyltransferases in liver tissue since the structures ofcomplex carbohydrates are altered in cancer and these changesare highly associated with invasion and metastasis of cancercells. A good example is the overexpression of N-acetylglu-cosaminyltransferases III (GnT-III) and V (GnT-V) in cancerprogression and metastasis.32 Some of the identified glycosyl-transferases are also disease related, for example, ProteinO-mannosyl-transferase 1, which transfers mannosyl residuesto the hydroxyl group of serine or threonine residues to formO-mannosyl glycans. It was found that mutations in POMT1gene would deactivate this glycosyltranseferase and causeWalker-Warburg syndrome.33 Furthermore, in some particularcircumstances, glycosylation is necessary for the proper foldingof these glycosyltransferases themselves and maintaining theiractivities. It is important to investigate the relationship betweenthe function and modification of these glycosyltransferases,which is not so clear yet. N-glycosites were found in 10glycosidases in this study. Glycosidases degrade glycoproteins,glycolipids and other glycoconjugates in lysosome, especiallyin liver.34 The deficiency of lysosomal enzyme may causelysosomal storage disease which is fatal to human of all ages.For example, Schindler disease is an infantile neuroaxonaldystrophy resulting from the deficient activity of the lysosomalhydrolase, R-N-acetylgalactosaminidase (R-NAGA),35 which isidentified with two N-glycosylation sites on Asn177 and Asn201

in this study.Glycosylation is largely restricted to the addition of complex

glycans to proteins or lipids localized on the cell surface orwithin lumenal compartments of subcellular organelles, suchas the endoplasmic reticulum (ER), Golgi apparatus, endo-somes, or lysosomes. Therefore, the subcellular location ofidentified glycoprotein can be used to evaluate the reliabilityof the glycoproteome analysis. We used freely available predic-tion software SingalP 2.036 for the predication of signal peptidesand TMHMM (version2.0)37 for the prediction of transmem-brane regions as Zhang et al.11 did to predict the likelysubcellular localization of all the identified glycoproteins. Ofthe 939 identified N-glycosites, 896 (95.4%) were from cellsurface proteins, secreted proteins and transmembrane pro-teins, where glycosylation are expected to happen. This per-centage is higher than that reported by Zhang et al. (93.4%),11

which indicated higher specificity for the identification ofglycoproteins in this study. Low percentage of intracellularproteins was also observed in the GO annotation for theglycoproteins identified in this study. Only a few identifiedproteins were annotated as nuclear (9, 1.7%), cytoskeleton (5,0.96%) and mitochondrion (4, 0.76%). Above subcellular loca-tion of glycoproteins demonstrated the reliability of glycopro-tein identification, which was largely attributed to the highconfidence of N-glycopeptide identification.

Discussion

Shotgun proteomics has become a powerful tool in com-prehensive characterization of proteins in complex biologicalsamples. It begins with the digestion of protein mixtures intopeptides by using proteases, typically trypsin. The resulting

Figure 5. Categorization of identified glycoproteins by GeneOntology. (A) Biological process, (B) cellular component, and (C)molecular function.




Tab

le1.

List

of

Iden

tifi

edG

lyco

sylt

ran

sfer

ases

and

Gly

cosi

das

esan

dT

hei

rFu

nct

ion

sa

IPI

pro

tein

nam

efu

nct

ion

glyc

osi

tes

IPI0

0184

851

Typ

e2

lact

osa

min

eal

ph

a-2,

3-si

alyl

tran

sfer

ase

Invo

lved

inth

esy

nth

esis

of

sial

yl-p

arag

lob

osi

de,

ap

recu

rso

ro

fsi

alyl

-Lew

isX

det

erm

inan

t.H

asan

alp

ha-

2,3-

sial

yltr

ansf

eras

eac

tivi

tyto

war

dG

al-�

1,4-

Glc

NA

cst

ruct

ure

on

glyc

op

rote

ins

and

glyc

olip

ids.

Has

are

stri

cted

sub

stra

tesp

ecifi

city

;it

uti

lizes

Gal

-�1,

4-G

lcN

Ac

on

glyc

op

rote

ins,

and

neo

lact

ote

trao

sylc

eram

ide

and

neo

lact

oh

exao

sylc

eram

ide,

bu

tn

ot

lact

ote

trao

sylc

eram

ide,

lact

osy

lcer

amid

eo

ras

ialo

-GM

1

N30

8

IPI0

0298

235

Splic

eIs

ofo

rm1

of

Pro

tein

O-m

ann

osy

l-tr

ansf

eras

e1

Tra

nsf

ers

man

no

syl

resi

du

esto

the

hyd

roxy

lgr

ou

po

fse

rin

eo

rth

reo

nin

ere

sid

ues

.C

oex

pre

ssio

no

fb

oth

PO

MT

1an

dP

OM

T2

isn

eces

sary

for

enzy

me

acti

vity

;ex

pre

ssio

no

fei

ther

PO

MT

1o

rP

OM

T2

alo

ne

isin

suffi

cien

t

N47

1

IPI0

0382

432

Splic

eIs

ofo

rm1

of

N-a

cety

lglu

cosa

min

e-1-

ph

osp

ho

tran

sfer

ase

sub

un

itsR

/�p

recu

rso

rC

atal

yzes

the

form

atio

no

fm

ann

ose

6-p

ho

sph

ate

(M6P

)m

arke

rso

nh

igh

man

no

sety

pe

olig

osa

cch

arid

esin

the

Go

lgi

app

arat

us.

M6P

resi

du

esar

ere

qu

ired

tob

ind

toth

eM

6Pre

cep

tors

(MP

R),

wh

ich

med

iate

the

vesi

cula

rtr

ansp

ort

of

lyso

som

alen

zym

esto

the

end

oso

mal

/pre

lyso

som

alco

mp

artm

ent

N69

9

IPI0

0058

192

Splic

eIs

ofo

rm1

of

GD

P-f

uco

sep

rote

inO

-fu

cosy

ltra

nsf

eras

e1

pre

curs

or

Cat

alyz

esth

ere

acti

on

that

atta

ches

fuco

seth

rou

ghan

O-g

lyco

sid

iclin

kage

toa

con

serv

edse

rin

eo

rth

reo

nin

ere

sid

ue

inE

GF

do

mai

ns.

Pla

ysa

cru

cial

role

inN

otc

hsi

gnal

ing.

N16

0

IPI0

0025

874

Do

lich

yl-d

iph

osp

ho

olig

osa

cch

arid

e-p

rote

ingl

yco

sylt

ran

sfer

ase

67kD

asu

bu

nit

pre

curs

or

Ess

enti

alsu

bu

nit

of

N-o

ligo

sacc

har

yltr

ansf

eras

een

zym

ew

hic

hca

taly

zes

the

tran

sfer

of

ah

igh

man

no

seo

ligo

sacc

har

ide

fro

ma

lipid

-lin

ked

olig

osa

cch

arid

ed

on

or

toan

asp

arag

ine

resi

du

ew

ith

inan

Asn

-X-S

er/T

hr

con

sen

sus

mo

tif

inn

asce

nt

po

lyp

epti

de

chai

ns

N33

8

IPI0

0013

887

CM

P-N

-ace

tyln

eura

min

ate-

�-ga

lact

osa

mid

e-R

-2,6

-si

alyl

tran

sfer

ase

Tra

nsf

ers

sial

icac

idfr

om

the

do

no

ro

fsu

bst

rate

CM

P-s

ialic

acid

toga

lact

ose

con

tain

ing

acce

pto

rsu

bst

rate

sN

149,

N16

1

IPI0

0012

440

Pla

smaR

-L-f

uco

sid

ase

pre

curs

or

R-L

-Fu

cosi

das

eis

resp

on

sib

lefo

rh

ydro

lyzi

ng

theR

-1,6

-lin

ked

fuco

sejo

ined

toth

ere

du

cin

g-en

dN

-ace

tylg

luco

sam

ine

of

the

carb

oh

ydra

tem

oie

ties

of

glyc

op

rote

ins

N23

9

IPI0

0009

410

ER

deg

rad

atio

n-e

nh

anci

ngR

-man

no

sid

ase-

like

3In

volv

edin

end

op

lasm

icre

ticu

lum

-ass

oci

ated

deg

rad

atio

n(E

RA

D).

Acc

eler

ates

the

glyc

op

rote

inE

RA

Db

yp

rote

aso

mes

.T

his

pro

cess

dep

end

so

nm

ann

ose

-tri

mm

ing

fro

mth

eN

-gly

can

s.Se

ems

toh

aveR

-1,2

-man

no

sid

ase

acti

vity

N19

5

IPI0

0012

585

�-H

exo

sam

inid

ase

bet

ach

ain

pre

curs

or

Res

po

nsi

ble

for

the

deg

rad

atio

no

fG

M2

gan

glio

sid

es,

and

ava

riet

yo

fo

ther

mo

lecu

les

con

tain

ing

term

inal

N-a

cety

lh

exo

sam

ines

,in

the

bra

inan

do

ther

tiss

ues

N32

3,N

327,

N84

IPI0

0441

344

�-G

alac

tosi

das

ep

recu

rso

rC

leav

es�-

linke

dte

rmin

alga

lact

osy

lre

sid

ues

fro

mga

ngl

iosi

des

,gl

yco

pro

tein

s,an

dgl

yco

sam

ino

glyc

ans

N46

4,N

555

IPI0

0414

909

R-N

-Ace

tylg

alac

tosa

min

idas

ep

recu

rso

rH

ydro

lysi

so

fte

rmin

aln

on

red

uci

ng

N-a

cety

l-D

-gal

acto

sam

ine

resi

du

esin

N-a

cety

l-R

-D-g

alac

tosa

min

ides

N17

7,N

201

IPI0

0328

488

Splic

eIs

ofo

rm1

of

Ep

idid

ymis

-sp

ecifi

cR

-man

no

sid

ase

pre

curs

or

Hyd

roly

sis

of

term

inal

,n

on

red

uci

ngR

-D-m

ann

ose

resi

du

esin

R-D

-man

no

sid

esN

516

IPI0

0293

088

Lyso

som

alal

ph

a-gl

uco

sid

ase

pre

curs

or

Hyd

roly

sis

of

term

inal

,n

on

red

uci

ng

1,4-

linke

dR

-D-g

luco

sere

sid

ues

wit

hre

leas

eo

fR

-D-g

luco

seN

140,

N47

0,N

882,

N92

5

IPI0

0011

454

Splic

eIs

ofo

rm2

of

Neu

tral

R-g

luco

sid

ase

AB

pre

curs

or

Cle

aves

seq

uen

tial

lyth

e2

inn

erm

ost

R-1

,3-l

inke

dgl

uco

sere

sid

ues

fro

mth

eG

lc(2

)Man

(9)G

lcN

Ac(

2)o

ligo

sacc

har

ide

pre

curs

or

of

imm

atu

regl

yco

pro

tein

s

N19

3

IPI0

0012

989

man

no

sid

ase,

alp

ha,

clas

s2B

,m

emb

er1

pre

curs

or

Nec

essa

ryfo

rth

eca

tab

olis

mo

fN

-lin

ked

carb

oh

ydra

tes

rele

ased

du

rin

ggl

yco

pro

tein

turn

ove

r.C

leav

esal

lkn

ow

nty

pes

of

R-m

ann

osi

dic

linka

ges

N36

7,N

766

IPI0

0025

869

R-G

alac

tosi

das

eA

pre

curs

or

Hyd

roly

sis

of

term

inal

,n

on

red

uci

ngR

-D-g

alac

tose

resi

du

esin

R-D

-gal

acto

sid

esN

215

aT

he

fun

ctio

ns

of

pro

tein

sar

eo

bta

ined

fro

mE

xpas

y(h

ttp

://w

ww

.exp

asy.

org

).



peptides are separated by liquid chromatography and thenanalyzed by tandem mass spectrometry.38,39 However, theentire sequence of each identified protein could not be coveredsince only a small portion of peptides will be detected by massspectrometry. Some of trypsin generated peptides cannot beidentified either because they are too small or too hydrophilicto retain on reversed phase LC column. In an analysis ofhemoglobin proteins from human blood samples, the shortHis-containing peptides, GHGK (at position R 57-60) andAHGK (at position � 62-65), were most likely washed off theLC column during loading sample and not detected by MS.25

On the other hand, large peptides could not be identified bymost types of ion trap mass spectrometers because of thelimitation in instrument resolution. In fact, peptides of asuitable mass range would be preferred for analysis in shotgunproteomics. According to MW distribution of the identifiedpeptides in this study as shown in Figure 1, peptides smallerthan 800 Da or bigger than 3500 Da are not likely to beidentified by the shotgun proteomics approach. Although wedo not need the complete sequence to identify a protein, thismass preference would lead to the missing of modification sitesin glycoproteomics analysis when the modified peptides areout of the mass range. In silico digestion of proteins in humanproteome database by trypsin indicated that many N-glyco-peptides are not in the mass range of 800-3500 Da, whichmeans that these N-glycosites could not be located if onlytrypsin is used.

To circumvent this problem, a multiple enzyme digestionstrategy was integrated with the hydrazide capture protocol toanalyze the N-glycosylation in human liver tissue in this study.Besides trypsin, two other proteases, that is, pepsin andthermolysin, with broader specificity were used. The combina-tion of using these three proteases resulted in the identificationof 939 N-glycosites; among them, 317 (33.7%) N-glycosites wereonly identified by pepsin and thermolysin. Among the 317additional N-glycosites, 98 (30.9%) could not be identified bytrypsin in theory because the corresponding in silico trypticpeptides are either too small or too big to detect in massspectrometry. This study clearly demonstrated that the cover-age of N-glycosylation sites could be significantly increased dueto the adoption of multiple enzyme digestion.

The liver is a vital organ present in human body and is thelargest one. It plays a major role in metabolism and has avariety of functions in the body, including glycogen storage,decomposition of red blood cells, plasma protein synthesis, anddetoxification. Liver diseases, such as viral hepatitis, andhepatocellular carcinoma, are vital to human life. There areover 350 million hepatitis virus carriers in the world; over amillion deaths per year (about 10% of all deaths in the adultage range) can be attributed to viral hepatitis, and over a milliondeaths each year are caused by liver cancer.7,40,41 The HumanLiver Proteome Project (HLPP) is one of the initiatives launchedby the Human Proteome Organization (HUPO).7 Its globalscientific objectives are to reveal the human liver proteome,expression profiles, modification profiles, a protein linkagemap, and a proteome localization map. Post-translationalmodification (PTMs) profile of proteins in human liver is oneof global scientific objectives of HLPP. The study of proteinglycosylation in human liver is of great importance since manyliver diseases are correlated with glycoproteins. For example,Deficiency of R-1-antitrypsin, an N-linked glycoprotein, is themost common genetic cause of liver disease in children. It alsocauses chronic liver injury and hepatocellular carcinoma in

adults.42 N-acetylglucosaminyltransferases III (GnT-III) and V(GnT-V), play a pivotal role in the processing of N-linkedglycoproteins, and are highly involved in cancer progressionand metastasis.32 Although phosphoproteomics analysis ofhuman liver tissue has been recently conducted in our labora-tory,43 few works have been done for the analysis of N-glycosylation in human liver tissues. Totally, 523 glycoproteinswith 939 N-glycosites were identified with high confidence inthis study. These proteins were found to have various functionsand be involved in many biological processes. The identifica-tion of glycosyltransferases and glycosidase may provide analternative way to judge liver function by the analysis of therelative expressions of these proteins. Moreover, this large-scaleanalysis could contribute new N-glycosites to the database ofhuman N-glycosites (www.unipep.org), which is a valuableresource for biomarker discovery.

Conclusion

This is the first large-scale analysis of human liver N-glycoproteome by shotgun proteomics approach. A combina-tion of hydrazide capture method and multiple enzyme diges-tion strategy was applied for the comprehensive analysis whichresulted in the identification of 939 N-glycosites and 523glycoproteins from human liver tissue. This data set couldprovide valuable information for the study of liver disease andbiomarker discovery. It is shown that protease with broaderspecificity gives good complement to that of trypsin. Use oftrypsin alone resulted in identification of 622 N-glycosites, whileuse of pepsin and thermolysin resulted in identification ofadditional 317 N-glycosites. Among the 317 additional N-glycosites, 98 (30.9%) could not be identified by trypsin intheory because the corresponding in silico tryptic peptides areeither too small or too big. This study clearly demonstratedthat the coverage of N-glycosylation sites could be significantlyincreased due to the adoption of multiple enzyme digestion.

Acknowledgment. Financial supports from theNational Natural Sciences Foundation of China (No.20675081, 20735004), the China State Key Basic ResearchProgram Grant (2005CB522701, 2007CB914104), the ChinaHigh Technology Research Program Grant (2006AA02A309),the Knowledge Innovation program of CAS (KJCX2.YW.HO9,KSCX2.YW.079) and the Knowledge Innovation program ofDICP to H.Z. and National Natural Sciences Foundation ofChina (No. 20605022, 90713017) to M.Y. are gratefullyacknowledged.

Supporting Information Available: Supporting tableincludes all the identified glycopeptides, N-glycosites andglycoproteins identified by typsin, thermolysin and pepsin,respectively. This material is available free of charge via theInternet at http://pubs.acs.org.

References(1) Apweiler, R.; Hermjakob, H.; Sharon, N. On the frequency of

protein glycosylation, as deduced from analysis of the SWISS-PROTdatabase. Biochim. Biophys. Acta 1999, 1473 (1), 4–8.

(2) Varki, A. Biological roles of oligosaccharidessall of the theoriesare correct. Glycobiology 1993, 3 (2), 97–130.

(3) Glinsky, G. V. Antigen presentation, aberrant glycosylation andtumor progression. Crit. Rev. Oncol. Hematol. 1994, 17 (1), 27–51.

(4) Lee, R. T.; Lauc, G.; Lee, Y. C. Glycoproteomics: protein modifica-tions for versatile functionssMeeting on glycoproteomics. EMBORep. 2005, 6 (11), 1018–1022.

(5) Zhang, H.; Yi, E. C.; Li, X. J.; Mallick, P.; Kelly-Spratt, K. S.;Masselon, C. D.; Camp, D. G.; Smith, R. D.; Kemp, C. J.; Aebersold,



R. High throughput quantitative analysis of serum proteins usingglycopeptide capture and liquid chromatography mass spectrom-etry. Mol. Cell. Proteomics 2005, 4 (2), 144–155.

(6) Ramachandran, P.; Boontheung, P.; Xie, Y. M.; Sondej, M.; Wong,D. T.; Loo, J. A. Identification of N-linked glycoproteins in humansaliva by glycoprotein capture and mass spectrometry. J. ProteomeRes. 2006, 5 (6), 1493–1503.

(7) He, F. C. Human Liver Proteome ProjectsPlan, progress, andperspectives. Mol. Cell. Proteomics 2005, 4 (12), 1841–1848.

(8) Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H. Glyco-proteomics based on tandem mass spectrometry of glycopeptides.J. Chromatogr., B 2007, 849 (1-12), 115–128.

(9) Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R. Identification andquantification of N-linked glycoproteins using hydrazide chem-istry, stable isotope labeling and mass spectrometry. Nat. Biotech-nol. 2003, 21 (6), 660–666.

(10) Liu, T.; Qian, W. J.; Gritsenko, M. A.; Camp, D. G.; Monroe, M. E.;Moore, R. J.; Smith, R. D. Human plasma N-glycoproteomeanalysis by immunoaffinity subtraction, hydrazide chemistry, andmass spectrometry. J. Proteome Res. 2005, 4 (6), 2070–2080.

(11) Zhang, H.; Liu, A. Y.; Loriaux, P.; Wollscheid, B.; Zhou, Y.; Watts,J. D.; Aebersold, R. Mass spectrometric detection of tissue proteinsin plasma. Mol. Cell. Proteomics 2007, 6 (1), 64–71.

(12) Olsen, J. V.; Ong, S. E.; Mann, M. Trypsin cleaves exclusivelyC-terminal to arginine and lysine residues. Mol. Cell. Proteomics2004, 3 (6), 608–614.

(13) Imre, T.; Schlosser, G.; Pocsfalvi, G.; Siciliano, R.; Molnar-Szollosi,E.; Kremmer, T.; Malorni, A.; Vekey, K. Glycosylation site analysisof human alpha-1-acid glycoprotein (AGP) by capillary liquidchromatography-electrospray mass spectrometry. J. Mass Spec-trom. 2005, 40 (11), 1472–1483.

(14) Bezouska, K.; Sklenar, J.; Novak, P.; Halada, P.; Havlicek, V.; Kraus,M.; Ticha, M.; Jonakova, V. Determination of the complete covalentstructure of the major glycoform of DQH sperm surface protein,a novel trypsin-resistant boar seminal plasma O-glycoproteinrelated to pB1 protein. Protein Sci. 1999, 8 (7), 1551–1556.

(15) Cutalo, J. M.; Deterding, L. J.; Tomer, K. B. Characterization ofglycopeptides from HIV-I-SF2 gp120 by liquid chromatographymass spectrometry. J. Am. Soc. Mass. Spectrom. 2004, 15 (11), 1545–1555.

(16) Larsen, M. R.; Hojrup, P.; Roepstorff, P. Characterization of gel-separated glycoproteins using two-step proteolytic digestion com-bined with sequential microcolumns and mass spectrometry. Mol.Cell. Proteomics 2005, 4 (2), 107–119.

(17) Clowers, B. H.; Dodds, E. D.; Seipert, R. R.; Lebrilla, C. B. Sitedetermination of protein glycosylation based on digestion withimmobilized nonspecific proteases and Fourier transform ioncyclotron resonance mass spectrometry. J. Proteome Res. 2007, 6(10), 4032–4040.

(18) Wu, S. L.; Kim, J.; Hancock, W. S.; Karger, B. Extended RangeProteomic Analysis (ERPA): A new and sensitive LC-MS platformfor high sequence coverage of complex proteins with extensivepost-translational modifications-comprehensive analysis of beta-casein and epidermal growth factor receptor (EGFR). J. ProteomeRes. 2005, 4 (4), 1155–1170.

(19) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark,J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss,A.; Clark, J. I.; Yates, J. R. Shotgun identification of proteinmodifications from protein complexes and lens tissue. Proc. Natl.Acad. Sci. U.S.A. 2002, 99 (12), 7900–7905.

(20) Tian, Y. A.; Zhou, Y.; Elliott, S.; Aebersold, R.; Zhang, H. Solid-phase extraction of N-linked glycopeptides. Nat. Protoc. 2007, 2(2), 334–339.

(21) Wang, F. J.; Dong, J.; Jiang, X. G.; Ye, M. L.; Zou, H. F. Capillarytrap column with strong cation-exchange monolith for automatedshotgun proteome analysis. Anal. Chem. 2007, 79 (17), 6599–6606.

(22) Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlatetandem mass-spectral data of peptides with amino-acid-sequencesin a protein database. J. Am. Soc. Mass. Spectrom. 1994, 5 (11),976–989.

(23) Jiang, X. N.; Jiang, X. G.; Han, G. H.; Ye, M. L.; Zou, H. F.Optimization of filtering criterion for SEQUEST database searchingto improve proteome coverage in shotgun proteomics. BMC Bioinf.2007, 8.

(24) King, N. L.; Deutsch, E. W.; Ranish, J. A.; Nesvizhskii, A. I.; Eddes,J. S.; Mallick, P.; Eng, J.; Desiere, F.; Flory, M.; Martin, D. B.; Kim,B.; Lee, H.; Raught, B.; Aebersold, R. Analysis of the Saccharomycescerevisiae proteome with PeptideAtlas. GenomeBiology 2006, 7, 11.

(25) Gatlin, C. L.; Eng, J. K.; Cross, S. T.; Detter, J. C.; Yates, J. R.Automated identification of amino acid sequence variations inproteins by HPLC/microspray tandem mass spectrometry. Anal.Chem. 2000, 72 (4), 757–763.

(26) Garcia-Castro, M. I.; Vielmetter, E.; Bronner-Fraser, M. N-cadherin,a cell adhesion molecule involved in establishment of embryonicleft-right asymmetry. Science 2000, 288 (5468), 1047–1051.

(27) Ueki, T.; Fujimoto, J.; Suzuki, T.; Yamamoto, H.; Okamoto, E.Expression of hepatocyte growth factor and its receptor c-metproto-oncogene in hepatocellular carcinoma. Hepatology 1997, 25(4), 862–866.

(28) Scarselli, E.; Ansuini, H.; Cerino, R.; Roccasecca, R. M.; Acali, S.;Filocamo, G.; Traboni, C.; Nicosia, A.; Cortese, R.; Vitelli, A. Thehuman scavenger receptor class B type I is a novel candidatereceptor for the hepatitis C virus. EMBO J. 2002, 21 (19), 5017–5025.

(29) Parodi, A. J. Role of N-oligosaccharide endoplasmic reticulumprocessing reactions in glycoprotein folding and degradation.Biochem. J. 2000, 348, 1–13.

(30) Breton, C.; Snajdrova, L.; Jeanneau, C.; Koca, J.; Imberty, A.Structures and mechanisms of glycosyltransferases. Glycobiology2006, 16 (2), 29R–37R.

(31) Dagostaro, G.; Bendiak, B.; Tropak, M. Cloning of cdna-encodingthe membrane-bound form of bovine beta-1,4-galactosyltrans-ferase. Eur. J. Biochem. 1989, 183 (1), 211–217.

(32) Taniguchi, N.; Miyoshi, E.; Ko, J. H.; Ikeda, Y.; Ihara, Y. Implicationof N-acetylglucosaminyltransferases III and V in cancer: generegulation and signaling mechanism. Biochim. Biophys. Acta 1999,1455 (2-3), 287–300.

(33) Akasaka-Manya, K.; Manya, H.; Endo, T. Mutations of the POMT1gene found in patients with Walker-Warburg syndrome lead to adefect of protein O-mannosylation. Biochem. Biophys. Res. Com-mun. 2004, 325 (1), 75–79.

(34) Aronson, N. N.; Deduve, C. Digestive activity of lysosomes 0.2.digestion of macromolecular carbohydrates by extracts of rat liverlysosomes. J. Biol. Chem. 1968, 243 (17), 4564.

(35) Wang, A. M.; Schindler, D.; Desnick, R. J. Schindler disease - themolecular lesion in the alpha-n-acetylgalactosaminidase gene thatcauses an infantile neuroaxonal dystrophy. J. Clin. Invest. 1990,86 (5), 1752–1756.

(36) Bendtsen, J. D.; Nielsen, H.; von Heijne, G.; Brunak, S. Improvedprediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004, 340(4), 783–795.

(37) Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. L.Predicting transmembrane protein topology with a hidden Markovmodel: Application to complete genomes. J. Mol. Biol. 2001, 305(3), 567–580.

(38) Aebersold, R.; Goodlett, D. R. Mass spectrometry in proteomics.Chem. Rev. 2001, 101 (2), 269–295.

(39) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics.Nature 2003, 422 (6928), 198–207.

(40) Plan and progress of human liver proteome project (HLPP). CellBiol. Int. 2005, 29 (10), S2-S2.

(41) Kirk, G. D.; Bah, E.; Montesano, R. Molecular epidemiology ofhuman liver cancer: insights into etiology, pathogenesis andprevention from The Gambia, West Africa. Carcinogenesis 2006,27 (10), 2070–2082.

(42) Rudnick, D. A.; Perlmutter, D. H. Alpha-1-antitrypsin deficiency:A new paradigm for hepatocellular carcinoma in genetic liverdisease. Hepatology 2005, 42 (3), 514–521.

(43) Han, G. H.; Ye, M. L.; Zhou, H. J.; Jiang, X. N.; Feng, S.; Jiang, X. G.;Tian, R. J.; Wan, D. F.; Zou, H. F.; Gu, J. R. Large-scale phosphop-roteome analysis of human liver tissue by enrichment andfractionation of phosphopeptides with strong anion exchangechromatography. Proteomics 2008, 8 (7), 1346–1361.

PR8008012



glycoproteomics analysis of human liver tissue by

Documents