computer-assisted structural analysis of oligo- and polysaccharides: an extension of casper to...

7
Ž . Carbohydrate Research 306 1998 11–17 Computer-assisted structural analysis of oligo- and polysaccharides: An extension of CASPER to multibranched structures Roland Stenutz a , Per-Erik Jansson b , Goran Widmalm a, ) ¨ a Department of Organic Chemistry, Arrhenius Laboratory, Stockholm UniÕersity, S-106 91 Stockholm, Sweden b Clinical Research Centre, Analytical Unit, Karolinska Institute, Huddinge Hospital, NOVUM, S-141 86 Huddinge, Sweden Received 14 July 1997; accepted 8 September 1997 Abstract The CASPER program which is used for determination of the primary structure of oligo- and polysaccharides has been extended. It can now handle a reduced number of experimental signals from an NMR spectrum in the comparison to the simulated spectra of structures that it generates, an improvement which is of practical importance since all signals in NMR spectra cannot always be identified. Furthermore, the program has been enhanced to simulate NMR spectra of multibranched oligo- and polysaccharides. The new developments were tested on four saccharides of known structure but of different complexity and were shown to predict the correct structures. q 1998 Elsevier Science Ltd. All rights reserved. Keywords: CASPER; Structure; Computer; Saccharide; Glycopeptide; NMR; Lipopolysaccharide; Capsular poly- saccharide 1. Introduction Carbohydrate structure analysis is a prerequisite for further studies involving carbohydrate–protein in- teractions which are important, e.g., in protein traf- ficking, inflammation or bacterial infection processes. The analysis may be divided into the analysis of the primary structure and that of the three-dimensional structure. The former deals with the identity of the constituent monosaccharides, substituents if any, and the sequence of the monosaccharide residues. The ) Corresponding author. latter includes conformation, flexibility and dynamics which are important properties to investigate in order to understand the function of carbohydrates in rela- tion to other molecules, most often proteins. Determi- nation of the primary structure of a saccharide may still be a time-consuming process and computerized approaches are useful in order to speed up the analy- sis. We have previously reported a computerized ap- wx proach 1 , viz., CASPER which is an acronym for C omputer Assisted SP ectrum E valuation of Regular Ž . polysaccharides vide infra . A similar development which uses only 13 C NMR data has been reported by wx Lipkind et al. 2 . Computerized structurerdata-base 0008-6215r98r$19.00 q 1998 Elsevier Science Ltd All rights reserved.

Upload: roland-stenutz

Post on 02-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Ž .Carbohydrate Research 306 1998 11–17

Computer-assisted structural analysis of oligo- andpolysaccharides: An extension of CASPER to

multibranched structures

Roland Stenutz a, Per-Erik Jansson b, Goran Widmalm a,)¨a Department of Organic Chemistry, Arrhenius Laboratory, Stockholm UniÕersity, S-106 91 Stockholm,

Swedenb Clinical Research Centre, Analytical Unit, Karolinska Institute, Huddinge Hospital, NOVUM, S-141

86 Huddinge, Sweden

Received 14 July 1997; accepted 8 September 1997

Abstract

The CASPER program which is used for determination of the primary structure of oligo-and polysaccharides has been extended. It can now handle a reduced number of experimentalsignals from an NMR spectrum in the comparison to the simulated spectra of structures that itgenerates, an improvement which is of practical importance since all signals in NMR spectracannot always be identified. Furthermore, the program has been enhanced to simulate NMRspectra of multibranched oligo- and polysaccharides. The new developments were tested onfour saccharides of known structure but of different complexity and were shown to predict thecorrect structures. q 1998 Elsevier Science Ltd. All rights reserved.

Keywords: CASPER; Structure; Computer; Saccharide; Glycopeptide; NMR; Lipopolysaccharide; Capsular poly-saccharide

1. Introduction

Carbohydrate structure analysis is a prerequisitefor further studies involving carbohydrate–protein in-teractions which are important, e.g., in protein traf-ficking, inflammation or bacterial infection processes.The analysis may be divided into the analysis of theprimary structure and that of the three-dimensionalstructure. The former deals with the identity of theconstituent monosaccharides, substituents if any, andthe sequence of the monosaccharide residues. The

) Corresponding author.

latter includes conformation, flexibility and dynamicswhich are important properties to investigate in orderto understand the function of carbohydrates in rela-tion to other molecules, most often proteins. Determi-nation of the primary structure of a saccharide maystill be a time-consuming process and computerizedapproaches are useful in order to speed up the analy-sis.

We have previously reported a computerized ap-w xproach 1 , viz., CASPER which is an acronym for

Computer Assisted SPectrum Evaluation of RegularŽ .polysaccharides vide infra . A similar development

which uses only 13C NMR data has been reported byw xLipkind et al. 2 . Computerized structurerdata-base

0008-6215r98r$19.00 q 1998 Elsevier Science Ltd All rights reserved.Ž .PII S0008-6215 97 10047-7

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–1712

programs using 1H NMR data have been reportedw xboth by van Kuik et al. 3 and Hounsell and Wright

w x4 . The neural network approach to solve structuralproblems has been put forward for carbohydrates by

w xMeyer et al. 5 . Thus, a number of computerizedapproaches that complement each other have hithertobeen developed in different laboratories.

The CASPER program was developed for the anal-ysis of the primary structures of oligosaccharides andfor polysaccharides with repeating units. The ap-proach is based mainly on NMR spectroscopy andhas been used in a number of structural determina-tions, e.g., the structures of the O-antigenic poly-

w xsaccharides from Escherichia coli O 1 6 , E. coli Ow x w x3 7 , E. coli O 101 8 , and the capsular polysaccha-

w xride S-156 from a Klebsiella pneumoniae strain 9 .Ž .The program requires as input i sugar components,

Ž . Ž .ii their linkage positions, and iii an NMR spec-trum, preferably carbon-13. Often this is enough tosuggest the correct structure. Further credibility to asuggested structure can be obtained by a 1H,13Ccorrelated NMR spectrum. The approach is based onthe fact that induced chemical shift differences areobserved upon glycosylation of sugars. The NMRspectra are as a first approximation considered to bethe result of their constituent disaccharide elements,i.e., the chemical shifts of the monosaccharides andthe induced chemical shift displacements for thedisaccharides. More complex spectra are observed forbranched structures, but the analysis can be per-formed using additional correction sets for vicinallydisubstituted sugar residues. Based on information oncomponents and linkages, NMR spectra are simulatedfor all possible structures of an oligosaccharide orrepeating unit in a polysaccharide. The experimentalspectrum is subsequently compared to each of thesimulated spectra and a ranking can be obtainedbased on the fit of spectral data. The ranking is basedon a deltasum which is a summation of the absolutevalues of chemical shift differences between experi-ment and simulation and a measure of the quality ofthe fit. When the fit is very good and, at the sametime, the structural suggestion ranked second placehas a deltasumf50% higher for 13C simulations, thefirst structural suggestion can be taken as the correctone. An excellent simulation is observed when thedeviation between simulation and experiment is f0.1

13 w xppmrsignal based on C NMR data 10 . In a num-ber of cases, more than one structural suggestionshow a good fit. These structures can then be differ-entiated by a judicious choice of an NMR experimentor a specific degradation.

The advantage of the CASPER approach is that aprobable structural suggestion can rapidly be ob-tained from unassigned NMR spectra. Previously, the

13 Žprogram required all C NMR signals or a definednumber of 1H resonances, e.g., H-1 to H-4 for all

.sugar residues to be given as input, something thatmay be difficult to obtain in heavily overlappedspectra. Furthermore, multibranched structures areoften occurring in carbohydrates. The previous ver-sions of CASPER could not handle an incomplete setof 13C NMR data or multibranched structures andprompted further development of the approach.

2. Results and discussion

Simulation with a reduced number of experimentalsignals.—The structure of the O-polysaccharide of

Ž .the lipopolysaccharide LPS from Shigella flexneritype 4a has been determined using chemical methodsw x 1 1311 and the H and C NMR spectra assigned in

w xmore recent studies 12,13 . The polysaccharide haspentasaccharide repeating units with structure.

The correct structure could be simulated byCASPER using the complete 13C NMR spectrum orthe two-dimensional 1H, 13C NMR spectrum as pre-

w xviously shown 14 . Some signals were then arbitrar-ily omitted from the experimental spectrum and abest fit between experimental and simulated data wasperformed. The fitting procedure for an experimentalset with a reduced number of signals uses a recursivealgorithm to find the best fit between the experimen-tal spectrum and the spectra for each of the simulatedstructures. Omission of signals from the experimental13C NMR spectrum leads to a lower deltasum and tosmaller differences between deltasums of simulatedstructures. A comparison of the deltasums from dif-ferent structures is shown in Fig. 1 using: all signals,2, 4 or 6 arbitrarily omitted signals. In these cases,the correct structure was always ranked as numberone. When more resonances were omitted, the correctstructure did not always turn out as number one, butwas still among the top ranked. The approach canthus be considered reliable when a few chemicalshifts cannot be identified in the experimental spec-trum, a case which is often observed in practice, and

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–17 13

Fig. 1. Comparison of simulated structures vs. deltasumŽ . Ž . Ž . Ž .using all 32 v , 30 B , 28 l or 26 ' resonances in

the 13C NMR spectrum of the O-antigenic polysaccharidefrom S. flexneri type 4a.

the procedure has recently been applied in the struc-tural determination of the capsular polysaccharide

w xfrom Klebsiella type 52 15 .Simulation of a polysaccharide with a doubly -

branched residue.—The structure of the polysaccha-ride of the LPS from an Aeromonas caÕie strain was

w xrecently elucidated 16 . The polysaccharide is com-posed of pentasaccharide repeating units with struc-ture

in which one of the sugar residues, the 2-acetamido-2-deoxy-D-mannose residue, is doubly branched, andthus is 3-, 4- and 6-substituted. As input to CASPER,all 13C NMR signals were used, besides informationon sugars and their linkages. The output from thesimulation is shown in Scheme 1.

The previously deduced structure is shown to betop ranked with a deviation of 0.34 ppmrsignal and asystematic deviation of y0.07 ppmrsignal. The dif-ferences to the following structures are fairly small,but this is one of the inherent limitations of the

ŽCASPER approach. A large systematic deviation not.present in this simulation would indicate a discrep-

ancy in experimental conditions or in referencing ofthe chemical shifts. Common for the first four struc-

tural suggestions is that the terminal a-L-Rhap residuesubstitutes the 3-position of the b-D-ManpNAcbranch-point residue.

Simulation of a polysaccharide with two-branchedresidues.—The capsular polysaccharide from Kleb-

w xsiella K8,52,59 17,18 was shown to have twobranching sugar residues out of the five in the repeat-ing unit, the structure of which is given by

From the 13C NMR spectrum all but two signalscould be identified and these were given as input tothe CASPER program. The result of the simulation isgiven in Scheme 2.

In an earlier study, we had simulated the 13C NMRw xspectrum of the backbone 19 . The structure previ-

ously determined is now shown to be the first oneafter all structures have been ranked according totheir respective deltasum. For the correct structure, adeviation of 0.34 ppmrsignal and a systematic devia-tion of 0.16 ppmrsignal were observed. It can benoted that in the first four structural suggestions thea-Gal residue is linked to the b-Glc residue and thepermutation changes have occurred at the linkagepositions of the branched residues.

Simulation of a larger oligosaccharide haÕing seÕ-eral branching points.—Oligosaccharides fromglycoproteins are commonly multibranched and forsuch structures the number of permutations rapidlyincrease with increasing branching and size of theoligosaccharide. One such structure,

is a high mannose glycopeptide from hen ovalbuminw x 1320 . As input to CASPER, all signals in the CNMR spectrum originating from the carbohydratepart were used, together with information on sugarsand their linkages. The output from the simulation isshown in Scheme 3.

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–1714

Scheme 1. Result from simulation based on the 13C NMR spectrum from the O-antigen polysaccharide from an AeromonascaÕie strain.

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–17 15

The correct structure with a deviation of 0.27ppmrsignal was observed to be top ranked. In thissimulation, J values were not used in order to limitsome possibilities and already for the second struc-ture a significant change has occurred in the core

region of the glycopeptide and the structure is nolonger in accord with N-linked glycans.

Conclusions.—The CASPER program has beendeveloped to find the best fit to the simulated spectrafrom an experimental NMR spectrum with a reduced

Scheme 2. Result from simulation based on the 13C NMR spectrum from the capsular polysaccharide from KlebsiellaK8,52,59.

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–1716

number of signals. Furthermore, the program cansimulate polysaccharides and oligosaccharides thatare multibranched, e.g., those occurring in glyco-

proteins. In the cases shown, the correct structureshave consistently been selected to be the highestranked structure. However, the discrimination be-

Scheme 3. Result from simulation based on the 13C NMR spectrum from a glycopeptide of hen ovalbumin.

( )R. Stenutz et al.rCarbohydrate Research 306 1998 11–17 17

tween these multibranched saccharides has not beenlarge and, from the data shown, it would not havebeen possible to deduce unambiguously the structure,which would need to be corroborated by some addi-tional data. The present developments make the pro-gram usable for most structural problems in carbo-hydrate chemistry that can be addressed by NMRspectroscopy.

3. Methods

The 13C NMR spectrum of the capsular poly-saccharide from Klebsiella K8,52,59 was recorded at70 8C on a JEOL 270 MHz NMR spectrometer and

Ž .referenced relative to internal acetone d 31.00 .Only readily identifiable resonances were used asinput to CASPER. The chemical shifts of the gly-

Žcopeptide were adjusted and referenced to dioxane d

. Ž67.40 . The chemical shifts of b-D-GlcpNAc- 1™w xNAsn 21 were adjusted and referenced to acetone.

The CASPER program, version 3.0, was written inC and run on Silicon Graphics workstations. In thecomparison with incomplete experimental data, allpossible combinations of removal of simulated peakswere generated whereafter a peak-to-peak fit to ex-perimental data was performed. The systematic devia-

Ž .tion ppmrsignal is the average chemical shift dif-ference between signals in a simulated spectrum andthose in the experimental spectrum and may thus bepositive or negative. A large systematic deviationindicates differences in referencing. The database hasbeen modified to include cross references betweenglycosylation shifts for different residues. This imple-mentation drastically reduces the size of the databaseand leads to a facile addition of new monomers.Possible anomeric configurations were not eliminatedby use of coupling constants for anomeric protons

Žexcept for b-D-GlcpNAc- 1™NAsn which was im-plemented as a single residue.

Acknowledgements

This work was supported by grants from theSwedish Natural Science Research Council.

References

w x1 P.-E. Jansson, L. Kenne, and G. Widmalm, Carbo-Ž .hydr. Res., 188 1989 169–181.

w x2 G.M. Lipkind, A.S. Shashkov, Y.A. Knirel, E.V.Vinogradov, and N.K. Kochetkov, Carbohydr. Res.,

Ž .175 1988 59–75.w x3 J.A. van Kuik, K. Hard, and J.F.G. Vliegenthart,˚

Ž .Carbohydr. Res., 235 1992 53–68.w x4 E.F. Hounsell and D.J. Wright, Carbohydr. Res., 205

Ž .1990 19–29.w x5 B. Meyer, T. Hansen, D. Nute, P. Albersheim, A.

Ž .Darvill, W. York, and J. Sellers, Science, 251 1991542–544.

w x6 H. Baumann, P.-E. Jansson, L. Kenne, and G. Wid-Ž .malm, Carbohydr. Res., 211 1991 183–190.

w x7 E.C. Medina, G. Widmalm, A. Weintraub, P.A. Vial,M.M. Levine, and A.A. Lindberg, Eur. J. Biochem.,

Ž .224 1994 191–196.w x8 M. Staaf, F. Urbina, A. Weintraub, and G. Widmalm,

Ž .Carbohydr. Res., 297 1997 297–299.w x9 A. Johansson, P.-E. Jansson, and G. Widmalm, Car-

Ž .bohydr. Res., 253 1994 317–322.w x10 P.-E. Jansson, L. Kenne, and G. Widmalm, Acta

Ž .Chem. Scand., 45 1991 517–522.w x11 L. Kenne, B. Lindberg, K. Petersson, E. Katzenellen-

bogen, and E. Romanowska, Eur. J. Biochem., 91Ž .1978 279–284.

w x12 P.-E. Jansson, L. Kenne, and T. Wehler, Carbohydr.Ž .Res., 166 1987 271–282.

w x13 P.-E. Jansson, L. Kenne, and T. Wehler, Carbohydr.Ž .Res., 179 1988 359–368.

w x14 P.-E. Jansson, L. Kenne, and G. Widmalm, J. Chem.Ž .Inf. Comput. Sci., 31 1991 508–516.

w x15 R. Stenutz, B. Erbing, G. Widmalm, P.-E. Jansson,Ž .and W. Nimmich, Carbohydr. Res., 302 1997 79–

84.w x16 M. Linnerborg, G. Widmalm, M.M. Rahman, P.-E.

Jansson, T. Holme, F. Qadri, and M.J. Albert, Carbo-Ž .hydr. Res., 291 1996 165–174.

w x17 P.-E. Jansson, B. Lindberg, G. Widmalm, G.G.S.Dutton, A.V.S. Lim, and I.W. Sutherland, Carbohydr.

Ž .Res., 175 1988 103–109.w x18 B. Erbing, P.-E. Jansson, G. Widmalm, and W. Nim-

Ž .mich, Carbohydr. Res., 273 1995 197–205.w x19 P.-E. Jansson, L. Kenne, and G. Widmalm, Carbo-

Ž .hydr. Res., 168 1987 67–77.w x20 A. Allerhand and E. Berman, J. Am. Chem. Soc., 106

Ž .1984 2412–2420.w x21 B.W. Dijkstra, J.F.G. Vliegenthart, G. Strecker, and J.

Ž .Montreuil, Eur. J. Biochem., 130 1983 111–115.