multivariate analysis of 2d-nmr spectroscopy647421/fulltext01.pdfiv för att identifiera molekyler i...

67
Multivariate Analysis of 2D-NMR Spectroscopy -Applications in wood science and metabolomics Tommy Öman Kemiska instutionen Umeå 2013

Upload: others

Post on 27-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

Multivariate Analysis of 2D-NMR

Spectroscopy -Applications in wood science and metabolomics

Tommy Öman

Kemiska instutionen

Umeå 2013

Page 2: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

Copyright © Tommy Öman

ISBN: 978-91-7459-728-8

Picture of Aurora Berealis, taken by Stefan Bergmark Piteå.

Elektronisk version tillgänglig på http://umu.diva-portal.org/

Tryck/Printed by: Print och Media Umeå

Umeå, Sweden 2013

Page 3: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

Tillägnad Melker, Sixten, Svante och Mia.

Page 4: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

ii

Abstract

Wood is our most important renewable resource. We need better

quality and quantity both according to the wood itself and the

processes that are using wood as a raw material. Hence, the

understanding of the chemical composition of the wood is of high

importance. Improved and new methods for analyzing wood are

important to achieve better knowledge about both refining processes

and raw material. The combination of NMR and multivariate analyses

(MVA) is a powerful method for these analyses but so far it has been

limited mainly to 1D NMR. In this project, we have developed

methods for combining 2D NMR and MVA in both wood analysis and

metabolomics. This combination was used to compare samples from

normal wood and tension wood, and also trees with a down regulation

of a pectin responsible gene. Dissolving pulp was also examined using

the same combination of 2D-NMR and MVA, together with FT-IR

and solid state 13C CP-MASNMR. Here we focused on the difference

between wood type (softwood and hardwood), process type (sulfite

and sulfate) and viscosity. These methods confirmed and added

knowledge about the dissolving pulp. Also reactivity was compared in

relation to morphology of the cellulose and pulp composition. Based

on the method and software used in the wood analysis projects, a new

method called HSQC-STOCSY was developed. This method is

especially suited for assignment of substances in complex mixtures.

Peaks in 2D NMR spectra that correlate between different samples are

plotted in correlation plots resembling regular NMR spectra. These

correlation plots have great potential in identifying individual

components in complex mixtures as shown here in a metabolic data

set. This method could potentially also be used in other areas such as

drug/target analyses, protein dynamics and assignment of wood

spectra.

Page 5: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

iii

Sammanfattning på svenska

Skogs- och träindustrin är en av de viktigaste basindustrierna i

Sverige, och speciellt i norra Sverige. Med ökande konkurrens i

världen inom skogs- och träindustrin är det viktigt att Sverige är

ledande i utvecklingen mot effektivare och bättre processer.

Utvecklingen börjar från att framädla/genmodifiera bättre träd med

specifika egenskaper, till att förbättra processer som billigare

framställer gamla och nya produkter baserat på råmaterialet. För att få

en bättre kontroll över vilka egenskaper som är viktiga parametrar i en

process så behöver även analysmetoderna utvecklas. Dessutom

kommer en ökad användning av biprodukter att ställa ännu högre krav

på analysmetoder för att lättare kunna använda rätt råmaterial och

processinställningar för att det ska vara ekonomiskt lönsamt. Även om

tillgängligheten för vissa analysmetoder som vi använt (exempelvis

NMR) idag är begränsad, så finns det stor möjlighet att (inom ett antal

år) antalet spektroskopiska analysmetoder kommer att öka avsevärt

inom skogsindustrin. Sättet att kombinera multivariata metoder med

olika analysmetoder kommer att förenkla tolkningen av komplexa

prov och hitta resultat som annars skulle varit omöjliga att se.

Teknologiskt kommer det att vara svårt att leda utvecklingen. Så

kunskap, erfarenhet och kreativitet är ett måste för att Sveriges

ekonomi i fortsättningen ska vara lika stark.

Här har nya metoder använts för att analysera prov både från

genmodifierade träd och produkter direkt från industrin.

Kombinationen av NMR och multivariata regressionsmetoder har

visat sig vara en robust, lättolkad och spännande metod som hittar

resultat som annars skulle vara svåra eller omöjliga att upptäcka.

För studier av dissolvingmassor fås bra modeller både med råmaterial

och processtyp. Detta kan hjälpa till att både prediktera in okänt

råmaterial och att justera processer för att få bättre slutprodukt.

Dessutom kan kostsamma analysmetoder i framtiden ersättas med

spektroskopiska analyser vilket kan spara både tid och pengar. Det

kan även ses som ett bra komplement som ger större kunskap om

vedproven.

Page 6: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

iv

För att identifiera molekyler i komplexa prov som innehåller många

ämnen så har vi utvecklat en ny metod, kallad HSQC-STOCSY.

Denna metod bygger på att signaler från samma molekyl korrelerar

(samvarierar) mellan olika prov och man kan därigenom extrahera ett

spektrum som endast innehåller signaler från en komponent från en

blandning. Detta kan anändas för att tillordna exempelvis spektran

från metaboliska studier eller vedprov. Men här finns även stor

potential i studier av proteiner eller interaktioner mellan proteineroch

ligander..

Page 7: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

v

Abbreviations

CP-MAS cross polarization magic angle spinning

DA discriminant analysis

DMSO Dimethyl sulfoxide

FT-IR Fourier transforms infrared spectroscopy

HSQC heteronuclear single quantum coherence spectroscopy

HW hardwood

MVA multivariate data analysis

NMI N-Methyl imidazole

NMR nuclear magnetic resonance spectroscopy

OPLS orthogonal projections to latent structures

OSC Orthogonal Signal Correction

PCA principal component analysis

PLS partial least squares

STOCSY statistical total correlation spectroscopy

SW softwood

Page 8: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

vi

List of papers

The work presented in this thesis is based on these papers:

I. Hedenström, M., Wiklund-Lundström, S., Öman, T., Lu, F.,

Gerber, L., Schatz P., Sundberg, B., Ralph, J. Identification

of Lignin and Polysaccharide Modifications in Populus

Wood by Chemometric Analysis of 2D NMR Spectra from

Dissolved Cell Walls. Mol Plant 2, 933-942 (2009).

II. Strunk, P*., Öman, T*., Gorzsas, A., Hedenström, M. &

Eliasson, B. Characterization of dissolving pulp by

multivariate data analysis of FT-IR and NMR spectra.

Nordic Pulp & Paper Research Journal 26, 398-409 (2011).

III. Öman, T*., Strunk, P*.,Eliasson, B., Hedenström, M.

Reactivity of dissolving pulp analyzed with multivariate

data analysis of XRD and NMR data. (Manuscript)

IV. Öman, T., Tessem, M-B, Bertilsson, H., Angelson, A.,

Bathen, T., Antti, H., Hedenström, M., Andreassen, T.

Identification of metabolites from 2D 1H-13C HSQC NMR

using peak correlation plots. (Submitted).

Equal contribution from authors *

Papers not included in this thesis.

Öhman, A., Öman, T., Oliveberg, M. Solution Structures

and backbone dynamics of the ribosomal protein S6 and its

permutant P(54-55). Protein Sci. 19, 183-189. (2010)

Haglund, E., Lind J., Öman, T., Öhman, A., Mäler, L.,

Oliveberg, M. The HD-exchange motions of ribosomal

protein S6 are insensitive to reversal of the protein-folding

pathway. Proc. Nat. Natl. Acad. Sci. USA 106(51), 21619-

21624. (2009).

Page 9: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

vii

Table of Contents

Abstract ii

Sammanfattning på svenska iii

Abbreviations v

List of papers vii

1. Nuclear Magnetic Resonance (NMR) 1 1.1. History of NMR 2 1.2. Theory of NMR 2 1.3. Experiments 7

2. Other analytical tools in wood science 11 2.1. Fourier Transform Infrared Spectroscopy (FT-IR) 11 2.2. X-ray diffraction (XRD) 12

3. Multivariate Data Analysis 12 3.1. Principal Component Analysis (PCA) 13 3.2. Partial Least Squares Projections to latent structures (PLS) 14 3.3. Orthogonal Signal Correction (OSC) 15 3.4. Orthogonal projections to latent structures (OPLS) 15 3.4.1. Interpretation of the OPLS models 16 3.5. Pre Treatment of data 17

4. Wood structure 18 4.1. Cellulose 19 4.2. Hemicellulose 20 4.3. Lignin 21

5. NMR in wood analysis 22 5.1. Solid-state CP-MAS NMR 22 5.2. 2D-NMR on wood 23

6. Dissolving pulp 26 6.1. Viscose process 27 6.2. Reactivity 28

7. Metabolomics 29 8. Result and Discussion 31

8.1. MVA analyses with 2D NMR on populous wood. 31 8.2. Investigation of dissolving pulp 34 8.3. Correlation analysis of 2D HSQC spectra (HSQC-STOCSY) 39

9. Future perspective and conclusion 44 10. Acknowledgements 46 11. References 47

Page 10: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

viii

Page 11: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad
Page 12: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

1

1. Nuclear Magnetic Resonance (NMR)

The first NMR spectrometer was commercially available more than 50

years ago. Since then, continuous technical development have steadily

increased the usefulness of NMR spectroscopy in many areas of

research.

This thesis focuses on development and application of methods to

combine the NMR method with multivariate data analyses to further

increase the application areas of NMR spectroscopy.

NMR has a wide range of applications, and the development of both

the spectrometer hardware and methods increase the number of

applications even further. Some of the current applications are:

Structure determination of organic molecules and

biomolecules (for example proteins and DNA). An advantage

this method has, in relationship to crystallographic methods, is

the possibility to examine these molecules in solution under

different conditions.

Analysis of complex mixtures of biofluids (with many

different compounds.)

Magic angle spinning (MAS) enables analysis of solid

samples, for example tissues and wood samples.

In medicine, magnetic resonance imaging (MRI) has shown a

large area of applications.

Page 13: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

2

1.1. History of NMR

The phenomenon of Nuclear magnetic resonance (NMR) was

discovered in 1945 by two independent groups lead by Ed Purcell and

Felix Bloch and for their work Purcell and Bloch shared the Nobel

Prize in 1952.1,2 In 1951 the chemical shift was discovered by Packard

and Arnold, and they were followed by many new studies which led to

a rapid development in the field of NMR spectroscopy. For example,

the effect of electron spin populations by Overhauser3, the first 13C

spectra by Lauterbur4 and Holm5, the first two-dimensional NMR

spectra by Jeener and Ernst6,7, fourier transform NMR by

Ernst8,9(Nobel Prize in 1991). These new discoveries together with the

fast technical development, has made NMR important in studying

biological molecules structure and for his work within this area

Wüthrich10-12 was awarded the Nobel Prize in chemistry in 2002. In

2003 Mansfield13 and Lauterbur14 received the Nobel prize in

physiology or medicine for their discoveries concerning magnetic

resonance imaging (MRI). More recently, the introduction of

cryogenically cooled probe heads together with ever increasing field

strengths have further increased the sensitivity and resolution of NMR

spectra.

1.2. Theory of NMR

The theory of Nuclear magnetic resonance (NMR) is very complex

and quantum mechanics is needed to describe certain aspects of NMR

but for a basic understanding, the simpler vector model is a good start.

First, let’s discuss the origin of the signal we observe in NMR. As the

name nuclear magnetic resonance implies, we are observing signals

from atomic nuclei. If the sum of number of neutrons and protons is

odd, these nuclei will have magnetic characteristics and can be viewed

(somewhat simplistically) as small magnets with a magnet field

Page 14: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

3

pointing in random directions. When we place a sample in a strong

external magnetic field, B0, the magnetic field of the nuclei will align

itself either parallel or antiparallel to B0. The parallel (+1/2 or α) state

have lower energy than the anti-parallel (-1/2 or β) state and will

therefore have a higher population. This creates a net magnetic field,

M0, pointing in the same direction as the external magnetic field. The

alignment of the random magnetic fields with the external field is seen

in Figure 1. This situation is true for spin ½ nuclei (explained below)

but can be more complicated for higher order spins. The most

commonly studied nuclei in NMR are 1H, 13C, 15N, 19F and 31P.

Common for these nuclei is that their spin quantum number, I, is ½.

This means that their magnetic quantum number, m, can adopt two

values m = 1/2 and m = -1/2 which corresponds to the - and -states

described above. All nuclei with I = ½ is referred to as spin half

nuclei.

Figure 1. The random magnetic nucleus aligns in a magnetic field.

The size of M0 is determined by the Boltzmann distribution, see

Equation 1.

Equation 1: N-1/2/N1/2 = e∆E/kT

Page 15: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

4

Here N is the population of the higher (-1/2 or β) and lower energy

state (+1/2 or α) of the nuclei, T is the absolute temperature, k is the

Boltzmann constant and E is the energy difference between the two

spin states.

The population difference between the lower and higher energy state

is relatively small and even though a sample contains a gigantic

amount of nuclei, NMR is still considered a relatively insensitive

method. The small population difference is a direct consequence of

the small value of E. This energy difference is dependent on the

strength of the B0-field (figure 2) and this is the reason why we want

to use as strong magnets as possible in NMR spectroscopy but even

with state of the art instruments, the net magnetization from the

sample is very small. The central component of all NMR

spectrometers is a strong magnet that generates the B0 field. These

magnets are superconducting electromagnets cooled with liquid

Helium that can generate magnetic fields up to 25 Tesla (T). In

comparison, the earth’s magnetic field ranges from 25-65 µT

(wikipedia.org).

Figure 2. The Sensitivity of NMR increases with higher magnetic field.

In order to observe any signal from the nuclei, we need to perturb the

net magnetization from the equilibrium position. This is done by

irradiation of the sample with short pulse of frequency in the MHz-

Page 16: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

5

range (radiofrequency). This will induce transitions between the α-

and β-states which gives us a means to manipulate the net

magnetization from the sample. In addition to aligning itself to B0, the

magnetic field of the individual nuclei will also precess around the

magnetic field (B0) with a frequency called the Larmor frequency

(Figure 3). This frequency is different for different nuclei, for example

all protons (1H) have a frequency of approximately 600 MHz in a

magnet with a field strength of 14.1 T.

Figure 3. Motion of a nucleus in a magnetic field.

The irradiation frequency is applied perpendicular to B0 and will

create its own magnetic field, called B1. So, in addition to the

precession around B0 the spins will also rotate around B1. This will

result in a rather complex motion of M0 but the key thing is that M0

will be tilted away from the direction of B0, Figure 4. After the pulse,

Page 17: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

6

M0 will rotate with its Larmor frequency in the x,y-plane and this

rotating magnetization vector will induce a current in the receiver coil

located next to the sample. This oscillating current, called the Fee

Induction Decay (FID) can then be fourier transformed to create an

NMR spectrum. NMR would not be a very useful analytical tool if we

only observe one signal at the Larmor frequency for a certain nucleus

and this is certainly not the case. The reason for this is that the exact

frequency of a spin depends on its chemical environment within a

molecule leading to small alterations of the Larmor frequency called

the chemical shift. The frequency range that we normally are

interested in is the chemical shift which is in the kHz range instead of

the MHz range of the Larmor frequency. Therefore, the “carrier

frequency” which is the frequency of the RF-pulse is subtracted from

the FID before the fourier transform.

Figure 4. The net magnetization (M0) aligned along the magnetic field (B0). Radio

waves (RF pulse) will move the M0 around the X-axis.

Page 18: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

7

1.3. Experiments

In this thesis three different NMR experiments have been used

extensively and will thus be explained in more detail. These

experiments are 1D 1H NMR, 2D 1H-13C Heteronuclear Single

Quantum Coherence (HSQC) NMR and solid-state 13C Cross-

Polarization Magic Angle Spinning (CP-MAS) NMR.

In a regular 1D-1H spectra only protons are observed (figure 5). If

maximum sensitivity and quantification of peaks is the aim of the

study this experiment is preferred. 1D 1H spectra are recorded

relatively fast and are quantitative (if appropriate parameters are

used). 1D-1H spectra are therefore suitable for projects with many

samples like metabolomics. The drawback is the amount of overlap

between peaks and a combination of the 1D-1H spectra with statistics

have often shown to be the best combination to solve this problem.15-

17

Figure 5. A typical 1D-1H NMR spectrum (acetylated cellobiose).

Page 19: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

8

The 2D 1H-13C HSQC spectrum shows 1H connected to a 13C. The

extra dimension compared to a 1D spectrum will resolve a lot of the

overlap seen in for example a 1H spectrum of a complex mixture.

Also, the 13C chemical shifts give a lot of useful information for

assignment purposes. HSQC spectra have been used a lot for

characterization of both small organic molecules and proteins. For

mixtures, these spectra (figure 6) can be seen as a detailed chemical

fingerprint of the sample

Figure 6. Typical HSQC spectra showing protons (1H) connected to a carbon (13C).

This spectrum is the metabolites in the synthetic sample from article IV.

The basic principle of the HSQC (and the similar HMQC) experiment

is that the 1J-coupling between 1H and 13C is used to transfer

polarization from 1H to 13C. The 13C magnetization is then rotating in

the x.y-plane during the evolution period. Before acquisition,

polarization is transferred back to 1H. 13C-decoupling is active during

the acquisition time to increase the sensitivity of the experiment and

Page 20: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

9

simplify the appearance of the spectrum. Figure 7 shows the pulse

sequence for a typical HSQC experiment using adiabatic 13C pulses.18-

20 Adiabatic pulses are pulses that vary in frequency with time during

the pulse. Their advantage is that they are relatively insensitive to

inhomogeneity in B1 and therefor especially suitable for in vivo

experiments.21

Figure 7. Pulse sequence for a modern HSQC containing adiabatic pulses.

The limitation for HSQC is the high concentration of sample which is

needed since the natural abundance of 13C is only a little more than

1% but the sensitivity of modern NMR spectrometers have enabled

the use of this information-rich experiment also for rather dilute and

complex samples such as biofluids.22,23

Solid-state 13C CP-MAS NMR spectroscopy has been used

extensively24-32 for cellulose and pulp analysis and is particularly

useful for analysis of cellulose crystallinity. NMR spectra from solids

have very broad line widths compared to NMR from a sample in

solution, see Figure 8.

Page 21: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

10

Figure 8. Typical 13C CP-MAS spectrum. Sample from dissolving pulp.

The main reason for this is the large chemical shift anisotropy in solid

samples. To avoid this and simulate the isotropic tumbling of a

molecule in a solution, samples in solid-state NMR are subjected to a

mechanical rotation of the sample. Samples are rotated at high speed

(4-20 kHz) at the magic angle 54.47◦ compared to the magnetic field

see Figure 9. However, even at high rotation speeds, the line-widths

are significantly broader than in solution because there are other

factors responsible for broad lines that cannot be fully removed by

spinning.

Page 22: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

11

Figure 9. Magic Angle Spinning (MAS). A solid sample rotating in an angle at

54.74◦ against the static magnetic field B0.

2. Other analytical tools in wood science

FT-IR and X-ray diffraction have been used in paper II and paper III,

respectively and although the actual analyses were performed by

collaborators I have included a short description of these methods.

2.1. Fourier Transform Infrared Spectroscopy (FT-

IR)

Even though the penetration depth is limited in FT-IR (ATR) the

method is often used in pulp and paper industry. It is used both in on-

Page 23: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

12

line and off-line processes.33 Vibrational spectroscopy has high

sensitivity to steric factors. It has also been shown to have benefits in

robustness which makes FT-IR a good tool for classifying between

different types of discriminants.34 Another benefit for FT-IR is the

relatively low cost. The major drawback of FT-IR is that the obtained

spectral information is not always that easy to interpret. However,

structural composition of cellulose have been examined by Zhbankov

et al.35,36

2.2. X-ray diffraction (XRD)

X-ray diffraction has been used to study the crystalline structure of

cellulose for a long time.37 Compared with solid state 13C CP-MAS

NMR, XRD gives more information about the crystalline regions and

less information about the non-crystalline regions. Even though the

cellulose crystallinity has been studied for a long time, there are

uncertainties about the structures and unit cells of the different

polymorphs of cellulose.

3. Multivariate Data Analysis

In today’s science, it is common to work with data from a large

amount of samples and where many variables describe each sample.

In order to get an overview of such large and often complex data, and

even more so to interpret changes in the data, multivariate statistical

methods are required. Multivariate projection methods have

successfully been used for interpretation and modelling of complex

Page 24: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

13

data. Common for these methods are that the data is arranged in a data

matrix, X with N rows (samples) and K columns (variables). For

projection methods to be useful, variables in the X matrix has to

correlate with each other, which is true in almost all cases with a high

K. By finding the underlying structures based on the variation in X,

projection methods models the variation in a few orthogonal latent

variables, which facilitates interpretation. Using NMR as an example,

each row in X is a sample spectrum and each variable refers to an

intensity value along the ppm scale. It is a high probability that peaks

occurring at different ppm values are varying in the same way, since

peaks from the same molecule show the same variation pattern over

all samples. A projection method could thus decrease the amount of

variables to a few latent variables or components focusing on the

variation in the data. By extracting the components describing a large

proportion of the variation in the data, the loss of information should

be small. In the thesis Principal Component Analysis (PCA), Partial

Least Squares Projection to latent structures (PLS) and Orthogonal

Projection to Latent Structure (OPLS) are the multivariate projection

methods used to the acquired NMR data.

3.1. Principal Component Analysis (PCA)

PCA is used for explaining as much as possible of the variation of the

data in as few components (principal components) as possible.38-42

PCA which is unsupervised, is suitable for a first overview of the data:

-finding trends, - outliers, - groups of observations with common

properties. Mathematically, the PCA is an orthogonal transformation

of the data matrix X into linearly uncorrelated variables (principal

components) meaning that the matrix X is divided into two reduced

orthogonal matrixes for scores T and loadings P, see equation 2. E is

here the residual variation in X not explained by the model. The score

values, ti, give the position of the object to the plane while the loading

Page 25: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

14

values give the direction of the plane, see Figure 10.

Figure 10. Projection of observation in a three-dimensional space onto a two-

dimensional plane consisting of two principal components, PC1 and PC2.

Equation 2: X = TP’ +E

Each principal component PC, consists of one score vector ti and one

loading vector pi. By plotting the score values for each principal

component, against the score value for another principal component a

score plot is achieved. In this score plot, the pattern between different

observations is studied. With the corresponding loading plot, one

obtain knowledge about which variables that contribute to the pattern

seen in the scores.

3.2. Partial Least Squares Projections to latent

structures (PLS)

PLS is a supervised multivariate regression model used to find the

relationship between the matrix X and the response matrix Y.43,44

Page 26: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

15

PLS uses the latent scores of the X matrix in order to describe both the

variation in X (see equation 3) and Y (equation 4). This is in contrast

to PCA which only uses the variation in X. The response matrix Y

could be either a class identity or a regular response value depending

on the problem addressed. The purpose can be both to predict Y from

X and/or to find the variation in X that have influence on the Y.

Equation 3 X = TPT + E

Equation 4 Y = TCT + F

3.3. Orthogonal Signal Correction (OSC)

OSC is a method for removing the systematic variations in X that are

uncorrelated to Y.45 OSC is suitable for removing these systematic

patterns before modelling. The drawback of this method is the

difficulties to validate the models of for example PLS when an OSC

filtering has changed the data and two different methods have to be

validated.

3.4. Orthogonal projections to latent structures

(OPLS)

Orthogonal projections to latent structures (OPLS) is refinement of

PLS by means of an integrated orthogonal signal correction filter of

the regression method Partial Least Squares.43 OPLS divides the

systematic variation of the response matrix (Y), into two blocks: the

predictive (Tp) component which describes the covariance between X

Page 27: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

16

and Y and the orthogonal variance (To) which is the systematic

variation not related to the response Y (see Equation 5). 46

Equation 5: X = TPPP’ + ToPo’ + E

The separation between the predictive and orthogonal variation makes

the OPLS easier to interpret compared to the PLS. see Figure 11.

Figure 11. The difference between the PLS and OPLS method.

OPLS is commonly used for discriminant analyses (DA) e.g. to

investigate differences between two classes of samples and where Y

describes the class belonging. .

3.4.1. Interpretation of the OPLS models

Interpretation of the OPLS models is similar to interpretation of PCA

models. An example from 2D-NMR data is given in Figure 12 OPLS-

DA was used to discriminant between softwood (SW) and hardwood

(HW) based on their NMR spectral profiles, see paper II for more

details. The samples are separated into two groups in the OPLS-DA

score plot (Figure 12a) and in the corresponding loading plot (Figure

12b) interpretation of which peaks that contribute to the separation can

be carried out. It was concluded that these peaks stemming from the

molecule Mannose (M) were higher in spectra of softwood samples,

while other peaks where higher in the spectra of hardwood samples.

Page 28: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

17

Figure 12. OPLS-DA model of 2D NMR data with type of wood (Hardwood and

Softwood) as discriminant. a) Score plot with SW colored in red and HW colored in

black. b) Corresponding loading plot that separates SW from HW. Positive p-values

colored in red and negative p-values colored in black.

3.5. Pre Treatment of data

Before modeling, some obligatory steps are necessary. These are

alignment of spectra and mean-centering. Mean centering (m.cent),

Equation 4, increases the interpretability of the model. Here xi is the

different variables.

Some optional steps that often are useful are scaling, normalization

and noise reduction. Scaling to unit variance (UV) is the most used

scaling. This gives each variable the same opportunity of influencing

the data. Equation 5 is auto scaling, which is a combination of UV

scaling and mean centering. Thus the UV scaling is done by dividing

the mean centered variables with their standard deviation (σ).

Equation 4. xixicentm .

Equation 5. /)(. xixiUVcentm

Page 29: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

18

For 2D-NMR data, the pre-treatment of the data and the

transformation of the two dimensional data to an X-matrix are done

with an in-house script (see chapter 5.2).

4. Wood structure

Wood consists of three main components: Cellulose, Hemicellulose

and Lignin. The chemical composition, see table 1, varies between

softwood (e.g. pine and spruce) and hardwood (e.g. oak and birch)47

and the largest difference is the composition of the lignin where

hardwood lignin consist more of G- and S- with traces of H-unit,

while softwood are composed mostly of the G-unit.48

Table 1. Wood composition in softwood and hardwood.

Softwood (%) Hardwood (%)

Cellulose 40-44 43-47

Hemicellulose 25-29 25-35

Lignin 25-31 16-24

Extractives 1-5 2-7

Page 30: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

19

4.1. Cellulose

Cellulose is the main constituent of wood and is a polymer of D-

Glucose units, see Figure 13, connected with (1→4) glycosidic

bonds and exists in several different allomorphs.49 Cellulose is

relatively stable (more crystalline) compared to other carbohydrate

polymers in the wood. Even though cellulose is very abundant and

thoroughly studied, the structures of the different polymorphs are still

under debate. This shows that this is a difficult task that still is an

ongoing project for many researchers.

Figure 13. Picture of the cellulose chain.

Native Cellulose is called cellulose I with the most common structures

cellulose Iα and cellulose Iβ. Wood has a higher amount of cellulose

Iβ compared to algae and bacteria. In the crystal structure of cellulose

I, the cellulose chains are ordered in a parallel fashion.

Cellulose II can be obtained by two different processes: regeneration

and mercerization of the natural cellulose I polymorph. This reaction

is irreversible and cellulose II is thermodynamically more stable than

the cellulose I structure.

In contrast to cellulose I, the chains in regenerated cellulose II are

arranged in antiparallel way consisting of a two-chain unit cell. There

are also indications of more hydrogen bonding between the sheets.

Page 31: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

20

4.2. Hemicellulose

Some examples of hemicelluloses are xylan, glucoronoxylan,

arabinoxylan, glucomannan and xyloglucan. Hemicelluloses consist of

shorter chain lengths than cellulose. Other differences are that

hemicelluloses are branched in contrary to cellulose and they are also

amorphous. Hemicelluloses contain many different sugar monomers.

D-xylose and D-mannose are common building units (sugar

monomers). Most hemicelluloses are easily hydrolyzed and can

therefore be removed quite easily in the pulping process. The most

common hemicelluloses are glucomannan (figure 14 A) and xylan

(figure14 B). In softwood, galactoglucomannan is more common than

glucomannan.50

A)

B)

Figure 14. Representation of (A) glucomannan (R=H or CH3CO) and (B) xylan,

two of the most common hemicelluloses.

Page 32: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

21

4.3. Lignin

Lignin is the second most common biopolymer in wood. The removal

of lignin is the key factor in the pulp industry. For industry there are

possibilities to affect the efficiency of the pulping process. Lower

amount of lignin or the ratio guaiacyl/syringyl without lowering the

defense or structure of the trees would decrease the cost of producing

the pulp.51 Lignin is a heterogeneous polyaromatic structure built out

of three different monolignols: paracoumaryl alcohol, coniferyl

alcohol and synapyl alcohol. The monolignins build up the lignin to

three different forms of lignin: guaiacyl (G), syringyl (S) and p-

hydroxyphenyl (PB), see Figure 15.48 These units can be linked in

various ways to create a very complex structure. The lignin has a

crucial role for mechanical support, resistance to diseases and water

transport.52 The polymerization buildup of the lignin have been

debated and two different theories exist. The first accepted theory is

that the lignin are built up by a radical coupling.53 The other

challenging theory is that the polymerization is under enzymatic

control.54

Figure 15. Chemical structures of the different structures commonly seen in lignin.

S = Syringyl, G = Guaiacyl and PB = para-hydroxybenzoate

Page 33: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

22

5. NMR in wood analysis

5.1. Solid-state CP-MAS NMR

Solid-state CP-MAS 13C NMR have for a long time been used to

study the structure and characteristics of wood and especially

cellulose.1,25,26,31,55-59 The most common parameter studied is the

crystallinity index (CI) which describes the relative amounts of

crystalline and amorphous cellulose. In addition to 13C CP-MAS

NMR, also XRD, FT-IR and Raman spectroscopy can be used to

study cellulose crystallinity. Within each of these analytical

techniques there exist a couple of variations and the absolute numbers

for the determined CI can sometimes vary not only by the analytical

technique used but also as a result on exactly how the data has been

analyzed.60,61 Here a brief introduction to how the CI can be calculated

(or at least estimated) using 13C CP-MAS NMR is presented. The

method is based on the fact that the peak from position 4 (C4) in

cellulose appear as 2 well resolved peaks in a CP-MAS spectrum, one

for crystalline cellulose located at 89 ppm and one for amorphous

cellulose located at 84 ppm.24

The easiest approach is to integrate the crystalline and amorphous

peaks and calculate CI by dividing the crystalline peak area with the

sum of the area of the amorphous and crystalline peaks according to

equation 6. This is a simple approach but also very sensitive to

varying amounts of hemicellulose because peaks from hemicellulose

are overlapping with the C4 peak from cellulose, especially the

amorphous peak.

Equation 6: CI=C4cr / (C4cr + C4am)

Here, C4cr is the peak area of the crystalline peak and C4am the peak

area of the amorphous peak.

Another approach is to subtract a spectrum of amorphous cellulose

amorphous sample from the sample of interest. The amorphous

spectrum are subtracted after a scale factor leading to a residual

spectra which don’t have any negative parts.61 This method is also

Page 34: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

23

relatively easy and quick to use. The peak deconvolution method is

another option based on spectral fitting method where line shape and

peak intensities are fitted to the experimental spectrum.to get more

accurate values of C4cr and C4am The peak deconvolution method can

be done with two peaks, the amorphous and crystalline peak at C4 but

in reality these two peaks consist of several components from different

cellulose polymorphs and hemicellulose. These components can also

be included in the spectral fitting. To get good reliable results from the

peak deconvolution method, good resolution of the spectra is

essential.

A clever approach to get rid of unwanted signals to be able to measure

CI is the linear combination method, by Newman and Hemmingson

that takes advantage of the difference in T2-relaxation rates between

the different components in wood.62 Two sets of spectra are recorded

for each sample, one normal CP-MAS spectrum and one spectrum

with a short spin-lock before acquisition. In the second spectrum,

signals from lignin and hemicellulose will be attenuated because of

their faster T2-relaxation. By a linear combination of the two spectra, a

sub spectrum containing only the cellulose peaks can be extracted.

Multivariate analysis can also be used on CP-MAS spectra to measure

crystallinity or at least to see if there is a variation of cellulose

crystallinity between samples.32,63,64

5.2. 2D-NMR on wood

In order to get high-resolution 2D NMR spectra of wood, it will have

to be dissolved in some way. In 2003 Ralph et al published a method

for dissolving wood samples.65 The purpose is to dissolve the cell wall

material with as little modification in structure of the different

polymers as possible and without doing any isolation/separation. This

method includes ball-milling, swelling in DMSO and N-methyl

imidazole (NMI), acetylation and precipitation into water. After that it

Page 35: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

24

is possible to get fully dissolved (although quite viscous) NMR

samples with CDCl3 as solvent. Spectra with high resolution are

achieved with the drawback that the natural degree of acetylation is

impossible to examine.66 Since then a number of studies of wood

samples with 2D-HSQC experiments have been published.58,67-71

There are also other possibilities with direct dissolution of the wood

sample in DMSO-d6 directly after the ball-milling step.72,73 This

method increases the speed of sample preparation, with the possibility

to examine the natural acetylation. However the gel like samples gives

spectra with slightly broader lines compared to the acetylated samples.

In order to perform multivariate analysis of 2D NMR spectra some

special preparation of the data has to be performed. One approach is to

integrate all peaks of interest and create a peak list that can be used for

multivariate analysis.74 You could also work directly with the time-

domain data before fourier transform.75 Peak integration will

inevitably lead to some loss of information in crowded regions of the

spectra and we therefore opted for a different approach where the full-

resolution fourier transformed 2D spectrum is used without the need

of integrating peaks. The challenge was then to transform a dataset

containing a number of 2D spectra (a three-dimensional matrix) to a

two-dimensional matrix suitable for MVA. For this purpose we

developed a Matlab script with that can import processed spectra

processed in Topspin 3 (Bruker Biospin, Germany), unfold each

spectrum into a row vector and then place each spectrum as a row in

the final data matrix (see Figure 16 for an overview).76 The data

matrix can then be opened in another software for the MVA and we

have used SIMCA-P+ 12 (Umetrics, Sweden) for this purpose.

Loadings from the multivariate models can then be refolded in the

Matlab script to create a 2D loading plot with the same dimensions as

the original 2D spectra and even exported in Bruker format. This

allows us to take advantage of all features in Topspin to interpret the

loadings. Different scaling can of course be used in multivariate

models and this needs to be taken into consideration when the

loadings are visualized. For example, we often use UV-scaled data to

give all peaks equal weight in the model regardless of their intensities

in the original spectra. This is usually a good thing but will make the

interpretation harder because a UV-scaled spectrum looks very

Page 36: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

25

different from a normal spectrum. We therefore added a feature called

back-scaling feature to the script. This procedure is basically the

reverse of the UV-scaling and the loadings from a model with UV-

scaled data are multiplied with the standard deviation of each variable.

This will lead to a 2D loading plot that looks like a normal spectrum

and we can also use this feature to choose cut-off values to visualize

only the most important variables in the model. When UV-scaled data

is used in a model, the loadings will have values between 0 and 1 (or -

1) depending on how important they are for the model. If a cut-off

value of for example 0.9 is used, only the most important variables

will be shown in the loading plot. This approach has previously been

used for 1D NMR spectra.15 This script has been used in papers I-III

and a modified version in paper IV. With this software it is also

possible to include regions of interest and to exclude unwanted

regions such as solvent peaks

Page 37: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

26

Figure 16. Overview of the in-house script for multivariate analyses of 2D-NMR

data. 1) Pre-treatment steps such as alignment, normalization, scaling and unfolding.

2) Noise exclusion and exporting files to external software for MVA. 3) Loadings

from the models are back transformed to spectrum which can be visualized as a 2D

spectrum for interpretation.

6. Dissolving pulp

Dissolving pulp has a low amount of hemicelluloses and lignin

compared to other pulps. It is used in the production of regenerated

cellulose. The dissolving pulp is used to make many cellulose

derivatives, to mention a few: textiles, cellophanes, filter paper and

fillers.

There are two main process types used: The sulfite process and the

sulfate process (or Kraft process), see Figure 17.

The sulfite process produces a pulp with a high purity. The cations can

vary between calcium, magnesium, sodium and ammonium. To

further remove the amount of hemicellulose and lignin, additional

cooking steps is used. These additional cooking steps are necessary

for some wood types, for example pine where cooking steps with

different pH are used. The acidification of the pulp causes

depolymerization of the cellulose and hemicellulose chains.

In the sulfate process (or Kraft process), NaOH and Na2S are added to

remove the lignin. The sulfate process can process all wood types

which is an advantage compared to the sulfite process.

Page 38: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

27

The dissolving pulp is suitable for manufacturing of viscose and

cellulose derivatives (for example the EHEC process) because of the

high purity of the cellulose.

Figure 17. Overview of the different processes for dissolving pulp, Sulfite and Kraft

(or Sulfate).

6.1. Viscose process

The viscose process is the most common method for preparation of

rayon. Cellulose is treated with NaOH and CS2. The result is called

cellulose xanthate. If dissolved in NaOH a yellow solution called

viscose is formed. This solution is forced through a spinneret and

allowed to cool in an acid solution coagulating to fine strands.

Page 39: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

28

6.2. Reactivity

The reactivity of dissolving pulp can be seen as the accessibility of the

cellulose to different chemicals. Crystallinity, fiber length, amount of

hemicelluloses and viscosity are some of the factors that are supposed

to be important for the reactivity. There are some industrially used

methods to describe the reactivity. The most used method used to

measure the reactivity of dissolving pulp is the Fock method.77,78 In

short, the Fock method is the viscose process in lab-scale. A NaOH

(typically 9%) solution with excess CS2 is added to a known amount

of dry pulp to achieve a viscose-like solution. Cellulose is regenerated

by acidification. The cellulose is then quantified with oxidation and

titration. A high Fock value is high pulp reactivity. Another way of

examine the reactivity or quality of the viscose is filter value

determination. The viscose filter clogging value Kw was introduced

by Treiber.79,80 It is a quality and reactivity parameter where the

viscose is pressed through a specific filter under pressure. The Kw

value is related to the difference in volume of liquid that have passed

through the filter at two different times, and to the difference between

the two times. Instead of the Kw value a correction according to the

viscosity can be made. This is the viscosity corrected clogging value

Kr81, see equation 7. A low Kr indicates the quality of the filterability

of the viscose dope is increased.

Equation 7. Kr = Kw/0.4

Gel concentration is a photo analysis to determine the amount of

undissolved material (gel). Gel particles in viscose are generally

thought to consist of undissolved cellulose from the dissolving pulp.

Those can cause severe problems by blocking spinneret holes and

reduce the fiber quality by defects.

Cloud temperature (or flocculation temperature) is the temperature

when a clear solution of a cellulose derivative, typically a cellulose

Page 40: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

29

ether, in water becomes cloudy when warmed82. Clarity (or visibility)

measures the colorization of undissolved particles. Clarity is measured

as the transmittance at 550 nm.

7. Metabolomics

Metabolomics aims to study and quantify cellular metabolites in

biological systems and is evolving. The hypothesis is that the

metabolites in question should describe the physical state of an

organism. The two most commonly used analytical techniques for

metabolite quantification are NMR and mass spectrometry coupled

with a separation technique such as gas or liquid chromatography

(GC-MS or LC-MS).

NMR is suitable for analyzing the complex mixtures we have in

metabolic studies. The samples in these studies are often different

biofluids such as urine, plasma, cerebrospinal fluid and saliva.

Compared to competing/complementary techniques, such as GC-MS

and LC-MS, NMR is insensitive. However, the techniqual advances

and experimental development of NMR, together with its benefits in

being none destructive, robust and relatively easy to quantify, imply

that NMR will be an even more important method in the future. In

addition to the biofluids there is also a possibility to analyze tissue

with magic angle spinning (MAS) experiments. The development of

NMR concerning both technique and new experiments will probably

increase the usefulness of NMR in metabolic studies and two

dimensional NMR is an area that probably will be more utilized for

metabolomics studies in the future. Even though the new SOFAST83-86

experiments are more or less only suitable for 2D 1H-15N HSQC, in

the future, there may exist similar experiments which are useful for

2D 1H-13C HSQC as well. This would have a large importance for the

usefulness of 2D-NMR (1H-13C) experiments because of the reduced

acquisition times. In many metabolomics studies with 1H NMR there

Page 41: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

30

is severe overlap between peaks which makes it more difficult to

properly assign and quantify the data. In 2D-NMR experiments the

problem with overlap can be significantly reduced. In many metabolic

studies, there is easy to obtain a large number of samples. The

complexity of the spectra and large amount of data has divided the

field into two parts: Chemometric approach and quantified

metabolomics (also called magnetic resonance diagnostics or targeted

profiling)87. The biggest difference between these methods is that the

chemometric approach finds pattern with peaks that are important and

tries to identify, while the quantified metabolomics identifies peaks

and quantify them before any statistical method is applied. In the

chemometric approach there exist many methods to analyze the NMR

data. Studies have for example been done with: PCA, PLS, OPLS and

STOCSY.15,17,88-93

Page 42: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

31

8. Result and Discussion

8.1. MVA analyses with 2D NMR on populous wood.

In paper I, we demonstrate how a combination of 2D 13C-1H HSQC

NMR spectroscopy and multivariate data analyses can be used to

visualize the difference between samples. In this case the purpose is to

study the difference between tension wood (TW) and normal wood

(NW) for Populus. This study was the first where 1H-13C HSQC data

from dissolved wood was evaluated with MVA and thus served as a

proof of concept of this approach. Sample preparation and data

analysis was performed as described in chapter 5.2. In short, these

were the steps involved.

1. Pre-treatment of data: Alignment, normalization and noise

exclusion.

2. Unfolding of 2D NMR data so that each spectrum is

transformed to one row in a data matrix.

3. Multivariate data analyses. Either PCA or OPLS-DA to

investigate variations between wood samples.

4. Refolding of loadings to 2D loading spectra analysed in

TOPSPIN.

The first analysis was the comparison of tension wood and normal

wood. Tension wood contains a cellulose-rich layer called the G-layer

and in total, tension wood therefore contains more cellulose than

normal wood. Since tension wood is so well studied, we already had

good knowledge about the differences in composition that we should

observe with more cellulose in tensions wood and relative higher

amounts of lignin and hemicellulose in normal wood. This

Page 43: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

32

comparison was therefore a good model data set to show the power of

our method.

For the tension/normal wood problem the multivariate analysis was

done with a PCA on mean centred data and a good separation in the

score plot between the tension wood and normal wood samples was

obtained (Figure 18a). Red peaks in Figure 18b are positive and are

thus elevated in tension wood compared to normal wood and these

peaks corresponds, as expected, to cellulose. Blue and green peaks are

negative and correspond to lignin and hemicellulose.

Figure 18. PCA of HSQC spectra from tension wood and normal wood. A) Score

plot showing tension wood in red and normal wood in green. B) 2D loading plot

constructed from the loading vector along the first principal component (t1).

We were also interested in studying the lignin composition in more

detail. From the PCA in figure 18, no information about this could be

extracted because the difference in cellulose content is dominating the

model. A second PCA was therefore done where only the region of

the spectra where the aromatic peaks from lignin were included, figure

19.

Page 44: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

33

Figure 19. PCA model on the aromatic region which separates the tension wood

samples (TW) against the normal wood (NW). A) Score plot with samples from TW

positive in t1 and NW negative. B) Corresponding loading plot where red peaks are

positive and black peaks negative.

In this model we could also see a separation between tension wood

and normal wood samples and the loading plot shows that we have

higher amounts of syringyl (S) lignin in tension wood. This result was

also expected based on previous knowledge about tension wood.94,95

An even clearer difference would probably be seen with an OPLS-DA

since the variation not correlated with the response would have been

removed.

The second part of this study was the investigation of wood from

poplar with a down-regulation of the enzyme pectin methyl esterase

(PME). The most interesting result from an OPLS-DA model

comparing the transgenic trees with wild-type was that we could see a

difference in S/G ratio of lignin. Samples from the transgenic trees

had a relatively higher amount of the syringyl compared to the

guaiacyl. These peaks are also highly important for the separation

between transgenic and normal tree, since they are visual even with a

cut off (0.5). The S/G ration is important for the paper industry since

the lower cost of remove the syringyl lignin. This result was not

expected but was confirmed by pyrolysis-MS analysis. The conclusion

from this study is that this approach of using MVA on 2D NMR data

from dissolved wood samples is a useful method.

Page 45: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

34

8.2. Investigation of dissolving pulp

In paper II and III, dissolving pulps from different manufacturers

have been investigated. Paper II is focussed on the study of how

dissolving pulp samples with different characteristics such as

viscosity, raw material used in the production (hardwood or softwood)

and process type used (sulfite and sulfate) differ in their chemical

composition. 2D 1H-13C HSQC NMR, 13C CP-MAS NMR and FT-IR

was used in this study. In paper III we focussed on correlating

different reactivity parameters of these dissolving pulps to 2D HSQC

spectra and data from XRD.

The aim in paper II was to study a broad range of dissolving pulps

with different characteristics. This will give more information about

how the choice of raw material and process type will influence the

final product. The samples were analysed with FT-IR, solid state

NMR and 2D1H-13C HSQC NMR. These analytical methods were all

combined with a multivariate method, either PCA or OPLS. The

viscosity, raw material and process type were used as a response in

these models.

Unfortunately, no reliable models could be calculated for viscosity.

For raw material, models were achieved with data from both FT-IR

and 2D 1H-13C HSQC NMR

In the OPLS model on HSQC data with softwood (SW) and hardwood

(HW) as Y-variables, a decent model was achieved (Q2 = 0.4), see

Figure 20. Softwood samples colored in red are positive in tp1 and

have more of the positive peaks (red) in the corresponding loading

plot (Figure 20b). It is clear that Mannose (M) is highly important for

the model that describes the difference the softwood from hardwood

samples. Therefore, at least for these samples, dissolving pulp made

from softwood have relatively higher amount of mannose (from

glucomannan) than hardwood pulp The model of the FT-IR data was

harder to interpret but could nevertheless be used to classify

dissolving pulps made from softwood or hardwood.

Page 46: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

35

Figure 20. OPLS-DA with softwood (SW) and hardwood (HW) as discriminant. a)

Score plot showing softwood pulps in red and hardwood pulps in black. b) 2D

loading plot where positive peaks are shown in red and negative peaks in black.

Also for process type (sulfite and sulfate), a good OPLS-DA based on

HSQC data was achieved with a Q2=0.64.

In the score plot (Figure 21a), samples from the sulfate process are

positive and therefore contains relatively more of positive (red) peaks

in the corresponding loading plot (Figure 21b). The sulfate process

have less of the negative peaks, so less of the reducing end of

cellulose and the sulfite process have the opposite – more of the

reducing end of cellulose (Cred α and Cred β). This loading plot is with

a cut-off at 0.6 showing only the peaks that have the highest impact on

the model. This implies that dissolving pulps from the sulfite process

have more of shorter cellulose chains.

Page 47: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

36

Figure 21. OPLS-DA with process type as discriminant. a) Sulfate samples are

positive in the score plot and sulfite negative. b) Corresponding loading plot to t1. A

cut off at 0.6 used to plot only the peaks that are most important for the model.

The OPLS discussed above was made to look at specific parameters of

the dissolving pulps. An unsupervised PCA of both 2D HSQC and 13C

CP-MAS data was also performed and these models show that what

mostly differs between the samples that were analyzed, are which

producers they derive from. This is no surprise because there are many

steps in a process which are similar if derived from the same

producers. The solid-state 13C NMR combined with PCA is quite

interesting because difference in cellulose crystallinity could be

detected, Figure 22. This is something that cannot be detected by

NMR in solution.

The C4 and C6 peaks from cellulose are split up into two set of peaks,

crystalline (cr) and amorphous (am) part. Observations (observation 1

for example) in the bottom in Figure 22a are negative in t2 in the score

plot. That indicates that it has more of the negative peaks in the

corresponding loading plot p2 plotted in Figure 22b. From the loading

plot, it is clear that sample 1 (and to a lesser degree sample 3) has

more amorphous cellulose than the other samples.

Page 48: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

37

Figure 22. PCA of 13C CP-MAS NMR data. a) Score plot. b) Loading plot showing

p2. CII is most likely a peak from the cellulose II polymorph and cr and am

corresponds to crystalline and amorphous cellulose, respectively.

In paper III the purpose is to correlate 2D 1H-13C HSQC data and

XRD data from dissolving pulp with different reactivity parameters.

Samples from different producers with different reactivity parameters

have been studied. As discussed in chapter 5.2, reactivity is very

important in order to further refine dissolving pulp into other products.

Multivariate models against the different reactivity parameters both

with 1H-13C HSQC NMR data and XRD data showed some interesting

result.

Perhaps the most interesting result is how well the composition of the

hemicellulose in dissolving pulp correlates to reactivity. 1H-13C HSQC

NMR data is very suitable to study difference in the amount of

hemicellulose. Figure 23a is the score plot with gel as response

(Q2=0.55). Positive samples in the score plot have more of positive

peaks in the corresponding loading plot. Samples to the right in the

score plot have higher gel values and these samples have more of

xylose (xylan).The score plot with Vis as the response is shown in

figure 23c). In the score plot, samples with lower Vis value are located

to the left. They have less of positive peaks in the corresponding

loading plot (figure 23d). So samples with low Vis value have low

mannose (glucomannan) content. It is no surprise that the amount of

hemicellulose affect reactivity of dissolving pulps but it is interesting

Page 49: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

38

to note that this result implies that the amount of xylan and

glucomannan have different influence on these two reactivity

parameters.

Figure 23. OPLS with 1H-13C HSQC. a) Score plot of Gel as response. b)

Corresponding loading plot to gel. c) Score plot of Vis as response. d)

Corresponding loading plot to Vis.

The reactivity could also be correlated to crystallinity with XRD data

when the raw data was modeled against Fock and Vis (paper III). A

low visibility had higher crystallinity. For Fock values high reactivity

correlated to a low crystallinity.

We also attempted to calculate crystallinity of those pulp samples with 13C CP-MAS NMR using several different approaches (described in

chapter 5.1). All crystallinity values from both 13C CP-MAS NMR

and XRD are summarized in Table 2.

Page 50: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

39

Table 2. Crystallinity index measured with different techniques comparing different

methods such as linear combination (lin comb), and deconvolution (dec) and peak

height (ph).

Sample NMR

Lin.comb.

NMR(dec)

Lin. comb

NMR(dec) XRD

(ph)

XRD

(dec)

Sample

1 67,1 65,3

62,9 64,6 52,3

Sample

2 68,0 63,4 62,2 56,7 49

Sample

3 65,5 63,3 59,5 59,1 49,7

Sample

4 67,7 66,4 68,9 67,2 54

Sample

5 66,8 65,1 61,7 60,9 51.4

Sample

6 67,3 65,2 64 68,6 55.5

Sample

7 65,1 63,5 61,4 70,5 56.8

Sample

8 67,1 64,4 71,1 59,7 56.6

Sample

9 67,1 65,2 68,3 58,2 51.1

When comparing the CI values calculated with XRD and NMR there

are no typical pattern. This is probably due to the low resolution of the

13C CP-MAS spectra (typical spectra seen in figure 8). Therefor we

chose to rely our crystallinity measurements on the XRD data.

8.3. Correlation analysis of 2D HSQC spectra (HSQC-

STOCSY)

Page 51: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

40

The method used in article IV is an extension of the Statistical Total

Correlation Spectroscopy (STOCSY) method used in metabolomics

studies with 1D 1H NMR.15,17,90,96-100 The purpose of the STOCSY

method is to identify metabolites of interest by using the high

correlation for peaks that originate from the same compound. We have

implemented this approach with 2D NMR data and in addition to

metabolite identification; this method has a number of possible

applications. For example protein dynamics studies, identification of

individual components in mixtures and drug/target studies. The

reason why we wanted to develop the correlation analysis used in

STOCSY to 2D HSQC NMR data is to be able to take advantage of

the reduced overlap between peaks in this type of experiment.

Statistical Total Correlation Spectroscopy performed on 1H spectral

data is based on calculating a full correlation matrix, C, for a set of 1H

NMR spectra (equation 8).15,90,101 The correlation plot can then be

plotted as a 2D plot with “cross-peaks” between peaks that are

correlating to each other. The superficial resemblance between such a

plot and a 2D Total Correlation Spectroscopy (TOCSY) spectrum is

the origin of the name STOCSY.

Equation 8 C = XtX

In equation 8, C is the correlation matrix, X is the data matrix and n is

number of spectra.

Instead of calculating the full correlation matrix, an alternative

approach can been used where only one selected peak is correlated to

the other peaks, equation 9.

Equation 9 cpeak = vpeakt X

In equation 9, cpeak is the correlation vector, vpeak is the vector of the

chosen peak, n is number of spectra and X is the matrix with spectra.

Equation 9 has been used in Diffusion Ordered Projection

Spectroscopy (DOPY). 17

When it comes to 2D NMR data, we chose equation 8 because the full

correlation matrix C will be gigantic considering the very large

Page 52: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

41

number of data points in 2D spectra. In this study, we have shown the

advantage of using this method for identification of individual

components in two sets of complex mixtures. One data set was a

synthetic metabolomics data set and the other a real metabolomics

data set from human biological fluid. The correlation analysis was

implemented in the Matlab script described in chapter 5.2, see Figure

24 and 25. In short, after the unfolding of the 2D spectra, one peak of

interest is chosen and cpeak is calculated. This vector is then refolded to

a 2D correlation plot with the same dimension as the original spectra.

This plot will show all the peaks that are correlated to the chosen

peak. The values of cpeak will vary between 0 (no correlation) and 1

(perfect correlation) and an appropriate cut-of value can be chosen to

only show the peaks originating from the same molecule. Figure 25

shows the actual graphical interface of the Matlab script and some of

the available features.

.

Figure 24. Description of the script for assignment of metabolic samples. Steps 1

and 2 are the same as described previously in paper I. 3) One peak of interest is

Page 53: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

42

chosen from the raw spectra. This peak is correlated to the other peaks using

equation 8. 4) Refolding of correlated peaks higher than a manually chosen cut off

and plotting all of these peaks as an NMR spectrum.

Figure 25. Picture of the Graphical user interface for the HSQC-STOCSY method.

In the first part of the study, 28 common metabolites were mixed and

out of these 7 (2 in the same way) were varied according to a

fractional factorial design (26-2). This resulted in a total of 19 samples.

One peak from each varied metabolite was chosen in the correlation

analysis which resulted in six correlation plots. From these, all of the

varied metabolites could be assigned and the assignments were

confirmed by comparison with data from HMDB.ca. The correlation

plots were achieved with a correlation cut off above 0.8. The cut off

can easily be varied which also is needed for some metabolites

especially in a crowded area of the spectrum. Two examples of

correlation plots are shown in Figure 26.

Page 54: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

43

Figure 26. Example of HSQC-STOCSY correlation plots. a) Assigned

Proline/Fructose spectra. B) Assigned α-D-Maltose spectra.

For the next part, real samples from a metabolic study were used to

study the assignments of these spectra. Plotting of peaks with high

correlations could be achieved and even though some of the molecules

didn’t exist in the HMDB.ca database the correlation plots are very

suitable for assignment of metabolic experiments. All of the

metabolites (both synthetic and real samples) were successfully

assigned using the HMDB.ca database.87,102-104

Page 55: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

44

9. Future perspective and conclusion

With the new method to dissolve wood samples to achieve well

resolved 2D 1H-13C HSQC NMR published by the group of John

Ralph in 2003,65 a new and important area of scientific studies

opened. This method (used in paper I-III), combined with multivariate

statistics has proven to be a good method for analysing wood samples

and products. In paper I, the difference between tension wood and

normal wood was studied and this study showed that this approach is

very useful to investigate differences both in the carbohydrate and

lignin composition between different types of wood samples, for

example between different transgenic trees. We showed that

differences in the S/G ratio of lignin in a transgenic line of poplar

could be detected. The S/G-ratio is important because it has an effect

on how efficiently lignin can be removed from wood in the pulping

process.

Dissolving pulps from different producers have been examined in

Paper II-III. We showed with different spectroscopic techniques

(primarily 2D 1H-13C HSQC NMR) that there are differences in our

samples both according to raw material used in the production and

type of process (sulfite or sulfate). There are differences in

hemicellulose composition between softwood and hardwood pulp.

The other difference is that dissolving pulp from the sulfate process

have higher amount of shorter cellulose chains. These findings are not

revolutionary in any way in themselves but the fact that all of this

information can be found by a single NMR experiment on each

dissolving pulp sample is very promising.

Reactivity parameters for the dissolving pulp samples were also

correlated to 2D HSQC data and XRD data using MVA. XRD data

showed that crystallinity of cellulose has an effect on reactivity and

with 2D NMR data we could show that the hemicellulose content and

composition is important for reactivity.

Page 56: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

45

Many of the analytical methods used today for characterization of

wood and pulp are very time consuming and expensive. The

spectroscopic methods used in these thesis will probably not replace

those (at least not in the pulp industry) but we have shown that NMR,

and specifically 2D HSQC NMR is a powerful tool in wood analysis.

More knowledge about cellulose and other wood components is still

needed and research in this field therefore continues to be important.

Comparing different spectroscopic methods according the cellulose

morphology is one subject that should be further examined. Inspired

by the multivariate approach it would, for example, be interesting to

examine and comparing the methods with O2PLS.

In paper IV we developed a new approach for identification of

individual components from 2D 1H-13C HSQC spectra on complex

mixtures. This is based on the STOCSY approach that has been used

on 1D NMR data but we have extended it to be used for 2D-NMR.

With this method we were able to extract sub spectra (or correlation

plots) of several different metabolites from a metabolomics data set.

These spectra could be unambiguously assigned

Such correlation plots could also be important in other areas such as

protein dynamics (and assignment) and drug/target analyses.

In HSQC spectra of wood, there are still peaks that are unassigned and

the use of 2D correlation plots could assist the assignment of more

peaks in these spectra.

Page 57: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

46

10. Acknowledgements

Tack till alla som samarbetat och gjort det här möjligt!

Extra tack till..

Mattias, min handledare under alla år. Tack för chansen och vilken

fantastisk handledare du är! Jag har verkligen inte nånting att klaga på

under alla dessa år!

Henrik, min biträdande handledare. Tack för all hjälp, alla ideer och

roliga diskussioner. Man kan ju tro att jag följer efter dig men jag

hann i alla fall först tillbaka till Piteå.

Bertil och Peter för erat samarbete med dissolvingmassorna.

Nils, Andrãs och Tryggve och alla andra medförfattare till artiklarna.

Gruppen! Pär ,Anna, Elinarna, Lina.

Familjen Morssy, som lät mig bo hur mycket som helst!

Lättström, så många trevliga tillfällen...när flyttar ni till Piteå?

Moens, som delat skjutsning till alla träningar.

Pelle, Maria J, Barbro och Carina för all praktisk hjälp.

Beachgubbarna Micke, Isak, Tom och Rami som förgyllt

fredagslunchen med roliga matcher...

Ett extra stort tack till familjen! Mia, Melker, Sixten

och Svante...ni är ju bara bäst!

Resterande ur familjen.. Mamma, pappa, Anders med familj... och

givetvis Marianne, Rickard, Lasse och Monica. För all hjälp med

allting!.

Page 58: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

47

11. References

1. Purcell, E.M., Torrey, H.C. & Pound, R.V. Resonance Absorption by

Nuclear Magnetic Moments in a Solid. Physical Review 69, 37-38 (1946).

2. Bloch, F., Hansen, W.W. & Packard, M. The Nuclear Induction Experiment. Physical Review 70, 474-485 (1946).

3. Overhauser, A.W. Paramagnetic Relaxation in Metals. Physical Review 89, 689-700 (1953).

4. Lauterbur, P.C. C[sup 13] Nuclear Magnetic Resonance Spectra. The

Journal of Chemical Physics 26, 217-218 (1957).

5. Holm, C.H. Observiation of Chemical Shielding and Spin Coupling of C13 Nuclei in Various Chemical Compounds by Nuclear Magnetic Resonance. Journal of Chemical Physics 26, 707-708 (1957).

6. Nagayama, K., Wuthrich, K., Bachmann, P. & Ernst, R.R. Two-dimensional J-resolved 1H n.m.r. spectroscopy for studies of biological macromolecules. Biochem Biophys Res Commun 78, 99-105 (1977).

7. Nagayama, K., Wuthrich, K. & Ernst, R.R. Two-dimensional spin

echo correlated spectroscopy (SECSY) for 1H NMR studies of biological macromolecules. Biochem Biophys Res Commun 90, 305-311 (1979).

8. Bartholdi, E. & Ernst, R.R. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9-19 (1973).

9. Schäublin, S., Höhener, A. & Ernst, R.R. Fourier spectroscopy of nonequilibrium states, application to CIDNP, overhauser experiments and relaxation time measurements. Journal of Magnetic Resonance (1969) 13, 196-216 (1974).

Page 59: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

48

10. Wider, G., Macura, S., Kumar, A., Ernst, R.R. & Wüthrich, K. Homonuclear two-dimensional 1H NMR of proteins. Experimental procedures. Journal of Magnetic Resonance (1969) 56, 207-234 (1984).

11. Williamson, M.P., Havel, T.F. & Wüthrich, K. Solution conformation

of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. Journal of Molecular Biology 182, 295-315 (1985).

12. Williamson, M.P., Marion, D. & Wüthrich, K. Secondary structure in the solution conformation of the proteinase inhibitor IIA from bull seminal plasma by nuclear magnetic resonance. Journal of Molecular Biology 173, 341-359 (1984).

13. P., M. Multi-planar image formation using NMR spin echoes. . Journal of Physical Chemistry: Solic State Phys 10, 55-58 (1977).

14. Lauterbur, P.C. Image formation by unduced local interactions:

examples of employing nuclear magnetic resonance. Nature 242, 190-191 (1973).

15. Cloarec, O., et al. Statistical total correlation spectroscopy: An exploratory approach for latent biomarker identification from metabolic H-1 NMR data sets. Analytical Chemistry 77, 1282-1289 (2005).

16. Cloarec, O., et al. Evaluation of the Orthogonal Projection on Latent Structure Model Limitations Caused by Chemical Shift Variability and Improved Visualization of Biomarker Changes in 1H NMR Spectroscopic Metabonomic Studies. Analytical Chemistry 77, 517-526 (2004).

17. Smith, L.M., et al. Statistical correlation and projection methods for

improved information recovery from diffusion-edited NMR spectra of biological samples. Analytical Chemistry 79, 5682-5689 (2007).

18. Cavanagh, J., Palmer Iii, A.G., Wright, P.E. & Rance, M. Sensitivity improvement in proton-detected two-dimensional heteronuclear relay spectroscopy. Journal of Magnetic Resonance (1969) 91, 429-436 (1991).

19. Schleucher, J., et al. A GENERAL ENHANCEMENT SCHEME IN HETERONUCLEAR MULTIDIMENSIONAL NMR EMPLOYING PULSED-FIELD GRADIENTS. Journal of Biomolecular Nmr 4, 301-306 (1994).

Page 60: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

49

20. Willker, W., Leibfritz, D., Kerssebaum, R. & Bermel, W. Gradient selection in inverse heteronuclear correlation spectroscopy. Magnetic Resonance in Chemistry 31, 287-292 (1993).

21. Tannús, A. & Garwood, M. Adiabatic pulses. NMR in Biomedicine

10, 423-434 (1997).

22. Nicholson, J.K., Foxall, P.J., Spraul, M., Farrant, R.D. & Lindon, J.C. 750 MHz 1H and 1H-13C NMR spectroscopy of human blood plasma. Anal Chem 67, 793-811 (1995).

23. Holmes, E., et al. 750 MHz 1H NMR spectroscopy characterisation of the complex metabolic pattern of urine from patients with inborn errors of metabolism: 2-hydroxyglutaric aciduria and maple syrup urine disease. Journal of pharmaceutical and biomedical analysis 15, 1647-1659 (1997).

24. Atalla, R.H., Gast, J.C., Sindorf, D.W., Bartuska, V.J. & Maciel, G.E.

C-13 Nmr-Spectra of Cellulose Polymorphs. Journal of the American Chemical Society 102, 3249-3251 (1980).

25. Isogai, A., Kato, T., Uryu, T. & Atalla, R.H. Solid-State Cp/Mas C-13-Nmr Analysis of Cellulose and Tri-O-Substituted Cellulose Ethers. Carbohyd Polym 21, 277-281 (1993).

26. Isogai, A., Usuda, M., Kato, T., Uryu, T. & Atalla, R.H. SOLID-STATE CP MAS C-13 NMR-STUDY OF CELLULOSE POLYMORPHS. Macromolecules 22, 3168-3172 (1989).

27. Hult, E.L., Larsson, P.T. & Iversen, T. A comparative CP/MAS C-13-

NMR study of cellulose structure in spruce wood and kraft pulp. Cellulose 7, 35-55 (2000).

28. Hult, E.L., Larsson, P.T. & Iversen, T. A comparative CP/MAS C-13-NMR study of the supermolecular structure of polysaccharides in sulphite and kraft pulps. Holzforschung 56, 179-184 (2002).

29. Hult, E.L., Liitia, T., Maunu, S.L., Hortling, B. & Iversen, T. A CP/MAS C-13-NMR study of cellulose structure on the surface of refined kraft pulp fibers. Carbohyd Polym 49, 231-234 (2002).

30. Iversen, T., Hult, E.L., Larsson, P.T. & Wickholm, K. CP/MAS C-13

NMR spectroscopy applied to structure studies on cellulose I. Abstr Pap Am Chem S 219, U279-U279 (2000).

Page 61: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

50

31. Larsson, P.T., Hult, E.L., Wickholm, K., Pettersson, E. & Iversen, T. CP/MAS C-13-NMR spectroscopy applied to structure and interaction studies on cellulose I. Solid State Nucl Mag 15, 31-40 (1999).

32. Wickholm, K., Hult, E.L., Larsson, P.T., Iversen, T. & Lennholm, H.

Quantification of cellulose forms in complex cellulose materials: a chemometric model. Cellulose 8, 139-148 (2001).

33. Kester, M., Trung, T., Leclerc, D. & Carver, J. On-line determination of kraft liquor constituents by fourier-transform near infrared spectroscopy. J Pulp Pap Sci 30, 121-128 (2004).

34. Naumann, D. Infrared Spectroscopy in Microbiology. in Encyclopedia of Analytical Chemistry (John Wiley & Sons, Ltd, 2006).

35. Zhbankov, R.G., Korolevich, M.V., Derendyaev, B.G. & Piottukh-

Peletsky, V.N. Structural similarity and modeling of infrared spectra of molecules of organic compounds. Journal of Molecular Structure 744, 937-945 (2005).

36. Zhbankov, R.G., Zhbankova, M.R., Baran, J., Marchewka, M. & Ratajczak, H. Anomalously high intensity of bands in the FTR-spectra of individual kinds of polymers. Journal of Molecular Structure 744, 585-587 (2005).

37. Meyer, K.H. & Misch, L. Positions des atomes dans le nouveau modèle spatial de la cellulose. Helvetica Chimica Acta 20, 232-244 (1937).

38. Pearson, K. On Lines and Planes of Closest Fit to Systems of Points

in Space. Philosophical Magazine 2, 559-572 (1901).

39. Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 417-441 (1933).

40. Jolliffe, I. Principal Component Analysis. in Encyclopedia of Statistics in Behavioral Science (John Wiley & Sons, Ltd, 2005).

41. Wold, S., Esbensen, K. & Geladi, P. PRINCIPAL COMPONENT

ANALYSIS. Chemometrics Intell. Lab. Syst. 2, 37-52 (1987).

Page 62: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

51

42. Jackson, G.M., Mason, I.M. & Greenhalgh, S.A. Principal Component Transforms of Triaxial Recordings by Singular Value Decomposition. Geophysics 56, 528-533 (1991).

43. Geladi, P. & Kowalski, B.R. Partial Least-Squares Regression - a

Tutorial. Anal Chim Acta 185, 1-17 (1986).

44. Gerlach, R.W., Kowalski, B.R. & Wold, H.O.A. Partial Least-Squares Path Modeling with Latent-Variables. Anal Chim Acta-Comp 3, 417-421 (1979).

45. Wold, S., Antti, H., Lindgren, F. & Öhman, J. Orthogonal signal correction of near-infrared spectra. Chemometrics Intell. Lab. Syst. 44, 175-185 (1998).

46. Trygg, J. & Wold, S. Orthogonal projections to latent structures (O-

PLS). J Chemometr 16, 119-128 (2002).

47. Thomas R, J. Wood: Structure and Chemical Composition. in Wood Technology: Chemical Aspects, Vol. 43 1-23 (AMERICAN CHEMICAL SOCIETY, 1977).

48. Boerjan, W., Ralph, J. & Baucher, M. LIGNIN BIOSYNTHESIS. Annual Review of Plant Biology 54, 519-546 (2003).

49. Hayashi, J., Sufoka, A., Ohkita, J. & Watanabe, S. CONFIRMATION

OF EXISTENCE OF CELLULOSE IIII, IIIII, IVI, AND IV(II) BY X-RAY METHOD. Journal of Polymer Science Part C-Polymer Letters 13, 23-27 (1975).

50. Maki-Arvela, P., Salmi, T., Holmbom, B., Willfor, S. & Murzin, D.Y. Synthesis of sugars by hydrolysis of hemicelluloses--a review. Chemical reviews 111, 5638-5666 (2011).

51. Lapierre, C., et al. Structural alterations of lignins in transgenic poplars with depressed cinnamyl alcohol dehydrogenase or caffeic acid O-methyltransferase activity have an opposite impact on the efficiency of industrial kraft pulping. Plant Physiol 119, 153-164 (1999).

52. Simmons, B.A., Loqué, D. & Ralph, J. Advances in modifying lignin

for enhanced biofuel production. Current opinion in plant biology 13, 312-319 (2010).

Page 63: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

52

53. Vanholme, R., Morreel, K., Ralph, J. & Boerjan, W. Lignin engineering. Current opinion in plant biology 11, 278-285 (2008).

54. Davin, L.B. & Lewis, N.G. Lignin primary structures and dirigent

sites. Current opinion in biotechnology 16, 407-415 (2005).

55. Christoffersson, K.E., Sjöström, M., Edlund, U., Lindgren, A. & Dolk, M. Reactivity of dissolving pulp: characterisation using chemical properties, NMR spectroscopy and multivariate data analysis. Cellulose 9, 159-170 (2002).

56. Isogai, A. Solid-State C-13-Nmr and X-Ray-Diffraction Analyses of Various Cellulose-Iii Samples. Abstr Pap Am Chem S 204, 31-Cell (1992).

57. Kunze, J., Scheler, G., Schroter, B. & Philipp, B. C-13 High-

Resolution Solid-State Nmr-Studies on Cellulose Samples of Different Physical Structure. Polym Bull 10, 56-62 (1983).

58. Strunk, P., Oman, T., Gorzsas, A., Hedenstrom, M. & Eliasson, B. Characterization of dissolving pulp by multivariate data analysis of FT-IR and NMR spectra. Nordic Pulp & Paper Research Journal 26, 398-409 (2011).

59. Suter, D. & Ernst, R.R. Spin diffusion in resolved solid-state NMR spectra. Phys Rev B Condens Matter 32, 5608-5627 (1985).

60. Thygesen, A., Oddershede, J., Lilholt, H., Thomsen, A.B. & Stahl, K.

On the determination of crystallinity and cellulose content in plant fibres. Cellulose 12, 563-576 (2005).

61. Park, S., Johnson, D., Ishizawa, C., Parilla, P. & Davis, M. Measuring the crystallinity index of cellulose by solid state 13C nuclear magnetic resonance. Cellulose 16, 641-647 (2009).

62. Newman, R.H. & Hemmingson, J.A. Determination of the Degree of Cellulose Crystallinity in Wood by Carbon-13 Nuclear Magnetic Resonance Spectroscopy. in Holzforschung - International Journal of the Biology, Chemistry, Physics and Technology of Wood, Vol. 44 351 (1990).

63. Elg-Christofferson, K., Hauksson, J., Edlund, U., Sjostrom, M. &

Dolk, M. Characterisation of dissolving pulp using designed process variables, NIR and NMR spectroscopy, and multivariate data analysis. Cellulose 6, 233-249 (1999).

Page 64: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

53

64. Wickholm, K., Larsson, P.T. & Iversen, T. Assignment of non-crystalline forms in cellulose I by CP/MAS 13C NMR spectroscopy. Carbohydrate Research 312, 123-129 (1998).

65. Lu, F.C. & Ralph, J. Non-degradative dissolution and acetylation of

ball-milled plant cell walls: high-resolution solution-state NMR. Plant J. 35, 535-544 (2003).

66. Mansfield, S.D., Kim, H., Lu, F. & Ralph, J. Whole plant cell wall characterization using solution-state 2D NMR. Nat. Protocols 7, 1579-1589 (2012).

67. Cetinkol, O.P., et al. Harnessing the effect of ionic liquid pretreatment on Eucalyptus. Abstr Pap Am Chem S 239(2010).

68. Hedenstrom, M., et al. Identification of Lignin and Polysaccharide

Modifications in Populus Wood by Chemometric Analysis of 2D NMR Spectra from Dissolved Cell Walls. Mol Plant 2, 933-942 (2009).

69. Mansfield, S.D., Kim, H., Lu, F.C. & Ralph, J. Whole plant cell wall characterization using solution-state 2D NMR. Nature Protocols 7, 1579-1589 (2012).

70. Ralph, J., Akiyama, T., Coleman, H.D. & Mansfield, S.D. Effects on Lignin Structure of Coumarate 3-Hydroxylase Downregulation in Poplar. Bioenergy Research 5, 1009-1019 (2012).

71. Yelle, D.J., et al. Two-Dimensional NMR Evidence for Cleavage of

Lignin and Xylan Substituents in Wheat Straw Through Hydrothermal Pretreatment and Enzymatic Hydrolysis. Bioenergy Research 6, 211-221 (2013).

72. Kim, H. & Ralph, J. Solution-state 2D NMR of ball-milled plant cell wall gels in DMSO-d6/pyridine-d5. Organic & Biomolecular Chemistry 8, 576-591 (2010).

73. Kim, H., Ralph, J. & Akiyama, T. Solution-state 2D NMR of Ball-milled Plant Cell Wall Gels in DMSO-d(6). Bioenergy Research 1, 56-66 (2008).

74. Lolli, M., Bertelli, D., Plessi, M., Sabatini, A.G. & Restani, C.

Classification of Italian Honeys by 2D HR-NMR. Journal of Agricultural and Food Chemistry 56, 1298-1304 (2008).

Page 65: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

54

75. Berglund, A., Brorsson, A.-C., Jonsson, B.-H. & Sethson, I. The equilibrium unfolding of MerP characterized by multivariate analysis of 2D NMR data. Journal of Magnetic Resonance 172, 24-30 (2005).

76. Hedenström, M., Wiklund, S., Sundberg, B. & Edlund, U.

Visualization and interpretation of OPLS models based on 2D NMR data. Chemometrics Intell. Lab. Syst. 92, 110-117 (2008).

77. W, F. Eine modifizierte Methode zur Bestimmung der Reactivität von Zellstoffen für die Viskoseherstellung. Das Papier 13, 92-95 (1959).

78. Köpcke, V. Conversion of Wood and Non-wood Paper-grade Pulps to Dissolving-grade Pulps. Doctoral Thesis in Pulp and Paper (2010).

79. E, T. Zur Viskositätsbeinflussung des Filterwertes von Viskosen.

Monadsheft Chemie Bd 93(1961).

80. Treiber E, R.J., Ameen C, Kolos F. Über eine Laboratoriums- Viskose-Kleinstanglage zur Testung von Chemiezellstoffen. Das Papier 16, 85-94 (1962).

81. Zellcheming. Verein der Zellstoff- und Papierchemiker und -ingenieure, Verein der Zellstoff- und Papierchemiker und –ingenieure, Merkblatt III/6B/68, Prüfung von Viskose: Filterverstopfungszahl von Viskose. (1968).

82. Sarkar, N. Thermal gelation properties of methyl and hydroxypropyl

methylcellulose. Journal of Applied Polymer Science 24, 1073-1087 (1979).

83. Gal, M., Kern, T., Schanda, P., Frydman, L. & Brutscher, B. An improved ultrafast 2D NMR experiment: Towards atom-resolved real-time studies of protein kinetics at multi-Hz rates. Journal of Biomolecular Nmr 43, 1-10 (2009).

84. Gal, M., Schanda, P., Brutscher, B. & Frydman, L. UltraSOFAST HMQC NMR and the repetitive acquisition of 2D protein spectra at Hz rates. Journal of the American Chemical Society 129, 1372-1377 (2007).

85. Schanda, P., Forge, V. & Brutscher, B. Protein folding and unfolding

studied at atomic resolution by fast two-dimensional NMR

Page 66: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

55

spectroscopy. Proceedings of the National Academy of Sciences of the United States of America 104, 11257-11262 (2007).

86. Schanda, P., Kupce, E. & Brutscher, B. SOFAST-HMQC experiments

for recording two-dimensional heteronuclear correlation spectra of proteins within a few seconds. Journal of Biomolecular Nmr 33, 199-211 (2005).

87. Wishart, D.S. Quantitative metabolomics using NMR. TrAC Trends in Analytical Chemistry 27, 228-237 (2008).

88. Chan, E.C.Y., et al. Metabolic Profiling of Human Colorectal Cancer Using High-Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HR-MAS NMR) Spectroscopy and Gas Chromatography Mass Spectrometry (GC/MS). Journal of Proteome Research 8, 352-361 (2009).

89. Gavaghan, C.L., et al. Directly Coupled High-Performance Liquid

Chromatography and Nuclear Magnetic Resonance Spectroscopic with Chemometric Studies on Metabolic Variation in Sprague–Dawley Rats. Analytical Biochemistry 291, 245-252 (2001).

90. Wang, Y., et al. Magic angle spinning NMR and H-1-P-31 heteronuclear statistical total correlation spectroscopy of intact human gut biopsies. Analytical Chemistry 80, 1058-1066 (2008).

91. Skappak, C., Regush, S., Cheung, P.Y. & Adamko, D.J. Identifying Hypoxia in a Newborn Piglet Model Using Urinary NMR Metabolomic Profiling. Plos One 8(2013).

92. Li, Y.Z., Liu, H.B., Wu, X.Z., Li, D.H. & Huang, J. An NMR

Metabolomics Investigation of Perturbations after Treatment with Chinese Herbal Medicine Formula in an Experimental Model of Sepsis. Omics-a Journal of Integrative Biology 17, 252-258 (2013).

93. Xu, W.F., et al. H-1 NMR-based metabonomics study on the toxicity alleviation effect of other traditional Chinese medicines in Niuhuang Jiedu tablet to realgar (As2S2). Journal of Ethnopharmacology 148, 88-98 (2013).

94. Pilate, G., et al. Lignification and tension wood. Comptes Rendus Biologies 327, 889-901 (2004).

Page 67: Multivariate Analysis of 2D-NMR Spectroscopy647421/FULLTEXT01.pdfiv För att identifiera molekyler i komplexa prov som innehåller många ämnen så har vi utvecklat en ny metod, kallad

56

95. Yoshida, M., Ohta, H., Yamamoto, H. & Okuyama, T. Tensile growth stress and lignin distribution in the cell walls of yellow poplar, Liriodendron tulipifera Linn. Trees 16, 457-464 (2002).

96. Alves, A.C., Rantalainen, M., Holmes, E., Nicholson, J.K. & Ebbels,

T.M.D. Analytic Properties of Statistical Total Correlation Spectroscopy Based Information Recovery in H-1 NMR Metabolic Data Sets. Anal. Chem. 81, 2075-2084 (2009).

97. Crockford, D.J., et al. Statistical heterospectroscopy, an approach to the integrated analysis of NMR and UPLC-MS data sets: Application in metabonomic toxicology studies. Anal. Chem. 78, 363-371 (2006).

98. Maher, A.D., et al. Statistical Total Correlation Spectroscopy Scaling for Enhancement of Metabolic Information Recovery in Biological NMR Spectra. Analytical Chemistry 84, 1083-1091 (2012).

99. Sands, C.J., et al. Data-Driven Approach for Metabolite Relationship

Recovery in Biological H-1 NMR Data Sets Using Iterative Statistical Total Correlation Spectroscopy. Anal. Chem. 83, 2075-2082 (2011).

100. Sands, C.J., et al. Statistical Total Correlation Spectroscopy Editing of H-1 NMR Spectra of Biofluids: Application to Drug Metabolite Profile Identification and Enhanced Information Recovery. Analytical Chemistry 81, 6458-6466 (2009).

101. Keun, H.C., et al. Heteronuclear F-19-H-1 statistical total correlation spectroscopy as a tool in drug metabolism: Study of flucloxacillin biotransformation. Analytical Chemistry 80, 1073-1079 (2008).

102. Wishart, D.S., et al. HMDB 3.0--The Human Metabolome Database

in 2013. Nucleic acids research 41, D801-807 (2013).

103. Forsythe, I.J. & Wishart, D.S. Exploring human metabolites using the human metabolome database. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 14, Unit14 18 (2009).

104. Wishart, D.S., et al. HMDB: a knowledgebase for the human metabolome. Nucleic acids research 37, D603-610 (2009).