full text 02

149
Identification and characterization of prognostic factors in breast cancer using MR metabolomics Thesis for the degree of Philosophiae Doctor Trondheim, December 2011 Norwegian University of Science and Technology Faculty of Medicine Department of Circulation and Medical Imaging Guro Fanneløb Giskeødegård

Upload: tarillo2

Post on 12-Oct-2015

30 views

Category:

Documents


0 download

DESCRIPTION

asdgsdga

TRANSCRIPT

  • Identification and characterization of prognostic factors in breast cancer using MR metabolomics

    Thesis for the degree of Philosophiae Doctor

    Trondheim, December 2011

    Norwegian University of Science and TechnologyFaculty of MedicineDepartment of Circulation and Medical Imaging

    Guro Fannelb Giskedegrd

  • NTNUNorwegian University of Science and Technology

    Thesis for the degree of Philosophiae Doctor

    Faculty of MedicineDepartment of Circulation and Medical Imaging

    Guro Fannelb Giskedegrd

    978-82-471-3228-9 (printed ver.)978-82-471-3230-2 (electronic ver.)ISSN 1503-8181

    Doctoral theses at NTNU, 2011:324

    Printed by NTNU-trykk

  • Identifisering og karakterisering av prognostiske faktorer i brystkreft ved bruk av MR metabolomics

    !"# $

    !%

    &$

    %

    $%

    !"

    '

    !"!"

    %

    (

    )

    *

    +, -

    % %(

  • . %

    '

    %

    ,

    /'!"

    % % %

    -

    %

    %

    Kandidat01.1

    Institutt0'%

    Veiledere0-.

    '1

    Finansiering0/.21),

    Overnevnte avhandling er funnet verdig til forsvares offentlig for graden

    Philosophiae Doctor i medisinsk teknologi.

    Disputas finner sted i Auditoriet, Medisinsk teknisk forskningssenter, St. Olavs Hospital,

    tirsdag 6.desember 2011 kl 12.15.

  • Acknowledgements

    -3

    !"$

    4$ !$ ' /-/2 5667, 56

    .$ 3 "$ 4$ /3

    $

    $ .21)#

    ' 3 8 3

    .'3-.

    9 '1 $$ $:-

    $3

    '

    $

    $

    '3 $ $ $

    8("!!"9;

    $$ $ ! 4

    $

    $ $

    $ $ $ '

    $ !" $

    $ $3

    '

    %33

    '3$

    4$$

    " 2 /%

    3$

    /'

    %''$

    $39;!4

    1

    9-1

    $

    $$

    $ 3

    $ $$ -3

    3

    3 '

    $

    < ' $

    $

    ) 9 9 !

  • $

    3 3 # 3

    -3!$$

    3;'3

    !

  • Summary

    $$

    3-=$$

    $$$

    $$$

    $$

    $33

    $

    -

    $3$$3

    >

    $$

    4$ $ $ $ $

    $ $

    ?

    $$ $

    $$

    $

    $ 8

    '

    $ $$ - $$ ,

    ,$

    - $

    $ $ $ !"# $$

    $

    ("!#!"$$

    $-338

    $$ $

    $ $ $ $

    !"$3>>-

    $3

    $3 $>

    38

    -

    3 8$ $$ >!"

    $$

    $$

    $3

    $33

    $

    $ $ 3

    ( $ $ $

    $ $ $

    $

    $

    $

    3 $ $

  • $3 $3

    8$ ,$ $ 3

    =$

    $$9

    3 8

    $ $ $

    $3

    $

    $$

    -$$!"$

    >!" $

  • Symbols and abbreviations

    5(1 5,8

    ;/ 8$

    6 $$

    / 3

    49,4 $$

    4

    $

    43

    $ 49$!1?

    $

    4* $$

    '4 $$$

    )"

    $

    )")-'4 $$

    $$$,$$

    $

    .'

    $$

    ..-

    1;2- $

    194 $$

    ()",5 3$$5

    () 8

    ('. 8$$

    (!

    ("! $

    ' $

    '( $

    ';4 $$

    ;

  • ! $

    /4 %$

    /41 /3$$

    /'9; ,?

  • 8

    -9 E,$$

    *- $

    *)1. $

    3$

    *)1." $

    3$$

    *93

    $3

  • 8

  • 8

    List of papers

    Paper I

    Multivariate modeling and prediction of breast cancer prognostic factors using

    MR metabolomics

    11.1!-8);

    .%()

    1'

    -.

    Journal of Proteome Research, 2010; 9(2): 972-979.

    Paper II

    Alignment of high resolution magic angle spinning magnetic resonance spectra

    using warping methods

    11.D -1D 91 -!1 '

    -.

    ;!4D

    Analytica Chimica Acta, 2010; 683(1): 1-11.

    Paper III

    Glycine and lactate- potential MR biomarkers of breast cancer prognosis

    1 1. ;

    .% () 9 1

    ;!4

    1'

    -.

    Submitted manuscript

    Paper IV

    Prognostic value of metabolic response in breast cancer patients receiving

    neoadjuvant chemotherapy

    4!D11.D

    -.;

    9);

    1'D

    Submitted manuscript

  • 8

  • 8

    Contents 1 Introduction ............................................................................................................ 1

    4$

    $$5

    5 -F

    5 !$G

    5 !"$$H

    55 ("!!"6

    5E !"$?5

    E 9$!"$E

    E $$E

    E5 9

    E

    EE $>+

    EF *$I

    F !G

    F 9$$

    G

    F5 9?7

    FE 3H

    FF 9$35

    F+ !55

    + *5E

    2 Objectives .............................................................................................................. 25

    3 Materials and methods ......................................................................................... 27

    E 9

    5G

    E5 5G

    EE ("!!"$57

    EF 5H

    E+ ($$$E6

    4 Summary of papers .............................................................................................. 33

    5 Discussion .............................................................................................................. 37

    + !$$E7

  • 8

    +5 9$!"$FE

    +E !!"$FI

    6 Conclusions and future perspectives .................................................................. 53

    References...................................................................................................................... 55

  • '$

    1 Introduction

    1.1 Cancer

    4$ $$ $$> $ $

    @ $ $ $ $$ $

    $

    $ mutations

    3

    $$ $

    > $ $$

    $

    ? $$ $$ $ 3

    -

    $ $

    $$$

    $$$/$

    '$

    $$$

    -

    $$ 3 5666 > (

    8 J $$=: $$ $ 3

    ,$

    3

    $

    $ 5 !

    $$$

    '3

    $$ $ $ $ $

    E $$ $

    $ $ 3

    $

    :

    $F '

    $$

    ?$$$$

    -

    $$ $$ $-

    $

    $ # $ $

    #

    $

  • '$

    5

    1.1.1 Breast cancer

    $$ $ $$ 3

    /33

    5G663$'566H$$$$55K

    $$ /3 3

    + . 3

    - $ ,$ $

    $

    $$3$'$$$ '4#

    $ $$ $$ G6,76K

    $$'4$3$3

    $ $$

    $$ ';4# ';4$$8

    6K $$ $ $ '4I ;

    $$$,$$3-

    3$

    $$

    3G+,

    $$

    /3H,H+K

    377K+ -

    $>$$

    $ $

    7 ' /3

    $$

    3

    3

    +6IH

    .$ 3

    $

    =

    $

    > $

    $ > -/! #

    $ )"# $ 9"#

    3 $ $5 ()",5# $ H,69

    3

    $ 8 )" L 9" 3 3

    $ ()",5 $

    $ 3 $ 3

    3 5 ()",5 $ $

    $()",58

    3

    $

    3->$

  • '$

    E

    Figure 1.1:@43$9$

    M;$44M!

    Locally advanced breast cancer

    86,56K$$

    3$$

    $$ >$3 $$G+K

    $$ $E - $$

    $$> -E,-F0

    N+$ 8

    $ 3# L 8

    /5,/E#

    3F;$$$$

    $$ 3 -

    3 3

    $ %$/4#$

    > /

    /4

    + 3 /4 3 ,$

    3$$$$

    $I9

    3

    $$/4$$

    3

    $$$G,

    7 '/3

    3 $ $ $$

    $

    /34$1 /41H#$/4

    39

    3$$$.)4 +,

  • '$

    F

    $$$$#'

    $O76K$$3$$.)4'

    $ P76K

    $

    53

    3$8$8

    1.1.2 Tumor metabolism

    4

    -9$4

    $$

    $

    -92 $ $ 8>

    $$$$$$ -4&$$#38

    $$-$%

    $ $: H+K

    $

    .5#-$3$-9

    $56

    4$$?

    $

    $ $

    $$%5-$

    $ 1;2-#1;2-

    $$55(8$

    $?

    $ $ $> $ $

    8

    5E 4$ $ 8 8 $

    $$8$$ ('.,#3$

    $

    5F$3

    $

    8

    $$$$$$8

    -$$$3@$5+

    $

    3 $ -9 < 8 $

    $ $ $ ?

    8

    -9$ $$$ $$$ $

    $

    -93

    $

  • '$

    +

    - $ $

    $

    $$$$(

    $,$$5

    '3$$3

    $$5I,57 9 % $

    $ 3 9$ 94#

    3$

    $5H.E394.

    $ 4#

    $$$ 94#$

    $94$$$ 49,4#

    $E6 94 >

    $49,45,$$1$$ 194#,

    $$$ $ 94 2 in vivo !"

    1949443 $ $

    4#'$4$

    $

    $$E,EFEx vivo$4

    $9457$$$

    4

    in vivo $$

    E+,EG

    Figure 1.20-

    $ '

    $8

    $

    8> -4 $$ 8 ' $

    $4$$$$

    $

  • '$

    I

    Figure 1.3: # $ # $

    ?

    $$-

    $573

    4

    $

    $$$3

    $

    3 ,$$ $

    $$ '$ $

    $$

    $ $8E7 EH $F6 .

    $$ $

    $ $$ F3

    3 $ $$

    $

    F5 -

    $$>

    $$ 3 $ $ 3 $ $

    FE-,$$3$ $

    $ $ $$

    FF-8$$3

    $$

    $

    31$

    $ $ $$

    1$

    $$

    3 $ 3

  • '$

    G

    F+,FI 8$ $ $$

    31$ > 3 '

    $$E,$56

    '$$4$$9$$

    ,,$$G3$

    $,$,3

    8$94$

    ,FG,,$$

    $

    83,

    $,$$

    !

    $ & $$

    > $

    '(L5#

    F7 !

    $$

    $ G6K '' '''

    FH @ 3, '( $ $ $ Q,

    '( $ 5,

    8 5(1#+6 - $ 8$ 5(1

    3 5(1

    J$=

    1.2 Metabolomics

    !$3$$$

    4$ $

    ,$

    2

    ! I666

    $(! (!#$$

    $$+$ ,$

    $

    $ 3

    $ $ $ $

    $ $ +5,+E !$ 3

  • '$

    7

    ! $ $ $ R,

    $?$$S+F

    !$ 3 $ $ R$S

    $:

    $ $$ $ -

    3$

    8 $

    $3.F33

    8$ ++-$$ $3

    3$$$

    $,

    $$

    Figure 1.4: - S$S $$ ! 3 $

    8 $ $

  • '$

    H

    !83$$$3

    3$$

    $$$

    $> $-$

    $$$$

    $$$ !"#$ !#!

    $3 ? $ +F +I

    !"!$!

    $?$

    $

    $3$$

    !"!"3

    ? $

    $

    $+G ' !"

    ,$ $? ?

    +F

    1.2.1 MR spectroscopy

    !" $ 8 $ $

    $3,> 'T6#$3

    L

    $$

    $!"-$(E4F/+/H.E93$(

    $

    $!"$@

    $$

    6#$3

    5'U

    ?$

    3$3?

    $

    $

    $('VL53

    3

    :

    ,6

    8$ 3

    6 $

    > !6#6'?

    $ ".#$

    3

    3 8$

    ?- !633$63

    $".H63!63H66

    >,$ 8,,@

    ". 3$ 8$

    $3 ? -# -5# 8

  • '$

    6

    >$3?!6$

    $$ .'#$$-$3$

    $-5-$

    6

    $-5$-5D-.'$.?

    $

    $3?

    $6$

    $

    /$

    $

    3 8

    $

    $

    $3

    $$ $ - $$

    $ # 3$

    $

    -$$3

    $$$

    $+7

    1.2.2 HR MAS MRS

    $$$$

    3 $ $ 3

    $ $

    $ - !"$

    $?3

    $

    !" ' 3

    $

    +H"

    $+(>#

    8$+FG $#$$

    $ 6 3 $

    $

    . +#

    $ 3

    $ $ ?

    $

    +H,I6 !$

    3$3I;3I5 H+7

    $

    ("!#!"3HHGIE,IF

    (" ! !" ,$ $? ?

    $ >

    $ $ 8

    8

    !"I+ - $?("! !" $

  • '$

    $8$E6

    $$("!!"II-$$$

    $,$ 194 94 4 (" ! !"

    8$

    $

    $$ ' $$$$$

    $.+#

    Figure 1.50

    ("!!"$$$3

    $!"'0$$

    !$$WV+FG$$$6

    8$$$

    X,1$X,

    $: $ $: ;$ $: 4 $: ,' ,': 1 $: -

    : ! : 1 : 1 : - !

    3

  • '$

    5

    1.2.3 MRS acquisition

    $$33!"

    $3

    ?

    $ 3 $ !"

    $?@$4

    $?!" $ $ , 3

    3 3 $ 3 $?

    $/3$$IG#-

    $$$

    $ -5,8 $

    $? $ $, -)# $? $

    ?

    $ 3 ,$ 4 9$ ! 1

    ?

    $ $ . I# 3 H6

    3 76 $ Y - 76 3

    $ $$ -5D $ H6

    .$3-5 >8,3

    $$3-5

    Figure 1.60 - ,$ 4 9$ ! 1 ?

    $ ,

    3H6-3766$Y

    -763$$H6

    90o n180oZY[ ZY[

  • '$

    E

    1.3 Preprocessing of MR spectra

    9$!"$$$$

    $ $$

    $

    9$$

    $3

    I7,IH $ 8:

    $-

    $$3

    G6,G

    1.3.1 Baseline corrections

    $$$$$$

    3$$

    $3$

    $?>$-3

    $$ $ $ ' 3 3

    $$$3

    $$8$$

    $ $ $ ,?

    3$?G5-

    $$

    1.3.2 Peak alignment

    - !" $

    $ (

    ,$$

    , $ !

    3

    $$

    $$ GE,GF !

    $ 3

    $ '

    $

    $

    $$ $ - >

    $??>GG+,GI-

    3 $ 9

    warping $$ 3

  • '$

    F

    >

    $$$$$$

    3 $

    3$3

    $$3! ?

    $$ 3$

    $

    '$GG $ > 3 4 '$ $$

    $, . ..-# . 3

    $. 7E3 $

    ,$

    $ '

    $

    4$ 3$

    $3>$9-@3$,

    $ $$ 7F ' $

    $,*93>

    $$

    ;3

    $

    $.G3

    4

  • '$

    +

    Figure 1.7: ("!!"$ # #

    $>

    3-$

    1.3.3 Scaling and normalization

    $$$$

    $ ,33 > $/>

    $ $$

    $ 3

    $ $ > !" $ 8 3$$$

    G6 $ > $

    $ : 3 >7+ />

    R

    S:8

    (38

    $3-$$$8$$$

    $G6

    $ $

    $ $

  • '$

    I

    7I !,$

    !" $

    $ $ $

    ,$

    $ 3 $I7 '

    $ $

    $$!"

    $ $

    $

    7G* *-# $ $

    337I$$

    4*# $

    $$-4*$$$

    4* $ *-

    '3$$$

    1.3.4 Variable selection

    ; !" $ $

    $$ $

  • '$

    G

    3 . ., $ 4 $

    3

    %$0

    5

    @ @ #c ci j

    c i jS

    \5]

    )$$"$$$

    $

    $ 8*,@#?

    8*L@#7H

    1.4 Multivariate analysis

    !"$$$-!"

    $ $ !

    chemometric $ $

    !"

    1.4.1 Principal component analysis

    9$$

    94#$$38$$

    $-$

    $$$

    94#$$

    $

    7G H94 3

    8 3 $ - 94

    8>9438$

    $ . 7# ?

    $

    8 - 3 943 $

    $$

    $

    !$94$$0

    T

    X TP E \E]

  • '$

    7

    3XT $8P8E

    8 $

    $ 3

    $94$$

    ;

    3

    $

    $ $

    $ $

    $7G

    1.4.2 Partial least squares

    9? 9;#33

    3

    $XYH5.!"$$X,83$

    Y,83$$$9;

    $ $

    ;*# 8>

    $$3

    XY

    Figure 1.8: 9$$

    ,$

    3

    8,

    8E9494588$

  • '$

    H

    -9;$3

    0

    T

    T

    X TP EY UQ F

    \F]

    3T U $$XY$P Q

    $E F-9;$$$

    HE$,? /'9;#

    '!9;HF . $

    3$$$

    $$

    3$XY,$ '!9;/'9;38$

    Y3

    $?Y-

    $ 9; $ 3 $

    94

    9; $ 9;,# $ $ 9;

    $ 3

    $ $ ,$ J=

    $$

    1.4.3 Bayesian belief networks

    3 /# $

    -$$

    -$$/$$$

    $ $ 3

    .H#-3

    $$$

    $,3'$

    $

    $$H+,HI)$

    $ 3 $ $

    $

    ' 3 / $

    $8 , HG ! / $

    $ $ $ !" $

    $>H7

  • '$

    56

    /^$$

    3 $ .

    !" $ ^ $ 3

    $$$

    -

    3

    ^ $

    $

    $

    $$/^

    $3$$HH

    /$33:,83$

    3

    66-3!"$

    Figure 1.9: 83 $

    433

    #

    9 V# 9 V#6+ 6+

    9 V# 9 V# 6+ 6+ 6H 6

    9 4V# 9 4V# 67 65 65 67

    4 9 V# 9 V# 6 6 6 6H 6 6H 66 6HH

  • '$

    5

    1.4.4 Probabilistic neural networks

    /3$ =$

    3 8

    $ 3 6 9$

    3 9//65# $$3

    $ #

    $ $ 3

    $9// 0

    $

    $3 .6#-

    $ $ $

    1$-

    3$

    -$$$$

    $$33

    $$ 3

    %$

    ' 3 $

    $$>-3

    $

    $$$$

    $-$$83$3

    $

    Figure 1.1009$33,$$-

    $ $ $-

    $-$$43

  • '$

    55

    1.4.5 Multilevel analysis

    949;,

  • '$

    5E

    1.5 Validation of multivariate methods

    ! 3 3 $

    9$

    ' &, $, & &,

    3 -

    $&

    $6+-

    3

    >

    &

    $ 3 $6I@

    &? >

    $ $ ,, ;

  • '$

    5F

    9$$,

    $

    -

    -33338$

    3 - $

    $$

    6H

  • !

    5G

    3 Materials and methods

    3.1 Patients and data sets

    -

    ' '''3

    $$ $$

    ,/3 -

  • !

    57

    8$ ?

    4 3 $$

    ,764 ("!

    !"3$

    .E#

    3.3 HR MAS MRS protocol

    - 3 > $ "AI66

    1(1# ?3 (LE4!3

    3

    $8-$$ ''''''#3$

    +6_;! 9F6_;#5 3

    9 - $$ '* $ 3

    >&,.("!E_;95$806G(>#3

    .

    Table 3.109$?("!!"$

    Breast spectra Cervix spectra Colon spectra

    Temperature F4 " F4

    Spin rate +(> I(> +(>

    Echo time 57+ 57+ 5G5

    Number of scans 57 57 57

    Collected region 6(> 6(> 6(>

    Number of points E5 E5 IF

    Acquisition time IF IF E5G

  • !

    5H

    Figure 3.109("!!";0-$$

    3 >

    ?

    $ !"

    "0-$!!"$?90-$

    $/-/2L1!

    3.4 Data analysis

    9$ $ 3 A@/!" ! "5667 -

    !3'$/$2#"5H5949;,3

    ! 9; 8 )

    $"$@

    $

    2# 9;,3

    '!9; / 3 /$

    /344#$3

    ' / 3 $ $

    "!" $ 9// 3 / 4lassi

    @ 1 2# - / 4 9//

    $ $

    1

    $ > $$

    $3

    $$$

    -

    $3

    $J=$$$,'3

  • !

    E6

    $ $ 66

    3

    3

    3 $

    ' 9//3 $

    $

    "!"$-,

    '*3*>

    6E

    &! " $ ("

    !!" 3!,1`3,1 $

    $

    $$3

  • !

    E

    Figure 3.20($8-0$

    38("!!"-3

    $$G6K$E6K$

    $0

    >$("!!"-

    $$$$

    3$!,1`3,1

  • !

    E5

  • EE

    4 Summary of papers

    9'

    Multivariate modeling and prediction of breast cancer prognostic factors using

    MR metabolomics

    Axillary lymph node status together with estrogen and progesterone receptor status are

    important prognostic factors in breast cancer. In this study, the potential of using MR

    metabolomics for prediction of these prognostic factors was evaluated. Biopsies from

    breast cancer patients (n = 160) were excised during surgery and analyzed by high

    resolution magic angle spinning MR spectroscopy (HR MAS MRS). The spectral data

    were preprocessed and variable stability (VAST) scaled, and training and test sets were

    generated using the Kennard-Stone and SPXY sample selection algorithms. The data

    were analyzed by partial least squares discriminant analysis (PLS-DA), probabilistic

    neural networks (PNNs) and Bayesian belief networks (BBNs), and blind samples (n =

    50) were predicted for verification. Estrogen and progesterone receptor status could be

    predicted from the MR spectra, and were best predicted by PLS-DA with a correct

    classification of 43 of 50 and 39 of 50 samples, respectively. Lymph node status was

    best predicted by BBN with 36 of 50 samples correctly classified, indicating a

    relationship between metabolic profile and lymph node status. Thus, MR profiles

    contain prognostic information that may be of benefit in treatment planning, and MR

    metabolomics may become an important tool for diagnosis of breast cancer patients.

  • EF

    9''

    Alignment of high resolution magic angle spinning magnetic resonance spectra

    using warping methods

    - $ $ !"# $ $

    $ 8

    $ $$

    $ '

    3 $ 4 $$

    #

  • E+

    9'''

    Glycine and lactate- potential MR biomarkers of breast cancer prognosis

    $$

    3 -= $$

    $ $$

    :

    3

    >

    - 3 8

    3

    $ $$ +,

    $$

    VH7#38$

    $

    !" $$ (" ! !"# - 3

    >$$

    94# ?

    $ 9;,#3$

    $ 9$ +,

    3 $ $ $$

    $ )"#$$

    VG#

    3 33 $

    V665F#

    ( $ $3 $3 3

    $

    $$ $

    $ 3

    )"

    9$+,)"

    $

    $$ - 3 $

    $$$$!$

    $$

  • EI

    9'*

    Prognostic value of metabolic response in breast cancer patients receiving

    neoadjuvant chemotherapy

    -a$$$$

    $$

    $$

    -38$$

    3 $ $ $$ $ % $

    /4#$$$

    ,

    9

    V7H# $ > $$ 3 $ $

    /4 $ $8 3 8$ ,

    ,

    > $

    $

    $ $$ ("!!"# - 3 8

    3

    $$

    $$ /4,,

    $ $ $ 3 7GHKLI7HK $$ $$$

    L?$ 9;,# P666#

    $33$$

    -$

    3

    $ /, P+ # $

    $ V666F#

    3 O+#8

    $

    $ $ V66FG# $,$ $

    b66E# $ $ V6665# -$ $

    $$

    -

    $$/43$3$$

    $$ !$ /4("

    ! !"

  • $

    EG

    5 Discussion

    '!"$$$$$$

    $ $!"$

    >>

    ' ' 3

    $$

    $ $ )" 9"

    3 8 (

    $3$$$!"$

    3$-

    8 $ !" $$

    $ 3

    $$ , 3 8 ''' -3

    3 $

    3 $

    $ $ 3

    $

    4$

    ,

    3

    !" $ $

    $$''*$/4$$

    3 8 $$

    $$

    '$3$$

    $''''

    $ $ $ $ $$

    3

    -3

    $

    -$$$$

    (3

    $$3

    3

    +,,3

    ,$$8

    $$

    $$,$

    -$

    $

    !" $ 3 '' .

    38("!!"

    $-''3$

    ''''*

  • $

    E7

    5.1 Metabolite profiles of breast cancer

    '

    $

    8$

    $$

    -$

    $3$'

    $8$ $3$!"

    $ $$ $

    $$93

    $$ $ $ $$

    $,5-3833

    $

    $ /4

    $$

    $

    1$ $3

    '''

    3,$$

    -

    3 '*

    3,8

    $$$/43

    $ $ -

    $ $

    3

    ,3$ ,

    3 , $ $

    $ " ' 3 )" 9"

    $$$

    )"9"8$

    3

    $ )" 9"

    $3 ,

    E,F-$

    3

    $ $

    %

    $

    $

    +,I

    (3$I$

    $8$33

    $ $ $ $$$ 3

  • $

    EH

    +,''''*-$$$

    $ ''' 3 )"

    $

    $$

    $$5$$$

    8# @ $# $ )

    $

    $$ $ G 7,H

    $$ 3$3

    $

    $$56 5$$$55 '$

    $$

    $

    $$$$

    $ - $ $ $$

    $1$>

    $ $ $

    $$$

    $$$>4

    $

    $ $$)

    $

    F+

    5E5F$$

    "'33944194

    )",$)",

    $

    ,$$ 3 )", , $ 3

    94194$)",,

    FG,$$$$$

    $$

    -$$$3in vitro

    $ $ 94 $ 3 $5I (3

    $$$8$

    $

    4$8(

    in vivo $

    in vitro$8$$

    $,$$5+

  • $

    F6

    /$,$33

    ''' $ $ $$

    '* , 3 194 4 /4

    $ - 3 $ $

    343

    '

    $$$,$

    /4

    3$$$$$

    , ' $$$3 '* 4

    3$$$$38$3$

    194

    $ 3 , 5I .

    $$?

    $,$

    $$$

    -

    $'''33)",$

    $$

    $ %

    -

    $ $ 3 8$

    $35G-$

    '*3

    3'3

    8$ 8$

    $

    $ $$

    $ $

    .

    ''' 3

    $$!

    $

    3

    $3

    ' 3

    $ $ $$ 3#

    -

    $3

    3$$

    $$ 3 3 $$ $

    $?$$

    9$'''3

    3)")"-3

    '3$

    $)"

    $

    $$

  • $

    F

    )"

    $

    3

    8 3

    / $ $

    $

    , $ 8 )" $ -

    $

    )$

    $ $$

    $

    $

    $ $$ '

    $$?)"

    $$ $

    )"

    )"-

    $3

    $

    !$$

    $3

    $ $ $$57 ' ' 3 $

    $

    33$

    $$3$9$

    !" $

    8 ! $

    $$$$$HFK5H(3

    $$ $

    -

    $-$3

    $

    # $ 3 ,

    8 $ - 3 $ $

    3 $ *)1.#,4 *)1.,

    3 $

    $ 3 $ $ *)1.",EE6,E

    8 $

    E5 $$EE $$ - $ $

    $$

    $$

    $ $ $ 3 $

    ' 4 $$ 8 3

    $56KGEI

    $

    EF ' & $

  • $

    F5

    $FK5F

    E+-$

    $ 8 $ 3

    $$$

    E+,EG3

    $$3EFE7',

    +735HG+EE

    $

    $5$3,

    EH-$$

    $ > $ 3

    3$$$$$

    ' 3

    $ ;/ ;/ /3 5666 $

    ' ;/ 8 $

    8$3;/

    #

    8$-

    3 $ , F6

    $$$

    9$ 3 3

    $ $

    3

    !

    =/4

    3$

    - in vivo $$

    4!"

    3 3 EG F ' '* 3 8

    $3

    $$/4

    3 $$$$@$

    $

    ,

    3 P+6K$b5+K

    $ # O +6K P 66K $

    #'3$

    $4-

    $$

    3 $ +6K$

    $

    .

    $

    3 O5+K$#

    ?

    (" ! !"

    $$

  • $

    FE

    (33 3 $ /43 $

    3-$$/43

    $ $ $ > $

    $$

    /4

    -$5(13$5667

    3

    $

    $$F5

    $ $ 5(1

    $

    9 8 $ $$ $

    $$ (3 $

    5(1$$$$

    5.2 Preprocessing of MR spectra

    $$8

    3$

    !"$'

    $

    38

  • $

    FF

    >G53$''(3

    "!3$

    3

    8$

    $

    3

    $3 ' 94

    ?

    ?GI7EFE3$

    $894$

    -$8

    9433

    $894

    3,$$$$

    9;,3$

    3

    $$ $$ $8 $$

    $ $

    $$$$$$

    / 3

    3-$

    3FF

    $

    '

    $$

    $$$

    !"$$

    - $ $$ 3 3

    $ 8

    G7 $

    !"$-

    $

    4

  • $

    F+

    @3$$$

    3 $ $$$$ 3

    $ %$

    3 H6K$$ $$-

    3 (

    $ 3

    $$ $

    8$3 3( $ 3

    $ $$ $ $

    $$$$$$#$

    $ $-

    $$$$

    3$

    $3$

    3

    -

    3 $ '' 3 3

    3 $

    3 $ J,,,=

    $$ !"(3

    83$

    $9

    . 9..-F+,FI#@

    $

    3

    ..- $,$ $ (3 9..-

    $$3-..-

    ' $$ $

    $, 3 $ $

    -

    $ 9..-$

    ! $

    $

    ..- "..-#3

    ?

    -

    >

    $$

    $

    $$

    $

    ,3

    "9#*

    !" $FE - $

    $

    $ ,$#

    $ -

    $

    3

    $

  • $

    FI

    8 ..- $,$ -

    % "9 3 3

    48$

    $

    -

    $

    - 3$ $

    $ $

    $?

    $

    ,$ 3

    4@#$

    $$3FH,+6-

    $

    :

    $$$

    $

    $

    $ 8> $ 3

    $3$3?3

    3$

    ' 3 3 3

    $ 3

    -$

    $3

    4@ 3 $ 4

  • $

    FG

    ;$

    . ,

    $

    ,$ J$ 8=

    $$$@$$

    $ $ $

    3$$>$

    $ ,$

    ?%''

    $$$9;,3$ ,/

    9//9;, $$ $ )"

    9"-$33

    3 $

    3

    $ . $ 9;,

    $

    $$ $ ?

    $$$$

    $

    $$)"''''

    8 $$ $ $

    $ - 3 3 $

    L$$ - $$ 9;, 3

    $$ +5 3

    $$$

    33$$3$$

    - $

    3 $

    >

    . 9// / 4 $>

    $

    $3

    3 3 $ '

    3 $ 3 $$

    '

    9///$$$

    $ 3

    - 9;, $ 9// /3

    $$'''9;,

    3

  • $

    F7

    9'*8

    $ ' 8

    $

    $$$,

    $

    9;, 3 $ -

    $ 3 $

    $:

    $$$ $IHK77K3

    . +# -

    $ $ 3 @ 3

    $ 9;, 9;,

  • $

    FH

    - $

    ? ?$ -

    ("!!"$ $$

    ,3$%$,$$-

    $

    8

    $$

    $ c$

    $

    $

    $ -9

    $ (" ! !"5 7 - -9 3

    3 3 +E $ -5 8

    $$ ?$ ("! $ 3 ? 8

    $ $ $$

    $ $$ ,

    $$

    )")-'4+F#4

    $

    ?$ :++ 3

    $ $ $$ . $

    ' $ $3

    $

    $ - ?$

    3$

    $

    ' $ > ?

    +I,+7'

    >$$

    $ > - 3 ''' '*

    - $$

    $ $ $

    3

    > $$

    $

    $ $

    $ $

    $

    $

    -+> $

    ("!!"$3

  • $

    +6

    Table 5.10("!!"$

    Purpose Methods

    $$ 1 $ ? 3 $$

    $$

    $$ $ ?

    3 $

    3 $

    9

    - $

    $

    $-9#

    3$

    $

    !" $

    '$

    !"$4

    $$

    "

    $

    @ $ 3

    $

    - $ 8 $

    $$$$

    ?

    ; $

    4

    /> />$?$$

    $

    3-$$

    $$

    3

    $$>

  • $

    +

    $ *- $ 3

    3

    @*-,$

    3$$

    *$ 23$

    "!"$ 3

    $

    !

    9483$

    $

    9;,$$!"$

    ! 9;, $$ 3

    $

  • $

    +5

  • 4$$

    +E

    6 Conclusions and future perspectives

    '

    $$$

    3

    3

    ( $

    $ $

    $ $$ $ $$ $

    $

    3 3 8

    3$-

    $33

    $$$

    $$.

    $ 8

    $

    $

    +,

    $$ 3 8 $

    3 $ 1$ $ 3 $ ,

    $$-$

    /43 8

    $ 3

    $$ $ ( $ $

    3

    $33-

    $$$

    @$$8$

    /

    8$ 3

    3

    $ 2 =$$

    3 8$ $

    $

    $$ $

    $

    >

    $$

    $

    $ 8 !" $ $ $

    !"$$$

    3 3 $

    $ ' 4 ; (

  • 4$$

    +F

    ("!!",

    8$ +H ' $ $

    $

    $

    < !" in vivo,ex vivo $

    $!"

    3 !" 8 $ in vivo

    -= $$!" $

    + - E - 3$

    $3F-$

    !"'

    in vivo

    4 $ 3

    / $$!" $

    $

    $$

    $?

    $$

    ex vivoin vivo$$

    - $ $ !" $

    > !" $

  • "

    $

    ++

    References

    &"MCancer biology5:9)$;0)856665 (:@"-( 4$Cell 2000,100 #

    +G,G6E 9&.

    Nat. Med. 2001,7 5#+,

    5F (:@"(4$0-/81

    Cell 2011,144 +#IFI,IGF+ 4$ " /3 Cancer in Norway 2009 - Cancer incidence,

    mortality, survival and prevalence in Norway: 4$ " /30

  • "

    $

    +I

    G 4e>,!$1 !: 1>e>, $$ %$Clinical and Translational Oncology 2010,12 G#FI,FIG

    7 & A: ! ! : d /: ( : B c !,$ $ $ $ %$ $ $$

    Eur. J. Cancer 2011,In Press, Corrected Proof

    H - /3 4$ 1 0LL333$L $$ 56#

    56 M!:-$>M;:;Biochemistry+:56655 ;$ M:4; $$BMC Biology 2010,8

    #7755 !$! ;: " : M !$ $

    $ 1;2-# $$J. Cell. Physiol. 2005,202 E#I+F,II5

    5E B9:@:3(8$$$$$Methods Mol. Biol. 2010,651,5H

    5F 4M:;&!(8,$$J. Cell. Biochem. 2011,112 E#GE+,GFF

    5+ @!:!:

  • "

    $

    +G

    ?(!"$$8J. Magn. Reson. Imaging 2010,31 F#7H+,H65

    EI !:9M:)(:";:1$):);': / ! -: ) - (: - - !: B

    : 13 !/% 4 ;$ $ 4$0 9$"3*(!"$$g9F-Radiology 2004,233 5#F5F,FE

    EG M

    /":&!:

    *:4$4:%.):1'4$ $$ $$> , $

    !"$$MAGMA 2004,16 F#GF,7

    EH 3!1:*:-d;:!"1:$;:4 9 ": M M &: ( " ): &3$> M 9 (",!$$ ? $ !"'LE,!"',$Magn. Reson. Med. 2003,50 +#HFF,H+F

    F6 -!,: h &!: %

    @: - 1: 1 1 .:

    - .: 1 ' : ( ) $

    3$$$$(("!!"$$$$J Proteome Res 2010,9 G#EIIF,EIG6

    F $*>$$,$$(,/!"$$,>NMR Biomed. 2003,16 #,

    F5 )> '!:):)(!!:),/):)(

  • "

    $

    +7

    F7 ;: M : ! '( $ Trends in Molecular Medicine 2010,16 H#E7G,EHG

    FH B (: 9 @: M 1:!$;

    ": "

    : B@:& ':$,( ': M:"1 M:.(:.:":(M:&>&@:*$$*):*:'('(5N. Engl. J. Med. 2009,360 7#GI+,GE

    +6 ;:@@:1:

    :!:)!:.*":M(1:M:&

    !4:!&!:9"!:@ 9 : B

    & ): ; ;!: "3> M : 4 ; 4:-4:*(

    !1: !4$,$ '($5,8Nature 2009,462 G5GF#GEH,FF

    + @:&84:14:)":B/:1:(:9$/:)::!":':AM:M;:4>M:;):4::(9:;9:.;:9

    M:.":4

    :->:4

    !:;3: > : d : 3 !: A B: 4 : 1 ":/>:":;;:*( M:. '(!03Nucleic Acids Res. 2009,37 #I6E,I6

    +5 /$M&:;M4:() a!$a0$ $ $ $ /!" $$$ Xenobiotica 1999,29 #7,7H

    +E . $$Prog Nucl Magn Reson Spectrosc 2009,54 E,F#5EH,5+F

    I 3 ) ": : ) " 1 /$ !$ "$$ 4 (

    Nature 1958,182 FI+6#I+H,I+H

  • "

    $

    +H

    I5 ;3 'M.

    '$$"Physical Review Letters1959,2 G#57+

    IE 4

    ;:!!:$;:9-:-$':;$:1>>"c $

    $$$$Proc. Natl. Acad. Sci. U. S. A. 1997,94 5#IF67

    IF !&&:!@):41:1

    ,$,

    $ $ $ $$ $Magn. Reson. Med. 1997,38 E#EHH,F6E

    I+ )::;%$),":>!:; !: : ! M: * 4 @ ) $ $ (,/!" $ Analytical and Bioanalytical Chemistry 2010,398 F#G7,GH6

    G5 )9(49$-@Anal. Chem. 2004,76F6F,FGE @%(:!

    @M:ad(M:1!:(

    $:

    ; ! $ $$ ?

    $ $ $ $J. Magn. Reson. 2000,144 #E+,FF

    GF *M-@):-4:*

    M:1

    M90

    3/!"$$$$$Journal of Chemometrics 1996,10 +,I#F5+,FE7

    G+ !-:

    :*1:-):">$)":>!;:1-4::!M4:*4@ /!", 4$> !$ (

    2 '

    Anal. Chem. 2008,80 6#EG7E,EGH6

  • "

    $

    I6

    GI . M: - " M 3 $ 3 $ $$ Journal of Chemometrics 2004,18 +#5E,5F

    76 ;

    1,4:@;$

    /!"Anal. Chim. Acta 2004,513 5#FE,FI

    7 4:1:!':">>:!.9:19:$:&$

    2*9

    [email protected]. 2009,81666,66G

    75 - 1: 1>

    M: @ (: 1$ M: !:@(M4-:

    (;9:)9(4:

    ;!4:@

    "'$3$Chemometrics Intell. Lab. Syst. 2010,104 #I+,GF

    7E . M: $,&

    ': M$ 9 9

    /!"

    $Anal. Chim. Acta 2003,487 5#7H,HH

    7F 1":@

    ":( M

    >8 $0 $ 3 $ $$Journal of Computational Chemistry 2001,225GE,57H

    7+ (

    !!@:4>,M>;::("9$ 8 $$ 8$ Anal.Chim. Acta 2005,545 #+E,IF

    7I &(4:)-!:(:!):$

  • "

    $

    I

    HE ! $ 9; Journal of Chemometrics 2009,23 6#+7,+5H

    HF M'!9;0$?Chemometrics Intell. Lab. Syst. 1993,18 E#5+,5IE

    H+ &4)M:";!:&:(394$3$$$Comput. Biol. Med. 1997,27 #H,5H

    HI ( Introduction to learning Bayesian networks from data. In D. Husmeier, R. Dybowski, & S. Roberts (Eds.)0566+

    HG 1

    ):%

    M:

    .

    9; $ Metabolomics 2008,4 #7,7H

    6 (M:"$@@($ $$:F6H$3$E+H

    3+Br.J. Cancer 1957,11 E#E+H,GG

    -.:M

    ;"::.%():(M:8): 1 ' : ;

    !", $

  • "

    $

    I5

    $$ $ $ Breast Cancer Res. Treat. 2007,104 5#7,H

    5 :;

    :

    -.:(M:.%():1'4("!!"$$$$$3$$NMR Biomed. 2006,19 #E6,F6

    E ;:4&:!):;:"%:4!4:/

    ->"1) $ $$ 3 , $,

    $$$$J. Magn. Reson. 1998,135 #HF,565

    7 3!1:d>:-d;:M:M:&&":$;:49":&:*:&3$>M c ( (",!$$Magn. Reson. Med. 2006,55 I#5+G,5IF

    H - ! : 3 ! 1: & & ": ! M: M :- d ;: M 9: &:/ M:*:1':&3$>M)$$ $$ ( (",! $$ Magn. Reson. Med. 2008,60 E#+6,I

    56 @

    :@!:;!:$3$1:&:") &: !,& @ ( ;$ ; 9$ ; ! - "$

    $ "$ 9

    (4$4$Cancer Res. 2000,60 F#HI,H5

    5 B(:1M:!!:(&:-(:/B;$$ $ (,!" $

    3,,$$$J. Magn. Reson. Imaging 2007,25 +#HH5,HHH

    55 > !: $ -: $ " ;: @

    : 4 " @:3 ! @: !,& @ ) $ $$

    $ $ ,,$ $$ Int. J. Radiat. Oncol. Biol. Phys. 2001,51 5#EFH,E+E

    5E 9

    M: 1 (, ( /!" $$

    8$$Magn. Reson. Med. 1992,24 #5E,EI

  • "

    $

    IE

    5F :

    -.:-):.%():;

    :(M:1'c$$$

    3

    $$ (" ! !" $$ NMR Biomed. 2010,23 F#F5F,FE

    5+ 1 &: -: @

    9 -: " *: - -: * .::%3d!(8"4&)8 (8,'$ .$,Q ( 9 4$!Cancer Res. 2008,68 #G5,76

    5I 4!::

    -.::;

    9):;

    :1'9$,

    $$

    $%$!"$NMR Biomed. 201106665LGI5

    5G -%1:/:!:/

    %k

    $$ $Br. J. Surg. 2004,91 6#E6G,E5

    57 ;

    9 ) $$ $ $0 3 fAnn. Oncol. 2007,18 7#E,G

    5H !4):%";:!$9:1$;:;4:"9:$ : 1 : ($ 2:

    : / ):'49$$$$$$ ,

    $ $$Br. J. Surg. 2001,88 H#5EF,5F6

    E6 $:44:3!):-1):@":9 ": M$1:/3 : &(:$

    !1*)1., $ $ $ Nat. Med. 2001,7 5#7I,H

    E !:(3 -: M$1: 9": M ;:*$ 9:"$$ ;: &: 4 &: ! '$

    *)1.,4 $$ Nat. Med. 2001,7 5#H5,7

    E5 &>3 !: / @: / M: !3 ": >3 9:;M*$

    3$48$3

    3 $

    $$Neoplasma 2011,58 F#E,H

    EE !!:AB9:d;)8$

    3$4 $ 3 $ $

    3$$$$J. Obstet. Gynaecol. Res. 2011

    EF 4 " M: . (: 43 : 1 " : 1$ :4,1$ !: 1 : / ! " $$ $ ,

    $$The Lancet 1999,354 H75#7HI,H66

    E+ &(M:(

    @!:4M,@:-!):;$(;:!1:!:9$&':34:!$4":!$$$

  • "

    $

    IF

    EI : 4 : 4: M

    @: &

    $ 4 " !:$$4:"%:(.4!$ !3 ; / /,/ 4$9

    "$/% - J. Clin. Oncol. 2001,19 +# FI7,FG+

    EG !"":":;

    (:"&:")":.

    '

  • "

    $

    I+

    FH (c9:@M'Comparison of a new spectrum alignment algorithm with other methods566$44

    $!!2!!2566:5I6,5I+

    +6 (c9:@M:!M:"$M:1>>@),$3$

    Cancer Inform 2011,10I+,75

    + !

    ":;-:-M4$$,,3Anal. Chim. Acta 2010,659 ,5#5E,EE

    +5 @!:1/:":M!:@@:&$"PLS_Toolbox 4.0 for use with MATLAB )

    $ "$ '$0@

    $

    @566I

    +E &!:4,1:*,M:$!:*9:4>>9M c $$$$$-9$$

    NMR Biomed. 1992,5 F#GH,7F

    +F ;: 9 ;: 3 ?!"Magn. Reson. Med. 1997,38 5#GH,75

    ++ "*:"M!:9>M:!$$:-*:">,-$1:;:$

    ;:4:1$,!!; #((",!

    $ $3

    3$NMR Biomed. 2009

    +I "$4!:4M::1!:13M:4'!:M:1:*:4;:'.!$ $$ 0 /!", $ J Proteome Res 2011,

    +G 4 M: "$4 !: :1!:13 M:4 '!:M:1:*:4;: '. !$ ; 4$ 0 /!",!$2J Proteome Res 2010,10 #55,5E6

    +7 &(4:M:9$%:;3M:!$(:9!: ": *: 4 " 4: M !$@4)4$4Clin. Cancer Res. 2009,15 5#IGI,IG5E

    +H $ , Nature News F$566H

  • "

    $

    II

  • Paper I

  • Is not included due to copyright

  • Paper II

  • Analytica Chimica Acta 683 (2010) 111

    Contents lists available at ScienceDirect

    Analytica Chimica Acta

    journa l homepage: www.e lsev ier .com/ locate /aca

    Alignment of high resolution magic angle spinning magnetic resonance spectrausing warping methods

    Guro F. Giskedegrda,1, Tom G. Bloembergb,1, Geert Postmab, Beathe Sittera, May-Britt Tessema,Ingrid S. Gribbestada, Tone F. Bathena, Lutgarde M.C. Buydensb,

    a Department of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), Trondheim, Norwayb Radboud University Nijmegen, Institute for Molecules and Materials, The Netherlands

    a r t i c l e i n f o

    Article history:Received 19 May 2010Received in revised form 9 September 2010Accepted 16 September 2010Available online 24 September 2010

    Keywords:Peak alignmentWarpingNMRMetabolomicsPeak shiftsMultivariate analysis

    a b s t r a c t

    The peaks of magnetic resonance (MR) spectra can be shifted due to variations in physiological andexperimental conditions, and correcting formisalignedpeaks is an important part of data processingpriorto multivariate analysis. In this paper, ve warping algorithms (icoshift, COW, fastpa, VPdtw and PTW)are compared for their feasibility in aligning spectral peaks in three sets of high resolution magic anglespinning (HR-MAS) MR spectra with different degrees of misalignments, and their merits are discussed.In addition, extraction of information that might be present in the shifts is examined, both for simulateddata and the real MR spectra. The generic evaluation methodology employs a number of frequently usedquality criteria for evaluationof the alignments, togetherwith PLS-DA to assess the inuenceof alignmenton the classication outcome.

    Peak alignment greatly improved the internal similarity of the data sets. Especially icoshift and COWseem suitable for aligning HR-MAS MR spectra, possibly because they perform alignment segment-wise.The choice of reference spectrum can inuence the alignment result, and it is advisable to test severalreferences. Information from the peak shifts was extracted, and in one case cancer sampleswere success-fully discriminated from normal tissue based on shift information only. Based on these ndings, generalrecommendations for alignment of HR-MAS MRS data are presented. Where possible, observations aregeneralized to other data types (e.g. chromatographic data).

    2010 Elsevier B.V. All rights reserved.

    1. Introduction

    Nuclear magnetic resonance spectroscopy, or just magneticresonance spectroscopy (MRS) in a medical context, is a highlyreproducible and robust technique for examining the metabolicproles of uids or tissue specimens. By using high resolutionmagic angle spinning (HR-MAS) MRS, intact tissue samples canbe analysed while peak broadening caused by anisotropic inter-actions is reduced [1]. The result is well-resolved spectra in whichthe metabolites are represented by sharp peaks. The peak posi-tions in an MR spectrum may, however, be shifted, or misaligned,among spectra in a data set. In general, two types of misalignmentare conceivable: non-systematic and systematic misalignments.Non-systematic misalignments can be caused by differences intemperature, intermolecular interactions and other variations dueto imperfect control of experimental conditions [26], while sys-tematic misalignments contain information about the biological

    Corresponding author. Tel.: +31 24 3653180; fax: +31 24 3652653.E-mail address: [email protected] (L.M.C. Buydens).

    1 These authors contributed equally.

    origin of the sample. It has for instance been shown that tumour tis-sue has a lower pH than normal tissue, possibly due to theWarburgeffect [7]. A different pH or ionic strength of samples inuences theionization state of basic or acidic groups and thus their associatedchemical shifts [4,8,9].Metaboliteprotein interactions are anotherpossible source of misalignment [10] which is especially impor-tant to consider when dealing with HR-MAS data of whole-tissuesamples. In general, chemical interactions between substances anddifferent background matrices might provide circumstantial evi-dence for differences between samples by systematic changes inchemical shifts [4,10,11].

    Misalignments between corresponding peaks will affect multi-variate analysis of the data. Therefore, it is generally recommendedto correct for them [3,5,8,12]. Minor misalignment problems canbe overcome by binning the data (typically using a bin widthof 0.04ppm), or by using more sophisticated peak alignmentalgorithms. An important disadvantage of binning is the loss ofresolution and the resulting loss of interpretability [9]. For majormisalignments, binning is not a feasible approach due to the result-ing loss of resolution, and alignment would be preferable. Severaldifferent alignment methods exist. Amongst these, the so-calledwarping methods are most prominent [1319], but other methods

    0003-2670/$ see front matter 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.aca.2010.09.026

  • 2 G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111

    Table 1Characteristics of alignment algorithms.

    Alignment method Optimization criterion Optimization method Alignment unit Aligns by Parameters to optimize

    Icoshift Correlation per segment Cross-correlation by FFT Segments Shifting 2COW Total correlation Dynamic programming Segments Stretching/shrinking 2Fastpa Correlation per segment Beam search Segments Shifting and stretching/shrinking 3VPdtw L1 norm Dynamic Programming Points Shifting 2PTW Weighted cross-correlation NelderMeada simplex [46] Complete spectrum Polynomial model 2

    a Default, different optimization algorithms are available.

    have been described as well [6,2022]. Most algorithms have theirroots in chromatography, but some were specically developed forMRS. It is not clear, however, which algorithm is the best choicefor aligning HR-MAS MRS data and whether the methods origi-nating in chromatography are indeed less suited for this type ofdata.

    In this study, we investigated the suitability of ve differentwarping algorithmsicoshift [13], COW [14,15], fastpa [16], VPdtw[17], and PTW [18]for aligning HR-MAS MRS data. Of these,icoshift and fastpa were developed specically for MRS data. Theperformances of COW and PTW have previously been comparedfor chromatographic data [23] and capillary electrophoresis (CE)data [24]. The performance of VPdtw for chromatographic datawasbriey compared with that of PTW in the original VPdtw paper[17], and the original icoshift [13] paper discusses comparisonswith COW and a number of fastpa-related methods for MRS data.We used three different cancer-related HR-MAS MRS data sets forevaluation, all containing samples from two distinct classes. Allthree data sets represent complex biological samples with varyingdegrees of misalignments. The various algorithms are compared,and their pros and cons will be discussed in this paper. Becausethere is no gold standard for assessing alignment quality, the align-ments were evaluated with a number of commonly used criteriadescribing the similarityof the spectra andquantifying their changedue to alignment. In addition, the data were classied using partialleast squares discriminant analysis (PLS-DA) [25,26] to investigatethe effect of alignment on the classication outcome. Althoughwe limited our evaluation to MRS data, the presented evaluationmethodology is generic and equally valid for other types of data.

    Apart from the warping algorithm, the spectrum to use as thereference for aligning might inuence the end result. Therefore, inall evaluations a number of different references were consideredand their inuence will be discussed.

    A possible drawback of correcting for misalignments is that anyinformation that might be present as systematic misalignments inthe chemical shifts is lost from the spectra. In that case, correc-tion via alignment or binning might be counterproductive. At thesame time, it may be possible to align the spectra while extractingpotential shift information from the warping path that describesthe transformation from unaligned into aligned spectra. This pos-sibility will be discussed in this article.

    To evaluate the effect of alignment on data that display system-atic shifts, a number of simple data sets were simulated in whichclass information was added as intensity differences, shift differ-ences or a combination of the two. These data were aligned, andclassication was performed on both the raw and aligned data,as well as on the shift information (i.e. the coefcients resultingfrom the alignment procedures) and to a combination of thesewiththe aligned spectra. The insights from this procedure were subse-quently used in an attempt to enhance the classication results forthe real data.

    Based on the results from this study, general recommendationsfor choosing an alignment method for HR-MAS MRS data and get-ting the optimal alignment are described. The validity of theseresults and recommendations for other types of data will be dis-cussed.

    2. Experimental

    2.1. Description of the alignment algorithms

    Five different alignment methods were used in this study, andwill be elaborated here. Characteristics of the different alignmentalgorithms are summarized in Table 1. Fig. 1 shows typical warpingpaths, or warping functions, (the new x-axes as a function of the oldx-axis) for all ve methods. For clarity, the differences between thenew x-axes and the old x-axis are drawn, rather than just the newx-axes.

    2.1.1. Interval correlated shifting (icoshift)Interval correlated shifting was developed specically for MRS

    data [13]. It divides spectra into segments, and aligns these to thecorresponding segments of a reference spectrum. The alignment isperformed by shifting the segments sideways so as to maximizetheir correlation. In practice, this involves calculating the cross-correlation between the segments by a fast Fourier transform (FFT)engine that aligns all spectra of a data set simultaneously. The seg-ments can be user-dened or of constant length. Missing parts onthe segment edges are either lled with missing values, or byrepeating the value of the boundary point. The maximum shift cor-rection of the segments can either be equal to a constant denedby the user, or the algorithm can search for the best value for eachsegment [13]. Icoshift is available as a tool forMatlab fromRef. [27].

    2.1.2. Correlation optimized warping (COW)Correlation optimized warping [14,15] is another segmented

    warping method. It aims to optimize the overall correlationbetween two spectra. The spectra are aligned by shrinking orstretching the segments, rather than by shifting them as in icoshift.

    Fig. 1. A comparison of the warping paths of the different alignment methods.Warping paths depicting the ne structures of the different alignment methodsare shown. The same query and reference spectra were used for all methods. Forclarity, the y-scale is set to the difference between the warping paths proper andthe original x-axis.

  • G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111 3

    Another difference with icoshift is that the optimization takes allsegments into account instead of aligning each segment separately,i.e. stretching a segment causes subsequent segments to shift. Themaximum allowed change in segment length is determined by theso-called slack parameter dened by the user. In addition, the usermust specify the segment length.

    The alignment is performed using a dynamic programmingalgorithm [28]. The algorithm uses linear interpolation to createstretched (or shrunken) versions of the individual segments withinthe limits determinedby the slack parameter. It calculates the sumsof the individual correlation coefcients for all combinations of thestretched segments and picks the combination that leads to thelargest sum (and hence the largest overall correlation) to constructthe aligned spectrum. As all possible combinations are considered,dynamic programming will always yield the global optimum forthe chosen parameters [14,15].

    The slack and segment length parameters of COW can be opti-mized by a discrete simplex-like optimization routine described bySkov et al. [29]. An optimization space for both parameters must bespecied, and the initial search is dened by a 55 grid in bothparameter directions. Each of the 25 parameter combinations isevaluatedbycalculating the sumof the simplicityvalueandaveragepeak factor (see Section 2.5) of the corresponding trial alignmentof the data set. By default, the three best combinations are used asstarting points for further simplex optimization. COW is availableas a tool for Matlab from Ref. [27].

    2.1.3. Peak alignment by beam search (fastpa)Like icoshift, peak alignment by beam search was developed for

    MRS data [16]. It also divides the spectra into segments, but alignsthese by both shifting and stretching/shrinking them to maximizetheir respective correlations. Fastpa is based on a routine by For-shed et al. [30] where the segments are chosen automatically toavoid cutting in a peak. However, instead of Forsheds genetic algo-rithm, fastpa uses a faster beam search [31,32] as the optimizationroutine for nding the optimal alignment. This change of opti-mization algorithm is possible because the segments are alignedindependently, as opposed to COW.

    Fastpa requires three input parameters to be specied: themax-imumnumber of segments, themaximumrange of shifting, and themaximum range of stretching or shrinking. In addition, the beamwidth k [31,32] has to be specied as either 1 or 2. From the view-point of optimization, a larger beamwidth is alwayspreferable [32],and we considered k to be constant at a value of 2.

    After choosing segments [30], the algorithm starts by adaptingan initial trial solution of stretches and shifts for the individual seg-ments. The 2 best adaptations are used as the next trial solutionsin the algorithm. This is repeated until the optimal solution withinthe beam search space is found [16]. Fastpa is available as a Matlabtool upon request from the authors [16].

    2.1.4. Variable penalty dynamic time warping (VPdtw)Dynamic time warping (DTW) [33] is generally considered to be

    the rst full-edged warping method that has been developed. Itworks by shifting individual points of the query spectrum, ratherthan complete segments, as in icoshift. Many different sets of rulesexist for allowed shifts [33,34]. Variable penalty DTW is a recentimplementationof asymmetricDTW[17]. Insteadof optimizing thecorrelation between the spectra, VPdtw tries to optimize the L1norm, i.e. the sumof the absolute differences between the variablesin the spectra.

    Regular DTW is notorious for causing artifacts in aligned data,by allowing too many shifts [17,35]. The variable penalty in VPdtwaims to prevent these from occurring by adding a penalty to theL1 norm for each shift. Clifford and Stone [17] propose to use amorphological dilation (i.e. a running maximum) of the reference

    Fig. 2. Breast cancer data. A representative HR-MAS MR spectrum from the breastcancer data set. The inset shows the misalignment for the peaks between 3.18 and3.30ppm. a.u., arbitrary units.

    spectrum as a penalty. This results in a high penalty being addedto the L1 norm for a shift at or near the position of peaks, whereasin a baseline region, it would result in almost no extra increase. Amaximum allowed shift and the penalty must be specied by theuser [17]. VPdtw is available as a package in R from Ref. [36].

    2.1.5. Parametric time warping (PTW)Rather than point-wise shifting, or dividing the spectra into seg-

    ments that can subsequently be shifted and/or stretched, PTW [18]explicitly produces a global polynomial model (the warping func-tion) of the misalignment:

    w(t) =K

    k=0akt

    k

    The rst two coefcients, a0 and a1, in the warping functioncan readily be interpreted as an overall shift and stretch/shrinkage,respectively. Further coefcients correspond to higher orderstretching or shrinking that are useful tomodel changes in themis-alignment along the retention time axis. Bloemberg et al. recentlyproposed to use the weighted cross-correlation (WCC) [37] as theoptimization criterion in PTW [19]. In their implementation, theuser has to specify the order of the warping function and the widthof the triangular weighting function for the WCC.

    The continuity and smoothness of the PTW warping functionimply that PTW does not lead to artifacts like decapitated peaks.In the absence of many high-order terms, the polynomial modelmakes PTWa somewhat restrainedmethod. Thismeans that itmayhave difculties in correcting strongly nonlinear misalignments,but also that overtting is very unlikely to occur [18,19]. PTW isavailable as a package in R from Ref. [38].

    2.2. Data

    Three different cancer-related data sets were used in this study:data fromcervix, breast, and colon tissue. All three data sets containdata from two biologically distinct classes of tissue. The data setsrepresent different degrees of misalignment: the cervical cancerdata display minor misalignments, colon cancer has major mis-alignments and the breast cancer data show something in between.Fig. 2 shows the MR spectra from the breast cancer set as an exam-ple. In addition to these HR-MASMRS data sets, simulated data setswith varying degrees of misalignment were generated.

  • 4 G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111

    Fig. 3. Classication results of simulated data. The results are the averages of three replicates. (a) Class information is contained in the intensities; (b) class informationis contained in the peak shifts; (c) class information is contained in both intensities and shifts. Abbreviations: Unal.: unaligned; coef.: coefcients; comb.: combined; %CC:percentage of correctly classied samples.

    2.2.1. Simulated dataA large number of simple data sets were simulated. Each set

    consists of 100 spectra50 of class 1 and 50 of class 2with aspectral width of 1000 variables and two peaks only of width 10points. Bivariate class information was added as intensity differ-ences (mean differences ranging from 1% to 10% of peak height),shift differences (means ranging from 1 to 25 points) or a com-bination of the two. For each set, six copies were produced withincreasing non-systematic shifts (see Fig. 3), but exactly simi-lar simulation parameters otherwise. Furthermore, all simulationswereperformed in triplicate; all reported results are the averages ofthe results obtained on three sets with similar settings for the ran-domly generated normal distributions that were used to generaterealistic intensity and shift distributions.

    2.2.2. Cervical cancer dataThisdata set is fullydescribed inRef. [39]. In short, cervical tissue

    samples (n=16) were collected after hysterectomy of cervical can-cer patients (n=8) and patients with non-malignant disease (n=8).The samples were analysed by HR-MAS MRS on a Bruker AvanceDRX600, using a water and lipid suppressing spin-echo sequence(cpmgpr, Bruker BioSpin GmbH, Germany). All experiments wereperformed at room temperature and without buffering. The chem-ical shifts were referenced to the lactate doublet at 1.32ppm,and the spectral region between 4.7 and 0.5ppm was saved in amatrix of 163736 variables. The spectra were baseline correctedusing asymmetric least squares [18] with parameters =1e5 andp=0.0001, and the minimum value of each spectrum was set tozero by subtracting the lowest value. The spectra were normalizedto equal total area.

    2.2.3. Breast cancer dataThis data set is fully described in Ref. [40]. Breast cancer tissue

    samples (n=208)wereexcised fromestrogen receptor (ER)positive(n=161) and negative (n=47) patients. The samples were analysedby HR-MAS MRS using a cpmgpr sequence. All experiments wereperformed at 4 C, and the samples were buffered with phosphate-buffered saline (PBS). Chemical shifts were referenced to the TSPpeak at 0ppm. The spectral region between 4.8 and 0.6ppm, rep-resented by 8251 variables, was extracted for further analyses. Thespectra were baseline corrected by subtracting the lowest value ofeach spectrum, and normalized to equal total area.

    2.2.4. Colon cancer dataColon tissue samples (n=32) were excised from the tumour

    area (n=17) and normal mucosa (n=15) of colon cancer patients,and the samples were analysed by HR-MAS MRS using a cpmgprsequence. These samples are part of a larger patient cohortdescribed in Ref. [41]. All experiments were performed at 4 C,

    and the samples were buffered with phosphate-buffered saline. Inorder to induce random misalignments in the data, this data setwas not chemical shift referenced. The spectral region between 4.8and 0ppm, represented by 9661 variables, was extracted for fur-ther analyses. The spectra were baseline corrected by subtractingthe lowest value of each spectrum, and normalized to equal totalarea (excluding polyethylene glycol pollution at 3.71ppm).

    2.3. Alignment of simulated data

    All simulated data sets were aligned using the ve warpingmethods and alignment was performed using the rst spectrumas the reference. Icoshift was set to align the data in two segments,whereas PTW was set to align using a linear warping function, cor-responding to an overall shift and stretch. Unexpectedly, COW raninto memory problems when the segment length was chosen to behalf the spectralwidth (500points), and the segment lengthwas setto one tenth of the spectral width (100 points). VPdtw was unableto produce well-aligned data consistently and the fastpa algorithmwas too unstable in its current form to allowhigh-throughput anal-ysis of a large number of data sets.

    The obtained warping coefcients correspond to two (integer)shifts for icoshift (one shift coefcient per segment), a shift and astretch coefcient for PTW and ten segment endpoints for COW.For classication purposes, the icoshift coefcients were used asis, whereas the stretch coefcient for PTW was multiplied by thenumber of data points (1000) after subtracting 1 (the default forno alignment) from it, as described in Ref. [19]. In this way, theshift and stretch coefcients are on comparable scales. For COW,the original segment endpointswere subtracted from the newend-points, so as to provide the differences between them.

    2.4. Alignment of real data

    The cervix, colon, and breast cancer data setswere aligned usingthe ve different warping methods, as described below. Ten differ-ent reference spectrawere used subsequently for aligning the data;this in order to examine the inuence of choosing different refer-ences and also to examine the robustness of the warping methods.Four spectrawere chosen from each of the two classes in a data set:two randomly chosen ones and the two spectra having the high-est average correlation with the other spectra in the data set. Inaddition, themeanand themedian spectrawereused as references.

    2.4.1. IcoshiftThe optimal number of segments for icoshiftwas determined by

    visual inspection of trial alignments and by the average overall cor-relation, and ranged from 20 to 150 for the different data sets andreferences. The maximum allowed shifts were determined by the

  • G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111 5

    algorithm. For somematrices this led to artifacts, and themaximumallowed shift was manually determined instead. A full spectrumcorrectionwasperformedprior to alignmentof the segments.Miss-ing parts on the segment edges were replaced by repeating thevalue of the boundary point. Both user-dened segments and seg-ments of constant length were tested.

    2.4.2. COWParameters for COW were determined using the optimization

    routine by Skov et al. [29]. The search interval for segment lengthwas based on the average peak width, as suggested by the authors,and the slack size search space ranged from 1 to 15. The searchspace was increased if the limit values were chosen as the opti-mal parameter. Optimal segment length ranged from 30 to 300variables.

    2.4.3. FastpaFastpa parameters were optimized by visual inspection of trial

    alignments, and the overall average correlation. The spectra werezero-padded prior to alignment to avoid cutting off peaks at thespectrum edges. This was also done because fastpa does not alignthe last segment of the spectra. The ranges of segment number,sideways movement and interpolation were 40170, 10150 and10100, respectively.

    2.4.4. VPdtwAlignments were performed on the normalized data, after addi-

    tional square-root scaling, as this turned out to provide betteralignment results than for unscaled data. The resulting warpingpaths were applied to the normalized data. The penalties that wereused for the alignment were morphological dilations of the data, asdescribed in Ref. [17]. Penalties were determined by trial and erroruntil satisfactory alignment results were produced. The maximumallowed shift (thewidth of the Sakoe-Chiba band [33]) ranged from100 to 300 variables.

    2.4.5. PTWPrior to alignment, the data were zero-padded, as described

    in Ref. [19]. The resulting warping coefcients were transformedaccordingly and applied to the unpadded data. Triangle widths forthe weighted cross-correlation measure were on the order of thelargest misalignment in the data, as determined by visual inspec-tion and ranged from 2 to 100 variables.

    2.5. Evaluation criteria

    The alignment results were assessed based on different mea-sures:

    2.5.1. CorrelationThe spectra of a data set will be more uniform after successful

    alignment, and thereby have a higher correlation. The correlationbetween all the spectra of a data setwas calculated before and afteralignment.

    2.5.2. Simplicity valueThe simplicity value is related to principal component analysis

    (PCA) by singular value decomposition (SVD) of a matrix, wherethe singular values state how much variance is explained by eachcomponent. Aligned spectra will have more variance explained bythe rst components. The simplicity value of a matrix is dened asthe sum of all singular values of the matrixscaled to a total sum ofsquares of onetaken to the fourth power, and will be larger when

    more variation is explained by the rst components [29].

    Simplicity =SVD

    Xi

    j

    xij2

    4

    2.5.3. Peak factorThe peak factor gives an estimate of how much the area and the

    shape of the peaks have changed in a spectrum after alignment. Itcompares the Euclidian length, or norm, of a spectrum before andafter alignment. If the peak area and shape stay almost the same,the difference between the norms before and after alignment willbe small [29]. The optimal value for the peak factor is 1, meaningthat there is no change in peak shape.

    Peak factor =I

    i=1(1 min (ci,1)2)

    I

    where

    ci =norm(xi,after) norm(xi,before)norm(xi,before)

    2.5.4. Classication

    Correcting for misalignments should improve the classicationresults for data sets distorted by random shifts. However, it isalso possible that information arising from biological differencesbetween different classes may be removed when the spectra arealigned. In PLS, latent variables (LVs) are derived to maximize thecovariance between the spectra and a quantity to bemodelled. PLS-DA is a special case of PLS that attempts to discriminate betweenclasses, represented by discrete numbers. Here, PLS-DA was usedto evaluate the classiability of aligned and unaligned data. In addi-tion, the warping path or warping parameters were used as inputto investigate possible shift information.

    2.5.5. Visual inspectionQuantitative measures are valuable means for comparing spe-

    cic characteristics of large sets of data at a glance, but they alsohave their limits. Thehumaneye andbrain are still unsurpassed as apattern recognition tool. In the context of alignment, especially theassessment of alignment quality and detection of artifacts benetfrom visual inspection.

    2.6. Classication of simulated sets

    Eachdata setwas classiedusingPLS-DA.Classicationwasper-formed on the unaligned spectra, the aligned spectra, the warpingcoefcients and a combination of the aligned spectra and the coef-cients. The latterwas achieved by simply concatenating the spectrawith the coefcients andmultiplying the latterwith a large number(on the scale of the average-scaled spectra, typically 100 was used)to make sure they would be contained in the rst latent variablesof the PLS model.

    PLS-DA, including mean centering, was performed using fullleave-one-out cross-validation (LOO-CV), and the number of LVsgiving the rst minimum in prediction error was chosen for themodel. PLS-DA was performed in Matlab 7.7.0.471 (R2008b, TheMathworks, Inc., Natick, USA) using PLS toolbox 5.5.1 (EigenvectorResearch, Wenatchee, USA).

  • 6 G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111

    2.7. Classication of real data

    The data sets were classied using PLS-DA, as described forthe simulated data. Classication was performed on the unalignedspectra, the aligned spectra, the warping coefcients, and acombination of aligned spectra and coefcients (multiplied by100).

    Pollutions from ethanol and fatty residuals, and from polyethy-lene glycol for the colon cancer data, were removed from thespectraprior to classication. Singleoutlying spectrawere removedfrom the colon cancer and breast cancer data sets. For the breastcancer data, the spectra were square-root scaled prior to analysis.

    3. Results and discussion

    3.1. Simulated data

    Fig. 3a shows the average classication results for the setsin which classes are coded as intensity differences. Fastpa andVPdtw were not capable of correctly aligning the simulated data.As expected, classication rates for unaligned data decreased asrandom misalignment increased. Aligned data, on the other hand,delivered stable classication results. The warping coefcients didnot contain class information, as expected, since the simulatedshifts were completely random.

    The average classication results for three data sets where classinformation is purely contained as peak shifts are depicted inFig. 3b. As expected, the situation here was completely oppositeto the previous one. The raw data gave better results than thealigned data; alignment removes the information of interest fromthe spectra. Now, the coefcients do contain information and inthis case the coefcients give even better classication results thanthe raw data themselves. This is most likely due to less noise beingpresent in the coefcients than in the data. Because classication isbased on intensities, the intensity noise in the unaligned data willhave a negative inuence on the result. By extracting the positionalinformation of the peaks into thewarping coefcients, it effectivelybecomes available as information without the intensity noise thatwas present in the spectra.

    When class information is present in both the shifts and theintensities, asmight be expectedwith real data, the situationwill besomewhere in between the two extremes discussed above. Whereexactly depends on the data at hand. In that respect, any simulationis rather arbitrary and we should not over-interpret the results. Itis clear from the example in Fig. 3c however, thatif discrimina-tion is the main interestthere are situations in which aligned dataon their own can be a sub-optimal choice as input for multivariateanalyses when there is information present in the shifts. In thesecases, for icoshift and PTW, the combination of aligned data withthe warping coefcients delivered the best classication results.The COW results for the combination of data and coefcients werea lotworse than thoseof theother twomethods; thismayhave todowith the sub-optimal parameter settings that were used to preventthe program from running into memory problems. Furthermore, itis likely that the good results for PTWaredue to the simplicity of thedata and the resulting suitability of a warping function of degree 1for modelling the misalignments. Bearing in mind the conclusionsfrom Refs. [23,24], it is to be expected that individual shifts in morecomplexMRdata cannotbemodelledverywellwith theglobal PTWmodel. Icoshift and COWaremore likely to extract positional infor-mation in a way that is suited for multivariate analyses, althoughthe cumulative character of the COW warping path might smearout misalignment information over several segments, making itharder to interpret.

    3.2. Real data

    3.2.1. Correlation and simplicity valueA plethora of similarity and distance measures are used as opti-

    mization criteria in different alignment algorithms. Icoshift, COWand fastpa are all optimized using correlation as a criterion. DTW isavailable with various distance measures [34] and VPdtw employsthe L1 norm as a criterion for optimization. PTW optimization wasoriginally based on the rootmean square difference (RMS) betweenspectra [18], but the current implementation uses the weightedcross-correlation as a similarity measure [19]. There is still no gen-erally accepted gold standard measure for assessing alignmentquality. However, the combination of simplicity value and peakfactor introduced by Skov et al. [29] is an interesting choice. Bothmeasures, together with the correlation, were used to assess align-ment quality in this study. In addition, the RMS and WCC criteriawere examined, but these measures did not provide extra infor-mation. It should be kept in mind that methods optimizing thecorrelation will very likely be biased towards that measure andit is not certain that the results for the simplicity value will becompletely independent.

    Fig. 4 shows box plots of the mutual correlations between allsamples in the three HR-MAS data sets, before alignment and afteralignment with each of the ve warping methods for ten differentreferences. It is clear that for all methods, the correlation distri-butions of the aligned data are better than for the unaligned data.This is especially pronounced for the colon cancer data set whichhas the largest misalignments. Here, all warping methods greatlyimproved the correlations, with PTW scoring lower than the othermethods. For the cervical cancer data, where the unaligned datadisplayed only minor misalignments, VPdtw resulted in the lowestcorrelation values, while the other methods performed compara-ble.

    The simplicity values for the raw and aligned data in Fig. 4 con-vey the same general picture as the correlations. There are somedifferences, however. The most striking ones are the PTW andVPdtw results for the colon cancer data set. When looking at thecorrelations, the PTW correlations are clearly lower than the onesfor icoshift, COW, and fastpa, and comparable to the VPdtw correla-tions. The PTWsimplicity values, however, are comparable to thoseof icoshift, COW, and fastpa, whereas the VPdtw simplicity valuesare much lower. The simplicity value is inuenced by the intensi-ties of thepeaks, andpeakswithhigh intensities inuence the valuemore than low intensity peaks. The colon cancer data displayed ahigh intensity peak at 3.71ppm, resulting frompolyethylene glycol.When the peak was removed from the data set, the simplicity val-ues were more in accordance with the correlations. This exampleshows a weakness of the simplicity value, and it might be advisableto scale the data prior to simplicity calculations.

    Fig. 4 also shows that the choice of reference can have a largeinuence on the alignment result for one-dimensional HR-MASMRspectra, contrary to the observation made in Ref. [21] for LCMSdata. The breast cancer data generally provided stable results, butdemonstrated that a bad choice of reference had a larger inu-ence on the correlations than the particular warping method thatis used. Icoshift, COW and fastpa, which are all segmented warpingmethods, appeared to be less inuenced by the choice of reference,whereas PTW gave worse results than the unaligned data for someof the randomly chosen reference spectra. It is therefore advisableto try different references when aligning. Using the spectrum thathas the highest average correlation to the other spectra does notseem to be a bad choice; however, it does not always give the opti-mal alignment. For data sets consisting of two or more classes,it is conceivable that the alignment will be affected by the classthe reference belongs to. Trying references from both classes maytherefore be advisable. Using the mean or median spectrum as a

  • G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111 7

    Fig. 4. Correlation and simplicity values for different alignment methods. In all plots, for each method, results from 10 different reference spectra are shown. From left toright: the spectrum having the highest average correlation with all other spectra, the (on average) second most highly correlated spectrum from the same class, and tworandom spectra from the same class; the most highly correlated spectrum, the second most highly correlated spectrum, and two random spectra from the other class, andthe overall mean and median spectra. The box plots show the distributions of correlation values for the data sets; the red box stands for the unaligned data. The scatter plotsshow the simplicity values of the data sets; the red line depicts the simplicity value of the unaligned data. (a) Correlations of the cervical cancer data; (b) simplicity valuesof the cervical cancer data; (c) correlations of the breast cancer data; (d) simplicity values of the breast cancer data; (e) correlations of the colon cancer data; (f) simplicityvalues of the colon cancer data.

    reference is also an option. This may not be a good choice for datasetswith bigmisalignments though, as themean/median spectrumwill have broad peaks and may not resemble a real spectrum. Thiscan be overcomebyusing an iterative procedure, i.e. by aligning thedata and then recalculating the reference spectrum. Thiswas tested

    for the colon cancer data in this study, and the resulting alignmentsthen resembled those for the other references (results not shown).

    In some cases, similarity measures can give the wrong impres-sion of the alignment quality. An example of this is when peaks arebadly deformed in order to give a high correlation, as in unpenal-

  • 8 G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111

    Fig. 5. Peak factors for different alignment methods. Results from the same 10 different references as in Fig. 4 are shown (a) peak factors for the cervical cancer data; (b)peak factors for the breast cancer data; (c) peak factors for the colon cancer data.

    ized DTW [17,35]. Other examples were encountered when usingicoshift: when using segments of constant length, the segmentedges would sometimes be located in a peak, leading to majorartifacts in the peak shapes, while the correlation of the spec-tra remained high. For some parameter choices for icoshift, thepeaks were displaced to the wrong position, but again correlationsremained high. Therefore, visual inspection of the data after align-ment remains of the utmost importance. This will however put alimit to the complexity of the data of interest. While HR-MAS spec-tra of tissues have quite well-dened peaks, MR spectra from uidsmay bemore crowded,making visual inspection of the aligned datachallenging. For these data, other methods may be more suitable,for instance the one described by Alm et al. [6].

    3.2.2. Peak factorPeak factor calculations for the different warping methods are

    shown in Fig. 5. Thedifferences between themethods are small, andoverall, all methods performed well. Icoshift and VPdtw gave thehighest peak factors. This result was as expected for icoshift, as itonly shifts segments of the spectra, as opposed to fastpa, COW andPTW that do shrinking and stretching. For VPdtw, the high peakfactors are noteworthy, given that DTW has a history of stronglydeforming peaks. Clearly, the variable penalty of VPdtw does whatit is intended to do.

    COWsparameters are optimizedbasedonpeak factor, andCOWgave the best results of the warping methods that do shrinking andshifting. On average, fastpa had the lowest peak factor values, andwas thus the method that changed the data most after alignment.This is also obvious from the warping functions in Fig. 1; fastpatypically had the most extreme warping path. As for correlationand simplicity value, icoshift and COW provided stable results fordifferent reference spectra. The results for fastpa and PTW variedmore, and it appears that the choice of reference is more critical forthese methods.

    3.2.3. Classication resultsThe classication results for unaligned and aligned data are

    shown in Fig. 6. For the cervical cancer data, the unaligned dataalready gave good results, with only one out of 16 samples mis-classied. In general, the aligned cervical cancer data gave betterclassication results for all methods. For the breast cancer data,the average classication results, based on the 10 different refer-ences, improved for all methods except PTW. Here, the use of somereference spectra improved the classication results while othersgave worse results. Despite the fact that both the correlation andthe simplicity value of the colon cancer data greatly improved byaligning, the classication results did not improve. After alignmenttheaverageclassication resultsdecreasedbyoneor twoadditionalsamples. It is not unlikely that the results for the unaligned data arebetter simply by chance. In theory, it is also possible that peakshavebeen aligned to the wrong peaks of the reference spectrum. This is

    unlikely however, as visual inspection of the alignment results gaveno indication of wrong alignments.

    Another possibility is that the shifts of the colon cancer datacontained information, resulting from systematic misalignmentsof the spectra. Information present in the shifts would get lost afteralignment. This was investigated for all the data sets by classi-cation of the warping parameters and a combination of spectraand parameters. It should be noted that the prediction error fromcross-validated PLS-DA was based on the same data set as the oneused for choosing the optimal number of LVs. Thus, the predictionerrorwill be slightly biased towards values lower than 0.5, and onlyresults that differed strongly from 0.5 were considered important.Classication of the warping parameters alone did not give reliablepredictions for thebreast cancer and the colon cancer data. Further-more, combining the warping parameters with the spectra did notimprove classication. As previously described, cancer tissue canhave a different pH than normal tissue, and the pH of a sample is animportant source of shift variation. For the breast cancer samples,there are no established hypotheses for pH differences between ERpositive and negative samples, and the results were as expected.For colon cancer, the samples in the data set were from either nor-mal or cancer tissue. Therefore, differences in pH are more likely,even though the samples were buffered prior to HR-MAS analysis[42]. However, it is likely that shift information that might havebeen present in the data was masked by the major random shiftsof the data set, similar to what is shown in Fig. 3b and c for thesimulated data.

    For the cervical cancer data, the results clearly indicate that thewarping parameters of icoshift, COWand fastpa contain class infor-mation. Classication of the parameters gave an average correctclassication of 87%, 84% and 76% for icoshift, COW and fastpa,respectively (Fig. 6d). Combining the spectra with the parameterswas not benecial for the overall classication. Thus, the informa-tion fromtheparameterswas redundant.Nevertheless, the fact thatshift information alone can discriminate between normal cervicaltissue and cancerous tissue is very interesting. The cervical sampleswere analysed by HR-MAS without buffering, and therefore pH dif-ferences related to cancer-induced changes in tissue may be morepronounced here than for the colon cancer samples. So despitethe redundancy of the shift information for the cervical data, thisresult indicates that measuring biological samples without buffer-ing might reveal biologically relevant differences that would beobscured otherwise.

    It is not hard to see why icoshift and fastpa provided warpingcoefcients that are suitable for subsequent multivariate analysis.These MRS-oriented methods align their segments independently;therefore corresponding shifts will always occur at the same posi-tion in the spectra. Opposed to that, for VPdtw, the effect ofwarpingis cumulative, and the actual stretching occurs at slightly differentplaces for different spectra even if they have similarmisalignments(results not shown). The alignment of COW is also cumulative, but

  • G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111 9

    Fig. 6. PLS-DA classication results for different alignment methods. Results from the same 10 different references as in Fig. 4 are shown. The red line denotes classicationresults of the unaligned data (a) classication results of the cervical cancer data; (b) classication results of the breast cancer data; (c) classication results of the colon cancerdata; (d) classication results of the cervical cancer warping coefcients.

    its segmented nature largely prevents it from showing many localdifferences. Thus, it is not surprising that COWs warping coef-cients for the cervical cancer data also led to good classicationresults. Both for VPdtw and COW, classication was also attemptedusing the cumulative sums of their warping functions instead, to(further) alleviate the local differences, but this did not improve theresults. For PTW, alignment was performed using a quadratic func-tion, and it can be assumed that the relevant shifts in the spectrawere too complex to be modelled well by this function.

    To summarize, the results presented here show that alignmentof the data using the warping methods examined in this work notalways improves the classication results compared to unaligneddata. However, alignment improved the interpretability of the

    resulting model by providing less ambiguous loading proles forPLS-DA, similar to the observations in Refs. [4345]. This is espe-cially important in situations where discriminating between twoclasses is not the only interest, but where one is also interested inlooking at the differences in metabolic proles to interpret biologi-cal incidences in the tissue. For that purpose, alignmentwill alwaysbe preferable.

    3.2.4. AlgorithmsTable 2 summarizes the evaluations of the different warping

    algorithms. Overall, icoshift and COW gave good alignment resultsand preserved the peak shapes. For COW, the optimization rou-tine has a large part in this. Icoshift required quite some manual

    Table 2Evaluation of alignment methodsa.

    Alignment method Programmingstability

    Memoryefciency

    Speed Optimization ofparameters

    Peakconservation

    Artifact- free Alignmentqualityc

    Icoshift + + ++ 0 ++ +COW + + +b ++ ++Fastpa + 0 0 0 +VPdtw + + + 0 + 0 0PTW + + 0 0 0 ++ 0

    a ++, very good; +, good; 0, moderate; , improvement advisable; , improvement necessary.b COW parameters were optimized to conserve peak shape prior to alignment.c The alignment quality is assessed based on the end result after optimization of the parameters.

  • 10 G.F. Giskedegrd et al. / Analytica Chimica Acta 683 (2010) 111

    Table 3Benchmarks. Time consumptions for alignment of the breast cancer data set usingthe same reference spectrum.

    Alignment method Time (s)

    Icoshift 3.73COW 292Fastpa 87.0VPdtw 42.8PTW 149

    optimization, but this was not considered a problem because ofits speed, and the end results were satisfactory. For the parame-ter combinations allowed by the algorithm, fastpa also gave goodalignment results, but at the expense of larger peak shape changes.This is not surprising when looking at Fig. 1: fastpas warping pathswere typically more extreme than those of the other algorithms.The segmentations of the spectra offered by fastpa and its prede-cessor [30]were typically good, however. A combinationof thispartof the algorithm with the alignment power of COW or icoshift maybe worthwhile. Both VPdtw and PTW delivered variable results.The variable penalty in VPdtw clearly prevents the algorithm fromdeforming the spectra, but optimizing the resulting alignments isnot trivial. The polynomialwarping function of PTW is probably notexible enough to model the local shifts occurring in NMR spectravery well.

    The time consumption for alignment varies a lot among themethods. Table 3 shows the benchmarks for alignment of thebreast cancer data set using the same reference spectrum. Thebenchmarks were obtained on a Dell Latitude E6400 laptop,equipped with an Intel Core 2 Duo P9500 processor running at2.53GHz and 3.48GB of RAM. The operating system was MicrosoftWindows XP SP3 (32bit). Alignments with icoshift, COW andfastpa were performed in Matlab, version 7.7.0.471 (R2008b),while VPdtw and PTW alignments were performed in R, version2.9.2.

    Icoshiftwas the fastestwarpingmethod, and alignment of a dataset of 209 spectra was done in a few seconds. COW, on the otherhand, was the most time-consuming of the methods tested here,andusedminutes to perform the samealignment. Also, COWsome-times runs into memory problems for large data sets. This can beovercome by choosing different parameters, or by dividing the dataset in smaller subsets.Obviously, thismay result in anal alignmentthat is not the optimal one for the data set.

    The benchmarks shown here do not include optimization ofthe parameters, as that largely depends on the effort put into theproceduresby theuser. For fastpa, threeparametershave tobeopti-mized, as opposed to the other methods with only two parameters.This made fastpa more time-consuming to optimize. In addition,some combinations of fastpa parameters will give an error withoutany obvious reason. COWhas an automatic optimization procedurethat is time-consuming; however, it requires a minimum of effortfrom the user.

    Although the evaluations in this paperwere limited toMRSdata,some of the observations above c