bayesian modelling and computation for raman spectroscopy

44
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion Bayesian modelling and computation for Raman spectroscopy Matt Moores Department of Statistics University of Warwick Oxford computational statistics & machine learning reading group March 11, 2016

Upload: matt-moores

Post on 09-Feb-2017

189 views

Category:

Science


1 download

TRANSCRIPT

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Bayesian modelling and computationfor Raman spectroscopy

    Matt Moores

    Department of StatisticsUniversity of Warwick

    Oxford computational statistics & machine learning reading groupMarch 11, 2016

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Acknowledgements

    University of WarwickMark GirolamiJake CarsonKarla Monterrubio Gmez

    University of StrathclydeKirsten GracieKaren FauldsDuncan Graham

    Funded by the EPSRC grant In Situ Nanoparticle Assemblies forHealthcare Diagnostics and Therapy (ref: EP/L014165/1) and anAward for Postdoctoral Collaboration from the EPSRC Network onComputational Statistics & Machine Learning (ref: EP/K009788/2)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Outline

    1 Raman Spectroscopy

    2 Functional Model

    3 Bayesian Computation

    4 Experimental ResultsModel ChoiceMultivariate Calibration

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Raman spectroscopy

    Illustration courtesy Jake Carson (U. Warwick)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Statistical properties

    Each Raman-active dye has a unique spectral signature:Peaks correspond to vibrational modes of the moleculeShift in wavenumber proportional to change in energy stateSmoothly-varying baseline (background fluorescence)

    200 300 400 500 600 700 800 900 1100 1300 1500 1700

    10000

    20000

    30000

    40000

    50000

    ~ cm

    1

    photo

    n c

    ounts

    Replicate

    A

    B

    C

    D

    E

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Surface-enhanced Raman scattering (SERS)

    Raman signal enhanced by proximity to nanoparticlesFunctionalisation using antibodies

    Illustration courtesy Kirsten Gracie (U. Strathclyde)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Fluorescent background

    200 400 600 800 1000 1200 1400 1600 1800 2000

    02

    00

    40

    06

    00

    80

    0

    (cm1)

    ph

    oto

    n c

    ou

    nts

    (a

    .u.)

    (a) surface-enhanced fluorescence(SEF)

    200 400 600 800 1000 1200 1400 1600 1800 2000

    02

    00

    40

    06

    00

    80

    0

    (cm1)

    ph

    oto

    n c

    ou

    nts

    (a

    .u.)

    Well E1

    Well E2

    Well E3

    background

    (b) surface-enhanced Raman spec-troscopy (SERS)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Baseline correction

    Existing methods estimate the baseline independently:Asymmetric least squaresBoelens, Eilers & Hankemeier (Anal. Chem., 2005)

    Modified polynomial fitLieber & Mahadevan-Jansen (Appl. Spectrosc., 2003)

    Robust baseline estimationRuckstuhl et al. (JQSRT, 2001)

    WaveletsCai, Zhang & Ben-Amotz (Appl. Spectrosc., 2001)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Additive functional model of a SERS spectrum

    Separate the hyperspectral signal into 3 components:

    yi() = i() + s() + (1)

    where:yi() is a an observed SERS spectrum, discretised at

    multiple wavenumbers j Vi() is a smooth baseline functions() is the spectral signature of the dye molecule is additive, zero mean white noise with variance 2

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Baseline

    Penalised spline:

    i() =

    Mm=1

    Bm()i,m (2)

    (i,) NM (0,) (3)where Bm() are Demmler-Reinsch or B-spline basis functions

    550 600 650 700 750 800

    1

    .5

    0.5

    0.5

    1.5

    ~ (cm1

    )

    Bm(~

    )

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Spectral signature

    An additive mixture of radial basis functions:

    s() =P

    p=1

    f ( | `p,Ap, p) (4)

    where:`p is the location of peak pAp is the amplitudep is the scale (broadening)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Squared exponential

    Kernel function is the Gaussian density:

    f (j | `p,Ap, p) = Ap exp

    {(j `p)2

    22p

    }(5)

    FWHM = 2

    2 ln 2p (6)

    1300 1400 1500 1600 1700 1800

    05

    00

    01

    00

    00

    15

    00

    0

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Lorentzian

    Long-range dependence between peaks can be modelledusing the Cauchy density:

    f (j | `p,Ap, p) = Ap2p

    (j `p)2 + 2p(7)

    FWHM = 2p (8)

    1300 1400 1500 1600 1700 1800

    05

    00

    01

    00

    00

    15

    00

    0

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Informative priors

    Obtained from manual peak fitting of independent data:

    Density

    1 2 5 10 20 50

    0.0

    00.0

    50.1

    00.1

    5

    kernel density

    lognormal

    (a) Scale parameters, (cm1)

    amplitudes

    Density

    0 5000 10000 15000 20000 250000.0

    0000

    0.0

    0010

    0.0

    0020

    kernel density

    truncated normal

    gamma

    (b) Amplitudes, A (arbitrary units)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Multivariate calibration

    Signal intensity depends linearly on dye concentration, from thelimit of detection (LOD) up to monolayer coverage:

    Ap = pci, cLOD ci < cMLC (9)

    where:ci is the nanomolar (nM) concentration of the dye in

    observation ip is a linear regression coefficient

    cLOD is based on the signal-to-noise ratiocMLC is proportional to the surface area of the

    nanoparticles

    Jones, et al. (1999) Anal. Chem. 71(3): 596601

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Markov chain Monte Carlo

    MCMC targeting the joint posterior (,,, | yi())

    Algorithm 1 Marginal Metropolis-Hastings1: Draw random walk proposals for the peaks: ,

    2: Propose baseline i, q(i, | yi(),,,

    )3: Propose q

    ( | yi(),,, i,

    )4: Compute the marginal acceptance ratio:

    =p(yi() | ,,, )

    p(yi() | ,,, )q( | )q( | )(

    )()()()

    q( | )q( | )()()()( )

    =p(yi() | ,)()()

    p(yi() | ,)()()

    5: Accept ,, i,, jointly with probability min(1, )

    (assuming peak locations `p and number of peaks P are fixed)

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Sequential Monte Carlo

    Particle-based method targeting a sequence of partialposteriors i (,, 1:i,, | Y1:i,)

    Algorithm 2 SMC

    1: Initialise (q), (q), (q), (q) q {1, . . . ,Q}2: Initialise importance weights, w(q)0 =

    1Q

    3: for all observations i = 1, . . . , n do4: Update importance weights:

    w(q)i w(q)i1 p

    (yi() | (q),(q), (q)i, , (q)

    )(10)

    5: Resample particles if ESSi is below threshold6: for all particles q {1, . . . ,Q} do7: Update (q), (q), (q), (q) using Algorithm 18: end for9: end for

    Chopin (2002) Biometrika 89(3): 539551

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Model evidence

    Algorithm 2 provides a consistent, unbiased estimate of themarginal likelihood:

    ZiZi1

    =

    Qq=1

    w(q)i1 p(

    yi() | (q)k)

    (11)

    p(Y | Mk) Zn =n

    i=1

    ZiZi1

    (12)

    where:

    (q)k =

    {

    (q)k ,

    (q)k ,

    (q)k ,

    (q)k

    }are the parameters of

    model Mk for particle qZn is the normalising constantZ0 = 1

    Del Moral, Doucet & Jasra (2006) JRSS B 68(3): 411436Pitt, Silva, Giordani & Kohn (2012) J. Econom. 171(2): 134151

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Thermodynamic integration

    An alternative approach is to use the path sampling identity:

    log{ZnZ0

    }=

    10

    E[

    d log q()d

    ]d (13)

    where = in identifies the sequence of partial posteriordistributions p =

    qZi = i (,, 1:i,, | Y1:i,)

    This equation cannot be solved exactly, so it must beapproximated using a numerical integration method.

    The expectation E can be estimated using a weighted sumover the SMC particles.

    Zhou, Johansen & Aston (2015) arXiv:1303.3123 [stat.ME]Gelman & Meng (1998) Statist. Sci. 13(2): 95208

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    R package serrsBayesl i b r a r y ( serrsBayes )l i b r a r y ( hyperSpec )

    ramanSpectra read . spc ("mfutils.spc" )wavenumbers wl ( ramanSpectra )peakLoc wl2 i ( ramanSpectra , c (964 , 1138 , 1218 , . . . ) )

    # i n f o r m a t i v e p r i o r s f o r the peaks and base l inel P r i o r s l i s t ( scale .mu=log (25 .27 ) ( 0 . 4 ^2 ) / 2 ,

    scale . sd=0.4 , b l . smooth=1.25e3, b l . knots =200 ,amp.mu=3449 , amp. sd=5672 , noise . sd=3 ,noise . nu= length ( wavenumbers ) nrow ( ramanSpectra ) ,beta . sd=1000)

    r e s u l t f i tPeaksWithBasel ineSMC ( wavenumbers ,ramanSpectra [ [ ] ] , peakLoc , l P r i o r s )

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Synthetic data: Lorentzian peaks + Cauchy kernel10

    000

    4000

    070

    000

    Inte

    nsity

    040

    0080

    0012

    000

    Inte

    nsity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Lorentzian peaks + squared exponential kernel10

    000

    4000

    070

    000

    Inte

    nsity

    040

    0080

    00In

    tens

    ity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Gaussian peaks + sq. exp. kernel

    1000

    040

    000

    7000

    0In

    tens

    ity

    040

    0080

    0012

    000

    Inte

    nsity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Gaussian peaks + Cauchy kernel10

    000

    4000

    070

    000

    Inte

    nsity

    050

    0010

    000

    1500

    0In

    tens

    ity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Real data: TAMRA + Cauchy kernel10

    000

    3000

    050

    000

    7000

    0In

    tens

    ity

    050

    0010

    000

    1500

    0In

    tens

    ity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Real data: TAMRA + sq. exp. kernel

    1000

    040

    000

    7000

    0In

    tens

    ity

    040

    0080

    00In

    tens

    ity

    1000

    030

    000

    5000

    0In

    tens

    ity

    250 500 750 1000 1250 1500 1750Raman Shift

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Real data: parameter estimates for

    0 2000 4000 6000 8000 10000

    200

    210

    220

    230

    (a) Cauchy

    0 2000 4000 6000 8000 1000021

    022

    023

    024

    025

    0

    (b) Squared Exponential

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Model Choice

    The Bayes Factor (log10 BF) correctly identifies the generativemodel for simulated spectra with Gaussian (194) andLorentzian (-160) peaks:

    Table: Simulation study

    Data Mk log10 p(Y | Mk)Gaussian sq. exp. -2011Gaussian Cauchy -2205Lorentzian sq. exp. -2163Lorentzian Cauchy -2004

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Model Choice for TAMRA

    The Bayes Factor favours Lorentzian peaks (log10 BF = -32) fortetramethylrhodamine (TAMRA):

    Table: Observed spectra (TAMRA)

    Data Mk log10 p(Y | Mk)TAMRA sq. exp. -2257TAMRA Cauchy -2225

    This indicates very strong evidence of long-range dependencebetween peaks in SERS spectra.

    Kass & Raftery (1995) JASA 90(430): 773795

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Dilution study for TAMRA

    315 spectra at 21 different concentrations, from 0.13 to 24.7 nM

    500 1000 1500 2000

    05

    00

    01

    00

    00

    15

    00

    02

    00

    00

    25

    00

    0

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Baseline correction: TAMRA

    500 1000 1500 2000

    05

    00

    01

    00

    00

    15

    00

    02

    00

    00

    25

    00

    0

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

    24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Spectral signature: TAMRA

    500 1000 1500 2000

    05

    00

    01

    00

    00

    15

    00

    0

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

    24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    95% HPD intervals: TAMRA`p (cm1) p (nM1) FWHM (cm1) LOD (nM)

    460 [6.73; 17.23] [0.00; 14.43] [0.211; 0.784]505 [167.95; 177.83] [14.36; 15.52] [0.019; 0.047]632 [253.65; 263.16] [11.54; 12.13] [0.012; 0.032]725 [19.63; 29.52] [9.97; 16.43] [0.123; 0.316]752 [44.74; 54.81] [19.49; 23.45] [0.059; 0.156]843 [23.48; 33.73] [15.97; 22.26] [0.106; 0.297]965 [17.78; 28.09] [12.07; 20.32] [0.135; 0.355]

    1140 [25.49; 36.11] [20.41; 27.92] [0.089; 0.253]1190 [18.67; 28.99] [11.27; 19.36] [0.110; 0.352]1220 [147.84; 158.30] [17.68; 19.20] [0.020; 0.051]1290 [31.19; 42.49] [15.72; 25.80] [0.080; 0.213]1358 [210.46; 221.96] [17.63; 18.97] [0.015; 0.035]1422 [53.13; 63.99] [15.34; 18.16] [0.059; 0.135]1455 [39.05; 53.87] [20.61; 41.11] [0.069; 0.175]1512 [146.07; 157.28] [20.81; 22.38] [0.019; 0.049]1536 [209.18; 221.03] [14.43; 15.80] [0.014; 0.036]1570 [38.54; 53.67] [23.87; 51.02] [0.073; 0.173]1655 [467.58; 477.91] [17.40; 17.92] [0.006; 0.016]

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    TAMRA at 0.13 nM

    500 1000 1500 2000

    020

    40

    60

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

    baselinecorrected spectra

    posterior mean

    3

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    TAMRA at 0.65 nM

    500 1000 1500 2000

    0100

    200

    300

    400

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

    baselinecorrected spectra

    posterior mean

    3

  • Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion

    Summary

    Semi-parametric model of hyperspectral data:Joint estimation of baseline & peaksContinuous representation of discretised spectraData tempering using SMCBayesian model choice for long-range dependence

    Ongoing work:Peak detection (estimation of `p & P)Informed source separation for multiplex spectraSpatio-temporal correlation between spectra

  • Appendix

    For Further Reading I

    Moores, Gracie, Carson, Faulds, Graham & GirolamiBayesian modelling and quantification of Raman spectroscopy.in prep.

    Gracie, Moores, Smith, Harding, Girolami, Graham, & FauldsPreferential attachment of specific fluorescent dyes and dyelabelled DNA sequences in a SERS multiplex.Anal. Chem., 88(2): 11471153, 2016.

    Zhong, Girolami, Faulds & GrahamBayesian methods to detect dye-labelled DNA oligonucleotides inmultiplexed Raman spectra.J. R. Stat. Soc. Ser. C, 60(2): 187206, 2011.

    Zhou, Johansen & AstonTowards Automatic Model Comparison: An Adaptive SequentialMonte Carlo Approach.arXiv:1303.3123 [stat.ME], 2015.

  • Appendix

    For Further Reading II

    ChopinA Sequential Particle Filter Method for Static Models.Biometrika, 89(3): 539551, 2002.

    Pitt, Silva, Giordani & KohnOn some properties of Markov chain Monte Carlo simulationmethods based on the particle filter.J. Econometrics 171(2): 134151, 2012.

    Jones, McLaughlin, Littlejohn, Sadler, Graham & SmithQuantitative Assessment of Surface-Enhanced ResonanceRaman Scattering for the Analysis of Dyes on Colloidal Silver.Anal. Chem., 71(3): 596601, 1999.

    Ramsay & SilvermanFunctional Data Analysis, 2nd ed.Springer, 2005.

  • Appendix

    Raman scattering

    Illustration courtesy Jake Carson (U. Warwick)

  • Appendix

    SERS

    Surface-enhanced: proximity to a nanoplasmonic substrate(silver/gold colloid)

    Illustration courtesy Jake Carson (U. Warwick)

  • Appendix

    SERRS

    Surface-enhanced: proximity to a nanoplasmonic substrate(silver/gold colloid)

    Resonance: tune excitation wavelength to an electronictransition of the molecule

    Illustration courtesy Jake Carson (U. Warwick)

  • Appendix

    Dilution study for FAM

    315 spectra at 21 different concentrations, from 0.13 to 24.7 nM

    500 1000 1500 2000

    02

    00

    04

    00

    06

    00

    08

    00

    01

    00

    00

    ~ (cm

    1)

    Co

    un

    ts

    24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

  • Appendix

    Results: Baseline correction

    500 1000 1500 2000

    02

    00

    04

    00

    06

    00

    08

    00

    01

    00

    00

    ~ (cm

    1)

    Inte

    nsity (

    a.u

    .)

    24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

    (a) Posterior means of the baselines

    500 1000 1500 2000

    02

    00

    04

    00

    06

    00

    08

    00

    0

    ~ (cm1

    )

    Inte

    nsity (

    a.u

    .)

    24.7 nM

    23.4 nM

    22.1 nM

    20.8 nM

    19.5 nM

    18.2 nM

    16.9 nM

    15.6 nM

    14.3 nM

    13 nM

    11.7 nM

    10.4 nM

    9.1 nM

    7.8 nM

    6.5 nM

    5.2 nM

    3.9 nM

    2.6 nM

    1.3 nM

    0.65 nM

    0.13 nM

    (b) Baseline-corrected spectra

  • Appendix

    Results: Quantification

    SERS peak intensities at 650cm1: 95% CI [257.7; 262.5] ci

    0 5 10 15 20 25

    02

    00

    04

    00

    06

    00

    08

    00

    0

    Final target concentration (nM)

    Inte

    nsity (

    a.u

    .)

    Expectation

    95% HPD ivl+/ 2

    (a) Linear regression for p

    0 5 10 15 20 25

    2

    02

    4

    Final target concentration (nM)

    Expectation

    +/ 1.96

    +/ 2.58

    (b) Standardised residuals

    Raman SpectroscopyFunctional ModelBayesian ComputationExperimental ResultsModel ChoiceMultivariate Calibration