bayesian modelling and computation for raman spectroscopy
TRANSCRIPT
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Bayesian modelling and computationfor Raman spectroscopy
Matt Moores
Department of StatisticsUniversity of Warwick
Oxford computational statistics & machine learning reading groupMarch 11, 2016
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Acknowledgements
University of WarwickMark GirolamiJake CarsonKarla Monterrubio Gmez
University of StrathclydeKirsten GracieKaren FauldsDuncan Graham
Funded by the EPSRC grant In Situ Nanoparticle Assemblies forHealthcare Diagnostics and Therapy (ref: EP/L014165/1) and anAward for Postdoctoral Collaboration from the EPSRC Network onComputational Statistics & Machine Learning (ref: EP/K009788/2)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Outline
1 Raman Spectroscopy
2 Functional Model
3 Bayesian Computation
4 Experimental ResultsModel ChoiceMultivariate Calibration
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Raman spectroscopy
Illustration courtesy Jake Carson (U. Warwick)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Statistical properties
Each Raman-active dye has a unique spectral signature:Peaks correspond to vibrational modes of the moleculeShift in wavenumber proportional to change in energy stateSmoothly-varying baseline (background fluorescence)
200 300 400 500 600 700 800 900 1100 1300 1500 1700
10000
20000
30000
40000
50000
~ cm
1
photo
n c
ounts
Replicate
A
B
C
D
E
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Surface-enhanced Raman scattering (SERS)
Raman signal enhanced by proximity to nanoparticlesFunctionalisation using antibodies
Illustration courtesy Kirsten Gracie (U. Strathclyde)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Fluorescent background
200 400 600 800 1000 1200 1400 1600 1800 2000
02
00
40
06
00
80
0
(cm1)
ph
oto
n c
ou
nts
(a
.u.)
(a) surface-enhanced fluorescence(SEF)
200 400 600 800 1000 1200 1400 1600 1800 2000
02
00
40
06
00
80
0
(cm1)
ph
oto
n c
ou
nts
(a
.u.)
Well E1
Well E2
Well E3
background
(b) surface-enhanced Raman spec-troscopy (SERS)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Baseline correction
Existing methods estimate the baseline independently:Asymmetric least squaresBoelens, Eilers & Hankemeier (Anal. Chem., 2005)
Modified polynomial fitLieber & Mahadevan-Jansen (Appl. Spectrosc., 2003)
Robust baseline estimationRuckstuhl et al. (JQSRT, 2001)
WaveletsCai, Zhang & Ben-Amotz (Appl. Spectrosc., 2001)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Additive functional model of a SERS spectrum
Separate the hyperspectral signal into 3 components:
yi() = i() + s() + (1)
where:yi() is a an observed SERS spectrum, discretised at
multiple wavenumbers j Vi() is a smooth baseline functions() is the spectral signature of the dye molecule is additive, zero mean white noise with variance 2
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Baseline
Penalised spline:
i() =
Mm=1
Bm()i,m (2)
(i,) NM (0,) (3)where Bm() are Demmler-Reinsch or B-spline basis functions
550 600 650 700 750 800
1
.5
0.5
0.5
1.5
~ (cm1
)
Bm(~
)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Spectral signature
An additive mixture of radial basis functions:
s() =P
p=1
f ( | `p,Ap, p) (4)
where:`p is the location of peak pAp is the amplitudep is the scale (broadening)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Squared exponential
Kernel function is the Gaussian density:
f (j | `p,Ap, p) = Ap exp
{(j `p)2
22p
}(5)
FWHM = 2
2 ln 2p (6)
1300 1400 1500 1600 1700 1800
05
00
01
00
00
15
00
0
~ (cm
1)
Inte
nsity (
a.u
.)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Lorentzian
Long-range dependence between peaks can be modelledusing the Cauchy density:
f (j | `p,Ap, p) = Ap2p
(j `p)2 + 2p(7)
FWHM = 2p (8)
1300 1400 1500 1600 1700 1800
05
00
01
00
00
15
00
0
~ (cm
1)
Inte
nsity (
a.u
.)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Informative priors
Obtained from manual peak fitting of independent data:
Density
1 2 5 10 20 50
0.0
00.0
50.1
00.1
5
kernel density
lognormal
(a) Scale parameters, (cm1)
amplitudes
Density
0 5000 10000 15000 20000 250000.0
0000
0.0
0010
0.0
0020
kernel density
truncated normal
gamma
(b) Amplitudes, A (arbitrary units)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Multivariate calibration
Signal intensity depends linearly on dye concentration, from thelimit of detection (LOD) up to monolayer coverage:
Ap = pci, cLOD ci < cMLC (9)
where:ci is the nanomolar (nM) concentration of the dye in
observation ip is a linear regression coefficient
cLOD is based on the signal-to-noise ratiocMLC is proportional to the surface area of the
nanoparticles
Jones, et al. (1999) Anal. Chem. 71(3): 596601
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Markov chain Monte Carlo
MCMC targeting the joint posterior (,,, | yi())
Algorithm 1 Marginal Metropolis-Hastings1: Draw random walk proposals for the peaks: ,
2: Propose baseline i, q(i, | yi(),,,
)3: Propose q
( | yi(),,, i,
)4: Compute the marginal acceptance ratio:
=p(yi() | ,,, )
p(yi() | ,,, )q( | )q( | )(
)()()()
q( | )q( | )()()()( )
=p(yi() | ,)()()
p(yi() | ,)()()
5: Accept ,, i,, jointly with probability min(1, )
(assuming peak locations `p and number of peaks P are fixed)
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Sequential Monte Carlo
Particle-based method targeting a sequence of partialposteriors i (,, 1:i,, | Y1:i,)
Algorithm 2 SMC
1: Initialise (q), (q), (q), (q) q {1, . . . ,Q}2: Initialise importance weights, w(q)0 =
1Q
3: for all observations i = 1, . . . , n do4: Update importance weights:
w(q)i w(q)i1 p
(yi() | (q),(q), (q)i, , (q)
)(10)
5: Resample particles if ESSi is below threshold6: for all particles q {1, . . . ,Q} do7: Update (q), (q), (q), (q) using Algorithm 18: end for9: end for
Chopin (2002) Biometrika 89(3): 539551
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model evidence
Algorithm 2 provides a consistent, unbiased estimate of themarginal likelihood:
ZiZi1
=
Qq=1
w(q)i1 p(
yi() | (q)k)
(11)
p(Y | Mk) Zn =n
i=1
ZiZi1
(12)
where:
(q)k =
{
(q)k ,
(q)k ,
(q)k ,
(q)k
}are the parameters of
model Mk for particle qZn is the normalising constantZ0 = 1
Del Moral, Doucet & Jasra (2006) JRSS B 68(3): 411436Pitt, Silva, Giordani & Kohn (2012) J. Econom. 171(2): 134151
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Thermodynamic integration
An alternative approach is to use the path sampling identity:
log{ZnZ0
}=
10
E[
d log q()d
]d (13)
where = in identifies the sequence of partial posteriordistributions p =
qZi = i (,, 1:i,, | Y1:i,)
This equation cannot be solved exactly, so it must beapproximated using a numerical integration method.
The expectation E can be estimated using a weighted sumover the SMC particles.
Zhou, Johansen & Aston (2015) arXiv:1303.3123 [stat.ME]Gelman & Meng (1998) Statist. Sci. 13(2): 95208
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
R package serrsBayesl i b r a r y ( serrsBayes )l i b r a r y ( hyperSpec )
ramanSpectra read . spc ("mfutils.spc" )wavenumbers wl ( ramanSpectra )peakLoc wl2 i ( ramanSpectra , c (964 , 1138 , 1218 , . . . ) )
# i n f o r m a t i v e p r i o r s f o r the peaks and base l inel P r i o r s l i s t ( scale .mu=log (25 .27 ) ( 0 . 4 ^2 ) / 2 ,
scale . sd=0.4 , b l . smooth=1.25e3, b l . knots =200 ,amp.mu=3449 , amp. sd=5672 , noise . sd=3 ,noise . nu= length ( wavenumbers ) nrow ( ramanSpectra ) ,beta . sd=1000)
r e s u l t f i tPeaksWithBasel ineSMC ( wavenumbers ,ramanSpectra [ [ ] ] , peakLoc , l P r i o r s )
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Synthetic data: Lorentzian peaks + Cauchy kernel10
000
4000
070
000
Inte
nsity
040
0080
0012
000
Inte
nsity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Lorentzian peaks + squared exponential kernel10
000
4000
070
000
Inte
nsity
040
0080
00In
tens
ity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Gaussian peaks + sq. exp. kernel
1000
040
000
7000
0In
tens
ity
040
0080
0012
000
Inte
nsity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Gaussian peaks + Cauchy kernel10
000
4000
070
000
Inte
nsity
050
0010
000
1500
0In
tens
ity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Real data: TAMRA + Cauchy kernel10
000
3000
050
000
7000
0In
tens
ity
050
0010
000
1500
0In
tens
ity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Real data: TAMRA + sq. exp. kernel
1000
040
000
7000
0In
tens
ity
040
0080
00In
tens
ity
1000
030
000
5000
0In
tens
ity
250 500 750 1000 1250 1500 1750Raman Shift
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Real data: parameter estimates for
0 2000 4000 6000 8000 10000
200
210
220
230
(a) Cauchy
0 2000 4000 6000 8000 1000021
022
023
024
025
0
(b) Squared Exponential
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model Choice
The Bayes Factor (log10 BF) correctly identifies the generativemodel for simulated spectra with Gaussian (194) andLorentzian (-160) peaks:
Table: Simulation study
Data Mk log10 p(Y | Mk)Gaussian sq. exp. -2011Gaussian Cauchy -2205Lorentzian sq. exp. -2163Lorentzian Cauchy -2004
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model Choice for TAMRA
The Bayes Factor favours Lorentzian peaks (log10 BF = -32) fortetramethylrhodamine (TAMRA):
Table: Observed spectra (TAMRA)
Data Mk log10 p(Y | Mk)TAMRA sq. exp. -2257TAMRA Cauchy -2225
This indicates very strong evidence of long-range dependencebetween peaks in SERS spectra.
Kass & Raftery (1995) JASA 90(430): 773795
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Dilution study for TAMRA
315 spectra at 21 different concentrations, from 0.13 to 24.7 nM
500 1000 1500 2000
05
00
01
00
00
15
00
02
00
00
25
00
0
~ (cm
1)
Inte
nsity (
a.u
.)24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Baseline correction: TAMRA
500 1000 1500 2000
05
00
01
00
00
15
00
02
00
00
25
00
0
~ (cm
1)
Inte
nsity (
a.u
.)
24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Spectral signature: TAMRA
500 1000 1500 2000
05
00
01
00
00
15
00
0
~ (cm
1)
Inte
nsity (
a.u
.)
24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
95% HPD intervals: TAMRA`p (cm1) p (nM1) FWHM (cm1) LOD (nM)
460 [6.73; 17.23] [0.00; 14.43] [0.211; 0.784]505 [167.95; 177.83] [14.36; 15.52] [0.019; 0.047]632 [253.65; 263.16] [11.54; 12.13] [0.012; 0.032]725 [19.63; 29.52] [9.97; 16.43] [0.123; 0.316]752 [44.74; 54.81] [19.49; 23.45] [0.059; 0.156]843 [23.48; 33.73] [15.97; 22.26] [0.106; 0.297]965 [17.78; 28.09] [12.07; 20.32] [0.135; 0.355]
1140 [25.49; 36.11] [20.41; 27.92] [0.089; 0.253]1190 [18.67; 28.99] [11.27; 19.36] [0.110; 0.352]1220 [147.84; 158.30] [17.68; 19.20] [0.020; 0.051]1290 [31.19; 42.49] [15.72; 25.80] [0.080; 0.213]1358 [210.46; 221.96] [17.63; 18.97] [0.015; 0.035]1422 [53.13; 63.99] [15.34; 18.16] [0.059; 0.135]1455 [39.05; 53.87] [20.61; 41.11] [0.069; 0.175]1512 [146.07; 157.28] [20.81; 22.38] [0.019; 0.049]1536 [209.18; 221.03] [14.43; 15.80] [0.014; 0.036]1570 [38.54; 53.67] [23.87; 51.02] [0.073; 0.173]1655 [467.58; 477.91] [17.40; 17.92] [0.006; 0.016]
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
TAMRA at 0.13 nM
500 1000 1500 2000
020
40
60
~ (cm
1)
Inte
nsity (
a.u
.)
baselinecorrected spectra
posterior mean
3
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
TAMRA at 0.65 nM
500 1000 1500 2000
0100
200
300
400
~ (cm
1)
Inte
nsity (
a.u
.)
baselinecorrected spectra
posterior mean
3
-
Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Summary
Semi-parametric model of hyperspectral data:Joint estimation of baseline & peaksContinuous representation of discretised spectraData tempering using SMCBayesian model choice for long-range dependence
Ongoing work:Peak detection (estimation of `p & P)Informed source separation for multiplex spectraSpatio-temporal correlation between spectra
-
Appendix
For Further Reading I
Moores, Gracie, Carson, Faulds, Graham & GirolamiBayesian modelling and quantification of Raman spectroscopy.in prep.
Gracie, Moores, Smith, Harding, Girolami, Graham, & FauldsPreferential attachment of specific fluorescent dyes and dyelabelled DNA sequences in a SERS multiplex.Anal. Chem., 88(2): 11471153, 2016.
Zhong, Girolami, Faulds & GrahamBayesian methods to detect dye-labelled DNA oligonucleotides inmultiplexed Raman spectra.J. R. Stat. Soc. Ser. C, 60(2): 187206, 2011.
Zhou, Johansen & AstonTowards Automatic Model Comparison: An Adaptive SequentialMonte Carlo Approach.arXiv:1303.3123 [stat.ME], 2015.
-
Appendix
For Further Reading II
ChopinA Sequential Particle Filter Method for Static Models.Biometrika, 89(3): 539551, 2002.
Pitt, Silva, Giordani & KohnOn some properties of Markov chain Monte Carlo simulationmethods based on the particle filter.J. Econometrics 171(2): 134151, 2012.
Jones, McLaughlin, Littlejohn, Sadler, Graham & SmithQuantitative Assessment of Surface-Enhanced ResonanceRaman Scattering for the Analysis of Dyes on Colloidal Silver.Anal. Chem., 71(3): 596601, 1999.
Ramsay & SilvermanFunctional Data Analysis, 2nd ed.Springer, 2005.
-
Appendix
Raman scattering
Illustration courtesy Jake Carson (U. Warwick)
-
Appendix
SERS
Surface-enhanced: proximity to a nanoplasmonic substrate(silver/gold colloid)
Illustration courtesy Jake Carson (U. Warwick)
-
Appendix
SERRS
Surface-enhanced: proximity to a nanoplasmonic substrate(silver/gold colloid)
Resonance: tune excitation wavelength to an electronictransition of the molecule
Illustration courtesy Jake Carson (U. Warwick)
-
Appendix
Dilution study for FAM
315 spectra at 21 different concentrations, from 0.13 to 24.7 nM
500 1000 1500 2000
02
00
04
00
06
00
08
00
01
00
00
~ (cm
1)
Co
un
ts
24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
-
Appendix
Results: Baseline correction
500 1000 1500 2000
02
00
04
00
06
00
08
00
01
00
00
~ (cm
1)
Inte
nsity (
a.u
.)
24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
(a) Posterior means of the baselines
500 1000 1500 2000
02
00
04
00
06
00
08
00
0
~ (cm1
)
Inte
nsity (
a.u
.)
24.7 nM
23.4 nM
22.1 nM
20.8 nM
19.5 nM
18.2 nM
16.9 nM
15.6 nM
14.3 nM
13 nM
11.7 nM
10.4 nM
9.1 nM
7.8 nM
6.5 nM
5.2 nM
3.9 nM
2.6 nM
1.3 nM
0.65 nM
0.13 nM
(b) Baseline-corrected spectra
-
Appendix
Results: Quantification
SERS peak intensities at 650cm1: 95% CI [257.7; 262.5] ci
0 5 10 15 20 25
02
00
04
00
06
00
08
00
0
Final target concentration (nM)
Inte
nsity (
a.u
.)
Expectation
95% HPD ivl+/ 2
(a) Linear regression for p
0 5 10 15 20 25
2
02
4
Final target concentration (nM)
Expectation
+/ 1.96
+/ 2.58
(b) Standardised residuals
Raman SpectroscopyFunctional ModelBayesian ComputationExperimental ResultsModel ChoiceMultivariate Calibration