integral transforms from finite data: an application of...

9
Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis Luca Ambrogioni Eric Maris Radboud University Abstract Computing accurate estimates of the Fourier transform of analog signals from discrete data points is important in many fields of science and engineering. The conventional approach of performing the discrete Fourier transform of the data implicitly assumes periodicity and bandlimitedness of the signal. In this paper, we use Gaussian process regression to esti- mate the Fourier transform (or any other in- tegral transform) without making these as- sumptions. This is possible because the pos- terior expectation of Gaussian process regres- sion maps a finite set of samples to a function defined on the whole real line, expressed as a linear combination of covariance functions. We estimate the covariance function from the data using an appropriately designed gradi- ent ascent method that constrains the so- lution to a linear combination of tractable kernel functions. This procedure results in a posterior expectation of the analog signal whose Fourier transform can be obtained an- alytically by exploiting linearity. Our simu- lations show that the new method leads to sharper and more precise estimation of the spectral density both in noise-free and noise- corrupted signals. We further validate the method in two real-world applications: the analysis of the yearly fluctuation in atmo- spheric CO 2 level and the analysis of the spectral content of brain signals. Proceedings of the 21 st International Conference on Ar- tificial Intelligence and Statistics (AISTATS) 2018, Lan- zarote, Spain. PMLR: Volume 84. Copyright 2018 by the author(s). 1 Introduction The Fourier transform is perhaps the most impor- tant mathematical tool for the analysis of analog sig- nals. In order to be processed with digital computers, analog signals need to be sampled at a finite num- ber of time points. From the samples, the Fourier transform of the signal is usually estimated using the discrete Fourier transform (DFT). However, the Nyquist–Shannon sampling theorem states that the DFT provides an unbiased estimate if and only if the underlying signal is periodic and contains no power above the Nyquist frequency [Rabiner and Gold, 1975]. Unfortunately, estimating the Fourier transform of sig- nals that do not respect these properties is an intrin- sically ill-posed problem as there are infinitely many functions that perfectly fit any finite set of samples. From a Bayesian perspective, ill-posed problems can be solved by assigning a prior probability distribu- tion to the underlying functional space [Idier, 2013]. Gaussian process (GP) priors have gained substantial popularity in these kinds of applications due to their flexibility, robustness and analytical tractability [Ras- mussen, 2006]. With this paper we introduce the use of GP regression for estimating the Fourier transform (or any other linear transform) of analog signals from a fi- nite set of samples. The estimation procedure assumes neither periodicity nor discreteness of the signal and outputs a closed-form function expressed as a linear combination of tractable kernels. This latter feature is particularly important since it allows to analytically perform a wide range of further analyses using closed- form expressions. This reduces the impact of numer- ical errors and instabilities on the analysis pipeline. We estimate the GP covariance function using a con- strained gradient ascent method that maintains the analytical tractability while being able to analyze ar- bitrarily complex signals with high extrapolation and interpolation performance.

Upload: others

Post on 08-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Integral Transforms from Finite Data: An Application of GaussianProcess Regression to Fourier Analysis

Luca Ambrogioni Eric MarisRadboud University

Abstract

Computing accurate estimates of the Fouriertransform of analog signals from discrete datapoints is important in many fields of scienceand engineering. The conventional approachof performing the discrete Fourier transformof the data implicitly assumes periodicity andbandlimitedness of the signal. In this paper,we use Gaussian process regression to esti-mate the Fourier transform (or any other in-tegral transform) without making these as-sumptions. This is possible because the pos-terior expectation of Gaussian process regres-sion maps a finite set of samples to a functiondefined on the whole real line, expressed asa linear combination of covariance functions.We estimate the covariance function from thedata using an appropriately designed gradi-ent ascent method that constrains the so-lution to a linear combination of tractablekernel functions. This procedure results ina posterior expectation of the analog signalwhose Fourier transform can be obtained an-alytically by exploiting linearity. Our simu-lations show that the new method leads tosharper and more precise estimation of thespectral density both in noise-free and noise-corrupted signals. We further validate themethod in two real-world applications: theanalysis of the yearly fluctuation in atmo-spheric CO2 level and the analysis of thespectral content of brain signals.

Proceedings of the 21st International Conference on Ar-tificial Intelligence and Statistics (AISTATS) 2018, Lan-zarote, Spain. PMLR: Volume 84. Copyright 2018 by theauthor(s).

1 Introduction

The Fourier transform is perhaps the most impor-tant mathematical tool for the analysis of analog sig-nals. In order to be processed with digital computers,analog signals need to be sampled at a finite num-ber of time points. From the samples, the Fouriertransform of the signal is usually estimated usingthe discrete Fourier transform (DFT). However, theNyquist–Shannon sampling theorem states that theDFT provides an unbiased estimate if and only if theunderlying signal is periodic and contains no powerabove the Nyquist frequency [Rabiner and Gold, 1975].Unfortunately, estimating the Fourier transform of sig-nals that do not respect these properties is an intrin-sically ill-posed problem as there are infinitely manyfunctions that perfectly fit any finite set of samples.From a Bayesian perspective, ill-posed problems canbe solved by assigning a prior probability distribu-tion to the underlying functional space [Idier, 2013].Gaussian process (GP) priors have gained substantialpopularity in these kinds of applications due to theirflexibility, robustness and analytical tractability [Ras-mussen, 2006]. With this paper we introduce the use ofGP regression for estimating the Fourier transform (orany other linear transform) of analog signals from a fi-nite set of samples. The estimation procedure assumesneither periodicity nor discreteness of the signal andoutputs a closed-form function expressed as a linearcombination of tractable kernels. This latter featureis particularly important since it allows to analyticallyperform a wide range of further analyses using closed-form expressions. This reduces the impact of numer-ical errors and instabilities on the analysis pipeline.We estimate the GP covariance function using a con-strained gradient ascent method that maintains theanalytical tractability while being able to analyze ar-bitrarily complex signals with high extrapolation andinterpolation performance.

Page 2: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Integral Transforms from Finite Data

1.1 Related works

Bayesian methods have become very influential in thefield of spectral estimation [Gregory and Mohammad-Djafari, 2001]. Several nonparametric Bayesian ap-proaches have been applied to the estimation of thespectral density of stochastic signals [Carter and Kohn,1997, Gangopadhyay et al., 1999, Liseo et al., 2001].These methods are based on an asymptotic approxima-tion of the distribution of the periodogram of discrete-time signals [Whittle, 1957]. Recently, some work hasbeen done on the use of GP regression for stochasticspectral estimation of discretely sampled analog sig-nals. The GP spectral mixture (GP-SM) approachmodels the spectral density by fitting the parametersof a small number of Gaussian functions [Wilson andAdams, 2013]. A more flexible alternative is the Gaus-sian Process Convolution Model (GPCM), which isbased on a two-stage generative model where the sig-nal is assumed to be generated by convolving a whitenoise process with a filter function sampled from a GP[Wilson and Adams, 2013]. The estimates obtainedusing GPCM and GP-SM do not directly provide anestimate of the Fourier transform since the spectraldensity does not contain phase information. In ourmethod, we learn the spectral density using a newgradient ascent method that shares the nonparametricflexibility of GPCM while being significantly simpler.This simplicity comes at the price of neglecting theuncertainty about the spectral density estimate. Themost important feature of this learning approach isthat the resulting point-estimate is expressed as a lin-ear combination of tractable kernel functions. This isa requirement for our Fourier estimation procedure.This Fourier estimation procedure can be seen as ageneralization of the GP quadrature method, whichuses GP regression for numerical integration of definiteintegrals [O’Hagan, 1991]. In this approach, the inte-grand is assumed to be sampled from a GP distributionand evaluated on a finite set of points. Importantly,under these assumptions, the posterior distribution ofthe integral can be obtained in closed-form.

2 Background

The Fourier transform of a function f(t) is defined asfollows:

F[f(t)

](ξ) = f(ξ) =

1

∫ +∞

−∞e−iξtf(t)dt , (1)

where e−iξt = cos ξt− i sin ξt is a complex-valued sinu-soid. The Fourier transform is a linear operator, mean-ing that F

[af(t)+bg(t)

](ξ) = aF

[f(t)](ξ)+bF

[g(t)](ξ).

We can interpret the Fourier transform as a specialcase of linear integral transform. A general linear in-

tegral transform has the following form:

IA[f(t)

](s) =

∫ a

b

A(s, t)f(t)dt , (2)

where the bivariate function A(s, t) is the kernel of thetransform. The limits of integration, a and b, can befinite or infinite.

2.1 Gaussian Process Regression

GP methods are popular Bayesian nonparametrictechniques for regression and classification. A generalregression problem can be stated as follows:

yt = f(t) + εt , (3)

where the data point yt is generated by the latent func-tion f(t) plus a zero-mean noise term εt that we willassume to be Gaussian. The main idea of GP regres-sion is to use an infinite-dimensional Gaussian prior(a GP) over the space of functions f(t). This infinite-dimensional prior is fully specified by a mean function,usually assumed to be identically equal to zero, and acovariance function K(t, t′) that determines the priorcovariance between two different time points. The pos-terior distribution over f(t) can be obtained by ap-plying Bayes theorem. Given a set of training points(tk, yk), it can be proven that the posterior expecta-tion mf (t) is a finite linear combination of covariancefunctions:

mf (t) =∑k

wkK(t, tk) , (4)

where the weights are linear combinations of datapoints

wk =∑j

Akjyj . (5)

In this expression the matrix A is given by the follow-ing matrix formula:

A = (K + λI)−1 , (6)

where λ is the variance of the sampling noise andthe matrix K is obtained by evaluating the covariancefunction for each couple of time points:

Kjk = K(tj , tk) . (7)

The derivation of these results is given in [Rasmussen,2006].

3 Computing Integral TransformsUsing GP Regression

One of the most appealing features of GP regression isthat, while the training data are finite and discretely

Page 3: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Luca Ambrogioni, Eric Maris

sampled, the posterior expectation is defined over thewhole time axis. Furthermore, Eq. 4 shows that thisexpectation is a linear combination of covariance func-tions. From this linearity, it follows that every integraltransform of mf (t) can be calculated as a linear com-bination of the integral transform of the covariancefunctions K(t, tk):

IA[mf

](s) =

∫ a

b

A(s, t)

[∑k

wkK(t, tk)

]dt (8)

=∑k

wk

∫ a

b

A(s, t)K(t, tk)dt .

Clearly, this transform is well-defined as far as thetransform

∫ abA(s, t)K(t, tk)dt exists. The special case

where the integral operator is a simple definite integralhas been applied to numerical integral analysis and isknown as the GP quadrature rule [O’Hagan, 1991]:∫ a

b

f(t)dt ≈∑k

wk

∫ a

b

K(t, tk)dt . (9)

In the case of the Fourier transform, Eq. 8 becomes:

IF[mf

](ξ) =

∑k

wk1

∫ +∞

−∞e−iξtK(t, tk)dt . (10)

This expression further simplifies when K(t, tk) is sta-tionary, meaning that K(t, t′) = K(t − t′, 0). In thiscase, we can use the Fourier shift theorem and obtain:(∑

k

wke−iξtk

)(1

∫ +∞

−∞e−iξtK(t, 0)dt

)(11)

= K(ξ, 0)∑k

wke−iξtk .

where the rightmost factor in the right hand side isproportional to the DFT of the GP weights, as de-fined in Eq. 5. This result hints to a deep connectionbetween the GP Fourier approach and the classicalDFT approach based on the Nyquist–Shannon sam-pling theorem. This connection will be made explicitin section 4.

3.1 Hierarchical Covariance Learning

The aim of this subsection is to introduce a hierarchi-cal Bayesian model that allows to estimate the GP co-variance function from the data using a MAP estima-tor. We restrict our attention to stationary covariancefunctions, i.e. covariance functions that solely dependon the difference between the time points τ = t′−t. Weconstruct the hierarchical model by defining a hyper-prior for the the spectral density S(ξ), defined as theFourier transform of the covariance function:

S(ξ) =1

∫ +∞

−∞K(τ)e−iξτdτ . (12)

Since the Fourier transform is invertible, an estimate ofthe spectral density can be directly converted into anestimate of the covariance function. Using a GP hyper-prior on the spectral density S(ξ) would be a con-venient modeling choice as it easily allows to specifyits prior smoothness, thereby regularizing the estima-tion. For example, we could use a GP hyper-prior withsquared exponential (SE) kernel (covariance) function:

KSE(ξ, ξ′) = e−(ξ−ξ′)2

2σ2 , (13)

where the scale parameter σ regulates the priorsmoothness. Unfortunately, this is not a valid priorfor the spectral density of a GP since it assigns non-zero probability to negative valued spectra which donot correspond to any valid stationary stochastic pro-cess. However, we can obtain a proper prior distribu-tion by restricting this GP probability measure to thefollowing positive-valued functional space:

s(ξ) =∑j

eajKSE(ξ, ξj)|aj ∈ R , (14)

where ξj are the discrete Fourier frequencies of thesampled data points. In order to obtain the likelihoodof the model, we assume that each DFT coefficientonly depends on the value of the spectrum correspond-ing to its frequency. For each frequency, the resultinglikelihood functions are complex normal distributions:

log p(yξi |s(ξ)

)= −

|yξj |2

s(ξ) + λ− log

(π(s(ξ) + λ)

)(15)

where yξj is the j-th DFT coefficient of the sampleddata and λ is the variance of the sampling noise. Theassumption of conditional independence is justified bythe fact that the Fourier coefficients of stationary GPsare independent random variables. Nevertheless, theassumption is not exact since the finite length of thesampling period induces correlations between the DFTcoefficients. We mitigated the bias induced by this ap-proximation by computing the DFT coefficients usinga Hann taper, which reduces the correlations betweenDFT coefficients corresponding to distant frequencies.We calculate the maximum-a-posteriori (MAP) esti-mate by means of gradient ascent applied to the poste-rior distribution of the spectral density. The algorithmmaximizes the posterior distribution with respect tothe log-weights aj and therefore only finds solutions inthe restricted subspace of Eq. 14. In the log-weightspace, the gradient of the (approximate) log marginallikelihood l is

∂l

∂ak= eak

∑j

(|yξj |2 − (s(ξj) + λ)

)(s(ξj) + λ

)2 KSE(ξk, ξj)

(16)

Page 4: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Integral Transforms from Finite Data

and the gradient of the (log-)prior p is

∂p

∂ak= −eak

∑j

KSE(ξk, ξj)eaj . (17)

The resulting MAP estimate has the following form:

S(ξ) =∑j

ehjKSE(ξ, ξj) , (18)

where hj are the optimized log-weights. Finally, ourpoint estimate of the covariance function is obtainedby applying the inverse Fourier transform to the MAPestimate of the spectral density:

K(τ) =

∫ +∞

−∞eiξτ S(ξ)dξ (19)

=∑j

ehj∫ +∞

−∞eiξτKSE(ξ, ξj)dξ

= σ∑j

ehje−σ2τ2

2 −iξjτ .

This covariance function has the advantage of captur-ing the spectral features of the data while keeping atractable analytic expression as a linear combination ofthe inverse Fourier transforms of SE kernel functions.Note that, if we have access to multiple realizationsof a stochastic time series, we can learn the spectraldensity from the whole set of realizations simply bysumming the log marginal likelihood of each realiza-tion. We will use this procedure in our analysis ofneural oscillations.

3.2 Bayes-Gauss-Fourier transform

We can now plug in the data-driven covariance func-tion in our expression for the integral transform andexploit the linear structure of the covariance functionby interchanging summation and integration:

σ∑k,j

wkehj

∫ a

b

A(s, t)e−σ2(t−tk)2

2 −iξj(t−tk)dt . (20)

In the case of the Fourier transform this formula spe-cializes to ∑

k,j

wkehje−

(ω−ξj)2

2σ2−iωtk (21)

because

F[e−

σ2(t−tk)2

2 −iξj(t−tk)](ω) = σ−1e−

(ω−ξj)2

2σ2−iωtk .

We refer to the resulting transformation of the data asthe Bayes-Gauss-Fourier (BGF) transform.

4 Theoretical considerations

In this section, we will lay a more rigorous mathe-matical foundation of the newly introduced methods.The aim is to introduce a larger theoretical frameworkthat allows to directly compare the DFT-based meth-ods with our new GP Fourier transform. In particular,we will show that the DTF method can be seen as aspecial case GP Fourier transform.

4.1 Fourier transform of band-limitedperiodic signals

We begin by reviewing some well known results aboutthe Fourier analysis of periodic band-limited signals.This will pave the way to a Bayesian reformulationof Fourier analysis, which we will introduce in thenext subsection. Consider a vector of samples y =(yt0 , .., ytN ) obtained by evaluating an analog signal atthe tuple of time points T = (t0, ..., tN ). The Fourieranalysis of a discretely sampled analog signal can bedecomposed into two basic operations.

First, the set of samples have to be mapped into awell-defined analog signal. This operation can be for-malized by a linear operator M : Y → H, mapping thesample space Y to an appropriate functional space H.The operator is defined by the property

M[y](tj) = f(tj) = yj . (22)

This simply means that the values of the resultingfunction at the sample time points have to be equalto the samples.

Second, the analog signal has to be mapped into itsFourier transform. From an abstract point of view,the Fourier transform can be seen as a linear operatorbetween functional spaces F : H → I, where the exactnature of the spaces H and I depends to the specificapplication. Under some regularity conditions over thefunctional space H [Schechter, 1971], this second op-eration is unproblematic.

Unfortunately, the first operation is intrinsically ill-posed since there is not a unique way to map afinite set of samples into an analog signal. Theclassical solution to this problem relies on theNyquist–Shannon sampling theorem, which states thatan analog signal can be perfectly reconstructed fromthe finite series of equally spaced samples y =(y−T/2, y−T/2+δt, ...yT/2−δt, , yT/2) if and only if: 1)the signal is periodic with period T and 2) it does notcontain harmonic components with frequency higherthan 1/(2δt). We will denote this functional spaceof periodic and band-limited functions as Hbl. Un-der these conditions, the operator M : Y → Hbl isuniquely defined. The functional space Hbl is the span

Page 5: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Luca Ambrogioni, Eric Maris

of a finite number of complex exponential basis func-tions

Φk(t) =1

Neiωkt . (23)

where ωk = 2πk/T and the integer k ranges from−N/2 to N/2. Therefore, any periodic and band-limited function fpb(t) ∈ Hbl can be expressed as fol-lows

fbl(t) =∑k

αkΦk(t) . (24)

Combining Eq. 24 with Eq. 22, we obtain a linear sys-tem of equations with a unique solution since the set ofbasis functions is linearly independent. The solutioncoefficients are the DFT coefficients of the samples:

fbl(t) =∑k

(φ∗ky

)Φk(t) , (25)

where the vectors φk are obtained by evaluating thebasis functions Φk(t) at the sample time points. Wecan now take the Fourier transform of the function inEq. 25, the result is:

F[fbl(t)] =∑k

(φ∗ky

)F[Φk(t)] (26)

=∑k

(φ∗ky

)N

δ(ξ − ωk) .

In the last expression, the symbol δ(ξ − ωk) denotesthe Dirac delta function. Intuitively, Eq. 26 says thatall the energy of the signal is concentrated in a finitenumber of DTF frequencies.

4.2 Fourier transform from a functionalBayesian viewpoint

We will now show that the classical method we re-viewed in the previous subsection is a special case ofGP Fourier analysis. First of all, the problem can beintegrated into a Bayesian probabilistic framework byintroducing observation noise into Eq. 22 and by as-signing a prior distribution over the functional spaceHbl. Using Gaussian observation noise and a sphericalGaussian prior distribution over the coefficients αk, weobtain the following Bayesian problem:

ytj ∼ N(fbl(tj) , λ

)(27)

αk ∼ N(0, β),

fpb(tj) =∑k

αkΦk(tj) ,

where λ is the variance of the observation noise and βis the variance of the prior over the coefficients. Thisis a Bayesian linear regression. Importantly, regard-less to the value of the prior variance, the posteriordistribution of the coefficients concentrate all its mass

on the solution given in Eq. 25 when the observationnoise tends to zero. Therefore, the Bayesian problemin Eq. 27 is a probabilistic generalization of the deter-ministic problem given by Eq. 24 and Eq. 22.

The Bayesian linear regression in Eq. 27 can now bereformulated as a GP regression. This generalizes ouranalysis to functional spaces that cannot be obtainedfrom a finite set of basis functions, thereby allowing formore flexible and realistic prior distributions. In par-ticular, we will work on the space of complex-valuedfunctions whose domain is R, which we will denoteHΩ. We can now define a Gaussian probability mea-sure over HΩ ⊇ Hbl that is equivalent to the coeffi-cient space prior distribution given in Eq. 27. Thisis achieved by constructing the following covariancefunction [Rasmussen, 2006]:

KBL(t, t′) = β∑k

Φk(t)Φ∗k(t′) . (28)

Using Eq. 28, we can reformulate Eq. 27 as follows:

ytj ∼ N(fbl(tj) , λ

)(29)

fpb(tj) ∼ CGP(0,Kbl(t, t

′)),

where CGP(m(t), k(t, t′)

)denotes a complex-valued

circularly-symmetric GP. Circularly-symmetric GPsare the complex-valued analogous of stationary real-valued GPs. They are defined by a complex-valuedmean function m(t) and a complex-valued hermitiancovariance function function k(t, t′). Hermitian co-variance functions are positive definite and Hermitian,meaning that k(t, t′) = k(t′, t)∗ [Boloix-Tortosa et al.,2014, 2015, Ambrogioni and Maris, 2016].

The main feature of Eq. 28 is that the resulting corre-lation between any pair of sample time points is zero(except for the first and the last time point, where thecorrelation is exactly equal to one). However, since thecovariance function is different from zero almost every-where, the Bayesian reformulation shows that we arestill making strong assumptions about the behavior ofthe function outside of the set of sample time points.Strikingly, the covariance function is periodic with pe-riod T and, consequently, all the functions obtainedfrom this GP are periodic with period T .

From a Bayesian point of view, determining the priordistribution based on the sample time points is rathercounterintuitive since our prior knowledge of the sig-nal should not depend on the sampling frequency andthe total sampling time. Furthermore, the covariancefunction given in Eq. 28 is degenerate, meaning thatit assigns non-zero probability measure only to a finitedimensional functional sub-space. This leads to somerather paradoxical situations. For example, considertwo scientists that are analyzing the same radio sig-nal, the first sampling it at 300 kHz for a period of 1s

Page 6: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Integral Transforms from Finite Data

and the second at 300 kHz for 1.1s. If they use Eq. 28,the resulting prior distributions will have a disjointsupport, meaning that the spaces of signals that theyconsider possible are completely non-overlapping!

Using Eq. 29, we can generalize the analysis to otherprior distributions simply by choosing another covari-ance function. A possible compromise solution is tomultiply the band-limited covariance function given inEq. 28 by a radial basis function such as the squaredexponential:

KrBL(t, t′) = βKSE(t, t′; ν)∑k

Φk(t)Φ∗k(t′) . (30)

Where ν denotes the length scale. This relaxed band-limited (rBL) covariance function assigns zero correla-tion between any couple of sample time points but itdoes not enforce periodicity. The resulting GP Fouriertransform of the data is obtained by plugging Eq. 28into Eq. 4 and taking the Fourier transform of theresulting linear combination of translated covariancefunctions:

F[mf ](ξ) = β∑j

wj F[KrBL(t, tj ; ν)](ξ) (31)

= βν

(∑k

e−ν2 (ξ−ωk)2

2

)(1

T

∑j

wj e−iξtj

).

Crucially, while the resulting prior is still dependent onthe sample time points, the rBL GP Fourier transformdoes not concentrate all the energy of the signal on a,rather arbitrary, finite set of frequencies.

Both the BL and the rBL covariance functions are de-signed to be maximally uninformative on the sampletime points. This leads to Fourier analysis that are notbiased toward a particular set of frequencies. Unfortu-nately, this also leads to a GP analysis that does notmeaningfully extrapolate the signal beyond the sampletime points. In order to be able to extrapolate withoutbiasing a predefined set of frequencies, the covariancefunction has to be learned from the data. In fact, thisallows to detect the periodic components of the sig-nal and to use this information in order to extrapolatebeyond the sampling range. In subsection 3.2, we out-lined a Bayesian learning scheme based on gradientascent. The resulting BGF covariance function givenin Eq. 19 can be rewritten as follows:

K(t, t′) = N2KSE(t, t′; 1/σ)∑k

ehk(y)Φk(t)Φ∗k(t′) ,

(32)This reformulation shows that the BGF covariancefunction is a spectrally weighted version of the rBLcovariance function.

5 Experiments

In this section we validate our new method on simu-lated and real data. We focus the validation studieson the problem of estimating the power spectrum ofdeterministic and stochastic signals, as this is perhapsthe most common application of the Fourier transform.We compare the performance of the BGF transformwith more conventional DFT-based estimators.

5.1 Analysis of noise-free signals

We investigate the performance of our method in re-covering the Fourier transform of a discretely sam-pled deterministic signal. As first example signal, weuse the following an-harmonic windowed oscillationg(t) = e−t

2/2a2cos3ω0t. We sampled the signal fromtmin = −25 to tmax = 25 in steps of 0.01 and witha = 15 and ω0 = 3

5π. These samples were analyzedusing the BGF transform as described in the Methods.Fig. 1A shows the result of the GP regression in thetime domain. Clearly, the expected value of the GPregression (blue line) is able to extrapolate the wave-form of the signal far beyond the data points. Nextwe compared our GP-based estimate with two moreconventional estimates of the spectrum |g(ω)|2: theDiscrete Fourier Transform (DFT) of the data using asquare and a Hann taper. The spectra obtained usingthe DFT methods were normalized to have the sameenergy of the ground truth signal in the set of DTFfrequencies. This normalization is required since theDTF-based methods assign all the energy of the analogsignal to the DTF frequencies instead of spreading itinto a continuous spectrum. Fig. 1B shows these spec-tral estimates, together with the ground truth spec-trum, on a log scale. The BGF transform (green line)captures the shape and width of the four main lobesalmost perfectly, despite the fact that their peaks arenot fully aligned with the discrete Fourier frequenciesof the sampled data (which are determined by the sig-nal’s length). Furthermore, the BGF transform hassignificantly higher sidelobe suppression than the DFTestimates, up to 106 higher than the DFT with Hanntaper.

We quantitatively evaluated these observations ina simulation study. We randomly generated200 ground truth signals of the form g(t) =

e−t2/2a2cos3(ω0t+ φ0), where the parameters φ0 ∈

[0, 2π], ω0 ∈ [0.3 × 2π, 0.6 × 2π] and a ∈ [0, 30] weresampled from uniform distributions. For each gener-ated signal, we computed the absolute deviation be-tween the ground truth spectrum and those estimatedusing BFT transform, DFT and tapered DFT. Weevaluated the deviations separately for the passband(log10 g(ξ) > −6) and the stopband (log10 g(ξ) < −6)

Page 7: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Luca Ambrogioni, Eric Maris

Figure 1: Spectral estimation of a synthetic signal. A)Ground truth signal (dashed black line), sample points(blue dots) and the expected value of the GP regression(green line). B) (Log10) Power spectrum of the groundtruth signal (dashed line) and spectral estimates ob-tained from the samples using BGF transform (greenline), DTF (blue line) and DTF with Hann taper (redline).

Figure 2: Quantitative comparison (noise-free case).Ranked absolute deviations in the A) passband range(log10 g(ξ) > −6) and B) stopband range (log10 g(ξ) <−6)

segments of the spectrum. This division is importantsince methods that are effective at estimating the mainlobes are often poor at suppressing the side lobes andvice versa. Since the actual deviations are not infor-mative, we only report the histogram of the rankedperformances. Fig. 2 shows the results. The BGFtransform outperforms both DTF and tapered DTFboth in the passband and in the stopband case. Asexpected, the DTF approach outperforms the taperedDFT approach on the passband range while the oppo-site is true in the stopband range.

5.2 Analysis of noise-corrupted signals

We evaluated the robustness of the method to noisein the time series. As ground truth signal, we usedthe deterministic signal given in the previous subsec-tion corrupted by Gaussian white noise (sd = 0.1). Wecompared the performance of the BGF transform withthe performance of a popular multitaper estimator in-

Figure 3: Spectral estimation of a synthetic noisy sig-nal. A) Ground truth signal (dashed black line), noise-corrupted sample points (blue dots) and expectedvalue of the GP regression (green line). B) (Log10)Power spectrum of the ground truth signal (dashedline) and spectral estimates obtained using BGF trans-form (green line), DTF with square taper (blue line)and DTF with two (blue line), three (red line) or four(yellow line) DPSS tapers.

volving discrete prolate spheroidal sequences (DPSS)[Percival and Walden, 1993]. In this method, severalspectral estimates are obtained by taking the DFTof the signal multiplied with several taper functions.The DPSS taper functions are obtained by optimiz-ing the spectral concentration [Percival and Walden,1993]. The different spectral estimates are then av-eraged in order to obtain the multi taper estimate.Fig. 3A shows that the GP expected value acts as adenoiser and remains able to extrapolate the signalbeyond the data points. As the noisy data requiremore regularization, the amplitude of the oscillationis reduced. Fig. 3B shows the estimated spectrum.The recovery of the main lobes remains very accurate,apart from a small downward shift due to the am-plitude loss. Furthermore, the flat background noisespectrum is more suppressed as compared to the mul-titaper estimates.

Again, we quantitatively evaluated these observationsin a simulation study. The study design was identicalto the noise-free case. We corrupted the observationwith Gaussian white noise (sd = 0.1). Fig. 4 showsthat the BGF transform outperforms all the DFT-DPSS estimates in almost all simulated trials.

5.3 Fourier analysis of Mauna Loa CO2 level

As first example using real data, we analyzed the spec-trum of the Mauna Loa monthly CO2 concentration.We considered a time period of 15 years. The timeseries was de-trended using a second order polynomialregression in order to remove the non-stationary com-ponent. We compared the BGF estimate with three

Page 8: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Integral Transforms from Finite Data

Figure 4: Quantitative comparison (noise-corruptedcase). Ranked absolute deviations in the A) pass-band range (log10 g(ξ) > −6) and B) stopband range(log10 g(ξ) < −6)

Figure 5: Analysis of CO2 level. 1) (Log) Spectral es-timate obtained using BGF transform. 2) (Log) Spec-tral estimate obtained using DTS with square taper(blue), Hann taper (red) and two DPSS tapers (yel-low).

DFT estimates well-suited for this noise range: 1)square taper, 2) Hann taper and 3) DPSS tapers (twoand three). The results are shown in Fig. 5. As wecan see, the BGF spectral estimate captures 1) the lowfrequency broadband component, 2) four sharp peakscorresponding to the one year cycle and 3) the spectralfloor. Note that, compared with the DFT-based esti-mates, the two main peeks are sharper. Furthermore,the peaks corresponding to the 3/year and 4/year fre-quencies are clearly visible in the BGF estimate butbarely discernible in the DFT-based estimates. Ttheenergy of the spectral floor is greatly suppressed inthe BGF estimate. This effect is probably due to theexplicit incorporation of the noise model in the GPanalysis. Altogether, the analysis confirms all the fea-tures of the BGF transform that we have already es-tablished on synthetic signals, namely sharper spectralpeeks and higher noise suppression.

5.4 Fourier analysis of neural oscillations

In our final experiment we use the BGF transform torecover the spectrum of neural oscillations. We col-lected resting state MEG brain activity from an ex-perimental participant that was instructed to fixateon a cross at the center of a black screen. Since we

Figure 6: Analysis of human MEG signal. 1) (Log)Spectral estimate obtained using BGF transform. 2)(Log) Spectral estimate obtained using DPSS DTSwith three tapers.

are not interested in the spatial aspects of the signalwe restricted our attention to the analysis of the MEGsensor with the greatest alpha (10 Hz) power. We ana-lyzed the time series using the BGF transform. In thisanalysis the covariance function of the GP was esti-mated jointly from all trials by summing the trial spe-cific likelihoods. We compared the resulting spectralestimates with those obtained using DPSS multitaperDFT (with three tapers). Fig. 5 shows the average andstandard deviation of the log-power estimates. Themain features of the MEG spectrum are (1) the 1/fcomponent, a well-known feature of many biologicaland physical systems [Szendro et al., 2001], 2) alphaneural oscillations, as visible from the peak at 11Hzand its second harmonic at 22Hz [Ward, 2003], and 3)power line noise, sharply peaked at 50 Hz. From thefigure we can see that, compared to the DPSS mul-titaper estimate (panel B), the spectral peaks of theBGF estimate (panel A) are sharper and more clearlyvisible against the 1/f background.

6 Conclusions

In this paper we introduced a new nonparametricBayesian method for estimating integral transforms ofdiscretely sampled analog signals. While the methodcan be applied to any linear transform, we focusedour exposition on the Fourier transform. We showedthat our approach is a probabilistic generalizationof the conventional approach based on the famousNyquist–Shannon sampling theorem. We introduceda new hierarchical Bayesian model which we used inorder to estimate the covariance function from the datausing a MAP approach. In a series of experiments onsimulated and real-world signals, we showed that theresulting BGF transform outperforms the DFT basedmethods both in terms of mainlobe sharpness and side-lobe suppression.

Page 9: Integral Transforms from Finite Data: An Application of ...proceedings.mlr.press/v84/ambrogioni18a/ambrogioni18a.pdf · The Fourier transform is perhaps the most impor-tant mathematical

Luca Ambrogioni, Eric Maris

References

L. R. Rabiner and B. Gold. Theory and Applicationof Digital Signal Processing. Prentice Hall, 1975.

J. Idier. Bayesian Approach to Inverse Problems. JohnWiley & Sons, 2013.

C. E. Rasmussen. Gaussian Processes for MachineLearning. The MIT Press, 2006.

P. C. Gregory and A. Mohammad-Djafari. A Bayesianrevolution in spectral analysis. AIP Conference Pro-ceedings, 568(1):557–568, 2001.

C. K. Carter and R. Kohn. Semiparametric Bayesianinference for time series with mixed spectra. Journalof the Royal Statistical Society: Series B (StatisticalMethodology), 59(1):255–268, 1997.

A. K. Gangopadhyay, B. K. Mallick, and D. G. T.Denison. Estimation of spectral density of a sta-tionary time series via an asymptotic representationof the periodogram. Journal of Statistical Planningand Inference, 75(2):281–290, 1999.

B. Liseo, D. Marinucci, and L. Petrella. Bayesian semi-parametric inference on long–range dependence.Biometrika, 88(4):1089–1104, 2001.

P. Whittle. Curve and periodogram smoothing. Jour-nal of the Royal Statistical Society: Series B (Sta-tistical Methodology), pages 38–63, 1957.

A. G. Wilson and R. P. Adams. Gaussian process ker-nels for pattern discovery and extrapolation. 3rd In-ternational Conference on Machine Learning, pages1067–1075, 2013.

A. O’Hagan. Bayes–Hermite quadrature. Journal ofStatistical Planning and Inference, 29(3):245–260,1991.

M. Schechter. Principles of Functional Analysis, vol-ume 2. Academic Press New York, 1971.

R. Boloix-Tortosa, F. J. Payan-Somet, and J. J.Murillo-Fuentes. Gaussian processes regressors forcomplex proper signals in digital communications.Sensor Array and Multichannel Signal ProcessingWorkshop, pages 137–140, 2014.

R. Boloix-Tortosa, E. Arias-de Reyna, F. J. Payan-Somet, and J. J. Murillo-Fuentes. Complex–valuedGaussian processes for regression: A widely non–linear approach. arXiv preprint arXiv:1511.05710,2015.

L. Ambrogioni and E. Maris. Complex–valuedGaussian process regression for timeseries analysis.arXiv:1611.10073, 2016.

D. B. Percival and A. T. Walden. Spectral Analysis for-Physical Applications. Cambridge University Press,1993.

P. Szendro, G. Vincze, and A. Szasz. Pink–noise be-haviour of biosystems. European Biophysics Jour-nal, 30(3):227–231, 2001.

L. M. Ward. Synchronous neural oscillations and cog-nitive processes. Trends in Cognitive Sciences, 7(12):553–559, 2003.