new bayesian joint detection and estimation in functional mri · 1 new bayesian joint detection and...

1

New Bayesian Joint Detection and Estimation inFunctional MRI

David Afonso, João Sanches, Member, IEEE, and Martin H. Lauterbach (MD)

Abstract—The functional MRI (Magnetic ResonanceImaging), fMRI, is a new imaging tool to study and evaluatethe brain from a functional point of view. The blood-oxygenation-level-dependent (BOLD) signal is currentlyused to detect the activation of brain regions with a stim-ulus application, e.g., visual or auditive. In a block designapproach the stimuli (called paradigm in the fMRI scope)are designed to detect activated and non activated brainregions with maximized certainty. However, corruptingnoise in MRI volumes acquisition, patient motion and thenormal brain activity interference makes this detection adifficult task. The most used activation detection fMRIalgorithm, here called SPM-GLM [1] uses a conventionalstatistical inference methodology based on the t-statistics,where it assumes a rather rigid shape on the BOLDHemodynamic Response Function (HRF), constant for thewhole region of interest (ROI).

A different perspective is presented in this paper; anew Bayesian method, here called SPM-MAP, for the jointdetection of brain activated regions and estimation ofthe underlying HRF. This approach presents two mainadvantages: 1) the activity detection benefits from themethod’s high flexibility toward the HRF shape; 2) itprovides local HRF estimations.

Monte Carlo tests were performed with synthetic data,first comparing the activity detection section of SPM-MAP with the standard SPM-GLM method and secondanalysing the whole SPM-MAP performance. Finally, de-tection analysis results on real fMRI block-designed dataare presented.

Index Terms—functional Magnetic Resonance, Nervoussystem, MAP estimation, Biomedical signal detection,Adaptive signal processing.

D. Afonso is with the Biomedical Engineering Consortium, IST-FML, Portugal.

D. Afonso and J. Sanches are with the Institute for Systemsand Robotics, Instituto Superior Técnico, Lisbon, Portugal (seehttp://welcome.isr.ist.utl.pt/home/).

M. Lauterbach is with the Faculty of Medicine, University ofLisbon, Portugal.

This work was supported by Fundação para a Ciência e a Tec-nologia (ISR/IST plurianual funding) through the POS ConhecimentoProgram which includes FEDER funds. It was done in partialcollaboration with Hospital da Cruz Vermelha, Lisbon.

The authors would like to thank Henrik Halvorsen and CátiaFonseca for English corrections.

I. INTRODUCTION

Functional Magnetic Resonance Imaging (fMRI) is anew and exciting method that, among other purposes,allows the determination of which parts of the brain areinvolved in a particular task. Offering considerably highresolution (e.g. comparing with electroencephalography(EEG)) and non-invasiveness, this recent and growingmodality has already been able to established itself asthe most prominent method used for functional brainimaging, and will certainly have a large impact inNeurology’s future.

This modality relies on changes in blood oxygena-tion and volume, consequence of the hemodynamicresponse events related to local neural activity. This sig-nal, called blood-oxygenation-level-dependent (BOLD),results from the endogenous magnetic contrast betweenoxyhaemoglobin (diamagnetic) and deoxyhaemoglobin(paramagnetic). Hence, increased blood volume reducesthe local concentration of deoxygenated hemoglobincausing an increase in the magnetic resonance (MR)signal on a T2 or T2*-weighted image [2].

The computational analysis of the data provided bythis method is not trivial and is a subject open tomuch development. Usually, it aims at obtaining a vi-sual statistical map representation of which areas wereactivated by the stimulation paradigm applied during thescan, and where in the brain these areas are located. Toobtain the desirable results several processing steps areusually involved, e.g. image preprocessing (motion andnoise correction), spacial normalization transformation,statistical tests and the final inferences procedures [3].Considering the last two steps, which are the focus ofthis paper, most of the methods described in the literatureare variants of the general linear model (GLM). Althoughsome exploratory methods like clustering, PCA [4] andICA [5], have been used, the general approach is toexpress the observed response variable in terms of alinear combination of explanatory variables (EVs) [1],and make use of classical statistics (T or F tests) toinfer activity, using e.g. a p-value threshold.

The main EVs used in the GLM allocate temporalvariance in the (preprocessed) data with strong cor-

2

relation with the paradigm stimulus. To achieve this,the stimulus paradigm timing is convolved with oneor more fixed shape hemodynamic impulse responsefunction (HRF).

In the literature there are two different approachesfor the HRF modeling. The most common approachis purely heuristic, using known functions (e.g. gammafunctions [6,7], or Gaussian functions [8]) to fit theexperimental data. Some variability of the HRF has beenaccounted for with the use of functional basis [9], butthe constrain imposed on the HRF shape is hard. Thesecond approach is physiological, modeling the underly-ing physiological process involved in the BOLD signalgeneration, e.g. the Balloon Model [10] which is oftenused and augmented [11]. But, unfortunately, due tomuch higher conceptual and computational complexityand higher number of state variables and parameters tobe estimated, these models have been remitted to studiesin which the knowledge of the physiological events areimportant or essential.

In this paper we make use of the linear, infinite im-pulse response (IIR) physiologically based hemodynamic(PBH) model [12], which combines the best of bothtypical approaches presented. i) Simplicity and lightnessof the heuristic approach; not compromising its useon simple, straight-ahead detection studies; ii) Physi-ologically based; possibly providing some informationon significant underlying physiological events of thevascular and neural tissues.

The present paper focuses on a new analysis perspec-tive that combines the activity estimation problem (whichleads to a functional brain map) and the local HRFestimation problem, in order to minimize the detectionerror probability. The combination of these problemsinto one, provides an activity detection method that ismore sensible to underlying HRF shapes that deviatefrom the "standard shape" generally assumed, eventuallyreducing its error probability. Furthermore, this methodis presented parameter-free which could be valuablefor comparison against results from pramater-dependentmethods such as the SPM-GLM. In addition, the HRFit provides might be useful for studying the neuro-vascular events and physiological events behind localbrain activity.

In order to estimate the binary (activated or not)information on each voxel (volume element), we presenta Bayesian method that forces this binary activationsolution while jointly estimating the HRF. This is doneby first estimating a model-free finite impulse response(FIR) and then projecting it into the PBH model IIRspace. Monte Carlo tests are presented for syntheticdata and the errors probability obtained with the pro-

posed method are compared with the standard classicalprocedures used in several software, e.g. the Statisti-cal Parametric Mapping (SPM) [13], FMRIB SoftwareLibrary [14] and BrainVoyager [15]. Furthermore HRFestimation is graphically presented and compared withthe real HRF behind the synthetic data. Results of theproposed method on real data are also displayed.

The rest of the paper is organized as follows. Sec-tion II starts with the introduction and mathematicalconstruction of the assumed model behind the BOLDdata generation, and its intervening elements such as thestimulus paradigms, the HRF model and the noise modelused. In Section III the Bayesian method is presentedand its computational estimation described. Section IVpresents the obtained results and is divided in threesection: III-A compares the activity detection error prob-ability with SPM-GLM on Monte Carlo testing; III-Bdisplays the SPM-MAP joint estimation results on MonteCarlo testing; III-C displays the behavior of the proposedmethod on real datasets. Finally, section IV concludesthis paper and discusses possible future extensions ofthis work.

II. PROBLEM AND METHOD FORMULATION

Let us consider the voxels (volume elements) dis-played in Fig. 1. Each voxel, after the application of agiven paradigm, may be activated by one or more appliedstimulus (∃k : βk = 1) or may not be activated at all(∀k : βk = 0).

Fig. 1. Example of activated and non activated regions in an fMRIimage, overlaid on a higher resolution structural MRI image.

Fig. 2. BOLD signal generation model.

3

In this paper we consider the BOLD signal associatedto a single voxel at a time - time course - with thefollowing data observation model, displayed in Fig. 2,

y(n) = h(n) ∗N∑

k=1

βkpk(n) + η(n) (1)

where η(n) is modeled as Additive White Gaussian Noise(AWGN), h(n) is the HRF of the brain tissues, pk(n) arethe stimulus signals along time (see Fig. 3) and βk areunknown binary variables to model the activation of thevoxel by the kth stimulus. For instance, Fig. 1 shows theresult of application of a two stimulus paradigm wherethree voxels are referenced: i) a voxels was activated bythe first stimulus, β1 = 1 and β2 = 0, ii) a voxel was notactivated, β1 = 0 and β2 = 0, and iii) a voxel was onlyactivated by the second stimulus, β1 = 0 and β2 = 1.

Fig. 3. Paradigm example with three block-designed stimulus.

In this paper we describe a Bayesian Statistical Para-metric Mapping algorithm (SPM) based on the Max-imum A Posteriori (MAP) [16] criterion called SPM-MAP1. The proposed algorithm jointly estimates thevector b = {β1, β2, ..., βN}T , associated with eachvoxel and the corresponding hemodynamic response,h(n), which can be denoted in vectorial form, h ={h(1), h(2), ..., h(N)}T .

The hemodynamic signal is assumed to be the re-sponse of an IIR linear time invariant system (LTI). Thismodel [12] is graphically displayed in Fig. 4 and has thefollowing third order transfer function

H(z) =b0 + b1z

−1 + b2z−2

1 + a1z−1 + a2z−2 + a3z−3(2)

where the coefficients bk and ak must be estimated.The estimation process is performed by minimizing

an energy function depending on the binary unknownsβk, on the hemodynamic response h(n), and on theobservations y(n) (see Fig. 2). The direct estimation ofthe H(z) coefficients is a difficult task because it is noteasy to define simple priors for these coefficients basedon the desired time response h(n) function. Therefore,to overcome this difficulty, instead of estimating theak and bk IIR coefficients, a FIR is estimated, g =

1Statistical parametric mapping is generally used to identify func-tionally specialized brain responses[1]

Fig. 4. Physiologically Based Hemodynamic model.

{g(1), g(2), ..., g(F )}T , with length F minor or equalto the observations length, F ≤ L. In each iteration thisestimated response is projected into the H(z) space, i.e.,a set of coefficients ak and bk are estimated in order tominimize ‖g(n) − h(n)‖ by estimating the coefficientsak and bk. The PBH ak and bk estimated coefficientsare then used to re-project H(z) into the FIR space bycomputing a new finite response h, which is used toobtain a new estimate of the binary unknowns βk. Thisprocess is schematically represented in Fig. 5, whereeach circle represents the set of admissible responsesfor each one of the IIR and FIR systems.

Fig. 5. FIR → IIR → FIR projections scheme.

Let x = {x(1), x(2), x(3), ..., x(L)}T (see Fig. 2),which may be expressed as follows

x = θb (3)

where

θ =

⎛⎜⎜⎜⎜⎜⎝

p1(1) p2(1) p3(1) ... pN (1)p1(2) p2(2) p3(2) ... pN (2)p1(3) p2(3) p3(3) ... pN (3)

......

... ......

p1(L) p2(L) p3(l) ... pN (L)

⎞⎟⎟⎟⎟⎟⎠ (4)

The output vector of h(n) displayed in Fig. 2,z = {z(1), z(2), z(3), ...z(L)}T , is obtained by z(n) =h(n)∗x(n), where h(n) is a F length FIR and ∗ denotes

4

the convolution operation. The output signal may beexpressed in the two following ways

z = Hx (5)

z = Φh (6)

where H and Φ are the following L × L and L × pToeplitz matrices respectively

H =

⎛⎜⎜⎜⎜⎜⎝

h(1) 0 0 0 0 0h(2) h(1) 0 0 0 0h(3) h(2) h(1) 0 0 0

......

......

......

0 ... h(p) h(p − 1) ... h(1)

⎞⎟⎟⎟⎟⎟⎠ (7)

Φ =

⎛⎜⎜⎜⎝

x(1) 0 0 0 0x(2) x(1) 0 0 0

......

...... 0

x(L) x(L − 1) ... ... x(L − P + 1)

⎞⎟⎟⎟⎠ (8)

The observed BOLD signal y(n), y ={y(1), y(2), ..., y(L)}T , can therefore be obtainedwith the following two ways

y = Ψb + n (9)

y = Φh + n (10)

where Ψ = Hθ and n = {η(1), η(2), ..., η(L)}T is avector of Independent and Identically Distributed (i.i.d)zero mean random variables normally distributed, thatis, p(η(k)) = N(0, σ2

y). This AWGN is usually used tomodel the corruption process in functional MRI [17],although other models may also be used, e.g., Rice[18] and Rayleigh [19]. In this work we suppose thatthe motion correction preprocessing step was efficientenough to remove most of the temporal correlationbetween voxels, and so its influence is included andcorrected along with the corruption noise.

III. ESTIMATION

The Maximum a Posteriori (MAP) estimation is ob-tained by minimizing the following energy function

E(y,x(b),h) = Ey(y,x(b),h) + Eb(b)+

Eh(x(b))(11)

where the data fidelity term is

Ey(y,x(b),h) = − log(p(y|x(b))) (12)

and the prior terms associated to the unknowns to beestimated, b = {β1, ...βN} and h = {h(0), ..., h(p−1)}are

Eb(b) = − log(p(b)) (13)

Eh(x(b)) = − log(p(h)). (14)

These priors incorporate the a priori knowledge aboutthe unknowns to be estimated: i) βk are binary andii)h(n) is smooth.

Fig. 6. Fluxogram of the proposed SPM-MAP algorithm.

The estimation process is performed in the followingthree steps, as shown in Fig. 6,

bt = arg minβ

E(y,x(bt−1),ht−1) (15)

g = arg minh

E(y,x(bt),ht−1) (16)

ht = ProjFIR [ProjIIR(g)] (17)

where ()t means estimation at tth iteration and Projstands for the projection operation by using the minimumsquare error (MSE) criterion. The ProjIIR problemis not trivial [20,21]. In this paper the approximationalgorithm proposed by Shanks [22,23] is used. Otherapproximations proposed by Prony [23,24] and Padé[23] may also be used, but lead to worse results in theMSE criterion.

The independence assumption on the time-courseobservations may not be realistic. However, it is areasonable assumption and a convenient mathematicalsimplification, because it separates the observations de-pendence and the estimated data dependence, simplifying

5

a considerable number of expressions. Furthermore, theinclusion of the observations statistical dependence inthe mathematical formulation may not lead to relevantimprovement on the final solution, as noted in [25].

The observations independence means thatp(y|x(b),h) =

∏Li=1 p(y(i)|(x ∗ h)(i)). The

adoption of the AWGN model leads to,

p(y(i)|(x ∗ h)(i)) = 1√2πσ2

y

e− (y(i)−(x∗h)(i))2

2σ2y where

σ2y is the observations noise variance. The parameters

βk to be estimated are also assumed independent, thatis,

p(b) =N∏

i=1

p(βk) (18)

where p(βk) is a bi-modal distribution defined as a sumof two Gaussian distributions centered at zero and one,with σ2

β variance,

p(βk) =12[N(0, σ2

β) + N(1, σ2β)]

(19)

because βk are binary variables, βk ∈ {0, 1}. In order tobetter approximate the binary answer, the σβ parametershould be as small as possible, but numerical stabilityreasons prevent the adoption of too small values. Theprior term Eb(b) may therefore be written as

Eb(b) =N∑

k=1

[2β2

k − 2βk + 14σ2

β

− log

(cosh

[2βk − 1

4σ2β

])].

To impose smoothness [26] on the estimated hemody-namic response, h(n) is assumed to be a Markov RandomField (MRF), which means, by the Hammersley-Cliffordtheorem [27], that p(h) is a Gibbs distribution:

p(h) =1Zh

e−α∑N

n=2 (h(n)−h(n−1))2 (20)

which leads to

Eh(x(b)) = − log(p(h)) = α(∆h)T (∆h) + C (21)

where α is a parameter that tunes the smoothing degreefor h(n), C is a constant, Zh is a partition function and∆ is the following difference operator

∆ =

⎛⎜⎜⎜⎜⎜⎝

1 0 0 ... 0 −1−1 1 0 ... 0 00 −1 1 ... 0 0...

...... ... 0 0

0 0 0 ... −1 1

⎞⎟⎟⎟⎟⎟⎠ (22)

The energy function (11) to be minimized,E(y,x(b),h), has the following formats in step one

(see section III-A) and step two (see section III-B)respectively

E1(y,x(b),h) =1

2σ2y

(Ψb − y)T (Ψb− y)+

Eb(b) + C1

(23)

E2(y,x(b),h) =1

2σ2y

(Φh − y)T (Φh− y)+

αhT (∆T ∆)h + C2.

(24)

The MAP estimate is obtained by finding theE(y,x(b),h) stationary point,

∇E(y,x(b),h) = 0. (25)

where (∇) is the gradient operator.

A. Step One: b estimation

In the first step the following equation must be solved

∇bE1 = ΨT (Ψb− y) +σ2

y

σ2β

[b− 1

2R(b)

]= 0 (26)

where ∇b is the gradient operator with respect to b andR(b) is a column vector with N elements rk,

rk = 1 + tanh

[2βk − 1

4σ2β

]. (27)

The solution of (26) may be obtained by using thefixed point method which leads to the following recur-sion

bt = (ΨT Ψ + λI)−1(ΨT y +λ

2R(bt−1)) (28)

where λ = σ2y/2σ

2β is a parameter, I is a N dimensional

identity matrix and bt is the b estimate at tth iteration.

B. Step Two: h estimation

In the second step, where a new estimate of h(n) isobtained, the following equation is solved

∇hE2 = ΦT (Φh− y) + 2λσ2yLh = 0 (29)

where ∇h is the gradient operator with respect to h andL = ∆T ∆ (see (22)).

The solution of (29) is

g =[ΦTΦ + 2λσ2

yL]−1

ΦTy (30)

where Φ(x(b)) is computed (8) with the b estimate, bt,obtained in step one (28).

6

C. Step Three: ProjFIR [ProjIIR(g)]

A HRF smoothness constrain is much more easy todefine on a FIR hemodynamic response, than on the IIRspace. However, this FIR response does not necessarilybelong to the responses space of the adopted IIR PBHmodel, presented in section II, which imposes harderrestrictions on h(n). To cope with this, restriction toits shape is adopted, in each iteration, by projectingthe finite length hemodynamic response, g, estimated instep two (30) into the IIR space of H(z) and then re-projecting it into the FIR space to obtain a new estimateof h, ht, as schematized in Fig. 5.

For the g → IIR projection, we use the Shanks’smethod [22,23] which provides the least, although farfrom ideal, MSE when compared with Padé’s [23] andProny’s [23,24] method.

The IIR → FIR projection is achieved by simplysampling the estimated continuous IIR function into adiscrete signal of length F , which is the best possibleestimate in a MSE sense.

These three steps (III-A, III-B andIII-C) are repeateduntil convergence, when very low variability of b, fromthe previous iterations to the current one, is achieved(stop criteria refered in Fig. 6). Hence, the estimatedelements of b, βk, are not binary but, close to 0 or 1,real numbers. To accomplish the desired binary natureof b the following threshold is applied to βk

bk =

{0 βk < 0.51 otherwise.

(31)

and this is the final activation estimation that providesinformation on whether the brain area represented inthe corresponding voxel was activated by each of theparadigm stimulus or not. If the answer is positive, thenthe estimated HRF provides a possibly valuable insightinto the BOLD local dynamics.

IV. EXPERIMENTAL RESULTS

In the pursue of validating the proposed algorithm,several tests where performed on synthetic and realdata and when possible compared against the standarddetection technique SPM-GLM. First the SPM-MAP iscompared to SPM-GLM in terms of their activity de-tection error probability. Secondly, the joint activitydetection and HRF estimation of SPM-MAP is presentedand analyzed.

A. Detection comparison to SPM-GLM on SyntheticData

The proposed SPM-MAP method formalizes the neuralactivation detection problem in step one (section III-A),and the HRF estimation problem in the following steps(section III-B, III-C). The results of each step has astrong influence on the others. So, in order to compareour method to the standard SPM-GLM [1,3], only theactivation detection procedure is considered. Two maindifferences must be stressed between both methods:

1) In the SPM-GLM method the whole N periodsignal is sometimes broken into N pieces corre-sponding to each paradigm period and the result-ing observation pieces are averaged to reduce thenoise corrupting the observations Y . The matrixθ, defined in (4), is built by using only a singleparadigm period. In the proposed method, insteadof braking the signal, it is dealt with as a wholesignal. And the same goes for the paradigm signal.The noise reduction is performed in a Bayesianframework where a realistic observation model isused to cope with it. In the case of AWGN bothmethods are very similar, but if other noise models(e.g. multiplicative) are used this would no longerbe true. This is because the averaging procedure isonly adequate for certain noise models.

2) In the SPM-GLM method the estimation of eachβk is based on the well known classical t-test[28] applied to the estimated coefficients βk . Thisstatistical inference technique is based on the nullhypothesis test, Ho, where the activation probabil-ity of a given voxel is computed with a certainconfidence degree. This test is performed over theestimated coefficients, βk, obtained with the GLM.These coefficients used to linearly combine theEVs (usually a convolution between the paradigmstimulus and the HRF model (s)) are estimated byusing the MSE criterion. These real coefficientsreflect the estimated "presence" amplitude of eachEV in the observed data. In the SPM-MAP methodthe coefficients set are assumed to be binary andare estimated in a Bayesian framework where aprior distribution forces its values to be close of{0, 1}. Once again, our concern is to follow arealistic model where it is assumed that a givenvoxel was activated or not by a given stimulus.Partial voxel activation is not acceptable in thisscope: it is totally activated or it is not activatedat all, by a given stimulus, pk(n).

To better understand the difference between bothmethods, a short description of the SPM-GLM method is

7

presented where the t-test is used to determine if a givenvoxel is or is not activated by a single EV , p(n)∗h(n),which means that b is a scalar, b = [β].

The observed BOLD signal is assumed to be obtainedfrom the following model

y = β (θ ∗ h(n))︸︷︷︸EV

+e (32)

where θ is defined in (4) but with only one stimulus, θ ={p(1), p(2), ..., p(L)}T , h(n) is the canonical gammafunction [1,3,29] and e is the residual error vector notexplained by the model. The β estimation given by theGLM method, in this very simple case, is computed asfollows

β = χ+Y (33)

where χ = θ ∗ h(n) is the EV and χ+ = (χT χ)−1χT isthe so called pseudoinverse of χ [26].

The SPM-GLM method core, tags each voxel as acti-vated, b = 1, or as inactivated, b = 0, by computing theprobability of a random generation of the analysed y(n)time-course with a confidence level α, that is,

b =

{1 (Active) ifp < α; (reject H0)0 (No Active) Otherwise; (acept H0)

(34)

where H0 is the null hypothesis which assumes noactivation with a confidence level α.

The p-value is obtained as follows

p = P (t ≥ T ) = 1 − I L

L+T2(L/2, 0.5) (35)

where Ix(a, b) is the incomplete Beta-function [28] de-fined as

Ix(a, b) =Γ(a + b)x

Γ(a)Γ(b)

∫ x

0τa−1(a − τ)b−1dτ (36)

and

T = β/σβ (37)

is the T estimator associated t-statistics, where β is theestimation value of β, and σβ the standard deviation. tis large if the estimated value is much larger than theestimator variance and t is small if the estimated valueis comparable with the corresponding estimator variance.

The estimator variance, σ2β , may be numerically esti-

mated using the following expression

σ2β = σ2

y

L∑n=1

p2(n) (38)

where p(n) is the nth θ element and σ2y is the estimated

noise energy

σ2y =

1L − 1

L∑n=1

[y(n) − β.p(n)

]2. (39)

To access the effectiveness of the proposed SPM-MAPmethod against the presented SPM-GLM, synthetic 1D-block-designed single stimulus data, with several AWGNnoise levels (σy) and several stimulus epochs (periodsin a block design paradigm approach), N , are used inMonte Carlo simulations. In these, the error probability(Pe), was obtained for each method as follows

Pe(σ,N) =1R

R∑i=1

|bi − bi| (40)

where R = 250 is the number of data repetitions usedin the Monte Carlo tests. The HRF function is assumedto be known and was selected from the PBH modelestimation on real data [12].

The resulting Pe differences between SPM-MAPand SPM-GLM, i.e.: ∆Pe = Pe(SPM−MAP ) −Pe(SPM−GLM), was computed for each experiment, andthe average results for the different noise levels andepochs are displayed in Fig.7 and in table I. Notice thatin table I, σy values that resulted in all null Pe are notdisplayed.

Although the performance of both algorithms de-creases, as expected, with the amount of noise and withthe decrease in epochs number, the ∆Pe obtained isnegative for most of the (N,σ) data pairs tested, whichmeans that the SPM-MAP outperforms the SPM-GLMmethod for almost every configuration tested. This isconfirmed by Table I, where the summation of ∆Pe,∑

i,j ∆Pe(Ni, σj) = −0.46, is negative. The number ofnegative values of ∆Pe, #(∆Pe < 0) = 23 and thenumber of positive values of ∆Pe, #(∆Pe > 0) = 6,which confirms that the SPM-MAP surpasses the tradi-tional SPM-GLM method. Still it is obvious that bothmethods present high accuracy in these tests due to theexact knowledge of both the HRF and noise distribution.

It is important to notice that in the SPM-GLM ap-proach, the HRF is indirectly estimated and used forevery time-course activation analysis. On the contrary,in the SPM-MAP method, this HRF is jointly estimatedwith the activation variables and is space variant. TheSPM-MAP is therefore adaptative, which allows theproposition that in real conditions the performance gapshould increase.

B. Joint Activity Detection and HRF estimation ResultsOn Synthetic Data

In this section Monte Carlo tests of the completeproposed method are presented in order to evaluate theperformance of the algorithm. Two synthetic binary im-ages of 128x128 pixels where generated, which representa single BOLD slice signal, as can be seen overlapped

8

Fig. 7. Error probability differences, ∆Pe, between the SPM-MAPand the SPM-GLM methods for two different graphic views. N isthe number of paradigm epochs and σ is AWGN standard deviationlevel.

N\σy 1 2 4

2 -0.012 0 -0.1683 -0.012 0.004 -0.1084 -0.008 -0.012 -0.0445 -0.004 -0.012 -0.0326 0 -0.008 -0.0167 0 -0.008 -0.0088 0 -0.004 09 0 0 -0.004

10 0 0 0.00411 0 -0.004 0.01212 0 0 0.00413 0 -0.004 014 0 -0.004 015 0 -0.004 0.00416 0 0 -0.00417 0 0 018 0 0 0.00419 0 0 -0.00420 0 0 -0.008

TABLE I∆Pe = Pe(SPM−MAP ) − Pe(SPM−GLM) FOR 2 ≤ N ≤ 20 AND

σy = {1, 2, 4}. FOR ALL OTHER TESTED VALUES OF

σy = {0.01, 0.02, 0.05, 0.1, 0.2, 0.5}, Pe = 0

in Fig. 8. In it, colored voxels (red, yellow and white)where activated by, at least, a stimulus paradigm andthe black pixels where not activated at all. So accord-ing to the mathematical notation presented above, red:b = {1, 0}T ; yellow: b = {0, 1}T ; white: b = {1, 1}T

and black: b = {0, 0}T , which is the activation groundtruth to be estimated for each voxel.

Fig. 8. Synthetic image representing a single BOLD slice with brainareas activated by two paradigms (red and yellow), with a functionaloverlapping region (white) and non activated areas (black).

The BOLD signal, y(n), is generated by using themodel presented in Fig. 2. A reasonable two stimuliblock-design paradigm, p1(n) and p2(n), of 10 secondstask duration followed by a 30 second rest period eachin 5 epochs, were used in order to obtain a non su-perposition of p1(n) and p2(n) while allowing for theBOLD signal to decay to rest. The true impulse HRFsignal, h(n), was generated from a representative IIR,selected from the PBH estimation on real single-eventdata [12], and the following noise energies were used:σy = {0.2, 0.5, 0.7, 0.8, 1}. These noise energies are bet-ter evaluated when compared against the BOLD signalenergy level, which is done with the signal-to-noise ratio(SNR = 10 log

∑N1

[(β1×p1(n)+β2×p2(n))∗h(n)]2

σ2y

) for thetwo data cases in which the BOLD signal is present:∑2

k=1 b(k) ≥ 1 (see Table II ).This generated synthetic data is equivalent to 2×128×

128 = 32768 independent y(n) time-courses, containingall possible combinations for the b vector. These areused on Monte Carlo tests to compute the Pe (see eq.(40)). The results obtained are graphically presented inFig. 9 and in Table II). These values were computed asthe ratio of the total number of wrong estimations overthe total number of Monte Carlo tests (32768).

Table II shows that even for the high noise levels ofσy = 0.8 the proposed SMP-MAP method presents aPe < 0.3%. It is important to point-out that although theSNR in MRI depends on a large number of variables, itis usually more than 1dB [3,30]. So the most realisticσy values would be situated between 0.2 and 0.5, for thedata used. In this range the method achieves values ofPe < 0.1%. Furthermore, for the very high noise amountof σy = 1 (SNR = [−6.7;−3.5]) the Pe stays below5%, resulting in the bottom image in Fig. 9. Notice that

9

σy 0.2 0.5 0.7 0.8 1SNR(dB) [7.3,11] [-0.63,2.6] [-3.6,-0.37] [-4.7,-1.5] [-6.7,-3.5]

Pe(%) 0.0427 0.0916 0.168 0.260 4.27

Pe(%) 0 0 0.0244 0.0245 1.51Pe(%) 0 0 0.0073 0.0061 0.513

TABLE IIMONTE CARLO Pe (40) OF SPM-MAP FOR SEVERAL VALUES OF

AWGN σy AND CORRESPONDENT SNR VALUES FOR THE∑2k=1 b(k) = 1 AND

∑2k=1 b(k) = 2 TWO DIFFERENT SIGNAL

ENERGY SITUATIONS. SPATIAL CORRELATION CORRECTION IS

EXEMPLIFIED IN Pe AND Pe WHERE ONE AND TWO ISOLATED

PIXELS WERE DISMISSED, RESPECTIVELY.

Fig. 9. SPM-MAP activation detection results for σy = 0.5 (up)and σy = 1 (down). Same color code as in Fig. 8

when looking at Fig. 9, the intuitive notion on the errorprobability might seem higher than the refered 0.1% and0.5% values, due to the fact that the images are actualyan overlap of two images.

It is intuitive when looking at the results in figure9, that the accuracy of the method can be improvedif spacial correlation information is included, removingseveral of those isolated, spatially uncorrelated, voxels.For illustration purposes, the Pe is recalculated afterremoving areas of one (Pe) and two (Pe) isolated voxelsin an 8 voxels neighborhood. The resultant error proba-bilities (see Table II) decreases for all the noise amounts,yielding null for the 0.2 and 0.5 σy values. The gain inperformance with this simple morphological correction

applied, is unquestionable in this example because thereare no small regions. If real regions of 1 or 2 voxelsexisted in the data, then their elimination would providefalse-negative errors. Therefore, the inclusion of spatialcorrelation between voxels should follow a more robustmethodology.

The Pe results cannot be directly compared with theresults in section IV-A, because there, the HRF is exactlyknown and uses only one stimulus in the paradigm. Here,the HRF is jointly estimated with the brain activationindicators, βk, and the paradigm is comprised of twostimulus. These aspects greatly influence the Pe dueto an explosion in the possible number of solutions,particularly if we consider the great degree of freedomin the HRF estimation (as mentioned in section III-C). Inboth tests, the exact noise distribution (AWGN) knowl-edge is an advantage that does not exist in real data.Adding to this, the existence of other factors like fieldinhomogeneity and motion errors, the Pe on real data (asshown on section IV-C)should be higher. Unfortunately,ground truth in real fMRI is difficult to obtain. Onlyrough expectations based on the prior knowledge ofwhich brain areas should be activated by the appliedstimulus, may be used to validate the results.

Fig. 10. Mean HRF estimation results considering all βk �= 0estimations (left) and for only the bk �= 0 correct ones (right), forσy = {0.05, 0.5, 1} from top to down. The Real HRF used for datageneration is in green, the estimated FIR average in red, and theestimated IIR average in blue.

The HRF estimation results are harder to analyse. Foreach one of the 32768 voxels a h(n) HRF is estimated,but only the ones corresponding to activated brain areasβk = 1 are relevant. So, in the false-negative case, thatinformation is discarded. On the other hand, the false-positive case is not discarded and the estimated h(n)tends to follow the random AWGN form, around zero.These reduce the amplitude of the HRF estimated mean,

10

computed over all positive estimated voxels βk, includingfalse-positives. This effect can be seen in Fig. 10, wherethe false-positives effect is removed. Considering that inreal data we do not have information on false-positives,a correction could be done dismissing estimated h(n)functions with AWGN distribution, but since the primarygoal of this work is the activation detection, this is left forfurther works. Furthermore there is a global decrease inamplitude of every HRF estimation in the FIR → IIR(section III-C) projection operation, which can be seenon figure 10 right column. In fact, when there is notan IIR filter that perfectly describes the estimated g(n)FIR, the IIR computed by the Shanks [22,23] algorithmis always of lower amplitude. In spite of all this, theHRF estimation provided reasonable results (Fig. 10) andproved robust even in high noise levels.

C. Joint Activity Detection Results On Real Data

Subjects and Data CollectionThree volunteers with no history of neurological or

psychiatric diseases participated on stimulated verbal,motor and trajectory activity during fMRI data acquisi-tion on a Philips Intera Achieva Quasar Dual 3T whole-body system with a 8 channel head-coil. T2*-weightedecho-planar images (EPI) 23cm square field of view witha 128×128 matrix size resulting in an in-plane resolutionof 1.8×1.8mm for each 4mm slice, echo time = 33ms,flip angle = 20o were acquired with a TR = 3000ms.

Three motor paradigms, one verb generation paradigmand one trajectory generation paradigm, summing up tothe total of five paradigms, described in the followingtable, were applied to the subjects.

(a) Verbs Stimulus: Seeing nouns and thinkingof related verbs.

Baseline: Seeing the "#####" string.

(b) Trajectories Stimulus: Reminding routes throughfamiliar places.

Baseline: Silently counting numbers.

(c) Right Foot Stimulus: Moving right foot fingers.Baseline: Complete rest with closed eyes.

(d) Right Hand Stimulus: Closing/opening right hand.Baseline: Complete rest with closed eyes.

(e) Hands Stimulus: Closing/opening both hands.Baseline: Complete rest with closed eyes.

TABLE IIIDESCRIPTION OF THE PARADIGMS CONDITIONS APPLIED TO THE

PARTICIPANT SUBJECTS.

These paradigms were all structured on the sameblock-design, with 20 samples per epoch (meaning 10

samples of stimulus following 10 samples of baseline,summing up to 60s time per epoch) and 4 total epochs.

Subjects and Data CollectionThe fMRI data was preprocessed with the standard

procedures implemented in the BrainVoyager software[15], namely decrease of data distortions due to mo-tion or other phase changes over time (registration)and spacial smoothing. This data was then statisticallyprocessed by the BrainVoyager SPM-GLM and SPM-MAP algorithms, and the resolving results are plottedin Fig. 11 for a selection of slices that have considerableactivity areas in them. Since the obtainable brain mapsby SPM-GLM highly depends on the selected p-values(see section IV-A), a neurologist provided the results,for each data set, which he considered more correct(reference result), based on its experience. Since thisresult is subjective, he also provided two other resultswhich we considered loose and restricted.

Visual inspection of the results in Fig. 11 show someexpected resemblance between the reasonable SPM-GLM brain maps, selected by the neurologist, and thebrain maps obtained by the proposed SPM-MAP algo-rithm. Still the compared algorithms are methodologi-cally different. First, SPM-MAP does not rely on the nullhypothesis for activity detection as SPM-GLM, but ratherconsiders two hypothesis (the null and the alternative)and so is able to answer different classes of questions nothandled by the standard fMRI analysis. Second, SPM-MAP jointly estimates the HRF shape and the associatedactivity brain map, and therefore, should theoreticallybe more precise in its activity results. These maindifferences should account for most of the dissimilaritiesobserved in each image set (identified as a, b, c, d ande on Fig. 11) between the algorithms results.

In several of the brain map image sets (and in allof the selected image sets in Fig. 11), there are brainregions detected as activated by SPM-MAP that werenot detected by SPM-GLM. These are unlikely falsepositives. Considering that the error probability has beenshown considerably low (see section IV-B), the probabil-ity of several false positives occurring grouped in a smallimage area, instead of randomly dispersed in the im-age, is infinitesimally small. Therefore, new brain areasdetected by the proposed method should be consideredreliable in most cases, depending on its size (numberof voxel in that group). The detection of these "new"areas happens when a voxel time-course does fluctuatesin correlation with the stimulus paradigm, but with anHRF of deviant shape from the EVs considered in theGLM (see examples in Fig. 12). Still, there are possiblysome false positive regions (e.g. activated groups of only

11

one or two voxels) in the SPM-MAP results that havebeen taken care of by means of clustering in the SPM-GLM results by the BrainVoyager software. Further de-velopments of SPM-MAP are planned to include spatialcorrelation priors to correct this handicap and improveresults.

Fig. 12. Example of two voxel time-courses, from two differentbrain areas, detected as activated by the SPM-MAP that were notdetected by the SPM-GLM. Binary paradigm is in red and the voxeltime-course in blue.

To enhance confidence in the the "new" activated brainregions detected by the proposed algorithm, a closer neu-rological functional analysis should be done. This wouldinvolve a neurologist careful evaluation, on each case,following a spatial transformation of the SPM-MAP brainmaps into a standard mapped space like the Talairachstereotaxic coordinates [3]. Although this would involveanother study, it is noticeable that many of these "new"areas are located in the frontal cortex, which is the focusof the conscious activity. We suspect that these may wellbe involved in performing the respective paradigm tasks,since all of the paradigm stimulus applied to the subjectshave a strong conscious influence (see Tab. III).

V. CONCLUSIONS

The characterization of brain regions from a functionalpoint of view can be performed by using BOLD contrastfMRI technology. In order to do so, statistical dataanalysis is needed to extract reliable information fromthe noisy and low amplitude BOLD signal. In this papera new Bayesian method is presented, where the neuralactivity detection is jointly obtained along with the HRFestimation. This approach presents two main advantages:1) the activity detection benefits from the adaptativenature of the HRF shape estimation; 2) it provides local,

space variant, HRF estimation. This might provide auseful tool to evaluate and compare estimated brainactivity between regions or subjects and for possiblebehavioral [31], neural and vascular local consideration.

In the joint Bayesian method presented, noisy observa-tions are modeled by the additive white Gaussian noise(AWGN) model and the stimulus activation indicatorsare modeled by binary variables that are estimated. Theprior associated with the binary indicators is a bimodalGaussian distribution around the 0 and 1 values to copewith the uncertainty related to data noise. The HRF isadaptively estimated on the restricted space responses ofthe adopted IIR PBH model [12].

The neural activity detection section of the proposedmethod is compared with the standard method proposedin [1] that uses a classical statistical inference method-ology based on the t-test method. Monte Carlo testson synthetic-1D-block-designed data with both meth-ods have shown that the proposed Bayesian method,called SPM-MAP, outperforms the classical one basedon the general linear model (GLM), here called SPM-GLM. The performance evaluation was based on thecomputation of the error probability, Pe(N,σ), for eachmethod which proved to be smaller for the proposedSPM-MAP method than the corresponding ones obtainedwith the SPM-GLM method, for almost every testedconditions: different noise levels, σj and number ofparadigm epochs, Ni, in a block design framework.

On Monte Carlo tests of the whole method with,again, synthetic data, the Pe obtained was lower than0.1% for noise levels close to expected real ones. Buteven for extreme noise levels the method showed itselfconsiderably reliable. This Pe should increase on realdata, but considering its low values, a prominent increaseis not expected. Preliminary results after a very simplespatial correlation on the activity detection results arepresented and foretells some significant achievable ac-curacy improvement. This is planned to be incorporatedin future developments of the SPM-MAP algorithm tomake the voxel activation probability dependent on itsneighbors. Regarding the HRF estimation, the averageresults showed a robust-to-noise close similarity betweenthe real synthetic data HRF and the estimated HRF,although there is a justified small bias towards an am-plitude reduction.

The real fMRI data tests showed that the SPM-MAPwas able to detect most of the activity detected bySPM-GLM and also detect other activated brain regions.Further analysis of these new brain areas will be donein future works.

In the end we presented and a Bayesian method forstatistical analysis and inference of fMRI data with a new

12

perspective and reliable results validated on synthetic andreal data. This method is parameter-free presented (ifthe noise distribution is know) while many methods areparameter-dependent, like the p-value dependant SPM-GLM standard algorithm. This may be very useful forstandalone analysis or for comparison against resultsfrom other methods.

REFERENCES

[1] K. J. Friston, “Analyzing brain images: Principles andoverview,” in Human Brain Function, R.S.J. Frackowiak andK.J. Friston and C. Frith and R. Dolan and J.C. Mazziotta, Ed.Academic Press USA, 1997, pp. 25–41.

[2] S. Ogawa, R. S. Menon, D. W. Tank, S. G. Kim, H. Merkle,J. M. Ellermann, and K. Ugurbil, “Functional brain mappingby blood oxygenation level-dependent contrast magnetic reso-nance imaging. A comparison of signal characteristics with abiophysical model,” Biophys J, vol. 64, no. 3, pp. 803–812, Mar1993, comparative Study.

[3] P. Jezzard, P. M. Matthews, and S. M. Smith, Functional mag-netic resonance imaging: An introduction to methods. OxfordMedical Publications, 2006.

[4] R. Baumgartner, L. Ryner, W. Richter, R. Summers, M. Jar-masz, and R. Somorjai, “Comparison of two exploratory dataanalysis methods for fMRI: fuzzy clustering vs. principal com-ponent analysis,” Magn Reson Imaging, vol. 18, no. 1, pp. 89–94, Jan 2000, comparative Study.

[5] F. Årup Nielsen, “Bibliography on independent componentanalysis in functional neuroimaging,” 2007. [Online]. Avail-able: http://www2.imm.dtu.dk/ ˜ fn/bib/Nielsen2001BibICA/Nielsen2001BibICA.html

[6] S. L. Z. Nicholas Lange, “Non-linear fourier time series analysisfor human brain mapping by functional magnetic resonanceimaging.” Applied Statistics, vol. 46, no. 1, pp. 1–29, 1997.

[7] K. J. Friston, P. Fletcher, O. Josephs, A. Holmes, M. D. Rugg,and R. Turner, “Event-related fMRI: characterizing differentialresponses,” Neuroimage, vol. 7, no. 1, pp. 30–40, Jan 1998.

[8] J. C. Rajapakse, F. Kruggel, J. M. Maisog, and D. Y. vonCramon, “Modeling hemodynamic response for analysis offunctional MRI time-series,” Hum Brain Mapp, vol. 6, no. 4,pp. 283–300, 1998.

[9] K. J. Friston and A. P. Holmes and K. J. Worsley and J.B. Poline and C. Frith and R. S. J. Frackowiak, “StatisticalParametric Maps in Functional Imaging: A General LinearApproach,” Human Brain Mapping, vol. 2, pp. 189–210, 1995.

[10] R. B. Buxton, E. C. Wong, and L. R. Frank, “Dynamics ofblood flow and oxygenation changes during brain activation:the balloon model,” Magn Reson Med, vol. 39, no. 6, pp. 855–864, Jun 1998.

[11] K. J. Friston, A. Mechelli, R. Turner, and C. J. Price, “Nonlinearresponses in fMRI: The Balloon model, Volterra kernels andother hemodynamics,” NeuroImage, vol. 12, pp. 466–477, 2000.

[12] D. M. Afonso, J. Sanches, and M. H. Lauterbach, “Neural phys-iological modeling towards a hemodynamic response functionfor fMRI,” Aug 2007, pp. 1615–1618.

[13] “tatistical parametric mapping software.” [Online]. Available:http://www.fil.ion.ucl.ac.uk/spm/

[14] “Fmrib software library.” [Online]. Available:http://www.fmrib.ox.ac.uk/fsl/

[15] “Brainvoyager software.” [Online]. Available:http://www.brainvoyager.com/

[16] K. M. Hanson, “Introduction to Bayesian image analysis,”Medical imaging VII: image processing (ed. M. H. Loew), Proc.SPIE, vol. 1898, no. 8, pp. 716–731, 1993.

[17] A. M. Wink and J. B. Roerdink, “Denoising functional MRimages: a comparison of wavelet denoising and Gaussiansmoothing,” IEEE Trans Med Imaging, vol. 23, no. 3, pp. 374–387, Mar 2004, comparative Study.

[18] H. Gudbjartsson and H. S. Patz, “The Rician distribution ofnoisy MRI data,” Magn Reson Med, vol. 34, no. 6, pp. 910–914, Dec 1995.

[19] Z. Q. Wu, A. Ware, and J. Jiang, “Wavelet-based rayleighbackground removal in mri,” pp. 603– 604, 2003.

[20] N. Vucic and H. Boche, “Equalization for MIMO ISI Systemsusing Channel Inversion under Causality, Stability and Ro-bustness Constraints,” in Proc. IEEE International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP 2006),Toulouse, France, May 2006.

[21] S. S. Rao and A. Ramasubrahmanyan, “Evolving iir approxi-mants for fir digital filters,” in ASILOMAR ’95: Proceedingsof the 29th Asilomar Conference on Signals, Systems andComputers (2-Volume Set). Washington, DC, USA: IEEEComputer Society, 1995, p. 976.

[22] John L. Shanks, “Recursion filters for digital processing,”Geophysics, vol. 32, pp. 33–51, 1967.

[23] M. H. Hayes, Statistical Digital Signal Processing and Model-ing. Wiley, March 1996.

[24] G. Baron de Prony, “Essai expérimental et analytique sur leslois de la Dilatabilité des uidesélastique et sur celles de la Forceexpansive de la vapeur de l’eau et de la vapeur de l’alkool,ádiérentes températures,” Jour. de L’Ecole Polytechnique, vol. 1,no. 1, pp. 24–76, 1795.

[25] E. Rignot and R. Chelappa, “Segmentation of polarimetricsunthetic aperture radar data,” IEEE Trans. Image Processing,vol. 1, no. 1, pp. 281–300, 1992.

[26] T. K. Moon and W. C. Stirling, Mathematical methods andalgorithms for signal processing, 2000.

[27] S. Geman and D. Geman, “Stochastic relaxation, gibbs distri-butions, and the bayesian restoration of images,” pp. 564–584,1987.

[28] N. Kutner and N. Wasserman and C. J. Nachtsheim and W.Wasserman, Applied Linear Statistical Models. The McGraw-Hill Companies, Inc., 1996.

[29] G. H. Glover, “Deconvolution of impulse response in event-related BOLD fMRI,” Neuroimage, vol. 9, no. 4, pp. 416–429,Apr 1999, clinical Trial.

[30] ——, “On signal to noise ratio tradeoffs in fmri,” Apr 1999.[31] M. Fukunaga, S. G. Horovitz, P. van Gelderen, J. A. de Zwart,

J. M. Jansma, V. N. Ikonomidou, R. Chu, R. H. R. Deckers,D. A. Leopold, and J. H. Duyn, “Large-amplitude, spatiallycorrelated fluctuations in BOLD fMRI signals during extendedrest and early sleep stages,” Magn Reson Imaging, vol. 24, no. 8,pp. 979–992, Oct 2006.

13

(a) Verbs generation paradigm results (b) Trajectory generation paradigm results

(c) Right foot paradigm results

(d) Right hand paradigm results (e) Bilateral hands paradigm results

Fig. 11. SPM-MAP activity detection results on real data compared against data processed by a neurologist on SPM-GLM. On the top ofeach image-set ((a), (b), (c), (d) and (e)) is the SPM-MAP result and on the bottom the SPM-GLM results with left to right increase inrestriction on the p-value, where the middle image was intuitively set as the reference result. The background fMRI image in the SPM-MAPresults is the smoothed data used for the statistical analysis, and in SPM-GLM is the unsmoothed data.

new bayesian joint detection and estimation in functional mri · 1 new bayesian joint detection and...

Documents