university of groningen wavelet-based methods for the analysis … · 2016. 3. 7. · isbn...

University of Groningen

Wavelet-based methods for the analysis of fMRI time seriesWink, Alle Meije

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2004

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):Wink, A. M. (2004). Wavelet-based methods for the analysis of fMRI time series. s.n.

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 02-01-2021

https://www.rug.nl/research/portal/en/publications/waveletbased-methods-for-the-analysis-of-fmri-time-series(a3724612-39de-41fe-9c26-733053b98ee2).html

https://www.rug.nl/research/portal/en/publications/waveletbased-methods-for-the-analysis-of-fmri-time-series(a3724612-39de-41fe-9c26-733053b98ee2).html

Wavelet-based Methods for the Analysis offMRI Time Series

Alle Meije Wink

This research has been part of the project “Wavelets and theirapplications” funded by the Dutch National Science Founda-tion (NWO), project no. 613.006.570. The research has beenconducted within the School of Behavioral and CognitiveNeurosciences (BCN).

Cover: Brain surface with active areas displayed via normalfusion, which integrates the activation map along the inwardsurface normals and colours the surface according to theamount of detected activation. The waveforms are a symmet-ric cubic spline scaling function (top) and the correspondingwavelet (bottom).

Wink, A. M.

Wavelet-based Methods for the Analysis of fMRI Time SeriesAlle Meije WinkThesis Rijksuniversiteit Groningen. - With index, ref.

Printed by Printpartners Ipskamp (www.ppi.nl).ISBN 90–367–2090–7

Online version:ISBN 90–367–2091–5

RIJKSUNIVERSITEIT GRONINGEN

Wavelet-based Methods for the Analysis of fMRI Time Series

Proefschriftter verkrijging van het doctoraat in deWiskunde en Natuurwetenschappenaan de Rijksuniversiteit Groningen

op gezag van deRector Magnificus, dr. F. Zwarts,in het openbaar te verdedigen op

vrijdag 10 september 2004om 13.15 uur

door

Alle Meije Winkgeboren op 6 juli 1976

te Drachten

Promotor: Prof. dr. J. B. T. M. Roerdink

Beoordelingscommissie: Prof. dr. ir. H. DuifhuisProf. dr. D. G. NorrisProf. dr. ir. M. A. Viergever

ISBN 90–367–2090–7

Contents

1 Introduction 11.1 MRI and fMRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 MRI physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 MR imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Functional MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.4 The analysis of fMRI data . . . . . . . . . . . . . . . . . . . . . . . . 71.1.5 Models and methods in fMRI . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.1 Fourier transforms and wavelet transforms . . . . . . . . . . . . . . 121.2.2 Wavelet bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2.3 Applications of the wavelet transform . . . . . . . . . . . . . . . . . 17

1.3 Thesis contribution and organisation . . . . . . . . . . . . . . . . . . . . . . 19

Denoising fMRI time series 23

2 BOLD noise assumptions in fMRI 252.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Noise in MR images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 The BOLD noise distribution in fMRI: mathematical analysis . . . . . . . . 29

2.3.1 Distribution of the difference fMRI signal . . . . . . . . . . . . . . . 292.3.2 Numerical approximation by a normal distribution . . . . . . . . . 302.3.3 Tail of the BOLD distribution . . . . . . . . . . . . . . . . . . . . . . 322.3.4 Statistical tests of normality . . . . . . . . . . . . . . . . . . . . . . . 322.3.5 Parameter estimation in fMRI data based on the general linear

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4 The BOLD noise distribution in fMRI: experimental data . . . . . . . . . . 342.4.1 Shape of the noise distribution in MR images . . . . . . . . . . . . . 352.4.2 Time series of MR images . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

vi CONTENTS

3 Denoising functional MR images 413.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Thresholding statistical maps: multiple hypotheses . . . . . . . . . . . . . 443.3 Noise models for fMRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.4 Wavelet-based denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4.1 Wavelet bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.4.2 Denoising images by wavelet domain thresholding . . . . . . . . . 50

3.5 Denoising 2D images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.6 Denoising a simulated time series . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6.1 Effect on the temporal SNR . . . . . . . . . . . . . . . . . . . . . . . 573.6.2 Effect on the shape of the detected spots . . . . . . . . . . . . . . . . 573.6.3 Segmentation via SNR thresholding . . . . . . . . . . . . . . . . . . 59

3.7 Statistical tests on the simulated time series . . . . . . . . . . . . . . . . . . 633.7.1 Impact of spatial filtering on the distribution of temporal noise . . 643.7.2 Positive regression dependence of the p-values . . . . . . . . . . . . 653.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.8 Statistical tests on a real fMRI data set . . . . . . . . . . . . . . . . . . . . . 693.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Modelling the haemodynamic response function 77

4 Extracting the HRF using Fourier-wavelet regularised deconvolution 794.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2 Modelling fMRI Time Signals . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.2.1 The General Linear Model . . . . . . . . . . . . . . . . . . . . . . . . 824.2.2 Determining the HRF . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2.3 Modelling the HRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.3.1 Shrinkage I: the Frequency Domain . . . . . . . . . . . . . . . . . . 854.3.2 Shrinkage II: Wavelets and ForWaRD . . . . . . . . . . . . . . . . . 864.3.3 Using ForWaRD to extract the HRF . . . . . . . . . . . . . . . . . . . 87

4.4 Simulation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.4.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.4.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.5 Event-Related fMRI Experiments . . . . . . . . . . . . . . . . . . . . . . . . 944.5.1 Fixed-ISI experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.5.2 Random-ISI experiment . . . . . . . . . . . . . . . . . . . . . . . . . 974.5.3 Using the extracted HRFs in covariance tests . . . . . . . . . . . . . 99

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

CONTENTS vii

5 Extracting the HRF using ForWaRD with orthogonal spline wavelets 1055.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.2 SPM, Wavelets, and ForWaRD . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.2.1 SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.2.2 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2.3 ForWaRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3 Computing the SI-DWT in the frequency domain . . . . . . . . . . . . . . . 1085.3.1 Efficient computation of the frequency-domain SI-DWT . . . . . . 1095.3.2 Computation times: spline wavelets . . . . . . . . . . . . . . . . . . 110

5.4 ForWaRD using spline wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 1115.5 Event-Related fMRI Experiments . . . . . . . . . . . . . . . . . . . . . . . . 111

5.5.1 Fixed-ISI Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125.5.2 Random-ISI Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.3 Using the extracted HRFs in activation tests . . . . . . . . . . . . . 114

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Discussion 1196.1 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

A Mathematical analysis of the null distribution 125A.1 Mean and variance of the BOLD distribution . . . . . . . . . . . . . . . . . 125A.2 Exact form of the null distribution in the Rayleigh case . . . . . . . . . . . 126A.3 Tails of the null distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

B Polyphase decompositions in the frequency domain 129B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129B.2 Upsampling and downsampling in the frequency domain . . . . . . . . . 129

B.2.1 Up/downsampling by a factor of 2 . . . . . . . . . . . . . . . . . . . 129B.3 Up/downsampling by an arbitrary factor . . . . . . . . . . . . . . . . . . . 131

B.3.1 The Z-transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131B.3.2 The DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

B.4 Different downsampling factors Q . . . . . . . . . . . . . . . . . . . . . . . 134B.4.1 Q = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134B.4.2 Q = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

B.5 Computing the FWT in the frequency domain . . . . . . . . . . . . . . . . 134B.6 Two-dimensional FWT in the frequency domain . . . . . . . . . . . . . . . 135B.7 The SI-DWT in the time / frequency domain . . . . . . . . . . . . . . . . . 136

Bibliography 139

Index 149

Publications 153

viii CONTENTS

Samenvatting 155

Dankwoord 159

Chapter 1

Introduction

The field of functional imaging uses medical imaging modalities, such as magnetic res-onance imaging (MRI), see Fig. 1.1, and positron emission tomography (PET), to visu-alise physiological processes. Applications range from PET imaging of tumours, ultra-sound imaging of coronary blood flow and magnetic resonance angiography (MRA),to tracking metabolic activity of proteins, hormones and neurotransmitters. In func-tional neuroimaging, these modalities are used to visualise brain function (see Fig. 1.2).Examples are the visualisation of neurodegenerative diseases with PET/MRI, electro-encephalography (EEG) and magnetoencephalography (MEG) measurement of brainaction potentials, and visualisation of brain regions activated during the processing of(visual, auditory, tactile) stimuli or during the execution of a specific task.

(a) (b) (c)

Figure 1.1. T1-weighted MR image of the head. (a) Transverse slice, (b) sagittal slice,(c) coronal slice.

Since its introduction in the early 1990s, functional magnetic resonance imaging(fMRI) has become the most influential modality for functional neuroimaging. Becauseof its flexibility, fMRI supports a very broad range of experiments, such as: localisationof brain regions where pain stimuli are projected, activation of different brain regionsfor listening to or reading correct words and non-words, such as ‘neuron’ vs. ‘noreun’,localisation of brain regions activated while listening to sounds at different frequen-cies, or finding the centres in the brain responsible for dealing with emotions. While

2

the first fMRI experiments were very simple and straightforward (measuring the differ-ence between active periods and rest periods, without modelling temporal behaviour),today’s experimental setups are much more complex. Consider a visual fMRI experi-ment of an object recognition task. The object is only vaguely visible, and as the scan-ning starts, the image quality increases. As soon as the test person recognises the object,he or she presses a button. The time of the button press is recorded, and the scanningcontinues while the image keeps improving until the end of the run. Such an experi-ment, which usually involves multiple time series, uses different analysis methods. Themoment of recognition and the button press are events, while seeing only noise vs. re-cognising the object are states. A combined analysis of both the recognition event andthe state difference gives very detailed information about how and where the imagesare processed inside the brain.

Figure 1.2. Surface rendering of the brain, with motor cortex activation displayedon the brain surface using normal fusion, and on the orthographic planes using directvolume rendering (thanks to Michel Westenberg for composing this picture).

Because of their size and complexity, fMRI data sets require powerful analysis meth-ods. This thesis only treats the analysis of single-subject (first-level) experiments. Ana-lyses of group studies (second-level experiments) require further statistical analysis.

The rest of this chapter is organised as follows. Section 1.1 introduces MR tech-niques relevant to fMRI and discusses fMRI analysis methods relevant to the rest of thethesis. Section 1.2 introduces the concept of a wavelet transform and demonstrates thealgorithms to compute the wavelet transforms used in this thesis. Section 1.3 gives anoverview of the thesis and discusses its main contributions.

Introduction 3

1.1 MRI and fMRI

This section introduces MRI in general and fMRI in particular. Some concepts in MRphysics are briefly discussed. The aim of this section is not to give a complete overviewof the MR image formation process, but to treat only the concepts relevant to fMRI. Ashort overview of fMRI is given, and topics that are discussed in the next chapters arehighlighted.

1.1.1 MRI physics

Magnetic resonance imaging (MRI) is based on the fact that atoms with an odd atomicmass or an odd atomic charge have an angular momentum, called spin. Each atom hasa magnetic moment with the axis parallel to the spin, inducing a small magnetic field(see Fig. 1.3a). Without the presence of an external magnetic field, the axes of a groupof nuclei are randomly oriented. In a static magnetic field, the nuclei assume preferredorientations, either parallel or anti-parallel to the static field. For atoms whose spinnumber I has the form n+1/2, n∈N (these are the atoms with an odd atomic mass and anodd atomic charge) this alignment is not static: each atom experiences a torque, causingits magnetic axis to precess around the axis of the external field. This precession is calledLarmor precession (Aine 1995, Birn et al. 1999), see Fig. 1.3b. The frequency of precessiondepends on the properties of the nucleus (described by the gyromagnetic ratio) and thestrength of the magnetic field, and is called the Larmor frequency.

(a) (b)

Figure 1.3. (a) Rapidly spinning nuclei possessing a magnetic moment induce a mag-netic field (with direction µ), so that they act as tiny bar magnets. (b) Nuclei do notalign with a static magnetic field, but they precess about the axis of the field. From“Basic principles of MR imaging” (1996), Courtesy Philips Medical Systems.

4 1.1 MRI and fMRI

The hydrogen nucleus (a proton) has spin I=1/2, is abundant in the human body,and has a high magnetic moment. An MR scanner contains of a number of magneticcoils generating magnetic fields with various orientations and strengths (see Fig. 1.4).A static field B0 is generated by a superconducting coil. If a person is placed inside anMR scanner, the protons inside the body will precess around B0 in two states: eitheraligning partly with B0(see Fig. 1.3), or partly against B0, with a tiny preference for thefirst: these atoms are in a lower-energy state than those partly aligned against B0. Thisresults in a net magnetisation B0. The magnitude of this net magnetisation increaseswith the strength of the magnetic field, and decreases with temperature.

The net magnetisation is changed by a radio-frequency radio-frequency (RF) pulse,a short burst of electromagnetic energy. The direction of the RF field is orthogonal tothat of B0, and it rotates around B0 with the Larmor frequency that corresponds to theenergy difference between the two preferred orientations of the nuclei. During the RFpulse, the direction of M rotates around the axis of the RF field. In the rotating frameof reference where the direction of the RF field is fixed, M has two components: Mz,the longitudinal component of M parallel to B0, and Mxy, the transverse componentof M in the plane orthogonal to B0. In the equilibrium state, Mz is maximal and Mxy

is minimal (zero). Immediately after the RF pulse, Mz is small and Mxy is large. Thereturn to the equilibrium state that follows the RF pulse is called relaxation. Relaxationconsists of two components: longitudinal relaxation (characterised by T1), and trans-verse relaxation (characterised by T2). Longitudinal relaxation is the recovery of the Mz

component of the net magnetisation, and T1 denotes the time in which Mz regains 63%of its equilibrium value. T1 is typically short in environments where molecules can ef-ficiently transfer energy. A stronger magnetic field results in a longer T1. Transverserelaxation is the decay of Mxy, and T2 denotes the time in which Mxy decreases to 37%of its maximum value. Transverse relaxation is the exchange of energy between nuclei.Short T2 values occur in solids and large molecules, long T2 values occur in fluids. T2

is independent of the magnetic field strength. Local field inhomogeneities greatly ac-celerate the decay of Mxy. The decay time T ∗

2 , characterises the decay resulting fromtransverse relaxation in combination with field inhomogeneities, and is shorter than T2.The spin-echo method, which can be used to measure T2, applies a second RF pulse atsome time after the first. The effect of this pulse is that the spins refocus. This results inan echo signal at time TE (echo time). The amplitude of the echo signal depends on T2.

1.1.2 MR imaging

The formation of an image using these principles consists of a number of steps. Gradientfields are provided by extra coils. If a relatively small gradient in field strength alongthe longitudinal (z) axis is added to the B0 field (see Fig. 1.4), every transverse slice hasits own Larmor frequency. There is a difference in the spacing of slices (the distancebetween the centres of the slices along the z-axis) and the slice thickness. Spacing iscontrolled by the z-gradient and the central frequencies of the RF pulses, thickness is

Introduction 5

Figure 1.4. Coil arrangement in an MR scanner. The direction of the gradients can befound via the right-hand rule. If the fingers of your right hand point in the directionof the current in the coil (light arrows), the thumb points in the direction of the field.In the case of the z-gradient coil, the field at one end of the B0 coil is parallel to B0, atthe other end is anti-parallel to B0, creating a gradient in the z-direction. From “Basicprinciples of MR imaging” (1996), Courtesy Philips Medical Systems.

controlled by the shape and the bandwidth of each RF pulse. The (x, y)-position in aslice is determined by gradient fields orthogonal to B0 (see Fig. 1.4). With a gradientfield in the x-direction, the Fourier transform of the detected signal is a projection ontothe x-axis. The amplitude of each frequency along the x-axis is the sum of amplitudesmeasured along the y-axis at that frequency. The y-position is often determined byapplying a phase-encoding gradient in the y-direction. The x-gradient is also used asthe readout gradient, i.e., the field gradient that is active when the signal is received.

After termination of the RF pulse, M returns to its original (equilibrium) direction.After applying the z-gradient, an image of a 2D slice is measured in the so-called k-space, with frequencies encoded along the x-dimension and phases encoded along they-dimension. Two-dimensional (2D) Fourier techniques are used to reconstruct an im-age of the slice. Using the field gradient parallel to B0 for slice selective excitation (z)and using a field gradient orthogonal to B0 for frequency encoding (x) and phase en-coding (y), a 2D image can be computed. By making multiple slices for different valuesof z, a three-dimensional (3D) image is obtained (see Fig. 1.5).

The sequence of excitation pulses and readouts determines the image contrast. Asequence is determined by a number of scanner parameters, such as the repetition time

6 1.1 MRI and fMRI

Figure 1.5. The x-, y-, and z-dimensions of medical images.

TR, and the echo time TE. A short TR and a short TE yield a T1-weighted image, a longTR and a long TE yield a T ∗

2 -weighted image. Sequences differ in speed and signal-to-noise ratio: some sequences have great contrast but require minutes to complete, othershave low contrast but can be completed in less than a second. Echo planar imaging(EPI), discussed in the next subsection, is a fast sequence that is often used in functionalMRI experiments.

1.1.3 Functional MRI

Functional magnetic resonance imaging (fMRI) is one of the most versatile methodsto study human brain function. The discovery that the changes in the concentrationof oxyhaemoglobin induced by local brain activity are measurable in an MRI scanner,was made in the early 1990s by a number of independent groups (Ogawa et al. 1990,Belliveau et al. 1991, Bandettini et al. 1992).

The transporter of oxygen in the blood is haemoglobin, a protein that binds oxygen.Haemoglobin carrying oxygen is called oxyhaemoglobin, and haemoglobin not carryingoxygen is called deoxyhaemoglobin. Oxyhaemoglobin is diamagnetic and deoxyhaemo-globin is paramagnetic. A paramagnetic substance becomes magnetised in the presenceof a magnetic field, causing field inhomogeneities and a faster T ∗

2 relaxation, which inturn causes a local decrease in signal intensity.

The signal in fMRI, called the blood oxygenation level dependent (BOLD) contrast,shows an increase during increased local activity. This is explained as follows. As localactivity increases, so does the local oxygen consumption, demanding an increase inoxyhaemoglobin concentration. Oxygen is delivered to the tissue by passive diffusion(i.e., not mediated by a transport mechanism) through the blood vessel walls. To in-crease this diffusion by the same amount as the extra consumption, the local supply

Introduction 7

of oxygenated blood needs to increase much more, leading to a relative increase in theoxyhaemoglobin level and decrease in the deoxyhaemoglobin level. The result is an in-crease in signal intensity (Birn et al. 1999). In T ∗

2 -weighted images, this BOLD contrastcan be measured.

oxyhaemoglobindeoxyhaemoglobin

capillaries

venulearteriole

blood flowoxyhaemoglobindeoxyhaemoglobin

capillaries

venulearteriole

blood flow

(a) (b)

Figure 1.6. (a) Regional blood flow in the brain during rest. (b) Local activity increasesthe regional blood flow and the local concentration of oxyhaemoglobin.

Functional imaging requires fast acquisition of MRI images to obtain time series witha high temporal resolution. Echo planar imaging (EPI) is a common MR sequence forfMRI. Single-shot EPI scans a whole slice with one RF excitation by alternating the direc-tion of the x-gradient after reading each line in k-space, using a zig-zag pattern. Typicalscanning times for 3D images are within three seconds. Howseman and Bowtell (1999)give an overview of the BOLD contrast and the required MRI techniques. All functionalMR images used in this thesis are EPI scans.

1.1.4 The analysis of fMRI data

This subsection describes the analysis process of a single-subject fMRI data set. Con-sider an fMRI experiment in which the subject has to perform a very short short task,once every 30 seconds. The analysis of this event-related experiment (a short task maybe considered as a single event) takes a number of steps.

First, the images of the time series must be aligned, i.e., their voxel coordinates mustbe relative to the same coordinate system. Scans of the head are relatively easy to align,compared to images of soft tissues. The head does not deform: it is a rigid body. Re-alignment only requires a rigid transformation for each image, e.g. by aligning everyimage to the first image of the time series. To compare time series of multiple subjects,it is necessary to map one person’s (anatomical) coordinate system to another’s. In thiscase, the images must be spatially normalised. Contrary to realignment, normalisationusually requires nonlinear transformations (Ashburner and Friston 1997, 1999).

The next step is noise suppression. One of the traditional preprocessing steps infMRI is to smooth each image, which is done by convolving the image with a lowpassfilter kernel. The effect of smoothing is that the highest spatial frequencies in an imageare suppressed. High-frequency signals are assumed to be mainly noise and accordingto certain measures, smoothing reduces noise. The problem with distinguishing noise

8 1.1 MRI and fMRI

from true signal is that for fMRI, the ground truth, the noise-free image, cannot be meas-ured. In (experimental) situations where the ground truth is available, smoothing isshown to seriously degrade the images while suppressing noise (see chapter 3).

After denoising, the images are ready for the statistical analysis. Different effectsmeasured in the time series can be predicted and modelled as time signals. These effectsmay be experiment-related (e.g., the timing of the stimuli, or the description of differentstates), or concern unwanted effects (e.g., signal changes due to head movements, orsignals originating from heart beat or blood flow). The residual noise in the analysis isthe unmodelled part of the signal, and the more effects can be modelled, the smaller theresiduals. During the statistical analysis, the role that each of the modelled effects playsin the total signal at each location in the image is represented by a weighting factor.Active regions are those regions where the weighting factors of the effects of interest aresignificantly high.

The performance of this analysis method depends heavily on the quality of the dataafter preprocessing and the precision of the model, which are discussed in the nextsubsection.

1.1.5 Models and methods in fMRI

The first fMRI experiments were aimed at detecting a BOLD signal. The experiment wasusually a block design, which alternates between periods of rest and periods of stimuluspresentation. During periods of rest, the subject lay still in the scanner, and duringperiods of stimulus presentation either a sound was played, a strong-contrast image(like a black-and-white checkerboard) was shown, or the test person had to performa task. Images scanned during rest periods were averaged together into a mean restimage, and images scanned during activity were averaged into a mean active image.The difference between those images represented the BOLD contrast. If different activestates were used, a number of mean images were combined to produce the contrastimage.

Active regions are the parts of the contrast image that have significantly high values.The term significance is usually defined in a statistical context, by assuming a distribu-tion of the noise in the contrast image. The BOLD images are thresholded at a quantileof the distribution, guaranteeing that the percentage of false positives does not exceeda given level.

A more general analysis is possible within the framework of the general linear model(GLM), which treats the BOLD responses as the output of a linear, time-invariant (LTI)system. Linear here indicates that the total response to multiple stimuli is the sum ofthe responses to all stimuli individually. Time invariance implies that the response doesnot change between stimulus times. A result of the GLM is that the fMRI response toa stimulus pattern can be modelled by convolving the stimulus pattern with the cor-responding impulse response function, the so-called haemodynamic response function(HRF). Statistical parametric mapping (SPM, Friston et al. 1995c), is based on the GLM.

Introduction 9

SPM assumes the noise to be additive and Gaussian distributed. The total fMRI signalmeasured in the experiment is a weighted sum of explanatory signals (signals that modelcomponents of the responses) plus noise. Given a Gaussian temporal distribution ofthe residual noise, which follows from the GLM if the original data contains Gaussiantemporal noise, the significance of the weighting factors can be found in each voxelvia standard hypothesis testing. For spatially stationary noise, the threshold need onlybe determined once, after which it can be applied to the entire map of statistic values,making SPM a very efficient analysis method.

The GLM definition of the BOLD contrast as a measure of covariance rather than as adifference image enables more advanced study designs. Bandettini et al. (1993) measurethe strength of the BOLD response by computing its correlation with the expected blocksignal. The experiments have a blocked design, but the BOLD contrast is computed viaanalysis of covariance (ANCOVA), rather than computing mean block images. Worsleyand Friston (1995, 2002) have developed a general statistical framework for fMRI timeseries analysis based on Gaussian random field (GRF) theory. Boynton et al. (1996) testthe GLM by independently varying stimulus duration and contrast, and investigatingthe additivity of the noise. They conclude that although deviations from linearity canbe measured, these are not strong enough to reject the GLM.

With faster scanners and a better temporal resolution, fMRI shifted from state-related,i.e., comparing scans taken during different experimental conditions, to event-related:explicitly modelling the time signal as a response to short stimuli. Josephs and Hen-son (1999) present an overview of event-related fMRI, demonstrating the benefits ofevent-related fMRI over state-related fMRI, and describing how to optimise experi-mental designs for event-related analysis. The BOLD response during an active periodmay be described by modelling the active state as a block signal. One step towardsevent-related analysis is to treat the block state signal as a stimulus signal and con-volve it with the HRF (Bandettini et al. 1993). Event-related fMRI involves stimuli muchshorter than the repetition time TR. According to the GLM, such a short stimulus willevoke a signal with the shape of a HRF. The HRF is often modelled as a smooth curve,rising about two seconds after the stimulus, peaking at approximately seven secondsafter the stimulus, followed by a negative undershoot and returning to baseline around30 seconds after the stimulus. As a result, the responses to stimuli much shorter thanthe acquisition time of an EPI volume can still be measured using fMRI.

A number of studies use the GLM in an event-related setting (Dale and Buckner1997, Miezin et al. 2000, Glover 1999). Friston et al. (1995b) test for differential responses,i.e., responses in different brain regions that vary in temporal shape, by comparing theresponses in those regions to different temporal basis functions. In a later study, dif-ferential responses are captured by expanding the responses as a superposition of basisfunctions (Friston et al. 1998a). Josephs et al. (1997) acquire high-resolution temporalsamples of the HRF by using interleaved post-stimulus sampling.

Next to detecting active regions, fMRI analysis also tries to describe the signals meas-ured in those regions. A number of methods estimate and model the HRF itself (Hinrichset al. 2000, Ollinger et al. 2001a, Ollinger et al. 2001b, Ciuciu et al. 2003). Other meth-

10 1.1 MRI and fMRI

ods assume a fixed response waveform, and estimate the delay between stimulus andthe onset of the HRF (Menon et al. 1998, Calhoun et al. 2000, Liao et al. 2002, Hensonet al. 2002). Aguirre et al. (1998) show that although there is substantial variance in theHRF between subjects, the differences between response functions measured within onesubject in subsequent experiments are much smaller. Jasdzewski et al. (2003) show thatthe temporal shape of the HRF differs significantly between the motor cortex and thevisual cortex, but within those regions it does not vary significantly between subjects.

The next paragraphs present some assumptions used for neuroimage analysis whichhave long been used without questioning, but have recently been subjected to criticalinspection.

Validity of the GLM

Experiments with varying stimulus durations and varying interstimulus intervals (ISI)have shown that the BOLD response is in general not linear. A number of solutions havebeen developed for the problem of modelling nonlinear effects in BOLD fMRI (Fristonet al. 2000a). Dynamical models have been used to describe the temporal changes inthe blood oxygenation levels (Buxton et al. 1998). Another solution is to estimate thenonlinear component of event-related responses in fMRI by expanding the response as asecond-order Volterra series (Friston et al. 1998b). The GLM is still the most widely-usedmodel for fMRI time series, but other, more advanced methods are gaining popularity.

Distribution of temporal fMRI noise

The hypothesis testing based on the GLM assumes that temporal fMRI noise is (i) ad-ditive and Gaussian distributed, and (ii) temporally uncorrelated. The assumption ofGaussian distributed temporal BOLD noise is widely accepted. In MR physics how-ever, noise in MR images has been shown to have a Rician distribution (Henkelman1985, Gudbjartsson and Patz 1995, Sijbers et al. 1998a, Sijbers et al. 1998c). Recent re-search of BOLD noise has shown deviations from a Gaussian distribution (Hanson andBly 2001) and validity tests for the assumption of Gaussian noise have been developed(Luo and Nichols 2003). Nichols and Holmes (2002) have developed a nonparametrichypothesis testing method for functional neuroimaging. This method does not makeany assumptions about the distribution of temporal noise. It is based on permutationtests, which makes it useful for PET experiments and second-level analysis of fMRI data,but not for (temporally correlated) fMRI time series. Raz et al. (2003) have developed apermutation test which is suitable for fMRI time series analysis, permuting the stimuluspattern rather than the sequence of scans.

Most testing methods assume the temporal noise to be uncorrelated (white). Fadiliand Bullmore (2001) assume that fMRI time signals contain 1/f (pink) noise. They in-troduce a technique called wavelet-generalised least squares (WLS) to get unbiased es-timators of the GLM in the presence of temporally correlated noise. In another paper by

Introduction 11

the same group (Bullmore et al. 2001), temporal autocorrelations are removed by trans-forming the time signals to the wavelet domain, permuting the detail coefficients, andreconstructing the signals. Alternatives to removing autocorrelations (whitening) arehigh-pass filtering, i.e., remove only the low-frequency autocorrelations, and band-passfiltering, i.e., keep only autocorrelations within a certain range of frequencies (Friston etal. 2000b).

Gaussian spatial autocorrelations

The threshold selection scheme used for SPM needs to correct for multiple hypothesistesting: one volume consists of thousands of voxels, which are all tested simultan-eously. Large numbers of simultaneous tests entail an increased risk of false posit-ives. Common procedures for multiple testing correction are (i) manually increasingthe threshold, (ii) controlling the familywise error (FWE), and (iii) controlling the falsediscovery rate (FDR). Manually changing the threshold is not a favourable solution, be-cause experiments are not reproducible. Two popular methods for controlling the FWEare Bonferroni correction and Gaussian random field (GRF) theory. Bonferroni correc-tion guarantees to control the FWE, but is often considered too conservative: spatialcorrelations in the noise are not taken into account. GRF-based tests assume the spatialcorrelations to be Gaussian, and controls the FWE for a Gaussian random field withparameters estimated from the data. This method is less conservative than Bonferronicorrection, but it relies heavily on Gaussian field assumptions. To bring the images intoagreement with these assumptions, they often require heavy smoothing, leading to de-formed (rounded) regions of detected activity. Control of the FDR is a relatively recentintroduction in fMRI analysis (Genovese et al. 2002), and has rapidly gained popularity.The FDR is the expected proportion of false positives among the total number of posit-ives. Control of this measure is possible with spatially uncorrelated data (Benjamini andHochberg 1995) and spatially positively correlated data (Benjamini and Yekutieli 2001),without the need for more specific knowledge about the autocorrelation function. Thismethod does not require the images to meet stringent shape criteria, which may explainthe popularity of FDR control in fMRI analysis.

Conclusion

A spirit of healthy criticism is found in many areas of fMRI analysis, including even theassumption that an increased BOLD signal is an indicator for increased neuronal activ-ity itself (Raichle 2001, Logothetis et al. 2001, Logothetis 2002). Petersson et al. (1999a)present a good overview of the possibilities, and also of the limitations, assumptionsand risks in contemporary fMRI methodology.

12 1.2 Wavelets

1.2 Wavelets

The word wavelet was first coined by Jean Morlet, a geophysical engineer working for anoil company. Morlet was French, and wavelet is actually a literal translation of ondelette.The let at the end signifies that a wavelet is a small wave, where small in this case standsfor transient. Whereas a wave continues to oscillate, a wavelet is only a small ripple. Awavelet transform converts the time-domain representation of a signal to its wavelet-domain representation. In the wavelet domain, a signal is described as a superpositionof localised basis functions that vary in offset and scale. The wavelet transform is in-troduced here via the more common Fourier transform, which is sometimes used tocompute wavelet transforms.

1.2.1 Fourier transforms and wavelet transforms

Wavelet analysis is related to Fourier analysis. A Fourier transform of a discrete sig-nal x(n) of length N decomposes the signal into N sines and N cosines of differentfrequencies. The Fourier basis functions are represented by complex exponentials, thecosines being the real part and the sines being the imaginary part. Given the signal x =x(n), n = 0, . . ., N − 1, its discrete Fourier transform (DFT) X(k) is defined as:

X(k) =N−1∑n=0

x(n) e−2πink

N ,

eik = cos(k) + i sin(k).

(1.1)

Every frequency coefficientX(k) contains the weights of the sinusoids at that frequency.The real part represents the weight of the cosine, and the imaginary part represents theweight of the sine. The inverse Fourier transform is given by:

x(n) =N−1∑k=0

X(k) e2πink

N , (1.2)

which superimposes the sinusoids of the frequencies k = 0, . . ., N − 1 at every pointn = 0, . . ., N − 1. The complexity of both transforms is O(N2). The DFT of a signalcan be efficiently computed by the fast Fourier transform (FFT), and reconstructed bythe inverse fast Fourier transform (IFFT), with complexity O(N log2N) (Mallat 1998,chapter 3).

An orthogonal wavelet basis is defined by two basis functions: a scaling function φand the corresponding wavelet ψ. The basis itself consists of translated dilations of φ andψ:

φj,l(n) =2−j/2φ(2−jn− l) ψj,l(n) =2−j/2ψ(2−jn− l), (1.3)

Introduction 13

where j and l denote scale and translation, respectively. The basis is orthogonal in L2 ifhl exist so that

φ(n) =∑

l

hlφ(2n− l) ψ(n) =∑

l

glφ(2n+ l),

where gl = (−1)lhl+1.

(1.4)

This condition is called the two-scale relation (Daubechies 1988). Most of the algorithmspresented in this thesis use orthogonal wavelets. Other basis functions, like those basedon B-splines, can be used to constitute a biorthogonal wavelet basis (Cohen et al. 1992).The conditions for biorthogonality are less strict than for orthogonality, and biortho-gonal bases lack certain properties of orthogonal bases, like preserving the amount ofenergy of a signal during the wavelet transform. In Chapters 4 and 5, biorthogonal basisfunctions are used in a detrending algorithm for fMRI time signals.

The J-level biorthogonal wavelet decomposition of a discrete signal x(n) is given by:

x(n) =∑

l

cJl φJ,l(n) +J∑

j=1

∑l

djl ψj,l(n),

cJl = 〈x, φJ,l〉 djl = 〈x, ψj,l〉,

(1.5)

where cJl is called the approximation signal and djl are called the detail signals. From this

equation it follows that the signal c0 represents the input signal and that φ0,n = δn,0.In the discrete setting, φ and ψ are represented by the filters h and g, respectively. Thereconstruction from the wavelet-domain representation of a signal back to its originalform is possible with a dual scaling function and wavelet, respectively, which are rep-resented in the discrete setting by the filters h and g. The dual basis functions mustsatisfy the same conditions, and it is also necessary that:

〈φj,l, ψj,m〉 = 0, 〈ψj,l, φj,m〉 = 0

〈φj,l, φj,m〉 = δl,m, 〈ψj,l, φk,m〉 = δj,kδl,m(1.6)

For an orthogonal basis, φ = φ and ψ = ψ, and the filters and their duals satisfy hl = h−l

and gl = g−l, respectively, where x denotes the complex conjugate of x. The relationbetween the filters and the basis functions is defined by:

φ(n) =√

2∑

l

hlφ(2n− l) ψ(n) =√

2∑

l

glφ(2n− l), and

φ(n) =√

2∑

l

hlφ(2n− l) ψ(n) =√

2∑

l

glφ(2n− l).(1.7)

Some basis functions, like Daubechies’ orthogonal wavelets with compact support (Daubechies1988), are defined as recursive refinements of the filters, starting with φ0,n = δn,0. Otherbasic functions, like orthogonal spline wavelets, are defined in the frequency domain

14 1.2 Wavelets

(Mallat 1989). The scaling filters h are then found via the inverse Fourier transform.Given a scaling filter, the wavelet filter g is found via (1.4), by reversing the filter h andthen multiplying it with the vector (−1, 1,−1, 1, . . . ,−1, 1).

The fast wavelet transform (FWT) is an efficient wavelet transform based on mul-tiresolution analysis (Mallat 1989). The algorithm repeatedly applies the filters h and g,each time followed by downsampling. Denoting downsampling ↓2 and upsampling ↑2

by a factor of 2, respectively, by:

↓2 x(n) = x(2n), n = 0, . . . , (N/2)− 1

↑2 x(n) =

{x(n/2), even n0, odd n,

n = 0, . . . , 2N − 1(1.8)

and using ∗ to denote discrete circular convolution, the FWT algorithm is defined bythe decomposition step at each level j:

cj = ↓2 (h ∗ cj−1) dj = ↓2 (g ∗ cj−1), j = 1, . . . , J (1.9)

Reconstruction via the inverse fast wavelet transform (IFWT) is defined by:

cj−1 = h ∗ (↑2 cj) + g ∗ (↑2 c

j), j = 1, . . . , J (1.10)

Figure 1.7 shows the structures of both the FWT and the IFWT. The operators H , G, H ,and G represent the the convolutions with the filters h, g, h, and g, respectively.

G 2

2H

G 2

2H

G 2

2H...c0 c1

d1

c2

d2

cJ

Jd

(a)

2 H ~

2 G~

+ 2 H ~

2 G~

+ 2 H ~

2 G~

+...cJ

Jd

cJ−1

dJ−1

c1

d1

c0

(b)

Figure 1.7. Graphical representations of the FWT (a) and the IFWT (b).

The FWT is not shift-invariant, i.e., the coefficients of an FWT of a shifted versionof a signal c0 are not the shifted coefficients of the FWT of c0. A shift-invariant discretewavelet transform (SI-DWT) exists (Mallat 1991). Instead of subsampling at every nextlevel, the SI-DWT algorithm does a polyphase decomposition, filters all phases separ-ately, and then does a monophase reconstruction. This operation is performed on two

Introduction 15

copies of the signal, once with filter h and once with g. A polyphase transform of Qphases subsamples the signal with a factor Q for shifts 0, . . . , Q−1. A monophase trans-form interleaves the Q phases again into one signal. At the first level of decompositionQ equals 1, and Q doubles every next level. The approximation cJ and each dj have thesame dimensions as the original c0, so the total size of a J-level transform is (J + 1)×N .The complexity is N log2(N). The SI-DWT is used in the second part of the thesis. Itsinverse is denoted as SI-IDWT. Each step of the SI-IDWT filters the approximation andthe detail at level j+ 1, respectively, and adds them together in the new approximation.Figure 1.8 shows the decomposition step and the reconstruction step, respectively. Theoperators ↓2j and ↑2j perform up- and downsampling with a factor of 2j , respectively,in the same way as the operators in (1.8). The /2j operator divides the signal values by2j .

2j

2j

2j

2j

2j

2j

2j

j2−1

j2−1

z

z

z

z

z

z

H

H

H

G

G

G +

+

z −1

z −1

+

+

+

z −1

z −1

+

z −1

z −1

2j

2j/

2j/

...

G

H0

0

1

2

1

2j

...

2

polyphase

...

...

monophase

2j

2j

2j

j

2j

2j

2j

2

cj cj+1

j+1d

2j

2j

2j

2j

2j

2j

2j

2j H

~

G~

H ~

H ~

G~

G~

G~

H ~

z

z

z

z

z

z

+

+

z −1

z −1

+

+

z −1

z −1

+

z −1

2j/

2j/ + 2/

0

1

2

0

1

2

j2−1

j2−1

...

... ...

...

z −1

polyphase monophase

2j

2j

2j

2j

2j

2j

2j

2j

+

dj+1

cj+1 cj

(a) (b)

Figure 1.8. One level of the SI-DWT (a) and of the SI-IDWT (b).

1.2.2 Wavelet bases

The Haar basis is the simplest wavelet basis. The Haar basis functions are members ofmany wavelet families, such as Daubechies wavelets and spline wavelets. The scalingfunction φ and the wavelet ψ are given by:

φ(n) =

{1, 0 ≤ x < 10, otherwise ψ(n) =

1, 0 ≤ x < 1

2

−1, 12≤ x < 1

0, otherwise(1.11)

16 1.2 Wavelets

Figure 1.9 shows the Haar scaling function and wavelet, respectively. Haar basis func-

−1 −0.5 0 0.5 1 1.5 2−0.5

0

0.5

1

1.5

x

φ(x)

−1 −0.5 0 0.5 1 1.5 2−1.5

−1

−0.5

0

0.5

1

1.5

x

ψ(x

)

(a) (b)

Figure 1.9. (a) Haar scaling function. (b) Haar wavelet.

tions have a number of favourable properties. They are symmetric and they have com-pact support. Disadvantages of the Haar basis functions are: poor approximation, andbad localisation in the frequency domain.

Daubechies’ wavelets (Daubechies 1988) are among the most popular wavelets presentlyin use. They are identified by their number of vanishing moments. The number of van-ishing moments is the maximum degree of the polynomials the scaling function canreproduce. Daubechies shows that the minimal length of a filter h with v vanishingmoments is 2v. The filter g follows from h via (1.4). Daubechies-1 is the Haar basis.Daubechies-2 has two vanishing moments, and h and g both have 4 filter taps. Thecorresponding basis functions φ(n) and ψ(n) are shown in Fig. 1.10. Daubechies’ filtersare the shortest filters that generate an orthogonal wavelet basis, given a number ofvanishing moments. As a result, they enable very efficient wavelet transforms via theFWT. Better localisation in the frequency domain and better approximation is obtainedby higher filter numbers (at the cost of longer filters). Disadvantages of these filtersare: they are not symmetric, and not well localised in the frequency domain. Variantsof Daubechies wavelets are: symlets, which are more symmetric (they are made fromDaubechies wavelets by rearranging the filter coefficients), and coiflets, whose waveletsalso have vanishing moments. Of the coiflet basis with index v, the scaling function has2v−1 vanishing moments and the wavelet has 2v vanishing moments. Both filters havea support of length 6N − 1.

Wavelet basis functions based on spline bases are smooth, well localised in the fre-quency domain, and they have good approximation properties. Many types of splinewavelet bases exist, such as: orthogonal, biorthogonal, causal, and symmetric. Thespline wavelets used in this thesis are orthogonal. Orthogonal spline wavelets do nothave compact support. Unser and Blu (2000) have made a spline wavelet basis construc-tion tool based on fractional splines, which is used to produce the spline wavelet basis

Introduction 17

−1 0 1 2 3 4−0.5

0

0.5

1

1.5

x

φ(x)

−1 0 1 2 3 4−1.5

−1

−0.5

0

0.5

1

1.5

2

x

ψ(x

)

(a) (b)

Figure 1.10. (a) Daubechies-2 scaling function. (b) Daubechies-2 wavelet.

functions in this thesis. An example of spline wavelets are Battle-Lemarie wavelets (seeFig. 1.11), whose scaling functions are based on symmetric orthogonal cubic spline basisfunctions.

−6 −4 −2 0 2 4 6−0.5

0

0.5

1

1.5

x

φ(x)

−6 −4 −2 0 2 4 6−1

−0.5

0

0.5

1

1.5

x

ψ(x

)

(a) (b)

Figure 1.11. (a) Battle-Lemarie scaling function. (b) Battle-Lemarie wavelet.

1.2.3 Applications of the wavelet transform

Wavelet transforms are common tools in many signal and image processing tasks. Inthis subsection, the applications relevant to the thesis are discussed: denoising andwaveform extraction.

Wavelet-based denoising uses the separation of the approximation cJ and the detailsignals dj, j = 1, . . ., J . The idea is that the relevant part is mainly captured in the ap-

18 1.2 Wavelets

proximation, and of the detail coefficients, only the largest are relevant. Wavelet-baseddenoising thresholds the detail signals: small detail coefficients are either removed orshrunk. The main characteristic of the different wavelet-based denoising methods is thechoice of the thresholds. Many methods use a priori hypotheses about the distribution ofdetail coefficients, usually assuming them to be N(0, 1) distributed. Thresholds may bebased on the false discovery rate (FDR) (Abramovich and Benjamini 1995, 1996) or otherstatistical thresholding procedures. The WaveLab project (Buckheit and Donoho 1995)aims to distribute wavelet-based algorithms via the literature, and to provide imple-mentations of those algorithms for other researchers. WaveLab is a collection of Mat-Lab (The Mathworks, USA) routines for wavelet transforms and wavelet-related oper-ations. This thesis uses a number of WaveLab routines for denoising signals (Donohoand Johnstone 1994, 1995). These methods are based on white N(0, 1)-distributed noiseand compute optimal thresholds for different criteria. If the data contains correlations,wavelet thresholding must be applied to each resolution channel separately (Johnstoneand Silverman 1997). The detail coefficients within one channel of a signal with correl-ated noise show almost no correlation, so threshold selection schemes based on whitenoise can be used. New wavelet-based denoising techniques are based on the likeli-hood ratio of the wavelet coefficients (Pizurica et al. 2003). Wavelet-domain Wiener fil-tering (Ghael et al. 1997, Alexander et al. 2000a) combines the MSE-optimal properties of(frequency domain) Wiener filtering and the minimax mean-square error (MSE) prop-erties of wavelet domain thresholding. Nowak (1999) has developed a wavelet-domainfiltering procedure for removing Rician noise that is characterised as a data-adaptivewavelet-domain Wiener filter. Wavelet-domain Wiener filtering is also used to denoisefMRI time signals (LaConte et al. 2000, Alexander et al. 2000b). Hilton et al. (1996) de-noise and analyse fMRI data in the wavelet domain. They compare the performanceof their own data analytic thresholding technique to a standard technique available inWaveLab. Ruttimann et al. (1998) perform the statistical analysis of fMRI time series inthe wavelet domain, by thresholding the differences between block mean images. Thismethod is fast and statistically robust, although it is only suitable for blocked designs.

Recently, a number of wavelet-based methods has been introduced to remove noise,while also deconvolving a blurring function (Dragotti and Vetterli 2002, Figueiredo andNowak 2003, Neelamani et al. 2004, Johnstone et al. 2004, Kalifa et al. 2003). This is themain subject of the second part of the thesis. The difference between our applicationand the ones mentioned above is that usually in image enhancement applications, theeffect of the impulse response function is simply removed. In our case, the signal is anfMRI time series, and the response function is the HRF, which is the object of interest.Wavelet-based extraction of waveforms has successfully been applied to elecroenceph-alographic (EEG) data (Zibulevsky and Zeevi 2002). A major difference between our ap-proach and those methods, is that the FWT is not used. The FWT is not shift-invariant,therefore the SI-DWT is used instead.

In some situations, variations on the FWT algorithm yield much faster implementa-tions than the original algorithm (Rioul and Duhamel 1992, Vetterli and Herley 1992). Awavelet transform with basis functions that do not have compact support is computed

Introduction 19

most efficiently via the frequency domain. The fast wavelet transform in the frequencydomain is denoted by FWD, its inverse by FWR (Westenberg and Roerdink 2000). Forboth the FWT used in the first part of the thesis, and the SI-DWT used in the second part,implementations in both the frequency domain and the time domain are presented (seeappendix B).

1.3 Thesis contribution and organisation

The remainder of this thesis is divided into two parts.The first part deals with noise in fMRI data. Chapter 2 introduces a definition of

BOLD noise derived from MR physics. MR (magnitude) images contain Rician noise(Gudbjartsson and Patz 1995): MR signals are measured in frequency space, and a com-plex value is measured at each voxel location. MR images are generally magnitudeimages, and if the (complex) noise is Gaussian distributed, its magnitude is Rician dis-tributed. Most fMRI analysis methods, however, assume Gaussian distributed noise,without mentioning the Rician distribution of MR data. We define each BOLD fMRIimage as the difference of two MR images: one measured during activation, the othermeasured during the baseline condition. The difference between two images containingRician noise, with the same underlying image and the same signal-to-noise ratio (SNR) is,to close approximation, Gaussian distributed. We find that the probability density func-tion of this difference (i) approximates a Gaussian very well, and (ii) actually decaysfaster to zero than a Gaussian. Therefore, this model agrees with the Gaussian noisemodel used in fMRI analysis methods. The mathematical derivations used to character-ise the difference distributions are given in appendix A.

Chapter 3 tests a number of wavelet-based denoising schemes in the context of func-tional MR time series analysis, and compares them to Gaussian smoothing. Gaus-sian smoothing is the standard preprocessing method for removing noise from fMRIdata (Gold et al. 1998), but smoothing changes the images in an irreversible way; spatialpatterns of activity found in smoothed images are likely to differ from those found inthe true underlying signal. Wavelet-based denoising methods are shown to improve theSNR beyond higher input values than Gaussian smoothing. In addition, the activationpatterns found after denoising remain closer, in terms of false positives and false negat-ives, to the original images. Gaussian smoothing is a prerequisite if GRF theory is usedfor type I error control. The wavelet-based denoising routines are therefore combinedwith FDR control, which does not require any form of smoothing prior to the analysis.

The second part of the thesis introduces a new method to extract a HRF from anfMRI data set. The BOLD response is assumed to be LTI, and this property is used inchapter 4 to extract the HRF from an fMRI time series with a combination of frequencydomain methods and wavelet domain methods. The ForWaRD method (Neelamani etal. 2004) requires only few assumptions about the shape of the HRF and is shown tobe very robust. This method is adapted to extract the HRF from fMRI time series. Theextracted HRF coefficients are used to fit a novel HRF model, which can be used in the

20 1.3 Thesis contribution and organisation

analyses of other data sets. Combining the new model with the extracted HRFs provesto yield a powerful analysis tool in subsequent ANCOVAs of similar data sets.

Chapter 5 extends the ForWaRD-based HRF extraction routine to support families ofwavelet basis functions which do not have compact support in the time domain. An effi-cient algorithm to compute the SI-DWT in the frequency domain is given in this chapter,using definitions from appendix B. The implementation of this algorithm facilitates theuse of fractional spline wavelet bases, as introduced by Unser and Blu (2000). Compar-isons of the computation times of the time-domain SI-DWT and the frequency-domainSI-DWT show that the frequency-domain version is much faster for long signals. Thefrequency-domain extraction method is tested in a similar way as the earlier version,and test results confirm its usability.

Chapter 6 contains a summary and general conclusions of the thesis, and gives re-commendations for future research.

Denoising fMRI time series

Chapter 2

BOLD noise assumptions in fMRI

Abstract

This chapter discusses the assumption of Gaussian distributed noise in the blood oxygen-ation dependent (BOLD) contrast, computed from functional MRI (fMRI) time series. TheRician distribution in MR images, which is used in MR physics, and the Gaussian distribu-tion, used in functional neuroimaging, are combined by defining the BOLD contrast as thedifference between two MR images. We review the properties of Rician noise, and we discussits most important differences from Gaussian noise. In particular, Rician noise is multiplic-ative while Gaussian noise is additive. If an image is contaminated with Rician noise andGaussian noise, respectively, the distribution of the difference between the image with Riciannoise and the original is asymmetric, and the distribution of the difference between the im-age with Gaussian noise and the original is symmetric. Furthermore, signals with a Riciandistribution are always greater than zero, while signals with additive Gaussian noise maybe negative. We derive an analytic expression for the statistical distribution (the ‘BOLDdistribution’) of the difference between two Rician distributed images in the form of an in-tegral in terms of two underlying Rician probability densities. This distribution is shown tobe symmetric, and an exact expression for its standard deviation in terms of modified Besselfunctions is derived.

Statistical tests, analytical results and numerical computations show that the distribu-tion of the difference between two images whose intensities follow a Rice distribution, can bevery well approximated by a Gaussian.

The approximation is closest for high SNR, but is still quite good for lower SNR. TheBOLD noise model is tested on simulated and real MR images. Subtracting the time seriesmean does not get rid of the asymmetry in temporal noise. Instead, the best approach to getsymmetric, nearly-Gaussian distributed noise, is to subtract a second time series from thetime series that is analysed.

26 2.1 Introduction

2.1 Introduction

In the medical imaging literature, the magnitude signal in magnetic resonance (MR)images is assumed to follow a Rice distribution (Edelstein et al. 1983, Henkelman 1985,Gudbjartsson and Patz 1995, Sijbers et al. 1998b, Nowak 1999, Wood and Johnson 1999,Pizurica et al. 2003), first studied by Rice (Rice 1945, p.100-103) in his analysis of ran-dom noise. Functional magnetic resonance imaging (fMRI) measures the activity indifferent areas of the brain during different experimental conditions. In the statisticalanalysis of fMRI experiments, the noise is usually assumed to be Gaussian (Aguirreet al. 1997, Friston et al. 2000b, Fadili and Bullmore 2001, Friston et al. 1995b, Fristonet al. 1995c, Worsley and Friston 1995). For this assumption to be valid, the differ-ence between two images containing Rician noise must have a Gaussian distribution,as the blood oxygenation level dependent (BOLD) signal is computed as the differencebetween images scanned during different conditions: BOLD = active - baseline. Findingscontradicting this assumption have already been published (Hanson and Bly 2001), andtests for the distribution of the residual (noise) signal in fMRI data sets have been de-veloped (Luo and Nichols 2003). In this chapter, we examine the properties of Riciannoise in MR images, and the distribution of the differences between pairs of MR images,for various signal-to-noise ratios (SNR). Most standard tests, such as the t-test, F -test,and the z-test, rely on Gaussian distributed noise. Petersson et al. (Petersson et al. 1999b)argue that after Gaussian spatial smoothing, with enough degrees of freedom and themultivariate central limit theorem, standard tests are still valid in functional neuroima-ging, with the warning that low-count PET data show departures from normality. Asimilar phenomenon can be seen in functional MR images. The Rician probability dens-ity function is very asymmetric if the signal amplitude is small compared to the noiseamplitude, so for low signal intensities and for images with a low SNR, the differencebetween Rician noise and Gaussian noise is most apparent.

The rest of this chapter is organised as follows. Section 2.2 briefly introduces theRician noise model for MR images. Section 2.3 derives analytical expressions for theprobability distribution of the difference between two MR images, which are verified ina series of tests on synthetic noise images. Section 2.4 investigates the noise distributionsin MR template images contaminated with noise and in a real fMRI time series. Section2.5 contains some general conclusions.

2.2 Noise in MR images

MR imaging is based on the fact that magnetic fields can be transmitted in pulses vary-ing in frequency and phase. Voxel locations are selected by frequency and by phase,and the resulting image consists of complex values. The frequency space in which thisimage is represented is known as the k-space. The values in both the real and imaginaryparts of the image are both Gaussian distributed, due to the central limit theorem. Theimage is transformed to a Cartesian space via an inverse Fourier transform (IFT). The

BOLD noise assumptions in fMRI 27

noise distribution in the image after the IFT is still Gaussian, because the IFT is a lineartransform.

Most applications of MR imaging only use the complex magnitudes of the signal,because those magnitudes represent a physical property of the scanned object (Sijbers etal. 1998a). Let A(x) represent the magnitude of the MR signal at voxel location x in theabsence of noise. The magnitude r(x) of the signal at voxel location x in the magnitudeimage is:

r(x) =

√(A(x) + n1(x))

2 + n2(x)2,

n1(x), n2(x) ∼ N(0, σ),(2.1)

where N(0, σ) is the Gaussian distribution with mean zero and standard deviation σ.The magnitude signal in each voxel x is Rician distributed (Edelstein et al. 1983,

Henkelman 1985, Gudbjartsson and Patz 1995), that is, Prob[r ≤ r(x) ≤ r + dr] =pA(x),σ(r), where pA,σ(r) is the Rician probability density with parametersA and σ definedby:

pA,σ(r) =

{0, r < 0

rσ2 e

−(A2+r2)

2σ2 I0(

A rσ2

), r ≥ 0,

(2.2)

where

Ik(z) =1

π

∫ π

0

ez cos(θ) cos(kθ) dθ (2.3)

is the modified Bessel function of the first kind of order k, k∈N. Figure 2.1 showsthe Rician probability density function (pdf) for varying values of A and σ. The shapeof the pdf changes with both parameters. The distribution for A=0 is called the Rayleighdistribution. For high SNRs, the Rician distribution approaches a Gaussian distribu-tion (Gudbjartsson and Patz 1995).

The mean µr =∫∞

0r pA,σ(r) dr of the Rice distribution is given by (Rice 1945, p.100-

103, Appendix 4B):

µr = σ√π/2 e−z2/4

{(1 + z2/2) I0(z

2/4) + z2/2 I1(z2/4)

}, (2.4)

where z = A/σ is the SNR, and I1 is the modified Bessel function of the first kind

of order 1. The standard deviation σr =√∫∞

0r2 pA,σ(r) dr − µ2

r of the Rice distributionsatisfies the relation (Gudbjartsson and Patz 1995):

σr =√A2 + 2σ2 − µ2

r. (2.5)

In the limiting case A=0, these formulas reduce to

µr = σ√π/2, σr =

1

2σ√

8− 2π. (2.6)

28 2.2 Noise in MR images

0 2 4 6 8 10

0

0.1

0.2

0.3

0.4

0.5

0 2 4 6 8 10

0

0.1

0.2

0.3

0.4

0.5

(a) (b)

0 5 10 15

0

0.1

0.2

0.3

0 5 10 15

0

0.2

0.4

(c) (d)

Figure 2.1. (a) Rician pdfs for σ2=1 and different values ofA: A ∈ {1, . . . , 6}. (b) Ricianpdfs for A=1 and different values of σ2

r : σ2r ∈ {1, . . . , 6}. (c) Rician pdfs for σ2 = 4 and

different values of A: A ∈ {1, . . . , 6}. (d) Rician pdfs for A = 4 and different values ofσ2

r : σ2r ∈ {1, . . . , 6}.

The behaviour in the opposite limit of large SNR can be obtained by considering theasymptotic expansion of the Bessel functions (Abramowitz and Stegun 1972):

Ik(z) ∼ez

√2πz

(1− 4k2 − 1

8 z

), z →∞. (2.7)

This yields

µr ≈ A

(1 +

1

2(A/σ)2

), (2.8)

σr ≈ σ

√1− 1

4(A/σ)2. (2.9)


As A/σ goes to infinity, these formulas yield µr → A, σr → σ, that is, the meanapproaches the noise-free intensity and the standard deviation approaches the corres-ponding value of the underlying noise distribution N(0, σ).

MR noise was modelled by computing the intensity distribution as in (2.1). Thefollowing procedure was used for adding noise to a real-valued image f(x) so that theresulting intensity has a Rician distribution:For each voxel location x:

1: n1(x), n2(x) ∼ N(0, σ2),2: r(x) =

√(f(x) + n1(x))2 + n2(x)2.

As before, the noisy image is denoted by r(x). The local SNR is controlled throughthe ratio f(x)/σ, where f(x) and σ determine µr and σr as described in (2.4) and (2.5).

2.3 The BOLD noise distribution in fMRI: mathematicalanalysis

Most fMRI analysis methods are about statistics. Testing for activation in a certain brainarea is done via hypothesis testing, the null hypothesisH0 stating that there is no activa-tion. Other hypotheses claim different kinds of activation. For the use of most standardstatistical methods to be justified, the noise in the activation images must be Gaussian.This section investigates whether this is really the case, by studying the noise distribu-tions in these images under the null hypothesis of no brain activation, and taking intoaccount the Rician nature of MR noise.

2.3.1 Distribution of the difference fMRI signal

First, note that the difference of two noisy versions r1(x) and r2(x) of the same un-derlying image f(x), both containing Rician noise, is not Rician distributed. Let theBOLD image s(x) be defined as the difference image s(x) = r2(x) − r1(x). Its prob-ability density is denoted by CA,σ(s), where we write A instead of f(x), and σ is thestandard deviation of the underlying noise distribution, cf. (2.1). That is, CA,σ(s) is theprobability that the value of the difference s(x) falls in an infinitesimal interval arounds: CA,σ(s) = Prob[s ≤ s(x) ≤ s+ ds]. We will refer to CA,σ(s) as the BOLD distribution.

Denoting by δ(r) the Dirac delta function, we find

CA,σ(s) =

∫ ∞

0

∫ ∞

0

pA,σ(r1) pA,σ(r2)δ(r2 − r1 − s) dr1 dr2

=

∫ ∞

0

pA,σ(r1) pA,σ(r1 + s) dr1, s ≥ 0.

Since it is clear that CA,σ(s) is symmetric, i.e., CA,σ(s) = CA,σ(−s), we have the following

30 2.3 The BOLD noise distribution in fMRI: mathematical analysis

expression valid for arbitrary values of s:

CA,σ(s) =

∫ ∞

0

pA,σ(r) pA,σ(r + |s|) dr. (2.10)

That is, CA,σ(s) is the cross-correlation of two identical Rice distributions.The mean µs and standard deviation σs of the BOLD distribution CA,σ(s) are given

by:

µs = 0 (2.11)

σs =√

2σr (2.12)

where σr is the standard deviation of the Rice distribution, see (2.5). For the proof, seesection A.1 of the appendix.

In the case A=0, the pdf of r1, as well as that of r2, is:

p0,σ(r) =r

σ2e−

r2

2σ2 . (2.13)

For the Rayleigh case (A=0), the integral in Eq. 2.10 can be explicitly evaluated. Theresult is

C0,σ(s) =1

2σe−s2

4σ2

(|s|2σe−s2

4σ2 +

√π

2

(1− s2

2σ2

)erfc(

|s|2σ

)

), (2.14)

where erfc(z) = 2√π

∫∞ze−t2 dt is the complementary error function (Abramowitz

and Stegun 1972). For the derivation of this formula, we refer to section A.2 of theappendix.

The following experiments investigate how well the pdf CA,σ(s) can be approxim-ated by a Gaussian distribution.

2.3.2 Numerical approximation by a normal distribution

We numerically approximated the BOLD distribution CA,σ(s) as given by Eq. 2.10 by aGaussian via the Levenberg-Marquardt nonlinear curve-fitting algorithm. The fit wascarried out on an interval centred around zero with negligible function values outsidethis interval.

Fig. 2.2 shows the pdf CA,σ(s), as well as the Gaussian fitted to this distribution, fora number of values of A and σ. From the plots the fit appears to be excellent.

Table 2.1 presents some quantitative results. It shows, for various values of A andσ, (i) the exact standard deviation σs =

√2σr of the BOLD distribution (see (2.12)),

where σr was computed according to (2.4)-(2.5); (ii) the width σGauss of the Gaussianfitted to CA,σ(s); and (iii) the mean square error (MSE) of the difference between CA,σ(s)itself and the fitted Gaussian. Note that difference between the width σGauss of the fittedGaussian and the exact value σs is quite small, especially for high SNR (i.e., A/σ). Since


A=0

−20 −10 0 10 200

0.2

0.4

d

pdf

−20 −10 0 10 200

0.05

0.1

0.15

d

pdf

−20 −10 0 10 200

0.02

0.04

0.06

0.08

d

pdf

σ = 1 σ = 3 σ = 5A=2

−20 −10 0 10 200

0.05

0.1

0.15

0.2

0.25

0.3

d

pdf

−20 −10 0 10 200

0.05

0.1

d

pdf

−20 −10 0 10 200

0.02

0.04

0.06

0.08

d

pdf

σ = 1 σ = 3 σ = 5A=8

−20 −10 0 10 200

0.05

0.1

0.15

0.2

0.25

d

pdf

−20 −10 0 10 200

0.05

0.1

d

pdf

−20 −10 0 10 200

0.01

0.02

0.03

0.04

0.05

0.06

d

pdf

σ = 1 σ = 3 σ = 5

Figure 2.2. The exact BOLD distribution CA,σ(s) (solid line), the fitted Gaussian(dashed line), and the difference between CA,σ(s) and its Gaussian fit (dotted line).Due to the excellent match, the fitted Gaussian is hardly distinguishable from the ex-act BOLD distribution.

σr approaches σ for high SNR (see section 2.2), σs approaches√

2σ in this limit. Thebehaviour of the MSE, which decreases when the SNR increases, is in line with theseobservations. Below it is shown that the BOLD distribution does not have a heavytail (it is slighter weaker than a Gaussian of width σ). Therefore, the function valuesoutside the interval used in the fitting procedure will have a negligible effect on thedetermination of the p-values.

All this implies that statistical tests based on the assumption of normally distributednoise with an estimated standard deviation σGauss close to the exact value σs, is justified.

32 2.3 The BOLD noise distribution in fMRI: mathematical analysis

Table 2.1. Accuracy of Gaussian fits to the pdf CA,σ for various values of A and σ.Shown are the exact standard deviation σs computed from (2.12), the width σGauss

of the fitted Gaussian, and the mean square error of the difference between the exactdistribution and the fitted Gaussian.

A σ σs σGauss error0 1 0.9265 0.9103 0.00800 3 2.7795 2.7315 0.00450 5 4.6325 4.5526 0.00352 1 1.2933 1.3071 0.00302 3 3.0463 3.0085 0.00352 5 4.8079 4.7291 0.00338 1 1.4086 1.4086 0.00008 3 4.0552 4.0780 0.00088 5 6.1567 6.2188 0.0013

2.3.3 Tail of the BOLD distribution

An important property of the pdf CA,σ(s) in the context of statistical neuroimaging isthe behaviour of the tail (|s| approaching infinity) of this distribution under the nullhypothesis, because this tail determines the p-values when thresholding at a certainsignificance level.

For the limiting cases of low and high SNR, i.e., A=0 and A/σ large, we mathem-atically analysed the behaviour of the pdf CA,σ(s) when |s| becomes very large. Thedetails are presented in section A.3 of the appendix. We find that both in the Rayleighcase (A=0) and for large values of A/σ, the tails of the (2.10) are lighter than the tail of aGaussian distribution. More precisely,

CA,σ(s) ∼ constant · 1

|s|e−

(|s|−A)2

2σ2 , s→∞, (2.15)

where the constant depends onA and σ. This is a Gaussian tail of width σ multipliedby a factor 1/ |s|, which means that the distribution approaches zero even faster than aGaussian distribution of width σ.

2.3.4 Statistical tests of normality

An image of a uniform underlying intensity with Rician noise has a spatially constantnoise distribution. The difference between two such images has a symmetric intensitydistribution.

To test whether this distribution is (approximately) Gaussian, the Kolmogorov-Smirnov(KS) test was employed as follows. Rician distributed noise was added to two imagesof a uniform intensity A, and the difference between the noisy images was computed.


The KS test was applied to this difference image. The null hypothesis of the KS testis that the data are normally distributed, and this is rejected if the p-value of the KStest statistic is below 0.05. For a number of intensities A, images of different sizes weretested, and for each size and intensity, the test was repeated 32 times. Table 2.2 showsthe mean p-values of the KS test statistics for each size, with intensity A=1 and A=5. Asa reference, 32 images of the same size containing N(0, 1) noise were also tested, andtheir mean p-values are in the right column.

Table 2.2. p-values produced by the KS test for the difference between images withRician distributed noise with signal amplitudes A=1 and A=5, and for images of thesame size with N(0, 1)-noise.

size p-value (A=1) p-value (A=5) p-value N(0, 1)2 × 2 0.6573 0.5607 0.45694 × 4 0.5761 0.5565 0.42498 × 8 0.5511 0.5493 0.4894

16 × 16 0.5801 0.5564 0.585432 × 32 0.5833 0.5378 0.594664 × 64 0.5629 0.4869 0.4816

128 × 128 0.5270 0.5426 0.5147256 × 256 0.4390 0.5554 0.5225512 × 512 0.3210 0.5219 0.4006

1024 × 1024 0.0587 0.5236 0.5037

This table shows that for low intensities, deviations from normality can only be de-tected in very large images, and for higher intensities, they cannot be measured at all.

2.3.5 Parameter estimation in fMRI data based on the general linearmodel

As the last step in the analysis of the normal approximation to the BOLD distribution,the precision of the parameter estimation of the Gaussian that describes the temporalnoise is tested, to see whether these values indeed correspond to those found in thenumerical approximation.

Most methods for fMRI data analysis, such as statistical parametric mapping (Fristonet al. 1995c), use the general linear model (GLM). The GLM treats fMRI responses asthe outputs of a linear time-invariant (LTI) system. Given the measured data Y and amatrix x containing the components of the temporal response, the data are modelled bythe equation

Y = Xβ + e, (2.16)

34 2.4 The BOLD noise distribution in fMRI: experimental data

where the parameter matrix β contains the weight of each temporal component ateach voxel, and the residual matrix e contains the part of the signal not modelled by anycomponent in x. The ensuing statistical tests are usually based on the assumption of aGaussian distribution of the values in e. Using the near-Gaussian distribution of onedifference image, a matrix e was simulated by making a time series of 128 differenceimages, and the standard deviation of the temporal noise was computed in each voxel.Table 2.3 shows, for the same input A and σ as before, the measured temporal standarddeviation σtemp, the mean standard deviation σs in the difference images (which equals√

2σr, see also section 2.3), and the ratio σs/σtemp. It shows that the estimated standard

deviation approximates σs very accurately.

Table 2.3. Measured temporal standard deviation σtemp, the mean standard deviationσs in the difference images, and the ratio σs/σ

temp.

A σ σs σtemp σs/σtemp

0 1 1.4280 1.4338 0.99590 3 4.2854 4.3017 0.99620 5 7.1352 7.1667 0.99562 1 1.8001 1.8071 0.99612 3 4.4708 4.4890 0.99592 5 7.2472 7.2815 0.99538 1 2.1579 2.1667 0.99598 3 5.7924 5.8157 0.99608 5 8.5195 8.5542 0.9959

2.3.6 Conclusion

From the statistical tests, the analytical results and the numerical computations, we con-clude that the difference between two images whose intensities follow a Rice distribu-tion, can be very well approximated by a Gaussian distribution. The approximationis closest for high SNR, but is still quite good for lower SNR. Given the parameters Aand σ of the Rician spatial noise in a series of MR images, the standard deviation of theGaussian that describes the temporal noise can be accurately predicted.

2.4 The BOLD noise distribution in fMRI: experimentaldata

The noise distribution of an MR image with Rician noise is a sum of Rician distributions.For each grey value in the image, the noise is distributed differently (see Fig. 2.3). Areas


with a ‘true’ grey value of 0, like the space around the body, have Rayleigh-distributednoise, and the areas with higher grey values have more symmetric distributions. Thesedistributions are similar to a Gaussian, and they are centred around the grey value atthat location. The total noise distribution is a mixture of all those distributions. Thequestion is whether the conclusions about the noise in the difference image obtained insection 2.2 also hold for noise with mixed distributions.

0 500 1000 1500 2000 2500

0 500 1000 1500 2000 2500Figure 2.3. Top: the histogram of a noise-free T ∗2 -weighted MR image. Bottom: thehistogram of the image with Rician noise of σ = 81.67 (SNR: 10 dB).

2.4.1 Shape of the noise distribution in MR images

A simulated MR image was acquired from the BrainWeb MRI simulator (Kwan et al.1996), with the following parameters: modality T2, slice thickness 1 mm, noise 0%,intensity non-uniformity 0%. From this image, Non-brain voxels were excluded withthe Brain Extraction Tool (Smith 2002). this image (see Fig. 2.4a) was used as a noise-free T ∗

2 -weighted MR image, to which Rician noise with a known σ was added (see

36 2.4 The BOLD noise distribution in fMRI: experimental data

Fig. 2.4b). A residual image was obtained by subtracting the original MR image fromthe noisy MR image, and a BOLD image was obtained using the definition proposed insection 2.3, i.e., as the difference between two MR images containing Rician noise.

(a) (b)

Figure 2.4. (a) A noise-free T ∗2 -weighted MR image. (b) Image (a) with Rician noise ofσ = 81.67 (SNR: 10 dB).

The dissimilarity between a Rician distribution and a Gaussian is largest for low sig-nal intensities A. The previous section showed that the difference between two imagesof a constant intensity A and Rician noise has zero mean and is near-Gaussian distrib-uted, also for low signal intensities. This section examines the difference images whenthe noise-free images contain more than one intensity. Figure 2.5 shows the histogramsof a noisy MR image, the difference between a noisy MR image and the noise-free im-age, and the difference between two noisy MR images, respectively. The histogramswere computed for a range of values for σ and are presented together as surface plots.As σ decreases, the histogram of the noisy MR image changes from one Rayleigh-likepdf to a number of Gaussian pdfs (see Fig. 2.5a). The histogram of the noisy image aftersubtraction of the original is asymmetric for high σ, and becomes more symmetric asthe σ decreases (see Fig. 2.5b). The histogram of the difference images is symmetric forall σ (see Fig. 2.5c).

2.4.2 Time series of MR images

A time series of 164 EPI scans was made on a 3T Intera scanner (Philips Medical Sys-tems, the Netherlands). The repetition time TR was 3 s, and the data volumes were


(a) (b) (c)

Figure 2.5. (a) Histogram of a noisy MR image, (b) histogram of the difference betweena noisy MR image and the noise-free MR image, and (c) histogram of the difference oftwo noisy MR images, for various σ. Top: surface plots, bottom: grey value maps.

64×64×46 voxels of 3.5 × 3.5 × 3.5 mm3. No stimuli were presented, so the null hypo-thesis of no activation was assumed to be true throughout the experiment. Alignmentof the images was done with the SPM´99 program (Friston et al. 1995c).

The time series was split in two disjoint sets: TS1 (images 1 . . . 82) and TS2 (images83 . . . 164). First, the noise of TS1 was centred around 0 by subtracting the time seriesmean image of TS1 from each image. Note that although this is a common procedure infMRI analysis, this means treating Rician noise as additive noise. Based on the resultsof the previous sections, an image with a symmetric noise distribution was obtained bysubtracting, from each image in TS1, the corresponding image in TS2.

The histogram of the time series mean image (see Fig. 2.6) was used to divide theimages in the time series into three intensity ranges: low intensity (grey value 0. . .300),medium intensity (grey value 301. . .600), and high intensity (grey value > 600).

Figure 2.7 shows the histograms of the grey values in the resulting time series withinthe three ranges. Gaussians were fitted to the histograms with the Levenberg-Marquardtcurve-fitting algorithm. For medium and high intensities, the time series histogramsshow no significant asymmetries. For low intensities however, the time series TS1 aftersubtracting the mean has an asymmetric histogram, while the time series TS1 after sub-tracting TS2 has a symmetric histogram. The fits are never perfect, except in the case ofof low intensities and after subtracting TS2. In that case, the intensity distribution hasone predominant intensity (A=0, see Fig. 2.6), and the difference distribution is close tothose in the A=0 cases of the previous section. In the other cases, the noise originatesfrom voxels with various intensities, and the noise distribution resembles a mixture of

38 2.5 Conclusions

0 500 1000 1500

Figure 2.6. Histogram of the time series mean image of TS1.

Gaussians with mean µ=0 and various σ.Because the amount of asymmetry in the medium and high grey-value ranges is

very small, the combination of thresholding and subtracting the time series mean maysolve most of the problems concerning the distribution of BOLD noise. However, thenew method presented in this chapter of subtracting a second time series is preferable:it has proved to yield symmetric noise distributions in all measurements considered.

2.5 Conclusions

We have presented a BOLD noise model that takes into account the Rician distribu-tion of MR noise known from the fMRI literature. The BOLD noise signal was definedas the difference between two MR images with Rician noise. We concentrated on theproperties of the BOLD image under the null hypothesis of no brain activation. Theproblem was studied in several complementary ways: analytical calculation, numericalsimulation, statistical estimation, and experimental validation on fMRI data.

An analytic expression was derived for the statistical distribution CA,σ(s) of theBOLD signal as an integral in terms of two underlying Rician probability densities withparameters A and σ. From this basic formula, analytical expressions were derived forthe mean and standard deviation of the BOLD distribution, as well as for its tail, i.e., itsasymptotic behaviour as s goes to infinity.

Next, a numerical approximation of the BOLD distribution CA,σ(s) by a Gaussianfunction was obtained via the Levenberg-Marquardt nonlinear curve-fitting algorithm.The approximation by a Gaussian distribution was very good, with the accuracy ofthe approximation increasing as the SNR, i.e., A/σ, becomes larger. Also, the standard


−25 0 25 −50 −25 0 25 50 −50 −25 0 25 50

(a) (b) (c)

−50 −25 0 25 50 −100 0 100 −100 0 100

(d) (e) (f)

Figure 2.7. Histograms of three intensity ranges of the images in the time series. Top:time series TS1 after subtracting the time series mean: (a) low intensity, (b) mediumintensity, (c) high intensity. Bottom: time series TS1 after subtracting the correspondingimages of time series TS2: (d) low intensity, (e) medium intensity, (f) high intensity.

deviation of the fitted Gaussian was found to be in excellent agreement with the exactstandard deviation σs derived from the analytical expressions.

The statistical properties of BOLD noise were examined in two ways. First we ap-plied the Kolmogorov-Smirnov test to difference images of noise-only images with Ri-cian distributed noise. Another test, performed using the general linear model (GLM),compared the estimated noise parameters to the value predicted by the model, andshowed that the agreement is excellent.

From the analytical results, the numerical computations, and the statistical tests, weconcluded that the assumption of Gaussian distributed noise used in the fMRI literaturecould be justified. That is, the difference between two images whose intensities followa Rice distribution can be very well approximated by a Gaussian distribution. The ap-proximation is closest for high SNR, but is still quite good for lower SNR. Given theparameters A and σ of the Rician spatial noise in a series of MR images, the standarddeviation of the Gaussian that describes the temporal noise can be accurately predicted.

The BOLD noise model was tested on simulated and real MR images. In a test thatcontaminated noise-free MR templates with Rician noise, MR noise was shown to havean asymmetric distribution when it is —incorrectly— treated as additive noise. As inthe test with noise-only images, difference images of noisy MR pictures were found to

40 2.5 Conclusions

have a symmetric distribution. The consequence for fMRI time series analysis is thatsubtracting the time series mean does not get rid of the asymmetry in temporal noise.We also considered thresholding the MR images as a fast and simple alternative, whichcan to some extent remove asymmetry in the noise distribution. However, the bestapproach to get symmetric, nearly-Gaussian distributed noise, is to subtract a secondtime series from the time series being analysed.

Chapter 3

Denoising functional MR images: acomparison of wavelet-based denoisingand Gaussian smoothing

Abstract

We present a general wavelet-based denoising scheme for functional magnetic resonanceimaging (fMRI) data and compare it to Gaussian smoothing, the traditional denoising methodused in fMRI analysis. One-dimensional WaveLab thresholding routines were adapted totwo-dimensional images, and applied to 2D wavelet coefficients. To test the effect of thesemethods on the signal-to-noise ratio (SNR), we compared the SNR of 2D fMRI images be-fore and after denoising, using both Gaussian smoothing and wavelet-based methods. Wesimulated a fMRI series with a time signal in an active spot, and tested the methods on noisycopies of it. The denoising methods were evaluated in two ways: by the average temporalSNR inside the original activated spot, and by the shape of the spot detected by thresholdingthe temporal SNR maps. Denoising methods that introduce much smoothness are bettersuited for low SNRs, but for images of reasonable quality they are not preferable, becausethey introduce heavy deformations. Wavelet-based denoising methods that introduce lesssmoothing preserve the sharpness of the images and retain the original shapes of active re-gions. We also performed statistical parametric mapping (SPM) on the denoised simulatedtime series, as well as on a real fMRI data set. False discovery rate control was used tocorrect for multiple comparisons. The results show that the methods that produce smoothimages introduce more false positives. The less smoothing wavelet-based methods, althoughgenerating more false negatives, produce a smaller total number of errors than Gaussiansmoothing or wavelet-based methods with a large smoothing effect.

42 3.1 Introduction

3.1 Introduction

Functional neuroimages often need preprocessing before being subjected to statisticalanalysis. A common preprocessing step is denoising, which is usually done via Gaus-sian smoothing. Smoothing suppresses noise, but it also changes the intensity variationof the underlying image. This suppresses, or even removes, detailed features of the ori-ginal image. In this chapter, we study wavelet-based denoising as a possible alternativeto Gaussian smoothing. Wavelet-based denoising has the advantage over low-pass fil-tering that relevant detail information is retained, while small details, due to noise, arediscarded. The performance of both approaches is compared with respect to (i) theimprovement of the signal-to-noise ratio (SNR), (ii) the preservation of the shapes ofactive regions during the denoising process, and (iii) the improvement in the statisticalanalysis via statistical parametric mapping (SPM).

The focus of this chapter is on functional magnetic resonance imaging (fMRI) timeseries. In an fMRI experiment, a person lying inside an MRI scanner is asked to per-form a certain task while a series of scans of the brain are made. Brain regions involvedin this task show increased concentrations of oxygenated blood, inducing local signalchanges (Turner et al. 1998). These signal changes are referred to as the blood oxygena-tion level dependent (BOLD) contrast, and detecting and characterising these changesis the main goal of fMRI time series analysis.

Most of the standard statistical tests assume Gaussian distributed noise. However, inthe MR literature, noise in MR images is shown to be Rician distributed (Gudbjartssonand Patz 1995). We analyse the BOLD contrast as the difference between two MR im-ages (active minus baseline) both containing Rician distributed noise, and show that thedistribution of BOLD noise is a close approximation of a Gaussian distribution. Thus,the standard tests requiring normally distributed noise can still be used.

The use of wavelets for the statistical analysis of fMRI and positron emission tomo-graphy (PET) studies is not new. Feilner et al. (1999) use the wavelet transforms of dif-ference images constructed from epoch-related fMRI experiments. Assuming a normaldistribution of values in the difference images, activation is found by applying a t-test tothe wavelet coefficients, using Bonferroni correction for multiple testing. The statisticalmap is found by applying the inverse wavelet transform. Ruttimann et al. (1998) followa similar approach. Their algorithm performs a two-stage test in the wavelet domain.The first test analyses the wavelet coefficients per direction channel: the coefficients areordered by resolution and by direction (horizontal, vertical, and diagonal). It assumesthe cumulative energy in each direction channel to be χ2-distributed. All coefficients ina direction channel at a certain resolution are discarded if its cumulative energy is lowerthan the value predicted via this χ2-distribution. The second test thresholds the wave-let coefficients in the remaining channels individually via a two-sided z-test. Both thechannelwise test and the voxelwise test use the Bonferroni correction for multiple test-ing. The inverse wavelet transform is applied to the output of the second test, yieldingan activation map. Raz and Turetsky (1999) perform an analysis of variance (ANOVA)in the wavelet domain, by thresholding the wavelet coefficients according to their score

Denoising functional MR images 43

in a statistical test. The testing is done blockwise: at the lowest resolution, each coef-ficient is a block, and at higher resolutions the same number of blocks is used. Thefalse discovery rate (FDR) is used to correct for multiple testing. Hilton et al. (1996) usea wavelet-based denoising procedure known from the WaveLab project (Buckheit andDonoho 1995), an open source collection of wavelet routines, and compare this to theirown data analytic thresholding procedure. The denoised time series are subjected tostatistical testing by means of a voxelwise t-test. Turkheimer et al. (2000) model PETimages in wavelet space by applying statistical models to the frame-by-frame wavelettransformations of PET time series.

The main contribution of this chapter is an extensive comparison of wavelet-baseddenoising and Gaussian smoothing, which is the standard denoising tool for functionalneuroimages. All wavelet-based denoising methods mentioned above except the oneby Hilton et al. (1996) perform the ensuing statistical tests in the wavelet domain. Wefavour the approach used by Hilton et al. for two reasons. First, performing a statisticaltest in the original domain enables a comparison between the wavelet-based methodsand Gaussian smoothing as preprocessing steps. The statistical analysis process is ex-actly the same for all data sets and can be kept outside the discussion. Secondly, per-forming the statistical test in the wavelet domain requires an inverse wavelet transformafterwards, which spreads out the activation in the final statistical map. Whether ornot another threshold is needed on this map before display is questionable. Separatingthe denoising and the statistical analysis has another advantage. The data sets used inthis study only require a simple statistical test, but most recent fMRI experiments oftenrequire much more complex procedures. It is not likely that all these tests can be donein the wavelet domain. However, if the denoised images are transformed back to theoriginal domain, this problem does not occur.

Another difference between the current study and previous publications on this sub-ject is that we include tests on simulated time series of which the SNRs and noise char-acteristics are known. Our definition of the BOLD signal allows a very precise charac-terisation of the noise in all test cases, so that the effect of each method on the SNR canbe accurately determined.

Thirdly, we simulate brain activity in the time series by superimposing a time signalon a selected area. From the difference between the shape of the original active spot andthe shape of the spot detected by statistical parametric mapping, we can make quant-itative analyses of the denoising methods in terms of false positive and false negativeerror rates.

The remainder of this chapter is organised as follows. Section 3.2 reviews a num-ber of procedures to correct for multiple hypothesis testing. Section 3.3 first describesthe noise in MR images, and introduces BOLD noise as the noise in the difference oftwo MR images. In section 3.4, we present the wavelet-based denoising methods avail-able in WaveLab (Buckheit and Donoho 1995). Adjustments have been made to thesemethods, to (i) make hem suitable for processing 2D images and (ii) to support noisewith unknown autocorrelations. These denoising methods, as well as various degreesof Gaussian smoothing, are tested on 2D images in section 3.5. In sections 3.6 and 3.7,

44 3.2 Thresholding statistical maps: multiple hypotheses

the wavelet-based and Gaussian methods are tested on an artificial time series, andcompared in terms of their effects on the temporal SNR of the denoised time series andon the quality of the statistical parametric map. Finally, we compare the effects of thesemethods in a statistical analysis of a real fMRI data set in section 3.8. Section 3.9 containssome general conclusions.

3.2 Thresholding statistical maps: multiple hypotheses

Neuroimage analysis often entails hypothesis testing. Consider an experiment in whicha subject is asked to perform a task while being recorded by the MRI scanner. The nullhypothesis H0 states that a brain region is not involved in that task. There may be morethan one alternative hypothesis, indicating different patterns of activity. In general,rejecting H0 means that brain activation related to the experiment has been detected.If a large number of hypothesis tests are done simultaneously, the expected number ofrejected null hypotheses increases. This introduces the risk of false positives, also calledtype I errors (see Table 3.1).

Table 3.1. Classifications and misclassifications in statistical tests.

inactive voxel active voxelkeep H0 correct type II errorreject H0 type I error correct

Statistical parametric mapping, SPM for short (Friston et al. 1995c), is the commonmethod to analyse functional neuroimages. SPM is based on the general linear model,which states that the response in an fMRI experiment can be written as a weightedsum of explanatory signals. Let the matrix Y [T×N ] denote the fMRI data measured inthe experiment, where each matrix element yij denotes the value measured at time i =1, . . . , T and voxel location j = 1, . . . , N . According to the general linear model,

Y = Xβ + e, (3.1)

where X [T×M ] is a matrix, called the design matrix, whose M column vectors are thesignals that represent the modelled effects, called the explanatory variables. The rowvectors of the matrix β[M×N ] are the weighting factors for those signals, and the valuesin the matrix e[T×N ] are the residual errors of each voxel in each scan. A least-squaresestimate b for β is given by (XT X)−1XT Y . Given a model of e, the significance ofthe coefficients of b, and thus of the modelled effects, can be found in each voxel viahypothesis testing.

A statistical parametric map ofN voxels consists of the p-values pi, 1 ≤ i ≤ N . Givena distribution of outcomes, a p-value is the probability of getting an outcome at least as


extreme as the one observed when the null hypothesis H0 is correct. The SPM methodallows for many statistical tests (t-tests, analysis of (co-)variance, regression analysis).In this chapter, we will only discuss the one-sample t-test. The temporal noise in fMRIdata sets is assumed to be Gaussian distributed, N(µ, σ2). The null hypothesis H0 statesthat µ = 0. We test for increased activation, which means that we perform a one-sidedtest: H1 states that µ > 0. We do not know the real σ2 of the temporal noise distribution,so it must be estimated via the sample variance s2, which can be computed using theresidual time signals in e. Using this estimate, we can test for increased activation viathe t-test. BOLD contrasts are constructed as linear combinations of rows of b (each ofwhich is an image of N voxels), and their values are t-distributed. The relation betweenp-values and t-values is as follows. If a t-value in the BOLD contrast is in the upperα % of the distribution, it p-value is below α. In other words: a small p-value providesstrong evidence against the null hypothesis. Active voxels are those with p-values belowa significance level α. For one test, α is the probability of erroneously rejecting H0.

Testing multiple independent hypotheses with the same significance level α leadsto false positives. For one test, a level of 0.05 is acceptable, but for N simultaneoustests, approximately 0.05 N detected activations will be false positives. Simultaneoustests deal with the omnibus null hypothesis (Holmes 1995), which states that there is noactivation in any of the individual tests. Testing the omnibus null hypothesis at level αcan be used to decide if there is activity in the image, but not where it is. The omnibusnull hypothesis is said to have weak type I error control.

One way to deal with this is Bonferroni correction, where the significance level α isreplaced by α/N . This guarantees that the proportion of false positives does not exceedα in any subset of the simultaneous tests. Bonferroni is therefore said to have strongtype I error control (Genovese et al. 2002), meaning that rejecting H0 in a certain regionin the brain is evidence for activation in that very region. Bonferroni correction not onlyaffects the number of type I errors: reducing the probability of rejecting the omnibusnull hypothesis also affects the number of true positives. This introduces false negat-ives, or type II errors (see Table 3.1). For fMRI signals, which are spatially correlated,Bonferroni correction may be too conservative.

In most neuroimage analysis programs that use the SPM method for the statisticalpart of the analysis, images are smoothed with Gaussian filters (Gold et al. 1998). Themotivation for this is twofold: (i) it increases the SNR, and (ii) it controls the smooth-ness of the noise in the images when viewed as a lattice representation of a continuous,stationary Gaussian random field (GRF). In order to use GRF theory, smoothing may benecessary to bring the data more into agreement with the model assumptions. Once thesmoothness of an image is known or controlled via filtering, threshold values for stat-istical maps can be computed using the Euler characteristic of Gaussian random fieldsto correct for multiple testing (Friston et al. 1994, Worsley et al. 1996). The method hasstrong type I error control. This approach has two drawbacks. Firstly, even after filter-ing, the noise in the smoothed images often still differs from a GRF. Noise and signalare smoothed together, so smoothing makes it even more difficult to separate signalfrom noise. The underlying image is not likely to represent a continuous GRF, so the

46 3.2 Thresholding statistical maps: multiple hypotheses

corrected threshold is likely to be biased. This will influence all corrected p-values .This problem is even more serious when the smoothing kernel has another full width athalf maximum (FWHM) than the intrinsic FWHM of the underlying images. Secondly,the smoothing process suppresses and removes details in the images. This hampers thedetection of detailed regions during subsequent analysis.

The false discovery rate (FDR) is another alternative for multiple test correction, thatis also applied in functional neuroimaging (Genovese et al. 2002). It does not requirespatial smoothness. The FDR is defined as the expected proportion of false positivesamong the rejected null hypotheses (Benjamini and Hochberg 1995):

FDR = E

(#type I errors (= #false positives)#H0 rejected (= #detections),

)(3.2)

E denoting expectation, and is identical to 0 when #detections = 0. The following al-gorithm results in an FDR approximately equal to q, with 0 ≤ q ≤ 1. Given N voxelswith p-values p1, . . . , pN and an FDR parameter q, an FDR-controlling threshold selec-tion procedure is given by:

1: defineη(N) =

{1, when the p-values are uncorrelated∑N

i=1 1/i, otherwise2: order the p-values so that pi ≤ pi+1 for every 0 < i < N ;3: let r be the largest i for which pi ≤ qi/Nη(N);4: reject the null hypotheses of the voxels with pi ≤ pr.

Figure 3.1 shows the expected distribution of p-values ifH0 is true (solid line), the upperthreshold line for the p-values is H0 is not true (dashed line), and the p-values (· and +).

This method has weak type I error control. Notice that, when η(N) = 1, the graphof qi/N vs. i is a straight line from (1, q/N) to (N, q), and that q/N and q correspond tothe Bonferroni-corrected threshold and the ‘omnibus’ threshold, respectively. If η(N) =∑N

i=1 1/i, the method is much more conservative (Genovese et al. 2002), so the η(N) = 1option is preferable if it is allowed.

This correction method has a number of advantages over Bonferroni correction andcorrection based on GRF theory. It is less conservative than Bonferroni correction and itdoes not require smoothing, in contrast to GRF theory. The most important advantageis its adaptivity: the threshold is selected on the basis of the distribution of p-values,so after hypothesis testing. Therefore, it can be applied to any set of p-values resultingfrom a statistical test. It is independent of the type of test and the number of hypotheses,so that comparisons between studies with equal FDRs are possible.

The value η = 1 is not only valid for uncorrelated p-values, but also for sets ofp-values that are positive regression dependent within subsets (PRDS). Genovese etal. (2002) explain the PRDS property briefly, and they argue that statistical paramet-ric maps have this property. In section 3.7 we discuss the distribution of the p-valuesunder the null hypothesis and their spatial correlation in greater detail. A uniform dis-tribution of p-values under the null hypothesis proves the validity of the statistical test.


0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

p(n)

= n

/N

p(n) = q × n/N, q=0.1

n/N, n=1..N

p−va

lue

p(n)

, n=

1..N

Figure 3.1. Graphical representation of the FDR-controlling threshold selection. ·: H0

rejected, +: H0 accepted.

The spatial correlation of the residual noise is tested because a time series with Gaussiannoise that is positively correlated among voxels, is PRDS (Genovese et al. 2002).

3.3 Noise models for fMRI

The computation of p-values in fMRI is usually done with standard tests, such as thet-test or F -test. The use of these tests is justified by assuming the BOLD noise to beGaussian under the null hypothesis. In the MR literature, however, the noise in MRimages is assumed to be Rician distributed (Edelstein et al. 1983, Gudbjartsson andPatz 1995, Sijbers et al. 1998a). Rician noise differs from Gaussian noise in that it is mul-tiplicative instead of additive, i.e., it depends on the signal intensity, and the probabilitydensity function (pdf) of the noise is very asymmetric for low signal intensities.

The difference between two images with Rician distributed noise has a symmetricdistribution (see Fig. 3.2). In such difference images, the distribution of noise is veryclose to Gaussian noise, as can be seen in table 3.2. This table shows, for each listed im-age size, the mean p-values of the Kolmogorov-Smirnov (KS) test statistic on 32 imagesof that size, for (a) the difference between two images with Rician noise and signal in-tensity one, (b) the difference between two images with Rician noise and signal intensityfive, and (c) an image containing N(0, 1) noise. The null hypothesis of the KS test is thatthe noise is normally distributed, and it is rejected if the p-value of the KS test statisticis below 0.05. For very low signal intensities, a deviation from Gaussianity is noticeableonly in very large images. We conclude that it is safe to use techniques based on the as-sumption of Gaussian noise for BOLD images. A more detailed analysis of BOLD noise

48 3.3 Noise models for fMRI

0 2 4 6 8 10

0

0.1

0.2

0.3

0.4

0.5

−10 −5 0 5 10

0

1

2

3

4

−10 −5 0 5 10

0

0.2

0.4

(a) (b) (c)

Figure 3.2. (a) Rician pdfs for different signal intensities. Higher intensities have noisedistributions similar to a Gaussian. (b) pdfs of the difference of two Rician distributedsets for a fixed signal intensity, with different standard deviations. (c) Gaussian pdfs,with different standard deviations.

is given in chapter 2.

Table 3.2. p-values produced by the KS test for the difference between images withRician distributed noise (R), with signal amplitudes (A) of 1 and 5, and for images ofthe same size with N(0, 1)-noise (N).

size p-value R(A=1) p-value R(A=5) p-value N2 × 2 0.6573 0.5607 0.45694 × 4 0.5761 0.5565 0.42498 × 8 0.5511 0.5493 0.4894

16 × 16 0.5801 0.5564 0.585432 × 32 0.5833 0.5378 0.594664 × 64 0.5629 0.4869 0.4816

128 × 128 0.5270 0.5426 0.5147256 × 256 0.4390 0.5554 0.5225512 × 512 0.3210 0.5219 0.4006

1024 × 1024 0.0587 0.5236 0.5037

The BOLD effect involves spatial autocorrelation due to the spatial extent of neur-onal events, but this autocorrelation is not exactly known (Malonek and Grinvald 1996,Worsley et al. 1997). We tested two types of spatial correlation: white noise, and 1/fnoise, which has a 1/f power spectrum. The motivation for the latter type of noiseis that, due to the MR frequency encoding, a unit pulse gets the shape of a peak withexponential slopes (Panych 1996). Section 3.5 describes how we simulated MR noise.


3.4 Wavelet-based denoising

Wavelet bases are bases of nested function spaces, which can be used to analyse signalsat multiple scales. Wavelet coefficients carry both time and frequency information, asthe basis functions vary in position and scale. The fast wavelet transform (FWT) effi-ciently converts a signal to its wavelet representation (Mallat 1989). In a one-level FWT,a signal c0 is split into an approximation part c1 and a detail part d1. In a multilevel FWT,each subsequent cj is split into an approximation cj+1 and detail dj+1. For 2D images,each cj is split into an approximation cj+1 and three detail channels dj+1

1 , dj+12 , and dj+1

3 ,for horizontally, vertically, and diagonally oriented details, respectively (see Figs. 3.3band 3.5a). The inverse FWT (IFWT) reconstructs each cj from cj+1 and dj+1. If the wave-let basis functions do not have compact support, the FWT is computed most efficientlyin the frequency domain. This transform and its inverse are called the Fourier-waveletdecomposition (FWD) and Fourier-wavelet reconstruction, (FWR), respectively, see thepaper by Westenberg and Roerdink (2000)

for more details.

(a) (b)

Figure 3.3. (a) A simulated MR image. (b) A 2D nonstandard fast wavelet transform(FWT) of (a).

3.4.1 Wavelet bases

As it is difficult to mathematically characterise functional brain signals, a basis with gen-eral properties is preferable. Of the common wavelet bases, such as Daubechies wave-lets (Daubechies 1988), symlets, coiflets (Daubechies 1993), and splines, spline bases

50 3.4 Wavelet-based denoising

have been shown to possess the best approximation properties, such as the smallest L2

error (Unser 1999). Because of their smoothness, splines are well localised in both thefrequency and time domains. Earlier studies about the use of wavelets in fMRI ana-lysis (Ruttimann et al. 1998, Turkheimer et al. 1999) favour the use of symmetric wave-lets and scaling functions, because they do not introduce phase distortions. Orthogonalbases are recommended, because they transform white noise into white noise (Jansenand Bultheel 2001). Unser and Blu (2000) have proposed an FWT that uses fractionalspline wavelets. Fractional splines are splines of a real-valued degree, which can beused to produce wavelet bases. They come in many flavours, such as symmetric andcausal, orthogonal and biorthogonal.

In view of the above, symmetric, orthonormal cubic spline wavelets (see Fig. 3.5b)are the best choice for this study. Symmetric, orthogonal, smooth wavelet basis func-tions cannot have compact support; they can have exponential decay (Cohen et al. 1992).For this reason, we use a frequency domain implementation via the FWD to computethe FWT (see Fig. 3.4).

in the frequency domainwavelet coefficients

wavelet coefficients in the time domain

original image

frequency coefficientsFWD

FWR

FWT

IFWT

FFT IFFTFFT per level, per channel

IFFT per level, per channel

Figure 3.4. The FWT and FWD of a signal are interchangeable via the fast Fouriertransform (FFT).

3.4.2 Denoising images by wavelet domain thresholding

The WaveLab package (Buckheit and Donoho 1995) contains a number of schemes forwavelet-based denoising, including HybridThresh, InvShrink, MinMaxThresh, Mul-tiMAD, SUREThresh, VisuShrink, and WaveJS (Donoho and Johnstone 1994, 1995). Theseroutines are based on thresholding detail coefficients in the wavelet domain. An import-ant characteristic of these schemes is the amount of smoothness they introduce in thedenoised image (Hilton et al. 1996).

InvShrink uses the VisuThresh threshold, which is√

2 logN for a vector dj of detailcoefficients of lengthN . The signal is scaled before thresholding so that it has unit stand-ard deviation. In multilevel transforms, the height of threshold is doubled for each sub-


sequent level. MultiMAD also uses VisuThresh, and rescales the dj of each level so thatits median absolute coefficient value is 0.6745, which is the median absolute deviation(MAD) of an N(0, 1)-distribution. MinMaxThresh uses a minimax threshold (Donohoand Johnstone 1994), which minimises the maximum risk. SUREThresh uses Stein’sUnbiased Risk Estimate (Donoho and Johnstone 1995). VisuShrink uses VisuThreshwith shrinkage of small coefficients, called soft thresholding, as the default, but hardthresholding,i.e., removal of small coefficients, is also used. HybridThresh uses Visu-Thresh for sparse vectors and SUREThresh for dense vectors. WaveJS uses a thresholdbased on the James-Stein estimate (Donoho and Johnstone 1995). InvShrink and Mul-tiMAD change the threshold for each decomposition level, while MinMaxThresh, SUREThresh,VisuShrink, HybridThresh, and WaveJS use one global threshold.

c

2

2

2

1

11

2

d d

d

d d

d

2

2 3

1

1

3

(a) (b)

Figure 3.5. (a) Ordering of the approximation and detail coefficients of a two-level 2Dnonstandard FWT. (b) Symmetric orthonormal cubic spline scaling function (top) andcorresponding wavelet (bottom).

If a 1D threshold selection scheme were used in a 2D FWT, for example by applying a1D thresholding scheme in both spatial dimensions, assumptions used by the WaveLabroutines would be violated, because the threshold for detail coefficients would in somecases be determined from both approximation and detail coefficients, and in some casesfrom detail coefficients only (see Fig. 3.5a). So it is necessary to respect the ordering ofcoefficients in a 2D FWT. The WaveLab thresholding schemes are based on the assump-tion of white Gaussian noise. If the autocorrelation of the noise is unknown, a level-dependent threshold should be used (Johnstone and Silverman 1997). To meet these re-quirements, we have made 2D versions of the denoising routines in which all channelswith detail coefficients (see Fig. 3.5a) can be thresholded individually. Each directionchannel at each resolution (each square in Fig. 3.5a) is thresholded independently us-ing the WaveLab routines. This setting works for each FWT of stationary noise with anunknown autocorrelation: there is a difference in variance between detail channels atdifferent resolutions, but within each channel there is constant variance (Johnstone andSilverman 1997).

52 3.5 Denoising 2D images

3.5 Denoising 2D images

The BOLD contrast is defined as the difference between an MR image of a brain withincreased local activity and an image of the same brain under resting conditions (Ogawaet al. 1990). We used the BrainWeb Simulator (Kwan et al. 1996) to obtain a noise-freeT ∗

2 -weighted MR image template. The parameters of the simulator were as follows.Modality: T2; voxel size: 1×1×1 mm3; Noise: 0%; intensity non-uniformity: 0%. TheBrain Extraction Tool (Smith 2002) was used to remove non-brain voxels, by settingtheir intensities to 0. One slice (slice no. 108) of this image was selected, and used as anoise-free MR brain template.

Rician noise was added to this template as follows. Let m(x) denote the templateslice, and σm its standard deviation. Two images n1(x) and n2(x) containing i.i.d.N(0, σ2

n)-distributed noise with a known standard deviation σn were made, and the noisy MRimage m was computed as m(x) =

√(m(x) + n1(x))2 + n2(x)2. The Rician distributed

noise in m, computed as r(x) = m(x)−m(x), has a standard deviation σr with approx-imately σr = σn

√2− π/2 (Gudbjartsson and Patz 1995). This approximation was used

to create noisy MR images with a known signal-to-noise ratio (SNR). The SNR of thenoisy images was computed as:

SNR em = 10 log10

σm

σr

, (3.3)

Gaussian noise with a 1/f power spectrum was produced by transforming both n1 andn2 to the frequency domain and multiplying their frequency spectra with a 1/

√f mask,

yielding power spectra with a 1/f falloff. Both the real and imaginary parts of thespectra were multiplied, so the phase spectra did not change. Multiplication in thefrequency domain is equivalent to convolution in the spatial domain. Because this isa linear operation, the noise distribution in the 1/f versions of the n1 and n2 imageswas still Gaussian. These transformed versions of n1 and n2 were used to obtain Riciandistributed noise with a 1/f power spectrum. The BOLD image was constructed fromtwo of these noisy MR images as follows:

1: create two noisy MR images m1 and m2 using the above procedure;2: define an ‘active region’ inside the template brain and create a noise-free BOLD

image f0 as:f0(x) =

{1, if x is inside the active region0, otherwise

3: add activity to m2 by adding c f0, where c = 5% of the maximum intensity of the MRtemplate.

4: compute a BOLD image f1 as f1(x) = m2(x)− m1(x)

The top row of Fig. 3.6 shows the active spot (bright area), and two noisy images m1,with white and 1/f noise, respectively. The bottom row shows the noise-free BOLDimage, and BOLD images with white noise and 1/f noise, respectively with an SNRof 15 dB. Because there is hardly any signal in the noise-free BOLD image image f0

(see Fig. 3.6d), the BOLD images have a much lower SNR than the MR images used to


construct them. MR images with an SNR of {5, 10, 15, 20, 25, 30, 35} dB yield BOLDimages with an SNR of {-15.2, -10.2, -5.2, -0.2, 4.8, 9.8, 14.8} dB, respectively.

(a) (b) (c)

(d) (e) (f)

Figure 3.6. Image without noise and the active spot magnified and shown in white (a).Noisy MR images with increased intensities inside the active spot, containing white(b) and 1/f (c) Rician noise of 15 dB, respectively. The noise-free BOLD image (d) andBOLD images created from noisy MR images with white (e) and 1/f (f) noise of 15 dB,respectively.

For these input images, we compared wavelet-based denoising and various degreesof Gaussian smoothing. Each of the wavelet-based schemes started with a 2D FWT off1, computed as shown in Fig. 3.5b. Symmetric orthonormal cubic spline basis functionsand a decomposition level of 4 were used for all tests. After denoising with one of themethods listed in subsection 3.4.2, a 2D IFWT yielded the denoised image f2. Denot-ing the standard deviation of an arbitrary image f by σf , the following procedure wascarried out for each of the tested methods:

1: the noise ε1(x) in f1(x) before denoising was computed as ε1(x) = f1(x)− f0(x)2: the SNR before denoising, denoted SNR1, was computed as:

SNR1 = 10 log10

σf0

σε1

, (3.4)

54 3.5 Denoising 2D images

3: the residual noise ε2(x) after denoising was obtained as ε2(x) = f2(x)− f0(x)4: the SNR of the denoised image, denoted SNR2, was computed as

SNR2 = 10 log10

σf0

σε2

, (3.5)

Figure 3.8 shows SNR2 plotted against the input SNR of the MR images.

(a) (b) (c) (d)

Figure 3.7. Images from the 2D denoising test, each with a cross section (solid line)of a line in the image (dotted line) plotted inside: (a) Original (f0), (b) noisy (f1), (c)denoised (f2) with Gaussian smoothing, FWHM = 3 pixels, (d) denoised (f2) with Visu-Thresh (s).

Of the Gaussian smoothing methods, the wider kernels perform better for low inputSNRs, and smaller kernels perform better for higher input SNRs. The maximum inputSNR where Gaussian smoothing still shows SNR improvement decreases as the filterwidth increases. Figs. 3.8a+c show that even for Gaussian smoothing with an FWHMof one pixel, the maximum output SNR is about 7 dB.

The wavelet methods perform as well as Gaussian smoothing for low SNRs, andbetter than Gaussian smoothing for higher SNRs. All wavelet-based methods showmaximum output SNRs above 10 dB. For white noise and low SNRs, there is a markeddifference in output SNRs. HybridThresh, InvShrink, MultiMAD, and VisuThresh withboth hard (h) and soft (s) thresholding increase the SNR most. WaveJS, MinMaxThresh,and SUREThresh (both h and s) thresholding increase the SNR less. For higher SNRsthe differences are smaller, but InvShrink, WaveJS, SUREThresh (h), and VisuThresh (s)now produce visibly lower SNRs. There is another difference between these methods:The images produced by HybridThresh, WaveJS, and VisuThresh (s) smear the activespot out much more than the other methods do. We refer to these schemes as ‘smooth-ing wavelet methods’. MinMaxThresh and SUREThresh (both h and s) produce sharpoutput images. The other methods produce images of intermediate smoothness. Ingeneral, the smoothing wavelet methods perform better for low input SNRs, but theless smoothing wavelet methods are better when the input SNR is higher. In this exper-iment SUREThresh (h) performs bad with a low input SNR, but with both noise types itperforms best for higher input SNR.


-15

-10

-5

0

5

10

15

20

5 10 15 20 25 30 35

SN

R o

f den

oise

d B

OLD

imag

e in

dB

�

SNR of noisy MR image in dB

FWHM=1 FWHM=2 FWHM=3 FWHM=4 FWHM=5 FWHM=6

-15

-10

-5

0

5

10

15

20

5 10 15 20 25 30 35

SN

R o

f den

oise

d B

OLD

imag

e in

dB

�


HybridThreshInvShrink

JamesSteinMultiMAD

MinMaxThreshSUREThresh(h)SUREThresh(s)

VisuThresh(h)VisuThresh(s)

(a) (b)

-15

-10

-5

0

5

10

15

20

5 10 15 20 25 30 35

SN

R o

f den

oise

d B

OLD

imag

e in

dB

�



-15

-10

-5

0

5

10

15

20

5 10 15 20 25 30 35

SN

R o

f den

oise

d B

OLD

imag

e in

dB

�



JamesSteinMultiMAD

MinMaxThreshSUREThresh(h)SUREThresh(s)

VisuThresh(h)VisuThresh(s)

(c) (d)

Figure 3.8. Performance of various wavelet denoising schemes, and Gaussian smooth-ing for several values for the FWHM parameter. The SNR of the denoised image isplotted against the SNR of the noisy image.

The differences in performance are smaller for 1/f noise than for white noise. Thisholds for the wavelet methods as well as for Gaussian smoothing. For all wavelet meth-ods except InvShrink, WaveJS, and VisuThresh(s), the output SNR is a linear functionof the input SNR: unlike Gaussian smoothing, the wavelet methods improve the SNRof input images that already have a high SNR. This suggests that in terms of SNR im-provement, wavelets are an attractive alternative to Gaussian smoothing. With whitenoise and for low SNRs, the less smoothing wavelet methods, such as MinMaxThreshand SUREThresh (h and s), produce relatively lower output SNRs than the other meth-

56 3.6 Denoising a simulated time series

ods. This indicates that introducing smoothness, thereby discarding image features, isnecessary to improve images with very low SNRs. Of the methods mentioned above,MultiMAD and VisuThresh(h) give good results for all tested SNRs.

8 16 24 32 40 48 56

−0.5

0

0.5

Figure 3.9. The time signal in the active spot of the simulated time series.

3.6 Denoising a simulated time series

In most neuroimaging applications it is not possible to separate signal and noise, so theSNR is not known. Therefore, a simulation study was performed in which the SNR isknown a priori. We constructed an artificial time series of 64 copies of the MR templateimage of the previous experiment, and superimposed noise on each image according tothe procedure described in section 3.5. A block signal b(t) (see Fig. 3.9) was superim-posed on the time signals at the voxel locations inside the active region (see Fig. 3.6a).The sign of the superimposed signal altered after every 8th time point. The size of theoriginal active region was 762 pixels.

The time series consisted of 8 blocks of 8 images: 4 blocks were labelled ‘rest’, and4 were labelled ‘task’. The ‘task’ blocks were those in which the time signal is positive(see Fig. 3.9). The amplitude of the time signal was set to 1% of the maximum intensityin the MR template. Starting with the image f0(x) from section 3.5, we use F0 to denotethe original time series with the time signal b(t) superimposed on it, but without thenoise:

F0(x, t) =

{f0(x) + b(t) in the active spotf0(x) in the rest of the image (3.6)

The noisy time series F1 was computed as:

F1(x, t) = F0(x, t) + ε1(x, t), (3.7)

where F1 and ε1(x, t) denote the noisy time series and the value of the input noise,respectively. A BOLD image was computed from each individual image by subtracting


the time series mean. As demonstrated in section 3.3, the noise distribution in suchdifference images is approximately Gaussian. The BOLD image was denoised using themethods from the previous section, after which the time series mean was added to thedenoised image.

Let F2 denote the denoised time series. After denoising, we tested each voxel loca-tion x for the presence of the signal b(t). The residual noise ε2 was computed as:

ε2(x, t) = F2(x, t)− f0(x)− b(t). (3.8)

We denote the temporal residual noise in a voxel x as a function of t by εx2(t) = ε2(x, t).The temporal SNR in a voxel x after denoising was computed as:

SNR2(x) = 10 log10

σb

σεx2

, (3.9)

where σb and σεx2

are the standard deviations of b(t) and εx2(t), respectively.

3.6.1 Effect on the temporal SNR

Figure 3.10 shows SNR2(x), averaged over all locations x inside the active spot, plottedagainst the input spatial SNR. The graphs for Gaussian denoising show the same beha-viour as in the 2D image experiment,i.e., the SNR curve eventually reaches a plateauvalue. The wavelet-based methods improve the temporal SNR both for low and highinput SNR. The same relation observed in the previous experiment between smooth-ness of the output image and the output SNR is visible here. The smoothing waveletmethods and wide Gaussian smoothing filters produce the highest temporal SNRs forlow input SNRs, and the less smoothing wavelet methods and narrow Gaussian filtersperform better for high input SNRs.

3.6.2 Effect on the shape of the detected spots

Apart from comparing the average temporal SNRs in the active spot, we also look atspatial maps of temporal SNR values of the denoised time series. Ideally, these mapsshould have high values inside the active spot and low values outside it. Figures 3.11 -3.14 show the temporal SNRs in the area containing the active spot for white noise and1/f noise of 11 dB and 14 dB, respectively. Note that the images were inverted (reversevideo mode) for enhanced display purposes.

Gaussian smoothing with small smoothing kernels and the smoothing wavelet meth-ods show bright spots, even for a low input spatial SNR like 11 dB. Wider kernels,with FWHM > 3 pixels, produce maps with a very smooth spot, which is less bright.The smoothing wavelet methods show bright spots, while those produced by the lesssmoothing wavelet methods are darker, cf. Fig. 3.11. The smoother the output image,the more the shape and the SNR value distribution of the visible spot differs from theoriginal active spot. Temporal SNR maps of methods that produce smooth images (both


0

2

4

6

8

10

12

14

8 11 14 17 20 23

outp

ut: S

NR

of t

ime

sign

al in

dB

input: SNR of MR images in dB


0

2

4

6

8

10

12

14

8 11 14 17 20 23

outp

ut: S

NR

of t

ime

sign

al in

dB



JamesSteinMultiMAD

MinMaxThreshSureThresh

VisuShrink(h)VisuShrink(s)

white noise, Gaussian smoothing white noise, wavelet denoising

0

2

4

6

8

10

12

14

8 11 14 17 20 23

outp

ut: S

NR

of t

ime

sign

al in

dB



0

2

4

6

8

10

12

14

8 11 14 17 20 23

outp

ut: S

NR

of t

ime

sign

al in

dB



JamesSteinMultiMAD

MinMaxThreshSureThresh

VisuShrink(h)VisuShrink(s)

1/f noise, Gaussian smoothing 1/f noise, wavelet denoising

Figure 3.10. Performance of the wavelet denoising schemes, as well as Gaussiansmoothing for six FWHM values. The average temporal SNR inside the original activespot in the denoised image is plotted against the spatial SNR of the noisy input image.

Gaussian and wavelet-based) show spots with a somewhat elliptic shape and a peaked(non-uniform) intensity distribution. The less smoothing wavelet methods retain theshape of the original spot and its uniform intensity distribution. For noise of 14 dB,InvShrink, MinMaxThresh, SUREThresh and Gaussian smoothing with FWHM = 1 re-turn almost exactly the original spot, with a very uniform distribution of temporal SNRvalues. Other less smoothing wavelet methods, such as MultiMAD and VisuThresh (h),and Gaussian smoothing with FWHM = 2, retain the shape of the spot quite well, withmost of the changes in the temporal SNR near the contour of the spot.


Hybrid InvShrink WaveJS MultiMAD MinMax

SUREThresh(h) SUREThresh(s) VisuThresh(h) VisuThresh(s) FWHM = 1

FWHM = 2 FWHM = 3 FWHM = 4 FWHM = 5 FWHM = 6

Figure 3.11. Temporal SNR maps (inverted) of the area around the active spot. Theoriginal images contained white noise with a spatial SNR of 11 dB.

3.6.3 Segmentation via SNR thresholding

Segmentation of MR images based on thresholding is a commonly used technique, andit has also been used on statistical parametric maps (Rajapakse et al. 1997, 2001.). Weassumed the temporal SNR maps to have bimodal histograms: one peak of low valuesfor the background and another peak of high values for the active spot. This assumptionwas used to segment the maps into a ‘non-active’ area and an ‘active’ area. We used thefollowing steps to determine a threshold:

1: Smooth the histogram with a moving average filter2: Take the logarithm of each entry in the smoothed histogram3: Model the log-histogram as the sum of two Gaussian peaks4: Use the 99.9% level of the cumulative histogram of the left peak as a threshold

Filtering the histogram was implemented by applying a three-tap averaging filter tentimes. The logarithm was used to amplify the second peak: the number of backgroundpixels is generally much larger than the size of the active spot, and large values decreasemore by taking their logarithm than small values. The histogram was approximated bythe sum of two Gaussian peaks with the Levenberg-Marquardt curve fitting algorithm.The threshold was based on the distribution of the noise.





Figure 3.12. Temporal SNR maps (inverted) of the area around the active spot. Theoriginal images contained 1/f noise with a spatial SNR of 11 dB.

This method is very simple, and based on the assumption of the bimodality of thehistogram. We demonstrate its performance in a number of cases, with SNRs rangingfrom low to high. With a low temporal SNR, the histogram of the temporal SNR map isequal to the histogram of an image with only noise. As it is not possible to distinguishtwo peaks in this case, the threshold is determined incorrectly (see Fig. 3.15c).

In the worst case we tested (Fig. 3.15a), the temporal SNR map itself also has a verylow SNR of -10 dB, and the histogram of the SNR map has the shape of the noise distri-bution, so that separation of signal and noise is not possible. The histogram in Fig. 3.15byields a sensible threshold, though the noise prevents a better detection (see Fig. 3.15e).Figure 3.15c-d show that temporal SNR maps with an SNR of at least 0 dB can be seg-mented well with this technique.

In the experiment, we looked at two measures: the number of false positive classific-ations, i.e., points outside the original active spot labelled active, and false negative clas-sifications i.e., points inside the original active spot labelled non-active. Tables 3.3 and3.4 show the false positive and false negative classifications, respectively. Images withspatial SNRs of 8 dB do not yield SNR maps that can be analysed in this way, because theSNRs of the BOLD images, as well as the SNRs of the temporal SNR maps, are too low(see the list of BOLD SNRs in section 3.5 and Fig. 3.15a-b). They either yield many false





Figure 3.13. Temporal SNR maps (inverted) of the area around the active spot. Theoriginal images contained white noise with a spatial SNR of 14 dB.

Table 3.3. Number of false positive classifications for white (left) and 1/f (right) noise.The SNR maps were assumed to have bimodal histograms.

spatial SNRmethod 8 11 14 17 20 23Hybrid 767 651 356 168 74 0InvShrink 0 1 3 0 0 0JAMES 599 589 78 2 154 75MultiMAD 54 123 96 9 0 0MinMax 2 42 42 3 0 0SURE(h) 47 0 0 0 0 0SURE(s) 236 292 165 1 0 0Visu(h) 52 120 95 10 0 0Visu(s) 680 552 313 165 35 0FWHM = 1 2 41 10 0 0 0FWHM = 2 198 62 15 35 24 23FWHM = 3 348 35 65 98 85 86FWHM = 4 116 68 148 155 182 166FWHM = 5 84 104 188 221 200 229FWHM = 6 99 132 216 245 249 265


positives or many false negatives. In general, the denoising methods that introducemuch smoothness yield more false positive classifications for higher SNRs, while the





Figure 3.14. Temporal SNR maps (inverted) of the area around the active spot. Theoriginal images contained 1/f noise with a spatial SNR of 14 dB.

Table 3.4. Number of false negative classifications for white (left) and 1/f (right) noise.The SNR maps were assumed to have bimodal histograms.



less smoothing methods yield many false negatives for the lowest SNR. Of the wavelet-based methods, InvShrink and SUREThresh (h) perform well for both noise types, and


(a) (b) (c)

(d) (e) (f)

Figure 3.15. Six situations in which the temporal SNR threshold was determined.Dashed line: histogram entries; solid line: log-histogram; vertical line: threshold; ∗:background intensities; ◦: activation intensities. BOLD images: SNR of -10 dB (a), -5 dB (b), 0 dB (c), 5 dB (d). (e) Activation (inverted) detected from (b), (f) activation(inverted) detected from (c).

MultiMAD, MinMaxThresh, and VisuThresh (h) yield good results for moderate andhigh SNRs. The relatively high numbers of type II errors for Gaussian smoothing withlarge FWHM relate to the blurring effect visible in Figs. 3.11- 3.14. The intensity distri-bution in the spot changes from uniform to peaked, which influences detections closeto the boundary of the spot. InvShrink, SUREThresh (h) and Gaussian smoothing withFWHM = 1 yield good results. MultiMAD and MinMaxThresh also perform well, theother methods yield more errors.

3.7 Statistical tests on the simulated time series

We also performed a standard statistical analysis on the denoised time series with theSPM method (Friston et al. 1995c). The design matrix, X in (3.1), had two columns: ablock signal like in Fig. 3.9 and a column containing a constant signal to capture the time

64 3.7 Statistical tests on the simulated time series

series mean. In terms of (3.1), the matrix β contained two columns, each representingan image. The first image, β1(x), contained the covariance of the block signal with thetime signal at each location x. Image β2(x) contained the time series mean of each voxel.

Although the noise in the MR images is Rician distributed, the noise in the BOLDimage has, to very good approximation, a Gaussian distribution, as explained in section3.3. If the temporal noise is Gaussian distributed, the values in β(x) are t-distributed.Using the sample variance s2 (see section 3.2) we can compute a statistical parametricmap of p-values by means of the t-test. We used FDR correction as described in section3.2, with an FDR parameter q of 0.05, to threshold β1. This yielded the statisticallysignificant activations for all denoising methods. To obtain more robust results, thisexperiment was repeated 20 times and the outcomes of the individual experiments wereaveraged.

Two important issues are critical to the validity of this method. First, a t-test is onlyappropriate if, after denoising, the temporal noise is still Gaussian. Second, to use theη = 1 setting described in section 3.2, the data is required to be PRDS. The validity ofthese two assumptions is discussed in the next subsections.

3.7.1 Impact of spatial filtering on the distribution of temporal noise

The p-values resulting from a set of statistical tests are uniformly distributed on [0,1] ifthe omnibus null hypothesis is true (Donahue 1999). The sequence of ordered p-valuesfrom that set of tests should lie on a straight line. We tested this by constructing timeseries similar to those previously described, but without activation: the null hypothesiswas true for all voxels. We applied all denoising methods to these time series and sor-ted the p-values acquired in the statistical analysis. Figure 3.16 shows representativeresults of both wavelet methods and Gaussian smoothing. Some methods produceGaussian temporal noise, others introduce a deviation from Gaussianity. The top rowshows that Gaussian smoothing with FWHM=1 yields uniformly distributed p-valuesunder the null hypothesis, while for FWHM=3 or higher, non-uniformly distributedp-values are obtained. The plots in the bottom row show results for three wavelet meth-ods. For InvShrink and MinMaxThresh the distribution of p-values is uniform, but forMultiMAD it is non-uniform.

The fact that even for Gaussian smoothing the distribution of the noise may becomenon-Gaussian may seem puzzling, but can be explained by the fact that, for Rician noise,a higher intensity in the image leads to a larger noise amplitude. This gives a kind ofspatial structure to the noise, which is observable in the (BOLD) difference images. De-noising methods that produce smoother images change this structure, thus introducingerrors. Although the deviation from normality varies between methods, we chose tokeep all methods in the statistical analysis, since the t-test is quite robust to deviationsfrom normality.


0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�

index i of sorted p-value

5 dB10 dB15 dB20 dB25 dB

0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�



0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�



(a) (b) (c)

0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�



0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�



0

0.2

0.4

0.6

0.8

1

p-va

lue

i

�



(d) (e) (f)

Figure 3.16. Sorted p-values in the statistical map of denoised time series with whitenoise, without activation. (a) No denoising, (b) FWHM = 1, (c) FWHM = 3, (d)InvShrink, (e) MultiMAD, (f) MinMaxThresh. The symbols {o, x, +, *, �} representthe time series of images with an input SNR of {5, 10, 15, 20, 25} dB.

3.7.2 Positive regression dependence of the p-values

Benjamini and Yekutieli (2001) show that the setting η = 1 can be used in the FDR-controlling procedure if the data are PRDS, and that multivariate positively correlatednormally distributed data are PRDS. Genovese et al. (2002) argue that most fMRI datasets satisfy this condition.

To test the spatial correlation of the noise after applying a denoising method, weobserved the time series (without activation) of the residual noise in the GLM, i.e., thee images in (3.1). We used the SPMd toolbox (Luo and Nichols 2003) to compute anormalised residual time series E . The noise in this time series was N(0, 1)-distributed.We tested for a positive correlation as follows. Let Ex

2 (t) denote the normalised residualsignal e in (3.1) at location x as a function of t. We assumed the autocorrelation functionto be localised, and for each location x in the image, the amount of spatial correlationa(x) was estimated by averaging the covariances of the voxel’s time signal with those


of a number of neighbouring voxels:

a(x) =1

Nv

∑xi

cov (Ex2 , E

xi2 ) , (3.10)

with xi from a small neighbourhood of x of size Nv (in our case, Nv was 11× 11 voxels).Figure 3.17 shows this function for a number of settings. The top rows show the amountof correlation found without applying denoising. Wavelet methods introduce posit-ive spatial correlations for lower input SNRs, and hardly any negative correlations forhigher input SNRs. Gaussian smoothing introduces strong positive correlations for allSNRs.

Another way to characterise the autocorrelation function is to look at statistics ofthe distribution of a(x). Figure 3.17 and Table 3.5 show that every denoising methodchanges the spatial correlations in the residual time series. All methods, except Mul-tiMAD, introduce significantly more positive correlations than negative ones. Waveletmethods change the spatial correlation much less than Gaussian smoothing. We assumethat without denoising, the residuals do not have significant negative correlations. Inour test data (spatially white or 1/f Gaussian noise) we know that this is the case. Be-cause the only significant correlations introduced by denoising are positive, the resid-uals are either uncorrelated, or positively correlated in space. These results indicate thatthe η = 1 setting is allowed in the statistical tests.

3.7.3 Results

In this experiment, we investigate the effect of denoising on the outcomes of the usualstatistical analysis. In particular, we looked at two measures: the number of false pos-itives and the number of false negatives. It is important to realize that denoising hastwo effects: first, the desired effect of noise reduction, and second, an unwanted butunavoidable change of the shape of the active spot. In order to take the latter effect intoaccount, false positives/negatives were defined as points outside/inside the originalactive spot (see Fig. 3.6d) after denoising, which are marked ‘active’/‘inactive’ in thet-test with FDR control (q = 0.05). These numbers are shown in Tables 3.6 and 3.7. Falsediscovery rates can be obtained from this table by computing the number of false pos-itives, divided by the number of detections; the latter number equals the size of activespot(=762 pixels), plus the number of false positives, minus the number of false negat-ives. A consequence of taking the original active spot as a reference is that the observedfalse discovery rates after denoising may exceed the 5% threshold imposed by the FDRcontrolling procedure.

These tables show that the smoothing methods produce more false positives, andInvShrink, MinMaxThresh, SUREThresh(h), VisuThresh(h) and Gaussian smoothingwith FWHM=1 produce very few false positives. The other wavelet methods and Gaus-sian smoothing with an FWHM of two voxels also perform well. For larger Gaussianfilters, the number of type I errors increases with the filter width. The number of type


−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

120

140

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

(a) (b) (c) (d) (e)

−0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

−0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

−0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

(f) (g) (h) (i) (j)

−0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

−0.2 0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

−0.2 0 0.2 0.4 0.6 0.8 10

5

10

15

−0.2 0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

−0.2 0 0.2 0.4 0.6 0.8 10

5

10

15

20

(k) (l) (m) (n) (o)

Figure 3.17. Surface plots of the spatial autocorrelation function (top) and histograms(bottom) of individual correlations computed in (3.10) of residual time series. Theoriginal time series contained white noise. (a-e) SNR input images = {5, 10, 15, 20,25} dB, without denoising. (f-j) Idem, denoised with MinMaxThresh. (k-o) Idem, afterGaussian smoothing with FWHM = 3 voxels.


Table 3.5. Minimal, maximal, mean, and median values and the standard deviationsof the temporal statistical correlation of voxels with their neighbours. The input timeseries contained no activation, and the SNR was 15 dB. Left: white noise, right: 1/fnoise.

statisticmethod min mean max σ medNo Denoising -0.03 0.00 0.04 0.01 -0.00Hybrid -0.05 0.00 0.07 0.02 0.00InvShrink -0.07 0.06 0.28 0.06 0.04WaveJS -0.04 -0.00 0.04 0.01 -0.00MultiMAD -0.18 0.08 0.83 0.14 0.03MinMax -0.06 -0.00 0.06 0.02 -0.00SURE(h) -0.04 -0.00 0.04 0.01 -0.00SURE(s) -0.05 0.00 0.07 0.02 0.00Visu(h) -0.08 -0.00 0.10 0.02 0.00Visu(s) -0.08 0.03 0.14 0.04 0.02FWHM = 1 -0.03 0.03 0.12 0.03 0.03FWHM = 2 0.01 0.20 0.35 0.06 0.20FWHM = 3 0.15 0.39 0.57 0.09 0.40FWHM = 4 0.25 0.55 0.74 0.12 0.58FWHM = 5 0.32 0.66 0.85 0.13 0.70FWHM = 6 0.34 0.74 0.91 0.13 0.80

statisticmethod min mean max σ medNo Denoising -0.04 0.04 0.12 0.03 0.04Hybrid -0.02 0.07 0.19 0.04 0.06InvShrink -0.01 0.18 0.51 0.10 0.16WaveJS -0.04 0.05 0.13 0.03 0.04MultiMAD -0.16 0.13 0.88 0.16 0.09MinMax -0.03 0.05 0.16 0.04 0.04SURE(h) -0.04 0.04 0.13 0.03 0.04SURE(s) -0.02 0.07 0.19 0.04 0.06Visu(h) -0.03 0.05 0.19 0.04 0.05Visu(s) -0.03 0.12 0.34 0.07 0.11FWHM = 1 0.02 0.16 0.32 0.05 0.15FWHM = 2 0.13 0.37 0.57 0.09 0.37FWHM = 3 0.26 0.53 0.73 0.12 0.55FWHM = 4 0.29 0.65 0.84 0.14 0.67FWHM = 5 0.35 0.73 0.93 0.15 0.78FWHM = 6 0.36 0.79 0.96 0.14 0.86

Table 3.6. Number of type I errors in the SPM analysis with FDR control (q = 0.05) forwhite (left) and 1/f (right) noise.



I errors is larger for 1/f noise than for white noise. The less smoothing wavelet-basedmethods and Gaussian smoothing with an FWHM of one voxel produce more type IIerrors than the other methods. With 1/f noise, this effect is worse than with white noise.In general, the wavelet-based methods and Gaussian smoothing with an FWHM of onevoxel introduce more type II errors, the other Gaussian filters introduces more type Ierrors. Figures 3.18 - 3.21 show statistical parametric maps built from the denoised timeseries with white noise and 1/f noise of 11dB and 14dB, respectively. Generally, the less


Table 3.7. Number of Type II errors in the SPM analysis with FDR control (q = 0.05)for white (left) and 1/f (right) noise.



smoothing methods produce spots that are closest to the original. The spots detectedafter InvShrink, MinMaxThresh, and SUREThresh (h) denoising and Gaussian smooth-ing with an FWHM of one voxel are closest to the original spot (see Figs. 3.18- 3.21).Because the boundary voxels are not detected, the resulting active spot is smaller thanthe original (type II errors). HybridThresh, WaveJS, and VisuThresh(s) and all Gaussiansmoothing methods produce larger spots (type I errors).

3.8 Statistical tests on a real fMRI data set

To test the denoising methods on real data, we used an example fMRI data set providedby the Dartmouth Brain Imaging Center. This is a recording of an experiment in whicha subject was scanned for 4 minutes with a TR of 2000 ms. The subject’s conditionswitched every 30 s (15 scans) between ‘rest’ and ‘task’, starting with ‘rest’. Duringthe ‘task’ periods, the subject had to perform an object manipulation task. The data setconsists of 120 volumes with a resolution of 64×64×27 voxels. Each voxel has a volumeof 3.75×3.75×5.50 mm3.

The 3D volumes, each consisting of 27 axial planes of 64×64 voxels, were trans-formed plane-by-plane to the wavelet domain. The decomposition level was set to4. Denoising was done by both the wavelet-based methods and Gaussian smoothing.For the latter we used smoothing kernels of 5×5×5.5 mm3, 10×10×5.5, and 15×15×5.5mm3mm3. We compared the activation images, using the activation map of the originaldata without preprocessing (see Fig. 3.22) as a reference: the shape of the active regiondetected after denoising should not differ too much from the region detected from theoriginal time series. The data underwent the same statistical analysis as the simulatedtime series. Figure 3.23 shows the voxels in a selected plane whose t-statistic was above

70 3.9 Conclusions




Figure 3.18. Statistical parametric maps of the area surrounding the active spot. Theoriginal images contained white noise with a spatial SNR of 11dB.

the FDR threshold, for all denoising methods, overlayed on the first image of the ori-ginal time series.

As in the case of the simulated time series, the active spot takes an elliptic shapefor Gaussian smoothing with large FWHMs. The spots detected from the data sets de-noised with MinMaxThresh and SUREThresh look very similar, and those found withWaveJS, InvShrink, and VisuThresh(h) are also similar. HybridThresh, MultiMAD andVisuThresh(s) yield rather different maps. After smoothing with a Gaussian kernel withan FWHM of 5×5×5.5 mm3, the detected spot resembles the ones found after InvShrinkand VisuThresh with hard thresholding. The other smoothing kernels yield heavily de-formed maps and show active spots very different from the one in the reference image.

3.9 Conclusions

We have compared wavelet denoising and Gaussian smoothing in the context of func-tional MRI in three settings: (i) 2D images and (ii) time series of 2D images, both con-taminated by white or 1/f noise with a known SNR, and (iii) a real fMRI data set withan unknown noise type and SNR. The noise in BOLD images was described as the dif-





Figure 3.19. Statistical parametric maps of the area surrounding the active spot. Theoriginal images contained 1/f noise with a spatial SNR of 11dB.

ference of two MR images containing Rician noise, and shown to have a Gaussian-likedistribution. The denoising methods were compared with respect to SNR improvement,effect on the shape of activated regions, and the effect on the quality of statistical para-metric maps. In contrast to most previous wavelet-based denoising schemes, we havechosen to do the subsequent statistical analysis in the spatial domain. This allowed usto directly compare the results of Gaussian and wavelet-based methods.

A discriminating characteristic of all tested denoising methods is the amount ofsmoothing they introduce. This characteristic plays a significant role in the applicab-ility of the methods. When the input SNR is very low, denoising schemes that producesmoother images are preferred, and the gain in SNR is highest. However, when theimages have moderate to high SNRs, these denoising schemes change the shapes of ob-jects in the images. The more smoothing is introduced, the heavier the deformation,and in this case less smoothing wavelet-based denoising methods are preferred. Gaus-sian smoothing may be the best choice for SNRs which are too low even for smoothingwavelet-based methods, but the resulting SNR may still not be high enough for a reli-able analysis.

The experiment with artificial time series showed that all denoising schemes have aneffect on the shape of the activation spot. Gaussian smoothing and the more smoothing

72 3.9 Conclusions




Figure 3.20. Statistical parametric maps of the area surrounding the active spot. Theoriginal images contained white noise with a spatial SNR of 14dB.

wavelet-based methods introduce severe deformations and blur the edges of the activespot. We used spatial maps of the temporal SNR as a diagnostic to compare the de-noising methods. Segmentations based on the temporal SNR maps showed that heavysmoothing obscures the border regions of the active spot, introducing false negatives,while for low SNRs the less smoothing methods lead to false positives. In the inter-mediate SNR range, wavelet methods generally show smaller numbers of errors thanGaussian smoothing. The same was observed in the statistical analysis. Via plots of thedistribution of p-values under the null hypothesis, we have shown that after the lesssmoothing wavelet-based denoising methods and after modest Gaussian smoothing,fMRI data do not violate the assumption of normally distributed temporal noise. Alltested denoising method preserved the PRDS property of fMRI data, which allowed usto use the favourable η = 1 setting for the FDR controlling procedure.

For the real fMRI data set, only the smallest Gaussian smoothing kernel yieldedreliable results. The wide smoothing kernels yield much larger detected areas (meaningmore type I errors), in contrast to those obtained via less smoothing wavelet denoisingmethods.

Summarising all of these results, wavelet denoising methods that introduce relat-ively little smoothness are generally preferable over Gaussian smoothing for denoising





Figure 3.21. Statistical parametric maps of the area surrounding the active spot. Theoriginal images contained 1/f noise with a spatial SNR of 14dB.

Figure 3.22. Activation detected by the SPM method in the original fMRI time series,after FDR thresholding with the FDR parameter set to q = 0.05.

74 3.9 Conclusions

Hybrid WaveJS InvShrink MultiMAD

MinMax SURE(h) SURE (s) VisuThresh(h)

VisuThresh(s) FWHM = 5×5 FWHM = 10×10 FWHM = 15×15

Figure 3.23. Activation detected with the SPM method in the denoised fMRI timeseries, after FDR thresholding with the FDR parameter set to q = 0.05.

fMRI time series. In particular, InvShrink, MinMaxThresh or SUREThresh (h) are safechoices. For low SNRs, the methods MultiMAD and SUREThresh (s) are best applied.

We expect to find similar results for PET data, although there are differences withfMRI regarding noise models and the SNR. We did not use temporal denoising of thetime series in this study, but wavelet denoising may prove a good alternative to smooth-ing in time as well. This will be the subject of future work.

76 3.9 Conclusions

Modelling the haemodynamic responsefunction

Chapter 4

Extracting the haemodynamic responsefunction using Fourier-waveletregularised deconvolution

Abstract

We present a method to extract the haemodynamic response function (HRF) from a func-tional magnetic resonance imaging (fMRI) time series. The method is based on Fourier-wavelet regularised deconvolution (ForWaRD). The extraction algorithm is very general: itrelies only on the assumptions of the general linear model (GLM) and the fact that signaland noise can be distinguished in the frequency and wavelet domain, respectively. Beforeextracting the HRF, low-frequency trends are removed from the time signals by a standardwavelet-based method. The combined routine of detrending and extraction is tested extens-ively, using noise from an fMRI data set and simulated event-related activation. The outputof the extraction program is a time series of image volumes, containing the HRF at eachvoxel location. Such a time series may be used in many fMRI-related problems, like de-fining region-specific HRFs (by combining the HRFs found in a specific region), finding aset of basis functions to efficiently describe the HRF (by decomposing the extracted HRFinto a more general set of functions), or comparing subject-specific HRFs. A new HRFmodel is introduced, and it is used in combination with the extraction method to describefMRI responses. The use of these modelled responses is demonstrated in the analysis of anevent-related fMRI experiment. Test results show that using subject-specific, regional HRFsdramatically improves the detection of active regions in fMRI.

80 4.1 Introduction

4.1 Introduction

Functional magnetic resonance imaging (fMRI) is one of the most versatile methodsfor functional neuroimaging. Regional brain activation is accompanied by temporalchanges in blood oxygenation, generating a blood oxygenation level dependent (BOLD)contrast in MR images (Ogawa et al. 1990). Many of the current analysis methods forfMRI time series are based on hypothesis testing. Given a model for the response toa stimulus pattern, a statistic can be computed in every voxel of the correspondencebetween the predicted signal and the measured signal. The most widely-used tech-nique based on hypothesis tests is statistical parametric mapping (Friston et al. 1995c).Given a noise distribution, this technique uses the estimated parameters of the noisein every voxel to determine a threshold for the computed statistic. The general linearmodel (GLM) treats the response to a stimulus pattern as the output of a linear, timeinvariant (LTI) system (Boynton et al. 1996). Two consequences of the GLM are that (i)the response to stimuli of one type can be modelled by convolving the time pattern ofthose stimuli with the corresponding response function, and (ii) the total response is thesum of responses to all stimulus types (see Fig. 4.1a). The stimulus pattern is knownfrom the experimental setup, and the haemodynamic response function (HRF), the tem-poral change in blood oxygenation that generates the BOLD contrast, is unknown. TheHRF is the BOLD response to an impulse. It is usually modelled as a smooth curve,starting about 2 seconds after the impulse, peaking about 6 seconds after the impulse,and returning to baseline in about 30 seconds.

The focus of this chapter is on extracting the HRF from fMRI data. The method pro-posed in this chapter is based on Fourier-wavelet regularised deconvolution, ForWaRDfor short, which was developed recently (Neelamani et al. 2004). ForWaRD combinesdeconvolution in the frequency domain with regularisation in the frequency domainand in the wavelet domain. The advantage of deconvolution in the frequency domainis the ability to deal with overlapping responses. Its main weakness is noise amplifica-tion. Noise can be reduced in the frequency domain by shrinking frequency coefficients,but noise and signal may be difficult to separate. ForWaRD solves this by using wave-let domain Wiener shrinkage. ForWaRD is related to a number of recent wavelet-basedtechniques for noise suppression during deconvolution (Sanchez-Avila 2002, Kalifa etal. 2003). An advantage of ForWaRD is that signal and response can be interchangedwithout violating the assumptions of the algorithm.

The novelty of our application of deconvolution is that the underlying signal (thestimulus pattern) is known and that the response function is reconstructed. The onlyprerequisites are an fMRI data set, the stimulus time pattern, and the GLM. The currentmethod extracts one HRF per experiment, in the future this may be extended to multipleHRFs (e.g., for different stimulus types) and multiple time series (e.g., extracting a meanHRF in a group study). The output of our method is a post-stimulus time series ofimage volumes, containing the HRF at each voxel location. An example application ofsuch a time series is the extraction of subject-specific, group-specific, and region-specificHRFs, which is easily implemented by averaging time signals from the corresponding

Extracting the HRF using Fourier-wavelet regularised deconvolution 81

0 32 64 96 128 160 192 224 256

i

ii

iii

iv

0 8 16 24 32

i

ii

iii

(a) (b)

Figure 4.1. (a) Inputs and output of the GLM: ( i) the impulse response function, ( ii)the stimulus pattern, ( iii) the total response without noise, ( iv) the noisy response.(b) Various stages of the ForWaRD algorithm: ( i) after frequency domain inversion,( ii) after frequency domain shrinkage, ( iii) after subsequent wavelet domain Wienershrinkage.

volumes.Extracting an HRF from fMRI data is difficult, and most current methods require

strong assumptions about the data. The simplest way to acquire the HRF is selectiveaveraging: use a long interstimulus interval (ISI) and assume that responses do notoverlap (Bandettini and Cox 2000, Buckner et al. 1996, Aguirre et al. 1998). Selectiveaveraging works, but because of the long ISI that is required, it is very time-consuming.Another method is averaging trials with overlapping responses, ignoring the fact thatoverlapping responses introduce errors (Boynton et al. 1996, Zarahn et al. 1997, Burocket al. 1998, Dale 1999). More advanced techniques use a function to describe the HRF,and determine the parameters of that function via curve-fitting (Glover 1999, Miezin etal. 2000, Hinrichs et al. 2000, Ollinger et al. 2001b). Another approach, based on the GLM,is the expansion of the HRF into a set of basis functions (Friston et al. 1995a, Josephs etal. 1997, Friston et al. 1998a).

Ciuciu et al. (2003) use a Bayesian method to extract the HRF. They assume a causal,smooth HRF that starts and ends at baseline, for each stimulus type. A Gaussian tem-poral autocorrelation is imposed on the HRF. The method is capable of extracting mul-tiple HRFs from multiple experiments in one run.

We present a new model for the HRF, based on general notions in dynamical systemstheory. The model is used in combination with the HRFs extracted from a first fMRIexperiment, to predict the responses in a second fMRI experiment. Results indicatethat a modelled HRF based on a region-specific extracted HRF yields a more preciseestimator than a standard HRF.

A difference between our HRF model and other models in use today is that it was

82 4.2 Modelling fMRI Time Signals

derived from systems theory. Many HRF models lack a theoretical description of fMRIresponses. Many groups use a Poisson function (Boynton et al. 1996, Miezin et al. 2000)or extensions thereof (Friston et al. 1998a, Gossl et al. 2001). These functions are popularbecause they seem to fit the data very well, but they are not based on a model of theBOLD response. Studies have shown that the BOLD response is not linear if the dura-tions or the amplitudes of the stimuli vary (Vazquez and Noll 1998, Friston et al. 2000a).This chapter treats only fMRI experiments with very short stimuli, and the GLM is as-sumed to be valid.

The remainder of this chapter is organised as follows. Section 4.2 describes howfMRI time signals are modelled. Section 4.3 treats the problem of regularisation, andpresents the regularisation methods used in this article. A method for extracting theHRF is given in section 4.3.3. The method is tested on simulated activation in section 4.4.A model for the BOLD HRF based on dynamical systems is presented, and noise froman fMRI time series is superimposed on the modelled activation. The model is usedwith the extracted HRFs in section 4.5. The test data are from an event-related fMRIexperiment of a motor task, consisting of two runs. The first run uses a fixed interstim-ulus interval, the second run uses random interstimulus times. The HRF model andthe coefficients extracted from one experiment are used to predict the responses in theother experiment. Results show that using the modelled HRF leads to an improvementin the localisation of activation, and increased statistical significance, compared to thestandard method. Section 4.6 contains some general conclusions.

4.2 Modelling fMRI Time Signals

The common method for hypothesis testing in fMRI is statistical parametric mapping(SPM). SPM assumes independent, identically distributed (i.i.d) Gaussian temporal noise.After parameterising the noise, SPM consists of:

1: computing a statistic at every voxel location;2: choosing a threshold based on the parameters of the noise and the correction for

multiple testing;3: thresholding the map of statistic values.

4.2.1 The General Linear Model

The GLM describes the response in an fMRI experiment as a weighted sum of explan-atory signals. An explanatory signal models the response to stimuli of one type. Letthe matrix Y [T×N ] denote the data of an fMRI experiment, where each element yij is themeasurement at time i = 1, . . . , T and voxel location j = 1, . . . , N . According to theGLM,

Y = Xβ + e. (4.1)


X [T×M ] is the design matrix, whose column vectors are the explanatory signals. Theseare multiplied by the weights in matrix β[M×N ]. The matrix e[T×N ] contains the residualsignal at each voxel location in each scan. A least-squares estimate b for β is given by(XT X)−1XT Y . Given a Gaussian temporal distribution of the residuals, which followsfrom the GLM if Y contains Gaussian temporal noise, the significance of the elementsof b can be computed in each voxel via standard hypothesis tests. For stationary noise,the threshold need only be determined once and can be applied to the entire map ofstatistic values.

4.2.2 Determining the HRF

In the GLM, the explanatory signal gf,h representing responses with shape h to a stimu-lus pattern f , can be written as a convolution:

gf,h(n) = (h ∗ f)(n), n = 1 . . . N, (4.2)

with ‘∗’ denoting discrete circular convolution. A convolution in the time domain is apointwise multiplication in the frequency domain, and a deconvolution is a pointwisedivision:

Gf,h(k) = H(k) F (k), andH(k) = Gf,h(k) / F (k), k = 1 . . . N,

(4.3)

where F (k), Gf,h(k) and H(k) denote the Fourier transforms of f(n), gf,h(n) and h(n),respectively. If the signal contains unmodelled components, represented in (4.1) by e,these are also present in the extracted response (see Fig. 4.1b). At frequencies k whereF (k) is small, noise is amplified. If F (k) = 0, the deconvolution problem is singular.To cope with these situations, regularisation is required. The ForWaRD regularisationscheme, used in this chapter, is described in section 4.3.

4.2.3 Modelling the HRF

The temporal resolution of fMRI data may be too coarse to accurately describe the HRF.A description at a finer scale is often obtained by using a model for the HRF.

We present an HRF model based on a linear system showing damped oscillations.The (canonical) differential equation that describes such a system, represented by statevariable O(t), is:

O′′(t) +O′(t) +O(t) = 1, whereO(0) = 0, O′(0) = 0.

(4.4)

The primes denote derivatives with respect to the time variable t. O(t) represents theregional blood oxygenation level, with value 0 before t=0, and value 1 as the new levelafter a transition period. Equation (4.4) has a unique solution:

O(t) = 1−

(√3

3sin

(√3t

2

)+ cos

(√3t

2

))e−t/2 (4.5)

84 4.2 Modelling fMRI Time Signals

and O′(t) can be derived:

O′(t) =2

3

√3 sin

(√3t

2

)e−t/2, (4.6)

see Fig. 4.2. The first-order Taylor expansion ofO(t) states thatO(t+dt) = O(t)+O′(t)dt,so that the response to a longer stimulus can be modelled by convolving the stimu-lus with O′(t). We define a function HRFpar based on O′(t) with parameters H(eight),D(ilation), P (eriod) and L(ag):

HRFH,D,P,Lpar =

{H sin

(t−LP

)e−t+L

D , if t > L0, otherwise

(4.7)

In our tests we used H = 4, D = 6, P = 3, and L = 2, i.e.,

HRFdo(t) = HRF4,6,3,2par (t), (4.8)

to resemble other HRFs in use today (see Fig. 4.4). The subscript ‘do’ refers to ‘dampedoscillator’.

0 10 20 30time (s)

old level

new level

Figure 4.2. The change in the blood oxygenation (solid line) and its time derivative(dashed line) modelled by Eq. (4.5) and (4.6), respectively.

The other HRF model used in this chapter is the canonical HRF from the SPM pro-gram (Friston et al. 1995c), denoted by HRFspm(t), which is the sum of two Poisson func-tions. Poisson functions are often used, without motivation, to describe the HRF. APoisson function pm,l(t) with shape parameter m and dilation parameter l has the form:

pm,l(t) =lm tm−1 e−lt

Γ(m), where

Γ(m) =

∫ ∞

0

tm−1e−tdt.

(4.9)


The standard HRF in the SPM program has the form

HRFspm(t) = p6,1(t)−1

6p16,1(t). (4.10)

It is often used in combination with (i) its temporal derivative and (ii) its derivative withrespect to dilation, to capture small response variabilities within an experiment (Fristonet al. 1998a, Henson et al. 2002). In this chapter, HRFspm is used as a reference for evalu-ating our model.

4.3 Regularisation

4.3.1 Shrinkage I: the Frequency Domain

Deconvolution problems are generally ill-posed, i.e., a solution does not exist, or it is notunique, or it is not stable, and regularisation is required. For one stimulus signal, oneresponse function and additive noise, Eq. (4.2) becomes:

g(n) = (h ∗ f)(n) + e(n). (4.11)

An estimate Hest of the Fourier transform of h is given by (4.3):

Hest(k) = G(k)/F (k)

=

{H(k) + E(k)

F (k), if |F (k)| > 0

0, otherwise,(4.12)

where E(k) is the Fourier transform of e(n). Regularisation is performed by shrinkingthe estimate at frequencies where F (k) is small. Given a regularisation parameter τ andan estimate σe for the noise strength, a Wiener filter multiplies each frequency coefficientHest(k) with a factor λ:

Hλ(k) = Hest(k) λ(k), where

λ(k) =|F (k)|2

|F (k)|2 + τ Neσ2e

|H(k)|2.

(4.13)

The HRF estimate hλ(n) is the inverse Fourier transform of Hλ(k). This regularisationmethod is known as Wiener shrinkage. Wiener shrinkage minimises |hλ − h|2. WhereF (k) is large, λ(k)≈1 and where F (k) is small, λ(k)≈0 . Wiener shrinkage is the optimalmethod to remove noise from regular (smooth) signals, but signals with irregularities(such as steep edges) are handled less well. Irregularities contain high frequencies, soeither noise is not suppressed, or artifacts (such as ringing) occur (Neelamani et al. 2004).Wiener shrinkage requires the power spectrum |H(k)|2 of h, e.g., by estimating it iterat-ively (Hillery and Chin 1991). Tikhonov shrinkage, which computes λ(k) as

λ(k) =|F (k)|2

|F (k)|2 + τ 2, (4.14)

86 4.3 Regularisation

does not require h’s power spectrum. Optimal values for the regularisation parameterτ in (4.13) and (4.14) are derived from the strength of the signal and of the noise. Givena signal g of length N and mean µg and a noise strength estimate σe, optimal values forτ are in the range [0.01, 10]Nσ2

e/|g − µg|2 (Neelamani et al. 2004).

4.3.2 Shrinkage II: Wavelets and ForWaRD

If the impulse response h(n) belongs to a smoothness class (i.e., if it is regular accordingto some regularity measure (Donoho and Johnstone 1995, Cai 2003)), and if F (k) doesnot contain zeros, the irregularity in the system is caused by the noise e. This turnsthe regularisation problem into a denoising problem. ForWaRD (see Algorithm 4.1) reg-ularises the deconvolution with frequency domain shrinkage and additional waveletdomain Wiener shrinkage (Ghael et al. 1997). Wavelet domain Wiener shrinkage is avery powerful regularisation method for signals containing irregularities. It requiresan estimate of the (regular part of the) signal. ForWaRD uses the wavelet transform toobtain this estimate.

A wavelet transform describes a signal c0 as a sum of localised basis functions. Theregular and irregular part c1 and d1 are written as weighted sums of shifted and dilatedversions of a scaling function φ and an accompanying wavelet ψ, respectively. Analysis atmultiple levels is done by dividing subsequent cj into cj+1 and dj+1. The correspondinginverse wavelet transform uses cj and dj to reconstruct cj−1. The wavelet transform ofa signal is efficiently computed via the fast wavelet transform , FWT (Mallat 1989). TheFWT transforms a signal of length N in O(N) computations into a transform of lengthN . Efficiency is obtained by downsampling at each level. The FWT is not translation-invariant, making it less useful for deconvolution. A wavelet transform using poly-phase decomposition (subsample for all possible shifts) is translation-invariant. Thesize of a level J transform is (J + 1)×N , its complexity is O(N log2(N)) (Mallat 1991).This transform is denoted by SI-DWT (shift-invariant discrete wavelet transform), itsinverse by SI-IDWT.

ForWaRD applies the wavelet domain Wiener shrinkage to the estimate hλ, see (4.13)and (4.14). For smooth signals, most energy is stored in the approximation cJ , and thecoefficients of dj are small (Donoho and Johnstone 1995). Large coefficients of dj appearat irregularities in the underlying signal. The regular and irregular parts of the signalare separated: cJ and large coefficients of dj are regarded as signal, the rest is noise.Two different wavelet transforms of hλ, represented by the basis functions (φ1,ψ1) and(φ2,ψ2), respectively, are similar. ForWaRD uses this similarity as follows. A first estim-ate of is obtained by computing the SI-DWT of hλ, using (φ1,ψ1), and thresholding thedetail coefficients {dj

1(n)}Jj=1 (n=1, . . . , N/2j) to remove noise. The threshold is θσe, with

θ∈{1, 2, 3, 4}. The noise standard deviation σe is estimated using the median absolutevalue (MAD) of the first-level detail coefficients (Donoho and Johnstone 1995). This es-timate of the (wavelet) spectrum of the underlying signal is used for the second step,wavelet domain Wiener shrinkage. After computing a second SI-DWT using (φ2,ψ2), its


Given: signal g, stimulus pattern f ,wavelet basis functions (φ1, ψ1) and (φ2, ψ2)

1: g SI-DWT−−−−→ estimate σe using the MAD (see section 4.3.2)2: compute τ using σe, g (see section 4.3.1)3: g FFT−−→ G, f FFT−−→ F4: first estimate: Hest := G/F5: if {Wiener shrinkage} then6: approximate |H|2 using F, G (see (Hillery and Chin 1991))7: compute λ using τ, σe, F, |H|2 (see Eq. (4.13))8: else {Tikhonov shrinkage}9: compute λ using τ, F (see Eq. (4.14))

10: end if11: shrink: HShrink := Hest λ12: Hλ IFFT−−→ hλ

13: hλ, (φ1, ψ1) SI-DWT−−−−→ cJ1 , {dj1}J

j=1

14: hλ, (φ2, ψ2) SI-DWT−−−−→ cJ2 , {dj2}J

j=1

15: using θ, σe, dj1

threshold−−−−−→ {dj

1}Jj=1 (see section 4.3.2)

16: compute κj using dj

1, σe (see Eq. (4.15))17: shrink: dj

κ,2 := dj2κ

j

18: final estimate: (cJ2 , {djκ,2}J

j=1), (φ2, ψ2) SI-IDWT−−−−→ hκ

Algorithm 4.1: ForWaRD in pseudo-code

detail coefficients are shrunk: djκ,2(n)=dj

2(n)κj(n), where

κj(n) =|dj

1(n)|2

|dj

1(n)|2 + σ2e

. (4.15)

Here, dj

1(n) denotes dj1(n) after thresholding. The final estimate hκ is the SI-IDWT of cJ2

and {djκ,2}J

j=1.

4.3.3 Using ForWaRD to extract the HRF

Given a stimulus pattern f and an fMRI time series, ForWaRD (see Algorithm 4.1) wasused to extract an HRF in each voxel. The BOLD response is the time signal relativeto the baseline, which is usually the time series mean. Low-frequency trends that arenot synchronous to the stimulus pattern are a possible source of artifacts. These wereremoved as much as possible beforehand by a wavelet technique known from the lit-erature (Meyer 2003). Transform each time signal (with length N ) to the wavelet do-main, using an FWT of log2(N) − 3 levels; remove the detail coefficients, and subtractthe low-scale signal from the time series. The total extraction routine for each voxelis:

88 4.4 Simulation Tests

1: load the time series g and the stimulus pattern f ;2: subtract the time series mean;3: remove low-frequency trends;4: apply ForWaRD to g, estimate the HRF hκ to the stimuli with pattern f .

Time signals at different voxel locations can be processed independently, which enablespartitioning the images during extraction to reduce the computation load. Given a max-imum block size B, the time signals are read, processed, and written B at a time. Theoutput of the algorithm is a time series of volumes which, inside activated brain areas,contain the HRF. These routines were implemented in MatLab (The Mathworks, USA).The next section describes a series of tests, using simulated time series with activationsof known shape and strength. Section 4.5 presents test results on an event-related fMRIdata set.

4.4 Simulation Tests

4.4.1 Test Setup

The routine presented in section 4.3.3 was tested on signals with varying properties(SNR, temporal resolution, low-frequency trends), while the parameters of the routine(decomposition level, wavelet filters, etc.) were also varied. Figure 4.5 shows the testsetup: (i) create an activation signal with noise and a low-pass trend (Fig. 4.5a-b), (ii)recover the HRF from this noisy signal (Fig. 4.5c), (iii) reconstruct the activation signal(Fig. 4.5d), and (iv) measure the mean square error (MSE) between the original activa-tion signal and the reconstruction.

Using noise from a real time series of MR images

A sequence of 128 scans of a subject in rest (no activation) was acquired on a 3T Interasystem (Philips Medical Systems, the Netherlands), with a repetition time (TR) of 3 sand images of 64×64×46 voxels, with a voxel size of 3.5×3.5×3.5 mm3. A sample of512 time signals was collected from these images by selecting a region of 8×8×8 voxelsfrom this data set (see Fig. 4.3).

Adding simulated activation

A randomised stimulus signal was created by thresholding a vector containing randomvalues. A time signal s(n), n=1, . . . , N was made by convolving the stimulus signal withan HRF. We used two different HRFs in our tests (see Fig. 4.4): HRFdo, see Eq. (4.8) andHRFspm, see Eq. (4.10).


(a) (b) (c)

Figure 4.3. The area that was sampled to obtain MR noise time signals: (a) transverse,(b) sagittal view, (c) coronal view.

0 5 10 15 20 25 30peristimulus time (s)

i

ii

iii

iii

iv

(a) (b)

Figure 4.4. (a) The HRFs used for simulating activations: ( i) HRFdo and ( ii) HRFspm, (b)The modelled low-frequency behaviour: ( i) no trend, ( ii) linear trend, ( iii) sinusoidaltrend, ( iv) quadratic trend.

Adding low-frequency trends

Four types of low-frequency trends were tested: no trend, and a linear, sinusoidal andquadratic trend (see Fig. 4.4b). Given the standard deviations σs of the signal, σe of thenoise and σt of the trend, the noise was amplified by a factor mn so that

SNR = 10 log10

σs

mnσe

. (4.16)

had the desired value, and the trend was scaled by a factor mt so that mtσt = mnσe. Theactivation, the trend and the noise (see 4.5a) were added together at each voxel location,resulting in a noisy time signal (see Fig. 4.5b).


0 16 32 48 64 80 96 112 128

(iii)

(ii)

(i)

0 16 32 48 64 80 96 112 128(a) (b)

0 2 4 6 8 10 12 14 16 18

(i)

(ii)

0 16 32 48 64 80 96 112 128

(ii)

(i)

(c) (d)

Figure 4.5. The setup of a simulated activation test. (a) The activation signal (i), thelow-pass trend (ii) and the noise (iii), (b) the resulting signal, (c) the original ( i) andreconstructed ( ii) HRF, (d) the original ( i) and reconstructed ( ii) signal.

Reconstructing the activation signal

Each noisy time signal was processed by the ForWaRD-based extraction routine of sec-tion 4.3.3, the HRF was extracted from each time signal, and the signal was reconstruc-ted using the mean HRF. The performance of the HRF extraction routine was measuredvia the MSE between s and its reconstruction r. The following properties of the signalwere varied in the tests: (a) input SNR, (b) low-frequency trends, (c) repetition time(TR), and (d) response onset. All tests were done with both HRFdo(t) and HRFspm(t).Parameters of the routine that were varied: (a) type of frequency domain shrinkage,(b) levels of the wavelet transform, (c) threshold level in the wavelet domain, and (d)the wavelet filters for (4.15). The default test setup was as follows: SNR = 0 dB, HRF =HRFspm, no trend, TR = 2 s, onset delay = 0 s, Tikhonov shrinkage, τ = 0.1, decompos-ition level = 3, θ = 3, φ1: Daubechies-4, φ2: Daubechies-3 (Daubechies 1988). Each testvaried one of these parameters.


4.4.2 Test Results

Output MSE as a function of input SNR

Figure 4.6 shows the outcome of a test run with various input SNR values. It shows thatthe MSE decreases for input SNRs up to 5 dB, above 5 dB the MSE increases.

−5 −2 1 4 7 1010

−1

100

101

102

SNR (dB)

log(

MS

E)

Figure 4.6. log10(MSE) of the noisy (×) and reconstructed (◦) signals.

Wiener shrinkage vs. Tikhonov shrinkage, choosing τ

The number of iterations of the algorithm (Hillery and Chin 1991) to estimate |H|2 beforeapplying Wiener shrinkage (see line 6 of Algorithm 4.1) was limited to ten. Figure 4.7shows the MSE with both types of frequency domain shrinkage, for varying SNR andregularisation parameter τ . These graphs show that with heavy regularisation (τ ≥ 1),and for a low SNR, Tikhonov regularisation performs as well as Wiener shrinkage. Forthe higher SNRs and with mild regularisation, Wiener shrinkage performs better (i.e.,smaller MSE).

The best setting of τ depends on the shrinkage type, the SNR and the TR. Figure 4.7shows that for short TR, mild regularisation (τ ≤ 0.1) yields the best results. A long TRrequires heavy shrinkage, and Wiener shrinkage outperforms Tikhonov shrinkage.

Different response delays

For most fMRI analysis techniques such as analysis of variance (ANOVA) and analysisof covariance (ANCOVA), temporal alignment is very important. To correct for smallsynchronisation errors in the response onset, a temporal derivative of the HRF is some-times included in the model (Friston et al. 1998a). Figure 4.8 shows that HRFs withdifferent onset delays can equally well be extracted with ForWaRD, which does not usesuch derivatives. The MSE hardly changes with different delays, indicating that the


−2 0 2 4 6

2

3

SNR (dB)

MS

E

−2 0 2 4 6

2

3

SNR (dB)

MS

E

−2 0 2 4 6

2

3

4

5

6

SNR (dB)

MS

E

−2 0 2 4 6

2

3

4

5

6

SNR (dB)

MS

E

Figure 4.7. Output MSEs of Tikhonov (left) and Wiener (right) shrinkage with a TRof 0.5 s (top) and 3 s (bottom), for different input SNRs and a varying regularisationparameter τ . ×: τ = 0.01, ◦: τ = 0.1, �: τ = 1, ∗: τ = 1.

shape of the response is preserved. This is an attractive alternative to other delay cor-rection methods. The increased MSE for negative shifts is caused by the fact that theHRF was only sampled in the post-stimulus interval.

Extractability of HRFs

We compared the extraction of HRFdo(t) vs. HRFspm(t). Figure 4.9 shows that witha TR of 0.5 s, HRFdo is better reconstructible, and with a TR of 3 s, HRFspm is betterreconstructible. The graphs indicate that the reconstructibility of the HRF is dependsheavily on the temporal resolution.

Different wavelet filters

We tested 15 different wavelet filters for (φ1,ψ1), as well as for (φ2,ψ2): Daubechies wave-lets 1 . . . 5 (the filter number indicates the number of vanishing moments), Daubechies’


−2 −1 0 1 2

1

2

3

SNR (dB)

MS

E

−2 −1 0 1 2

1

2

3

SNR (dB)

MS

E

(a) (b)

Figure 4.8. Output MSEs with varying response onset delays, for Tikhonov (a) andWiener (b) shrinkage. SNR = -2 dB (×), 0 dB (◦), 2 dB (�), 4 dB (∗), and 6 dB (+).

−2 0 2 4 6

1

2

3

4

5

SNR (dB)

MS

E

−2 0 2 4 6

1

2

3

4

5

SNR (dB)

MS

E

(a) (b)

Figure 4.9. Output MSE for different SNRs and different HRFs, (a) TR = 0.5 s, (b) TR =3 s. ×: HRFdo, ◦: HRFspm.

symmetric wavelets 2 . . . 6 (Daubechies 1993) (filter 1 corresponds to the Daubechies-1filter), and Coiflets 1 . . . 5 (Daubechies 1993). Different filters did not yield large differ-ences in performance.

Decomposition level and noise threshold

The wavelet-domain threshold level θ also influences the output MSE. Figure 4.10 showsthe MSE for different SNRs, different θ and different decomposition levels. We find thattwo-level decompositions produce the smallest errors for the lower SNRs, and three-

94 4.5 Event-Related fMRI Experiments

level decompositions perform best for the higher SNRs (see Fig. 4.10). Four-level andfive-level decompositions yield higher errors. A higher θ often produces a lower MSE.

−2 0 2 4 6

0

1

2

SNR (dB)

MS

E

−2 0 2 4 6

0

1

2

SNR (dB)M

SE

(a) (b)

Figure 4.10. Output MSE for different SNRs, with various levels of decompositionand threshold levels. Wiener shrinkage was used, θ = 2 (a) and θ = 3 (b). Wavelettransforms: two-level (×), three-level(◦), four-level (�), five-level(∗).

Different low-frequency trends

Four different types of low-frequency trends were tested (see Fig. 4.4b). We observethat only the type of frequency domain shrinkage influences the result significantly (seeFig. 4.11). Tikhonov shrinkage yields a higher MSE than Wiener shrinkage, especiallyfor lower SNRs. This may be because trends are not removed perfectly; Wiener shrink-age has extra knowledge about the signal f and the estimated power spectral density ofh, which enables it to deal with the residuals of the trend.

4.4.3 Conclusions

As shown in Fig. 4.6, the MSE of the reconstructed signal was lower than the input MSEin most of the tested situations. The method is quite robust with respect to changes ofparameter settings and changes of signal properties, such as the SNR and the samplefrequency.

4.5 Event-Related fMRI Experiments

We demonstrate the HRF extraction routine of section 4.3.3 and the HRF model of sec-tion 4.2.3 in two event-related fMRI experiments of one subject, measured on different


−2 0 2 4 6

1

2

3

4

SNR (dB)

MS

E

−2 0 2 4 6

1

2

3

4

SNR (dB)

MS

E

(a) (b)

Figure 4.11. Output MSE for different SNRs and low-pass trends in the data, usingTikhonov shrinkage (a) and Wiener shrinkage (b). The signals contained no trend (×),a linear trend (◦), a sinusoidal trend (�), or a quadratic trend (∗).

days. We compute HRFs for the whole brain and in a region of interest, respectively,which are then used in covariance analyses.

4.5.1 Fixed-ISI experiment

In this experiment, the subject had to make a fist on the appearance of a visual stimulus,and immediately relax. Stimuli were presented on a white screen placed inside thescanner: a white disc was shown as the default, a red disc was a cue to make a fist. Theexperiment consisted of 156 scans, acquired as described in section 4.4. Cues were givenevery 24 s (8 scans× 3 s), starting at scan 2. Expected areas of increased activity were themotor cortex, the premotor cortex, the supplementary motor area and the cerebellum.The first part of this experiment tested the detectability of an HRF in those areas.

Detecting activation

The scanned brain volumes were denoised with a wavelet-based technique (Wink andRoerdink 2004), using SUREShrink in the wavelet domain (Donoho and Johnstone 1995).Realignment, normalisation, and statistical analysis were done with the SPM program (Fristonet al. 1995c). We made a statistical parametric map of all responses synchronous with thestimulus pattern, using a design matrix X containing a constant signal, modelling thetime series mean for each voxel, and a set of 6 Fourier basis functions (3 sines, 3 cosines),modulated by a Hanning window, in the time interval of 8 scans after each stimulus.The variance ratio was computed in each voxel to test the amount of variance explainedby the design. The variance ratio is the ratio of the variance explained by the model (inthis case, the Fourier basis functions) and the variance of the residual (noise), as com-


puted by the linear model (Josephs and Henson 1999). Significant points were selectedusing an F -test, and false discovery rate (FDR) control (Genovese et al. 2002) with theFDR parameter q=0.05 was used to correct for multiple hypothesis testing. Activationmaps are shown in Fig. 4.12a as maximum intensity projections (MIP) in the orthogonaldirections. The voxel location with the most significant activation is marked with a ‘<’sign. We found activation in all expected areas, predominantly in the motor cortex.

(a) (b)

Figure 4.12. Maps of the variance ratio in the transverse and coronal direction, respect-ively, for the fixed-ISI (a) and the random-ISI (b) experiments, thresholded with FDRcontrol for q=0.05. The indicated areas are: left motor cortex(1), supplementary motorarea (2), premotor areas (3), right cerebellum (4).

Extracting the HRF

To evaluate the performance of the ForWaRD method, HRFs were extracted by ForWaRDand selective averaging, respectively. Given a long ISI, selective averaging (Buckner etal. 1996) is a simple and robust technique to obtain the HRF. Selective averaging yieldeda time series of eight volumes containing the averaged post-stimulus activation at eachvoxel location. A much stricter FDR-corrected threshold (q=0.0001) was applied to themap shown in Fig. 4.12a, to limit the number of voxel locations contributing to the HRF(see Fig. 4.13). A whole-volume HRF was extracted from the post-stimulus volumes byaveraging the response of each volume, weighted by the map of significant varianceratio values (see Sec. 4.5.1). A region-specific HRF in the 7×7×7-voxel neighbourhoodof a selected voxel (see the crosshairs in Fig. 4.13) was computed by using only the timesignals from that region. Figure 4.14a-b shows the extracted HRFs.

The ForWaRD algorithm used 128 scans of the experiment, starting with scan 2 (firststimulus). The resulting post-stimulus time series was used to create a whole-volume


(a) (b) (c)

Figure 4.13. Maximum intensity projections of the variance ratio in the fixed-ISI timeseries, after FDR-corrected thresholding with q=0.0001: transverse view (a), sagittalview (b), coronal view (c). The crosshairs show the selected voxel location.

HRF and a region-specific HRF in the same way as with selective averaging. The HRFsare shown in figure 4.14c-d. The HRFs extracted by ForWaRD are similar to those ex-tracted by selective averaging, with the difference that the baseline of the ForWaRD-extracted HRF appears to decrease. This may be explained by the fact that the HRFdoes not return to baseline within the sampled interval, so that in the GLM the responsedecreases at every next stimulus. A modelled HRF was obtained by fitting HRFpar tothe extracted HRFs with the Levenberg-Marquardt nonlinear curve-fitting algorithm.We compared the L2-difference between the extracted coefficients and the values of themodelled HRF at the sample points. A standard HRF, in this case HRFspm sampled at thesame points as the other signals, was used as a reference. All signals were normalisedto have a unit L2-norm. Table 4.1 shows that the fitted HRFpar matches the measuredsignal much better than the standard HRFspm.

Table 4.1. L2 differences of the HRF models and the HRFs extracted from the fMRIdata sets.

fixed ISI random ISIselective averaging ForWaRD ForWaRDvolume region volume region volume region

|hκ − HRFpar| 0.008 0.025 0.049 0.093 0.018 0.015|hκ − HRFspm| 0.264 0.445 0.432 0.701 0.095 0.166

4.5.2 Random-ISI experiment

We repeated the experiment with a randomised interstimulus interval (ISI). The stimu-lus signal was made by thresholding a vector of random values. The length of the ex-


−1 0 1 2 3 4 5 6peristimulus scan no.


(a) (b)



(c) (d)

Figure 4.14. Top row: HRFs extracted from the fixed-ISI time series produced by select-ive averaging. (a) whole-volume HRF, (b) region-specific HRF. Bottom row: HRFs ex-tracted from the fixed-ISI time series produced by ForWaRD. (c) whole-volume HRF, (c)region-specific HRF. ×: extracted HRF, solid line: function HRFpar fitted to ×, dashedline: function HRFspm.

periment was 256 scans, the scanning parameters and image preprocessing steps wereequal to those in section 4.4.

Detecting activation

Because of the overlapping responses, the windowed Fourier basis set could not be usedto measure the explained variance. Instead, the canonical HRF of the SPM program andits time and dilation derivatives were used as basis functions in the GLM. The varianceratio map was thresholded with FDR control and q=0.05, resulting in the activation mapshown in Fig. 4.12b. The activation is more localised than in the fixed-ISI experiment,and there is more contrast between the regions of interest and the rest of the brain.


Extracting the HRF

A post-stimulus time series, containing the HRF at each voxel location, was createdwith ForWaRD, using all 256 scans of the experiment. Selective averaging could notbe used here because of the overlapping responses. Thanks to the random ISI, a muchlonger post-stimulus interval could be sampled (Burock et al. 1998). The post-stimulusvolumes produced by the extraction routine were used to create a whole-volume HRFand a region-specific HRF, respectively, in the same way as in the fixed-ISI experiment.The volume thresholded with q=0.001 and the selected voxel for the region-specific HRFare shown in Fig. 4.15. Instead of using one function HRFpar to describe the HRF, two of

(a) (b) (c)

Figure 4.15. Maximum intensity projections of the variance ratio in the random-ISItime series thresholded with q=0.0001: transverse view (a), sagittal view (b), coronalview (c). The crosshairs show the selected voxel location.

these functions were used: one to model the initial peak, and one to model the under-shoot following the peak. In the fixed-ISI experiment we did not use a function to modelthe undershoot, because the post-stimulus interval was to short to make a reliable fit.Figure 4.16 shows the extracted HRF coefficients, together with the modelled HRFs.

The L2-differences between the measurements and the fits were computed for bothextracted HRFs, in the same way as for the fixed-ISI HRFs. The standard HRFspm againshows a greater difference from the measurements then the fitted HRFpar.

4.5.3 Using the extracted HRFs in covariance tests

HRFs measured from an fMRI data set cannot be used to test for activation in the samedata set: a model must be specified a priori, and inferences cannot be made from modelsthat are determined by the data itself. Therefore, we tested for activation in the random-ISI experiment with the HRFpar fitted to the points extracted from the fixed-ISI data, andvice versa. The cross-covariance was computed between the responses, predicted by thestimulus times and the modelled HRFs, and the measured time signals. A one-samplet-test on the covariance map was used to detect activated areas, using the residual time


−1 0 1 2 3 4 5 6 7 8 9 10111213peristimulus scan no.


(a) (b)

Figure 4.16. HRFs extracted from the random-ISI experiment by ForWaRD: whole-volume (a) and region-specific (b). ×: extracted HRF, solid line: two functions HRFpar

fitted to ×, dashed lines: function HRFspm.

signal, i.e., e in (4.1), to obtain unit variance. FDR control with q=0.05 was used tocorrect for multiple hypothesis testing.

The first covariance analysis was performed with HRFspm. Figure 4.17 shows thecovariance maps from both experiments. Differences can be seen between Figs. 4.17a-b,

(a) (b)

Figure 4.17. Maximum intensity projections of the cross-covariance between the timesignals and the modelled responses, using HRFspm. (a) fixed-ISI experiment, (b)random-ISI experiment.

most notably the large active region in the left motor cortex, detected in the fixed-ISI ex-periment and showing up only faintly in the random-ISI experiment. The motor cortex


and the supplementary motor area show less activation in the random-ISI experiment,whereas the cerebellum and the premotor areas are more pronounced. These maps, to-gether with the graphs of Figs. 4.14 and 4.16 indicate that the HRF measurable in therandom-ISI experiment differs significantly from HRFspm.

(a) (b)

Figure 4.18. Maximum intensity projections of the cross-covariance found in the fixed-ISI data set, using the HRFs computed from the random-ISI experiment by ForWaRD.(a) using the whole-volume HRF, (b) using the region-specific HRF.

The covariance maps constructed with the extracted HRFs are shown in Figs. 4.18and 4.19. Figures 4.18a-b show the activation detected in the fixed-ISI dataset with theHRFs extracted with ForWaRD from the random-ISI dataset. The differences between (a)and (b) are much smaller than those between Fig. 4.17a-b, and like in Fig. 4.17a, the de-tected activation matches our expectation. Figure 4.19 shows the activation detected inthe random-ISI dataset with the HRFs from the fixed-ISI dataset. The analysis was donewith the HRFs made after selective averaging (a-b) and ForWaRD (c-d). The maps arein very good agreement: for the experiments in which selective averaging is possible,ForWaRD yields results very similar to those produced by selective averaging. The dif-ferences for ForWaRD between the fixed-ISI and random-ISI time series are also small(selective averaging is not possible with a random ISI), indicating that ForWaRD workson the random-ISI data set as well. Table 4.2 shows the maximum values for the vari-ance ratio in the covariance analyses. A high variance ratio indicates that much of thevariance in the signal is explained by the model, and that the residual noise in the GLM(see Eq. (4.1)) is small. It shows that ForWaRD works as well on the random-ISI datasetas it does on the fixed-ISI data set. Its performance is similar to that of averaging onthe fixed-ISI dataset. The modelled region-specific HRFs generally perform better thanwhole-volume HRFs, and the maps of detected activation indicate that the modelledHRFs do not only detect activation in the region from which they were extracted, but

102 4.6 Conclusions

(a) (b)

(c) (d)

Figure 4.19. Maximum intensity projections of the cross-covariance found in therandom-ISI data set, using the HRFs computed from the fixed-ISI experiment by select-ive averaging (a-b) and ForWaRD (c-d). Left: whole-volume HRF, right: region-specificHRF.

that they are general enough also to detect activation in other areas.

4.6 Conclusions

We have developed a deconvolution method to extract the HRF from fMRI time seriesbased on ForWaRD. Deconvolution in the frequency domain allows extraction of theHRF even when the responses to subsequent stimuli overlap, and the sensitivity to noiseof frequency-domain deconvolution is compensated by Wiener or Tikhonov shrinkage


Table 4.2. Maximum variance ratio values found in the tests.

ForWaRD selective averagingvolume region volume region HRFspm

fixed-ISI 113 162 – – 117random-ISI 103 102 102 104 74

in the frequency domain, followed by wavelet domain Wiener shrinkage. Before apply-ing ForWaRD, low-frequency trends are removed from the time signal with a standardwavelet-based method. Tests of the extraction routine using noise from a real fMRItime series and simulated activation, demonstrate its robustness. Test results show thatthe method is robust to trends in the data, and the performance does not differ muchbetween the noise levels we tested. The output of our algorithm is a post-stimulus timeseries, representing the HRF in every voxel.

We have presented a model for the HRF based on damped oscillations that can beused in combination with the extracted coefficients, to predict event-related fMRI re-sponses. An HRF using this model is compared with the standard HRF from the SPMprogram and shows a better match with extracted responses. A comparison of statisticalanalyses with (i) the standard HRFspm and (ii) an HRF based on HRFpar using the coef-ficients extracted from another experiment with the same subject, shows the benefits ofHRF modelling. With the modelled HRF, detected regions are larger, and the statisticalanalysis is more powerful than with the standard HRFspm. At present, the extractionmethod is capable of recovering one HRF from one time series. A possible extension ofthe method is the extraction of multiple HRFs from one or multiple experiments.

Chapter 5

Extracting the Haemodynamic ResponseFunction from fMRI Time Series usingFourier-Wavelet RegularisedDeconvolution with Orthogonal SplineWavelets

AbstractWe describe a method to extract the haemodynamic response function (HRF) from func-

tional magnetic resonance imaging (fMRI) time series based on Fourier-wavelet regular-ised deconvolution (ForWaRD). The algorithm presented here is an extension of an earlierForWaRD-based extraction method. We introduce the computation of shift-invariant dis-crete wavelet transforms (SI-DWT) in the frequency domain, and apply ForWaRD usingorthogonal spline wavelets. The extraction of subject-specific HRFs is demonstrated, as wellas the use of these HRFs in a subsequent brain activation analysis. Temporal responses aremodelled by using the extracted HRF time signals in a new model for fMRI time signals.The resulting activation maps show the effectiveness of the proposed method.

5.1 Introduction

Functional magnetic resonance imaging (fMRI) is a versatile method for functional neuroima-ging. Regional brain activation induces changes in blood oxygenation, generating ablood oxygenation level dependent (BOLD) contrast in MR images (Ogawa et al. 1990).An important tool for fMRI analysis is statistical hypothesis testing, where the fMRI sig-nal is predicted using the stimulus pattern and a response model. Statistical parametricmapping (Friston et al. 1995c) uses a model of the noise (Gaussian), and hypothesis testsare based on the parameters of the model. This chapter presents a method to extractthe haemodynamic response function (HRF) from fMRI data. The method is based onFourier-wavelet regularised deconvolution, ForWaRD (Neelamani et al. 2004), using or-thogonal spline wavelets.

106 5.2 SPM, Wavelets, and ForWaRD

ForWaRD combines the advantages of deconvolution in the frequency domain andregularisation in the combined frequency and wavelet domains, see Fig. 5.1. The out-put is given as a time series of image volumes, containing the HRF at each voxel loc-ation. Compared to other HRF extraction methods (Glover 1999, Miezin et al. 2000),the method requires only a few assumptions: the stimulus-response relation should belinear and time invariant (LTI), and the signal should be distinguishable from noise inthe Fourier and/or wavelet representation. The use of ForWaRD to extract the HRF istreated in more detail in chapter 4.

The extraction method given in chapter 4 is extended by using a novel frequency-domain implementation of the shift-invariant discrete wavelet transform (SI-DWT). Itmay be efficient to compute the wavelet transform in the frequency domain (Westenbergand Roerdink 2000), as is the case for orthogonal spline wavelets, which do not havecompact support.

The remainder of this chapter is organised as follows. Section 5.2 introduces somefMRI and wavelet terminology. Then in section 5.3 we treat the computation of the SI-DWT in the frequency domain, and section 5.4 presents the spline wavelets we used. Insection 5.5, fMRI experiments are analysed using the proposed extraction method. Insection 5.6 we present our conclusions.

5.2 SPM, Wavelets, and ForWaRD

5.2.1 SPM

Statistical parametric mapping (SPM) is the standard fMRI analysis tool. It assumes thetemporal noise to be independent, identically distributed and Gaussian. SPM consistsof the following steps: (i) estimate the parameters of the noise, (ii) compute a statistic atevery voxel location, (iii) threshold the statistic values using the noise parameters anda multiple testing correction method. Assuming a linear, time invariant (LTI) stimulus-response relation, an fMRI data set of T time samples in N voxels is modelled as:

Y [T×N ] = X [T×M ]β[M×N ] + e[T×N ]. (5.1)

Y represents the fMRI data, X is the design matrix ofM explanatory variables (modelledeffects), β contains the weights of each of these variables, and e contains the residuals,i.e., the unmodelled part of the signals. Each column of X contains the modelled re-sponse to the stimuli of one type. An LTI response to one type of stimulus is given bya convolution of the time pattern of the stimuli and the appropriate HRF. A good HRFmodel is critical to the success of the estimation based on (5.1), because inappropriatemodelling will lead to a non-Gaussian distribution of the values in e.

Extracting the HRF using ForWaRD with orthogonal spline wavelets 107

5.2.2 Wavelets

A wavelet transform describes a signal c0 as a sum of basis functions. Given a scalingfunction φ and a basic wavelet ψ, c0 is split into an approximation (smooth) part c1 and adetail part d1, which are weighted sums of shifted dilates of φ and ψ, respectively. Multi-level transforms divide subsequent cj into cj+1 and dj+1, j = 0, 1, . . .J, J∈N. The inversetransform recursively uses cj and dj to reconstruct cj−1. An efficient algorithm for com-puting these wavelet transformations is the fast wavelet transform, FWT (Mallat 1989),which subsamples cj and dj at each level. It requires O(N) computations for a signal ofsizeN . However, the FWT is not shift-invariant, and is not useful for deconvolution. Us-ing polyphase decomposition (subsample for all possible shifts) instead of subsamplingthe undisplaced filtered signal yields a shift-invariant discrete wavelet transform (SI-DWT), and its inverse (SI-IDWT). The output of a level L transform has size (L+1)×N ,and the complexity is O(N log2[N ]) (Mallat 1991).

Input: a noisy signal g, a stimulus pattern f ,wavelet bases (φ1, ψ1) and (φ2, ψ2)

1: f FFT−−→ Fg FFT−−→ G

2: Hfreq := G/F3: G, Hfreq Wiener/Tikhonov−−−−−−−−−→ HfreqShrink

4: HfreqShrink IFFT−−→ hfreq

5: hfreq, (φ1, ψ1) SI-DWT−−−−→ cL1 , d11 . . . d

L1

hfreq, (φ2, ψ2) SI-DWT−−−−→ cL2 , d12 . . . d

L2

6: d11 . . . d

L1

threshold−−−−−→ d1

1 . . . dL

1

7: d1

1 . . . dL

1 , (d12 . . . d

L2 ) wavelet Wiener−−−−−−−−→ (d1

2 . . . dL2 )Wiener

8: cL2 , (d12 . . . d

L2 )Wiener SI-IDWT−−−−→ h

Algorithm 5.1: The ForWaRD algorithm.

5.2.3 ForWaRD

Using the LTI model of section 5.2.1, an HRF can be extracted from an fMRI time seriesby deconvolving the measured time signals with the stimulus pattern. In the frequencydomain, deconvolution can in principle be done via pointwise division (Fourier inver-sion). However, noise is amplified at frequencies where the signal is small, introducinginstablitity: small changes in the inputs induce large changes in the output. Regularisa-tion suppresses the destabilising effects. If the destabilising factor is the noise, regular-isation is tantamount to denoising.

A common regularisation technique for frequency domain deconvolution is shrink-ing the frequency coefficients after inversion. The signal of interest is usually smooth(low-frequency) and noise is usually erratic (high-frequency). Two familiar shrinkage

108 5.3 Computing the SI-DWT in the frequency domain

0 32 64 96 128 160 192 224 256

i

ii

iii

iv

0 8 16 24 32

i

ii

iii

(a) (b)

Figure 5.1. (a) Inputs and output of the GLM: ( i) the impulse response function, ( ii)the stimulus pattern, ( iii) the total response without noise, ( iv) the noisy response.(b) Various stages of the ForWaRD algorithm: ( i) after frequency domain inversion,( ii) after frequency domain shrinkage, ( iii) after subsequent wavelet domain Wienerfiltering.

methods are Wiener shrinkage and Tikhonov shrinkage (Neelamani et al. 2004). Non-smooth parts of signals (such as steep edges) are not efficiently represented in the fre-quency domain, because they contain much high-frequency energy. As a result, noiseat those frequencies is not shrunk. Using more shrinkage to remove noise introducesartifacts, such as ringing.

The ForWaRD method applies Fourier inversion, Fourier shrinkage, and wavelet-domain Wiener shrinkage. Wiener shrinkage reduces the magnitude of wavelet coeffi-cients at indices where the true signal is weak, and preserves those coefficients wherethe true signal is strong. The true signal is unknown, so ForWaRD uses two wavelettransforms of a signal: one transform with basis functions (φ1, ψ1) to estimate the truesignal by thresholding detail coefficients, and another transform with basis functions(φ2, ψ2), whose detail coefficients dj

2, j = 1. . .J are shrunk. ForWaRD uses the SI-DWT toensure shift-invariance. ForWaRD can be used to extract an HRF h from a noisy signalg, assuming g to be the convolution of the stimulus pattern f with h, plus noise e. Thisis denoted as: g = f ∗ h+ e. The algorithm is summarised in Algorithm 5.1.

5.3 Computing the SI-DWT in the frequency domain

ForWaRD requires the SI-DWT, which was implemented in the time domain (Mallat1991). Spline wavelets (Unser and Blu 2000) can be computed most efficiently in thefrequency domain. An FWT subamples at each level (taking samples {0, 2, . . . , N −2}), whereas an SI-DWT first does a polyphase decomposition (for Q phases: takesamples {0, Q, . . . , N−Q}, {1, Q+1, . . . , N−Q+1}, . . . , {Q−1, 2Q−1, . . . , N−1}), filters


each phase separately, and then does a monophase reconstruction, i.e., it combines allphase signals back into one signal. It is possible to do this in the frequency domain. Asdescribed by Rioul and Duhamel (1992) and Vetterli and Herley (1992), subsampling bya factor Q (using only phase 0) in the frequency domain separates the frequencies intoQ consecutive blocks and computes an average block. Computing phases {1, . . . , Q−1}requires (i) a shift to each block to sample coefficients from the right block {0, . . . , Q−1},(ii) a shift inside the block to sample the right coefficient {0, . . . , N/Q−1}, and (iii) aver-aging the Q blocks. The monophase reconstruction in the frequency domain distributesthe phases again across the frequency blocks by, for each phase, applying the appropri-ate shifts to each block and concatenating the blocks. The signals created for each phaseare added together to reconstruct one signal in the frequency domain. Note that a shiftof k places in an N -point signal is a multiplication by e2πik/N in the frequency domain.

While filtering is cheap in the frequency domain, polyphase decompositions arequite costly, mainly because of the many multiplications required for shifting. We haveoptimised the SI-DWT in the frequency domain towards processing a matrix of many1D signals, such as a time series of large images. The imaginary exponentials for theshifts, which are equal for all voxel locations, are precomputed once for all signals, andaccess to the phases and frequency blocks is accelerated by changing the dimensionalityof the (initially 1D) signals according to the number of phases.

5.3.1 Efficient computation of the frequency-domain SI-DWT

The frequency-domain implementations of the SI-DWT and the SI-IDWT, respectively,are given in Algorithm 5.2. The input parameters of the transform are the Fourier trans-forms of (i) a signal s, and (ii) the wavelet filters h and g. Analogous to the FWT inthe frequency domain, the SI-DWT produces the frequency representations of the ap-proximation and the detail channels (the SI-DWT in the time domain can be obtainedby applying the IFFT to these signals). The frequency-domain SI-IDWT transforms theapproximation and the detail channels back into the Fourier transform of s.

Computing the complex exponentials required for the polyphase and monophasetransforms, respectively, is a costly step. The application described here uses a largenumber of 1D transforms with the same parameters, and computation time is reducedby computing those exponentials (step 1 of Algorithm 5.2) only once and reusing them.The exponentials required for the polyphase decomposition are given in Eq. (B.14) ofthe appendix. To further speed things up, the product of every required combinationof exponentials in (B.14) is stored in a lookup table. The exponentials required for themonophase reconstruction, given in (B.15), are also stored in a lookup table.

According to (B.14), for each phase q (whose signal has length N/Q), each frequencyblock `, ` = 0. . .Q − 1 is multiplied by e(2πi`q)/Q (to shift to the right frequency blockfor sampling) times e(2πikq)/N (to select the right phase) and then the blocks are aver-aged. If the complex exponentials are available, the polyphase decomposition amountsto adding a number of pointwise vector products. Using the combined shifts from the

110 5.3 Computing the SI-DWT in the frequency domain

lookup table, the number of multiplications is considerably reduced. The frequencydomain polyphase decomposition requires more computations than the time-domainversion (the signal needs to be processed Q times), but for long filters this is worth theeffort: the convolution step requires much less time than in the time domain (see alsosection 5.3.2).

More efficiency is obtained by changing the dimensionality of each 1D signal beforethe decomposition. A length-N signal is split into Q frequency blocks of length N/Q.After the polyphase decomposition these blocks contain the Q phases of length N/Q.The monophase reconstruction in the frequency domain also uses the polyphase de-composition and the monophase reconstruction. The gain in speed here is also obtainedby precomputing the shift exponentials of (B.15), and by changing the dimensionalityof the signal.

The forward transform performs the polyphase decomposition, filters the phases,and then performs the monophase reconstruction for each level of decomposition, andthen doubles the number of phases Q. The filters are subsampled by a factor 2 in thefrequency domain. This corresponds to extracting the N/2-point Fourier transform of asignal from the N -point Fourier transform (Westenberg and Roerdink 2000).

The inverse transform starts with the number of phases at the highest level of de-composition, and Q is divided by two after each reconstruction step. The filters cannotbe subsampled progressively because the reconstruction starts at the coarsest level. Fastaccess to the subsampled filters is achieved by changing the dimensionality of the fil-ters. To subsample by a factor 2j , the N -point filter is split into N/2j blocks of length 2j ,and the first element of each block is used. After filtering, the 1D signal is restored.

5.3.2 Computation times: spline wavelets

The algorithm for computing the SI-DWT in the frequency domain is computationallyintensive. For compact support filters, the time-domain computation is more efficient.When the wavelet basis functions either have exponential decay instead of compactsupport or are defined directly in the frequency domain, the transform proposed inAlgorithm 5.2 above may be more efficient than the time-domain computation. Thetime-domain versions of the transforms use convolution, of which the computation timeincreases with the signal length as well as with the filter length. The length of waveletfilters that do not have compact support increases with the signal length N , so the com-plexity of the convolutions is O(N2). In the frequency domain the complexity is O(N),because time-domain convolution is pointwise multiplication in the frequency domain.We compared the computation times of the time-domain version and the frequency-domain version of the SI-DWT, by computing the three-level SI-DWT of signals vary-ing in length, using the symmetric orthogonal cubic spline wavelet basis (Mallat 1989).For each length, 100 signals were transformed and reconstructed. Figure 5.2 shows thecomputation times for signals of varying lengths. The graphs show that the frequency-domain implementation is faster for signals of more than 64 points. Therefore, the


frequency-domain SI-DWT is preferred for long filters (like orthogonal spline wavelets),and the rest of this chapter uses the frequency-domain implementation.

8 16 32 64 128 256 512

0

0.5

1

1.5

signal length

time

(s)

Figure 5.2. Computation times of the time-domain SI-DWT and SI-IDWT and thefrequency-domain SI-DWT and SI-IDWT, respectively, with symmetric orthogonal cu-bic spline wavelet basis functions. Time domain: ×: SI-DWT, ◦: SI-IDWT. Frequencydomain: �: SI-DWT, ∗: SI-IDWT.

5.4 ForWaRD using spline wavelets

The efficient frequency-domain implementation of the SI-DWT inside ForWaRD facil-itates the use of spline wavelets in the HRF extraction routine. We use orthonormalsplines to preserve the signals’ energy during the transform. An implementation offractional splines (Unser and Blu 2000) was used to generate the wavelet basis functions.Examples of spline wavelet basis functions are shown in Fig. 5.3. Our new version ofForWaRD with frequency-domain SI-DWT and orthonormal spline wavelets was usedin the extraction program. In this chapter, we used causal splines with degree α = 4 for(φ1, ψ1) and anticausal splines with degree α = 3 for (φ2, ψ2).

5.5 Event-Related fMRI Experiments

The HRF extraction routine was used in the analysis of two event-related fMRI exper-iments of one subject, measured on different days. The subject had to make a fist atpresentation of a visual stimulus, and then immediately relax.

Stimuli were presented on a white screen inside the scanner. A white disc was shownas the default, a red disc was the cue to make a fist. One experiment was performedwith a fixed interstimulus interval (ISI) and one with a randomised ISI. Realignment,


y=0

y=0

x=−

2.5

x=2.5

y=0

y=0

x=−

2.5

x=2.5

y=0

y=0

x=−

2.5

x=2.5

(a) (b) (c)

Figure 5.3. Orthonormal quartic (α=4) spline wavelet basis functions : (a) causal, (b)anticausal, (c) symmetric. Top: scaling function φ, bottom: wavelet ψ.

normalisation, and statistical analysis were done with the SPM program (Friston et al.1995c). Denoising was done with a wavelet-based technique (Wink and Roerdink 2004).We computed HRFs for the whole brain and in a region of interest, respectively, whichwere then used in covariance analyses to test for activation.

5.5.1 Fixed-ISI Experiment

The fixed-ISI data set consisted of 156 volumes of 64×64×46 voxels with size 3.5× 3.5×3.5 mm3. Cues were given every 24 s (8 scans × 3 s) starting at scan 2. HRFs wereextracted by our method, and also by selective averaging (Dale and Buckner 1997), asimple and robust extraction method when the ISI is long. A first statistical analysiswas done to detect activation synchronous to the stimuli. We used a design matrix witha set of 6 Fourier basis functions (3 sines, 3 cosines), modulated by a Hanning window,in the time interval of 8 scans after each stimulus, so as not to impose shape asumptionson the HRF. An SPM{F} resulting from an F -test was computed, using false discoveryrate (FDR) control (Genovese et al. 2002) with q = 0.05 for multiple hypothesis testing.With both ForWaRD (using 128 of the 156 scans) and selective averaging, we computed awhole-volume HRF and a regional HRF in a 7×7×7-voxel region with high activity (seethe region indicated by a ’<’ in Fig. 5.4a). The post-stimulus volumes were multipliedby the F -values in the map, after thresholding with an FDR-parameter q = 0.0001, andaveraged over the volume/region. Figures 5.5a-b show the HRFs. Selective averagingshows better results than ForWaRD, which remains below baseline in the ISI. This maybe because the real HRF does not return to baseline within the measured interval, soin the LTI model the response decreases at every next stimulus. This results in an HRFwith a lower baseline.


(a) (b)

Figure 5.4. SPM {F} of the fixed-ISI experiment (a), SPM {F} of the random-ISI ex-periment (b).

5.5.2 Random-ISI Experiment

Stimulus times for this experiment were random and the length of the random-ISI dataset was 256 scans, the other parameters were unchanged. Post-stimulus image volumeswere produced by ForWaRD. Due to response overlap, neither selective averaging northe Fourier basis set could not be used. The design matrix X was made by convolvingthe stimulus signal with the canonical HRF from the SPM´99 program (Friston et al.1995c) and its time and dilation derivatives. HRFs were made from the SPM{F} (seeFig. 5.4b) and the post-stimulus volumes, see Fig. 5.5c. The regional HRF correspondsmost to the previously extracted HRFs. Both HRFs return to baseline within the post-stimulus interval.

−1 0 1 2 3 4 5 6 peristimulus scan no.

−1 0 1 2 3 4 5 6 peristimulus scan no.


(a) (b) (c)

Figure 5.5. HRFs extracted from the fixed-ISI data set by selective averaging (a) and byForWaRD (b), and from the random-ISI time series by ForWaRD (c). ×: whole-volume,◦, region-specific.


5.5.3 Using the extracted HRFs in activation tests

A covariance test was done on the random-ISI data using a model based on the fixed-ISIHRF, and vice versa. Extracted HRFs cannot not be used for covariance tests on the samedata set: a model must be specified a priori, and inferences cannot be made from modelsthat are determined by the data. We modelled the HRFs by fitting one function

fH ,D,P ,L(t) =

{H sin( t−L

P) e

−t+LD , if t > L

0, otherwise(5.2)

to the HRFs extracted from the fixed-ISI data, and two such functions were used to theHRFs from the random-ISI data.

Function (5.2) models a damped oscillator with parametersH(eight), L(ag), P (eriod),and D(ilation), which is a plausible model for a delayed response such as the BOLD sig-nal. With the HRFs from the random-ISI experiment we use two such functions: one

(a) (b)

Figure 5.6. The modelled HRFs for the covariance test, with the coefficients from thefixed-ISI experiment (a) and the random-ISI experiment (b). Solid lines: regionallydetermined HRFs, dashed lines: whole-volume HRFs.

to model the peak and one to model the undershoot. The fixed-ISI HRFs did not haveenough points to model the undershoot. The fitted functions were used to build thedesign matrices. The maps in Fig. 5.7 resulting from a t-test show very similar shapesas those in Fig. 5.4, but here the detected activations are stronger. This indicates thatthe model used here captures all variance captured by those methods. The differencebetween this analysis and the previous is that only one basis function is used here, en-abling a covariance test with stronger responses.

Table 4.2 of the previous chapter is extended in this chapter by adding the maximumvariance ratio values found in the tests of this section. Recall that a high variance ratioindicates that much of the variance in the signal is explained by the model, and thatthe residual noise in the GLM (see Eq. (5.1)) is small. Table 5.1 shows that the HRFs


(a) (b)

(c) (d)

Figure 5.7. SPM{T}s of the activation found by using the modelled HRFs: (a) fixed-ISIdata, random-ISI whole-volume HRF, (b) fixed-ISI data, random-ISI regional HRF, (c)random-ISI data, fixed-ISI whole-volume HRF, (d) random-ISI data, fixed-ISI regionalHRF.

Table 5.1. A comparison of the maximum variance ratio values found in this Chapterand those found in Chapter 4.

ForWaRD (splines) original ForWaRD selective averagingvolume region volume region volume region HRFspm

fixed-ISI 120 163 113 162 – – 117random-ISI 103 101 103 102 102 104 74

116 5.6 Conclusion

extracted from the random-ISI data set by ForWaRD with orthogonal spline waveletsyield better results than the HRFs previously extracted. Especially the undershoot ofthe whole-volume HRF is better detected with ForWaRD and orthogonal spline wavelets(see Fig. 5.5). The HRFs extracted from the fixed-ISI data set do not show measurableimprovements. As indicated in Section 5.5.1, a possible explanation for this is that theISI is too short to successfully capture the whole HRF.

The generally small differences between the results of both versions of ForWaRDmay be explained by the fact that ForWaRD is quite robust to the choice of waveletfilters, as observed in the tests in Chapter 4. Another possible regularising factor is thatnot the coefficients themselves are used, but the functions HRFpar, and that differentHRF time signals yield very similar fits.

5.6 Conclusion

We have presented an HRF extraction method for fMRI time series based on ForWaRD.The output of our algorithm is a post-stimulus time series, representing the HRF inevery voxel. The existing ForWaRD method was extended by introducing a novel,frequency-domain, implementation of the SI-DWT. Computation time was reduced byprecomputing and reusing the exponentials required for the polyphase decompositionand the monophase reconstruction. The efficiency of the algorithm was further in-creased by changing the dimensionality of the signals and filters according to the num-ber of phases. Timings show that for signals longer than 64 points, the speed gain ofthe frequency-domain transform is considerable. This enabled us to efficiently use or-thogonal spline wavelets. We also presented a model for the HRF that can be used incombination with the extracted coefficients to predict event-related fMRI responses. Themodelled HRF appears to capture the same amount of variance in one basis function asthe tested traditional methods, which require multiple basis functions.


Given: signal s of length N , wavelet filters h and g

s FFT−−→ S; h FFT−−→ H ; g FFT−−→ G

Forward transform (SI-DWT):

1: Compute shifts, the set of complex exponentials for the shifts2: C0 := S; Q := 13: for j = 1 to J do4: shifts, Cj−1 polyphase(Q)−−−−−−−→

{CQ,q

}Q−1

q=0(see Eq. (B.14))

5: for q = 0 to Q− 1 do6: DQ,q := CQ,qG (pointwise multiplication)

CQ,q := CQ,qH (pointwise multiplication)7: end for8: shifts,

{CQ,q

}Q−1

q=0monophase(Q)−−−−−−−−→ Cj (see Eq. (B.15))

shifts,{DQ,q

}Q−1

q=0monophase(Q)−−−−−−−−→ Dj (see Eq. (B.15))

9: H :=↓2 H ; G :=↓2 G10: Q := 2Q11: end for

Result: CJ and {Dj}Jj=1.

Inverse transform (SI-IDWT):

1: Compute shifts, the set of complex exponentials for the shifts†

2: for j = J downto 1 do3: Q=2j−1

4: shifts, Cj polyphase(Q)−−−−−−−→{CQ,q

}Q−1

q=0(see Eq. (B.14))

shifts, Dj polyphase(Q)−−−−−−−→{DQ,q

}Q−1

q=0(see Eq. (B.14))

5: H ↓2j−1

−−−→ Hs

G ↓2j−1

−−−→ Gs

6: for q = 0 to Q− 1 do7: CQ,q := (CQ,qHs + DQ,qGs) /28: end for9: shifts,

{CQ,q

}Q−1

q=0monophase(Q)−−−−−−−−→ Cj−1 (see Eq. (B.15))

10: Q := Q/211: end for

C0 1D-IFFT−−−−→ s

† These shifts are the same as those used in the forward transform.Algorithm 5.2: Frequency-domain SI-DWT and SI-IDWT in pseudo-code.

Chapter 6

Discussion

6.1 Summary and conclusions

Within the field of functional neuroimaging, fMRI plays a prominent role. Its versatilitymakes fMRI the preferred technique for a large range of experiments. The complexityof the research questions and the data that are measured, requires powerful analysismethods.

The first part of Chapter 1 provides an overview of some aspects of MR imaging thatare relevant to fMRI. The physical principles of MRI and the steps required to produceimages are briefly discussed. The contrast in fMRI is based on the different magneticproperties of oxygenated vs. deoxygenated blood, which can be measured with an MRscanner. The application of fMRI has evolved in the last decade of the 20th century fromsubtracting blocks of images in single-subject experiments to group studies of event-related experiments. Analysis methods have evolved from one-sample t-tests on blockdifference images to nonparametric tests based on ANCOVA designs. The central partof most analysis methods in use today is the general linear model, assuming the BOLDresponse to be linear and time invariant. The common method for analysis is statisticalparametric mapping, introducing the efficient use of many standard hypothesis tests inthe statistical analysis of neuroimages. The downside of the popularity of statisticalparametric mapping is that it uses many assumptions about the data, which are oftenused without asserting their validity. Recent studies have criticised the rash adoptionof these assumptions, demonstrating cases where they are not true.

The second part of Chapter 1 introduces the concept of wavelet analysis. A wave-let transform decomposes a signal into a number of versions at different scales. Twotypes of wavelet transform are discussed: the fast wavelet transform, which is based onmulti-resolution analysis, and the translation-invariant wavelet transform, which is basedon polyphase decomposition. Implementations of both types of transform in the timedomain, as well as in the frequency domain, are presented. The applications of wavelettransforms in this thesis, i.e., denoising and waveform extraction, are briefly reviewed.

Chapter 2 provides a critical analysis of the definition of the BOLD signal. In fMRIresearch the noise is assumed to be additive and Gaussian distributed, but MRI datahave a Rician distribution. Rician noise is multiplicative instead of additive, and unlike

120 6.1 Summary and conclusions

Gaussian noise, it has an asymmetric distribution. We demonstrate that if every BOLDimage is defined as the difference between two MR images with the same ground truthimage and the same signal-to-noise ratio (SNR), its grey value distribution is symmetricand, to a close approximation, Gaussian. These properties of the null distribution (thedistribution of the noise if there is no activation) are analytically derived, and confirmedvia tests on images with synthetic noise. Tests on a noise-free MR template image con-taminated with synthetic Rician noise show the asymmetry of the distribution of theresidual (the original image subtracted from the noisy image). The difference betweentwo of these noisy images has a symmetric distribution.

In fMRI time series analysis, the temporal noise distribution is used to determine thethreshold used in the statistical tests. Tests with a time series of images with syntheticnoise shows that, if a second time with noise is subtracted, the parameters of the (near-Gaussian) temporal noise can be estimated very accurately. A final test uses a time seriesof real MR volumes. The time series is split in two, and the residual time series of thefirst part is obtained in two ways: firstly, by subtracting its time series mean volume andsecondly, by subtracting the volume in the second part from the corresponding volumein the first part. In the latter case, the noise distribution can better be approximatedby a Gaussian than in the first case. From these results we conclude that the BOLDsignal computed in the latter way agrees much better with the assumption of Gaussiandistributed temporal BOLD noise than the old definition, which uses the time seriesmean to obtain the residual time series.

A generic wavelet-based denoising scheme for functional MR images is presented inChapter 3. The hypothesis testing framework of statistical parametric mapping is intro-duced, and different methods for multiple hypothesis testing correction are reviewed.Bonferroni correction, the simplest method, is considered too conservative for spatiallycorrelated noise. Corrections based on Gaussian random field theory are not favourableeither, because they require heavy smoothing, obscuring detailed structures and produ-cing deformed regions of activation. False discovery rate control is more powerful thanBonferroni correction and does not require smoothing. Wavelet-based denoising andfalse discovery rate control are proposed for the statistical analysis of fMRI time series.The fast wavelet transform (FWT) efficiently transforms a signal to its wavelet repres-entation. The denoising schemes are developed by extending the threshold selectionschemes in WaveLab to 2D images. Symmetric, orthogonal, cubic spline basis functionsare used for the wavelet transforms.

Tests on noisy 2D BOLD images are performed by making two copies of a tem-plate image, contaminating both of them with Rician noise, adding activation to oneof the images and subtracting the other image. The resulting BOLD images are de-noised by wavelet-based methods and Gaussian smoothing, respectively. Comparisonsof the increase in SNR of the BOLD images after denoising show that the less smooth-ing wavelet-based methods can produce higher output SNRs than the more smooth-ing methods. Tests on time series of 2D images show that the region where signal isdetected in the denoised time series is closest to the ground truth with less smoothingmethods. Maps of the temporal signal-to-noise ratio and maps resulting from an AN-

Discussion 121

COVA analysis also show fewer errors for less smoothing methods. The validity ofstatistical parametric mapping after wavelet-based denoising is tested, by measuringthe amount of negative spatial correlations in the noise, and checking the distributionof p-values of time series without activation. A final test on a real fMRI data set showsthat after denoising with less smoothing methods, the detected region remains does notchange much. After much smoothing, the detected regions are elliptic and much largerthat the ones detected in the original time series. We conclude that the number of falsepositives in functional MR time series analysis is better controlled with less smoothingpreprocessing methods. These less smoothing wavelet-based denoising methods, suchas InvShrink, MinMaxThresh and SUREThresh thresholding in the wavelet domain, incombination with FDR control, form an attractive alternative to Gaussian smoothingand Gaussian RFT-based multiple hypothesis correction.

Chapters 4 and 5 present methods to extract the haemodynamic response function(HRF) based on Fourier-wavelet regularised deconvolution (ForWaRD). Chapter 4 de-scribes the problem of extracting the HRF from noisy fMRI time series. Given the gen-eral linear model, the HRF can be extracted using deconvolution, but this hardly everyields a stable solution. Regularisation of the process is done by extracting the HRFfrom the noisy signal in the frequency domain, shrinking the frequency coefficients toremove noise, and performing wavelet-domain Wiener shrinkage to further improvethe results. The fast wavelet transform is not shift-invariant, so ForWaRD relies on theshift-invariant wavelet transform (SI-DWT). Tests on simulated activation signals, com-bined with real fMRI data to provide the noise, show that the algorithm is very robustto changes in the input signals and changes in its parameters.

A new model for the HRF is given, based on linear systems showing damped os-cillations. The BOLD signal derived from such a system and its derivative provide afunctional description of the HRF. In combination with HRF coefficients extracted froman fMRI data set by the ForWaRD-based algorithm, this model may provide a detaileddescription of the HRF.

The extraction method and the model are tested on a set of two experiments withone test person and one stimulus type. The time series were measured on differentdays, and the interstimulus interval was fixed in the first experiment and randomisedin the second. HRF signals were extracted from the data using the extracted HRF ateach voxel, weighted with a statistical value computed at each voxel. HRF signals wereextracted from both studies and the HRF of each of the two time series was used toproduce a statistical map (computed with ANCOVA) in the other time series. The ana-lyses performed with the extracted and modelled HRFs were compared with analysesperformed with a standard HRF (i.e., the HRF of the SPM program). Tests indicate thatusing extracted and modelled HRFs may substantially improve the statistical analysisof fMRI time series.

The method introduced in Chapter 4 is extended in Chapter 5 to support long wave-let filters, in particular those constituting orthogonal spline wavelet bases. These filtersare as long as the signals they decompose, making filtering in the time domain veryexpensive. We facilitate the use of these wavelets by introducing a frequency-domain

122 6.2 Perspectives

implementation of the shift-invariant wavelet transform. The shift-invariant wavelettransform is based on polyphase decompositions rather than on subsampling (like theFWT), and the frequency-domain implementation of the SI-DWT performs the poly-phase transform, and its inverse, in the frequency domain. Formulas for these trans-forms are given in an appendix. For long signals, the frequency-domain version of theSI-DWT is shown to be much more efficient than the time-domain version.

The same time series as used in Chapter 4 are analysed using the frequency-domainimplementation of the HRF extraction algorithm. After modelling the HRFs, ANCOVAtests are applied to both experiments. Results are similar to those found in Chapter 4.Detection of activation is improved by using orthogonal splines for extracting the HRFfrom the random-ISI time series. For HRFs extracted from the fixed-ISI time series, theuse of orthogonal splines did not improve the results. For orthogonal splines and otherlong wavelet filters, using the frequency-domain implementation significantly reducesthe computation load.

6.2 Perspectives

This thesis has addressed a number of topics in functional neuroimage analysis. Wherenew insights and ideas are presented, of course also new questions arise.

In the first part of the thesis, new ways to treat fMRI noise were given. Chapter 2provides a method to obtain a BOLD signal with near-Gaussian noise from fMRI timeseries where the images have Rician noise. However, other noise sources, related tosubject movement and non-experiment-related physiological processes, were not takeninto account. Noise models can be refined by also incorporating the effects of thosedisturbance factors.

The method of statistical parametric mapping is based on the noise model beingknown. A number of recent, promising methods mentioned in the introduction, useparameter-free tests. The noise model presented in Chapter 2 provides new insightsabout the distribution of BOLD noise. However, little being known about other noisesources that contribute to the BOLD signal, and about their distributions, parameter-free tests offer an attractive alternative to noise parameterisation.

Chapter 3 presents a wavelet-based denoising scheme for functional neuroimages,and embeds it into the general linear model framework for statistical analysis by com-bining it with false discovery rate control for multiple hypothesis correction. The thresholdselection schemes used in the denoising routines are from the WaveLab toolbox, andfractional spline wavelet basis functions are used. A comparison of wavelet-baseddenoising methods and Gaussian smoothing shows that the wavelet-based denoisingmethods yield fewer false positives than Gaussian smoothing. The 2D FWT was usedin the denoising routines, and as explained in part two of the thesis, the FWT is nottranslation-invariant. A possible way to improve the denoising scheme is to use the SI-DWT instead of the FWT. As new wavelet-based denoising methods and new waveletbasis functions are still being invented, incorporating these may improve the method as

Discussion 123

well.The second part of the thesis deals with HRF extraction. This is a relatively new

area of research: many methods for HRF extraction have recently been proposed andare reviewed in Chapter 4. The method proposed in this chapter is attractive, because itrequires no knowledge about the shape of the HRF, and extracting it requires only theMR image time series and the stimulus pattern. Methods already exist to extract mul-tiple HRFs from more complex fMRI experiments, and also to combine data of differentsubjects to yield more robust results. The method presented in Chapter 4 can also beextended to extract more general, or more robust, results by using multiple sources ofinformation.

The HRF model presented in Chapter 4 relates to other HRF models based on dy-namical systems, but it is attractive because of its simplicity, and its compatibility withthe general linear model. One possibility to make this model more flexible is alreadydemonstrated in Section 4.5.2, where two functions describing a damped oscillator areused instead of one, in order to model different stages of the haemodynamic response.Further refinement of this model may include the modelling of nonlinear effects, andmodelling differential responses by measuring the parameters of the descriptive func-tion for different regions and different stimulus types.

Chapter 5 describes a technical extension to the HRF extraction method of Chapter4, by facilitating the use of orthogonal spline wavelets. An efficient frequency-domainimplementation of the SI-DWT is presented, and a comparison of computation timeswith the time-domain SI-DWT and the frequency-domain SI-DWT, respectively, showthat the latter is preferred when orthogonal spline wavelets are used. The frequency-domain implementation of the SI-DWT is at present only available for 1D signals, itwould be interesting to see the computational benefit of this implementation on multi-dimensional signals.

The denoising scheme presented in Chapter 3 is already available as a ‘plug-in’ forthe SPM program. We are planning also to integrate the HRF extraction routine and theHRF model into SPM.

Appendix A

Mathematical analysis of the nulldistribution

In this appendix we present proofs of the exact analytical results presented in Section 2.3on the distribution of the BOLD signal under the null hypothesis.

A.1 Mean and variance of the BOLD distribution

First, the mean µs is zero because of the symmetry of CA,σ(s). Second, since the mean iszero, the variance of the BOLD distribution satisfies

σ2s =

∫ ∞

−∞dss2C(s)

=

∫ ∞

−∞dss2

∫ ∞

0

dr1

∫ ∞

0

dr2pA,σ(r1) pA,σ(r2) δ(r2 − r1 − s)

=

∫ ∞

0

dr1

∫ ∞

0

dr2pA,σ(r1) pA,σ(r2)

∫ ∞

−∞dss2 δ(r2 − r1 − s)

=

∫ ∞

0

dr1

∫ ∞

0

dr2pA,σ(r1) pA,σ(r2) (r1 − r2)2

=

∫ ∞

0

dr1

∫ ∞

0

dr2pA,σ(r1) pA,σ(r2)(r21 + r2

2 − 2r1 r2)

=

∫ ∞

0

dr1r21 pA,σ(r1) +

∫ ∞

0

dr2r22 pA,σ(r2)− 2

(∫ ∞

0

dr1r1 pA,σ(r1)

) (∫ ∞

0

dr2r2 pA,σ(r2)

)= 2E(r2)− 2E(r)2 = 2σ2

r

Here E(. . .) denotes the average of the quantity within the brackets. So we have foundthat σ2

s = 2σ2r , which directly yields (2.12).

126 A.2 Exact form of the null distribution in the Rayleigh case

A.2 Exact form of the null distribution in the Rayleighcase

Substituting the form (2.13) of the Rayleigh distribution in expression (2.10), we find

C0,σ(s) =

∫ ∞

0

drr

σ2e−

r2

2σ2r + |s|σ2

e−(r+s)2

2σ2 . (A.1)

Putting r/σ = x, |s| /σ = q, A/σ = a, we find after some algebra:

C0,σ(s) =1

σ

∫ ∞

0

dx{(x+ q/2)2 − q2/4

}e−(x+q/2)2−q2/4 (A.2)

Again, putting y = x+ q/2:

C0,σ(s) =1

σe−

q2

4

∫ ∞

q/2

dy(y2 − q2/4

)e−y2

. (A.3)

Writing τ = q/2, we can write this integral as the sum of two terms, each of whichcan be expressed in terms of the complementary error function:

C0,σ(s) =1

σe−τ2

S2 −1

σe−τ2

τ 2 S0, (A.4)

where

S0 =

∫ ∞

τ

dy e−y2

=

√π

2erfc(τ)

S2 =

∫ ∞

τ

dy y2 e−y2

=1

2τ e−τ2

+

√π

4erfc(τ)

Substitution of these expressions in (A.4) yields

C0,σ(s) =1

2σe−τ2

{τ e−τ2

+

√π

2(1− 2τ 2) erfc(τ)

}. (A.5)

Re-expressing τ in terms of the original variable s (i.e., τ = q/2 = |s| /(2σ)), we obtainformula (2.14).

A.3 Tails of the null distribution

We consider the limiting case of low versus high SNR, i.e., A=0 and A/σ large.

Mathematical analysis of the null distribution 127

Case A=0

This is the Rayleigh case, for which we have derived an exact expression for the nulldistribution, see formula (2.14). When s is large, we can use the asymptotic behaviourof the error function (Abramowitz and Stegun 1972)

erfc(z) ∼ 1√πz

e−z2

, z →∞ (A.6)

Substituting this in (2.14), we find after some rearrangement of terms

C0,σ(s) ∼ 1

2 |s|e−

s2

2σ2 , s→∞, (A.7)

which behaves as a Gaussian tail of width σ multiplied by a factor 1/ |s|.

Case A/σ large

Since A/σ is large, we apply the Gaussian approximation of the Rice distribution:

pA,σ(r) ∼ 1√2πσ2

e−(r−A)2

2σ2

As shown in (Gudbjartsson and Patz 1995), this approximation is already accurate forA ≥ 2σ.

Substituting this in (2.10), we get

CA,σ(s) ∼∫ ∞

0

dr1

2πσ2e−

(r−A)2

2σ2 e−(r+|s|−A)2

2σ2 .

Putting r/σ = x, |s| /σ = q, A/σ = a, we find after some algebra:

CA,σ(s) ∼ 1

2πσ

∫ ∞

0

dx e−(x−a)2

2 e−(x+q−a)2

2 =1

2πσe−

q2

4

∫ ∞

0

dx e−(x+q/2−a)2 .

Again, putting y = x+ q/2− a:

CA,σ(s) ∼ 1

2πσe−

q2

4

∫ ∞

q/2−a

dy e−y2

=1

2πσe−

q2

4

√π

2erfc(q/2− a).

In terms of the original variable s:

CA,σ(s) ∼ 1

4√πσ

e−s2

4σ2 erfc

(|s| /2− A

σ

). (A.8)

128 A.3 Tails of the null distribution

Applying the asymptotic expansion (A.6) of the erfc function for large argument, wefind:

CA,σ(s) ∼ 1

2π(|s| − 2A)e−

(|s|−A)2+A2

2σ2 .

Finally, since |s| is large, we can replace |s| /2− A by |s|,

CA,σ(s) ∼ constant · 1

|s|e−

(|s|−A)2

2σ2 , s→∞. (A.9)

which again behaves as a Gaussian tail of width σ multiplied by a factor 1/ |s|.

Appendix B

Polyphase decompositions in thefrequency domain

B.1 Introduction

Given a signal x(n), n = 0, . . . , N−1 and itsN -point Fourier transformX(k), k = 0, . . . , N−1,we derive formulas for its polyphase decomposition in the frequency domain, see Eq. (B.14).In the polyphase representation of a signal decomposed in Q phases, we denote eachphase by:

XQ,q(k), k = 0, . . . , N/Q− 1, q = 0, . . . , Q− 1. (B.1)

The reconstruction to one phase, which we call the monophase reconstruction, recoversX(K), K = 0, . . ., N − 1 from the phase components XQ,q (B.1). A formula for thisreconstruction is given in Eq. (B.15).

B.2 Upsampling and downsampling in the frequency do-main

Implementations of the FWT in the frequency domain use upsampling and downsamplingby a factor of 2 in the frequency domain (Rioul and Duhamel 1992, Vetterli and Herley1992, Westenberg and Roerdink 2000). This case, corresponding to a biphase decompos-ition and regarding only the first phase, is extended to an arbitrary number of phases(in the case of wavelet transforms this number is of the form 2k, k∈N).

B.2.1 Up/downsampling by a factor of 2

For notational convenience, we use the Z-transform of x(n), which is defined as

X(z) =N−1∑n=0

x(n)z−n, z∈C. (B.2)

130 B.2 Upsampling and downsampling in the frequency domain

On the unit circle in the complex plane, the Z-transform X(e2πik/N) coincides with theelement X(k) of the discrete Fourier transform (DFT) of x, see Eq. (1.1). Splitting asignal into its even- and odd-numbered samples, respectively, is called the biphase de-composition. The signals consisting of the even and odd samples of x have Z-transforms

Xeven(z) =

N/2−1∑n=0

x(2n)z−n,

Xodd(z) =

N/2−1∑n=0

x(2n+ 1)z−n.

(B.3)

It has been shown (Vetterli and Herley 1992) that

X(z) = Xeven(z2) + z−1Xodd(z2). (B.4)

From (B.4) we find:

Xeven(z) =1

2

(X(z

12 ) + X(−z

12 ))

Xodd(z) =1

2z

12

(X(z

12 )− X(−z

12 )),

(B.5)

which are the Z-transforms of the two phases in the biphase decomposition.The Z-transform of the upsampled signal x is given by

Xup(z) = X(z2), (B.6)

i.e., the coefficients of X(z) are the even samples in the upsampled signal.The DFT of x(n) after downsampling (which is the even part of the biphase decom-

position) is obtained via:

Xeven(k) = Xeven(e2πikN/2 )

= 12

(X(k) +X

(k − N

2

))Xodd(k) = Xodd(e

2πikN/2 )

= 12e

2πikN

(X(k)−X

(k − N

2

)) k = 0, . . ., N/2− 1. (B.7)

Given a signal x(n) of length N/2, the DFT of the upsampled signal is:

Xup(k) = Xup(e2πik

N ) = X(e2πikN/2 ), k = 0, . . ., N − 1. (B.8)

The spectrum X has period N/2, so the spectrum Xup doubles X , i.e., the spectrum Xup

is the concatenation of two copies of X(k).

Polyphase decompositions in the frequency domain 131

B.3 Up/downsampling by an arbitrary factor

Given a signal x(n) of length N , we define the signals downsampled by a factor of Q(assume Q to be a divisor of the signal length N ) and shifted over an index q, q =0, 1, . . . , Q−1 by

shift 0: xQ,0 = x(0), x(Q), x(2Q), . . . , x(N −Q)shift 1: xQ,1 = x(1), x(Q+ 1), x(2Q+ 1), . . . , x(N −Q+ 1)shift 2: xQ,2 = x(2), x(Q+ 2), x(2Q+ 2), . . . , x(N −Q+ 2)

. . . . . .shift Q-1: xQ,Q−1 = x(Q− 1), x(2Q− 1), x(3Q− 1), . . . , x(N − 1).

Splitting a signal x into the signals xQ,0, . . . , xQ,Q−1 is called the polyphase decompositionof x. Each signal xQ,q, q=0, . . . , Q−1 represents a phase component of x.

B.3.1 The Z-transform

The Z-transform of each phase component xQ,q is denoted by XQ,q(z):

XQ,q(z) =

N/Q−1∑n=0

x(Qn+ q)z−n. (B.9)

The Z-transform of the signal, when it is decomposed into Q phases, is a generalisationof (B.4):

X(z) =N−1∑n=0

x(n)z−n

=

Q−1∑q=0

∑n′

x(Qn′ + q)z−(Qn′+q)

=

Q−1∑q=0

z−q XQ,q(zQ)

(B.10)

This enables us to formulate a general expression for the polyphase decomposition of asignal in the frequency domain:

Theorem B.3.1 Let x(n) be a signal of length N with Z-transform X(z). Let xQ,q denote thedownsampled signal with downsampling factor Q and shift q, whose Z-transform is denoted byXQ,q(z). Then:

XQ,q(z) =1

Qz

qQ

Q−1∑`=0

e2πi`q

Q X(e2πi`

Q z1Q ) (B.11)

132 B.3 Up/downsampling by an arbitrary factor

Proof: Denoting the sum in the right-hand side of (B.11) by SUM and inserting thepolyphase decomposition (B.10), we get

SUM =

Q−1∑`=0

e2πi`q

Q X(e2πi`

Q z1Q )

=

Q−1∑`=0

e2πi`q

Q

Q−1∑m=0

e−2πi`m

Q z−mQ XQ,m(z)

=

Q−1∑m=0

z−mQ XQ,m(z)

Q−1∑`=0

e2πi`(q−m)

Q

We know thatQ−1∑`=0

e2πi`(q−m)

Q = Qδq,m

where δq,m is the Kronecker delta function. Therefore,

SUM =

Q−1∑m=0

z−mQ XQ,m(z)Qδq,m

= z−qQ XQ,q(z)Q.

This completes the proof. �

B.3.2 The DFT

The next step is reformulating (B.11) in terms of DFT coefficients. Let X(k) be the N -point Fourier transform of a signal x(n) of lengthN . Also, letXQ,q(k) be the k-th Fouriercoefficient of the phase component xQ,q of length N/Q, i.e.,

X(k) = X(e2πik

N )

XQ,q(k) = XQ,q(e2πikN/Q ).

(B.12)

Substituting this into (B.11) yields:

XQ,q(k) =1

Qe

2πikqN

Q−1∑`=0

e2πi`q

Q X(e2πi`

Q e2πik

N )

=1

Qe

2πikqN

Q−1∑`=0

e2πi`q

Q X(e2πi(k+`N/Q)

N )

(B.13)


As a result, the equation which extends the biphase decomposition in the frequencydomain to the more general polyphase decomposition is given by:

XQ,q(k) =1

Qe

2πikqN

Q−1∑`=0

e2πi`q

Q X

(k +

`N

Q

),

k = 0, 1, . . . , N/Q− 1, q = 0, 1, . . . , Q− 1.

(B.14)

This formula expresses the DFT coefficients of the phase components in the DFT coeffi-cients of the original signal. The cases Q=1 and Q=2 are treated in the next section.

The monophase reconstruction in the frequency domain, which transforms the poly-phase representation of a signal back into the DFT coefficients X(K) is given by thefollowing equation:

let K = k +`N

Q,

X(K) =

Q−1∑q=0

e−2πiqK

N XQ,q(k),

k = 0, 1, . . . , N/Q− 1, ` = 0, 1, . . . , Q− 1.

(B.15)

Proof: We start with Eq. (B.14),

XQ,q(k) =1

Qe

2πikqN

Q−1∑`′=0

e2πi`′q

Q X(k + `′N/Q)

=1

Q

Q−1∑`′=0

e2πiq

N

“k+ `′N

Q

”X(k + `′N/Q).

Insert this expression for XQ,q(k) into the right-hand side (RHS) of (B.15):

RHS =

Q−1∑q=0

e−2πiq

N (k+ `NQ )XQ,q(k) =

Q−1∑q=0

e−2πiq

N (k+ `NQ ) 1

Q

Q−1∑`′=0

e2πiq

N

“k+ `′N

Q

”X(k + `′N/Q).

Changing the order of the terms yields:

RHS =1

Q

Q−1∑`′=0

X(k + `′N/Q)

Q−1∑q=0

e−2πiq

N (k+ `NQ ) e

2πiqN

“k+ `′N

Q

”

=1

Q

Q−1∑`′=0

X(k + `′N/Q)

Q−1∑q=0

e−2πiq

N(`−`′)N

Q︸︷︷︸e−2πiq

Q(`−`′)︸︷︷︸

Qδ`,`′

= X(k +`N

Q).

134 B.4 Different downsampling factors Q

�

B.4 Different downsampling factors Q

B.4.1 Q = 1

The first value, Q = 1, is trivial, but serves as a good illustration of how these equationscan be simplified. For Q = 1, we need to threat one phase, q = 0, and the length of theresulting signal is N , so k = 0, . . ., N − 1.

X1,0(k) =1

1e

2πik·0N

0∑`=0

e2πi`·0

1 X(k + `N/1) = X(k)

This reduction is trivial: X1,0(k) reduces to the DFT X(k) of the signal x(n).

B.4.2 Q = 2

The case Q = 2 is known as biphase decomposition, and one of its applications is effi-cient computation of the FWT in the frequency domain (Vetterli and Herley 1992). ForQ = 2, the phases q = {0, 1} must be treated, and the length of the resulting signal isN/2, so k = 0, 1, . . . , N/2− 1.

X2,0(k) =1

2e

2πik·0N

1∑`=0

e2πi`·0

2 X(k + `N/2)

=1

2(X(k) +X(k +N/2))

X2,1(k) =1

2e

2πik·1N

1∑`=0

e2πi`·1

2 X(k + `N/2)

=1

2e

2πikN

(X(k) + eπiX(k +N/2)

)=

1

2e

2πikN (X(k)−X(k +N/2))

This shows that for Q = 2, (B.14) reduces to the biphase decomposition (B.7).

B.5 Computing the FWT in the frequency domain

The biphase decomposition and the corresponding reconstruction in the frequency do-main are used to perform the downsampling and upsampling operations of Eq. (1.8) tocompute the FWT in the frequency domain. The differences between the time-domainalgorithm and the frequency-domain algorithms are (i) the methods for upsampling and


downsampling, respectively, and (ii) the convolution operation. A convolution in thetime domain is a multiplication in the frequency domain. Given the signals cj , j = 1. . .Jand the orthogonal wavelet filters h and g (see also section 1.2), and denoting their fre-quency domain representations with H and G, the decomposition steps and reconstruc-tion steps of the FWT, respectively, are shown in the table below. The Fourier transformsof the dual filters h and g are denoted by H and G.

Table B.1. Decomposition (forward) and reconstruction (inverse) fast wavelet trans-form operations in the time and frequency domain.

time domain frequency domainforward cj =↓2(h ∗ cj−1) Cj = (HCj−1)even

dj =↓2(g ∗ cj−1) Dj = (GCj−1)even

inverse cj−1 = h ∗ (↑2 cj) + g ∗ (↑2 d

j) Cj−1 = H(Cj)up + G(Dj)up

B.6 Two-dimensional FWT in the frequency domain

In two dimensions, the FWT in the frequency domain is defined by the following de-composition and reconstruction steps. LetHH ,HG,GH , andGG be the tensor productsof H and G, so that HG(k1, k2) = H(k1)G(k2), etc. Downsampling of a 2D signal c ofsize N×N is represented in the frequency domain as:

Cevenx,eveny(k, l) =1

4[C(k, l) + C(k, l +m) + C(k +m, l) + C(k +m, l +m)]

m = N/2, k, l = 1, . . . ,m(B.16)

and upsampling as:

Cupx,upy =

[C CC C

](B.17)

where C denotes the matrix with coefficients C(k, l) as described in Eq. (B.16). Theforward step (decomposition) of the 2D FWT in the frequency domain is defined as

Cj = (HH Cj−1)evenx,eveny Dj1 = (HG Cj−1)evenx,eveny

Dj2 = (GH Cj−1)evenx,eveny Dj

3 = (GG Cj−1)evenx,eveny,(B.18)

after which the filters are subsampled in both dimensions by a factor of 2, to extract theN/2×N/2-point Fourier transforms of H and G, respectively, from their N×N -pointFourier transforms (Westenberg and Roerdink 2000). These N/2×N/2-point Fouriertransforms are used in the next decomposition level.

136 B.7 The SI-DWT in the time / frequency domain

The 2D dual filters are constructed from the 1D dual filters similarly as above, andthe inverse step (reconstruction) is defined as:

Cj−1 = HH (Cj)upx,upy + HG (Dj1)

upx,upy + GH (Dj2)

upx,upy + GG (Dj3)

upx,upy. (B.19)

B.7 The SI-DWT in the time / frequency domain

The SI-DWT (Mallat 1991) uses the polyphase decomposition instead of usual sub-sampling. An important difference between subsampling and polyphase decomposi-tion is that subsampling is not invertible, but polyphase decomposition is. The SI-DWTperforms a monophase reconstruction after the filtering step. Such a reconstruction doesnot take place in the FWT. The computation steps for the SI-DWT in the time domainand in the frequency domain are given in Table B.2. Each step starts with a polyphasetransform, then the phase components are filtered separately, and a monophase recon-struction is performed on the filtered phase components. The polyphase transformsand monophase reconstructions, denoted by arrows, are computed via the formulas inSec. B.3.

Table B.2. Decomposition (forward) and reconstruction (inverse) shift-invariant wave-let transform operations in the time and frequency domain.

time domain frequency domain

forward 1. cjpoly→

{cQ,q

}Q−1

q=0Q=2j 1. Cj poly→

{CQ,q

}Q−1

q=0Q=2j

2. ∀q : dQ,q := g ∗ cQ,q 2. ∀q : DQ,q := GCQ,q

∀q : cQ,q := h ∗ cQ,q ∀q : CQ,q := HCQ,q

3.{dQ,q

}Q−1

q=0

mono→ dj+1 3.{DQ,q

}Q−1

q=0

mono→ Dj+1{cQ,q

}Q−1

q=0

mono→ cj+1{CQ,q

}Q−1

q=0

mono→ Cj+1

4. G :=↓2G H :=↓2H

inverse 1. dj+1 poly→{dQ,q

}Q−1

q=0Q=2j 1. Dj+1 poly→

{DQ,q

}Q−1

q=0Q=2j

cj+1 poly→{cQ,q

}Q−1

q=0Q=2j Cj+1 poly→

{CQ,q

}Q−1

q=0Q=2j

2. Gs :=↓2j G Hs :=↓2jH

2. ∀q : dQ,q := g ∗ cQ,q 3. ∀q : DQ,q := GsCQ,q

∀q : cQ,q := h ∗ cQ,q ∀q : CQ,q := HsCQ,q

3. ∀q : cQ,q := (cQ,q + dQ,q)/2 4. ∀q : CQ,q := (CQ,q +DQ,q)/24.{cQ,q

}Q−1

q=0

mono→ cj 5.{CQ,q

}Q−1

q=0

mono→ Cj

Bibliography

Abramovich, F. and Benjamini, Y. (1995), Thresholding of wavelet coefficients as mul-tiple hypotheses testing procedure, in A. Antoniadis and G. Oppenheim, eds,‘Wavelets and Statistics’, Vol. 103 of Lecture Notes in Statistics, Springer-Verlag,pp. 5–14.

Abramovich, F. and Benjamini, Y. (1996), ‘Adaptive thresholding of wavelet coeffi-cients’, Computational Statistics and Data Analysis 22, 351–361.

Abramowitz, M. and Stegun, I. A., eds (1972), Handbook of mathematical functions, Vol. 55of Applied Mathematics Series, 10th edn, National Bureau of Standards U.S.A., Wash-ington.

Aguirre, G. K., Zarahn, E. and D’Esposito, M. (1998), ‘The variability of human, BOLDhemodynamic responses’, NeuroImage 8(4), 360–369.

Aguirre, G. K., Zarahn, E. and Esposito, M. D. (1997), ‘Empirical analyses of BOLDfMRI statistics II. spatially smoothed data collected under null-hypothesis and ex-perimental conditions’, NeuroImage 5(3), 199–212.

Aine, C. (1995), ‘A conceptual overview and critique of functional neuroimaging tech-niques in humans: I. MRI/fMRI and PET’, Critical Reviews in Neurobiology 9(2-3), 229–309.

Alexander, M. E., Baumgartner, R., Summers, A. R., Windischberger, C., Klarhoefer,M., Moser, E. and Somorjai, R. L. (2000a), ‘A wavelet-based method for improv-ing signal-to-noise ratio and contrast in MR images’, Magnetic Resonance Imaging18(2), 169–80.

Alexander, M. E., Baumgartner, R., Windischberger, C., Moser, E. and Somorjai, R. L.(2000b), ‘Wavelet domain de-noising of time-courses in MR image sequences’, Mag-netic Resonance Imaging 18(9), 1129–1134.

Ashburner, J. and Friston, K. J. (1997), ‘Multimodal image coregistration and partition-ing - a unified framework’, NeuroImage 6(3), 209–217.

Ashburner, J. and Friston, K. J. (1999), ‘Nonlinear spatial normalization using basis func-tions’, Human Brain Mapping 7(4), 254–266.

140 BIBLIOGRAPHY

Bandettini, P. A. and Cox, R. W. (2000), ‘Event-related fMRI contrast when using con-stant interstimulus interval: Theory and experiment’, Magnetic Resonance in Medi-cine 43, 540–548.

Bandettini, P. A., Jesmanowicz, A., Wong, E. C. and Hyde, J. S. (1993), ‘Processingstrategies for time-course data sets in functional MRI of the human brain’, Mag-netic Resonance in Medicine 30, 161–173.

Bandettini, P. A., Wong, E. C., Hinks, R. S., Tikofsky, R. S. and Hyde, J. S. (1992), ‘Timecourse EPI of human brain function during task activation’, Magnetic Resonance inMedicine 25, 390–397.

Belliveau, J. W., Kennedy, D. N., McKinstry, R. C., Buchbinder, B. R., Weisskoff, R. M.,Cohen, M. S., Vevea, J. M., Brady, T. J. and Rosen, B. R. (1991), ‘Functional mappingof the human visual cortex by magnetic resonance imaging’, Science.

Benjamini, Y. and Hochberg, Y. (1995), ‘Controlling the false discovery rate: A practicaland powerful approach to multiple testing’, Journal of the Royal Statistical Society57(1), 289–300.

Benjamini, Y. and Yekutieli, D. (2001), ‘The control of the false discovery rate in multipletesting under dependency’, Annals of Statistics 29(4), 1165–1188.

Birn, R. M., Donahue, K. M. and Bandettini, P. A. (1999), Magnetic resonance imaging:principles, pulse sequences, and functional imaging, in W. Hendee, ed., ‘Biomed-ical Uses of Radiation’, Vol. 1, Wiley and Sons, chapter 9.

Boynton, G. M., Engel, S. A., Glover, G. H. and Heeger, D. J. (1996), ‘Linear systemsanalysis of functional magnetic resonance imaging in human V1’, The Journal ofNeuroscience 16(13), 4207–4221.

Buckheit, J. B. and Donoho, D. L. (1995), Wavelab and reproducible research,Technical Report 474, Dept. of statistics, Stanford University. http://www-stat.stanford.edu/˜wavelab.

Buckner, R. L., Bandettini, P. A., Craven, K. M. O., Savoy, R. L., Petersen, S. E., Raichle,M. E. and Rosen, B. R. (1996), ‘Detection of cortical activation during averagedsingle trials of a cognitive task using functional magnetic resonance imaging’, Pro-ceedings of the National Academy of Sciences 93, 14878–14883.

Bullmore, E. T., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Carpenter, T. A.and Brammer, M. (2001), ‘Colored noise and computational inference in neuro-physiological (fMRI) time series analysis: Resampling methods in time and wave-let domains’, Human Brain Mapping 12, 61–78.

BIBLIOGRAPHY 141

Burock, M. A., Buckner, R. L., Woldorff, M. G., Rosen, B. R. and Dale, A. M. (1998), ‘Ran-domized event-related experimental designs allow for extremely rapid presenta-tion rates using functional MRI’, NeuroReport 9, 3735–3739.

Buxton, R. B., Wong, E. C. and Frank, L. R. (1998), ‘Dynamics of blood flow and oxy-genation changes during brain activation: the balloon model’, Magnetic Resonancein Medicine 39, 855–864.

Cai, T. T. (2003), ‘Rates of convergence and adaptation over Besov spaces under point-wise risk’, Statistica Sinica 13(3), 881–902.

Calhoun, V., Adal, T., Kraut, M. and Pearlson, G. (2000), ‘A weighted least-squares al-gorithm for estimation and visualization of relative latencies in event-related func-tional MRI’, Magnetic Resonance in Medicine 44(6), 947–954.

Ciuciu, P., Poline, J.-B., Marrelec, G., Idier, J., Pallier, C. and Benali, H. (2003), ‘Unsuper-vised robust non-parametric estimation of the hemodynamic response function forany fMRI experiment’, IEEE Transactions on Medical Imaging 22(10), 1224– 1234.

Cohen, A., Daubechies, I. and Feauveau, J. C. (1992), ‘Biorthogonal bases of compactlysupported wavelets’, Communications on Pure and Applied Mathematics 45, 485–560.

Dale, A. M. (1999), ‘Optimal experimental design for event-related fMRI’, Human BrainMapping 8, 109–114.

Dale, A. M. and Buckner, R. L. (1997), ‘Selective averaging of rapidly presented indi-vidual trials using fMRI’, Human Brain Mapping 5, 329–340.

Dartmouth Brain Imaging Center (1999), ‘Example fMRI data set’.http://dbic.dartmouth.edu/researcher/data.php.

Daubechies, I. (1988), ‘Orthonormal bases of compactly supported wavelets’, Commu-nications on Pure and Applied Mathematics 41, 909–996.

Daubechies, I. (1993), ‘Orthonormal bases of compactly supported wavelets: II. vari-ations on a theme’, SIAM Journal on Mathematical Analysis 24(2), 499–519.

Donahue, R. M. J. (1999), ‘A note on information seldom reported via the P value’, TheAmerican Statistician 53(4), 303–306.

Donoho, D. L. and Johnstone, I. M. (1994), ‘Ideal denoising in an orthonormal basischosen from a library of bases’, Comptes Rendus de l’Academie des Sciences, Series A319, 1317–1322.

Donoho, D. L. and Johnstone, I. M. (1995), ‘Adapting to unknown smoothness by wave-let shrinkage’, Journal of the American Statistical Association 90, 1200–1224.

142 BIBLIOGRAPHY

Dragotti, P. L. and Vetterli, M. (2002), Deconvolution with wavelet footprints for ill-posed inverse problems, in ‘Proc. IEEE: International Conference on Acoustics,Speech, and Signal Processing.’.

Edelstein, W. A., Bottomley, P. A. and Pfeifer, P. M. (1983), ‘A signal-to-noise calibrationprocedure for NMR imaging systems’, Medical Physics 11, 180–185.

Fadili, J. and Bullmore, E. T. (2001), ‘Wavelet-generalised least squares: a new BLU es-timator of regression models with long-memory errors’, NeuroImage 15, 217–232.

Feilner, M., Blu, T. and Unser, M. (1999), Statistical analysis of fMRI data using ortho-gonal filterbanks, in ‘Proc. SPIE: Wavelet Applications in Signal and Image Pro-cessing’, Vol. 3813, pp. 551–560.

Figueiredo, M. A. T. and Nowak, R. D. (2003), ‘An EM algorithm for wavelet-basedimage restoration’, IEEE Transactions on Image Processing.

Friston, K. J., Fletcher, P., Josephs, O., Holmes, A., Rugg, M. D. and Turner, R. (1998a),‘Event-related fMRI: Characterizing differential responses’, NeuroImage 7, 30–40.

Friston, K. J., Frackowiak, R. S. J. and Turner, R. (1995a), ‘Characterizing dynamic brainresponse with fMRI: A multivariate approach’, NeuroImage 2, 166–172.

Friston, K. J., Frith, C. D., Turner, R. and Frackowiak, R. S. J. (1995b), ‘Characterizingevoked hemodynamics with fMRI’, NeuroImage 2, 157–165.

Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J. P., Frith, C. D. and Frackowiak, R.S. J. (1995c), ‘Statistical parametric maps in functional imaging: A general linearapproach’, Human Brain Mapping 2, 189–210. http://www.fil.ion.ucl.ac.uk/spm.

Friston, K. J., Josephs, O., Rees, G. and Turner, R. (1998b), ‘Nonlinear event-related re-sponses in fMRI’, Magnetic Resonance in Medicine 39, 41–52.

Friston, K. J., Michelli, A., Turner, R. and Price, C. J. (2000a), ‘Nonlinear responses infMRI: The balloon model, Volterra kernels and other hemodynamics’, NeuroImage12, 466–477.

Friston, K. J., Worsley, K. J., Frackowiak, R. S. J., Mazziotta, J. C. and Evans, A. C. (1994),‘Assessing the significance of focal activations using their spatial extent’, HumanBrain Mapping 1, 214–220.

Friston, K., Josephs, O., Zarahn, E., Holmes, A. P., Rouquette, S. and Poline, J.-B. (2000b),‘To smooth or not to smooth? Bias and efficiency in fMRI time-series analysis.’,NeuroImage 12, 196–208.

Genovese, C. R., Lazar, N. A. and Nichols, T. E. (2002), ‘Thresholding of statistical mapsin functional neuroimaging using the false discovery rate’, NeuroImage 15, 772–786.http://www.sph.umich.edu/˜nichols/FDR.

BIBLIOGRAPHY 143

Ghael, S. P., Sayeed, A. M. and Baraniuk, R. G. (1997), Improved wavelet denoisingvia empirical Wiener filtering, in ‘Proc. SPIE: Wavelet Applications in Signal andImage Processing’, Vol. 3169, pp. 389–399.

Glover, G. H. (1999), ‘Deconvolution of impulse response in event-related BOLD fMRI’,NeuroImage 9, 416–429.

Gold, S., Christian, B., Arndt, S., Zeien, G., Cizadlo, T., Johnson, D. L., Flaum, M. andAndreasen, N. C. (1998), ‘Functional MRI statistical software packages: A compar-ative analysis’, Human Brain Mapping 6, 73–84.

Gossl, G., Fahrmeir, L. and Auer, D. P. (2001), ‘Bayesian modeling of the hemodynamicresponse function in BOLD fMRI’, NeuroImage 14, 140–148.

Gudbjartsson, H. and Patz, S. (1995), ‘The Rician distribution of noisy MRI data’, Mag-netic Resonance in Medicine 34, 910–914.

Hanson, S. J. and Bly, B. M. (2001), ‘The distribution of bold susceptibility effects in thebrain is non-Gaussian’, NeuroReport 12(9), 1971–1976.

Henkelman, R. M. (1985), ‘Measurement of signal intensities in the presence of noise inMR images’, Medical Physics 12(2), 232–233.

Henson, R. N. A., Price, C. J., Rugg, M. D., Turner, R. and Friston, K. J. (2002), ‘Detectinglatency differences in event-related BOLD responses: Application to words versusnonwords and initial versus repeated face presentations’, NeuroImage 15(1), 83–97.

Hillery, A. D. and Chin, R. T. (1991), ‘Iterative Wiener filters for image restoration’, IEEETransactions on Signal Processing 39, 1892–1899.

Hilton, M., Ogden, T., Hattery, D., Eden, G. and Jawerth, B. (1996), Wavelet processingof functional MRI data, in A. Aldroubi and M. Unser, eds, ‘Wavelets in Biology andMedicine’, CRC Press.

Hinrichs, H., Scholz, M., Tempelmann, C., Woldorff, M. G., Dale, A. M. and Heinze,H. J. (2000), ‘Deconvolution of event-related fMRI responses in fast-rate experi-mental designs: Tracking amplitude variations’, Journal of Cognitive Neuroscience12(6(suppl. 2)), 76–89.

Holmes, A. P. (1995), Statistical Issues in functional Brain Mapping, PhD thesis, Univer-sity of Glasgow.

Howseman, A. M. and Bowtell, R. W. (1999), ‘Functional magnetic resonance ima-ging: Imaging techniques and contrast mechanisms’, Philosophical Transactions ofthe Royal Society 354, 1179–1194.

144 BIBLIOGRAPHY

Jansen, M. and Bultheel, A. (2001), ‘Empirical bayes approach to improve waveletthresholding for image noise reduction’, Journal of the American Statistical Associ-ation 96(454), 629–639.

Jasdzewski, G., Strangman, G., Wagner, J., Kwong, K. K., Poldrack, R. A. and Boas,D. A. (2003), ‘Differences in the hemodynamic response to event-related motorand visual paradigms as measured by near-infrared spectroscopy’, NeuroImage20(1), 479–488.

Johnstone, I., Kerkyacharian, G., Picard, D. and Raimondo, M. (2004), ‘Wavelet decon-volution in a periodic setting’, Journal of the Royal Statistical Society. accepted forpublication.

Johnstone, I. M. and Silverman, B. W. (1997), ‘Wavelet threshold estimators for data withcorrelated noise’, Journal of the Royal Statistical Society 59(2), 319–351.

Josephs, O. and Henson, R. N. A. (1999), ‘Event-related functional magnetic resonanceimaging: Modelling, inference and optimization’, Philosophical Transactions of theRoyal Society 354, 1215–1228.

Josephs, O., Turner, R. and Friston, K. J. (1997), ‘Event-related fMRI’, Human Brain Map-ping 5, 243–248.

Kalifa, J., Mallat, S. and Rouge, B. (2003), ‘Deconvolution by thresholding in mirrorwavelet bases’, IEEE Transactions on Image Processing 12(4), 446–457.

Kwan, R. K.-S., Evans, A. C. and Pike, G. B. (1996), An extensible MRI sim-ulator for post-processing evaluation, in ‘Proc. Visualization in BiomedicalComputing’, Vol. 1131 of Lecture Notes in Computer Science, pp. 135–140.http://www.bic.mni.mcgill.ca/brainweb.

LaConte, S. M., Ngan, S.-C. and Hu, X. (2000), ‘Wavelet transform-based Wiener filter-ing of event-related fMRI data’, Magnetic Resonance in Medicine 44, 746–757.

Liao, C. H., Worsley, K., Poline, J. B., Aston, J. A. D., Duncan, G. H. and Evans, A. C.(2002), ‘Estimating the delay of the response in fMRI data’, NeuroImage 16(3), 593–606.

Logothetis, N. K. (2002), ‘The neural basis of the blood-oxygen-level-dependent func-tional magnetic resonance imaging signal’, Philosophical Transactions of the RoyalSociety 357(1424), 1003–1037.

Logothetis, N. K., Pauls, J., Augath, M., Trinath, T. and Oeltermann, A. (2001), ‘Neuro-physiological investigation of the basis of the fMRI signal’, Nature 412, 150–157.

Luo, W.-L. and Nichols, T. E. (2003), ‘Diagnosis and exploration of massively univariatefMRI models’, NeuroImage 19(3), 1014–1032.

BIBLIOGRAPHY 145

Mallat, S. (1998), A Wavelet Tour of Signal Processing, Academic Press.

Mallat, S. G. (1989), ‘A theory for multiresolution signal decomposition: The wave-let representation’, IEEE Transactions on Pattern Analysis and Machine Intelligence11(7), 674–693.

Mallat, S. G. (1991), ‘Zero-crossings of a wavelet transform’, IEEE Transactions on Inform-ation Theory 37(4), 1019–1033.

Malonek, D. and Grinvald, A. (1996), ‘Interactions between electrical activity and cor-tical microcirculation revealed by imaging spectroscopy; implications for func-tional brain imaging’, Science 272(5261), 551–554.

Menon, R. S., Luknowsky, D. C. and Gati, J. S. (1998), ‘Mental chronometry usinglatency resolved functional MRI’, Proceedings of the National Academy of Sciences95(18), 10902–10907.

Meyer, F. G. (2003), ‘Wavelet-based estimation of a semiparametric generalized linearmodel of fMRI time-series’, IEEE Transactions on Medical Imaging 22(3), 315–322.

Miezin, F. M., Macotta, L., Ollinger, J. M., Petersen, S. E. and Buckner, R. L. (2000),‘Characterizing the hemodynamic response: Effects of presentation rate, samplingprocedure and the possibility of ordering brain activity based on relative timing’,NeuroImage 11, 735–759.

Neelamani, R., Choi, H. and Baraniuk, R. G. (2004), ‘ForWaRD: Fourier-wavelet regu-larized deconvolution for ill-conditioned systems’, IEEE Transactions on Signal Pro-cessing 52(2), 418– 433.

Nichols, T. E. and Holmes, A. P. (2002), ‘Nonparametric permutation tests for functionalneuroimaging: A primer with examples’, Human Brain Mapping 15(1), 1–25.

Nowak, R. D. (1999), ‘Wavelet-based Rician noise removal for magnetic resonance ima-ging’, IEEE Transactions on Image Processing 8(10), 1408 –1419.

Ogawa, S., Lee, T. M., Kay, A. R. and Tank, D. W. (1990), ‘Brain magnetic resonanceimaging with contrast dependent on blood oxygenation’, Proceedings of the NationalAcademy of Sciences 87, 9868–9872.

Ollinger, J. M., Corbetta, M. and Shulman, G. L. (2001a), ‘Separating processes within atrial in event-related functional MRI, II. Analysis’, NeuroImage 13(1), 218–229.

Ollinger, J. M., Shulman, G. L. and Corbetta, M. (2001b), ‘Separating processes within atrial in event-related functional MRI, I. The method’, NeuroImage 13(1), 210–217.

Panych, L. P. (1996), ‘Theoretical comparison of Fourier and wavelet encoding in mag-netic resonance imaging’, IEEE Transactions on Medical Imaging 15(2), 141–153.

146 BIBLIOGRAPHY

Petersson, K. M., Nichols, T. E., Poline, J.-B. and Holmes, A. P. (1999a), ‘Statistical limit-ations in functional neuroimaging I. Non-inferential methods and statistical mod-els’, Philosophical Transactions of the Royal Society 354, 1239–1260.

Petersson, K. M., Nichols, T. E., Poline, J.-B. and Holmes, A. P. (1999b), ‘Statistical lim-itations in functional neuroimaging II. Signal detection and statistical inference’,Philosophical Transactions of the Royal Society 354, 1261–1281.

Philips Medical Systems (1996), ‘Basic principles of MR imaging’, Best, the Netherlands.Second edition.

Pizurica, A., Philips, W., Lemahieu, I. and Acheroy, M. (2003), ‘A versatile wavelet do-main noise filtration technique for medical imaging’, IEEE Transactions on MedicalImaging 22(3), 323–331.

Raichle, M. E. (2001), ‘Bold insights’, Nature 412, 128–130.

Rajapakse, J. C. and Piyaratna, J. (2001.), ‘Bayesian approach to segmentation of stat-istical parametric maps’, IEEE Transactions on Biomedical Engineering 48(10), 1186–1194.

Rajapakse, J. C., Giedd, J. and Rapoport, J. L. (1997), ‘Statistical approach to segmenta-tion of single-channel MR images’, IEEE Transactions on Medical Imaging 16(2), 176–186.

Raz, J. and Turetsky, B. (1999), Wavelet ANOVA and fMRI, in ‘Proc. SPIE: Wavelet Ap-plications in Signal and Image Processing’, Vol. 3813, pp. 561–570.

Raz, J., Zheng, H., Ombao, H. and Turetsky, B. (2003), ‘Statistical tests for fMRI basedon experimental randomization’, NeuroImage 19(2), 226–232.

Rice, S. O. (1945), ‘Mathematical analysis of random noise III-IV’, Bell System TechnicalJournal.

Rioul, O. and Duhamel, P. (1992), ‘Fast algorithms for discrete and continuous wavelettransforms’, IEEE Transactions on Information Theory 38(2), 569–486.

Ruttimann, U. E., Unser, M., Rawlings, R. R., Rio, D., Ramsey, N. F., Mattay, V. S.,Hommer, D. W., Frank, J. A. and Weinberger, D. R. (1998), ‘Statistical analysis offunctional MRI data in the wavelet domain’, IEEE Transactions on Medical Imaging17(2), 142–154.

Sanchez-Avila, C. (2002), ‘Wavelet domain signal deconvolution with singularity-preserving regularization’, Mathematics and Computers in Simulation 2101, 1–12.

Sijbers, J., den Dekker, A. J., Audekerke, J. V., Verhoye, M. and Dyck, D. V. (1998a),‘Estimation of the noise in magnitude MR images’, Magnetic Resonance Imaging16(1), 87–90.

BIBLIOGRAPHY 147

Sijbers, J., den Dekker, A. J., Scheunders, P. and Dyck, D. V. (1998b), ‘Maximum like-lihood estimation of Rician distribution parameters’, IEEE Transactions on MedicalImaging 17(3), 357–361.

Sijbers, J., den Dekker, A. J., Verhoye, M., Raman, E. and Dyck, D. V. (1998c), Optimalestimation of T2 maps from magnitude MR data, in ‘Proc. SPIE: Medical Imaging’,Vol. 3338, pp. 384–390.

Smith, S. M. (2002), ‘Fast automated brain extraction’, Human Brain Mapping 17(3), 143–155.

Turkheimer, F. E., Banati, R. B., Visvikis, D., Aston, J. A. D., Gunn, R. N. and Cunning-ham, V. J. (2000), ‘Modeling dynamic PET-SPECT studies in the wavelet domain’,Journal of Cerebral Blood Flow and Metabolism 20, 879–893.

Turkheimer, F. E., Brett, M., Visvikis, D. and Cunningham, V. J. (1999), ‘Multiresolutionanalysis of emission tomography images in the wavelet domain’, Journal of CerebralBlood Flow and Metabolism 19(11), 1189–1208.

Turner, R., Howseman, A., Rees, G. E., Josephs, O. and Friston, K. J. (1998), ‘Functionalmagnetic resonance imaging of the human brain: data acquisition and analysis’,Experimental Brain Research 123, 5–12.

Unser, M. (1999), ‘Splines: A perfect fit for signal and image processing’, IEEE SignalProcessing Magazine 16(6), 22–38.

Unser, M. and Blu, T. (2000), ‘Fractional splines and wavelets’, SIAM Review 42(1), 43–67.

Vazquez, A. L. and Noll, D. C. (1998), ‘Non-linear aspects of the blood oxygenationresponse in functional MRI’, NeuroImage 7(2), 108–118.

Vetterli, M. and Herley, C. (1992), ‘Wavelets and filter banks: Theory and design’, IEEETransactions on Signal Processing 40(9), 2207–2232.

Westenberg, M. A. and Roerdink, J. B. T. M. (2000), ‘Frequency domain volume ren-dering by the wavelet X-ray transform’, IEEE Transactions on Image Processing9(7), 1249–1261.

Wink, A. M. and Roerdink, J. B. T. M. (2004), ‘Denoising functional MR images: a com-parison of wavelet denoising and Gaussian smoothing’, IEEE Transactions on Med-ical Imaging 23(3), 374–387.

Wood, J. C. and Johnson, M. K. (1999), ‘Wavelet packet denoising of magnetic resonanceimages: importance of Rician noise at low SNR’, Magnetic Resonance in Medicine41(3), 631–635.

148 BIBLIOGRAPHY

Worsley, K. J. and Friston, K. J. (1995), ‘Analysis of fMRI time-series revisited - again’,NeuroImage 2, 173–181.

Worsley, K. J., Liao, C., Aston, J., Petre, V., Duncan, G. H., Morales, F. and Evans, A. C.(2002), ‘A general statistical analysis for fMRI data’, NeuroImage 15, 1:15.

Worsley, K. J., Marrett, S., Neelin, P., Vandal, A. C., Friston, K. J. and Evans, A. C.(1996), ‘A unified statistical approach for determining significant signals in imagesof cerebral activation’, Human Brain Mapping 4, 58–73.

Worsley, K. J., Wolforth, M. and Evans, A. C. (1997), Scale space searches for a periodicsignal in fMRI data with spatially varying hemodynamic response, in ‘Proceedingsof BrainMap’96’.

Zarahn, E., Aguirre, G. and D’Esposito, M. (1997), ‘A trial-based experimental designfor fMRI’, NeuroImage 6, 122–138.

Zibulevsky, M. and Zeevi, Y. Y. (2002), ‘Extraction of a single source from multichanneldata using sparse decomposition’, Neurocomputing 49(1), 163–173.

Index

active spot, 50, 51alignment

spatial, 7, 93, 110temporal, 89

analysis of covariance, 89analysis of covariance, 9, 97, 111analysis of variance, 89ANCOVA, see analysis of covarianceANOVA, see analysis of varianceapproximation, 13, 47, 84, 104autocorrelation

spatial, 45, 49

Bessel function, 25biphase decomposition, 127, 128

Z-transform, 128frequency domain, 128

blood oxygenation level, 82blood oxygenation level dependent con-

trast, 6, 7BOLD, see blood oxygenation level de-

pendentBonferroni correction, 11, 43

cerebellum, 93, 94coil, 4, 5complementary error function, 28complex magnitude, 24convolution, 13, 50, 81, 104correlation, 9cross-correlation, 27

damped oscillations, 81deconvolution, 81, 83, 105derivative

time, 81

detail, 13, 47, 84, 105DFT, see Fourier transformdifferential response, 9diffusion

passive, 6downsampling, 13, 84, 127

Z-transform, 128frequency domain, 128

echo planar imaging, 6, 7EPI, see echo planar imagingevent, 2explanatory signals,variables, 9, 42, 80,

104

false discovery rate, 11, 44, 44, 64, 93, 110false negative, 42, 43false positive, 11, 42, 42familywise error, 11fast wavelet transform, 13, 47, 49, 84, 105

in frequency domain, 132, 133fast wavelet transform (FWT), 14FDR, see false discovery rate, see false dis-

covery rateFFT, see Fourier transformfMRI, see functional MRIForWaRD, see Fourier-wavelet regularised

deconvolutionFourier transform, 12, 81

— basis functions, 12, 93, 110— inversion, estimate, 83discrete (DFT), 12fast (FFT), 12

Fourier-wavelet decomposition, 47, 48Fourier-wavelet reconstruction, 47, 48

150 INDEX

Fourier-wavelet regularised deconvolu-tion, 84

Fourier-wavelet regularised deconvolu-tion, 79, 105

using spline wavelets, 109Fourier-wavelet regularised deconvolu-

tion (ForWaRD), 85, 105full width at half maximum, 43functional imaging, 1

— neuroimaging, 1EEG, 1MEG, 1ultrasound, 1

functional MRI, 1, 2, 6— and neuronal activity, 11block design, 8event-related, 7, 9, 93, 110single-subject, 2state-related, 9

FWD, see Fourier-wavelet decompositionFWE, see familywise errorFWHM, see full width at half maximumFWR, see Fourier-wavelet reconstructionFWT, see fast wavelet transform

Gaussian random field, 9, 43general linear model, 8, 31, 79, 80GLM, see general linear modelgradient

frequency, 4phase, 5

GRF, see Gaussian random fieldground truth, 7

haemodynamic response function, 8, 78,103

canonical, 82, 95, 110model, 81, 82, 87region-specific, 95, 96, 98, 110, 111,

112whole-volume, 95, 96, 98, 110, 111,

112haemoglobin, 6

deoxy —, 6oxy —, 6

Hanning window, 93, 110histogram, 33, 35, 36

bimodal, 57HRF, see haemodynamic response func-

tionhypothesis testing, 9, 42, 80

nonparametric, 10

inter-stimulus interval, 10, 93inverse fast wavelet transform

in frequency domain, 133ISI, see inter-stimulus interval

Kolmogorov-Smirnov test, 30KS test, see Kolmogorov-Smirnov test

L2-difference, 95, 97Larmor precession, 3, 3

— frequency, 3least-squares estimate, 42linear time invariant, 8LTI, see linear time invariant

magnetic field, 3B0, 4RF pulse, 4

magnetic resonance imaging, 1, 1, 3, 6k-space, 24k-space, 5— angiography, 1echo time (TE), 4, 6repetition time (TR), 5sequence, 5

maximum intensity projection, 93, 94, 98,111, 113

mean square error, 28, 86MIP, see maximum intensity projectionmonophase reconstruction, 106, 127

frequency domain, 131in frequency domain, 107

motor cortex, 93, 94MRI, see magnetic resonance imaging

INDEX 151

MSE, see mean square error

noise1/f , 10, 46, 50Gaussian, 8, 25, 43pink, 10Rayleigh, 25Rician, 10, 25, 26, 33, 50stationary, 9suppression, 7white, 10, 46

normalisationspatial, 7, 93, 110

null hypothesis, 27, 42omnibus, 43

ondelette, see waveletone-sample t-test, 43onset delay, 89

p-value, 42corrected, 43distribution of —, 62

pdf, see probability density functionPET, see positron emission tomographyPoisson function, 82polyphase decomposition, 14, 84, 106, 127,

129Z-transform, 129frequency domain, 131in frequency domain, 106

polyphase reconstruction, 14positive regression dependent within sub-

sets, 45, 63positron emission tomography, 1PRDS, see positive regression dependent

within subsetspremotor cortex, 93, 94probability density function, 25proton, 4

random field theory, 11regularisation, 105relaxation, 4

longitudinal, T1, 4transverse, T2, T∗

2, 4Rice, 24

sample variance, 43scaling function, 12segmentation, 57selective averaging, 79, 94, 110shift-invariant wavelet transform, 14, 15,

84, 105in frequency domain, 106

shift-invariant wavelet transform (SI-DWT),134

computation times, 109in frequency domain, 115, 134

shrinkagefrequency domain, 83, 106Tikhonov, 83, 89, 105wavelet domain, 84, 106Wiener, 83, 89, 105

SI-DWT, see shift-invariant wavelet trans-form, see shift-invariant wavelettransform

signal-to-noise ratio, 24, 50, 86map, 55temporal, 55

significance, 42level, 43

significantstatistically, 8

smoothing, 7SNR, see signal-to-noise ratiospin, 3, 3

number, 3SPM, see statistical parametric mappingstate, 2statistical test

t-test, 43statistical analysis, 8, 93, 110statistical parametric mapping, 8, 42, 80,

104statistical test

F -test, 24, 45, 93, 110

152 INDEX

t-test, 24, 45, 62, 97, 112z-test, 24

stimulus, 8subsampling, 14, 106

in frequency domain, 106supplementary premotor area, 93, 94

T1, see relaxationT2, see relaxationT∗

2, see relaxationTaylor expansion, 81TE, see magnetic resonance imagingTR, see magnetic resonance imagingtrend, 87

removal, 85two-scale relation, 12type I error, see false positivetype II error, see false negative

upsampling, 13, 127Z-transform, 128frequency domain, 128

variance ratio, 93, 112, 113

wavelet, 11, 12wavelet basis, 84

Daubechies, 15, 16, 90Haar, 14, 16orthogonal, 12splines, 16, 17, 49, 109, 109

wavelet basis function, 12— filter, 13, 90with compact support, 13dual, 13

wavelet transform, 84wavelet-based denoising, 48

HybridThresh, 48InvShrink, 48MinMaxThresh, 48MultiMAD, 48SUREThresh, 48, 93VisuShrink, 48WaveJS, 48

Z-transform, 127

Publications

Papers in scientific journals

A. M. Wink and J. B. T. M. Roerdink, Denoising functional MR images: a comparisonof wavelet denoising and Gaussian smoothing, IEEE Transactions on Medical Imaging23(3), 374–387, 2004.

Papers in conference proceedings

A. M. Wink and J. B. T. M. Roerdink, Enhancing functional neuroimages: wavelet de-noising as an alternative to Gaussian smoothing, in Proc. International Conference onComputer Vision and Graphics, Vol. 2, pp. 787–792, September 25–29 2002, Zakopane, Po-land.

A. M. Wink and J. B. T. M. Roerdink, The effect of image enhancement on the statisticalanalysis of functional neuroimages: Wavelet-based denoising and Gaussian smoothing,in Proc. SPIE: Medical Imaging, Vol. 5032, pp. 1320–1330, February 15–20 2003, San Diego,USA.

Submitted Material

A. M. Wink and J. B. T. M. Roerdink, BOLD noise assumptions in fMRI, submitted toIEEE Transactions on Medical Imaging, 2004.

A. M. Wink and J. B. T. M. Roerdink, Extracting the haemodynamic response func-tion using Fourier-wavelet regularised deconvolution, submitted to IEEE Transactionson Medical Imaging, 2004.

A. M. Wink and J. B. T. M. Roerdink, ‘Extracting the haemodynamic response func-tion from fMRI time series using Fourier-wavelet regularised deconvolution with or-

154

thogonal spline wavelets’, submitted to the International Conference on Medical ImageComputing and Computer-Assisted Intervention (MICCAI), September 26–30 2004, Rennes–St. Malo, France.

Samenvatting

De term functional neuroimaging wordt gebruikt voor het vakgebied dat moderne beeld-vormende technieken, zoals magnetische resonantie imaging (MRI), positron emissie-tomografie (PET) en electro-encefalografie (EEG), gebruikt om neurale processen zicht-baar te maken. Binnen dit vakgebied speelt functionele MRI (fMRI) een belangrijke rol.De veelzijdigheid van fMRI maakt een grote verscheidenheid aan experimenten moge-lijk. De complexiteit van de onderzoeksvragen en de grootte van de datasets vereisenkrachtige analysemethoden.

Het eerste deel van hoofdstuk 1 van dit proefschrift geeft een overzicht van de aspec-ten van MRI die belangrijk zijn voor fMRI. De natuurkundige principes van MRI en debenodigde stappen om een beeld te maken worden kort behandeld. Het contrast in fM-RI is gebaseerd op de verschillende magnetische eigenschappen van zuurstofarm bloeden zuurstofrijk bloed, die kunnen worden gemeten met een MR-scanner. De toepassingvan fMRI is de afgelopen tien jaar ontwikkeld van het simpelweg doen van een t-testop het verschil tussen beelden, tot het doen van nonparametrische tests op ANCOVA-gebaseerde experimenten. De meeste hedendaagse analysemethoden zijn gemaakt rondhet algemene lineaire model, dat veronderstelt dat het fMRI-signaal lineair en tijdinvari-ant is. De meest gebruikte methode voor fMRI-analyse is statistical parametric mapping,waarmee op een efficiente manier veel gelijktijdige statistische tests kunnen worden ge-daan. Het nadeel van de populariteit van deze methode is dat ze veel aannames overde data gebruikt die nu vaak zonder verdere controle als waar worden beschouwd. Re-cente studies hebben deze overhaaste aannames aan de kaak gesteld, door gevallen telaten zien waar ze niet opgaan.

Het tweede deel van hoofdstuk 1 introduceert het concept wavelet-analyse. Een wavelet-transformatie ontleedt een signaal in een aantal representaties op verschillende scha-len. Twee transformaties worden beschouwd: de fast wavelet transform, die gebaseerdis op multiresolutie-analyse, en de shift-invariant wavelet transform, die gebaseerd is oppolyfase-decompositie. Implementaties van beide soorten transformatie worden gege-ven in zowel het tijddomein als het frequentiedomein. De toepassingen van wavelet-transformaties in dit proefschrift, namelijk het verwijderen van ruis en het extraherenvan responsfuncties, worden kort besproken.

Hoofdstuk 2 bevat een kritische analyse van de definitie van het fMRI-signaal. Deruis (de onvoorspelbare component) in het fMRI-signaal wordt meestal gemodelleerdals Gaussisch verdeeld en additief, maar MRI-metingen hebben een zogenaamde Rice-

156

verdeling. Dit betekent dat de ruis niet additief is maar multiplicatief, en bovendien datde verdeling ervan, in tegenstelling tot een Gaussische verdeling, asymmetrisch is. Indit hoofdstuk wordt aangetoond dat als elk fMRI-beeld wordt berekend als het verschiltussen twee MR-beelden van hetzelfde object en met dezelfde signaal-ruisverhouding,de grijswaardenverdeling van dat beeld symmetrisch en, tot een zeer nauwkeurige be-nadering, Gaussisch is. Deze eigenschappen van de zogenaamde nulverdeling (de ruis-verdeling bij afwezigheid van activatie) worden analytisch afgeleid en bevestigd doorproeven op beelden met synthetische ruis. Proeven met een ruisvrij MR-voorbeeld ensynthetische Rice-verdeelde ruis laten zien dat de waarden in het verschilbeeld van hetruizige beeld en het ruisvrije beeld niet symmetrisch verdeeld zijn. Het verschilbeeldvan twee ruizige beelden heeft wel een symmetrische verdeling.

De verdeling van temporele ruis wordt in fMRI-analyse vaak gebruikt om de drem-pelwaarde te bepalen voor de statistische test. Proeven met tijdreeksen van beeldenmet synthetische ruis tonen aan dat als van elk beeld in de tijdreeks een overeenkom-stig beeld uit een tweede tijdreeks (zonder activatie) wordt afgetrokken, de parametersvan de (bijna-Gaussische) ruisverdeling zeer nauwkeurig kunnen worden bepaald. Eenlaatste proef betreft een tijdreeks van MR-beelden zonder activatie. De tijdreeks wordtin tweeen verdeeld en het fMRI-signaal in de eerste helft van de tijdreeks wordt op tweemanieren berekend: eerst door het gemiddelde beeld van de eerste helft van elk beeld afte trekken, en daarna door steeds het overeenkomstige beeld van uit de tweede helft afte trekken. In het tweede geval ontstaat er een symmetrische temporele ruisverdeling,die beter door een Gaussische kromme kan worden benaderd dan in het eerste geval.Geconcludeerd kan worden dat deze berekening van het fMRI-signaal een nauwkeuri-ger schatting voor de ruis oplevert dan de gebruikelijke berekening, waarbij steeds hettijdreeksgemiddelde van elk signaal wordt afgetrokken.

In hoofdstuk 3 wordt een generieke op wavelets gebaseerde ruisverwijderingsproce-dure gepresenteerd. Het testen van hypotheses met statistical parametric mapping wordtuitgelegd, en enkele correctiemethoden voor meervoudige tests worden gepresenteerd.Bonferroni-correctie is de meest simpele methode, maar is te behoudend voor ruis dieruimtelijk gecorreleerd is. Correctiemethoden gebaseerd op Gaussische random veldenzijn ook niet wenselijk, omdat ze zware Gaussische smoothing (filteren van de data omdie gladder te maken) vereisen, waardoor details verloren gaan en de gedetecteerdeactieve gebieden vervormd zijn. Het aanpassen van de false discovery rate is een statis-tisch krachtiger methode dan Bonferroni-correctie en er is geen smoothing voor nodig.Voorgesteld wordt een combinatie van op wavelets gebaseerde ruisverwijdering en hetgebruik van de false discovery rate om tijdens de statistische analyse te corrigeren voormeervoudige tests. De fast wavelet transform met basisfuncties van symmetrische, ortho-gonale, cubische splines wordt berekend in het frequentiedomein. Het verwijderen vanruis gebeurt met drempelingsprocedures afkomstig uit WaveLab, die uitgebreid zijnnaar tweedimensionale (2D) beelden.

In proeven met ruizige 2D fMRI-beelden worden twee kopieen van een voorbeeld-afbeelding voorzien van Rice-verdeelde ruis met dezelfde eigenschappen. Aan eenruizig beeld wordt in een klein gebiedje activatie toegevoegd, waarna het andere rui-

Samenvatting 157

zige beeld hiervan wordt afgetrokken. De fMRI-beelden die zo ontstaan worden be-werkt met de wavelet-methoden om zoveel mogelijk ruis te verwijderen. Als referen-tie wordt ook Gaussische smoothing gebruikt om ruis te verwijderen. Een vergelijkingvan de signaal-ruisverhouding van de verschillende beelden laat zien dat de wavelet-methoden effectiever zijn in het verwijderen van ruis dan Gaussische smoothing. Ook dewavelet-methoden maken de beelden gladder, en een vergelijking tussen de wavelet-methoden laat zien dat de methoden die minder gladheid introduceren de hoogstemaximale signaal-ruisverhouding kunnen bereiken. Proeven met een tijdreeks van 2Dbeelden wijzen uit dat de vorm van het gebied waar signaal is gedetecteerd na het ver-wijderen van ruis het meeste lijkt op het originele gebied (in het ruisvrije geval) bij dewavelet-methoden die weinig gladheid introduceren. Afbeeldingen van de temporelesignaal-ruisverhouding en afbeeldingen van een ANCOVA-analyse (analyse van cova-riantie) vertonen ook minder fouten bij de minder gladheid introducerende wavelet-methoden. De geldigheid van statistical parametric mapping na gebruik van de wavelet-methoden wordt getest door middel van het meten van negatieve ruimtelijke correlatiesin de ruis en te kijken naar de verdeling van p-waarden van tijdreeksen zonder activa-tie. Een laatste test met een echte fMRI tijdreeks toont aan dat de actieve gebieden ge-detecteerd na de wavelet-methoden die weinig gladheid introduceren, het meest lijkenop de gebieden in de originele tijdreeks. Na veel smoothing zijn alle gedetecteerde ge-bieden elliptisch en veel groter dan de gebieden gedetecteerd in de originele tijdreeks.De conclusie is dat het aantal valse positieven (niet-actieve gebieden die wel als ac-tief worden aangemerkt) beter in toom te houden is met voorbewerkingsmethoden dieweinig gladheid introduceren. De wavelet-methoden die de beste resultaten leveren,zoals InvShrink-, MinMaxThresh- en SUREThresh-drempeling in het wavelet-domein,gecombineerd met de false discovery rate om te corrigeren voor meervoudige tests, vor-men een aantrekkelijk alternatief voor de combinatie Gaussische smoothing en correctievoor meervoudige tests op basis van Gaussische random velden.

Hoofdstuk 4 en 5 presenteren nieuwe methoden om de hemodynamische respons-functie (HRF) uit een fMRI tijdreeks te extraheren met behulp van Fourier-wavelet gere-gulariseerde deconvolutie (ForWaRD). De problemen die bij het extraheren van de HRFkomen kijken, worden besproken in hoofdstuk 4. Gegeven het algemene lineaire mo-del kan de HRF worden verkregen door middel van deconvolutie, maar dit levert in hetalgemeen geen stabiel resultaat. Dit proces wordt geregulariseerd door de HRF te extra-heren in het frequentiedomein, gevolgd door krimping van de frequentiecoefficientenom ruis tegen te gaan, en daarna toepassing van een Wiener filter in het wavelet-domeinom het resultaat nogmaals te verbeteren. De fast wavelet transform is niet tijdinvariant,daarom maakt ForWaRD gebruik van de shift-invariant wavelet transform. Proeven metgesimuleerde signalen, gecombineerd met ruis van een echte MRI tijdreeks, laten derobuustheid van het algoritme zien met betrekking tegen veranderende eigenschappenvan de invoersignalen.

Een nieuw model voor de HRF, gebaseerd op een lineair systeem met gedempte os-cillaties, wordt gepresenteerd. Het fMRI-signaal afgeleid van zo’n systeem en zijn afge-leide leveren een functionele beschrijving van de HRF op. Samen met de op ForWaRD

158

gebaseerde HRF-extratiemethode levert dit een erg nauwkeurige beschrijving van deHRF op.

De HRF-extractie en het model worden getest op een stel fMRI tijdreeksen van eenproefpersoon en een type stimulus. De tijdreeksen werden gemeten op verschillen-de dagen en het interstimulus interval was constant tijdens de eerste meting en wil-lekeurig (in aantal scans) tijdens de tweede meting. Uit beide reeksen werden HRF-signalen afgeleid door de tijdreeks met de geextraheerde HRF-beelden (de HRF werdberekend in elk beeldpunt) te wegen met een statistische waarde in elk punt. De ver-kregen coefficienten werden gebruikt in het model om nauwkeurige beschrijvingen vande HRF te geven. De beschrijving verkregen uit elk van de tijdreeksen werd vervolgensgebruikt in een ANCOVA-test van de andere tijdreeks, om een statistische afbeeldingvan de activatie te krijgen. De analyses gemaakt met deze gemodelleerde HRF’s werdenvergeleken met de analyses gemaakt met een standaard HRF, en uit deze tests blijkt dathet gebruik van de gemodelleerde HRF’s de statistische analyse van de fMRI tijdreeksenbeduidend verbetert.

De in hoofdstuk 4 gepresenteerde methode wordt in hoofdstuk 5 uitgebreid doorhet gebruik van lange wavelet-filters te ondersteunen, met name orthogonale splinewavelet-filters. Deze filters zijn gedefinieerd in het frequentiedomein en zijn net zolang als het signaal dat gefilterd wordt, zodat filteren in het tijddomein een kostbareoperatie is. Het gebruik van deze filters wordt vergemakkelijkt door een implementa-tie van de shift-invariant wavelet transform in het frequentiedomein. Deze transformatiemaakt gebruik van een polyfase-decompositie, en de nieuwe implementatie doet de-ze decompositie, en de tegenovergestelde monofase-reconstructie, in het frequentiedo-mein. Formules voor al deze bewerkingen worden gegeven in een aanhangsel. Voorlange signalen is deze implementatie in het frequentiedomein een stuk efficienter dande originele versie in het tijddomein.

Dezelfde tijdreeksen als gebruikt in hoofdstuk 4 worden geanalyseerd met de nieu-we implementatie van de HRF-extractiemethode in het frequentiedomein. Na het mo-delleren van de HRF’s worden beide tijdreeksen geanalyseerd met ANCOVA. De re-sultaten komen overeen met die van hoofdstuk 4. De detectie van activatie verbetertdankzij het gebruik van orthogonale spline wavelets voor de HRF-extractie uit metname de tijdreeks met gerandomiseerde interstimulustijden. In de tijdreeks met vas-te interstimulustijden zorgden spline wavelets niet voor een verbetering. Voor langewavelet-filters zoals orthogonale splines reduceert de nieuwe ForWaRD-implementatiein het frequentiedomein de rekentijd aanzienlijk.

Dankwoord

Ik wil aan het eind graag een aantal mensen bedanken die mij de afgelopen jaren hebbengeholpen tijdens het maken van dit proefschrift.

Als eerste mijn promotor, Jos Roerdink. Je hebt mij tijdens dit project veel geleerdover het doen van wetenschappelijk onderzoek. De precisie waarmee je mijn resultatenbeoordeelde en je vermogen om in elke denkbare situatie kalm en beheerst te blijvenzullen me nog lang bijblijven. Ik vond de samenwerking erg plezierig en gezien eenaantal dingen die nog ‘in de pipeline’ zitten, kan die gelukkig nog even doorgaan.

Mijn voormalige promotor Nicolai Petkov bedank ik voor zijn ondersteuning in hetbegin van het project. Daarnaast was je de drijvende kracht achter de zeiluitstapjes vande groep (en iedereen die op dat moment maar mee wilde), een perfecte manier om degemoedelijke sfeer in de groep te houden.

Gert ter Horst en Rob Visser bedank ik voor de mogelijkheden die ze me de af-gelopen maanden hebben geboden. Dankzij mijn aanstelling aan het NeuroimagingCentrum kon ik ook na mijn AiO-project in het vakgebied blijven werken.

Philips Medical systems bedank ik voor de afbeeldingen over MRI in het eerstehoofdstuk.

Met de collega’s van het IWI en BCN heb ik altijd prettig samengewerkt een paarmensen wil ik graag in het bijzonder noemen.

Als eerste Michel Westenberg. Jouw kamer zat tegenover die van mij, wat erg gezel-lig was maar voor jou ook een bezoeking moet zijn geweest. Ik ging altijd als eerste naarjouw kamer voor vragen over LATEX, Linux, wavelets, het maken van ‘stoere’ plaatjes enwat al niet meer, en natuurlijk voor de vele liters koffie. Dit alles bleek onmisbaar voorhet maken van dit proefschrift.

Door Simone Reinders werd ik destijds getipt over de AiO-positie voor dit onder-zoek. Via e-mail, talkers, doordeweekse bezoeken aan de kroeg en zelfs een reis naarHamburg zijn we contact blijven houden. Hopelijk blijven we dat doen als ik straks inEngeland zit!

Van de collega’s wil ik verder nog Michael Wilkinson bedanken voor zijn adviezenen zijn stortvloed aan informatie over allerlei zaken, Remco Renken en Sonja Tomasko-vic voor het delen van fMRI-kennis en -data (elke keer als ik meedeed aan een fMRI-onderzoek kreeg ik als beloning de dataset), Hans Hoogduin voor het binnen een weekregelen van een event-related fMRI-onderzoek en het meermalen corrigeren van mijnwerk, Sybren Deelstra voor het mede-organiseren van twee KIWI-kampen en een perso-

160

neelsuitje, Gerke Hoiting voor de muziek tijdens en na het werk (waaronder een nachtbivakkeren op het NS-station voor kaartjes van Bruce Springsteen), Doeke Keizer voorhet invallen als proefpersoon en voor de broodnodige (en al langer bekende) humor ophet werk. Georgios Ouzounis introduced the concept of going to the pub after work,and calling it a ‘colloquium’ to attract more colleagues. Unfortunately, the breaks in theweekends were not so successful. Thanks for cheering up the place!

Doeke en Simone bedank ik ook voor hun bereidheid om mij als paranimfen te steu-nen bij de verdediging van mijn proefschrift.

Er zijn veel andere vrienden, collega’s en andere bekenden van de afgelopen vierjaar met wie het contact niet direct via dit onderzoek ging, maar die wel zorgden dathet vol te houden was: bedankt.

Als laatste wil ik mijn zus Freke en haar vriend Sipke bedanken voor hun steun. ElkeAiO vergeet wel eens dat er ook nog een leven is buiten het proefschrift, dankzij jullieverloor ik die wetenschap niet uit het oog.

Tot slot bedank ik vooral mijn ouders voor het feit dat ze er altijd voor mij zijn ge-weest.

university of groningen wavelet-based methods for the analysis … · 2016. 3. 7. · isbn...

Documents