wavelet-based statistical analysis of fmri data

1

UCLA, Ivo Dinov Slide 1

Stat 233 Statistical Methods in Biomedical Imaging

Wavelet-Based Statistical Analysis of fMRI Data

Ivo Dinov

http://www.stat.ucla.edu/~dinov

UCLA, Ivo DinovSlide 3

fMRI Simulation movie


Problems with BOLD

How localized is the BOLD response to the site of neuronal activity?

Is the signal from draining veins rather than the tissue itself?

How is the signal change coupled to neuronal activity?

Do changes in timing of BOLD responses reliably tell us about changes in timing of neural activity?


Resting state versus Active statee.g. Finger tapping, word recognition.

Whole brain scanned in ~3 seconds using a high speed imaging technique (EPI).

Perform analysis to detect regions which show a signal increase in response to the stimulus.

Activation Imaging using BOLD


fMRI Data Analysis Tools

Brain Voyager http://www.brainvoyager.de/

CARET http://brainmap.wustl.edu/resources/caretnew.html

FSL http://www.fmrib.ox.ac.uk/fsl/

SCIrun http://software.sci.utah.edu/scirun.html

LONI Pipeline http://www.loni.ucla.edu

McGill BIC www.bic.mni.mcgill.ca/usr/local/matlab5/toolbox/fmri/

AFNI http://afni.nimh.nih.gov/afni/

SPM http://www.fil.ion.ucl.ac.uk/spm/UCLA, Ivo DinovSlide 7

Hypotheses vs. Data driven Approaches

Hypothesis-drivenExamples: t-tests, correlations, general linear model (GLM)

• a priori model of activation is suggested• data is checked to see how closely it matches components of the model• most commonly used approach (e.g., SPM)Data-driven – Independent Component Analysis (ICA)• No prior hypotheses are necessary• Multivariate techniques determine the patterns in the data that account

for the most variance across all voxels• Can be used to validate a model (see if the math comes up with the

components you would’ve predicted)• Can be inspected to see if there are things happening in your data that

you didn’t predict• Can be used to identify confounds (e.g., head motion)• Need a way to organize the many possible components • New and upcoming

2


A. Region-Of-Interest (ROI) approach1. A localizer run(s) to find a region (e.g., show moving rings to find medial temporal MT area)2. Extract time course information from that region in separate independent runs3. See if the trends in that region are statistically significant

The runs that are used to generate the area are independent from those used to test the hypothesis.

Example study: Tootell et al, 1995, Motion Aftereffect

Two approaches: Global vs. Regional

Localize “motion area” MT in a run comparing moving vs. stationary rings

Extract time courses from MT in subsequent runs while subjects see illusory motion (motion aftereffect)

MT


Two Approaches: Whole Brain Stats

Source: Tootell et al

B. Whole volume statistical approach1. Make predictions about what differences you should see if

your hypotheses are correct2. Decide on statistical measures to test for predicted

differences (e.g., t-tests, correlations, GLMs)3. Determine appropriate statistical threshold4. See if statistical estimates are significant

Statistics available1. T-test2. Correlation3. Frequency-Based (Fourier/Wavelet/Fractal) modeling4. General Linear Model -overarching statistical model that lets you perform many types

of statistical analyses (incl. correlation/regression, ANOVA)


Comparing the two approaches

Region of Interest (ROI) Analyses• Gives you more statistical power because you do not have to correct

for the number of comparisons• Hypothesis-driven• ROI is not smeared due to intersubject averaging• Easy to analyze and interpret• Neglects other areas which may play a fundamental role• Popular in North America

Whole Brain Analysis• Requires no prior hypotheses about areas involved• Includes entire brain• Can lose spatial resolution with intersubject averaging• Can produce meaningless lists of areas that are difficult to interpret• Depends highly on statistics and threshold selected• Popular in EuropeNOTE: Though different experimenters tend to prefer one method over the other,

they are NOT mutually exclusive. You can check ROIs you predicted and then check the data for other areas.


Why do we need statistics?MR Signal intensities are random-variation caused by scanner (magnet, coil, time)-variation in neurophysiology (area, task, subject)

The rest-state scans used against activation tasks in subtraction paradigm setting to correct for some of these effects within the same run.

Statistical analyses help us confirm whether the eyeball tests of significance based on visual thresholding are real/correct.

Because we do so many comparisons (~106), we need a way to compensate (increase of Type I error).


Paradigm: Event-related design to assess between-population differences of amplitude and variance of hemodynamicresponse to visual stimulus (14 young, 14 old noAD and 13 AD subjects)

Stimulus required a righthand index-finger click

16 coronal slices64x64 in-plane voxelsvolume time = 2 sec3 x 3 x 5 mm3 voxels

4 runs per Subject128 volumes per run 2 (randomly chosen)conditions (always 7+8)1 Trial = 21.44 sec (8 vol’s)Total 60 Trials/subject

fMRI of Young, Nondemented and Demented Adults

15 Random Trials / Run

1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

Run 1

Run 2

Run 3

1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128Run 4

blank1-trial condition–1.5 second 8Hz B/W checkerboard flicker

2-trial condition – 2 visual stimuli 5.36 sec apart

Buckner et al. J. Cog. Neurosci. 2000


One Approach – Voxel-Based T-testsLook for differences b/w One-Trial & Two-Trial Activation, for a given voxel:

Measure average MR signal and SD for each volume in which 1-Trial stimuli were presented (7-8 epochs x 8 volumes/epoch = 56-64 volumes).Measure average MR signal and SD for each volume in which 2-Trial stimuli were presented (56-64 volumes).

If p = .001 and 65,536 voxels, 65,536*0.001 = 66 voxels could be significant purely by chance

Stat-Significant Mean difference? Calculate To value. Look up p valuefor that number of degrees of freedom (df >= 56 x 2 = 112). e.g., For ~112 df To>1.98 → p <0.05 To > 3.39 → p < 0.001Repeat this process 65,536 more times(64x64x16), once for each voxel.

To look for OneTrial – TwoTrialActivation, look at the negative tail of the comparison

F>PP>F fMR

ISignal

2 Trial1 Trial 2 Trial1 Trial

3


T-test: MapsFor each voxel in the brain, we can now color code a map based

on the computed t and p values:

And we can also do this for the negative tail (2 Trial – 1 Trial)Blue = low significanceGreen = high significance

We can do this for the positivetail (1 Trial – 2 Trial)Orange = low significanceYellow = high significanceSchmutz or ESP voxels?


Another Statistical Analysis – Basic Correlation• voxels with time course correlated with a reference function• can incorporate hemodynamic response function (HRF) to predict time course more accurately

For each voxel:• Find the correlation between the predictor and the MR signal• Extract the correlation (r value) and find the corresponding p value (Pearson/Spearman).• Determine whether it is statistically significant• Similar in spirit to a T-test.

56 data points

56 data points

Data 0 1 Ref.ModelValue of Predictor

Valu

e of

fMR

I Sig

nal

Calculate r

Remember r2 is the proportion of variance accounted for by our predictor, e.g., if r=0.7, r2=0.49 49%


Predictor – Hemodynamic Response Function - HRF

% signal change= (point – baseline)/baselineusually 0.5-3%

initial dip-more focal and potentially a better measure-somewhat elusive so far, not everyone can find it

Perc

ent C

hang

e Fr

om B

asel

ine

StimulusOn

Time from StimulusOnset (sec)

time to risesignal begins to rise soon after stimulus begins

time to peaksignal peaks 4-6 sec after stimulus begins

post stimulus undershoot

signal suppressed after stimulation ends


Correlation: Incorporating the HRFWe can model the expected curve of the data by convolving our predictor with the hemodynamic response function.

To find a 1-Trial responsive area, we can correlate the convolved face predictor with each voxel time course

Data 0 1 ModelValue of Predictor

56 data points

56 Model points

Valu

e of

fMR

I Sig

nal

g(x)KernelPredictor

m(x)

Model(m*g)(x)

Dataf(x)


Correlations: Negative Tail

Looking for anything that shows a reduced response during 1-Trial condition (compared to both 2-Trial and fixation/rest)


Two Main fMRI Designs

Block design• Compare long periods

(e.g., 16 sec) of one condition with long periods of another

• Traditional approach• Most statistically

powerful approach• Less dependent on how

well you model the HRF (hemo)

• Habituation effects?!?

Event-related design• Compare brief trials (1 sec)

of 1 condition with brief of another

• Newer approach (1997+) • Less statistically powerful

but has some advantages• Trials can be well-spaced to

allow the MR signal to return to baseline b/w trials (e.g., 21+ sec apart) or closely spaced (e.g., every 2 sec)

4


Problems with T-tests and Correlation Analyses

1) How do we evaluate runs with different orders?We could average our 4 runs (60 trials) by 1-Trial of 2-Trial stimuli and then do stats comparing the functional differences between the 2 paradigms. There is no way to collapse between orders.

2) If we test more subjects, how can we evaluate all subjects together?As with the single subject runs, we could average all the subjects together (after morphing them into an atlas space) but that still means we have to run all of them in the same order.

3) We can get nice hemodynamic predictors for 1-Trial or 2-Trial stimuli, but how can we compare them accurately?

If this predictor is significant, we won’t know if it’s because

1-Trial > 2-Trial OR 1-Trial > Fixation

A Solution: Use General Linear ModelingUCLA, Ivo DinovSlide 21

General Linear Model for fMRI

Adapted from Brain Voyager course slides

Parse out variance in the voxel’s time course to the contributions of six predictors plus residual noise (what the predictors can’t account for).

+

fMRI signal residuals

+

β1 ×

β2 ×

=

β6 ×

…

+

+

+

+

β3 ×

β4 ×

β5 ×

Design Matrix Examples:Stimulus

Subject

Run

Trial

Group (AD/Young)

ROI


Completeness and Efficiency of Signal Representation – Wavelet functions 1D

SOCR Wavelet/Fourier Demo!

http://socr.stat.ucla.edu/


Mostly no global multi-scale self-similarity.

Contain both man-made and natural features

Mostly no simple and universal underlying physical or biological processes that generate the patterns in a general image.

Features of Natural Images

Ref. www.math.umn.edu/~jhshen


Fourier Representation

Claim: Harmonic waves are bad vision neurons…

Because.A typical Fourier neuron is

To process a simple bright spot in the visual field,

all such neurons have to respond to it (!) since

).exp(iax=φ

)(xδ

.1)(, )( == ∫ − dxex xi πδφδ


Efficiency of representation

David Field (Cornell U, Vision psychologist):

“To discriminate between objects, an effective transform (representation) encodes each image using the smallestpossible number of neurons, chosen from a large pool.”

5


Asking our own “headtop”…

Psychologists show that visual neurons are spatiallyorganized, and each behaves like a small sensor (receptor) that can respond strongly to spatial changes such as edge contours of objects (Fields, 1990).


Marr’s Edge Detection

Detection of edge contours is a critical ability of human vision(Marr, 1982). Marr and Hildreth (1980) proposed a model for human detection of edges at all scales. This is Marr’s Theory of Zero-Crossings:

Why? IG( whereI in occurs Edge

yxyxyG

xGG

yxG

.0)

,2

exp2

11

:isGaussian theofLaplacian

KernelGaussian,2

exp2

1

2

22

2

22

4

2

2

2

2

2

22

2

=∗∆

+−

+−−=

=∂

∂+∂

∂=∆

+−=

σ

σσσ

σ

σσπσ

σπσ


Marr’s Edge Detection

Laplacian of (an isotropic) Gaussian Kernel (at (0,0)) (http://www.dai.ed.ac.uk/HIPR2/flatjavasrc/)

+−= 2

22

2 2exp

21

σπσσ

yxG 22

22

yG

xGG

∂∂+

∂∂=∆ σσσ


Laplacian of a Gaussian –Why an Edge at Zero-Crossing?

Uses the second spatial derivatives of an image. The operator response will be zero in areas where the image has a constant intensity (i.e. where the intensity gradient is zero).

Near a change in image intensity the operator response will be positive on the darker side, and negative on the lighter side.

A sharp edge between two regions of different uniform intensities, the Laplacian of Gaussians response will be:

zero away from the edge, positive to one side of the edge, negative to the otherzero on the edge in b/w.

Example:Response of 1D Laplacian of Gaussian (σ = 3 pixels) to a 1D step edge.



A zero crossing detector looks for places where the Laplacianchanges sign. Such points often occur around edges in images.

But they also occur at places that are not as easy to associate with edges. Zero crossings always lie on closed contours.

The starting point for the zero crossing detector is an image which has been convolved with a Laplacian of Gaussian filter.

The zero crossings that result are strongly influenced by the size of the (isotropic) Gaussian, change filter-size in applet demo.

As the smoothing is increased then fewer and fewer zero crossing contours will be found, and those that do remain will correspond to features of larger and larger scale in the image.



Demos:

C:\Ivo.dir\UCLA_Classes\Applets.dir\ImageProcessing\LaplacianOfGaussian

LaplacianOfGaussianDemo.html

LaplacianDemo.html

6


Marr vs. Haar’s edge encoding

Marr’s edge detector is to use second derivative to locate the maxima of the first derivative (which the edge contours pass through).

Haar Basis (1909) encodes the edges into image representation via the first derivative operator (i.e. moving average/difference):

( )

−=+= →← ++

+,...

2,

2...,...,... 122122

122nn

nnn

nHaar

nnxxxx daxx


A good representation should respect edges

Edges are important features of images.

A good image representation (or basis) should be capable of providing the edge information easily.

Edge is a local feature. Local operators, like differentiation, must be incorporated into the representation, as in the coding by the Haar basis.

Wavelets improve Haar encoding, while respecting the above edge representation principle.


What is a good image representation?

Mathematically rigorous (i.e. a clean and stable analysisand synthesis program exists. FT & IFT…).

Having an independent fast digital formulation framework (e.g., FFT, FWT… ).

Capturing the characteristics of the input signals, and thus many existing processing operators (e.g. image indexing, image searching …) are directly permitted on such representation.

Improving statistical inference power.


Mathematics of wavelet image representation

Let Ω denote the collection of all images. What is the mathematical structure of Ω ? Suppose that is an observed image. Then Ω should be invariant under

Euclidean motion of the scanner/imager:

Brightness/Flashing:

or, more generally, a morphological transform –

Scaling:

Ω∈f

.),2(),()( 2RaOQaQxfxf ∈∈+→

,),()( +∈→ Rxfxf µµ

.0',:)),(()( >→→ hRRhxfhxf

.),()( +∈→ Rxfxf λλ


E.g., Zooming in 2-D


What is zooming?Zooming (aiming) center: a.

Zooming scale: h.

Zoom into the h-neighborhood at a in a given image I:

Zooming is one of the most fundamental and characteristic operators for image analysis and visual communication. It reflects the multi-scale nature of images and vision.

( )aperture. the yII

field; visual the x xhaIxI

hay

hay

ha

ha,1)(

,,)(

,

,

⋅=

Ω∈⋅+=−

Ω−

7


The zooming/scaling and wavelets

The base function:

Aiming (a) and zooming (h), in-or-out:

Generating response (to centering/scaling of signals):

The zooming space:

)( xψ

).()( 1, h

axhha x −= ψψ

.)()(, ,,, dxxxIII hahaha ψψ ∫==

.),( +×∈ RRha


Good Signal representation should be adaptable

A good representation should react strongly to abrupt changes, and weakly to stable intensities

That means, for a stable image, e.g., constant, I=c, the responses Ia , h should be approximately zeros:

This is the “differentiating” property of the encoding, e.g., d/dx

.0)(∫ =R

xψ

0, ,, ≈= haha II ψ


The continuous wavelet representation

DefinitionDefinition:

A differentiating zooming function ψ(x) is said to be a (continuous) wavelet. Representing a given image I(x) by all the scale/shift responses

is the corresponding wavelet representation.

Questions:Questions:Is there a best wavelet ψ(x) ?Does such wavelet representation allow perfect image reconstruction?

haha II ,, ,ψ=


Synthesizing a wavelet representation

Goal:Goal: to recover perfectly an image signal I from its wavelet representation I(a , h).

(Continuous) Wavelet synthesis (where ^ denotes the FT ):

which is in the form of IFT. Let then it can be perfectly recovered via the a-FT of I(a , h).Then can be perfectly recovered from J via

, ),(,,),( ,,ξξψψψ ia

haha ehIIIhaI −∧∧∧∧

===

)(),( ξψξ hIhJ∧∧

=

∧I

.1 as,)(),()(),0[),0[

2)(

== ∫∫ ∞∞

∧∧

∧

dh h

dh h

hhJIhψ

ξψξξ


The admissibility condition & differentiation

The admissibility condition of a continuous wavelet:

A differentiating zooming operator satisfies the admissibility condition since:

Examples:The (1D Laplacian of Gaussian) Marr wavelet (Mexican-hat): second derivative of Gaussian.

The Shannon wavelet:

. /|)(| 2

0∞<

∧∞

∫ dhhhψ

).()( and ,0)()0( hochhdxxR

+===∧∧

∫ ψψψ

t t t

x sincx sincx

ππ

ψ)sin()(sinc

).()2(2)(=

−=


The discrete set of zooming scales

Make a log-linear discretization to the scale parameter h:

Make a scale-adaptive discretization of the zooming centers:

The discrete set of zooming operators:

.,2,1,0log 2 L±±=−=→ jhjj

.,2 ,1 ,0,2/ : 2 scaleat

L±±=

==→= −

kkkhakh j

jkj

j

).2(2)()( 2/1, kxx jj

hkhx

hkj j

j

j−== − ψψψ

8


The discrete wavelet representation

The wavelet coefficients:

Questions:Questions:Does the set of all wavelet coefficients still encode the complete information of each input image I ? Is the set of wavelets a basis?Or more precisely what space has these basis functions?

We don't know. But let's check out some examples...

WT.continuous theof in terms ,

.)2()(2,

2,2,

2/,,

jj kkj

j

R

jkjkj

Id

dxkxxIId

−−=

−== ∫ ψψ

,:)( , Zkjxkj ∈ψ


Example 1: Haar wavelet

The Haar “aperture” function is

Haar’s theorem (1905):All Haar wavelets together with the constant function I[0;1](x), form an orthonormal basis for the Hilbert space of all square integrable functions on [0, 1].

).()()( 121

210 xI xIx xx

(Harr)<≤<≤ −=ψ

1

-1

11/2

ZkZj (Haar)kj ∈∈ ;|,ψ


Haar waveletsHaar’s mother wavelet:

Why orthonormal basis?

.Completeness is due to the fact that:All All dyadicallydyadically piecewise constant functions are dense in piecewise constant functions are dense in LL22 (0,1).(0,1).

1

-1

).()()( 121

210 xI xIx xx

(Harr)<≤<≤ −=ψ

otherwise mklj

mlkj ,0&,1; ,,

===ψψ


Haar wavelets

Three Haar wavelets and the mean (constant) encode all the information of the piecewise constant approximation (or, the analog-to-digital transition).

I(x)

1 constant + 3 wavelets are enough.


Example 2: The Shannon wavelets

The Shannon’s “aperture” function is:

TheoremTheorem:

is an orthonormal basis of L2(R).,:)( Zkjx(Shannon)

j,k ∈ψ

x x x sinc

x( sincx sincx(Shannon)

ππ

ψ)sin()(

),)2(2)(

=

−=


Shannon wavelets

An orthonormal basis is visualized better in Fourier space!

Shannon proved that:All (−π, π) bandlimited signals can be represented by sinc(x-n)…(−2π, π ) U (π, 2π) band-limited by (−4π, 2π ) U (2π, 4π) band-limited by

).)2(2)( x( sincx sincx(Shannon)

−=ψ

π– π

Fourier of 2sinc(2x)Fourier of sinc(x)

– 2 π 2 π

)2(2)(

,1,0

nxnx

nn

−=−

ψψψ

9


Shannon wavelets

Shannon proved that:All (−π, π) band-limited signals can be represented by sinc(x-n)(−2π, π ) U (π, 2π) band-limited by (−4π, 2π ) U (2π, 4π) band-limited by…

)2(2)(

,1,0

nxnx

nn

−=−

ψψψ

. . . . . .

.s' nxby drepresente n )(,0 −ψ.s' )2(2by drepresente n1, nx −= ψψ

.s' )2/(2/1by drepresente n1,- nx −= ψψ

π π4π22π

ξ


Partition of the Time-Frequency space

Heisenberg’s uncertainty principle postulates that for each particle’s position and momentum we have:

So, to get an optimal spatial-localization, we must influence the particle’s scale or frequency content.

Freq

uenc

y

Space/Time

sinc(x-n)

sinc(2x-m)

sinc(4x-k)

.2π≥∆⋅∆ xt


Multiresolution analysis (MRA)

Mallat and Meyer (1986):

An (orthogonal) multiresolution of L2(R) is a chain of closed subspaces indexed by all integers:

subject to the following three conditions:(completeness)

(scale similarity)

(translation seed) V0 has an orthonormal basis consisting of all integral translates of a single function

LL 21012 VVVVV ⊂⊂⊂⊂ −−

.0lim),(lim 2 ==−∞→∞→ nnnn

VRLV

.)2()( 1+∈⇔∈ nn VxfVxf

:)( :)( Znnx x ∈−φφUCLA, Ivo DinovSlide 53

Equations for designing MRA

The refinement/dilation equation for the father-waveletfunction:

This father-wavelet function is often called: scaling function, shape function

Let Wo denote the orthogonal complement of Vo in V1. Then Wo is also orthogonally spanned by the integer translates of a single translation function ψ(x) the (mother) wavelet!

s.' ofset suitable afor ),2(2)( nn n hnxhx −= ∑ φφ

s.'g of set suitable somefor nxgx nn n ),2(2)( −= ∑ φψ

LL 21012 VVVVV ⊂⊂⊂⊂ −−


Wavelets representation

Theorem:Theorem:

Wavelets representation of a signal:Wavelets representation of a signal:

2

2/,

Lfor basis lorthonormaan is ,:)2(2 Zkjkxjj

kj ∈−= ψψ

jj VII ∈~+

11 −− ∈ jj VI+

11 −− ∈ jj Wd

+

L2−jd .00 Wd ∈

.00 VI ∈

.0021 IdddI jjj ++++= −− L

Course Scale +Wavelet Detail Coefficients

……


An example of wavelet decomposition

One level wavelet decomposition of a 1-D signal

10


2-Channel filter bank: Analysis bankForward Discrete Wavelet Transform

• H’ is the low-pass filter ;•G’ is the high-pass filter ;• 2 is the down-sampling operator: (1 3 4 6 5 8 7) (1 4 5 7).

G’

H’ 2

2

xs

d

Low-pass channel

High-pass channel

Scales

Details


2-channel filter bank: Synthesis bankInverse Discrete Wavelet Transform

• H is the low-pass filter ;• G is the high-pass filter.• 2 is the up-sampling operator: (1 4 5 7) (1 0 4 0 5 0 7).

G

H2

2

s

d

Low-pass channel

High-pass channel

y


A biorthogonal filter bank

G’

H’ 2

2x

s

d

G

H2

2

s

dy

Biorthogonal (or perfect) filter bank: if y=x for all inputs x .


An orthogonalorthogonal filter bank

G’

H’ 2

2x

s

d

Definition: Orthogonal filter bank is biorthogonal, where both analysis filters H’ and G’ are the time reversals of the synthesis filters H & G: e.g., H=(1, 2, 3) H’=(3, 2, 1).

G

H2

2

s

dy


The fundamental theorem of MRA

An orthogonalorthogonal Mallat-Meyer MRA corresponds to an orthogonalorthogonal filter bank with the synthesis filters:

where, the hn’s and gn’s are the 2-scale connection coefficients in the dilation and wavelet equations:

And, the multiresolution wavelet decomposition of f corresponds to the iteration of the analysis bank with the φ-coefficients of f as the input digital data.

).:(),:( ZngGZnhH nn ∈=∈=

).2(2)(),2(2)( nxgxnxhxn nn n −=−= ∑∑ φψφφ


The fundamental theorem (cont’d)

jj VI ∈+

11 −− ∈ jj VI+

11 −− ∈ jj Wd

+

L2−jd .00 Wd ∈

.00 VI ∈

.0021 IdddI jjj ++++= −− L

Suppose j=2, and ∑=k k xkcI ).()( ,222 φ

G’

H’ 22

c2d1

G’

H’ 22 d0

c0c1

11


The Discrete Wavelet Transform – DWT

Alternating quardature mirror – Smooth (S) and Detail (D) filters for Daub4 filterbank.

NNcccccccc

cccccccc

cccccccc

×

2301

1032

0123

3210

0123

3210

- -

. .. .

. .- -

- -

SD

Half the results of the S and D filters are discarded (decimation) .N N/2 Smooth/Low-Pass components and N/2 Detail/High-Pass components.


DWT – Pyramidal Algorithm (forward)


Data(y) Apply Permute Apply PermuteMatrix: Cxy Elements Matrix: Cxy Elements

Details, waveletcoefficients

→

→

→

→

4

3

2

1

2

1

2

1

4

3

2

1

2

2

1

1

4

3

2

1

4

3

2

1

4

4

3

3

2

2

1

1

8

7

6

5

4

3

2

1

ddddddddssss

ddddddssddss

ddddssss

dsdsdsds

yyyyyyyy Smooth

Components ,Mother functioncoefficients

Forward DWT


DWT – Pyramidal Algorithm (inverse)


Data(y) Permute Apply Permute ApplyElements CTxw Elements CTxw

←

←

←

←

=

8

7

6

5

4

3

2

1

4

4

3

3

2

2

1

1

4

3

2

1

4

3

2

1

4

3

2

1

2

2

1

1

4

3

2

1

2

1

2

1

8

7

6

5

4

3

2

1

?

wwwwwwww

dsdsdsds

ddddssss

ddddddssddss

ddddddddssss

yyyyyyyy

InverseDWTC=orthomatrix


FBI fingerprints.JPEG2000 image compression file format.Image indexing and image search engines (for databanks).Signal modeling.Image denoising and restorations (optimal representation).Texture analysis.Brain Mapping and Neuroimaging.Algorithm speeding up based on multi-resolution representation (e.g., solving large sparse-matrix linear equ’s)Time series analysis.

Major Applications of Wavelets



A 2D Daubechies wavelet illustrating the compact support,fast decaying and oscillatory properties of wavelets.



Signal & it’s 3D WT

12











Wavelets

The three curves on this graph represent the original signal(HeavySine function, thin red curve), its wavelet transform (thin bluecurve) and the reconstructed function estimator using only the largest2% of the wavelet parameters. Note the space-frequency decorrelationof the original data in the wavelet-space (blue curve).


Wavelets

If Time Permits:C:\Ivo.dir\Research\movies\WaveletMovie.mpg

Also available online at the Wavelet Analysis of Image Registration site:http://www.loni.ucla.edu/~dinov/WAIR.html


Atlas–Based Wavelet–Space Stat Analysis

Design protocol for the wavelet-based construction and utilization of the fMRIfunctional atlas (F&A SVPA). The construction of the anatomical probabilisticatlas is augmented by including the regional distributions of the waveletcoefficients of the functional signals for one population. Statistical analysis offunctional variability between a new fMRI volume and the F&A SVPA atlas isassessed following wavelet-space shrinkage.

13


Distribution of the wavelet coefficients

Displayed is the frequency histogram of the wavelet coefficients at 1,000 randomly selected locations (random indices of wj,k), averaged across all 578 MRI volumes part of the ICBM database [Mazziotta et al., 1995]. Notably, most of the wavelet coefficients are near theorigin, with some having sporadic, but large magnitudes. Heavy-tail distributions models will be appropriate for these data.

-5 0 5


Distr’n of the wavelet coefficients – 3 volumes

Shown here are the frequency distributions of the wavelet coefficients for three separate MRI volumes (randomly selected from the pool of 578). There is little variation between these three individuals, however the overall shape of the distribution of the magnitudes of the wavelet coefficients of each individual across the 1,000 random locations is more regular, still heavy-tailed, than the averaged distribution across subjects depicted in the previous Figure


Wavelet Coefficient distributions MRI data

This image illustrates a portion of all of the individualdistributions of the waveletparameters for the 578 ICBM MRI volumes. The horizontal axis represents the magnitude of the wavelet coefficients in the range [-4 ; 4], vertical axis labels the subject index and the row color map indicates the frequencies of occurrence of a wavelet coefficient of certain magnitude across all 1,000 voxels. Bright colors indicate high, and dark colors represent low, frequencies.

Vox

el L

ocat

ion Distribution Across 578 Subjects


fMRI data – 1 subject wavelet coefficient distributions

This diagram depicts the frequency histogram of the average magnitude of a wavelet coefficient, across all 128 fMRI 3D time-volumes (part of the fMRI study of Buckner and colleagues [Buckner et al., 2000]) at 1,000 randomly selected locations (random indices of wj,k). Again, we observe the heavy-tailness of the data.


A 2D image displaying all of the individual distributions for each 128 3D time points (1 run, 15 trials) of an fMRI timeseries. On the horizontal axis is the magnitude of the wavelet and the vertical axis labels the time point of the fMRI epoch.

Colors indicate the frequencies of occurrence of a wavelet coefficient of certain magnitude across all 1,000 voxel locations. The across row average is effectively shown on previous Figure.

Wavelet coefficient Distribution – 1 run

-4 0 4

1 64

128

Magnitude of wavelet-coefficients

Time

Freq

uenc

y


Distribution of 4D wavelet coefficients

This graph shows the frequency distribution of the wavelet coefficients of the complete 4D WT of the fMRI timeseries. Note the slow asymptotic decay of the tails of this distribution (data-size: 64x64y16z128t, floating point). Here the x-axis represents the magnitude of the wavelet coefficients and the vertical axis indicates the frequency of occurrence of a wavelet coefficient of the given magnitude within the 4D dataset. Sporadic behavior of the large-magnitude wavelet coefficients is clearly visible at the tails of this empirical distribution.

-10 0 10

14


Distribution of (and Models for) the Wavelet Coefficients of the 4D fMRI volume

Several heavy-tail distribution models are fitted to the frequency histogram of 4D wavelet coefficients. These include double-exponential, Cauchy, T, double-Pareto and Bessel K form models. Because our statistical tests will be applied on the coefficients that survive wavelet shrinkage it is important to utilize a distribution model that provides accurate asymptotic approximation to the real data in the tail regions.


Right Tail of Distribution of (and Models for) the Wavelet Coefficients of the 4D fMRI volume

Shows the extreme left trail of the data distribution (data is symmetric). The double-exponential and the T distribution models underestimate the tails of the data asymptotically, but provide good fits around the mean. Cauchy, Bessel K forms and double-Pareto distribution models, in that order, provide increasingly heavier tails with Cauchy being the most likely candidate for the best fit to the observed wavelet coefficients across the entire range. The double-Pareto and the Bessel K form densities provide the heaviest tails, however, they are inadequate in the central range [-3 : 3] and undefined near the mean of zero.


Wavelet-domain Atlas-based fMRI Analysis

• Anatomical Sub-Volume Probabilistic Atlas (SVPA)

• Analysis using conventional time-domain SVT (sub-volume thresholding)Subtractionparadigm:AD-ElderlyNC

20

0


Wavelet-domain Atlas-based fMRI Analysis

• Raw results following analysis in the wavelet-domain (has ringing effects outside of the true ROI where the activation supposedly occurred).

• Post-processed wavelet domain statistics [final uniform thresholding (top 1%), reconstruction of all wavelet-space stats into a single volume – regional calculations]:


Bayesian Mixture Models for fMRI Analysis

• We use two component mixture prior distributions on the wavelet coefficients θj,k with

where πj is a proportion between 0 and 1, δ(0) is the Diracpoint mass at zero and τj >0. In other words, there is a level-dependent positive probability πj a priori that each wavelet coefficient will be exactly zero. If not, the coefficient will be normally distributed with mean zero and a level-specific standard deviation τj

( ) )0()1(,0~| 2, δπτπτπθ jjjjjkj N −+


Bayesian Mixture Models for fMRI Analysis

• Given the observed data over an ROI y=f+ε, the corresponding wavelet representation w = θ +z, where W is the discrete WT, w = Wy, θ=Wf and z = Wε, and the above prior distribution for the true wavelet coefficients, the posterior distributions of θj,k are again independent two-component mixtures:

Where λj,k= 1/(1+ ρj,k) and and the posterior odds that θ j,kis exactly zero are:

( ) )0()1(,~| ,22

22

22

2,

,,, δλτσ

τσ

τσ

τλθ kj

j

j

j

jkjkjkjkj

wNwp −+

++

+

−+−=

)(2exp

1222

,222

,στσ

τσ

στπ

πρ

j

kjjj

j

jkj

w

15


Completeness and Efficiency of Signal Representation


Completeness and Efficiency of Signal Representation - Functional Bases






Completeness and Efficiency of Signal Representation –Why use Wavelet functions?

IWT(Uniform 5%)

HeavySine


Wavelet-space Shrinkage

16










Stereotactic data alignment - warping

Data Target

Warp 1 Warp 2 Warp 3


Wavelet-space Warp ClassificationFive PET (left) and corresponding MRI (right) data volumes (left columns), their LS,

MI and Affine warps (columns 2-4) and the target (right column).

Whi

ch o

ne is

the

best

war

p?

17


Wavelet-space Warp Classification

Spread Group Classification (SGC) Analysis of PET data-separation of saccade PET frequency clouds


Wavelet-space Warp Classification

Cluster Group Classification (CGC) Analysis of MRI data- how close together are all 5 MRI under each warp strategy?

Smal

ler c

lass

ifyin

g va

lues

be

tter w

arps

1

4

2

3Rank

wavelet-based statistical analysis of fmri data

Documents