wavelet-based statistical analysis of fmri data
TRANSCRIPT
1
UCLA, Ivo Dinov Slide 1
Stat 233 Statistical Methods in Biomedical Imaging
Wavelet-Based Statistical Analysis of fMRI Data
Ivo Dinov
http://www.stat.ucla.edu/~dinov
UCLA, Ivo DinovSlide 3
fMRI Simulation movie
UCLA, Ivo DinovSlide 4
Problems with BOLD
How localized is the BOLD response to the site of neuronal activity?
Is the signal from draining veins rather than the tissue itself?
How is the signal change coupled to neuronal activity?
Do changes in timing of BOLD responses reliably tell us about changes in timing of neural activity?
UCLA, Ivo DinovSlide 5
Resting state versus Active statee.g. Finger tapping, word recognition.
Whole brain scanned in ~3 seconds using a high speed imaging technique (EPI).
Perform analysis to detect regions which show a signal increase in response to the stimulus.
Activation Imaging using BOLD
UCLA, Ivo DinovSlide 6
fMRI Data Analysis Tools
Brain Voyager http://www.brainvoyager.de/
CARET http://brainmap.wustl.edu/resources/caretnew.html
FSL http://www.fmrib.ox.ac.uk/fsl/
SCIrun http://software.sci.utah.edu/scirun.html
LONI Pipeline http://www.loni.ucla.edu
McGill BIC www.bic.mni.mcgill.ca/usr/local/matlab5/toolbox/fmri/
AFNI http://afni.nimh.nih.gov/afni/
SPM http://www.fil.ion.ucl.ac.uk/spm/UCLA, Ivo DinovSlide 7
Hypotheses vs. Data driven Approaches
Hypothesis-drivenExamples: t-tests, correlations, general linear model (GLM)
• a priori model of activation is suggested• data is checked to see how closely it matches components of the model• most commonly used approach (e.g., SPM)Data-driven – Independent Component Analysis (ICA)• No prior hypotheses are necessary• Multivariate techniques determine the patterns in the data that account
for the most variance across all voxels• Can be used to validate a model (see if the math comes up with the
components you would’ve predicted)• Can be inspected to see if there are things happening in your data that
you didn’t predict• Can be used to identify confounds (e.g., head motion)• Need a way to organize the many possible components • New and upcoming
2
UCLA, Ivo DinovSlide 8
A. Region-Of-Interest (ROI) approach1. A localizer run(s) to find a region (e.g., show moving rings to find medial temporal MT area)2. Extract time course information from that region in separate independent runs3. See if the trends in that region are statistically significant
The runs that are used to generate the area are independent from those used to test the hypothesis.
Example study: Tootell et al, 1995, Motion Aftereffect
Two approaches: Global vs. Regional
Localize “motion area” MT in a run comparing moving vs. stationary rings
Extract time courses from MT in subsequent runs while subjects see illusory motion (motion aftereffect)
MT
UCLA, Ivo DinovSlide 9
Two Approaches: Whole Brain Stats
Source: Tootell et al
B. Whole volume statistical approach1. Make predictions about what differences you should see if
your hypotheses are correct2. Decide on statistical measures to test for predicted
differences (e.g., t-tests, correlations, GLMs)3. Determine appropriate statistical threshold4. See if statistical estimates are significant
Statistics available1. T-test2. Correlation3. Frequency-Based (Fourier/Wavelet/Fractal) modeling4. General Linear Model -overarching statistical model that lets you perform many types
of statistical analyses (incl. correlation/regression, ANOVA)
UCLA, Ivo DinovSlide 10
Comparing the two approaches
Region of Interest (ROI) Analyses• Gives you more statistical power because you do not have to correct
for the number of comparisons• Hypothesis-driven• ROI is not smeared due to intersubject averaging• Easy to analyze and interpret• Neglects other areas which may play a fundamental role• Popular in North America
Whole Brain Analysis• Requires no prior hypotheses about areas involved• Includes entire brain• Can lose spatial resolution with intersubject averaging• Can produce meaningless lists of areas that are difficult to interpret• Depends highly on statistics and threshold selected• Popular in EuropeNOTE: Though different experimenters tend to prefer one method over the other,
they are NOT mutually exclusive. You can check ROIs you predicted and then check the data for other areas.
UCLA, Ivo DinovSlide 11
Why do we need statistics?MR Signal intensities are random-variation caused by scanner (magnet, coil, time)-variation in neurophysiology (area, task, subject)
The rest-state scans used against activation tasks in subtraction paradigm setting to correct for some of these effects within the same run.
Statistical analyses help us confirm whether the eyeball tests of significance based on visual thresholding are real/correct.
Because we do so many comparisons (~106), we need a way to compensate (increase of Type I error).
UCLA, Ivo DinovSlide 12
Paradigm: Event-related design to assess between-population differences of amplitude and variance of hemodynamicresponse to visual stimulus (14 young, 14 old noAD and 13 AD subjects)
Stimulus required a righthand index-finger click
16 coronal slices64x64 in-plane voxelsvolume time = 2 sec3 x 3 x 5 mm3 voxels
4 runs per Subject128 volumes per run 2 (randomly chosen)conditions (always 7+8)1 Trial = 21.44 sec (8 vol’s)Total 60 Trials/subject
fMRI of Young, Nondemented and Demented Adults
15 Random Trials / Run
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 1
Run 2
Run 3
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128Run 4
blank1-trial condition–1.5 second 8Hz B/W checkerboard flicker
2-trial condition – 2 visual stimuli 5.36 sec apart
Buckner et al. J. Cog. Neurosci. 2000
UCLA, Ivo DinovSlide 13
One Approach – Voxel-Based T-testsLook for differences b/w One-Trial & Two-Trial Activation, for a given voxel:
Measure average MR signal and SD for each volume in which 1-Trial stimuli were presented (7-8 epochs x 8 volumes/epoch = 56-64 volumes).Measure average MR signal and SD for each volume in which 2-Trial stimuli were presented (56-64 volumes).
If p = .001 and 65,536 voxels, 65,536*0.001 = 66 voxels could be significant purely by chance
Stat-Significant Mean difference? Calculate To value. Look up p valuefor that number of degrees of freedom (df >= 56 x 2 = 112). e.g., For ~112 df To>1.98 → p <0.05 To > 3.39 → p < 0.001Repeat this process 65,536 more times(64x64x16), once for each voxel.
To look for OneTrial – TwoTrialActivation, look at the negative tail of the comparison
F>PP>F fMR
ISignal
2 Trial1 Trial 2 Trial1 Trial
3
UCLA, Ivo DinovSlide 14
T-test: MapsFor each voxel in the brain, we can now color code a map based
on the computed t and p values:
And we can also do this for the negative tail (2 Trial – 1 Trial)Blue = low significanceGreen = high significance
We can do this for the positivetail (1 Trial – 2 Trial)Orange = low significanceYellow = high significanceSchmutz or ESP voxels?
UCLA, Ivo DinovSlide 15
Another Statistical Analysis – Basic Correlation• voxels with time course correlated with a reference function• can incorporate hemodynamic response function (HRF) to predict time course more accurately
For each voxel:• Find the correlation between the predictor and the MR signal• Extract the correlation (r value) and find the corresponding p value (Pearson/Spearman).• Determine whether it is statistically significant• Similar in spirit to a T-test.
56 data points
56 data points
Data 0 1 Ref.ModelValue of Predictor
Valu
e of
fMR
I Sig
nal
Calculate r
Remember r2 is the proportion of variance accounted for by our predictor, e.g., if r=0.7, r2=0.49 49%
UCLA, Ivo DinovSlide 16
Predictor – Hemodynamic Response Function - HRF
% signal change= (point – baseline)/baselineusually 0.5-3%
initial dip-more focal and potentially a better measure-somewhat elusive so far, not everyone can find it
Perc
ent C
hang
e Fr
om B
asel
ine
StimulusOn
Time from StimulusOnset (sec)
time to risesignal begins to rise soon after stimulus begins
time to peaksignal peaks 4-6 sec after stimulus begins
post stimulus undershoot
signal suppressed after stimulation ends
UCLA, Ivo DinovSlide 17
Correlation: Incorporating the HRFWe can model the expected curve of the data by convolving our predictor with the hemodynamic response function.
To find a 1-Trial responsive area, we can correlate the convolved face predictor with each voxel time course
Data 0 1 ModelValue of Predictor
56 data points
56 Model points
Valu
e of
fMR
I Sig
nal
g(x)KernelPredictor
m(x)
Model(m*g)(x)
Dataf(x)
UCLA, Ivo DinovSlide 18
Correlations: Negative Tail
Looking for anything that shows a reduced response during 1-Trial condition (compared to both 2-Trial and fixation/rest)
UCLA, Ivo DinovSlide 19
Two Main fMRI Designs
Block design• Compare long periods
(e.g., 16 sec) of one condition with long periods of another
• Traditional approach• Most statistically
powerful approach• Less dependent on how
well you model the HRF (hemo)
• Habituation effects?!?
Event-related design• Compare brief trials (1 sec)
of 1 condition with brief of another
• Newer approach (1997+) • Less statistically powerful
but has some advantages• Trials can be well-spaced to
allow the MR signal to return to baseline b/w trials (e.g., 21+ sec apart) or closely spaced (e.g., every 2 sec)
4
UCLA, Ivo DinovSlide 20
Problems with T-tests and Correlation Analyses
1) How do we evaluate runs with different orders?We could average our 4 runs (60 trials) by 1-Trial of 2-Trial stimuli and then do stats comparing the functional differences between the 2 paradigms. There is no way to collapse between orders.
2) If we test more subjects, how can we evaluate all subjects together?As with the single subject runs, we could average all the subjects together (after morphing them into an atlas space) but that still means we have to run all of them in the same order.
3) We can get nice hemodynamic predictors for 1-Trial or 2-Trial stimuli, but how can we compare them accurately?
If this predictor is significant, we won’t know if it’s because
1-Trial > 2-Trial OR 1-Trial > Fixation
A Solution: Use General Linear ModelingUCLA, Ivo DinovSlide 21
General Linear Model for fMRI
Adapted from Brain Voyager course slides
Parse out variance in the voxel’s time course to the contributions of six predictors plus residual noise (what the predictors can’t account for).
+
fMRI signal residuals
+
β1 ×
β2 ×
=
β6 ×
…
+
+
+
+
β3 ×
β4 ×
β5 ×
Design Matrix Examples:Stimulus
Subject
Run
Trial
Group (AD/Young)
ROI
UCLA, Ivo DinovSlide 22
Completeness and Efficiency of Signal Representation – Wavelet functions 1D
SOCR Wavelet/Fourier Demo!
http://socr.stat.ucla.edu/
UCLA, Ivo DinovSlide 23
Mostly no global multi-scale self-similarity.
Contain both man-made and natural features
Mostly no simple and universal underlying physical or biological processes that generate the patterns in a general image.
Features of Natural Images
Ref. www.math.umn.edu/~jhshen
UCLA, Ivo DinovSlide 24
Fourier Representation
Claim: Harmonic waves are bad vision neurons…
Because.A typical Fourier neuron is
To process a simple bright spot in the visual field,
all such neurons have to respond to it (!) since
).exp(iax=φ
)(xδ
.1)(, )( == ∫ − dxex xi πδφδ
UCLA, Ivo DinovSlide 25
Efficiency of representation
David Field (Cornell U, Vision psychologist):
“To discriminate between objects, an effective transform (representation) encodes each image using the smallestpossible number of neurons, chosen from a large pool.”
5
UCLA, Ivo DinovSlide 26
Asking our own “headtop”…
Psychologists show that visual neurons are spatiallyorganized, and each behaves like a small sensor (receptor) that can respond strongly to spatial changes such as edge contours of objects (Fields, 1990).
UCLA, Ivo DinovSlide 27
Marr’s Edge Detection
Detection of edge contours is a critical ability of human vision(Marr, 1982). Marr and Hildreth (1980) proposed a model for human detection of edges at all scales. This is Marr’s Theory of Zero-Crossings:
Why? IG( whereI in occurs Edge
yxyxyG
xGG
yxG
.0)
,2
exp2
11
:isGaussian theofLaplacian
KernelGaussian,2
exp2
1
2
22
2
22
4
2
2
2
2
2
22
2
=∗∆
+−
+−−=
=∂
∂+∂
∂=∆
+−=
σ
σσσ
σ
σσπσ
σπσ
UCLA, Ivo DinovSlide 28
Marr’s Edge Detection
Laplacian of (an isotropic) Gaussian Kernel (at (0,0)) (http://www.dai.ed.ac.uk/HIPR2/flatjavasrc/)
+−= 2
22
2 2exp
21
σπσσ
yxG 22
22
yG
xGG
∂∂+
∂∂=∆ σσσ
UCLA, Ivo DinovSlide 29
Laplacian of a Gaussian –Why an Edge at Zero-Crossing?
Uses the second spatial derivatives of an image. The operator response will be zero in areas where the image has a constant intensity (i.e. where the intensity gradient is zero).
Near a change in image intensity the operator response will be positive on the darker side, and negative on the lighter side.
A sharp edge between two regions of different uniform intensities, the Laplacian of Gaussians response will be:
zero away from the edge, positive to one side of the edge, negative to the otherzero on the edge in b/w.
Example:Response of 1D Laplacian of Gaussian (σ = 3 pixels) to a 1D step edge.
UCLA, Ivo DinovSlide 30
Laplacian of a Gaussian –Why an Edge at Zero-Crossing?
A zero crossing detector looks for places where the Laplacianchanges sign. Such points often occur around edges in images.
But they also occur at places that are not as easy to associate with edges. Zero crossings always lie on closed contours.
The starting point for the zero crossing detector is an image which has been convolved with a Laplacian of Gaussian filter.
The zero crossings that result are strongly influenced by the size of the (isotropic) Gaussian, change filter-size in applet demo.
As the smoothing is increased then fewer and fewer zero crossing contours will be found, and those that do remain will correspond to features of larger and larger scale in the image.
UCLA, Ivo DinovSlide 31
Laplacian of a Gaussian –Why an Edge at Zero-Crossing?
Demos:
C:\Ivo.dir\UCLA_Classes\Applets.dir\ImageProcessing\LaplacianOfGaussian
LaplacianOfGaussianDemo.html
LaplacianDemo.html
6
UCLA, Ivo DinovSlide 32
Marr vs. Haar’s edge encoding
Marr’s edge detector is to use second derivative to locate the maxima of the first derivative (which the edge contours pass through).
Haar Basis (1909) encodes the edges into image representation via the first derivative operator (i.e. moving average/difference):
( )
−=+= →← ++
+,...
2,
2...,...,... 122122
122nn
nnn
nHaar
nnxxxx daxx
UCLA, Ivo DinovSlide 33
A good representation should respect edges
Edges are important features of images.
A good image representation (or basis) should be capable of providing the edge information easily.
Edge is a local feature. Local operators, like differentiation, must be incorporated into the representation, as in the coding by the Haar basis.
Wavelets improve Haar encoding, while respecting the above edge representation principle.
UCLA, Ivo DinovSlide 34
What is a good image representation?
Mathematically rigorous (i.e. a clean and stable analysisand synthesis program exists. FT & IFT…).
Having an independent fast digital formulation framework (e.g., FFT, FWT… ).
Capturing the characteristics of the input signals, and thus many existing processing operators (e.g. image indexing, image searching …) are directly permitted on such representation.
Improving statistical inference power.
UCLA, Ivo DinovSlide 35
Mathematics of wavelet image representation
Let Ω denote the collection of all images. What is the mathematical structure of Ω ? Suppose that is an observed image. Then Ω should be invariant under
Euclidean motion of the scanner/imager:
Brightness/Flashing:
or, more generally, a morphological transform –
Scaling:
Ω∈f
.),2(),()( 2RaOQaQxfxf ∈∈+→
,),()( +∈→ Rxfxf µµ
.0',:)),(()( >→→ hRRhxfhxf
.),()( +∈→ Rxfxf λλ
UCLA, Ivo DinovSlide 36
E.g., Zooming in 2-D
UCLA, Ivo DinovSlide 37
What is zooming?Zooming (aiming) center: a.
Zooming scale: h.
Zoom into the h-neighborhood at a in a given image I:
Zooming is one of the most fundamental and characteristic operators for image analysis and visual communication. It reflects the multi-scale nature of images and vision.
( )aperture. the yII
field; visual the x xhaIxI
hay
hay
ha
ha,1)(
,,)(
,
,
⋅=
Ω∈⋅+=−
Ω−
7
UCLA, Ivo DinovSlide 38
The zooming/scaling and wavelets
The base function:
Aiming (a) and zooming (h), in-or-out:
Generating response (to centering/scaling of signals):
The zooming space:
)( xψ
).()( 1, h
axhha x −= ψψ
.)()(, ,,, dxxxIII hahaha ψψ ∫==
.),( +×∈ RRha
UCLA, Ivo DinovSlide 39
Good Signal representation should be adaptable
A good representation should react strongly to abrupt changes, and weakly to stable intensities
That means, for a stable image, e.g., constant, I=c, the responses Ia , h should be approximately zeros:
This is the “differentiating” property of the encoding, e.g., d/dx
.0)(∫ =R
xψ
0, ,, ≈= haha II ψ
UCLA, Ivo DinovSlide 40
The continuous wavelet representation
DefinitionDefinition:
A differentiating zooming function ψ(x) is said to be a (continuous) wavelet. Representing a given image I(x) by all the scale/shift responses
is the corresponding wavelet representation.
Questions:Questions:Is there a best wavelet ψ(x) ?Does such wavelet representation allow perfect image reconstruction?
haha II ,, ,ψ=
UCLA, Ivo DinovSlide 41
Synthesizing a wavelet representation
Goal:Goal: to recover perfectly an image signal I from its wavelet representation I(a , h).
(Continuous) Wavelet synthesis (where ^ denotes the FT ):
which is in the form of IFT. Let then it can be perfectly recovered via the a-FT of I(a , h).Then can be perfectly recovered from J via
, ),(,,),( ,,ξξψψψ ia
haha ehIIIhaI −∧∧∧∧
===
)(),( ξψξ hIhJ∧∧
=
∧I
.1 as,)(),()(),0[),0[
2)(
== ∫∫ ∞∞
∧∧
∧
dh h
dh h
hhJIhψ
ξψξξ
UCLA, Ivo DinovSlide 42
The admissibility condition & differentiation
The admissibility condition of a continuous wavelet:
A differentiating zooming operator satisfies the admissibility condition since:
Examples:The (1D Laplacian of Gaussian) Marr wavelet (Mexican-hat): second derivative of Gaussian.
The Shannon wavelet:
. /|)(| 2
0∞<
∧∞
∫ dhhhψ
).()( and ,0)()0( hochhdxxR
+===∧∧
∫ ψψψ
t t t
x sincx sincx
ππ
ψ)sin()(sinc
).()2(2)(=
−=
UCLA, Ivo DinovSlide 43
The discrete set of zooming scales
Make a log-linear discretization to the scale parameter h:
Make a scale-adaptive discretization of the zooming centers:
The discrete set of zooming operators:
.,2,1,0log 2 L±±=−=→ jhjj
.,2 ,1 ,0,2/ : 2 scaleat
L±±=
==→= −
kkkhakh j
jkj
j
).2(2)()( 2/1, kxx jj
hkhx
hkj j
j
j−== − ψψψ
8
UCLA, Ivo DinovSlide 44
The discrete wavelet representation
The wavelet coefficients:
Questions:Questions:Does the set of all wavelet coefficients still encode the complete information of each input image I ? Is the set of wavelets a basis?Or more precisely what space has these basis functions?
We don't know. But let's check out some examples...
WT.continuous theof in terms ,
.)2()(2,
2,2,
2/,,
jj kkj
j
R
jkjkj
Id
dxkxxIId
−−=
−== ∫ ψψ
,:)( , Zkjxkj ∈ψ
UCLA, Ivo DinovSlide 45
Example 1: Haar wavelet
The Haar “aperture” function is
Haar’s theorem (1905):All Haar wavelets together with the constant function I[0;1](x), form an orthonormal basis for the Hilbert space of all square integrable functions on [0, 1].
).()()( 121
210 xI xIx xx
(Harr)<≤<≤ −=ψ
1
-1
11/2
ZkZj (Haar)kj ∈∈ ;|,ψ
UCLA, Ivo DinovSlide 46
Haar waveletsHaar’s mother wavelet:
Why orthonormal basis?
.Completeness is due to the fact that:All All dyadicallydyadically piecewise constant functions are dense in piecewise constant functions are dense in LL22 (0,1).(0,1).
1
-1
).()()( 121
210 xI xIx xx
(Harr)<≤<≤ −=ψ
otherwise mklj
mlkj ,0&,1; ,,
===ψψ
UCLA, Ivo DinovSlide 47
Haar wavelets
Three Haar wavelets and the mean (constant) encode all the information of the piecewise constant approximation (or, the analog-to-digital transition).
I(x)
1 constant + 3 wavelets are enough.
UCLA, Ivo DinovSlide 48
Example 2: The Shannon wavelets
The Shannon’s “aperture” function is:
TheoremTheorem:
is an orthonormal basis of L2(R).,:)( Zkjx(Shannon)
j,k ∈ψ
x x x sinc
x( sincx sincx(Shannon)
ππ
ψ)sin()(
),)2(2)(
=
−=
UCLA, Ivo DinovSlide 49
Shannon wavelets
An orthonormal basis is visualized better in Fourier space!
Shannon proved that:All (−π, π) bandlimited signals can be represented by sinc(x-n)…(−2π, π ) U (π, 2π) band-limited by (−4π, 2π ) U (2π, 4π) band-limited by
).)2(2)( x( sincx sincx(Shannon)
−=ψ
π– π
Fourier of 2sinc(2x)Fourier of sinc(x)
– 2 π 2 π
)2(2)(
,1,0
nxnx
nn
−=−
ψψψ
9
UCLA, Ivo DinovSlide 50
Shannon wavelets
Shannon proved that:All (−π, π) band-limited signals can be represented by sinc(x-n)(−2π, π ) U (π, 2π) band-limited by (−4π, 2π ) U (2π, 4π) band-limited by…
)2(2)(
,1,0
nxnx
nn
−=−
ψψψ
. . . . . .
.s' nxby drepresente n )(,0 −ψ.s' )2(2by drepresente n1, nx −= ψψ
.s' )2/(2/1by drepresente n1,- nx −= ψψ
π π4π22π
ξ
UCLA, Ivo DinovSlide 51
Partition of the Time-Frequency space
Heisenberg’s uncertainty principle postulates that for each particle’s position and momentum we have:
So, to get an optimal spatial-localization, we must influence the particle’s scale or frequency content.
Freq
uenc
y
Space/Time
sinc(x-n)
sinc(2x-m)
sinc(4x-k)
.2π≥∆⋅∆ xt
UCLA, Ivo DinovSlide 52
Multiresolution analysis (MRA)
Mallat and Meyer (1986):
An (orthogonal) multiresolution of L2(R) is a chain of closed subspaces indexed by all integers:
subject to the following three conditions:(completeness)
(scale similarity)
(translation seed) V0 has an orthonormal basis consisting of all integral translates of a single function
LL 21012 VVVVV ⊂⊂⊂⊂ −−
.0lim),(lim 2 ==−∞→∞→ nnnn
VRLV
.)2()( 1+∈⇔∈ nn VxfVxf
:)( :)( Znnx x ∈−φφUCLA, Ivo DinovSlide 53
Equations for designing MRA
The refinement/dilation equation for the father-waveletfunction:
This father-wavelet function is often called: scaling function, shape function
Let Wo denote the orthogonal complement of Vo in V1. Then Wo is also orthogonally spanned by the integer translates of a single translation function ψ(x) the (mother) wavelet!
s.' ofset suitable afor ),2(2)( nn n hnxhx −= ∑ φφ
s.'g of set suitable somefor nxgx nn n ),2(2)( −= ∑ φψ
LL 21012 VVVVV ⊂⊂⊂⊂ −−
UCLA, Ivo DinovSlide 54
Wavelets representation
Theorem:Theorem:
Wavelets representation of a signal:Wavelets representation of a signal:
2
2/,
Lfor basis lorthonormaan is ,:)2(2 Zkjkxjj
kj ∈−= ψψ
jj VII ∈~+
11 −− ∈ jj VI+
11 −− ∈ jj Wd
+
L2−jd .00 Wd ∈
.00 VI ∈
.0021 IdddI jjj ++++= −− L
Course Scale +Wavelet Detail Coefficients
……
UCLA, Ivo DinovSlide 55
An example of wavelet decomposition
One level wavelet decomposition of a 1-D signal
10
UCLA, Ivo DinovSlide 56
2-Channel filter bank: Analysis bankForward Discrete Wavelet Transform
• H’ is the low-pass filter ;•G’ is the high-pass filter ;• 2 is the down-sampling operator: (1 3 4 6 5 8 7) (1 4 5 7).
G’
H’ 2
2
xs
d
Low-pass channel
High-pass channel
Scales
Details
UCLA, Ivo DinovSlide 57
2-channel filter bank: Synthesis bankInverse Discrete Wavelet Transform
• H is the low-pass filter ;• G is the high-pass filter.• 2 is the up-sampling operator: (1 4 5 7) (1 0 4 0 5 0 7).
G
H2
2
s
d
Low-pass channel
High-pass channel
y
UCLA, Ivo DinovSlide 58
A biorthogonal filter bank
G’
H’ 2
2x
s
d
G
H2
2
s
dy
Biorthogonal (or perfect) filter bank: if y=x for all inputs x .
UCLA, Ivo DinovSlide 59
An orthogonalorthogonal filter bank
G’
H’ 2
2x
s
d
Definition: Orthogonal filter bank is biorthogonal, where both analysis filters H’ and G’ are the time reversals of the synthesis filters H & G: e.g., H=(1, 2, 3) H’=(3, 2, 1).
G
H2
2
s
dy
UCLA, Ivo DinovSlide 60
The fundamental theorem of MRA
An orthogonalorthogonal Mallat-Meyer MRA corresponds to an orthogonalorthogonal filter bank with the synthesis filters:
where, the hn’s and gn’s are the 2-scale connection coefficients in the dilation and wavelet equations:
And, the multiresolution wavelet decomposition of f corresponds to the iteration of the analysis bank with the φ-coefficients of f as the input digital data.
).:(),:( ZngGZnhH nn ∈=∈=
).2(2)(),2(2)( nxgxnxhxn nn n −=−= ∑∑ φψφφ
UCLA, Ivo DinovSlide 61
The fundamental theorem (cont’d)
jj VI ∈+
11 −− ∈ jj VI+
11 −− ∈ jj Wd
+
L2−jd .00 Wd ∈
.00 VI ∈
.0021 IdddI jjj ++++= −− L
Suppose j=2, and ∑=k k xkcI ).()( ,222 φ
G’
H’ 22
c2d1
G’
H’ 22 d0
c0c1
11
UCLA, Ivo Dinov Slide 62
The Discrete Wavelet Transform – DWT
Alternating quardature mirror – Smooth (S) and Detail (D) filters for Daub4 filterbank.
NNcccccccc
cccccccc
cccccccc
×
2301
1032
0123
3210
0123
3210
- -
. .. .
. .- -
- -
SD
Half the results of the S and D filters are discarded (decimation) .N N/2 Smooth/Low-Pass components and N/2 Detail/High-Pass components.
UCLA, Ivo DinovSlide 63
DWT – Pyramidal Algorithm (forward)
Alternating quardature mirror – Smooth (S) and Detail (D) filters for Daub4 filterbank.
Data(y) Apply Permute Apply PermuteMatrix: Cxy Elements Matrix: Cxy Elements
Details, waveletcoefficients
→
→
→
→
4
3
2
1
2
1
2
1
4
3
2
1
2
2
1
1
4
3
2
1
4
3
2
1
4
4
3
3
2
2
1
1
8
7
6
5
4
3
2
1
ddddddddssss
ddddddssddss
ddddssss
dsdsdsds
yyyyyyyy Smooth
Components ,Mother functioncoefficients
Forward DWT
UCLA, Ivo DinovSlide 64
DWT – Pyramidal Algorithm (inverse)
Alternating quardature mirror – Smooth (S) and Detail (D) filters for Daub4 filterbank.
Data(y) Permute Apply Permute ApplyElements CTxw Elements CTxw
←
←
←
←
=
8
7
6
5
4
3
2
1
4
4
3
3
2
2
1
1
4
3
2
1
4
3
2
1
4
3
2
1
2
2
1
1
4
3
2
1
2
1
2
1
8
7
6
5
4
3
2
1
?
wwwwwwww
dsdsdsds
ddddssss
ddddddssddss
ddddddddssss
yyyyyyyy
InverseDWTC=orthomatrix
UCLA, Ivo DinovSlide 65
FBI fingerprints.JPEG2000 image compression file format.Image indexing and image search engines (for databanks).Signal modeling.Image denoising and restorations (optimal representation).Texture analysis.Brain Mapping and Neuroimaging.Algorithm speeding up based on multi-resolution representation (e.g., solving large sparse-matrix linear equ’s)Time series analysis.
Major Applications of Wavelets
UCLA, Ivo DinovSlide 67
Completeness and Efficiency of Signal Representation – Wavelet functions 2D
A 2D Daubechies wavelet illustrating the compact support,fast decaying and oscillatory properties of wavelets.
UCLA, Ivo DinovSlide 68
Completeness and Efficiency of Signal Representation – Wavelet functions 3D
Signal & it’s 3D WT
12
UCLA, Ivo DinovSlide 69
Completeness and Efficiency of Signal Representation – Wavelet functions 3D
Signal & it’s 3D WT
UCLA, Ivo DinovSlide 70
Completeness and Efficiency of Signal Representation – Wavelet functions 3D
Signal & it’s 3D WT
UCLA, Ivo DinovSlide 71
Completeness and Efficiency of Signal Representation – Wavelet functions 3D
Signal & it’s 3D WT
UCLA, Ivo DinovSlide 72
Wavelets
The three curves on this graph represent the original signal(HeavySine function, thin red curve), its wavelet transform (thin bluecurve) and the reconstructed function estimator using only the largest2% of the wavelet parameters. Note the space-frequency decorrelationof the original data in the wavelet-space (blue curve).
UCLA, Ivo DinovSlide 73
Wavelets
If Time Permits:C:\Ivo.dir\Research\movies\WaveletMovie.mpg
Also available online at the Wavelet Analysis of Image Registration site:http://www.loni.ucla.edu/~dinov/WAIR.html
UCLA, Ivo DinovSlide 74
Atlas–Based Wavelet–Space Stat Analysis
Design protocol for the wavelet-based construction and utilization of the fMRIfunctional atlas (F&A SVPA). The construction of the anatomical probabilisticatlas is augmented by including the regional distributions of the waveletcoefficients of the functional signals for one population. Statistical analysis offunctional variability between a new fMRI volume and the F&A SVPA atlas isassessed following wavelet-space shrinkage.
13
UCLA, Ivo DinovSlide 75
Distribution of the wavelet coefficients
Displayed is the frequency histogram of the wavelet coefficients at 1,000 randomly selected locations (random indices of wj,k), averaged across all 578 MRI volumes part of the ICBM database [Mazziotta et al., 1995]. Notably, most of the wavelet coefficients are near theorigin, with some having sporadic, but large magnitudes. Heavy-tail distributions models will be appropriate for these data.
-5 0 5
UCLA, Ivo DinovSlide 76
Distr’n of the wavelet coefficients – 3 volumes
Shown here are the frequency distributions of the wavelet coefficients for three separate MRI volumes (randomly selected from the pool of 578). There is little variation between these three individuals, however the overall shape of the distribution of the magnitudes of the wavelet coefficients of each individual across the 1,000 random locations is more regular, still heavy-tailed, than the averaged distribution across subjects depicted in the previous Figure
UCLA, Ivo DinovSlide 77
Wavelet Coefficient distributions MRI data
This image illustrates a portion of all of the individualdistributions of the waveletparameters for the 578 ICBM MRI volumes. The horizontal axis represents the magnitude of the wavelet coefficients in the range [-4 ; 4], vertical axis labels the subject index and the row color map indicates the frequencies of occurrence of a wavelet coefficient of certain magnitude across all 1,000 voxels. Bright colors indicate high, and dark colors represent low, frequencies.
Vox
el L
ocat
ion Distribution Across 578 Subjects
UCLA, Ivo DinovSlide 78
fMRI data – 1 subject wavelet coefficient distributions
This diagram depicts the frequency histogram of the average magnitude of a wavelet coefficient, across all 128 fMRI 3D time-volumes (part of the fMRI study of Buckner and colleagues [Buckner et al., 2000]) at 1,000 randomly selected locations (random indices of wj,k). Again, we observe the heavy-tailness of the data.
UCLA, Ivo DinovSlide 79
A 2D image displaying all of the individual distributions for each 128 3D time points (1 run, 15 trials) of an fMRI timeseries. On the horizontal axis is the magnitude of the wavelet and the vertical axis labels the time point of the fMRI epoch.
Colors indicate the frequencies of occurrence of a wavelet coefficient of certain magnitude across all 1,000 voxel locations. The across row average is effectively shown on previous Figure.
Wavelet coefficient Distribution – 1 run
-4 0 4
1 64
128
Magnitude of wavelet-coefficients
Time
Freq
uenc
y
UCLA, Ivo DinovSlide 80
Distribution of 4D wavelet coefficients
This graph shows the frequency distribution of the wavelet coefficients of the complete 4D WT of the fMRI timeseries. Note the slow asymptotic decay of the tails of this distribution (data-size: 64x64y16z128t, floating point). Here the x-axis represents the magnitude of the wavelet coefficients and the vertical axis indicates the frequency of occurrence of a wavelet coefficient of the given magnitude within the 4D dataset. Sporadic behavior of the large-magnitude wavelet coefficients is clearly visible at the tails of this empirical distribution.
-10 0 10
14
UCLA, Ivo DinovSlide 81
Distribution of (and Models for) the Wavelet Coefficients of the 4D fMRI volume
Several heavy-tail distribution models are fitted to the frequency histogram of 4D wavelet coefficients. These include double-exponential, Cauchy, T, double-Pareto and Bessel K form models. Because our statistical tests will be applied on the coefficients that survive wavelet shrinkage it is important to utilize a distribution model that provides accurate asymptotic approximation to the real data in the tail regions.
UCLA, Ivo DinovSlide 82
Right Tail of Distribution of (and Models for) the Wavelet Coefficients of the 4D fMRI volume
Shows the extreme left trail of the data distribution (data is symmetric). The double-exponential and the T distribution models underestimate the tails of the data asymptotically, but provide good fits around the mean. Cauchy, Bessel K forms and double-Pareto distribution models, in that order, provide increasingly heavier tails with Cauchy being the most likely candidate for the best fit to the observed wavelet coefficients across the entire range. The double-Pareto and the Bessel K form densities provide the heaviest tails, however, they are inadequate in the central range [-3 : 3] and undefined near the mean of zero.
UCLA, Ivo DinovSlide 83
Wavelet-domain Atlas-based fMRI Analysis
• Anatomical Sub-Volume Probabilistic Atlas (SVPA)
• Analysis using conventional time-domain SVT (sub-volume thresholding)Subtractionparadigm:AD-ElderlyNC
20
0
UCLA, Ivo DinovSlide 84
Wavelet-domain Atlas-based fMRI Analysis
• Raw results following analysis in the wavelet-domain (has ringing effects outside of the true ROI where the activation supposedly occurred).
• Post-processed wavelet domain statistics [final uniform thresholding (top 1%), reconstruction of all wavelet-space stats into a single volume – regional calculations]:
UCLA, Ivo DinovSlide 85
Bayesian Mixture Models for fMRI Analysis
• We use two component mixture prior distributions on the wavelet coefficients θj,k with
where πj is a proportion between 0 and 1, δ(0) is the Diracpoint mass at zero and τj >0. In other words, there is a level-dependent positive probability πj a priori that each wavelet coefficient will be exactly zero. If not, the coefficient will be normally distributed with mean zero and a level-specific standard deviation τj
( ) )0()1(,0~| 2, δπτπτπθ jjjjjkj N −+
UCLA, Ivo DinovSlide 86
Bayesian Mixture Models for fMRI Analysis
• Given the observed data over an ROI y=f+ε, the corresponding wavelet representation w = θ +z, where W is the discrete WT, w = Wy, θ=Wf and z = Wε, and the above prior distribution for the true wavelet coefficients, the posterior distributions of θj,k are again independent two-component mixtures:
Where λj,k= 1/(1+ ρj,k) and and the posterior odds that θ j,kis exactly zero are:
( ) )0()1(,~| ,22
22
22
2,
,,, δλτσ
τσ
τσ
τλθ kj
j
j
j
jkjkjkjkj
wNwp −+
++
+
−+−=
)(2exp
1222
,222
,στσ
τσ
στπ
πρ
j
kjjj
j
jkj
w
15
UCLA, Ivo Dinov Slide 87
Completeness and Efficiency of Signal Representation
UCLA, Ivo Dinov Slide 88
Completeness and Efficiency of Signal Representation - Functional Bases
UCLA, Ivo Dinov Slide 89
Completeness and Efficiency of Signal Representation - Functional Bases
UCLA, Ivo Dinov Slide 90
Completeness and Efficiency of Signal Representation - Functional Bases
UCLA, Ivo Dinov Slide 93
Completeness and Efficiency of Signal Representation –Why use Wavelet functions?
IWT(Uniform 5%)
HeavySine
UCLA, Ivo Dinov Slide 94
Wavelet-space Shrinkage
16
UCLA, Ivo Dinov Slide 95
Wavelet-space Shrinkage
UCLA, Ivo Dinov Slide 96
Wavelet-space Shrinkage
UCLA, Ivo Dinov Slide 97
Wavelet-space Shrinkage
UCLA, Ivo Dinov Slide 98
Wavelet-space Shrinkage
UCLA, Ivo DinovSlide 99
Stereotactic data alignment - warping
Data Target
Warp 1 Warp 2 Warp 3
UCLA, Ivo Dinov Slide 100
Wavelet-space Warp ClassificationFive PET (left) and corresponding MRI (right) data volumes (left columns), their LS,
MI and Affine warps (columns 2-4) and the target (right column).
Whi
ch o
ne is
the
best
war
p?
17
UCLA, Ivo Dinov Slide 101
Wavelet-space Warp Classification
Spread Group Classification (SGC) Analysis of PET data-separation of saccade PET frequency clouds
UCLA, Ivo Dinov Slide 102
Wavelet-space Warp Classification
Cluster Group Classification (CGC) Analysis of MRI data- how close together are all 5 MRI under each warp strategy?
Smal
ler c
lass
ifyin
g va
lues
be
tter w
arps
1
4
2
3Rank