medical imaging informatics: lecture # 1lecture # 1€¦ · estimation theoryestimation theory...

MEDICAL IMAGING INFORMATICS:Lecture # 1Lecture # 1

Estimation TheoryEstimation Theory

Norbert SchuffProfessor of Radiology

VA Medical Center and UCSFVA Medical Center and [email protected]

UCSF VAMedical Imaging Informatics 2012 NschuffCourse # 170.03Slide 1/31

Department of Radiology & Biomedical Imaging

Objectives Of Today’s LectureObjectives Of Today s Lecture

• Understand basic concepts of data modelingp g– Deterministic– Probabilistic– BayesianBayesian

• Learn the form of the most common estimators– Least squares estimator

Maximum likelihood estimator– Maximum likelihood estimator– Maximum a-posteriori estimator

• Learn how estimators are used in image processing

UCSF VADepartment of Radiology & Biomedical Imaging

What Is Medical Imaging Informatics?• Signal Processing

– Digital Image Acquisition – Image Processing and Enhancement

• Data Mining• Data Mining– Computational anatomy– Statistics– Databases– Data-mining– Workflow and Process Modeling and SimulationWorkflow and Process Modeling and Simulation

• Data Management– Picture Archiving and Communication System (PACS) – Imaging Informatics for the Enterprise – Image-Enabled Electronic Medical Records – Radiology Information Systems (RIS) and Hospital Information Systems (HIS)Radiology Information Systems (RIS) and Hospital Information Systems (HIS)– Quality Assurance – Archive Integrity and Security

• Data Visualization– Image Data Compression – 3D, Visualization and Multi-media3D, Visualization and Multi media – DICOM, HL7 and other Standards

• Teleradiology– Imaging Vocabularies and Ontologies– Transforming the Radiological Interpretation Process (TRIP)[2]– Computer-Aided Detection and Diagnosis (CAD).


Co pute ded etect o a d ag os s (C )– Radiology Informatics Education

• Etc.

What Is The Focus Of This Course?Learn using computational tools to maximize information for

knowledge gain:

Pro-active

ImageMeasurements Model knowledge

Let Data

Drive the model

Image Model g

Collectdata Compare

withmodel

Reactive


Challenge: Maximize Information Gain

1. Q: How to estimate quantities from a given set of t i ( i ) t ?uncertain (noise) measurements?

A: Apply estimation theory (1st lecture today)

2. Q: How to quantify information?A: Apply information theory (2nd lecture next week)A: Apply information theory (2 lecture next week)

UCSF VAMedical Imaging Informatics 2009, NschuffCourse # 170.03Slide 5/31


Motivation Example I: Tissue ClassificationGray/White Matter Segmentation

1.0

Hypothetical Histogram

0.6

0.8

0.0

0.2

0.4

Intensity

GM/WM overlap 50:50;GM/WM overlap 50:50;Can we do better than flipping a coin?



Motivation Example II: D i Si l T kiDynamic Signal Tracking

Goal: Capture a dynamic signal on a static background

Goal: Capture a periodic signal corrupted by noise

5

-1

1

3

0 1000 2000 3000 4000 5000 6000

-5

-3

D. Feinberg Advanced MRI Technologies, Sebastopol, CA

0 1000 2000 3000 4000 5000 6000

Time


Motivation Example III: Si l D itiSignal Decomposition

Goal:Diffusion Tensor Imaging (DTI) Goal:Capture directions of fiber bundles

Diffusion Tensor Imaging (DTI)•Sensitive to random motion of water•Probes structures on a microscopic scale

Microscopic tissue sample

Dr. Van Wedeen, MGH


Quantitative Diffusion Maps

Basic Concepts of Modeling

Θ: unknown state of interest

ρ: measurement

: Estimator - aΘ: Estimator a good guess of Θbased on measurements

Θ

Cartoon adapted from: Rajesh P N Rao Bruno A Olshausen Probabilistic Models of the Brain


Cartoon adapted from: Rajesh P. N. Rao, Bruno A. Olshausen Probabilistic Models of the Brain. MIT Press 2002.

Deterministic ModelN = number of measurementsM = number of states, M=1 is possibleUsually N > M and ||noise||2 > 0

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞1 11 1 1 1

1

. . .m

n n nm m n

h h noise

h h noise

ϕ θ

ϕ θ

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟= +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

i

NxM M noise= +N Nφ H θ Unknowns

The model is deterministic, because discrete values of Θ are solutions.

Note:1) we make no assumption about the1) we make no assumption about the

distribution of Θ1) Each value is as likely as any

another value

UCSF VA

What is the best estimator under thesecircumstances?


Least-Squares Estimator (LSE)

The best what we can do is minimizing noise:

LSE 0ˆnoise− =

− =

φ Hθ

φ Hθ

( ) LSE 0ˆ− =T TH φ H H θ

1( ) 1

LSE ϕ−

= T THθ H H

•LSE is popular choice for model fitting•LSE is popular choice for model fitting•Useful for obtaining a descriptive measure

But


•LSE makes no assumptions about distributions of data or parameters•Has no basis for statistics “deterministic model”

Prominent Examples of LSE

100

150 Mean Value: ( )1

1ˆN

meanj

jN

θ ϕ=

= ∑

0

50

100

Inte

nsiti

es (Y

)

1j=

Variance ( )( )2

var1

1ˆ ˆ1

N

iance meanj

jN

θ ϕ θ= −− ∑

100 300 500 700 900

Measurements (x)

-50

11 jN =

100

200

ity

Amplitude:1̂θ

Frequency:2̂θ

-100

0Inte

ns

2

Phase:3̂θ

Decay:4̂θ


100 300 500 700 900

Measurements

Likelihood Model( ) ( )

We believe θ is governed by chance, but we don’t know the governing probability.

We perform measurements for all possible

Likelihood ( ) ( )|L pϕ ϕΦ = Φ

We perform measurements for all possiblevalues of θ.

We obtain the likelihood function of θgiven our measurements ρgiven our measurements ρ

Note:θ is randomϕ Is measurableϕ Is measurableThe likelihood function is the joint probabilityof unknown θ and known ϕ


Likelihood Model (cont’d)Likelihood functionL

( ) ( )Pr |L Nθ ϕ θ=

New Goal:

( ) ( )Pr | ,NL Nθ ϕ θ=

New Goal:Find an estimatorwhich gives the most likely probability of θp yunderlying .( )L θ


Maximum Likelihood Estimator (MLE)

Goal: Find estimator which gives the most likely probability of θunderlying Highest probability( )L θ

( )max |MLE p ϕ θ=Θ N

g p y

( )ln | 0d p ϕ θ =

ΘMLE can be found by taking the derivative of the likelihood function

( )ln | 0MLE

pd

ϕ θθ =

=Nθ θ


Example I: MLE Of Normal DistributionNormal distribution

( ) ( )( )222

1Pr | , exp2

N

jϕ θ σ ϕ θ⎡ ⎤

= −⎢ ⎥⎣ ⎦

∑N0.8

1.0

Normal Distribution

( ) ( )( )212 jσ =

⎢ ⎥⎣ ⎦

∑

0.2

0.4

0.6

a2

log of the normal distribution (normD)

100 300 500 700 900a1

0.0( ) ( )( )222

1

1ln Pr | ,2

N

j

jϕ θ σ ϕ θσ =

= −∑N

Log Normal Distribution

MLE of the mean (1st derivative):

( ) ( )( )2

1ln Pr 0ˆ4

Nd jd N

θθ

ϕ= − =∑i -5

0

( ) ( )( )21ˆ4 j

jd Nσθ

ϕ=∑

( )1 N

MLE jN

ϕΘ = ∑-10


( )1j

MLE jN

ϕ=∑

100 300 500 700 900a1

-15

Example II: MLE Of Binominal Distribution(Coin Tosses)( )

n= # of tossesy= # of heads in n tossesθ = probability of obtaining a head in a toss (unknown)θ probability of obtaining a head in a toss (unknown)Pr(y| n, θ) = probability of heads from n tosses given θ

0 7

0 1

0.2

0.7Pr(y|n=10, θ =7)

0.0

0.1

y 0.3Pr(y|n=10, θ =3)

0.1

0.2


1 2 3 4 5 6 7 8 9 10N0.0

y

MLE Of Coin Toss (cont’d) Goal:

Given Pr(y| n=10, θ), estimate the most likely probability distribution MLEΘGiven Pr(y| n 10, θ), estimate the most likely probability distribution that produced the data.

Intuitively:

MLEΘ

#MLE

headsΘ =y

#MLE tosses

For a fair coin 0.5MLEΘ ≈



MLE Of Coin Toss (cont’d)

0.25

Coin tosses follow a binominal distribution

( ) ( )( ) ( )!Pr | , 1! !

n yynY y n θ θ θ −= = ⋅ −

0.10

0.15

0.20

Like

lihoo

d

Likelihood function of coin tosses

( ) ( )( ) ( )! !y n y−

( ) ( )( ) ( )!| , 1! !

n yynL y ny n y

θ θ θ −= ⋅ −−

0.1 0.3 0.5 0.7 0.9W

0.00

0.05

θ

( ) ( )( ) ( )10 7710!| 10, 7 0.5 1 0.5 0.127! 10 7 !

L n yθ −= = = ⋅ − =−

What is the likelihood of observing 7 heads given that we tossed a coin 10 times?

For a fair coin θ =0 5: ( )( )7! 10 7 !For a fair coin θ 0.5:

( ) ( )( ) ( )10 7710!| 10, 7 0.6 1 0.6 0.217! 10 7 !

L n yθ −= = = ⋅ − =−

For an unfair coin θ =0.6



MLE Of Coin Toss (cont’d)

( ) ( )!| 1 n yynL yθ θ θ −=0.20

0.25

od

Likelihood function of coin tosses

( ) ( )( ) ( )| 1! !

L yy n y

θ θ θ= ⋅ −−

0.05

0.10

0.15

Like

lihoo

0.1 0.3 0.5 0.7 0.9W

0.00

log likelihood function θ

( )

( ) ( )

ln |!ln ln ln 1

L yn y n y

θ

θ θ

=

+ +-1

( )( ) ( ) ( )ln ln ln 1! !

y n yy n y

θ θ+ + − −−

0.1 0.3 0.5 0.7 0.9

-3

-2



Φθ

MLE Of Coin Toss

( ) ( )

Evaluate MLE equation (1st derivative)

( ) ( )ln0

1d L n yy

dθ θ θ−

= − =−

i

#0(1 ) #MLEy n y heads

n tossesθ

θ θ−

= = =Θ= ⇒−

According to the MLE principle, the distribution Pr(y/n,ΘMLE) for a given n is the most likely distribution to have generated the observed datais the most likely distribution to have generated the observed dataof y.



Relationship between MLE and LSE• θ is independent of noise• Probability of measurements and noise have the same distribution

Assume:

( ) ( )Pr | Pr |noiseϕ θ ϕ θ θ= −θ N N H

For gaussian noise, i.e. zero mean

p(ρ|θ) is maximized when LSE is minimized



Bayesian Model

Now, the daemon comes into play, but we know

Prior knowledge

the daemon’s preferencesfor θ (prior knowledge).

New Goal:

( ) ( )Prprior θ θ=

New Goal:Find the estimator which gives the most likely probability distribution of θprobability distribution of θgiven everything we know.



Bayesian Model

( ) ( ) ( )| PrNposterior C Lϕ ϕθ ϕ θ θ= ⋅ ⋅



( ) ( ) ( )|Np ϕ ϕ ϕ

Maximum A-Posteriori (MAP) Estimator ( )Goal: Find the most likely ΘMAP (max. posterior density of ) given ϕ.

( ) ( )max |MAP NL pϕ θ ϕΘ = N

Maximize joint density

ΘMAP can be found by taken the partial derivative

( ) ( ) ( ) ( )ln | Pr ln | lnPr 0d L Ld

ϕ ϕθ θ θ θθ θ θ

∂ ∂= + =∂ ∂N N( ) ( ) ( ) ( )

dθ θ θ∂ ∂N N



Example III: MAP Of Normal Distribution

The sample mean of MAP is:

( )2 N

jμσ ϕ=Φ ∑MAP ( )2 21j

jTϕ μ

ϕσ σ =+

Φ ∑MAP

If we do not have prior information on μ, σμ inf or T inf

MAP MLˆ ˆ ˆ, LSE⇒μ μ μMAP ML , LSEμ μ μ



Posterior Distribution and Decision Rules

(Θ|ρ)(Θ|ρ)

ΘΘΘMSE


X ΘΘMAPΘMSE

Decision Rules

Measurements Likelihood Posterior function

Prior

Distribution

Gain

Result

Prior Distribution

GainFunction



Some Desirable Properties of Estimators I:

Unbiased: Mean value of the error should be zero

- 0E Φ Φ =

Consistent: Error estimator should decrease asymptotically as number of

2- 0 for large NMSE E= ΦΦ →

Consistent: Error estimator should decrease asymptotically as number of measurements increase. (Mean Square Error (MSE))

0 for large NMSE E ΦΦ →

What happens to MSE when estimator is biased?2 2 - - b bMSE E E= Φ +Φ

pp


variance bias

Some Desirable Properties of Estimators II:Some Desirable Properties of Estimators II:Efficient: Co-variance matrix of error should decrease asymptotically to itsminimal value for large N

( )( )- - . . .iik kk

T

iE some very small value= Φ Φ ≤Φ ΦC

minimal value for large N



Example:P ti Of E ti t M d V iProperties Of Estimators Mean and Variance

1 1N

( )1

1 1ˆj

E E j NN N

μ ϕ μ μ=

= = ⋅ =∑Mean:

The sample mean is an unbiased estimator of the true meanThe sample mean is an unbiased estimator of the true mean

21 1N

( ) ( )( )2

22 22 2

1

1 1ˆN

j

E E j NN N N

σμ μ ϕ μ σ=

− = − = ⋅ =∑Variance:

The variance is a consistent estimator becauseIt approaches zero for large number of measurements.



Properties Of MLEp

• is consistent: the MLE recovers asymptotically the true y p yparameter values that generated the data for N inf;

• Is efficient: The MLE achieves asymptotically the minimum error (= max. information)



SummarySummary

• LSE is a descriptive method to accurately fit data to a p ymodel.

• MLE is a method to seek the probability distribution that k th b d d t t lik lmakes the observed data most likely.

• MAP is a method to seek the most probably parameter value given prior information about the parameters andvalue given prior information about the parameters and the observed data.

• If the influence of prior information decreases, MAP approaches MLE.



Some Priors in ImagingSome Priors in Imaging

• Smoothness of the brain• Anatomical boundaries • Intensity distributions• Anatomical shapes• Physical models

P i t d f ti– Point spread function– Bandwidth limits

• Etc.



Estimation Theory: Motivation Example IGray/White Matter Segmentation

1.0

Hypothetical Histogram

0.6

0.8

0.0

0.2

0.4

Intensity

What works better than flipping a coin?

Design likelihood functions based onanatomyco-occurance of signal intensitiesothers


Determine prior distributionpopulation based atlas of regional intensitiesmodel based distributions of intensitiesothers

Estimation Theory: Motivation Example IIEstimation Theory: Motivation Example IIGoal: Capture dynamic signal on a

static background5

Poor signal to noise

1

3

-5

-3

-1

0 1000 2000 3000 4000 5000 6000

Time

Improvements to identify the dynamic signal:

Design likelihood functions based on

D. Feinberg Advanced MRI Technologies, Sebastopol, CA

Design likelihood functions based onauto-correlationsanatomical information

Determine prior distributions fromi l t



serial measurementsmultiple subjectsanatomy

Estimation Theory: Motivation Example IIIEstimation Theory: Motivation Example III

Goal:Capture directions

Diffusion Spectrum Imaging – Human Cingulum Bundle

Capture directions of fiber bundles

Improvements to identify tracts:p o e e ts to de t y t acts

Design likelihood functions based onsimilarity measures of adjacent voxels, e.g. correlationsfiber anatomy

Determine prior distributions fromanatomyfiber skeletons from a populationothers

Dr. Van Wedeen, MGH

others



MAP Estimation in Image Reconstructions ith Ed P i P iwith Edge-Preserving Priors



Dr. Ashish Raj, Cornell U

MAP in Image Reconstructions with Edge-Preserving PriorsPreserving Priors

For DTI, use the fact that coregistered DTI images have common edge features:images have common edge features:


Dr. Justin Haldar Urbana-Champaign

MAP Estimation In Image Reconstruction

Human brain MRI. (a) The original LR data. (b) Zero-padding interpolation. (c) SR with box-PSF. (d) SR with Gaussian-PSF.p ( ) ( )

From: A. Greenspan in The Computer Journal Advance Access published February 19, 2008



Improved ASL Perfusion Results p

zDFT = zero-filled DFT

UCSF VABy Dr. John Kornak, UCSF

Bayesian Automated Image SegmentationBayesian Automated Image Segmentation

Bruce Fischl MGH



Bruce Fischl, MGH

Population Atlases As Priorsp



Dr. Sarang Joshi, U Utah, Salt Lake City

Population Shape Regressions Based Age-S l ti P iSelective Priors

Age = 29 33 37 41 45 49



Age 29 33 37 41 45 49Dr. Sarang Joshi, U Utah, Salt Lake City

Imaging Software Using MLE And MAPImaging Software Using MLE And MAPPackages Applications Languages

VoxBo fMRI C/C++/IDLVoxBo fMRI C/C++/IDLMEDx sMRI, fMRI C/C++/Tcl/Tk SPM fMRI, sMRI matlab/C iBrain IDLiBrain IDLFSL fMRI, sMRI, DTI C/C++

fmristat fMRI matlab BrainVoyager sMRI C/C++BrainVoyager sMRI C/C

BrainTools C/C++ AFNI fMRI, DTI C/C++

Freesurfer sMRI C/C++Freesurfer sMRI C/CNiPy Python



medical imaging informatics: lecture # 1lecture # 1€¦ · estimation theoryestimation theory...

Documents