medical image computing - bmva summer school 2014

Medical Image Computing:Image Registration & Segmentation

Daniel Rueckert, Ph.D.Biomedical Image Analysis Group, Department of Computing

Overview

• Introduction to image registration– transformation models– similarity measures– optimization techniques (very briefly)– rigid and non-rigid registration algorithms

• Introduction to image segmentation– EM-based segmentation– Graph-cut segmentation– atlas-based segmentation– patch-based segmentation

• Examples

What is image registration?

C. Ruff, 1995

Image to image registration

• Intra-subject registration– Aim: the registration of different images of the same subject.– Purpose: to combine anatomical and functional information of

different imaging modalities.– Examples:

• Registration of CT and MR images of the brain for surgery and therapy planning

• Registration of MR and SPECT/PET images for the localization of tracer uptake to indicate brain physiology or of the head and neck for cancer staging

Image to image registration

• Inter-subject registration– Aim: the registration of the same images of different subjects.– Purpose: to assess morphometric variability of anatomical

structure across individuals.• Serial registration

– Aim: the registration of a sequence of images (over time) of the same subject.

– Purpose: to monitor temporal changes and assessing drug treatment in diseases such as epilepsy, multiple sclerosis.

• Fusion of MR and CT head image:

Intra-subject registration example

MRI CT MRI + CT

Intra-subject registration example

• Fusion of MR and PET head image :

MRI PET MRI + PET

Serial registration exampleC

Baseline Unregistered Registered

Registration of pre-operative CT and MRI to intra-operative scene

Image to physical space registration

• Image guided surgery for navigation in surgery– Aim: the registration of the pre-operative acquired data to the

physical space of the patient lying on the operating table.– Purpose: to allow surgeons to use the pre-operative data for

guidance.– Applications: neurosurgery, surgery of skull base and

maxillofacial region • Radiation therapy

– Aim: the registration of the patient coordinate system to the coordinate system of the linear accelerator.

– Purpose: to allow position the patient so that the radiation dose can be calculated using pre-operative data.

Image to physical space registration

• Relating imaging device coordinates to the physical space of the patient

• Registration of:– extrinsic marker pins or stereotactic frames fixed to the patient

at scanning and operation time– intrinsic markers (anatomical landmarks, image features)– intra-operative images (video, ultrasound, X-ray or X-ray

fluoroscopy) to pre-operative image

What is image registration ?

Maximize similarity

Compute similarity

Transformation

Maximize similarity

Compute similarity

Transformation

Maximize similarity

Compute similarity

Transformation

Maximize similarity

Compute similarity

Transformation

Compute similarity

Transformation

Image registration as an optimization problem

Components of any generic registration algorithm

• Similarity measure– How similar are the images/features after registration, in other

words how good is the registration?

• Optimisation method– How do we find the transformation which maximises the similarity

subject to the desired constraints?

• Transformation model– What type of transformations are allowed and what constraints

should the model satisfy?

Overview of the rest of this talkN

on-rigidregistration

• Transformation models– splines– elastic and fluid models– regularization

registration

• Transformation models – rigid– affine• Similarity measures– mono-modal registration– multi-modal registration

Transformation models

• Transformations are described by a number of parameters (often referred to as Degrees of Freedom or DOF)

Number of Parameters

identity(0)

rigid(6)

rigid+scaling(7, 9)

affine(12,15)

non-rigid10’s - 100000’s

Rigid transformations

• Compensates for global patient repositioning• Preserves distances, planes and angles.

• Appropriate for: – brain, bone, optically tracked surgical instruments– often used to initialise non-rigid registration

Translation (3) Translation+Rotation (6)Rotation (3)

Rigid transformations

• Rigid transformation (6 degrees of freedom)

• tx, ty, tz describe the 3 translations in x, y, z• r11, ..., r33 describe the 3 rotations around x, y, z

• Compensates for additional global scaling and shears

Affine transformations

Shear (3+3)Scaling (3)

Translation (3) Translation+Rotation (6)Rotation (3)

Affine transformations

• Affine transformations (12 degrees of freedom)

Similarity measure

• Feature-based similarity measures of isolated geometrical features– e.g. points, lines, ridges, surfaces, curvature extrema– registration minimises distance between corresponding features– correspondences are interpolated between features

• Voxel-based similarity measures of intensities across the whole image– registration maximises a measure of image intensity similarity– correspondence are defined everywhere during the registration

Generic feature-based registration

• While dissimilarity > 0 and improvement do1. Feature extraction 2. Feature pairing3. Similarity formulation and outlier removal4. Dissimilarity reduction (optimization)

Great differences in each step depending on images and task!

Generic feature-based registration: Using points

• Orthogonal Procrustes problem:– Named after a robber in Greek mythology, who would offer travellers the

opportunity to stay the night in a perfectly fitting bed. – Unfortunately, it was the guest who was altered to fit the bed, rather than

the bed to fit the guest!– Short visitors were stretched to fit, and tall visitors had parts of their body

cut off so that they would fit, with invariably fatal results.

The hero Theseus stoppedthis unpleasant practice by subjecting Procrustes to his own method.

Generic feature-based registration: Using points for rigid registration

• Optimal fitting problem of least square type:– Closed-form solution for rigid case. Non-rigid registration

requires iterative solution– Given two sets of N corresponding points {xi} and {yi}, find the

rigid-body transformation (rotation matrix R and translation vector t) that minimizes the mean squared distance between the points:

i+t ! y

Centred

Centred androtated

iRx ix tRx +i

Translate

• Centre points: – Compute mean: – Centre wrt mean:

• Determine rotation matrix R via singular value decomposition (SVD) of correlation matrix H:

• Determine translation vector:

xx ∑=

xxx −= ii~ yyy −= ii

xRyt −=

))det(,1,1(,

),,,( 321321TT diag

diagVUDVDUR ==

≥≥=Λ λλλλλλ

IVVUUVUH

≡ ∑=

• Anatomic landmarks• Skin-affixed markers• Bone-implanted markers• Advantages of markers

– Fiducial is independent of anatomy– Automatic algorithms for locating fiducial markers can take

advantage of marker’s shape and size in order to accurately and robustly compute the fiducial point

“Bummer of a birthmark, Hal!”

Skin-Affixed Markers

• Advantage: non-invasive• Disadvantage: can move due to mobility of skin

Bone-Implanted Markers

• Advantage: cannot move • Disadvantage: invasive

Bone-Implanted Markers

CT MR-T1 MR-T2

Generic feature-based registration: Using surfaces

• Generally aligns a large number of points • 3D correspondence of anatomy or pathology is

often not known or unavailable• The 3D boundary of an object is an intuitive and

easily characterized geometrical feature that can be used for registration

• Surface-based methods involve determining corresponding surfaces in different images and/or physical space and finding transformation that best aligns these surfaces

• Skin surface (air-tissue interface)• Bone surface (tissue-bone interface)• Representations

– Point sets (collection of points on the surface)– Faceted surfaces e.g., triangle set approximating surface– Implicit surfaces– Parametric surfaces, e.g., B-spline surface

Surface Extraction

• Images– Isointensity contour extraction (Marching Cubes)– Deformable models

• Physical space– Laser range finders– Stereo video systems (photogrammetry)– Localizers

• Articulated mechanical• Magnetic• Active and passive optical

– Tracked ultrasound for bone surface

Marching Cubes: Example

Skin surface Bone surface

Generic feature-based registration: Using surfaces for rigid registration

• Given a set of N surface points {xi} and a surface Y, find the rigid transformation (rotation matrix R and translation vector t) that minimizes the mean squared distance between the points and the surface:

! where yj denotes the closest (rather than corresponding) point on Y.

yi=C(T(x

D(T) =1

• Iterative Closest Point (ICP) [Besl & McKay, PAMI 1992]

• To register data shape X to model shape Y, decompose X into point set {xi} , then–Compute closest points {yi} on Y–Register points {xi} to points {yi} –Apply resulting transformation to points {xi}–Repeat until convergence

Registration based on voxel-based similarity measures

• Registration based on geometrical features– requires the extraction of points, lines or surfaces– registration error is affected by localization errors during the

feature extraction stage• Registration based on voxel similarity measures

– uses some measure derived from the intensity of the voxels directly

– assumes that there is a relationship between the image intensities of both images if the images are registered

– does not require any feature extraction, thus registration error is not affected by any localization errors

Registration based on voxel-based similarity measures

• Registration based on geometric features is independent of the modalities from which the features have been derived

• Registration based on voxel similarity measures features must make a distinction between – monomodality registration:

• CT-CT, MR-MR, PET-PET, etc– multimodality registration

• MR-CT, MR-PET, CT-PET, etc

Mono-modal image registration

• Sums of Squared Differences (SSD)

– assumes an identity relationship between image intensities in both images

– optimal measure if the difference between both images is Gaussian noise

– sensitive to outliers

For all voxels i

! " IB(T(x

• Robust statistics can be used to reduce the influence of outliers on the registration

• Sum of absolute differences (SAD)

– assumes an identity relationship between image intensities– less sensitive to outliers

• Cross Correlation (CC)

– is the average intensity in image A– is the average intensity in image B– assumes a linear relationship between image intensities– useful if images have been acquired with different intensity

windowing

S =IA(x)!µ

A( ) IB (T(x))!µB( )"

IA(x)!µ

"( ) IB(T(x))!µ

Mono-modal vs Multi-modal

• Mono-modal image registration– Image intensities are related by some simple function

• identity: Use SSD (e.g. CT to CT registration)• linear: Use CC (e.g. MR T1 to MR T1 registration)

• Multi-modal image registration – Image intensities are related by some unknown function or

statistical relationship– Relationship between intensities is not known a-priori– Relationship between intensities can be viewed by inspecting a

2D histogram or co-occurrence matrix

Multi-modal images: Measuring similarity

CT MRbone

airsoft tissue

Measuring similarity: 2D histograms

registered misregistered by 2mm misregistered by 5mm

MR/PET

registered misregistered by 2mm misregistered by 5mm

misregistered by 2mm misregistered by 5mmregistered

p(a,b)

Images as probability distributions

• Frequency of corresponding intensity pairs can be interpreted in terms of probabilities

Nbahbap ),(),( =

is the joint probability of a voxel having greyvalue a in the first image and greyvalue b in the second image.

is the marginal probability of a voxel in the first image having greyvalue a( )∑=

bbapap ,)(

is the marginal probability of a voxel in the second image having greyvalue b( )∑=

abapbp ,)(

Voxel similarity based on information theory

• Entropy (Shannon-Wiener)

describes the amount of information in image A. • Joint Entropy

describes the amount of information in the combined images A and B.

• Venn diagram:

H(A) H(B)

H(A,B)

I(A;B)

• Mutual Information (Viola et al., 1995 and Collignon et al., 1995)

describes how well one image can be explained by another image.

• Mutual Information can be expressed in terms of marginal and joint probability distributions:

• Mutual Information is still sensitive to the overlap of the two images

• Normalized Mutual Information (Studholme et al, 1999)

can be shown to be independent of the amount of overlap between images.

Multi-m

ono-modal

• Sums of squared differences

• Cross-correlation

• Mutual information (or variants)

Image registration: Voxel-based similarity measures

! " IB(T(x

S =IA(x)!µ

A( ) IB (T(x))!µB( )"

IA(x)!µ

"( ) IB(T(x))!µ

S = ! p(a,b)logp(a,b)

p(a)p(b)b

Optimization of voxel-similarity measures

• Optimization of voxel-similarity measures normally requires iterative (gradient-based) techniques, i.e.– steepest gradient descent– conjugate gradient descent– more recently, discrete gradient free optimization techniques

have been proposed (e.g. Glocker et al, IPMI 2007)• In general:

– Global optimization schemes are not feasible for image registration

– Local optimization schemes are much more efficient but will get trapped in local optima

– Registration has a limited capture range

see Numerical Recipes for description of various optimization schemes

Similaritymeasure

Alignment

inside capture range outside capture range

Similarity

measure

Alignment

local maxima

global maximum

Multi-scale optimization

• Capture range can be increased by using multi-scale techniques:

Multi-resolution optimization

• Registration can be accelerated by using multi-resolution techniques:

Rigid registration using Mutual Information

Thanks to Colin Studholme

Rigid registration using Mutual Information

Thanks to Colin Studholme

Non-rigid registration

• Rigid registration is appropriate for – brain (constrained by skull)– bone (neck, vertebrae)

• Affine registration is appropriate – if not all image acquisition parameters are known:

• unknown voxel sizes• unknown gantry tilt

– if scale changes are expected:• growth• inter-subject registration

Limited applicability except as initialization

for non-rigid registration

Non-rigid registration in neuroimaging

• Motion:– Compensation for tissue deformation

• Brain deformation during neurosurgery• Fusion of functional and anatomical information (MR/PET,

etc.)• Longitudinal or cross-sectional studies:

– Quantification of change over time, • brain atrophy• brain growth

– Quantification of differences between populations• voxel-based morphometry (low-dimensional transformations)• deformation-based morphometry (high-dimensional

transformations)• Atlas-based studies

– Segmentation via atlas propagation

• Transformation T(x) ➞ x’ which defines the spatial relationship between the two images:

Representing deformations

x’ = T(x) = x + u(x)

T xx u

Types of non-rigid transformations

#DOFs = #voxels

• Non-parametric transformations– optical flow type registration (e.g. Demons)– elastic registration– fluid or geodesic registration

#DOFs << #voxels

• Parametric transformations– affine transformations– polynomial transformations

• linear• quadratic• cubic

– spline-based transformations

#DOFs = #voxels

#DOFs << #voxels

Non-rigid transformations

• Affine transformation model canan be extended to other higher-order polynomials:– 3rd order (60 DOF)– 4th order (105 DOF)– 5th order (168 DOF)

• Problems:– can model only global shape changes, not local shape changes– higher order polynomials introduce artifacts such as oscillations

• Used for registration– Woods et al., JCAT 1998

• Use linear combinations of (orthogonal) basis functions θi instead of polynomials

• Basis functions may be– trigonometric basis functions (used by SPM)

• spectral representations of deformations– wavelet basis functions

• Used for registration– Amit et al., Journal of Am. Statis. Ass., 1991– Ashburner et al., Human Brain Mapping, 1999

Before deformation After deformation

Displacement in thehorizontal direction

Displacement in thevertical direction

Registration transformation: Splines

• Engineering:– Splines are long flexible strips of wood or metal which were

deformed by attaching different weights along their length• Mathematics:

– Splines are a tool used to approximate or interpolate functions from scattered data:

– 1D: curves– 2D: surfaces– 3D: volumes

Registration transformation: Interpolating splines

• Basic idea:– identify a set of points in image A– identify the corresponding set of points in image B– find a spline transformation which

• interpolates the displacement field at the points• produces a smoothly varying displacement field between the

points

– points are either • anatomical or geometrical landmarks• pseudo landmarks or control points

Registration transformation: Thin-plate splines

• Thin-plate splines can be defined as a linear combination of radial basis functions θ :

• A transformation between two images can be defined by three separate thin-plate splines:

t(x,y,z) = a1+ a

2x + a

3y + a

4z + b j" # j $ (x,y,z)( )

T(x,y,z) = t1(x,y,z), t

2(x,y,z), t

3(x,y,z)( )

"(s) =s

2log s

2( ) in 2D

s in 3D

• Interpolation conditions can be written in matrix form as

is a 4 x 3 matrix containing the affine coefficientsis a n x 3 matrix containing the non-affine coefficientsis a n x n matrix with

" ij = # $i %$ j( )

• Used for registration:– with points – Bookstein, IEEE PAMI, 1989 and Goshtasby, IEEE

Geoscien. and Rem. Sensing, 1998– with voxel similarity measures - Meyer et al., Medical Image

Analysis, 1996• Advantages:

– Control points can have arbitrary spatial distribution• Disadvantages:

– Control points have global influence since the radial basis function have infinite support

Registration transformation: B-splines

• Free-Form Deformation (FFD) are a common technique in Computer Graphics for modelling 3D deformable objects

• FFDs are defined by a mesh of control points with uniform spacing

• FFDs deform an underlying object by manipulating a mesh of control points– control point can be displaced from their original location– control points provide a parameterization of the transformation

• FFDs based on B-splines can be expressed as a 3D tensor product of 1D B-splines:

%$, s =

%$, t =

$#%1, j =

$#%1, k =

u(x, y, z) = bi (r) !bj (s) !bk (t) !k=0

" #i+l, j+m,k+n

• bi corresponds to the B-spline basis functions:

0 |τ| >= 2 b3(τ+2) -2 < τ <= -1B(τ) = b2(τ+1) -1 < τ <= 0 b1(τ) 0 < τ <= 1 b0(τ-1) 1 < τ < 2

1-2 -1 2

b0(s) = (1! s)

3/ 6 b

2(s) = (!3s

2+3s+1) / 6

b1(s) = (3s

2+ 4) / 6 b

3(s) = s

Registration transformation: FFDs

source

target

Registration transformation: FFDs

5 mm spacing 20 mm spacing

Displacement of 1 control point along 1 axis

Thanks to Marc Modat

Image registration: Adding regularization

• In case of non-rigid registration regularization is often required to obtain realistic deformations:

• Bayesian interpretation:– similarity term can be viewed as likelihood– penalty term can be viewed a prior probability which expresses

a-priori knowledge about expected deformations

C = S +!P

Similarity Penalty

Image registration: Adding regularization

• Diffusion-like regularization

• Curvature-like regularization

• Others like volume preservation (see later)

P =!2u

+ 2!2u

#DOFs = #voxels

#DOFs << #voxels

Registration using optical flow

• Optical flow is a well-known computer vision technique– Basic assumption:

! i.e. image intensity is conserved

• Using a Taylor series expansion and ignoring higher order terms, the OF equation becomes:

I(p, t) = I(p+!p, t +!t) = J

Registration using optical flow

• Or

is the temporal difference between images! is the spatial gradient of the images• To estimate the displacement field we need

additional constraints (i.e. smoothness)

!I "u = I # J

• Start with optical flow equation:

• Using linearization one can derive:

• To avoid instabilities this can be changed to:

Variants of optical flow: Demons

!I "u = I # J

u =(I ! J )"J

"J( )2

u =(I ! J )"J

"J( )2+ (I ! J )

Variants of optical flow: Demons

Image matching as a diffusion process: an analogy with Maxwell's demons. Thirion, J. P. Med Image Anal, 1998, 2, 243-260

Elastic registration

• Deformation is modelled as a physical process which resembles the stretching of an elastic material (i.e. rubber)

• Deformation is governed by two forces– internal force: caused by deformation of the elastic body, i.e.

stress. The internal force is zero if the body is in equilibrium.– external force: acts on the elastic body and causes the body to

deform away from the equilibrium.• External force can be based on

– distance between corresponding geometric features (points, lines or surfaces)

– voxel-similarity measures

• Deformation is described by the Navier linear elastic partial differential equation (PDE):

- displacement field - external force - gradient operator - Laplace operator - Lamé’s elasticity constants

µ⇥2u(x, y, z) + (� + µ)⇥(⇥ · u(x, y, z)) + f(x, y, z) = 0

• Elasticity constants can be interpreted in terms of– Young’s modulus

– Poisson’s ratiowhich relates the stress and strain of a body

which is the ratio between lateral shrinking and longitudinal stretching

• PDE can be solved using either– finite differences and successive overrelaxation (SOR)

• produces a dense displacement field at every voxel– finite element methods

• produces a sparse displacement field at the FEM nodes.• displacements at every voxel can be obtained by FEM

interpolation• Problems:

– cannot cope with large deformations– cannot cope with changes in topology

• Extensions of elastic registration– spatially varying elasticity parameters (Davatzikos, 1997)

Large deformation

models

Small deformation

models

– spline-based transformations• Non-parametric transformations

– optical flow type registration (e.g. Demons)– elastic registration– fluid or geodesic registration

Fluid registration

• Christensen 1996, Bro-Nielsen 1996• Advantages:

– can cope with large deformations– preserves topology

• Similar to elastic registration, but uses Navier-Stokes partial differential equation:

! where v is the velocity field

µ⇥2v(x, y, z) + (� + µ)⇥(⇥ · v(x, y, z)) + f(x, y, z) = 0

Fluid registration

x� = g(x, 1) = x +� 1

t=0v(g(x, t))dt

Fluid registration

Geodesic registration

• Traditional fluid registration techniques+ can deal with large deformations+ are diffeomorphic– approximate a greedy solution, not necessarily the shortest path

(geodesic)• Geodesic registration techniques

+ determine the shortest path (geodesic)+ resulting deformation can be used as a metric

• Several competing approaches:– F. Beg et al. Computing Large Deformation Metric Mappings via

Geodesic Flows of Diffeomorphisms. IJCV 2005.– B. Avants et al. Symmetric diffeomorphic image registration with

cross-correlation, Medical Image Analysis 2008

Thanks to S. Joshi

LDDMM registration

• In fluid or geodesic registration the transformations constructed from the group of diffeomorphisms of the underlying coordinate system

Diffeomorphisms: one-to-one onto (invertible) and differential transformations. Preserve topology.

Thanks to S. Joshi

LDDMM registration

• In fluid or geodesic registration the transformations constructed from the group of diffeomorphisms of the underlying coordinate system

Diffeomorphisms: one-to-one onto (invertible) and differential transformations. Preserve topology.

Alternatives to LDDMM registration

• Use stationary instead of time-varying velocity fields:– J. Ashburner, A fast diffeomorphic image registration algorithm,

Neuroimage 2007– T. Vecauteren et al., Diffeomorphic demons: Efficient non-

parametric image registration, NeuroImage 2009• Integration of velocity fields can be solved very

efficiently using squaring and scaling:g(x, 1

8) = x+ v(x)

g(x, 28) = g(x, 1

8)!g(x, 1

g(x, 38) = g(x, 1

8)!g(x, 3

g(x, 88) = g(x, 1

8)!g(x, 7

• Use stationary instead of time-varying velocity fields:– J. Ashburner, A fast diffeomorphic image registration algorithm,

Neuroimage 2007– T. Vecauteren et al., Diffeomorphic demons: Efficient non-

parametric image registration, NeuroImage 2009• Integration of velocity fields can be solved very

efficiently using squaring and scaling:g(x, 1

8) = x+ v(x)

g(x, 14) = g(x, 1

8)!g(x, 1

g(x, 12) = g(x, 1

4)!g(x, 1

g(x,1) = g(x, 12)!g(x, 1

• Determinant of the Jacobian operator

Properties of non-rigid transformations

Jac(u) = det

$$$$$$$$

''''''''

Jac(u) Properties

Jac(u) = 1 volume preservation

Jac(u) < 1 local contraction

Jac(u) > 1 local expansion

Jac(u) = ∞ tearing

Jac(u) < 0 folding

Image registration: Adding regularization II

• Local volume preservation (Rohlfing, TMI 2003):

• Local rigidity (Loeckx, MICCAI 2004):

• Others are possible, e.g. for topology preservation

P = log(J(x))! dx

P = J(x)J(x)T !1" dx

Freely available implementations

SyN (ANTS based on ITK)

Demons (part of ITK)

Dartel (part of SPM)

Fnirt (part of FSL)

F3D (part of NiftyReg)

Elastix (based on ITK)

Non-parametric implementations Parametric implementations

Symmetric normalisationNon-stationary velocity field

Gaussian smoothing regularisationVarious implementations

Stationary velocity fieldSquaring approach

Original free-form deformation implementation

Free-form deformation with multiple extra parameters

Free-form deformation with a focus on computation time

Free-form deformation using multiple filters

Freely available implementations

Atlas-based segmentation

• Most of the current state-of-the-art segmentation techniques use atlases (or models) as a form of a-priori information in some form

– EM – Graph cuts

Weak priors

– Multi-atlas– Patch-based segmentation

Strong priors

What is an atlas?

• An map or chart of the anatomy• Atlases usually have

– geometric information about points, curves or surfaces, or– label information about voxels (anatomical regions or function)

• Atlases are usually constructed from – single subjects– populations of subjects, e.g. by averaging to produce

probabilistic or statistical atlases• Atlases vs templates:

– atlas has labels or other information in form of annotations– templates have appearance or intensity information

What is an atlas?

Labelled atlas: One label (e.g. an integer value) per voxel indicating the anatomical structure or tissue present at that voxel

What is an atlas?

Anatomical template

Probabilistic atlas: One value (between 0 and 1) per voxel per anatomical structure or tissue representing

CSF probability

GM probability

WM probability

• Widely used in medical image segmentation, e.g. – Wells et al. 1996, van Leemput et al. 1999

0.95 0.05

WMGMCSF

EM Segmentation

EM Segmentation: Step by step(Notation in the style of Van Leemput, TMI 1999)

For all y and for all classes

E stepCalculate pand log L

p(m+1)ik =

f⇣yi | zi = ek,�(m)

⌘f⇣zi = ek

PKj=1 f

⇣yi | zi = ej ,�

⌘f⇣zi = ej

M stepUpdate parameters

µ(m+1)k =

Pni=1 p(m+1)

ik yiPn

i=1 p(m+1)ik

⇣�(m+1)

Pni=1 p(m+1)

⇣yi � µ(m+1)

Pni=1 p(m+1)

log L(�) =X

f(yi | zi = ek,�y)f(zi = ek)⌘

Compare current log L with old log L

Converged?

Finish

(x) = 1p2⇡�2 e

�(x)2

f(yi | zi = ek,�y) = G�k

⇣yi � µk

Full Brain - GM -

Zoomed

9% noise, no MRF

Expectation - Maximisation Segmentation

9% noise, with MRF

Expectation - Maximisation Segmentation: Step by step

Calculate pand log L

p(m+1)ik =

⌘f⇣zi = ek

PKj=1 f

⇣yi | zi = ej ,�

⌘f⇣zi = ej

Update parameters

log L(�) =X

Converged?

Finish

(x) = 1p2⇡�2 e

�(x)2

⇣yi � µk

µ(m+1)k =

Pni=1 p(m+1)

ik yiPn

i=1 p(m+1)ik

⇣�(m+1)

Pni=1 p(m+1)

⇣yi � µ(m+1)

Pni=1 p(m+1)

f(zi = ek | p(m)Ni

�(m)z ) =

e��UMRF(ek|p(m)Ni

,�(m)z )

PKj=1 e

��UMRF(ej |p(m)Ni

,�(m)z )

p(m+1)ik =

⌘f⇣zi = ek

PKj=1 f

⇣yi | zi = ej ,�

⌘f⇣zi = ej

Update parameters

µ(m+1)k =

Pni=1 p(m+1)

ik yiPn

i=1 p(m+1)ik

⇣�(m+1)

Pni=1 p(m+1)

⇣yi � µ(m+1)

Pni=1 p(m+1)

log L(�) =X

Converged?

Finish

(x) = 1p2⇡�2 e

�(x)2

⇣yi � µk

EM Segmentation: Adding priors

0.950.95

f(zi = ek | p(m)Ni

�(m)z ) =

⇡ik e��UMRF(ek|p(m)Ni

,�(m)z )

PKj=1 ⇡ij e��UMRF(ej |p(m)

Ni,�(m)

f(zi = ek | p(m)Ni

�(m)z ) =

,�(m)z )

Ni,�(m)

f(zi = ek | p(m)Ni

�(m)z ) =

e��UMRF(ek|p(m)Ni

,�(m)z )

PKj=1 e

��UMRF(ej |p(m)Ni

,�(m)z )

p(m+1)ik =

⌘f⇣zi = ek

PKj=1 f

⇣yi | zi = ej ,�

⌘f⇣zi = ej

Update parameters

µ(m+1)k =

Pni=1 p(m+1)

ik yiPn

i=1 p(m+1)ik

⇣�(m+1)

Pni=1 p(m+1)

⇣yi � µ(m+1)

Pni=1 p(m+1)

log L(�) =X

Converged?

Finish

(x) = 1p2⇡�2 e

�(x)2

⇣yi � µk

p(m+1)ik =

⌘f⇣zi = ek

PKj=1 f

⇣yi | zi = ej ,�

⌘f⇣zi = ej

Update parameters

µ(m+1)k =

Pni=1 p(m+1)

ik yiPn

i=1 p(m+1)ik

⇣�(m+1)

Pni=1 p(m+1)

⇣yi � µ(m+1)

Pni=1 p(m+1)

log L(�) =X

Converged?

Finish

(x) = 1p2⇡�2 e

�(x)2

⇣yi � µk

f(zi = ek | p(m)Ni

�(m)z ) =

,�(m)z )

Ni,�(m)

Image segmentation using graph cuts

• The problem of segmenting an image can be formulated in terms of a Markov random field (MRF)

• The MRF-energy function consists of a data-term D and a smoothness term V

• Data term: Unary function based on image information at the voxel

Prior information based on spatial information from atlas

Intensity model based on a Gaussian mixture model

• Data term: Unary function based on image information at the voxel

• Widely used in medical image segmentation, e.g. Song et al. 2006, van der Lijn et al. 2009, Wolz et al. 2010

Prior information based on spatial information from atlas

Intensity model based on a Gaussian mixture model

• MRF-based energy function minimization solved as a min-cut or max-flow problem: – Image voxels represented as nodes in a graph– Source and Sink nodes represent foreground and background

• Edge weights– Voxel to source/sink: Defined by the data term– Voxel to voxel: Defined by smoothness term

• Seek a ‘cut’ that separates the voxels in an optimal way

Greig et al, 1989, Boykov et al 2001

Segmentation using registration

Atlas/Model Segmentation

Registrationor matching

New image

Propagation of segmentation

Segmentation using registration

Manual segmentation

Automatic segmentation

Measuring agreement between segmentations

Tanimoto :=|A| ∩ |B|

|A| ∪ |B|

Dice :=2(|A| ∩ |B|)

|A|+ |B|

More than one way to measure overlap, e.g.:

For these, 1 ⇒ perfect agreement, 0 ⇒ perfect disagreementAnd they can be related, e.g.

Measuring agreement between segmentations

Can make comparisons between different types of segmentation:

Dice(human expert A, human expert B)⇒ Inter-rater agreementDice(human expert A, human expert A at a later time)⇒ Intra-rater agreementDice(human expert, automatic method)⇒ Accuracy

(if we treat the human expert as a gold standard)

Multiple atlases

• T1-weighted MR images from 30 volunteers (age range 20-54 years, median age 30.5 years, 15 male, 15 female)

• 256 x 256 x 124 volumes (resolution 1.25 × 0.94 × 1.50 mm)

• Each image is manually segmented in 83 anatomical structures

A. Hammers et al. Three-dimensional maximum probability atlas of the human brain, with particular reference to the temporal lobe. Human Brain Mapping, 19(4):224-247, 2003.

Multi-atlas segmentation using classifier fusion

Atlas SegmentationRegistration Unseen data

Decision fusion

Final segmentation

Heckemann et al.Neuroimage 2006

How do you fuse?

• Global fusion strategies:– Majority voting– Weighted voting

• Local fusion strategies:– Locally weighted fusion– Simultaneous Truth And Performance Level Estimation

(STAPLE)• Shape Based Averaging

Seg #3: Background

Seg #1 : Hippocampus

Seg #2: Background

Seg #4: Hippocampus

Seg #5: Background

Combining multiple segmentations: Simple majority voting

Heckemann et al.Neuroimage 2006

Multi-atlas segmentation using classifier fusion

Combining multiple segmentations: VOTE

• Disadvantage of majority votingSeg1 Seg2 Seg3

Majority voting Desirable output

Combining multiple segmentations: Simple majority voting

Combining multiple segmentations: SBA

• Average the shapes of the segmentations– Based on averaging the distance transforms of the segmentations

VOTE SBA

Combining multiple segmentations: Shape-based averaging

Probabilistic approach

What is the most probable underlying true segmentations when you see the above segmentations?

At the same time, it tells you the performance (sensitivity and specificity) of each segmentation.

Sensitivity p: true positive fractionSpecificity q: true negative fraction

Combining multiple segmentations: STAPLE

• Can we find the performance of each segmentation (p, q) and the ground truth (T) that maximises the likelihood of observing the segmentations (D)?

D (Data/expert segmentations we want to merge) known

T (the ground Truth) unknown

p, q (performance of each segmentation), unknown

Solved iteratively by Expectation-Maximization (EM) (Warfield et al., 2004)

D (Data/expert segmentations you want to merge)known

p, q (performance of each segmentation), unknown

T (the ground Truth)unknown

p(0), q(0) : initial estimate

f (T | D, p(0), q(0))

E-step

p(1), q(1) : next estimateM-step : use the current ground truth to estimate the next p(1), q(1).

Combining multiple segmentations: STAPLE

Multi-atlas segmentation using classifier fusion and selection

AtlasAffine

registrationStandard

AtlasAffine

Unseen data

AtlasAffine

spaceSelection of

similar atlases

Unseen data

AtlasAffine

spaceSelection of

similar atlasesNon-rigid

registration

Unseen data

AtlasAffine

spaceSelection of

similar atlasesNon-rigid

registration

Unseen data

Individual segmentations

Decisionfusion

Unseen data Aljabar et al.

NeuroImage 2009

Aljabar et al.NeuroImage 2009

naive classifier fusion

classifier selection & fusion

Patch-based segmentation

• Patch-based methods have been popular in the computer vision community, e.g. – texture synthesis (Efros and Freeman, 2001)– in-painting (Criminisi et al., 2004)– restoration (Buades et al., 2005)– single-frame super resolution (Protter et al., 2009).

• Basic idea:– only coarse registration required (e.g. rigid or affine registration)– use the same concept as label fusion

Patch-based denoising

• Buades et al. 2005– uses a neighborhood averaging strategy called non-local means

(NLM)– assumes that every patch in a natural image has many similar

patches in the same image.

Denoised image Original image

Self-similarity

Patch-based denoising

• Buades et al. 2005– uses a neighborhood averaging strategy called non-local means

(NLM)– assumes that every patch in a natural image has many similar

patches in the same image.

Weighting functionusually: exp(-x)

Bandwidth for smoothing

Patch-based label fusion

• Apply the same idea as in denoising to label fusion:– Simultaneously proposed by Rousseau et al. and Coupe et al. in 2011

Rousseau et al. 2011

Coupe et al. 2011

Discriminative Dictionary Learning for Segmentation (DDLS)

atic segmentations

Patch-based multi-organ segmentation of abdominal CT

• Segmentation of CT scans from 150 subjects

• Hierarchical, patch-based label fusion process

Structure Dice Jaccard

Kidneys 92.5 (7.2) [51.5 98.2] 86.8 (10.5) [34.6 96.4]Liver 94.0 (2.8) [81.4 97.3] 88.9 (4.8) [68.7 94.9]