statistical learning for image-based personalization of cardiac models

UNIVERSITÉ NICE SOPHIA ANTIPOLIS

ÉCOLE DOCTORALE STICSCIENCES ET TECHNOLOGIES DE L’INFORMATION

ET DE LA COMMUNICATION

THÈSE DOCTORALEpour obtenir le titre de

Docteur en Sciencesde l’Université Nice Sophia Antipolis

Discipline : AUTOMATIQUE, TRAITEMENT DU SIGNAL ET DES

IMAGES

Soutenue par

Loïc LE FOLGOC

Apprentissage Statistique pour la Personnalisation de ModèlesCardiaques à partir de Données d’Imagerie

Superviseur de thèse : Hervé DELINGETTE

Co-Superviseur: Nicholas AYACHE

preparée à Inria Sophia Antipolis, équipe ASCLEPIOS

soutenue le 27 novembre 2015

Jury :

Rapporteurs : Nikos PARAGIOS - École Centrale Paris (CVC)Bertrand THIRION - Inria (équipe PARIETAL)

Examinateurs: Antonio CRIMINISI - Microsoft Research Cambridge (MLP)Sébastien OURSELIN - University College London (TIG)

Superviseur: Hervé DELINGETTE - Inria (équipe ASCLEPIOS)Co-Superviseur: Nicholas AYACHE - Inria (équipe ASCLEPIOS)

UNIVERSITY OF NICE - SOPHIA ANTIPOLIS

DOCTORAL SCHOOL STICSCIENCES ET TECHNOLOGIES DE L’INFORMATION

ET DE LA COMMUNICATION

P H D T H E S I Sto obtain the title of

PhD of Scienceof the University of Nice Sophia Antipolis

Specialty : AUTOMATION, SIGNAL AND IMAGE PROCESSING

Defended by

Loïc LE FOLGOC

Statistical Learning for Image-based Personalizationof Cardiac Models

Thesis Advisor: Hervé DELINGETTE

Thesis Co-Advisor: Nicholas AYACHE

prepared at INRIA Sophia Antipolis, ASCLEPIOS Teamdefended on November 27, 2015

Jury :

Reviewers: Nikos PARAGIOS - École Centrale Paris (CVC)Bertrand THIRION - Inria (PARIETAL Research Team)

Examinators: Antonio CRIMINISI - Microsoft Research Cambridge (MLP)Sébastien OURSELIN - University College London (TIG)

Advisor: Hervé DELINGETTE - Inria (ASCLEPIOS Research Team)Co-Advisor: Nicholas AYACHE - Inria (ASCLEPIOS Research Team)

This thesis was prepared at Inria, ASCLEPIOS Research Project, Sophia Antipolis, France.It was partly funded by the Microsoft Research – Inria Joint Centre and by the EuropeanResearch Council through the ERC Advanced Grant MedYMA (2011-291080) on Bio-physical Modeling and Analysis of Dynamic Medical Images.

Apprentissage Statistique pour la Personnalisation de Modèles Cardiaques àpartir de Données d’Imagerie

Abstract: Cette thèse porte sur un problème de calibration d’un modèle électromé-canique de cœur, personnalisé à partir de données d’imagerie médicale 3D + t ; etsur celui — en amont — de suivi du mouvement cardiaque. Les perspectives à longterme de la simulation personnalisée de la fonction cardiaque incluent l’aide au diag-nostic et à la planification de thérapie, ainsi que la prévention des risques cardiovasculaires.

A cette fin, nous adoptons une méthodologie fondée sur l’apprentissage statistique.Pour la calibration du modèle mécanique, nous introduisons une méthode efficace mêlantapprentissage automatique et une description statistique originale du mouvement cardiaqueutilisant la représentation des courants 3D + t. Notre approche repose sur la constructiond’un modèle statistique réduit reliant l’espace des paramètres mécaniques à celui dumouvement cardiaque.

L’extraction du mouvement à partir d’images médicales avec quantificationd’incertitude apparaît essentielle pour cette calibration, et constitue l’objet de la sec-onde partie de cette thèse. Plus généralement, nous développons un modèle bayésienparcimonieux pour le problème de recalage d’images médicales. Notre contributionest triple et porte sur un modèle étendu de similarité entre images, sur l’ajustementautomatique des paramètres du recalage et sur la quantification de l’incertitude. Nous pro-posons une technique rapide d’inférence gloutonne, applicable à des données cliniques 4D.

Enfin, nous nous intéressons de plus près à la qualité des estimations d’incertitudefournies par le modèle. Nous comparons les prédictions du schéma d’inférence gloutonneavec celles données par une procédure d’inférence fidèle au modèle, que nous développonssur la base de techniques MCMC. Nous approfondissons les propriétés théoriques etempiriques du modèle bayésien parcimonieux et des deux schémas d’inférence.

Keywords: Problème inverse, suivi de mouvement cardiaque, recalage non-rigide,modélisation bayésienne parcimonieuse structurée, détermination automatique, méthodesde Monte-Carlo par chaînes de Markov

Statistical Learning for Image-Based Personalization of Cardiac Models

Abstract: This thesis focuses on the calibration of an electromechanical model of theheart from patient-specific, image-based data; and on the related task of extracting thecardiac motion from 4D images. Long-term perspectives for personalized computersimulation of the cardiac function include aid to the diagnosis, aid to the planning oftherapy and prevention of risks.

To this end, we explore tools and possibilities offered by statistical learning. Topersonalize cardiac mechanics, we introduce an efficient framework coupling machinelearning and an original statistical representation of shape & motion based on 3D+tcurrents. The method relies on a reduced mapping between the space of mechanicalparameters and the space of cardiac motion.

The second focus of the thesis is on cardiac motion tracking, a key processing step inthe calibration pipeline, with an emphasis on quantification of uncertainty. We developa generic sparse Bayesian model of image registration with three main contributions:an extended image similarity term, the automated tuning of registration parameters anduncertainty quantification. We propose approximate inference schemes that are tractableon 4D clinical data.

Finally, we wish to evaluate the quality of uncertainty estimates returned by theapproximate inference scheme. We compare the predictions of the approximate schemewith those of an inference scheme developed on the grounds of reversible jump MCMC.We provide more insight into the theoretical properties of the structured sparse Bayesianmodel and into the empirical behaviour of both inference schemes.

Keywords: Inverse Problem, Cardiac Motion Tracking, Non-rigid Registration,Structured Sparse Bayesian Learning, Automatic Relevance Determination, Markov ChainMonte Carlo

iii

Acknowledgments

I would like to start by thanking my advisor Hervé Delingette, for your scientific oversight,your guidance and your availability throughout my thesis. I am grateful for all the ideasthat you gave me to investigate and for all that I got to learn during my stay. I was bothgiven the chance to keep pushing my ideas until I was happy with them, and given the(occasional regains of) motivation and safeguards to carry them through. The adviceand directions that you provided during many helpful discussions shaped this work intobetter form. To Nicholas Ayache, thank you for welcoming me in the team, for providingsuch a great work environment and for what I learned from you. I truly enjoyed mystay at Asclepios. To Antonio Criminisi, many thanks for your availability and for yoursuggestions that often brought a fresh perspective on my research. It was always a pleasureto visit you at Cambridge.

Secondly, I want to express my sincere thanks to my reviewers, Bertrand Thirion andNikos Paragios. I am grateful for the time you spent reviewing this manuscript, for yoursharp comments and your encouragements. I would like to extend my thanks to SébastienOurselin for accepting to be member of the jury and for attending my defense. I am veryglad to have all of you in my jury.

It is great time I thank all of my co-workers for the friendly atmosphere in the team!Special thanks to the people with whom I shared an office: to Ján Margeta and StéphanieMarchesseau for your warm welcome when I arrived in the lab, for your support and for(very) patiently answering many of my random questions. To Hervé Lombaert for thehelpful ideas, for the friendly discussions at work or around a drink. A shout-out to HakimFadil and Loïc Cadour for taking all the not-so-fun engineer jokes with a smile and alwayshelping regardless. Dear Asclepios dinosaurs and youngsters alike, I thank you for the fondmemories shared during my stay: whether hiking, climbing, gaming, kayaking, visiting afriend at his marriage, horse-riding or goofing around, you made it a fun few years (yes,the Bollywood movies too).

Big thanks to Isabelle Strobant for all of the help of course, but also for your kindness.I also take this opportunity to thank Maxime Sermesant and Xavier Pennec for the valuablediscussions, the shared ideas and the constructive feedback; and for contributing to thefriendly atmosphere!

Of course I ought to acknowledge Laurent Massoulié and the Microsoft Research –Inria Joint Center for supporting my research but, just as importantly, I would also like toexpress my warm thanks to you, to Hélène Bessin-Rousseau and to my colleagues therefor giving me a spontaneous and friendly welcome upon each of my visit!

Finally I wish to simply thank my friends, my family, my parents and my brother.

Contents

1 Introduction 11.1 Context and Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Cardiac Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 A Macroscopic Multi-Scale Model of Cardiac Mechanics . . . . . . . . . . 41.4 Patient-Specific Model Personalization: an Inverse Problem . . . . . . . . . 51.5 Cardiac Motion Tracking & Image Registration . . . . . . . . . . . . . . . 71.6 Open challenges in image registration: from an optimization standpoint to

a probabilistic standpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.7 Manuscript Organization and Objectives . . . . . . . . . . . . . . . . . . . 10

2 Machine Learning and 4D Currents for the Personalization of a MechanicalModel of the Heart 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Currents for Shape Representation . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 A Statistical Shape Representation Framework . . . . . . . . . . . 122.2.2 Computational Efficiency and Compact Approximate Representa-

tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Current Generation from Mesh Sequences . . . . . . . . . . . . . . 152.3.2 Shape Space Reduction . . . . . . . . . . . . . . . . . . . . . . . . 162.3.3 Regression Problem for Model Parameter Learning . . . . . . . . . 17

2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Sparse Bayesian Registration of Medical Images for Self-Tuning of Parametersand Spatially Adaptive Parametrization of Displacements 233.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Statistical Model of Registration . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Data Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.2 Representation of displacements . . . . . . . . . . . . . . . . . . . 303.2.3 Transformation Prior . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.4 Sparse Coding & Registration . . . . . . . . . . . . . . . . . . . . 333.2.5 Hyper-priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.6 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Inference schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.1 Approximation of the Model Likelihood . . . . . . . . . . . . . . . 363.3.2 Form of the Marginal Likelihood . . . . . . . . . . . . . . . . . . . 373.3.3 Hyperparameter inference . . . . . . . . . . . . . . . . . . . . . . 383.3.4 Algorithmic overview . . . . . . . . . . . . . . . . . . . . . . . . 40

vi Contents

3.4 Experiments & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.1 Self-tuning registration algorithm: an analysis . . . . . . . . . . . . 423.4.2 Synthetic 3D Ultrasound Cardiac Dataset . . . . . . . . . . . . . . 443.4.3 STACOM 2011 tagged MRI benchmark . . . . . . . . . . . . . . . 493.4.4 Cine MRI dataset: qualitative results and uncertainty . . . . . . . . 51

3.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Quantifying Registration Uncertainty with Sparse Bayesian Modelling: IsSparsity a Bane? 574.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Statistical Model and Inference . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Bayesian Model of Registration . . . . . . . . . . . . . . . . . . . 604.2.2 Model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 Posterior Exploration by MCMC Sampling . . . . . . . . . . . . . 64

4.3 Predictive Uncertainties: Marginal Likelihood Maximization vs. Exact In-ference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4 Preliminary Experiments and Results . . . . . . . . . . . . . . . . . . . . . 714.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Conclusion and Perspectives 775.1 Personalization of cardiac mechanics: contribution & perspectives . . . . . 775.2 Sparse Bayesian modelling for image registration: contribution & perspec-

tives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.2.1 Contributions and perspectives on image registration . . . . . . . . 785.2.2 Contributions and perspectives on statistical learning . . . . . . . . 79

A Sparse Bayesian Registration: Technical Appendices 81A.1 Closed form regularizers for Gaussian reproducing kernel Hilbert spaces . . 81A.2 Contribution of a basis to the log marginal likelihood . . . . . . . . . . . . 82A.3 Update of µ, Σ, L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.4 Update of si, κi and qi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.5 Marginal prior & marginal likelihood . . . . . . . . . . . . . . . . . . . . 86

Bibliography 89

CHAPTER 1

Introduction

Contents1.1 Context and Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Cardiac Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 A Macroscopic Multi-Scale Model of Cardiac Mechanics . . . . . . . . 41.4 Patient-Specific Model Personalization: an Inverse Problem . . . . . . . 51.5 Cardiac Motion Tracking & Image Registration . . . . . . . . . . . . . 71.6 Open challenges in image registration: from an optimization standpoint

to a probabilistic standpoint . . . . . . . . . . . . . . . . . . . . . . . . 91.7 Manuscript Organization and Objectives . . . . . . . . . . . . . . . . . 10

In the following pages, we briefly review the motivations for this thesis, introducingthe necessary background on cardiac function and modelling with emphasis on mechanicsand motion. We reposition the problems we will address with respect to the state of the artand give an overview of the manuscript organization.

1.1 Context and Motivations

Cardiac diseases are associated worldwide with high morbidity and mortality. Detectingand preventing risks for such diseases, improving patient health, reducing morbidity hasgenerated immense interest and is the subject of active research in many communities. Thecontinued improvement of imaging modalities brings forward a wealth of opportunities toobtain information about the cardiac function non-invasively, with expected benefits forimproved clinical diagnosis and therapy planning.

As a reflection of these challenges, the communities of medical image computing,computer anatomy and physiology have made tremendous progress over the past decadestowards simulation of the human biology (and of particular interest to us, the cardiacfunction) as well as towards the automated analysis and understanding of medical im-ages. Firstly, imaging softwares (MedInria, ParaView, Segment, Cardioviz3D, ITKSnap...)provide a set of standard tools for visualization and low-level processing of images, thatalready make it possible to extract clinically relevant (albeit simple) indices of the car-diac function (e.g. the volume of cardiac ventricles). Advances have been made towardsthe automated extraction of the cardiac geometry (also know as segmentation) and that ofthe cardiac motion (a.k.a. motion tracking or registration) from structural and longitudi-nal magnetic resonance or echocardiographic data. The geometry and kinematics of the

2 Chapter 1. Introduction

Figure 1.1: The Virtual Patient Heart long-term vision couples knowledge-driven computermodels of the cardiac anatomy and physiology with patient-specific data to yield a faithful,personalized simulation that may aid clinical diagnosis as well as help predict successfultherapies.

patient’s heart potentially yield valuable tokens of their clinical state, either via derivedindices (e.g. the ejection fraction) or direct analysis of the motion (dyssynchronous con-traction). Of course, open challenges – either intrinsic to the tasks themselves, or broughtforward by limitations in the image acquisition protocols – subsist and continue to moti-vate active research on the topic. Secondly, systematic advances into the biophysical andmathematical modelling of the cardiac physiology have resulted in a variety of computersimulators able to reproduce aspects of the cardiac function with increasingly high degreeof faithfulness. This includes the simulation and integration of haemodynamics, electro-physiology, mechanics, metabolism...

This thesis work falls within the join Microsoft Research – Inria Medilearn project. TheMedilearn project gathers researchers from the Machine Learning and Perception group(MSR Cambridge), Inria Asclepios and Parietal teams around several research axes, ofwhich the latter directly motivates our work: the development of novel analytic approachesto learn from large-scale datasets of medical images, the automatic analysis and indexationof medical images [Margeta 2011, Lombaert 2014, Margeta 2015, Hoyos-Idrobo 2015,Lombaert 2015] and the development of accurately personalized heart models with thelonger-term goal of assisting diagnosis and therapy selection for patients with a heart con-dition. Naturally patient-specific, personalized simulation of the cardiac behaviour callsfor the coupling of computer simulations and of data assimilation techniques that incor-porate (typically) image-based information so that the model may reproduce the observedbehaviour. Thus the motivation for this work, as we will now see after a brief introduction

1.2. The Cardiac Function 3

of the necessary background, is firstly the exploration of adequate strategies for the person-alization of cardiac models – more specifically cardiac mechanics – which then motivatedthe development of novel tools for the decisive preprocessing step of extracting cardiackinematics, a.k.a. motion tracking.

1.2 The Cardiac Function

The heart is an involuntary muscle that pumps blood throughout the human body via thecirculatory system, in a closed loop. The deoxygenated veinous blood reaches the rightatrium before being pushed into the right ventricle. During the cardiac contraction, it isejected towards the lungs and reoxygenated. Similarly, it then arrives in the left atrium,is pushed down to the left ventricle and ejected towards the aorta and the arterial systemduring the contraction. Atria and ventricles are separated by atrioventricular valves thatopen (in a single direction) or close as a result of the pressure differential; similarly arte-rial valves control the flow between ventricles and arteria. Hence the cardiac contractionis divided into four phases, as summarized in Fig. 1.2. During the filling phase, the ven-tricles fill up with blood, first passively then actively due to the contraction of the atria.In a second phase (isovolumetric contraction), the atrioventricular valves close and theventricles start contracting, resulting in a ventricular pressure rise. Upon reaching and ex-ceeding the arterial pressure, the arterial valves open and the blood is ejected towards thearteria (ejection phase). Arterial valves close as the pressure differential reverts back andthe isovolumetric relaxation starts. As soon as the atrium pressure exceeds the ventricularpressure, atrioventricular valves open and the filling phase starts anew.

Figure 1.2: (Left) Simplified cardiac anatomy, illustrating the four chambers and their con-nection to the circulatory system. The blue compartment deals with de-oxygenated blood,the red compartment with oxygenated blood. The conduction system responsible for thesynchronous contraction of the heart is overlayed in yellow. Image from [Prakosa 2013a].(Right) The Wiggers diagram summarizes the phases of the cardiac cycle. Image fromWikipedia.


The contraction of the cardiac muscle occurs preferentially along elongated cardiacfibers, which enroll in a structured manner around the cardiac chambers. The contrac-tion rate is controlled thanks to a natural pacemaker, the sinoatrial node, that generateschanges of electrical potential across cardiac cell membranes. The electrical wave prop-agates through the conduction system and diffuses through the cardiac muscle and is re-sponsible for the (normally) synchronous contraction of the cardiac muscle (so-called my-ocardium).

1.3 A Macroscopic Multi-Scale Model of Cardiac Mechanics

A variety of models were developed for the simulation of cardiac electrophysiology ormechanics, ranging from cellular models of ionic interactions, of muscle cell structureand activity, to macroscopic models of the electrical wave propagation and of the my-ocardium mechanics. Our work relies on an implementation of the Bestel-Clement-Sorine(BCS) model [Bestel 2001, Chapelle 2012] described in [Marchesseau 2013a]. We referthe reader to the same manuscript for a discussion of the modelling literature and detailson the BCS model, while only providing the necessary summary here. The BCS elec-tromechanical model progressively derives, from a fine description of cellular processesand the recourse to statistical mechanics, a mesoscopic then a macroscopic model of car-diac mechanics. The macroscopic model, as illustrated in Fig. 1.3, is ultimately entirelydetermined by 14 global parameters, sometimes refined regionally. The model accounts foran element of active contractility (active stress τc) controlled by an electrical input u witha simplified behaviour dependent on four main input variables (including the time at whichthe electrical wave reaches the point of interest and the duration of the action potential),for energy dissipations and for the passive properties of the muscle cells and of the musclematrix. The amplitude of the electrical input (parameters kATP and kRS) govern the rates ofactive contraction and relaxation.

Figure 1.3: The electromechanical model. (Left) The mechanical model. The conven-tion is that stress is additive for modules in parallel and deformations are additive (afterlinearization) for modules in series. (Middle) The 14 model parameters. (Right) Sim-plified model of the action potential controlling the contraction. Td: depolarization time(beginning of active contraction). APD: action potential duration. Tr: repolarizationtime (end of active relaxation, beginning of passive relaxation). Illustration adapted from[Marchesseau 2013a].

1.4. Patient-Specific Model Personalization: an Inverse Problem 5

1.4 Patient-Specific Model Personalization: an Inverse Problem

Assimilating available data and tuning model parameters for the cardiac model to reproducethe observed behaviour will henceforth be referred to as cardiac model personalization.Data typically consists in 3D time series of MR images, which allows to study the motionof the cardiac muscle over the cycle. Several modalities for dynamic MRI exist. Cine SSFPMRI is acquired 2D+t slice by 2D+t slice across several cardiac cycles then reconstructedby resynchronizing stacks from ECG signals. The acquisition process leads to a lowernumber of axial slices and a lower axial resolution (as in Fig. 1.4 (Left)). Moreover,parts of the cardiac anatomy such as the basal area typically fall outside of the field ofview, introducing higher uncertainty in the tracking of motion. Alternatively, tagged MRIimages the cardiac motion via regular, grid-like ‘tags’ that deform over time, following thedisplacement of underlying physical points in the myocardium. This makes the tracking ofmotion at tag intersections easier. On the other hand anatomical structures are not visiblein this modality (only tags), and tagged MRI data availability is much less widespread inthe clinical practice. Challenges peculiar to the task of extracting the motion from cardiacimages will be the basis for the second part of our thesis work.

Motion tracking is a common preprocessing step for model personalization as it makesprocessed data more amenable to comparison with model-based simulations, in the formof time series of volumetric meshes. Indeed, automatic or interactive tools already existto create the 3D myocardium geometry [Ecabert 2011, Larrabide 2009], which can thenbe propagated via the output of motion tracking from a reference frame to the rest of thecycle. The task of personalizing cardiac mechanics from cardiac kinematics can then beseen as that of optimizing model parameter values for the simulated sequence of meshesto approximate in some sense the data-driven sequence of meshes. Optimizing parametervalues from observed data is an inverse problem, by opposition to the forward problem ofsimulating the cardiac motion from given parameter values.

Figure 1.4: From patient-specific images to a personalized simulation: an inverse problem

This inverse problem has been tackled by different authors over the last five years.[Sundar 2009] and [Delingette 2012] propose adjoint variational methods to personalize


contractility parameters from synthetic cardiac motion, or real pathological cases for thelatter. [Moireau 2011, Chabiniok 2012, Imperiale 2011] use Reduced Order UnscentedKalman Filtering (ROUKF) to estimate the contractility. [Xi 2011] compared ROUKF andSequential Quadratic Programming for the estimation of passive material parameters fromsynthetic data. [Marchesseau 2013b] uses ROUKF to personalize contractilities from theobservation of regional volumes on healthy and pathological cases. Variational methodsrely on the differentiation of a cost function, which calls for expensive evaluations of theentire forward model and results in a long processing time. Sequential methods typicallyattempt to reduce the computational load by assimilating and correcting the discrepancy be-tween observations and the predicted state at every time step during the simulation. Thesemethods are intrusive (the code for the forward model has to be modified) and sensitiveto initialization. [Marchesseau 2012] proposes to alleviate the dependency on the initial‘guess’ with an Unscented (Kalman) Transform based calibration from global indices.

The motivation for our work finds its source in the challenges posed by the computa-tional complexity and the highly non-linear behaviour of the forward model for classicaloptimization strategies. This context prompted us to explore the possibilities offered bymachine learning. Specifically, can we accelerate the personalization procedure by sys-tematic, parallelizable characterization of the relationship between the parameter spaceand the cardiac motion in an offline phase? Can we make the personalization more reliableand accurate? Anticipating on some of our conclusions, it was found that the estimationof parameters from the cardiac motion was subject to significant residual uncertainty evenin well controlled synthetic settings. This uncertainty has several sources, as depicted inFig. 1.5, partly intrinsic to the task at hand and partly caused by the experimental setting.In particular uncertainty in the motion tracking is crucial and highly conditions the ensu-ing personalization. Hence the rest of this thesis focuses on the development of reliableregistration tools, with emphasis on a framework that opens perspectives for uncertaintyquantification. Indeed, accounting for uncertainty on input kinematics in the personaliza-tion process is likely to improve its accuracy and robustness. In fact the inverse problemcould be given a full probabilistic treatment, returning a faithful characterization of likelycombinations of model parameters; preliminary work as recently been conducted in thatdirection [Neumann 2014].

Figure 1.5: Sources of uncertainty in the personalization of cardiac mechanics

1.5. Cardiac Motion Tracking & Image Registration 7

As an alternative to the motion tracking preprocessing, the personalization schemecould proceed to directly match parameters from image data but this calls for an adequatesimulator of images given a prescribed motion. Such simulators are not yet widespreaddespite major recent advances, e.g. [Prakosa 2013b, Prakosa 2013a] and for 3D echocar-diographic images, [Alessandrini 2015].

1.5 Cardiac Motion Tracking & Image Registration

Cardiac motion tracking can be more abstractly apprehended as a particular instance ofimage registration. Pairwise image registration addresses the problem, given two imagesI and J representing related objects or organs, of finding a transformation of space thatmatches homologous features between objects. Thorough reviews of the field can be foundfor instance in [Goshtasby 2012] and [Sotiras 2013]. Fig. 1.6 provides a graphic overviewof the basic methodological review that follows.

Figure 1.6: Building blocks for an image registration algorithm, relevant both from the clas-sical standpoint of variational optimization and from that of probabilistic registration. Ter-minology changes according to the context (prior and image likelihood vs. regularizer andimage similarity criterion). B-spline and Gaussian RBF illustrations from [Brunet 2010].

The quality of match is assessed via a metric of disparity, either between the intensityprofiles of registered images, or between key features (landmark- or intensity-based) ex-tracted by some other means. Intensity-based measures of discrepancy include the sum-of-squared differences of voxel intensities (SSD), which can be improved upon by modelingspatially varying noise levels [Simpson 2013] and artefacts [Hachama 2012], or by relax-ing assumptions over the intensity mapping between images – e.g. to a piecewise constantmapping [Richard 2009], to a locally affine mapping [Cachier 2003, Lorenzi 2013] or to amore complex, non-linear (Parzen-window type) intensity mapping [Janoos 2012]. Othercriteria derived from information theory, such as mutual information, allow the registrationof images of different modalities [Wells III 1996]. Feature-based strategies are often highly


application dependent, although block matching [Ourselin 2000, Clatz 2005] and computervision inspired features [Lowe 2004, Dalal 2005, Bay 2008, Calonder 2010, Rublee 2011]count among the most generic and widespread.

In many cases (such as our application of interest), images are not merely related by arigid transform (neither a similarity or an affine transform) and a fully non-rigid transfor-mation is sought. This has given rise to a variety of parametric or non-parametric repre-sentations of the transformation: piecewise-affine [Pitiot 2003], polyaffine [Arsigny 2003,Arsigny 2009], free form deformations parametrized by B-splines [Rueckert 2006], densedisplacement fields [Thirion 1998], dense diffeomorphic representations [Dupuis 1998,Beg 2005, Arsigny 2006, Ashburner 2007, Singh 2015], parametric (e.g. Gaussian radialbasis functions) diffeomorphic representations [Durrleman 2007, Sommer 2013], etc.

Non-rigid registration is ill-posed (due to the aperture problem, to the sheer dimension-ality of the space of admissible transformations, etc.) and the incorporation of various softor hard constraints has been proposed. In the dominant formulation following the workof [Broit 1981] (see also [Gee 1998]), registration attempts to minimize an energy that isthe trade-off between the image discrepancy criterion and an L2 regularizer inspired frommechanics (e.g. from membrane or thin-plate models) or interpolation theory. Other con-straints include incompressibility [Rohlfing 2003], local rigidity [Loeckx 2004] or mixedL1/L2 norms inducing sparsity of the parametrization [Shi 2012].

Finally, various gradient-free or gradient-based strategies have been implemented totackle the resulting optimization problem [Klein 2007], from preconditioned gradientdescent to a range of quasi-Newton methods possibly accelerated by full multigridapproaches [Haber 2006]. Alternatively, [Glocker 2007, Glocker 2008a, Glocker 2011]showed tremendous speed improvements by exploiting specific structure of the prior andimage similarity energies (via Markov Random Fields), by quantizing the search spacefor displacements and solving the resulting problem by efficient linear programming.The framework allows for the efficient computation of min-marginals that measure thevariations of the model energy under different constraints [Kohli 2008, Glocker 2008b]: alocal measure of uncertainty is then derived and used to spatially adapt the displacementquantization, further improving the framework efficiency [Parisot 2014].

Many approaches to cardiac motion tracking have been proposed [Mäkelä 2002].[Chandrashekara 2004] registers tagged MRI via the use of multi-level FFDs andnormalized mutual information (NMI). Alternative grid topologies have been sug-gested for free form deformations that attempt to more closely match the heart shape[Heyde 2013] with expected gains in compacity and computational efficiency. In[McLeod 2013a, Mcleod 2013b], a cardiac-specific polyaffine model of motion withsoft-incompressibility constraint is used instead. [De Craene 2010] addresses the problemof temporal consistency of the FFD by parametrizing the spatio-temporal velocity fieldwith B-Spline kernels that have both a spatial and a temporal extent. The method is appliedto both tagged MRI [De Craene 2012] and (3D US) echocardiographic [De Craene 2012]sequences. [Shi 2013] showed an increase in registration accuracy when appending asparsity-inducing prior (L1 norm) to a bending energy regularizer, yielding a sparsemulti-level FFD representation. [Osman 1999] develops a dedicated harmonic phase

1.6. Open challenges in image registration 9

tracking algorithm for the tagged MR modality, exploiting the peculiar image structurein this modality; recent extensions of the approach showed state-of-the-art result (see[Zhou 2015], including a discussion of other tagged MR-specific methods). Finally, publicbenchmarks to evaluate the accuracy in terms of motion and strain of cardiac trackingmethods were recently proposed [Tobon-Gomez 2013, De Craene 2013] for tagged MRIand 3D US data.

Many of the afore-mentioned approaches exploit parametrizations specifically craftedfor the purpose of cardiac motion tracking, or third party data such as anatomical segmen-tations, to help the registration process. Although it is often a reasonable means to achievebetter performance, it also comes at an obvious cost in terms of broadness of applicabil-ity. We will be interested in maintaining the genericity of the proposed methods whileremaining competitive with the state-of-the-art performance-wise.

1.6 Open challenges in image registration: from an optimiza-tion standpoint to a probabilistic standpoint

Two main issues are left largely open in the state of the art. Firstly, registration algorithmsare heavily dependent on a priori unknown hyperparameters, such as the optimal trade-off between image similarity and regularization. Optimal values of these hyperparametersare affected by a variety of spurious factors such as rescaling of the intensity profiles,resampling of images, coarseness of the parametrization (number of degrees of freedom inthe parametrization) and the image acquisition process. Optimal values are also likely tochange over the course of the cardiac cycle and from one subject to another (for instance,in the presence of pathologies affecting the cardiac motion). This renders cross-validation-based procedures extremely cumbersome or inefficient at finding optimal parameters. Anatural question follows: can we automate the process of finding optimal hyperparameters?Can we find a principled, (somewhat) objective basis on which to define such optimality?

Secondly, confidence in the registration output and its accuracy is hampered by severallimiting factors: inadequacy of the parametrization or of the prior assumptions, unknownmodel hyperparameters, lack of observability of the motion (aperture problem, texturelessregions...). We would greatly benefit from estimates of the local confidence in the registra-tion output that integrates these various unknown. In short, can we quantify uncertainty inthe registration task?

There is yet little literature on the subject while acknowledging the associated chal-lenges [Taron 2009, Kybic 2010], and we review it with more depth in the correspond-ing chapters. We note for now the seminal work of [Simpson 2012, Simpson 2013] and[Janoos 2012, Risholm 2013] that reinterpret registration in a Bayesian setting. This theo-retical framework brings adequate tools to answer (at least partially) the above questions.Three main challenges have to be tackled to push this framework towards routine use: thefirst is to improve the modelling of registration (prior models and above all, the observationmodel that relates registered images), the second is the curse of dimensionality induced bythe size of the transformation representation, the third is the development of efficient and


faithful inference schemes to characterize the quantities of interest.

1.7 Manuscript Organization and Objectives

The goals of this thesis led us to consider the following three problems and questions:

1. The personalization of cardiac mechanics poses challenges related to the com-putational complexity and the highly non-linear behaviour of the forward model[Marchesseau 2012, Marchesseau 2013b]. Can machine learning help us acceler-ate and improve the estimation of parameters?

2. The success of the personalization is crucially dependent on the quality of the pre-liminary motion tracking. Moreover unavoidable uncertainty in the motion tracking,along with other sources of bias and uncertainty, render the inverse problem funda-mentally underdetermined. This legitimates the recourse to a probabilistic person-alization of model parameters. In this thesis we focus on the upstream sub-task ofdeveloping reliable algorithms for, and quantifying uncertainty in the motion track-ing. We focus on algorithms and methods that are generic, broadly applicable formedical image registration. We contribute with improved models of registration andpartial answers to the following open issues. Can we automate the process of tun-ing the registration parameters or circumvent the issue altogether? Can we quantifyuncertainty in the registration process?

3. Are the estimates of uncertainty qualitatively satisfactory and useful?

This thesis is organized around published or submitted work. Chapter 2 presents a ma-chine learning approach for the personalization of cardiac mechanics, based on the workof [Le Folgoc 2012].

Chapter 3 develops a novel sparse Bayesian model of registration and a fast inferencescheme. Bayesian modelling allows for self-tuning of model hyperparameters. By forcingthe parametrization of the transformation to be sparse, we expect a lower computationalcomplexity of the Bayesian inference and we make the evaluation of the covariance ma-trix on transformation parameters tractable. This covariance on transformation parameterscan be translated into estimates of uncertainty on the displacement. The framework is ap-plied on tasks of motion tracking on real and synthetic 3D cardiac data of various modal-ities. The chapter is based on [Le Folgoc 2015b] (submitted). We note for reference theearly counterpart of this work presented in [Le Folgoc 2014], where several aspects of themethodology differed, including the recourse to a block matching strategy.

Chapter 4 is concerned with the soundness of uncertainties predicted by the sparseBayesian model. The question prompted us to formalize in a fully Bayesian manner theproposed sparse Bayesian model of registration. Assumptions behind the fast inferencescheme of chapter 3 are explicited and an alternative, exact inference scheme is proposed.The estimates of the ‘optimal’ transformation and of uncertainty returned by the two in-ference schemes are compared and analyzed on theoretical and empirical grounds. Thechapter is based on the following article in preparation [Le Folgoc 2015a].

CHAPTER 2

Machine Learning and 4D Currentsfor the Personalization of a

Mechanical Model of the Heart

Contents2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Currents for Shape Representation . . . . . . . . . . . . . . . . . . . . 12

2.2.1 A Statistical Shape Representation Framework . . . . . . . . . . . 12

2.2.2 Computational Efficiency and Compact Approximate Representations 14

2.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 Current Generation from Mesh Sequences . . . . . . . . . . . . . . 15

2.3.2 Shape Space Reduction . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.3 Regression Problem for Model Parameter Learning . . . . . . . . . 17

2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1 Introduction

Patient-specific models may help better understand the role of biomechanical and electro-physiological factors in cardiovascular pathologies. They may also prove to be useful inpredicting the outcome of potential therapeutic interventions for individual patients. In thischapter we focus on the mechanical personalization of the Bestel-Clement-Sorine (BCS)model, as described in [Bestel 2001, Chapelle 2012].

Model personalization aims at optimizing model parameters so that the behaviourof the personalized model matches the acquired patient-specific data (e.g. cine-MRimages). Several approaches to the problem of cardiac model personalization have beensuggested in the recent years, often formulating the inverse problem via the frameworkof variational data assimilation [Delingette 2012] or that of optimal filtering theory[Liu 2009b, Imperiale 2011, Chabiniok 2012]. The output of these methods is dependenton the set of parameters used to initialize the algorithm; for this reason calibrationprocedures are introduced as a preprocessing stage, such as the one developed in

12 Chapter 2. Machine Learning and Currents for Cardiac Model Personalization

[Marchesseau 2012]. Furthermore these approaches rely on on-line simulations, as anaccurate estimation of the effect of parameter changes along several directions in theparameter space is required to drive the parameter estimation. Due to the complexity ofthe direct simulation these approaches are costly in time and computations.

In this chapter, we explore a novel machine-learning approach, in which the need forinitialization and on-line simulation is removed, by moving the analysis of the parametereffects on the kinematics of the model (and thus the bulk of the computations) to anoff-line learning phase. In this work we assume the tracking of the heart motion fromimages to be given (e.g. via [Mansi 2011]) and focus on the mechanical personalization ofthe cardiac function from meshes. Our work makes use of currents, a mathematical toolwhich was originally introduced to the medical imaging community in the context of shaperegistration [Vaillant 2005, Durrleman 2007] and offers a unified, correspondence-freestatistical representation of geometrical objects. Our main contributions include theconstruction of 4D currents to represent, and perform statistics on 3D + t beating heartsand the proposal of a machine-learning framework to personalize electromechanicalcardiac models.

The remaining of this chapter is organized as follows. In the first part we introduce thebackground on currents necessary to present the rest of our work. We develop our methodin the following section, then present and discuss experimental results in the final sections.

2.2 Currents for Shape Representation

2.2.1 A Statistical Shape Representation Framework

Currents provide a unified representation of geometrical objects of any dimension,embedded in the Euclidean space Rn, that is fit for statistical analysis. The frameworkof currents makes use of geometrically rich and well-behaved data spaces allowing forthe proper definition of classical statistical concepts. Typically the existence of an innerproduct structure provides a straightforward way to define the mean and principal modesof a data set for instance, as in the Principal Component Analysis (PCA). These commentsmotivate an approach of currents from the perspective of kernel theory in this section,although currents are formally introduced in a more general way via the field of differentialtopology. The connection to differential topology is particularly relevant to outline thedesirable properties of currents when dealing with discrete approximations of continuousshapes, in terms of convergence and consistence of the representation [Durrleman 2010].

A well-known theorem due to Moore and Aronszajn [Aronszajn 1951] states that forany symmetric, positive definite (p.d.) kernel on a setX , there exists a unique Hilbert spaceHK ⊂ RX for which K is a reproducing kernel. This result suggests a straightforward wayof doing statistics on X as long as a p.d. kernel K can be engineered on this set, bymapping any point x ∈ X to a function K(x, ·) ∈ HK and exploiting the Hilbert spacestructure in HK . Furthermore, practical computations can be efficiently tracted thanks to

2.2. Currents for Shape Representation 13

Figure 2.1: Discrete curve (in black) described as a collection of tangents (xi, ηi)1≤i≤pin green. In red: its associated representation as a smooth vector field (or as a dual smoothdifferential form), obtained by spatial convolution with a Gaussian kernel.

the reproducing kernel property - namely, for any x, y ∈ X , we have

( K(x, ·) |K(y, ·) )HK = K(x, y) , (2.1)

and more generally yet, for any f ∈ HK , (K(x, ·)|f)HK = f(x). Expanding on this, onecan compute statistics on pairs of points and m-vectors (x, η) ∈ Rn × ΛmRn by mappingthem to functions K(x, ·)η and making use of the reproducing property

(K(x, ·)η|K(y, ·)ν) = ηᵀνK(x, y) . (2.2)

Eq. 2.2 simply extends Eq. 2.1 to vector-valued functions, making use of the factthat the tensor product of two kernels is again a kernel over the product space. Ex-panding the framework even further, we can regard a discrete shape as a finite set(xi, ηi)1≤i≤p, where ηi describes the tangent space at xi, and associate to it a signa-ture function

∑1≤i≤pK(xi, ·)ηi. Fig. 2.1 illustrates this construction, which can also

be acknowledged as a special case of the convolution kernel on discrete structures de-scribed in [Haussler 1999] and [Gärtner 2002]. The correlation between two discreteshapes (xi, ηi)1≤i≤p and (yj , νj)1≤j≤q can then be measured by the inner product

(∑

i

K(xi, ·)ηi |∑

j

K(yj , ·)νj ) =∑

i,j

ηᵀi νjK(xi, yj) . (2.3)

The above inner-product defines a correspondence-free way to measure proximity betweenshapes, trading hard correspondences for an aggregation of the measures of proximity be-tween each simplex of one shape with every simplex of the other shape in the sense of akernel K(·, ·). We have yet to specify a choice of kernel K. In the following, we will


consider the multivariate Gaussian kernel with variance Σ:

KΣ(x, y) =1

(2π)n|Σ|1/2exp−1

2(x− y)ᵀΣ−1(x− y) .

The choice of kernel width Σ can be interpreted as a choice of scale at which the shape ofinterest is observed: shape variations occurring at a lower scale are likely to be smoothedby the convolution and go unnoticed. This mechanism naturally introduces some level ofnoise insensitivity in the analysis. This parameter is decided on with regard to the meshresolution and the level of noise in the data.

Finally, the linear point-wise evaluation functional δηx : ω 7→ ω(x)(η) is continuousand dual to K(x, ·)η by the reproducing kernel property. In the following we will re-fer to δηx as a delta-current or a moment. To summarize, the discretized m-manifold(xi, ηi)1≤i≤p admits equivalent representations as the current

∑i δηixi , its dual differential

m-form∑

1≤i≤pK(xi, ·)ηᵀi or its dual vector field∑

1≤i≤pK(xi, ·)ηi.

2.2.2 Computational Efficiency and Compact Approximate Representations

This framework lends itself to an efficient implementation. Firstly, the inner productbetween two discrete shapes can be computed in linear time with respect to the number ofmomenta through the use of a translation invariant kernel. Indeed γ(·) =

∑iK(xi, ·)ηi

may then be precomputed at any desired accuracy on a discrete grid by convolution, andrewriting

∑i,j η

ᵀi νjK(xi, yj) as

∑j γ(yj)

ᵀνj exhibits the linear dependency w.r.t. thenumber of momenta.

Secondly, if the mesh diameter is small with respect to the scale Σ, the initial delta-current representation will be highly redundant. [Durrleman 2008] introduced an iterativemethod to obtain compact approximations of currents at a chosen scale and with any desiredaccuracy. We rely on this procedure at training time to fasten computations and reduce thememory load. This algorithm is inspired from the Matching Pursuit method [Davis 1997]and illustrated in Fig. 2.2. A compact current is built from the current S to approximate (ofdual field γ) by iteratively adding a single delta current δηnxn to the previous approximationSn−1, in such a way that the difference ‖S − Sn‖H′Σ steadily decreases. This is achievedby greedily placing the moment at the maximum (in ‖ · ‖2 norm) xn of the residual fieldγ(·) − γn−1(·), then choosing the optimal η, i.e. the one that minimizes ‖γ − γn−1 +

K(xn, ·)η‖2HΣ. It is shown in [Durrleman 2008] that this algorithm is greedy in ‖ · ‖HΣ

norm, and converges both in ‖ ·‖HΣnorm and ‖ ·‖∞ norm. The stopping criterion is on the

residual norm ‖γ(·)−γn(·)‖2HΣ. Our implementation uses a discrete kernel approximation

of the Gaussian kernel, rather than an FFT based scheme, for fast local updates of theresidual field.

2.3. Method 15

Figure 2.2: Sparse deconvolution scheme for currents: (a) the initial configuration. Right:two discrete curves in blue and their mean in red, as the collection of all tangents (from bothcurves) seen as momenta in the space of currents. Left: the Gaussian convolution of theinitial momenta gives the dual representation of the mean as a dense vector field. (b) (Resp.(c)): first (resp. third) iteration of the matching pursuit algorithm: estimated momenta onthe right panel, residual vector field on the left panel. The momenta converge to the truesolution while the residual vector field tends to zero. Figure from [Durrleman 2009].

2.3 Method

The workflow for the proposed machine-learning based parameter estimation method cou-ples three successive processing steps: the first one aims at generating a current from aninput sequence of meshes, so as to obtain a statistically relevant representation; the secondone consists in a dimensionality reduction step, so as to derive a reduced shape represen-tation in Rk, which leads to computationally efficient statistical learning; the third steptackles the matter of finding a relationship between the reduced shape space and the (bio-physical) model parameters. The three modules are mostly independent and can easily beadjusted in their own respect. As a machine learning based method, our work involves anoff-line learning stage and an on-line testing stage: all three modules of the pipeline areinvolved during each stage. Fig. 2.3 gives a visual overview of our approach. The restof this section describes the three afore-mentioned processing steps and their use duringlearning and testing stages.

2.3.1 Current Generation from Mesh Sequences

Let us briefly describe the way we build a current from a time sequence of 3D meshes.We first extract surface meshes from the volumetric meshes. This choice derives fromthe assumption that the displacement of surface points can be recovered more easily thanthe displacement of all points within the myocardium, given a sequence of images; thuslearning from surface meshes may be more relevant for real applications. In this work weassume the trajectory of surface points to be entirely known, as opposed to the displacementin the direction normal to the contour only (aperture problem). Several variants to derivecurrents for 4D object representation can be discussed (e.g. [Durrleman 2010]), but theirrelevance largely depends on the application and complete processing work flow from theoriginal data.


(a) Step-by-step process

(b) Learning phase

(c) Testing phase

Figure 2.3: Overview of the learning and testing phases.

In this work, we rely on the remark that the concatenation of smoothly deformed sur-face meshes can be visualized as a (3D) hyper-surface in 4D (Fig. 2.4). The ith simplex ofthis hyper-surface generates a current δηixi , where xi is its barycenter and ηi is the vector ofR4 normal to its support and of length the volume of the simplex. The current associatedto the series of meshes is the aggregation of such delta currents,

∑i δηixi . This construction

captures both the geometry of the heart and its motion.

2.3.2 Shape Space Reduction

Since learning a direct mapping between the space of model parameters and the space of3D+t currents is a cumbersome task, we introduce an intermediate step of dimensionalityreduction via PCA. During the learning stage, we compute the mean current and principalmodes of variation from the learning database ofN currents Si1≤i≤N generated from theN training mesh sequences Mi1≤i≤N as described in §2.3.1. This is achieved efficientlyby computing the Gram matrix of the data Gij = (Si|Sj) column by column and usingthe so-called kernel trick [Schölkopf 2002]. Each column of G is computed in O(N · P ),where P is the maximum number of momenta among all currents Sj (cf. §2.2.1). Finally,we compute an approximate compact representation at the scale Σ of the mean currentT and of the K first modes of variation Tk1≤k≤K to accelerate computations of innerproducts involving these currents, as in [Durrleman 2008].

At testing time and given a new current S, we derive its coordinates v =(v1, · · · , vK

)

in the reduced shape space by projection on the principal modes of variation, vk = (S −T |Tk).

2.3. Method 17

t

x

y

•

•

•

•

P(t)

Q(t)

P(t+ δt)

Q(t+ δt)

×

×

x1

x2

η1

η2

Figure 2.4: Current generation from a mesh element, illustrated on an element of contourin 2D deformed in time. The simplex PQ is followed over two consecutive timesteps,which gives a quad embedded in 3D. The quad is divided into two triangles, from whichwe get two current deltas, applied at each triangle barycenter, orthogonal to the supportof their corresponding triangles and of norm the area of the triangle. For a surface in3D deformed over time, each element of the triangulation followed over two consecutivetimesteps generates a hyper-prism embedded in 4D, which is in turn decomposed in threetetrahedra from which we obtain three momenta.

2.3.3 Regression Problem for Model Parameter Learning

It remains to link the physiological (model) parameters to the reduced shape space.Although we are ultimately interested in finding an optimal set of parameters p ∈ Rdfrom an observation v ∈ RK we will actually learn a mapping in the other direction,f : p ∈ Rd 7→ v ∈ RK . We motivate this choice by three arguments. Firstly, theobservation v is a deterministic output of the cardiac model given a parameter set p andthus the mapping f is well-defined; however there may be several parameter sets resultingin the same observable shape and deformation, as parameter identifiability is not a prioriensured. Secondly, the parameter space is expected to be of smaller dimensionality thanthe reduced shape space and therefore easier to sample for combinatorial reasons. Finally,we can also expect that the set of biologically admissible model parameters be relativelywell-behaved; on the other hand few points in the shape space may actually relate toanatomically reasonable hearts: thus mapping every v ∈ Rk to a parameter set could beimpractical.

The regression function f is learned by kernel ridge regression using a Gaussian ker-nel [Hoerl 1970], and admits a straightforward close-form expression. During the testingphase, given a new observation v, we solve the optimization problem arg minp ‖f(p)−v‖2by Simulated Annealing [Xiang, Y. 2012]. This optimization problem involves an analyt-ical mapping between low-dimensional spaces, as opposed to optimizing directly over the4D meshes or currents. Thus it will not constitute a computational bottleneck regardless of


0 2 4 6 8 10

0

20

40

60

80

100

Principal mode K

Residual(unexplained)variance

(%)

(∑

k≤K λi)/(∑

k≤N λi)

(a) Residual variance

0 0.5 1 1.5 2

·107

−6

−4

−2

0

2

4

·10−6

σ

f(σ)

RegressionTraining set

(b) Regression function

Figure 2.5: Dimensionality reduction assessment for the first experiment. (Left) Residualunexplained variance with only the first K = 0, 1, . . . modes. (Right) Mapping betweenthe parameter space and the first mode of variation, one-to-one over the domain of interest.

the chosen optimization scheme. Naturally, if a prior on the likelihood of a given parameterset p ∈ Rd were known (e.g. via a biophysical argument), it could be integrated in the costfunction in the form of a prior energy term λ ·R(p).

2.4 Experimental Results

In our first experiment we focus on the prediction of the maximum contractility parame-ter σ0 of the BCS model, defined globally for the whole cardiac muscle. Building on thesensitivity analysis from [Marchesseau 2012], we consider that σ0 covers the range of val-ues from 106 to 2 107 in an anatomically plausible way. We form a training base of tencases pi,Mi by sampling this range deterministically and launching simulations withthe corresponding parameter sets, for a single heart geometry from the STACOM’2011dataset. Following the PCA, the first principal mode of variation is found to explain 81%

of the variance, thus we set the reduced shape space to be of dimension 1 (K = 1); theregression function (σ = 0.3, λ = 10−5) bijectively maps the model parameter space andthe reduced shape space. In all experiments, the model parameters are affinely mapped to[−1, 1] for convenience, for the regression and optimization stages. We use an isotropicGaussian kernel of width 1cm in space and 50ms in time.

We evaluate the performance of our approach by cross-validation on an independenttest set pj ,Mj0≤j<N ′ by randomly choosing parameter sets in the admissible rangeof parameters and launching the corresponding simulations. We thereafter refer to pj asthe real parameter (value) and to the output of our approach p∗j as the optimal parameter(value). Our test set is of size N ′ = 100 samples. The whole personalization pipeline,from the current generation to the parameter optimization phase, takes roughly 2 minutesper sample on a regular laptop. We define the relative error on the parameter value for agiven test sample j as εrpj = |p∗j − pj |/pj . In addition to the relative error, we considerthe absolute error over the range of admissible parameters, εapj = |p∗j −pj |/|pmax−pmin|.

2.4. Experimental Results 19

We refer to εap as an absolute error but express it for convenience as a percentage of theadmissible parameter variation. Over the test set, we found a mean relative (resp. absolute)error of 9.2% (resp. 4.5%) and a median relative (resp. absolute) error of 6.8% (resp.2.3%).

As a preliminary evaluation of the robustness of our approach with respect to geometrychanges, ten samples are generated following the same procedure as before but usinganother heart geometry of the STACOM dataset. The 10 mesh sequences are manually reg-istered (translation, rotation and scale) to the training geometry based on the end-diastolemesh before applying the normal pipeline, as described in Section 2.3. The mean relative(resp. absolute) error on the contractility parameter over our sample is 25% (9.3%), with15% (resp. 7.5%) median relative (absolute) error.

The second experiment proceeds similarly to the first one, but we simultaneously es-timate the contractility σ0, the relaxation rate krs and the viscosity µ. For the trainingphase, the parameter space is sampled on a 7× 7× 7 grid with σ0 in the range [106, 2 107],krs in [5, 50] and µ in [105, 8 105]. The explained variance with 1 eigenmode of the PCA(resp. 2 to 5) out of the N = 343 modes equals 63.2% of the total variance (resp. 80.3%,89.5%, 94.1%, 96.7%). We set the dimension of the reduced shape space to K = 3.The performance is tested on N ′ = 100 random samples. Because we can no longerassume the parameter set to be identifiable a priori, we introduce another measure ofthe goodness of fit of our personalization by directly evaluating the error on the obser-vations. Given two surface mesh sequencesM = Mi1≤i≤T andM′ = M′i1≤i≤T ,we define the pseudo-distance dsur(M,M′) = maxi ds(Mi,M′i) where ds(Mi,M′i)2 isthe mean square distance of the points of the surface Mi to the surface M′i. Addition-ally given one-to-one correspondences between M and M′, we can define the distancednod(M,M′) = maxi dp(Mi,M′i), where dp(Mi,M′i) is the mean distance betweencorresponding nodes ofMi andM′i. While dsur intuitively relates to an upper bound forthe matching between surface meshes at any time step, dnod conveys more informationabout the quality of the matching of point trajectories. The results for this experimentare reported in Table 2.1 and in Fig. 2.6. As a comparison, two mesh sequences corre-sponding to extreme values in the parameter set will yield a value for dsur(M,M′) (resp.dnod(M,M′)) of the order of 6mm (resp. 8mm).

In addition we compute the optimal parameters and performance indicators for a dif-ferent choice of the reduced space dimension K, obtaining quasi-identical statistics forK = 4. Finally, we test here again the robustness with respect to changes of the heartgeometry. Using the same procedure as before on 10 test samples on a different geometry,

εrσ0 (εaσ0) εrkrs (εakrs) εrµ (εaµ) dsur (mm) dnod (mm)Mean 15.2% (8.0%) 48.8% (26.4%) 40.5% (20.0%) 0.92mm 1.42mm

Median 13.2% (6.3%) 44.7% (19.6%) 32.1% (17.5%) 0.80mm 1.32mm

Table 2.1: Experiment 2 - results


Figure 2.6: Benchmark for the second experiment. (Boxplots) Statistics are computedover the test set. The left boxplot quantifies the quality of fit in the observation space(mesh-to-mesh error). The right boxplot reports errors between estimated and ground truthparameters. (Mesh slices) Visual display of the quality of fit, on an example case, ona 2D slice of the cardiac muscle at two representative frames during the cardiac cycle(respectively end of relaxation phase and end of contraction phase).

we find a mean error of 1.4mm and a median value at 1.3mm for dsur (respectively, 1.8mmand 1.6mm for dnod).

2.5 Discussion

Despite working around the bias and error introduced by the model and image processingin real applications, our synthetic experiments show promising performance for our frame-work in terms of accuracy, tolerance to non-linear effects of parameters, robustness andcomputational efficiency. The accuracy of our approach was found to be below the typicalvoxel dimension (1mm), while a priori optimizing among a very wide range of parametervalues at test time, and using a reasonable number of training samples at learning time.Although a single geometry is used for the training phase, the accuracy was of the same or-der on similar (non-pathological) heart geometries. Naturally, further work should handlegeometry variability in a proper way, taking it into account at the training stage, and adding"shape factors" to the model parameter space capturing 3D shape variability. Moreover theaddition in the pipeline of a pre-clustering stage with respect to the heart geometry, so asto distinguish very different geometries and treat them separately, should reduce the num-ber of samples required to cover the whole parameter space while achieving better modelpersonalization.

The proposed framework also brings an interesting perspective on the issue of param-eter identifiability. It should be noticed that we achieve good results in terms of spatialdistance between the matched model and observations while significant differences in theparameter space may still be observed. Parameter identifiability encompasses two distinctaspects. Firstly, small variations of the parameter values may result in changes that are notnoticeable at the scale of reference. This sensitivity to parameters partially explains the er-

2.6. Conclusion 21

ror on the retrieved set of parameters. In our approach, the kernel width for currents impactsthe ability of the algorithm to discern shape differences. In the future we will experimentwith smaller kernel widths and improve algorithms to handle increased computational cost.Secondly in joint parameter estimation, a whole subset in the parameter space may resultin identical observations, which also affects parameter identifiability. Such considerationscan be analyzed in depth at the regression or optimization steps: several parameter setswith similar costs along with a measure of local sensitivity around these values may beadditionally output by the Simulated Annealing algorithm. Biophysical priors may also beintroduced at the optimization step by penalizing unlikely parameter sets without addingsignificant computational cost.

Finally more efficient machine learning algorithms should be tested in lieu of PCA, soas to capture non-linear 4D shapes variation, and to obtain and exploit precise informationabout the manifold structure of 4D heart shapes. Not only will this be of help with param-eter identifiability and to derive efficient representations in the reduced shape space, but itcould also provide valuable feedback for "smart" sampling of the parameter space.

2.6 Conclusion

A machine-learning current-based method has been proposed for the personalization ofelectromechanical models of the heart from patient-specific kinematics. A framework toencapsulate information regarding shape and motion in a way that allows the efficientcomputation of statistics via 4D currents has been described. This approach has been eval-uated on synthetic data using the BCS model, with the joint estimation of the maximumcontraction, relaxation rate and viscosity. It is found that the proposed method is accurate,computationally efficient and robust.

CHAPTER 3

Sparse Bayesian Registration ofMedical Images for Self-Tuning of

Parameters and Spatially AdaptiveParametrization of Displacements

Contents3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Statistical Model of Registration . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Data Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 Representation of displacements . . . . . . . . . . . . . . . . . . . 30

3.2.3 Transformation Prior . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.4 Sparse Coding & Registration . . . . . . . . . . . . . . . . . . . . 33

3.2.5 Hyper-priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.6 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Inference schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.1 Approximation of the Model Likelihood . . . . . . . . . . . . . . . 36

3.3.2 Form of the Marginal Likelihood . . . . . . . . . . . . . . . . . . . 37

3.3.3 Hyperparameter inference . . . . . . . . . . . . . . . . . . . . . . 38

3.3.4 Algorithmic overview . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Experiments & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.1 Self-tuning registration algorithm: an analysis . . . . . . . . . . . . 42

3.4.2 Synthetic 3D Ultrasound Cardiac Dataset . . . . . . . . . . . . . . 44

3.4.3 STACOM 2011 tagged MRI benchmark . . . . . . . . . . . . . . . 49

3.4.4 Cine MRI dataset: qualitative results and uncertainty . . . . . . . . 51

3.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.1 Introduction

Non-rigid image registration is the ill-posed task of inferring a deformation Ψ from a pair ofobserved (albeit typically noisy), related images I and J . Classical approaches propose to

24 Chapter 3. Sparse Bayesian Registration of Medical Images

minimize a functional which weighs an image similarity criterion D against a regularizing(penalty) termR:

arg minΨ

E(Ψ) = β · D(I, J,Ψ) + λ · R(Ψ) (3.1)

Prior knowledge to precisely model the space of plausible deformations or the regularizingenergy is generally unavailable. The optimal trade-off between the image similarity termand the regularization prior is itself difficult to find. Typically the user would manuallyadjust the ratio λ/β until a qualitatively good fit is achieved, which is time consuming andcalls for some degree of expertise. Alternatively if quantitative benchmarks are availableon a similar set of images, they can serve as a metric of reference on which to optimizeparameters, under the assumption that the value that achieves optimality is constant acrossthe dataset – say, for images of the same modality or among a given sequence. Unfor-tunately, this assumption generally does not hold. A major feat in the recent literature[Richard 2009, Simpson 2012, Risholm 2013] was to realize that this issue can be tackledautomatically by reinterpreting registration in a probabilistic setting. [Gee 1998] first notedthat, in a Bayesian paradigm, the two terms in Eq. (3.1) relate respectively to a likelihoodand prior on the latent transformation Ψ. In fact the parameters λ and β themselves canbe treated as hidden random variables, equipped with broad prior distributions, and jointlyinferred with Ψ or integrated out.

In practice, analytical inference is precluded and various strategies are devised for ap-

Figure 3.1: (a) Trajectories of points on the endocardium, following the registration of atime series of cardiac MR images by the proposed approach. (b) LV volume over timeand 99.7% confidence interval. (c) Tensor visualization of directional uncertainty at end-systole, rasterized at voxel centers of a 2D slice. For a thorough description, please refer tothe discussion in section 3.5.

3.1. Introduction 25

proximate inference. [Risholm 2013] characterize the distributions of interest from MCMCsamples. This is a most principled, generic and accurate approach provided that enoughsamples can be drawn within the available computational budget. Aside from monitoringthe progress of the scheme, two difficulties arise: crafting an efficient proposal distribu-tion over Ψ and computing the acceptance probability of the proposed sample. To cir-cumvent this latter issue, the authors sample from an approximate posterior distributionderived in a variational free-energy framework. Alternatively, the full inference can betackled in a variational framework. In this spirit, [Simpson 2012, Simpson 2015] derive afast Bayesian approach using variational Bayes Expectation Maximization. This offers anappealing compromise between the computational burden and the quality of the estimates,depending on the chosen family of variational (approximate) posterior distributions.

Despite the incurred computational cost, this probabilistic view of registration offerssubstantial benefits. In this chapter, we demonstrate how such a framework can be ex-tended so as to automatically select the scale and location of bases used to parameterizethe transformation Ψ. The complexity of the mapping adapts to the underlying dataset,yielding a reduced set of relevant degrees of freedom: finer bases get introduced only inthe presence of coherent image information and motion, while coarser bases ensure bet-ter extrapolation of the motion to textureless, uninformative regions. Spatial refinementof the parametrization was previously handled heuristically [Rohde 2003]; or it led to al-ternative formulations of registration via spatially anisotropic filtering [Stefanescu 2004].[Stewart 2003] proposed to select a deformation model on the basis of information crite-ria, in a computationally favorable setting, choosing regionally among a limited pool ofmodels (e.g. similarity, affine, quadratic). Here and to our knowledge, for the first time,it is approached on principled grounds within a probabilistic framework. We develop astatistical model of registration in which a reduced parametrization of the transformationis automatically inferred from the data jointly with all model parameters; and propose anefficient algorithmic scheme that renders inference over this model tractable for real scaleregistration tasks. In particular we extend the scope of a state-of-the-art tool for sparseregression and classification [Tipping 2001, Tipping 2003]. To increase its potency in thecontext of registration, we lift a strong assumption of independency on hidden variables, sothat it now handles generic quadratic regularizing priors at no cost in algorithmic complex-ity. We also generalize it to multivalued regression (regression of vector fields as opposedto scalar fields), so as to preserve the natural invariance to a change of coordinate system.

This chapter expands on earlier work of the authors [Le Folgoc 2014] in several ways.Firstly, we propose a different approximation of the likelihood term, effectively removinga computational bottleneck – specifically, the voxelwise, local optimization of the imagesimilarity via dense block-matching. Instead, a step of global optimization of the posteriordistribution w.r.t. the reduced parametrization (typically, a hundred degrees of freedom)is performed. Furthermore we introduce a flexible noise model that can account robustlyfor acquisition noise and artefacts, and seamlessly adapts to a range of image modalities.We demonstrate our approach on tasks of motion tracking on real cardiac data, specificallytime sequences of 3D cine or tagged MR images and echocardiographic images.


Figure 3.2: Graphical representation of the probabilistic registration model. The data pairD is put in relation via a transformation Ψ of space, up to some noise (β). The transfor-mation is parameterized by weights wk on a predefined overcomplete set of basis functionsφk, k = 1 · · ·M. Priors on the transformation smoothness and on the relevance of indi-vidual bases introduce additional parameters (λ and Ak respectively). Random variablesare circled, whereas deterministic parameters are in squares. Arrows capture conditionaldependencies. Shaded nodes are observed. The doubly circled node indicates that thetransformation Ψ is fully determined by its parent nodes (the φk andwk). The plate standsfor M (replicated) groups of nodes, of which only a single one is shown explicitely.

3.2 Statistical Model of Registration

Registration assumes that the dataset of interest describes objects related via some transfor-mation of space. In a medical context, this occurs for instance when the motion of organsis followed over time in a sequence of images. Registration then aims at recovering theunderlying transformation of space from the observed data. This can be formally regardedas an inference problem and handled as such. We start by expliciting our statistical modelof pairwise registration. Fig. 3.2 provides a graphical depiction thereof. We specify a gen-erative model of the data (e.g. images, landmarks)D = D1, D2 given the transformationΨ, along with a prior over the admissible set of transformations. The general strategy toinfer the parameters of this model is exposed at the end of the section.

The abstract graphical model depicted in Fig. 3.2 bears a strong resemblance to that ofthe Relevance Voxel Machine of [Sabuncu 2011], developed independently by the authorsfor regression and classification tasks. In Section 3.3 we propose alternative inferenceschemes with significant gains in algorithmic complexity, very much in the spirit of howthe later work of [Tipping 2003] improved upon the original Relevance Vector Machine[Tipping 2001]. This effectively renders the approach applicable to non-rigid registration.

3.2. Statistical Model of Registration 27

3.2.1 Data Likelihood

A good transformation Ψ should adequately map the datasets D = D1, D2, up to somemisalignment and residual error attributable to the data formation process. Our knowledgeof this process is captured in a likelihood model, which assigns a probability p(D|Ψ) forthe data D to be observed under some transformation Ψ. The data likelihood is commonlyrelated to the data-matching energy that appears in the classic optimization framework ofEq. (3.1) by:

D(D,Ψ) = − ln p(D|Ψ) + const . (3.2)

For landmark registration, D(D,Ψ) is generally chosen to be the sum of squareddistances between pairings. Alternatively the quadratic loss can be replaced by robustlosses such as the L1 norm, or other loss functions derived from the heavy tailed fam-ily of Student-t distributions [Tipping 2005]. For pairwise registration of images, variousdata-matching terms were introduced in the literature. The simplest and most commonimage similarity term is the sum of squared difference (SSD) of voxel intensities, whichcan be improved upon by modeling spatially varying noise levels [Simpson 2013] and arte-facts [Hachama 2012], or by relaxing assumptions over the intensity mapping betweenimages – e.g. to a piecewise constant mapping [Richard 2009], to a locally affine mapping[Cachier 2003] or to a more complex, non-linear (Parzen-window type) intensity mapping[Janoos 2012]. Mutual information is another popular image similarity, especially in thecontext of registering images of different modalities [Wells III 1996], and has been suc-cessfully applied to the registration of cardiac images [Chandrashekara 2004].

SSD is a simple yet efficient image similarity term for registration of monomodal car-diac images. It naturally lends itself to a probabilistic interpretation and eases mathematicalderivations. The target image J is modeled as the warped source image I Ψ−1 furthercorrupted by additive, independent identically distributed (i.i.d.) noise ei ∼ N (0, β) ateach voxel i = 1 · · ·N :

J = I Ψ−1 + e (3.3)

where e ∼ N (0, βI), I the N × N identity matrix. β is a global scaling parameter: itdenotes the precision (or inverse variance) of the noise across the image. The SSD modelcan be described in a more familiar manner by the energy of Eq. (3.4), where ciNi=1 isthe list of voxel centers in the fixed image and Ci = Ψ−1(ci) are the paired coordinates inthe moving image.

Dβ(I, J ; Ψ) =β

2

N∑

i=1

(J[ci

]− I[Ci

] )2 (3.4)

Since the SSD is quadratic w.r.t to intensity differences of paired voxels in the registeredimages, both the penalty for intensity discrepancies and the rate at which it grows canbecome arbitrarily high. This renders registration particularly vulnerable to strong localintensity biases, introduced for instance by topology changes in the imaged objects or byacquisition artefacts. Fig. 3.3 demonstrates such an occurrence where the shortcomingsof SSD result in a qualitatively poor registration. Furthermore residual misalignments be-tween structures of interest in the pair of registered images typically yield higher intensityresiduals than those observed at background voxels, as seen in Fig. 3.4a. Sources of model


bias and acquisition noise cannot be captured together in a plausible manner with a single,spatially uniform noise level. In other words, the SSD noise model lacks both the flexibil-ity and robustness required to cope with the various sources of discrepancy in the intensityprofiles of the fixed and warped images I and J . To address this limitation we proposeto model the noise at each voxel i = 1 · · ·N with a mixture of Gaussian distributionsei ∼

∑l≤L πlN (0, βl). The corresponding data matching energy is given in Eq. (4.2),

where we denoted by Zl =√

2π/βl the normalizing constant for the Gaussian probabilitydistribution function.

Dβ,π,L(I, J ; Ψ) = −N∑

i=1

logL∑

l=1

πlZl

exp−βl2

(J[ci

]− I[Ci

] )2 (3.5)

a. Moving image b. Fixed image

c. SSD loss d. Robust loss

Figure 3.3: We illustrate the appeal of a robust variant of the SSD image loss based ona mixture-of-Gaussians model (GMM). Images (c,d) display the output warped imagesobtained after registering images (a) and (b), using respectively the SSD-based likelihoodor the GMM-based likelihood (section 3.2.1). The arrows point towards a specific regionthat highlights the limitations of the SSD: the subset of hypo-intense voxels bordering themyocardium in the fixed image has no evident counterpart in the moving image. The SSDstill drives the motion towards the best matches intensity-wise, which induces implausibletangential stretch of the myocardium. The GMM, on the other hand, incorporates a naturalmechanism to downweight regions that cannot be reliably paired from image to imagebased on intensity values. The inferred motion is qualitatively closer to our expectations.


Figure 3.4: (Left) Example residual image after pairwise image registration, as per theexample of Fig. 3.3. Artefacts and structures of variable appearance between registeredimages stand out in a distinct manner, much unlike ambient noise. The intensity of the car-diac muscle itself differed between moving and fixed image. (Right) Histogram of intensityresiduals, with SSD and GMM fits overlayed respectively in red and green.

At each voxel, the residue is implicitely associated to a component of the mixture. Thisnaturally yields a spatially varying model of noise that is better suited to render thecomplexity of noise patterns in medical images. Unlike previous work [Simpson 2013,Le Folgoc 2014] we do not assume that the noise varies smoothly across the image, as pat-terns arising from misalignment and imaging artefacts are local in nature. Moreover theparameters L, π = πll≤L and β = βll≤L of the mixture will be jointly inferred fromthe data – so that it fits the expected distribution of intensity residuals between registeredimages. Fig. 3.4b shows the histogram of intensity residuals obtained after registering theimages depicted in the 2D example of Fig. 3.3, along with the Gaussian mixture fit jointlyduring the registration process. From the standpoint of energy-based formulations, the pro-cedure effectively learns from the data the most appropriate data matching energy among aprior family of candidates. The respective profiles of the standard SSD loss and the learnedGaussian mixture (GMM) loss are comparatively displayed in Fig 3.5 for the aforemen-tioned example. The characteristic inflexion of the GMM loss, with a reduced growth rateas the intensity residual becomes higher, is responsible for its robustness towards intensityartefacts compared to the standard SSD quadratic loss. From a computational standpoint,Eq. (4.2) fortunately admits variational quadratic upperbounds that serve as mathemati-cally sound proxies for the exact loss and make it as convenient to use as the SSD. It isin fact handled as an iteratively reweighted (voxelwise) SSD when necessary. A similarprocedure is described fully by e.g. [Archambeau 2007]. For the sake of simplicity andclarity of exposition, most of the derivations are therefore presented for the SSD loss.

Another limitation of SSD shared by all aforementioned variants is to assume thateach voxel provides an independent value of intensity. This assumption does not hold inpractice however [Simpson 2012]: the residual between the warped image I Ψ−1 andits counterpart J exhibits local spatial correlations, either intrinsic to the image acquisi-tion and pre-processing (e.g. image pre-smoothing, image upsampling) or introduced as


Figure 3.5: Energy profiles for the SSD (grey dashes) and GMM (black line). The penaltyis plotted as a function of the difference of intensities in the registered images. The mixture-of-Gaussians model achieves robustness by incorporating concave inflexions that result ina soft threshold on the penalty incurred for large intensity residuals.

a consequence of registration misalignments. Ignoring local correlations in the noise pat-tern leads to an artificial increase in the number of independent observations and inducesover-confidence in the data term. On the other hand, modeling precisely the noise structurewould come at a significant computational cost. Instead, we follow [Simpson 2012] in arti-ficially downweighting the SSD term by a factor α that captures redundancies in voxelwiseobservations, based on a virtual decimation procedure suggested by [Groves 2011].

A final fallacy in our generative model of data stems not from the way residuals aremodeled, but from the later assumption that the intensity profile can be evaluated with infi-nite accuracy at any point in the moving image; whereas we actually rely on interpolationof a discrete scalar field. In regions where strong intensity gradients occur between adja-cent voxels (e.g. in the case of neighbouring hypo- and hyper-intense regions), ignoringinterpolation errors leads to an unreasonably high sub-voxel confidence in the optimallypaired coordinates Ci = Ψ−1(ci). Indeed the region in which the intensity value I[Ci] inthe moving image coincides, up to the noise level, with the value J [ci] in the fixed im-age may then shrink down in the gradient direction unreasonably below sub-voxel dimen-sions. To account for interpolation uncertainty, [Wachinger 2014] model interpolation as anoisy observation of the hidden, true intensity value and propose an approximate schemeto marginalize over latent variables. We opt instead for a simple scheme that couples effi-ciently with a Laplace approximation of the data likelihood, as explained in section 3.3.1.

3.2.2 Representation of displacements

In the context of non-rigid registration, an admissible space of transformations should bespecified. In this work we restrict ourselves to a small deformation framework, Ψ−1 = Id+

u, with a parameterized representation of the displacement field u: x ∈ Rd 7→ u(x) ∈ Rd.We constrain the displacement field to be expressed over a dictionary φkMk=1 of radialbasis functions, specifically Gaussian kernels φk(x) = KSk(xk, x) I where I is the d × d


identity matrix and

KS(x, y) = exp−1

2(x− y)ᵀS−1(x− y) . (3.6)

In other words, the displacement field u is parameterized by a set of weights wk ∈ Rdassociated to each basis φk:

uw(x) =∑

1≤k≤Mφk(x)wk = φ(x)ᵀw . (3.7)

φ(x) =(φ1(x) · · ·φM (x)

)ᵀand wᵀ =

(wᵀ

1 · · ·wᵀM

)are respectively the concatenation,

for k = 1 · · ·M , of φk(x) and wk. The basis centers xk span a predefined regular grid ofpoints, typically the whole range of voxel centers. The kernel width Sk is also allowed tovary and spans a user-predefined set of values S1, S2, · · · , Sq. This allows for a redundant,multiscale representation of displacements. In other words we benefit both from a com-pact representation via larger kernels, and from the ability to capture finer local details viasmaller kernels.

This dictionary of basis functions can be seen as a finite-dimensional approximation tothe space HS spanned by Gaussian kernels of a given width S ≤ minkSk. Such spaceshave attractive properties and are related in the literature to spaces of currents [Gori 2013].In particular, we derive in A.1 a family of analytical, computationally efficient probabilisticpriors overHS .

3.2.3 Transformation Prior

In non-rigid registration, the displacement uw is insufficiently constrained by the data andsome regularizing prior has to be imposed over its parameters. This prior distributionencapsulates our knowledge of the deformation and our modeling assumptions (see forinstance [Sotiras 2013] for an exhaustive review of deformation priors). We will considerGaussian priors of the form

p(w|λ, Ak) ∝ exp−1

2

λwᵀRw +

M∑

k=1

wᵀkAkwk

(3.8)

where λ and Akk=1···M are so called hyperparameters. The motivation for such a prioris two-fold.

Regularity control. Gaussian priors in the form of Eq. (3.9) let us penalize physicallyimplausible deformations. They have been commonly used in the literature starting with[Broit 1981], both because of their natural interpretability and soundness in mechanicalterms, and their convenience from an algorithmic and computational standpoint.

q(w|λ) ∝ exp−1

2λwᵀRw (3.9)

The structure of the precision matrix R can be adjusted to penalize the magnitude ofthe first derivative [Gee 1998] or higher order derivatives [Rueckert 1999], effectively


Penalizingcompressibility

(D = div)

Elastic membraneenergy

(D = ∇)

Bending (thin-plate)energy

(D = ∇2)

Figure 3.6: Impact of the regularization model. Displacements are parameterized byisotropic Gaussian kernels of set width σ = 0.25. From left to right, the regularizer varies.The data consists of 4 points regularly sampled on the unit circle, forming an axis alignedsquare, pulled twice as far away from the origin as they initially were. The warped gridobtained by regression is displayed along with the ground truth displacement.

encoding a wide range of priors [Ashburner 2007, Ashburner 2013]. We specificallyconsider the subset of quadratic forms that exploit the structure of the space of Gaussiankernels introduced in Section 3.2.2; namely priors of the type wᵀRw = ‖Duw‖2H,with Du any differential operator acting on u. Details are reported in A.1. By doingso, we obtain among others an analytical, computationally efficient implementation ofa membrane energy with D = ∇, a bending (thin plate) energy with D = ∇2, and apenalty favoring incompressible, divergence free, behaviours by setting D = div. Fig. 3.6illustrates the respective impacts of such penalties.

Basis selection. The second factor in our prior, recalled in Eq. (3.10), induces the basisselection mechanism that we exploit in this chapter.

q(w|Ak) ∝m∏

k=1

exp−1

2wᵀkAkwk (3.10)

The additional termwᵀkAkwk for each basis φk lets us penalize independently the recourse

to this basis to capture the displacement, by penalizing high magnitudes of its associatedweightwk. The limit case of infiniteAk actually constrainswk to be null and thus forbidsthe use of φk to represent the signal. In section 3.2.6 we propose a principled way todetermine optimal values of the set Akk=1···M , from which most of them turn out tobe infinite: we thus obtain a sparse representation of the displacement from our initial,over-complete dictionary.


Basis scale:σ = 0.25



Figure 3.7: Impact of the basis scale on the inferred transform. From left to right, thedisplacement field is parameterized by isotropic Gaussian kernels of increasing width. Thedata consists of 8 points regularly sampled on the unit circle. The underlying motion is arotation of π/4 radian. Only four of these eight rotated samples are displayed for readi-bility. The scale of the bases used to represent the transform strongly affects the area ofinfluence of the data points, as can be seen from the scale at which the regressed transformresembles a global rotation.

3.2.4 Sparse Coding & Registration

Sparsity-inducing priors have a two-fold motivation. The first benefit is in terms of al-gorithmic complexity. Unless resorting to low parametric models, the shear size of theparametrization makes direct optimization cumbersome without the recourse to sophisti-cated solvers. The computation of exact covariance matrices that are typically involved inprobabilistic approaches also becomes unfeasible, while diagonal approximations used intheir stead ignore significant interactions induced by the data and priors, and encoded byoff-diagonal terms. Secondly, basis selection mechanisms adaptively constrain the admis-sible space of deformations, automatically tuning the degrees of freedom to the smallestsufficient set able to capture the observed displacement. Coupled with a multiscale set ofbasis functions, this yields a data-driven, automatic spatial refinement of the granularityof the displacement field that complements the otherwise scale-blind L2 regularization.Adaptive, multiscale regularization was shown to yield state-of-art results e.g. in denoisingnatural scenes [Fanello 2014]. It is also critical for medical image registration. The spreadof informative image structures lacks spatial homogeneity, yet those structures must cor-rectly drive the whole registration. Moreover, the amount of noise and artefacts may varyacross the image in hardly predictable patterns – and the degree of coarseness at whichthe displacement is modeled should be refined in consequence. Fig. 3.7 illustrates the keyimpact of scale when relying on limited observations, on a simple toy example where theunderlying motion is a rotation.

L1 priors have been widely and successfully used in all areas of sparse coding, includ-ing for registration [Shi 2013]. Other sparsity-inducing norms such as k-support norms and


variants [Argyriou 2012, Belilovsky 2015a], that improve over the performance of the L1

norm w.r.t. the degree of sparsity in presence of strongly correlated explanatory variables,have recently been proposed. They were shown to be particularly attractive, including ontasks of functional MR imaging [Jenatton 2012, Belilovsky 2015b]. Here, we turn insteadtowards sparse Bayesian learning, with the prospect of joint estimation of model parame-ters and that of uncertainty quantification. For an extensive review of sparse methods, werefer the reader to the work of [Bach 2012], and to that of [Mohamed 2012] for a bench-mark of L1 and bayesian sparse learning methods. The prior of Eq. (3.10) was first intro-duced by [Tipping 2001] for regression and classification tasks with the so called RelevanceVector Machine. The authors demonstrated its relevance for sparse coding when used inconjunction with the framework of Automatic Relevance Determination [MacKay 1992].[Bishop 2000] offer an alternative sparse Bayesian learning (SBL) view on the RelevanceVector Machine, where they opt for a Variational Bayes treatment. [Wipf 2008] furtherinvestigate links between the SBL and ARD frameworks and resulting schemes. Alterna-tively, Eq. (3.8) can be interpreted as a generalized spike-and-slab prior [Mitchell 1988]despite using a different parametrization, provided that each Ak is constrained to a binarystate – either null or infinite.

3.2.5 Hyper-priors

In this work, we treat hyperparameters A, β, λ as frequentist parameters rather thanlatent random variables. For a fully Bayesian formulation, prior distributions would be im-posed over model parameters. When conjugate priors are available this yields a very elegantmodel over which, for instance, Variational Bayes approximate inference may be used toderive closed form iterative updates. We refer the reader to the work of [Simpson 2012] foran application to image registration.

However, in absence of strong prior knowledge over the values of λ or β, broad unin-formative priors are typically chosen. In the limit this effectively yields identical updatesto the ones we derive with point estimates of the parameters while simplifying the exposi-tion. Moreover the uniform prior on Ak has the appealing benefit of making our inferencescheme invariant to rescaling of basis functions, as will be seen from Eq. (3.22), (3.23).This is highly desirable as we rely on a multiscale dictionary of basis functions (section3.2.2) whose scaling factors may otherwise be hard to relate.

3.2.6 Model Inference

Our probabilistic model is summarized in a graphical manner in Fig. 3.2. The registrationtask now consists in inferring the displacement from the observed data and the prescribedgraphical model. We describe the high-level approach in what follows. Section 3.3presents the method from an algorithmic standpoint.

Hyperparameter inference. We first estimate the values of the model hyperparametersAi, λ, β by maximizing the so called marginal likelihood of the data p(D|A, λ, β).This framework is known as that of type-II maximum likelihood in the statistical literature.


Note that in computing the marginal likelihood, we integrate over the parameters w (theseparameters are marginalized out):

p(D|A, λ, β) =

∫

wp(D|w, β)p(w|A, λ)dw (3.11)

where for convenience of notation we introduce the block diagonal matrix A = diag(Ai).

Posterior distribution of model parameters. Given the optimal parameters A∗, β∗, λ∗ =

arg maxA,β,λ p(D|A, λ, β), we can derive statistics of interest on the transformation ψ byfirst computing the posterior distribution of its parametric representationw conditioned onthe observed data D. Bayes’ rule asserts that:

p(w|D,A∗, β∗, λ∗) =p(D|w, β∗)p(w|A∗, λ∗)

p(D|A∗, λ∗, β∗) . (3.12)

In particular, the maximum of Eq. (3.12) minimizes Eq. (3.1). This bridges the gapwith the classical framework in which registration is seen as the task of optimizing afunctional. In Eq. (3.12) however, the hyperparameters assume optimal values A∗, β∗, λ∗

defined w.r.t. the dataset of interest D. Moreover, the posterior distribution does notmerely encode a point estimate of w. Its higher-order moments relate to the variability inthe inferred parameters. We consider a Gaussian approximation N (µ,Σ) of Eq. (3.12)around its posterior mode to obtain an approximate second moment of the distribution as Σ.

Predictive distribution of displacements. Our model Eq. (4.3) implies a linear rela-tionship between the displacement value u(x) = φ(x)ᵀw at any x ∈ Rd and the modelparameters w. Formally, the posterior distribution of u(x) is then given by Eq. (3.13),where θ = A, β, λ denotes the set of hyperparameters and δ the Dirac distribution.

p(u(x)|D,θ∗) =

∫

wδφ(x)ᵀw[u(x)] p(w|D,θ∗)dw (3.13)

If the posterior distribution p(w|D,θ∗) of w is normally distributed N (µ,Σ), u(x)

is in turn Gaussian with mean φ(x)ᵀµ and covariance φ(x)ᵀΣφ(x). More generally,u|D,θ∗ : x 7→ u(x) is a Gaussian process with mean u( · ) = φ( · )ᵀµ and covariance

Cov (u(x),u(y)) = φ(x)ᵀΣφ(y) . (3.14)

Full posterior vs. frequentist posterior. Strictly speaking we would in fact rather estimatethe full posterior, where hyperparameters have been integrated out as per Eq. (3.15). Whenusing uniform priors over hyperparameters, the posterior probability p(θ|D) is proportionalto the marginal likelihood p(D|θ).

p(u(x)|D) =

∫

θp(u(x)|D,θ)p(θ|D) dθ (3.15)

The full posterior generally cannot be computed without resorting to MCMC estimates.However if hyperparameters are well determined by the data, p(θ|D) ≈ δθ∗ becomes


highly peaked around its mode θ∗. The quality of this assumption is discussed at greaterlength by [Tipping 2001]. The full posterior of Eq. (3.15) is then approximately equalto the frequentist posterior of Eq. (3.13). This motivates our two-step approach of firstlooking for the maximum likelihood hyperpameters θ∗ before computing the frequentistposterior.

3.3 Inference schemes

As described in the previous section (3.2.6), our inference strategy is based on the max-imization of the marginal likelihood as per Eq. (3.11). Unfortunately, the closed formevaluation of this type of integral is generally intractable, whereas its approximation viaMCMC sampling schemes can often be prohibitively costly. [Tipping 2001] notes thatwhen the likelihood and the prior are normally distributed, the data conveniently alsofollows a Gaussian distribution; furthermore the form of Eq. (3.11) becomes such thattractable maximization schemes can be derived with respect to the hyperparameters. Wepropose a Gaussian approximation of the likelihood term and extend the inference strategyof [Tipping 2003] to the broader class of priors described in 3.2.3.

3.3.1 Approximation of the Model Likelihood

The assumption of Gaussianity of the data given the model parameters w does not standfor registration purposes and several strategies can be considered to approximate the modellikelihood. In earlier work [Le Folgoc 2014], a Taylor expansion of the likelihood aroundone of its local maxima was applied, resulting in a dense block matching step. Here insteadLaplace’s method is used around the current estimate of the mode µMP of the posteriordistribution, found by quasi Newton (BFGS) optimization on Eq. (3.12). In other words,as in conventional energy-based registration, we numerically solve for the minimizer µMP

of Eq. (3.16) using the current estimate of hyperparameters A, λ, β:

E(w) = Dβ(I, J ;Φw) +1

2wᵀ(A + λR)w (3.16)

Most notably however, Eq. (3.16) in effect only involves the sparse subset of selected basesφk for which Ak < +∞. Proceeding with Laplace’s approximation of the data likelihoodand dropping the term involving the Hessian of the image1, we arrive at Eq. (3.17):

D(I, J ;w, β) ≈ 1

2

N∑

i=1

(ti − φ(ci)w)ᵀ βHi (ti − φ(ci)w) . (3.17)

The virtual observations ti ∈ Rd and the confidence tensors Hi ∈ Md×d are given by Eq.(3.18), where Ci = ci + uMP(ci) stands for the current (posterior) estimate of the pairingfor ci.

ti = uMP(ci)−I(Ci)− J(ci)

‖∇I(Ci)‖2∇I(Ci) , Hi = ∇I(Ci)∇I(Ci)

ᵀ (3.18)

1This outer product approximation is justified in e.g. [Bishop 2006].

3.3. Inference schemes 37

These virtual pairings immediately relate to the optical flow: if we dropped the confidencetensors Hi, the above would yield an approximation of Eq. (3.1),(3.12) much in the spiritof the demons algorithm [Thirion 1998, Cachier 2004]. The tensors Hi vary sharply acrossthe image however, e.g. as edges or boundaries are crossed. They assign anisotropic, spa-tially varying confidence in voxelwise pairings and account for how informative and struc-tured the image is at the point of interest. The local approximation of Eq. (3.17) transformsan image-based criterion into a landmark-based one, and the proximity to formulations inthe related literature [Rohr 2003] is indeed striking in this form.

The confidence βHi in the virtual voxelwise pairing ti + ci can grow arbitrarily highfor arbitrarily high intensity gradients. These expressions result from linearizing the inten-sity profile I around the current pairing Ci, and are blind to interpolation uncertainty inevaluating I(Ci) and∇I(Ci). To address this shortcoming, we propose to replace βHi by

((βHi)

−1 + Dint)−1

=1

1 + tr[βHiDint]· βHi (3.19)

which implements a soft upper threshold on the precision, as a heuristic for interpolationuncertainty. Dint acts as a minimum covariance: it is a diagonal matrix set to the square of–say– half the voxel spacing to prevent unreasonable subvoxel confidence.

3.3.2 Form of the Marginal Likelihood

We now assume that the data t is generated by corrupting the true signal u = Φw withGaussian noise e ∼ N (0, βH):

t = Φw + e (3.20)

Note that in block form, Eq. (3.17) yields an approximate likelihood model in the formof Eq. (3.20). Given the hyperparameters of the model, the posterior distribution of wconditioned on the data – obtained by combining the likelihood and prior with Bayes’ ruleaccording to Eq. (3.12) – is Gaussian N (µ,Σ) with

µ = ΣΦᵀβHt Σ = (ΦᵀβHΦ + λR + A)−1 (3.21)

The integrand in Eq. (3.11) involves the product of two Gaussian factors, and by inte-grating out the weights w we obtain the evidence or marginal likelihood for the hyperpa-rameters:

p(t|A, λ, β) = | 2πC |−1/2 · exp

−1

2tᵀC−1t

(3.22)

where by identification C−1 = βH − (βH)ΦΣΦᵀ(βH). In other words, the distributionof the data t conditioned on the hyperparameters A, λ, β is Gaussian N (0,C). Further-more, it follows from the Woodbury matrix identity that

C = (βH)−1 + Φ(A + λR)−1Φᵀ . (3.23)

Interestingly, the process of setting the hyperparameters can then be seen as fitting acovariance model C to the data t via a maximum likelihood approach. Note also that thetwo factors in Eq. (3.22) have antagonistic effects: while the left hand term penalizes


a. Moving image b. Fixed image c. Output displacement field

d. Adaptive parametrization(In the initial steps)

e. Adaptive parametrization(At the end)

Figure 3.8: Basis selection mechanism displayed on an example 2D registration betweenslices of cardiac MR images (cf. sec 3.4.1), respectively at ES (a) and ED (b). (d,e) Basesselected in the initial steps of the algorithm vs. at the end. The locations and scales of theGaussian RBFs are indicated by circles (isocontour at 1 std). (c) Inverse displacement fieldoutput by the algorithm (scale factor: 2), smoothly varying across the whole image.

covariance matrices C that waste mass (via |C|), the right hand term gives incentive tospend mass to better explain the data t. This compromise leads to sparsity, as revealed bya careful look at the form of C in Eq. (3.23). Indeed, regardless of the value of A, part ofthe data is explained for free by the contribution (βH)−1 of the noise to C; thus only a fewdegrees of freedom need be active (Ak <∞) to fully explain the data.

3.3.3 Hyperparameter inference

Following the strategy discussed in 3.2.6, we wish to maximize the marginal likelihood(Eq. (3.22)), or equivalently its logarithm, with respect to the model hyperparameters. Weconsider schemes that monotonically converge towards a local maximum by iterativelymaximizing, or merely increasing, the evidence w.r.t. either one of the hyperparametersA, β or λ.

Maximization w.r.t. A | a hill-climbing scheme. The procedure relies on a sequence of

3.3. Inference schemes 39

additions and deletions of candidate basis functions starting from an empty set S = ∅ ofactive bases (all Ak’s set to ∞). This notably avoids the O(M3) cost that would stemfrom the matrix inversion in Eq. (3.21), if all M bases were included in the active set Sat the start (all Ak < ∞). At each iteration, we take a single action among the additionof a previously inactive basis (k /∈ S), or the update or deletion of an active one (k ∈ S),in a principled way. Specifically, we implement the action that leads to the largest gain inevidence. This is possible because, when all other hyperparameters A−k, λ, β are fixed,the contribution l(Ak) of a given basis to (the logarithm of) the evidence for any valueof its associated hyperparameter Ak can be singled out in the form of Eq. (3.24). Detailsabout the statistics involved κk, sk and qk and the maximization of l(Ak) are left to A.2.

l(Ak) = log |Ak + κk| − log |Ak + κk + sk|+ qᵀk Ak + κk + sk−1 qk

(3.24)

Naturally, as bases are added or removed from the active set S via updates of their as-sociated hyperparameter, the potential contribution maxAk

l(Ak) of other bases is subjectto change. In practice, the statistics κk, sk and qk indeed depend on A−k, λ, β via the mo-ments µ and Σ of the posterior distribution (Eq. (3.21)). Therefore, after updating a givenAi, we exploit rank-one matrix identities to update these moments with a complexity of atmostO(|S|2) (A.4). We then update all the κk, sk and qk with a complexity ofO(|S|) perbasis. Therefore the worst-case complexity of the updates is of O(|S|2 + M |S|), as op-posed to O(M3) for EM updates. The decrease in complexity is rather extreme, as typicalrespective orders of magnitude of |S| and M for registration would be 102 and 106.

To gain a better intuition on how this scheme proceeds, let us look back at the quantitiesinvolved in Eq. (3.24). Firstly, qk captures the relevance of φk towards providing a betterexplanation for the data. It is related to the projection of the residual on the basis φk.When L2 regularization is used (λ > 0), qk also involves the projection of the residual onnearby active bases φj (j ∈ S). Secondly sk captures the amount of overlap between thebasis φk under consideration and those already in the active set S, therefore accounting forcompetition between strongly correlated bases. κk arises from the added regularizationexp−1

2λwᵀRw, which was absent in the regular RVM (λ = 0). The sum Ak + κk

entering Eq. (3.24) highlights the competition between the shrinkage (sparsity-inducing)mechanism due to Ak and the L2-norm energy regularization that is accounted for by thequantity κk.

Maximization w.r.t λ and β | EM updates. We estimate the hyperparameters λ, β usingExpectation Maximization updates. Instead of directly maximizing our target criterion, themarginal-likelihood p(t|A, λ, β), EM proceeds by maximizing a surrogate quantity, theexpected complete log-likelihood:

arg maxλ,β

Ep(w|D,A∗,β∗,λ∗) [log p(t,w|A∗, λ, β)] . (3.25)

The expectation is taken with respect to the current estimate of the posterior distributionp(w|D,A∗, β∗, λ∗) of w. β∗, λ∗ are quantities fixed at their current values, as opposed


to variables to optimize. In practice, we iterate between reestimation of the noise level βand of the regularization trade-off λ, fixing the other parameter in turn. The EM schemeis appealing as it guarantees an increase in marginal likelihood, despite maximizing a sur-rogate. Furthermore, when the distributions involved are Gaussian, the expected completelog-likelihood (3.25) can be expressed directly in terms of the mean and covariance matrixof the Gaussian distributions. For the regularizing trade-off λ, this leads to maximizing thesmooth, convex function:

λ∗ = arg maxλ≥0

−λ2

tr(ΣR) +1

2log |A + λR| − λ

2µᵀRµ . (3.26)

The solution can be found numerically in inexpensive O(|S|) iterations after a single sin-gular value decomposition in O(|S|3). Alternatively, if we restrict Ak to be either null orinfinite as suggested in section 3.2.4, we obtain the simpler analytical update:

|S| · λ∗−1 = µᵀRµ+ tr(ΣR) . (3.27)

Leaving optimization details aside, we can instead gain some insight into the EM update byexamining the solution of the constrained optimization problem Eq. (3.26). Setting asidethe case where the constraint is active (λ∗ = 0), the solution λ∗ cancels the gradient of thefunction to maximize. From a careful analysis of the terms involved in the gradient, thefollowing identity holds at λ∗:

Ep(w|A∗,λ∗) [wᵀRw] = Ep(w|D,A∗,β∗,λ∗) [wᵀRw] (3.28)

λ∗ is chosen such that a sample drawn from the updated prior has the same energy asa sample drawn from the inferred posterior, on average. In other words, the EM updatemodifies the strength λ of the prior to best reflect what has been inferred from the data.The noise level β is updated according to Eq. (3.29). It accounts for a bias between theestimated and true displacements, and for the variance in the estimated displacement.

N · β∗−1 = ‖t−Φµ‖2H + tr(ΣΦᵀHΦ) . (3.29)

In other words, β∗−1 is set to the expected error (averaged over pixels) if sampling fromthe current estimate of the posterior.

3.3.4 Algorithmic overview

Algorithm 1 summarizes our pipeline for registration. The algorithm mostly works byiterative refinements of the subset of active basis functions φj , j ∈ S with fast updates,based on an approximate likelihood model around the current mode of the displacementparameters µMP. Every now and then, the noise and regularization parameters λ, β areupdated, which makes it necessary to recompute every statistic in full. Before doing so,we re-estimate the posterior mode µMP from the true likelihood model and the current(reduced) set S of active bases and recompute the likelihood approximation around thismode. To further accelerate the pipeline, the scheme can be coupled with a multiresolutionpyramidal scheme, starting with downsampled (smoothed) versions of the image I and Jand progressively moving through the pyramid of images to the images I and J at fullresolution.

3.4. Experiments & Results 41

Algorithm 1: Sparse Bayesian registration algorithm1: Initialize Ak =∞ for all k (S = ∅)2: Initialize µ = 0 and Σ = 0

3: repeat4: Update β according to Eq. (3.29).5: Find the posterior mode µMP of Eq. (3.12) for the current values of A, λ, β by

quasi-Newton (BFGS) search.6: Compute a quadratic approximation of the likelihood around µMP: Eq.

(3.17),(3.18)7: Recompute µ (= µMP) and Σ in full from Eq. (3.21)8: Recompute qk, sk and κk in full from Eq. (A.31), (A.32) and Eq. (A.33), (A.34),

for all k9: for p iterations do

10: ∀k ∈ S (resp. k /∈ S), compute the gain maxAk ∆l(Ak) in marginal likelihoodobtained by updating or deleting k from S (resp. adding k to S), using Eq.(3.24) and A.2.

11: Select the most favorable action i s.t. maxAi ∆l(Ai) ≥ maxAk ∆l(Ak) for all k.12: Set A∗i = arg maxAi l(Ai) and update S .13: Update µ, Σ via rank-one matrix identities from Eq. (A.22), (A.25), (A.24).14: Update qk, sk and κk for all k using rank-one updates e.g. Eq. (A.35), (A.36)

and Eq. (A.33), (A.34).15: Update λ according to Eq. (3.26).16: until no action leads to a significant increase in marginal likelihood.

3.4 Experiments & Results

We experiment with the proposed sparse Bayesian framework on tasks of cardiac motiontracking. The goal in cardiac motion tracking is to accurately recover the hidden motionof the cardiac muscle over the course of the cardiac cycle from a time series of 3D images.The first experimental setup aims at clarifying and analyzing the empirical behaviour of theproposed algorithm and focuses on a simple example of 2D pairwise registration. All othersections involve full 3D + t motion tracking on various imaging modalities, namely cineSSFP sequences, tagged sequences and a synthetic ultrasound dataset. Fig. 3.9 displaysexample 2D slices from frames of each modality.

The experimental setup is identical across all modalities, and indeed we aim to demon-strate that the proposed framework is flexible enough to adapt seamslessly to the pecu-liarities of each dataset. For all 3D experiments, the multiscale parametrization of thedisplacement field consists of isotropic Gaussian kernels at two scales, of respective vari-ance S1 = 202 mm2 and S2 = 102 mm2, plus an anisotropic Gaussian kernel of variance102 mm2 in the short axis plane and 202 mm2 along the long axis. Indeed since our frame-work imposes no restriction on the parametrization of the displacement field, a natural wayto put this advantage to use is to introduce anisotropic bases of potential anatomical rele-vance. All scales are jointly optimized upon. As explained in section 3.3.4, the multiscale


Figure 3.9: Example slices for the cardiac imaging modalities that we experiment on. (Left)3D tagged MR image (Middle) 3D cine SSFP MR image (Right) 3D echocardiographicimage. In all experiments the same flexible model of registration is applied successfully,regardless of the artefacts and patterns peculiar to each modality.

representation of the displacement field is coupled with a classic multiresolution, pyrami-dal scheme on the images of interest themselves – they are downsampled by a factor 2

(and smoothed) at three different resolutions. In other words the lowest resolution levelis subsampled by a factor of 4 compared to the original image. Finally, we use a bending(thin-plate) energy as a regularizer (Section 3.2.3) and we constrain the hyperpriors Ak

to a binary state (sections 3.2.4 and 3.3.3) to prevent competition between two types ofregularization: the regularity induced by λR on the derivatives of the displacement andthat induced by Ak on the magnitude of specific weights wk. Note also that our registra-tion scheme does not make use of pre-segmentations of regions of interest, which could bechallenging or otherwise impractical to obtain.

3.4.1 Self-tuning registration algorithm: an analysis

As explained in the previous sections, the parameters introduced in the model of registra-tion are inferred jointly during the course of the algorithm. We now leverage the example2D registration of Fig. 3.3 3.8 to provide some insight into our schemes and analyze theconvergence of key parameters throughout iterations.

Basis selection & regularization. From an algorithmic standpoint, the algorithm proceedsby iteratively adding dictionary bases into – or deleting them from – the active parametriza-tion of the displacement field. Every few iterations, the noise model and regularizationlevel λ are re-adjusted based on the current estimate of the displacement and its uncer-tainty. Fig. 3.10 demonstrates how basis selection mechanisms empirically combine withparameter re-estimation throughout iterations to provide a seamless convergence towards areasonable local minimum.

The regularization level λ is initialized thanks to a heuristic that provides a "very large"value for the registration at hand. Initially this effectively prevents the addition of finer dic-tionary bases in the model, whose impact on the signal regularity is too high at this stage.Instead coarse bases are favoured, which capture the global trends in the observed displace-


Figure 3.10: Basis selection mechanism and its coupling with the jointly estimated regu-larization level, across iterations. (Top) Addition, update or deletion of dictionary bases inthe active parametrization of the displacement field across iterations. Three distinct scalesare used in the representation of displacements (1 curve per scale). (Bottom) Regulariza-tion parameter λ, updated every few iterations, plotted against the number of iterations runsince the beginning of the registration.

ment. The regularization level is consequently refined to reflect the actual regularity in thisinferred displacement. As λ decreases towards a more sensible value, finer bases get incor-porated in the active set to capture finer local details of the visible motion, or to ensure thatthese finer details of the inferred motion blend smoothly with the rest of the displacementfield. In case of significant overlap between a subset of fine bases and a coarse basis, thebasis at the coarsest scale may automatically be removed – as it no longer contributes to-wards a better explanation of the data. Towards the last iterations, algorithmic convergencehas roughly been reached. This is evidenced by the plateau value of λ and by the fact thatmost actions consist in small updates of active bases (specifically, their orientation) ratherthan in the addition or deletion of dictionary bases.

Fig. 3.8 further illustrates this mechanism of basis selection, with the location of basesin the active parametrization being depicted at two points in time – in the initial steps ofthe algorithm and at the end.

Noise model estimation. The noise model is jointly estimated over the course of thealgorithm. In all experiments it displayed a fast convergence towards its final inferreddistribution. This is examplified by Fig. 3.11, where the probability density functionscorresponding to the Gaussian mixture inferred at an early iteration and at the final


Figure 3.11: Inferred noise model, i.e. inferred expected distribution of intensity residualsbetween the fixed and warped images. The learned distribution is comparatively shown atan early stage in the registration (black dashed line) and at convergence (green solid line).The curves are nearly indistinguishable, hinting towards fast convergence in this example.The zoom on the distribution tails still evidences a slightly lower noise level at the end ofregistration, as intuitively expected.

iteration are hardly distinguishable. This partly reflects the fact that our registrationexperiments involve small displacements, with the coarse patterns in the underlyingmotion being captured early on. The Gaussian mixture is also quick to adapt to changesin the distribution of intensity residuals that arise from our multiresolution pyramidalscheme, when hopping from a smoothed downsampled image to the next level in thepyramid of images, as seen from Fig 3.12. The jumps from a coarser image resolution to afiner image resolution occur at iteration 10 and iteration 20.

Robustness w.r.t. initialization. Beyond the seamless convergence of model parametersover the course of the algorithm, one may also wonder whether their estimated finalvalue displays consistency across a range of possible initializations. Fig. 3.13 providesevidence towards the empirical robustness of the estimated level of regularity λ w.r.t.its initial value. The final value of λ typically varies by significantly less than an orderof magnitude when initialized from a range of values covering several orders of magnitude.

3.4.2 Synthetic 3D Ultrasound Cardiac Dataset

We first demonstrate our approach on synthetic sequences of 3D ultrasound data providedas part of a registration challenge organized for the 2012 MICCAI workshop on Statisti-cal Atlases and Computational Models of the Heart (STACOM). Details on the challengemethodology can be found in [De Craene 2013] along with participant results. These syn-thetic images count approximately 10 million voxels each, at an extremely fine isotropic


Figure 3.12: Inferred noise model at the beginning of registration and at the end. Theregistration scheme relies on a multiresolution representation of registered images, jump-ing from a coarse resolution to a finer resolution at predetermined iterations. The inferreddistribution of intensity residuals is shown for the lowest resolution image (light green)and finest resolution image (dark green). As expected the noise model learned on down-sampled, smoothed images has a higher probability of low noise (higher peak around 0)but also of high noise (higher tails) due to the increased misalignment at the beginning ofregistration.

Figure 3.13: Robustness of the inferred regularity level w.r.t. its initial estimation. The 2Dregistration is run 3 times, and initialized each time with a differing level of regularity λ(respectively 104, 106, 108). Each curve shows the evolution of λ over the course of theassociated run. While the initial value of λ spans 4 orders of magnitude, its final estimatevaries by at most a factor of 4 across runs.


end-diastole frame end-systole frame

Figure 3.14: Ground truth mesh (green transparent surface) vs. reference mesh transportedvia registration (overlayed black wireframe). The extrapolated motion out of the field ofview (where the arrow points) remains close to the ground truth. The maximum error doesnot exceed 4mm. Best seen by zooming in.

resolution of 0.33mm. Without further optimization of our code w.r.t. RAM managementwe had to downsample them by a factor of 2 to process them. We thus worked at a resolu-tion of 0.66mm at the finest level.

The appeal of this benchmark is to offer a dense ground truth in terms of motion andstrain inside the cardiac muscle, as displacements in the myocardium are directly pre-scribed from the output of an electromechanical model of the heart as part of the work-flow of image generation. For each sequence of images, the ground truth consists of asequence of meshes of the left and right ventricles deformed over the cardiac cycle. Thedata extracted from such ground truth meshes can be compared to that obtained by de-forming the mesh at a reference time point (namely, end diastole) throughout the cardiaccycle with the transformation output by the proposed registration approach. Prior to anyquantitative assessment, we first comment that the visual and qualitative behaviour of theproposed approach was found to be satisfactory, even in terms of extrapolation – as indeedthe inferred motion remained consistent in areas of the right ventricle that fell outside ofthe field of view (Fig. 3.14). This indicates an effective regularization mechanism, despitebeing automatically tuned.

We evaluate the accuracy of our approach on a first subset of sequences that image thesame motion at various Signal-to-Noise Ratios (SNRs). Because the proposed approachinfers a consistent motion both inside and outside of the field of view, we find natural toassess its accuracy from statistics based on the whole mesh. This slightly departs fromthe methodology of [De Craene 2013] where part of the left ventricle only is considered.Fig. 3.15 reports the median point-to-point error in the inferred displacement for each timeframe, where the median statistics is computed from every node in the mesh. At the bestSNR, the highest error is observed around end systole with a median of 0.83mm, althoughthe spread of error values becomes wider in the last frames. This falls in the same rangeas that reported for challenge participants by [De Craene 2013] – although slightly higher


Figure 3.15: Accuracy benchmark on the 3D US STACOM 2012 normal dataset, reportingthe median tracking error over time for varying SNRs (blue, green, red curves). For thereference SNR (in blue), overlayed quartiles (boxplots) picture the dispersion of errors.

than the most accurate methodology. Of course part of the error is likely to be attributableto the use of downsampled, smoothed images with a resolution of 0.66mm as opposed to0.33mm. Besides as the signal to noise ratio degrades, we observe as expected a globaltrend of increased error magnitude. As seen from Fig. 3.16, the increased SNR impactsthe noise model (with a higher prevalence of small intensity residuals at high SNR) learnedby the proposed approach, which in turn becomes more conservative in its estimates ofdisplacements.

Figure 3.16: Evolution of the inferred noise model for increasing Signal-to-Noise Ratios.


Normal Ischemy (LCX)

Figure 3.17: Bull’s eye plots of the radial, longitudinal and circumferential strain com-ponents at end-systole, averaged over AHA segments: estimated (top) and ground truth(bottom). A healthy case (left) and an ischemic case (right) are reported.

Fig. 3.17a reports (Green Lagrangian) strain measures at end systole averaged overAHA segments. This provides indirect evidence of the relevance of the automaticallytuned regularity level λ and of the displacement parametrization. Ground truth values ofstrain obtained from the corresponding ground truth mesh are compared to those estimatedfrom the output of registration. Variations in the strain across segments are generally wellcaptured, even more so for its longitudinal and circumferential components. Similarly tomost methodologies however, the radial strain – which captures the thickening of the mus-cle during the contraction – appears to be globally somewhat underestimated in the leftventricle. Such bias in the radial strain might result from a slight bias in the estimatedplacement of the endo- and/or epicardium. This might indicate a slightly conservative esti-mate of displacements due to a coarse parametrization or over-regularized transformation.The following table provides statistics on the number of bases of each scale used for theparametrization of the displacement field, for the normal case at highest SNR. The numberof active bases on these sequences is typically smaller than that used in our experiments oncine and tagged data, with a lesser reliance on fine-scale bases. It may evidence increasedconservatism in the estimated displacements, as well as indicate greater regularity of thesynthetic ground truth motion.

Basis type Median # (Q1 – Q3)

σ = 20mm 17.5 (14.25 – 19)anisotropic σ 15 (11.25 – 20.5)σ = 10mm 34 (31.25 – 38)

Total 64.5 (60 – 71)

Table 3.1: Number of bases at each scale in the active parametrization of the displacementfield (pooled over all frames in the sequence). Median, first and third quartiles are reported.


The benchmark also provides datasets that aim at reproducing pathological cardiacfunction, including a case where certain AHA segments become quasi akinetic due to is-chemy. Fig. 3.17b summarizes estimated regional strains for this case, with qualitativeretrieval of the ischemic segments (bolded contours), as emphasized by the comparisonwith the normal case. The accuracy on the ischemic case is similar to that of the normalcase at identical SNR, with a median error at end systole of 0.80mm.

3.4.3 STACOM 2011 tagged MRI benchmark

In 2011 the MICCAI workshop on Statistical Atlases and Computational Models of theHeart (STACOM) proposed a cardiac motion analysis challenge. The challenge datasets,aimed at evaluating the accuracy of motion tracking algorithms, are openly hosted by theCardiac Atlas Project. The data includes a set of 15 sequences of 3D tagged MR images.Fig. 3.9 (Left) shows an example slice for such an image. The grid-like tags overlayed onthe region of interest allow to follow the motion of keypoints on the boundary of, or insidethe cardiac muscle. Each sequence in the dataset thus comes with a corresponding set of12 landmarks, the motion of which was manually tracked over time. The landmarks aretypically located where tags intersect, and divided in three groups of 4 points in the basal,mid-ventricular and apical areas of the left ventricle. Details of the experimental settingalong with challenger results are provided and analyzed by [Tobon-Gomez 2013].

Figure 3.18: Accuracy benchmark on the 3D tag STACOM 2011 dataset, reporting box-plots of tracking errors on all methodologies. The doted black line represents the averageinter-observer variability.

We validate our approach on this dataset of real 3D tagged MR sequences. Manuallandmarks serve as ground truth from which the accuracy of our methodology at End-Systole (ES) is assessed. Fig. 3.18 summarizes challengers’ results along with ours. Theproposed approach achieves state-of-art results on this benchmark with a median accuracyof 1.46mm. As a point of comparison, the variability in the landmark tracking was esti-


mated as part of the challenge methodology at 0.84mm. We perform two simple statisticaltests to quantify the statistical significance of the increase in accuracy of our methodologycompared to the challengers: a pairwise Student-t test and a pairwise Kolmogorov-Smirnovtest. The tests are run for each pair of samples involving the proposed approach against achallenge participant’s. The Student-t test aims at detecting significant differences in thetrue mean error of our method versus a challenger’s, whereas the Kolmogorov-Smirnovtest more generally aims at detecting whether the underlying distribution of errors differ.Figures are reported in Table 3.2 and provide some evidence towards a significant improve-ment from at least 3 of the 4 methodologies.

Challenger Student-t p-value KS p-value

iLogDemons < 2.2 · 10−16 < 2.2 · 10−16

MEVIS 0.0099 0.1385IUCL < 2.2 · 10−16 < 2.2 · 10−16

UPF 2.45 · 10−5 0.00024

Table 3.2: Statistical significance of the increase in accuracy on the STACOM 2011 3Dmotion tracking challenge. We report p-values of pairwise tests for the proposed approachversus each participant’s. Bolded values highlight significant improvements at the 5%

significance level.

Rather than over-emphasizing the reach of these statistical tests we would like thereader to appreciate the ability of the proposed formulation to achieve in a quasi auto-matic manner results qualitatively and quantitatively on par with the state of the art, asdemonstrated on this benchmark. In particular it should be emphasized that all parametersinvolved in the proposed formulation – the noise model and regularity level λ, the activeparametrization of the displacement field – were automatically determined during registra-tion. All 3D registration experiments reported in this section involve little user interaction,in effect run from the same model and settings.

Figure 3.19: Strain at ES, computed from the 3D tag data of volunteer V9.


Interestingly the strain maps and mesh deformations produced by our scheme, as illus-trated for instance in Fig. 3.19, also appear to be qualitatively on par with the best challengeresults in that respect, and superior to that of the closest competing methodology accuracywise (please refer to [Tobon-Gomez 2013] for a direct counterpart to Fig. 3.19). This hintstowards the fact that the automatically adjusted weights of the data energy versus the reg-ularization energy, beyond their theoretical grounds, are of practical relevance. Finally wereport in Table 3.3 the number of bases of each scale used for the parametrization of thedisplacement field.


σ = 20mm 17 (14.75 – 20)anisotropic σ 30 (25 – 33.25)σ = 10mm 100 (90 – 110.25)

Total 148 (134 – 160)

Table 3.3: Number of bases at each scale in the active parametrization of the displace-ment field (pooled over all sequences and all frames). Median, first and third quartiles arereported.

3.4.4 Cine MRI dataset: qualitative results and uncertainty

In addition to tagged MRI sequences, the STACOM 2011 challenge datasets include 15

sequences of cine SSFP MR cardiac images. For each of the 15 volunteers the left and rightventricles are imaged in 3D, over 30 frames covering the cardiac cycle. Original imageshad a low inter-slice resolution of 8mm compared to the in-plane resolution of 1.25mm,and we upsampled them (typically by a factor of 5) prior to the registration process toprevent a degradation of numerical accuracy. To obtain a ground truth by direct manualtracking of landmarks over time was deemed difficult for this image modality. Insteadthe accuracy of the proposed algorithm was evaluated by cross-comparison with direct3D+t segmentation results. Specifically, the endocardium was delineated over time on 2Dslices using the freely available software Segment2 [Heiberg 2005], yielding a 3D pointset of discretized contours. A 3D surface was then reconstructed as the zero level set ofa signed distance map computed by radial basis interpolation, after estimating the normalto the surface at every point in the set from a local neighborhood3. We then assessed thediscrepancy between the reference end diastole segmentation transported over time via theoutput of registration, and the surface estimated by direct segmentation of the endocardiumat each time step.

Fig. 3.20 summarizes the distribution of errors over time, pooled over all 15 sequencesand all contour points, displaying the evolution of key quantile-based statistics. The medianerror reaches a satisfactory maximum of 1.82mm for frame 10, which roughly coincideswith the end systole time for all volunteers. As a point of comparison, the volumes under

2http://segment.heiberg.se3http://hdl.handle.net/10380/3149


Figure 3.20: Accuracy benchmark on the cine SSFP STACOM 2011 dataset, reportingmedian error over time along with quartiles. Surfaces reconstructed from slice-by-slice3D+t segmentation serve as ground truth. Points on the discrete contours delineated attime 0 are transported over time with the registration output and point-to-surface distancesare gathered. For each time step, errors over all 15 sequences and all contour points arepooled. Errors at time 0 are induced by the surface reconstruction.

consideration have a spacing of 1.25mm in the short-axis plane (i.e. within slices) and 8mmalong the long-axis (i.e. inter-slice). The wide spread of error values partly reflects thechallenge in obtaining a 3D segmentation of the endocardium that remains consistent overtime (e.g. due to the variable appearance of papillary muscles). Misalignment of short-axisslices in 3D volumes, which may arise from the (slice by slice) image acquisition process,also accounts for some of the largest discrepancies. We observed no evident spatial patternin the distribution of errors, although the segmentation rarely reached the very tip of theapical region.

The mixture-of-Gaussians noise model proved adequate here again, as distinct com-ponents captured variations in the level of noise of an order of magnitude (a factor of 10

between the standard deviations of extreme components). Indeed regions of interest changeappearance over time and motion, and tend to be assigned higher noise levels than the base-line acquisition noise level. Voxels in basal slices, with visible outflow tracts and apparenttopology changes, also tend to fall in the noisiest components. Finally, Fig. 3.21 atteststo the high variability (several orders of magnitude) of the optimal model parameter λ forvarying sequences and time steps, which would render its manual estimation via a trial-and-error or cross-validation approach cumbersome. The apparent bimodality of the histogrammight reflect the fact that cardiac phases with significant contraction or relaxation, aroundend systole, alternate with phases of lesser motion around end diastole.

Finally, because of the strong anisotropy in spacing (lower inter-slice resolution) thatis characteristic of the image acquisition process, cine MR data examplifies the necessity

3.5. Discussion and conclusions 53

Figure 3.21: Histogram of inferred values for the regularity hyperparameter λ, pooled overall 15 sequences and 30 frames per sequence.

of modeling uncertainty in the interpolation of discrete intensity profiles (cf. sections 3.2.1and 3.3.1). When not accounting for it the regularity of the inferred transform was sys-tematically found, upon visual inspection, to be of lesser quality in the direction of lowerresolution. This behaviour is to be expected if the scheme is blind to its increased relianceon interpolation to find locations of matching intensity values in the moving image. Imageupsampling prior to registration also relies on interpolation and was accounted for in anidentical manner. Table 3.4 reports statistics on the number of bases of each scale used forthe parametrization of the displacement field.


σ = 20mm 18 (10 – 28)anisotropic σ 29 (21 – 38)σ = 10mm 44 (29 – 57)

Total 93 (75 – 111)

Table 3.4: Number of bases in the active parametrization of the displacement field (pooledover all sequences and all frames), at each scale. Median, first and third quartiles arereported.

3.5 Discussion and conclusions

We proposed a data-driven, spatially adaptative, multiscale parametrization of deforma-tions for registration. It uses larger kernels in regions of high uncertainty due to e.g. lack ofimage gradients or incoherent information in registered images, and uses smaller kernels


Frame 0 Frame 7

Frame 15 Frame 29

Figure 3.22: Example registration for the cine SSFP dataset of volunteer V5. We propagatethe segmentation from the reference frame to the rest of the time-series with the output ofthe registration. The resulting mesh is overlayed on a 2D slice and visualized at fourrepresentative timesteps. The 3D mesh attests to the regularity of the underlying transform,and to its coherence over the cardiac cycle.

where a finer motion can be estimated with confidence from local cues in paired images.This is achieved in a Bayesian framework, so that the approach retains natural advantagesof probabilistic formulations such as the joint inference of registration hyperparameters.In effect this yields a self-tuning algorithm of general scope with attractive capabilities interms of achieved accuracy and regularity. The prime contribution of this work is a proce-dure for fast marginal likelihood maximisation in sparse Bayesian models, that relaxes theassumptions made by [Tipping 2003] for the fast Relevance Vector Machine at virtually noalgorithmic cost. It broadens the scope of the RVM much in the same direction as the mixedL1-L2 Elastic Net regularization [Zou 2005] extends the L1-norm LASSO regularization[Tibshirani 1996]. This scheme applies to the wide range of classification and regressiontasks that share the same abstract graphical representation as Fig. 3.2 or variants thereof.

While we left the question of uncertainty quantification mostly unadressed, we notethat the proposed framework provides us with a readily computable, compact |S| × |S|covariance matrix Σ that summarizes uncertainty on the degrees of freedom in the active

3.5. Discussion and conclusions 55

parametrization. The covariance on transformation parameters can be turned into direc-tional estimates of uncertainty at any point in space by simple linear algebra, or can besampled from at a marginal O(|S|2) cost to efficiently explore the joint variability of thefull transformation. Sampling the transformation itself, unlike sampling displacements in-dependently at each point in space, preserves correlations in the displacement of close-bypoints. This can be instrumental in deriving empirical estimates of uncertainty on integralgeometrical quantities. For instance, Fig. 3.1(b) reports estimates of uncertainty in the vol-ume enclosed over time by the endocardium surface, as segmented on the reference frame(at time 0), for a cine MRI sequence (volunteer 5). For the same volunteer, Fig. 3.1(c)summarizes, in the form of a tensor map, the uncertainty in the inferred displacement fieldat end-systole, accounting for uncertainty in the output of each frame-to-frame registrationbetween end-diastole and end-systole. Tensors are rasterized at the voxel centers of theend-systole frame. Each tensor encodes (the square root of) the 3 × 3 covariance matrixof the displacement at a given point and is evidently elongated in directions of higher un-certainty. Due to voxel spacing anisotropy in the cine SSFP dataset, the direction of higheruncertainty is, consistently across space, aligned with the long-axis. The color scheme thusencodes the second principal direction of highest uncertainty. Steep intensity gradients inthe underlying image typically translate into directions where tensors are least elongated.Tensor magnitude and principal directions vary smoothly across space, as estimates of un-certainty incorporate information of a local (and in fact, global) nature. The yellow dashedline gives a visual cue as to the position of the left ventricle endocardium boundary. Fu-ture work will have to assess extensively the quality of the approximate posterior returnedby our fast scheme compared to the exact model posterior (as explored e.g. via MCMCtechniques), and the empirical agreement between the exact model posterior and intuition.

We experimented with the proposed framework of registration on tasks of motion track-ing on dynamic cardiac data. A flexible noise model based on mixtures of Gaussian dis-tributions was introduced and performed suitably on all tested modalities, advantageouslyreplacing models of smoothly varying noise. This is likely due to an increased ability torepresent intricate spatial patterns of intensity residuals arising from acquisition noise andartefacts, registration misalignment and variable appearance of organs over time. Despiteusing generic multiscale RBFs, the inferred parametrizations of 3D displacements werehighly sparse, typically involving no more than a hundred degrees of freedom. We notethat tagged MR images encouraged the recourse to finer bases; beyond the higher resolu-tion of these volumes (compared to cine SSFP data), it is likely that tags were found to bereliable, informative structures along all directions of motion. While good accuracy wasachieved on synthetic echocardiographic time series with a reduced number of bases, thesynthetic motion from which the sequences were reconstructed is likely to have enjoyedgreater regularity as well.

Despite not relying on temporal regularization to help in tracking the motion of thecardiac muscle (as in e.g. [De Craene 2012]), the temporal and spatial consistency ofdeformations was judged satisfactory, as evidenced by Fig. 3.1(a), 3.22. Still, incor-porating temporal regularization and moving towards a large deformation framework[Beg 2005, Arsigny 2006] with geodesic-by-part trajectories may be fruitful for thisapplication. It could also constitute a step in moving from a data-driven parametrization of


displacements (beneficial to the quality of registration) towards an anatomically relevantparametrization. Indeed the proposed framework of basis selection may be suitable forlearning a parametric atlas of motion from a small dataset of 3D+t images in the spirit ofe.g. [Allassonnière 2007, Durrleman 2013, Gori 2013].

CHAPTER 4

Quantifying RegistrationUncertainty with Sparse Bayesian

Modelling: Is Sparsity a Bane?

Contents4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 Statistical Model and Inference . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Bayesian Model of Registration . . . . . . . . . . . . . . . . . . . 60

4.2.2 Model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.3 Posterior Exploration by MCMC Sampling . . . . . . . . . . . . . 64

4.3 Predictive Uncertainties: Marginal Likelihood Maximization vs. ExactInference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4 Preliminary Experiments and Results . . . . . . . . . . . . . . . . . . . 71

4.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.1 Introduction

Non-rigid image registration is an ill-posed task that supplements limited, noisy data with‘inexact but useful’ prior knowledge to infer an optimal deformation between images ofinterest. As a standard processing step in many pipelines for medical imaging, for compu-tational anatomy & physiology, registration would greatly benefit from the development ofprincipled strategies to analyze its output and to subsequently re-evaluate model assump-tions. Bayesian modelling and inference, potentially in conjunction with hypothesis test-ing, provides a framework to explicitly incorporate prior assumptions and re-assess theirrelevance in retrospect. We focus on another expected benefit of Bayesian approaches, theopportunity to go beyond point estimates of the optimal deformation towards quantificationof uncertainty in the solution.

In a seminal paper Gee and Bajcsy [Gee 1998] lay the groundwork for a Bayesian in-terpretation of registration, extending the mechanical formulation of Broit [Broit 1981].Exploiting the Gaussian Markov random field structure inherited from a finite-element dis-cretization of the domain, they characterize the posterior distribution of displacements by

58 Chapter 4. Uncertainty Quantification : Is Sparsity a Bane?

Figure 4.1: Graphical representation of the probabilistic registration model. The process ofgenerating the data D involves a transformation Ψ of space, and some noise whose modelis controlled by a set of parameters P . Hyperpriors (with hyperparametersHP ) are in turnimposed over the noise parameters P . The transformation is parameterized by weightswk on a predefined overcomplete set of basis functions φk, k = 1 · · ·M. Priors on thetransformation smoothness and on the relevance of individual bases introduce additionalparameters (λ and zk′ respectively). Random variables are circled, whereas deterministicparameters are in squares. Arrows capture conditional dependencies. Shaded nodes areobserved variables or fixed hyperparameters. The doubly circled node indicates that thetransformation Ψ is fully determined by its parent nodes (the φk andwk). The plate standsfor M (replicated) groups of nodes, of which only a single one is shown explicitly.

Gibbs sampling. Risholm et al [Risholm 2013] extend the approach to the case of un-known confidence on the observed data and on model priors respectively, aiming to ad-dress the critical issue of finding an objective trade-off between data fit and regularity-inducing priors. The so-called temperature hyperparameters are treated as latent vari-ables and approximately marginalized over, while an MCMC chain with full dimensionalMetropolis-Hastings transitions traverses the space of transformation parameters. Simp-son et al [Simpson 2012] propose an alternate methodology based on variational Bayesinference. Using a parametric (FFD) representation of the displacement field, the poste-rior distribution is approximated within a ‘convenient’ family for which transformationparameters and model hyperparameters factorize. As a result, estimates of uncertaintyquantify variability in the displacement field conditionally to the inferred hyperparameters,but disregard uncertainty induced by hyperparameter variability. The variational factor-ization nevertheless proves to be computationally advantageous. This work and ensuingdevelopments [Simpson 2013] also expose crucial limitations of classic generative modelsof data for registration. Although uncertainty quantification is peripheral to their work,

4.1. Introduction 59

Richard et al [Richard 2009] develop for the related task of atlas building a mixed SAEMand MCMC approach where nodes of the finite-element mesh are updated via Metropolis-Hastings-Within-Gibbs transitions; Zhang et al [Zhang 2013] implement a mixed SAEMand Hybrid Monte Carlo approach for a Bayesian MAP estimation of the template and oftemperature hyperparameters in a diffeomorphic setting.

Our contribution is twofold. Whist applied to a particular model of registration intro-duced in earlier work [Le Folgoc 2015b], our inference schemes open broader avenues forefficient and objective comparison between competing registration models. At a high level,the space of transformation parameters is explored by a reversible jump Markov chain. Re-versible jump MCMC [Green 1995] provides a principled mechanism to elegantly jumpbetween competing models of varying dimensionalities while sampling the posterior den-sity of interest. In this work, reversible jump transitions allow to seamlessly refine theparametrization of the transformation, adapting the granularity of the parametrization tothe granularity of the underlying motion and the local informativeness of the image, all thewhile exploring the most likely deformations. As it carefully avoids the computation of so-called Bayes factor, the method proceeds significantly faster than state-of-the-art MCMCinference schemes for registration. At a lower level, we capitalize on closed form marginal-ization of most nuisance variables, and integrate second-order knowledge of the posteriordistribution in proposal kernels. This yields an algorithm that reliably and consistentlytraverses the parameter space towards the most likely deformations in spite of the modelintricacies.

The proposed ‘sparse Bayesian’ model of registration [Le Folgoc 2015b] is adaptedfrom the Relevance Vector Machine (RVM) devised by Tipping et al [Tipping 2001,Tipping 2003] for the purpose of classification and regression. In these articles approx-imate inference is achieved by marginal likelihood maximization (the ‘evidence frame-work’), which provides a point estimate of the sparsity-governing hyperparameters. Anapproximate Gaussian posterior for the regressor parameters, conditionally to the optimalset of hyperparameters, is then derived. Rasmussen et al [Rasmussen 2005] exposed thepeculiar behaviour of the RVM in terms of predictive uncertainty, yielding lower uncer-tainty away from data (see e.g. [Quiñonero-Candela 2005]). Several ad-hoc methods weresubsequently suggested to circumvent this behaviour. We adopt a different stance, show-ing that while the approximate posterior derived by evidence maximization suffers frominconsistent predictive uncertainties, the exact posterior resulting from the proposed sparseBayesian model does not.

The chapter unfolds as follows. In part 4.2 we describe the sparse Bayesian model ofregistration and devise a principled strategy for exact inference. The proposed design ofthe Markov chain exploits insight gained into the model to bypass standard impediments ofMCMC schemes. Hyperparameter uncertainty is fully accounted for by marginalization ofthe nuisance variables. In part 4.3 we review breakdown scenarii in which the approximateposterior significantly departs from the true posterior, leading to poor approximate predic-tive uncertainty. In part 4.4 we conduct preliminary experiments to assess the validity ofMCMC uncertainty estimates.


4.2 Statistical Model and Inference

Registration infers, from prior knowledge and limited data D, a transformation of space Ψ

that pairs homologous features in objects of interests (e.g. organs or vessels, in a medicalsetting). The section starts with a succinct description of the registration model, developedby the authors in Chapter 3, and offers insight into its mechanisms. Fig. 4.1 provides agraphical representation thereof. An MCMC approach for systematic characterization ofthe posterior distribution is then devised.

4.2.1 Bayesian Model of Registration

4.2.1.1 Likelihood model

The generative model of data makes explicit the relationship between the data D and thespatial mapping Ψ. It is specified by a likelihood model p(D|Ψ;P ) (often conditionedon a set of hyperparameters P ) that typically assumes the form of a Boltzmann distribu-tion p(D|Ψ;P ) ∝ exp−ED(D,Ψ;P ). For landmark registration, a transformation thatapproximately maps corresponding key points ti and Ti, i = 1 · · ·N , between a tem-plate object and a target object is sought. A standard choice of energy is the sum of squared

Figure 4.2: Graphical representation of the generative data model. Graphical codes areidentical to those of Fig. 4.1. Residuals between the fixed image J and the warped imageI Ψ−1 are assumed to be distributed according to a mixture of L Gaussian componentswhose parameters ρl (probability of falling in the lth component) and βl (inverse variancea.k.a. precision parameter for the lth Gaussian component) are regarded as latent variables.cn′ ∈ 1 · · ·L assigns the corresponding voxel to one of the L mixture components.

4.2. Statistical Model and Inference 61

distances between pairings, up to multiplicative factor:

ED(D,Ψ;β) =β

2

N∑

i=1

‖Ti −Ψ(ti)‖2 . (4.1)

For pairwise registration of a fixed image J and a moving image I , a mixture-of-Gaussiansmodel of intensity residuals is adopted. At voxel center vi, the intensity residual ri =

J(vi) − I[Ψ−1(vi)] is assigned to the lth component of the mixture, 1 ≤ l ≤ L, if the L-way categorical variable ci ∈ 1 · · ·L takes value l. If so the residual ri follows a normaldistribution N (0, β−1

l ). The component assignment ci follows a categorical distributionand takes value l with probability ρl, normalized such that

∑Ll=1 ρl = 1. For distinct

voxels vi and vj , residuals ri and rj (resp. component assignments ci and cj) are assumedto be independent. The corresponding GMM energy ED(D,Ψ;β,ρ) is given by Eq. (4.2),with Zl =

√2π/βl a normalizing constant:

−N∑

i=1

log

L∑

l=1

ρlZl

exp−βl2

(J [vi]− I[Ψ−1(vi)]

)2 (4.2)

Fig. 4.2 summarizes this model of data in graphical form. The assumption of independenceof voxelwise residuals is known not to hold (see e.g. [Simpson 2012, Le Folgoc 2015b])and to affect the outcome of the probabilistic registration. Since a proper probabilisticaccount of correlations in intensity residuals is both beyond the scope of this work andirrelevant to the ensuing developments, the Virtual Decimation scheme of [Simpson 2012]is reproduced instead for simplicity.

4.2.1.2 Transformation parametrization

A small deformation standpoint is adopted for convenience. The displacement field u: x ∈Ω ⊂ Rd 7→ u(x) = Ψ−1(x)− x ∈ Rd is parametrized by a linear combination of M basisfunctions φk(.) with associated weight wk ∈ Rd:

u(x) =∑

1≤k≤Mφk(x)wk = φ(x)ᵀw . (4.3)

φ(x) =(φ1(x) · · · φM (x)

)ᵀand wᵀ =

(wᵀ

1 · · · wᵀM

)are respectively the concatenation,

for k = 1 · · ·M , of φk(x) and wk. Arbitrary choices of basis functions φk are possible.B-splines (e.g. [Rueckert 1999]) present desirable properties in terms of smoothness andinterpolation. Here the φk’s instead consist of multiscale Gaussian radial basis functions(RBFs) whose centers lie on a regular grid of points (typically, decimated voxel centers).Multiscale Gaussian RBFs possess attractive analytical and computational properties.

4.2.1.3 Transformation priors

The weights w are endowed with a generalized Spike-&-Slab prior that favours bothsmoothness of the resulting displacement field and sparsity in its parametrization. Theproperties of this prior are central to the proposed ‘sparse Bayesian’ modelling and to our


analysis thereof. Each basis φk is assigned a distinct activation variable zk that controlsits inclusion in the active parametrization (or exclusion therefrom). If zk = 0 the basisφk is pruned out of the active parametrization. We do so by designing p(wk|zk = 0) asa Dirac distribution centered at 0. If zk = 1 the basis φk is included in the parametriza-tion. The prior on such bases is designed as a joint, structured Gaussian distribution thatpenalizes lack of smoothness in the induced displacement field. Let us denote by S theset of such indices k for which zk = 1 and by wS the concatenation of the correspond-ing subset of weights wk, k ∈ S. For an arbitrary linear differential operator D, wewish to penalize high values of the quadratic energy ‖Du‖2 = wᵀ

SRSwS , where RSis the |S| × |S| matrix whose k, l-th coefficient is 〈Dφk|Dφl〉. The Gaussian distribu-tion N (wS

∣∣0, λ d|S|−1R−1S ) is a natural choice of prior for p(wS |S), that we adopt

henceforth. Note that the covariance normalization by d|S|, where d is the image di-mension, departs from that of [Le Folgoc 2015b]. Under this prior λ d|S| · wᵀ

SRSwSis χ2(d|S|) distributed so that λ immediately relates to the expectation of the energy:Ep(wS |S)(‖Du‖2) = λ−1 and Ep(w)(‖Du‖2) = λ−1. The prior over all weights w condi-tioned on the state of the gate variables z =

(z1 · · · zM

)ᵀis best summarized in the form

of Eq. (4.4), where −S is the complement of S:

p(w|z, λ) = N (wS∣∣0, 1

λ d|S|R−1S ) · N (w−S

∣∣0,0) . (4.4)

4.2.1.4 Hyperpriors

Parameters introduced in the specification of priors are in turn treated as latent variables. λis endowed with a Gamma prior Γ(λ|a0, b0) that is conjugate to p(w|z, λ). The parametersβl (resp. β) involved in the likelihood model for image (resp. landmark) registration areendowed with independent Gamma priors Γ(βl|γ0, δ0). The noise mixture proportionsρ = ρ1 · · · ρL are assigned a Dirichlet prior Dir(ρ|κ), with κ = (κ1 · · ·κL).

Independent Bernoulli priors B(zk|πk) on each zk constitute a natural, conjugate hy-perprior specification for the activation variables z. The positive mass 1−πk concentratedat wk = 0 as a result explicitly encodes sparsity. Assuming all πk = π0 to be equal,all parametrizations using the same number of active bases |S| are a priori equally proba-ble. In addition the cost of including a new basis in the active parametrization is indepen-dent of the current number of active bases. However, we opt instead for a stronger prior,p(z) ∝ Γ(d|S|2 )−1. The Gamma function Γ(·) is a natural extension of the (integer) fac-torial to real values, yielding a prior that increasingly penalizes each new inclusion. Thisprior was found to perform better w.r.t. sparsity, as can be theoretically argued from theanalysis of the marginal prior p(w|z).

4.2.2 Model analysis

4.2.2.1 Marginal prior and marginal likelihood

Critical insight into the statistical model can be gained by considering the prior p(w|z,H)

and likelihood p(D|w, c,H) with so called temperature parameters λ and β marginalized


over, e.g.:

p(w|z,H) =

∫

R+

p(w|z, λ,H)p(λ|H)dλ . (4.5)

The multivariate Student distribution tν(·|µ,Λ) with location parameter µ, inverse scalematrix Λ and ν degrees of freedom naturally appears in analytic derivations (cf. AppendixA.5), yielding the following expressions for the prior and likelihood:

p(w|z,H) = N(w−S

∣∣0,0)tνλ

(wS

∣∣∣0, a0

b0d|S|RS

)(4.6)

p(D|w, c,H) =L∏

l=1

tνl

(Il Ψ−1

w

∣∣∣Jl,γ0

δ0I

)(4.7)

where νλ = 2a0, νl = 2γ0, S is the set of active bases and |S| = ∑k zk its cardinal. Jl =(

· · · J [vi] · · ·)ᵀi|ci=l

is the vector of voxel values in image J , for those voxels assigned

to component l, and Il Ψ−1w =

(· · · I[Ψ−1

w (vi)] · · ·)ᵀi|ci=l

is similarly defined for the

warped image I Ψ−1w . For a fixed choice of active bases z, the posterior distribution

of the weights p(w|z, c,D,H) is proportional to the product of the prior Eq. (4.6) andlikelihood Eq. (4.7). In the limit of uninformative hyperpriors a0, γ0 → 0, β0, δ0 → 0 andassuming L = 1 for the sake of illustration,

p(w|z, c,D,H) ∝ N(w−S

∣∣0,0) 1

χlik[w]N1

χpr[w]d|S|. (4.8)

where χlik[w]2 is the data error and χpr[w]2 = ‖Duw‖2 the regularizing energy. In particu-lar the posterior distribution is invariant to rescaling of the data error, and hence to rescalingof the intensity profile, after marginalizing over temperature parameters. Note also that, fora fixed parametrization z, the ratio of posterior probabilities of two distinct parameter setsw1 andw2 may become arbitrarily overwhelmed by the prior as the number of bases in theparametrization grows (|S| N ). If not for sparsity, this might render MCMC character-ization of the posterior unreliable (using e.g. Metropolis Hastings transitions), potentiallymaking its outcome dependent on the size of the parametrization. Fortunately the proposedsparse model has a clear mechanism to prevent overparametrization and render overlappingbases largely mutually exclusive, as discussed next.

4.2.2.2 Prior probability of basis inclusion

Interactions between overlapping bases can be better understood by looking at the proba-bility p(zk|w−k, z−k,H) of inclusion of a new basis zk given a known configuration z−kfor the other bases and their associated weightsw−k. The statew−k of other bases informsus about the expected regularity of the signal uw, introducing dependencies between zk andz−k conditionally to w−k. Denoting by z (resp. z) the state with zk = 1 (resp. zk = 0),we see from Bayes’ rule that:

p(zk = 1|w−k, z−k)p(zk = 0|w−k, z−k)

=p(w−k|z)

p(w−k|z)

p(z)

p(z)(4.9)


where the dependence on hyperparameters is made implicit for convenience of notations.Leaving details of derivations aside, we note that in the limit of uninformative values, theratio of Eq. (4.9) takes the form of

p(z)

p(z)

( |κk||Rk,k|

)1/2(

1−µkᵀpr Rk,kµ

kpr

wᵀ−kRSw−k

)−d|S|2

(4.10)

where S is the set of active bases (excluding k), µkpr = −R−1k,kR

ᵀkw−k and κk = Rk,k −

RᵀkR−1S Rk. The middle factor penalizes the inclusion of basis k if it overlaps with bases

in the active set S, in the sense of the metric induced by R. κk is a measure of overlapof basis k with all bases in the active set S and is null if basis k is perfectly collinear toS. The right most factor favors the inclusion of basis k if it is a priori expected to yield asignificant increase in regularity.

4.2.3 Posterior Exploration by MCMC Sampling

For any set of points X = x1 · · ·xn in the admissible domain Ω, consider the vectorof displacements uᵀ

X =(u(x1)ᵀ · · · u(xn)ᵀ

). We wish to characterize the joint posterior

distribution p(uX |D,H) of any such vector of displacements for any discrete set X . Tothat aim we merely need to characterize the posterior distribution p(w|D,H) of the weightsw involved in the parametrization of the transformation Ψ−1 sufficiently well.

4.2.3.1 Related work

MCMC methods are tools of predilection to explore arbitrarily complex distributions in aprincipled manner. Gibbs sampling [Geman 1984] cycles between latent variables, sam-pling from their conditional distributions in turn while other model variables remain fixed.It is attractive when conditional distributions are known in closed form whereas the jointdistribution is untractable or computationally costly to sample. When the conditionalcannot be sampled directly, a component-wise proposal may be used instead within aMetropolis-Hastings (MH) step (Metropolis-Within-Gibbs). Unfortunately, Gibbs sam-pling of temperature parameters is prone to failure, with the chain drifting away fromregions of high probability for the duration of any finite MCMC run. Collapsing tem-perature parameters λ, β when sampling regressor variables w is highly opportune. Inthe context of registration, Risholm et al. [Risholm 2013] propose a MH scheme wheremarginalizing over temperature parameters induces the expensive computation of partitionfunctions, for which an intricate procedure based on Laplace approximations is designed.In the proposed model, the computation of partition functions (specifically, marginal like-lihoods, a.k.a. evidences) may arise as well when sampling gate variables zk. Selecting aspecific configuration z can be interpreted as a choice between competing models of vary-ing complexity and dimensionality. The problem of estimating the evidence for a modelis well studied in the statistical literature. A variety of methods exist, ranging from thestraightforward Laplace approximation to more principled approaches typically exploitingsamples from the (possibly augmented) posterior, including Chib’s method [Chib 1995],


importance sampling, bridge sampling, path sampling (see e.g. [Gelman 1998]) and re-versible jump MCMC [Green 1995]. The latter approach is in fact primarily concernedwith sampling from a posterior distribution involving competing models (freely jumpingbetween models in the process) and merely obtains evidence ratios as a byproduct. Re-versible jump MCMC is appealing in our setting where competing models z are organizedin series of nested models of increasing complexity, rendering its machinery mostly invis-ible. Reversible jump MCMC proceeds in the general framework of Metropolis-Hastings,hence a sound proposal must be crafted. We derive a sensible family of proposals from amodal analysis of the posterior distribution.

4.2.3.2 Modal analysis of the posterior & proposal

For the model described in 4.2.1, the Laplace approximation of the (conditional) posteriorp(w|D, z, c,H) around its mode w∗ = arg maxw p(w|D, z, c,H) takes the followingform:

− log p(w|D, z, c,H) ≈1

2

γ0 +N/2

δ0 + χ2lik/2

(T∗ −Φw)ᵀH∗(T∗ −Φw)+

1

2

a0 + |S|/2b0 + χ2

pr|S|/2d|S|wᵀRSw + const . (4.11)

where for the sake of illustration we take a single component mixture (L = 1, ci = 1

for all i, β = β). χ2pr = wᵀ

∗RSw∗ is the energy in the displacement field, χ2lik is the

data error χ2lik =

∑Ni=1(J [vi] − I[Ψ−1

∗ (vi)])2 and we discard higher order terms in b0, δ0.

T ᵀ∗ =

(T ᵀ

1∗ · · ·T ᵀN∗)

is a set of virtual pairings whose value does not depend on β, λ.H∗ is a block diagonal matrix whose ith diagonal block H∗i is the d × d precision matrixassociated to the ith virtual pairing Ti∗. The factors stemming from the marginalization:

β∗ =γ0 +N/2

δ0 + χ2lik/2

, λ∗ =a0 + |S|/2b0 + χ2

pr|S|/2(4.12)

are commensurable to temperature parameters. The approximation of the conditional pos-terior is Gaussian (Eq. (4.11) is quadratic) and admits the more obvious canonical formN (µ,Σ), with µ = ΣΦᵀ(β∗H∗)T∗ and Σ = (Φᵀβ∗H∗Φ + λ∗|S|RS)−1. The Laplaceapproximation provides a reasonable approximation of the posterior and a judicious start-ing point to design proposals. Component-wise proposals that leave most of the activationvariables zl and the corresponding weights wl unchanged will be of particular interest tous (cf. section 4.2.3.3). A natural idea is to use the conditionals wk ∼ N (µkpos,Σk) ofthe Laplace approximation N (µ,Σ) as proposal distributions. Because they neither re-quire the actual computation of µ and Σ nor involve inner products φᵀk(β∗H∗)φl, these‘Gibbs-like’ proposals are computationally appealing. As a final tweak to alleviate modalassumptions, we reintroduce dependency on the current value ofwk, yielding the followingcomponent-wise proposal instead, with 0 ≤ rHMALA ≤ 1 and s ≥ 1:

qk(wk → wk) = N (wk |mk(wk), sΣk) (4.13)


mk(wk) = (1− rHMALA)wk + rHMALA µkpos (4.14)

If not set to 1, the factor s accounts for potentially fatter tails of the true conditional pos-terior in the proposal. µkpos and Σk depend on H∗ and T∗, which in the formal reasoningbased on the Laplace approximation are computed around Ψ−1

∗ (·) = Id +φ(·)ᵀw∗. In factT∗ and H∗ can be replaced by Tw and Hw computed from a (local) quadratic approxima-tion of p(w|D, z, c, λ∗, β∗) around the current Ψ−1(·) = Id + φ(·)ᵀw. In that case Eq.(4.13), (4.14) exactly coincide with a component-wise Hessian preconditioned Metropo-lis Adjusted Langevin Algorithm (HMALA) [Roberts 1996, Girolami 2011, Zhang 2011],which exploits first and second order local information about the target distribution for in-creased efficiency. However the local approximation generates additional computations ateach step and offers little gain if we expect the posterior to be unimodal. Given our experi-mental settings, we use the global approximation with adaptation during the burn-in phase(at that stage λ∗, β∗, T∗ and H∗ are recomputed every few iterations from statistics |S|,χ2

pr, χ2lik averaged with decaying weights over past samples).

4.2.3.3 Reversible jump MCMC scheme

The groundwork for this scheme was laid in sections 4.2.2.1, 4.2.2.2, 4.2.3.2.The reversible jump procedure itself lets us generate samples of the joint posteriorp(w, z, c|D,H) with temperature parameters marginalized over. Dropping irrelevantvariables in the generated samples, we obtain samples of the marginals of interest, e.g.p(w|D,H). The reversible jump scheme simply proposes to move from a current statew, z, c to a new state w, z, c and computes a Metropolis-Hastings acceptance ratio for theproposal, leading to acceptance or rejection of the new state. For the sake of simplicity,proposals for a new state of w, z may be made separately from those of c. For the latter,the most natural proposal exactly results in collapsed Gibbs sampling of each ci, see e.g.[Murphy 2012]1. Forw, z we design basic moves that – when combined – allow to add, re-move or switch active bases as well as update several components ofw. These basic movesare combined to craft proposal distributions Q(w, z → w, z) for which the probability ofa movew, z → w, z has direct symmetries with that of the reverse move w, z → w, z, sothat the acceptance ratio

min(

1,p(w, z, c|D,H)

p(w, z, c|D,H)

Q(w, z, c→ w, z, c)

Q(w, z, c→ w, z, c)

)(4.15)

becomes particularly straightforward to compute. The basic moves are:a) Basis removal. For a basis k such that zk = 1, set zk = 0 and wk = 0. The

symmetric move is the basis addition.b) Component-wise update. For a basis k such that zk = 1, propose a new wk ∼

qk(wk → wk) according to Eq. (4.13), (4.14) with a fixed 0 ≤ rHMALA ≤ 1. This move isits own symmetric (using the reverse update).

1A complete and concise summary of the relevant derivations and schemes is given in http://www.kamperh.com/notes/kamper_bayesgmm13.pdf

http://www.kamperh.com/notes/kamper_bayesgmm13.pdf

http://www.kamperh.com/notes/kamper_bayesgmm13.pdf


Algorithm 2: Proposal Qk(w, z, reverse_traversal→ w, z, reverse_traversal∗).nneighb is an integer fixed in advance.Set w = w, z = z.

Draw one of 3 competing events: on-off , exchange, update.

if φk inactive and update thenExit. No action to implement (as wk = 0).

if exchange and φk active thenDraw an inactive basis φk∗ to replace φk. A proposal that favours well-aligned bases isdesigned.

else if exchange and φk inactive thenDraw an active basis φk∗ to replace φk. A proposal that favours well-aligned bases isdesigned.

if φk active and on-off or exchange thenSet zk = 0 and wk = 0.

if on-off or update then

• For update: set I = k.• For on-off : Set I ⊂ S\k to a list of nneighb active bases, favouring bases well-aligned withφk.

• If reverse_traversal = 1, reverse the ordering of I.

for l ∈ I downew

l ∼ ql(w → wnew) and wl = wnewl

if φk inactive and on-off thenSet zk = 1 and wk ∼ qk(wk → ·), using rHMALA = 1.

else if φk inactive and exchange thenSet zk∗ = 1 and wk∗ ∼ qk∗(wk∗ → wk∗) (rHMALA = 1).

if on-off or exchange thenSwitch the state of the binary variable reverse_traversal.

c) Basis addition. For a basis k such that zk = 0, set zk = 1 and propose a newwk according to Eq. (4.13), (4.14) with rHMALA = 1. The symmetric move is the basisremoval.

The family of proposalsQk(·) that we design combines these basic moves in such a waythat when reversed, the sequence of moves induced by the proposalQk(w, z, c→ w, z, c)

coincides exactly with the sequence of moves induced by Qk(w, z, c → w, z, c). Theproposal and reverse proposal travel along the same path in opposite directions, drasticallyreducing the computational load when evaluating Eq. (4.15). Each proposal Qk revolvesprimarily around the corresponding basis φk and is defined as per Algorithm 2 (where weintroduced a binary variable reverse_traversal to address technicalities). Using Qk, wedefine a transition kernel Pk conventionally: given the current state wt, zt, ct, we proposea new state c = ct, w, z ∼ Qk(wt, zt → ·). The state is accepted with probability givenby Eq. (4.15), in which case we set (wt+1, zt+1, ct+1) = (w, z, c); otherwise we stayat the current state and (wt+1, zt+1, ct+1) = (wt, zt, ct). Computation of the acceptanceratio is relatively straightforward by construction, since the ratio of posterior probabilities


Figure 4.3: Highlight and rationale for the main constituents of the MCMC scheme.

involved in Eq. (4.15) can be rewritten as:

p(D|w, z, c,H)

p(D|w, z, c,H)· p(w|z, c,H)p(z|H)

p(w|z, c,H)p(z|H)(4.16)

The leftmost factor is a ratio of likelihoods and need only be evaluated once for a proposedtransition. As the denominator is known from the previous iteration, only the numeratorneed be evaluated. In the context of registration, this part corresponds to the image termand would involve costly computations if evaluated repeatedly. Note also that for basisfunctions with compact support (or approximately so), only part of the image term need beupdated to evaluate the ratio. The ratio on the right-hand side and the ratio of proposals aresimply decomposed over the sequence of previously defined basic moves, then efficientlyevaluated using Eq. (4.6), (4.13), (4.14) and expressions similar to Eq. (A.39). For thelatter, statistics κk are kept up to date (for all bases) using efficient rank one updates derivedin [Le Folgoc 2015b]. Alternatively, the necessary statistic κk can be recomputed fromscratch only for the bases under consideration. This is usually much more efficient (cf.algorithmic complexity in 4.2.3.5).

Each transition kernel Pk satisfies a detailed balance condition. In terms of these tran-sition kernels, the MCMC chain proceeds as follows. Random variables k1, k2, . . . takingvalues in 1, 2, . . . ,M are chosen according to some scheme and the corresponding tran-sition kernel Pkt is used at time t. Conventional schemes include the random-scan, wherethe kt are i.i.d uniform, and the deterministic scan that cycles through 1, 2, . . . ,Min natural order (see e.g. [Roberts 2006]). For the random scan, the global transition ker-nel also satisfies detailed balance conditions. For both schemes, the MCMC chain hasstationary distribution p(w, z, c|D,H) after incorporating collapsed Gibbs updates of c.Highlights of the MCMC scheme main constituents are summarized in Fig. 4.3.

4.2.3.4 Markov chain mixing improvement

Similarly to Gibbs sampling of temperature parameters, Gibbs sampling of voxel GMMassignments c within updates separated from those ofw, z potentially hampers the mixingof the Markov chain for any finite, practical duration of the MCMC run. If at any point intime, a data point that should be regarded as an outlier (e.g. an image artifact), or a group of


such points, is assigned to a ‘non-outlier’ mixture component, the disjoint sampling gener-ally causes the chain to remain stuck in the vicinity of the corresponding local mode of theposterior p(w, z, c|D,H): the desired reverse assignment move virtually occurs with prob-ability zero after readjustment ofw, z. This defect is critical as such failure scenarii happenwith overwhelming probability. Fortunately, joint proposals for w, z, c can be designedat little cost, even more so after noting that the component-wise proposals for wk (Eq.(4.13), (4.14)) and zk only indirectly depend on c. The transition Qk(w, z, c → w, z, c)

proceeds in two steps. First, w, z is proposed as per Algorithm 2. Then, c is sampledby component-wise collapsed Gibbs sampling of each ci ∼ p(ci | cj<i, cj>i, w,D,H) inturn. For efficiency, only the subset of voxels in the support of updated basis functionsis sampled, and voxel assignments are updated only once in case of overlapping supports.The two-step move is accepted or rejected based on the acceptance ratio (4.15), replacingc by c where necessary. The order of voxel traversal is reversed according to the stateof reverse_traversal. Sampling c and computing its contribution to the acceptance ratioexclusively involves the residuals ri and ri of updated voxels prior and after the updatew, z → w, z, which were already required to compute the likelihood change in Eq. (4.16).

4.2.3.5 Algorithmic complexity

The algorithmic complexity associated to a transition kernel Pk (proposal and acceptance-reject) isO(|S|·|I+|+

∑l∈I+ Vl+LC), noting I+ the set of updated bases, Vl the number

of voxels in the support of basis φl and C the number of voxels whose assignments ci areresampled. The first term includes part of the cost of the proposal w, z → w, z and itsimpact on the ratio of prior probabilities. The second term is replicated three times andcan be heavily parallelized in each case: once to compute µlpos,Σl in Eq. (4.13), (4.14) forl ∈ I+, twice to evaluate and store differences in the displacement fields (resp. residualimages) over the support of basis functions in I+ following their update. The last termaccounts for all computations related to resampled voxel GMM assignments c. Whena move that involves the inclusion or removal of a basis function from the active set isaccepted, an additionalO(|S|2 +M · |S|) cost is involved to maintain statistics κk over allbases in the dictionary, with the right-hand term being parallelizable intoM disjointO(|S|)operations. The O(M · |S|) cost upon inclusion or deletion of a basis can be replaced by aO(|S|2) cost per proposed move, which is usually more efficient.

4.2.3.6 Initialization

The chain is initialized from the output of the deterministic algorithm presented in[Le Folgoc 2015b] which progresses greedily in the space of parameters z, λ, P towardsa local maximum of their joint posterior. We comment, however, that any registrationalgorithm could reasonably be used to initialize the chain.


4.3 Predictive Uncertainties: Marginal Likelihood Maximiza-tion vs. Exact Inference

The ‘sparse Bayesian’ model presented in Fig. 4.1 is inspired by the Spike-&-Slab modelof Mitchell and Beauchamp [Mitchell 1988] and the Relevance Vector Machine (RVM)proposed by Tipping [Tipping 2001] for tasks of regression and classification. In the latterwork, the author approaches the problem of inferring an optimal sparse regression functionfrom the standpoint of Automatic Relevance Determination (ARD). Point estimates of thehyperparameters that govern basis selection (and in fact of all hyperparameters) are soughtin a first step by maximizing the marginal likelihood or evidence as per Eq. (4.17):

θ∗ = arg maxθ p(D|θ,H)

= arg maxθ

∫

wp(D|w,θ,H)p(w|θ,H)dw

(4.17)

where θ = z, P, λ using our notations. If non-uniform, proper hyperpriors on θ areassumed, θ∗ maximizes the posterior p(θ|D,H) ∝ p(D|θ,H)p(θ|H) instead. In a secondstep, the distribution of weights wk is characterized conditionally to the selected model,

p(w|D,H) ≈ p(w|θ∗,D,H) . (4.18)

This strategy is typically successful in reaching strongly sparse solutions with good pre-dictive power but, above all else, is motivated by its computational efficiency. Dedicatedschemes relying on linear algebra and rank one updates make it possible to efficiently, it-eratively build the set |S| of relevant basis functions φk from scratch. See for instance[Tipping 2003], and [Le Folgoc 2015b] for an extension to the wider family of priors re-quired for registration tasks. The approximation of Eq. (4.18) is justified by observingthat the full posterior p(w|D,H) is obtained by summing over all conditional posteriorsp(w|θ,D,H), conditioned on the value θ, weighted by the posterior probability p(θ|D,H)

for this value:p(w|D,H) =

∫

θp(w|θ,D,H)p(θ|D,H)dθ (4.19)

Now if the available data D is informative enough, p(θ|D,H) will be sharply peakedaround its mode(s). In the limit case where p(θ|D,H) is a Dirac centered at its single modeθ∗, Eq. 4.18 is retrieved exactly, and the two-step scheme outlined in Eq. (4.17), (4.18) isjustified. Moreover in the case of sparsity governing parameters z =

(z1 · · · zM

)ᵀ, Tipping

[Tipping 2001] argues that, even if several combinations of parameters are highly proba-ble due to the presence of redundant functions φk in the dictionary of bases, they shouldroughly lead to the same optimal solution u∗ and an approximate mode (or the expecta-tion) of p(u|D,H) should still be correctly evaluated. Regardless, we now demonstratewhy this evidence-based approximation will typically fail to properly approximate higherorder moments of the full posterior, resulting for instance in poor approximation of the realpredictive uncertainty. There are two main breakdown situations for the evidence-basedapproximation of the full posterior assumed in Eq. 4.18.

Firstly in absence of data, the assumption that the posterior distribution p(θ|D,H)

of hyperparameters is well approximated by a Dirac collapses. Indeed the posterior then

4.4. Preliminary Experiments and Results 71

Figure 4.4: Comparison of approximate evidence-based inference and faithful MCMC in-ference for the sparse Bayesian model, on a 1D regression task. Data points (black dots)are sampled with additive i.i.d. Gaussian noise from the true signal (dashed green line).The consistence of the fast and faithful estimates of the regressor function (black lines) issatisfactory (w.r.t. uncertainty levels), even more so in the presence of data. Esimates ofuncertainty (grey ribbon), however, can be inconsistent.

resembles the prior distribution p(θ|H), which is typically flat. This scenario is aproposin the case of basis selection parameters zk, since associated basis functions φk have alocal support over which reliable data may be missing. Away from data and without strongincentive to include the basis to increase the deformation regularity, the probability of basisinclusion (resp. exclusion) is πk (resp. 1− πk), and for neutral values of πk, the choice ofexcluding the basis is arbitrary.

Secondly and even in presence of data, many combinations of active bases could havequasi-identical probability. When using radial basis functions for instance, the location ofbasis centers can be slightly perturbed without significantly affecting the posterior prob-ability of the new configuration. The optimal value of basis weights w under two suchperturbations will slightly differ however, as well as the resulting transformation Ψ. Theevidence-based approximation of Eq. (4.18) relies on a single – perhaps only marginallysuperior – configuration, whereas the true posterior sums over all such configurations, asseen from Eq. (4.19). As it turns out, ‘basis wiggling’ accounts for a significant part of theuncertainty.

4.4 Preliminary Experiments and Results

The following experiments aim to qualitatively evaluate the consistency of approximateevidence-based and faithful MCMC inference, as well as provide insight into the MCMCinference scheme, starting again from the example 2D registration of Chapter 3 as shownon Fig. 4.5.

For the approximate-based inference, the methodology of Chapter 3 is used withoutchange. The multiscale dictionary is identical, using Gaussian RBFs at three differentscales (isotropic, σ = 6mm, 12mm and 24mm). For the MCMC scheme, the chain was


Figure 4.5: 2D registration setting: fixed and moving images.

run for roughly 7·105 transitions and 500 samples were regularly extracted. Approximately7 · 104 additional samples were discarded as part of the burn-in phase, during which theparameters of the proposal distribution were fine-tuned (cf. section 4.2.3.2). This tuningrelies on a set of sufficient statistics, such as the average energy and the average voxel-wise square intensity residuals per sample. The averages are computed using a schemethat downweights the early samples, typically using a first phase of fixed learning rate be-fore reverting to a classical (inverse linear) weighting, drawing inspiration from the SAEMscheme of e.g. [Richard 2009]. The free parameter s controlling the spread of proposalscompared to the second-order approximation of the posterior (section 4.2.3.2) was set to 1

(spread unchanged). The observed acceptance rate varies between 20–45% under reason-able variations of the experimental setting, and between 27–34% during the run of interestunder the settings described above. Examples of samples are reported in Fig. 4.6. As anorder of magnitude, the run takes 20 minutes on a standard laptop with a naive implemen-tation.

Fig. 4.7 reports the mean displacement reported respectively by the evidence-basedinference scheme and by the MCMC inference scheme. As anticipated from the discussionof section 4.3, very good agreement between the evidence-based and MCMC-based esti-

Figure 4.6: Examples of samples returned by the MCMC run.

4.4. Preliminary Experiments and Results 73

Figure 4.7: Comparison of the approximate evidence-based posterior mean displacementvs. the MCMC-based posterior mean.

mates of the displacement is observed. Upon close inspection, minor differences are notedin some areas with flat intensity profiles or otherwise low confidence (such as that resultingfrom artefacts, or disagreeing intensities in the fixed and moving image). Their magnitudeis lower than the level of uncertainty in the output of registration, as estimated from theMCMC scheme.

Fig. 4.8 reports the estimates of uncertainty obtained from the MCMC characteriza-tion of the posterior. Any relevant statistics can be estimated empirically from the set ofsamples returned by the run. To study the spatial localization of uncertainty, we visualizeat each voxel center xi the 2 × 2 empirical covariance matrix of the posterior distributionp(u(xi)|I, J,H) of the corresponding displacement vector u(xi). This is reasonable un-der the assumption that the posterior on displacements is approximately mono-modal andGaussian. The voxelwise empirical covariance matrix, or its square root (homogeneousto a standard deviation), can be visualized as a 2D tensor that encodes uncertainty at thispoint along any direction. Fig. 4.8 displays the resulting tensor map (Left) and a scalarsummary (Right).

The order of magnitude of reported uncertainties (typically ∼ 1mm for a 95% confi-dence interval) is consistent with both the magnitude of the underlying motion (no morethan 5mm, see fig. 4.7) and the resolution (voxel dimensions: 1.25mm×1.25mm). As ex-pected, uncertainty is higher in regions with little structured content (no intensity gradients)and in the direction of contours.

Finally, Fig. 4.9 demonstrates the benefit of a careful design of the Markov chain. Theleft-most figure displays the estimated mean displacement, under the afore-mentioned ex-perimental setting, if moves in the space of transformation parameters are done separatelyfrom the resampling of voxelwise assignments to components of the noise mixture insteadof jointly (right-most figure). In this example, a local discrepancy in the intensity profilesof the fixed and moving images induces a spurious maximum in the joint posterior distri-bution of transformation parameters and voxel labels (cf. section 4.2.3.4). A systematic


Figure 4.8: Estimates of uncertainty obtained by characterizing the posterior distributionof the sparse Bayesian model by MCMC sampling. (Left) Tensor visualization of thedisplacement uncertainty: each tensor encodes the square root of the empirical 2 × 2 co-variance displacement matrix at this location. Color scheme: direction of first eigenvector.(Middle) Moving image (Right) Trace of the square root covariance.

and indefinite drift towards this mode was observed in all runs where the sampling wasperformed in an alternated manner, whereas systematic recovery was observed under theimproved scheme. Similar observations were made in preliminary experiments where tem-perature parameters were treated by Gibbs sampling instead of analytically marginalizedover.

4.5 Discussion and Conclusions

In this chapter we explored the properties of the proposed sparse Bayesian model of regis-tration for the purpose of uncertainty quantification. We emphasize the distinction betweenthe Bayesian model itself and inference schemes used to characterize the distributions ofinterest within this model. In particular we presented in the previous chapter a greedyevidence-based approximate inference scheme. The question underlying the present work,motivated by [Quiñonero-Candela 2005, Rasmussen 2005], is twofold. Does this approxi-mate inference procedure provide consistent estimates of the moments of the true posteriordistribution of displacements, specifically of the expectation and variance? And perhapsmore fundamentally, are the estimates of displacement and uncertainty provided by the truesparse model useful and sound?

To this aim, we proposed a reversible jump (transdimensional) MCMC scheme forthe systematic, quasi-exact characterization of the posterior distribution. Special care wastaken in the design of the chain to ensure proper mixing: nuisance variables such as tem-perature parameters (noise and regularization levels) and mixture parameters were analyti-cally marginalized over, while transformation parameters and voxel assignments to mixturecomponents were jointly sampled. This was shown to prevent the chain from indefinitelydrifting towards poor local maxima of the posterior on examples of practical relevance.This work hints at the good empirical correspondence between the optimal estimates of

4.5. Discussion and Conclusions 75

Figure 4.9: Comparison of estimates of the posterior mean returned by MCMC character-ization. (Left) Alternated sampling of on the one hand, activation variables z and corre-sponding weights w, on the other hand voxel mixture assignments c. (Right) Joint sam-pling, as per the approach proposed in section 4.2.3.4.

displacement returned by the approximate and exact inference schemes, particularly in thepresence of informative data. On the other hand, it evidences limitations of the approxi-mate evidence-based scheme for uncertainty quantification even in simplified experimentalsettings. This observation is supported by simple insight on the underlying approximationsmade by the evidence-based scheme, and how it proceeds to select the active parametriza-tion.

Despite limitations of the approximate scheme, preliminary experiments hint at thesoundness of the true uncertainty estimates (as characterized from the MCMC analysis).Of course, a major factor affecting the validity of uncertainty estimates is the soundnessof the model assumptions. Various model biases can plague the quality of uncertainty es-timates in the context of registration, such as the assumed model of interpolation of imageintensities and the inadequacy of the noise or prior models. A thorough, application-drivenevaluation of the quantified uncertainty, assessing the respective impacts of these factorsshould be conducted: so far, this work contributes by providing the methodological frame-work and schemes to do so. As a methodological perspective, the Bayesian frameworkand the proposed MCMC inference scheme in fact allow to relax even further both theprior assumptions (e.g. allowing the structure of the quadratic prior to be learned) and theobservation model.

To address limitations of the evidence-based scheme, approximate inference schemesthat correct identified causes of breakdown can be devised, for instance by keeping trackof several sets of relevant explanatory variables via tree structures [Schniter 2008]. Infact, we note instead the satisfactory computational complexity of the proposed MCMCapproach and the high potential for parallelization, which could make it usable in the re-search routine even on real clinical data. Moreover classic techniques used to accelerateregistration optimization schemes (subsampling of voxels, sensible scanning of the multi-


scale parametrization) remain applicable in the proposed framework, with some care. In theproposed MCMC approach, we did not make use of full-dimensional moves over the spaceof transformation parameters. Calibrating such transitions calls for the particularly expen-sive computation of large (non-diagonal) Hessian matrices, which render them inefficientunless e.g., exploiting dedicated procedures inspired from limited memory quasi-Newtonmethods [Zhang 2011]. Such moves could be incorporated in the proposed scheme in thefuture, although we found component-wise transitions to be particularly suitable, providedthat the set of active bases must be jointly explored.

CHAPTER 5

Conclusion and Perspectives

Contents5.1 Personalization of cardiac mechanics: contribution & perspectives . . . 775.2 Sparse Bayesian modelling for image registration: contribution & per-

spectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.2.1 Contributions and perspectives on image registration . . . . . . . . 78

5.2.2 Contributions and perspectives on statistical learning . . . . . . . . 79

This thesis was motivated by the following questions. Can we learn cardiac character-istics, such as parameter values governing constitutive mechanics, by observing the cardiacmotion? Can statistical learning make the process faster or more reliable? The question inall its generality remains broad in scope, but this work brings partial answers, insight andmethodological contributions to approach the issue.

This thesis started with a preliminary machine learning approach for personalization ofcardiac mechanics from image-based 3D + t data. The work evidenced high uncertaintyin the estimation of mechanical parameters, with sources of uncertainty typically rangingfrom lack of parameter identifiability when personalizing from purely kinematic data, tomodel biases, to uncertainty and errors in the preprocessing steps of segmentation and mo-tion tracking. We then focused on the development of novel tools for motion tracking,and more generally image registration, enabling uncertainty quantification. The work ex-plored possibilities offered by Bayesian modelling to address open limitations of classicaloptimization-based registration schemes. We demonstrated how sparse Bayesian modellingcould result in greater automation of the registration task and could render possible the useof adaptative observation and prior models (a.k.a. image similarity and regularizing priors).We provided methodological tools for, and insight into the problem of quantifying uncer-tainty in registration, and specifically under the sparse Bayesian registration model. Thecomputational tractability of the proposed Bayesian model was demonstrated on a range ofmodalities on real 3D and 4D cardiac data.

5.1 Personalization of cardiac mechanics: contribution & per-spectives

In [Le Folgoc 2012], we proposed a preliminary machine learning framework for person-alization of cardiac mechanics from image-based data. The space of model parametersis systematically sampled at a training stage: simulations are run for each parameter set,

78 Chapter 5. Conclusion and Perspectives

and the resulting training set is used to learn a ‘causal’ regressor between parameter andobservation spaces. The observation space consists in time sequences of 3D meshes, en-capsulating shape and motion. This is consistent with the output of the mechanical modeland, at test time, such sequences can be obtained once the kinematics have been extractedfrom images by motion tracking. At test time, parameters are optimized so as to minimizethe discrepancy between simulation and observation in the shape and motion space. Toalleviate the computational cost, an intermediate reduced observation space is introduced:a low dimensional descriptor of the shape and motion serves as surrogate for the mesh se-quences, and allows for fast optimization, even with complex optimizers (e.g. simulatedannealing). We proposed to derive the low-dimensional descriptor by linear dimensional-ity reduction in a Hilbert space of smooth 4D currents, exploiting the advantageous spacestructure to guarantee efficient computations and bypass the need for (node-to-node pair-wise) mesh correspondences in the training set and at test time. The framework is highlyparallelizable during the training stage and fast at test time.

The work evidences the issue of parameter identifiability even in simplified syn-thetic experimental settings. While state-of-art sequential optimization techniques[Moireau 2011, Imperiale 2011, Xi 2011, Chabiniok 2012, Marchesseau 2013b] give a par-tial access to uncertainty in the estimation, a dedicated probabilistic framework would letone crucially [Liu 2009a, Brynjarsdóttir 2014] account for multimodality of the distribu-tion of probable parameters, for various sources of biases (e.g. boundary and limit condi-tions, modelization of mechanics, observation model) and for uncertainty in the preprocess-ing (segmentation, motion tracking). Bayesian treatment of the inverse problem remainsso far marginal in the cardiac electromechanical personalization community except for therecent work of [Neumann 2014] and the seminal paper of [Konukoglu 2011] (applied to thepersonalization of the electrophysiology), although plentiful inspiration can be found fromvaried literature in the fields of statistics, inverse problems and computational physics, e.g.[Kennedy 2001, Arendt 2012, Stuart 2010, Kaipio 2011]. Some of the building blocks fora tractable Bayesian analysis are already present in the proposed work: moving from an op-timization to a probabilistic standpoint, the simulated annealing based optimization couldbe replaced with dedicated MCMC schemes for exploration of the posterior distribution;the training phase along with the dimensionality reduction address the more general is-sue of approximating the costly forward model (the simulation and the observation model)with an inexpensive surrogate. The sampling of the parameter space required to build theapproximate model may advantageously be done over sparse grids (e.g. [Ma 2009]) withproper scaling w.r.t. the dimensionality of the parameter space.

5.2 Sparse Bayesian modelling for image registration: contri-bution & perspectives

5.2.1 Contributions and perspectives on image registration

This work developed a probabilistic framework of non-rigid image registration that enablesbetter automation of the registration process thanks to principled hyperparameter inference.

5.2. Sparse Bayesian registration: contribution & perspectives 79

The strategy is successful in finding a good, objective trade-off between image similarityand regularization as in e.g. [Simpson 2012]; but it also allowed for data-driven adaptivityof both the model of intensity residuals (via a mixture model) and of the representation ofdeformations, seamlessly coupling a coarse parametrization to ensure proper extrapolationof the structured image content to textureless, flat-intensity regions, and a finer parametriza-tion to capture local patterns of the observed motion in regions of high confidence. To ourknowledge, it is the first time spatial adaptivity of the parametrization of transformationswas attempted on probabilistic grounds. In our case, this was made easier by the recourseto specific sparsity-inducing priors with straightforward probabilistic interpretations. Inexperiments on clinical data we retained the benefits of fixed, coarsely parametrized FFD(Free Form Deformation) models w.r.t. regularity of the inferred motion, while typicallymaintaining state of the art accuracy. In the context of cardiac motion tracking this may giveopportunities to consistently capture subtle temporal patterns of asynchronous contractionsuch as septal flashes. We also note that the approach was completely generic in terms ofrepresentation of the transformation, exploiting little prior knowledge on the specific ap-plication to cardiac imaging – and specifically no segmentation of the cardiac muscle. Thisproved to be convenient as cardiac segmentation remains a challenging task, but dedicatedprior knowledge or constraints on the parametric representation of transformations couldof course be seamslessly integrated into the model.

As a perspective, moving from a small displacement framework towards a model oflarge diffeomorphic deformation (as in [Arsigny 2006, Ashburner 2007, Arsigny 2003]or [Miller 2002, Beg 2005, Durrleman 2007, Zhang 2013, Sommer 2013]) could beextremely fruitful in the context of intersubject registration, or longitudinal analysiswith poor temporal resolution. In addition, the framework for basis selection developedin this work for pairwise registration could be directly extended to derive a commonparametric atlas of deformations in group-wise analyses. In the present work, the adaptiveparametrization was a means to achieve compacity of representation with gains in termsof computational tractability, in terms of accuracy and smoothness of the transformation.However, if performed at a group scale, the reduced representation could become intrinsi-cally related to the trends and variability in the population, and open avenues for statisticalanalysis. A final aspect left open by our work regarding image registration is the derivationof a statistically coherent, yet efficient, model of images with particular care taken into themodelling of spatial correlations. Spatial correlations of intensity residuals were ignoredin the generative model we proposed, and the over-confidence on data stemming fromthis approximation was corrected on an ad-hoc basis by introducing a joint scheme forvirtual decimation of voxels. While partial solutions to model correlations are broughtto the table by Fourier analysis in the case of spatially homogeneous noise, the diffi-culty lies in accounting at the same time for non-stationarity of the noise patterns in images.

5.2.2 Contributions and perspectives on statistical learning

Although specifically applied to registration problems, the methodological tools that weput in place have a broader scope of application to regression, classification and unsuper-

80 Chapter 5. Conclusion and Perspectives

vised learning tasks. While it is somewhat difficult here to have a thorough vision of themachine learning and statistical literature, a fully generic extension of the fast marginallikelihood computation scheme of [Tipping 2003] that handles the incorporation of genericquadratic priors (in addition to the sparsity-inducing terms) is proposed here for the firsttime, to our knowledge. We note that the extended prior model itself was also proposed forimage-based tasks of classification and regression under the Relevance Voxel Machine of[Sabuncu 2012], where inference followed the earlier guidelines of [Tipping 2001], lateraccelerated for specific priors exploiting sparsely connected graphs [Ganz 2013]. Alterna-tively, the scheme developed in [Le Folgoc 2014, Le Folgoc 2015b] can be seen as a fastgreedy inference scheme for a Spike-&-Slab prior model with arbitrary structure of depen-dency between explanatory variables (explanatory variables are typically assumed to be apriori independent, e.g. in [Mitchell 1988]). Relevant applications for the proposed modeland inference span a range of probabilistic regression and classification tasks where somebenefit can be expected from enforcing a given form of regularity or connectivity.

Secondly we explored the properties of the proposed model for the purpose of un-certainty quantification. In that respect, the sparse Bayesian model of [Tipping 2001,Bishop 2000] was previously reinterpreted in the work of [Candela 2004] from the stand-point of sparse Gaussian processes. Their work evidenced limitations of the RVM ap-proximate evidence-based inference [Quiñonero-Candela 2005, Rasmussen 2005], whichquestioned its ability to gauge adequate confidence intervals. In [Le Folgoc 2015a], ourwork clarifies the respective responsibilities of the inference scheme and of the model.We provide theoretical arguments and empirical evidence for the good behaviour of thesparse Bayesian model itself in terms of uncertainty quantification, although indeed reveal-ing limitations of approximate evidence-based inference for that purpose. We contribute byproposing a feasible transdimensional Markov Chain Monte Carlo scheme for explorationof the parameter space, exploiting insight into the model to avoid standard impediments ofMCMC implementations. On the applicative side, extensive evaluation of uncertainty esti-mates still needs to be conducted, with specific targets to judge its usefulness from a clinicalstandpoint. On the methodological side, the possibilities offered by statistical modellingand analysis could be further explored, moving from Bayesian modelling and inferencetowards Bayesian (or non-Bayesian) analysis of modelling assumptions and assessment oftheir soundness. On a shorter term, our Bayesian framework and MCMC inference opensup avenues for tractable model comparison and would allow to relax even further bothprior assumptions (e.g. allowing the structure of the quadratic prior to be learned) and theobservation model.

APPENDIX A

Sparse Bayesian Registration:Technical Appendices

ContentsA.1 Closed form regularizers for Gaussian reproducing kernel Hilbert spaces 81A.2 Contribution of a basis to the log marginal likelihood . . . . . . . . . . 82A.3 Update of µ, Σ, L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.4 Update of si, κi and qi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.5 Marginal prior & marginal likelihood . . . . . . . . . . . . . . . . . . . 86

A.1 Closed form regularizers for Gaussian reproducing kernelHilbert spaces

Gaussian reproducing kernel Hilbert space. Given a d × d symmetric positive definite(s.p.d.) matrix S and the Gaussian kernel KS(x, y) = exp−1

2(x− y)ᵀS−1(x− y), weconsider the space HdS of integrable d-vector fields f : Rd → Rd such that ‖f‖S < +∞,where

‖f‖2S =1

(2π)d

∫

ξ‖f(ξ)‖22 K−1

S (ξ)dξ (A.1)

involves the Fourier transform f = F [f ] of f , defined by f(ξ) =∫

x exp−ixᵀξf(x)dx.Endowed with the inner product A.2

〈f |g〉S =1

(2π)d

∫

ξf(ξ)†g(ξ) K−1

S (ξ)dξ (A.2)

(HdS , 〈 | 〉S) is a reproducing kernel Hilbert space with attractive theoretical and algorithmicproperties. Perhaps more intuitively, HdS is the completion of the space spanned by allfinite combinations of KS(x, ·)α, for x ∈ Rd and α ∈ Rd.

A multiscale space. From A.1 and the properties of Gaussian kernels under Fouriertransform, the atom KS(x, ·)α lies in HmS if and only if S > S/2, in the sense of positivedefiniteness. In particular, given a sequence S1 ≤ · · · ≤ Sq of d × d s.p.d. matrices, theirassociated r.k.h.s. are nested: HdS1

⊇ · · · ⊇ HdSq . This property leads to a principledframework to represent displacements in a multiscale fashion, jointly regularized at all

82 Appendix A. Sparse Bayesian Registration: Technical Appendices

scales.

Closed form regularizers. The successive partial derivatives ∂xi1 ···xipf of elementsf ∈ HdS exist and all lie in HdS [Zhou 2008]. As such, we may consider the family ofregularizers of the form RD(f) = ‖Df‖2S , where D is a differential operator. For any fthat can be written over a finite number of atoms KSk(xk, ·)αk, the properties of Gaus-sian kernels under Fourier transform, multiplication (by Gaussian kernels) and summationyield closed form expressions for such regularizers. We illustrate this on two classic penaltyterms for registration: the membrane and bending energies (D = ∇s, s = 1, 2). Recall thatfor any p-uplet of integers i1 · · · ip ∈ 1, · · · , d, F [∂xi1 ···xipf ](ξ) = jpξi1 · · · ξip F [f ](ξ).Thus for any f ∈ HdS ,

F [∇ · f ](ξ) = jξᵀF [f ](ξ) . (A.3)

and for any even integer s,

F [∇sf ](ξ) = (jξ)sF [f ](ξ) (A.4)

The Fourier transform of a Gaussian kernel is given by F [KSk(xk, ·)](ξ) =

|2πSk|1/2e−jξᵀxk exp−1

2ξᵀSkξ, which is again Gaussian. Regrouping the exponential

factors when appropriate, we derive the following expression for inner products of the form〈∇s(KSkαk) | ∇s(KSlαl) 〉S :

CSk,Sl,d ·αᵀk

∫

ξ‖ξ‖2sF [KSk,l(zk,l, ·)](ξ)dξ

αl (A.5)

where CSk,Sl,d = 1(2π)d

(|Sk|·|Sl||S|·|Sk,l|

)1/2is a constant, Sk,l = Sk +Sl−S and zk,l = xl−xk.

Recognizing the inverse Fourier transform of ∇2s[KSk,l(~0, ·)] evaluated at zk,l, we finallyobtain that ∥∥∥∇s

(∑kKSk(xk, ·)αk

)∥∥∥2

S= αᵀ [Rs

k,l

]1≤k,l≤M α (A.6)

Rsk,l =

( |Sk| · |Sl||S| · |Sk,l|

)1/2

(−∆)s[KSk,l ](zk,l) I (A.7)

for s integer and I the d × d identity matrix. Straightforward analytical expressions of(−∆)s[KSk,l ](zk,l) are obtained for s = 1, 2 by computing the derivatives of the Gaussiankernel.

A.2 Contribution of a basis to the log marginal likelihood

It follows from Eq. (3.22) that the log marginal likelihood L = log p(t|A, λ, σ2) is givenup to additive constant by

L = −1

2

log |C|+ tᵀC−1t

(A.8)

with C = (βH)−1 + ΦLΦᵀ, where we define

L , (A + λR)−1 . (A.9)

A.2. Contribution of a basis to the log marginal likelihood 83

Noting that C exclusively depends on the basis k via the kth diagonal coefficient of A andcolumn of Φ, we would like to single out the contribution l(Ak) of any such basis φk tothe global log marginal likelihood in the form:

L = l(Ak) + L−k (A.10)

where L−k does not depend on the basis k. If we denote by L−k the inverse of the matrixobtained by removing the kth column from L−1 = A + λR (or equivalently by settingAk = +∞ in L), we see from the Woodbury rank one matrix identity that L = L−k +

UkLkkUᵀk, with Uᵀ

k =((λL−kRk)

ᵀ I)

and Lkk = (Ak + κk)−1, where the d× d matrix

κk is defined as:κk , λRkk − (λRk)

ᵀL−k(λRk) (A.11)

By injecting this latter decomposition of L into the expression of C, we derive a decompo-sition of C into the sum of a term that does not depend on the kth basis and of a rank oneterm:

C = C−k + (ΦUk)(Ak + κk)−1(ΦUk)

ᵀ (A.12)

Letting C−1−k , (C−k)

−1, a second application of rank one update identities for the de-terminant and the inverse gives the two following expressions A.13 and A.14 for the twoterms in the right-hand side of the log marginal likelihood expression A.8:

|C| = |C−k| · |Ak + κk|−1 · |Ak + κk + sk| (A.13)

tᵀC−1t = tᵀC−1−kt− q

ᵀk (Ak + κk + sk)

−1 qk (A.14)

We introduced the statistics sk and qk respectively defined as:

sk , (ΦUk)ᵀC−1−k(ΦUk) (A.15)

qk , (ΦUk)ᵀC−1−kt (A.16)

We thus retrieve the expression Eq. (3.24) for l(Ak), which was used without proof inSection 3.3.3. It is of practical significance to the algorithmic complexity of our schemesthat the quantities involved (L, µ, Σ) do not actually depend on bases that are not in theactive set S (i.e. all bases s.t. Am = +∞). Similarly sk, κk and qk only involve the set ofactive bases S augmented with the kth basis, due of the form of Uk.

The maximization of Eq. (3.24) under constraint that Ak is a symmetric positivesemidefinite d× d matrix involves the gradient of the (unconstrained) function l(Ak):

∇l(Ak) = −σkqkq

ᵀk − sk − sk(Ak + κk)

−1skσk (A.17)

where σk is shorthand for (Ak + κk + sk)−1, and ∇l(Ak) is a d × d matrix. Since σk

is symmetric positive definite and qkqᵀk is of rank one, ∇l(Ak) has at most one negative

eigenvalue. More precisely, if qkqᵀk−sk is negative then∇l(Ak) is positive definite for all

Ak and the improper maximizer of l(Ak) lies at infinity Ak → +∞. Otherwise there isexactly one negative eigenvalue and we look for maximizers of the form A−1

k = α−1k ηkη

ᵀk .


This is consistent with the intuitive comment that A−1k ∈Md,d cannot be fully determined

from a single "observation" qk ∈ Rd and should be degenerate. Rewriting Eq. (3.24) asa function of α, η leads to maximizing A.18 under constraint that α is positive (droppingthe index k for convenience). Note also that A.18 is invariant under reparametrizationη → νη, α→ α/ν2.

l(α,η) = − log

1 +

ηᵀsη

α+ ηᵀκη

+

(qᵀη)2

α+ ηᵀ(κ+ s)η(A.18)

At a maximizer α∗,η∗ = arg maxα,η l(α,η) the constraint is either active (α∗ = 0) orinactive (α∗ > 0). If inactive, the solution actually maximizes the unconstrained functionA.18 and is given by α∗ = α(η), η∗ = η where

α(η) =(ηᵀsη)2

(qᵀη)2 − ηᵀsη − ηᵀκη , (A.19)

η = s−1q . (A.20)

In this case l(α∗,η∗) is simply equal to l(η∗), where l(η) is defined by A.21 with ξ(η) ,(qᵀη)2/ηᵀsη.

l(η) , − log ξ(η) + ξ(η)− 1 (A.21)

In addition l(η) can be seen to always provide an upper bound to the maximum contribu-tion of a basis to the evidence, maxα,η l(α,η). In the case where the constraint is active,α∗ = 0, we numerically optimize over the unit sphere in Rd to find η∗. This case occurswhen the l2-norm regularization is by itself sufficient along the direction η∗, and no addi-tional shrinkage is deemed necessary. To save on unnecessary computations, we first checkthat the upper bound l(η) to the maximum contribution of the basis k to the evidence issuperior to the current best contribution among bases already handled, as this is a necessarycondition for Ak to be updated as this iteration.

A.3 Update of µ, Σ, L

Updates of the moments of the posterior distribution µ, Σ = (Φᵀ(βH)Φ + A + λR)−1

and of L = (A + λR)−1 upon deletion from the model, update or addition to the modelof a basis i are done similarly to [Tipping 2003] and follow from Woodbury identities.Denoting updated quantities with a tilde, we get in the case of deletion:

Σ = Σ− ΣiΣ−1ii Σ

ᵀi , (A.22)

L = L− LiL−1ii Lᵀ

i (A.23)

andµ = µ− Σi(Σ

−1ii µi) . (A.24)

These rank one updates carefully avoid matrix-matrix products and have a O(|S|2) com-plexity. In the case of the addition of a basis, we first compute the new column of Σ (resp.L) before updating its full body as:

Σ = Σ + ΣiΣ−1ii Σ

ᵀi , (A.25)

A.4. Update of si, κi and qi 85

L = L + LiL−1ii Lᵀ

i , (A.26)

where the column Σi (resp. Li) is given in O(|S|2) by

Σi =

(ΣΠiΣii

Σii

), Li =

(L(λRi)Lii

Lii

)(A.27)

andΣii = (si + κi + Ai)

−1 , Lii = (κi + Ai)−1 . (A.28)

Πi is the column vector of d× d matrices defined by Πi = Φᵀ(βH)φi + λRi. The case ofthe update of a basis i is treated as a deletion followed by an addition, updating si and κiin-between these actions as they are needed in Eq. (A.28).

A.4 Update of si, κi and qi

From the resolvent identity, we note that ΣΦᵀ(βH)ΦL = L−Σ. Using such relationshipsafter developing the factors in A.15 and A.16, we derive alternative expressions for sm andqm:

sm = φᵀm(βH)φm − ΠᵀmΣ−mΠm + (λRm)ᵀL−m(λRm) (A.29)

qm = φᵀm(βH)t− ΠᵀmΣ−mΦ

ᵀ(βH)t (A.30)

where Πm is a column vector of 3 × 3 matrices defined by Πm = Φᵀ(βH)φm + λRm.Πm can be interpreted as the inner product of basis m with all the active bases w.r.t anappropriate metric, in the sense that its jth coefficient is given by: Πjm = φᵀj (βH)φm +

λ (Dφj |Dφm)H. In the specific case where λ = 0, we retrieve the quantities and ex-pressions derived by [Tipping 2003] for the RVM. We found useful to introduce surrogatequantities tm and rm respectively defined according to A.31 and A.32:

tm , φᵀm(βH)φm − ΠᵀmΣΠm + (λRm)ᵀL(λRm) (A.31)

rm , φᵀm(βH)t− ΠᵀmΣΦ

ᵀ(βH)t (A.32)

These quantities merely differ from sm and qm in that the index −m was dropped fromΣ−m and L−m. Our underlying motivation is to update simpler quantities tm and rm thatstill retain a straightforward link to the statistics sm and qm of interest for the computationof l(Am). Indeed, for a basis l that does not lie in the model, Σ−l = Σ and L−l = L.Therefore, the quantities under consideration coincide: sl = tl and ql = rl. For a basisj that lies in the model and noting that Σ−j = Σ − ΣjΣ

−1jj Σ

ᵀj , we obtain the statistics of

interest efficiently as:

sj = tj +[ΠᵀjΣj

]Σ−1jj

[ΠᵀjΣj

]ᵀ

− [(λRj)ᵀLj ] L

−1jj [(λRj)

ᵀLj ]ᵀ

(A.33)

qj = rj +[ΠᵀjΣj

]Σ−1jj µj (A.34)


Thus, we always maintain the quantities tm and rm (for every basis) and recompute smand qm either at no cost for inactive bases or, for bases in the active set S, in O(|S| · d).Updates of tm and rm upon deletion from the model, update or addition to the model ofa basis i are done similarly to [Tipping 2003], in O(|S| · d) per basis. For instance, in theaddition case, it follows from Woodbury identities that

tm = tm −[ΠᵀmΣi

]Σ−1ii

[ΠᵀmΣi

]ᵀ

+[(λRm)ᵀLi

]L−1ii

[(λRm)ᵀLi

]ᵀ (A.35)

andrm = rm −

[ΠᵀmΣi

]ri (A.36)

where rm and tm denote updated quantities, as opposed to quantities prior to the updaterm and tm. The quantities indexed by i are computed (once for all bases) following A.3.

A.5 Marginal prior & marginal likelihood

We assume a Gaussian prior p(w|λ) = | λ2πR|1/2 exp−λ2w

ᵀRw and a conjugate Gammaprior with support over strictly positive real numbers, p(λ|α, β) = βα

Γ(α)λα−1e−βλ, where

Γ(α) =∫∞

0 tα−1e−tdt is the Gamma function. Then the marginal distribution p(w) =∫p(w|λ)p(λ)dλ is given by:

p(w) =Γ(α+ dim(w)/2)

Γ(α)

∣∣∣ 1

2πβR∣∣∣1/2(

1 +χ2

2β

)−(α+dim(w)/2)

(A.37)

with χ2 = wᵀRw. This is a non-standardized multivariate Student t2α(w|0, 1β/αR),

denoting by tν(·|µ,Λ) the multivariate Student distribution with location parameter µ,inverse scale matrix Λ and ν degrees of freedom:

tν(t|µ,Λ) =Γ(ν+dim(t)

2 )

Γ(ν2 )

∣∣∣ 1

νπΛ∣∣∣1/2(

1 +1

ν(t− µ)ᵀΛ(t− µ)

)−(ν+dim(t))/2

(A.38)

In the limit case of α, β → 0, χ2 2β, p(w) ∝ 1χdim(w) . Insight into this latter improper

prior can be gained by commenting that it also arises naturally from the two followingassumptions: scale invariance (a priori) on χ, p(χ) ∝ 1/χ, and rotational invariance ofp(w) w.r.t. the euclidean metric defined by R.

The form of the marginal prior given by Eq. (4.6) follows from the above. For a fixedset of active bases S specified by the state of activation variables z =

(z1 · · · zM

)ᵀ, the

marginal prior p(w|z,H) is the product of a Dirac N (w−S∣∣0,0) on weights w−S asso-

ciated to inactive bases, and of a multivariate Student distribution tνλ(wS |0, a0b0d|S|RS)

on weights wS associated to active bases. Analogous derivations lead to Eq. 4.7 for themarginal likelihood.

A.5. Marginal prior & marginal likelihood 87

In the reversible jump scheme developed in section 4.2.3.3, ratios p(w|z,H)p(w|z,H) of marginal

priors are involved, where exactly one activation variable and its associated weight differbetween both states, e.g. (zk = 0, wk = 0), (zk = 1, wk arbitrary); and for any l 6= k,zk = zk and wk = wk. The ratio can be written as:

Γ(a0 + d |S|+12 )

Γ(a0 + d |S|2 )

( |S|+ 1

|S|

)d |S|2 (d(|S|+ 1)

2πb0

)d2 |RS |

12

|RS |12

(1 + d|S|χ2

2b0

)a0+d|S|2

(1 + d(|S|+1)χ2

2b0

)a0+d|S|+1

2

(A.39)where S is the set of active bases excluding k, S includes basis k, χ2 = wᵀ

SRSwS and

χ2 = wᵀSRSwS = χ2 + 2wᵀ

kRkwS + wᵀkRkkwk. The ratio |RS ||RS | of determinants, using

rank-one matrix update identities, is equal to |Rkk −RᵀkR−1S Rk|.

Bibliography

[Alessandrini 2015] Martino Alessandrini, Mathieu De Craene, Olivier Bernard, SophieGiffard-Roisin, Pascal Allain, Juergen Weese, Eric Saloux, Herve Delingette,Maxime Sermesant and Jan D’Hooge. A Pipeline for the Generation of Realis-tic 3D Synthetic Echocardiographic Sequences: Methodology and Open-accessDatabase. IEEE Transactions on Medical Imaging, page in press, 2015. (Cited onpage 7.)

[Allassonnière 2007] Stéphanie Allassonnière, Yali Amit and Alain Trouvé. Towards acoherent statistical framework for dense deformable template estimation. Journalof the Royal Statistical Society: Series B (Statistical Methodology), vol. 69, no. 1,pages 3–29, 2007. (Cited on page 56.)

[Archambeau 2007] Cédric Archambeau and Michel Verleysen. Robust bayesian cluster-ing. Neural Networks, vol. 20, no. 1, pages 129–138, 2007. (Cited on page 29.)

[Arendt 2012] Paul D Arendt, Daniel W Apley and Wei Chen. Quantification of modeluncertainty: Calibration, model discrepancy, and identifiability. Journal of Me-chanical Design, vol. 134, no. 10, page 100908, 2012. (Cited on page 78.)

[Argyriou 2012] Andreas Argyriou, Rina Foygel and Nathan Srebro. Sparse Predictionwith the k-Support Norm. In Advances in Neural Information Processing Systems,pages 1457–1465, 2012. (Cited on page 34.)

[Aronszajn 1951] N. Aronszajn. Theory of reproducing kernels. Harvard University, 1951.(Cited on page 12.)

[Arsigny 2003] Vincent Arsigny, Xavier Pennec and Nicholas Ayache. Polyrigidand polyaffine transformations: A new class of diffeomorphisms for locallyrigid or affine registration. Medical Image Computing and Computer-AssistedIntervention-MICCAI 2003, pages 829–837, 2003. (Cited on pages 8 and 79.)

[Arsigny 2006] Vincent Arsigny, Olivier Commowick, Xavier Pennec and Nicholas Ay-ache. A log-euclidean framework for statistics on diffeomorphisms. In Medi-cal Image Computing and Computer-Assisted Intervention–MICCAI 2006, pages924–931. Springer, 2006. (Cited on pages 8, 55 and 79.)

[Arsigny 2009] Vincent Arsigny, Olivier Commowick, Nicholas Ayache and Xavier Pen-nec. A fast and log-euclidean polyaffine framework for locally linear registration.Journal of Mathematical Imaging and Vision, vol. 33, no. 2, pages 222–238, 2009.(Cited on page 8.)

[Ashburner 2007] John Ashburner. A fast diffeomorphic image registration algorithm.NeuroImage, vol. 38, no. 1, pages 95 – 113, 2007. (Cited on pages 8, 32 and 79.)

90 Bibliography

[Ashburner 2013] John Ashburner and Gerard R. Ridgway. Symmetric diffeomorphic mod-elling of longitudinal structural MRI. Frontiers in Neuroscience, vol. 6, no. 197,2013. (Cited on page 32.)

[Bach 2012] Francis Bach, Rodolphe Jenatton, Julien Mairal and Guillaume Obozinski.Optimization with sparsity-inducing penalties. Foundations and Trends R© in Ma-chine Learning, vol. 4, no. 1, pages 1–106, 2012. (Cited on page 34.)

[Bay 2008] Herbert Bay, Andreas Ess, Tinne Tuytelaars and Luc Van Gool. Speeded-uprobust features (SURF). Computer vision and image understanding, vol. 110, no. 3,pages 346–359, 2008. (Cited on page 8.)

[Beg 2005] M.Faisal Beg, MichaelI. Miller, Alain Trouvé and Laurent Younes. Comput-ing Large Deformation Metric Mappings via Geodesic Flows of Diffeomorphisms.International Journal of Computer Vision, vol. 61, no. 2, pages 139–157, 2005.(Cited on pages 8, 55 and 79.)

[Belilovsky 2015a] Eugene Belilovsky, Andreas Argyriou, Gaël Varoquaux andMatthew B. Blaschko. Convex relaxations of penalties for sparse correlated vari-ables with bounded total variation . Machine Learning, pages 1–21, June 2015.(Cited on page 34.)

[Belilovsky 2015b] Eugene Belilovsky, Katerina Gkirtzou, Michail Misyrlis, AnnaKonova, Jean Honorio, Nelly Alia-Klein, Rita Goldstein, Dimitris Samaras andMatthew Blaschko. Predictive sparse modeling of fMRI data for improved clas-sification, regression, and visualization using the k-support norm. ComputerizedMedical Imaging and Graphics, page 1, 2015. (Cited on page 34.)

[Bestel 2001] J. Bestel, F. Clément and M. Sorine. A biomechanical model of musclecontraction. In Niessen, W.J., Viergever, M.A. (eds.) MICCAI. LNCS, vol. 2208,pages 1159–1161. Springer, 2001. (Cited on pages 4 and 11.)

[Bishop 2000] Christopher M Bishop and Michael E Tipping. Variational relevance vectormachines. In Proceedings of the Sixteenth conference on Uncertainty in artificialintelligence, pages 46–53. Morgan Kaufmann Publishers Inc., 2000. (Cited onpages 34 and 80.)

[Bishop 2006] Christopher M Bishopet al. Pattern recognition and machine learning, vol-ume 1. springer New York, 2006. (Cited on page 36.)

[Broit 1981] Chaim Broit. Optimal registration of deformed images. 1981. (Cited onpages 8, 31 and 57.)

[Brunet 2010] F. Brunet. Contributions to Parametric Image Registration and 3D Sur-face Reconstruction. PhD thesis, Université d’Auvergne, Technische UniversitätMünchen, 2010. (Cited on page 7.)

Bibliography 91

[Brynjarsdóttir 2014] Jenný Brynjarsdóttir and Anthony O’Hagan. Learning about physi-cal parameters: The importance of model discrepancy. Inverse Problems, vol. 30,no. 11, page 114007, 2014. (Cited on page 78.)

[Cachier 2003] Pascal Cachier, Eric Bardinet, Didier Dormont, Xavier Pennec andNicholas Ayache. Iconic feature based nonrigid registration: the PASHA algo-rithm. Computer vision and image understanding, vol. 89, no. 2, pages 272–298,2003. (Cited on pages 7 and 27.)

[Cachier 2004] Pascal Cachier and Nicholas Ayache. Isotropic energies, filters and splinesfor vector field regularization. Journal of Mathematical Imaging and Vision,vol. 20, no. 3, pages 251–265, 2004. (Cited on page 37.)

[Calonder 2010] Michael Calonder, Vincent Lepetit, Christoph Strecha and Pascal Fua.Brief: Binary robust independent elementary features. Computer Vision–ECCV2010, pages 778–792, 2010. (Cited on page 8.)

[Candela 2004] Joaquin Quinonero Candela and Lars Kai Hansen. Learning withuncertainty-Gaussian processes and relevance vector machines. PhD thesis, 2004.(Cited on page 80.)

[Chabiniok 2012] R. Chabiniok, P. Moireau, P.-F. Lesault, A. Rahmouni, J.-F. Deux andD. Chapelle. Estimation of tissue contractility from cardiac cine-MRI using abiomechanical heart model. Biomechanics and Modeling in Mechanobiology,vol. 11, no. 5, pages 609–630, 2012. (Cited on pages 6, 11 and 78.)

[Chandrashekara 2004] Raghavendra Chandrashekara, Raad H Mohiaddin and DanielRueckert. Analysis of 3-D myocardial motion in tagged MR images using non-rigid image registration. IEEE Transactions on Medical Imaging, vol. 23, no. 10,pages 1245–1250, 2004. (Cited on pages 8 and 27.)

[Chapelle 2012] D. Chapelle, P. Le Tallec, P. Moireau and M. Sorine. An energy-preserving muscle tissue model: formulation and compatible discretizations.IJMCE, 10(2):189-211, 2012. (Cited on pages 4 and 11.)

[Chib 1995] Siddhartha Chib. Marginal likelihood from the Gibbs output. Journal of theAmerican Statistical Association, vol. 90, no. 432, pages 1313–1321, 1995. (Citedon page 64.)

[Clatz 2005] Olivier Clatz, Hervé Delingette, Ion-Florin Talos, Alexandra J Golby, RonKikinis, Ferenc Jolesz, Nicholas Ayache, Simon K Warfieldet al. Robust nonrigidregistration to capture brain shift from intraoperative MRI. Medical Imaging, IEEETransactions on, vol. 24, no. 11, pages 1417–1427, 2005. (Cited on page 8.)

[Dalal 2005] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for humandetection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEEComputer Society Conference on, volume 1, pages 886–893. IEEE, 2005. (Citedon page 8.)

92 Bibliography

[Davis 1997] G. Davis, S. Mallat and M. Avellaneda. Adaptive greedy approximations.Constructive approximation, vol. 13, no. 1, pages 57–98, 1997. (Cited on page 14.)

[De Craene 2010] Mathieu De Craene, Gemma Piella, Nicolas Duchateau, Etel Silva,Adelina Doltra, Hang Gao, Jan D’hooge, Oscar Camara, Josep Brugada, MartaSitgeset al. Temporal diffeomorphic free-form deformation for strain quantifi-cation in 3D-US images. In Medical Image Computing and Computer-AssistedIntervention–MICCAI 2010, pages 1–8. Springer, 2010. (Cited on page 8.)

[De Craene 2012] Mathieu De Craene, Gemma Piella, Oscar Camara, Nicolas Duchateau,Etelvino Silva, Adelina Doltra, Jan D’hooge, Josep Brugada, Marta Sitges andAlejandro F Frangi. Temporal diffeomorphic free-form deformation: Application tomotion and strain estimation from 3D echocardiography. Medical Image Analysis,vol. 16, no. 2, pages 427–450, 2012. (Cited on pages 8 and 55.)

[De Craene 2013] Mathieu De Craene, Stephanie Marchesseau, Brecht Heyde, Hang Gao,M Alessandrini, Olivier Bernard, Gemma Piella, AR Porras, E Saloux, L Tautzet al.3D Strain Assessment in Ultrasound (STRAUS): A synthetic comparison of fivetracking methodologies. IEEE Transactions on Medical Imaging, 2013. (Cited onpages 9, 44 and 46.)

[Delingette 2012] H. Delingette, F. Billet, K.C.L. Wong, M. Sermesant, K. Rhode,M. Ginks, CA Rinaldi, R. Razavi, N. Ayacheet al. Personalization of CardiacMotion and Contractility from Images using Variational Data Assimilation. IEEETrans. Biomed. Eng., vol. 59, no. 1, page 20, 2012. (Cited on pages 5 and 11.)

[Dupuis 1998] Paul Dupuis, Ulf Grenander and Michael I Miller. Variational problems onflows of diffeomorphisms for image matching. Quarterly of applied mathematics,vol. 56, no. 3, page 587, 1998. (Cited on page 8.)

[Durrleman 2007] S. Durrleman, X. Pennec, A. Trouvé and N. Ayache. Measuring brainvariability via sulcal lines registration: a diffeomorphic approach. In Ayache, N.,Ourselin, S., Maeder, A. (eds.) MICCAI 2007, Part I. LNCS, vol. 4791, pages675–682. Springer, Heidelberg, 2007. (Cited on pages 8, 12 and 79.)

[Durrleman 2008] S. Durrleman, X. Pennec, A. Trouvé and N. Ayache. Sparse approxi-mation of currents for statistics on curves and surfaces. In Metaxas, D., Axel, L.,Szekely, G., and Fichtinger, G. (eds.) Proceedings MICCAI, Part II, LNCS, vol.5242, pages 390–398. Springer, 2008. (Cited on pages 14 and 16.)

[Durrleman 2009] Stanley Durrleman, Xavier Pennec, Alain Trouvé and Nicholas Ayache.Statistical models of sets of curves and surfaces based on currents. Medical imageanalysis, vol. 13, no. 5, pages 793–808, 2009. (Cited on page 15.)

[Durrleman 2010] S. Durrleman. Statistical models of currents for measuring the vari-ability of anatomical curves, surfaces and their evolution. Ph.D. Thesis, INRIA,March 2010. (Cited on pages 12 and 15.)

Bibliography 93

[Durrleman 2013] Stanley Durrleman, Stéphanie Allassonnière and Sarang Joshi. Sparseadaptive parameterization of variability in image ensembles. International Journalof Computer Vision, vol. 101, no. 1, pages 161–183, 2013. (Cited on page 56.)

[Ecabert 2011] Olivier Ecabert, Jochen Peters, Matthew J Walker, Thomas Ivanc, Cris-tian Lorenz, Jens von Berg, Jonathan Lessick, Mani Vembar and Jürgen Weese.Segmentation of the heart and great vessels in CT images using a model-basedadaptation framework. Medical Image Analysis, vol. 15, no. 6, pages 863–876,2011. (Cited on page 5.)

[Fanello 2014] Sean Ryan Fanello, Cem Keskin, Pushmeet Kohli, Shahram Izadi, JamieShotton, Antonio Criminisi, Ugo Pattacini and Tim Paek. Filter Forests for Learn-ing Data-Dependent Convolutional Kernels. In Computer Vision and PatternRecognition (CVPR), 2014 IEEE Conference on, pages 1709–1716. IEEE, 2014.(Cited on page 33.)

[Ganz 2013] Melanie Ganz, Mert R Sabuncu and Koen Van Leemput. An improved opti-mization method for the relevance voxel machine. In Machine Learning in MedicalImaging, pages 147–154. Springer, 2013. (Cited on page 80.)

[Gärtner 2002] T. Gärtner, P.A. Flach, A. Kowalczyk and A.J. Smola. Multi-instance ker-nels. In Proceedings of the 19th International Conference on Machine Learning,pages 179–186, 2002. (Cited on page 13.)

[Gee 1998] James C Gee and Ruzena K Bajcsy. Elastic matching: Continuum mechanicaland probabilistic analysis. Brain warping, vol. 2, 1998. (Cited on pages 8, 24, 31and 57.)

[Gelman 1998] Andrew Gelman and Xiao-Li Meng. Simulating normalizing constants:From importance sampling to bridge sampling to path sampling. Statistical sci-ence, pages 163–185, 1998. (Cited on page 65.)

[Geman 1984] Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distribu-tions, and the Bayesian restoration of images. Pattern Analysis and Machine Intel-ligence, IEEE Transactions on, no. 6, pages 721–741, 1984. (Cited on page 64.)

[Girolami 2011] Mark Girolami and Ben Calderhead. Riemann manifold langevin andhamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B(Statistical Methodology), vol. 73, no. 2, pages 123–214, 2011. (Cited on page 66.)

[Glocker 2007] Ben Glocker, Nikos Komodakis, Nikos Paragios, Georgios Tziritas andNassir Navab. Inter and intra-modal deformable registration: Continuous defor-mations meet efficient optimal linear programming. In Information Processing inMedical Imaging, pages 408–420. Springer, 2007. (Cited on page 8.)

[Glocker 2008a] Ben Glocker, Nikos Komodakis, Georgios Tziritas, Nassir Navab andNikos Paragios. Dense image registration through MRFs and efficient linear pro-gramming. Medical image analysis, vol. 12, no. 6, pages 731–741, 2008. (Citedon page 8.)

94 Bibliography

[Glocker 2008b] Ben Glocker, Nikos Paragios, Nikos Komodakis, Georgios Tziritas andNassir Navab. Optical flow estimation with uncertainties through dynamic MRFs.In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conferenceon, pages 1–8. IEEE, 2008. (Cited on page 8.)

[Glocker 2011] Ben Glocker, Aristeidis Sotiras, Nikos Komodakis and Nikos Paragios.Deformable medical image registration: Setting the state of the art with discretemethods*. Annual review of biomedical engineering, vol. 13, pages 219–244,2011. (Cited on page 8.)

[Gori 2013] Pietro Gori, Olivier Colliot, Yulia Worbe, Linda Marrakchi-Kacem, SophieLecomte, Cyril Poupon, Andreas Hartmann, Nicholas Ayache and Stanley Durrle-man. Bayesian Atlas Estimation for the Variability Analysis of Shape Complexes.In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013,pages 267–274. Springer, 2013. (Cited on pages 31 and 56.)

[Goshtasby 2012] A Ardeshir Goshtasby. Image registration methods. In Image Registra-tion, pages 415–434. Springer, 2012. (Cited on page 7.)

[Green 1995] Peter J Green. Reversible jump Markov chain Monte Carlo computation andBayesian model determination. Biometrika, vol. 82, no. 4, pages 711–732, 1995.(Cited on pages 59 and 65.)

[Groves 2011] Adrian R Groves, Christian F Beckmann, Steve M Smith and Mark WWoolrich. Linked independent component analysis for multimodal data fusion.Neuroimage, vol. 54, no. 3, pages 2198–2217, 2011. (Cited on page 30.)

[Haber 2006] Eldad Haber and Jan Modersitzki. A multilevel method for image registra-tion. SIAM Journal on Scientific Computing, vol. 27, no. 5, pages 1594–1607,2006. (Cited on page 8.)

[Hachama 2012] Mohamed Hachama, Agnès Desolneux and Frédéric JP Richard.Bayesian technique for image classifying registration. Image Processing, IEEETransactions on, vol. 21, no. 9, pages 4080–4091, 2012. (Cited on pages 7 and 27.)

[Haussler 1999] D. Haussler. Convolution kernels on discrete structures. Rapport tech-nique, Technical report, UC Santa Cruz, 1999. (Cited on page 13.)

[Heiberg 2005] Einar Heiberg, L Wigstrom, Marcus Carlsson, AF Bolger and M Karlsson.Time resolved three-dimensional automated segmentation of the left ventricle. InComputers in Cardiology, 2005, pages 599–602. IEEE, 2005. (Cited on page 51.)

[Heyde 2013] Brecht Heyde, Daniel Barbosa, Piet Claus, Frederik Maes and Jan D’hooge.Influence of the grid topology of free-form deformation models on the performanceof 3d strain estimation in echocardiography. In Functional Imaging and Modelingof the Heart, pages 308–315. Springer, 2013. (Cited on page 8.)

[Hoerl 1970] A.E. Hoerl and R.W. Kennard. Ridge regression: Biased estimation fornonorthogonal problems. Technometrics, pages 55–67, 1970. (Cited on page 17.)

Bibliography 95

[Hoyos-Idrobo 2015] Andrés Hoyos-Idrobo, Yannick Schwartz, Gaël Varoquaux andBertrand Thirion. Improving sparse recovery on structured images with baggedclustering. In International Workshop On Pattern Recognition In Neuroimaging(PRNI), 2015 , Palo alto, United States, June 2015. (Cited on page 2.)

[Imperiale 2011] A. Imperiale, R. Chabiniok, P. Moireau and D. Chapelle. ConstitutiveParameter Estimation Using Tagged-MRI Data. In Metaxas, D., Axel, L. (eds.)Proceedings of FIMH’11, LNCS 6666, pages 409–417. Springer, 2011. (Cited onpages 6, 11 and 78.)

[Janoos 2012] Firdaus Janoos, Petter Risholm and William Wells III. Bayesian character-ization of uncertainty in multi-modal image registration. Biomedical Image Regis-tration, pages 50–59, 2012. (Cited on pages 7, 9 and 27.)

[Jenatton 2012] Rodolphe Jenatton, Alexandre Gramfort, Vincent Michel, GuillaumeObozinski, Evelyn Eger, Francis Bach and Bertrand Thirion. Multi-scale Min-ing of fMRI data with Hierarchical Structured Sparsity. SIAM Journal on ImagingSciences, vol. 5, no. 3, pages 835–856, July 2012. (Cited on page 34.)

[Kaipio 2011] Jari P Kaipio and Colin Fox. The Bayesian framework for inverse problemsin heat transfer. Heat Transfer Engineering, vol. 32, no. 9, pages 718–753, 2011.(Cited on page 78.)

[Kennedy 2001] Marc C Kennedy and Anthony O’Hagan. Bayesian calibration of com-puter models. Journal of the Royal Statistical Society. Series B, Statistical Method-ology, pages 425–464, 2001. (Cited on page 78.)

[Klein 2007] Stefan Klein, Marius Staring and Josien PW Pluim. Evaluation of optimiza-tion methods for nonrigid medical image registration using mutual informationand B-splines. Image Processing, IEEE Transactions on, vol. 16, no. 12, pages2879–2890, 2007. (Cited on page 8.)

[Kohli 2008] Pushmeet Kohli and Philip HS Torr. Measuring uncertainty in graph cutsolutions. Computer Vision and Image Understanding, vol. 112, no. 1, pages 30–38, 2008. (Cited on page 8.)

[Konukoglu 2011] Ender Konukoglu, Jatin Relan, Ulas Cilingir, Bjoern H Menze, PhaniChinchapatnam, Amir Jadidi, Hubert Cochet, Mélèze Hocini, Hervé Delingette,Pierre Jaïset al. Efficient probabilistic model personalization integrating uncer-tainty on data and parameters: Application to eikonal-diffusion models in cardiacelectrophysiology. Progress in Biophysics and Molecular Biology, vol. 107, no. 1,pages 134–146, 2011. (Cited on page 78.)

[Kybic 2010] Jan Kybic. Bootstrap resampling for image registration uncertainty estima-tion without ground truth. Image Processing, IEEE Transactions on, vol. 19, no. 1,pages 64–73, 2010. (Cited on page 9.)

96 Bibliography

[Larrabide 2009] Ignacio Larrabide, Pedro Omedas, Yves Martelli, Xavier Planes,Maarten Nieber, Juan A Moya, Constantine Butakoff, Rafael Sebastián, Oscar Ca-mara, Mathieu De Craeneet al. GIMIAS: an open source framework for efficientdevelopment of research tools and clinical prototypes. In Functional imaging andmodeling of the heart, pages 417–426. Springer, 2009. (Cited on page 5.)

[Le Folgoc 2012] Loic Le Folgoc, Hervé Delingette, Antonio Criminisi and Nicholas Ay-ache. Current-based 4D shape analysis for the mechanical personalization of heartmodels. In Bjoern Menze, Georg Langs, Albert Montillo, Zhuowen Tu and Anto-nio Criminisi, editeurs, MCV - MICCAI Workshop on Medical Computer Vision- 2012, volume 7766, pages 283–292, Nice, France, October 2012. Menze, Bjoernand Langs, Georg and Montillo, Albert and Tu, Zhuowen and Criminisi, Antonio,Springer Berlin Heidelberg. (Cited on pages 10 and 77.)

[Le Folgoc 2014] Loic Le Folgoc, Hervé Delingette, Antonio Criminisi and Nicholas Ay-ache. Sparse Bayesian Registration. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014, pages 235–242. Springer, 2014. (Cited onpages 10, 25, 29, 36 and 80.)

[Le Folgoc 2015a] Loïc Le Folgoc, Hervé Delingette, Antonio Criminisi and NicholasAyache. Quantifying registration uncertainty with sparse Bayesian modelling: issparsity a bane? to be submitted in IEEE Transactions in Medical Imaging, 2015.(Cited on pages 10 and 80.)

[Le Folgoc 2015b] Loïc Le Folgoc, Hervé Delingette, Antonio Criminisi and NicholasAyache. Sparse Bayesian Registration of Medical Images for Self-Tuning of Pa-rameters and Spatially Adaptive Parametrization of Displacements. submitted toMedical Image Analysis, 2015. (Cited on pages 10, 59, 61, 62, 68, 69, 70 and 80.)

[Liu 2009a] Fei Liu, MJ Bayarri, JO Bergeret al. Modularization in Bayesian analysis,with emphasis on analysis of computer models. Bayesian Analysis, vol. 4, no. 1,pages 119–150, 2009. (Cited on page 78.)

[Liu 2009b] H. Liu and P. Shi. Maximum a posteriori strategy for the simultaneous motionand material property estimation of the heart. IEEE Trans. Biomed. Eng., vol. 56,no. 2, pages 378–389, 2009. (Cited on page 11.)

[Loeckx 2004] Dirk Loeckx, Frederik Maes, Dirk Vandermeulen and Paul Suetens. Non-rigid image registration using free-form deformations with a local rigidity con-straint. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2004, pages 639–646. Springer, 2004. (Cited on page 8.)

[Lombaert 2014] Herve Lombaert, Darko Zikic, Antonio Criminisi and Nicholas Ayache.Laplacian Forests: Semantic Image Segmentation by Guided Bagging. In Medi-cal Image Computing and Computer-Assisted Intervention–MICCAI 2014, pages496–504. Springer, 2014. (Cited on page 2.)

Bibliography 97

[Lombaert 2015] Herve Lombaert, Antonio Criminisi and Nicholas Ayache. SpectralForests: Learning of Surface Data, Application to Cortical Parcellation. Medi-cal Image Computing and Computer-Assisted Intervention–MICCAI 2015, 2015.(Cited on page 2.)

[Lorenzi 2013] Marco Lorenzi, Nicholas Ayache, Giovanni B Frisoni, Xavier Pennec,Alzheimer’s Disease Neuroimaging Initiative (ADNIet al. LCC-Demons: a ro-bust and accurate symmetric diffeomorphic registration algorithm. NeuroImage,vol. 81, pages 470–483, 2013. (Cited on page 7.)

[Lowe 2004] David G Lowe. Distinctive image features from scale-invariant keypoints.International journal of computer vision, vol. 60, no. 2, pages 91–110, 2004. (Citedon page 8.)

[Ma 2009] Xiang Ma and Nicholas Zabaras. An efficient Bayesian inference approach toinverse problems based on an adaptive sparse grid collocation method. InverseProblems, vol. 25, no. 3, page 035013, 2009. (Cited on page 78.)

[MacKay 1992] David JC MacKay. Bayesian interpolation. Neural computation, vol. 4,no. 3, pages 415–447, 1992. (Cited on page 34.)

[Mäkelä 2002] Timo Mäkelä, Patrick Clarysse, Outi Sipilä, Nicoleta Pauna, Quoc CuongPham, Toivo Katila and Isabelle E Magnin. A review of cardiac image registrationmethods. Medical Imaging, IEEE Transactions on, vol. 21, no. 9, pages 1011–1021,2002. (Cited on page 8.)

[Mansi 2011] T. Mansi, X. Pennec, M. Sermesant, H. Delingette and N. Ayache.iLogDemons: A Demons-Based registration algorithm for tracking incompressibleelastic biological tissues. International journal of computer vision, vol. 92, no. 1,pages 92–111, 2011. (Cited on page 12.)

[Marchesseau 2012] S. Marchesseau, H. Delingette, M. Sermesant, K. Rhode, S.G. Duck-ett, C.A. Rinaldi, R. Razavi and N. Ayache. Cardiac Mechanical Parameter Cal-ibration based on the Unscented Transform. In Ayache, N., Delingette, H., Gol-land, P., Mori, K. (eds.) MICCAI, LNCS, volume 7511. Springer, 2012. (Cited onpages 6, 10, 12 and 18.)

[Marchesseau 2013a] Stéphanie Marchesseau. Simulation of patient-specific cardiac mod-els for therapy planning. Theses, Ecole Nationale Supérieure des Mines de Paris,January 2013. (Cited on page 4.)

[Marchesseau 2013b] Stéphanie Marchesseau, Hervé Delingette, Maxime Sermesant, Ro-cio Cabrera Lozoya, Catalina Tobon-Gomez, Philippe Moireau, Rosa MariaFigueras I Ventura, Karim Lekadir, Alfredo Hernandez, Mireille Garreau, ErwanDonal, Christophe Leclercq, Simon G. Duckett, Kawal Rhode, Christopher Aldo

98 Bibliography

Rinaldi, Alejandro F. Frangi, Reza Razavi, Dominique Chapelle and Nicholas Ay-ache. Personalization of a Cardiac Electromechanical Model using Reduced Or-der Unscented Kalman Filtering from Regional Volumes. Medical Image Analysis,vol. 17, no. 7, pages 816–829, May 2013. (Cited on pages 6, 10 and 78.)

[Margeta 2011] Ján Margeta, Ezequiel Geremia, Antonio Criminisi and Nicholas Ayache.Layered Spatio-Temporal Forests for Left Ventricle Segmentation from 4D CardiacMRI Data. In MICCAI workshop: Statistical Atlases and Computational Modelsof the Heart (STACOM), 2011. (Cited on page 2.)

[Margeta 2015] Jan Margeta, Antonio Criminisi, R Cabrera Lozoya, Daniel C Lee andNicholas Ayache. Fine-tuned convolutional neural nets for cardiac MRI acquisi-tion plane recognition. Computer Methods in Biomechanics and Biomedical En-gineering: Imaging & Visualization, no. ahead-of-print, pages 1–11, 2015. (Citedon page 2.)

[McLeod 2013a] Kristin McLeod, Christof Seiler, Maxime Sermesant and Xavier Pennec.A near-incompressible poly-affine motion model for cardiac function analysis. InStatistical Atlases and Computational Models of the Heart. Imaging and ModellingChallenges, pages 288–297. Springer, 2013. (Cited on page 8.)

[Mcleod 2013b] Kristin Mcleod, Christof Seiler, Nicolas Toussaint, Maxime Sermesantand Xavier Pennec. Regional analysis of left ventricle function using a cardiac-specific polyaffine motion model. In Functional Imaging and Modeling of the Heart,pages 483–490. Springer, 2013. (Cited on page 8.)

[Miller 2002] Michael I Miller, Alain Trouvé and Laurent Younes. On the metrics andEuler-Lagrange equations of computational anatomy. Annual review of biomedi-cal engineering, vol. 4, no. 1, pages 375–405, 2002. (Cited on page 79.)

[Mitchell 1988] Toby J Mitchell and John J Beauchamp. Bayesian variable selection inlinear regression. Journal of the American Statistical Association, vol. 83, no. 404,pages 1023–1032, 1988. (Cited on pages 34, 70 and 80.)

[Mohamed 2012] Shakir Mohamed, Katherine A Heller and Zoubin Ghahramani.Bayesian and L1 Approaches for Sparse Unsupervised Learning. Proceedings ofthe 29th International Conference in Machine Learning, 2012. (Cited on page 34.)

[Moireau 2011] P. Moireau and D. Chapelle. Reduced-order Unscented Kalman Filteringwith application to parameter identification in large-dimensional systems. ESAIM:Control, Optimisation and Calculus of Variations, vol. 17, no. 02, pages 380–405,2011. (Cited on pages 6 and 78.)

[Murphy 2012] Kevin P Murphy. Machine learning: a probabilistic perspective. MITpress, 2012. (Cited on page 66.)

Bibliography 99

[Neumann 2014] Dominik Neumann, Tommaso Mansi, Bogdan Georgescu, Ali Kamen,Elham Kayvanpour, Ali Amr, Farbod Sedaghat-Hamedani, Jan Haas, Hugo Ka-tus, Benjamin Mederet al. Robust image-based estimation of cardiac tissue pa-rameters and their uncertainty from noisy data. In Medical Image Computingand Computer-Assisted Intervention–MICCAI 2014, pages 9–16. Springer Inter-national Publishing, 2014. (Cited on pages 6 and 78.)

[Osman 1999] Nael F Osman, William S Kerwin, Elliot R McVeigh and Jerry L Prince.Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonanceimaging. Magnetic resonance in medicine: official journal of the Society of Mag-netic Resonance in Medicine/Society of Magnetic Resonance in Medicine, vol. 42,no. 6, page 1048, 1999. (Cited on page 8.)

[Ourselin 2000] S. Ourselin, A. Roche, S. Prima and N. Ayache. Block Matching: A Gen-eral Framework to Improve Robustness of Rigid Registration of Medical Images. InScottL. Delp, AnthonyM. DiGoia and Branislav Jaramaz, editeurs, Medical ImageComputing and Computer-Assisted Intervention – MICCAI 2000, volume 1935 ofLecture Notes in Computer Science, pages 557–566. Springer Berlin Heidelberg,2000. (Cited on page 8.)

[Parisot 2014] Sarah Parisot, William Wells, Stéphane Chemouny, Hugues Duffau andNikos Paragios. Concurrent tumor segmentation and registration with uncertainty-based sparse non-uniform graphs. Medical image analysis, vol. 18, no. 4, pages647–659, 2014. (Cited on page 8.)

[Pitiot 2003] Alain Pitiot, Grégoire Malandain, Eric Bardinet and Paul M Thompson.Piecewise affine registration of biological images. In Biomedical Image Regis-tration, pages 91–101. Springer, 2003. (Cited on page 8.)

[Prakosa 2013a] Adityo Prakosa. Analysis and simulation of multimodal cardiac images tostudy the heart function. Theses, Université Nice Sophia Antipolis, January 2013.(Cited on pages 3 and 7.)

[Prakosa 2013b] Adityo Prakosa, Maxime Sermesant, Hervé Delingette, StéphanieMarchesseau, Eric Saloux, Pascal Allain, Nicolas Villain and Nicholas Ayache.Generation of synthetic but visually realistic time series of cardiac images combin-ing a biophysical model and clinical images. Medical Imaging, IEEE Transactionson, vol. 32, no. 1, pages 99–109, 2013. (Cited on page 7.)

[Quiñonero-Candela 2005] Joaquin Quiñonero-Candela and Carl Edward Rasmussen. Aunifying view of sparse approximate Gaussian process regression. The Journal ofMachine Learning Research, vol. 6, pages 1939–1959, 2005. (Cited on pages 59,74 and 80.)

[Rasmussen 2005] Carl Edward Rasmussen and Joaquin Quinonero-Candela. Healing therelevance vector machine through augmentation. In Proceedings of the 22nd inter-national conference on Machine learning, pages 689–696. ACM, 2005. (Cited onpages 59, 74 and 80.)

100 Bibliography

[Richard 2009] Frédéric JP Richard, Adeline MM Samson and Charles A Cuénod. ASAEM algorithm for the estimation of template and deformation parameters inmedical image sequences. Stat Comput, vol. 19, no. 4, 2009. (Cited on pages 7,24, 27, 59 and 72.)

[Risholm 2013] Petter Risholm, Firdaus Janoos, Isaiah Norton, Alex J Golby andWilliam M Wells III. Bayesian characterization of uncertainty in intra-subjectnon-rigid registration. Medical image analysis, vol. 17, no. 5, pages 538–555,2013. (Cited on pages 9, 24, 25, 58 and 64.)

[Roberts 1996] Gareth O Roberts and Richard L Tweedie. Exponential convergence ofLangevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996. (Cited on page 66.)

[Roberts 2006] Gareth O Roberts and Jeffrey S Rosenthal. Harris recurrence ofMetropolis-within-Gibbs and trans-dimensional Markov chains. The Annals ofApplied Probability, pages 2123–2139, 2006. (Cited on page 68.)

[Rohde 2003] Gustavo K Rohde, Akram Aldroubi and Benoit M Dawant. The adaptivebases algorithm for intensity-based nonrigid image registration. Medical Imaging,IEEE Transactions on, vol. 22, no. 11, pages 1470–1479, 2003. (Cited on page 25.)

[Rohlfing 2003] Torsten Rohlfing, Calvin R Maurer Jr, David Bluemke, Michael Ja-cobset al. Volume-preserving nonrigid registration of MR breast images using free-form deformation with an incompressibility constraint. Medical Imaging, IEEETransactions on, vol. 22, no. 6, pages 730–741, 2003. (Cited on page 8.)

[Rohr 2003] K Rohr, M Fornefett and H.S Stiehl. Spline-based elastic image registration:integration of landmark errors and orientation attributes. Computer Vision andImage Understanding, vol. 90, no. 2, pages 153 – 168, 2003. (Cited on page 37.)

[Rublee 2011] Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary Bradski. ORB:an efficient alternative to SIFT or SURF. In Computer Vision (ICCV), 2011 IEEEInternational Conference on, pages 2564–2571. IEEE, 2011. (Cited on page 8.)

[Rueckert 1999] Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LG Hill, Martin OLeach and David J Hawkes. Nonrigid registration using free-form deformations:application to breast MR images. IEEE Transactions on Medical Imaging, vol. 18,no. 8, pages 712–721, 1999. (Cited on pages 31 and 61.)

[Rueckert 2006] Daniel Rueckert, Paul Aljabar, Rolf A Heckemann, Joseph V Hajnal andAlexander Hammers. Diffeomorphic registration using B-splines. In Medical Im-age Computing and Computer-Assisted Intervention–MICCAI 2006, pages 702–709. Springer, 2006. (Cited on page 8.)

[Sabuncu 2011] Mert R Sabuncu and Koen Van Leemput. The Relevance Voxel Ma-chine (RVoxM): a Bayesian method for image-based prediction. In Medical Image

Bibliography 101

Computing and Computer-Assisted Intervention–MICCAI 2011, pages 99–106.Springer, 2011. (Cited on page 26.)

[Sabuncu 2012] Mert R Sabuncu and Koen Van Leemput. The relevance voxel machine(rvoxm): A self-tuning bayesian model for informative image-based prediction.Medical Imaging, IEEE Transactions on, vol. 31, no. 12, pages 2290–2306, 2012.(Cited on page 80.)

[Schniter 2008] Philip Schniter, Lee C Potter and Justin Ziniel. Fast Bayesian matchingpursuit. In Information Theory and Applications Workshop, 2008, pages 326–333.IEEE, 2008. (Cited on page 75.)

[Schölkopf 2002] B. Schölkopf and A.J. Smola. Learning with kernels: Support vectormachines, regularization, optimization, and beyond. the MIT Press, 2002. (Citedon page 16.)

[Shi 2012] Wenzhe Shi, Xiahai Zhuang, Luis Pizarro, Wenjia Bai, Haiyan Wang, Kai-PinTung, Philip Edwards and Daniel Rueckert. Registration Using Sparse Free-FormDeformations. MICCAI, pages 659–666, 2012. (Cited on page 8.)

[Shi 2013] Wenzhe Shi, Martin Jantsch, Paul Aljabar, Luis Pizarro, Wenjia Bai, HaiyanWang, Declan O’Regan, Xiahai Zhuang and Daniel Rueckert. Temporal sparsefree-form deformations. Medical Image Analysis, vol. 17, no. 7, pages 779–789,2013. (Cited on pages 8 and 33.)

[Simpson 2012] Ivor JA Simpson, Julia A Schnabel, Adrian R Groves, Jesper LR An-dersson and Mark W Woolrich. Probabilistic inference of regularisation in non-rigid registration. NeuroImage, vol. 59, no. 3, pages 2438–2451, 2012. (Cited onpages 9, 24, 25, 29, 30, 34, 58, 61 and 79.)

[Simpson 2013] Ivor JA Simpson, Mark W Woolrich, Manuel Jorge Cardoso, David MCash, Marc Modat, Julia A Schnabel and Sebastien Ourselin. A Bayesian Ap-proach for Spatially Adaptive Regularisation in Non-rigid Registration. MedicalImage Computing and Computer-Assisted Intervention–MICCAI 2013, pages 10–18, 2013. (Cited on pages 7, 9, 27, 29 and 58.)

[Simpson 2015] IJA Simpson, MJ Cardoso, M Modat, DM Cash, MW Woolrich, JLRAndersson, JA Schnabel, S Ourselin, Alzheimer’s Disease Neuroimaging Initia-tiveet al. Probabilistic Non-Linear Registration with Spatially Adaptive Regulari-sation. Medical image analysis, 2015. (Cited on page 25.)

[Singh 2015] Nikhil Singh, François-Xavier Vialard and Marc Niethammer. Splines fordiffeomorphisms. Medical image analysis, 2015. (Cited on page 8.)

[Sommer 2013] Stefan Sommer, Mads Nielsen, Sune Darkner and Xavier Pennec. Higher-order momentum distributions and locally affine LDDMM registration. SIAM Jour-nal on Imaging Sciences, vol. 6, no. 1, pages 341–367, 2013. (Cited on pages 8and 79.)

102 Bibliography

[Sotiras 2013] Aristeidis Sotiras, Christos Davatzikos and Nikos Paragios. Deformablemedical image registration: A survey. IEEE Transactions on Medical Imaging,vol. 32, no. 7, pages 1153–1190, 2013. (Cited on pages 7 and 31.)

[Stefanescu 2004] Radu Stefanescu, Xavier Pennec and Nicholas Ayache. Grid powerednonlinear image registration with locally adaptive regularization. Medical imageanalysis, vol. 8, no. 3, pages 325–342, 2004. (Cited on page 25.)

[Stewart 2003] Charles V Stewart, Chia-Ling Tsai and Badrinath Roysam. The dual-bootstrap iterative closest point algorithm with application to retinal image regis-tration. Medical Imaging, IEEE Transactions on, vol. 22, no. 11, pages 1379–1394,2003. (Cited on page 25.)

[Stuart 2010] Andrew M Stuart. Inverse problems: a Bayesian perspective. Acta Numer-ica, vol. 19, pages 451–559, 2010. (Cited on page 78.)

[Sundar 2009] Hari Sundar, Christos Davatzikos and George Biros. Biomechanically-constrained 4D estimation of myocardial motion. In Medical Image Computing andComputer-Assisted Intervention–MICCAI 2009, pages 257–265. Springer, 2009.(Cited on page 5.)

[Taron 2009] Maxime Taron, Nikos Paragios and M-P Jolly. Registration with uncertain-ties and statistical modeling of shapes with variable metric kernels. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pages 99–113,2009. (Cited on page 9.)

[Thirion 1998] J-P Thirion. Image matching as a diffusion process: an analogy withMaxwell’s demons. Medical image analysis, vol. 2, no. 3, pages 243–260, 1998.(Cited on pages 8 and 37.)

[Tibshirani 1996] Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. (Cited on page 54.)

[Tipping 2001] Michael E Tipping. Sparse Bayesian learning and the relevance vectormachine. The Journal of Machine Learning Research, vol. 1, pages 211–244, 2001.(Cited on pages 25, 26, 34, 36, 59, 70 and 80.)

[Tipping 2003] Michael E Tipping, Anita C Faulet al. Fast marginal likelihood maximisa-tion for sparse Bayesian models. In Workshop on artificial intelligence and statis-tics, volume 1. Jan, 2003. (Cited on pages 25, 26, 36, 54, 59, 70, 80, 84, 85 and 86.)

[Tipping 2005] Michael E Tipping and Neil D Lawrence. Variational inference forStudent-t models: Robust Bayesian interpolation and generalised component anal-ysis. Neurocomputing, vol. 69, no. 1, pages 123–141, 2005. (Cited on page 27.)

[Tobon-Gomez 2013] Catalina Tobon-Gomez, Mathieu De Craene, Kristin McLeod,Lennart Tautz, Wenzhe Shi, Anja Hennemuth, Adityo Prakosa, H Wang, Gerry

Bibliography 103

Carr-White, Stam Kapetanakiset al. Benchmarking framework for myocardialtracking and deformation algorithms: An open access database. Medical ImageAnalysis, 2013. (Cited on pages 9, 49 and 51.)

[Vaillant 2005] M. Vaillant and J. Glaunes. Surface matching via currents. In Christensen,G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pages 381–392. Springer,Heidelberg, 2005. (Cited on page 12.)

[Wachinger 2014] Christian Wachinger, Polina Golland, Martin Reuter and William Wells.Gaussian Process Interpolation for Uncertainty Estimation in Image Registration.In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014,pages 267–274. Springer, 2014. (Cited on page 30.)

[Wells III 1996] William M Wells III, Paul Viola, Hideki Atsumi, Shin Nakajima and RonKikinis. Multi-modal volume registration by maximization of mutual information.Medical Image Analysis, vol. 1, no. 1, pages 35–51, 1996. (Cited on pages 7and 27.)

[Wipf 2008] David P Wipf and Srikantan S Nagarajan. A new view of automatic relevancedetermination. In Advances in neural information processing systems, pages 1625–1632, 2008. (Cited on page 34.)

[Xi 2011] J. Xi, P. Lamata, J. Lee, P. Moireau, D. Chapelle and N. Smith. Myocardialtransversely isotropic material parameter estimation from in-silico measurementsbased on a reduced-order unscented Kalman filter. Journal of the MechanicalBehavior of Biomedical Materials, 2011. (Cited on pages 6 and 78.)

[Xiang, Y. 2012] Xiang, Y., Gubian, S., Suomela, B. and Hoeng, J. Generalized SimulatedAnnealing for Efficient Global Optimization: the GenSA Package for R. The RJournal, 2012. Forthcoming. (Cited on page 17.)

[Zhang 2011] Yichuan Zhang and Charles A Sutton. Quasi-Newton methods for Markovchain Monte Carlo. In Advances in Neural Information Processing Systems, pages2393–2401, 2011. (Cited on pages 66 and 76.)

[Zhang 2013] Miaomiao Zhang, Nikhil Singh and P Thomas Fletcher. Bayesian estima-tion of regularization and atlas building in diffeomorphic image registration. InInformation Processing in Medical Imaging, pages 37–48. Springer, 2013. (Citedon pages 59 and 79.)

[Zhou 2008] Ding-Xuan Zhou. Derivative reproducing properties for kernel methods inlearning theory. Journal of computational and Applied Mathematics, vol. 220,no. 1, pages 456–463, 2008. (Cited on page 82.)

[Zhou 2015] Yitian Zhou, Olivier Bernard, Eric Saloux, Alain Manrique, Pascal Allain,Sherif Makram-Ebeid and Mathieu De Craene. 3D harmonic phase tracking withanatomical regularization. Medical Image Analysis, vol. 26, no. 1, pages 70 – 81,2015. (Cited on page 9.)

104 Bibliography

[Zou 2005] Hui Zou and Trevor Hastie. Regularization and variable selection via the elas-tic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology),vol. 67, no. 2, pages 301–320, 2005. (Cited on page 54.)

statistical learning for image-based personalization of cardiac models

Documents