medical image analysis - emory universityhaber/pubs/media.pdftauseef ur rehmana,*, eldad haberb,...

10
3D nonrigid registration via optimal mass transport on the GPU Tauseef ur Rehman a, * , Eldad Haber b , Gallagher Pryor a , John Melonakos a , Allen Tannenbaum a a Georgia Institute of Technology, School of ECE, 313 Ferst Drive, Atlanta, GA 30332, USA b Emory University, Atlanta, GA 30332, USA article info Article history: Received 10 April 2008 Received in revised form 24 July 2008 Accepted 3 October 2008 Available online xxxx Keywords: Non-rigid registration Optimal mass transport Monge–Kantorovich Multigrid Variational methods GPU abstract In this paper, we present a new computationally efficient numerical scheme for the minimizing flow approach for optimal mass transport (OMT) with applications to non-rigid 3D image registration. The approach utilizes all of the gray-scale data in both images, and the optimal mapping from image A to image B is the inverse of the optimal mapping from B to A. Further, no landmarks need to be specified, and the minimizer of the distance functional involved is unique. Our implementation also employs mul- tigrid, and parallel methodologies on a consumer graphics processing unit (GPU) for fast computation. Although computing the optimal map has been shown to be computationally expensive in the past, we show that our approach is orders of magnitude faster then previous work and is capable of finding transport maps with optimality measures (mean curl) previously unattainable by other works (which directly influences the accuracy of registration). We give results where the algorithm was used to com- pute non-rigid registrations of 3D synthetic data as well as intra-patient pre-operative and post-operative 3D brain MRI datasets. Published by Elsevier B.V. 1. Introduction Image registration is amongst the most common image process- ing tasks in medical image analysis. Registration is the process of establishing a common geometric reference frame between two or more image data sets and is necessary in order to compare or integrate image data obtained at different times or using different imaging modalities. A vast amount of literature exists on image registration techniques and we refer the reader to Maintz and Viergever (1998), Brown (1992), Goshtasby (2005), Hajnal and Hawkes (2001) for an overview of this field. Broadly speaking, image registration techniques can be classi- fied as either ‘‘rigid” or ‘‘non-rigid”. Rigid registration is usually performed when the images are assumed to be of objects that sim- ply need to be rotated and translated with respect to one another to achieve correspondence. Non-rigid registration on the other hand is used when either through biological differences or image acquisition or both, correspondence between structures in two images cannot be achieved without some localized stretching of the images (Crum et al., 2004). In contrast to rigid registration techniques, non-rigid registration techniques are still the subject of significant ongoing research activity. In this paper, we approach the task of non-rigid registration by treating it as an optimal mass transport problem. As with other registration techniques, the com- putational burden associated with this problem is high. We pro- pose a multi-resolution approach for the solution of this problem on the GPU to alleviate this difficulty. The optimal mass transport problem was first formulated by a French mathematician Gasper Monge in 1781, and was given a modern formulation in the work of Kantorovich and, therefore, is now known as the Monge–Kantorovich problem (Kantorovich, 1948). The original problem concerned finding the optimal way to move a pile of soil from one site to another in the sense of min- imal transportation cost. Hence, the Kantorovich–Wasserstein dis- tance is also commonly referred to as the earth mover’s distance (EMD). More recently, optimal mass transport has found applica- tions in medical image registration problems (Haker et al., 2004, 2001). Although there have been a number of algorithms in the lit- erature for computing an optimal mass transport, the method pro- posed in Haker et al. (2004, 2001) computes the optimal warp from a first order partial differential equation, which is a computational improvement over earlier proposed higher order methods and computationally complex discrete methods based on linear pro- gramming. However, at large grid sizes and especially for 3D regis- tration the computational cost of even this method is significant. Rigorous mathematical details for their algorithm can be found in Angenent et al. (2003). Though computationally expensive, the OMT method has a number of distinguishing characteristics: (1) it is a parameter free method and no landmarks need be specified, (2) it is symmetrical (the mapping from image A to image B is the inverse of the map- ping from B to A), (3) its solution is unique (no local minima), (4) it can register images where brightness constancy is an invalid 1361-8415/$ - see front matter Published by Elsevier B.V. doi:10.1016/j.media.2008.10.008 * Corresponding author. Tel.: +1 7705002101. E-mail addresses: [email protected] (T.u. Rehman), [email protected]. edu (E. Haber), [email protected] (G. Pryor), [email protected] (J. Melonakos), [email protected] (A. Tannenbaum). Medical Image Analysis xxx (2009) xxx–xxx Contents lists available at ScienceDirect Medical Image Analysis journal homepage: www.elsevier.com/locate/media ARTICLE IN PRESS Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid registration via optimal mass transport on the GPU, Med. Image Anal. (2009), doi:10.1016/j.media.2008.10.008

Upload: others

Post on 04-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    Contents lists available at ScienceDirect

    Medical Image Analysis

    journal homepage: www.elsevier .com/ locate/media

    3D nonrigid registration via optimal mass transport on the GPU

    Tauseef ur Rehman a,*, Eldad Haber b, Gallagher Pryor a, John Melonakos a, Allen Tannenbaum a

    a Georgia Institute of Technology, School of ECE, 313 Ferst Drive, Atlanta, GA 30332, USAb Emory University, Atlanta, GA 30332, USA

    a r t i c l e i n f o

    Article history:Received 10 April 2008Received in revised form 24 July 2008Accepted 3 October 2008Available online xxxx

    Keywords:Non-rigid registrationOptimal mass transportMonge–KantorovichMultigridVariational methodsGPU

    1361-8415/$ - see front matter Published by Elsevierdoi:10.1016/j.media.2008.10.008

    * Corresponding author. Tel.: +1 7705002101.E-mail addresses: [email protected] (T.u. Reh

    edu (E. Haber), [email protected] (G. Pryo(J. Melonakos), [email protected] (A. Tannenb

    Please cite this article in press as: Rehman(2009), doi:10.1016/j.media.2008.10.008

    a b s t r a c t

    In this paper, we present a new computationally efficient numerical scheme for the minimizing flowapproach for optimal mass transport (OMT) with applications to non-rigid 3D image registration. Theapproach utilizes all of the gray-scale data in both images, and the optimal mapping from image A toimage B is the inverse of the optimal mapping from B to A. Further, no landmarks need to be specified,and the minimizer of the distance functional involved is unique. Our implementation also employs mul-tigrid, and parallel methodologies on a consumer graphics processing unit (GPU) for fast computation.Although computing the optimal map has been shown to be computationally expensive in the past,we show that our approach is orders of magnitude faster then previous work and is capable of findingtransport maps with optimality measures (mean curl) previously unattainable by other works (whichdirectly influences the accuracy of registration). We give results where the algorithm was used to com-pute non-rigid registrations of 3D synthetic data as well as intra-patient pre-operative and post-operative3D brain MRI datasets.

    Published by Elsevier B.V.

    1. Introduction

    Image registration is amongst the most common image process-ing tasks in medical image analysis. Registration is the process ofestablishing a common geometric reference frame between twoor more image data sets and is necessary in order to compare orintegrate image data obtained at different times or using differentimaging modalities. A vast amount of literature exists on imageregistration techniques and we refer the reader to Maintz andViergever (1998), Brown (1992), Goshtasby (2005), Hajnal andHawkes (2001) for an overview of this field.

    Broadly speaking, image registration techniques can be classi-fied as either ‘‘rigid” or ‘‘non-rigid”. Rigid registration is usuallyperformed when the images are assumed to be of objects that sim-ply need to be rotated and translated with respect to one anotherto achieve correspondence. Non-rigid registration on the otherhand is used when either through biological differences or imageacquisition or both, correspondence between structures in twoimages cannot be achieved without some localized stretching ofthe images (Crum et al., 2004). In contrast to rigid registrationtechniques, non-rigid registration techniques are still the subjectof significant ongoing research activity. In this paper, we approachthe task of non-rigid registration by treating it as an optimal masstransport problem. As with other registration techniques, the com-

    B.V.

    man), [email protected]), [email protected]).

    , T.u. et al., 3D nonrigid regi

    putational burden associated with this problem is high. We pro-pose a multi-resolution approach for the solution of this problemon the GPU to alleviate this difficulty.

    The optimal mass transport problem was first formulated by aFrench mathematician Gasper Monge in 1781, and was given amodern formulation in the work of Kantorovich and, therefore, isnow known as the Monge–Kantorovich problem (Kantorovich,1948). The original problem concerned finding the optimal wayto move a pile of soil from one site to another in the sense of min-imal transportation cost. Hence, the Kantorovich–Wasserstein dis-tance is also commonly referred to as the earth mover’s distance(EMD). More recently, optimal mass transport has found applica-tions in medical image registration problems (Haker et al., 2004,2001). Although there have been a number of algorithms in the lit-erature for computing an optimal mass transport, the method pro-posed in Haker et al. (2004, 2001) computes the optimal warp froma first order partial differential equation, which is a computationalimprovement over earlier proposed higher order methods andcomputationally complex discrete methods based on linear pro-gramming. However, at large grid sizes and especially for 3D regis-tration the computational cost of even this method is significant.Rigorous mathematical details for their algorithm can be foundin Angenent et al. (2003).

    Though computationally expensive, the OMT method has anumber of distinguishing characteristics: (1) it is a parameter freemethod and no landmarks need be specified, (2) it is symmetrical(the mapping from image A to image B is the inverse of the map-ping from B to A), (3) its solution is unique (no local minima), (4)it can register images where brightness constancy is an invalid

    stration via optimal mass transport on the GPU, Med. Image Anal.

    mailto:[email protected]:[email protected]. edumailto:[email protected]. edumailto:[email protected]:[email protected] mailto:[email protected]://www.sciencedirect.com/science/journal/13618415http://www.elsevier.com/locate/media

  • 2 T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    assumption, and (5) OMT is specifically designed to take into ac-count changes in densities that result from changes in area orvolume.

    In the present paper, we extend our previous work (Rehmanand Tannenbaum, 2007) and implement the more general formu-lation of the OMT problem for 3D non-rigid registration based onmulti-resolution techniques and using the parallel architecture ofthe GPU. Although multi-resolution methods have served as criti-cal pieces of registration algorithms in the past, it had yet to beshown that the optimal mass transport problem could be solvedin the same manner. Our experimental results show that this is in-deed the case, a result which has implications for many fields be-yond imaging due to the ubiquitous nature of the OMT problem.We also show that the PDE-based solution to the OMT problemis greatly enhanced by our approach to such an extent that itbecomes practical for use on large 3D datasets both in terms ofspeed and accuracy. Overall, these results show that OMT-basedimage registration is practical on medical imagery and, thus,merits further investigation as an elastic registration techniquewithout the need of smoothness priors or brightness constancyassumptions.

    The rest of the paper is organized as follows. In Section 2 wereview the mathematical formulation of the problem and showhow to obtain a descent direction. In Section 3 we discuss the dis-cretization and the solution of the discrete problem using multi-resolution, multigrid methods implemented on the GPU. In Section4 we present the results of applying our algorithm to synthetic aswell as MRI brain datasets. Finally, in Section 5 we summarize ourwork.

    2. Optimal mass transport for registration

    2.1. Formulation of the problem

    We model the registration of images as an optimal mass trans-port problem. Accordingly, the solution to the problem is an opti-mal mapping û (in some sense) between two densities l0 > 0 andl1 > 0 (Kantorovich, 1948). If we now define d as the dimension ofthe image domain, detð�Þ as the determinant, u as a mapping fromX! X with X a subdomain of Rd, and represent byqð�; �Þ : X�X! Rþ a function of distance between two points inX, then the problem can be formalized as

    û ¼minu2U

    12

    RXd l0ðxÞqðuðxÞ; xÞdx;

    U :¼ fu : X! XjcðuÞ ¼ detðruÞl1ðuÞ � l0 ¼ 0g:ð2:1Þ

    We refer to the constraint cðuÞ ¼ 0 as the mass preserving (MP)property.

    For the remainder of this paper, we take qð�; �Þ to be the squareddistance function qðuðxÞ; xÞ ¼ kuðxÞ � xk2. Even for the simple L2-norm, (2.1) defines a highly non-linear optimization problem.While there exists a large body of literature which deals with theanalysis of the problem, such as (Ambrosio, 2000; Evans, 1989),only a smaller number of papers discuss efficient numerical solu-tions for the problem. Benamou and Brenier estimate û by relatingEq. (2.1) to the minimization of a certain kinetic energy functionalwith a space-time transport partial differential equation (PDE) con-straint (Benamou and Brenier, 2003). Their approach not only esti-mates the optimal mapping but also provides the transportationpath between the densities. A computationally faster solution to(2.1) was proposed in Haker et al. (2001), Angenent et al. (2003)and Haker et al. (2004). Their algorithm directly estimates û by firstcomputing a transformation u0 that fulfills the MP property. After-wards, the algorithm improves u0 by concatenating the mappingwith the transformation

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    ŝ ¼mins2S

    12

    RXd l0ðxÞðu0ðs�1ðxÞÞ � xÞ

    2dx;

    S :¼ fs : X! Xj~cðsÞ ¼ detðrsÞl0ðsÞ � l0 ¼ 0g:ð2:2Þ

    We refer to the second equation in (2.2) as the ~c constraint. Thismeans that s 2S is an MP mapping from l0 to itself. The authorsin Angenent et al. (2003) and Haker et al. (2004) show that ŝ canbe estimated via a steepest descent flow. To register 2D MRIs, theyimplement the method using forward Euler equation scheme fortime stepping and a simple finite difference discretization of thespatial derivatives. The approach, however, does not enforcethe MP constraint at each step of the numerical algorithm, so thatthe final solution generally does not fulfill the MP property. In addi-tion, steepest descent is very slow in estimating the solution to Eq.(2.2). For these reasons it would be very challenging to efficientlyregister 3D medical images with this approach. To overcome thishurdle, this paper describes a faster numerical solution to Eq.(2.2) that enforces the MP constraint.

    Unlike Angenent et al. (2003) and Haker et al. (2004), we solvethe optimization problem via an approach where we choose adirection other than steepest descent and show that it convergesfaster (see Section 2.2). Furthermore, we derive a numerical ap-proach that uses a consistent conservative discretization methodand enforces the MP constraint at each update of the solution (Sec-tion 3).

    We end this section with the comment that our approach mostclosely relates to those registration approaches based on fluidmechanics. The optimal warping map of the L2 Monge–Kantoro-vich equation may be regarded as the velocity vector field whichminimizes a standard energy integral subject an Euler continuityequation constraint (Benamou and Brenier, 2003). In particular,in the fluid mechanics framework, this means that the optimalMonge–Kantorovich solution is given as a potential flow.

    2.2. Obtaining the descent direction

    We now quickly review the derivation presented in Angenentet al. (2003) and Haker et al. (2004) but within a variationalframework.

    Assuming that the MP constraint manifold (2.2) is valid we takea perturbation in s which stays on the MP constraint manifold. Thisleads to

    0 ¼ cðsþ dsÞ � cðsÞ¼ detðrðsþ dsÞÞl0ðsþ dsÞ � detðrsÞl0ðsÞ¼ detðrsÞðr � ðdsðs�1ÞÞðsÞÞl0ðsÞ þ detðrsÞrl0ðsÞ � ds:

    This expression can be simplified as long as the constraint is va-lid. Since detðruÞ > 0 we can divide, and rearranging we have

    0 ¼ ðl0r � ðdsðs�1ÞÞÞðsÞ þ rl0ðsÞ � ds¼ l0r � ðdsðs�1ÞÞ þ rl0 � dsðs�1Þ¼ r � ðl0dsðs�1Þ:

    Defining df ¼ l0dsðs�1Þ, we see that

    r � df ¼ 0Next, looking at u ¼ u0ðs�1Þ, we can write uðsÞ ¼ u0 which impliesthat

    ðruðsÞÞdsþ duðsÞ ¼ 0or

    du ¼ �ðruÞdsðs�1Þ:Using the definition of df we obtain that as long as the constraint isvalid and for uðsÞ ¼ u0, we have

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx 3

    ARTICLE IN PRESS

    du ¼ �l�10 ðruÞdf; ð2:3aÞ0 ¼ r � df: ð2:3bÞ

    Letting M denote the objective function in (2.2), it can be shownthat

    dM ¼Z

    Xu � dfdx: ð2:3cÞ

    In the original papers (Haker et al., 2001; Angenent et al., 2003; Ha-ker et al., 2004), it is suggested to use the Helmholtz decompositionin order to obtain a descent direction. Here we employ a differentapproach. First, we note that the divergence constraint can be elim-inated by selecting

    df ¼ r� dg;and thus to reduce the objective function M we need to obtain adirection that yields a negative dM, that is we seek a direction, dgsuch that

    dM ¼Z

    Xu � r � gdx < 0:

    Using Gauss theorem we obtain thatZX

    u � r � dgdx ¼Z

    Xr� u � gdxþ

    Z@Xðu � ðg� nÞdx;

    and therefore the steepest descent direction is given by

    dg ¼ r� u dg 2 X; dg� n ¼ 0 dg 2 @Xwhich leads to the update

    df ¼ r�r� u;and finally to the steepest descent direction in u

    du ¼ � 1l0ðruÞr �r� u

    or, in symmetric form

    l0ðruÞ�1du ¼ �r�r� u: ð2:3dÞThe reason that this form is useful is because it can help to fur-

    ther understand the behavior of the system. The elliptic operator�r�r� is a negative operator thus, the equation can be thoughtof as a parabolic PDE as long as all the eigenvalues of ru have po-sitive real parts. If at some point this condition is violated (negativereal parts), then we obtain a backward parabolic equation which isill-posed. This point must be carefully considered for the numericalmethod to be used.

    Using the above decomposition a family of different directionsmay be obtained. Note that in order to reduce the objectiveR

    Xr� u � gdx, any vector field of the formdg ¼ Ar� ucan be used. For example, a choice that leads to a similar method tothe one derived in the original works (Angenent et al., 2003; Hakeret al., 2004) in 2D is A ¼ �D�1 which leads to the update

    l0ðruÞ�1du ¼ r� D�1r� u: ð2:3eÞIt is also easy to see that the flow (2.3e) is valid in 3D. Moreover,

    using Fourier analysis it is easy to verify that given a smooth u thesecond formulation (2.3e) leads to a more stable method thatshould converge faster compared with the first formulation(2.3d), because the operator r� D�1r� is compact while ther�r� operator is unbounded (Trottenberg et al., 2001a). Thus,(2.3e) will not in general prefer high or low frequencies. In the nextsection, we therefore derive a numerical method for (2.3e) ratherthan for (2.3d).

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    3. Implementation

    In this section we derive an efficient numerical method for thesolution of the flow. The method has four main components:

    – pre-processing of input volume data.– conservative discretization of Eq. (2.3e).– a criterion to choose the step size.– a method to correct steps that drift away from the constraint.

    (2.2).

    3.1. Pre-processing input data

    In context of image registration applications the input data toour algorithm is the source and target volumes that need to be reg-istered. For all the examples presented in this paper we model themass density for a voxel as the image intensity. However, it canalso be alternatively defined as any scalar field that is related tothe underlying physical model. This property can be exploited fornon-rigid registration of multi-modality data as well, where suffi-cient anatomical correspondence exists between the source andtarget datasets. This will be further studied in future work. In orderfor the notion of mass transport to hold it is necessary that bothvolumes have same total mass. This is ensured by normalizingthe image intensities by the respective sum of all intensity valuesin each volume. The normalized data is then scaled by a commonfactor to avoid numerical instability due to very small values. An-other step in the pre-processing of input data is the addition of asmall mass in the background regions where there the intensityvalues are zero in order to avoid a divide-by-zero while solvingEq. (2.3e). Another step necessary in context of Brain MRI registra-tion is dealing with the inherent anisotropic nature of the data. Wepre-process all brain MRI data by interpolating and re-sampling toisotropic voxels.

    3.2. Conservative discretization

    The applications we have in mind derive from medical imagingwhere images are discretized on a regular grid. We therefore con-struct our discretization based on a finite volume/difference ap-proach. To derive and analyze our discretization we introduce anew variable dp ¼ D�1r� u and rewrite (2.3e) as

    l0ðruÞ�1 r�

    0 D

    !du

    dp

    � �¼

    0r� u

    � �: ð3:7Þ

    In order for the discrete system to be well posed we need con-sistent discretizations for D;ru and r� u. There are a number ofpossible discretizations that lead to a well-posed system.

    We divide X into n1 � . . .� nd cells, each of size h1 � . . .� hdwhere d is the dimension of the problem. We discretize all thecomponents of u at the nodes of each cell to obtain d grid functionsû1; . . . ûd. Since dp is connected to u by the curl operator, we employa staggered grid and place dp at cell centers. To approximate ru ateach node, we use long differences. For example, in 2D, assumingh1 ¼ h2 ¼ h, we have

    ðruÞi;j ¼1

    2hû1iþ1;j � û1i�1;j û1i;jþ1 � û1i;j�1û2iþ1;j � û2i�1;j û2i;jþ1 � û2i;j�1

    � �þ Oðh2Þ

    ¼ ðrhûÞij þ Oðh2Þ

    Thus, in 3D, the discretized (1,1) block in (3.7) is a matrix of theform

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • Fig. 1. Outline of processing for the OMT solver conducted on the GPU. Processing occurs in two major phases: evolution of the map from source to target volumes and timestep adjustment. Each gray rectangle represents one Cg kernel executed on the GPU. Arrows indicate the flow of data volumes through the Cg kernels. The entire process inthe figure, above is repeated left to right until convergence.

    Fig. 2. CPU versus GPU solution of PDEs: While the CPU computes updates on data grids one element at a time, the GPU is capable of updating entire grids in one pass due totheir massively parallel architecture.

    4 T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    ðrhûÞ ¼1h

    diagðD1û1Þ diagðD2û1Þ diagðD3û1ÞdiagðD1û2Þ diagðD2û2Þ diagðD3û2ÞdiagðD1û3Þ diagðD2û3Þ diagðD3û3Þ

    0B@

    1CA; ð3:8Þ

    where Dj is a matrix of long differences in the jth direction. Assum-

    ing u is sufficiently smooth it can be shown that upon a consistentdiscretization of the Laplacian the system (3.7) is invertible and thatthe overall (discrete) problem is well-posed. To obtain a consistentdiscretization of the Laplacian we use a standard discretization (5point stencil in 2D and 7 point stencil in 3D) with Dirichlet bound-ary conditions.

    Fig. 3. The GPU realizes an increasing advantage in solving the OMT problem overthe CPU as grid size increases up to 1283 sized grids.

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    Finally, we need to discretize the curl of u. Here we use shortdifferences in one direction averaged in the other direction to ob-tain a cell center, second order accurate approximation of r� u.For example, in 2D we obtain

    Fig. 4. Error Analysis-Known Deformation Example. L2-norm and 1-norm oferror in calculation of u as a function of grid size.

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx 5

    ARTICLE IN PRESS

    ðCûÞiþ12þ12 ¼û1i;jþ1 � û1i;j þ û1iþ1;jþ1 � û1iþ1;j

    2h1

    � û2iþ1;j � û2i;j þ û2iþ1;jþ1 � û2i;jþ12h2

    þ Oðh2Þ; ð3:9Þ

    where C denotes the curl matrix.

    3.3. Computation of a step

    The computation of each step requires two parts. Firstly, thesolution of (3.7) and secondly, a way to determine if it is an accept-able step. The solution of the system (3.7) is straightforward. Anyfast Poisson solver can be used for the task. Here we have used astandard multigrid method with weighted Jacobi smoothing (Trot-tenberg et al., 2001b), bilinear prolongation and its adjoint as arestriction (Trottenberg et al., 2001c).

    The validity of the update is determined using the followingprocedure. Assume that at iteration n we have ûn as an approxima-tion to u and that we computed dû. The update is then performedusing,

    ûnþ1 ¼ Pðûn þ adûÞ; ð3:10Þ

    Fig. 5. Synthetic Imagery Results. A sphere is mapped to its deformed counterpart. Inmagnitude of deformation is maximum at the top where the dent is and it decays smoo

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    where P is an orthogonal projection discussed in Section 3.4 belowthat projects ûn þ adû into the mass preserving manifold. The stepsize a is then chosen such that the objective function is decreasedand that the real part of the eigenvalues of ðrhûÞ is positive. The en-tire procedure is outlined in Algorithm 1.

    Algorithm 1. Solution of OMT:û OMTsolðl0;l1Þ;

    Use l0 and l1 to compute a mass preserving u0while true do

    the lowthly ins

    stratio

    Solve (3.7) for dûline search: set a ¼ 1while true do

    ûnþ1 ¼ Pðûn þ adûÞif kûnþ1 � xkl0 < kûn � xkl0 and Reðkðrhunþ1ÞÞ > 0 then

    Breakend ifa( a=2

    end whileend while

    er image we show the deformation vector field. It is clearly visible that theide the sphere (Data size 1283).

    n via optimal mass transport on the GPU, Med. Image Anal.

  • 6 T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    3.4. Orthogonal projection into the mass preserving constraint

    Assume that we have computed a mass preserving mappingûn, and that we have updated it to obtain vn ¼ ûn þ adû. It shouldbe noted that an infinitesimal dû does not guarantee masspreservation. Furthermore, we aim to take large steps in dû, andtherefore the MP constraint is likely to be invalid. To correct forthis we use orthogonal projection. The goal is to compute a vectorfield dv such that cðvþ dvÞ ¼ 0. Obviously, dv is non-unique andtherefore we seek a minimum norm solution that is we seek dvsuch that

    minv

    12kdvk2l0

    subject to

    cðdvÞ ¼ l0ðvþ dvÞdetðrðvþ dvÞÞ � l1 ¼ 0:It is easy to verify that a correction for dv can be obtained by solvingthe system dv � c>v ðcvc>v Þ

    �1cðvÞ (Nocedal and Wright, 1999) The sys-tem cvc>c can be thought as an elliptic system of equations. The sys-tem is solved using preconditioned conjugate gradient with anincomplete Cholesky preconditioner.

    Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices froanterior part of the brain.

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    3.5. 3D multigrid Laplacian inversion

    We inverted the Laplacian (a key component of the OMT algo-rithm) using a 3D multigrid solver. The multigrid idea is very fun-damental. It takes advantage of the smoothing properties of theclassical iteration methods at high frequencies (Jacobi, Gauss Sie-del, SOR, etc.) and the error smoothing at low frequencies byrestriction to coarse grids. The essential multigrid principle is toapproximate the smooth (low frequency) part of the error on coar-ser grids. The non-smooth or rough part is reduced with a smallnumber of iterations with a basic iterative method on the fine grid.

    The basic components of multigrid algorithm are discretization,intergrid transfer operators (interpolation and restriction), a relax-ation scheme and the iterative cycling structure. We used an expli-cit finite difference scheme for approximating the 3D Poissonequation. This approach uses a 19-point formula on the uniformcubic grid. Relaxation was performed using a parallelizable four-color Gauss-Seidel relaxation scheme. This increases robustnessand efficiency and is especially suited for the implementation onthe GPU. We used a trilinear interpolation operator for transferringthe coarse grid correction to fine grids. The residual restrictionoperator for projecting residual from the fine to coarse grids is

    m Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx 7

    ARTICLE IN PRESS

    the full-weighting scheme. A multigrid V(2,2)-cycle algorithm wasused to iterate for the solution (residual max norm � 10�5). Theinterested reader is referred to Gupta and Zhang (2000); Briggset al. (2000); Trottenberg et al. (2001c) for complete details onimplementation of the multigrid method.

    3.6. GPU implementation

    An advantage of our solution to the OMT problem is that it isparticularly well-suited for implementation on parallel computingarchitectures. Over the past few years, it has been shown thatgraphics processing units (GPUs; now standard in most con-sumer-level computers), which are naturally massively parallel,are well suited for these types of parallelizable problems (Bolzet al., 2003; Nolan et al., 2003).

    A GPU is a highly parallel computing device designed for thetask of graphics rendering. However, the GPU has evolved in recentyears to become a more general processor, allowing users to flexi-bly program certain aspects of the GPU to facilitate sophisticatedgraphics effects and even scientific applications. In general, theGPU has become a powerful device for the execution of data-paral-lel, arithmetic (versus memory) intensive applications in which the

    Fig. 7. OMT Results viewed on a sagital slice. The top row shows corresponding slices fromvisible on the anterior part of the brain.

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    same operations are carried out on many elements of data in par-allel. Example applications include the iterative solution of PDE’s,video processing, machine learning, and 3D medical imaging.

    Taking advantage of the benefits a parallel approach has to offerour problem, we implemented our OMT multigrid algorithm on theGPU. The GPU’s advantage over the CPU in this sense is that whilethe CPU can execute only one or two threads of computation at atime, the GPU can execute over two orders of magnitude more.Thus, instead of sequentially computing updates on data gridsone element at a time, the GPU computes updates on entire gridson each render pass, significantly improving performance (Fig. 3).For instance, on a modest Dual Xeon 1.6Ghz machine with an nVi-dia GeForce 8800 GX GPU (3DMark score of 7200), improvementsin speed over our CPU OMT implementation reached 4826 percenton a 1283 volume data where it converges in just 15 minutes. Pres-ently available GPUs only allow single precision computations,however, this did not affect the stability of the OMT algorithm.

    The OMT algorithm is implemented on the GPU as a series ofkernel operations: arithmetic computations performed compo-nent-wise over large grids of data. An example of such an operationis the restriction operator utilized in the multigrid algorithm todown-sample data; each element of an input data grid is convolved

    Pre-op(Left) and Post-op(Right) MRI data. Here again the maximum deformation is

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • 8 T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    and re-sampled to a lower resolution grid. The data flow and se-quence of kernel applications involved in the OMT solver are givenin Fig. 1. All kernels are written in Cg in conjuction with theOpenGL/fragment shader paradigm for GPU computing as de-scribed in Pharr (2005). Fig. 2

    Fig. 8. Sphere Registration(3D View). The deformation is vis

    Fig. 9. Brain Sag Registration(3D View). The brain sag is visi

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    4. Results

    We illustrate our registration method using both synthetic andreal examples. We start by recovering a known deformation fieldthat relates two images. We then register two synthetically

    ible at the dented region of the sphere. (Data size 1283).

    ble in the anterior portion of the brain. (Data size 2563).

    stration via optimal mass transport on the GPU, Med. Image Anal.

  • T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx 9

    ARTICLE IN PRESS

    generated spherical volumes and conclude by giving a real exam-ple of 3D Brain MRI image registration.

    4.1. Synthetic examples

    A synthetic example can easily be constructed to test the con-vergence of our algorithm. We used the standard MATLAB 3DMRI dataset for this experiment. Since, the optimal map u can bedefined as the gradient of a convex function / (Angenent et al.,2003). We define one such function as,

    /ðxÞ ¼12ðx21 þ x22 þ x23Þ þ c � e

    �12 x1�12ð Þ

    2=r21 � e�12 x2�12ð Þ

    2=r22

    � e�12 x3�12ð Þ2=r23 2 ½0;1�2;

    where c;r1;r2 and r3 are parameters chosen to create a uniquedeformation field. Differentiating / with respect to x ¼ ðx1; x2; x3Þ,we obtain u ¼ ðu1;u2; u3Þ,

    u1 ¼ x1 � c � ððx1 � 0:5Þ=r21Þ � f ðxÞ

    u2 ¼ x2 þ c � ððx2 � 0:5Þ=r22Þ � f ðxÞ

    u3 ¼ x3 þ c � ððx3 � 0:5Þ=r23Þ � f ðxÞ

    where,

    f ðxÞ ¼ e�12 x1�12ð Þ2=r21 � e�12 x2�12ð Þ

    2=r22 � e�12 x3�12ð Þ

    2=r23

    We then apply this deformation field u to l1ðx1; x2; x3Þ (MAT-LAB MRI data) to obtain l0ðx1; x2; x3Þ as per the followingrelationship:

    l0 :¼ detðruÞl1ðuÞ:We then input the l0 and l1 pair into our solver to find the

    transformation u. We terminated our algorithm after 100 iterationsor when the curl of the solution was 4 orders of magnitude smallerthan its initial size (in the 1-norm). The algorithm was runwith input sizes of 8 � 8 � 8, 16 � 16 � 16, 32 � 32 � 32 and64 � 64 � 32. The error between the known and computed defor-mation fields is plotted in Fig. 4 as a function of the grid size whichclearly demonstrates quadratic convergence of our method to thetrue solution as is expected from the discretization error used inour numerical approximations.

    In the second case, we register a synthetically generated 3Dsphere (128 � 128 � 128) to a deformed (dented) counterpart;see Fig. 5. It can be clearly seen that our algorithm does a goodjob in capturing the deformation in the sphere.

    4.2. Brain sag registration

    In the third case, we registered two 3D brain MRI datasets. Thefirst data set was pre-operative while the second data set wasacquired during surgery (craniotomy and opening of the dura).Both were resampled to 2563 voxels and pre-processed to removethe skull. For clarity we view the 2D deformation grid overlaid oncorresponding sagital and coronal slices in Figs. 6 and 7,respectively.

    Figs. 8 and 9 show the respective deformation grids of the aboveexamples in 3D. For each of the above examples the deformationmap was computed in fewer than 20 iterations. The curl (optimalitymetric) was reduced to less than 10�3, indicating convergence. Thisis a major improvement over the previous methods (Haker et al.,2004; Angenent et al., 2003) where thousands of iterations wererequired for convergence. Another advantage to our method isthe explicit projection to the mass preserving constraint in each

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    iteration which ensures that the calculated mapping always takesus from the source image to the target image.

    5. Conclusions

    In this paper, we presented a computationally efficient methodfor 3D image registration based on the classical problem of optimalmass transportation implemented in a novel manner.

    Many times, global elastic registration methods based on prin-ciples from computational fluid dynamics of the type presented inthis work are so computationally intensive that they becomeimpractical for realistic problems in medical imaging. However,we have shown that optimal mass transport is, in fact, a viablesolution for elastic registration by achieving low run times for typ-ically sized 3D datasets on standard desktop computing platforms.In future work, we will be applying this methodology to otherinteresting cases as well as extending the results to 3D surfaces(for which the Monge–Kantorovich theory holds).

    Acknowledgement

    This work was supported in part by grants from NSF, AFOSR,ARO, as well as by a grant from NIH (NAC P41 RR-13218) throughBrigham and Women’s Hospital. This work is part of the NationalAlliance for Medical Image Computing (NAMIC), funded by the Na-tional Institutes of Health through the NIH Roadmap for MedicalResearch, Grant U54 EB005149. Information on the NationalCenters for Biomedical Computing can be obtained from http://nihroadmap.nih.gov/bioinformatics. The authors were alsopartially funded by NSF Grants CCF-0728877, CCF-0427094 andDOE Grant DE-FG02-05ER25696. We also want to thank FlorinTalos at Surgical Planning Laboratory, Department of Radiology,Brigham & Women’s Hospital, Harvard Medical School, Boston,MA for graciously providing Brain MRI data.

    References

    Ambrosio, L., 2000. Lecture notes on optimal transport problems. Lectures given atEuro Summer School, July 2000. URL http://cvgmt.sns.it/papers/amb00a/.

    Angenent, S., Haker, S., Tannenbaum, A., 2003. Minimizing flows for theMonge–kantorovich problem. SIAM Journal of Mathematical Analysis 36, 61–97.

    Benamou, J.D., Brenier, Y., 2003. A computational fluid mechanics solution to theMonge–kantorovich mass transfer problem. SIAM Journal of MathematicalAnalysis 35, 61–97.

    Bolz, J., Farmer, I., Grinspun, E., Schroeder, P., 2003. Sparse matrix solvers on theGPU: conjugate gradients and multigrid. In: Proceedings of SIGGRAPH, vol. 22.pp. 917–924.

    Briggs, W., Hensen, V., McCormick, S., 2000. A Multigrid Tutorial. SIAM.Brown, L.G., 1992. A survey of medical image registration. ACM Computing Surveys

    24, 325–376.Crum, W., Hartkens, T., Hill, D., 2004. Non-rigid image registration: theory and

    practice. British Journal of Radiology 77 (Special Issue), S140–S153.Evans, L.C., 1989. Partial differential equations and Monge–kantorovich

    transfer. Lecture notes. URL http://math.berkeley.edu/~evans/Monge–Kantorovich.survey.pdf.

    Goshtasby, A.A., 2005. 2-D and 3-D image registration: for medical, remote sensing,and industrial applications. John Wiley and Sons, Hoboken, NJ.

    Gupta, M.M., Zhang, J., 2000. High accuracy multigrid solution of the 3dconvection-diffusion equation. Applied Mathematics and Computation 113,249–274.

    Haker, S., Tannenbaum, A., Kikinis, R., 2001. Mass preserving mappings and imageregistration. In: MICCAI, pp. 120–127.

    Haker, S., Zhu, L., Tannenbaum, A., Angenent, S., 2004. Optimal mass transport forregistration and warping. International Journal of Computer Vision 60 (3), 225–240.

    Hajnal, J.V., Hawkes, D.L.G.H., 2001. Medical Image Registration. The BiomedicalEngineering Series. CRC Press, Boca Raton, FL.

    Kantorovich, L.V., 1948. On a problem of Monge. Uspekhi Matematicheskikh Nauk 3,225–226.

    Maintz, J.A., Viergever, M.A., 1998. A survey of medical image registration. MedicalImage Analysis 2, 1–57.

    stration via optimal mass transport on the GPU, Med. Image Anal.

    http://math.berkeley.eduhttp://math.berkeley.edu

  • 10 T.u. Rehman et al. / Medical Image Analysis xxx (2009) xxx–xxx

    ARTICLE IN PRESS

    Nocedal, J., Wright, S., 1999. Numerical Optimization. Springer, New York, pp. 547–548 (Chapter 18).

    Nolan, G., et al., 2003. A multigrid solver for boundary value problemsusing programmable graphics hardware. In: Proceedings of SIGGRAPH, pp.102–111.

    Pharr, M., 2005. GPU Gems 2. Addison-Wesley.Rehman, T., Tannenbaum, A., 2007. Multigrid optimal mass transport for image

    registration and morphing. In: Proceedings of SPIE Conference onComputational Imaging V, p. 649810.

    Please cite this article in press as: Rehman, T.u. et al., 3D nonrigid regi(2009), doi:10.1016/j.media.2008.10.008

    Trottenberg, U., Oosterelee, C., Schüller, A., 2001a. Multigrid. Academic Press, pp.102–106 (Chapter 4).

    Trottenberg, U., Oosterelee, C., Schüller, A., 2001b. Multigrid. Academic Press, p. 30(Chapter. 2).

    Trottenberg, U., Oosterelee, C., Schüller, A., 2001c. Multigrid. Academic Press.

    stration via optimal mass transport on the GPU, Med. Image Anal.

    3D nonrigid registration via optimal mass transport on the GPUIntroductionOptimal mass transport for registrationFormulation of the problemObtaining the descent direction

    ImplementationPre-processing input dataConservative discretizationComputation of a stepOrthogonal projection into the mass preserving constraint3D multigrid Laplacian inversionGPU implementation

    ResultsSynthetic examplesBrain sag registration

    ConclusionsAcknowledgementReferences