automated reconstruction of dendritic and axonal trees by ... · geometry into account to improve...

24
Neuroinform (2011) 9:279–302 DOI 10.1007/s12021-011-9122-1 ORIGINAL ARTICLE Automated Reconstruction of Dendritic and Axonal Trees by Global Optimization with Geometric Priors Engin Türetken · Germán González · Christian Blum · Pascal Fua Published online: 15 May 2011 © Springer Science+Business Media, LLC 2011 Abstract We present a novel probabilistic approach to fully automated delineation of tree structures in noisy 2D images and 3D image stacks. Unlike earlier meth- ods that rely mostly on local evidence, ours builds a set of candidate trees over many different subsets of points likely to belong to the optimal tree and then chooses the best one according to a global objective function that combines image evidence with geometric priors. Since the best tree does not necessarily span all the points, the algorithm is able to eliminate false detections while retaining the correct tree topology. Manually annotated brightfield micrographs, retinal scans and the DIADEM challenge datasets are used to evaluate the performance of our method. We used the DIADEM metric to quantitatively evaluate the topological accuracy of the reconstructions and showed that the use of the geometric regularization yields a substantial improvement. Keywords DIADEM · Tree reconstruction · Global optimization · Minimum arborescence · k-MST · Ant colony optimization E. Türetken (B ) · G. González · P. Fua Computer Vision Laboratory, Faculté Informatique et Communications, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland e-mail: [email protected] C. Blum ALBCOM, Dept. Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Jordi Girona 1–3, Omega 112 Campus Nord, 08034 Barcelona, Spain Introduction Tree-like structures, such as dendritic, vascular, or bron- chial networks, are pervasive in biological systems. With the advent of modern acquisition techniques that produce endless streams of imagery, there has been renewed interest in automated delineation to exploit this data. However, despite many years of sustained effort, automated techniques remain fragile and error- prone. In this paper, we use 3D optical micrographs of neurons and 2D retinal fundus images to demon- strate the importance of taking global tree structure and geometry into account to improve topological accuracy of the delineations. More specifically, we rely on a machine learning approach to assign to image voxels probabilities of belonging to the centerline of a filament. We then se- lect evenly spaced high-probability voxels that we treat as anchor points and connect them using maximum- probability paths. This turns our set of N anchor points into a weighted graph in which we look for minimum- weight trees that span k < N of its edges, which is known as the k-Minimum Spanning Tree (k-MST) problem (Garg 1996). Finding the optimal k-MST is NP-hard but an Ant Colony Optimization scheme we developed in earlier work (Blum and Blesa 2005) has proved effective at generating good approximations. Such an approach is in contrast to more traditional ones (Fischler and Heller 1998; Gonzalez et al. 2008) that use a minimum spanning tree to link all vertices and then prune the branches that do not conform to a shape or image appearance criterion. Such methods are faster and can eliminate spurious branches, but cannot recover from incorrect connectivity in the initial spanning tree. By contrast, when k is in the right range,

Upload: others

Post on 12-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Neuroinform (2011) 9:279–302DOI 10.1007/s12021-011-9122-1

ORIGINAL ARTICLE

Automated Reconstruction of Dendritic and Axonal Treesby Global Optimization with Geometric Priors

Engin Türetken · Germán González ·Christian Blum · Pascal Fua

Published online: 15 May 2011© Springer Science+Business Media, LLC 2011

Abstract We present a novel probabilistic approach tofully automated delineation of tree structures in noisy2D images and 3D image stacks. Unlike earlier meth-ods that rely mostly on local evidence, ours builds aset of candidate trees over many different subsets ofpoints likely to belong to the optimal tree and thenchooses the best one according to a global objectivefunction that combines image evidence with geometricpriors. Since the best tree does not necessarily spanall the points, the algorithm is able to eliminate falsedetections while retaining the correct tree topology.Manually annotated brightfield micrographs, retinalscans and the DIADEM challenge datasets are usedto evaluate the performance of our method. We usedthe DIADEM metric to quantitatively evaluate thetopological accuracy of the reconstructions and showedthat the use of the geometric regularization yields asubstantial improvement.

Keywords DIADEM · Tree reconstruction ·Global optimization · Minimum arborescence ·k-MST · Ant colony optimization

E. Türetken (B) · G. González · P. FuaComputer Vision Laboratory, Faculté Informatiqueet Communications, École Polytechnique Fédéralede Lausanne, 1015 Lausanne, Switzerlande-mail: [email protected]

C. BlumALBCOM, Dept. Llenguatges i Sistemes Informàtics,Universitat Politècnica de Catalunya, Jordi Girona 1–3,Omega 112 Campus Nord, 08034 Barcelona, Spain

Introduction

Tree-like structures, such as dendritic, vascular, or bron-chial networks, are pervasive in biological systems.With the advent of modern acquisition techniques thatproduce endless streams of imagery, there has beenrenewed interest in automated delineation to exploitthis data. However, despite many years of sustainedeffort, automated techniques remain fragile and error-prone. In this paper, we use 3D optical micrographsof neurons and 2D retinal fundus images to demon-strate the importance of taking global tree structure andgeometry into account to improve topological accuracyof the delineations.

More specifically, we rely on a machine learningapproach to assign to image voxels probabilities ofbelonging to the centerline of a filament. We then se-lect evenly spaced high-probability voxels that we treatas anchor points and connect them using maximum-probability paths. This turns our set of N anchor pointsinto a weighted graph in which we look for minimum-weight trees that span k < N of its edges, which isknown as the k-Minimum Spanning Tree (k-MST)problem (Garg 1996). Finding the optimal k-MST isNP-hard but an Ant Colony Optimization scheme wedeveloped in earlier work (Blum and Blesa 2005) hasproved effective at generating good approximations.Such an approach is in contrast to more traditionalones (Fischler and Heller 1998; Gonzalez et al. 2008)that use a minimum spanning tree to link all verticesand then prune the branches that do not conform toa shape or image appearance criterion. Such methodsare faster and can eliminate spurious branches, butcannot recover from incorrect connectivity in the initialspanning tree. By contrast, when k is in the right range,

280 Neuroinform (2011) 9:279–302

the tree spanning only k edges does not suffer fromthis problem. We present an automated way to estimatethe optimal k value by assigning probabilistic costs totrees and selecting the one that minimizes this cost. Asa result, we obtain improved reconstructions such asthose depicted in the third column of Fig. 1.

As shown in the last column of Fig. 1, these re-sults can be further improved by incorporating intothe tree reconstruction algorithm a regularization priorthat rewards geometric consistency between consec-utive edges and nodes. Unfortunately, this improve-ment comes at the cost of an increased computationalcomplexity because the algorithm has to deal withpairs of edges as opposed to single ones. Nevertheless,we were able to extend our earlier optimization algo-rithm (Blum and Blesa 2005) to handle the pairwiseterms and compute near-optimal trees for all possiblek values at little extra cost.

This results in a generic and fully automated tech-nique. Our contribution is therefore both a globaloptimization approach to finding optimal trees and apractical algorithm to computing good approximationsin an acceptably short time, even though the underlyingproblem is NP-hard.

Related Work

Most automated delineation techniques rely on a localtubularity measure that can be postulated a pri-ori (Frangi et al. 1998; Law and Chung 2008), opti-mized to find specific patterns (Jacob and Unser 2004;Meijering et al. 2004), or learned (Santamaría-Panget al. 2007; Gonzalez et al. 2009) from training data.Given an image stack, they compute a tubularity im-age in which this measure is computed at each voxel.The algorithms that use such tubularity scores can beroughly categorized into two classes.

The first class includes methods that use segmen-tation of the tubularity image such as thinning-basedmethods, which perform skeletonization of this seg-mentation (Weaver et al. 2004; Xu et al. 2009), and ac-tive contour-based methods, which are initialized fromit (Cai et al. 2006; Vasilkoski and Stepanyants 2009).Such methods are shown to be effective and efficient,when a very good segmentation is given or can bereliably obtained. In practice, however, such segmen-tations are hard to obtain. In particular, thinning-basedmethods often produce disconnected components andartifacts on noisy data, which then require considerable

(a) (b) (c) (d)

Fig. 1 Delineation results in 3D (top row) and 2D (bottom row).a A brightfield micrograph of a neuron and a scan of a retinalblood vessel network. b Minimum spanning trees. c Reconstruc-tions obtained without geometric regularization. d Reconstruc-

tions obtained with geometric regularization. This figure, as mostothers in this paper, is best viewed in color because it containsoverlays

Neuroinform (2011) 9:279–302 281

post-processing and analysis to merge into a meaningfultree.

The second class involves explicitly delineating thetree in the tubularity image. It includes tracking meth-ods that start from a set of seed points and recur-sively trace high-tubularity paths (Can et al. 1999; Al-Kofahi et al. 2002; Yedidya and Hartley 2008). Thesetechniques are computationally efficient because thetubularity measure only needs to be evaluated for asmall subset of the image volume. However, due totheir incremental nature, local tracing errors may resultin large topological perturbations, and hence they lackrobustness.

Global methods avoid this problem by exploitingmore of the image evidence and optimizing a globalobjective function, for example by using Markov ChainMonte Carlo (MCMC) algorithms (Fan 2006; Sunet al. 2007). However, while such methods producesmooth tree components, they do not necessarily guar-antee their spatial connectedness. Furthermore, theyare computationally intensive, which limits their ap-plicability to large datasets.

By contrast, methods that sample local maxima ofthe tubularity image and then connect these samplesinto a minimum spanning tree (MST) (Fischler andHeller 1998; Gonzalez et al. 2008; Xie et al. 2010),guarantee connectivity. However, they usually resultin many spurious branches and gaps. While pruningduring post-processing can eliminate some of the er-roneous branches, it does not allow for recovery fromthe other mistakes. To resolve some of these mistakes,in Xie et al. (2010), a gap indicator function is incor-porated in the edge weights. However, this approachcan easily fail in the presence of noisy data where thebranches appear as disconnected segments.

Furthermore, MST-based approaches usually donot take into account global tree geometry, such assmoothness along the edges or branching factors, whichcan play an important role in improving topologicalaccuracy, avoiding over-fitting, and speeding up con-vergence. Finally, they do not explicitly account forjunction points such as bifurcations and crossovers,which can easily lead to erroneous connections, as willbe shown later.

We address these problems by optimizing a globalobjective function that explicitly models spuriousbranches and incorporates geometric priors that cap-ture geometric relationships between pairs of tree ver-tices and edges. These priors are of the same natureas those that are used in the object detection litera-ture (Felzenszwalb and Huttenlocher 2005; Leordeanu

et al. 2007) to express dependencies between objectparts.

Approach

In this section, we briefly outline our approach to delin-eation, which is depicted by Fig. 2. Our algorithm goesthrough the following steps:

1. We compute a tubularity value at each voxel, asshown in Fig. 2b. It encodes how likely it is to beon the centerline of a tubular structure.

2. We select high-probability voxels that are as evenlyspaced as possible and that we treat as anchorpoints, as shown in Fig. 2b.

3. We compute the most probable paths betweenpairs of nearby anchor points and assign them prob-abilistic costs that are lowest when all voxels alongthem are likely to lie in the middle of a filament.This results in the graph of Fig. 2c, where verticesare represented by the anchor points and edges bythe paths linking them.

4. We compute the lowest-cost tree in this graphamong those that span k of the edges for awide range of k < N. This is known as the k-Minimum Spanning Tree (k-MST) or k-CardinalityTree (KCT) problem. Even though it is NP-hard,approximate solutions can nevertheless be com-puted efficiently and fast (Blum and Blesa 2005).

5. We select the tree depicted by Fig. 2d that maxi-mizes a global objective function.

Steps 4 and 5 are those that most distinguish our ap-proach from more traditional ones that either buildtrees spanning all the anchor points and then attempt toeliminate spurious branches, or grow the tree incremen-tally at the risk of propagating errors. We avoid theseproblems by minimizing a well-defined global objectivefunction, as the minimum spanning tree approaches do,but with the possibility to explore different topologiesbecause we do not force the tree to systematicallyconnect all vertices.

For clarity’s sake, we describe our approach in termsof finding dendritic and axonal trees in 3D image stacks.Note, however, that it also applies to linear structuresin regular 2D images, such as the retinal scans of Fig. 1.One only has to replace the voxels by pixels and 26-connectivity by 8-connectivity.

In the remainder of this paper, we first describe inmore details our approach to building graphs such as

282 Neuroinform (2011) 9:279–302

(a) (b)

(c) (d)

Fig. 2 The tree construction process. a Maximum intensity pro-jection of an image stack representing olfactory projection fibers.The root node of the tree is manually supplied and is depictedby the yellow sphere. b Maximum intensity projection of thetubularity values computed for individual voxels. Anchor pointsobtained as local maxima of this measure are overlaid in green

and blue. The green dots denote bifurcation points and the bluedots denote sampled points on the fibers’ centerline. c The graphobtained by linking all anchor points to their neighbors. Theedges depicted in red are illustrative and do not represent theactual paths. d The k-MST that minimizes our global objectivefunction

the one of Fig. 2c. We then formulate the objectivefunction we use to assess the quality of a tree within thisgraph and show how it can be optimized to produce atree reconstruction such as the one of Fig. 2d. Finally,we present results on real 2D and 3D images acquiredusing different modalities and discuss them.

Graph Construction

As discussed above, we begin by computing a graph,whose nodes are anchor points that are likely to lieat the center of filaments, and whose edges representpaths connecting them. In this section, we discuss thethree steps involved in building this graph. We firstintroduce the tubularity measure we use to assess theprobability that a voxel lies on a filament. We thendiscuss the sampling procedure we implemented toselect regularly spaced anchor points by using the voxelprobabilities. Finally, we describe our approach to link-ing them to create the edges of the graph.

Tubularity Measure

Elongated structures such as those of Fig. 1 can befound at many different scales and their appearanceis often severely affected by the point spread functionof the microscope, acquisition noise, and irregularitiesin the staining process. As a result, they only rarelyappear as well-defined tubes, especially when they arevery thin.

To achieve robustness to such distortions, we relyon statistical machine learning to learn the appearanceof processes given the specific acquisition modality weare dealing with. More specifically, for each voxel, weform a set of feature vectors each of which is madeup of steerable filter responses (Gonzalez et al. 2009)and Hessian eigenvalues at a particular scale and anorientation. We then train a Support Vector Machineclassifier (SVM) (Schoelkopf et al. 1999) with Gaussiankernel on training data that consists of hand-supplieddelineations. To this end, we collect voxels along thesedelineations as positive samples and randomly sampleones away from them as negative samples. At run-time,

Neuroinform (2011) 9:279–302 283

for a voxel xi, we run the SVM for several possiblewidths and orientations. For each pair of width w andorientation φ, the SVM returns a score f (xi, w, φ). Wetake our tubularity measure to be

fi = maxw,φ

f (xi, w, φ) (1)

and retain the corresponding maximizing values wi andφi as local width and orientation estimates. We use asigmoid to map this measure to the posterior probabil-ity pi of xi being on the centerline of a linear structuregiven the score fi. We write

pi = 1

1 + e−(afi+b), (2)

where the parameters a and b are estimated by cross-validation over a validation set. Using a sigmoid toconvert an SVM output into a probability is valid be-cause it preserves the sparseness of the SVM whilereturning probabilities comparable to those producedby regularized likelihood kernel methods (Platt 2000).

Sampling the Image

To extract a representative set of anchor points thatapproximate well the underlying tree structure, samplesshould be taken from both junction points and tubularsegments of the tree. We have therefore developed atwo-step approach to finding anchor points introducedat the beginning of section “Approach”. We first at-tempt to detect junctions such as the green dots ofFig. 2b and then to find regularly spaced anchor pointssuch as the blue dots away from them. In this way,if a junction is missed, it can be recovered when thealgorithm tries to link blue dots. Furthermore, even ifall the junctions were found, the blue dots would stillbe required to properly link distant ones.

Recall that we have assigned the tubularity probabil-ity pi of Eq. 2 to each voxel. It is the voxel’s probabilityof being on the centerline of a tubular structure. Todetect junctions, we use a 0.5 threshold to binarizethe resulting probability image and compute its skele-ton (Lee et al. 1994). We label as potential junctionvoxels from which more than two skeletal branchesemanate. Note that this can result either from a legit-imate bifurcation or from a crossover, which can occurwhenever two independent filaments approach eachother at a distance smaller than the spatial resolution ofthe microscope. Because we cannot know at this stagewhether a detected point is a bifurcation or a crossover,we generate two co-located anchor points for each suchpoint to avoid problems such as those depicted by Fig. 3at tree reconstruction time.

A

C

B

D

Y

Z

B AX

(a) (b)

Fig. 3 Two cases that can lead to reconstruction errors. a Ifthere is no sample point at bifurcation D, the tree reconstructionalgorithm is likely to incorporate both the AB and AC paths intothe final tree, which will result in counting twice the pixels in ADwhen scoring it. This is avoided by introducing an anchor point atD. b At a crossover, an anchor point can be used to build eitherthe horizontal AX B branch or the vertical Y X Z one, but notboth. This is why we choose to create two anchor points at allpotential crossovers

This being done, we remove from further consider-ation spatial neighborhoods of the detected junctionpoints and sort the remaining voxels according to theirtubularity probability. We then iteratively select themost probable one and eliminate all those within theneighborhood until the whole image has been explored.The neighborhood of voxel xi is taken to be a cylindricalbox whose axis is aligned with the orientation estimateφi. While the cylinder height is fixed to favor evensampling along filaments, the radius is taken to belinearly proportional to the width estimate wi.

This results in a set of anchor points that are rel-atively regularly spaced along filaments. Inevitably,some of these points are false positives that do not lieon filaments and will have to be ignored when buildingthe final tree.

Linking the Anchor Points

Let V be the set of anchor points, which is a subset ofV I = {xi}, the set of all voxels in the image volume.We represent this volume by a directed graph GI =(V I, EI) whose vertex set is V I and whose edges EI ={eI

ij} connect each voxel to its 26 neighbors. Hereafter,GI will be referred to as the image graph.

We construct a reduced graph G = (V, E) of GI overthe set of anchor points V by linking all pairs of them,except the co-located ones, that are within a specifieddistance of each other. The edge set E correspondsto paths formed by successive edges of EI connectingnearby anchor points. In the remainder of the paper,we will use the terms voxels and vertices of G as well asedges and paths interchangeably.

Formally, the path emn ∈ E linking anchor pointsxm , xn ∈ V can be represented by the set of edges

284 Neuroinform (2011) 9:279–302

{eImi, eI

ij, . . . , eIkn} it includes. In practice, we choose these

edges by running Dijkstra’s algorithm to minimize

dmn =∑

eIij∈emn

− log pij , (3)

where pij is an image-based probability that edge eIij

belongs to the centerline of a filament. In other words,we take emn to be the maximum likelihood path basedon image evidence only and assuming that, in the ab-sence of any such evidence, all paths are a priori equallylikely.

We derive pij from the pi values of Eq. 2 by consid-ering the dmn cost of Eq. 3. We require that dmn shouldbe roughly equal to the integral of − log(pi) alongthe corresponding continuous path. In other words, wehave

dmn ≈∫

− log p(s)ds (4)

≈∑

eIij∈emn

∫ lij

0− log p

(lij − s

lijxi + s

lijx j

)ds ,

where s represents the curvilinear abscissa along thepath, p(s) the probability that the point at abscissa s ison a centerline, and lij the distance between neighbor-ing points xi and x j. Since we work on a voxel grid, tocompute the integral of Eq. 4, we only have values pi

and pj of p((1 − s/ lij)xi + (s/ lij)x j) for s = 0 and s = lij

respectively. Assuming that p varies linearly betweenxi and x j, we write

dmn ≈∑

eIij∈emn

∫ lij

0− log

(lij − s

lijpi + s

lijpj

)ds

≈∑

eIij∈emn

− lijpi(log(pi) − 1) + pj(1 − log(pj))

pi − pj. (5)

In practice, to avoid divisions by zero, we therefore takepij to be equal to p

lij

i if |pi − pj| ≤ ε, and so that

log(pij) = lijpi(log(pi) − 1) + pj(1 − log(pj))

pi − pj, (6)

otherwise. Note that this is consistent because whenpj − pi tends towards zero, log(pij) defined in this man-ner tends towards lij log(pi) = lij log(pj).

Tree Reconstruction

Given the graph G = (V, E) introduced in the previoussection, we take trees to be directed subgraphs of G

with a distinguished vertex, called the root, with in-degree 0 and such that there is a unique directed pathfrom it to every other vertex in the tree.

Let T (G) be the set of all such trees in G. Our tasknow is to find the tree t∗ ∈ T (G) that best representsthe underlying tree structure. To this end, we firstintroduce a Bayesian framework that lets us define anobjective function to assess the quality of a tree t ∈T (G), using both image- and geometry-based evidence.Then, since T (G) is exceedingly large, we introducetwo different algorithms designed to find acceptableapproximations of the true optimum sufficiently fast tobe of practical use.

Bayesian Formulation

A subgraph t of G can be represented by a set ofindicator variables {tmn}, one for each edge of G, suchthat

∀emn ∈ E, tmn ={

1, if emn ∈ t

0, otherwise.(7)

Let T = {Tmn} be the set of binary random variablessuch that Tmn stands for edge emn truly being on thecenterline of a tree structure. Similarly, let T I

ij standfor the random variable representing the presence orabsence of a centerline along the edge eI

ij of the imagegraph.

Recall from section “Tubularity Measure” that weassociate to each voxel xi the tubularity measure fi ofEq. 1, which can be used to evaluate how likely xi isto be on the centerline of a tubular structure. We alsoestimate the width wi and the orientation φi of the tube,assuming there is one. Let f denote the set of tubularitymeasures and � the set of all widths and orientations.

We take the optimal subgraph t∗ to be the tree whoselikelihood is greatest, given the image features f andthe orientation and width estimates �. This can be writ-ten as

t∗ = argmaxt∈T (G)

P(T = t|f, �, �) (8)

= argmaxt∈T (G)

P(f|T = t)P(T = t|�, �) (9)

= argmint∈T (G)

− log P(f|T = t) − log P(T = t|�, �), (10)

where � denotes a set of learned meta-parametersencoding prior knowledge about plausible tree shapes.Equation 9 follows from Eq. 8 because the tubularitymeasures f of Eq. 1 only denote the presence or theabsence of centerlines and are therefore conditionallyindependent of the widths and orientations �, given

Neuroinform (2011) 9:279–302 285

the presence or absence of a tubular structure. Wenow turn to defining more precisely the two terms ofEq. 10, which we will refer to as the Image-Based andthe Geometry-Based terms respectively.

Image-Based Term

P(f|T = t) represents the probability of observing theimage features we extract, knowing where the tree is.We assume conditional independence of these featuresalong neighboring edges given that we actually knowwhether these edges belong to the tree or not, andhence, represent the likelihood as a product of indi-vidual edge likelihoods. Since our eventual goal is tomaximize the first term of Eq. 9 with respect to the tmn

indicator variables of Eq. 7, in the following derivation,we drop all terms that are independent from the tmn

and have therefore no bearing on the outcome of thecomputation.

Recall from section “Linking the Anchor Points”that each edge emn ∈ E is composed by a set of edges{eI

ij} ∈ EI belonging to the image graph and linkinganchor points xm, xn ∈ V. We therefore write

P(f|T = t) =∏

emn∈E

P(fmn|Tmn = tmn) (11)

=∏

emn∈E

P(fmn|Tmn = 1)tmn

×P(fmn|Tmn = 0)(1−tmn) (12)

∝∏

emn∈E

[P(fmn|Tmn = 1)

P(fmn|Tmn = 0)

]tmn

(13)

=∏

emn∈E

⎢⎣∏

eIij∈emn

P(

fij|T Iij = 1

)

P(

fij|T Iij = 0

)

⎥⎦

tmn

(14)

=∏

emn∈E

⎢⎣∏

eIij∈emn

P(

T Iij =1| fij

)P

(T I

ij =0)

P(

T Iij =0| fij

)P

(T I

ij =1)

⎥⎦

tmn

(15)

∝∏

emn∈E

⎢⎣∏

eIij∈emn

P(

T Iij = 1| fij

)

P(

T Iij = 0| fij

)

⎥⎦

tmn

(16)

=∏

emn∈E

⎢⎣∏

eIij∈emn

pij

1 − pij

⎥⎦

tmn

, (17)

where fmn is the set of tubularity values along the pathemn and fij stands for the tubularity values fi and fj forthe vertices of the edge eI

ij ∈ EI of the image graph.Equations 12 and 13 are respectively obtained by a

simple algebraic manipulation and dropping a constantterm. In Eq. 14, we use the conditional independenceassumption of the tubularity measures along neighbor-ing edges of the image graph given the true state of theedges. In Eqs. 15 and 16, we use Bayes’ rule and assumethat edges are a priori equally likely to belong to a tree,which lets us also drop the P(T I

ij = 0) and P(T Iij = 1)

terms. Finally, in Eq. 17, we replaced the probabilityP(T I

ij = 1| fij), the likelihood that the edge eIij is on

a filament centerline, by its short notation pij intro-duced in Eq. 3 and computed according to the formulagiven in Eq. 6. Likewise, we replaced P(T I

ij = 0| fij) by1 − pij.

For a tree t in G, the image-based term of Eq. 17 cantherefore be written as

Fi(t) = − log P(f|T = t) =∑

emn∈E

cmn tmn, (18)

where cmn =∑

eIij∈emn

−logpij

1 − pij.

In essence, cmn can be considered as the cost of edgeemn and our algorithm will try to minimize the sum ofthese costs over the whole tree, while also enforcingthe geometric constraints discussed in the followingsection.

Introducing the Geometry-Based Term

The simplest possible approach to estimating thegeometry-based term, − log P(T = t|�, �), is to as-sume that all trees have the same prior probability andthat subgraphs that are not trees have probability zero,which makes it constant for all choices of tmn valuesthat result in a tree. Under this assumption, the optimaltree t∗ of Eq. 10 can be obtained by finding the setof indicator variables {t∗mn} such that the correspondingsubgraph is a tree and the image-based linear objectivefunction of Eq. 18 is minimized.

However, as shown in Fig. 1, this can result in er-roneous topologies in ambiguous cases, mostly becausethe image data is very noisy. To remedy this, we incor-porate geometric priors into our objective function topenalize trees whose geometric properties make themunlikely candidates.

More specifically, we model the prior term as atree structured Bayesian network that captures geomet-ric relationships between consecutive edge and vertex

286 Neuroinform (2011) 9:279–302

pairs. In this work, we assume that the root vertex, xr ∈V, of the tree structure is either given or can be reliablyestimated. Let Er ⊂ E be the set of edges emanatingfrom xr. We take the prior probability P(T = t|�, �)

to be∏

eri∈Er

P(Tri = 1|�r, �i, �)tri

×∏

eom∈Eemn∈E\Er

P(Tmn = 1|Tom = 1, �omn, �)tmntom ,

where �i denotes the width and orientation estimatesfor vertex xi, and �omn denotes width and orientationestimates for vertex triplets. Under this model, thegeometry-based term becomes the quadratic functionof the indicator variables

− log P(T = t|�, �)=∑

eri∈Er

bri tri +∑

eom∈Eemn∈E\Er

aomntmntom ,

(19)

where

bri = − log P(Tri = 1|�r, �i, �) , (20)

aomn = − log P(Tmn = 1|Tom = 1, �omn, �).

These terms encode the fact that successive edgesand vertices must have consistent widths and orienta-tions. Specific choices for modeling them will be givenin the next section.

The objective function Fg(t) that must be minimizedto find the optimal tree t∗ of Eq. 10 can now be obtainedby summing the geometry-based term of Eq. 19 to theimage-based one of Eq. 18. This yields

Fg(t)=∑

emn∈E

cmntmn +∑

eri∈Er

britri +∑

eom∈E,emn∈E\Er

aomntmntom (21)

=∑

eri∈Er

(cri + bri)tri +∑

eom∈E,emn∈E\Er

(cmn + aomn)tmntom . (22)

Equation 22 follows from Eq. 21 by rewriting theimage-based term as a sum of unary and binary termsinvolving the indicator variables under the assumptionthat t is a tree, and thus has no cycles, and then groupingthem with those that appear in the geometry-basedterm. Note that the unary term represents the total costof the edges emanating from the root vertex and thebinary term represents the cost for all edge pairs in t.Note also that the cmn and bri terms are edge weightswhile the aomn terms are pairwise edge weights.

The above optimization problem generalizes theMinimum Arborescence Problem (Duhamel et al.

2008) with a quadratic cost and additional constraints,which is NP-hard. The presence of the quadratic termsmakes the minimization of Fg(t) more difficult thanthat of the linear objective function Fi(t) of Eq. 18,which only involves unary terms. While it turned outto be possible to optimize the latter by directly usingour earlier algorithm (Blum and Blesa 2005), we had toextend it substantially to handle the former. This willbe discussed in more details in section “Optimization”.

Even though the above formulation assumes a singletree with a well defined root, it is easily extended tothe multiple tree reconstruction case. This is done byadding a virtual vertex to the graph and connecting itto the individual root vertices of the trees to be recon-structed by minimal cost outgoing edges. Likewise, thepairwise costs of edge pairs directly connected to theroot vertices are set to minimal values. This transformsthe multiple tree reconstruction problem into a single-tree reconstruction one with no loss of generality.

For multiple trees, as in the single tree case, weassume that the root vertices of each tree is either givena priori or can be reliably estimated. This is requiredbecause the global objective function of Eq. 22 doesnot score one single tree at a time, as is done in mosttracking-based methods. Instead, the tree reconstruc-tion algorithm discussed below, which is designed tominimize it, seeks to explain the whole image volumeat once.

Modeling the Geometry-Based Term

In this work, we exploit four geometric properties tocapture the underlying relations between parts of thetree structure, as illustrated by Fig. 4. They are:

(a) Edge Direction Similarity. To obtain smooth re-constructions, we encode direction similarity ofpairs of consecutive edges. The angular differencebetween the directions is modeled by a von Misesdistribution, that is a circular normal distributionof mean μe and concentration ke.

(b) Width Consistency. We model the width dif-ferences at pairs of consecutive vertices by anasymmetric Gaussian distribution of mean μw andvariances σ 2

wl and σ 2wr. This model accommodates

the fact that the width may tend to decrease withdistance from the root in some datasets but notothers.

(c) Orientation Consistency. We measure the angulardeviation of the estimated orientations of twoconsecutive vertices from the direction of the linebetween them. This deviation is again modeled by

Neuroinform (2011) 9:279–302 287

(a) (b)

(d)(c)

Fig. 4 The four components of the geometric regularization term. The green circles are the vertices of the reduced graph and the solidlines illustrate the edges between them. a Edge direction similarity. b Width consistency. c Orientation consistency. d Tortuosity

a von Mises distribution of mean μφ and concen-tration kφ .

(d) Tortuosity. We use the ratio of the path lengthto the linear distance between the endpoints asmeasure of tortuosity, which we then represent bya Gaussian distribution of mean μtor and varianceσ 2

tor.

Given these geometric terms, the vector � of meta-parameters introduced in Eq. 10 is composed of themeans, variances, and concentrations of these four dis-tributions, which we estimate using a maximum like-lihood approach on a training data set. We thereforewrite the geometry-based probability of a pair of con-secutive edges eom ∈ E and emn ∈ E belonging to thetree as

P(Tmn = 1|Tom = 1, �omn, �)

= M(φom − φmn, μe, ke)

×AN(wm − wn, μw, σ 2

wl, σ2wr

)

×M(φm − φmn, μφ, kφ)

×M(φn − φmn, μφ, kφ)

×N(tor(emn), μtor, σ

2tor

), (23)

where φom denotes the direction angle of the line be-tween vertices xo and xm, and tor(emn) denotes thetortuosity value assigned to the path corresponding toedge emn. The prior probability P(Tri = 1|�r, �i, �)

of Eq. 20 for edges emanating from the root vertexincludes the same set of geometric terms except theedge direction similarity term.

Optimization

Let us assume that the graph G has N vertices. In thefirst part of this section, we present a simple approach

that aims at building the best possible tree, tk, among allthose that span exactly k edges. We run it for all k from1 to N−1 and pick the tk that yields the best overallscore. This is a simple but practical way to optimizethe criterion of Eq. 18 because, for each k, we can takeadvantage of our earlier k-MST algorithm (Blum andBlesa 2005) to build a near-optimal k cardinality tree.However, this algorithm was not designed to handlepairwise edge-relationships, such as those that appearin the objective function of Eq. 22. Furthermore, thisapproach is computationally inefficient because it in-volves restarting the whole computation from scratchfor each successive cardinality. In the second part of thissection, we therefore extend the original k-MST algo-rithm both to handle geometric relationships betweenedges and to compute not one single tree but many, onefor each cardinality, at once.

Optimizing the Image-Based Term

To minimize the image-based objective function Fi ofEq. 18, we begin by using the k-MST algorithm (Blumand Blesa 2005) for each value of k (0 < k < N) tobuild the k-cardinality tree tk that minimizes

emn∈E

dmntmn , (24)

where the dmn are the sum of negative log likelihoods ofEq. 3. We then simply select the k and corresponding tk

that yields the lowest value of Fi(tk).Note that it might have seemed more logical to use

the sums of log likelihood ratios cmn of Eq. 18 in Eq. 24so that the k-MST algorithm could directly build k-cardinality trees that are optimal for the true objectivefunction we seek to minimize. However, consider thatthe k-MST algorithm makes local decisions to build

288 Neuroinform (2011) 9:279–302

candidate trees, which involves selecting a new edge ateach iteration to decrease the overall tree cost. If theedges were weighted using the log-likelihood ratios ofEq. 18, the preferred edge would always be the oneaccounting for the greatest amount of image evidence,thus ignoring the fact that there are more edges tobe added. This would bias the optimization towardslong edges and would produce undesirable effects, suchas those depicted by Fig. 5a. The same argument ismade very convincingly, albeit in a different context,in Section 5 of Felzenszwalb and McAllester (2006).In contrast, using the log-likelihood of Eq. 24 as edgeweights results in the highest density of probabilitytree being built. Since we assume that there exists onlyone tree per image, once the tree tk is constructed weconsider all remaining edges as background, and assignto the tree the score of Eq. 18.

The k-MST algorithm (Blum and Blesa 2005) relieson an ant colony optimization (ACO) scheme (Dorigoand Stütale 2004) inspired by the foraging behaviorof real ants. While walking from food sources to thenest and vice versa, ants deposit chemicals known as

pheromones on the ground. Paths marked by strongpheromone concentrations are more likely to be chosenwhen deciding in what direction to go. This groupbehavior is the basis for a cooperative interaction whichlets the ants find shortest paths between their nest andfood sources. In ACO algorithms, an artificial ant incre-mentally constructs a complete solution by iterativelyadding appropriately chosen components to a partialone. The solution components to be added are chosenprobabilistically according to a parameterized proba-bilistic model, the so-called pheromone model P , whichis a finite set of numerical values. Each pheromonevalue τi ∈ P is associated to an element from a set ofpotential solution components. The pheromone modelis used to probabilistically generate complete solutionsby assembling them from a set of solution components.In general, an ACO algorithm repeatedly goes throughthe two following steps:

1. Candidate solutions are constructed using a pher-omone model, that is, a parameterized probabilitydistribution over the solution space;

(a) (b)

(c) (d)

Fig. 5 Different approaches to linking a set of anchor points.a Minimizing the log-likelihood ratio cost function of Eq. 18 usingthe ACO algorithm of section “Optimizing the Image-BasedTerm” favors long paths between anchor points, which results inzig-zags and some pixels being used twice. b Replacing the log-likelihood ratios by the log likelihoods of Eq. 24 removes thezig-zags but does not prevent double-counting, resulting in the

linking of a false-positive and a spurious branch. c Introducingthe geometric priors to the log-likelihood ratios of Eq. 25 alsoremoves the zig-zags but still does not prevent double-countingand the creation of a different spurious branch. d Using the loglikelihoods and the geometric priors as in Eq. 26 produces theright answer

Neuroinform (2011) 9:279–302 289

2. The candidate solutions are used to modify the pher-omone values so as to bias future sampling towardhigh quality solutions.

The second step, often referred to as the pheromoneupdate, aims at focusing the search towards promisingparts of the search space and is a critical componentof all ACO algorithms. It implicitly assumes that goodsolutions consist of good solution components.

More specifically, the ACO algorithm for the k-MST problem in undirected graphs that we use hereworks roughly as follows. At each iteration, a num-ber na of l-cardinality trees—where l > k—are proba-bilistically constructed based on the pheromone modeland edge weights. The pheromone model consists ofa pheromone value τe for each edge e of the givenundirected graph. Then, a dynamic programming al-gorithm (Blum 2007) is used to extract from each ofthese l-cardinality trees the best k-cardinality tree theycontain. The last step of each iteration consists in thepheromone update, which—depending on the so-calledconvergence factor—uses a weighted average of goodsolutions found previously. When reaching the imposedcomputation time limit, the algorithm returns the bestsolution it has found up to this point.

Accounting for the Pairwise Terms

The approach discussed above cannot be used to op-timize the full objective function of Eq. 22 becauseour original k-MST algorithm cannot handle quadraticterms. Furthermore, it is computationally inefficientbecause it involves running the algorithm many timesover, once for each cardinality. We have thereforeextended the k-MST algorithm in the following ways:

1. Since the root node is given and the pairwise termsaomn of Eq. 20 are asymmetric, we replace theundirected graph model used by the original k-MSTalgorithm by a directed one and explicitly introducethe root vertex xr.

2. We take into account the pairwise terms intro-duced in section “Introducing the Geometry-BasedTerm” when computing effective edge weights.

3. To improve the convergence of the algorithm,pheromone values are assigned to pairs of consec-utive edges instead of to individual ones. In otherwords, the set P contains a pheromone value foreach pair of consecutive edges.

4. We introduce an additional tree neighborhoodstructure and search over this neighborhood toavoid incorrect connections at crossovers and toimprove convergence.

5. A branching factor limit constraint is imposed onthe reconstructions.

In our implementation, not only is the quality of thetrees subject to optimization, but also their cardinality,which is not given beforehand. This is achieved throughan optimization scheme that iterates two complemen-tary steps, one for tree construction and the other forcardinality selection. Recall from section “Introducingthe Geometry-Based Term”, that we seek to minimizethe objective function

Fg(t) =∑

eri∈Er

(cri + bri)tri +∑

eom∈E,emn∈E\Er

(cmn + aomn)tmntom ,

(25)

which we will refer to as the primary objective function.We also introduce an auxiliary objective function

F ′g(t) =

eri∈Er

(dri + bri)tri +∑

eom∈E,emn∈E\Er

(dmn + aomn)tmntom

(26)

where we replace the log likelihood ratios cmn of Eq. 18by the log likelihoods dmn of Eq. 3.

In our optimization procedure, we use the dmn costsin the auxiliary objective function as a search heuristicto construct the trees and the cmn ones in the primaryobjective function to score them. We do this for thesame reason as we did in the simpler version of the algo-rithm outlined in section “Optimizing the Image-BasedTerm”. This remains necessary because, even thoughintroducing the geometric priors tends to fix the zigzag-ging behavior depicted by Fig. 5a, as shown in Fig. 5c,it does not prevent some pixels from being used twice,resulting in spurious branches such as the one depictedby Fig. 5c. This behavior results from the fact thatwhen we compute the edges of the graph, that is, thepaths linking the anchor points, nothing prevents thesame voxels from being used in more than one path.Preventing this would require adding many additionalanchor points to guarantee that there is one for everypotential junction and would be very computationallyexpensive. Furthermore, this would result in irregularlyspaced anchor points and paths of arbitrary length,which would give additional and unwarranted influenceto path length when selecting edges.

The first reconstruction step of the optimizationinvolves first building a number na of trees usingan ACO scheme similar to the one discussed insection “Optimizing the Image-Based Term” that adds

290 Neuroinform (2011) 9:279–302

edges so as to probabilistically minimize the auxil-iary objective function with the quadratic term. Foreach one of these trees, the algorithm then uses thesame dynamic programming technique (Blum 2007) asbefore to extract the best k-cardinality tree for k =1, . . . , klimit. Finally, for each cardinality, among theextracted na trees, we keep the one that minimizes theauxiliary objective function. This results in klimit treeseach with a different cardinality.

In the second reconstruction step, among all theresulting trees, the one that minimizes the primaryobjective function is selected and the pheromone valuesare updated accordingly.

Henceforth, we will refer to this algorithm as theACO-RTS method, which stands for Ant Colony Op-

timization for the Reconstruction of Tree-like Struc-tures. In the Appendix, we describe it in more detailsand provide a pseudo-code for it.

Results

In this section, we first use eight brightfield micro-graphs, such as those depicted in Figs. 1 and 6, toevaluate the various components of our approachand to demonstrate the importance of using both theimage-based and the geometry-based terms of ourobjective function. To demonstrate the generality ofour approach, we show a similar improvement on avery different dataset, the DRIVE database of retinal

(a) (b) (c) (d)

Fig. 6 Reconstructions in a brightfield micrograph. The top rowdepicts the whole image stack and the bottom one an enlargedportion of it. The yellow sphere denotes the soma and serves asthe root node. a Manually delineated ground truth overlaid overthe original stack. b Minimum spanning tree. c Reconstructionobtained without geometric regularization. d Reconstruction ob-tained with geometric regularization. The boxes overlaid on thebottom row images highlight locations where geometric regular-ization clearly brings about an improvement. In the area within

the uppermost box, there seems to be a gap in the data. As aresult, both the MST and the k-MST without regularization fail toconnect the top and bottom part of the dendrite, which is wrong ascan be seen in the ground truth data. By contrast, the regularizedk-MST depicted by the last column exhibits the right topology.Moreover, in the area enclosed by the lowest of the boxes, thereis structured noise, which is correctly ignored by the regularizedk-MST but not by the others

Neuroinform (2011) 9:279–302 291

scans (Staal et al. 2004), such as those in Figs. 1 and 8.Finally, we present our results on the DIADEM chal-lenge datasets (Ascoli et al. 2010). All images in thissection contain overlays that are best viewed in color.

In all cases, we used the same algorithm with-out customization, besides retraining using groundtruth data. This illustrates the ability of our statisticallearning-based approach to adapt to a wide range ofmodalities.

Running the full algorithm requires setting 13 pa-rameters in total. The a and b parameters of the sig-moid function of Eq. 2 are learned from the trainingdatasets and associated ground truth tracings using theLevenberg-Marquardt algorithm of Platt (2000). Theradius and height of the cylindrical box used for sam-pling and described in section “Sampling the Image” areestimated from the training datasets using a two dimen-sional grid search procedure to maximize the DIADEM

110 120 140 160 180 200 220 240

0.5

0.6

0.7

0.8

0.9

Cardinality

DIA

DE

M S

core

110 120 140 160 180 200 220 240−4500

−4000

−3500

−3000

−2500

−2000

−1500

−1000

Cardinality

Sco

re

300 400 500 600 700 800 900 1000 1100−3.5

−3

−2.5

−2

−1.5

−1

−0.5x 10

4

Cardinality

Sco

re

300 400 500 600 700 800 900 1000 1100

0.3

0.4

0.5

0.6

0.7

Cardinality

DIA

DE

M S

core

200 250 300 350 400 450 500 550 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Cardinality

DIA

DE

M S

core

Sco

re

200 250 300 350 400 450 500 550 600−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8x 10

4

Cardinality

(a) (b)

Fig. 7 Quantitative evaluation of our results for the brightfieldstack of Fig. 6 (top row), the retinal scan of Fig. 8 (middle row)and the Olfactory projection fibers of Fig. 9 (bottom row). a DI-ADEM scores as a function of the tree cardinality. The red solidcurve represents the scores obtained with the proposed geometricregularization, which means minimizing the objective function

of Eq. 26 for each fixed cardinality, and the blue dotted curverepresents those obtained without regularization. The yellowsquare indicates the score of the standard MST. b Correspondingvalues of the objective functions Fg of Eq. 22 and Fi of Eq. 18,also drawn as red solid and blue dotted lines respectively. Thediamonds represent the selected cardinalities that minimize them

292 Neuroinform (2011) 9:279–302

score. The remaining parameters correspond to thedistributions of the geometric priors introduced insection “Modeling the Geometry-Based Term” and areestimated using a maximum likelihood approach.

The algorithm is implemented in C++ and paral-lelized across multiple CPU cores. The training-timesare in the order of a few tens of minutes. The run-timesrange between few seconds for relatively small datasetssuch as the Olfactory Projection Fibers to few tens ofminutes for the largest ones such as the HippocampalCA3 Neurons on an eight core 3 GHz PC. The soft-ware is available online at http://cvlab.epfl.ch/research/medical/lm.

Brightfield Micrographs

We evaluated our algorithm on eight brightfield mi-crographs such as those of Figs. 1 and 6. They wereacquired by our colleagues from EPFL’s Brain Mind In-stitute, who also gave us hand-traced ground-truth data.The images were obtained from biocityne-dyed ratbrains. The numerous artifacts produced by irregulari-ties of the staining process and the non-Gaussian blurintroduced by the microscope make their automatedanalysis challenging. Many significant processes appearas faint structures, present abrupt intensity changes,or are severely blurred. Furthermore, the stain candye irrelevant structures, such as blood vessels thatare close to the neuron under analysis. This producesboth structured and unstructured noise that is difficultto distinguish from true processes, even for a humanexpert.

As discussed in the implementation section, we usemanually annotated trees to train the SVM classifiers ofsection “Tubularity Measure”, which we use to evaluatetubularity, and to learn the geometric probability dis-tributions of section “Modeling the Geometry-BasedTerm”. Since the data is noisy, many spurious anchorpoints are generated during the sampling step of sec-tion “Sampling the Image”. Thus, the minimum span-ning tree of Fig. 1b is over-complete. Applying our k-MST scheme results in the much cleaner trees of Fig. 1c,when not using our geometric regularization term, andof Fig. 1d, when using it.

In Fig. 6, we show similar results on a second micro-graph. As shown in the areas highlighted by the boxesoverlaid on the bottom row images, the geometric regu-larization term can bring very significant improvementsboth when there is a gap in the image data, by bridgingit, or when there is structured noise, by eliminatingit. To quantify this improvement, we scored the re-constructions using the DIADEM metric (Ascoli et al.

(a)

(b)

(c)

(d)

(e)

Fig. 8 Example result for a retinal image from the DRIVEdatabase. The first column depicts the whole image and thesecond an enlarged portion of it. a Manually outlined bloodvessels overlaid in white. b Minimum spanning tree. c Minimumspanning tree with the spurious branches trimmed by minimizingEq. 18. d Reconstruction without geometric regularization. eReconstruction with geometric regularization. The green boxesand the yellow ellipses denote spurious branches and gaps, all ofwhich disappear in row e

Neuroinform (2011) 9:279–302 293

2010), which is specifically designed to compare thetopology of a reconstructed tree against ground truth.It returns a number between 0 and 1 that measures thetopological distance between trees by matching theirbranching and end points, and analyzing the connectingpaths. Each connection is weighted by the size of thesubtree it is connected to. Therefore, errors close tothe tree root are usually penalized more heavily thanthose further away because they result in more severetopological changes. The results shown in the top rowof Fig. 7 confirm that trees reconstructed using our full

approach score better than both the minimum span-ning tree and those reconstructed without geometricregularization.

Retinal Scans

To demonstrate the generality of our algorithm, weevaluated it on the 2D retinal images of the DRIVEdatabase (Staal et al. 2004). The dataset contains 20test images with manual segmentations of blood ves-sels performed by trained human observers. Since

Fig. 9 Olfactory projectionfibers. a Maximum intensityprojections of three imagestacks with the reconstruc-tions overlaid. b Enlargedversions of the same images

(a) (b)

294 Neuroinform (2011) 9:279–302

segmentations can not be used for quantitative topolog-ical comparison, we manually traced the vascular treesand treat them as ground truth. At detection time, theroot vertex is automatically estimated to be the anchorpoint nearest to the center of the optic disk region,which is detected by using a variant of the methodof Huang and Yan (2006).

Figure 8 illustrates the behavior of our algorithm onone of these images.1 Again, our approach eliminatesmany outliers and yields much cleaner trees than astandard minimum spanning tree, with far fewer mis-takes such as those highlighted by the green rectanglesand the yellow ellipses in the second column of thefigure, especially when using the full objective function.Note that the final tree is not a subset of the minimumspanning tree. Its topology is actually different, whichcould not have been achieved by simply pruning theminimum spanning tree as shown in the third row of thefigure. To quantify the corresponding improvement,we again computed the DIADEM scores. As shownin the middle row of Fig. 7, the results are similar tothose we obtained for the brightfield micrographs. Interms of the metric, using the geometric regularizerimproves the results by about 40% over not using it andby slightly more when comparing against the minimumspanning tree or the pruned version of it. Arguably,using the k-MST algorithm without geometric regular-ization eliminates many spurious branches but does notsignificantly improve the resulting topology. However,this is achieved using the full objective function.

DIADEM Data

We ran our algorithm on the five datasets providedby the DIADEM challenge (Ascoli et al. 2010). Eachone includes training image stacks with ground-truthtraces generated by experts and test stacks. As before,the ground truth is used to train the SVM classifiers ofsection “Tubularity Measure” and to learn the geomet-ric probability distributions of section “Modeling theGeometry-Based Term”.

Olfactory Projection Fibers

These image stacks are acquired from neurons of the ol-factory bulb of the Drosophila fly. Each stack depicts asingle neuron labeled with a Green Fluorescent Protein(GFP) and is imaged using a confocal microscope. Thestacks are relatively clean, with a small point spread

1Additional results are available online at http://cvlab.epfl.ch/research/medical/lm/retinal/.

(a) (b)

Fig. 10 Conversion from color to grayscale of Cerebellar climb-ing fibers images. a Original image patch. b Grayscale version

function. As can be seen in Fig. 9, the trees appeareither as simple structures consisting of few branchesor a path with almost no bifurcations that ends in acomplex and a highly branched structure.

The reconstructions we produce are generally goodas also shown by the DIADEM scores of the bottomrow of Fig. 7. Note that the objective function Fg of

Fig. 11 Cerebellar climbing fibers. Top row: image stack withreconstruction overlaid in white. The yellow dot denotes the treeroot. Bottom row: ground truth delineation. Note that the branchon the right of the root node is correctly delineated, even thoughit was not traced in the ground truth

Neuroinform (2011) 9:279–302 295

Eq. 25 does not seem to reach to a minimum as shownin the bottom row of Fig. 7b. Since this dataset is fairlyclean with relatively little noise on the background,it turns out that the sampling does not produce anyoutliers outside the processes and all the anchor pointsare within the fibers. As a result, the objective functionFg always decreases with increasing tree cardinality,up to the point where all vertices are spanned bythe tree.

However, the reconstructions contain some spuriousbranches that are perpendicular to the main processesand that reside within them, which is attributable tothe sampling methodology of section “Sampling theImage”. Because the processes occasionally have com-plex shapes, the sampling step generates two anchorpoints, one on the centerline and the other close tothe surface of the fiber, corresponding to the samecross-section of the tube. This could be solved by amore sophisticated sampling strategy or by taking into

account voxel width and orientation estimates whencomputing the paths of Eq. 3.

Cerebellar Climbing Fibers

These stacks are acquired from the cerebellar cortexof rats. The axons’ terminals are dyed with Biotiny-lated Dextran Amine (Anterograde), which results invery dark processes and blob-like blue nuclei. Beforeprocessing the stacks, we converted them to grayscaleby linearly combining the RGB components so as tosuppress the blue blobs and maximize the absolutegrayscale intensity difference between the axon termi-nals and the nuclei in the training stacks, as shown inFig. 10.

As shown in Fig. 11, each stack contains severaltiles that are stitched together and empty regions ofspace are assigned a neutral gray level. The resultingreconstructions are fairly accurate but contain some

Fig. 12 Neocortical layer 6dataset. Top row: image stackwith reconstructions overlaid.Each colored treecorresponds to a differentroot node. Bottom row: anexample of the tree stealingproblem. The red, green andyellow trees (left) compete forthe same part of the imagedata. In the ground truthreconstruction (right), the redtree explains all the diagonalprocess

296 Neuroinform (2011) 9:279–302

topological mistakes, such as erroneous connections atcrossovers and shortcuts where the curvature is veryhigh. Solving this would require more global geometri-cal constraints than the relatively local ones we impose.

Furthermore, in our implementation as discussed insection “Linking the Anchor Points”, paths betweenanchor points only take into account the probabilitiesthat their voxels are on the centerline of a filament. Anatural extension that would generate more accurateand smoother paths would be to incorporate the widthand the orientation estimates of the voxels in the short-est path computation.

Neocortical Layer 6 Axons

This dataset is obtained by labeling axons using GFPand imaging them with a two-photon microscope. Theresulting image stack is depicted by Fig. 12. The axonsappear as clean structures and the point spread functionof the microscope is relatively small. There is someshot noise that is easily eliminated by our tubularitymeasure.

This dataset is nevertheless very challenging becausethere can be more than 30 different trees per stack.As discussed at the end of section “Introducing theGeometry-Based Term”, we handle this by introducinga virtual node connected to the root node of each in-dividual tree. These trees are then reconstructed simul-taneously. Because their branches are closely spaced, itcan easily happen that one tree “steals” the branches ofanother one, as depicted in the bottom row of Fig. 12.As a result, the DIADEM scores we obtain for thisdataset are much lower than those of Fig. 7. They are in

the order of 0.05 without the geometric priors and 0.1with them. In other words, while imposing geometricregularity and explicitly sampling from junctions helpsto some extent, a more global analysis of the interactionbetween distant anchor points and edges would berequired to automatically correct such mistakes.

Hippocampal CA3 Neurons

These stacks are relatively similar to those of Figs. 1and 6. They are acquired from the rat brains, dyed withbiocityne, and imaged using brightfield microscopes.However, their image characteristics differ in threeimportant ways. First, the point spread function of themicroscope is very elongated in the z axis. As a resultthe dendrites look more like thin planes than tubularstructures. Second, there are irregularities in the stain-ing process, which make several dendrites degenerateinto sequences of blobs. Finally, the dye spread aroundthe soma, generating blob-like structures. These arti-facts make automated tracing much more difficult.

As in the cerebellar climbing fiber dataset, the stacksconsist of several tiles that we stitched together. Asshown in Fig. 13, in spite of all the problems mentionedabove, most of the salient processes are successfullyreconstructed and the spurious ones ignored.

Neuromuscular Projection Fibers

This dataset consists of 152 image stack tiles of mouseneuromuscular axonal projection fibers acquired witha 2-channel confocal microscope. Starting from a setof nearby root nodes, the fibers follow approximately

Fig. 13 Hippocampal CA3neurons stack with overlaidreconstructions. As in thecase of Fig. 12, there areseveral trees overlaid usingdifferent colors

Neuroinform (2011) 9:279–302 297

parallel paths while twisting around each other, asshown in Fig. 14a.

Because the fibers are so close to each other,the reduced graph G, introduced in section “GraphConstruction”, inevitably contains low-cost edges thatconnect anchor points from different branches. At link-ing time, this often results in trees with branches erro-neously jumping from one fiber to another. As shown inFig. 14(b), this problem can be mitigated by increasing

the sampling rate and generating more anchor pointsthan we did for the other datasets.

However, if we were to process the whole dataset,this would result in hundreds of thousands of anchorpoints to be linked. This would be computationallyinfeasible using our current algorithm, which is why weonly show results on a single tile. A feasible strategycould then be to trace the tiles sequentially and use theleaves obtained in one tile as the roots for the next one,

Fig. 14 The tile of theneuromuscular projectionfibers dataset that containsthe root nodes. Top row:image stack withreconstructions overlaid.Each colored treecorresponds to a differentroot node. Bottom row:enlarged version. The anchorpoints are depicted by thewhite circles

298 Neuroinform (2011) 9:279–302

possibly with some manual intervention to prevent thepropagation of errors.

Conclusion

We have presented an approach to automatically infer-ring tree structures in 2D images and 3D image stacksby optimizing a global objective function that combinesimage data and geometric regularization terms. It alsoexplicitly accounts for potential problems at bifurca-tions and crossovers. This allows us to recover faint anddisconnected filaments at an acceptable computationalcost while rejecting structured noise.

We evaluated our method on brightfield micro-graphs, retinal scans of the DRIVE database and theDIADEM challenge datasets. The final reconstruc-tions are compared against manually annotated groundtruths by experts. Despite the differences in the imag-ing modalities, our method produces visually pleas-ing results on all datasets, which is also supported byquantitative results. In particular, we showed that theuse of the geometric regularization yields a substantialimprovement in the DIADEM metric scores.

The limitations of our current approach mainly comefrom the fact that the geometric properties we haveincorporated in our objective function are still rela-tively local ones. As a result, we cannot account forvery global properties, such as overall smoothness ofa branch or expected branching factors as a given dis-tance from the root of the tree. Our focus in futurework will therefore be to extend our approach to suchproperties and thereby further increase its robustness.

Information Sharing Agreement

The software described in this work will be pub-licly accessible online at http://cvlab.epfl.ch/research/medical/lm.

Acknowledgements This work was supported in part by theSwiss National Science Foundation and in part by the MicroNanoERC project. Christian Blum also acknowledges support fromthe Ramón y Cajal program of the Spanish Government.

We would like to thank Felix Schürmann for his advice andfor giving us access to the brightfield dataset and correspondingground-truth.

Appendix: Pseudo-code for the ACO-RTS Algorithm

In this appendix, we describe in more details our ACO-RTS algorithm and supply pseudo-code for it.

Notations and Overview

ACO-RTS is designed to minimize the primary costfunction of Eq. 25 and uses the auxiliary cost functionof Eq. 26 during tree construction. To this end, it main-tains three sets of trees, namely the best-so-far solutionsTbs, the restart-best solutions Trb , and the iteration-best solutions Tib . Each one of these sets contains agroup of trees with cardinalities {1, . . . , klimit}, whereklimit is the maximum cardinality of the trees to beconstructed. Its value is updated at each iteration of thealgorithm so as to focus the search on the regions ofmost likely cardinalities and to discard very high ones.

Given a directed graph G = (V, E), let H be theset of all directed paths (eus, est) containing exactlytwo consecutive edges. The pheromone model P , firstintroduced in section “Optimizing the Image-BasedTerm”, is constructed by assigning a pheromone valueτust to each one of these edge pairs and τrv to each edgeemanating from the root vertex xr. More formally,

P = {τust ∈ [τmin, τmax] | est ∈ E, eus ∈ E}∪ {τrv ∈ [τmin, τmax] | erv ∈ Er}, (27)

where τmin ∈ R and τmax ∈ R are lower and upper limitsfor the pheromone values and Er is the set of edgesemanating from xr. In this work, τmin and τmax are setto 0.01 and 0.99, respectively. In effect, this modelamounts to introducing a likelihood distribution overedge pairs, which improves convergence by resolvingconflicts at crossovers.

Figure 15 shows a simplified flowchart of the algo-rithm. At each iteration, na directed trees are prob-abilistically constructed by combining the pheromoneinformation and the auxiliary cost of Eq. 26. The treecardinality is bounded by the variable klimit, whosevalue is dynamically updated at run-time. Tree con-struction always starts at the root vertex xr, with edgesbeing probabilistically selected and iteratively added tothe tree. Edge pairs whose contribution to the auxiliarycost is small are given higher priority and care is takennot to violate the branching factor limit.

We rely on dynamic programming (Blum 2007) tofind the best l-cardinality tree for all l from 1 to klimit

in each one of the na trees that have been built. Foreach cardinality, we select from the resulting na treesthe one that minimizes the auxiliary objective functionand store it in Tib . Each l-cardinality tree in Trb or Tbs

is then updated by the l-cardinality tree in Tib , if thelatter has a lower auxiliary cost. The best cardinalityk∗ for the next iteration is then taken to be the onethat minimizes the primary objective function of Eq. 25among the trees in Tbs. The cardinality limit klimit is

Neuroinform (2011) 9:279–302 299

Fig. 15 ACO-RTS algorithmflowchart

taken to be the average of k∗ and |V| − 1, the maximumpossible cardinality of a tree. The pheromone valuesare updated by using the k∗-cardinality trees in Tib ,Trb and Tbs so that values corresponding to edge pairsbelonging to these k∗-cardinality trees are increasedwhile others are decreased. Finally, when the algorithmhas run for a preset amount of time, it returns the k∗-cardinality tree in Tbs.

Pseudo Code

Figure 16 depicts the pseudo-code that implementsthe approach outlined above. It includes the followingfunctions:

– ConstructDirectedTree(P, klimit, xr): This functionconstructs a klimit-cardinality tree T = (VT , ET)

with vertices VT ⊆ V, edges ET ⊆ E, and pairs ofconsecutive edges HT ⊆ H. The construction startsfrom the given root vertex xr, that is, initially T =({xr}, ∅). At each step, a tree is stochastically se-lected from a neighborhood of the current solution.This neighborhood is generated by adding an edgeand its target vertex to the tree under construc-tion such that the tree property is maintained (i.e.,no cycles are formed) and no bifurcation with abranching factor greater than a predefined thresh-old b th is introduced. The set of candidate edges isgenerated as follows.

300 Neuroinform (2011) 9:279–302

Fig. 16 ACO-RTS: pseudo-code for ant colony optimization for the reconstruction of tree-like structures

As a convention, let the source vertex of an edge est ∈E be denoted by xs and the target vertex by xt. Let alsoN ⊆ V be the set of vertices such that N ∩ VT = ∅ andfor each xt ∈ N, there exists at least one edge est ∈ Ewith xs ∈ VT . Moreover, let Et ⊆ E denote the set of

edges that can connect the vertex xt to T such that thefollowing conditions hold:

1. For each est ∈ Et, deg+(xs) < b th, where deg+(xs)

denotes the out-degree of vertex xs in tree T.

Fig. 17 Crossover neighborhood for a junction with two branches and three continuation edges. The green circles are vertices. The bluedotted lines represent one of the crossing branches and the red solid lines represent the other one

Neuroinform (2011) 9:279–302 301

2. There does not exist an edge esl ∈ ET such thatxs and xl are co-located vertices as defined in sec-tion “Sampling the Image”.

Then, the set of candidate edges is taken to be

C = {argminest∈Et

{w(est)} | xt ∈ N}, (28)

where w(est) is the effective weight of an edge est, whichcan be expressed in terms of the cost terms of Eq. 26.

w(est) ={

dst + b st , if xs = xr ,

dst + aust , if xs �= xr, eus ∈ ET .(29)

At each construction step, with a constant probabil-ity q ∈ [0, 1], the algorithm chooses an edge est ∈ C thatdeterministically minimizes τust/w(est), where τust ∈ Pdenotes the pheromone value assigned to the edge pairest and eus ∈ ET . Otherwise, with probability 1 − q, est ∈C is chosen probabilistically in proportion to τust/w(est).

The resulting trees may contain incorrect connec-tions at crossovers. We observed that, while growingthe tree from the root vertex, one of the crossingbranches usually grows more rapidly than the other anddominates the crossover region by taking the ownershipof the continuation edges. This is mainly caused by con-trast differences in the original data and local classifiererrors, which lead to high-cost edges not being selectedto bridge gaps along the branches.

To correct such errors, we introduce an additionalneighborhood structure, which we name crossoverneighborhood. Two trees are said to be crossover-neighbors if one can be obtained from the other bysimply changing possession of continuation edges attwo co-located crossover vertices. Figure 17 gives anexample of such a neighborhood for a crossover withthree continuation edges. After each construction step,we search over this neighborhood of the constructedtree and minimize the primary cost function. Note that,this minimization reduces to the minimization of thesum of the pairwise edge weights in this case, sinceall the trees in the neighborhood have the same set ofedges, and hence the unary edge costs are common.

– DynamicTree(T, klimit): This runs a dynamic pro-gramming algorithm (Blum 2007), which we ex-tended to handle directed graphs. For each l =1, . . . , klimit, it returns the best l-cardinality treerooted at xr. The quality of a tree is determined bythe auxiliary cost function.

– Update(Ta, Tb , klimit): This function updates the setof trees Ta by the set Tb for each cardinality. Thatis, for l = 1, . . . , klimit, if F ′

g(Ta[l]) > F ′g(Tb [l]), then

Ta[l] = Tb [l], where Ta[l] and Tb [l] denote the l-cardinality trees in Ta and Tb , respectively.

– PheromoneUpdate(cf , bs_update, P, Tib [k∗],Trb [k∗], Tbs[k∗]): The pheromone values are up-dated using the three solutions Tib [k∗], Trb [k∗] andTbs[k∗] that have the same k∗ cardinality. Theirinfluence on the update depends on the state ofconvergence of the algorithm. This procedure is thesame as the one described in Blum and Blesa (2005)for the k-cardinality tree problem, except that thepheromone values are assigned to edge pairs in-stead of individual edges. We refer the interestedreader to our earlier publication for further details.

– ComputeConvergenceFactor(P, Trb [k∗], k∗): Inthis procedure, the convergence factor of thealgorithm is computed based on the convergenceof the restart-best tree with cardinality k∗. We takeit to be

cf =

h∈PTrb [k∗]

τh

k∗.τmax, (30)

where PTrb [k∗] denotes the set of pheromone valuesof the edges and edge pairs included in Trb [k∗] andτmax is the upper limit for the pheromone values.Note that, the convergence factor can only assumevalues in the interval [0 1].

References

Al-Kofahi, K. A., Lasek, S., Szarowski, D. H., Pace, C. J., Nagy,G., Turner, J. N., et al. (2002). Rapid automated three-dimensional tracing of neurons from confocal image stacks.Transactions on Information Technology in Biomedicine,6(2), 171–187.

Ascoli, G. A., Svoboda, K., & Liu Y. (2010). Digital reconstruc-tion of axonal and dendritic morphology DIADEM chal-lenge. http://diademchallenge.org/.

Blum, C. (2007). Revisiting dynamic programming for findingoptimal subtrees in trees. European Journal of OperationalResearch, 177(1), 102–115.

Blum, C., & Blesa M. (2005). Combining ant colony optimizationwith dynamic programming for solving the K-cardinality treeproblem. In Computational intelligence and bioinspired sys-tems. Lecture notes in computer science (Vol. 3512, pp. 25–33).

Cai, H., Xu, X., Lu, J., Lichtman, J., Yung, S., & Wong, S. (2006).Repulsive force based snake model to segment and trackneuronal axons in 3D microscopy image stacks. NeuroImage,32(4), 1608–1620.

Can, A., Shen, H., Turner, J., Tanenbaum, H., & Roysam, B.(1999). Rapid automated tracing and feature extraction fromretinal fundus images using direct exploratory algorithms.Transactions on Information Technology in Biomedicine,3(2), 125–138.

302 Neuroinform (2011) 9:279–302

Dorigo, M., & Stütale, T. (2004). Ant colony optimization. Cam-bridge, MA: MIT Press.

Duhamel, C., Gouveia, L., Moura, P., & Souza, M. (2008). Mod-els and heuristics for a minimum arborescence problem. Net-works, 51(1), 34–47.

Fan, D. (2006). Bayesian inference of vascular structure from reti-nal images. Ph.D. thesis, Dept. of Computer Science, U. ofWarwick, Coventry, UK.

Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structuresfor object recognition. International Journal of ComputerVision, 16(1), 55–79.

Felzenszwalb, P., & McAllester, D. (2006). A min-cover ap-proach for finding salient curves. In Conference on computervision and pattern recognition (pp. 61–74).

Fischler, M., & Heller, A. (1998). Automated techniques for roadnetwork modeling. In DARPA image understanding work-shop (pp. 501–516).

Frangi, A. F., Niessen, W. J., Vincken, K. L., & Viergever, M.A. (1998). Multiscale vessel enhancement filtering. LectureNotes in Computer Science, 1496, 130–137.

Garg, N. (1996). A 3-approximation for the minimum tree span-ning K vertices. In IEEE symposium on foundations of com-puter science (Vol. 27, pp. 302–309). Washington, DC, USA:IEEE Computer Society.

Gonzalez, G., Aguet, F., Fleuret, F., Unser, M., & Fua, P. (2009).Steerable features for statistical 3D dendrite detection. InConference on medical image computing and computer as-sisted intervention (Vol. 12, pp. 625–32).

Gonzalez, G., Fleuret, F., & Fua, P. (2008). Automated delin-eation of dendritic networks in noisy image stacks. In Euro-pean conference on computer vision. Lecture notes in com-puter science (Vol. 5305, pp. 214–227). Berlin/Heidelberg:Springer.

Huang, K., & Yan, M. (2006). Robust optic disk detection inretinal images using vessel structure and radon transform.In SPIE (Vol. 6144).

Jacob, M., & Unser, M. (2004). Design of steerable filters for fea-ture detection using canny-like criteria. IEEE Transactionson Pattern Analysis and Machine Intelligence, 26(8), 1007–1019.

Law, M., & Chung, A. (2008). Three dimensional curvilinearstructure detection using optimally oriented flux. In Euro-pean conference on computer vision (pp. 368–382).

Lee, T., Kashyap, R., & Chu, C. (1994). Building skeleton modelsvia 3-D medial surface axis thinning algorithms. CVGIP:

Graphical Models and Image Processing, 56(6), 462–478.

Leordeanu, M., Hebert, M., & Sukthankar, R. (2007). Beyondlocal appearance: Category recognition from pairwise inter-actions of simple features. In Conference on computer visionand pattern recognition (pp. 1 –8).

Meijering, E., Jacob, M., Sarria, J. C. F., Steiner, P., Hirling, H.,& Unser, M. (2004). Design and validation of a tool for neu-rite tracing and analysis in fluorescence microscopy images.Cytometry Part A, 58A(2), 167–176.

Platt, J. (2000). Advances in large margin classifiers. Chap prob-abilistic outputs for SVMs and comparisons to regularizedlikelihood methods. Cambridge, MA: MIT Press.

Santamaría-Pang, A., Colbert, C. M., Saggau, P., & Kakadiaris,I. (2007). Automatic centerline extraction of irregular tubu-lar structures using probability volumes from multiphotonimaging. In Conference on medical image computing andcomputer assisted intervention (pp. 486–494).

Schoelkopf, B., Burges, C., & Smola, A. (1999). Advances in Ker-nel methods. In Support vector learning. Cambridge, MA:MIT Press.

Staal, J., Abramoff, M., Niemeijer, M., Viergever, M., & vanGinneken, B. (2004). Ridge based vessel segmentation incolor images of the retina. IEEE Transactions on MedicalImaging, 23, 501–509.

Sun, K., Sang, N., & Zhang, T. (2007). Marked point processfor vascular tree extraction on angiogram. In Conference oncomputer vision and pattern recognition (pp. 467–478).

Vasilkoski, Z., & Stepanyants, A. (2009). Detection of the opti-mal neuron traces in confocal microscopy images. Journal ofNeuroscience Methods, 178(1), 197–204.

Weaver, C., Hof, P., Wearne, S., & Brent, L. (2004). Automatedalgorithms for multiscale morphometry of neuronal den-drites. Neural Computation, 16(7), 1353–1383.

Xie, J., Zhao, T., Lee, T., Myers, G., & Peng, H. (2010). Auto-matic neuron tracing in volumetric microscopy images withanisotropic path searching. In Conference on medical imagecomputing and computer assisted intervention.

Xu, J., Wu, J., Feng, D., & Cui, Z. (2009). DSA image blood ves-sel skeleton extraction based on anti-concentration diffusionand level set method. Computational intelligence and intelli-gent systems, 51, 188–198.

Yedidya, T., & Hartley, R. (2008). Tracking of blood vessels inretinal images using Kalman filter. In Digital image comput-ing: Techniques and applications (pp. 52–58).