robust boundary detection with adaptive groupingjepson/segmentation/... · gestalt grouping...

Robust Boundary Detection With Adaptive Grouping

Francisco J. EstradaCentre for Vision Research, York University

Toronto, On., M3J 1P3, [email protected]

Allan D. JepsonDept. of Comp. Sci., University of Toronto

Toronto, On., M5S 3G4, [email protected]

Abstract

This paper presents a perceptual grouping algorithmthat performs boundary extraction on natural images. Ourgrouping method maintains and updates a model of the ap-pearance of the image regions on either side of a growingcontour. This model is used to change grouping behaviourat run-time, so that, in addition to following the traditionalGestalt grouping principles of proximity and good contin-uation, the grouping procedure favours the path that bestseparates two visually distinct parts of the image. The re-sulting algorithm is computationally efficient and robust toclutter and texture. We present experimental results on nat-ural images from the Berkeley Segmentation Database andcompare our results to those obtained with three alternategrouping methods.

1. Introduction

Consider the following experiment: A human observer(perhaps the reader of this paper) is presented with a linedrawing such as the one illustrated in Fig. 1 and asked toidentify the central object in the image. After looking at itcarefully for a moment, the observer might point out withremarkable confidence that he is looking at an image of abear. In a surprisingly short interval, the observer has per-ceptually organized a large set of input features into mean-ingful groups and, at some point during the organizationprocess, correctly identified the object in the image. A vastamount of research has been dedicated to identifying theprinciples that guide the formation of such groups and todeveloping algorithms that can exploit these principles toperceptually organize sets of input features.

In the particular case of line drawings, the above demon-stration suggests that it should be possible to organize aset of input line segments using exclusively the informa-tion provided by the geometric relationships between them.Indeed, significant progress has been made in the fields ofperceptual grouping and contour extraction by the cleverexploitation of such geometric cues as proximity, smooth

Figure 1. A complex line drawing with a bear inside it (left). Bestcontour detected with the grouping algorithm from [5] (right).

continuation, parallelism, and compactness. These cues,commonly referred to as Gestalt grouping principles orGestalt laws after the school of psychologists that first in-vestigated them at the turn of the 20th century [10, 20],have proven useful because they are good predictors ofnon-accidentalness [11]: groups of features that exhibit theabove properties are unlikely to have originated randomly.

Gestalt principles are the foundation of all current con-tour extraction algorithms. However, our computationalmodels of these principles are not yet sufficient to deal withcomplex line sets such as the one depicted in Fig. 1. Insuch images, the abundance of clutter and structured tex-ture results in increased ambiguity that geometric cues maynot be able to resolve, rendering contour grouping algo-rithms unreliable. The increased ambiguity also manifestsitself as a dramatic increase in the computational expenserequired to perform grouping on complex line-sets. Theseproblems have traditionally been alleviated by introducingdomain specific knowledge, or by enforcing strong shapeconstraints such as convexity, but these approaches limit theapplicability of grouping algorithms.

In this paper we show that an algorithm that adapts itsgrouping behaviour to the appearance of the image regionson either side of a growing contour is able to recover, reli-ably and robustly, the boundaries of objects in natural im-ages with significant amounts of texture and clutter. Ourmethod is based on the grouping algorithm of Estrada andJepson [5]. Their algorithm provides an efficient search-based engine for contour extraction. We extend the algo-rithm to maintain and update a model of the appearance of

the image regions on either side of a growing contour. Thismodel is used, in conjunction with traditional Gestalt cues,to guide the boundary extraction process so as to follow thepath that best separates two visually distinct parts of the im-age. Once a complete contour has been found, the appear-ance model for the inside and outside of the contour can beused to measure the quality of the extracted boundary. Thenovel property of our algorithm is that it changes its group-ing preferences at run-time depending on the characteristicsof a particular grouping hypothesis. This allows the algo-rithm to accumulate information about the appearance of apotential object. Such information is particularly useful forfinding the boundaries of heterogeneous objects, and is notavailable at the start of the search.

The main contributions of this paper are: 1) an adap-tive grouping framework that uses both geometric infor-mation together with an evolving appearance model of po-tential objects; 2) a procedure for modifying the groupingbehaviour of the algorithm at run-time, based on the ap-pearance model for a particular (partial) contour hypothesis;3) an appearance-based measure of contour quality used torank detected shapes. 4) A quantitative evaluation and com-parison of our algorithm and two alternate grouping meth-ods on images from the Berkeley Segmentation Database(BSD). We also evaluate a reduced version of our frame-work that can use colour, but is not adaptive. We will showthat the proposed algorithm is robust and efficient, and thatit extracts boundaries that correspond more closely to indi-vidual object contours.

2. Previous Work

Because of space limitations, a comprehensive review ofperceptual grouping research is not possible. Here we dis-cuss only those methods that are directly related to the prob-lem of detecting closed object contours from images. Ma-hamud et al. [12] propose an algorithm for contour extrac-tion based on the stochastic completion fields of Williamsand Jacobs [21]. The algorithm estimates the probabilitythat a random walk started at one segment will visit otherlines in the image and come back to the starting edge. Theseprobabilities determine the saliency of segments and theirconnecting links. Closed contours are extracted as con-nected components derived from the edge and link salien-cies. Elder and Zucker [2] describe a method that findsclosed contours as shortest path cycles in a graph. Theirformulation expands the representation of lines by includ-ing brightness values on either side of a segment. Elderet al. [1] extend this algorithm to extract closed boundariesby finding progressively longer chains of edges. Their al-gorithm explores only the edge sequences that have highlikelihood with regard to a set of probabilities derived fromnatural scene statistics.

Jacobs [9] proposes an efficient, search based method for

convex group extraction based on a threshold on bound-ary coverage (i.e. the percentage of a contour that is cov-ered by image segments). Huttenlocher and Wayner [8]also use convexity as a constraint, but propose a greedysearch algorithm. Saund [16] presents a grouping methodthat uses shape compactness (the ratio of a contour’s areato the area of its convex hull), along with predefined prefer-ences for particular grouping choices, to find closed bound-aries on sketches. Estrada and Jepson [5] introduce an effi-cient search-based grouping algorithm based on local affin-ity normalization and shape compactness.

Sarkar and Soundararajan [18] use graph partitioningtechniques to identify subsets of features that are weaklyconnected to the remaining features in a graph that en-codes parallelism, perpendicularity, proximity, and conti-nuity. Gdalyahu et al. [7] propose an algorithm that deter-mines subsets of features that appear together on stochasti-cally generated cuts in a graph. Wang et al. [19] present analgorithm that extracts closed contours as maximum likeli-hood cycles in a graph which encodes geometric relation-ships between neighboring edges. The likelihood of closedcontours is normalized by the contour length to avoid a biasfor small boundaries. They show that their Ratio Contouralgorithm compares favorably to the grouping algorithmsof Mahamud et al. [12], and Elder and Zucker [2].

A common problem for current contour grouping algo-rithms is that they rely almost exclusively on geometric re-lationships between edges, line segments, or small curvefragments. In general, current boundary extraction meth-ods fail when confronted with images that are rich in clutterand structured texture. This is illustrated in Fig 1. Group-ing in such images is extremely challenging because tex-ture elements tend to align with object boundaries to yielda combinatorially large number of possible closed contours.In addition to this, clutter and texture combine to cause anexplosive increase in grouping complexity that can rendermany grouping approaches impractical.

3. Robust Contour Extraction

Our search framework is based on that of Estrada andJepson [5]. Their algorithm is capable of efficiently search-ing for closed contours in cluttered line-sets. For each pairof segments l1 and l2, our algorithm computes an affinityvalue given by

Taffty (l1, l2) = Gaffty(l1, l2) + γ, (1)

where the term Gaffty(l1, l2) ≥ 0 scores junctions accord-ing to geometric cues, and γ > 0 is a suitable constant. Theform of this affinity function represents the belief that, onany given image, we will find two types of junctions: first,there are junctions that correspond to neighboring pairs oflines along object boundaries. Such junctions should bewell-modeled by geometric grouping principles and should

receive a high Gaffty(l1, l2) score. Secondly, we have junc-tions that originate from un-structured texture, clutter, andnoise. These junctions should receive a low Gaffty(l1, l2)score, and their affinity value will be dominated by the uni-form term γ. The fact that such accidental junctions receivea similar affinity value (i.e. roughly γ) indicates that wehave little reason to believe one of these junctions to be bet-ter than another.

The Gaffty(l1, l2) term we use here has a very simpleform. It captures the geometric cues of proximity andsmooth continuation, and has only one free parameter (no-tice that this is different from the form proposed in [5]).The proximity component is an exponential function of thegap between the segments measured as the distance d be-tween their closest endpoints: e−d2/(2σ2

gap) with an appro-priate σgap. The smooth continuation term is proportionalto the cosines of the angles between each segment and theline that joins the midpoints of these segments. If we callthese angles θ1 and θ2 respectively, the smooth continuationterm is given by ((cos(θ1)+1)/2)∗((cos(θ2)+1)/2). Thisterm is 1 when the lines are collinear, and it decreases as theangles between the lines grow. The geometry used here issimilar to that proposed by Elder and Zucker in [2], exceptthat they measure θ1 and θ2 with regard to the line joiningthe closest endpoints of the two segments.

The Gaffty score is calculated by multiplying togetherthe proximity and smooth continuation terms. The affinitygiven by (1) is computed for every pair of lines in the im-age, but the grouping algorithm doesn’t use the raw affinityscores. Instead, affinity values are locally normalized as fol-lows: for each segment l1 the algorithm finds the set K thatcontains the k segments with the largest affinities toward l1.For each of these k segments, a normalized affinity score iscomputed as

Naffty(l1, l2) =Taffty(l1, l2)∑

li∈K Taffty(l1, li)∈ (0, 1).

The resulting normalized affinities sum to 1, and provideinformation about the relative goodness of possible group-ing choices in the neighborhood of segment l1. The prop-erties of these normalized affinities are discussed in detailin [4], but it should be noted that they are not symmetricso in general Naffty(li, lj) �= Naffty(lj , li). The local nor-malization procedure leads to a robust and efficient search,and is particularly good at reducing search complexity intextured and cluttered regions of the line-set.

The contour grouping process is a depth-first search con-trolled by a threshold τaffty on normalized affinities and bygeometric constraints. In particular, each partial group ischecked for simplicity (no self-intersections are allowed)and for compactness. That is, the ratio of the area of a con-tour to the area of its convex hull should be greater thansome threshold τcompact. The search algorithm is summa-rized as follows:

1 - For each segment i in the line-set, generate a group con-taining the single segment i.2 - Find the set K with the best k junctions for the segmentadded most recently to the group (sorted by decreasing nor-malized affinity).3 - For each segment j in K

– If Naffty(i, j) < τaffty end loop, otherwise add j tothe group.

– Compute the compactness of the shape, if compactness< τcompact try the next j.

– Check for closure, if the group is closed, compute itssaliency and report it.

– Recursively call step 2 with the newly extended group.

4 - Report the extracted polygons sorted in order of decreas-ing saliency.

We will discuss how to estimate the saliency of a contourfurther on in the paper. Estrada and Jepson show that theuse of normalized affinities leads to an efficient search evenin the presence of clutter and un-structured texture. How-ever, as illustrated in Fig 1, the presence of large amountsof structured texture leads their algorithm to form groupsthrough textured regions of the image and miss the actualobject contours. In what follows, we will describe the pro-cedure that allows our algorithm to adaptively change theaffinities between pairs of segments. We describe the ap-pearance model used by the algorithm to judge region simi-larity, and discuss how this similarity can be used to changegrouping behavior at run-time depending on a particular(partial) grouping hypothesis.

4. Building an Appearance Model for PartialContours

There are many possible ways in which we could repre-sent the appearance of the regions that lie on either side ofa growing contour. We will not explore the important butseparate problem of determining the optimal representationhere. Instead we will use a simple, non-parametric repre-sentation that has been used extensively and with successin the context of image database retrieval [14]. We repre-sent local appearance using colour histograms [17] in theCIELab colour space [22]. The histogram representationis compact, easy to compute and manipulate, has well stud-ied similarity measures [15], and, as we shall see, can beeasily updated as additional knowledge about the appear-ance of an image region becomes available. In addition tothis, the same model supports the use of other image cuessuch as texture [14].

Local histograms are computed within a small squarewindow of fixed size wsize centered at a particular im-age location (x, y). Each histogram contains a fixed num-ber nbins of bins, and three layers corresponding to thethree CIELab colour components. The colour values ateach pixel within the window are accumulated onto the

histogram with contributions weighted by a 2-D Gaussianmask (we ignore any pixel locations outside the imagebounds). The resulting histograms are normalized to sumto 1. The choice of σ for this Gaussian is determined by thewindow size so that at least 1.5σ fit within the window oneither side of (x, y). We will discuss in detail the appropri-ate choice of wsize and number of bins nbins further on inthe paper.

The local histograms need to be computed only once,before the search phase begins. Given the overlap be-tween neighboring histograms, we could sample the imagesparsely and still obtain a good representation. However,in the interest of simplicity, here we compute and store lo-cal colour histograms centered at every pixel in the image.To simplify notation in the next two sections of the paper,we will denote local histograms H as single layer, unit-areahistograms that result from concatenating the individual his-tograms for the three layers of the CIELab colour space.

4.1. Choosing the Optimal Histogram Characteris-tics

To determine the optimal values for wsize and nbins, weperformed an empirical evaluation of three histogram sim-ilarity measures on images from the Berkeley Segmenta-tion Database. We randomly selected 75 images from thetraining image set of the BSD, and for each image we se-lected the human segmentation with fewest regions (underthe assumption that this segmentation correctly captures thecoarse structure of the image). We then randomly choseone of these 75 images, and extracted a pair of randomlylocated patches for a given wsize. We used the ground truthto determine whether the patches belonged to the same im-age region or not. We then computed histograms with dif-ferent numbers of bins (between 2 and 50) from the ex-tracted patches, and evaluated the similarity between themusing histogram intersection, the χ2 measure, and the JD-divergence (which is similar to the KL-divergence but hasthe advantage of being symmetric). See [15] for details onthese measures.

We examined window sizes between 3 and 79 pixels,and for each window size we performed 2500 trials (eachof them on a randomly chosen image). At the end of thesequence of trials, we computed a joint score for each com-bination of wsize and nbins. The joint score is the averagesimilarity for pairs of patches from the same image regionplus the average dissimilarity for patches from different re-gions (where dissimilarity = 1 − similarity, or vice-versaif the chosen measure yields dissimilarity). The results areshown in Fig. 2. From these plots we can select the optimalvalues for wsize and nbins that maximize the joint score.Notice that the three measures agree that smaller histogramswith fewer bins yield the best results on the set of trainingimages. Based on this, we set wsize = 11 and nbins = 4 in

0

5

10

15

20

25

30

35

40

45

500 10 20 30 40 50 60 70 80

Joint χ2 score

Window Size

Num

ber

of B

ins

0

5

10

15

20

25

30

35

40

45

500 10 20 30 40 50 60 70 80

Joint Histogram−intersection score

Window Size

Num

ber

of B

ins

0

5

10

15

20

25

30

35

40

45

500 10 20 30 40 50 60 70 80

Joint JD−divergence score

Window Size

Num

ber

of B

ins

Figure 2. Joint score for the χ2 (left), histogram intersection (cen-tre), and JD-divergence (right) similarity measures for differentwindow sizes and number of histogram bins. Blue indicates lowscores, red indicates high scores.

our grouping method, and will use these values throughoutour experiments.

To select the similarity measure, we follow the recom-mendations put forth by Rubner et al. in [15]. Theyperformed a thorough experimental comparison of severalmeasures of histogram similarity. Their results indicate thatfor the tasks of classification and image segmentation theχ2 measure performs best at a reasonable computationalcost (the Earth-Mover’s distance has the best discrimina-tion power, but Rubner et al. point out that for applicationssuch as our algorithm that require the similarity measure tobe evaluated a large number of times, its computational costbecomes prohibitive). For the above reasons, we have cho-sen to quantify histogram similarity using the χ2 measurewhich for two histograms H1 and H2 is defined as

χ2(H1, H2) =∑

i

(f(i; H1) − f̂(i))2

f̂(i)(2)

where f(i,H1) is the value of the ith bin of H1, and f̂(i) =(f(i,H1) + f(i,H2))/2. For unit-area histograms the χ2

measure gives values in [0, 1], where 0 indicates that thehistograms are identical.

5. Adaptive Grouping

Given the colour histograms on either side of a partialcontour, we can bias the growth of the contour so that itfavours the path that best separates two dissimilar colourdistributions. More formally, we say that each partial con-tour splits the image into two regions: an inside region,which lies to the right of the contour when traveling in thedirection of contour growth, and an outside region that liesto the left of the contour. We maintain and update colourhistograms for the inside and outside colour distributionsalong the partial contour.

Each segment to be considered for grouping is associ-ated with two local histograms corresponding to the insideand outside (right and left respectively) with regard to thedirection of contour growth (which always proceeds clock-wise). To select the appropriate local histograms, we lookin the direction perpendicular to the segment starting fromits midpoint, and retrieve the pre-computed histograms at

a distance of ±(wsize + 1)/2. This choice ensures that weuse histograms on either side of the segment that are close tothe line, but still mostly contained within the inside or out-side regions bounded by the contour. At the same time, theGaussian weighting ensures that pixels close to the edgescontribute little to the histograms, thus providing robustnessto the colour variations that occur near image edges.

At search time, we compare the associated histogramsof possible grouping choices with the colour histogramsfor the current contour, and we bias the search to favourgrouping together edges that have similar colour distribu-tions on the inside, as well as high dissimilarity betweenthe inside and outside colour histograms. We denote thecontour’s inside histogram by Hb−in, its outside histogramby Hb−out, and the inside and outside histograms of asegment that is being considered as a possible extensionto the contour by Hs−in, and Hs−out. We then use χ2

to compute the similarity between the colour distributionson the inside, and the dissimilarities between the colourdistributions on opposite sides of the boundary: Sin =1 − χ2(Hs−in,Hb−in), Do→i = χ2(Hs−out,Hb−in), andDi→o = χ2(Hs−in,Hb−out).

Here, Sin is simply the similarity between the insidecolour distributions of the segment and the contour. Do→i

is the dissimilarity between the colour distribution on theoutside of the new segment and the inside of the contour.It discourages grouping with segments that are inside anobject (in which case, Sin is high, but Do→i dissimilarityshould be low). Di→o is the dissimilarity between the in-side of the new segment and the outside of the contour; itis useful when no candidate segment can be found whoseinside colour distribution matches the inside of the con-tour. In this case, we favour the segment whose inside ismost dissimilar to the outside of the current contour. Giventhese three terms, we define the histogram-based colouraffinity between a segment l and a partial contour B asCcolour−affty(l, B) = (Sin + Do→i + Di→o)/3. This mea-sure combines the three components of the colour affinitywith equal weights. In general, an optimal weighting couldbe determined from results on training images.

Incorporating the colour affinity into the search proce-dure involves multiplying the original geometric affinityvalues by the newly computed colour affinities, and re-normalizing the resulting scores. For each line l2 in theset K that contains the best grouping choices for the lastsegment l1 in the current contour B, we compute

Taffty = (Gaffty(l1, l2) ∗ Ccolour−affty(B, l2)) + γ,

and re-normalize the resulting, modified affinities. Oncea segment has been chosen for grouping, it is added to thecontour, and the segment’s histograms are used to updatethe contour’s appearance model. Since a contour with Msegments has already accumulated M local histograms, thenew inside histogram is given by Hb−in = ((M/(M +

Figure 3. Clockwise from top-left: original image and three pro-gressively longer contours (including the final, closed shape). Foreach contour, we show the similarity of the local histograms acrossthe image with regard to the current model for the inside of the par-tial boundary (brighter is more similar). Notice that the similaritychanges as the contour grows, and observe that parts of the imagethat should be inside in the bear are given high similarity.

1))Hb−in)) + ((1/(M + 1))Hs−in)), and similarly for theoutside histogram. The search algorithm is identical to theone described on Section 3 and uses the same thresholds onnormalized affinity and compactness.

The above procedure has several important effects: First,it changes the exploration order of potential groupingchoices. The algorithm will favour segments whose colourhistograms agree with the current appearance model forthe partial contour. Second, it reduces the chance that thegrouping algorithm will wander into textured regions evenfor structured texture patterns; this is because segmentswithin a textured region are likely to have similar colourdistributions on both sides. Third, it allows the algorithmto use information that was not available at the beginningof the search phase, and that keeps changing as the con-tour grows. The resulting adaptive search enables our al-gorithm to find object boundaries more accurately and withincreased robustness. The adaptation process is illustratedin Fig 3. Notice that the model does a good job of esti-mating which parts of the image are likely to be part of theinside of the object bound by the partial contours.

6. Estimating The Quality of a Contour

The contour grouping phase is likely to find hundredsof closed groups on moderately complex line-sets. Thesehave to be ranked so that only a few contours, deemed to be’salient’, are reported back to the user. In the current group-ing literature, the saliency of groups is either determined asa side-effect of the optimization algorithm being used (i.e.the contour found first by a given optimization process is bythe definition the most salient, like in [19]), or evaluated af-

terward using geometric information, either within a proba-bilistic (Bayesian) framework, such as in [5], or using espe-cially defined saliency measures (for example, as in [12]).The important thing to notice is that such saliency measuresdepend almost exclusively on geometric information, andare thus likely to be unreliable in images with significantamounts of clutter and texture.

We propose that the quality of a contour should be re-lated to the quality of the segmentation of the image in-duced by that contour. The best boundary neatly sepa-rates two visually distinct regions of the image. To quan-tify this, we evaluate the cost of encoding the image usingthe colour distributions inside and outside a hypothesizedgroup. We compute colour histograms (here we will useagain the original notation, with 3, unit-area colour lay-ers per histogram) Hin, and Hout for the inside and out-side of the contour respectively. Under the simplifying as-sumption that the three colour components for a given pixelare independent, the probability of drawing a pixel’s colour�x = (xL, xa, xb) from a particular colour histogram isgiven by p(�x|H) = H(BxL

) · H(Bxa) · H(Bxb

), whereBxlayer

gives the appropriate layer and bin in the histogramfor the indicated colour component of pixel x.

We estimate the encoding cost of a contour as the sumof the log-probabilities of drawing each pixel in the imagefrom the appropriate region (inside or outside of the con-tour). For the shape S bound by the contour

EC = −∑

�x∈S

log(p(�x|Hin)) −∑

�x/∈S

log(p(�x|Hout)).

This measure favours boundaries that separate two differentcolour distributions because histograms for well-separateddistributions have more probability mass on the bins oftheir corresponding pixels than histograms for more hetero-geneous distributions (which are flatter, and get closer touniform the more heterogeneous the corresponding region).All groups extracted by the algorithm are sorted, and thecontours with the lowest encoding cost are presented at thetop of the list.

7. Experimental Results

In this section we present a comparative study of group-ing performance between our adaptive algorithm and threealternate grouping methods. The algorithms were testedon 21 images from the testing image set of the BSD. Weselected only those images that contain at least one large,un-occluded object that is fully contained within the imagebounds. We generated ground-truth object boundaries fromthe human segmentations provided with the BSD. Addition-ally, we selected 10 images from the training set of the BSD(following the same criteria) to be used as a training set forthe parameters in our algorithm. The selected images de-pict complex objects in natural environments and usually

contain large amounts of texture and clutter; as such, theyconstitute difficult perceptual grouping problems.

We compare our adaptive algorithm (AA) against a vari-ation of our method that uses local colour histograms tocompute a pairwise colour affinity for every pair of lines,but does not maintain or use an appearance model for theinside and outside of partial contours. The pairwise colouraffinity is computed before the search phase begins and ismultiplied directly onto the geometric affinity for the corre-sponding pair of segments. The resulting non-adaptive al-gorithm (NA) provides a baseline for judging whether theadaptation process provides any advantage over a methodthat uses colour, but not adaptation. We also compare AAagainst the original grouping method of Estrada and Jepson(EJ) [5], and against the Ratio Contour algorithm (RC) ofWang et al. [19]1.

All algorithms receive as input the same edge maps com-puted using the Canny edge detector. A robust line-fittingprocedure was used to generate the line-sets for AA, NA,and EJ . Ratio Contour uses its own feature fitting pro-cess to generate a set of input fragments (small segmentsmodeled with spline curves). The values of the parametersfor our method were set on the training images. We dis-cussed the optimal values for wsize and nbins in the textabove. In addition to these, our algorithm requires σgap

and γ for the geometric affinity, and a compactness thresh-old τcompact. We selected a low compactness threshold ofτcompact = .5 to allow for reasonably non-compact shapestypical of objects such as long-legged animals (in contrast,the original EJ used τcompact = .8), and we set the val-ues of σgap = 15 and γ = .1 so that the distribution ofnormalized affinities on our training images had the appro-priate shape described in [4]. Finally, Estrada and Jepsonsuggest that values of τaffty in [1.1/K, 1.5/K] are the mostuseful, and we have observed that on our training imagesτaffty = (1.1/K) = .055 yields the best results. We usethese same parameter values throughout all of our tests forboth AA and NA. For EJ and RC we used the original pa-rameters set by the authors of the corresponding algorithms,except that for EJ we used the same τaffty = .055 as withAA.

To evaluate the quality of a contour, we use a simple errormeasure based on the average distance between boundarypixels from two contours. If B1 is a detected boundary,and B2 is the corresponding ground truth for the image, wedefine the distance error between the two contours as

Derror =

∑NB1i=1 dist(B1(i), B2)

NB1

+

∑NB2j=1 dist(B2(j), B1)

NB2

where NB1 is the number of boundary pixels in B1,dist(B1(i), B2) is the distance between boundary point i in

1We would like to acknowledge the kindness of Dr. Song Wang inproviding us with his implementation of RC.

Figure 4. Results on 3 test images from the BSD (481 × 321 pix-els in size). From top to bottom: input line-sets, best contours ex-tracted by AA, best boundaries from NA, best RC contours (us-ing RC’s own segment fitting procedure), and best groups foundby EJ . Note that RC and EJ do not use colour information.

B1 and the closest boundary point in B2; and similarly forNB2 and dist(B2(i), B1). This distance error will be smallwhen the detected and ground-truth boundaries are alignedwith each other without additional or missing sections.

For each image, we have a set of ground truth contours(corresponding to the different human segmentations of thesame image and any large objects within it). For a givendetected contour B1, we compute and report the averageDerror with regard to all available, overlapping ground-truth contours from the corresponding image. Each algo-rithm was allowed to generate 10 contours (using its ownranking method), and from these contours we selected theone with lowest distance error for comparison. We dothis because otherwise we would be unable to determinewhether any observed differences in performance are due tothe algorithms themselves, or if instead they are simply theresult of differences in the ranking of output shapes. Fig-ure 4 shows the resulting contours for several challengingimages from our testing set. Notice the complexity of theinput and the abundance of texture and clutter.

The results show that the adaptive algorithm (AA) isable to retrieve contours that more closely approximate theboundary of the main object in each scene. It is also clearthat the non-adaptive NA method, while better than theoriginal EJ and RC, cannot match the quality of the con-

5 10 15 200

50

100

150

200

250Distance Error for Individual Images

Image

Dis

tanc

e E

rror

(in

pix

el u

nits

)

AANAEJRC

0

10

20

30

40

50

60

70Mean Distance Error over 21 Test Images

Dis

tanc

e E

rror

(in

pix

el u

nits

)

AANAEJRC

Figure 5. Left: distance error (as defined in the text) for all of the21 test images. Right: mean distance error over the complete dataset for each of the algorithms.

tours extracted with AA. This indicates that the appearancemodel is in fact providing AA with the necessary guidancethat enables it to detect good object boundaries. RC per-forms well, though it still has problems on very clutteredscenes. It appears to be the case that RC benefits fromthe smoothing introduced by the spline approximation usedfor fitting the input fragments. As a result, contours pro-duced by RC are usually very smooth. EJ suffers from thecomplexity of the input and often yields groups that incor-porate texture elements. Figure 5 shows the distance errorobtained by each algorithm on each of the test images, aswell as the average distance errors over the complete testset. This figure shows that the adaptive algorithm producesbetter results on most of the test images, achieving a sig-nificant reduction in boundary localization error. It is worthnoting that the adaptive algorithm performs well on difficultimages on which the competing methods perform poorly.

The average run-times for the algorithms are: 3.2 min.for AA, 4.2 min. for NA, 5.2 min. for RC, and .5 min. forEJ . It is interesting to see that NA takes longer to find con-tours. Since both algorithms require an equivalent amountof computation per search step, this indicates that the ap-pearance model in AA not only produces better contours,but also reduces overall search complexity when comparedto a purely local approach.

We note that our method has some limitations. The algo-rithm can fail to detect a reasonable boundary if an objectand its background have similar colour distributions (e.g. ifthe main object is camouflaged), or if several objects withsimilar appearance are in close proximity. Also, algorithmparameters are scale-dependent. In particular, for imagesthat are significantly larger than those used here, texturefeatures on an object may become larger than the local-histogram windows; the resulting colour bias would favourthe formation of groups around individual texture elements.Clearly, a multi-scale approach would be desirable. One ofthe authors is currently working on a multi-scale groupingmethod and preliminary results [3] appear promising. Thismulti-scale algorithm is not based on the grouping frame-work described here, but instead builds on the work of El-

der et al. [1]. On images half the resolution of those usedhere, the multi-scale method achieves a mean distance error(after adjusting for image size) of 36.32 which is very closeto the error achieved by our adaptive algorithm. Results atfull-resolution are worse due to problems with the currentmulti-scale implementation.

It is worth noting that humans achieve a mean distanceerror of just a couple of pixels which is much better thanall of the grouping methods tested here. The gap in perfor-mance is similar to that observed for edge detection [13],and for image segmentation [6]. Clearly, some componentof these tasks is not properly captured by current bottom-upalgorithms. This should be a topic of reflection and discus-sion for the vision community.

The framework presented here could be extended with-out much effort to use texture information in the form ofhistograms of filter-bank responses (see [14]). The appro-priate way to combine colour and texture can be learnedfrom training images, as has previously been demonstratedby Martin et al. [13]. A more interesting extension involvesthe use of histograms of optical flow vectors to perform en-hanced boundary extraction on motion sequences. Theseextensions are the topic of our current research. A moreambitious project would be to explore the use of recognitionalgorithms to modify the behaviour of the grouping phase atrun-time. A match between part of an object boundary anda partial contour could be used to generate a spatial prior toguide contour formation. The grouping algorithm, in turn,could inform the recognition stage of its success or fail-ure at extending a contour along some hypothesized shape,this could be used to either reject or refine possible objectmatches. However, robustly matching partial contours toobject boundaries is a difficult and as yet open problem.

8. Conclusion

The method we have demonstrated here expands on cur-rent research in perceptual grouping by demonstrating that agrouping algorithm can efficiently and successfully changeits behaviour at run-time in a context-dependent way. Weshow that an appearance model of the image regions on ei-ther side of a hypothesized contour can be used to guideboundary formation, and that dynamically adjusting group-ing behaviour leads to the detection of better object bound-aries on challenging natural images. Our method is effi-cient and robust, and it outperforms competing algorithmson complex images from the BSD.

References[1] Elder, J., Krupnik, A., and Johnston, L., “Contour Grouping

with Prior Models,” PAMI, Vol. 25, No. 6, pp. 661-674, 2003.2, 8

[2] Elder, J., and Zucker, S., “Computing Contour Closure,”ECCV, pp. 399-412. 1996. 2, 3

[3] Estrada, F., and Elder, J., “Multi-Scale Contour ExtractionBased on Natural Image Statistics,’ 5th IEEE Workshop onPerceptual Organization in Computer Vision, 2006. 7

[4] Estrada, F., and Jepson, A., “Controlling the Search for Con-vex Groups,” Tech-Report CSRG-842, Department of Com-puter Science, University of Toronto, 2004. 3, 6

[5] Estrada, F., and Jepson, A., “Perceptual Grouping for ContourExtraction,” ICPR, Vol. 2, pp. 32-35, 2004. 1, 2, 3, 6

[6] Estrada, F., and Jepson, A., “Quantitative Evaluation of aNovel Image Segmentation Algorithm,” CVPR, Vol. 2, pp.1132-1139, 2005. 8

[7] Gdalyahu, Y., Weinshall, D., and Werman, M., “Self-Organization in Vision: Stochastic Clustering for Image Seg-mentation, Perceptual Grouping, and Image Database Orga-nization,” PAMI, Vol. 23, No. 10, pp. 1053-1074, 2001. 2

[8] Huttenlocher, D., and Wayner, P., “Finding Convex EdgeGroupings in an Image,” IJCV, Vol. 8, No. 1, pp. 7-27, 1992.2

[9] Jacobs, D., “Robust and Efficient Detection of Salient ConvexGroups,” PAMI, Vol. 18, No. 1, pp. 23-27, 1996. 2

[10] Koffka, K., Principles of Gestalt Psychology, Harcourt,Brace, & World, 1935. 1

[11] Lowe, D., “Three-Dimensional Object Recognition fromSingle Two-Dimensional Images,” AI, Vol. 31, No. 3, pp. 355-395, 1987. 1

[12] Mahamud, S., Williams, L., Thornber, K., and Xu, K., “Seg-mentation of Multiple Salient Closed Contours from Real Im-ages,” PAMI, Vol. 24, No. 4, pp. 433-444, 2003. 2, 6

[13] Martin, D., Fowlkes, C., and Malik, J., “Learning to DetectNatural Image Boundaries Using Brightness and Texture,”NIPS, 2002. 8

[14] Puzicha, J., Hofmann, T., and Buhmann, J. M., “Non-parametric Similarity Measures for Unsupervised TextureSegmentation and Image Retrieval,” CVPR, pp. 267-272,1997. 3, 8

[15] Rubner, Y., Puzicha, J., Tomasi, C., and Buhmann, J. M.,“Empirical Evaluation of Dissimilarity Measures for Colorand Texture,” CVIU, Vol. 84, pp. 25-43, 2001. 3, 4

[16] Saund, E., “Finding Perceptually Closed Paths in Sketchesand Drawings,” PAMI, Vol. 25, No. 4, pp. 475-491, 2003. 2

[17] Swain, M., and Ballard, D., “Color Indexing,” IJCV, Vol. 7,No. 1, pp. 11-32, 1991. 3

[18] Sarkar, S., and Soundararajan, P., “Supervised Learning ofLarge Perceptual Organization: Graph Spectral Partitioningand Learning Automata,” PAMI, Vol. 22, No. 5, pp. 504-525,2000. 2

[19] Wang, S., Kubota, T., Siskind, J. M., and Wang, J. “SalientClosed Boundary Extraction with Ratio Contour,” PAMI, Vol.27, No. 4, pp. 546-561, 2005 2, 5, 6

[20] Wertheimer, M., A Sourcebook of Gestalt Psychology, Rout-ledge and Kegan Paul, 1938. 1

[21] Williams, L., and Jacobs, D., “Stochastic Completion Fields:A Neural Model of Illusory Contour Shape and Salience,”Neural Computation, Vol. 9, No. 4, pp. 837-858, 1997. 2

[22] Wyszecki, G., and Stiles, W.S., “Color Science: Conceptsand Methods, Quantitative Data and Formulae,” Wiley, N.Y.,1982. 3

robust boundary detection with adaptive groupingjepson/segmentation/... · gestalt grouping...

Documents