a benchmark for interactive image segmentation...

A Benchmark for Interactive Image Segmentation Algorithms

Yibiao Zhao1,3, Xiaohan Nie2,3, Yanbiao Duan2,3, Yaping Huang1, Siwei Luo1

1Beijing Jiaotong University, 2Beijing Institute of Technology, 3Lotus Hill Institute{ybzhao.lhi,ybduan.lhi,xhnie.lhi}@gmail.com,{yphuang,swluo}@bjtu.edu.cn

Abstract

This paper proposes a general benchmark for interac-tive segmentation algorithms. The main contribution canbe summarized as follows: (I) A new dataset of fifty imagesis released. These images are categorized into five groups:animal, artifact, human, building and plant. They coverseveral major challenges for the interactive image segmen-tation task, including fuzzy boundary, complex texture, clut-tered background, shading effect, sharp corner, and over-lapping color. (II) We propose two types of schemes, point-process and boundary-process, to generate user scribblesautomatically. The point-process simulates the human in-teraction process that users incrementally draw scribblesto some major components of the image. The boundary-process simulates the refining process that users place morescribbles near the segment boundaries to refine the detailsof result segments. (III) We then apply two precision mea-sures to quantitatively evaluate the result segments of dif-ferent algorithm. The region precision measures how manypixels are correctly classified, and the boundary precisionmeasures how close is the segment boundary to the realboundary. This benchmark offered a tentative way to guar-antee evaluation fairness of person-oriented tasks. Basedon the benchmark, five state-of-the-art interactive segmen-tation algorithms are evaluated. All the images, synthesizeduser scribbles, running results are publicly available on thewebpage1.

1. Introduction

Image segmentation is one of the most essential prob-lems in the field of computer vision. Although this topichas been extensively studied, common segmentation algo-rithms often serve as an preprocessing method of other algo-rithms. Automatic segmentation can hardly obtain satisfiedresults without high level knowledge of interest object. The

1http://www.imageparsing.com/seg.htm

Figure 1. Images and ground truth labels in the benchmark dataset.There are fifty images categorized into five classes of animal, arti-fact, building, human and plant.

Person-Oriented approach, from another point of view, fo-cus on how we can make the state-of-the-art useable by themajority of ”ordinary” people. The introducing of humaninteraction contributes to improving the performance of tra-ditional segmentation methods towards real life application.

33978-1-61284-035-2/10/$26.00 ©2010 IEEE

fuzzy boundary

overlapping colorshading effect

complex texture cluttered background

sharp corners and edges

Figure 2. Some typical images in the benchmark dataset cover several major challenges for segmentation algorithms

Start from Boycov et. al [2], the interactive segmentation al-gorithms [11] [1] [4] [5] [10] have drawn wide attention ofactive researchers, and the person-oriented techniques alsobecome a hot topic in the latest decade.

However, when the human interference is involved, thecomparison between algorithms can be hardly objective.For automatic segmentation, Martin et. al [8] firstly pro-vided a image database containing wide range of naturalscenes and evaluated the precision of result segment bound-aries. Unnikrishnan [12] proposed a similarity measure toperform a quantitative comparison between image segmen-tation algorithms. Recently, Kevin McGuinness [9] devel-oped a software to calculate the feedback as a person is us-ing a interactive segmentation algorithm.

In this paper, we strive to propose a general frameworkto evaluate interactive segmentation algorithms. The contri-bution of this paper includes: (I) A complete dataset of fivecategories of images is publicly available. Each image cate-gory contains ten representative images, and there is at leastone salient object on each image . These images cover somemajor challenges of image segmentation, including fuzzyboundary, complex texture, cluttered background, shad-ing effect, sharp corner, and overlapping color. Ground-truths are precisely hand-labeled for each image. (II) Twoschemes of human interaction, point-process and boundary-process, are proposed to objectively simulate interactiveprocess of drawing scribbles. The point-process drawspoints on key components of foreground/background. Theboundary-process simulates the process of the boundary re-finement after the point-process. By applying these twoschemes, one can automatically generate scribbles, and ob-jectively evaluate the performance of interactive algorithmswithout human bias. (III) Two criteria of region and bound-ary precision are further applied to evaluate the region cov-erage and boundary proximity for those two processes. Anoverview of our benchmark is shown in Fig.1.

The reminder of this paper is organized as follows:

In section.2, we start from introducing the design of thedataset, and section.3 presents the idea of interaction simu-lation. In section.4 we describe the details of our evaluationmethodology, and analyze the performance of algorithmsbased on the quantitative results on our benchmark. Andthe paper concludes in section.5.

2. Dataset designThe dataset contains fifty images from LHI database

[13]. These images are selected from the categories of an-imal, artifact, building, human and plant. The animal cate-gory is the most challenging one, which contains some wildanimal images with fuzzy boundaries and the complex tex-tures. Some animals also have very similar color appear-ance to overlapped with background color. The overlappingcolor between object and environment make a simple fore-ground color distribution is hard to be distinguished fromthe background. Opposite to animal category, the artifactcategory has relative clear boundaries and smooth appear-ance, while the shading effect also make the color distribu-tions to widely spread even overlap with each other. Thebuilding category contains some textured background andstructured foreground, it looks easy to be classified. How-ever, some algorithms have problems to deal with the sharpedges and corners on the building. The cluttered back-ground usually appears in the human category. Besidesthat, the color distributions for human clothes are some-times multimodal, it is challenging for some parametriccolor models. In the images of plant category, boundariesare very smooth, so the ”zigzag” effect for some discreteoptimization method will be visually apparent. The groundtruths of all images are human labeled. Our database is pub-licly available on the project website.

34

Figure 3. The two types of simulation named point-process(top row) and boundary-process(bottom row). The level of point-processincreases from left to right to test the algorithms’ ability to cover region. In the bottom row the gap near boundary become bigger from leftto right make locating boundary harder. With the increase of level from one to four, it become more difficult to get a satisfy segmentationresult.

3. Interaction simulation

In person-oriented applications, person is in the loop ofalgorithm iteration. The algorithm is running based on theresponse of users. Therefore, the experiment results can beeasily manipulated by human interference. In order to offera fair benchmark to evaluate different algorithms, we pro-pose two schemes to automatically simulate the interactiveprocess of person drawing scribbles, as illustrated in Fig.3.

3.1. Point-process simulation

Users label some key components of an image to indi-cate the major features of the foreground and background,and expect that the computer can predict desired labels tothe remaining parts of the image. Point-process simulationgenerates several points to represent key features in the im-age. In our method, we use k-means algorithm to captureseveral clusters on color space, each cluster is an importantcolor in the image. We then sample a pixel according tothe most representative color in each cluster. The 10*10area around the pixel is labeled to correspondent label asan initial scribbles. In order to evaluate the performance

of algorithms progressively, we define four levels with theincreasing numbers of clusters in GMM. Either foregroundregion or background region has only three clusters in levelone. On the level four, there are up to 50 clusters for eachlabel.

3.2. Boundary-process simulation

We provide another simulation named boundary-process. It simulates the refining process that users placemore scribbles near the segment boundaries to refine thedetails of result segments. In this process, we give the innerpart of foreground and background with known labels, andonly leave a band near the boundary to be proceed. Thisinput is used to evaluate how precise is the result segmentwith major part labeled. We also defined four levels of inputwith the decreasing width of unlabeled area. In level one thewidth is 40 pixels and in level four it decrease to 10 pixels.

35

Method Animal Artifact Building Human Plantbp rp bp rp bp rp bp rp bp rp

Boycov et al. [2] ICCV2001 0.26 0.70 0.40 0.68 0.30 0.60 0.30 0.68 0.26 0.79Bai et al. [1] IJCV2009 0.28 0.58 0.36 0.60 0.33 0.61 0.32 0.61 0.28 0.73

Couprie et al. [4] ICCV2009 0.26 0.74 0.56 0.82 0.41 0.81 0.27 0.63 0.32 0.80Grady [5] PAMI2006 0.24 0.73 0.48 0.80 0.40 0.78 0.29 0.64 0.32 0.81

Noma et al. [10] CoRR2008 0.35 0.64 0.68 0.80 0.51 0.73 0.40 0.70 0.40 0.83

Table 1. The segmentation precision on five image category of five algorithms.

Method boundary-process level point-process level1 2 3 4 1 2 3 4

Boycov et al. [2] ICCV2001 0.21 0.26 0.31 0.43 0.41 0.63 0.80 0.82Bai et al. [1] IJCV2009 0.19 0.24 0.33 0.50 0.36 0.53 0.74 0.80

Couprie et al. [4] ICCV2009 0.23 0.30 0.38 0.54 0.50 0.72 0.84 0.88Grady [5] PAMI2006 0.22 0.28 0.35 0.52 0.46 0.71 0.84 0.88

Noma et al. [10] CoRR2008 0.37 0.40 0.47 0.64 0.49 0.69 0.82 0.87

Table 2. The segmentation precision on four different simulation level of five algorithms.

4. Quantitative evaluation

With a dataset containing some challenging images andtwo simulation scribbles, we then need quantitative evalua-tion criteria for the two simulation. For the results generateby point-process simulation we apply a criterion to evalu-ate region coverage. For boundary-process the most regionare pre-labeled leaving the gap near boundary to algorithmsto handle, so here we evaluate the proximity between resultboundary and desired boundary.

4.1. Region coverage

We denote the overlapping ratio of the foreground objectas the region segmentation precision,

RP (ΛR,ΛGR) =| ΛR ∩ ΛG

R | / | ΛR ∪ ΛGR | . (1)

where ΛR and ΛGR are the foreground region of segment

result and ground-truth respectively. The region coverageratio RP (ΛR,Λ

GR) is the ratio of intersection to the union

of ΛR and ΛGR, and the output is a real value range from 0

to 1 where one means every pixel is labeled correctly.

4.2. Boundary proximity

With more areas marked by point-process, almost allmethods can obtain a course results without major failure.In this time, users will focus more on the boundary. Theinput scheme of boundary-process is designed to evaluateability of precise boundary locating. By introducing anundirected chamfer distance, we define the boundary locat-

ing accuracy as :

BP (ΛB ,ΛGB) = 1/

∑u minvd(u, v) +

∑v minud(v, u)

]v + ]u(2)

where u ∈ ΛB and v ∈ ΛGB are the pixels on result bound-

ary and ground-truth boundary, ]v and ]u denotes the num-ber of pixels on the corresponding boundaries. This is amore rigorous metric of amplifying the subtle difference be-tween result and ground-truth.

4.3. Results and analysis

In order to provide a set of baseline results and evaluatethe state-of-art algorithms on the new dataset, we test fiverepresentative algorithms: Graph cuts [2], Geodesic matting[1], Random walker [5], Power watersheds [4] and Struc-tural Interactive Segmentation [10].

The region coverage precision (rp) and boundary prox-imity precision (bp) of the five algorithms on each imagecategory are shown in Table.1. The animal category havethe lowest boundary proximity precision, while the arti-fact category is on the contrary. The human category getthe poor region coverage precision because of the clus-tered background, while the algorithms can easily extractthe foreground in plant category due to the nearly indepen-dent color distribution. From the experimental results, onecan easily see that the Graph cuts [2] performs good on thehuman category, while Power watersheds [4] can handle thesharp edges exist in artifact object and building images.

Table.2 shows the segmentation precisions of these algo-rithms on four simulation levels. The highest region cov-erage precision of Power watersheds [4] on every simula-tion level reveals its excellent region extracting ability. It

36

is interesting to notice that with the increase of the simu-lation level the performance of Random walker [5] is closeto Power watersheds. The Structural Interactive Segmen-tation [10] beat all others with boundary based interaction.This reveal this algorithm possess strong ability to attractthe result boundary towards the real one, once the majorcomponents of the image are labeled.

5. Conclusion

In this paper, we propose a general benchmark to eval-uate interactive segmentation algorithms. We collect a di-verse dataset of natural images. The dataset is composedof five categories. Two interactive simulation schemes areproposed to simulate the user interaction process. Two cri-teria are applied to evaluate the region coverage and bound-ary proximity of the two schemes. Five state-of-art algo-rithms are evaluated and all the experimental results withthe dataset are available at website2.

In the future, we are planning to extend the dataset tocontain more images and more categories e.g. medical im-ages and aerial images. Other interactive techniques, likebounding-box-based interaction [11] or boundary-based in-teraction [6] are also interesting to us. The goal of our workis pushing the boundaries of algorithm performance, andenlighten new idea for person-oriented tasks.

6. Acknowledgments

This work at Beijing Jiaotong University is sup-ported by China 863 Program 2007AA01Z168, NSFChina grants 60975078, 60902058, 60805041, 60872082,60773016, Beijing Natural Science Foundation 4092033,and Doctoral Foundations of Ministry of Educationof China 200800041049. And the work at the Lo-tus Hill Institute is supported by China 863 Program2007AA01Z340, 2009AA01Z331 and NSF China grants60970156, 60728203.

References[1] X. Bai and G. Sapiro. Geodesic matting: A framework for

fast interactive image and video segmentation and matting.International Journal of Computer Vision, 82, 2009.

[2] Y. Y. Boykov and M. P. Jolly. Interactive graph cuts for opti-mal boundary and region segmentation of objects in n-d im-ages. In In Proceeding of International Conference on Com-puter Vision., volume 1, pages 105–112 vol.1, 2001.

2http://www.imageparsing.com/seg.htm

[3] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active con-tours. International Journal of Computer Vision, 22:61–79,1995.

[4] C. Couprie, L. Grady, L. Najman, and H. Talbot. Power wa-tersheds: a new image segmentation framework extendinggraph cuts, random walker and optimal spanning forest. InIn Proceeding of International Conference on Computer Vi-sion., 2009.

[5] L. Grady. Random walks for image segmentation. IEEETransactions on Pattern Analysis and Machine Intelligence,28(11), 2006.

[6] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Activecontour models. International Journal of Computer Vision,1:321–331, 1988.

[7] V. Lempitsky, P. Kohli, C. Rother, and T. Sharp. Image seg-mentation with a bounding box prior. In In Proceeding ofInternational Conference on Computer Vision., 2009.

[8] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A databaseof human segmented natural images and its application toevaluating segmentation algorithms and measuring ecologi-cal statistics. In In Proceeding of International Conferenceon Computer Vision., volume 2, 2001.

[9] K. McGuinness and N. E. O’Connor. A comparative evalua-tion of interactive segmentation algorithms. Pattern Recog-nition, 43, 2010.

[10] A. Noma, A. B. V. Graciano, L. A. Consularo, R. M. J. Ce-sar, and I. Bloch. A new algorithm for interactive structuralimage segmentation. Technical report, 2008.

[11] C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interac-tive foreground extraction using iterated graph cuts. ACMTransactions on Graphics, 23:309–314, 2004.

[12] R. Unnikrishnan, C. Pantofaru, and M. Hebert. Toward ob-jective evaluation of image segmentation algorithms. IEEETransactions on Pattern Analysis and Machine Intelligence,29(6):929–944, April 2007.

[13] B. Yao, X. Yang, and S.-C. Zhu. Introduction to a large-scale general purpose ground truth database: Methodology,annotation tool and benchmarks. In In Proceeding of En-ergy Minimization Methods in Computer Vision and PatternRecognition., pages 169–183, 2007.

37

Figure 4. A screenshot of two browse mode on our web pages. The left one shows the segmentation results obtained from differentalgorithms. The bar plot shows the region precision for each algorithms. The right page shows several results by Graph Cuts. Theseimages are from the category of human, and input is selected as point-process level 3.

38

a benchmark for interactive image segmentation...

Documents