salient object subitizing - cv-foundation.org · salient object subitizing ... bers, maartje ej...

Post on 07-Oct-2018

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Salient Object Subitizing

Jianming Zhang1, Shugao Ma1, Mehrnoosh Sameki1, Stan Sclaroff1, Margrit Betke1, Zhe Lin2, Xiaohui Shen2, Brian Price2, Radomír Mech2

1Department of Computer Science, Boston University. 2Adobe Research.

Figure 1: How many salient objects are in each image?

How quickly can you tell the number of salient objects in each image in Fig.1? It was found over a century ago that people are equipped with a remark-able capacity to effortlessly and consistently identify 1, 2, 3 or 4 items bya simple glance [2]. This phenomenon, later coined by Kaufman, et al. asSubitizing [3], has been observed under various measurements. It is shownthat apprehension of small numbers up to three or four is highly accurate,quick and confident, while beyond this subitizing range, the feeling is lost.Accumulating evidence also shows that infants and even certain species ofanimals can differentiate between small numbers of items within the subitiz-ing range, suggesting that subitizing may be an inborn numeric capacity ofhumans and animals. It is speculated that subitizing is a preattentive andparallel process, and that it can help humans and animals make prompt de-cisions in basic tasks like navigation, searching and choice making.

In this paper, we propose a subitizing-like approach to estimate the num-ber (0, 1, 2, 3 and 4+) of salient objects in a scene, without resorting to anyobject localization process. Solving this Salient Object Subitizing (SOS)problem can benefit many computer vision tasks and applications.

Knowing the existence and the number of salient objects without theexpensive detection process (e.g., sliding window detection) can enable amachine vision system to select different processing pipelines at an earlystage, making it more intelligent and reducing computational cost. For ex-ample, SOS can help a computer vision system suppress the object detectionprocess, until the existence of salient objects is detected, and it can also pro-vide cues for selecting among search strategies and early stopping criteriabased on the predicted number. Differentiating between scenes with zero, asingle and multiple salient objects can also facilitate applications like robotvision, egocentric video summarization, snap point prediction, iconic imagedetection and image thumbnailing, etc.

To study the SOS problem, we provide a new image dataset of about7000 images, where the number of salient objects in each image has beenannotated by Amazon Mechanical Turk (AMT) workers. The dataset isavailable on our project website: http://www.cs.bu.edu/groups/ivc/Subitizing/. In Fig. 2, we show some sample images in the pro-posed SOS dataset with the collected ground-truth labels. Although thereare no bounding box annotations accompanying the numbers, it is usuallypretty straightforward to see which objects these numbers refer to. The an-notations from the AMT workers are further analyzed in a more controlledoffline setting, which shows a high inter-subject consistency in subitizingsalient objects.

Our ultimate goal is to develop a fast and accurate computational methodto estimate the number of salient objects in natural images. The trivialcounting-by-detection approach is quite challenging in this scenario, dueto cluttered background, occlusion, and large appearance, position and scalevariations of the objects in everyday images. Instead, inspired by the psy-chological observation that the subitizing is likely to be accomplished byrecognizing holistic patterns [1], the proposed SOS method bypasses thechallenging salient object localization process by using global features.

An implementation of our SOS method using an end-to-end Convolu-tional Neural Network (CNN) classifier attains 94% accuracy in detecting

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

0 1 2

3 4+

Figure 2: Sample images of the proposed SOS dataset. These images covera wide range of content and object categories.

94%(319)

4%(15) 0 0%

(1)1%(3)

7%(41)

82%(507)

8%(50)

2%(11)

1%(8)

3%(7)

31%(68)

42%(91)

18%(40)

6%(13)

6%(8)

17%(23)

17%(23)

45%(61)

16%(22)

6%(4)

3%(2)

10%(7)

14%(10)

67%(46)

0

1

2

3

4+

0 1 2 3 4+

Figure 3: Confusion matrix of our method using the fine-tuned CNN model.Each row corresponds to a ground-truth category. The percentage reportedin each cell is the proportion of images of category A (row number) labeledas category B (column number).

the existence of salient objects, and 42-82% accuracy (chance is 20%) inpredicting the number of salient objects (1, 2, 3, and 4+) on our dataset (seeFig. 3), without resorting to any intermediate saliency map computation orsalient object detection. These results are quite encouraging, consideringthat our CNN-based SOS method is capable of processing an image in acouple of milliseconds.

We demonstrate applications of the SOS technique in guiding salientobject detection and object proposal generation, resulting in state-of-the-art performance. In the task of salient object detection, we demonstratethat SOS can help improve accuracy by identifying images that contain nosalient object. In the task of object proposal generation, we present a simplecontent-aware proposal allocation approach using SOS, and show consistentimprovement over state-of-the-art.

[1] Brenda RJ Jansen, Abe D Hofman, Marthe Straatemeier, Bianca MCWBers, Maartje EJ Raijmakers, and Han LJ Maas. The role of patternrecognition in children’s exact enumeration of small numbers. BritishJournal of Developmental Psychology, 32(2):178–194, 2014.

[2] W Stanley Jevons. The power of numerical discrimination. Nature, 3:367–367, 1871.

[3] EL Kaufman, MW Lord, TW Reese, and J Volkmann. The discrimi-nation of visual number. The American Journal of Psychology, pages498–525, 1949.

top related