a computing model of selective attention for service robot...

Research ArticleA Computing Model of Selective Attention forService Robot Based on Spatial Data Fusion

Huanzhao Chen and Guohui Tian

School of Control Science and Engineering Shandong University Jinan Shandong 250061 China

Correspondence should be addressed to Guohui Tian ghtiansdueducn

Received 23 March 2018 Accepted 14 May 2018 Published 2 July 2018

Academic Editor L Fortuna

Copyright copy 2018 Huanzhao Chen and Guohui Tian This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Robots and humans are facing the same problem they all need to face a lot of perceptual information and choose valuableinformation Before the robots provide services they need to complete a robust real-time selective attention process in the domesticenvironment Visual attention mechanism is an important part of human perception which enables humans to select the visualfocus on the most potential interesting information It also could dominate the allocation of computing resource It also couldfocus humanrsquos attention on valuable objects in the home environmentTherefore we are trying to transfer visual attention selectionmechanism to the scene analysis of service robots This will greatly improve the robotrsquos efficiency in perception and processinginformation We proposed a computing model of selective attention which is biologically inspired by visual attention mechanismwhich aims at predicting focus of attention (FOA) in a domestic environment Both static features and dynamic features arecomposed in attention selection computing process Information from sensor networks is transformed and incorporated intothe model FOA is selected based on a winner-take-all (WTA) network and rotated by inhibition of return (IOR) principle Theexperimental results showed that this approach is robust to the partial occlusions scale-change illumination and variations Theresult demonstrates the effectiveness of this approachwith available literature on biological evidence Some specific domestic servicetasks are also tailored to this model

1 Introduction

Humans are able to choose the salient part of the visual sceneand focus on this area Building a computing model to detectsuch salient has become a popular approach Its advantagelies in the allocation of computing resources in the follow-ing cognition and analysis operations The most populardefinition for selective attention has been proposed by thepsychologist J WilliamThe potential of an area or a locationin an image to appear distinct from the rest in the scene isroughly referred to as its visual salience It is believed that thesaliency is affected by top-down knowledge and bottom-upstimulus-driven processes [1] Saliency computing and selec-tive attention mechanism are widely used in many applica-tions including object of interest image segmentation visualscene analysis and object recognition [2]

Here a question is raised about how to choose a spe-cific focus of attention based on multiple formations that

describe the visual scene Based on this theory most popularapproaches of bottom-up attention Itti model hypothesizedthat many kinds of feature maps on image feed into a uniquerepresentation called saliencymapThe saliencymap is a two-dimensional scalar map representation while it topographi-cally represents activity saliency of image and is used tochoose the salient location Specifically when this location issalient it is referred to as an active location of a saliency mapnomatter the facts that it corresponds to a thing of an obviouscolour in a dark background or to a moving object in a stillbackground Switching attention based on this scalar topo-graphical representation to focus on the most salient locationcould draw attention on the area of the highest activity in thesaliency map[3]

The concept of ldquosalientrdquo is usually mentioned in the proc-ess of bottom-up computations The term ldquosaliencyrdquo charac-terizes specific parts in a scene that might be users or objectsthat are assumed to be relative to the neighbouring area [4]

HindawiJournal of RoboticsVolume 2018 Article ID 5368624 9 pageshttpsdoiorg10115520185368624

2 Journal of Robotics

Figure 1 Illustration of intelligent space system

The representation of saliency in a specific scene is compu-tationally introduced in the process of choosing spatial areaFor selection and convergence process when one wins thespatial competition in one or more feature maps at differentspatial scales the related location is defined as salient Afterthat detailed measurement of saliency integrating multiplekinds of feature maps is encoded by saliency map In thatway it provides potential heuristic information for choosingattention on salient locations without consideration of thedetailed feature responses that made those locations salientThe saliency map is where the attention spotlight should beplaced as well as the most salient location in the scene [5]A proper neural architecture to find the area needing userrsquosattention from the generated saliency map is winner-take-allnetwork which implements a neural distributed maximumdetector [4] Using this mechanism however brings anothercomputing problem how to lay attention on the differentsaliency location or winning area in a saliency map Oneplausible computing strategy has already got biological exper-imental support It contains transiently inhibiting neurons forcurrent focus of attention in presenting saliency map Thecurrently attended location is then suppressed This progresswill be repeated and generates attention recording This kindof inhibitory strategy aims at transferring the FOA Thisphenomenon has been named as ldquoinhibition of returnrdquo (IOR)and became a popular way of simulating selective attentionmechanism [5] By using IOR process we are able to rapidlyfind and shift the focus of attention on different salientlocations with decreasing saliency rather than being boundto attend only to the location of maximal saliency at a giventime

We also use intelligent space to obtain information aboutthe users in a comprehensive way Intelligent space is capableof improving both perception and executive skills of servicerobots Figure 1 illustrates the overview of intelligent spaceUsers are capable of communicating with both robots andmachines with the help of intelligent space Many usefulservices could be provided by actuators discretely installed inthe space The main suggestion of the intelligent space is toinstall sensors and actuators separately in domestic space sothat service robot could provide timely stably and intelligentservice with limited configuration [6] The sensor networkcan capture human motion information as the user moves ineveryday life scenes In the meantime intelligent space canoffer some service to users by its actuators It is importantto fuse the data from different resources as the sensornetwork of intelligent space is discretely distributed in target

environment Cellular neural network is an effective way todeal with spatiotemporal temporal processing [7] Here wechoose spatial transformation as a mathematical way in datafusion operation This mathematical approach is relativelysimple and straightforward comparing with neural networkIn conclusion intelligent space is utilized in our researchaiming to develop both perception skills and executing skillsfor robots

In this paper we proposed a computing model of visualattention based on data fusion using intelligent space Themodel is turned out to select visual attention accurately androbustly by a set of experiments

2 Related Work

The popular attention selection model is inspired by thebiological structure and the behaviour of human visual sys-tem demonstrations [1 8] Usually saliency map is proposedbased on topographical combination with multiscale fea-tures in colour images Gabor filtering and center-surroundoperator are used to process the features Center-surroundmechanism is inspired by photoreceptor interactions andvisual receptive field organization It consists of the center andthe surround as the feature values at the corresponding imagepixels derived from responses to stimuli [9] A dynamicalneural network based is built on WTA network and IORmechanism to choose focus of attention and to balance thesaliency map A few other saliency computing models basedon Itti model for salience computation are presented [10ndash13]

In recent years some models are proposed by new ap-proaches other than Ittirsquos approach Zhao proposed a multi-context deep learning framework for salient object detectionwhere local context and global context are all concerned[14] Wang presented a deep neural network based saliencydetection mechanism by integrating global search and localestimation [15 16] Liu presented saliency tree as a novelsaliency detection framework which could provide a hierar-chical representation of saliency for generating high-qualityregional and pixel-wise saliencymaps [17] Zhang proposed asaliency-guided unsupervised feature learning framework forscene analysis [18]Wei proposed an algorithm for simultane-ously localizing objects and discovering object classes basedon bottom-up multiple class learning [19] Lin proposed anovel salient object detection algorithm by integrating multi-level features including local contrast global contrast andbackground priors [20] Chen proposed a protoobjects com-putational model by predicting fixations using a combinationof saliency and mid-level representations of shape [21]

Most models of fixation prediction operation at thefeature level are like the classical Itti model Others emphasizethat salient object is more important while defining it needsresearchers annotation Most of the existing models considerlow-level visual features and locations of objects The majordifference among them is the method for dealing withincoming sensor information feature types and computingsaliency For domestic service robot there are two mainshortages At first considering the complex environment ofdomestic environment blocking is a big troubleThen humanbehaviour is a key component of robot attention for service

Journal of Robotics 3

Figure 2 Illustration of selective attention model procedure

task so that human behaviour deserves more interests in thesaliency computing process Unlike other studies which haveemphasize elaborately designed low-level feature descriptorsour approach proposed a saliency model with the help ofintelligent space platform So more information from differ-ent cameras is mixed and computed This has the potentialto improve the accuracy and robustness of the saliency com-puting

3 A Saliency Computing Model ofSelective Attention

Thegeneral procedure of saliency computingmodel for selec-tive attention is illustrated in Figure 2 Raw images (colourimage time ordered image sequences and depth image) areobtained from cameras distribution from intelligent space Atfirst features are extracted using Gabor filter and differenceof Gaussian filter Besides dynamic features are calculated bycomputing of optical flow Then the homogeneous transfor-mation matrix is applied for representing geometric relationsbetween cameras from different coordinate system After thata saliency map is proposed for nonlinearly combining thefeatures and feeds them into a WTA network By integratingwith inhibition of return method the FOA is finally selectedand rotated

31 Feature Extraction and Processing For this computingmodel the input is captured as colour images and depthimages Colour images are obtained being digitized at aspecific resolution Based on dyadic Gaussian pyramids andbeing created in few spatial scales this is considered as pro-gressively low-pass filter The colour images are subsampledto produce horizontal and vertical image-reduction factorsDepth images are obtained by infrared sensor of MicrosoftKinect and are capable of detecting user movement in keyareas

Figure 3 Illustration of multiview situation from sensor network inintelligent space

Static features could be extracted from one single imageand represent some static character (like outline and chro-matic aberration) Colour images are obtained from ceilingcamera from intelligent space platform as shown in Figure 3As 119887 119892 and 119903 refer to the blue green and red from the colourimages then we build intensity feature map119868 = (119903 + 119892 + 119887)3This measures the intensity value of all colour channels ofan image Four broadly tuned colour channels are created asfollows

119861 = 119887 minus (119892 + 119903)2 (1)

119877 = 119903 minus (119892 + 119887)2 (2)

119866 = 119892 minus (119887 + 119903)2 (3)

119884 = (119887 + 119903)2 minus 1003816100381610038161003816119903 minus 11989210038161003816100381610038162 minus 119887 (4)

for red green red and yellow Based on these colourchannels four Gaussian pyramids are created as 119877(120590) 119866(120590)119861(120590) and119884(120590) while the scale is 120590 isin [0 1 2 3 8] By usingGaussian pyramid in image processing to extract multiscalefeatures and downsampling we defined the difference ofcross scale as ⊝ between two feature maps based on center-surround operation This operation is generated based onpoint-by-point subtraction and interpolation This comput-ing process is highly relative to the sensitive features which isshown in a series of maps 119868(119888 119904) with 119904 = 119888 + 120575 119888 isin [2 3 4]and 119904 isin [3 4]

119868 (119888 119904) = |119868 (119888) ⊝ 119868 (119904)| (5)

By simulating the colour-double-opponent system in partsof human brain chromatic and spatial opponency is con-structed for the redgreen greenred blueyellow and yel-lowblue colour pairs in our model

119877119866 (119888 119904) = |(119877 (119888) minus 119866 (119888)) ⊝ (119866 (119904) minus 119877 (119904))| (6)

119861119884 (119888 119904) = |(119861 (119888) minus 119884 (119888)) ⊝ (119861 (119904) minus 119884 (119904))| (7)


Gabor filters are the product of a cosine grating and a 2DGaussian envelope approximating the receptive field sensi-tivity profile (impulse response) of orientation selective neu-rons in primary visual cortex Next we extracted informationof local orientation from based on Gabor pyramids O(120590 120579)where 120590 isin [0 1 2 8] represents the scale and 120579 isin[0∘ 45∘ 90∘ 135∘] is the preferred orientation

119866 (119909 119910 120579) = exp(minus 11990910158402 + 12057421199101015840221205752 ) cos(2120587 11990910158402120582 + 120595) (8)

Orientation feature maps O(119888 119904 120579) are encoded as a grouplocal orientation contrast between the center and surroundscales

O (119888 119904 120579) = |O (119888 120579) ⊝ (119904 120579)| (9)

Here we obtain a depth image based on infrared camera fromMicrosoft Kinect At first Kinect captures Speckle Patternin different scales and it is saved as Primary or ReferencePatternThen the infrared camera detects the objects and getsits corresponding Speckle Pattern The depth informationis obtained by matching Primary or Reference Pattern andtriangulation method of ground resistance test Finally itgenerates a depth image to describe the spatial informationby using 3D Reconstruction and Local offset After a normal-ization operation we can get depth map 119898119889 to describe thespatial relationship of the environment

Next the dynamic features from image sequences areextracted by computing optical flow The mobile objects(normally users and robots) are salient part and deservemoreattention from service robots Suppose the image intensityfunction 119868(119909 119905) is

119868 (119909 119905) asymp 119868 (119909 + 120575119909 119905 + 120575119905) (10)

where 120575119909 is the displacement of the local image region at (119909 119905)after time 120575119905 The left side of the equation is expanded basedon a Taylor series

119868 (119909 119905) = 119868 (119909 119905) + nabla119868 sdot 120575119909 + 120575119905 sdot 119868119905 + O2 (11)

where nabla119868 = (119868119909 119868119910) and 119868119905 are the first-order partial deriva-tives of 119868(119909 119905) and O2 is the second- and higher-order termswhich are assumed negligible Subtract 119868(119909 119905) on both sideswhile ignoring O2 and dividing by 120575119905

nabla119868 sdot (V) + 119868119905 = 0 (12)

This equation is called optical flow constraint equationSuppose (119906 V) is the image velocity while nabla119868 = (119868119909 119868119910) isthe spatial intensity gradient Optical flow is defined as thedistribution of the velocity in given part of the image Theoptical flow at given time is described as feature map 11989811990011989132 Space Transformation Using Coordination TransformIn this chapter we proposed a systematic and generalizedmethod about three-dimensional coordinate transformThatis because the service robot and visual devices under real

circumstances are from different area with respect to dif-ferent reference coordinates We use a 3D Euclidean space-homogeneous coordinate representation to develop matrixtransformations This operation includes scaling rotatingtranslating and perspective transformation So features canbe flexibly selected to generate a saliency map and focus theattention accurately In this way some common problems likepattern noise can be successfully solved

The homogeneous coordinate representation implies thatthe 119899-dimensional vector could be represented by a 119899+1 com-ponent vector Thus in a 3D space a position vector and anaugmented vector w can be used to represent that a positionvector p is in homogeneous coordinate representation

w times p = (119908119901119909 119908119901119910 119908119901119911 119908)119905 (13)

In physical world the coordinates corresponded to homoge-neous coordinate as follows

119901119909 = 119908119901119909119908 119901119910 = 119908119901119910119908 119901119911 = 119908119901119911119908

(14)

The component 119908 represents the scale factor of the homoge-neous coordinate The homogeneous transformation matrixis a 4 times 4 matrix This matrix is used to map a homogenouscoordinate position vector from one coordinate system to theother coordinate system The homogeneous transformationmatrix 119879 is as follows

119879 = [R3times3 p3times1f1times3 1199081times1]

= [ 119877119900119905119886119905119894119900119899119872119886119905119903119894119909 119875119900119904119894119905119894119900119899119881119890119888119905119900119903119875119890119903119904119901119890119888119905119894V119890119879119903119886119899119904119891 119878119888119886119897119894119899119892119865119886119888119905119900119903 ]

(15)

The two components respectively are the 3 times 3 rotationmatrix and the 3 times 1 position vector The vector is the rotatedcoordinate systemoriginwith respect to the reference systemFor the second row one component is a 4 times 4 homogeneoustransformation matrix and the other component of 119879 rep-resents 119878119888119886119897119894119899119892119865119886119888119905119900119903 The matrix maps a vector expressedin homogeneous coordinates with respect of the 119874119880119881119882coordinate system to the reference coordinate 119874119883119884119885 systemThat is

119901119909119910119911 = 119879119901119906V119908 (16)

119879 =[[[[[[

119899119909 119904119909 119886119909 119901119909119899119910 119904119910 119886119910 119901119910119899119911 119904119911 119886119911 1199011199110 0 0 1

]]]]]]

= [n s a p0 0 0 1] (17)

As is shown (17) represents a homogeneous transformationmatrix in a 3D space The space transfer operation is used to


composite an overall saliencymap based on the saliencymapsfrom different cameras In conclusion the homogeneoustransformation matrix geometrically represents the locationof a fixed coordinate system with respect to a reference coor-dinate system from other visual resources

33 Building Saliency Map and Finding FOA In this chapterwe compute saliency by building a saliency map referringto biological principles to describe the value of saliency inthe given image The most salient location in the specificcoordinate is referred to as FOA It represents the maximumarea of the space that the robot should focus on at giventime Next we need to propose a FOA selection and inhibitionmechanism to switch FOA among saliency maps There aretwo major steps which are biologically plausible At firstwinner-take-all network is used to model the saliency scoreall over the environment and identify FOA Then Argmaxfunction is used to identify of the saliency map against thehighest contributing score of the saliency map activity atthe winning location The neurons of WTA network includea single capacitance a leakage conductance and a voltagethreshold A prototypical spike will be produced as the chargedelivered by synaptic input reaches the threshold of theneuron Finally the saliency became the neurons feed intoa winner-take-all (WTA) neural network There is one andonly one neuron that could be activated by theWTA networkat one time Other neurons are inhibited at the same time toensure that the only activated exclusively neuron representsthe FOA in space

We can build four ldquofeature mapsrdquo corresponding to thefeatures extracted by methods proposed in previous chapterIntensity map 119898119894 is obtained in (1) Colour map 119898119888 describescolour features obtained by (2) and (3) Orientation featuremap is 119898119888 = O(119888 119904 120579) Depth map 119898119889 is extracted fromsensors in Kinect Moreover the dynamic feature map 119898119900119891obtained from the computing of optical flow in (8) Here weuse entropy method to choose the weigh for every part ofsaliency amount The entropy can be obtained by the follow-ing equation

119890119904 = 4sum119896=1

119890119896 (18)

where 119890119896 119896 isin [1 2 3 4] means the entropy for 119898119888119898119894119898119900and 119898119889 119890119904 is the summation of 119890119896The final saliency map 119878 isgenerated by accumulation static and the former featuremapsas follows

119878 = 1198901119890119904119898119888 +1198902119890119904119898119894 +

1198903119890119904119898119900 + 1198904119890119904119898119889 + 119898119900119891 (19)

In this way features are nonlinearly accumulated by theirentropy contribution in normalization coefficient so that theresult will be more precise and comprehensiveThen the finalsaliency map 119878 from one specific resource is generated

Theneurons in the saliencymap 119878 could receive activationsignals from previous process and remain independent ofeach other Every neuron of the saliency should activate thecorresponding WTA network neuron Based on biological


mechanism the neuron could be fast activated according tosaliency map on salient place of space Here some relativeprocess of choosing FOA is included as follows First theFOA in original state is not fixed Another winner neuronwith salient place will be excited Then all other neurons aresuppressed to make the FOA exclusively and wait for nextsignal At last IOR mechanism is activated in saliency com-puting in order to switch the FOA dynamically based onsaliency map [22] The FOA could be unconstrained trans-ferred from the previous FOA to the next FOA while otherareas are suppressed In order to locate the FOA the winningposition is

119865119874119860 = argmax 119878 (20)

In this way the FOA which is highly relative to service taskcan be picked outThis mechanism can be verified by experi-ments in the following chapter

4 Experiments

In this chapter we present a few experiments in real imagesto demonstrate the computing model of selective attentionmodel In order to stimulate indoor environment for servicerobot we build a room and interconnected corridor experi-ment environment based on intelligent space as is shown inFigure 4

It is a simple environment by contrast as there are fewerobstacles and a simple background in the corridor Howeverthe room is a sophisticated environment as there are variouskinds of furniture and electronics All these objects willmake it hard for attention selection We built an intelligentspace platform There are a few cameras including ceilingcamera and robot sensors as is shown in Figure 5This sensornetwork could be able to observe the target environment ina more comprehensive way We can obtain colour imagesdepth images and time sequence images based on our


Figure 5 Illustration of cameras in intelligent space system

Figure 6 Original colour image and depth image in a corridor

approach Based on this configuration a series of experimentsare performed in this real scenario

General Performance The first experiment is deployed inthe corridor Figure 6 illustrates the original images (colourimages and depth images) both from a room and a corridorThere are one user and a service robot in the experimentThisis in accordance with the scenes of daily life The first tworows are colour images from ceiling camera and robot Thebottom image is depth image from robot We also capture aseries of time ordered image sequences It implies that colourimage contains the global colour features from users andobjects Depth image emphasizes the saliency about userslocated in specific area Then we calculated the optical flow


using the image sequence The result of saliency computingexperiment based on our approach is illustrated in Figure 7We can see that as user is walking in the corridor the outlineof the user is clearly shown in the chart As depth image isstatic and captured in fixed area the userrsquos movement as dy-namic feature is more accurate to reveal the saliency of thetarget environment It turns out to be a good supplementarymethod

After that the final result of the saliency computing resultis shown in Figure 8 The first row is saliency maps while thebottom picture visualizes the FOA in the environment FromFigure 8 we can see the model can find the user from thecorridor accurately In the common sense a human waitershould also focus on the user and prepare to offer someservice So our result is in agreement with biological sense Asfor the indoor environment despite the fact that the back-ground is complex we can see that our attention is in the userBy common sense the model chooses the FOA accuratelyDue to the existing of depth information the user can bemore attractive than the background

Comparison Experiment with Existing Model Then we maketwo comparison experiments with traditional computingapproach as is shown in Figure 9 The first row is the originalcolour images while one is from a corridor and others arefrom a room The second row is result of traditional modelwhile the last row is result of our model By comparison wecan see that the result of Itti model is coarse and inaccurateSome other parts which are less salient are chosen (likedesks and chairs) That is because traditional approaches aremainly focused on the saliency parts in image Normallythe human user is of great importance and should be thefocus of the service tasks On the contrary our model is ableto choose FOA more precisely and more reasonably due tothe process of depth image and event item which emphasize


Figure 8 General performance experiment

Figure 9 Comparison experiment with existing model

discrete information about intelligent space device or humanbehaviour

Influence of Noise Figure 10 is an example of influence ofnoise The top-left is an image with salt and pepper noisewhile the top-right image is a depth image The bottomimages are the result of saliencymap and corresponding FOAAs is shown in the figure the user and service robot arechosen as saliency partThe influence of noise has been elim-inated and the model chosen is right FOA by common senseIn such images our model was also in better performanceas our subjective perception of saliency Due to the depthinformation noise in colour images can be excluded from theresultThis operation improves the robustness of the saliencycomputing model

5 Discussion

Selective visual attention is capable of finding attention timelyon salient objects or users in the domestic environment Thiskind of skill enables service robots to locate salient things ina sophisticated visual scene As it makes human able to findpossible food predator ormate in the outer worldThis paperfocuses on developing computational models for servicerobots that describe the detailed process on attention selec-tion in a specific sceneThis has been an important challengefor both robotic and computational neuroscience

We proposed a computing model for visual attentionbased on spatial data fusion Using images spatially inde-pendently from the sensor network is applied for visualizingstatic occlusion culling We also captured dynamic features



in order to reflect human motion states This is a key com-ponent of potentially interesting information for a servicerobot When performing in a real environment both quan-titatively and qualitatively the application of our approachhas been found to correspond more accurately to locationsand saliencymaps of eye fixations in visual environment thantraditional approaches Our computing method approach isinspired by the framework of Itti model and has also beenfound to be more useful in performing better performancethan any other same state-of-the-art existing same saliencycomputing approaches The better performance of the pro-posed model has been confirmed through a series of exper-iments Most traditional approaches on saliency computingmerely are based on low-level features from colour imageThis paper covers spatial information and dynamic fea-tures which are important to be concerned in scene analysisfor service robot Our model shows great improvement com-pared with traditional methods both in task-related accuracyand in noise resistance capability What is more comparedwith traditional approaches it produces better results forattention selection To some extent it is observed that theresult using our saliency computing approach is consistentmanual annotations

At present there are a few shortages in our research Somelow-level visual features in multiscales are considered in theresearch such as depth hues orientation and intensitiesWhile biological principle (like the comprehensive shape fea-tures) derived is extensible and could be used to more featuredimension In the future new approaches with less compu-ting cost incurred through multiplicative top-down weighton bottom-up saliency maps which will combine both goal-driven and stimulus-driven attention should be concernedAlso it should optimize the efficiency of guiding on potentialtarget locations simultaneously being productive to changevisual stimulus situations Furthermore when robot operatesin unconstrained environments where unexpected eventssuch as accidents may occur the attention system based onsaliency computing needs to bridge the attention selectionto further application in further research This has great

potentials to improve its guiding capabilities for more visualcomputing tasks

6 Conclusions

In this paper we proposed a novel saliency computing modelfor selective attention based on human visual system and tra-ditional bottom-up saliency computing approach This newcomputing model with feature fusion is built inspired bythe framework of Itti model It performs salience computingbased on multiple information fusions The main contribu-tion of this paper is to build a computing model for visualattention selection for service robots based on spatial datafusion By introducing colour images and depth informationfrom different locations our model solved the problem ofimage matching under complex condition like noise corrup-tion The result of our model is consistent with the biologicalevidence of study of visual attention in human psychophysicsand optimized target detection speed tailored to a specificdomestic service task

Visual attention mechanism plays an important role inthe human visual system and for robots and other machines[1] There are two main goals for developing the selectiveattention model At first selective attention model can beused to cope with massive complexity perception informa-tionThismodel can prioritize saliency areas in visual imagesMore potential operations exist especially for large imageinformation that need to be processed or if real-time process-ing is needed for service robots The other goal is the abilityto support task plans This is always necessary especially forservice task performs in a domestic and sophisticated un-known environment As service robot usually works in thesame environment with human It is reasonable to transferthe human attention mechanism to service robots in order toperform service tasks

The novel computing model of selective attention pro-posed in the paper is a biological plausible approach Fur-thermore it could extend to other applications not only insaliency computing related approaches A large bunch of po-tential opportunity stays in modifying and updating thesaliency computing framework of Itti model and referring torecent research in biology In the future it would be inter-esting in exploiting performance of the saliency computingmodel built on the application of the proposed computingmodel Besides service task inducing in our research moreapplication of the extracted saliency computingmodel can bepresent to segmentation bionic cognition and so on

Data Availability

The data used to support the findings of this study have notbeen made available because some of technical reasons

Conflicts of Interest

The authors declare that they have no conflicts of interest


References

[1] F Baluch and L Itti ldquoMechanisms of top-down attentionrdquoTrends in Neurosciences vol 34 no 4 pp 210ndash224 2011

[2] M Cheng N J Mitra X Huang P H Torr and S M HuldquoGlobal contrast based salient region detectionrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 37 no 3pp 569ndash582 2015

[3] L Itti and C Koch ldquoComputational modelling of visual atten-tionrdquo Nature Reviews Neuroscience vol 2 no 3 pp 194ndash2032001

[4] C Koch and S Ullman ldquoShifts in selective visual attentiontowards the underlying neural circuitryrdquo Human Neurobiologyvol 4 no 4 pp 219ndash227 1985

[5] H-W Kwak and H Egeth ldquoConsequences of allocating atten-tion to locations and to other attributesrdquo Perception amp Psy-chophysics vol 51 no 5 pp 455ndash464 1992

[6] H Hashimoto ldquoIntelligent space - how to make spaces intel-ligent by using DINDrdquo in Proceedings of the SMC2002 IEEEInternational Conference on Systems Man and Cybernetics pp14ndash19 Yasmine Hammamet Tunisia 2002

[7] L Fortuna P Arena D Balya and A Zarandy ldquoCellular neu-ral networks a paradigm for nonlinear spatio-temporal pro-cessingrdquo IEEE Circuits Systems Magazine vol 1 no 4 pp 6ndash212001

[8] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[9] D Sen and M Kankanhalli ldquoA bio-inspired center-surroundmodel for salience computation in imagesrdquo Journal of VisualCommunication and Image Representation vol 30 pp 277ndash2882015

[10] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[11] R Achanta and S Susstrunk ldquoSaliency detection using max-imum symmetric surroundrdquo in Proceedings of the 17th IEEEInternational Conference on Image Processing (ICIP rsquo10) pp2653ndash2656 September 2010

[12] T Liu Z Yuan J Sun et al ldquoLearning to detect a salient objectrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 33 no 2 pp 353ndash367 2011

[13] C Yang L H Zhang H C Lu X Ruan and M-H YangldquoSaliency detection via graph-based manifold rankingrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo13) pp 3166ndash3173 Portland OreUSA June 2013

[14] R ZhaoW Ouyang H Li and XWang ldquoSaliency detection bymulti-context deep learningrdquo in Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition CVPR 2015pp 1265ndash1274 IEEE Massachusetts Mass USA June 2015

[15] T Chen L Lin L Liu X Luo and X Li ldquoDISC deep imagesaliency computing via progressive representation learningrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 27 no 6 pp 1135ndash1149 2016

[16] J Pan E Sayrol X Giro-I-Nieto K McGuinness and NE OrsquoConnor ldquoShallow and deep convolutional networks forsaliency predictionrdquo in Proceedings of the 2016 IEEE Conferenceon Computer Vision and Pattern Recognition CVPR 2016 pp598ndash606 usa July 2016

[17] Z Liu W Zou and O Le Meur ldquoSaliency tree a novel saliencydetection frameworkrdquo IEEE Transactions on Image Processingvol 23 no 5 pp 1937ndash1952 2014

[18] F Zhang B Du and L Zhang ldquoSaliency-guided unsupervisedfeature learning for scene classificationrdquo IEEE Transactions onGeoscience and Remote Sensing vol 53 no 4 pp 2175ndash21842015

[19] X H Shen Y Wu and Evanston ldquoA unified approach to salientobject detection via low rank matrix recoveryrdquo in Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo12) pp 853ndash860 Providence RI USA June2012

[20] M Lin C Zhang andZChen ldquoGlobal feature integration basedsalient region detectionrdquo Neurocomputing vol 159 no 1 pp 1ndash8 2015

[21] Y Chen and G Zelinsky ldquoComputing Saliency over Proto-Objects Predicts Fixations During Scene Viewingrdquo Journal ofVision vol 17 no 10 p 209 2017

[22] M I Posner ldquoComponents of visual orientingrdquo Attention Per-formance vol 32 no 4 pp 531ndash556 1984

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018


Active and Passive Electronic Components

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of



Journal ofEngineeringVolume 2018

SensorsJournal of



RotatingMachinery


Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom



The representation of saliency in a specific scene is compu-tationally introduced in the process of choosing spatial areaFor selection and convergence process when one wins thespatial competition in one or more feature maps at differentspatial scales the related location is defined as salient Afterthat detailed measurement of saliency integrating multiplekinds of feature maps is encoded by saliency map In thatway it provides potential heuristic information for choosingattention on salient locations without consideration of thedetailed feature responses that made those locations salientThe saliency map is where the attention spotlight should beplaced as well as the most salient location in the scene [5]A proper neural architecture to find the area needing userrsquosattention from the generated saliency map is winner-take-allnetwork which implements a neural distributed maximumdetector [4] Using this mechanism however brings anothercomputing problem how to lay attention on the differentsaliency location or winning area in a saliency map Oneplausible computing strategy has already got biological exper-imental support It contains transiently inhibiting neurons forcurrent focus of attention in presenting saliency map Thecurrently attended location is then suppressed This progresswill be repeated and generates attention recording This kindof inhibitory strategy aims at transferring the FOA Thisphenomenon has been named as ldquoinhibition of returnrdquo (IOR)and became a popular way of simulating selective attentionmechanism [5] By using IOR process we are able to rapidlyfind and shift the focus of attention on different salientlocations with decreasing saliency rather than being boundto attend only to the location of maximal saliency at a giventime

We also use intelligent space to obtain information aboutthe users in a comprehensive way Intelligent space is capableof improving both perception and executive skills of servicerobots Figure 1 illustrates the overview of intelligent spaceUsers are capable of communicating with both robots andmachines with the help of intelligent space Many usefulservices could be provided by actuators discretely installed inthe space The main suggestion of the intelligent space is toinstall sensors and actuators separately in domestic space sothat service robot could provide timely stably and intelligentservice with limited configuration [6] The sensor networkcan capture human motion information as the user moves ineveryday life scenes In the meantime intelligent space canoffer some service to users by its actuators It is importantto fuse the data from different resources as the sensornetwork of intelligent space is discretely distributed in target

environment Cellular neural network is an effective way todeal with spatiotemporal temporal processing [7] Here wechoose spatial transformation as a mathematical way in datafusion operation This mathematical approach is relativelysimple and straightforward comparing with neural networkIn conclusion intelligent space is utilized in our researchaiming to develop both perception skills and executing skillsfor robots

In this paper we proposed a computing model of visualattention based on data fusion using intelligent space Themodel is turned out to select visual attention accurately androbustly by a set of experiments

2 Related Work

The popular attention selection model is inspired by thebiological structure and the behaviour of human visual sys-tem demonstrations [1 8] Usually saliency map is proposedbased on topographical combination with multiscale fea-tures in colour images Gabor filtering and center-surroundoperator are used to process the features Center-surroundmechanism is inspired by photoreceptor interactions andvisual receptive field organization It consists of the center andthe surround as the feature values at the corresponding imagepixels derived from responses to stimuli [9] A dynamicalneural network based is built on WTA network and IORmechanism to choose focus of attention and to balance thesaliency map A few other saliency computing models basedon Itti model for salience computation are presented [10ndash13]

In recent years some models are proposed by new ap-proaches other than Ittirsquos approach Zhao proposed a multi-context deep learning framework for salient object detectionwhere local context and global context are all concerned[14] Wang presented a deep neural network based saliencydetection mechanism by integrating global search and localestimation [15 16] Liu presented saliency tree as a novelsaliency detection framework which could provide a hierar-chical representation of saliency for generating high-qualityregional and pixel-wise saliencymaps [17] Zhang proposed asaliency-guided unsupervised feature learning framework forscene analysis [18]Wei proposed an algorithm for simultane-ously localizing objects and discovering object classes basedon bottom-up multiple class learning [19] Lin proposed anovel salient object detection algorithm by integrating multi-level features including local contrast global contrast andbackground priors [20] Chen proposed a protoobjects com-putational model by predicting fixations using a combinationof saliency and mid-level representations of shape [21]

Most models of fixation prediction operation at thefeature level are like the classical Itti model Others emphasizethat salient object is more important while defining it needsresearchers annotation Most of the existing models considerlow-level visual features and locations of objects The majordifference among them is the method for dealing withincoming sensor information feature types and computingsaliency For domestic service robot there are two mainshortages At first considering the complex environment ofdomestic environment blocking is a big troubleThen humanbehaviour is a key component of robot attention for service









119861 = 119887 minus (119892 + 119903)2 (1)

119877 = 119903 minus (119892 + 119887)2 (2)

119866 = 119892 minus (119887 + 119903)2 (3)



119868 (119888 119904) = |119868 (119888) ⊝ 119868 (119904)| (5)


119877119866 (119888 119904) = |(119877 (119888) minus 119866 (119888)) ⊝ (119866 (119904) minus 119877 (119904))| (6)

119861119884 (119888 119904) = |(119861 (119888) minus 119884 (119888)) ⊝ (119861 (119904) minus 119884 (119904))| (7)



119866 (119909 119910 120579) = exp(minus 11990910158402 + 12057421199101015840221205752 ) cos(2120587 11990910158402120582 + 120595) (8)


O (119888 119904 120579) = |O (119888 120579) ⊝ (119904 120579)| (9)



119868 (119909 119905) asymp 119868 (119909 + 120575119909 119905 + 120575119905) (10)


119868 (119909 119905) = 119868 (119909 119905) + nabla119868 sdot 120575119909 + 120575119905 sdot 119868119905 + O2 (11)


nabla119868 sdot (V) + 119868119905 = 0 (12)




w times p = (119908119901119909 119908119901119910 119908119901119911 119908)119905 (13)


119901119909 = 119908119901119909119908 119901119910 = 119908119901119910119908 119901119911 = 119908119901119911119908

(14)



= [ 119877119900119905119886119905119894119900119899119872119886119905119903119894119909 119875119900119904119894119905119894119900119899119881119890119888119905119900119903119875119890119903119904119901119890119888119905119894V119890119879119903119886119899119904119891 119878119888119886119897119894119899119892119865119886119888119905119900119903 ]

(15)


119901119909119910119911 = 119879119901119906V119908 (16)

119879 =[[[[[[

119899119909 119904119909 119886119909 119901119909119899119910 119904119910 119886119910 119901119910119899119911 119904119911 119886119911 1199011199110 0 0 1

]]]]]]

= [n s a p0 0 0 1] (17)






119890119904 = 4sum119896=1

119890119896 (18)


119878 = 1198901119890119904119898119888 +1198902119890119904119898119894 +

1198903119890119904119898119900 + 1198904119890119904119898119889 + 119898119900119891 (19)





119865119874119860 = argmax 119878 (20)


4 Experiments

















5 Discussion








6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia










119861 = 119887 minus (119892 + 119903)2 (1)

119877 = 119903 minus (119892 + 119887)2 (2)

119866 = 119892 minus (119887 + 119903)2 (3)



119868 (119888 119904) = |119868 (119888) ⊝ 119868 (119904)| (5)


119877119866 (119888 119904) = |(119877 (119888) minus 119866 (119888)) ⊝ (119866 (119904) minus 119877 (119904))| (6)

119861119884 (119888 119904) = |(119861 (119888) minus 119884 (119888)) ⊝ (119861 (119904) minus 119884 (119904))| (7)



119866 (119909 119910 120579) = exp(minus 11990910158402 + 12057421199101015840221205752 ) cos(2120587 11990910158402120582 + 120595) (8)


O (119888 119904 120579) = |O (119888 120579) ⊝ (119904 120579)| (9)



119868 (119909 119905) asymp 119868 (119909 + 120575119909 119905 + 120575119905) (10)


119868 (119909 119905) = 119868 (119909 119905) + nabla119868 sdot 120575119909 + 120575119905 sdot 119868119905 + O2 (11)


nabla119868 sdot (V) + 119868119905 = 0 (12)




w times p = (119908119901119909 119908119901119910 119908119901119911 119908)119905 (13)


119901119909 = 119908119901119909119908 119901119910 = 119908119901119910119908 119901119911 = 119908119901119911119908

(14)



= [ 119877119900119905119886119905119894119900119899119872119886119905119903119894119909 119875119900119904119894119905119894119900119899119881119890119888119905119900119903119875119890119903119904119901119890119888119905119894V119890119879119903119886119899119904119891 119878119888119886119897119894119899119892119865119886119888119905119900119903 ]

(15)


119901119909119910119911 = 119879119901119906V119908 (16)

119879 =[[[[[[

119899119909 119904119909 119886119909 119901119909119899119910 119904119910 119886119910 119901119910119899119911 119904119911 119886119911 1199011199110 0 0 1

]]]]]]

= [n s a p0 0 0 1] (17)






119890119904 = 4sum119896=1

119890119896 (18)


119878 = 1198901119890119904119898119888 +1198902119890119904119898119894 +

1198903119890119904119898119900 + 1198904119890119904119898119889 + 119898119900119891 (19)





119865119874119860 = argmax 119878 (20)


4 Experiments

















5 Discussion








6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




119866 (119909 119910 120579) = exp(minus 11990910158402 + 12057421199101015840221205752 ) cos(2120587 11990910158402120582 + 120595) (8)


O (119888 119904 120579) = |O (119888 120579) ⊝ (119904 120579)| (9)



119868 (119909 119905) asymp 119868 (119909 + 120575119909 119905 + 120575119905) (10)


119868 (119909 119905) = 119868 (119909 119905) + nabla119868 sdot 120575119909 + 120575119905 sdot 119868119905 + O2 (11)


nabla119868 sdot (V) + 119868119905 = 0 (12)




w times p = (119908119901119909 119908119901119910 119908119901119911 119908)119905 (13)


119901119909 = 119908119901119909119908 119901119910 = 119908119901119910119908 119901119911 = 119908119901119911119908

(14)



= [ 119877119900119905119886119905119894119900119899119872119886119905119903119894119909 119875119900119904119894119905119894119900119899119881119890119888119905119900119903119875119890119903119904119901119890119888119905119894V119890119879119903119886119899119904119891 119878119888119886119897119894119899119892119865119886119888119905119900119903 ]

(15)


119901119909119910119911 = 119879119901119906V119908 (16)

119879 =[[[[[[

119899119909 119904119909 119886119909 119901119909119899119910 119904119910 119886119910 119901119910119899119911 119904119911 119886119911 1199011199110 0 0 1

]]]]]]

= [n s a p0 0 0 1] (17)






119890119904 = 4sum119896=1

119890119896 (18)


119878 = 1198901119890119904119898119888 +1198902119890119904119898119894 +

1198903119890119904119898119900 + 1198904119890119904119898119889 + 119898119900119891 (19)





119865119874119860 = argmax 119878 (20)


4 Experiments

















5 Discussion








6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia






119890119904 = 4sum119896=1

119890119896 (18)


119878 = 1198901119890119904119898119888 +1198902119890119904119898119894 +

1198903119890119904119898119900 + 1198904119890119904119898119889 + 119898119900119891 (19)





119865119874119860 = argmax 119878 (20)


4 Experiments

















5 Discussion








6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia
















5 Discussion








6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia







6 Conclusions




Data Availability





References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



References

























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia


a computing model of selective attention for service robot...

Documents