what stands-in for a missing tool?: a prototypical ...ceur-ws.org/vol-2325/paper-05.pdf · madhura...

What Stands-in for a Missing Tool?

A Prototypical Grounded Knowledge-based

Approach to Tool Substitution

Madhura Thosar1, Christian A. Mueller2, Sebastian Zug11 Faculty of Computer Science,

Otto von Guericke University Magdeburg, Germany2 Robotics Group,

Computer Science & Electrical Engineering Department,Jacobs University Bremen, Germany

[email protected], [email protected],[email protected]

Abstract

When a robot is operating in a dynamic en-vironment, it cannot be assumed that a toolrequired to solve a given task will always beavailable. In case of a missing tool, an idealresponse would be to find a substitute to com-plete the task. In this paper, we present aproof of concept of a grounded knowledge-based approach to tool substitution. In orderto validate the suitability of a substitute, weconducted experiments involving 22 substitu-tion scenarios. The substitutes computed bythe proposed approach were validated on thebasis of the experts’ choices for each scenario.Our evaluation showed, in 20 out of 22 scenar-ios (91%), the approach identified the samesubstitutes as experts.

1 Introduction

The sophistication pertaining to tool-use in humansinvolves not just the dexterity in manipulating a tool,but also the diversity in tool exploitation. The abil-ity to exploit the tools has enabled humans to adaptand thus exert control over an uncertain environment,

Copyright c© by the paper’s authors. Copying permitted for pri-vate and academic purposes.

In: G. Steinbauer, A. Ferrein (eds.): Proceedings of the 11th In-ternational Workshop on Cognitive Robotics, Tempe, AZ, USA,27-Oct-2018, published at http://ceur-ws.org

especially when they are faced with unfavorable situa-tions. For example, if we don’t find a hammer to ham-mer a nail into a wall, we will use a heel of a shoe or arock or if a tray is unavailable for serving the drinks,we will use a plate for serving. In situations like these,humans seem to know - either from the past experi-ence or from observations or from the “necessity is themother of improvisation (invention)” type approach -what kind of object is needed as a substitute.

On the contrary, consider a robot performing a taskthat involves tool use. When a robot is operating in adynamic environment, it can not be assumed that atool required in the task will always be available. Insituations like these, an effective way for a robot wouldbe to find an alternative as humans do, for example,use an eating plate for serving, rather than wait un-til a tray becomes available. This skill is significantwhen operating in a dynamic, uncertain environmentbecause it allows a robot to adapt to unforeseen sit-uations to a degree. The question is how can a robotdetermine which object in the environment is a viablecandidate for a substitute? A possible approach wouldbe by interacting with an object in a manner missingtool is maneuvered. However, it would be time con-suming if a robot interacts with every single object inthe environment to determine a viability which makesthis approach less practical.

In this prototypical work, we propose a non-invasiveapproach that identifies viable candidate/s from theexisting objects in the environment. This paper makesthe following contributions: 1) An approach to creategrounded knowledge about objects expressed in termsof their properties (Sec. 5.2), 2) an approach to identify

20

relevant properties of a missing tool and determine asubstitute on the basis of them (Sec. 5.3).

2 Related Work

Typically, a substitute for a missing tool is determinedby means of knowledge base that provides knowledgeabout objects and similarity measures to determine thesimilarity between a missing tool and a potential sub-stitute. In the following, in addition to the approachesto determine a substitute, we also report the litera-ture related to existing knowledge bases developed forrobotic applications.

Knowledge Base

We reviewed in [24] nine existing knowledge basesnamely: KNOWROB [23], MLN-KB [25], NMKB [19],OMICS [10], OMRKF [21], ORO [14], OUR-K [15],PEIS-KB [8], and RoboBrain [20]. The objective wasto determine whether these existing knowledge basescontain 1) ontological knowledge about the propertiesof objects, 2) such knowledge is grounded into robot’sperception, and 3) intra-class variability in a propertyis modeled instead of expressing the property in a bi-nary form. We gained primarily the following insightswhich form the basis for our work.

We noted that the majority of the knowledge basesrelied on the external human-centric commonsenseknowledge bases such as WordNet, Cyc, OpenCyc, andsome either relied on the hand-coded knowledge oron the knowledge acquired by human-robot interac-tion. The main issue, we believe is that, the depth andbreadth of the human-centric knowledge base is notobservable by a robot in its entirety due to its lim-ited sensing capabilities. This causes a disconnect be-tween human-centric knowledge and robot-centric per-ception. To deflect this issue, we aim to acquire therobot-centric perceptual data for different propertiesof objects. Such property data can then be used togenerate grounded knowledge about objects (see Sec.5.1).

Substitution Computation

One of the closest areas that study the usability of anobject is affordancs of tools where the primary focusis to examine various functional abilities of an objectby exploring what actions can be performed on theobject and observing its responses. As such, using asubstitute in place of a missing tool can also be seen astransferring of an affordance of the missing tool to thesubstitute after determining similarity between them.

In [3], a substitute for a missing tool is inferred onthe basis of inheritance and equivalence relations. The

work discussed in [2] retrieves the knowledge about ob-jects from the ROAR [22] relational database and de-termines a substitute that shares similar affordances.However, in ROAR, the knowledge is acquired eitherusing machine learning techniques requiring trainingexamples or inferred or hand-coded. The work in [7]uses the ConceptNet where potential candidates areextracted from the knowledge base if they share thesame parent with a missing tool for the predeterminedrelations: has-property, capable-of and used-for. Aftereliminating irrelevant candidates, a substitute is deter-mined on the basis of the similarity metrics. The ap-proach proposed in [1] uses a part-based 3D model andweight of an object to determine the orientation andmanipulation of a substitute to be used as a missingtool. In the cases where supervised machine learningtechnique is used, providing bulk of labeled examplesbeforehand would not be realistic for a substitutionproblem scenario. On the other hand, the approacheswhich rely on existing external knowledge bases arebuilt around the available knowledge in the knowledgebases which does impose some constraints. We circum-vents this issue by first identifying what knowledge isgenerally required to determine a substitute and thenbuild an approach to acquire the required knowledgeand compute a substitute on the basis of it.

3 Challenges

How to characterize similarity between a missing tooland a potential substitute: A candidate for a substi-tute is expected to be similar to a missing tool to somedegree to ensure a substitutability. The notion of simi-larity can be understood in various forms, for instance,a distance between two objects denoted by two pointsin a multi-dimensional space or two objects belongingto the same cluster or aspects of the objects that areidentified as shared. In this work, the question will beaddressed in a broader sense: it is not merely aboutidentifying a similar object by deploying some similar-ity measure, instead, it is about gaining an access towhat aspects of the objects were found to be sharedbetween the similar objects.

What kind of knowledge is required to determine thesimilarity: It has been demonstrated in the literatureon tool use in humans and animals alike that in orderto use an object in tasks one needs to have knowledgeabout objects [4]. Baber in [5] also noted that con-ceptual knowledge about objects is especially desiredin tool use where a systematic deliberation is calledfor. For a robot, the story won’t be much different ifit is expected to perform in the real world along sidehumans. As a consequence, a robot needs conceptualknowledge about an object where the object will notbe only a physical entity that is merely to be perceived,

21

but also a concept which consists of distinct character-istics and relations which set each object apart fromeach other and also similar to each other.

How to acquire the necessary knowledge: The ac-quisition of such conceptual knowledge is not with-out challenge. From a robot stand-point, it is a trade-off between what needs to be known and what canbe known. The trade-off is a direct consequence ofthe limited perception capabilities of a robot whichoften leads to partial understanding of the environ-ment. While deploying a multi-modal perception toextract the required knowledge about objects wouldbe an ideal solution, however, it carries its own setof complexities such as noisy sensors, dynamicity ofthe environment, complexities of the composition ofan object. For this prototypical work, the necessaryknowledge is acquired using human-centric as well asmachine-centric methods.

How to maneuver a substitute as a missing tool:Once the substitute has been identified, a robot is ex-pected to use it in place of a missing tool and achievethe same result as the missing tool in the task. Thechallenge to estimate the maneuver as well as graspingof a substitute is two fold: to determine whether themaneuver and grasping knowledge of a missing toolcan be transferred and utilized on a substitute, else,estimate the maneuver and grasping for a substitutesuch that it can be used as a missing tool in the task.

For this work, we have focused on the first threechallenges and have developed a prototypical systemcalled ERSATZ (German word for a substitute or al-ternative) where the focus is to identify the requiredknowledge to determine a substitute and develop a sys-tem that computes a substitute for a missing tool.

4 Approach

The proposed approach distinguishes a tool from asubstitute where a tool is defined as an artifact thatis designed, manufactured and maneuvered in accor-dance with its designated purpose in the tasks suchas hammer for hammering, tray for serving etc., whilea substitute is seen as an extension of a missing tool.Within the context of a designated purpose, the rela-tionship between a tool and a substitute is symmetric,for instance, for hammering, a hammer can be replacedby a heeled shoe and vice versa. However, it may notalways be the case once you step outside the context,for instance, a hammer can not replace a heeled shoe.Our research work, therefore, focuses on searching fora substitute for a conventional tool required in the on-going task as opposed to determining a substitute foritself.

Consider a scenario in which a robot has to choosebetween a plate and a mouse pad as an alternative

for a tray. A tray can be defined as a rigid, rectangu-lar, flat, wooden, brown colored object while a platecan be defined as a rigid, circular, semi-flat, white col-ored object and a mouse pad as soft, rectangular, flat,leather-based object. Bear in mind, however, that someproperties are more relevant than others with respectto the designated purpose of the tool. For a tray whosedesignated purpose is to carry, rigid and flat are morerelevant to carry than a material or a color of a tray.Consequently, to find the most appropriate substitute,the relevant properties of the unavailable tool need tocorrespond to as large a degree as possible to the prop-erties of the possible choices for a substitute.

The proposed approach performs conceptualknowledge-driven computation to identify the relevantproperties of the missing tool and determines the mostsimilar substitute on the basis of those properties. Be-sides identifying the most similar object as a substitutefor a missing tool, the proposed approach grants an ex-plicit access to the relevant properties of the missingtool which carries twofold advantages: firstly, knowingwhich properties are primarily required in the poten-tial substitute narrows down the search space and sec-ondly, in case of an unknown object instance, only therelevant properties will have to be learned to determinea substitute.

The conceptual knowledge considered in this workprimarily involves properties of the objects. The prop-erties considered are divided into physical and func-tional properties where physical properties describethe physicality of the objects such as rigidity, weight,hollowness while the functional properties ascribe the(functional) abilities or affordances to the objectssuch as containment, blockage, support. The functionalproperties in the proposed approach play a primaryrole in identifying the relevant properties of a missingtool (see Sec. 5.3).

The functional properties considered in this workare derived from the theory of image schemas [9] whichhas its roots in cognitive linguistics. According to [12]image schemas are patterns abstracted from spatio-temporal experiences. Essentially image schemas cap-ture recurrent patterns that emerge from our percep-tual and bodily interactions with the environment.Since some of these patterns are posited on the op-erational abilities of objects Kuhn postulated in [12]that affordances a.k.a. functional properties [9] for thespatio-temporal processes can be derived from imageschemas. For example, the containment schema sug-gest an object’s ability to contain something or thesupport schema indicates an object’s ability to holdup something or the blockage refers to the ability ofan object to block or obstruct the movement of another object. Currently, the proposed system is re-stricted to three functional properties based on image

22

Figure 1: Illustration of the object shape conceptual-ization approach [18]. Concepts are randomly colored.

schemas: containment, support and blockage. While thefunctional properties as well as the designated purposecan both be identified as affordances, the proposed ap-proach is built by hypothesizing that the functionalproperties are building blocks upon which designatedpurposes of tools rest.

5 Methodology

5.1 Knowledge Acquisition

Our ultimate objective is to acquire machine centricdata from which property specific data can be ex-tracted. Such property data will then be used to gener-ate grounded knowledge about objects. As a first step,our initial property acquisition focuses on the compos-ite of a machine-centric and a human-centric method.In the machine-centric approach, geometrical prop-erties are acquired using a non-invasive vision-basedtechnique while non-geometric properties are acquiredby sampling from the data from the expert generatedintuitive model for the properties.

Machine Generated Properties: In this paper,we introduce a state-of-art data-driven approach thatunsupervisedly conceptualizes shape according to com-monalities within object point clouds which is dis-cussed in detail in our work [18]. As a result of theprocess, a set of shape concepts is generated whichconcept responses for an unknown object are used inthe knowledge base as machine-generated geometricobject properties.

In our previous work on shape concept learning [18],raw sensor information in form of point clouds is ab-stracted to a symbolic level in which point cloud seg-ments [17] may represent meaningful shape compo-nents in a symbolic space [16]. Therein we introducea hierarchical learning procedure that leads to sym-bols which are gradually organized to reflect generic-to-specific facets of shape components and can be sub-sequently used as building blocks that constitute ob-jects (see A in Fig. 1).

An object shape representation is introduced thatgradually encodes observed objects symbol composi-tions (see B in Fig. 1): from local components to com-ponent groups that may represent object parts or ob-jects as a whole. The proposed shape representationincorporates aspects of exemplar, respectively, proto-

type theory since we believe that the richness of a pro-totype provides an unaltered perspective on the char-acteristics of object instances. Based on the proposedsymbolic shape representation we analyze topology andstructure within the encoded symbol compositions inorder to discover persistent patterns that may repre-sent shape concepts.

We introduce an iterative filtering process [18] toassociated instances to groups which may representshape concepts (see C in Fig. 1). Given the set oflearned concepts, for an unknown object, concept re-sponses are retrieved (see D in Fig. 1) and exploitedas machine-generated geometric object property val-ues in our tool-substitution scenario. Note that in ourtool-substitution scenario, concepts are learned fromunlabeled object instances of the Object DiscoveryDataset(ODD) [17]; the ODD provides a variety of ob-jects from teddy bears over flash lights to shoes whichfacilitates an expressive concept generation.

Human Generated Properties The geometricproperties alone offer a very limited scope of the phys-icality as well as the functionality of an object. There-fore, to compensate the gap, we also considered non-geometrical properties such as weight, rigid, hollow-ness as physical and support, blockage, containmentas functional. Note that, in general, these proper-ties are challenging and cumbersome to extract solelyfrom non-invasive visuoperceptual approaches. Conse-quently, extracting such properties via multi-modal ormanipulation capabilities is needed, but this is beyondthe scope of this paper. In the generation process, a setof labeled prototype objects selected from the Wash-ington dataset (see Table 1) were taken into account.The distribution of each property for particular objectlabels (cf. Table 1) was approximated by an expertto resemble the scope for the variations in the valuesof the property in general. Consequently, given an ob-ject and its label, a sample value was drawn from thea-priori generated property distribution.

5.2 Knowledge about Objects

Knowledge about objects is spread across three levels:the first level consists of the data about the machine-generated as well as human-generated properties, thesecond level consists of qualitative knowledge aboutindividual object instances, while the third level con-sists of the aggregated qualitative fuzzy knowledgeabout respective classes of object instances. The fuzzyformalism is used to model the intra-class variationsin the objects. In the following, we discuss the for-mal description of the methodology deployed to creategrounded knowledge about objects.

Consider O as a given set of object class labelswhere (by abuse of notation) each object class is iden-

23

tified with its label. Let each object class O ∈ O bea given set of its instances. Let

⋃O be a union of all

object classes such that |⋃O| = n. Let P and F be

the given sets of physical properties’ labels and a setof functional properties’ labels respectively. By abuseof notation, each physical and functional property isidentified with its label. For each physical propertyP ∈ P as well as for a functional property F ∈ F,sensory data is acquired from each object instanceo ∈

⋃O. Let Pn and Fn represent sets of n number of

extracted sensory values from n number of object in-stances for a physical property P ∈ P and a functionalproperty F ∈ F respectively.

Sub-categorization - From Continuous to Dis-crete

The sub-categorization process is performed to form(more intuitive) qualitative measures to represent thedegree with which a property is reflected by an ob-ject instance. It is the first step in creating symbolicknowledge about object classes where the symbols rep-resenting the qualitative measures of a physical or afunctional property reflected in an object instance aregenerated unsupervisedly by a clustering mechanism.A qualitative measure of a physical property is referredto as a physical quality and that of a functional prop-erty as a functional quality.

In this process, Pn and Fn representing measure-ments of a physical property P ∈ P and a functionalproperty F ∈ F respectively extracted from n numberof object instances is categorized into a given num-ber of discrete clusters η using a clustering algorithm.Let ∇P and ∇F be partitions of the sets Pn and Fnafter performing clustering on them. Let Pη and Fηbe the sets of labels, expressing physical qualities andfunctional qualities, generated for a physical propertyP ∈ P and a functional property F ∈ F respectively.Given the label for a property, the quality labels aregenerated by combining a property label P and a clus-ter label (created by the clustering algorithm). Forinstance, the quality labels for a property size arerepresented as {size 1, size 2, size 3, size 4}. At theend of the sub-categorization process, the clusters aremapped to the generated symbolic labels for qualita-tive measures.

Note that the number of clusters essentially de-scribes the granularity with which each property canqualitatively be represented. The higher number ofclusters suggest that an object is described in a finerdetail which may obstruct the selection of a substitutesince it may not be possible to find a substitute whichis similar to a missing tool down to the finer details.For example, in size = {small, medium, big, bigger},size is a physical property and small, medium, big,

bigger are its physical qualities. The semantic termsgiven above are meant for the readers to understandthe qualitative measures of the properties.

Attribution - Object Instance Knowledge

The attribution process generates knowledge abouteach object instance by aggregating all the physi-cal and functional qualities assigned to the objectinstance by the sub-categorization step. In otherterms, the knowledge about an instance consists ofthe physical as well as functional qualities reflectedin the instance. Let Pη and Fη be the families ofsets containing the physical quality labels Pη and thefunctional quality labels Fη for each physical propertyP ∈ P and functional property F ∈ F respectively.Thus, each object instance o ∈

⋃O is represented as

a set of all the physical as well as functional qualitiesattributed to it which are expressed by a symbol holdsas: holds ⊂

⋃O× (Pη ∪ Fη) For example, knowledge

about the instance plate1 of a plate class can be givenas, holds(plate1,medium), holds(plate1, harder),holds(plate1, can support) where medium is a phys-ical quality of size property, harder is a physicalquality of rigidity property and can support is afunctional quality of support property.

Conceptualization - Knowledge about Objects

The conceptualization process aggregates the knowl-edge about all the instances of an object class. Theaggregated knowledge is regarded as conceptual knowl-edge about an object class.

Let OKB be a knowledge base about object classeswhere each object class O ∈ O. Given the knowledgeabout all the instances of an object class O, in the con-ceptualization process, the knowledge about the objectclass OK ∈ OKB is expressed as a set of tuples con-sisting of a physical or a functional quality and itsproportion (membership) value in the object class. Atuple is expressed as 〈O, t,m〉 where t ∈ Pη ∪ Fη anda proportion value m is calculated using the follow-ing membership function: m = P (holds(o, t)|o ∈ O).The proportion value allows to model the intra-classvariations in the objects.

For example, knowledge about object class tablecan be expressed as: {〈plate, harder, 0.6 〉, 〈plate,light weight, 0.75 〉, 〈plate, less hollow, 0.67 〉, 〈plate,hollow, 0.33 〉, 〈plate, more support, 0.71 〉}, where thenumbers indicate that, for instance, physical qualityharder was observed in 60% instances of object classplate. At the end of the conceptualization process,conceptual knowledge about an object class is cre-ated which is represented in a symbolic fuzzy formand grounded into the human-generated or machine-generated data about the properties of objects. The

24

Figure 2: A typical process flow to determine a substi-tute for a missing tool from the available objects.

knowledge about objects is then used to determine asubstitute from the existing objects in the environ-ment.

Conceptualization - Knowledge about Func-tional Properties

In addition to conceptual knowledge about ob-jects, Conceptualization process also creates knowl-edge about functional quality, termed as a functionmodel, by associating the occurrence of physical qual-ities in an object instance with the occurrence of afunctional quality in the instance and aggregating theresult of such concurrent occurrences. The role of afunctional model is discussed later in the section 5.3.Given the knowledge about the object instances, afunction model fd of a functional quality f ∈ Fη is ex-pressed as a set of tuples containing a functional qual-ity f ∈ Fη, a physical quality p ∈ Pη and a propor-tion value d. A tuple is represented as 〈f, p, d〉 wheref ∈ Fη, p ∈ Pη and a proportion value d is computedas, d = P (holds(o, p)|holds(o, f)) For example, a func-tion model for a functional quality more support isgiven as, { 〈more support, harder,0.8 〉, 〈more support,softer, 0.2 〉 where the number indicates that, for in-stance, functional quality more support and a physicalquality harder co-occurred in the knowledge about theobject instances 80% of the time.

5.3 Reasoner

Fig. 2 illustrates a process flow consisting of the pri-mary operations involved in determining a substitute.The flow offers an approximated aerial view for theprototypical model of ERSATZ. When ERSATZ isqueried to find a substitute for a missing tool x fromthe set of available objects Y the system checks if thesubstitution model for x exists in the knowledge base.If the substitution model does not exist, then the rea-soner computes the relevant functional and physicalproperties of the queried tool.

Representative Models

A representative physical model and a representativefunctional model of an object consists of the physicalor functional qualities, respectively, that are regardedas representative qualities of the object class, while thequalities which do not fall under representative qual-

ities are regarded as exceptional or uncommon quali-ties.

Let O ∈ O be an object class of a missing tool andlet θ is a representative model threshold which qualifiesa physical or a functional quality as stereotypical orrepresentative to the object class O. Orp is called asa representative physical model of an object class Osuch that Orp = {p : implies(O, p) ≥ θ, p ∈ Pη} andOrf is called as a representative functional model of anobject O such that Orf = {f : implies(O, f) ≥ θ, f ∈Fη}. Similarly, let fd be a function model of functionalquality f , then frp is called as a representative physicalmodel of a functional quality f such that frp = {p :implies(f, p) ≥ θ, p ∈ Pη}

Relevant Qualities

Due to the abstract nature of an image schema andby extension a corresponding functional property, itcan subsume various purposes of objects, for example,a functional property support which can subsume thepurposes place on, sit on and serve on of the a table,a chair and a tray respectively. It is suggested in [6]that a certain assemblage of physical properties areessential prerequisites to enable a functional property.Thus, it can be assumed that by knowing the relevanceof one functional property can help identify the rele-vant physical properties of different objects which areused for different purposes.

The relevance of a representative functional qualityis decided by examining whether the physical charac-terization of the function model of the representativefunctional quality of a tool are in a close proximity tothe physical characterization of a representative phys-ical model of the tool. The close proximity between afunctional quality and the object class of the tool isdetermined using Jaccard Index. Jaccard Index deter-mines a similarity and dissimilarity between the twosets A and B where the similarity is calculated by di-viding the magnitude of the intersection of A and Bby the magnitude of the union of A and B.

Let Orp and frp be the representative physical mod-els of an object class O of the missing tool and of afunction model fd of a representative functional qual-ity f ∈ Fη of the object class O respectively. Let φ bea Minimum Similarity Tolerance threshold for similar-ity. Then, Jaccard Index ofOrp and frp is computed as:

J(Orp, frp) =|Orp∩frp||Orp∪frp| . A representative functional

quality f of an object class O is regarded as relevantif J(Orp, frp) > φ. Let OF ′ be a set of all relevantfunctional qualities of an object class O. Let frp be arepresentative physical model of a function model fdof a relevant functional quality f ∈ OF ′ . Let Orp be arepresentative physical model of O. Then, the relevantphysical qualities of an object class O, expressed by a

25

set OP ′ = (Orp ∩ frp).

Reasoning about a Substitute

Let Oµ ∈ O be an object class of a missing tool andlet Oβ ∈ O be an object class of a possible candidatefor a substitute. Let OµP ′ be a set of relevant phys-ical qualities of Oµ and let Oβrp be a representative

physical model of Oβ . Let φ be a Minimum SimilarityTolerance threshold for similarity. The substitutabilityof a candidate is determined by measuring the sim-ilarity between OµP ′ and Oβrp using Jaccard’s Index.

Oβ is termed as a substitute, expressed as Oβ+, ifJ(OµP ′ , Oβrp) > φ, else it is regarded as not a substi-

tute and expressed as Oβ−. Given the set of relevantphysical qualities OµP ′ , the set of relevant functionalqualities OµF ′ and a positive substitute Oβ+, and a neg-ative substitute Oβ−, a substitution model of Oµ is ex-pressed as a tuple: 〈OµP ′ , O

µF ′ , Oβ+, Oβ−〉. The knowl-

edge about object Oµ ∈ O is then extended in OKB

to accommodate its substitution model.

6 Experimental Evaluation

The objective of the experimental evaluation of ER-SATZ is to validate the suitability of the substitutescomputed by ERSATZ by comparing the results withthat of human experts. For the experimental eval-uation, we used the images from the WashingtonDataset [13] to generate human-based and machine-based properties. Around 22 object categories were se-lected and for each category, we selected random im-ages from all the given instances of the category lead-ing up to total of 692 images. Table 1 illustrates thenumber of images selected from each category. For theexperiment, we generated 22 queries based on 22 ob-ject categories. Each query consisted of a missing tooland 5 randomly selected objects from which a sub-stitute was to be selected. We gave 22 queries, to 14human experts and asked them to select a substitutein each query. The distribution of the human selectionsfor each scenario is illustrated in Fig. 3(a). Similarly,the queries were run on ERSATZ with the following(heuristically determined) optimal values of the targetparameters: i) Number of machine-generated proper-ties is set to 4 (Sec. 5.1), ii) Number of clusters to 4(Sec. 5.2), iii) Representative threshold (Sec. 5.3) andMinimum Similarity Tolerance (Sec. 5.3) to 0.35.

The results of both experiments were plotted as aheat map where the y-axis shows missing tools and x-axis shows the available objects illustrated in Fig. 3.The grayed cells mean the corresponding object cate-gories were not available in the respective query. Thecells that are marked with represents substitutes se-lected by experts and ERSATZ. Out of 22 scenarios,

Table 1: Number of scans (#) per category (Σ# =692) of the Washington RGBD dataset [13].

Category

lab

el

ball

bin

der

bow

lcap

cere

al

box

coff

ee

mug

flash

light

food

bag

food

box

food

can

food

cup

food

jar

hand

tow

el

keyb

oard

kle

enex

note

book

pit

cher

pla

te

sham

poo

soda

can

sponge

wate

rb

ott

le

Inst

.

1-7

1-3

1-6

1-4

1-5

1-8

1-5

1-8

1-1

2

1-1

4

1-5

1-6

1-5

1-5

1-5

1-5

1-3

1-7

1-6

1-6

1-1

2

1-9

Scans

per

Inst

.5 10

5 8 6 4 6 4 3 2 6 5 6 6 6 6 10

5 5 5 3 4

# 35303032303230323628303030303030303530303636

ball

bin

der

bow

lca

pce

real

box

coff

eem

ug

flash

light

food

bag

food

box

food

can

food

cup

food

jar

han

dto

wel

keyb

oard

kle

enex

note

book

pit

cher

pla

tesh

am

poo

sod

aca

nsp

on

ge

wate

rb

ott

le

Substitutes

ballbinder

bowlcap

cereal boxcoffee mug

flashlightfood bagfood boxfood canfood cupfood jar

hand towelkeyboard

kleenexnotebook

pitcherplate

shampoosoda can

spongewater bottle

Mis

sin

gto

ols

(a) Human expert selectiondistributions

bal

lb

ind

erb

owl

cap

cere

alb

oxco

ffee

mu

gfl

ash

ligh

tfo

od

bag

food

box

food

can

food

cup

food

jar

han

dto

wel

keyb

oard

kle

enex

not

ebook

pit

cher

pla

tesh

amp

oo

sod

aca

nsp

onge

wat

erb

ottl

e

Substitutes

ballbinder

bowlcap

cereal boxcoffee mug

flashlightfood bagfood boxfood canfood cupfood jar

hand towelkeyboard

kleenexnotebook

pitcherplate

shampoosoda can

spongewater bottle

0.0

0.2

0.4

0.6

0.8

1.0

Dis

trib

uti

on/

Sim

ilari

ty

(b) ERSATZ selections with sim-ilarity to the missing tool

Figure 3: Substitution results w.r.t. human expert se-lection distribution and ERSATZ similarity responses.Note that, gray cells correspond to object categorieswhich are not available in the respective query, cellsmarked with represents substitutes selected by ex-perts and ERSATZ.

ERSATZ and the experts identified the same substi-tutes in 20 scenarios (91%).

7 Future Work

The paper presents a prototypical system to deter-mine a substitute for a missing tool using the groundedknowledge about objects. The approach has drawn in-spiration from symbol grounding, the theory of affor-dances and the theory of image schemas to representthe grounded knowledge and to determine a substi-tute. This is an ongoing research with a focus on the

26

Sensory Data - Object Instances

Physical Property Data - Object Instances

Fuzzy Conceptual Knowledge - Object Instances

Fuzzy Conceptual Knowledge - Object Classes

Property Extraction Methods

Clustering Method

Bivariate Joint Frequency Distributions

Functional Property Data - Object Instances

Aggregation

Figure 4: Multi-layered dataset to build a robot-centricgrounded knowledge about objects.

following aspects.Our immediate goal focuses on the fuzzification of

the clustering method and the reasoning method tocombat the migration of the data points within clus-ters. Moreover, we have derived three functional prop-erties, namely, contain, support, block from the imageschemas Containment, Support and Blockage respec-tively. However, further investigation is needed to for-malize the identification of additional functional prop-erties to be derived from the existing image schema.For robot-centric property acquisition, we are cur-rently developing a framework that allows a robot toextract properties of individual objects and build aknowledge base in a bottom-up manner such that theknowledge about properties of objects is constructedon the basis of what is sensed (see Fig. 4). We haveproposed the preliminary framework in [11].

References

[1] P. Abelha, F. Guerin, and M. Schoeler. A model-based approach to finding substitute tools in 3Dvision data. Proceedings - IEEE InternationalConference on Robotics and Automation, 2016-June, 2016.

[2] A. Agostini, M. J. Aein, S. Szedmak, E. E. Ak-soy, J. Piater, and F. Worgotter. Using struc-tural bootstrapping for object substitution inrobotic executions of human-like manipulationtasks. IEEE International Conference on Intelli-gent Robots and Systems, 2015-Decem:6479–6486,2015.

[3] I. Awaad, G. K. Kraetzschmar, and J. Hertzberg.Challenges in Finding Ways to Get the JobDone. Planning and Robotics (PlanRob) Work-shop at 24th International Conference on Auto-mated Planning and Scheduling, 2014.

[4] C. Baber. Cognition and Tool Use. Taylor andFrancis, 2003.

[5] C. Baber. Introduction. In Cognition and ToolUse, chapter 1, pages 1–15. Taylor and Francis,2003.

[6] C. Baber. Working With Tools. In Cognition andTool Use, pages 51–68. 2003.

[7] A. Boteanu, A. St. Clair, A. Mohseni-Kabir,C. Saldanha, and S. Chernova. LeveragingLarge-Scale Semantic Networks for AdaptiveRobot Task Learning and Execution. Big Data,4(4):217–235, 2016.

[8] M. Daoutis, S. Coradeshi, and A. Loutfi. Ground-ing commonsense knowledge in intelligent sys-tems. Journal of Ambient Intelligence and SmartEnvironments, 1(4):311–321, 2009.

[9] P. Gardenfors. Cognitive semantics and imageschemas with embodied forces. Image (Rochester,N.Y.), pages 1–16, 1987.

[10] R. Gupta and M. J. Kochenderfer. Common SenseData Acquisition for Indoor Mobile Robots. InProceedings of the Nineteenth National Confer-ence on Artificial Intelligence, Sixteenth Confer-ence on Innovative Applications of Artificial In-telligence, pages 605–610, San Jose, California,USA, 2004.

[11] G. Jager, C. A. Mueller, M. Thosar, S. Zug,and A. Birk. Towards Robot-Centric ConceptualKnowledge Acquisition. In Robots that learn andreason in IEEE/RSJ International Conference onIntelligent Robots and Systems, Madrid, 2018.

[12] W. Kuhn. An Image-Schematic Account of Spa-tial Categories. Spatial Information Theory, pages152–168, 2007.

[13] K. Lai, L. Bo, X. Ren, and D. Fox. A Large-Scale Hierarchical Multi-View RGB-D ObjectDataset. In IEEE International Conference onRobotics and Automation (ICRA), pages 1817–1824, Shanghai, China, 2011.

[14] S. Lemaignan, R. Ros, L. Mosenlechner, R. Alami,and M. Beetz. ORO, a knowledge managementplatform for cognitive architectures in robotics.IEEE/RSJ 2010 International Conference on In-telligent Robots and Systems, IROS 2010 - Con-ference Proceedings, (April):3548–3553, 2010.

[15] G. H. Lim, I. H. Suh, and H. Suh. Ontology-based unified robot knowledge for service robotsin indoor environments. IEEE Transactions onSystems, Man, and Cybernetics Part A:Systemsand Humans, 41(3):492–509, 2011.

27

[16] C. Mueller, K. Pathak, and A. Birk. Object shapecategorization in rgbd images using hierarchicalgraph constellation models based on unsupervis-edly learned shape parts described by a set ofshape specificity levels. In International Confer-ence on Intelligent Robots and Systems, 2014.

[17] C. A. Mueller and A. Birk. Hierarchical Graph-Based Discovery of Non-Primitive-Shaped Ob-jects in Unstructured Environments. In Interna-tional Conference on Robotics and Automation,2016.

[18] C. A. Mueller and A. Birk. Conceptualization ofObject Compositions Using Persistent Homology.In IEEE/RSJ International Conference on Intel-ligent Robots and Systems (IROS), Madrid, 2018.

[19] L. A. Pineda, A. Rodrıguez, G. Fuentes,C. Rascon, and I. Meza. A light non-monotonicknowledge-base for service robots. Intelligent Ser-vice Robotics, 10(3):159–171, 2017.

[20] A. Saxena, A. Jain, O. Sener, A. Jami, D. K.Misra, and H. S. Koppula. RoboBrain: Large-Scale Knowledge Engine for Robots. arXiv, pages1 – 11, 2014.

[21] I. H. Suh, G. H. Lim, W. Hwang, H. Suh,J. H. Choi, and Y. T. Park. Ontology-based

multi-layered robot knowledge framework (OM-RKF) for robot intelligence. IEEE InternationalConference on Intelligent Robots and Systems,(October):429–436, 2007.

[22] S. Szedmak, E. Ugur, and J. Piater. KnowledgePropagation and Relation Learning for Predict-ing Action Effects. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems,Chicago, 9 2014.

[23] M. Tenorth and M. Beetz. KNOWROB-Knowledge Processing for Autonomous PersonalRobots. In IEEE/RSJ International Conferenceon Intelligent Robots and Systems, pages 4261–4266, 2009.

[24] M. Thosar, S. Zug, A. M. Skaria, and A. Jain.A Review of Knowledge Bases for Service Robotsin Household Environments. In 6th InternationalWorkshop on Artificial Intelligence and Cogni-tion, 2018.

[25] Y. Zhu, A. Fathi, and L. Fei-Fei. Reasoning AboutObject Affordance in a Knowledge Based Repre-sentation. European Conference on Computer Vi-sion, (3):408–424, 2014.

28

what stands-in for a missing tool?: a prototypical ...ceur-ws.org/vol-2325/paper-05.pdf · madhura...

Documents