guiding interaction behaviors for multi-modal grounded ...behavior annotations. we gather relevant...

Post on 28-Jul-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Guiding Interaction Behaviors forMulti-modal Grounded Language Learning

Jesse Thomason, Jivko Sinapov, Raymond J. MooneyUniversity of Texas at Austin

,

Multi-Modal Grounded Linguistic Semantics

Robots need to be able to connect language to theirenvironment in order to discuss real world objects withhumans. Grounded language learning connects arobot’s sensory perception to natural language pred-icates. We consider a corpus of objects explored by arobot in previous work (Sinapov, IJCAI 2016).

grasp lift lower

drop press pushA hold and look behavior were also performed foreach object.I Objects sparsely annotated with languagepredicates from an interactive “I Spy” game withhumans (Thomason, IJCAI 2016).

I Many object examples are necessary to groundlanguage in multi-modal space

I We explore additional sources of information fromusers and corporaI Behavior annotations, “What behavior would you use tounderstand ‘red’?”

I Modality annotations, “How do you experience ‘heaviness’?”I Word embeddings, exploiting that “thin” is close to “narrow.”

Methodology

The decision d(p,o) ∈ [−1,1] for predicate p and ob-ject o is

d(p,o) =∑c∈C

κp,cGp,c(o), (1)

for Gp,c a linear SVM trained on labeled objects for p inthe feature space of context c with Cohen’s κp,c agree-ment with human labels during cross-validation.

“round”

grasp lift look

audio audio color

haptics haptics shape haptics

lower

grasp lift look

audio audio color

haptics haptics shape haptics

lower

yes

no

Data from Human Interactions

Human Behavior Annotations

Estimated Context Reliability

+Behavior Annotations

audio

audio

Behavior Annotations. We gather relevant behaviorsfrom human annotators for each predicate, allowing usto augment classifier confidences with human sugges-tions. The predicate “heavy” favors interaction behav-iors like lifting and holding.

Modality Annotations. We also derive a modality an-notations past work (Lynott, 2009), allowing us to aug-ment classifier confidences with human intuitions. Thepredicate “white” favors visual-related modalities.

Word Embeddings. We calculate positive similarityas and subsequently get augment each predicate’sclassifier confidences with similarity-weighted aver-ages of its predicate neighbors. A predicate like “nar-row” with few object examples can borrow informationfrom a predicate like “thin” that is close in word embed-ding space and has many examples.

Experiment and Results

p r f1mc .282 .355 .311κ .406 .460 .422B+κ .489 .489 .465M+κ .414 .466 .430W+κ .373 .474 .412

I Adding behavior annotations or modality annotationsimproves performance over using kappa confidencealone.

I Behavior annotations helped the f -measure ofpredicates like “pink”, “green”, and “half-full”

I Modality annotations helped with predicates like“round”, “white”, and “empty”.

I Sharing kappa confidences across similarpredicates based on their embedding cosinesimilarity improves recall at the cost of precision.

I For example, “round” improved, but at the expense ofdomain-specific meanings of predicates like “water”.

jesse@cs.utexas.edu https://www.cs.utexas.edu/˜jesse/

top related