grounding words in perception and action

30
kovanrese archlab Grounding Words in Perception and Action: A Robotics Perspective Kadir Firat Uyanik [email protected] Cogs 534: Cognition, Perception and Action offered by Dr. Annette Hohenberger 19.12.2011

Upload: kfuscribd

Post on 22-Oct-2014

48 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grounding Words in Perception and Action

kovanresearchlab

Grounding Words in Perception and Action:

A Robotics Perspective

Kadir Firat [email protected]

Cogs 534: Cognition, Perception and Actionoffered by Dr. Annette Hohenberger

19.12.2011

Page 2: Grounding Words in Perception and Action

Outline

1. Introduction2. Words and physical world3. Words and perceptual categories4. Words and context dependency5. Word learning from audio-visual inputs6. Grounding verbs in action7. Grounding nouns in perception and action8. Grounding concepts through social interactions9. Proposed Framework10. Conclusions

2/21

Page 3: Grounding Words in Perception and Action

Introduction

3/21

Page 4: Grounding Words in Perception and Action

Words and physical world

• According to the results obtained in cognitive/neuro-science (especially embodied cognition) studies,

– language develops in parallel with the interactive actions that we generate.• Lakoff3, and Gallese and Lakoff4 : metaphors are grounded• Zwaan and Taylor5 : role of action in language comprehension• Hauk et al.6 : effect of listened verbs on motor activations• Kaschak et al. 7 : effect of listened motion sentences on sentence

comprehension• Chambers et al. 8 : effect of bodily capabilities on grammatical analysis of

the listened sentence.• For a detailed review, see Glenberg9

– language should be grounded on something that is not symbolic. Our sensory, action, and emotion systems of our bodies provide that grounding (

• Harnad’s symbol merry-go-round argument [10]).

4/21

Page 5: Grounding Words in Perception and Action

Words and physical world contn’d

Thus, meaning of• round is grounded in visual

features of exemplars,• Push in motor control structures,• Heavy in haptic features,• or, they are grounded in

combinations/interrelations of all these features.

5/21

Page 6: Grounding Words in Perception and Action

Words and perceptual categories

• Most of the language grounding systems are only capable of labeling the similar clusters

1. Convert continuous sensory input into discrete feature vectors,

2. Cluster similar feature vectors,3. Label them according to a linguistic

convention• Usually, these systems are not

context aware, and fixed category models cannot capture context sensitive details (Mojsilovic’s color associator16.) Red wine and black wine. Red and black

refers to the same object in two different linguistic convention.

6/21

Page 7: Grounding Words in Perception and Action

Words and context dependency

• Gardenfors17 proposed a model in which the relation between context independent color prototypes and wine colors are shown.

• This model also shows why red and white cannot be used interchangebly but red and black can be used to refer same wine color in different linguistic conventions.

Red wine and black wine. Red and black refers to the same object in two different linguistic convention.

7/21

Page 8: Grounding Words in Perception and Action

Words and context dependency contn’d

• Ragier18 showed that simple words such as ‘above’ or near may correspond to rather implicit features of the environment.

• He found two main features to model this ‘above’ spatial relationship which closely matches human judgments.

• However, models like Gardenfors’ and Ragier’s are insensitive to functional contexts.

Different levels of “aboveness”. The concept of “above” becomes less and less comfortable from left to right if ‘the circle is above the block’ statement is considered.

8/21

Page 9: Grounding Words in Perception and Action

Word learning from audio-visual inputs

Roy, D. and Pentland, A. (2002) Learning words from sights and sounds: A computational model. Cogn. Sci. 26, 113–146

CELL ( Cross-channel Early Lexical Learning)

9/21

Page 10: Grounding Words in Perception and Action

Word learning from audio-visual inputs contn’d

• CELL assumes that object-of-interest is available.• Yu, Ballard and Aslin developed a system that processes spoken input

paired with visual images of multiple objects combined with the speaker’s eye gaze direction.

Yu, C. Ballard D.H., Aslin R.N. The role of embodied intention in early lexical acquisition. Cogn. Sci. vol.29, issue.6, pp 961-1005, 2005

10/21

Page 11: Grounding Words in Perception and Action

Grounding verbs in action

• In Siskind’s perceptually grounded model, the semantics of basic verbs are modeled using temporal schemas that define expected sequences of force dynamic interactions between objects.

– E.g. ‘hand picks up block’ : • table-supports-block, • hand-contacts-block, • hand-attached-block, • hand-supports-block

• Time durations are not specified by the schemas, enabling the model to classify observations across varying timescales.

• Higher level actions are defined in terms of these lower level schemas. Thus ‘move’ is defined as the ordered sequence of the schemas corresponding to ‘pick up’ followed by ‘put down’.

Siskind’s schemas

11/21

Page 12: Grounding Words in Perception and Action

Grounding verbs in action contn’d

• How to distinguish ‘push’ and ‘shove’ by using Siskind’s schemas ?– It is not possible since we need action parameters or at least time information

to differentiate these kinds of actions.• Bailey et al. addressed this issue by developing a system that learns verb

semantics in terms of action control structures, called ‘x-schemas’, which control sequences of movements of a simulated manipulator arm.

• A verb is defined by its associated x-schema and control parameters. – The verbs pick up and put down are distinguished by the structure of their

associated x-schemas, – push and shove are distinguished by different force or velocity control

parameters applied to the structurally identical x-schema.

Bailey et al.’s x-schemas

12/21

Page 13: Grounding Words in Perception and Action

Grounding nouns in perception and action

• Verbs: sensory-motor control programs similar to x-schemas.• Adjectives: sensory expectations relative to specific actions. E.g.,

– red : is not simply a color category, but rather a color category linked to the motor program for directing active gaze towards an object.

– Heavy : haptic expectations associated with lifting actions.• Locations are encoded in terms of body-relative coordinates.• Objects: bundles of properties tied to a particular location along

with encodings of motor affordances for affecting the future location of the bundle.

• E.g. ball : subsumes both the meaning of round (which is one of its expected properties along with color, size, etc.), and all of the actions that may affect the ball.

Roy’s framework

Deb Roy, Semiotic schemas: A framework for grounding language in action and perception, Artificial Intelligence, Volume 167, Issues 1-2, September 2005, Pages 170-205,

13/21

Page 14: Grounding Words in Perception and Action

Grounding nouns in perception and action contn’d

• Revised Definition: An affordance is an acquired relation between a <(entity, behavior)> tuple of an agent such that the application of the <behavior> on the <entity> generates a certain <effect> [2].

<entity> <behavior>

<effect>

environment agent

(<effect>, <(entity, behavior)>)

Affordances framework1

14/21

Page 15: Grounding Words in Perception and Action

Grounding nouns in perception and action contn’d

Affordances framework: Overview

EntityIncludes various perceptual features sensed through distinct sensors

EffectIncludes changes in the features representing an object-of-interest

BehaviorId of the pre-coded action

Affordance<(entity, behavior), effect> nested relation between these three properties

15/21

Page 16: Grounding Words in Perception and Action

Grounding nouns in perception and action contn’d

Affordances framework: How to say “ <do this> to <that thing>” ?

Let’s try “lift the cup” !

16/21

Page 17: Grounding Words in Perception and Action

Grounding nouns in perception and action contn’d

Affordances framework: How to say “ <do this> to <that thing>” ?

Verb (<do this>)Getting <this> action done actually doesn’t depend on the way the action is applied.It is more about the effect generated on <that thing>.

Instead of representing verbs with behaviors, represent them with the effect clusters.

17/21

Page 18: Grounding Words in Perception and Action

Grounding nouns in perception and action contn’d

Affordances framework: How to say “ <do this> to <that thing>” ?

Noun (<that thing>)A robot can learn which feature of the object doesn’t change by applying various actions on that object.

These stable features are actually good indicators of the object itself, and what it actually is.

Therefore, these stable features can be used to call the object as <that thing>, and variable features can be used to predict what it is going to happen (<effect>) if the robot realize <do this>.

18/21

Page 19: Grounding Words in Perception and Action

Grounding concepts through social interactions

Left: A scene from “R.U.R“(1921), showing three robots [11]. Right: A scene from “Sayonara” (2010) [12].

Human-Robot Interaction

19/21

Page 20: Grounding Words in Perception and Action

Grounding concepts through social interactions

• Human–Robot Interaction (HRI) is a field of study dedicated to understanding, designing, and evaluating robotic systems for use by or with humans [13].

• HRI is a highly interdisciplinary field which requires collaboration between the groups from cognitive science, linguistics, psychology, engineering, mathematics, computer science etc.

• Unfortunately, robots are still far from being able to interact with humans in a smooth, natural way (Breazeal14, Fong15)

Human-Robot Interaction

20/21

Page 21: Grounding Words in Perception and Action

Proposed Framework

A common framework for interaction• Learning affordances either by directly acting in the environment, or

observing others’ acting, or even acting collaboratively.• Understanding what is meant to do on what !

– Verbs identify the action (in fact the effect),– Nouns identify the entity to apply action upon

• Assumptions:– Robot’s action repertoire is pre-coded,– Object-of-interest is available to the robot.

21/21

Page 22: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: Overview

22/21

Page 23: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: Tabletop 3D object segmentation & identification

23/21

Page 24: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: Tabletop 2D object segmentation & identification

24/21

Page 25: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: Tactile Sense

25/21

Page 26: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: ExperimentiCub: “Please tell me what to do!“

Human: “iCub reach object one“

iCub: “I'm perceiving“

iCub: “I guess object one is going to be reached"

iCub: “Please give me object one"

iCub: “Please tell me what happened“ Human:“iCub object one is reached"

26/21

Page 27: Grounding Words in Perception and Action

Proposed Framework

Experimental Setup: Preliminary Results

27/21

Page 28: Grounding Words in Perception and Action

Conclusion

• Our purpose is– To enable emergence of verbs and nouns from the interactions of the robot

with the environment,– To enable emergence of the same concepts through observation of others or

interacting with a human collaboratively.• At the end, our robot is supposed to be able to interact with a human

partner in a reasonable way to accomplish a given task, and learn from demonstration how to get it done.

28/21

Page 29: Grounding Words in Perception and Action

References[1] J. J. Gibson (1977), The Theory of Affordances. In Perceiving, Acting, and Knowing, Eds. Robert Shaw and John Bransford, ISBN 0-470-99014-7.[2] E. Sahin, M. Cakmak, M.R.Dogar, E. Ugur , G. Ucoluk, To Afford or Not to Afford: A New Formalization of Affordances Toward Affordance-Based

Robot Control, Adaptive Behavior , 2007 pp: 447-472[3] Lakoff G. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago, IL: University of Chicago Press; 1987.[4] Gallese V, Lakoff G. The brain’s concepts: the role of the sesnsory-motor system in conceptual knowledge. Cogn Neuropsychol 2005, 22:455–479.[5] Zwaan RA, Taylor LJ. Seeing, acting, understanding: motor resonance in language comprehension. J Exp Psychol Gen 2006, 135:1–11.[6] Hauk O, Johnsrude I, Pulvermu¨ ller F. Somatotopic representation of action words in human motor and premotor cortex. Neuron 2004, 41:301–

307.[7] Kaschak MP, Madden CJ, Therriault DJ, Yaxley RH, AveyardM, et al. Perception of motion affects language processing. Cognition 2005, 94:B79–B89.[8] Chambers CG, Tanenhaus MK, Magnuson JS. Actions and affordances in syntactic ambiguity resolution. J Exp Psychol Learn Mem Cogn 2004,

30:687–696.[9] Glenberg, Arthur M. Embodiment as a unifying perspective for psychology, Wiley Interdisciplinary Reviews: Cognitive Science, vol.1 issue.4, 2010[10] Harnad S. The symbol grounding problem. Physica D 1990, 42:335–346.[11] http://www.umich.edu/~engb415/literature/pontee/RUR/RURsmry.html[12] http://www.seinendan.org/en/special/2011/europe/[13] Goodrich MA , Schultz AC. Human–Robot Interaction: A Survey, Foundations and TrendsR in Human–Computer Interaction Vol. 1, No. 3 (2007)

203–275[14] Breazeal, C. (2003). Toward sociable robots. Robotics and Autonomous Systems, 42(3-4), 167±175. Elsevier. [15] Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A survey of socially interactive robots. Robotics and autonomous systems. Elsevier[16] Mojsilovic, A. (2005) A computational model for color naming and describing color composition of images. IEEE Trans. Image Process.14, 690–699[17] Ga¨rdenfors, P. (2000) Conceptual Spaces: The Geometry of Thought,MIT Press[18] Regier, T. (1996) The Human Semantic Potential, MIT Press

29/21

Page 30: Grounding Words in Perception and Action

Thank you for listening…

30/21