chapter 8. situated dialogue processing for human-robot interaction in cognitive systems,...

Chapter 8. Situated Dialogue Processing for Human-Robot Interactionin Cognitive Systems, Christensen et al.

Course: Robots Learning from Humans

Sabaleuski Matsvei

Interdisciplinary Program in Cognitive ScienceSeoul National University

http://cogsci.snu.ac.kr

Contents

Introduction

Background

Multi-level Intergration in Language Processing

Language Processing and Situational Experience

Talking

Talking about What You Can See

Talking about Places You Can Visit

Talking about Things You Can Do

Conclusions

2

Local visuo-spatial scenes

Spatial organization of an indoor environment

DIALOGUE

«THE WORLD»

Playmate scenario Explorer scenario

Introduction

Requirements for the solution

Gradual construction

Referentiality

Persistence

Efficiency & Effectiveness

LANGUAGE PERCEPTION

Winograd's SHRDLU

Incremental "left-to-right" linguistic analyses connected to visuo-spatial representations of local scenes.

Could understand and execute human

commands Had a basic memory

to supply context

Small virtual world Language consisting of

around 50 words

Steels's Semiotic Networks

Open-ended, adaptive communication system

Ability to learn Communicative

success above 80%

Lexicon of around 50 words

Impossible to connect alternative meanings at the same time

Sony AIBO robots

Bi-directionality hypothesis

• Gradual construction

Use of Combinatory Categorial Grammar (CCG)

• Referentiality

Use of structured discourse representation models with the

ability to resolve linguistic reference to situated context

• Persistence

Different referent resolutions can be combined, which is used in

visual learning

• Efficiency & Effectiveness

Incremental comprehension model can sort out unlikely word-

and meaning hypothesses;

Perfomance of speech recognition and parcing is close to 90%

Multi-level Intergration in Language Processing

Modular model

Context-independant representation is constructed first and only then it is intepreted against preceding dialogue context

Incremental model

Every new word is related to representations of the preceding input

Princimple of parsimony:Preferance of the least 'presuppositionally' heavy intepretations

e.g. The postman delivered the baby. Mary gave the child the dog bit a bandaid.

Incremental model is supported by the results of psycholinguistic research (saccadic eye movement research)

Language Processing and Situational Experience

Anticipatory effect

Disambiguation by scene understanding

Temporal projection

Focus of psycholinguistic research:

How information from situation awareness effects utterance

comprehension

Interaction between LANGUAGE and VISION is mediated by CATEGORIES

The research revlealed:

Talking

Listening

Comprehending

Representing an utterance

Representing the Interpretation of an Utterance in Context

Comprehending an Utterance in Context

Picking Up the Right Interpretation

Speaking

Producing an Utterance in Context

Producing Speech

Representing an utterance

Utterance is represented as ontologically richly sorted, relational structure - a logical form in a decidable fragment of modal logic

I want you to put the red mug to the right of the ball

Packing

Take the ball to the left of the box

Packing node

Internal relation

Packing nominal

Packing edge

Packing node target

Example of incremental parsing and packing of logical forms

Here is the ball

Representing the Interpretation of an Utterance in Context

Co-reference relations - relations between mentions referring to the same objects or events.eg. pronouns ('it'), anaphoric expressions ('the red mug')

New referent identifier – [NEW : {antn}]

Antecendant referent - [OLD : {anti}],

[OLD : anti < {antj, ..., antk} < NEW : {antn}].

Reference structure can specify preference orders over sets of old and new referents

Decision tree for dialogue moves

A dialogue move ('speech act') specifies how an etterance contributes to furthering the dialogue

Dialogue context model

Put the red ball next to the cube

Comprehending an Utterance in Context

Cross-modal salience model

Visual salience

Linguistic salience

Word recognition lattice

Example of an incremental analysis

Utterance interpretation at grammatical level

Picking up the right interpretation

Parse selectoin system based on a statistical linear model explores a set of relevant acoustic, syntactic, semantic and contextual features of the parses, and computes a likelihood score for each of them.

Parse selection is a function F :X →Y,where X is a set of possible input utterances, Y is a set of parsesWe alos assume:1. A function GEN(x) which enumerates all possible parses for an input x.2. A d-dimensional feature vector f (x, y) ∈ Rd, representing specific featuresof the pair (x, y). 3. A parameter vector w ∈ Rd

Where wT · f (x, y) is the inner product , and can be seen as a measure of the 'quality' of the phrase

Producing an utterance in context

http://mary.dfki.de:59125/

Producing of an utterance is triggered by a communicative goal.

Communicative goal specifies a dialogue move, and content which is to be

communicated.

The utterance realizer uses the same grammar as the parser.

The MARY speech synthesis engine then produces audio output.

References are generated by the use of incremental algorithm of Dale and

Reiter.

The algorithm is initialized with the intended referent, a contrast set and a list of

prefered attributes. It incrementally tries to rule out members of the set for which

a given property of the intended referent foes not hold.

http://mary.dfki.de:59125/

Thank you for your attention

chapter 8. situated dialogue processing for human-robot interaction in cognitive systems,...

Documents

language systems

natural language

dialogue processing

robots learning

situated dialogue

small world of objects

time sony aibo robots

complicated system