the architecture dream team schloss dagshul, germany october 2001

23
The Architecture Dream Team Schloss Dagshul, Germany October 2001

Upload: bruno-walton

Post on 16-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

The Architecture Dream Team

Schloss Dagshul, Germany

October 2001

Page 2

Would you build your dream house without a blueprint?

Page 3

What you hope to get

Page 4

… what you might get

Page 5

User(s)

Information

Applications

People

Today’s Conventional Architecture

Presentation ApplicationInterface

Dialog Control

Page 6

CHAMELEON Platform (Intelimedia Workbench) Paul McKevitt

Speech synthesizer

Speech recognizer

Laser pointer

Black board

NL parser

Microphone array

Domain model

Gesture recognizer

Dialoguemanager

Frame semantics

Topsy

Page 7

Microsoft Derek Jacoby

DomainExperts

SemanticEngine

RenderingEngine

RecognitionEngine

Semantic ObjectsMultimodalInputs

MultimodalOutputs

DiscourseSemantics

Dialog ManagerDialog Manager

Signal Signal ProcessingProcessing

UI LogicUI Logic

CSRCSR SLUSLU

2.5-4.8kbps, optimized for CSR2.5-4.8kbps, optimized for CSR

ApplicationApplicationApplicationApplication

SAPI 5.0SAPI 5.0

MIPAD Architecture

A Typical DrWho App

Page 8

Harry Bunt

Context

Input Interpretation Output Synthesis

ContextManagement Dialogue

ManagementAPI

Application

Pe

ndin

g C

ont

ext

linguisticsemanticphysicalperceptualcognitivesocial

Page 9

Art ExplorationOliviero Stock

explicit input(e.g., pointing)

input analyzer

composer engine

implicit input (e.g., movement)

presentation

Physical space model

Hypermediainformationvisitor

models

interaction history

Audio message to headphonelinks and image to UI

Page 10

COLLAGEN Sidner et al.

Page 11

IBM’s Responsive Information Architect (RIA)Michelle Zhou

speech

gesture

MultimodalInterpreter

ConversationalFacilitator

Presentation Broker

MediaProducer

VisualDesigner

LanguageDesigner

Models of:

DesignDomainUserConversationEnvironment

userIRISInfo

Server

Page 12

InteractKristiina Jokinen

Input Manager

Presentation ManagerDialogue Manager

Task Agents/Acts

Information Storage

Database

Dialogue Agents/Acts(e.g., Q, A, State)

ASR

LanguageUnderstanding

TopicRecognition

TTS

GeneratorAgents

Page 13

EMBASSI Conceptual Architecture Z-Axis:

- Underlying HW-Infrastructure

- Software-Infrastructure (Agent / Distr.

Comp. Middleware)

- Functional building blocks of conceptual

architecture (Multimodal Assistant

Componentware, MAC)

- Application-level Assistants (not shown)

XY-Plane of MAC

- Dialogic Assistance

- Effectual Assistance

- Situational Assistance

- Explicit and implied generic (=

application independent) ontologies,

defining component interfaces

Page 15

SMARTKOM Wolfgang Wahlster

Page 16

Page 17

DARPA Galaxy Communicator

LanguageGeneration

LanguageGeneration

Text-to-SpeechConversion

Text-to-SpeechConversion

AudioServer

AudioServer

DialogueManagement

DialogueManagement

ApplicationBackend

ApplicationBackend

ContextTracking

ContextTracking

FrameConstruction

FrameConstruction

SpeechRecognition

SpeechRecognition

Hub

The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems

Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator

Page 21

Definitions

Abstract Architecture- Components, connections (protocols), and constraints

(IEEE definition)- Data/knowledge structures, data flow and protocols,

control flow- Consider use cases, e.g.,

In-car navigation systemDesktop, kiosk, mobile device interactionMedia conversion

Page 22

Requirements Functional

- Modality integration (input and output)- Situation (User, task, application) appropriate real-time

sensing/response (e.g., supporting barge-in, perceptual sensing/feedback)

- Representation of level of granularity (modules and data structures)

- Manage feedback - local and global, when/where?- Support incremental processing - Support incremental development (and scaleability)

System/Technical- Support for processing/fusing multimodal input

(e.g., parallel processing)- Modular, composable (possibly distributed processing)- Efficient implementation- Time scale, Temporal and spatial resolution- Accessible (even partial) data structures- Open and extensible protocols

User(s)

Information

Applications

People

Media Fusion

InteractionManagement

IntentionRecognition

DiscourseModeling

UserModeling

PresentationDesign

Representation and Inference

UserModel

DiscourseModel

DomainModel

TaskModel

MediaModels

MediaAnalysis

Media/ModeAnalysis

Language

Graphics

Gesture

Biometrics

Design

Media/ModeDesign

Language

Graphics

Gesture

AnimatedPresentation

Agent

Media InputProcessing

Media OutputRendering

Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998)

Presentation Dialog ControlApplication

Interface

Ap

pli

cati

on

In

terf

ace

Info

rmat

ion

, Ap

plic

atio

ns,

Peo

ple

User(s)

UserModeling

DiscourseManagement

IntentionRecognition

InteractionManagement

Media/Mode

AnalysisLanguage

Graphics

Gesture

Sound

Media InputProcessing

Media OutputRendering

Architecture

Context Management

LexiconManagement

User ID

Bio

met

rics

Application Interface

Integrate

Respond

Request

Terminate

Initiate

T

A

V

G

G

ModeCoordination

PresentationDesign

Multimodal ReferenceResolution

Multimodal Fusion

A

A

V

G

G

Media/Mode

DesignLanguage

Graphics

Gesture

Sound

AnimatedPresentation

Agent

Select Content

Design

Allocate

Coordinate

Layout

UserModel

DiscourseModel

DomainModel

MediaModels

TaskModel

Representation and Inference, States and Histories

ApplicationModels

ContextModel

ReferenceResolution

Action Planning

The Architecture Dream Team

Schloss Dagshul, Germany

October 2001

Page 27

Media Fusion

Media Fusion

MediaAnalysis

Media/ModeAnalysis

Spoken Language

Lip Reading

Gesture

Media Fusion

S

V

V

Page 28

COLLAGEN Sidner et al.

Speechinterpretation

Planning and discourse Agent

Application

USER

Speech

Windowevents

Student Model

Mel

ViaVoice