the architecture dream team schloss dagshul, germany october 2001
Post on 16-Dec-2015
219 Views
Preview:
TRANSCRIPT
Page 5
User(s)
Information
Applications
People
Today’s Conventional Architecture
Presentation ApplicationInterface
Dialog Control
Page 6
CHAMELEON Platform (Intelimedia Workbench) Paul McKevitt
Speech synthesizer
Speech recognizer
Laser pointer
Black board
NL parser
Microphone array
Domain model
Gesture recognizer
Dialoguemanager
Frame semantics
Topsy
Page 7
Microsoft Derek Jacoby
DomainExperts
SemanticEngine
RenderingEngine
RecognitionEngine
Semantic ObjectsMultimodalInputs
MultimodalOutputs
DiscourseSemantics
Dialog ManagerDialog Manager
Signal Signal ProcessingProcessing
UI LogicUI Logic
CSRCSR SLUSLU
2.5-4.8kbps, optimized for CSR2.5-4.8kbps, optimized for CSR
ApplicationApplicationApplicationApplication
SAPI 5.0SAPI 5.0
MIPAD Architecture
A Typical DrWho App
Page 8
Harry Bunt
Context
Input Interpretation Output Synthesis
ContextManagement Dialogue
ManagementAPI
Application
Pe
ndin
g C
ont
ext
linguisticsemanticphysicalperceptualcognitivesocial
Page 9
Art ExplorationOliviero Stock
explicit input(e.g., pointing)
input analyzer
composer engine
implicit input (e.g., movement)
presentation
Physical space model
Hypermediainformationvisitor
models
interaction history
Audio message to headphonelinks and image to UI
Page 11
IBM’s Responsive Information Architect (RIA)Michelle Zhou
speech
gesture
MultimodalInterpreter
ConversationalFacilitator
Presentation Broker
MediaProducer
VisualDesigner
LanguageDesigner
Models of:
DesignDomainUserConversationEnvironment
userIRISInfo
Server
Page 12
InteractKristiina Jokinen
Input Manager
Presentation ManagerDialogue Manager
Task Agents/Acts
Information Storage
Database
Dialogue Agents/Acts(e.g., Q, A, State)
ASR
LanguageUnderstanding
TopicRecognition
TTS
GeneratorAgents
Page 13
EMBASSI Conceptual Architecture Z-Axis:
- Underlying HW-Infrastructure
- Software-Infrastructure (Agent / Distr.
Comp. Middleware)
- Functional building blocks of conceptual
architecture (Multimodal Assistant
Componentware, MAC)
- Application-level Assistants (not shown)
XY-Plane of MAC
- Dialogic Assistance
- Effectual Assistance
- Situational Assistance
- Explicit and implied generic (=
application independent) ontologies,
defining component interfaces
Page 17
DARPA Galaxy Communicator
LanguageGeneration
LanguageGeneration
Text-to-SpeechConversion
Text-to-SpeechConversion
AudioServer
AudioServer
DialogueManagement
DialogueManagement
ApplicationBackend
ApplicationBackend
ContextTracking
ContextTracking
FrameConstruction
FrameConstruction
SpeechRecognition
SpeechRecognition
Hub
The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems
Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator
Page 21
Definitions
Abstract Architecture- Components, connections (protocols), and constraints
(IEEE definition)- Data/knowledge structures, data flow and protocols,
control flow- Consider use cases, e.g.,
In-car navigation systemDesktop, kiosk, mobile device interactionMedia conversion
Page 22
Requirements Functional
- Modality integration (input and output)- Situation (User, task, application) appropriate real-time
sensing/response (e.g., supporting barge-in, perceptual sensing/feedback)
- Representation of level of granularity (modules and data structures)
- Manage feedback - local and global, when/where?- Support incremental processing - Support incremental development (and scaleability)
System/Technical- Support for processing/fusing multimodal input
(e.g., parallel processing)- Modular, composable (possibly distributed processing)- Efficient implementation- Time scale, Temporal and spatial resolution- Accessible (even partial) data structures- Open and extensible protocols
User(s)
Information
Applications
People
Media Fusion
InteractionManagement
IntentionRecognition
DiscourseModeling
UserModeling
PresentationDesign
Representation and Inference
UserModel
DiscourseModel
DomainModel
TaskModel
MediaModels
MediaAnalysis
Media/ModeAnalysis
Language
Graphics
Gesture
Biometrics
Design
Media/ModeDesign
Language
Graphics
Gesture
AnimatedPresentation
Agent
Media InputProcessing
Media OutputRendering
Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998)
Presentation Dialog ControlApplication
Interface
Ap
pli
cati
on
In
terf
ace
Info
rmat
ion
, Ap
plic
atio
ns,
Peo
ple
User(s)
UserModeling
DiscourseManagement
IntentionRecognition
InteractionManagement
Media/Mode
AnalysisLanguage
Graphics
Gesture
Sound
Media InputProcessing
Media OutputRendering
Architecture
Context Management
LexiconManagement
User ID
Bio
met
rics
Application Interface
Integrate
Respond
Request
Terminate
Initiate
T
A
V
G
G
ModeCoordination
PresentationDesign
Multimodal ReferenceResolution
Multimodal Fusion
A
A
V
G
G
Media/Mode
DesignLanguage
Graphics
Gesture
Sound
AnimatedPresentation
Agent
Select Content
Design
Allocate
Coordinate
Layout
UserModel
DiscourseModel
DomainModel
MediaModels
TaskModel
Representation and Inference, States and Histories
ApplicationModels
ContextModel
ReferenceResolution
Action Planning
Page 27
Media Fusion
Media Fusion
MediaAnalysis
Media/ModeAnalysis
Spoken Language
Lip Reading
Gesture
Media Fusion
S
V
V
top related