speech & gesture recognition systems -...

69
Speech & Gesture Recognition Systems Andreas Farner Seminar Human Computer Interaction 18.06.09

Upload: hoangdiep

Post on 05-Mar-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Speech & Gesture Recognition Systems

Andreas Farner

Seminar Human Computer Interaction18.06.09

Page 2: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Overview

1 Multimodal Interfaces

2 Put That There

3 Finite-state Multimodal Parsing and Understanding

Page 3: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Multimodal Interfaces

Systems that allow input and/or output to be conveyed overmultiple different channels

Making the system produce a specific output based on thevoice- & gesture-input of the user

Page 4: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Overview

1 Multimodal Interfaces

2 Put That There

3 Finite-state Multimodal Parsing and Understanding

Page 5: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Put That There: Voice and Gesture at the

Graphics Interface

Richard A. Bolt

1980

Page 6: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Overview

1 Introduction

2 Commands

3 Technologies

4 Summary

Page 7: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Introduction

Groundbreaking paper from 1980

Approach to combined voice-input and gesture-recognition

Practical use: arranging things

Scenery: The media room of the Massachusetts Institute ofTechnology (MIT)

Page 8: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

The Media Room

The MIT Media Room I

Page 9: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

The Media Room

The MIT Media Room II

Combines virtual “Dataland” and physical space→ one interactive space

Offers the user to act naturally

Page 10: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

The Media Room

Objects

Simple basic shapes

CirclesSquaresDiamonds

Variable attributes:

ColorSize (large, medium, small)

Virtual object does not represent the real shape of the object

Page 11: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Controls

Navigating through “Dataland”

No keyboard needed

Joysticks and touch-pads

Navigating in a helicopter-like manner

Moving “you are here” marker with right-hand joystick ortouch-pad on TV-screenZooming in and out with the left-hand joystickMoving the marker also possible by pointing

Page 12: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Controls

Put That There

Page 13: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Overview: Basic Commands

Create

Move

Make that smaller/bigger/like that

Delete

Naming

Page 14: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Create

Create

“Create a blue circle here.”

Page 15: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Create

Create

“Create a blue circle here.”

Page 16: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Create

Create

Default size: medium

Color and shape must be given

“there” is combined with the x,y pointing input

Page 17: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 18: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 19: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 20: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 21: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Make That Smaller/Bigger/Like That

Smaller I

“Make the blue circle smaller.”

Page 22: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Make That Smaller/Bigger/Like That

Smaller II

“Make that smaller.”

Page 23: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Make That Smaller/Bigger/Like That

Smaller Result

“Make the blue circle smaller.”“Make that smaller.”

Page 24: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Make That Smaller/Bigger/Like That

Make That

“Make that a large red diamond.”

Page 25: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Make That Smaller/Bigger/Like That

Make That

“Make that a large red diamond.”

Page 26: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 27: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 28: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 29: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability
Page 30: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Delete

Everything

“Delete everything to the left of this.”

Page 31: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Delete

Everything

“Delete everything to the left of this.”

Page 32: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Naming

Naming

“Call that...the calendar”

Page 33: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Naming

Naming

“Call that...the calendar”

“Call that” → record x,y coordinates and switch to trainingmode

“...” → Pause needed to switch between the modes

“the calendar” → associate coordinates to “the calendar” andswitch back to recognition mode

Page 34: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Speech

Speech Recognizers

Two types of speech recognizers:

Discrete or isolated utterancesRequires pause between words → unnatural

Connected SpeechNo pause necessary, up to 5 words → more natural

Page 35: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Speech

Used Speech Recognizer

Response time: 300 milliseconds

Output: display of the text on an alphanumeric visual display

Vocabulary: 120 words as word reference patterns

Page 36: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Space

Pointing

Based on measurements made of a nutating magnetic field

Two small plastic cubes each containing three coils (one foreach axis)

1 A transmitter cube (3,8cm on edge)2 A sensor cube (1,9cm on edge) attached to a wristband of the

user

Coils together create an antenna

Page 37: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Commands Technologies Summary

Summary

More than a virtual desktop overview:

Moving ships about a harbor mapMoving battalion formationsFacilities planning

Groundbreaking approach⇒ First approach to combine speech and gesture input

May be simple, but have to remember: from 1980

Page 38: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Put That There rather superficial

Another approach from 2000:

“Finite-state Multimodal Parsing and Understanding”

→ How is multimodal input parsed?

Page 39: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Overview

1 Multimodal Interfaces

2 Put That There

3 Finite-state Multimodal Parsing and Understanding

Page 40: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Multimodal Parsing and Understanding

Michael Johnston & Srinivas Bangalore

2000

Page 41: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Inhaltsverzeichnis

1 Introduction

2 Finite-state Language Processing

3 Finite-state Multimodal Grammars

4 Applying Multimodal Transducers

5 Conclusion

Page 42: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Multimodal Interfaces

Multimodal interfaces require effective parsing andunderstanding

General applicability required

Using one single finite-state device

Important concern in the ongoing migration of interactionfrom the desktop to wireless devices as PDAs, next-generationphones, etc.

Page 43: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Automaton (FSA)

Parsing, understanding and integration of speech and peninput performed by one device

Running on three tapes1 Speech input2 Pen input3 Combined interpretation

Page 44: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Example

Page 45: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Transducers

Are finite-state automata (FSA)

Each transition consists of an input and an output symbol

Can be regarded as a two-tape FSA with an input tape and anoutput tape

Page 46: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Models

Attractive mechanisms for language processing

Efficiently learnable from data

Generally effective for decoding

Allows straightforward integration of constraints from variouslevels of language processing

Enable tight integration of language processing with speechand gesture recognition

Page 47: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Frege’s Principle

“The meaning of a complex expression is

determined by the meanings of its constituent

expressions and the rules used to combine them.”

Page 48: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Parsing Multiple Input Streams

Speech and pen input require three tapes, one for speech andone for pen input and the third one for their combinedmeaning

Finite-state device combines the content of multiple inputstreams into a single semantic representation

An interface with n modes requires n + 1 tapes

First n tapes represent the input streamsn + 1 is an output stream representing their composition

Page 49: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Multimodal Context-free Grammar

Non-terminals are atomic symbols

Each terminal contains three components W:G:Mcorresponding to the n + 1 tapes

1 W: Words2 G: Gestures3 M: Meaning

ǫ means that component is empty in this terminal

Page 50: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Example Grammar

S → V NP ǫ:ǫ:])NP → DET NNP → DET N CONJ NPCONJ → and:ǫ:,V → email:ǫ:email([V → page:ǫ:page([DET → this:ǫ:ǫDET → that:ǫ:ǫ

N → person:Gp:person( ENTRYN → organization:Go :org( ENTRYN → department:Gd :dept( ENTRYENTRY → ǫ:e1:e1 ǫ:ǫ:)ENTRY → ǫ:e2:e2 ǫ:ǫ:)ENTRY → ǫ:e3:e3 ǫ:ǫ:)

ENTRY → ...

Page 51: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Example Three-Tape FSA

Page 52: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Meaning Representation

Capturing Meaning

Not only capturing the structure of the language, but alsomeaning

Writing symbols on the third tape (n + 1)

Concatenated symbols yield the semantic representation of anutterance

Page 53: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Meaning Representation

Gesture Values

Every object that can be gestured on needs an uniqueidentifier → In case of persons something like an address book

To avoid repeating ID-arcs like ǫ : objid123 : objid123 theauthors store these values in a finite set of buffers labelede1, e2, e3, . . .

Page 54: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Meaning Representation

Capturing Meaning Example

“Email ...” ⇒ Word input→ Output on semantic tape: email([

Page 55: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Meaning Representation

Capturing Meaning Example

“Email this ...” ⇒ Word input + pen input→ Output on semantic tape: email([ (e1)

Page 56: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Finite-state Meaning Representation

Capturing Meaning Example

“Email this person.” ⇒ Word input→ Output on semantic tape: email([person(e1)])

Page 57: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Multimodal Finite-state Transducers

Problems

Finite-state language processing tools only support finite-statetransducers (two tapes)

Speech recognizers don’t support use of three-tape FSA⇒ Three-tape FSA has to be converted into an FST

Page 58: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Multimodal Finite-state Transducers

From FSA to FST

Combining pen input & word input→ one input component: (G × W )

Output component M remains the same

Resulting function: T : (G × W ) → M

Page 59: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Multimodal Finite-state Transducers

Transducers

R : G → W

T : (G × W ) → M

Page 60: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Applying Multimodal Transducers

1 Recognizing pen input first→ Process incoming pen gestures and construct a finite-statemachine

2 Using observed pen input to modify the language model forspeech recognition

Page 61: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Gesture Finite-state Machine

Page 62: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Gesture Language Transducer

+

R : G → W

Page 63: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Gesture Language Transducer

=

Page 64: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Output Tape of Gesture Language Transducer

Page 65: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Speech Recognizer

Page 66: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Gesture Speech FST

Page 67: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Final Transducer

Page 68: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Summary

First approach using single finite-state device to parse andintegrate spoken language and pen input

Speech and pen input recognition

Composes semantic representation from speech and pen input

Mutual compensation among input modes

Page 69: Speech & Gesture Recognition Systems - DFKIembots.dfki.de/doc/seminar_ss09/Farner_Multimodal.pdf · Multimodal interfaces require effective parsing and understanding General applicability

Introduction Finite-state Language Processing Finite-state Multimodal Grammars Applying Multimodal Transducers Conclusion

Thank you very much for your attention!

Any questions?