efficient computer interfaces using continuous gestures, language models, and speech

16

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th , 2004

Upload: toviel

Post on 05-Jan-2016

22 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

DESCRIPTION

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech. Keith Vertanen July 30 th , 2004. The problem. Speech recognizers make mistakes Correcting mistakes is inefficient 140 WPM Uncorrected dictation 14 WPMCorrected dictation, mouse/keyboard - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Efficient Computer Interfaces Using Continuous Gestures,

Language Models, and Speech

Keith Vertanen

July 30th, 2004

Page 2: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

The problem

Speech recognizers make mistakes Correcting mistakes is inefficient

140 WPM Uncorrected dictation 14 WPM Corrected dictation, mouse/keyboard 32 WPM Corrected typing, mouse/keyboard

Voice-only correction is even slower and more frustrating

Page 3: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Research overview

Make correction of dictation: More efficient More fun More accessible

Approach: Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to

perform confirmation and correction

Page 4: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Building lattice

Example n-best list:1: jack studied very hard2: jack studied hard3: jill studied hard4: jill studied very hard5: jill studied little

Page 5: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Insertion errors

Page 6: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary:

studied s t ah d iy d Use observed phone confusions to generate alternative

pronunciations:s t ah d iy d s t ah d iy d

s ao d iys t ah d iy…

Map pronunciation back to words:s t ah d iy d studieds ao d iy saudis t ah d iy study

Page 7: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Acoustic confusions:“Jack studied hard”

Page 8: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Language model confusions:“Jack studied hard”

Look at words before or after a node, add likely alternate words based on n-gram LM

Page 9: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Expansion results (on WSJ1)

84.0%

86.0%

88.0%

90.0%

92.0%

94.0%

96.0%

98.0%

Baseli

ne

Inse

rtion

Acous

tic

Mor

pholo

gy

Bigram

Trigra

m

Backw

ard

bigra

m

Backw

ard

trigr

am

Ora

cle

wo

rd a

ccu

racy

ObservedFully additive

Upper bound

Page 10: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Probability model

Our confirmation and correction interface requires probability of a letter given prior letters:

Page 11: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Probability model

Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:

Page 12: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:

Page 13: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Evaluating expansion Assume a good model requires as little information

from the user as possible

1t

0ii211i2 )s...ss|s(Plog

t

1 entropy(T) Cross

0.4

0.5

0.6

0.7

0.8

0.9

Baseli

ne

Inse

rtion

Acous

tic

Mor

pholo

gy

Bigram

Trigra

m

Backw

ard

bigra

m

Backw

ard

trigr

am

Cro

ss

en

tro

py

(b

its

)

Page 14: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Results on test set Model evaluated on held out test set (Hub1) Default language model

2.4 bits/letter User decides between 5.3 letters

Best speech-based model 0.61 bits/letter User decides between 1.5 letters

Page 15: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

“To the mouse snow means freedom from want and fear”

Page 16: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Questions?

Speech Choir and Proper Hand Gestures

Human ComputerInteraction: Speech Interfaces and e …dihana.cps.unizar.es/.../2011/Eduardo/Valencia_2011-speech-tech.pdf · Human-Human communication: Speech Human-Computer Interfaces

Human-Computer Interaction: Speech Interfaces and e …dihana.cps.unizar.es/~eduardo/trabhci/doc/2012/speech.pdf · Human-Computer Interaction: Speech Interfaces and e ... Vocal human

Mapping Hand Gestures to Speech Using Neural Networks

Gestures and Lip Shape Integration for Cued Speech Recognition

3 Gestures and Speech in Cars Report

Multimodal Information Access Using Speech and Gestures Norbert Reithinger [email protected]

1 Speech User Interfaces 2 Outline Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech

Speech and 2D Micro-Gestures

Variability of opening and closing gestures in speech ... · Variability of opening and closing gestures in speech communication ... feel inclined to agree with the French philosopher’s

A Multiparadigm Approach to Integrate Gestures and Sound ...ceur-ws.org/Vol-1112/08-paper.pdfMultimodal interfaces involving speech and gestures have been widely used for text input,

Glove-TalkII: Mapping Hand Gestures to Speech Using Neural ...papers.nips.cc/...mapping-hand-gestures...networks.pdf · Translating gestures to speech using an AVT model has a long

Chapter 14 Gesture-Based Interfaces: Practical ... · The use of gestures in interfaces has ranged widely from conversational interfaces with speech and gestures used together to

The Perception ofPhonetic Gestures* - Haskins … · Haskins Laboratories StatusReport on Speech Research 1989, SR-99/l00,102-117 The Perception ofPhonetic Gestures* Carol A. Fowlert

Deformable Interfaces for Performing Music · PDF fileDeformable Interfaces for Performing Music ... Deformable interfaces offer new possibilities for gestures, ... a camera-based

Deep Learning for New User Interactions (Gestures, Speech and Emotions)

Systematic literature review of hand gestures used in ......Systematic literature review of hand gestures used in human computer interaction interfaces 1. Introduction Gestures are

Cutlural variation of speech accompanying gestureswrap.warwick.ac.uk/66214/2/WRAP_Kita_culture_and_gestue_LCP_v13-distr (1).pdfCross-cultural variation of speech-accompanying gestures:

Multimodal emotion recognition from expressive faces, body ... · Multimodal emotion recognition from expressive faces, body gestures and speech 377 facial expressions and body gestures

Videorealistic Facial Animation for Speech-Based Interfaces · Videorealistic Facial Animation for Speech-Based Interfaces by ... 3.2 Speech Recognition ... audio, graphics/video,

Modal Interfaces & Speech User Interfaces Katherine Everitt CSE 490F Section Nov 20 & 21, 2006

Speech User Interfaces

Dimensionalizing co-speech gesturesintro2psycholing.net/ICPhS/papers/ICPhS_1539.pdf · labelling co-speech gestures, which divided gestures into various types. These included (roughly

Best Practices in Natural Interaction Design (Gestures, Posture, Speech, Gaze)

GEPPETO 1 : A modeling approach to study the production of speech gestures

SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions

Barack Obama’s pauses and gestures in humorous speeches · Barack Obama’s pauses and gestures in humorous speeches ... Speech and gestures are closely ... House Correspondents’

Designing Speech-Based Interfaces for Telepresence Robots ...vigir.missouri.edu/~gdesouza/Research/Conference...Designing Speech-Based Interfaces for Telepresence Robots for People

Towards Facial Gestures Generation by Speech Signal

Human-Robot Interaction Using Gestures and Speech Recognition · PDF fileMicrosoft Speech Recognition Engine (or speech recognizer) takes an audio stream as ... and Microsoft Speech

Efficient computer interfaces using continuous gestures, language

Introduction to Speech Interfaces for Web Applications

„But Wait, There’s More!” – A Deeper Look into Gestures on ...€¦ · „But Wait, There’s More!” – A Deeper Look into Gestures on Touch Interfaces Emilie Lind Damkjær

The Role of Gestures in Spatial Working Memory and Speechrmk7/PDF/GestMem.pdf · Gestures and Cognition 2 The Role of Gestures in Spatial Working Memory and Speech Traditionally,

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004