efficient computer interfaces using continuous gestures, language models, and speech

16
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th , 2004

Upload: toviel

Post on 05-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech. Keith Vertanen July 30 th , 2004. The problem. Speech recognizers make mistakes Correcting mistakes is inefficient 140 WPM Uncorrected dictation 14 WPMCorrected dictation, mouse/keyboard - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Efficient Computer Interfaces Using Continuous Gestures,

Language Models, and Speech

Keith Vertanen

July 30th, 2004

Page 2: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

The problem

Speech recognizers make mistakes Correcting mistakes is inefficient

140 WPM Uncorrected dictation 14 WPM Corrected dictation, mouse/keyboard 32 WPM Corrected typing, mouse/keyboard

Voice-only correction is even slower and more frustrating

Page 3: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Research overview

Make correction of dictation: More efficient More fun More accessible

Approach: Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to

perform confirmation and correction

Page 4: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Building lattice

Example n-best list:1: jack studied very hard2: jack studied hard3: jill studied hard4: jill studied very hard5: jill studied little

Page 5: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Insertion errors

Page 6: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary:

studied s t ah d iy d Use observed phone confusions to generate alternative

pronunciations:s t ah d iy d s t ah d iy d

s ao d iys t ah d iy…

Map pronunciation back to words:s t ah d iy d studieds ao d iy saudis t ah d iy study

Page 7: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Acoustic confusions:“Jack studied hard”

Page 8: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Language model confusions:“Jack studied hard”

Look at words before or after a node, add likely alternate words based on n-gram LM

Page 9: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Expansion results (on WSJ1)

84.0%

86.0%

88.0%

90.0%

92.0%

94.0%

96.0%

98.0%

Baseli

ne

Inse

rtion

Acous

tic

Mor

pholo

gy

Bigram

Trigra

m

Backw

ard

bigra

m

Backw

ard

trigr

am

Ora

cle

wo

rd a

ccu

racy

ObservedFully additive

Upper bound

Page 10: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Probability model

Our confirmation and correction interface requires probability of a letter given prior letters:

Page 11: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Probability model

Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:

Page 12: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:

Page 13: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Evaluating expansion Assume a good model requires as little information

from the user as possible

1t

0ii211i2 )s...ss|s(Plog

t

1 entropy(T) Cross

0.4

0.5

0.6

0.7

0.8

0.9

Baseli

ne

Inse

rtion

Acous

tic

Mor

pholo

gy

Bigram

Trigra

m

Backw

ard

bigra

m

Backw

ard

trigr

am

Cro

ss

en

tro

py

(b

its

)

Page 14: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Results on test set Model evaluated on held out test set (Hub1) Default language model

2.4 bits/letter User decides between 5.3 letters

Best speech-based model 0.61 bits/letter User decides between 1.5 letters

Page 15: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

“To the mouse snow means freedom from want and fear”

Page 16: Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Questions?