speech user interfaces

27
1 Speech User Speech User Interfaces Interfaces

Upload: quinn-vega

Post on 04-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Speech User Interfaces. Outline. Review Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech UI design tools Multimodal UIs. Review. Why do we prototype? get feedback on our design from customers – faster & cheaper - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech User Interfaces

11

Speech User InterfacesSpeech User Interfaces

Page 2: Speech User Interfaces

22

OutlineOutline

ReviewReview Motivation for speech UIsMotivation for speech UIs Speech recognitionSpeech recognition UI problems with speech UIsUI problems with speech UIs SpeechActs: Guidelines for speech SpeechActs: Guidelines for speech

UIsUIs Speech UI design toolsSpeech UI design tools Multimodal UIsMultimodal UIs

Page 3: Speech User Interfaces

33

ReviewReview

Why do we prototype?Why do we prototype?• get feedback on our design from customers – faster & get feedback on our design from customers – faster &

cheapercheaper Why use low-fi prototypes?Why use low-fi prototypes?

• traditional methods take too long & focus designers & traditional methods take too long & focus designers & customers on the wrong (visual) issuescustomers on the wrong (visual) issues

What is the Wizard of Oz technique?What is the Wizard of Oz technique?• faking the interactionfaking the interaction

What is the advantage of using informal tools like What is the advantage of using informal tools like SILK, DENIM, & SUEDE?SILK, DENIM, & SUEDE?• advantages of electronic medium (editing, reuse, advantages of electronic medium (editing, reuse,

distribution, etc.)distribution, etc.)• faster than traditional UI toolsfaster than traditional UI tools• do not focus designers/customers on the wrong issuesdo not focus designers/customers on the wrong issues• ability to support testing & analysis of resulting dataability to support testing & analysis of resulting data

Page 4: Speech User Interfaces

44

Motivation for Speech UIs:Motivation for Speech UIs:Pervasive Information AccessPervasive Information Access

Information

&

Services

I-Land vision by Streitz, et. al.

Page 5: Speech User Interfaces

55

UIs in the Pervasive Computing EraUIs in the Pervasive Computing Era

Future computing devices won’tFuture computing devices won’t have the same UI as current PCs have the same UI as current PCs

• wide range of deviceswide range of devices small or embedded in environmentsmall or embedded in environment often w/ “alternative” I/O & w/o screens often w/ “alternative” I/O & w/o screens information appliancesinformation appliances

I-Land vision by Streitz, et. al.

Page 6: Speech User Interfaces

66

Information Access via SpeechInformation Access via Speech

Read my important

email

Page 7: Speech User Interfaces

77

Industry LeadersIndustry Leaders

NuanceNuance Corporation Corporation Applications: Applications: TellMeTellMe, …, … Users: Government, Computers- Users: Government, Computers-

Microsoft, IBM, Microsoft, IBM,

Page 8: Speech User Interfaces

88

Speech UI MotivationSpeech UI Motivation

Smaller devices -> difficult I/OSmaller devices -> difficult I/O• people can talk at ~ 90 wpm -> high speedpeople can talk at ~ 90 wpm -> high speed

““Virtually unlimited” set of commandsVirtually unlimited” set of commands Freedom for other body partsFreedom for other body parts

• imagine you are working on your car & need to imagine you are working on your car & need to know something from the manualknow something from the manual

NaturalNatural• evolutionarily selected forevolutionarily selected for

reading, writing, & typing are not (too new)reading, writing, & typing are not (too new)

Page 9: Speech User Interfaces

99

Why are Speech UIs Hard to Get Why are Speech UIs Hard to Get Right?Right?

Speech recognition far from perfectSpeech recognition far from perfect• imagine inputting commands w/ the imagine inputting commands w/ the

mouse & getting the wrong result 5-20% mouse & getting the wrong result 5-20% of the timeof the time

Speech UIs have no visible stateSpeech UIs have no visible state• can’t see what you have done before or can’t see what you have done before or

what affect your commands have hadwhat affect your commands have had Speech UIs are hard to learnSpeech UIs are hard to learn

• how do you explore the interface? how how do you explore the interface? how do you find out what you can say?do you find out what you can say?

Page 10: Speech User Interfaces

1010

Speech recognitionSpeech recognition• the computer understanding what the customer is the computer understanding what the customer is

sayingsaying

Speech production (or synthesis)Speech production (or synthesis)• the computer talking to the customerthe computer talking to the customer

Speech UIs RequireSpeech UIs Require

Page 11: Speech User Interfaces

1111

Speech RecognitionSpeech Recognition

Continuous vs. non-continuousContinuous vs. non-continuous Speaker independent vs. dependentSpeaker independent vs. dependent Speech often misunderstood by peopleSpeech often misunderstood by people

• feedback via speech, facial expressions, & gesturefeedback via speech, facial expressions, & gesture Recognizers trained with real samplesRecognizers trained with real samples

• often get gender-based problemsoften get gender-based problems Based on probabilities (HMMs - Bayes)Based on probabilities (HMMs - Bayes)

• trigrams of sounds or wordstrigrams of sounds or words Several popular recognizersSeveral popular recognizers

• Nuance, SpeechWorks, IBM ViaVoiceNuance, SpeechWorks, IBM ViaVoice

Page 12: Speech User Interfaces

1212

Speech ProductionSpeech Production

Three frequency regions of great Three frequency regions of great intensity visible on oscilloscopeintensity visible on oscilloscope• come from larynx, throat, mouthcome from larynx, throat, mouth

Two needed for recognition but “tinny”Two needed for recognition but “tinny” Can generate emotion affect in speechCan generate emotion affect in speech

• DemoDemo anger, disgust, gladness, sadness, fear, & anger, disgust, gladness, sadness, fear, &

surprise surprise http://cahn.www.media.mit.edu/people/cahn/ehttp://cahn.www.media.mit.edu/people/cahn/emot-speech.htmlmot-speech.html

Page 13: Speech User Interfaces

1313

Recognition ProblemsRecognition Problems

Good recognition Good recognition • humans < 1% error rate on dictationhumans < 1% error rate on dictation• top recognition systems get <1-X% error ratestop recognition systems get <1-X% error rates

computers don’t use much contextcomputers don’t use much context Key is to be application specific for lower error ratesKey is to be application specific for lower error rates

Background noiseBackground noise• even worse recognition rates (20-40% error)even worse recognition rates (20-40% error)

Speed Speed • Better as hardware getting fasterBetter as hardware getting faster

in 10 years gone from 5 high-end workstations required to in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAssome speech systems running on laptops or even PDAs

Page 14: Speech User Interfaces

1414

More Recognition ProblemsMore Recognition Problems

Isolated, short words difficultIsolated, short words difficult• common words become shortcommon words become short

SegmentationSegmentation• silly versus sill leasilly versus sill lea

SpellingSpelling• mail vs. male -> need to understand mail vs. male -> need to understand

languagelanguage

Page 15: Speech User Interfaces

1515

Speech UI ProblemsSpeech UI Problems

Speech UI no-nosSpeech UI no-nos• modes (no feedback)modes (no feedback)

certain commands only work when in specific statescertain commands only work when in specific states• deep hierarchies (aka voice mail hell)deep hierarchies (aka voice mail hell)

Verbose feedback wastes time/patienceVerbose feedback wastes time/patience• only confirm consequential thingsonly confirm consequential things• use meaningful, short cuesuse meaningful, short cues

InterruptionInterruption• half-duplex communication (i.e., no barge-in support)half-duplex communication (i.e., no barge-in support)

Too much speech on the part of customer is Too much speech on the part of customer is tiringtiring

Speech takes up space in working memorySpeech takes up space in working memory• can cause problems when problem solvingcan cause problems when problem solving

Page 16: Speech User Interfaces

1616

SpeechActs: SpeechActs: Guidelines for Speech UIsGuidelines for Speech UIs

Speech interface to computer toolsSpeech interface to computer tools• email, calendar, weather, stock quotesemail, calendar, weather, stock quotes

Establish common ground & shared contextEstablish common ground & shared context• make sure people know where they are in the conversationmake sure people know where they are in the conversation

PacingPacing• recog. delays are unnatural, make it clear when this occursrecog. delays are unnatural, make it clear when this occurs• barge-in lets user interrupt like in real conversationsbarge-in lets user interrupt like in real conversations• tapering of promptstapering of prompts• progressive assistance: short errors messages at first, progressive assistance: short errors messages at first,

longer when user needs more helplonger when user needs more help• implicit confirmation: include confirm in next commandimplicit confirmation: include confirm in next command

Page 17: Speech User Interfaces

SpeechActs Video

Page 18: Speech User Interfaces

1818

AnnouncementsAnnouncements

Task analysis / Contextual inquiry Task analysis / Contextual inquiry HWHW• average = 79/100, stdev. 8.4average = 79/100, stdev. 8.4

Low-fi user test due MondayLow-fi user test due Monday• questionsquestions

If you haven’t gotten a laptop yet, If you haven’t gotten a laptop yet, check with Wai-ling after classcheck with Wai-ling after class

Page 19: Speech User Interfaces

1919

SUEDE:SUEDE:Low-fi Prototyping for Speech-based UIsLow-fi Prototyping for Speech-based UIs

Supports design practiceSupports design practice• example scriptsexample scripts• Wizard of OzWizard of Oz• error simulationerror simulation• iterative design iterative design ((design-test-design-test-

analysisanalysis))

Informal user interfaceInformal user interface• no speech no speech

recognition/synthesisrecognition/synthesis• need not be programming need not be programming

expertexpert• fast & fluid designfast & fluid design

Page 20: Speech User Interfaces

machine prompt user response

Page 21: Speech User Interfaces

2121

Page 22: Speech User Interfaces

2222

Page 23: Speech User Interfaces

2323

SUEDE SummarySUEDE Summary

SUEDE supports speech-based UI designSUEDE supports speech-based UI design• moving from concrete examples to abstractionsmoving from concrete examples to abstractions• allows designer to accept responses that aren’t allows designer to accept responses that aren’t

exactly what they originally had in mindexactly what they originally had in mind• embeds iterative design w/ design-test-analyzeembeds iterative design w/ design-test-analyze

Designers using SUEDE need not be experts Designers using SUEDE need not be experts in speech recognition technologyin speech recognition technology

Page 24: Speech User Interfaces

2424

One Vision of Future User One Vision of Future User InterfacesInterfaces

Star Trek style UIStar Trek style UI• verbally ask the computer for informationverbally ask the computer for information• may be common in mobile/hands-busy situationsmay be common in mobile/hands-busy situations• problem: hard to design, build, & use!problem: hard to design, build, & use!

requires perfect speech recognition & language requires perfect speech recognition & language understandingunderstanding

Page 25: Speech User Interfaces

2525

Our Vision of Future User Our Vision of Future User InterfacesInterfaces

Multimodal, Context-aware UIsMultimodal, Context-aware UIs• multimodalmultimodal

uses multiple input modalities (speech & gesture) to uses multiple input modalities (speech & gesture) to disambiguatedisambiguate

user says “move it to this screen” while pointinguser says “move it to this screen” while pointing

• context-awarecontext-aware apps can be aware of location, user, what they are doing, apps can be aware of location, user, what they are doing,

…… people are talking -> don’t rely on speech I/Opeople are talking -> don’t rely on speech I/O

Problem: how to prototype & test new ideas?Problem: how to prototype & test new ideas?• Informal UI Design Tools!Informal UI Design Tools!

combine Wizard of Oz & informal storyboardingcombine Wizard of Oz & informal storyboarding

Page 26: Speech User Interfaces

2626

Multimodal Error CorrectionMultimodal Error Correction

Dictation error correction studyDictation error correction study• found users are better at correcting found users are better at correcting

recognition errors with a different input recognition errors with a different input modalitymodality

• recognizer got it wrong the first time -> it recognizer got it wrong the first time -> it will get it wrong the second timewill get it wrong the second time

hyperarticulating aggravateshyperarticulating aggravates

Correct dictation errors withCorrect dictation errors with• vocal spelling, writing, typing, etcvocal spelling, writing, typing, etc

Page 27: Speech User Interfaces

2727

SummarySummary

Speech UIsSpeech UIs• may permit more natural computer accessmay permit more natural computer access• allow us to use computers in more situationsallow us to use computers in more situations• are hard to get to work wellare hard to get to work well

lack of visible state, tax working memory, recognition lack of visible state, tax working memory, recognition problems, etc.problems, etc.

UI tools are needed for speech UI designUI tools are needed for speech UI design Multimodal UIs address some of the problems Multimodal UIs address some of the problems

with pure speech UIswith pure speech UIs• help disambiguatehelp disambiguate• help w/ correctionhelp w/ correction