the speech speech

39
The Speech Speech casey chesnut brains-N-brawn.com Madison .NET April 2007

Upload: nero-cross

Post on 30-Dec-2015

58 views

Category:

Documents


2 download

DESCRIPTION

The Speech Speech. casey chesnut brains-N-brawn.com Madison .NET April 2007. Powerpoint. Page Up Page Down. brains-N-brawn.com. Pervasive Computing Tablet PC (MVP 03) Compact Framework (MVP 04) Advanced Web Services (MVP 05) Media Center (MVP 06) Speech Location Based Services - PowerPoint PPT Presentation

TRANSCRIPT

The Speech Speech

casey chesnutbrains-N-brawn.com

Madison .NET April 2007

Powerpoint

• Page Up

• Page Down

brains-N-brawn.com

• Pervasive Computing– Tablet PC (MVP 03)

– Compact Framework (MVP 04)

– Advanced Web Services (MVP 05)

– Media Center (MVP 06)

– Speech– Location Based Services– Artificial Intelligence– 3D

Outline

• Speech Overview

• Vista Speech Recognition

• SAPI 5.3 / System.Speech

• Speech Server 2007

Outline : Speech Overview

• Voice User Interface

• How does it work?– Synthesis (TTS)– Recognition (SR)

Overview

• Speech is just another presentation system– Synthesis = Output to user– Recognition = User input

• Voice User Interface (VUI)

VUI Modes

• Applications– Multi-modal– Voice-only

VUI Tips

• Don't replicate the touch-tone-based menu system

• Restrict options on the main (opening) menu to 4 or fewer

• Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say?

http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx

Speech Synthesis

• Text to Speech– Dynamic– Prompt database

How Synthesis Works

• Text parsing– Sentences, numbers, symbols, pauses

• Natural language processing– Part of speech, tense

• Phonemes are looked up or sounded out

• Diphones are appended together

• Post process audio to add emphasis

• Play speech audio

How Synthesis Works

• Demo– /xnaSynth app

• Article– http://www.brains-N-brawn.com/ttSpeech/– http://www.brains-N-brawn.com/xnaSynth/ (codebase from

/ttSpeech)

Speech Recognition

• Speech to Text– Dictation– Command and Control

How Recognition Works

• Audio signal is processed

• Look for signals which might be speech

• Phonemes are found in audio signals

• Phonemes are mapped to a dictionary or words– Dictation or grammar-based

• Apply natural language processing

How Recognition Works

• Demo– /wavReader app

• Article– http://www.brains-N-brawn.com/noReco/

– http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)

Outline : Vista Speech Recognizer

• Built-in to Vista’s shell

• Microphone bar

• Language support

• Can be trained to improve accuracy

• Command-and-control, also Dictation

• Automagic application support

• Horrible Office integration

• UAC problems

Demo

• Say what you see• Show numbers• Correct• Spell it• Mouse grid

http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/

High Risk Demo

Hack

http://news.bbc.co.uk/1/hi/technology/6320865.stm

• /micBarExtend – tap and talk

Narrator

• Vista’s screen reader

Outline : SAPI 5.3 / System.Speech

• Desktop applications– SAPI 5.3– System.Speech

SAPI 5.3

• COM based

• Native applications

• Managed apps which need more control

System.Speech

• Part of .NET 3.0 WPF

• Managed wrapper built on SAPI 5.3

• Simple API

• Standards support (SSML, SRGS)

• Language support

• Vista Speech Recognition integration

• Does not work in XBAP

System.Speech.Synthesis

• SpeechSynthesizer

• SSML

• PromptBuilder

• Voices

System.Speech.Synthesis

• Demo– /speechSamples - /speechSynth

System.Speech.Recognition

• SpeechRecognizer / SpeechRecognizerEngine

• SRGS

• GrammarBuilder

• Advanced users– Deep-link functionality– Mixed initiative

System.Speech.Recognition

• Demo– /speechSamples - /speechReco

System.Speech

• Demo– /micBarExtend– /mceSapiMcpl

• Article– http://www.brains-N-brawn.com/speechSamples/– http://www.brains-N-brawn.com/micBarExtend/– http://www.brains-N-brawn.com/mceSapi/ (not

updated for Vista yet)

What about Mobile Devices

• OEMs can add VoiceCommand– VoiceCommand is not accessible to

developers

• WindowsMobile has the SAPI API, but no engines

• PlatformBuilder is supposed to have engines

• There are 3rd party engines for purchase

Outline : Speech Server 2007

Speech Server 2007

• Telephony Applications

• Outgoing calls

• Speaker Independent

Speech Server 2007

• VOIP

• Language support

• VoiceXML / SALT

• Workflow development model

• Reports

• Still in beta

Speech Server 2007

• Speech Synthesis– Inline

– PromptBuilder

– SSML

– Prompt databases

• Speech Recognition– Inline

– Dynamic Grammar

– SRGS

– Conversational Grammar Builder

– DTMF

VoiceXML

• Declarative language

• Article– http://www.brains-N-brawn.com/vxml/– http://www.brains-N-brawn.com/myVoices/– http://www.brains-N-brawn.com/voiceBio/

SALT

• Yet another declarative language

• Multimodal support has been dropped

• Article– http://www.brains-N-brawn.com/noHands/

– http://www.brains-N-brawn.com/speechMulti/– http://www.brains-N-brawn.com/tabletWeb/– http://www.brains-N-brawn.com/mceSalt/

Speech Workflow

• Speech Sequence Workflow designer

• Speech activities– Statement– QuestionAnswer

• Debugging tools

Speech Workflow

• Demo– /speechTextAdv– /speakerVerify– /mobileRecord

• Article– http://www.brains-N-brawn.com/

speechTextAdv/– http://www.brains-N-brawn.com/

speakerVerify/

Where

• Accessibility

• Telephony

• Telematics

• Home automation

• Mobile Devices / Tablets

• Gaming

• Warehouses

• …

Possible Future• Telematics• Service Pack for Office Support• Exchange Server 2007• Speech Server 2007 release• Rumors that WindowsMobile will get a public

API• Dictation has room to improve• Hope that System.Speech will ultimately work

in XBAP

Questions