the speech speech casey chesnut brains-n-brawn.com madison.net april 2007
TRANSCRIPT
![Page 1: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/1.jpg)
The Speech Speech
casey chesnutbrains-N-brawn.com
Madison .NET April 2007
![Page 2: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/2.jpg)
Powerpoint
• Page Up
• Page Down
![Page 3: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/3.jpg)
brains-N-brawn.com
• Pervasive Computing– Tablet PC (MVP 03)
– Compact Framework (MVP 04)
– Advanced Web Services (MVP 05)
– Media Center (MVP 06)
– Speech– Location Based Services– Artificial Intelligence– 3D
![Page 4: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/4.jpg)
Outline
• Speech Overview
• Vista Speech Recognition
• SAPI 5.3 / System.Speech
• Speech Server 2007
![Page 5: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/5.jpg)
Outline : Speech Overview
• Voice User Interface
• How does it work?– Synthesis (TTS)– Recognition (SR)
![Page 6: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/6.jpg)
Overview
• Speech is just another presentation system– Synthesis = Output to user– Recognition = User input
• Voice User Interface (VUI)
![Page 7: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/7.jpg)
VUI Modes
• Applications– Multi-modal– Voice-only
![Page 8: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/8.jpg)
VUI Tips
• Don't replicate the touch-tone-based menu system
• Restrict options on the main (opening) menu to 4 or fewer
• Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say?
http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx
![Page 9: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/9.jpg)
Speech Synthesis
• Text to Speech– Dynamic– Prompt database
![Page 10: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/10.jpg)
How Synthesis Works
• Text parsing– Sentences, numbers, symbols, pauses
• Natural language processing– Part of speech, tense
• Phonemes are looked up or sounded out
• Diphones are appended together
• Post process audio to add emphasis
• Play speech audio
![Page 11: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/11.jpg)
How Synthesis Works
• Demo– /xnaSynth app
• Article– http://www.brains-N-brawn.com/ttSpeech/– http://www.brains-N-brawn.com/xnaSynth/ (codebase from
/ttSpeech)
![Page 12: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/12.jpg)
Speech Recognition
• Speech to Text– Dictation– Command and Control
![Page 13: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/13.jpg)
How Recognition Works
• Audio signal is processed
• Look for signals which might be speech
• Phonemes are found in audio signals
• Phonemes are mapped to a dictionary or words– Dictation or grammar-based
• Apply natural language processing
![Page 14: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/14.jpg)
How Recognition Works
• Demo– /wavReader app
• Article– http://www.brains-N-brawn.com/noReco/
– http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)
![Page 15: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/15.jpg)
Outline : Vista Speech Recognizer
• Built-in to Vista’s shell
• Microphone bar
• Language support
• Can be trained to improve accuracy
• Command-and-control, also Dictation
• Automagic application support
• Horrible Office integration
• UAC problems
![Page 16: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/16.jpg)
Demo
• Say what you see• Show numbers• Correct• Spell it• Mouse grid
http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/
![Page 17: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/17.jpg)
High Risk Demo
![Page 18: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/18.jpg)
Hack
http://news.bbc.co.uk/1/hi/technology/6320865.stm
• /micBarExtend – tap and talk
![Page 19: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/19.jpg)
Narrator
• Vista’s screen reader
![Page 20: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/20.jpg)
Outline : SAPI 5.3 / System.Speech
• Desktop applications– SAPI 5.3– System.Speech
![Page 21: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/21.jpg)
SAPI 5.3
• COM based
• Native applications
• Managed apps which need more control
![Page 22: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/22.jpg)
System.Speech
• Part of .NET 3.0 WPF
• Managed wrapper built on SAPI 5.3
• Simple API
• Standards support (SSML, SRGS)
• Language support
• Vista Speech Recognition integration
• Does not work in XBAP
![Page 23: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/23.jpg)
System.Speech.Synthesis
• SpeechSynthesizer
• SSML
• PromptBuilder
• Voices
![Page 24: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/24.jpg)
System.Speech.Synthesis
• Demo– /speechSamples - /speechSynth
![Page 25: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/25.jpg)
System.Speech.Recognition
• SpeechRecognizer / SpeechRecognizerEngine
• SRGS
• GrammarBuilder
• Advanced users– Deep-link functionality– Mixed initiative
![Page 26: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/26.jpg)
System.Speech.Recognition
• Demo– /speechSamples - /speechReco
![Page 27: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/27.jpg)
System.Speech
• Demo– /micBarExtend– /mceSapiMcpl
• Article– http://www.brains-N-brawn.com/speechSamples/– http://www.brains-N-brawn.com/micBarExtend/– http://www.brains-N-brawn.com/mceSapi/ (not
updated for Vista yet)
![Page 28: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/28.jpg)
What about Mobile Devices
• OEMs can add VoiceCommand– VoiceCommand is not accessible to
developers
• WindowsMobile has the SAPI API, but no engines
• PlatformBuilder is supposed to have engines
• There are 3rd party engines for purchase
![Page 29: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/29.jpg)
Outline : Speech Server 2007
![Page 30: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/30.jpg)
Speech Server 2007
• Telephony Applications
• Outgoing calls
• Speaker Independent
![Page 31: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/31.jpg)
Speech Server 2007
• VOIP
• Language support
• VoiceXML / SALT
• Workflow development model
• Reports
• Still in beta
![Page 32: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/32.jpg)
Speech Server 2007
• Speech Synthesis– Inline
– PromptBuilder
– SSML
– Prompt databases
• Speech Recognition– Inline
– Dynamic Grammar
– SRGS
– Conversational Grammar Builder
– DTMF
![Page 33: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/33.jpg)
VoiceXML
• Declarative language
• Article– http://www.brains-N-brawn.com/vxml/– http://www.brains-N-brawn.com/myVoices/– http://www.brains-N-brawn.com/voiceBio/
![Page 34: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/34.jpg)
SALT
• Yet another declarative language
• Multimodal support has been dropped
• Article– http://www.brains-N-brawn.com/noHands/
– http://www.brains-N-brawn.com/speechMulti/– http://www.brains-N-brawn.com/tabletWeb/– http://www.brains-N-brawn.com/mceSalt/
![Page 35: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/35.jpg)
Speech Workflow
• Speech Sequence Workflow designer
• Speech activities– Statement– QuestionAnswer
• Debugging tools
![Page 36: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/36.jpg)
Speech Workflow
• Demo– /speechTextAdv– /speakerVerify– /mobileRecord
• Article– http://www.brains-N-brawn.com/
speechTextAdv/– http://www.brains-N-brawn.com/
speakerVerify/
![Page 37: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/37.jpg)
Where
• Accessibility
• Telephony
• Telematics
• Home automation
• Mobile Devices / Tablets
• Gaming
• Warehouses
• …
![Page 38: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/38.jpg)
Possible Future• Telematics• Service Pack for Office Support• Exchange Server 2007• Speech Server 2007 release• Rumors that WindowsMobile will get a public
API• Dictation has room to improve• Hope that System.Speech will ultimately work
in XBAP
![Page 39: The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007](https://reader036.vdocument.in/reader036/viewer/2022062421/56649cf45503460f949c30f3/html5/thumbnails/39.jpg)
Questions