tulsa techfest 2008 - creating a voice user interface with speech server

53
Creating a Voice User Interface with Speech Server 2007 Jason Townsend

Upload: jason-townsend

Post on 19-May-2015

1.848 views

Category:

Technology


0 download

DESCRIPTION

This is the slide deck from my presentation at Tulsa Techfest 2008 on Microsoft Speech Server and Creating Successful Voice User Interfaces.

TRANSCRIPT

Page 1: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Creating a Voice User Interface with Speech Server 2007Jason Townsend

Page 2: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Jason Townsend

•President, Bartlesville .NET User Group•Sr. Analyst, ConocoPhillips•11+ Years Development Experience•Father of 4 wonderful children•Married to an amazing and forgiving wife!•Avid Sailor

Page 3: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Page 4: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Speech Server 2007

•Speech Server is an IVR (interactive voice response) platform that allows you to develop telephony applications using standards such as Speech Application Language Tags (SALT) and VoiceXML.

•New Features▫Native Voice Over IP (VoIP)▫Voice Response Workflow▫Conversational Grammar Builder

Page 5: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Common Application Scenarios•Customer Service

▫Pay bills by phone (ex: ChoicePay)▫Order products (ex: Tickets.com)▫Customer Support (ex: Dell)▫Banking (ex: Bank of America)

•Information Worker Markets▫Pipeline workers▫Insurance Appraisers▫Realtors▫For workers that may not be in front of a

desktop

Page 6: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

New Features

•Support for .NET 2.0 Framework•Support for VoiceXML •Voice Response Workflow Applications

▫Based on Windows Workflow Foundation•Native Support for VoIP•Integrated into Office Communications

Server.

Page 7: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Speech Server Architecture

Page 8: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Speech Recognition Supported Languages•English – Austalia•English – United Kingdom•English – North America•German – Germany•Spanish – Americas•More to come…

Page 9: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

VoiceXML

•W3C’s standard XML Format for specifying interactive voice dialogues between a human and a computer

•Interpreted by a voice browser

Page 10: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

SALT• SALT Forum was founded on October 15,

2001 ▫Microsoft▫Cisco▫Comverse▫Intel▫Philips▫ScanSoft

• W3C work initiated in July 2002• SALT Forum seems to have gone dead. The

last press release was in 2003.• Main concept was multimodal applications

▫Speechify the web, ivr, handhelds, etc…

Page 11: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

SALT Usage

•Microsoft Speech Server 2004▫Only SALT

•Microsoft Speech Server 2007▫SALT and VXML

•Plugin for Internet Explorer

Page 12: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Key Workflow Concepts• Workflows are a set of activities

▫ The work flow itself is an Activity• Activities are the building blocks of the application

▫ A single unit of Reuse▫ A single unit of Execution

• An Activity has associated properties, conditions, and events

• Developers can build their own Custom Activity Libraries▫ Image your own Telerik RAD Controls, Infragistics

Controls, etc… Just for VUI’s• A Workflow runs within a Host Process

▫ WAS▫ IIS▫ .EXE▫ Windows Managed Services

Page 13: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Dialogue Flow is a Workflow

•Speech Server only supports sequential workflow development

Page 14: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Speech Application Development•Define the dialogue flow

▫Statements, questions, answers, etc…▫Other activities

•Specify possible answers (grammars)•Record questions (prompts)•Integrate into the back-end (Web

Services)•Deploy, test, and tune application

Page 15: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Developing Your Prototype

Managed Code Assembly

Page 16: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Tuning Applications

•Out of the box speech applications▫Are not robust to real world user input▫Need real data to optimize

•Trial phases required for gathering data▫Wizard of Oz phase▫Pilot phases

•Visual Studio Integrated Analytics and Tuning Studio tool can be used to analyze the data and find problems

Page 17: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Reporting in Speech Server•Measuring application performance and

server performance▫Call-Volume ▫Self Service completion rates

•Sharing reporting date throughout the business▫Speech server can leverage the full SQL

Server stack Reporting Services Analysis Services Integration Services

Page 18: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Data Management – Trace Logging• Logs

▫Call details▫Application instrumentation▫Audio and grammers▫Server latencies▫More..

• Saved in Speech Server Log files• Can import via Log import tool into your SQL

Server Database/Farm• Analyze via Speech Server 2007 Analytics

and Tuning Stuiod• Present reports via SQL Server Reporting

Services

Page 19: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Logged Information - Prompt

•Prompt▫Content▫Barge-in detection▫Rate/Volume▫Persona

Page 20: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Logged Information - Response

• Input Mode▫ Speech▫ DTMF

• Grammar▫ Content (coverage)▫ Rule weights▫ Pronunciations

• Confirmation Threshold• SR configuration

▫ Speech Detection▫ Rejection Threshold▫ Silence Timeout▫ Endsilence▫ Decoder …▫ Acoustic Models …

Page 21: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Page 22: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Voice User Interface (VUI)•Allows for human interaction with computers

through a voice/speech platform•VUI is the interface to any speech application•Drive to make them conversational• Instead of Browser Incompatibility you have

dialect incompatibility.•Not all business processes are suited to VUIs.

▫Some are too complex▫Sometimes automation is impossible or

impractical

Page 23: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Grammars

•Best practice: constrain the grammar as much as possible.

•Good prompt design guides the caller to use in-grammar responses.

•Out-of-grammar (OOG) responses are handled with more explicit prompting to elicit in-grammar response.

Page 24: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

VUI Design Best Practices1) Use DTMF for long numbers2) Don’t use open ended prompts3) Don’t repeat prompts4) Focus on grammar accuracy5) If natural dialogs fail, fall back to directed dialog6) Always confirm what was recognized7) Generate prompts based on recognition

confidence scores.8) Bail out if too many errors occur9) Keep text-to-speech output to a minimum10)Be aware of human memory11)“Platinum Rule”12)Let the Caller Drive

Page 25: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Use DTMF for Long Numbers

•Limit spoken digits to 4 or less•This rule is often broken for:

▫Credit Card Numbers▫Social Security Numbers▫Bank Account Numbers▫Telephone Numbers

•DON’T Break This Rule!!!•Remember customer privacy!

Page 26: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Don’t Use Open Ended Prompts

•BAD: “Hello, thank you for calling Tulsa Techfest. May I help you?

•BETTER: “Hello, thank you for calling Tulsa Techfest, would you like to hear about today’s speakers?

Page 27: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Don’t Repeat Prompts

•Callers will tend to repeat the same response you did not understand the first time, when prompts are repeated

•Provide Escalated Help

Page 28: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Focus on Grammar Accuracy• Spend time TUNING and REFINING your

grammars• Accuracy is IMPERATIVE• To reduce recognition failures:

▫Create prompts that make it clear what the user can and should say

▫Test grammars with many different utterances from several people

▫Record incoming calls once the system is in production and use this information to continually tune the grammars.

• Watch for dialects!

Page 29: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

If Natural Dialogs Fail, Fall back to Directed Dialog•Natural Dialogs are great, but they have a

higher rate of failure.•Don’t want to frustrate the user

Page 30: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Always Confirm What Was Recognized•Mismatches are common

▫Austin/Boston▫Sharp/Shark▫Brittney Spears/Kevin Federline

•Even for grammars with low ambiguity it’s important to confirm your recognition

•Implicit confirmation▫Ok Jason, Are you coming to Techfest?

•QA Control makes it easy to provide confirmation

Page 31: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Generate Prompts Based on Recognition Confidence Scores•Speech recognition errors are common•How to handle?

▫Changing prompts▫Falling back to directed dialogs▫Transferring to operator

•Humans change their interaction based on perceived confidence, whether implicitly or explicitly

•N-Best lists are of great value here

Page 32: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Confidence Scores & N-Best Lists•The recognition engine returns a

confidence score along with a result•The recognition engine can return several

“guesses” of what it understood.•You tell the engine to return up to N

guesses.

Page 33: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Skip Lists

•Skip List is a type of N-Best processing•Keep track of results that caller has

confirmed ‘no’ to, and don’t ask again.

Page 34: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Bail Out If Too Many Errors

•Don’t make your customer become a “0” (zero) jammer

•Transfer to a live person if they error out more than twice

•Remember, some people have speech impediments, or patterns that may not correlate well into recognition confidence.

•Find the threshold! (This takes testing)

Page 35: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Keep TTS Output to a Minimum

•Does not sound professional•Hire a voice talent.. The payoff will justify

the upfront cost•Can use as a fall back for data or prompts

that need to be dynamic

Page 36: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Be Aware of Human Memory

•Make lists short•No more than 5 items•Present large lists in chunks•Make the prompts short

Page 37: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Platinum Rule

•Treat users as they want to be treated, not how you want to be treated

•Step into their shoes•Use vocabulary they understand

Page 38: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Let The Caller Drive

•Provide instant gratification (let’s the caller get in a zone, and they enjoy the experience due to small successes)

•Only ask for what you need, not everything at once.

Page 39: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

VUI Design is a Science•Design before development

•Wizard of Oz Testing

•Find balance between business requirements and the caller experience

•Run usability trials on test subjects to validate your design

•Use a pilot to trial the application. If caller behavior is not as expected, make adjustments.

Page 40: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Demos

Page 41: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Page 42: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Additional Information

•http://www.microsoft.com/speech•http://www.microsoft.com/uc•http://www.gotspeech.net•http://www.nuance.com•https://www.intervoice.com/•http://www.tellme.com/•http://www.vuidesign.org/

Page 43: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Further Resources•My Blog

▫http://www.okcodemonkey.com•Linkedin

▫http://www.linkedin.com/in/okcodemonkey•Bartlesville .NET User Group

▫http://www.bdnug.com•Twitter

▫http://twitter.com/okcodemonkey•Email

[email protected]

Page 44: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Key Terms

Page 45: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Page 46: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Voice Browser• “Web Browser” that presents and IVR VUI to

the user• Provides interface to the PSTN or a PBX• Works with Voice Dialogues (were web

browsers work with HTML/XHMTL)• Presents information aurally via:

▫Text-To-Speech▫Prerecorded prompts

• Obtains information through:▫Speech Recognition▫DTMF detection

Page 47: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Speech Recognition

•Converts spoken words to machine readable input

Page 48: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

DTMF (Dual-tone Multi-Frequency)•Used for telephone signaling over the line

in the voice-frequency band to the call switching center.

•Standardardized ny the ITU-T Recommendation Q.23

Page 49: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

Text-To-Speech (Speech Synthesis)

•Artificial production of human speech•Computer used is called the speech

synthesizer•Can be implemented in software or

hardware•Converts normal language text into

speech

Page 50: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

PSTN (Public Switched Telephone Network)• Network of the world’s public circuit switched

telephone networks• Similar to the way the Internet is the network

of the world’s public IP-based packet-switched networks.

• Originally a network of fixed-line analog telephone systems

• Now almost completely digital and includes mobile phones

• Governed by technical standards created by the ITU-T, and uses E.163/E.164 addresses (telephone numbers)

Page 51: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

ITU-T (International Telecommunication Union Standardization Sector)•Coordinates standards for

telecommunications on behalf of the International Telecommunications Union

•Based in Geneva, Switzerland•Original work dates back to 1865, with

the birth of the International Telegraph Union

•Became a United Nations specialized agency in 1947

Page 52: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

ITU (International Telecommunication Union)•Established to standardize and regulate

international radio and telecommunications.

•Founded as the International Telegraph Union on May 17, 1865 in Paris

•Main tasks include standardization, allocation of the radio spectrum, and organizing interconnection agreements between countries

Page 53: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

PBX (Private Branch Exchange)

•Is a telephone exchange that serves as a particular business or office, as opposed to one that a common carrier or telephone company operates for many businesses