tulsa techfest 2008 - creating a voice user interface with speech server

Post on 19-May-2015

1.849 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is the slide deck from my presentation at Tulsa Techfest 2008 on Microsoft Speech Server and Creating Successful Voice User Interfaces.

TRANSCRIPT

Creating a Voice User Interface with Speech Server 2007Jason Townsend

Jason Townsend

•President, Bartlesville .NET User Group•Sr. Analyst, ConocoPhillips•11+ Years Development Experience•Father of 4 wonderful children•Married to an amazing and forgiving wife!•Avid Sailor

Speech Server 2007

•Speech Server is an IVR (interactive voice response) platform that allows you to develop telephony applications using standards such as Speech Application Language Tags (SALT) and VoiceXML.

•New Features▫Native Voice Over IP (VoIP)▫Voice Response Workflow▫Conversational Grammar Builder

Common Application Scenarios•Customer Service

▫Pay bills by phone (ex: ChoicePay)▫Order products (ex: Tickets.com)▫Customer Support (ex: Dell)▫Banking (ex: Bank of America)

•Information Worker Markets▫Pipeline workers▫Insurance Appraisers▫Realtors▫For workers that may not be in front of a

desktop

New Features

•Support for .NET 2.0 Framework•Support for VoiceXML •Voice Response Workflow Applications

▫Based on Windows Workflow Foundation•Native Support for VoIP•Integrated into Office Communications

Server.

Speech Server Architecture

Speech Recognition Supported Languages•English – Austalia•English – United Kingdom•English – North America•German – Germany•Spanish – Americas•More to come…

VoiceXML

•W3C’s standard XML Format for specifying interactive voice dialogues between a human and a computer

•Interpreted by a voice browser

SALT• SALT Forum was founded on October 15,

2001 ▫Microsoft▫Cisco▫Comverse▫Intel▫Philips▫ScanSoft

• W3C work initiated in July 2002• SALT Forum seems to have gone dead. The

last press release was in 2003.• Main concept was multimodal applications

▫Speechify the web, ivr, handhelds, etc…

SALT Usage

•Microsoft Speech Server 2004▫Only SALT

•Microsoft Speech Server 2007▫SALT and VXML

•Plugin for Internet Explorer

Key Workflow Concepts• Workflows are a set of activities

▫ The work flow itself is an Activity• Activities are the building blocks of the application

▫ A single unit of Reuse▫ A single unit of Execution

• An Activity has associated properties, conditions, and events

• Developers can build their own Custom Activity Libraries▫ Image your own Telerik RAD Controls, Infragistics

Controls, etc… Just for VUI’s• A Workflow runs within a Host Process

▫ WAS▫ IIS▫ .EXE▫ Windows Managed Services

Dialogue Flow is a Workflow

•Speech Server only supports sequential workflow development

Speech Application Development•Define the dialogue flow

▫Statements, questions, answers, etc…▫Other activities

•Specify possible answers (grammars)•Record questions (prompts)•Integrate into the back-end (Web

Services)•Deploy, test, and tune application

Developing Your Prototype

Managed Code Assembly

Tuning Applications

•Out of the box speech applications▫Are not robust to real world user input▫Need real data to optimize

•Trial phases required for gathering data▫Wizard of Oz phase▫Pilot phases

•Visual Studio Integrated Analytics and Tuning Studio tool can be used to analyze the data and find problems

Reporting in Speech Server•Measuring application performance and

server performance▫Call-Volume ▫Self Service completion rates

•Sharing reporting date throughout the business▫Speech server can leverage the full SQL

Server stack Reporting Services Analysis Services Integration Services

Data Management – Trace Logging• Logs

▫Call details▫Application instrumentation▫Audio and grammers▫Server latencies▫More..

• Saved in Speech Server Log files• Can import via Log import tool into your SQL

Server Database/Farm• Analyze via Speech Server 2007 Analytics

and Tuning Stuiod• Present reports via SQL Server Reporting

Services

Logged Information - Prompt

•Prompt▫Content▫Barge-in detection▫Rate/Volume▫Persona

Logged Information - Response

• Input Mode▫ Speech▫ DTMF

• Grammar▫ Content (coverage)▫ Rule weights▫ Pronunciations

• Confirmation Threshold• SR configuration

▫ Speech Detection▫ Rejection Threshold▫ Silence Timeout▫ Endsilence▫ Decoder …▫ Acoustic Models …

Voice User Interface (VUI)•Allows for human interaction with computers

through a voice/speech platform•VUI is the interface to any speech application•Drive to make them conversational• Instead of Browser Incompatibility you have

dialect incompatibility.•Not all business processes are suited to VUIs.

▫Some are too complex▫Sometimes automation is impossible or

impractical

Grammars

•Best practice: constrain the grammar as much as possible.

•Good prompt design guides the caller to use in-grammar responses.

•Out-of-grammar (OOG) responses are handled with more explicit prompting to elicit in-grammar response.

VUI Design Best Practices1) Use DTMF for long numbers2) Don’t use open ended prompts3) Don’t repeat prompts4) Focus on grammar accuracy5) If natural dialogs fail, fall back to directed dialog6) Always confirm what was recognized7) Generate prompts based on recognition

confidence scores.8) Bail out if too many errors occur9) Keep text-to-speech output to a minimum10)Be aware of human memory11)“Platinum Rule”12)Let the Caller Drive

Use DTMF for Long Numbers

•Limit spoken digits to 4 or less•This rule is often broken for:

▫Credit Card Numbers▫Social Security Numbers▫Bank Account Numbers▫Telephone Numbers

•DON’T Break This Rule!!!•Remember customer privacy!

Don’t Use Open Ended Prompts

•BAD: “Hello, thank you for calling Tulsa Techfest. May I help you?

•BETTER: “Hello, thank you for calling Tulsa Techfest, would you like to hear about today’s speakers?

Don’t Repeat Prompts

•Callers will tend to repeat the same response you did not understand the first time, when prompts are repeated

•Provide Escalated Help

Focus on Grammar Accuracy• Spend time TUNING and REFINING your

grammars• Accuracy is IMPERATIVE• To reduce recognition failures:

▫Create prompts that make it clear what the user can and should say

▫Test grammars with many different utterances from several people

▫Record incoming calls once the system is in production and use this information to continually tune the grammars.

• Watch for dialects!

If Natural Dialogs Fail, Fall back to Directed Dialog•Natural Dialogs are great, but they have a

higher rate of failure.•Don’t want to frustrate the user

Always Confirm What Was Recognized•Mismatches are common

▫Austin/Boston▫Sharp/Shark▫Brittney Spears/Kevin Federline

•Even for grammars with low ambiguity it’s important to confirm your recognition

•Implicit confirmation▫Ok Jason, Are you coming to Techfest?

•QA Control makes it easy to provide confirmation

Generate Prompts Based on Recognition Confidence Scores•Speech recognition errors are common•How to handle?

▫Changing prompts▫Falling back to directed dialogs▫Transferring to operator

•Humans change their interaction based on perceived confidence, whether implicitly or explicitly

•N-Best lists are of great value here

Confidence Scores & N-Best Lists•The recognition engine returns a

confidence score along with a result•The recognition engine can return several

“guesses” of what it understood.•You tell the engine to return up to N

guesses.

Skip Lists

•Skip List is a type of N-Best processing•Keep track of results that caller has

confirmed ‘no’ to, and don’t ask again.

Bail Out If Too Many Errors

•Don’t make your customer become a “0” (zero) jammer

•Transfer to a live person if they error out more than twice

•Remember, some people have speech impediments, or patterns that may not correlate well into recognition confidence.

•Find the threshold! (This takes testing)

Keep TTS Output to a Minimum

•Does not sound professional•Hire a voice talent.. The payoff will justify

the upfront cost•Can use as a fall back for data or prompts

that need to be dynamic

Be Aware of Human Memory

•Make lists short•No more than 5 items•Present large lists in chunks•Make the prompts short

Platinum Rule

•Treat users as they want to be treated, not how you want to be treated

•Step into their shoes•Use vocabulary they understand

Let The Caller Drive

•Provide instant gratification (let’s the caller get in a zone, and they enjoy the experience due to small successes)

•Only ask for what you need, not everything at once.

VUI Design is a Science•Design before development

•Wizard of Oz Testing

•Find balance between business requirements and the caller experience

•Run usability trials on test subjects to validate your design

•Use a pilot to trial the application. If caller behavior is not as expected, make adjustments.

Demos

Additional Information

•http://www.microsoft.com/speech•http://www.microsoft.com/uc•http://www.gotspeech.net•http://www.nuance.com•https://www.intervoice.com/•http://www.tellme.com/•http://www.vuidesign.org/

Further Resources•My Blog

▫http://www.okcodemonkey.com•Linkedin

▫http://www.linkedin.com/in/okcodemonkey•Bartlesville .NET User Group

▫http://www.bdnug.com•Twitter

▫http://twitter.com/okcodemonkey•Email

▫okcodemonkey@gmail.com

Key Terms

Voice Browser• “Web Browser” that presents and IVR VUI to

the user• Provides interface to the PSTN or a PBX• Works with Voice Dialogues (were web

browsers work with HTML/XHMTL)• Presents information aurally via:

▫Text-To-Speech▫Prerecorded prompts

• Obtains information through:▫Speech Recognition▫DTMF detection

Speech Recognition

•Converts spoken words to machine readable input

DTMF (Dual-tone Multi-Frequency)•Used for telephone signaling over the line

in the voice-frequency band to the call switching center.

•Standardardized ny the ITU-T Recommendation Q.23

Text-To-Speech (Speech Synthesis)

•Artificial production of human speech•Computer used is called the speech

synthesizer•Can be implemented in software or

hardware•Converts normal language text into

speech

PSTN (Public Switched Telephone Network)• Network of the world’s public circuit switched

telephone networks• Similar to the way the Internet is the network

of the world’s public IP-based packet-switched networks.

• Originally a network of fixed-line analog telephone systems

• Now almost completely digital and includes mobile phones

• Governed by technical standards created by the ITU-T, and uses E.163/E.164 addresses (telephone numbers)

ITU-T (International Telecommunication Union Standardization Sector)•Coordinates standards for

telecommunications on behalf of the International Telecommunications Union

•Based in Geneva, Switzerland•Original work dates back to 1865, with

the birth of the International Telegraph Union

•Became a United Nations specialized agency in 1947

ITU (International Telecommunication Union)•Established to standardize and regulate

international radio and telecommunications.

•Founded as the International Telegraph Union on May 17, 1865 in Paris

•Main tasks include standardization, allocation of the radio spectrum, and organizing interconnection agreements between countries

PBX (Private Branch Exchange)

•Is a telephone exchange that serves as a particular business or office, as opposed to one that a common carrier or telephone company operates for many businesses

top related