css v7 user guide asr option nuance

Nuance ASR Option

User Guide

AspectCustomer Self-Servicev7.0

© Copyright 2002–2004 by Aspect Communications Corporation. All rights reserved.

Aspect Communications Corporation is headquartered in San Jose, California.Aspect Communications Limited is headquartered in Stockley Park, Uxbridge, Middlesex, United Kingdom.Aspect Communications B.V. is headquartered in Hoofddorp, the Netherlands.Aspect Communications GmbH is headquartered in Neu-Isenburg, Germany.Aspect Communications (S) Pte Ltd. is headquartered in Singapore.Aspect Communications (HK) Ltd. is headquartered in Wanchai, Hong Kong.Aspect Communications Pty Ltd. is headquartered in Sydney, Australia.Aspect Communications Japan, Ltd., is headquartered in Tokyo, Japan.Aspect Communications is headquartered in Paris, France.Aspect Communications is headquartered in Markham, Ontario, Canada.

Aspect and the Aspect logo are trademarks and/or service marks of Aspect Communications Corporation in the United States and/or other countries. All other product or service names mentioned in this Guide may be trademarks of the companies with which they are associated.

User Guide for the Nuance ASR Option April 7, 2004

User Guide for the ASR Nuance Option iii

Contents

About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viiWho Should Read This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiHow This Guide Is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiWhere to Find Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiTechnical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Chapter 1: Overview of the CSS Nuance ASR Option . . . . . . . . . . . . . . . . . .11About Nuance ASR v8.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11About Language Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12About Integrated CSS Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12About CSS Nuance ASR System and Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 14

Required System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Optional System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Relocating the Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Relocating the Runtime Server Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Relocating the Nuance Software Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Relocating the Telephony Server Software Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 2: Basic Speech Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19About Nuance Grammars and Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Specifying a Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Compiling a Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

About Nuance Parameters and Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21About Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Sharing Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Setting Parameters at Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Using Applications to Set Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Using Contexts to Set Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Specifying Recognition Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Using Token Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Using the VINP Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Creating a Sample Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Developing CSS Applications for Different Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 26

iv Aspect Customer Self-Service v7.0

Specifying a Recognition Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27About Resource Management for Recognizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Allocating and Deallocating Recognizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Configuring Recognizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Using Sparse Recognition Resource Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

About Speech Recognition Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30VINP Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30LISTEN Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

About the LISTEN Cell Input Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31About the Information Returned by the LISTEN Cell . . . . . . . . . . . . . . . . . . . . . . . . 35

About the Nuance Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36NNBEST Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Chapter 3: Natural Language Understanding . . . . . . . . . . . . . . . . . . . . . . . . 39How Natural Language Understanding Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39CSS Cells Used to Access Natural Language Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Accessing Semantic Information through the NSLOT Cell . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Using the NSLOT Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Retrieving Values From Slots Associated With an Optional Component . . . . . . . . . . . . . 42

About Ambiguous Grammar Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Accessing Advanced Slot-Type Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Accessing Per-Slot Confidences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 4: Dynamic Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45About Dynamic Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Updating Dynamic Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Creating and Using Dynamic Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Loading Dynamic Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Using the Nuance Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

About Recognition Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Preparing to Use a Dynamic Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Implementing a Dynamic Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 5: Speaker Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51How Speaker Verification Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

About the Verifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51About Online Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52About Variable-Length and Fixed-Length Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Using Verification in an Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Determining Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54About Verifier Processing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Storing Nuance Voiceprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

About Speaker Verification Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

User Guide for the ASR Nuance Option v

About Speaker Verification Cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56SVT Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56SV Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57SET Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57VINP and LISTEN Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58GETSV Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58GET Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Chapter 6: CSS Nuance ASR Advanced Features . . . . . . . . . . . . . . . . . . . . 61About Utterance Recording. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Setting Up Utterances for Application Playback. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62About Utterance Processing by CSS Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Playing Back Utterance Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Creating Utterances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Setting Up Utterance Recording for Call Logging Purposes . . . . . . . . . . . . . . . . . . . . . . . 63Formatted Text Recognition Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64N-Best List Size Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Using a Nuance Recognition Context to Set the N-Best Size . . . . . . . . . . . . . . . . . . . . . . 65Using the SET Cell to Dynamically Set the N-Best List Size. . . . . . . . . . . . . . . . . . . . . . . 65About Message Size Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Chapter 7: Configuring the Nuance ASR Option . . . . . . . . . . . . . . . . . . . . . 67Configuring Nuance on the Telephony Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Configuring Nuance on the Recognition Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Configuring Nuance through the SCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Nuance Configuration Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Chapter 8: Using Nuance ASR with VXML . . . . . . . . . . . . . . . . . . . . . . . . . . 75Grammar Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Just-In-Time Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Grammar Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Chapter 9: Nuance Call Logging and CSS. . . . . . . . . . . . . . . . . . . . . . . . . . . 77Enabling Nuance Call Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Additional Logging Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

vi Aspect Customer Self-Service v7.0

User Guide for the Nuance ASR Option vii

About This Guide

This guide describes how to write an Aspect Customer Self-Service (CSS) application that is speech-enabled using automatic speech recognition (ASR) software from Nuance Communications. It also describes the architecture of the Aspect and Nuance systems and how to configure them in order to run your application.

Who Should Read This Guide

This guide is written for anyone involved in the development and deployment of an Aspect CSS application that is to be speech-enabled using Nuance ASR technology. This includes project managers, product architects, voice-interface (dialog) designers, application developers, and system site planners and administrators.

Those who use this manual should have a basic familiarity with ASR technology in general, the Nuance ASR system in particular, and the Aspect CSS system. This manual is not intended as an introduction or complete reference to any of these topics.

viii Aspect Customer Self-Service v7.0

How This Guide Is Organized

This guide is organized as follows:

■ Chapter 1, Overview of the CSS Nuance ASR Option, provides an overview of the Aspect CSS Nuance ASR option. It also describes the Aspect CSS and Nuance ASR system architecture.

■ Chapter 2, Basic Speech Recognition, provides the information you need to know in order to develop a CSS application that performs basic speech recognition processing.

■ Chapter 3, Natural Language Understanding, provides the information you need to know in order to access the Natural Language (NL) understanding capabilities of the Nuance system.

■ Chapter 4, Dynamic Grammars, provides the information you need to know in order to use the dynamic grammar and voice enrollment capabilities of the Nuance system.

■ Chapter 5, Speaker Verification, describes how to use the Nuance ASR software with CSS to add speaker verification to a CSS application.

■ Chapter 6, CSS Nuance ASR Advanced Features, provides information about the advanced features of the CSS Nuance ASR system.

■ Chapter 7, Configuring the Nuance ASR Option, describes how to configure the Nuance ASR system.

■ Chapter 8, Using Nuance ASR with VXML, describes some issues to consider when developing a VXML script to use with the Nuance ASR option.

■ Chapter 9, Nuance Call Logging and CSS, describes how to enable the Nuance call logging facility for your CSS application.

Where to Find Additional Information

The following additional documents related to the Aspect CSS system are available:

■ For information about installing Aspect CSS core software and options, and upgrading from previous versions of Aspect CSS, see the Aspect Customer Self-Service v7.0 Installation and Upgrade Guide.

■ For information about administering the Telephony Server, see the Aspect Customer Self-Service v7.0 Service Console Interface Guide for the Telephony Server.

■ For information about developing CSS applications, see the Aspect Customer Self-Service v7.0 Application Developer Guide.

User Guide for the Nuance ASR Option ix

■ For information about administering the Runtime Server Platform, see the Aspect Customer Self-Service v7.0 System Administration Guide for the Runtime Server.

■ For information about specific RSP cells and parameters, see the Aspect Customer Self-Service v7.0 Cell Catalog Reference Guide or the ASCII on-line Cell Help system (available within RSP).

■ For information about installing the ODBC database access options, see the:

– Aspect Customer Self-Service v7.0 Installation Guide for ODBC Access, Windows Edition

– Aspect Customer Self-Service v7.0 Installation Guide for ODBC Access, UNIX Edition

■ For information about fax technologies, see the Aspect Customer Self-Service v7.0 User Guide for the Fax Option.

■ For information about using the Nuance ASR, see the Aspect Customer Self-Service v7.0 User Guide for the Automatic Speech Recognition Option (Nuance).

■ For information about using ScanSoft OSR, see the Aspect Customer Self-Service v7.0 User Guide for the Automatic Speech Recognition Option (ScanSoft).

■ For information about using Text-to-Speech options in a CSS application, see the Aspect Customer Self-Service v7.0 User Guide for the Text-To-Speech Option.

■ For information about the Aspect CSS SNMP option, see the Aspect Customer Self-Service v7.0 User Guide for SNMP.

■ For information about using the Data-Driven Languages feature to customize the CSS user interface to display information in supported languages, see the Aspect Customer Self-Service v7.0 User Guide for Data-Driven Languages.

■ For information about using the stock applications included with the Aspect CSS software, see the Aspect Customer Self-Service v6.2 User Guide for Canned Applications. These canned applications are not updated for the CSS v7.0 release.

For the latest versions of the Aspect documentation, access the following Web site:

http://www.aspect.com/services/login.cfm

You must have a valid Aspect support contract to log on to this site.

In addition to the Aspect documentation, Nuance documents are available on your system after installation. To locate the documentation, go to the Windows Start menu and select Nuance→v8.0.0→Documentation→Product Documentation.

x Aspect Customer Self-Service v7.0

Technical Support

Aspect provides technical support under the Aspect Support Agreement. If you have a question or problem that you are unable to resolve by reading the manual or online Help, call your Aspect representative.

When you call, have your documentation handy, and be prepared to provide the following information:

■ Your name and address

■ Your Aspect site number

■ The version of the Aspect software you are running

■ A concise, clear description of the problem, including any error messages that appeared, and of the actions you were taking when you encountered the problem.

User Guide for the Nuance ASR Option 11

CH

AP

TE

R

1Overview of the CSS Nuance ASR Option Chapter 1:

This chapter describes the Nuance ASR v8.0 option, the languages and features it supports, and the system and software architecture.

This chapter contains the following topics:

■ About Nuance ASR v8.0

■ About Language Support

■ About Integrated CSS Features

■ About CSS Nuance ASR System and Software Architecture

About Nuance ASR v8.0

As integrated with Aspect's CSS system, the Nuance ASR system provides the advanced speech recognition, speaker verification, and language understanding capabilities required to automate over-the-phone transactions. These applications can supplement call center agents or provide other automated services available from any telephone.

For a description of the capabilities and features that Nuance ASR v8.0 provides, see the Nuance documentation.

Chapter 1: Overview of the CSS Nuance ASR Option

12 Aspect Customer Self-Service v7.0

About Language Support

Nuance v8.0 ships with the acoustic models needed to support the following languages:

■ Arabic (Jordan) ■ Greek

■ Cantonese Chinese ■ Hebrew

■ Czech ■ Italian

■ Danish ■ Japanese

■ Dutch ■ Korean

■ English - Australian/New Zealand ■ Mandarin - Chinese (China)

■ English - Singapore ■ Mandarin - Chinese (Taiwan)

■ English - South African ■ Norwegian

■ English - U.K. ■ Portuguese - Brazilian

■ English - U.S./Canadian ■ Spanish - European

■ French - Canadian ■ Spanish - Latin American

■ French - European ■ Swedish

■ German - German/Austrian ■ Turkish

■ German - Swiss

In addition, as mentioned above, custom multi-lingual acoustic models can be generated by Nuance on request.

Note: The above list reflects the officially supported languages that shipped with the Nuance v8.0 distribution. Additional language models that are completely compatible with the v8.0 release may have been made available by Nuance since this release. Contact your sales representative if additional language support is needed for your application.

About Integrated CSS Features

The integration of Nuance ASR v8.0 with the CSS v7.0 system provides the following capabilities and features:

■ Access to all primary Nuance ASR features.

CSS systems can be configured and CSS applications written to provide access to many of the features of the Nuance ASR system.



■ Dynamic recognition resource management.

The CSS system delays allocation of a Nuance recognizer to a call until the application makes a recognition request, at which time a recognizer that supports the Nuance grammar specified in the request is selected. The selected recognizer normally remains allocated until the end of the call, but an application whose use of speech recognition is localized to a particular portion of the call can explicitly deallocate the recognizer when it is no longer needed. An application that does not use speech recognition for the duration of the call can use a sparse recognition resource configuration (one in which fewer recognizers than telephone channels are configured). In such configurations, active calls dynamically compete for available recognizers.

If you can obtain accurate statistics on call volume and the percentage of time your application spends on average in its recognition-enabled portion, implementing a sparse recognition resource configuration can result in significant cost savings (as compared to a configuration which provides a recognizer for every telephone channel). A proper traffic engineering analysis can enable you to realize this cost savings without sacrificing recognizer availability for the significant majority of recognition requests.

■ Dynamic active grammar selection.

The CSS application tells the Nuance system which grammar to use for each recognition request, resulting in the caller's utterance being recognized and interpreted in the context of that grammar.

Note: Dynamic active grammar selection is only available if a dynamic grammar already exists (from a previous version of CSS). The development of new dynamic grammars is not supported in CSS v7.0.

■ Integrated speech and DTMF input.

The CSS system allows callers to use DTMF as an alternative to voice input at any point in the application where speech input is accepted. This can be useful if the active grammar consists entirely of digits or if the application's interface design needs to allow DTMF keypad spelling as an alternative to speech input (for example, spelling a stock symbol name using the telephone keypad as opposed to speaking it).

■ Recorded utterance playback.

Some applications require the ability to play a caller's spoken phrase back to the caller. Examples include agent whisper or the system access portion of an application that can play back a password hint that the caller entered during a previous system enrollment phase. CSS supports playing back utterances recorded by either the recognizer or the Telephony Server's voice subsystem.

■ Call and prompt management.

The CSS system automatically hands off incoming calls to specified applications, schedules outbound applications, handles detection of caller hang-up during a call, and plays standard audio or TTS prompts to the caller, as specified by the application.



■ Access to other services provided by the CSS telephony or third-party servers.

The CSS system provides fax, database, terminal protocol-based remote host communications (screen scrape and pop), and CTI services. The CSS system also supports a generic interface for extending the system to access non-standard services running on any host on the network.

■ Direct (grammar-integrated) DTMF support.

If you set Abort Recognition upon DTMF Input to No and you are using a DTMF-enabled grammar, the recognizer can interpret either DTMF or speech input.

■ ODBC database support.

Nuance v8.0 supports this only on the MS Windows platform.

About CSS Nuance ASR System and Software Architecture

Both the Aspect CSS and Nuance ASR systems are distributed, client-server systems. Figure 1 illustrates a typical system architecture. For a description of the system components, see Required System Components on page 15. Optional system components are described in Optional System Components on page 17.

Note: The systems must be connected by a high-speed Local Area Network (LAN). The details of the LAN topology depend on the size and nature of your application.

For additional information about Nuance system architecture, see the Nuance documentation.



Figure 1. System Architecture

Required System Components

Depending on your call volume and the recognition tasks that your application performs, you may need multiple instances of one or more of the components described in this section.

The system architecture components shown in Figure 1 include:

Runtime Server — executes the call flow logic defined in the CSS application. It initiates requests of the Telephony Server and receives replies indicating the results. It can also initiate requests to third-party database or other remote server systems.

Telephony Server — performs all core telephony functions, including call management (answering and hanging up the phone line), plays prompts, records caller speech, detects and generates DTMF, and sends and receives faxes (when configured to do so).

RecServer 1 RecServer 2 (optional) RecServer 3 (optional)

Runtime Server

Aspect CSSapplication

RecClient(runs on Host CPU) Dialogic JCT Card

Nuance Resource Manager

Nuance License Manager

Telephony Server

VRU

ISDN, T1, or E1phone line card



Requests to perform a TTS play or an ASR are generally performed by remote server systems. The remote server sends the generated audio (for TTS) or recognition results (for ASR) back to the Telephony Server. For TTS, the audio is streamed (forwarded in real time) to the voice board in the Telephony Server, and then played to the caller. For ASR, the recognition results are sent back to the requesting Runtime Server for parsing and analysis by the CSS application.

Nuance Audio Provider—runs on the Telephony Server. It receives the real-time audio data from the Telephony Server's telephone line interface board, performs preprocessing (speech start and end detection) on it, and then forwards the preprocessed audio to the Nuance recognition client process.

Nuance Recognition Client (RecClient)—runs on the Telephony Server host CPU. It reads the audio data from the Nuance audio provider, streams it to one of the recognition servers on the network, receives the recognition results from the server, and sends those results back to the Telephony Server software subsystem.

Nuance Recognition Server (RecServer)—runs on the Nuance recognition server system. It performs the core recognition, natural language interpretation, and speaker verification functions in the Nuance system. It receives the real-time speech data from the RecClients and sends back the recognition analysis results.

A single RecServer process can process multiple recognition requests concurrently without introducing additional delays in the system. The exact number of concurrent requests that can be handled is dependent on several factors including the processing power and memory resources of the RecServer system, the size and complexity of the configured grammar, and the acceptable response time latency for your application. See the Nuance documentation for more information.

Nuance Compilation Server —runs on the Nuance RecServer. This process enables speaker-independent dynamic grammars. You must start at least one compilation server before you call any functions that create or edit dynamic grammars.

Nuance Resource Manager—runs on the Telephony Server or Nuance RecServer system. In configurations with multiple RecServers, it performs real-time load balancing across those RecServers by intelligently distributing recognition requests among them.

More specifically, when a RecClient needs to initiate a recognition task, it forwards a request for the task to the resource manager. Depending on the grammar specified and the current load on each RecServer system, the resource manager picks a RecServer to handle the task, and replies to the RecClient with that choice. The RecClient establishes a session with the chosen RecServer and starts streaming the speech data to it. In the default configuration, the session is terminated when the recognition results are received from the RecServer or when the caller hangs up. In this way, a new RecServer is selected each time a recognition request is made of the system, helping to maintain a more even load balance across the pool of configured RecServers.



Nuance License Manager—runs on the Telephony Server or Nuance RecServer system. It enforces the Nuance product licensing policies. The RecClient and RecServer systems validate their right to execute as configured by checking out the appropriate number and type of product licenses at system start-up time.

Optional System Components

The following components may optionally be included in the system, depending on the system configuration and your application:

Oracle database server—stores Nuance speaker verification voice prints and dynamic grammars for CSS applications that use these features. Each caller's voice print is stored as a single record in the database. The RecServer stores and retrieves these records to and from the database, as needed.

NFS or other high-performance file server—can be configured to provide a single location for storing the Nuance grammar files, recorded utterance files, the Nuance software itself, and any other application-specific data files that are common and need to be shared across one or more of the above systems. Configure each system that needs access to these files as a client of the file server. References to the common files are transparently translated to accesses to those files on the file server.

Relocating the Software

Relocating the Runtime Server Software

The Runtime Server software subsystem is fully relocatable, which means that you can install the collection of files and subdirectories contained by the subsystem under any directory. By default, the Runtime Server subsystem installs the software under \home\aspect, creates an “aspect” user account on the Runtime Server system, and sets the UNIX shell environment variable VOICE_HOME to this path when a user of the system logs in under the aspect account.

Internally, the VOICE_HOME environment variable must be set correctly for the component processes of the Runtime Server to run. In UNIX shell notation, $VOICE_HOME refers to the value of the VOICE_HOME environment variable. Using the default installation $VOICE_HOME equals the /home/aspect string.

Relocating the Nuance Software Subsystem

The Nuance software subsystem is fully relocatable. The default installation procedure for the Nuance v8.0.0 SDK on the Telephony Server installs the software under C:\Nuance\v8.0.0\. Depending on the details of your configuration, it may be installed under some other directory on your Nuance recognition server system.



Internally, in order for the component processes and shell scripts of the Nuance subsystem to run, the environment variable NUANCE must be set to the installation's base directory. The Nuance SDK also includes a Windows environment initialization script, NuanceVar.bat, that must be executed before any other Nuance programs or scripts can run. The script is located in the path C:\Nuance\v8.0.0\bin.

Relocating the Telephony Server Software Subsystem

The Telephony Server software is relocatable.


CH

AP

TE

R

2Basic Speech Recognition Chapter 2:

This chapter describes how to develop a CSS application that performs basic speech recognition processing. It includes basic information about Nuance grammars and parameters and how they are specified in a CSS application, and about how recognizer resource management works in CSS. It also describes the CSS application programming interfaces used to perform a basic Nuance speech recognition.

This chapter includes the following topics:

■ About Nuance Grammars and Packages

■ About Nuance Parameters and Contexts

■ Specifying Recognition Grammars

■ Specifying a Recognition Context

■ About Resource Management for Recognizers

■ About Speech Recognition Cells

About Nuance Grammars and Packages

In any telephone-based speech recognition application, the domain of words, phrases, or sentences that callers are allowed to speak must be constrained. These constraints are defined in a set of grammars, which are application-specific. All recognizers have the same capabilities.

Specifying a Grammar

The Nuance system supports a sophisticated syntax for specifying a grammar. Nuance calls this syntax the Grammar Specification Language (GSL). GSL supports defining a hierarchy of grammars in which only the top-level grammars in the hierarchy can be set as the active grammar by an application. By definition, top-level grammar names always begin with a period, which is part of the name. Subgrammars can be defined and referenced by the

Chapter 2: Basic Speech Recognition


top-level grammars and other subgrammars, as needed. This hierarchical design supports modularizing large or complex grammars, which facilitates comprehension and common grammar re-use.

The Nuance implementation collects a set of one or more grammars into a grammar file. The grammar file is a simple text file that defines the grammars needed by your application. A grammar file name always ends in the extension .grammar. A grammar file can be created manually by using a text editor, or by using the Nuance Grammar Builder (NGB) utility.

Grammar names must be unique, both within a single grammar file and across all grammar files (if you create more than one).

Compiling a Grammar

Before it can be used by the Nuance recognition system, the textual representation of your grammars must be compiled into a form that is more readily usable by the recognition system. This compilation process is performed by the nuance-compile utility, which takes as its primary inputs your grammar file, a language-specific acoustic model set, and a set of one or more language-specific phonetic dictionaries.

The output of the grammar compilation process is called a recognition package, which contains the information necessary to configure the Nuance system to support your application. The recognition package is the basic unit of configuration in the Nuance ASR system.

Compiling a grammar file named package_name.grammar (where package_name can be any name you choose) generates a package named package_name. Physically, the package is implemented as a directory containing a number of files that are used by the Nuance system at run-time. Compiling the package_name.grammar file creates the sub-directory package_name located in the same directory as the package_name.grammar file, and containing the files generated by or copied to this directory by nuance-compile.

For example, the Nuance SDK includes an example grammar file digits.grammar, which is installed on the Telephony Server in the path %nuance%\sample-packages\digits.grammar. This file contains the following text, which defines a single, very simple grammar named “.SENTENCE” that recognizes unconstrained digit strings:

;

; Sample GSL file for a connected digit recognizer.

;

; Examples of what you can say:

; (Pausing between words is not necessary.)

;; “One two three four five.”



;; “Three two seven, oh six nine four.”;

.SENTENCE +[one two three four five six seven eight nine zero oh]

This pre-compiled grammar file is included in the Nuance SDK, thus providing the digits recognition package. After the Nuance SDK installation, this package is located on the Telephony Server in the directory %nuance%\sample-packages\digits.

The process of developing a set of grammars for your application is a major task involving a number of steps. See the Nuance documentation for information about the complete capabilities of the Nuance GSL and the details involved in developing application grammars.

About Nuance Parameters and Contexts

This section describes the use of Nuance parameters and contexts.

About Parameters

The Nuance system contains numerous parameters that are used to specify characteristics of the recognition algorithm. Examples include the parameters rec.DoNBest, which specifies that N-Best processing is to be performed, and rec.NumNBest, which sets the value for N.

There are also a number of parameters that are used to specify the Nuance system configuration. Examples include the parameters lm.Addresses, which specifies the location (TCP port and hostname) of the Nuance license manager, and rm.Addresses, which specifies the location of the resource manager.

In the CSS Nuance ASR system, you can assign values to parameters in one of three ways:

■ As entries in the configuration file (%NUANCE%\data\nuance-resource.site)

■ As elements of a Nuance context defined in a grammar file

■ By the client (CSS) application

Parameters that are specific to a particular recognition package are defined in a configuration file associated with the package. The configuration file is named package-name.nuance-resources (where package-name is the same name you used for the grammar) and is located in the same directory as the grammar file (package-name.grammar) for the package.



When the grammar is compiled, nuance-compile copies the contents of package-name.nuance-resources into the file package-name\nuance-resources in the package's directory. This directory was generated by nuance-compile, as described in Compiling a Grammar on page 20.

The nuance-resources file typically contains parameters that are used by the Nuance RecClient, RecServer, and Resource Manager processes, and must be visible to all those processes.

Sharing Parameters

If your application uses more than one recognition package, you can place the parameter definitions that are common across all the packages in the %NUANCE%\data\nuance-resources.site. The Environment variable %NUANCE% defaults to C:\Nuance\v8.0.0\ for Nuance v8.0.0.

In addition, if you have an NFS or other shared file server in your configuration, and the Nuance subsystem resides on this server, the parameters defined in the nuance-resources.site file can be shared across all the systems in your configuration.

The parameters that are defined in the nuance-resources.site file apply to all packages (and are available to all Nuance processes) in the configuration. The parameters that are defined in a package-specific nuance-resources file apply to all grammars in that package.

Setting Parameters at Runtime

The need may arise to use a parameter value for one grammar in a package that is different from the value defined in the nuance-resources file for that package (or from the default value inherited from the nuance-resources.site file), or from the Nuance system default. In these cases, you can set the parameter values by:

■ Using Applications to Set Parameter Values

■ Using Contexts to Set Parameter Values

Using Applications to Set Parameter Values

You can use a CSS application to directly set the value of parameters at run-time, before the recognition task is initiated. The Nuance system provides client interfaces that enable an application to directly set a parameter value, provided the parameter can be set at run-time.

Depending on which parameter is set and the application's own needs, the application may need to reset the parameter to its original value after the recognition task is completed. The application may need to query the Nuance system for the current parameter value before changing it.



CSS provides access to the Nuance interfaces for setting and retrieving Nuance parameter values through the SET and GET cells. For more information about these cells, see About Speech Recognition Cells on page 30.

Using Contexts to Set Parameter Values

You can also use a recognition context to set grammar-specific parameter values. Contexts are defined in the package's grammar file, and specify the name of the grammar to use along with a list of one or more parameter settings.

At run-time, when a recognition task is initiated, the application specifies the context name instead of a grammar name. The Nuance system temporarily sets all the parameters listed in the context to their specified values, performs the requested task, and then restores the parameters to their previous values.

For example, the following lines show a sample context named YesNo, which can be used to override the default package values for the grammar named “.YESNO”:

YesNo

{

Grammar .YESNO

ep.EndSeconds=0.3

rec.Pruning=950

}

Note: The context name does not start with a period.

For additional information about Nuance recognition contexts, see the Nuance documentation.

Specifying Recognition Grammars

In a typical CSS application, you use the VINP cell to specify the Nuance grammar. You can also use the LISTEN cell. For information about using the LISTEN cell, see About Speech Recognition Cells on page 30.

Using Token Names

For historical reasons, you cannot specify the Nuance grammar name directly by using the VINP cell. Instead, you specify a pair of token names. At run-time, this pair is mapped to an integer number (in the range 0-255) and is stored in the recognition command that is sent from the Runtime Server to the Telephony Server. This integer is then mapped again by the Telephony Server to the actual Nuance grammar name, which is passed as an



argument to the Nuance system API call that initiates the requested recognition. The integer number contained in the CSS command is referred to as the CSS vocabulary id, or simply vocabid.

The token names referenced in the VINP cell are defined in the Multiple Voice Recognition Vocabularies file on the Runtime Server. This file resides in the path $VOICE_HOME/sys_files/mvrv.config on the Runtime Server. The following is an example line from this file, which is intended to describe a 'digits' grammar in the U.S. English language for a recognizer that supports continuous speech recognition:

GRP_US_ENGLISH_CON VOC_DIGITS C 60

The structure of this file reflects the basic two-level model that exists within CSS for specifying grammars and supporting multi-lingual applications:

■ The token in the first field (GRP_US_ENGLISH_CON) conceptually specifies the language (or vocabulary group)

■ The token in the second column (VOC_DIGITS) specifies the vocabulary (or grammar) within that language.

■ The C stands for continuous, specifying that attribute of the recognition technology.

■ The number 60 is the CSS vocabid, which is sent in the recognition command to the Telephony Server.

When the Telephony Server receives this command, it matches the number 60 to a second grammar id mapping file, the Vocabulary Map or vocabmap file. This file maps CSS vocabids to the names of actual Nuance grammars that are configured on your system. The grammar name is then passed on to the Nuance recognizer, which initiates a recognition using the grammar.

The Nuance vocabmap file resides on the Telephony Server in the path C:\Program Files\Aspect Communications\CSS\TS\parms\asr\nuance\vocabmap. The default version (installed on the system by the installation of the Telephony Server software Nuance ASR option) contains only a single line:

60 .SENTENCE

This line maps CSS vocabid 60 to the name of the single grammar defined in the Nuance example digits recognition package. If you configure this digits package as a supported recognition package for the Nuance recognizers on your Telephony Server, you can perform a recognition using this grammar (see Chapter 8).

Using the VINP Cell

In a CSS application, you use the VINP cell to specify the Nuance grammar to use for a recognition. You set the Use Common Vocabularies? parameter of the cell to No, and then use the Unique Vocabulary parameter to select the line containing the (vocabulary group,



grammar) token pair that is mapped (as described in Using Token Names on page 23) to the Nuance grammar you want to use.

In addition, you need to add the Nuance grammars (to the CSS system mappings) that your application will use. You do this by making the appropriate additions to the mvrv.config and vocabmap files, and then adding translations for the tokens added to mvrv.config in a new version of the xae.lng file. You can then access these grammars by putting a VINP cell in your application that selects the appropriate grammar from among the items in the Unique Vocabulary parameter (which now includes the textual descriptions of the grammars that you added).

By default, the X windows (CSS) Application Editor Language string translation, or xae.lng, file (for the U.S. English language) resides on the Runtime Server in the path $VOICE_HOME/ui_lang/en_US/xae.lng. This file contains textual translations for tokens referenced by the CSS application editor. In the example presented in Using Token Names on page 23, this file maps the token GRP_US_ENGLISH_CON to US ENGLISH CON, and the token VOC_DIGITS to DIGITS 0-9, OH, so that the line in the Unique Vocabulary pop-up list corresponding to the desired line in the mvrv.config file is:

"US ENGLISH CON DIGITS 0-9, OH"

Note: The preceding information describes CSS mapping for the Nuance digits package .SENTENCE grammar. This mapping is provided with the standard CSS product.

Creating a Sample Application

This section describes how to build a simple Nuance application. The CSS application provides a limited automated stock trading service. The application asks the caller three questions:

■ “What stock do you want to trade?”

■ “Do you want to buy or sell?”

■ “Do you want to place a market, limit, or stop order?”

To build the sample Nuance application:

1. Create a simple Nuance grammar file, stock_trade.grammar, that defines a grammar corresponding to each of the three CSS application questions.

The grammar file might look like this:

; stock_trade.grammar

;

.STOCK_NAME [aspect compaq ibm ford gm chrysler]

.STOCK_TRADE_TYPE [buy sell]



.STOCK_ORDER_TYPE [market limit stop]

2. Add mappings for these new grammars on the Runtime Server.

Add lines to the mvrv.config file that contain the new token names corresponding to these grammars:

GRP_US_ENGLISH_CON VOC_STOCK_NAME C 100

GRP_US_ENGLISH_CON VOC_STOCK_TRADE_TYPE C 101

GRP_US_ENGLISH_CON VOC_STOCK_ORDER_TYPE C 102

3. Add the U.S. English language translations for the three new tokens you added to the mvrv.config file.

Create a new xae.lng file on the Runtime Server under the directory $VOICE_HOME/ui_lang/en_US_site, and put your token translations in this file:

VOC_STOCK_NAME “Stock Name”

VOC_STOCK_TRADE_TYPE “Stock Trade Type”

VOC_STOCK_ORDER_TYPE “Stock Order Type”

4. Add the mappings of the new CSS vocabids (to the actual Nuance stock names in the stock.grammar grammar file) to the Nuance vocabmap file on the Telephony Server:

100 .STOCK_NAME

101 .STOCK_TRADE_TYPE

102 .STOCK_ORDER_TYPE

5. Build a CSS application that references the new Nuance grammars.

6. Compile the stock_trade.grammar file to create the stock_trade recognition package.

7. Configure the recognizers on your Telephony Server to use the stock_trade recognition package.

8. Run your application.

Developing CSS Applications for Different Languages

The procedure and example described in the preceding section, Creating a Sample Application, can be generalized for the construction of CSS applications that support languages other than U.S. English.



The only differences in this procedure are:

■ When compiling your Nuance grammar you use a Nuance voice model for the desired language, rather than a voice model for U.S. English.

■ The translations of the new tokens in the mvrv.config file go in a new xae.lng file that resides on the Runtime Server in a subdirectory that is associated with the desired language, rather than in the en_US_site subdirectory.

For example if you are doing speech recognition in German, you can create a new subdirectory $VOICE_HOME/ui_lang/de_DE_site and put your new xae.lng file in this file.

For more information about developing CSS applications for other languages, see the Nuance documentation.

Specifying a Recognition Context

The procedure for making a Nuance recognition context available to your CSS application is essentially the same as for a recognition grammar. Set up a CSS mapping as described in Using Token Names on page 23, but reference the context name instead of the grammar name in the mapping files.

For example, to use the example “YesNo” context shown earlier (assuming recognition in U.S. English):

1. On the Runtime Server, use the existing line in the mvrv.config file for the grammar:GRP_US_ENGLISH_CON VOC_YESNO C 61

2. On the Runtime Server, use the associated lines in the $VOICE_HOME/ui_lang/en_US/xae.lng file. For this example, the line is: US ENGLISH CON YES, NO

3. Add the following line to the Telephony Server's Nuance vocabmap file:61 YesNo



4. In the VINP cell in your CSS application

a. Navigate to the Unique Vocabulary parameter.

b. Use the pull-down list to specify US ENGLISH CON YES, NO.

The text of the items in the Unique Vocabulary parameter list matches the content in the xae.lng file.

About Resource Management for Recognizers

Recognizer resource management within CSS is done dynamically, at the time of the first recognition request encountered in a call. Initially, when a new call comes into the CSS system, no recognizer is allocated to the call. When the application call flow encounters the first recognition request (LISTEN cell), the Telephony Server searches for an available recognizer. If no recognizer is currently available, the LISTEN cell takes the Error branch. If a recognizer is available, it is removed from the list of available recognizers, assigned to the trunk on which the call came in, and used to perform the recognition.

Note: The following description of dynamic recognizer resource management within CSS applies only to client recognizers (independently allocatable channels of recognition service). The dynamic load balancing of recognition requests across multiple Nuance RecServer systems, which is done by the Nuance resource manager, is completely separate from the resource management discussed here.

Allocating and Deallocating Recognizers

Nuance recognizers are not automatically deallocated at the end of the recognition (LISTEN cell). So, subsequent LISTEN cells in the application use the recognizer initially allocated to the call. Nuance recognizers are automatically deallocated at the end of the call.

If an application knows that it will not be doing any additional recognitions, it can force the CSS system to deallocate the recognizer before the end of the call by including a FREE cell in the application, at the point in the call flow when the recognizer is no longer needed.

Allocation of a Nuance recognizer for the duration of the speech-enabled portion of the call provides the best echo cancellation and hence barge-in performance for the recognizer (since the telephone line characterization required to support echo cancellation is an adaptive algorithm). It also improves the Nuance system's performance for voice enrollment and speaker verification, and is contractually required by Nuance. For these reasons you should not FREE a Nuance recognizer between each recognition request (LISTEN cell) in your application.



Configuring Recognizers

When you configure Nuance recognizers on the Telephony Server, you specify two pieces of information relevant to resource management:

■ The Nuance recognition package (or list of packages) that those recognizers are to support.

Since each recognition package defines one or more grammars (and possibly contexts), this defines a list of the Nuance grammars (and contexts) that are supported by each recognizer in the system.

■ The list of CSS vocabids supported by the recognizers.

This list must correspond to the vocabids to which the grammars (and contexts) in the configured packages are mapped (in the Nuance vocabmap file on the Telephony Server).

Using the stock trade example presented in Creating a Sample Application on page 25, you configure your recognizers to support the stock_trade package and the vocabids 100, 101, and 102.

Note: All recognizers have the same capabilities.

Using Sparse Recognition Resource Configuration

If your application's use of speech recognition is localized to a particular portion of the call, you can implement a sparse recognition resource configuration (one fewer recognizers than telephone channels) using CSS on-demand recognizer allocation together with the explicit use of the FREE cell in your application. In such configurations, active calls dynamically compete for available recognizers.

If you can obtain accurate statistics on call volume and the percentage of time your application spends on average in its recognition-enabled portion, implementing a sparse recognition resource configuration can result in significant cost savings (as compared to a configuration which provides a recognizer for every telephone channel). A proper traffic engineering analysis can enable you to realize this cost savings without sacrificing recognizer availability for the significant majority of recognition requests.



If a LISTEN cell takes the Error branch because no appropriate recognizer was available at the time the LISTEN cell was encountered in the application, the application global buffer CELL STATUS contains the value 16. This buffer can be used to determine if the Error branch exited because no resource was available from other (true error) conditions. In a sparse recognizer resource configuration, an application might want to wait a few seconds and then try the recognition request again to see if a recognizer becomes available.

About Speech Recognition Cells

In a CSS application, a Nuance-based speech recognition is performed using a sequence of three main cells, in the order listed:

■ VINP cell — enables speech input and specify the grammar

■ LISTEN cell — specifies the recognition parameters, initiates the recognition, and collects the recognition result information

■ NNBEST cell — parses the Nuance recognition result information.

Note: Unlike earlier versions of the LISTEN cell, CSS v7.0 can perform recognition without the use of a subsequent NNBEST cell. Prior to CSS v7.0, the LISTEN cell returned raw results and the NNBEST cell provided the N-Best list. With CSS v7.0, up to four items on an N-Best list can be returned with the LISTEN cell. This will cover most cases. In cases where it is likely that there will be more than 4 items in the N-Best list, use a NNBEST cell after the LISTEN cell.

The following cells can also be used in a CSS Nuance application:

■ GET cell — gets recognition-related parameter values

■ SET cell — sets recognition-related parameter values

■ FREE cell — deallocates a Nuance recognizer from the call

■ GDAT cell — collects any additional DTMF data

The three main cells are described in this section. For more information about all the speech recognition cells, see the Cell Catalog appendix in the Aspect Customer Self-Service v7.0 Application Developer Guide or the online Help system for CSS.

VINP Cell

The VINP cell must be used at least once in an application to enable Speech Input. To do this, set the Enable Voice Input? parameter to Yes.



The VINP cell is also used it to select a specific grammar. If each prompt uses a different grammar, use a VINP cell before each prompt. If DTMF input is used, a VINP cell must be used prior to the GDAT or MENU cell to disable voice input.

A VINP cell that disables voice input only ensures that a subsequent GDAT cell in the application does not turn the Nuance recognizer on. It does not deallocate a Nuance recognizer previously allocated to your call. Nuance recognizer deallocation occurs only when the call ends or when a FREE cell is encountered in the application.

If, at any point in your application, you want to restrict the caller's input to DTMF (no speech input allowed), and speech input was enabled using a VINP cell at some previous point in the application, you must precede your GDAT cell with a VINP cell that disables voice input. If you do not do this, the GDAT cell erroneously turns on the Nuance recognizer.

For information about using the VINP cell to specify the grammar to be used for the recognition, see Specifying Recognition Grammars on page 23.

LISTEN Cell

The LISTEN cell enables you to specify and perform a simple speech input dialog with the caller. The dialog includes:

■ Playing prompts

■ Waiting for speech (or DTMF) input

■ Storing a copy of the caller's speech input in a file (utterance recording)

■ Analyzing speech input

■ Collecting DTMF input

■ Automatically retrying the recognition request, after playing of an optional retry prompt, in the event that the caller never starts speaking, or speaks too long.

Note: DTMF type ahead does not work reliably across multiple LISTEN cells.

About the LISTEN Cell Input Parameters

The input parameters to the dialogue initiated by the LISTEN cell are derived from several sources, and are as follows:

■ Nuance grammar

The grammar is typically specified using the VINP cell. It is also possible to specify this information in the LISTEN cell. To do this, set the LISTEN cell Set Vocabulary from Buffers? parameter to Yes. Set the Vocabulary Group and Vocabulary parameters to the names of buffers where your application previously stored the vocabulary group and vocabulary tokens (as defined in the mvrv.config file), respectively.



■ Recognition prompts

These are specified in the LISTEN cell's Prompts table.

■ Barge-in setting

Specify whether or not barge-in is enabled for the recognition (whether the recognizer is to start listening before or after any specified recognition prompt plays have completed).

Disabling barge-in on the recognizers is a configuration choice that must be used carefully. This is because there is an intrinsic delay in this configuration between the time the prompt play ends and the time the recognizer actually starts listening. In general, this delay in the CSS system will be quite small (a fraction of a second) but it may increase if the CSS system (Runtime Server or Telephony Server) is under very heavy load.

Under such circumstances, eager callers that speak too quickly after the prompt play completes (or that don't quite wait for it to complete) may have problems being recognized by the system because the first part of their speech may not be seen by the recognizer. See the description of the client.MinPreSpeechSilenceSecs parameter in the Minimum Pre-Speech Silence Time bullet below.

■ DTMF input.

Specify how DTMF input entered during a prompt play is interpreted. This behavior is controlled by the LISTEN cell's Abort Recognition Upon DTMF Input parameter.

If the parameter is set to the default value of Yes, the recognition is aborted. The DTMF digit is stored in the buffer specified by the Input buffer. You use a GDAT cell to retrieve the digit.

If the parameter is set to No, the DTMF tone is streamed to the recognizer with the speech. If the grammar on the recognizer is DTMF-enabled, the DTMF digit can be recognized; if not, the DTMF digit is lost.

■ Recognition hypothesis.

To specify the maximum number of recognition hypothesis to be returned (N-Best size), use the Requested Number of Results slider parameter. The LISTEN cell limits the N-Best list size that can be requested to 5. See N-Best List Size Considerations on page 65 for a mechanism that enables you to request larger N-Best lists.

■ Minimum pre-speech silence time.

To specify the minimum pre-speech silence time, use the client.MinPreSpeechSilenceSecs parameter. This parameter enforces a minimum time the caller must wait after the recognizer starts before speaking. The default value for this parameter is 0.10 seconds. Setting this parameter to zero disables the timeout (no time limit).



■ Maximum pre-speech silence time.

To specify the maximum pre-speech silence time, use the PreSpeech Timeout parameter. This parameter can be set in one of two ways:

– Setting the PreSpeech Timeout slider to the desired value

– Setting the Set Timeouts from Buffers? parameter to Yes, and then setting the PreSpeech Timeout parameter to the name of a buffer in which your application has previously stored the desired pre-speech timeout value.

Setting this parameter to zero disables the timeout (no time limit). The pre-speech silence timer starts after any prompt plays specified in the LISTEN cell complete.

The related Nuance system parameter, client NoSpeechTimeoutSecs, does not work in the CSS Nuance integration. This behavior is due to the fact that the CSS integration does not inform the Nuance recognizer of when the prompt play completes. Internally, the CSS integration sets the Nuance parameter to zero, effectively disabling the associated timer in the Nuance system, and implements a timer associated with the LISTEN cell parameter itself.

■ Maximum utterance duration time.

To set the maximum utterance duration time, use the parameter, client.TooMuchSpeechTimeoutSecs. This parameter has a relatively large default value (60 seconds). Setting this parameter to zero disables the timeout.

■ Maximum recognition latency.

This is the maximum delay between the end-of-speech and the receipt of the final recognition result from the recognizer. In a properly provisioned system configuration, the Nuance RecServers can perform recognitions in real-time with the input speech, so the recognition latency is generally fairly small and is not a concern to the application.

If your application needs a limit set on this time, you can reset the Nuance parameter client.RecognizerTooSlowTimeoutSecs. The default value of 60 seconds effectively implements a fail-safe timer which aborts the recognition attempt in the unlikely event that the recognizer stops responding after the end-of-speech but before it delivers the final recognition result. Setting this parameter to zero disables the timeout.

■ Maximum input total response time.

To specify the maximum input total response time, use the Global Timeout. This parameter sets a limit on the total amount of time allowed to elapse between the end of the recognition prompt plays (if any) and the receipt of the recognition result from the recognizer. It places a limit on the sum of the pre-speech, utterance duration, and recognition latency time periods.

This source can be used as an alternative to, or in conjunction with, setting the parameters associated with the individual time periods. Use this source in cases where the application does not care how long the user or the Nuance recognizer take to perform their tasks, but needs only to place an upper bound on how long the system has to return an answer.



Setting this parameter to zero disables the timeout. Like the LISTEN cell PreSpeech Timeout parameter, this parameter can be specified by using either a slider or an input buffer.

■ Utterance recording.

To perform a CSS-initiated utterance recording, you use the Store Speech as Utterance and Utterance File Name parameters. The CSS-initiated utterance recording must be enabled in the configuration of the recognizer allocated to perform the recognition. For more information about utterance recording, see Chapter 6, CSS Nuance ASR Advanced Features.

■ Timeout retry processing.

This property determines the behavior of the dialogue if the maximum pre-speech silence, maximum utterance duration, or maximum input total response time limit is exceeded. It is specified in the Number of Timeout Retries and Timeout Prompt parameters.

If the Number of Timeout Retries parameter is set to some value greater than one, and one of these time limits is exceeded, the specified Timeout Prompt plays, followed by another playing of the recognition prompts and attempted recognition request. This process is repeated until either the caller starts and finishes speaking within the specified limits, or the specified timeout retry count is exhausted.

The retry processing provided by the LISTEN cell is independent of which time limit expired. In your application, you can play a different retry prompt depending on whether the caller never started speaking or spoke too long. To specify that no timeout retry processing be done by the LISTEN cell, set the Number of Timeout Retries parameter to one, and implement the timeout retry processing using logic in your application (outside of the LISTEN cell).

In other words, you can have the LISTEN cell Timeout branch determine which time limit expired (see About the Information Returned by the LISTEN Cell on page 35), play a different prompt accordingly, and then loop back to your LISTEN cell. The loop logic can keep a count of the number of retry attempts already done, and stop when the count reaches a specified limit.

■ Recognition results location.

This specifies the location (application buffer name) where the recognition results are stored. The LISTEN cell Returned Recognition String parameter stores the entire recognition string.

The four Result Buffers contain the first four items on the N-Best list. The four Confidence Buffers contain the confidence limits applicable to each Result Buffer.

See the Aspect CSS v7.0 Cell Reference Guide for more information about LISTEN cell parameters.



About the Information Returned by the LISTEN Cell

On completion of the dialogue, the LISTEN cell returns the following information to the application:

■ Overall dialogue / recognition result.

This information is communicated through the Exit branch of the LISTEN cell, the Method of Input Detected and Exception Status output parameters, and the CELL STATUS application global buffer.

The result can be one of the following six cases:

– Successful (in-grammar) speech input.

The dialogue successfully interacted with the caller to collect the desired information, which the caller entered by speaking. The cell takes the Success branch and populates the Method of Input Detected buffer with the value 0.

– Out-of-grammar (utterance rejection) or uncertain (low confidence) speech input.

The caller spoke, but the recognizer determined that the utterance was not in the active grammar, or the recognizer was not sufficiently confident that it was in the active grammar. The cell takes the Rejection branch.

If this occurs because the caller spoke before the recognizer started listening, the LISTEN cell Exception Status is populated with the value 1; otherwise, the Exception Status contains the value 0.

– DTMF input.

The caller entered DTMF instead of speech input. If the Abort Recognition Upon DTMF input parameter is set to No, the DTMF tone is streamed to the recognizer. If the grammar on the recognizer is DTMF-enabled, the DTMF digit can be recognized; if not, the DTMF digit is lost.

– The caller never started speaking or spoke too long, and the dialogue's retry processing limit was exhausted.

The cell takes the Timeout branch. To determine which of the two different timeout cases occurred in the dialogue, examine the application's CELL STATUS global buffer:

On Timeout exit from the LISTEN cell, this buffer is populated with the value 12 if the caller never started speaking, and with the value 11 if the caller spoke too long.

Note: Expiration of the maximum total input response time limit (the LISTEN cell's Global Timeout parameter value) is treated the same as expiration of the maximum utterance duration time limit (the Nuance parameter's client.TooMuchSpeechTimeoutSecs value). A value of 11 is returned to the CELL STATUS application buffer in either case.

– The dialogue was never initiated because no appropriate recognizer was available at the time.



The LISTEN cell takes the Error branch, and the application's CELL STATUS global buffer is populated with the value 16.

– The dialogue failed due to some problem in the CSS or Nuance systems.

The LISTEN cell takes the Error branch, and the CELL STATUS global buffer contains some value other than 16 (for example, a value of 17 indicates some error with the Nuance recognition system, such as exceeding the specified maximum recognition latency).

■ Dialogue caller input.

The LISTEN cell takes the Success branch if the caller successfully enters input into the system, either by speaking and being recognized, or by entering DTMF.

If the input is speech, the recognizer interprets the speech, including any natural language understanding and speaker verification information, for every element of the N-Best list. The interpretation is contained in a specially coded string that is stored in the application buffer specified in the Returned Recognition String parameter.

The four Result Buffers contain the first four items on the N-Best list. The four Confidence Buffers contain the confidence limits applicable to each Result Buffer. Refer to the Aspect CSS v7.0 Cell Reference Guide for more information about LISTEN cell parameters.

■ Recorded utterance path name and duration.

If caller utterance recording is specified and an utterance is successfully recorded, the LISTEN cell's Full Path for Utterance File buffer fills in with the full (absolute) path name of the file on the Telephony Server that contains the utterance.

For additional information about utterance recording in CSS, see Chapter 6, CSS Nuance ASR Advanced Features.

About the Nuance Parameters

The Nuance system provides a number of parameters that control many details of the recognition request, for which the LISTEN cell provides no interface. These parameters include:

– rec.ConfidenceRejectionThreshold — specifies the minimum confidence value required of a recognition result to be accepted as in-grammar by the recognizer.

– ep.EndSeconds — specifies the amount of post-speech silence needed for the Nuance recognizer to decide that the caller has stopped speaking.



NNBEST Cell

The NNBEST cell is used to parse the specially coded recognition result string returned by the Nuance recognizer to the CSS system. It takes as input in the Recognition Result input buffer, the name of the buffer specified in the Returned Recognition String parameter in the LISTEN cell in your application.

On successful exit, the NNBEST cell populates the Recognition String x and Recognition Confidence x output buffers with the successive recognition result hypotheses and their associated confidences (the successive members of the N-Best list returned by the Nuance recognizer). It also populates the Number of Results output buffer with the size of the returned N-Best list (the number of returned hypotheses). The cell can return up to 19 hypotheses.

For information about the Recognition Formatted String output buffer, see Formatted Text Recognition Results on page 64.


CH

AP

TE

R

3Natural Language Understanding Chapter 3:

This chapter describes the information you need to know in order to access the Natural Language (NL) understanding capabilities of the Nuance system.


■ How Natural Language Understanding Works

■ CSS Cells Used to Access Natural Language Information

■ Accessing Semantic Information through the NSLOT Cell

■ About Ambiguous Grammar Support

■ Accessing Advanced Slot-Type Values

How Natural Language Understanding Works

A natural language understanding system takes a sentence (typically a recognized utterance) as input and returns an interpretation (a representation of the meaning of the sentence). In effect, the system maps words to meaning. The application can then use the result of the natural language understanding process to determine an appropriate response to the user's utterance. A response might be playing a prompt, or booking a flight.

The Nuance system integrates basic phrase and sentence recognition with natural language understanding. Both the allowable utterances and their semantic interpretations are specified in the grammar, and both the text for the utterance actually spoken and the semantic interpretation are returned to the application by the recognizer. The Nuance system uses a semantic slot model to accomplish this. Each of the one or more semantic slots in an utterance holds a unique unit of meaning. Different ways of communicating this meaning return the same slot value.

For example, consider the following utterances:

■ “Withdraw fifteen hundred dollars from savings.”

■ “Take fifteen hundred out of savings.”

■ “Give me one thousand five hundred dollars from my savings account.”

Chapter 3: Natural Language Understanding


A Nuance grammar could be designed that would map each of these utterances to the same semantic interpretation, with the semantic slots action, amount, and account, and their corresponding values as follows:

Slot Valueaction withdrawamount 1500account savings

The application can then access the semantic interpretation directly; it does not have to parse the recognition result (spoken sentence).

The semantic interpretation is based on the recognition interpretation. If your application requests and processes more than one recognition result hypothesis (if you are doing N-Best processing), there will be one potentially different semantic interpretation for each returned hypothesis.

The Nuance grammar specification language allows for the specification of optional grammar components (grammars that enable input of varying amounts of information). If natural language understanding commands are attached to these optional components, it is possible for a valid utterance to supply only some of the semantic information specified in the grammar. For example, the utterance:

“I want to make a withdrawal from my savings account.”

supplies the action and account, but not the amount and semantic slots. In such cases, the Nuance system returns only the semantic information (action=withdraw and account=savings) that was supplied. The application can then follow a call flow that prompts for and retrieves only the missing pieces of information. Depending on the grammar design and your application's accuracy requirements, this subsequent request can use the same or a different Nuance grammar.

For more information about the design and implementation of Nuance grammars that return natural language interpretation, see the Nuance documentation.

CSS Cells Used to Access Natural Language Information

The following CSS cells are used to access natural language information from a Nuance recognition result:

■ NNBEST cell — returns the number of recognition result hypotheses, and if requested, the text-based recognition result string.

■ NSLOT cell — parses (retrieves) the values of the specified slots of the specified recognition result hypothesis NL interpretation.

■ NNUMB cell — determines the number of NL interpretations for a given recognition result hypothesis.



Most Nuance grammars have no NL semantic ambiguities, and the number of NL interpretations for all hypotheses is one. For more information about ambiguous grammars and the use of the NNUMB cell, see Chapter 6, CSS Nuance ASR Advanced Features.

Accessing Semantic Information through the NSLOT Cell

As described in Chapter 2, a CSS application can perform a Nuance-based recognition and retrieve the recognition result. It can then access the semantic information supplied in the utterance using the NSLOT cell.

Using the NSLOT Cell

To access the semantic information:

1. Follow the LISTEN cell with an NSLOT cell, setting the NSLOT cell parameters as follows:

a. Set the Recognition Result parameter to the recognition result output buffer specified in the LISTEN cell (the buffer specified in the Returned Recognition String parameter in the LISTEN cell).

b. If you are not doing N-Best processing, set the NBest Number input parameter to 1; if you are doing N-Best processing, set this parameter to the number of the hypothesis currently under consideration in your application.

Note: Use an NNBEST cell between the LISTEN cell and the NSLOT cell to determine the number of hypotheses actually returned by the recognizer.

c. Set the Which NL Interpretation parameter to 1.

This assumes there are no semantically ambiguous valid utterances in your grammar. For a description of processing semantic results for grammars that contain ambiguities, see Chapter 6.

d. Set the successive Slot Name x input parameters to the names of one or more of the semantic slots in your grammar.

The slots names in your grammar can appear in any order in the NSLOT cell. There is no relationship between any notion of order of slots names in the Nuance grammar (or associated slot definitions file) and the order of slot names specified in your NSLOT cell.

On successful exit (Success branch) from the NSLOT cell, the system populates the successive Slot Value x output buffers with the values for the corresponding requested slots, and populates the Buffer Count output buffer with the number of slot values actually returned.



Retrieving Values From Slots Associated With an Optional Component

It is possible for a single NSLOT cell to request values for more than one slot. A problem occurs when a requested slot is associated with an optional component of the grammar and the caller's utterance does not include that component.

In this case, the NSLOT cell takes the Fail branch, and no information is returned – even for other slots requested in the cell that were filled. Therefore, in your application you must request values for slots associated with optional components of your grammar one slot at a time by using a single NSLOT cell for each slot.

Using the example in the previous section (and assuming your grammar is designed so that each of the three slots is associated with an optional grammar component), your application would contain three NSLOT cells. The first NSLOT cell sets the Slot Name 1 input buffer to action; the second sets the Slot Name 1 input buffer to amount; the third sets the Slot Name 1 input buffer to account.

On successful exit from these cells, the buffer specified in Slot Value 1 for the first NSLOT cell would contain the string “withdraw”; the buffer specified in Slot Value 1 for the second NSLOT cell would contain the string “1500”; the buffer specified in Slot Value 1 for the third NSLOT cell would contain the string “savings”.

Note: If the grammar were designed with no optional components, then a single NSLOT cell could request and retrieve all three slot values.

The NSLOT cell can return up to seventeen slot values. If your grammar contains more slots than this (that are attached to required grammar components), you can still access all of them by using more than one NSLOT cell in your application.

About Ambiguous Grammar Support

The Nuance natural language understanding system supports ambiguous grammars. A grammar is ambiguous if a sequence of words can produce multiple interpretations. For example, the following grammar:

.Command (call Name:nm) {<command call> <name $nm>}

Name [ [john (john smith)] {return(john_smith)} [mary (mary jones)] {return(mary_jones)} [john (john brown)] {return(john_brown)} . . . ]

is ambiguous because the word sequence “call john” produces two interpretations:

{<command call> <name john_smith>}{<command call> <name john_brown>}



When ambiguous sentences occur, the natural language system returns multiple interpretations, sorted by probability (if specified). The CSS system provides access to all NL interpretations through the NNUMB and NSLOT cells.

The NNUMB cell can be used in your application to determine the number of NL interpretations that were actually returned by the NL system for a given recognition result hypothesis. The cell takes as input the name of the buffer containing the recognition result string (Recognition Result parameter) and the recognition result (N-Best list) hypothesis number (one indexed).

On successful exit from the cell, the # of Interpretations output buffer contains the number of NL interpretations for the specified hypothesis. Successive invocations of the NSLOT cell, with the Which NL Interpretation input parameter set to values ranging from one to the number of NL interpretations returned by the NNUMB cell, can then access the successive NL interpretations returned by the NL system.

Accessing Advanced Slot-Type Values

The CSS NSLOT cell returns values only for simple slot types (integers and strings). The Nuance NL system also supports filling of slots with more complex data types, such as structures and lists. If you pass the name of a slot that is filled with one of these complex slot types to an NSLOT cell, the cell returns no information, and takes the Error branch.

If you need to use complex slot types in your application, you can access their returned values using the rec.TextResultFormat Nuance facility. The format specifier slotname returns the value of any slot, and the specifier slotname.featurename returns the value of a particular field (feature) of a structure-type slot. If a slot (or slot feature) returns a list, the information is returned in the following format:

(a b c ... )

where a, b, c, ... are the successive elements of the returned list. If you specify a slot name (or slot feature name) of structure type, the information returns in the following format:

[<f1 v1> <f2 v2> ...]

where f1 is the name of the first feature in the slot, and v1 is the returned value for that feature; f2 is the name of the second feature in the slot, v2 is the returned value for that slot, and so on.



Accessing Per-Slot Confidences

The Nuance natural language understanding system returns not only the value, but also a confidence associated with each filled slot. The NSLOT cell does not provide access to these per-slot confidences. If you want to use this information in your application, you can gain access to it by setting the rec.TextResultFormat Nuance parameter to a string that includes %slotname format specifiers, where slotname is the name of each slot whose confidence you want to access. For more information, see the RecResultGetTextResult() Nuance API function documentation.


CH

AP

TE

R

4Dynamic Grammars Chapter 4:

This chapter contains information you need to know in order to use the dynamic grammar capabilities of the Nuance system. Nuance recommends using just-in-time (JIT) grammars.


■ About Dynamic Grammars

■ About Recognition Packages

■ Preparing to Use a Dynamic Grammar

■ Implementing a Dynamic Grammar

About Dynamic Grammars

A dynamic grammar is a grammar that can be created and modified by a running application. This is necessary if the complete application grammar cannot be determined until runtime or if the grammar needs to change at runtime. Examples include a personal contact list in a voice-activated dialing application or a database search result.

The Nuance dynamic grammar mechanism lets you create and update grammars at runtime and use them for recognition immediately, without having to recompile the recognition package and start a new RecServer process.

The CSS Nuance integration is concerned with dynamic grammars defined using voice-based interfaces only. The structure of such grammars is a simple, flat list of alternative words or phrases.

The Nuance system provides interfaces for creating new (initially empty) dynamic grammars and for subsequently adding and removing words or phrases to and from such a grammar. These interfaces enable the application to specify a phrase id for each phrase, which serves as both the handle for the phrase within the phrase list, and the text string that is returned as the recognition result if that phrase is subsequently recognized. The application can also specify an NL interpretation for the phrase.

Chapter 4: Dynamic Grammars


Updating Dynamic Grammars

Nuance dynamic grammars are stored in a database. Each dynamic grammar created has a unique identifier called a key and resides in one or more physical records in the database. Updates (adding or removing phrases) to a dynamic grammar occur in place in the database. The database type can be either a third-party relational (for example, Oracle) database, or a file system type database. The latter is supplied by Nuance and is intended only for development or prototyping; it should not be used in a deployed configuration.

New phrases can be added to a dynamic grammar using a text-based interface. Adding phrases through a voice interface (that is, by speaking them) is called voice enrollment, or simply enrollment, and is not supported for CSS v7.0. Grammars created with this mechanism are inherently speaker-dependent.

Because the pronunciations are generated based on the caller's spoken input, they should only be used for recognition with that speaker. Dynamic grammars created from text-based interfaces are speaker-independent. Pronunciations are generated through dictionaries and the automatic pronunciations generator and can be used for any speaker.

Creating and Using Dynamic Grammars

After a dynamic grammar is created, it can be used to perform recognition in your application. To do this, you insert a dynamic grammar label or placeholder in one of the static grammars for your application, and compile that grammar into your application's recognition package.

At runtime, you load the dynamic grammar into the recognition package, effectively inserting it into the static grammar at the position specified by the label, and then perform a recognition using the newly constructed grammar. Upon successful recognition of a phrase defined in the dynamic grammar, the phrase id and any NL interpretation defined for the phrase in the grammar are returned to the application as the recognition result.

Loading Dynamic Grammars

It is possible to load multiple dynamic grammars into your recognition package by using multiple labels. A single static grammar can then effectively insert those dynamic grammars at different points in the grammar by referencing the labels at the desired points of insertion. It also possible for more than one static grammar to reference (insert) the same dynamic grammar. In this case, loading a dynamic grammar into the recognition package effectively inserts it into all static grammars referencing that dynamic grammar.



Using the Nuance Interfaces

CSS does not provide any direct support for creating or modifying a dynamic grammar using any Nuance interface. To do this, create your own program that uses these interfaces, and then invoke it from your CSS application. For the best results, use just-in-time (JIT) grammars instead.

For additional information on dynamic grammars and Nuance databases, see the Nuance documentation.

About Recognition Packages

An application uses dynamic grammars in a recognition phase, in which subscribers are recognized against their personal phrase list. An application loads the dynamic grammar containing their personal phrase list into a containing, static grammar.

The recognition package must declare a dynamic grammar label for the dynamic grammar to be loaded. Any static grammar that wants to insert that dynamic grammar must contain a reference to that label at the desired point of insertion in the grammar.

For example, suppose you are building a simple voice dialing application. You might have a recognition package grammar file that contains the following lines:

PhoneList:dynamic

.VoiceDial (call PhoneList)

The first line declares the name PhoneList as a dynamic grammar label. The label is then referenced in the .VoiceDial grammar. At run-time, the application identifies the current caller and loads that caller's personal phrase (phone) list into the recognition package, thereby inserting it into the .VoiceDial grammar after the initial command word (“call”).

Preparing to Use a Dynamic Grammar

To use a Nuance dynamic grammar from your CSS application, you must perform the following procedures:

■ Configure or create the database that holds the Nuance dynamic grammar information.

If you are using a third-party relational database (for example, Oracle), you must configure the Nuance system to use that database. If you are using the Nuance file system type database (for development), you must initially create the database. In either case, you use the Nuance utility nuance-database-admin.



■ Identify the Nuance dynamic grammar database when you configure the recognizers on the Telephony Server system.

For a relational database, you must specify the database server hostname and database name. For a file system database, you must specify the database root directory and the database name. In either case, you must specify a database handle (dbhandle) to associate with the database. Your CSS application uses this handle to identify the specific database to use in a dynamic grammar loading operation.

Some CSS applications may need to access more than one dynamic grammar database. In this case, you specify the list of dynamic grammars needed when the recognizers are configured. You must define a unique dbhandle for each dynamic grammar database.

For further details about specifying the dynamic grammar databases when configuring your recognizers, see Chapter 7.

■ Create a CSS application that performs a standard recognition that uses a dynamic grammar.

Your application specifies the database key of the dynamic grammar that contains the caller's phrase list, and, if multiple dynamic grammar databases are configured, the dbhandle of the database containing that dynamic grammar.

Any and all Nuance dynamic grammar databases specified when the Nuance recognizers were configured are automatically opened during Telephony Server start-up, and automatically closed during Telephony Server shutdown.

Note: For a relational database, the CSS recognizer configuration interface does not allow specification of the database account name and password. Therefore the Nuance default account (nuance:nuance) must be configured as a valid account for that database.

Implementing a Dynamic Grammar

Load the dynamic grammar into your application's recognition package, specifying the dynamic grammar label in the package and the dynamic grammar database key of the specific dynamic grammar. Then perform a recognition using a static grammar that references the dynamic grammar (label).

In a CSS application, loading the dynamic grammar is accomplished using the SET cell.

To load the dynamic grammar:

1. Set the CSS parameter vil.DynamicGrammarLabel to the dynamic grammar label in the recognition package.

2. Set the CSS parameter vil.StorageIDString to the dynamic grammar database key.



3. Follow the SET cell in your application with a VINP cell.

4. Follow the VINP cell with a LISTEN cell that performs a recognition using the static grammar that references the dynamic grammar label just loaded.

For details, see Specifying a Grammar on page 19 and About Speech Recognition Cells on page 30.

For example, you have a voice dialing application, and a record in the Nuance dynamic grammar database. The key is subscriber_1796, which contains a particular subscriber's enrolled phone name list. If the subscriber calls your application and wants to use a phone name list, your application must contain the following:

■ A SET cell that sets vil.DynamicGrammarLabel to PhoneList, and vil.StorageIDString to subscriber_1796.

■ A VINP cell that sets Enable Voice Input to Yes, Use Common Vocabularies to No, and Unique Vocabulary to whatever token pair is mapped to the .VoiceDial Nuance grammar.


CH

AP

TE

R

5Speaker Verification Chapter 5:

This chapter describes how to use the Nuance ASR software with CSS to add speaker verification to a CSS application.


■ How Speaker Verification Works

■ About Speaker Verification Databases

■ About Speaker Verification Cells

How Speaker Verification Works

The Nuance verifier lets you add security features to your applications by authenticating a user's identity before allowing access to sensitive information and transactions. Speaker verification is a voice biometric – an automated verification of the identity of a person based on physiological characteristics of the person's voice. Speaker verification provides a higher level of security than common mechanisms such as passwords and identification cards, which can be guessed, stolen, or counterfeited, and lets you design applications that do not require users to remember specific identification numbers or passwords.

For additional information about the Nuance verifier and Nuance databases, see the Nuance documentation.

About the Verifier

In the Nuance system, speaker verification is seamlessly integrated with speech recognition. The verifier runs as part of the Nuance RecServer. To use the verifier, you add verification-related configuration information to the recognition packages for your application. Recognition requests sent to a RecServer for a verification-enabled package can then be processed by both the verifier and the recognizer. Verification results are returned to the application along with the recognition results.

Chapter 5: Speaker Verification


To perform speaker verification, the verifier checks a speaker's identity by comparing speech samples to an existing voiceprint for that speaker. A verification-enabled application consists of two phases:

■ A training session — creates voiceprints for new users of the system.

During training, a user provides enough speech samples to allow the verifier to learn the voice. The verifier creates a voiceprint and stores it in a database. The voiceprint is a characterization of person's voice and not a recording of actual voice samples.

■ A verification session — verifies the caller's identity.

The caller asserts a particular identity, and the verifier retrieves the voiceprint associated with that identity (called the claimant voiceprint) from the database. After one or more utterances, the verifier authenticates or rejects the identity of the speaker by verifying live speech samples against that voiceprint.

Typically, users need to train voiceprints only the first time they use an application. Subsequent verification sessions retrieve and authenticate a caller using that voiceprint. By default, the trained voiceprint is not modified during a verification session; however, provided the caller was authenticated, the Nuance verifier also supports online adaptation, in which a voiceprint is automatically refined based on utterances collected during a verification session.

About Online Adaptation

Online adaptation is a verification-related parameter that can be supplied when the application's recognition package is compiled. Voiceprint adaptation has a number of benefits, including simplifying the caller's interaction with the system (since it will be easier for the verifier to authenticate the caller), improving system performance (since the verifier has more data on which to base its decision), and increasing application security (since it will be less likely for an imposter to be accepted).

About Variable-Length and Fixed-Length Verification

By default, the Nuance verifier makes verification decisions using variable-length verification, a mechanism that provides accurate results with the smallest number of verification utterances. Unlike fixed-length verification, in which a fixed number of utterances are processed and a decision is made based on that data, variable-length verification stops verifying utterances when either:

■ The verifier is confident that it has determined the correct result

■ The verifier has processed a set maximum number of utterances



Variable-length verification typically requires the verifier to process fewer utterances, and for some applications one utterance may be enough to positively identify a user. However, if an application needs fixed-length verification, you can select this method by using a recognition package compilation parameter. For variable-length verification, you specify the minimum required and maximum needed number of valid utterances when you compile the recognition package. For fixed-length verification, you specify the number of required valid utterances used by the verifier in order to make a verification decision when you compile the recognition package.

The preceding specification applies only to a verification session. The verifier imposes no limit on the number of utterances used to train a new voiceprint. This is an application design decision and is completely under the control of the application.

Note: There is a direct relationship between the number of utterances collected during the training session and the ease of authentication that callers experience during a verification session. In general, Nuance recommends that at least three utterances be collected during the training session.

Using Verification in an Application

You can use the verifier in three different modes:

■ Text-dependent mode — uses the same utterance for training and verification.

■ Text-prompted mode — performs verification against a phrase that was not necessarily used for training, and the verifier knows what the verification phrase should be.

For example, the application could train a voiceprint by asking the user to repeat several digit strings. During the verification session, the application could then ask the user to repeat randomly generated digit strings.

■ Text-independent mode — performs verification against a phrase that was not necessarily used for training, and the verifier does not know what the verification phrase should be.

For example, after the user makes an identity claim, the application could perform verification on a random utterance selected by the user, and the utterance does not have to be in the application grammar.

For both the text-dependent and text-prompted modes, the verifier must know what the caller is expected or required to say. The application communicates this information to the verifier by setting the Nuance parameter sv.RequiredPhrase to the required phrase. The required phrase must be a valid phrase in the current recognition grammar.



Determining Validity

Internally, the verifier uses valid utterances only for training or verification. If a required phrase is in effect, the utterance is considered valid if it exactly matches the recognition result returned by the recognizer for that utterance (or, if N-Best processing is in effect, exactly matches one of the recognition result hypotheses returned by the recognizer). If no required phrase is in effect, then any utterance not rejected by the recognizer is considered valid. In both training and verification mode, the verifier provides an interface that enables the application to find out how many valid utterances the verifier has processed so far in the current session. This is especially useful to the application during training, since the verifier itself imposes no limits on the number of utterances during training.

About Verifier Processing Modes

By default, the Nuance verifier works in real-time with recognition; each spoken utterance is processed by the verifier as the utterances are received. The Nuance verifier also supports a verification data buffering mode (for both training and verification). When operating in this mode, all utterances sent to the recognizer are stored in an internal buffer, which can be processed by the verifier (on application demand) at a later time.

You can use the data buffering facility to support a simultaneous claim and verification application user interface. In this design, the initial utterance collected from the caller uniquely identifies that caller by a recognition-based method (for example, by specifying an account number or telephone number). The application database is designed so that this caller identification string is also the database key for the subscriber's voiceprints in the voiceprint or some other subscriber-related database.

If the indicated subscriber is not found in the database, the application turns on the verifier in training mode and initially trains on the buffered data. If the subscriber is found in the database, the application turns on the verifier in verification mode and initially verifies on the buffered data. If more than one utterance is needed to complete the training or verification session, the application turns data buffering off, and subsequent utterances are processed by the verifier in real-time. In this way, the caller does not need to supply two utterances to gain access to the system (the first to identify themselves and the second to authenticate that identity claim).

Storing Nuance Voiceprints

Databases are used for storing Nuance voiceprints in a manner that is exactly the same as their use for storing Nuance dynamic grammars. Each voiceprint has a unique key and resides in one or more physical record in a particular database. Adaptation of a voiceprint updates the voiceprint's data in place in the database. The database type can be either a



third-party relational (for example, Oracle) database or the Nuance-provided file system type database, which is used only for development purposes. In either case, the database must be initialized or created using the Nuance nuance-database-admin utility.

About Speaker Verification Databases

CSS does not provide a single cell to perform either verification training or verification sessions. Instead, CSS provides interfaces that:

■ Turn the verifier on and off, in either training or verification mode

■ Set and get verification-related Nuance system parameter values

■ Invoke Nuance-specific verification-related functions and return their results to the CSS application.

This design is consistent with the Nuance system design, in which the verifier interfaces are orthogonal to the recognition interfaces, and avoids creating a duplicate CSS interface for performing recognition (which is accomplished using the VINP and LISTEN cell). It also gives the application developer complete flexibility in the design of the training and verification session dialogues, by combining the cells that access the Nuance verification functions with the VINP and LISTEN cells to initiate the recognitions.

In order to initialize and access Nuance speaker verification databases in the CSS-Nuance integration:

■ The databases must be configured or created to hold Nuance verification voiceprints, using the nuance-database-admin utility.

■ If the database is a relational database, the default Nuance database account (nuance:nuance) must be a valid account on the database server system.

■ The databases must be specified when the recognizers are configured on the Telephony Server system, and a unique dbhandle specified for each. All such databases are automatically opened during Telephony Server start-up, and automatically closed during Telephony Server shutdown.

■ A CSS application must specify the voiceprint key in any CSS cell that accesses a Nuance verification voiceprint. If more than one verification database has been configured, the CSS application must specify the dbhandle:key.



About Speaker Verification Cells

CSS applications access the Nuance verifier functionality using the following CSS cells:

■ SVT (Speaker Verification Training) cell — starts and stops the verifier in verification training mode

■ SV (Speaker Verification) cell — starts and stops the verifier in verification mode

■ SET (Set parameter) cell (optional) — sets verification-related Nuance parameters, and provides an interface for invoking optional verification-related Nuance functions

■ VINP and LISTEN cells — perform a Nuance recognition (with the verifier enabled at the time)

■ GETSV (Get Speaker Verification results) cell — gets the verifier's cumulative verification decision (authentication or rejection) and scores metric data

■ GET (Get parameter) cell (optional) — retrieves Nuance-specific verification-related parameter values

The SVT, SV, and GETSV provide interfaces to all common (vendor-independent) speaker verification functionality. Access to non-common or Nuance-specific functionality is provided by setting or retrieving Nuance system parameters and CSS Nuance pseudo-parameters.

Attempts to turn the verifier on in a different mode (training or verification) when it is already on in the other mode result in an error. Attempts to turn the verifier on in the mode in which it is already running are silently ignored; they have no effect and do not generate an error.

SVT Cell

The SVT cell starts or stops the verifier in verification training mode. In training mode, the Verification Train parameter specifies whether any collected voiceprint data is committed to the verification database. This parameter can be set to Start, Stop commit, or Stop abort.

The application must also specify the database key of the voiceprint to be trained (or, if the application uses multiple databases, the dbhandle:key). This information is supplied in the Speaker model input parameter. An SVT cell that ends a successful training session (has the Verification Train parameter set to Stop Commit) automatically creates the specified voiceprint in the database.

Note: The Nuance verifier supports voiceprint adaptation during a training session. This occurs automatically when a successful training session is performed on an existing voiceprint in the database. However, this feature is not necessary in a typical speaker verification application. The use of this feature is discouraged by Nuance, because it significantly increases the size of the voiceprint in the database.



If data buffering is enabled (see SET Cell), an SVT cell that turns on the verifier automatically trains on any buffered data, then turns data buffering off so that the verifier is left on in real-time training mode; valid utterances subsequently recognized are added to the training data.

The SVT cell takes either the Success or Error branch; it never takes the Timeout branch.

SV Cell

The SV cell starts or stops the verifier in verification mode, as specified by the cell's Verification parameter setting. You specify the claimant voiceprint in the Speaker Model parameter, using the same syntax as the SVT Cell.

If data buffering is enabled (see SET Cell), an SV cell that turns on the verifier automatically verifies the claimant voiceprint against any buffered data, then turns data buffering off, so that the verifier is left on in real-time verification mode; valid utterances subsequently recognized are not processed in the verification session.

The SV cell takes either the Success or Error branch; it never takes the Timeout branch.

SET Cell

The SET cell provides an interface for performing operations that have no interface in the SV or SVT cells, such as:

■ Specifying the required phrase — done by setting the Nuance parameter sv.RequiredPhrase to the required phrase text.

■ Turning data buffering on or off — done by setting the CSS parameter ver.DataBuffering to True or False.

In most applications, data buffering is turned off automatically when the application encounters an SVT or SV cell. The SET cell provides an interface for explicitly turning data buffering off.

■ Excluding the most recent utterance (the one processed by the most recent LISTEN cell in the application) from the verification training or verification data — done by setting the CSS parameter ver.IgnoreLastSVUtterance to True.

If the utterance was valid (it matched the required verification utterance), it is removed from the cumulative results, and the count of the number of valid utterances is reduced by one.



VINP and LISTEN Cells

These cells are used by your application to perform a recognition (see VINP Cell on page 30 and LISTEN Cell on page 31). If the verifier is on in verification mode when the LISTEN cell is encountered, after recognition completes, the utterance is processed by the verifier, and the verification results are stored by CSS for subsequent retrieval by the GETSV or GET cells (see GETSV Cell and GET Cell below).

If the verifier is on in training mode when the LISTEN cell is encountered, after recognition completes, the utterance is processed by the verifier and, if valid, added to the training data. If data buffering is on when the LISTEN cell is encountered, the utterance data is stored by the Nuance recognizer for subsequent use by the verifier (when an SVT or SV cell is encountered in the application, as described above).

GETSV Cell

The GETSV cell determines whether or not the verifier has processed enough valid utterances to make a decision, and if so, to return that decision. It can retrieve Nuance-specific raw score information about the utterances processed by the verifier.

If the verifier has not processed enough valid utterances to make a decision, the GETSV cell takes the More Data Needed branch, and none of the output buffer contents is significant.

If the verifier has processed enough valid utterances to make a decision, the GETSV cell takes the Success branch, and sets the Acceptance output buffer to indicate the decision: 1 for authenticated, or 0 for rejected. The Score output buffer is filled in with the verifier's raw score for the most recent utterance (if it was valid), and the Cumulative Score output buffer is filled in with verifier's cumulative raw score for all valid utterances processed in the current session.

Note: These values are highly Nuance-specific, and are not generally of much application interest.

If no recognizer is allocated to the call when the GETSV cell is encountered in your application, the cell takes the No Resource branch. The cell takes the Error branch if any other error condition occurred that prevents CSS from returning the verification results information. The cell never takes the Timeout branch.



GET Cell

The GET cell provides an interface for retrieving information not returned by the GETSV cell, namely:

■ Whether or not the most recent utterance processed by the verifier was valid.

To return this information, request the value for the CSS parameter ver.isValidUtt, which returns a 1 if the utterance was valid or a 0 if it was not. You can use this value to determine whether the Score output of the GETSV cell is meaningful.

■ The number of valid utterances processed by the verifier in the current session.

To return this information, request the value for the CSS parameter ver.NumValidUtt, and can be used to determine whether or not the Cumulative Score output of the GETSV cell is meaningful. More importantly, it can be used by the application to determine whether or not to continue collecting utterances during a training session.

■ The verification decision based only on the most recent utterance.

To return this information, request the value for the CSS parameter ver.SVDecision which returns a 1 to indicate accepted/authenticated, or a 0 to indicate rejected.

■ Whether or not the model was adapted during the last utterance.

To return this information, request the value for the CSS parameter ver.isSVModelAdapted, which returns a 1 if the model was adapted or a 0 if it was not. This applies only when the verifier is in verification mode, not in training mode.

■ The number of speech frames that comprised the last utterance.

To return this information, request the value for the CSS parameter ver.NumFrames. This information can be used during training to ensure that utterances have a minimum length.


CH

AP

TE

R

6CSS Nuance ASR Advanced Features Chapter 6:

This chapter presents information about the advanced features of the CSS Nuance ASR system.


■ About Utterance Recording

■ Formatted Text Recognition Results

■ N-Best List Size Considerations

About Utterance Recording

The Nuance recognition system is capable of recording the caller's utterance into a specified file. This action takes place any time the Nuance parameter client.WriteWaveforms is set to True.

The following additional Nuance parameters control details of the created utterance files:

■ wavout.AutoFilenameExtension

■ wavout.FileFormat

■ client.RecordFilename

■ client.RecordDirectory

■ client.RecordCounter

With the default settings of these parameters, the utterance files are created in a NIST SPHERE -format file (an 8K mu-law format audio file with a 1Kbyte prepended header), in the directory you specified when you used the SCI to configure the Nuance option. The utterance files each have a unique name automatically generated by the RecClient and a .wav extension. For additional details on these parameters, see the Nuance documentation or associated HTML pages.

There are two reasons why you might need to enable utterance recording:

Chapter 6: CSS Nuance ASR Advanced Features


■ For application purposes (presumably, to enable playback of the recorded utterance to the caller or another user of the system at a later time).

See Setting Up Utterances for Application Playback on page 62.

■ For recognizer tuning purposes.

See Setting Up Utterance Recording for Call Logging Purposes on page 63.

Setting Up Utterances for Application Playback

To record utterances for application playback purposes, you must:

■ Specify that utterance recording is to be performed when the recognizers are configured on the Telephony Server system.

This includes specifying the directory on the Telephony Server in which the utterance files are to be created.

■ Specify that utterance recording is to be performed in your CSS application.

This is done on a per-recognition basis in the LISTEN cell. You do this by setting the Store Speech as Utterance? input parameter to Yes, and setting the Utterance Filename input parameter to the name of file to be created.

Note: You specify only the file name, not the full path name of the file.

About Utterance Processing by CSS Applications

When a call is processed by your application and the LISTEN cell is encountered in the call flow, CSS stores the utterance file in the path name generated by appending the filename specified in the LISTEN cell with the directory specified when the recognizer allocated to the call was configured. Upon exiting the cell, the system writes the generated path name to the Full Path for Utterance File output buffer of the LISTEN cell.

Note: The LISTEN cell does not specify the full path name of the file, but only the name of the file. The full path name of the created utterance file is returned to the LISTEN cell.

When utterance file recording is initiated by the CSS application:

■ No .wav extension is added to the file; the returned name is exactly what was specified by the application.

■ Internally, CSS temporarily changes the values of the Nuance utterance recording-related parameters, as necessary, before initiating the recognition, and restores their original values after the recognition completes.



Playing Back Utterance Files

You use the PNAME cell to play back utterance files that were recorded using the LISTEN cell. To do this, you set the File Name input parameter of the PNAME cell to the Full Path for Utterance File output buffer of the LISTEN cell; not to the Utterance Filename input parameters. In other words, the PNAME cell takes the full path name of the utterance file as input, not just the file name.

Creating Utterances

In general, the following rules determine whether a requested utterance file is created:

■ Success branch — the utterance file is created if the input was speech, and not if the input was DTMF

■ Rejection branch — the utterance file is created only if the Exception Status output buffer has the value 0. A value of 0 means that the Rejection branch was taken because the utterance was truly rejected by the Nuance recognizer, and not because the caller spoke too soon.

■ Timeout, and Error branch — the utterance file is never created

In any case, the contents of the Full Path for Utterance File output buffer indicate whether or not the utterance file is created. If the file was created, the buffer contain the file name. If the file was not created, the buffer contains the empty string (“”).

If your application specifies utterance recording in the LISTEN cell, but you forget to configure the recognizers for utterance recording, the application's request is silently ignored. The requested utterance file is not created; but this does not cause the cell to take the Error branch.

If you configure your recognizers to do utterance recording, but do not specify it in your application, the Telephony Server creates a file name, and records the utterance into that file.

Setting Up Utterance Recording for Call Logging Purposes

To optimize your application's performance, you must go through an initial limited-deployment phase in which a small volume of live calls are taken by the CSS system, and data is collected on the Nuance recognition system's performance for those calls. The data is collected by the Nuance call logging facility. If client.WriteWaveforms is set to True, the logged data includes recordings of all the utterances processed by the Nuance system or a filtered sample of those utterances.



The Nuance call logging facility is described in the Nuance Speech Recognition System v8.0 Nuance Platform Integrator's Guide, and it's integration with the CSS system is described in Chapter 8 of this Guide.

Note: When recording utterances for recognition tuning (call logging) purposes, do not use the Telephony Server resource configuration interface (the SCI) to configure the Nuance recognizers for utterance recording. Doing so interferes with the correct generation of the Nuance call logs; specifically, it causes the utterance files to be recorded to the wrong place.

Formatted Text Recognition Results

As mentioned in Chapter 2, the complete Nuance recognition result is returned to the LISTEN cell in a specially coded string. This string has an undocumented format, which the Nuance-specific recognition result parsing cells (NNBEST, NNUMB, and NSLOT) know. These cells provide access to all the recognition result information that a typical application needs to know.

However, there are a few additional pieces of information that are contained in the returned result string that are not accessible by these cells (for example, per-slot confidences).

If your application needs to access such information, it may be able to do so by using the Nuance formatted text recognition result facility. This facility allows you to instruct the recognizer to return certain pieces of the recognition result in a text string in a format that you specify. When requested, this text information is included in the specially coded result string that is sent to the LISTEN cell, and can be separated from the rest of that string using the NNBEST cell.

To return recognition result information in an application-defined format, you use the rec.TextResultFormat Nuance parameter. If this parameter is set when a recognition takes place, the specified text string is returned to the LISTEN cell along with the normal encoding of all recognition result information.

Use a subsequent NNBEST cell to extract that string and place it in the Recresult Formatted String output buffer of that cell. You can parse the text string by using a custom (USER function-based) program invoked from, and supplied by, your application.

The format specifiers supported by the rec.TextResultFormat parameter are defined in the RecResultGetTextResult() Nuance API function documentation.



N-Best List Size Considerations

The size of the N-Best list returned by the Nuance recognizer is determined by the LISTEN cell's Requested Number of Results parameter, which has a maximum value of five. Internally, this value is communicated to the Nuance recognizer by assigning it to the Nuance rec.NumNBest parameter before the recognition is initiated. The assignment is temporary. After the recognition is complete, the original value of rec.NumNBest is restored.

While the Nuance system itself is capable of returning large N-Best lists, the standard CSS interface limits applications to an N-Best list size of five. This can be impractical for some types of applications.

There are two ways to work around the LISTEN cell N-Best size limitation:

■ Using a Nuance Recognition Context to Set the N-Best Size

■ Using the SET Cell to Dynamically Set the N-Best List Size

Using a Nuance Recognition Context to Set the N-Best Size

You can specify the desired N-Best list size in a Nuance recognition context for the relevant grammar by setting the rec.NumNBest parameter in that context. This overrides the value of rec.NumNBest temporarily set by the LISTEN cell. This method is appropriate if the N-Best list size is fixed for a given grammar.

Note: Even though contexts are statically defined in your grammar definition file, the parameter settings specified in them take place at run-time, just before the recognition is initiated.

You cannot override the N-Best size specified in the LISTEN cell simply by setting rec.NumNBest in your nuance-resources file (outside of a context definition). Settings specified outside of a context occur at CSS system initialization time, and are overridden by settings made at run-time (in this case, by the LISTEN cell).

Using the SET Cell to Dynamically Set the N-Best List Size

If you need to set a large N-Best list size dynamically (one which is not always associated with a particular grammar) in your application, you can instruct CSS to ignore the LISTEN cell Requested Number of Results input parameter value, thereby letting this recognition parameter be controlled by the rec.NumNBest Nuance parameter.

You do this by setting the SET cell vil.Ignore_LISTEN_NBest parameter to True in your application, before the recognition is initiated by the LISTEN cell. Once this parameter is



set, you can specify the desired rec.NumNBest value through any methods (for example, in your nuance-resources file or in a SET cell in your application).

Note: After the LISTEN cell executes, the default LISTEN cell behavior (specifying the N-Best list size by the Requested Number of Results input parameter) is restored.

About Message Size Limitations

If you need to have large N-Best lists returned to your application, it is important to understand the information presented in this section.

Communications between the Runtime Server software and Telephony Server software are message-based, with the maximum message data size being approximately 10,000 characters. If the actual recognition result string size for a given recognition request is too big, no results are returned to the application. The LISTEN cell still takes the Success branch, but no results are actually returned to it. This case can be distinguished (from a true LISTEN cell Success exit) because the Actual Number of Results output buffer contains the value 0.

There is no completely reliable solution to this problem. However, depending on the needs of your application, in some cases you may be able to avoid it by specifying that the recognition result be returned to the LISTEN cell in a more concise format; or in a format that does not contain information ordinarily returned by the Nuance recognizer, but that your application does not actually need. To do this, you must:

■ Specify the information contents and format to be included in the returned recognition result by setting the rec.TextResultFormat Nuance parameter.

For more information, see Formatted Text Recognition Results on page 64.

■ Precede the LISTEN cell in your application with a SET cell that sets the CSS parameter vil.TextResultStringOnly to True.

This instructs CSS to return to the LISTEN cell only the information (contents and format) specified by your rec.TextResultFormat Nuance parameter setting (as opposed to returning the information requested by that parameter, plus all recognition result information, in the default format, which is the normal LISTEN cell behavior).

Using this method, you can determine the maximum number of characters that will be included in each returned hypothesis of the N-Best list, and therefore, what the largest N-Best list is that can reliably be returned to your application.


CH

AP

TE

R

7Configuring the Nuance ASR Option Chapter 7:

This chapter describes how to configure the Nuance ASR option on the Telephony Server and the Nuance Recognition Server. These are normally separate systems. However, you can configure the Telephony Server as a Recognition Server for development systems.

Note: When you install the Nuance ASR option as part of the Aspect CSS installation, the Nuance software is installed at C:\Nuance\v8.0.0 by default for Nuance v8.0. This path is also set as the environment variable %NUANCE%. This chapter refers to the path where you installed the Nuance ASR software as %NUANCE%.

For information about installing the Nuance ASR option, see the Aspect Customer Self-Service v7.0 Installation and Upgrade Guide.

This chapter contains the following sections:

■ Configuring Nuance on the Telephony Server

■ Configuring Nuance on the Recognition Server

■ Configuring Nuance through the SCI

■ Nuance Configuration Tips

Configuring Nuance on the Telephony Server

Configure the Recognition Clients (recclient), Resource Manager (rm), and License manager (lm) on the TS server. If you do not plan to use a separate system for the Recognition Server, add the contents of the Recognition Server configuration (watcher.config) file to the TS configuration file.

Note: In the following procedures, command-line entries are single-line commands. Lines indented after the first line indicate a continuation of the single-line entry. Do not break command lines when inputting them into the system.

To configure the Nuance ASR option on the Telephony Server:

1. Log in as system administrator:

Chapter 7: Configuring the Nuance ASR Option


2. Create the following:

– In the %NUANCE% folder (C:\Nuance\v8.0.0), create a folder named Aspect.

– In the \Aspect folder, create a folder named Log.

3. Obtain a Nuance ASR license file from CSS and copy it to the following path on the TS server:%NUANCE%\Aspect\licence.dat

4. In the \Aspect folder, create a watcher.config file that contains the following command lines:

– One entry for license manager:

%NUANCE%\bin\win32\nlm.exe%NUANCE%\Aspect\license.datwatcher.RestartOnFailure=TRUE

– One entry for resource manager:

%NUANCE%\bin\win32\resource-managerwatcher.RestartOnFailure=TRUEconfig.LogFileNamePrefix=rmconfig.LogFileRootDir=%NUANCE%\Aspect\log

– One configured Recognition Client for every 24 recognition channels. Use a different port and log file prefix for each config.RecClientPort entry:

%NUANCE%\bin\win32\recclient -nthreads 12watcher.RestartOnFailure=TRUEconfig.RecClientPort=7879config.LogFileNamePrefix=recclient1config.LogFileRootDir=%NUANCE%\Aspect\log

5. To specify the watcher.config file as the startup file to be used by the Nuance Watcher service, enter the following at the command prompt:

%NUANCE%\bin\win32\watcher-daemon-win32-service-init -a config.LogFileRootDir=%NUANCE%\Aspect\logconfig.LogFileNamePrefix=watcher"wm.snmp.MibDirs=%NUANCE%\data\mibs;%VWS%\data\mibs" watcher.Modules=http,snmpwatcher.QuiesceTimeoutMs=2000000000watcher.DaemonStartupFilename=%NUANCE%\Aspect\watcher.config

6. Navigate to http://localhost:7080/ to open the Service Control Manager and start the Nuance Watcher Daemon.

7. Verify that all specified Nuance processes are running.



Configuring Nuance on the Recognition Server

To configure the Nuance ASR Recognition Server:

1. Log in to the Nuance Recognition Server as system administrator.

2. In the %NUANCE% (C:\Nuance\v8.0.0) folder, create a folder named Aspect.

3. In the \Aspect folder, create a folder named Log.

4. In the \Aspect folder, create a watcher.config file that contains:

– One entry for recognition server:

%NUANCE%\bin\win32\recserverwatcher.RestartOnFailure=TRUEconfig.LogFileNamePrefix=recserver1config.LogFileRootDir=%NUANCE%\Aspect\logconfig.ServerPort=9201 -package %NUANCE%\sample-packages\digitsrm.Addresses=TS_SERVER_NAME:7777

Where TS_SERVER_NAME is the name or IP address of the TS machine.

– One entry for the compilation server:

compilation-server watcher.RestartOnFailure=TRUEconfig.ClapiConfigFile=%NUANCE%\Aspect\clapi_conf.xmlconfig.LogFileNamePrefix=USEngCompS1comp.MergeDictionary=C:\Nuance\v8.0.0\Aspect\Packages\digits1\vws_dictionaryconfig.LogFileRootDir=%NUANCE%\Aspect\log-config_name USEngCompilationServer-package %NUANCE%\Aspect\Packages\digitsJIT comp.Port=10100rm.Addresses=TS_SERVER_NAME:7777

Where TS_SERVER_NAME is the name of the TS machine.

5. To specify the watcher.config file as the startup file to be used by the Nuance Watcher service, enter the following at the command prompt:

%NUANCE%\bin\win32\watcher-daemon-win32-service-init -a config.LogFileRootDir=%NUANCE%\Aspect\logconfig.LogFileNamePrefix=watcher"wm.snmp.MibDirs=%NUANCE%\data\mibs;%VWS%\data\mibs" watcher.Modules=http,snmpwatcher.QuiesceTimeoutMs=2000000000watcher.DaemonStartupFilename=%NUANCE%\Aspect\watcher.config

6. Navigate to http://localhost:7080/ to open the Service Control Manager and start the Nuance Watcher Daemon.



7. Verify that all specified Nuance processes are running.

Note: On a single system that houses both the Telephony Server and the Recognition Server, combine the command lines from step 4 and step 5.

Configuring Nuance through the SCI

The following example shows how to perform an initial configuration. Refer to the Aspect Customer Self-Service v7.0 Service Console Interface Guide for the Telephony Server for information about modifying the configuration.

Note: Before configuring the Nuance ASR option, compile the grammars as described in Compiling a Grammar in Chapter 2, Basic Speech Recognition.

To configure Nuance through SCI:

1. Log in to the Telephony Server as system administrator.

2. Double-click the Service Console Interface icon on the desktop.

3. Navigate to setup/options/asr and enter nuance.

The system displays the following prompt.

Please enter the number of recognizers you want to add

4. Enter the number of speech recognizers to make available in this process. The maximum number supported depends on the number of resources available on your telephony boards.

For example, if using a single D240JCT-T1 running a CAS protocol, the maximum number of recognizers is 24.

The system displays the following prompt:

Enter Nuance rec-client system info, one at a timeWhen complete, press EnterEnter system name

5. Enter the system name for the TS machine name where the Nuance RecClients will run.


Enter the port number

6. Enter the port number for the Nuance RecClient. Use the recClient port number specified in the watcher.config file on the Nuance client system.


Continue entering rec-client system names or press Enter



7. To add systems, return to step 5. If this is the last system, press ENTER to finish.

Enter a Nuance RecClient for every 24 available Nuance Recognizer. For example, to support a 240-recognizer system, enter 10 Nuance RecClients, each using separate TCP/IP port.


Enter the language for this package e.g. en-USWhen complete, press <CR>Enter language:

8. Enter the language associated with this package.


Enter the compilation server for this packageWhen complete, press <CR>Enter compilation server:

9. Enter the name of the compilation server on which this package is loaded.


Enter the name of this packageWhen complete, press <CR>Enter package name:

10. Enter the name of the package. This name corresponds to the name you used when you compiled the grammar.


Enter the FULL path of nuance package directoryEnter packages one at a timeWhen complete, press <CR>Enter path:

11. Enter the full path to the Nuance grammar directory.

The system displays the following prompt::Continue Enter the language for another package or <CR>

12. To add additional languages, return to step 8. If this is the last language, press ENTER.

13. When prompted to Configure use of Dynamic Grammars or Voice Enrollment? Current Yes/No? = yes, enter no.


Configure use of Speaker Verification?Current Yes/No? = yesyes - Configure for Speech Verificationno - Do not configure for Speech Verification

14. Enter yes to perform Speaker Verification. You must also configure a database to hold the speaker data once it is generated. See Nuance documentation for details.




Must configure at least one database for Speaker VerificationCurrent Type of database? = FilesystemDone - Done entering databases.Filesystem - Filesystem database.Oracle - Oracle database.Custom - Custom type of database, use at own risk .Type of database? =

15. Enter the type of database to use. For example, if you plan to use local disk space to store the data, enter Filesystem.

The system prompts you for the dbroot.

16. Enter the full path to the directory where the data will be stored.

If your system is running with an external recognition server, both recognition server and the TS server must be able to access the database.

The system prompts you for the dbname.

17. Enter a unique database name that Nuance uses to access the database where speaker verification data is stored.

The system prompts you for the dbhandle.

18. Enter a unique database handle that Nuance uses to access the database where speaker verification data is stored.

The system displays the following message and prompt:

Current Type of database? = FilesystemDone - Done entering databases.Filesystem - Filesystem database.Oracle - Oracle database.Custom - Custom type of database, use at own risk .Type of database? =

19. If this is the last database, enter done. Otherwise, return to step 15.


Do you wish to save utterance to a file? = noyes - Save utterance to a fileno - Do not save utterance to a file

20. .If you do not want to save utterances to a file, type no and skip to step 22. If you want to save utterances to a file, type yes.

The system prompts you for the path name.

21. Enter the directory path.

22. At the /sci/setup/options> prompt, enter done.

23. At the /sci/setup> prompt, enter done to save the configuration data.



Nuance Configuration Tips■ At least one Nuance recognition package must be configured to enable the Aspect

Nuance process to run.

■ Run the External Recognition client on the same machine where the Dialogic boards are installed (Nuance requirement).

■ Run Nuance License and Resource Managers locally on the TS server. If running the Nuance License or Resource Managers remotely, change the address setting in %NUANCE%/data/nuance-site.config.

■ Verify that the databases for Speaker Verification and Dynamic grammar are shared across the Nuance Recognition Server and CSS Telephony Server. They should share the same connection settings. For example, file system databases should map to same drive and location on both servers.

■ When the CSS Telephony Server and the Nuance Recognition Server are on different machine, enable access to the common database by putting the machines son the same domain and giving the Nuance process access to the database. If the database machine is in a different domain, run the Nuance process (Nuance Watcher Service) from an account that has access to the databases.

■ Verify that the packages are shared between the Nuance Recognition Server and CSS Telephony Server and use the same path.

■ Verify that the Dialogic hardware is correctly configured. Refer to Chapter 3 in the Aspect Customer Self-Service Installation and Upgrade Guide.


8Using Nuance ASR with VXML Chapter 8:

Use the CSS VXML cell to run a VXML script. See the Aspect Customer Self-Service Cell Reference Guide and the cell online help for information about the VXML cell. This chapter includes information about writing VXML scripts to be compatible with both CSS and the Nuance ASR option.

Grammar Support

The Aspect CSS Nuance ASR option supports the following types of grammars:

■ Built-in grammars— included in the VXML interpreter. These grammars are provided as .war files in the directory %NUANCE%\Aspect\builtin.war. Host these grammars on a web application server. In the file %NUANCE%\data\nuance.config, set the parameter egr.builtin.context to the path of where the web application server hosts the builtin.war file.

■ In-line grammars— all the items in an in-line grammar are listed in the VXML<grammar> tag. These grammars must be in the GRXML format.

■ External grammars— stored in external files. These grammars can be in either the Nuance Grammar Specification Language (GSL) or GRXML format.

Just-In-Time Grammars

To use Nuance ASR with VXML scripts, use a Just-in-Time (JIT)-enabled package on the Telephony Server. For example, create a JIT-enabled package named “USEngPkg” in the directory “%NUANCE%\Aspect\Packages\digitsJIT” using digits.grammar as a template.

%NUANCE%\Bin\win32\nuance-compile %NUANCE%\sample-packages\digits.grammar English.America -enable_jit -o %NUANCE%\Aspect\Packages\digitsJIT -package_name USEngPkg

Chapter 8: Using Nuance ASR with VXML


After creating the package on the Telephony Server, copy the package to the same directory on all Recognition Servers. For example, copy the packages to %NUANCE%\Aspect\Packages\digitsJIT.

Parameter Settings

To set the Nuance ASR -specific parameters:

■ modify the nuance.config file to include the parameter name and settings.

■ add the parameter to the VXML script in the format nuance.parameter_name.

Not all parameter settings can be set at runtime. See the Nuance documentation for information about specific parameters.

Grammar Files

When you refer to grammar files in your VXML script, follow these conventions:

■ http: URI http://URI_of_grammar#RuleName

For examplehttp://gramar_server.com/groceries.gsl#Nuts

■ file: URI file:/location_of_grammar#RuleName

For examplefile:/c:/usr/grammars/groceries.gsl#Nuts

Limitations

The following limitations apply to normal VXML scripting when using the Aspect CSS Nuance ASR option.

■ Each call can use a maximum of 50 grammars.

■ Use grammars of either the GSL format or the GRXML format. The option does not support the ABNF grammar format.

■ To use the <tag> format inside a <grammar> tag, use the following format:

<tag><variable_name “variable_value”&gt</tag>


9Nuance Call Logging and CSS Chapter 9:

This chapter explains how to enable the Nuance call logging facility for your CSS application.


■ Enabling Nuance Call Logging

■ Additional Logging Considerations

Enabling Nuance Call Logging

To enable the Nuance call logging facility for your CSS application:

1. Manually create the directory on the Telephony Server in which the call logs will be written, for example \c:\nuance\v8.0.0\CallLogs.

2. Add the following lines in your nuance-resources file:

client.WriteWaveforms True

behavior.calllog.BasePath \c:\nuance\v8.0.0\CallLogs

behavior.calllog.DirectoryLevel hours

where the value of behavior.calllog.BasePath is the directory in which all the Nuance call log data will be written.

Depending on your system traffic volume and application characteristics, the log use up significant amount of disk space, so be sure you have adequate free space in the file system containing the BasePath directory. The suggested directory name is %nuance%\aspect\calllogr\, however you can use any directory in which the containing file system has adequate free space.

Note: Performance problems may result (due to issues with NFS write performance) if the path specified in the BasePath directory is mounted to a remote NFS file system.

The behavior.calllog.DirectoryLevel parameter is used to modify the default recorded utterance and statistics file directory hierarchy generated by the Nuance call logging facility. By default, these files are placed in a subdirectory that uniquely

Chapter 9: Nuance Call Logging and CSS


identifies each recognition attempt (whose name indicates the time of that attempt), which in turn is stored in a directory that identifies the date of the recognition attempt.

This default directory structure can cause problems for the following reasons:

– Due to the simple linear-search algorithms typically employed for file name lookup, file system performance tends to degrade exponentially as the number of files in the directory grows. This can have a moderate to severe impact on your application performance, since the recognition results will not be returned by the recognizer to your application until the utterance recording is complete.

– Setting the behavior.calllog.DirectoryLevel parameter to hours instructs the Nuance call logging facility to generate intermediate level directories (between the one for the day and ones for each recognition attempt) associated with each hour. This modified structure greatly reduces the number of subdirectories created under any single directory, thereby avoiding the above problems.

Additional Logging Considerations

This section presents additional Nuance Call Logging considerations:

■ Prior to each LISTEN cell, you can optionally include a SET cell that sets the Nuance behavior.calllog.STATE and behavior.calllog.PROMPTS parameters appropriately.

Setting these parameters adds documentation to the generated Nuance call log that describes what your application was doing at the time of each recognition. The values of these parameters are completely application-specific, and are not interpreted in any way by either the Nuance recognition system or the CSS system.

■ You may also want to set one or more of the following Nuance parameters in your nuance-resources file:

– behavior.calllog.MinFreeDiskMB

– behavior.calllog.PF4All

– behavior.calllog.PF4Call

– behavior.calllog.PF4Confidence

– behavior.calllog.PF4Grammar

– behavior.calllog.PF4NLResult

– behavior.calllog.PF4Rejects

The first parameter can be used to ensure that the generated call log data does not inadvertently fill up your file system. The remaining parameters specify filters that determine which utterances are recorded by the call logging facility (for example, whether all or only a randomly-chosen sample of the utterances are recorded).



If you have specific utterances that you explicitly want to exclude from the generated call logs (regardless of the settings of the above filter parameters), you can set the behavior.calllog.ConfidentialUtterance Nuance parameter to True (using a SET cell) at the appropriate points in your application.

For a description of the effect of these parameters, see the Nuance documentation or the associated HTML pages.

■ The values for any and all behavior.calllog Nuance parameters must not contain any white space. This is a Nuance limitation; use underscores or hyphens instead.

■ The current CSS implementation does not inform the Nuance recognizer when the prompts played by the LISTEN cell are completed. For this reason, the Nuance call log (in general, erroneously) indicates that barge-in occurred on every recognition.

■ Utterance recording should not be specified when the Nuance recognizers are configured on the Telephony Server (that is, using the CSS Service Console Interface (SCI)). This CSS configuration setting has to do only with utterance recording for application purposes. Enabling it when the Nuance call logging facility is enabled interferes with the correct generation of the call logs.


Ccells

GDAT, 31LISTEN, 31VINP, 30

configurationNuance on the Recognition Server, 69Nuance on the TS, 67Recognition Server, 67

contextusing to set parameter values, 23

contexts, 23

Ddeallocating a Nuance recognizer, 31dynamic grammars, 13

Eenabling a Nuance recognizer, 31

Ffreeing a Nuance recognizer, 28

GGDAT cell, 31Grammar Specification Language, 19grammars, 19

active, 19compiling, 20developing, 21dynamic, 13file, 20names, 20re-use, 20sub grammars, 19top-level names, 19

GSL, 19

LLISTEN cell, 31

NNGB, 20Nuance ASR option

configuration, 67configuration tips, 73configuration via SCI, 70

Index


Index

Nuance Grammar Builder, 20Nuance parameters, 21Nuance recognizer

deallocating, 31enabling, 31freeing, 28

Ooptions

TS Nuance via SCI, 70

Ppackage, 20

configuration parameters, 21parameters, 21

Nuance, 21

Rrecclient, 67Recognition Clients, 67recognition contexts, 23recognition package, 20recognition resource configuration

sparse, 13Recognition Server, 67

Nuance configuration, 69recognizer

freeing, 28

Resource Manager, 67rm, 67

SSpeaker Verification, Nuance, 70sub grammars, 19

TTelephony Server, 67

Nuance ASR option, 67

VVINP cell, 30vocabmap file, 24

css v7 user guide asr option nuance

Documents