speaker identification and verification dan burnett, nuance 58 th ietf

Speaker Identification and Verification

Dan Burnett, Nuance

58th IETF

Terminology

• Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers

• Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim)

• Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.

draft-burnett-mrcpext-00.txt

• Created by Nuance and Intervoice• Proposes extensions to MRCP v1

(draft-shanmugham-mrcp-04.txt)• Based originally on Nuance functionality,

modified to be more general• Starting point for MRCP v2 functionality

discussions• Also extensions for speaker-enrolled grammars,

hotword recognition, and to the recognition resource

Proposed SI/SV process(simplified, see section 6.7)

VER-START-SESSIONVER-BUFFERING-START

VER-SET-VOICEPRINT

VER-END-SESSION

VER-DELETE-VOICEPRINT

VER-ROLLBACK

GET-PARAMS

SET-PARAMS

VERIFY

VER-FROM-BUFFER*

VER-BUFFERING-STOP

VER-BUFFERING-CONTROL

VER-FROM-BUFFER*

* Requires active buffering and ver/id sessions.

Discussion points

• Why buffering?

• Registry for return info

• Anything else before I convert to MRCPv2?

Voice/Text Grammar Enrollment(simplified, see section 5.5)

• Extension to existing recognition resource

• Creates speaker-produced grammar entries

• E.g., voice-enrolled entries for voice dialing

• Both speech and text can be used to create grammar entries

START-ENROLLMENT-SESSION

END/ABORT-ENROLLMENT-SESSION

PAUSE/RESUME-ENROLLMENT-SESSION

ENROLLMENT-ROLLBACK

RECOGNIZE/STOP*

ADD/DELETE/MODIFY-PHRASE

* These methods already exist in the recognizer resource

Hotword(see section 7)

• New recognition resource

• Instead of listening for a set time period, listens continuously until it matches a grammar

• Non-matching speech is ignored and does not affect the state of the recognizer

Other Extensions

• Record method (sec. 4.4)– Allows end-pointed recording of an audio

stream

• Interpret method (sec. 4.5)– Behaves as a recognition except that text input

is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.

speaker identification and verification dan burnett, nuance 58 th ietf

Documents