speaker identification and verification dan burnett, nuance 58 th ietf
TRANSCRIPT
![Page 1: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/1.jpg)
Speaker Identification and Verification
Dan Burnett, Nuance
58th IETF
![Page 2: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/2.jpg)
Terminology
• Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers
• Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim)
• Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.
![Page 3: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/3.jpg)
draft-burnett-mrcpext-00.txt
• Created by Nuance and Intervoice• Proposes extensions to MRCP v1
(draft-shanmugham-mrcp-04.txt)• Based originally on Nuance functionality,
modified to be more general• Starting point for MRCP v2 functionality
discussions• Also extensions for speaker-enrolled grammars,
hotword recognition, and to the recognition resource
![Page 4: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/4.jpg)
Proposed SI/SV process(simplified, see section 6.7)
VER-START-SESSIONVER-BUFFERING-START
VER-SET-VOICEPRINT
VER-END-SESSION
VER-DELETE-VOICEPRINT
VER-ROLLBACK
GET-PARAMS
SET-PARAMS
VERIFY
VER-FROM-BUFFER*
VER-BUFFERING-STOP
VER-BUFFERING-CONTROL
VER-FROM-BUFFER*
* Requires active buffering and ver/id sessions.
![Page 5: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/5.jpg)
Discussion points
• Why buffering?
• Registry for return info
• Anything else before I convert to MRCPv2?
![Page 6: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/6.jpg)
Voice/Text Grammar Enrollment(simplified, see section 5.5)
• Extension to existing recognition resource
• Creates speaker-produced grammar entries
• E.g., voice-enrolled entries for voice dialing
• Both speech and text can be used to create grammar entries
START-ENROLLMENT-SESSION
END/ABORT-ENROLLMENT-SESSION
PAUSE/RESUME-ENROLLMENT-SESSION
ENROLLMENT-ROLLBACK
RECOGNIZE/STOP*
ADD/DELETE/MODIFY-PHRASE
* These methods already exist in the recognizer resource
![Page 7: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/7.jpg)
Hotword(see section 7)
• New recognition resource
• Instead of listening for a set time period, listens continuously until it matches a grammar
• Non-matching speech is ignored and does not affect the state of the recognizer
![Page 8: Speaker Identification and Verification Dan Burnett, Nuance 58 th IETF](https://reader036.vdocument.in/reader036/viewer/2022082517/56649dd85503460f94acd852/html5/thumbnails/8.jpg)
Other Extensions
• Record method (sec. 4.4)– Allows end-pointed recording of an audio
stream
• Interpret method (sec. 4.5)– Behaves as a recognition except that text input
is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.