listener-control navigation of voicexml. nuance speech analysis 92% of customer service is through...
TRANSCRIPT
Listener-Control Navigation of VoiceXML
Nuance Speech Analysis 92% of customer
service is through phone.
84% of industrialists believe speech better than web.
1st Qtr8%
2nd Qtr92%
1st Qtr16%
2nd Qtr84%
W3C (’02)
VoiceXML Forum (’00)
Motorola (’98)
HP (’98)
IBM (’98)
Bell/Lucent (’98)
AT&T (‘95)
History of VoiceXML
PML
PML
SpeechML
TalkML
VoxML
VoiceXML 1.0
VoiceXML 2.0
VoiceXML Open standard-language for serving
voice/audio documents.
VoiceXML is designed for creating audio dialogs that feature.
Synthesized speech, Digitized audio, Recognition of spoken and DTMF key input, Recording of spoken input, Telephony and Mixed-Initiative conversations.
VoiceXML (Cont’d) VoiceXML allows scripts/CGIs etc.
Can take input from the listener via speech(fill out forms like in HTML).
Used extensively for automated call handling.
Makes info accessible over (cell) phones
The next revolution on the Web.
Architectural Model
Goals of VoiceXML Web development and content delivery into
voice response applications.
Minimize client/server interactions.
Separate code from service logic.
Shield the application authors from platform specific details.
Voice Browser Software platform running on a network server.
It supports the following features. ASR DTMF Recognition grammars Mixed-initiative dialog TTS
Voice browser:VoiceXML :: Web browser:HTML
Voice Enabling
Sample VoiceXML Code <vxml version="2.0">
<form> <field name="rich">
<grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar>
<prompt>Would you like to get rich quick?</prompt> <filled>Gotcha.
<if cond="rich==‘yes’">You want to be rich! <goto next="rich.vxml" />
<else /> You don't want to be rich.
<goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>
Problem with VoiceXML Navigation of the voice document.
Author has to ask where listener will like to go next.
Listener has absolutely no control over navigation.
Tedium, Adv.Applications not possible.
Analogy: Scroll vs book
Solution Allow users to control navigation interactively.
Using Voice Anchors.
Voice Anchors Permit Speech labels that listeners can place
on a dialog.
Listener can return to that dialog later by uttering that label.
Hard to implement, as free-form speech recognition is not possible.
Need to incorporate in the voice browser.
Voice Anchors We developed a number of methods for
attaching voice anchors.
Most practical method: Spelling.
Anchor as a whole word.
Default anchors
Default navigation strategies
Initial VXML
ConverterAugmented
VXMLVoice
browser
Creates a DB file
Place Anchors
Recall Anchor
New VXML DB file
Our Architecture
Cumulative Anchors Different dialogs can be marked with the same
label.
Recalling the label reads out the corresponding dialogs.
Multiple cumulative anchors in a single document.
Allows creation of sub documents.
Hierarchy of sub documents can be created.
Grammar Set of valid expressions.
Each dialog references one or more grammars.
Nuance Grammar Specification Language (GSL).
Inline grammar and Offline grammar. Offline provides the following advantages:
Can be generated dynamically (via CGI’s, ASP's). Reused by multiple dialogs or applications. Updated and modified without change in source code.
Subgrammars and Form-level grammar.
Sample Grammar code<grammar type="application/x-gsl" mode="voice"><![CDATA[[[(skip)]{<option "skip">}[(previous)]{<option "previous">}[(place anchor) (call mark) (begin mark)]{<option
"mark">}[(recall mark) (recall anchor) (recall)]{<option "recall">}]]]>
InitialVXML
ConverterAugmented
VXML
Voice browser
Initial HTML
Translator
Reference to anotherlink in
Augmented VXML
Get the HTML page
Applications
Web access through voice.
This involves the following sequence of steps HTML -> VXML
Translator written in java was already developed.
Navigation of VXML
Applications
Mathematics for visually impaired.
This involves the following steps. MathML -> VXML.
A translator was developed to convert the MathML documents to VXML documents using the XSLT semantics.
Navigation of VXML.
Conclusion & Future work Designing default navigation strategies.
Unit of division for navigation.
Voice Scripting Languages. Example: “repeat chlorine until exit”.