1 a few of speech recognition's greatest blunders david thomson cto, speechphone (voicexml...
TRANSCRIPT
1
A Few of Speech Recognition's Greatest Blunders
David Thomson
CTO, SpeechPhone
(VoiceXML Tools Committee chair)
Over 22 years in the field:some breakthroughs,some disasters.
3
Field Problem Examples1. Germs and money2. User training3. Echo cancellation4. Inexperienced management5. Last-minute "improvements"6. User interface testing7. Half-duplex speakerphones8. Ventilation9. Fire safety10.Leading the market11.Offering too much12.Component "upgrade"13.Tuning
4
Chapter: Analog Echo
Germs and Money
5
ATM Speaker Verification
Pick up the phone and say the following digit string: 3594.
3594
• Two levels of security: PIN and voiceprint.
• Random digit strings protect from recordings.
6
Chapter: Analog Echo
User Training
7
MovieFone (777-FILM)
MovieFone w/ASR
•MovieFone was the dominant U.S. movie information service, taking over 80,000,000 calls/year.
• ASR overwhelmingly preferred over touch-tone in caller survey.
•Users favored menu-based over spontaneous input.
Hello and welcome to MovieFone...
8
Example MovieLocator TransactionWhat science fiction movies are playing?
At the Ogden 6 theater, Pirates of the Carribean shows at 7:30.
Near Wheaton, Pirates of the Caribbean is playing at the Ogden 6 theater.
What time is it showing?
Wheaton.
Near what city?
Movie information conversation. The recognizer is designed to understand any reasonable movie information request from the caller.
9
Newspaper
Phone the Theater
MovieFone
MovieLocator
Menu-based
0
11
10
8
3
sometimesnever alwaysoften
8
5
10
5
6
6
4
1
6
8
7
1
0
2
4
Would You Use This To Find Movies?
Total = 22 subjects
10
ASR vs. Human Attendants
ASR:
- 96.2% calls routed correctly
Receptionists:
- 87% calls routed correctly
Conditions: Callers were greeted with “How may I direct your call?” and were routed to one of over 30 departments. Accuracy was scored by the customer.
11
Chapter: Analog Echo
Echo cancellation
12
Echo in an Analog System
SpeechRecognizer
Telephone Network
PromptGenerator
EchoCanceller
-6 dB
Speech: -40 dBmEcho: -33 dBmSNR: -7 dB
-11 dBm signal
-15 dB
-25 dbmSignal
Tip/Ring CardHybrid
-7 dB Line:-9 dB
Low speech signal strength and strong echos generated by the local network card conspire to make speech recognition difficult. Speech is up to 9 dB quieter and echos are about 31 dB louder than in a digital system, for a total signal-to-noise ratio loss of 40 dB.
13
Chapter: Analog Echo
Inexperienced Management
14
Voice Verification and Dialing•Panic response to competitor.
•No initial business case.
•Used unproven SV platform.
•Heavy use of inexperienced contractors.
•Poor budgeting.
•Distributed development organization.
•Turf battles, technical disagreements, egos.
•Changing feature requirements.
•Staff of 60, 4 years, $70M.
15
Chapter: Analog Echo
Last-Minute “Improvements”
Heat Sink Failure
Epoxy Beads
17
Chapter: Analog Echo
User Interface Testing
18
Multilingual Digit Dialer
Vier drei fünf vier zwei null sechs drei sieben.
•Complex user interface•Language dependencies ignored•No testing on naïve users•User errors exceeded ASR errors•System was deployed, then removed
19
Chapter: Analog Echo
Half-Duplex Speakerphones
20
Name Dialing - Placing a Call
(Dial tone)
Calling “home”
Call homeVoiceDialer
Telephone Network
21
Half-Duplex Speakerphones
Speech Recognition System
Response
Prompt
Speaker
Microphone
Half-Duplex Speakerphone
Unless user speech can force the handsfree phone to switch off the prompt, the recognition system hears nothing.
Call messages.
What can I do for you now?
) ) ) )
Lesson: Record or die
23
Unmasking Half-Duplex Equipment
Ready?
OK
Go.
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10.
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10.
Speakerphone user Handset user
24
Chapter: Analog Echo
Ventilation
25
Extreme Temperature Environment
Frame 2Frame 1
Door VentFanHall Window (20 yards)
Airflow
120 degrees
26
A/C Frame cooling example - side view
A. Ideal airflow
Monitor
Master PC
A/C
Unit
Monitor
Master PC
A/C
Unit
B. Air leaks C. Ducted frame
Monitor
Master PC
A/C
Unit
27
Improved Airflow
28
Chapter: Analog Echo
Fire Safety - 1
Example of Flammability Failure
IR View
30
Chapter: Analog Echo
Fire Safety - 2
31
Central Office Grade Speech Server
Photo of CDSUs in a frame:
d:\ppt\cdsu.jpg
48V Power
LANCard
Backplane Current Sense Resistors
Sense Resistors
33
Chapter: Analog Echo
Leading the Market
34
Wi-Fi Voice Dialing
SoftPhone
VoiceDial
SDK
TTS
Mobile Device
Data Network
VoIPGateway
Telco
Wi-Fi Network
Call DavidThomson
ASR
35
Chapter: Analog Echo
Offering too Much
36
1
2
3
4
5
6
7
8
9
*
0
#
Connecting
630-555-1212
A service that does everything
Business may subscribe to be listed in this service.
Movie Locator
Weather Line
Messages Shopping
Voice E-mailBusiness
Directory Voice Dialing
Business Directory.
Welcome to Lucent Technologies Automated Business Call Dialer. Please say the name of the Business to Call. For information, say ‘help.’
United Airlines.
Calling United. To cancel, say ‘cancel.’
VoiceXML
Privacy Manager
• Now, you can HEAR who's behind the call waiting beep.
• First, you hear the Call Waiting "beep" and then you hear the name of the second caller.
• Once you've heard the name, you decide if you want to "click over" and take the call. It's that simple!
• Talking Call Waiting is only $2.50 a month if you currently have Call Waiting on your phone line.
• Talking Call Waiting is currently available in our Major Market areas of: Chicago, IL Indianapolis, IN Detroit, MI Akron, OH Cleveland, OH Columbus, OH Dayton, OH
Milwaukee, WI
$2.50/mo.
Talking Call WaitingInstructions
or Call to Order Today 1-888-635-5050
http://www.ameritech.com/navigation/site/1,1935,150,00.html
Talking Call Waiting
39
Chapter: Analog Echo
Component “Upgrade”
Processor (before die shrink)
41
Chapter: Analog Echo
Tuning
42
Field Accuracy Improves Over TimeError Rate
Lab 1st Iteration 2nd Iteration Final
Wireless Digit Dialing Trial
Land-Line Models
New Models from Field Data
Final Tuning
43
Other Assorted Field Problems
•ASR works, forces touch-tone failures•Late beep causes people to speak early•Voice enhancement wrecked spectrum•Failure to record left developers blind•Speech takes the heat for unrelated bugs