1 a few of speech recognition's greatest blunders david thomson cto, speechphone (voicexml...

44
1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) [email protected]

Upload: irma-merritt

Post on 01-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

1

A Few of Speech Recognition's Greatest Blunders

David Thomson

CTO, SpeechPhone

(VoiceXML Tools Committee chair)

[email protected]

Page 2: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Over 22 years in the field:some breakthroughs,some disasters.

Page 3: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

3

Field Problem Examples1. Germs and money2. User training3. Echo cancellation4. Inexperienced management5. Last-minute "improvements"6. User interface testing7. Half-duplex speakerphones8. Ventilation9. Fire safety10.Leading the market11.Offering too much12.Component "upgrade"13.Tuning

Page 4: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

4

Chapter: Analog Echo

Germs and Money

Page 5: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

5

ATM Speaker Verification

Pick up the phone and say the following digit string: 3594.

3594

• Two levels of security: PIN and voiceprint.

• Random digit strings protect from recordings.

Page 6: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

6

Chapter: Analog Echo

User Training

Page 7: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

7

MovieFone (777-FILM)

MovieFone w/ASR

•MovieFone was the dominant U.S. movie information service, taking over 80,000,000 calls/year.

• ASR overwhelmingly preferred over touch-tone in caller survey.

•Users favored menu-based over spontaneous input.

Hello and welcome to MovieFone...

Page 8: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

8

Example MovieLocator TransactionWhat science fiction movies are playing?

At the Ogden 6 theater, Pirates of the Carribean shows at 7:30.

Near Wheaton, Pirates of the Caribbean is playing at the Ogden 6 theater.

What time is it showing?

Wheaton.

Near what city?

Movie information conversation. The recognizer is designed to understand any reasonable movie information request from the caller.

Page 9: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

9

Newspaper

Phone the Theater

MovieFone

MovieLocator

Menu-based

0

11

10

8

3

sometimesnever alwaysoften

8

5

10

5

6

6

4

1

6

8

7

1

0

2

4

Would You Use This To Find Movies?

Total = 22 subjects

Page 10: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

10

ASR vs. Human Attendants

ASR:

- 96.2% calls routed correctly

Receptionists:

- 87% calls routed correctly

Conditions: Callers were greeted with “How may I direct your call?” and were routed to one of over 30 departments. Accuracy was scored by the customer.

Page 11: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

11

Chapter: Analog Echo

Echo cancellation

Page 12: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

12

Echo in an Analog System

SpeechRecognizer

Telephone Network

PromptGenerator

EchoCanceller

-6 dB

Speech: -40 dBmEcho: -33 dBmSNR: -7 dB

-11 dBm signal

-15 dB

-25 dbmSignal

Tip/Ring CardHybrid

-7 dB Line:-9 dB

Low speech signal strength and strong echos generated by the local network card conspire to make speech recognition difficult. Speech is up to 9 dB quieter and echos are about 31 dB louder than in a digital system, for a total signal-to-noise ratio loss of 40 dB.

Page 13: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

13

Chapter: Analog Echo

Inexperienced Management

Page 14: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

14

Voice Verification and Dialing•Panic response to competitor.

•No initial business case.

•Used unproven SV platform.

•Heavy use of inexperienced contractors.

•Poor budgeting.

•Distributed development organization.

•Turf battles, technical disagreements, egos.

•Changing feature requirements.

•Staff of 60, 4 years, $70M.

Page 15: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

15

Chapter: Analog Echo

Last-Minute “Improvements”

Page 16: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Heat Sink Failure

Epoxy Beads

Page 17: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

17

Chapter: Analog Echo

User Interface Testing

Page 18: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

18

Multilingual Digit Dialer

Vier drei fünf vier zwei null sechs drei sieben.

•Complex user interface•Language dependencies ignored•No testing on naïve users•User errors exceeded ASR errors•System was deployed, then removed

Page 19: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

19

Chapter: Analog Echo

Half-Duplex Speakerphones

Page 20: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

20

Name Dialing - Placing a Call

(Dial tone)

Calling “home”

Call homeVoiceDialer

Telephone Network

Page 21: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

21

Half-Duplex Speakerphones

Speech Recognition System

Response

Prompt

Speaker

Microphone

Half-Duplex Speakerphone

Unless user speech can force the handsfree phone to switch off the prompt, the recognition system hears nothing.

Call messages.

What can I do for you now?

) ) ) )

Page 22: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Lesson: Record or die

Page 23: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

23

Unmasking Half-Duplex Equipment

Ready?

OK

Go.

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10.

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10.

Speakerphone user Handset user

Page 24: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

24

Chapter: Analog Echo

Ventilation

Page 25: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

25

Extreme Temperature Environment

Frame 2Frame 1

Door VentFanHall Window (20 yards)

Airflow

120 degrees

Page 26: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

26

A/C Frame cooling example - side view

A. Ideal airflow

Monitor

Master PC

A/C

Unit

Monitor

Master PC

A/C

Unit

B. Air leaks C. Ducted frame

Monitor

Master PC

A/C

Unit

Page 27: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

27

Improved Airflow

Page 28: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

28

Chapter: Analog Echo

Fire Safety - 1

Page 29: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Example of Flammability Failure

IR View

Page 30: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

30

Chapter: Analog Echo

Fire Safety - 2

Page 31: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

31

Central Office Grade Speech Server

Photo of CDSUs in a frame:

d:\ppt\cdsu.jpg

48V Power

LANCard

Page 32: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Backplane Current Sense Resistors

Sense Resistors

Page 33: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

33

Chapter: Analog Echo

Leading the Market

Page 34: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

34

Wi-Fi Voice Dialing

SoftPhone

VoiceDial

SDK

TTS

Mobile Device

Data Network

VoIPGateway

Telco

Wi-Fi Network

Call DavidThomson

ASR

Page 35: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

35

Chapter: Analog Echo

Offering too Much

Page 36: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

36

1

2

3

4

5

6

7

8

9

*

0

#

Connecting

630-555-1212

A service that does everything

Business may subscribe to be listed in this service.

Movie Locator

Weather Line

Messages Shopping

Voice E-mailBusiness

Directory Voice Dialing

Business Directory.

Welcome to Lucent Technologies Automated Business Call Dialer. Please say the name of the Business to Call. For information, say ‘help.’

United Airlines.

Calling United. To cancel, say ‘cancel.’

VoiceXML

Page 37: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Privacy Manager

Page 38: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

• Now, you can HEAR who's behind the call waiting beep.

• First, you hear the Call Waiting "beep" and then you hear the name of the second caller.

• Once you've heard the name, you decide if you want to "click over" and take the call. It's that simple!

• Talking Call Waiting is only $2.50 a month if you currently have Call Waiting on your phone line.

• Talking Call Waiting is currently available in our Major Market areas of: Chicago, IL Indianapolis, IN Detroit, MI Akron, OH Cleveland, OH Columbus, OH Dayton, OH

Milwaukee, WI

$2.50/mo.

Talking Call WaitingInstructions

or Call to Order Today 1-888-635-5050

http://www.ameritech.com/navigation/site/1,1935,150,00.html

Talking Call Waiting

Page 39: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

39

Chapter: Analog Echo

Component “Upgrade”

Page 40: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

Processor (before die shrink)

Page 41: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

41

Chapter: Analog Echo

Tuning

Page 42: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

42

Field Accuracy Improves Over TimeError Rate

Lab 1st Iteration 2nd Iteration Final

Wireless Digit Dialing Trial

Land-Line Models

New Models from Field Data

Final Tuning

Page 43: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

43

Other Assorted Field Problems

•ASR works, forces touch-tone failures•Late beep causes people to speak early•Voice enhancement wrecked spectrum•Failure to record left developers blind•Speech takes the heat for unrelated bugs

Page 44: 1 A Few of Speech Recognition's Greatest Blunders David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair) david@speechphone.com

44

For Slides or More Information

David Thomson

[email protected]

Phone 949-655-1693