1 bilc seminar 2009 speech recognition: is it for real? tony mirabito defense language institute...

BILC SEMINAR 2009

Speech Recognition: Is It for Real?

Tony Mirabito

Defense Language Institute

English Language Center

(DLIELC)

DLIELC

OVERVIEW

● The technology evolution in language teaching ● What speech recognition is

● How speech recognition works

● Shortcomings of speech recognition ● Conclusions

DLIELC

THE TECHNOLOGY EVOLUTION

The Classroom

WHAT SPEECH RECOGNITION IS

● A formal definition: -- A system of spoken input into a computer

in which software can “recognize” the input and transform it into digitized signals—that is, “react” in various ways to the spoken input

● Examples: -- Speech to text -- “Telephony”: airlines, transportation, etc. -- Commercial software for learning a language (e.g. Rosetta Stone)

DLIELC


● A computer program that takes verbal input and “matches” it against models—acoustic and language models

● A computer program that allows speech to be evaluated as “correct” or “acceptable” or “incorrect” or “unacceptable”

● A computer program that “talks to“ an authoring software and allows the software to branch in different directions based on the evaluation

DLIELC


● “Speaker independent” vs. “speaker dependent” -- Speaker independent: a speech recognition program that recognizes all speakers (used in language learning)

-- Speaker dependent: a speech recognition program that is “trained” to recognize a particular speaker (speech to text)

DLIELC


● “Discreet speech input” vs. “continuous speech input”

-- Discreet speech input: requires a user to pause between words (e.g. “I + want + to + leave.”)

-- Continuous speech input: blending of sounds between words is allowed (e.g. “next + week” becomes “neksweek’)

● Cannot “understand” free speech; it “matches” speech input with stored data and pre-determined parameters

DLIELC

HOW SPEECH RECOGNITION WORKS

● Speech recognition technology is based on the Markov Chain Theory, a mathematical formula that deals with probabilities and changes

● Most speech recognition engines contain common databases for a particular language: -- Grammar -- Lexicon (dictionary) -- Supra-segmental models (prosody) -- Acoustic speech samples

DLIELC


● Speech is input through a microphone, analyzed by databases (“search aligners”), and then scored against a norm

● A developer can determine the score as being “acceptable” or “unacceptable”; appropriate feedback from the authoring software can be given to the user (text, audio, video, etc.)

DLIELC

DLIELC

Speech Input SR Engine Databases

Authoring Software

Feedback


● Communication between the speech engine and the authoring software is essential

That’s a table.

SHORTCOMINGS OF SPEECH RECOGNITION TECHNOLOGY

● Quiet environment needed plus noise- reduction microphones

● General problems with consonants ● Prosody and fluency problems which

requires re-engineering the engine

● Perpetual issues: false positives false negatives● Most recognizers are effective only in limited domains

DLIELC

SHORTCOMINGS OF SPEECH RECOGNITION TECHNOLOGY

● If used as a diagnostic tool to pinpoint pronunciation problems, it must be carefully re-engineered to do so, and it must be accurate

DLIELC

CONCLUSIONS

● Most speech recognizers have a couple of domains where they are effective—the recognizer should be used only in these domains (“low stakes” vs. “high stakes”)

● Speech recognition technology is not a “black box” or a “magic pill”. It’s a tool that has to be used very carefully.

● Research needs to be done in order to use a recognizer effectively

● We must never forget that technology is effective only if it allows people to learn

DLIELC

BILC SEMINAR 2009

QUESTIONS?

COMMENTS?

DLIELC

1 bilc seminar 2009 speech recognition: is it for real? tony mirabito defense language institute...

Documents