1 bilc seminar 2009 speech recognition: is it for real? tony mirabito defense language institute...
TRANSCRIPT
BILC SEMINAR 2009
Speech Recognition: Is It for Real?
Tony Mirabito
Defense Language Institute
English Language Center
(DLIELC)
DLIELC
OVERVIEW
● The technology evolution in language teaching ● What speech recognition is
● How speech recognition works
● Shortcomings of speech recognition ● Conclusions
DLIELC
THE TECHNOLOGY EVOLUTION
The Classroom
WHAT SPEECH RECOGNITION IS
● A formal definition: -- A system of spoken input into a computer
in which software can “recognize” the input and transform it into digitized signals—that is, “react” in various ways to the spoken input
● Examples: -- Speech to text -- “Telephony”: airlines, transportation, etc. -- Commercial software for learning a language (e.g. Rosetta Stone)
DLIELC
WHAT SPEECH RECOGNITION IS
● A computer program that takes verbal input and “matches” it against models—acoustic and language models
● A computer program that allows speech to be evaluated as “correct” or “acceptable” or “incorrect” or “unacceptable”
● A computer program that “talks to“ an authoring software and allows the software to branch in different directions based on the evaluation
DLIELC
WHAT SPEECH RECOGNITION IS
● “Speaker independent” vs. “speaker dependent” -- Speaker independent: a speech recognition program that recognizes all speakers (used in language learning)
-- Speaker dependent: a speech recognition program that is “trained” to recognize a particular speaker (speech to text)
DLIELC
WHAT SPEECH RECOGNITION IS
● “Discreet speech input” vs. “continuous speech input”
-- Discreet speech input: requires a user to pause between words (e.g. “I + want + to + leave.”)
-- Continuous speech input: blending of sounds between words is allowed (e.g. “next + week” becomes “neksweek’)
● Cannot “understand” free speech; it “matches” speech input with stored data and pre-determined parameters
DLIELC
HOW SPEECH RECOGNITION WORKS
● Speech recognition technology is based on the Markov Chain Theory, a mathematical formula that deals with probabilities and changes
● Most speech recognition engines contain common databases for a particular language: -- Grammar -- Lexicon (dictionary) -- Supra-segmental models (prosody) -- Acoustic speech samples
DLIELC
HOW SPEECH RECOGNITION WORKS
● Speech is input through a microphone, analyzed by databases (“search aligners”), and then scored against a norm
● A developer can determine the score as being “acceptable” or “unacceptable”; appropriate feedback from the authoring software can be given to the user (text, audio, video, etc.)
DLIELC
DLIELC
Speech Input SR Engine Databases
Authoring Software
Feedback
HOW SPEECH RECOGNITION WORKS
● Communication between the speech engine and the authoring software is essential
That’s a table.
SHORTCOMINGS OF SPEECH RECOGNITION TECHNOLOGY
● Quiet environment needed plus noise- reduction microphones
● General problems with consonants ● Prosody and fluency problems which
requires re-engineering the engine
● Perpetual issues: false positives false negatives● Most recognizers are effective only in limited domains
DLIELC
SHORTCOMINGS OF SPEECH RECOGNITION TECHNOLOGY
● If used as a diagnostic tool to pinpoint pronunciation problems, it must be carefully re-engineered to do so, and it must be accurate
DLIELC
CONCLUSIONS
● Most speech recognizers have a couple of domains where they are effective—the recognizer should be used only in these domains (“low stakes” vs. “high stakes”)
● Speech recognition technology is not a “black box” or a “magic pill”. It’s a tool that has to be used very carefully.
● Research needs to be done in order to use a recognizer effectively
● We must never forget that technology is effective only if it allows people to learn
DLIELC
BILC SEMINAR 2009
QUESTIONS?
COMMENTS?
DLIELC