speech recognition ece5526 wilson burgos. outline introduction objective existing solutions...
TRANSCRIPT
uSpeak2MeEsol Android Tool
Speech Recognition ECE5526
Wilson Burgos
OutlineIntroductionObjectiveExisting SolutionsImplementationTest and ResultConclusion
IntroductionLots of $$$ are spent annually to improve
language skills for non native speakers.Classes for ESOL (English Speakers of other
languages) Lack of effective toolsSpeech recognition can help us in some areas
ObjectiveCreate a tool to help people learn to speak
English correctly in an effective way.Engage people using new technology
(Smartphone's) Using pocketsphinx, android and Text-to-
speech technologySimple and intuitive to useFun
Existing SolutionsEyeSpeak - http://www.eyespeakenglish.com
Pros Uses Native Speakers Pronunciation, pitch, timing & loudness
Cons Difficult to use Runs only on windows
Concept of OperationThe user from the main menu can start the
gameThe game screen must lead the user through
a series of words and log the number of positive responses (the score).
Each word has a corresponding graphic to display. For example, the game might show the user a picture of a mountain
The user has at most 30 seconds to respond
Development EnvironmentEclipse IDE with Android pluginCygwin Emulator
QEMU-based ARM emulatorRuns the same image as the deviceLimitations
No Camera support
Development EnvironmentActual Device
ImplementationUsing Java with the Android SDKPocketsphinx
Lightweight speech recognition decoder library Implemented in C
Android Architecture
Application Building BlocksActivityIntentReceiverServiceContentProvider
Application Architecture
ImplementationQuizGameActivity
The screen at the heart of the application—the game play screen.
This screen prompts the user to answer a series of trivia questions and stores the resulting score information
Uset Text-to-Speech technology to speak word if in simple mode
Implementation
RecognizerTask AudioTask
PocketSphinx VUMeter
ImplementationRecognizerTask
Interfaces directly to the pocketsphinx library using JNI calls
The Java Native Interface (JNI) enables the integration of code written in the Java programming language with code written in other languages such as C and C
Consumes data from the audio queue, produced by the AudioTask
Calls process_rawScoring
Based on positive detection of the utterance
ImplementationAudioTask
Interfaces directly to the audio peripherals to gather data
Format Sample Rate 8000Hz, 16Bit PCM, 8192k buffer
PocketSphinxVery limited documentationPackaged the pocketsphinx into a shared
library Created java shared library counterpart (jar)
To be added to the android applicationCompiled using cygwinInitialized with custom dictionary and
language modelSpeak2me.dicSpeak2me.lm
Loaded at startup from java code
LimitationsHardware memory
In the sphinx4 demos the recognizer was active all the time gathering data. When running in the device the AudioRecord buffer fills up preventing the recognizer to be active all the time.
Game needs to be responsive, how to solve this problem?
LimitationsHardware memory
The VUMeter class calculates the energy of the sampled data, removing the DC offset with a filter.
Detection logic was added to trigger end of utterance automatically with configurable lock/unlock thresholds
The game timer automatically starts the recognizer after every given word
Device SpeedTo improve detection the application uses the
partial results to determine if a match has been found, doesn’t penalize if partial is incorrect.
Screenshots
Test and ResultsThe cmu07a.dic recognized very poorlyhub4_wsj_sc_3s_8k.cd_semi_5000 TOTAL Words: 91
Correct: 56 Errors: 46 TOTAL Percent correct = 61.54%
Error = 50.55% Accuracy = 49.45% TOTAL Insertions: 11 Deletions: 3 Substitutions: 32
hub4_wsj_sc_3s_8k.cd_semi_5000adapt TOTAL Words: 91 Correct: 71 Errors: 25 TOTAL Percent correct = 78.02%
Error = 27.47% Accuracy = 72.53% T TOTAL Insertions: 5 Deletions: 9 Substitutions: 11
Test and ResultsUsing the custom corpus and creating
custom language model the tool accurately detects speech in a timely fashion ~2s.
InstallationNeed to install custom lexical and language
modeling files
Future AdditionsAdapt scoring based on pitch and phoneme
recognition. Add different levels of difficultyShow progress reports
Referenceshttp://developer.android.comhttp://sites.google.com/site/ioSams Teach Yourself Android Application
Development, Lauren Darcey & Shane Conder (2010)