Spread
• Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children
• It is a game used to visually motivate deaf and hearing impaired children to learn to speak
CLIENT
How does Spread work?
Record
Selection
SERVER
SphinxSphinx
.wav file + current word Transcribe
ScoringresultFeedback
Transmission
• Transmission begins once the stop button is pressed. • The wav file, the current word and the training
phoneme are sent to the server for processing.
transmission
CLIENT
K AA R
SERVER
Training PhonemeTraining
Phoneme
CLIENT
Transcribing & Sphinx
Record
Selection
SERVER
SphinxSphinx
.wav file + current word Transcribe
Transcribing
• Once the wav file arrives at the server, it is inputted into Sphinx in order to recognize what the user said
SphinxSphinx
Sphinx
• Sphinx is a Java-based Hidden Markov Model speech recognition system developed by Carnegie Mellon University
SphinxSphinx
Sphinx
• To decode the wav file, Sphinx needs three data sets– Acoustic Model– Dictionary– Language Model
SphinxSphinx
Acoustic Model
Acoustic Model
DictionaryDictionary
Language Model
Language Model
Acoustic Model
• The Acoustic Model maps sound features to units of speech called phonemes
• Derived through the sampling of a large data set of spoken words called a speech corpus
K
AA
R
Language Model
• The language model indicates the probability of a particular word appearing given the previous words– Not used since Spread only needs to recognize individual
words
Decoding
• Sphinx in Spread is configured to detect what phonemes were pronounced by the user
SPHINXSPHINX
KKAAAA
RR
Increasing Accuracy
• To increase accuracy, Sphinx in Spread is only made to recognize a limited number of phonemes per level
• 7 levels means 7 individually configured Sphinxes
Sphinx Level1Sphinx Level1
CAR, JAR, STAR…CAR, JAR, STAR…
Sphinx Level2Sphinx Level2
BED, NET, TENT…BED, NET, TENT…
Sphinx Level3Sphinx Level3
PLAY, PARTY, CIRCLE…
PLAY, PARTY, CIRCLE…
Scoring
• The server compares the decoded result against the expected result, taking note of the training phoneme
SphinxSphinx
You said: K AA R
You said: K AA R
K AA RExpected:
Training PhonemeTraining
Phoneme
CLIENT
Final result
Record
Selection
SERVER
SphinxSphinx
.wav file + current word Transcribe
ScoringresultFeedback
Preliminary results
• Tested with adult members of the hearing impaired community– Very positive. – "I wish I had this when I was learning speech"
• Problems: Too enthusiastic– Loud cheering noises reduced recognition rates
Preliminary results
• SPREAD was tested with hearing impaired students of the SPED division of the Batino Elementary School in Proj. 3, Quezon City– Accuracy testing and software evaluation
Working with the children
• Of the 40 students, only 5 volunteered to test the software– The children were generally shy and hesitant to perform
the speech
Working with the children
• The children only knew very few words– They knew how to sign some of the words but not to
vocalize them
• General mood was as if they were taking an exam that they were not prepared for
Working with the children
• Surprisingly, children were very good at conversational phrases– “Good morning”– “Good bye and thank you!”
Working with the teachers
• Teachers still need to help the students vocalize some words– System at yet cannot be left unsupervised with the
students
Working with the teachers
• Noisy screen distracts students– Need to have a simpler screen to focus on
Conclusion
• Need to work closely with SPED teachers on speech curriculum– Test on just recently learned words
• Conversational phrases– Hearing impaired children use simple phrases rather than
words.– Conversational phrases spoken, other words signed
• UI improvements, simple is better• Accuracy improvements urgently needed
Image Sources• Microphone - http://mmflc.com/images/microphone-stock-image.jpg• Crystal Project - http://www.everaldo.com/crystal/• Wave form - http://bipinb.com/converting-wav-file-to-gsm-file.htm
Scoring
• Getting the training phoneme correctly as well as the correct length of the phoneme gets an EXCELLENT score
K AA RExpected:
SphinxSphinx
You said: K AA R
You said: K AA R
3 Phonemes Long
Got the Training Phoneme
Scoring
• Note that Spread is only looking for the correct pronunciation of the training vowel
K AA RExpected:
SphinxSphinx
You said: K AA T
You said: K AA T
3 Phonemes Long
Got the Training Phoneme
Scoring
• Not getting the correct word length gets a Good score
K AA RExpected:
SphinxSphinx
You said: K AA R TYou said: K AA R T
3 Phonemes Long
Got the Training Phoneme
Scoring
• Not getting the training vowel means the user will have to try again– Length is no longer checked
K AA RExpected:
SphinxSphinx
You said: K AE R
You said: K AE R
Got the Training PhonemeSorry =(
Updates
• SPREAD has undergone BETA testing with a group of hearing impaired adults– Testing of original (pass/fail) algorithm
• Results– Low recognition rates even for recognizable speech– Puzzling due to high recognition rates with lab speech
Recognition RateWord Rate Close wordApple 60% Apple (60%)Art 6% Bat (66%)Banana 13% Apple(73%)Bat 66% Bat (66%)Car 0% Hand (46%)Fan 0% Hand (53%)Hand 20% Bat (33%)Jar 0% Hand (60%)Lamb 0% Apple (33%)Sofa 0% Hand (46%)Star 0% Fan (26%)Table 0% Apple (46%)Van 0% Art (26%)Wallet 0% Hand (60%)
Recommendations
• Better microphone/setup– Sphinx has preprocessing modules for less noise
• Per word recognition– Use creative word combinations to isolate training
phoneme w/o having to go into per phoneme recognition
• Check out phoneme recognizers
Per phoneme recognition
• Per phoneme recognition is worse– Spread is highly dependent on full words for increased
recognition ratesRecognizing: Lamb2.wav I heard: ae ah mRecognizing: Lamb3.wav I heard: ae mRecognizing: Sofa1.wav I heard: s ow l owRecognizing: Sofa2.wav I heard: s aeRecognizing: Sofa3.wav I heard: s ow hh aaRecognizing: Star1.wav I heard: ao tRecognizing: Star2.wav I heard: s d aa rRecognizing: Star3.wav I heard: s d aa rRecognizing: Table1.wav I heard: ah d lRecognizing: Table2.wav I heard: ae ahRecognizing: Table3.wav I heard: ae ahRecognizing: Van1.wav I heard: m aeRecognizing: Van2.wav I heard: m ae