Download - How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used

How Spread Works

Spread

• Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children

• It is a game used to visually motivate deaf and hearing impaired children to learn to speak

CLIENT

How does Spread work?

Record

Selection

SERVER

SphinxSphinx

.wav file + current word Transcribe

ScoringresultFeedback

CLIENT

Selection

Selection

SERVER

Selection

• The user is presented with a screen showing the word to pronounce

CLIENT

Recording

Record

Selection

SERVER

Recording

• Recording begins once the user clicks the record button.

CLIENT

Transmission

Record

Selection

SERVER

.wav file + current word

Transmission

• Transmission begins once the stop button is pressed. • The wav file, the current word and the training

phoneme are sent to the server for processing.

transmission

CLIENT

K AA R

SERVER

Training PhonemeTraining

Phoneme

CLIENT

Transcribing & Sphinx

Record

Selection

SERVER

SphinxSphinx


Transcribing

• Once the wav file arrives at the server, it is inputted into Sphinx in order to recognize what the user said

SphinxSphinx

Sphinx

• Sphinx is a Java-based Hidden Markov Model speech recognition system developed by Carnegie Mellon University

SphinxSphinx

Sphinx

• To decode the wav file, Sphinx needs three data sets– Acoustic Model– Dictionary– Language Model

SphinxSphinx

Acoustic Model

Acoustic Model

DictionaryDictionary

Language Model

Language Model

Acoustic Model

• The Acoustic Model maps sound features to units of speech called phonemes

• Derived through the sampling of a large data set of spoken words called a speech corpus

K

AA

R

Dictionary

• The dictionary maps words into phonemes

...CAN K AE NCAR K AA RCAT K AE T T...

Language Model

• The language model indicates the probability of a particular word appearing given the previous words– Not used since Spread only needs to recognize individual

words

Decoding

• Sphinx in Spread is configured to detect what phonemes were pronounced by the user

SPHINXSPHINX

KKAAAA

RR

Increasing Accuracy

• To increase accuracy, Sphinx in Spread is only made to recognize a limited number of phonemes per level

• 7 levels means 7 individually configured Sphinxes

Sphinx Level1Sphinx Level1

CAR, JAR, STAR…CAR, JAR, STAR…


BED, NET, TENT…BED, NET, TENT…


PLAY, PARTY, CIRCLE…

PLAY, PARTY, CIRCLE…

CLIENT

Scoring

Record

Selection

SERVER

SphinxSphinx


Scoring

Scoring

• The server compares the decoded result against the expected result, taking note of the training phoneme

SphinxSphinx

You said: K AA R

You said: K AA R

K AA RExpected:

Training PhonemeTraining

Phoneme

CLIENT

Final result

Record

Selection

SERVER

SphinxSphinx


ScoringresultFeedback

Feedback

• The result is sent over to the client to give feedback to the user

Preliminary results

• Tested with adult members of the hearing impaired community– Very positive. – "I wish I had this when I was learning speech"

• Problems: Too enthusiastic– Loud cheering noises reduced recognition rates

Preliminary results

• SPREAD was tested with hearing impaired students of the SPED division of the Batino Elementary School in Proj. 3, Quezon City– Accuracy testing and software evaluation

Working with the children

• Of the 40 students, only 5 volunteered to test the software– The children were generally shy and hesitant to perform

the speech


• The children only knew very few words– They knew how to sign some of the words but not to

vocalize them

• General mood was as if they were taking an exam that they were not prepared for


• Surprisingly, children were very good at conversational phrases– “Good morning”– “Good bye and thank you!”

Working with the teachers

• Teachers still need to help the students vocalize some words– System at yet cannot be left unsupervised with the

students

Working with the teachers

• Noisy screen distracts students– Need to have a simpler screen to focus on

Recognition Rates

• Sphinx recognition rates were low– Hampered by noisy environment

Conclusion

• Need to work closely with SPED teachers on speech curriculum– Test on just recently learned words

• Conversational phrases– Hearing impaired children use simple phrases rather than

words.– Conversational phrases spoken, other words signed

• UI improvements, simple is better• Accuracy improvements urgently needed

The Spread Team

Image Sources• Microphone - http://mmflc.com/images/microphone-stock-image.jpg• Crystal Project - http://www.everaldo.com/crystal/• Wave form - http://bipinb.com/converting-wav-file-to-gsm-file.htm

• Extra slides follow…

Scoring

• There are three possible outcomes– EXCELLENT– Good– Sorry

Scoring

• Getting the training phoneme correctly as well as the correct length of the phoneme gets an EXCELLENT score

K AA RExpected:

SphinxSphinx

You said: K AA R

You said: K AA R

3 Phonemes Long

Got the Training Phoneme

Scoring

• Note that Spread is only looking for the correct pronunciation of the training vowel

K AA RExpected:

SphinxSphinx

You said: K AA T

You said: K AA T

3 Phonemes Long


Scoring

• Not getting the correct word length gets a Good score

K AA RExpected:

SphinxSphinx

You said: K AA R TYou said: K AA R T

3 Phonemes Long


Scoring

• Not getting the training vowel means the user will have to try again– Length is no longer checked

K AA RExpected:

SphinxSphinx

You said: K AE R

You said: K AE R

Got the Training PhonemeSorry =(

Updates

• SPREAD has undergone BETA testing with a group of hearing impaired adults– Testing of original (pass/fail) algorithm

• Results– Low recognition rates even for recognizable speech– Puzzling due to high recognition rates with lab speech

Recognition RateWord Rate Close wordApple 60% Apple (60%)Art 6% Bat (66%)Banana 13% Apple(73%)Bat 66% Bat (66%)Car 0% Hand (46%)Fan 0% Hand (53%)Hand 20% Bat (33%)Jar 0% Hand (60%)Lamb 0% Apple (33%)Sofa 0% Hand (46%)Star 0% Fan (26%)Table 0% Apple (46%)Van 0% Art (26%)Wallet 0% Hand (60%)

Analysis

• Microphone

Lab test data

Live data

Recommendations

• Better microphone/setup– Sphinx has preprocessing modules for less noise

• Per word recognition– Use creative word combinations to isolate training

phoneme w/o having to go into per phoneme recognition

• Check out phoneme recognizers

Per phoneme recognition

• Per phoneme recognition is worse– Spread is highly dependent on full words for increased

recognition ratesRecognizing: Lamb2.wav I heard: ae ah mRecognizing: Lamb3.wav I heard: ae mRecognizing: Sofa1.wav I heard: s ow l owRecognizing: Sofa2.wav I heard: s aeRecognizing: Sofa3.wav I heard: s ow hh aaRecognizing: Star1.wav I heard: ao tRecognizing: Star2.wav I heard: s d aa rRecognizing: Star3.wav I heard: s d aa rRecognizing: Table1.wav I heard: ah d lRecognizing: Table2.wav I heard: ae ahRecognizing: Table3.wav I heard: ae ahRecognizing: Van1.wav I heard: m aeRecognizing: Van2.wav I heard: m ae

Download - How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used

Top Related