36605087 voice recognition system report
TRANSCRIPT
-
8/10/2019 36605087 Voice Recognition System Report
1/21
VOICE RECOGNITION BASED
HOME AUTOMATION SYSTEM
-
8/10/2019 36605087 Voice Recognition System Report
2/21
2
CHAPTER 1
Introduction
Voice Recognition System is a system which can recognize the voices. This can e !or the
"ur"ose o! words identi!ication or !or the "ur"ose o! security.
Voice Recognition is the "rocess o! automatica##y recognizing who is s"ea$ing or what is
s"ea$ing% on the asis o! individua# in!ormation inc#uded in the s"eech waves. This techni&ue
ma$es it "ossi#e to use the s"ea$er's voice to veri!y their identity and contro# access to services
such as voice dia#ing% an$ing y te#e"hone% te#e"hone sho""ing% dataase access services%
in!ormation services% voice mai#% security contro# !or con!identia# in!ormation areas% and remote
access to com"uters.
Some Voice Recognition System is designed in such a way that they can convert the s"o$en
words into te(t.
Voice recognition System or So!tware)s can a#so e used as an a#ternative to ty"ing on a
$eyoard. Put sim"#y% you ta#$ to the com"uter and your words a""ear on the screen. The
so!tware has een deve#o"ed to "rovide a !ast method o! writing onto a com"uter and can he#"
"eo"#e with a variety o! disai#ities. It is use!u# !or "eo"#e with "hysica# disai#ities who o!ten
!ind ty"ing di!!icu#t% "ain!u# or im"ossi#e. Voice recognition so!tware can a#so he#" those with
s"e##ing di!!icu#ties% inc#uding users with dys#e(ic% ecause recognized words are a#ways
correct#y s"e##ed.
*e can see the use o! Voice Recognition Systems in our dai#y #i!e !or e(am"#e today% when we
ca## most #arge com"anies+ a "erson doesn)t usua##y answer the "hone. Instead% an automated
voice recording answers and instructs you to "ress uttons to move through o"tions menus.
,any com"anies have moved eyond re&uiring you to "ress uttons% though. -!ten you can ust
s"ec$ certain words /again as instructed y a recording0 to get what you need. The system that
ma$es this "ossi#e is a ty"e o! Voice Recognition Program an automated "hone system.
2ou can a#so use voice recognition so!tware in homes and usinesses. A range o! so!tware
"roducts a##ows users to dictate to their com"uter and have their words converted to te(t in a
word "rocessing or e3mai# document. 2ou can access !unction commands% such as o"ening !i#es
and accessing menus% with voice instructions. Some "rograms are !or s"eci!ic usiness settings%
such as medica# or #ega# transcri"tion.
Peo"#e with disai#ities that "revent them !rom ty"ing have a#so ado"ted voice3recognition
systems. I! a user has #ost the use o! his hands% or !or visua##y im"aired users when it is not
-
8/10/2019 36605087 Voice Recognition System Report
3/21
-
8/10/2019 36605087 Voice Recognition System Report
4/21
4
C#assi!ication o! Voice Recognition System
Iso#ated Voice Recognition System re&uires a rie! "ause etween each s"o$en word%
otherwise they can)t detect the voice com"#ete#y% and this system wi## ma#!unction.
Continuous Voice Recognition System doesn)t re&uire a rie! "ause etween each s"o$en
words% hence it can detect the continuous s"eech or voice. *e can say that this system is
an advance version o! the Iso#ated Voice Recognition System.
S"ea$er3e"endent Voice Recognition System can on#y recognize the s"eech !rom one
"articu#ar s"ea$er)s voice. This ty"e o! system)s can e used !or security and
identi!ication "ur"oses.
-
8/10/2019 36605087 Voice Recognition System Report
5/21
5
S"ea$er3Inde"endent Voice Recognition System can recognize the s"eech !rom anyody.
These ty"es o! systems are emedded in voice3activated routing at customer ca## centre)s%
voice dia#ing on moi#e "hones and many other dai#y a""#ications. This system is an
advanced version o! S"ea$er3e"endent Voice Recognition System.
The eve#o"ment *or$!#ow o! Voice Recognition System
There are two maor stages within Voice RecognitionD a training stage and a testing stage.
Training invo#ves 6teaching the system y ui#ding its !ictionary% an acoustic mode# !or eachword that the system needs to recognize. In the testing stage we use acoustic mode#s o! these
words to recognize s"o$en words using a c#assi!ication a#gorithm.
The eve#o"ment *or$!#ow consists o! three ste"sD
S"eech Ac&uisition.
S"eech Ana#ysis.
@ser Inter!ace eve#o"ment.
S"eech Ac&uisition
>or training s"eech is ac&uired !rom the micro"hone and rought under the deve#o"ment
environment !or the o!!#ine ana#ysis. >or testing the s"eech is continuous#y streamed into the
environment !or on#ine "rocessing.
uring the training stage% it is necessary to record the re"eated utterances o! each word in the
dictionary. >or e(am"#e% su""ose we are recording the word 6A""#e in the dictionary% then we
have to record the 6A""#e !or many times with a "ause etween each utterance. This isnecessary !or ui#ding a roust voice recognition system. I! we !ai# to do so% then the system
deve#o"ed may "roduce undesira#e res"onses.
*e can record the s"eech y using a micro"hone and with the he#" o! standard PC3Sound Card.
This a""roach wor$s we## !or training data. In the testing stage% we need to continuous#y ac&uire
-
8/10/2019 36605087 Voice Recognition System Report
6/21
6
and u!!er s"eech sam"#es% and at the same time% "rocess the incoming s"eech"ra#e by "ra#e% or
in continuous grou"s o! sam"#es.
S"eech Ana#ysis
*hen s"eech is ac&uired into the deve#o"ment environment then it has to e "rocessed orana#yzed. This s"eech ana#ysis is one o! the most com"#icated and im"ortant ste" in the
deve#o"ment o! voice recognition system. In this stage a word detection a#gorithm is made that
serrate each word !rom the amient noise. Then an acoustic mode# is derived that gives a roust
re"resentation o! each word in the training stage. >ina##y an a""ro"riate c#assi!ication a#gorithm
is se#ected !or the testing stage.
@ser Inter!ace eve#o"ment
These systems have a Fra"hica# @ser Inter!ace !or the convenience o! the users. In these @ser
Inter!aces !irst#y the users have to train their system and then can use this system !or the "ur"ose
o! testing and their wor$.
How S"eech To ata Conversion Ta$es P#aceG
To convert s"eech to on3screen te(t or a com"uter command% a com"uter has to go through
severa# com"#e( ste"s. *hen you s"ea$% you create virations in the air. The analog-to-digital
conv!t! "ADC# trans#ates this ana#og wave into digita# data that the com"uter can understand.
To do this% it $a%&l$% or digitizes% the sound y ta$ing "recise measurements o! the wave at
!re&uent interva#s. The system !i#ters the digitized sound to remove unwanted noise% and
sometimes to se"arate it into di!!erent ands o! '!()nc*/!re&uency is the wave#ength o! the
sound waves% heard y humans as di!!erences in "itch0. It a#so norma#izes the sound% or adusts it
to a constant vo#ume #eve#. It may a#so have to e tem"ora##y a#igned. Peo"#e don't a#ways s"ea$
at the same s"eed% so the sound must e adusted to match the s"eed o! the tem"#ate sound
sam"#es a#ready stored in the system's memory.
-
8/10/2019 36605087 Voice Recognition System Report
7/21
7
=e(t the signa# is divided into sma## segments as short as a !ew hundredths o! a second% or even
thousandths in the case o! &lo$iv con$onant $o)nd$33 consonant sto"s "roduced y ostructing
air!#ow in the voca# tract 33 #i$e B"B or Bt.B The "rogram then matches these segments to
$nown &+on%$in the a""ro"riate #anguage. A "honeme is the sma##est e#ement o! a #anguage
33 a re"resentation o! the sounds we ma$e and "ut together to !orm meaning!u# e("ressions.
There are rough#y
-
8/10/2019 36605087 Voice Recognition System Report
8/21
8
The ne(t ste" seems sim"#e% ut it is actua##y the most di!!icu#t to accom"#ish and is the is !ocus
o! most s"eech recognition research. The "rogram e(amines "honemes in the conte(t o! the other
"honemes around them. It runs the conte(tua# "honeme "#ot through a com"#e( statistica# mode#
and com"ares them to a #arge #irary o! $nown words% "hrases and sentences. The "rogram then
determines what the user was "roa#y saying and either out"uts it as te(t or issues a com"uter
command.
Voic Rcognition and Stati$tical Modling
Ear#y s"eech recognition systems tried to a""#y a set o! grammatica# and syntactica# ru#es to
s"eech. I! the words s"o$en !it into a certain set o! ru#es% the "rogram cou#d determine what the
-
8/10/2019 36605087 Voice Recognition System Report
9/21
9
words were. However% human #anguage has numerous e(ce"tions to its own ru#es% even when it's
s"o$en consistent#y. Accents% dia#ects and mannerisms can vast#y change the way certain words
or "hrases are s"o$en. Imagine someone !rom :oston saying the word Barn.B He wou#dn't
"ronounce the BrB at a##% and the word comes out rhyming with Bohn.B -r consider the sentence%
BI'm going to see the ocean.B ,ost "eo"#e don't enunciate their words very care!u##y. The resu#t
might come out as BI'm goin' da see tha ocean.B They run severa# o! the words together with no
noticea#e rea$% such as BI'm goin'B and Bthe ocean.B Ru#es3ased systems were unsuccess!u#ecause they cou#dn't hand#e these variations. This a#so e("#ains why ear#ier systems cou#d not
hand#e continuous s"eech 33 you had to s"ea$ each word se"arate#y% with a rie! "ause in
etween them.
Today's s"eech recognition systems use "ower!u# and com"#icated $tati$tical %odling $*$t%$.
These systems use "roai#ity and mathematica# !unctions to determine the most #i$e#y outcome.
According to ohn Faro!o#o% S"eech Frou" ,anager at the In!ormation Techno#ogy ?aoratory
o! the =ationa# Institute o! Standards and Techno#ogy% the two mode#s that dominate the !ie#d
today are the Hidden ,ar$ov ,ode# and neura# networ$s. These methods invo#ve com"#e(
mathematica# !unctions% ut essentia##y% they ta$e the in!ormation $nown to the system to !igureout the in!ormation hidden !rom it.
The Hidden ,ar$ov ,ode# is the most common% so we'## ta$e a c#oser #oo$ at that "rocess. In
this mode#% each "honeme is #i$e a #in$ in a chain% and the com"#eted chain is a word. However%
the chain ranches o!! in di!!erent directions as the "rogram attem"ts to match the digita# sound
with the "honeme that's most #i$e#y to come ne(t. uring this "rocess% the "rogram assigns a
"roai#ity score to each "honeme% ased on its ui#t3in dictionary and user training.
-
8/10/2019 36605087 Voice Recognition System Report
10/21
1
This "rocess is even more com"#icated !or "hrases and sentences 33 the system has to !igure out
where each word sto"s and starts. The c#assic e(am"#e is the "hrase Brecognize s"eech%B which
sounds a #ot #i$e Bwrec$ a nice eachB when you say it very &uic$#y. The "rogram has to ana#yze
the "honemes using the "hrase that came e!ore it in order to get it right. Here's a rea$down o!
the two "hrasesD
! + , ao g n a* $ & i* c+.!cogni $&c+.
! + , a* n a* $ / i* c+
.0!c, a nic /ac+.
*hy is this so com"#icatedG I! a "rogram has a vocau#ary o! ;% words /common in today's
"rograms0% a se&uence o! three words cou#d e any o! 71; tri##ion "ossii#ities. -vious#y% even
the most "ower!u# com"uter can't search through a## o! them without some he#".
That he#" comes in the !orm o! "rogram training. According to ohn Faro!o#o D
6These statistica# systems need #ots o! e(em"#ary training data to reach their o"tima#"er!ormance 33 sometimes on the order o! thousands o! hours o! human3transcried s"eech and
hundreds o! megaytes o! te(t. These training data are used to create acoustic mode#s o! words%
word #ists% and J...K mu#ti3word "roai#ity networ$s. There is some art into how one se#ects%
com"i#es and "re"ares this training data !or BdigestionB y the system and how the system
mode#s are BtunedB to a "articu#ar a""#ication. These detai#s can ma$e the di!!erence etween a
we##3"er!orming system and a "oor#y3"er!orming system 33 even when using the same asic
a#gorithm.
*hi#e the so!tware deve#o"ers who set u" the system's initia# vocau#ary "er!orm much o! this
training% the end user must a#so s"end some time training it. In a usiness setting% the "rimary
users o! the "rogram must s"end some time /sometimes as #itt#e as 1 minutes0 s"ea$ing into the
system to train it on their "articu#ar s"eech "atterns. They must a#so train the system to recognizeterms and acronyms "articu#ar to the com"any. S"ecia# editions o! s"eech recognition "rograms
!or medica# or #ega# o!!ices have terms common#y used in those !ie#ds a#ready trained into them.
-
8/10/2019 36605087 Voice Recognition System Report
11/21
1
CHAPTER LBloc, Diag!a%
MICROPHONE
PC
MICROCONTROLLER
RELAY 1 RELAY 2 RELAY 3
LOAD 1 LOAD 2 LOAD 3
-
8/10/2019 36605087 Voice Recognition System Report
12/21
1
CHAPTER rance on insta##ing s"eech recognition systems on ,irage aircra!t% and "rograms in the @M
dea#ing with a variety o! aircra!t "#at!orms. In these "rograms% s"eech recognizers have een
o"erated success!u##y in !ighter aircra!t with a""#ications inc#udingD setting radio !re&uencies%
commanding an auto"i#ot system% setting steer3"oint coordinates and wea"ons re#ease"arameters% and contro##ing !#ight dis"#ays. Fenera##y% on#y very #imited% constrained
vocau#aries have een used success!u##y% and a maor e!!ort has een devoted to integration o!
the s"eech recognizer with the avionics system.
Some im"ortant conc#usions !rom the wor$ were as !o##owsD
1. S"eech recognition has de!inite "otentia# !or reducing "i#ot wor$#oad% ut this "otentia#
was not rea#ized consistent#y.
7. Achievement o! very high recognition accuracy /89N or more0 was the most critica#!actor !or ma$ing the s"eech recognition system use!u# O with #ower recognition rates%
"i#ots wou#d not use the system.
-
8/10/2019 36605087 Voice Recognition System Report
13/21
1
L. ,ore natura# vocau#ary and grammar% and shorter training times wou#d e use!u#% ut
on#y i! very high recognition rates cou#d e maintained.
?aoratory research in roust s"eech recognition !or mi#itary environments has "roduced
"romising resu#ts which% i! e(tenda#e to the coc$"it% shou#d im"rove the uti#ity o! s"eechrecognition in high3"er!ormance aircra!t.
He#ico"ters
The "ro#ems o! achieving high recognition accuracy under stress and noise "ertain strong#y tothe he#ico"ter environment as we## as to the !ighter environment. The acoustic noise "ro#em is
actua##y more severe in the he#ico"ter environment% not on#y ecause o! the high noise #eve#s ut
a#so ecause the he#ico"ter "i#ot genera##y does not wear a !acemas$% which wou#d reduceacoustic noise in the micro"hone. Sustantia# test and eva#uation "rograms have een carried out
in the "ast decade in s"eech recognition systems a""#ications in he#ico"ters% nota#y y the @.S.
Army Avionics Research and eve#o"ment Activity /AVRAA0 and y the Roya# Aeros"aceEsta#ishment /RAE0 in the @M. *or$ in >rance has inc#uded s"eech recognition in the Puma
he#ico"ter. There has a#so een much use!u# wor$ in Canada. Resu#ts have een encouraging% andvoice a""#ications have inc#udedD contro# o! communication radios+ setting o! navigation
systems+ and contro# o! an automated target handover system.
As in !ighter a""#ications% the overriding issue !or voice in he#ico"ters is the im"act on "i#ot
e!!ectiveness. Encouraging resu#ts are re"orted !or the AVRAA tests% a#though these re"resenton#y a !easii#ity demonstration in a test environment. ,uch remains to e done oth in s"eech
recognition and in overa## s"eech recognition techno#ogy% in order to consistent#y achieve
"er!ormance im"rovements in o"erationa# settings.
:att#e ,anagement
:att#e ,anagement command centers genera##y re&uire ra"id access to and contro# o! #arge%
ra"id#y changing in!ormation dataases. Commanders and system o"erators need to &uery these
dataases as convenient#y as "ossi#e% in an eyes3usy environment where much o! the
in!ormation is "resented in a dis"#ay !ormat. Human3machine interaction y voice has the
"otentia# to e very use!u# in these environments. . A numer o! e!!orts have een underta$en to
inter!ace commercia##y avai#a#e iso#ated3word recognizers into att#e management
environments. In one !easii#ity study s"eech recognition e&ui"ment was tested in conunction
with an integrated in!ormation dis"#ay !or nava# att#e management a""#ications. @sers were
very o"timistic aout the "otentia# o! the system% a#though ca"ai#ities were #imited.
S"eech understanding "rograms s"onsored y the e!ense Advanced Research Proects Agency
/ARPA0 in the @.S. has !ocused on this "ro#em o! natura# s"eech inter!ace. S"eech recognition
e!!orts have !ocused on a dataase o! continuous s"eech recognition /CSR0% #arge3vocau#ary
s"eech which is designed to e re"resentative o! the nava# resource management tas$. Signi!icant
advances in the state3o!3the3art in CSR have een achieved% and current e!!orts are !ocused on
-
8/10/2019 36605087 Voice Recognition System Report
14/21
1
integrating s"eech recognition and natura# #anguage "rocessing to a##ow s"o$en #anguage
interaction with a nava# resource management system.
Training Air Tra!!ic Contro##er
Training !or air tra!!ic contro##ers /ATC0 re"resents an e(ce##ent a""#ication !or s"eechrecognition systems. ,any ATC training systems current#y re&uire a "erson to act as a B"seudo3
"i#otB% engaging in a voice dia#og with the trainee contro##er% which simu#ates the dia#og which
the contro##er wou#d have to conduct with "i#ots in a rea# ATC situation. S"eech recognition and
synthesis techni&ues o!!er the "otentia# to e#iminate the need !or a "erson to act as "seudo3"i#ot%
thus reducing training and su""ort "ersonne#. Air contro##er tas$s are a#so characterized y high#y
structured s"eech as the "rimary out"ut o! the contro##er% hence reducing the di!!icu#ty o! the
s"eech recognition tas$.
The @.S. =ava# Training E&ui"ment Center has s"onsored a numer o! deve#o"ments o!
"rototy"e ATC trainers using s"eech recognition. Fenera##y% the recognition accuracy !a##s shorto! "roviding grace!u# interaction etween the trainee and the system. However% the "rototy"e
training systems have demonstrated a signi!icant "otentia# !or voice interaction in these systems%
and in other training a""#ications. The @.S. =avy has s"onsored a #arge3sca#e e!!ort in ATC
training systems% where a commercia# s"eech recognition unit was integrated with a com"#e(
training system inc#uding dis"#ays and scenario creation. A#though the recognizer was
constrained in vocau#ary% one o! the goa#s o! the training "rograms was to teach the contro##ers
to s"ea$ in a constrained #anguage% using s"eci!ic vocau#ary s"eci!ica##y designed !or the ATC
tas$. Research in >rance has !ocused on the a""#ication o! s"eech recognition in ATC training
systems% directed at issues oth in s"eech recognition and in a""#ication o! tas$3domain grammar
constraints.
The @SA>% @S,C% @S Army% and >AA are current#y using ATC simu#ators with s"eech
recognition !rom a numer o! di!!erent vendors% inc#uding @>A% Inc% and Adace# Systems Inc
/ASI0. This so!tware uses s"eech recognition and synthetic s"eech to ena#e the trainee to contro#
aircra!t and ground vehic#es in the simu#ation without the need !or "seudo "i#ots.
Another a""roach to ATC simu#ation with s"eech recognition has een created y Su"remis. The
Su"remis system is not constrained y rigid grammars im"osed y the under#ying #imitations o!
other recognition strategies.
Te#e"hony and -ther omains
ASR in the !ie#d o! te#e"hony is now common"#ace and in the !ie#d o! com"uter gaming and
simu#ation is ecoming more wides"read. es"ite the high #eve# o! integration with word
-
8/10/2019 36605087 Voice Recognition System Report
15/21
1
"rocessing in genera# "ersona# com"uting% however% ASR in the !ie#d o! document "roduction has
not seen the e("ected increases in use.
The im"rovement o! moi#e "rocessor s"eeds made !easi#e the s"eech3ena#ed Symian and
*indows ,oi#e Smart "hones. S"eech is used most#y as a "art o! @ser Inter!ace% !or creating
"re3de!ined or custom s"eech commands. ?eading so!tware vendors in this !ie#d areD ,icroso!t
Cor"oration /,icroso!t Voice Command0% =uance Communications /=uance Voice Contro#0%Vito Techno#ogy /VIT- Voice7Fo0% S"eereo So!tware /S"eereo Voice Trans#ator0% igita#
Sy"hon /Sonic ,assager a""#iance0 and SV-4.
Peo"#e with isai#ities
Peo"#e with disai#ities can ene!it !rom s"eech recognition "rograms. S"eech recognition is
es"ecia##y use!u# !or "eo"#e who have di!!icu#ty using their hands% ranging !rom mi#d re"etitive
stress inuries to invo#ved disai#ities that "rec#ude using conventiona# com"uter in"ut devices.
In !act% "eo"#e who used the $eyoard a #ot and deve#o"ed RSI ecame an urgent ear#y mar$et!or s"eech recognition. S"eech recognition is used in dea! te#e"hony% such as voicemai# to te(t%
re#ay services% and ca"tioned te#e"hone. Individua#s with #earning disai#ities who have "ro#ems
with thought3to3"a"er communication /essentia##y they thin$ o! an idea ut it is "rocessed
incorrect#y causing it to end u" di!!erent#y on "a"er0 can ene!it !rom the so!tware.
1)!t+! A&&lication
Automatic trans#ation+
Automotive s"eech recognition
Te#ematics /e.g. vehic#e =avigation Systems0+
Court re"orting /Rea#3time Voice *riting0+
Hands3!ree com"utingD voice command recognition com"uter user inter!ace+
Home automation+
Interactive voice res"onse+
,oi#e te#e"hony% inc#uding moi#e emai#+
,u#timoda# interaction+
Pronunciation eva#uation in com"uter3aided #anguage #earning a""#ications+
-
8/10/2019 36605087 Voice Recognition System Report
16/21
1
Rootics+
Video games% with Tom C#ancy's End *ar and ?i!e#ine as wor$ing e(am"#es+
Transcri"tion /digita# s"eech3to3te(t0+
S"eech3to3te(t /transcri"tion o! s"eech into moi#e te(t messages0+
Air Tra!!ic Contro# S"eech Recognition
CHAPTER 9
Per!ormance o! Voice Recognition System
The "er!ormance o! s"eech recognition systems is usua##y s"eci!ied in terms o! accuracy and
s"eed. Accuracy is usua##y rated with word error rate /*ER0% whereas s"eed is measured withthe rea# time !actor. -ther measures o! accuracy inc#ude Sing#e *ord Error Rate /S*ER0 and
Command Success Rate /CSR0.
In 187 Murzwei# A""#ied Inte##igence and ragon Systems re#eased s"eech recognition
"roducts. :y 189% Murzwei#)s so!tware had a vocau#ary o! 1% wordsOi! uttered one word at
a time. Two years #ater% in 185% its #e(icon reached 7% words% entering the rea#m o! human
vocau#aries% which range !rom 1% to 19% words. :ut recognition accuracy was on#y
1N in 188L. Two years #ater% the error rate crossed e#ow 9N. ragon Systems re#eased
B=atura##y S"ea$ingB in 1885 which recognized norma# human s"eech. Progress main#y came
!rom im"roved com"uter "er!ormance and #arger source te(t dataases. The :rown Cor"us wasthe !irst maor dataase avai#a#e% containing severa# mi##ion words. In 71 recognition
accuracy reached its current "#ateau o! N% no #onger growing with data or com"uting "ower. In
7;% Foog#e "u#ished a tri##ion3word cor"us% whi#e Carnegie ,e##on @niversity researchers
!ound no signi!icant increase in recognition accuracy.
ictation in Voice Recognition Systems
ictation machines can achieve good "er!ormance in contro##ed conditions. There is some
con!usion% however% over the interchangeai#ity o! the terms Bs"eech recognitionB and
BdictationB. Commercia# s"ea$er3de"endent dictation systems usua##y re&uire on#y a short
training "eriod /sometimes a#so ca##ed Qenro##ment'0 and may success!u##y ca"ture continuous
s"eech with a #arge vocau#ary at norma# "ace with a very high accuracy. ,ost commercia#
com"anies c#aim that recognition so!tware can achieve etween 8N to 88N accuracy i!
o"erated under o"tima# conditions. Q-"tima# conditions' usua##y assume that usersD
-
8/10/2019 36605087 Voice Recognition System Report
17/21
1
have s"eech characteristics which match the training data%
can achieve "ro"er s"ea$er ada"tation% and
*or$ in a c#ean noise environment /e.g. &uiet o!!ice or #aoratory s"ace0.
This e("#ains why some users% such as those whose s"eech is heavi#y accented% e("erience much
#ower recognition rates.
?imited vocau#ary systems% re&uiring no training% can recognize a sma## numer o! words /!or
instance% the ten digits0 as s"o$en y most s"ea$ers. Such systems are "o"u#ar !or routing
incoming "hone ca##s to their destinations in #arge organizations.
A#gorithm used in Voice Recognition System
:oth acoustic mode#ing and #anguage mode#ing are im"ortant "arts o! modern statistica##y3ased
s"eech recognition a#gorithms. Hidden ,ar$ov mode#s /H,,s0 are wide#y used in manysystems. ?anguage mode#ing has many other a""#ications such as smart $eyoard and document
c#assi!ication.
Hiddn Ma!,ov %odl$
,odern genera#3"ur"ose s"eech recognition systems are ased on Hidden ,ar$ov ,ode#s.
These are statistica# mode#s which out"ut a se&uence o! symo#s or &uantities. H,,s are used in
s"eech recognition ecause a s"eech signa# can e viewed as a "iecewise stationary signa# or a
short3time stationary signa#. In a short3time /e.g.% 1 mi##iseconds00% s"eech can e a""ro(imatedas a stationary "rocess. S"eech can e thought o! as a ,ar$ov mode# !or many stochastic
"ur"oses.
Another reason why H,,s are "o"u#ar is ecause they can e trained automatica##y and are
sim"#e and com"utationa##y !easi#e to use. In s"eech recognition% the hidden ,ar$ov mode#wou#d out"ut a se&uence o! n3dimensiona# rea#3va#ued vectors /with neing a sma## integer% such
as 10% out"utting one o! these every 1 mi##iseconds. The vectors wou#d consist o! ce"stra#
coe!!icients% which are otained y ta$ing a >ourier trans!orm o! a short time window o! s"eechand dcor3re#ating the s"ectrum using a cosine trans!orm% then ta$ing the !irst /most signi!icant0
coe!!icients. The hidden ,ar$ov mode# wi## tend to have in each state a statistica# distriution
that is a mi(ture o! diagona# covariance Faussians which wi## give #i$e#ihood !or each oservedvector. Each word% or /!or more genera# s"eech recognition systems0% each "honeme% wi## have a
di!!erent out"ut distriution+ a hidden ,ar$ov mode# !or a se&uence o! words or "honemes ismade y concatenating the individua# trained hidden ,ar$ov mode#s !or the se"arate words and
"honemes.
escried aove are the core e#ements o! the most common% H,,3ased a""roach to s"eech
recognition. ,odern s"eech recognition systems use various cominations o! a numer o!
standard techni&ues in order to im"rove resu#ts over the asic a""roach descried aove. Aty"ica# #arge3vocau#ary system wou#d need conte(t de"endency !or the "honemes /so "honemes
-
8/10/2019 36605087 Voice Recognition System Report
18/21
1
with di!!erent #e!t and right conte(t have di!!erent rea#izations as H,, states0+ it wou#d use
ce"stra# norma#ization to norma#ize !or di!!erent s"ea$er and recording conditions+ !or !urthers"ea$er norma#ization it might use voca# tract #ength norma#ization /VT?=0 !or ma#e3!ema#e
norma#ization and ma(imum #i$e#ihood #inear regression /,??R0 !or more genera# s"ea$er
ada"tation. The !eatures wou#d have so3ca##ed de#ta and de#ta3de#ta coe!!icients to ca"ture s"eechdynamics and in addition might use heteroscedastic #inear discriminant ana#ysis /H?A0+ or
might s$i" the de#ta and de#ta3de#ta coe!!icients and use s"#icing and an ?A3ased "roection
!o##owed "erha"s y heteroscedastic #inear discriminant ana#ysis or a g#oa# semitied covariancetrans!orm /a#so $nown as ma(imum #i$e#ihood #inear trans!orm% or ,??T0. ,any systems useso3ca##ed discriminative training techni&ues which dis"ense with a "ure#y statistica# a""roach to
H,, "arameter estimation and instead o"timize some c#assi!ication3re#ated measure o! the
training data. E(am"#es are ma(imum mutua# in!ormation /,,I0% minimum c#assi!ication error/,CE0 and minimum "hone error /,PE0.
ecoding o! the s"eech /the term !or what ha""ens when the system is "resented with a new
utterance and must com"ute the most #i$e#y source sentence0 wou#d "roa#y use the Viteri
a#gorithm to !ind the est "ath% and here there is a choice etween dynamica##y creating acomination hidden ,ar$ov mode# which inc#udes oth the acoustic and #anguage mode#
in!ormation% or comining it statica##y e!orehand /the !inite state transducer% or >ST% a""roach0.
D*na%ic ti% 0a!&ing "DT2#-/a$d $&c+ !cognition
ynamic time war"ing is an a""roach that was historica##y used !or s"eech recognition ut hasnow #arge#y een dis"#aced y the more success!u# H,,3ased a""roach. ynamic time
war"ing is an a#gorithm !or measuring simi#arity etween two se&uences which may vary in time
or s"eed. >or instance% simi#arities in wa#$ing "atterns wou#d e detected% even i! in one videothe "erson was wa#$ing s#ow#y and i! in another they were wa#$ing more &uic$#y% or even i! there
were acce#erations and dece#erations during the course o! one oservation. T* has een
a""#ied to video% audio% and gra"hics indeed% any data which can e turned into a #inear
re"resentation can e ana#yzed with T*.
A we## $nown a""#ication has een automatic s"eech recognition% to co"e with di!!erent s"ea$ing
s"eeds. In genera#% it is a method that a##ows a com"uter to !ind an o"tima# match etween two
given se&uences /e.g. time series0 with certain restrictions% i.e. the se&uences are Bwar"edB non3
#inear#y to match each other. This se&uence a#ignment method is o!ten used in the conte(t o!hidden ,ar$ov mode#s.
CHAPTER ;>urther In!ormation
Po"u#ar s"eech recognition con!erences he#d each year or two inc#ude S"eechTEM andS"eechTEM Euro"e% ICASSP% Euros"eechICS?P /now named Inters"eech0 and the IEEE ASR@.
-
8/10/2019 36605087 Voice Recognition System Report
19/21
-
8/10/2019 36605087 Voice Recognition System Report
20/21
2
addition to Eng#ish% and the FA?E "roect% which !ocused so#e#y on ,andarin and Araic and
re&uired trans#ation simu#taneous#y with s"eech recognition.
Commercia# research and other academic research a#so continue to !ocus on increasing#y di!!icu#t"ro#ems. -ne $ey area is to im"rove roustness o! s"eech recognition "er!ormance% not ust
roustness against noise ut roustness against any condition that causes a maor degradation in
"er!ormance. Another $ey area o! research is !ocused on an o""ortunity rather than a "ro#em.
This research attem"ts to ta$e advantage o! the !act that in many a""#ications there is a #arge&uantity o! s"eech data avai#a#e% u" to mi##ions o! hours. It is too e("ensive to have humans
transcrie such #arge &uantities o! s"eech% so the research !ocus is on deve#o"ing new methods o!machine #earning that can e!!ective#y uti#ize #arge &uantities o! un#ae#ed data. Another area o!
research is etter understanding o! human ca"ai#ities and to use this understanding to im"rove
machine recognition "er!ormance.
Voice Recognition SystemsD *ea$ness and >#aws
=o s"eech recognition system is 1 "ercent "er!ect+ severa# !actors can reduce accuracy. Some
o! these !actors are issues that continue to im"rove as the techno#ogy im"roves. -thers can e#essened 33 i! not com"#ete#y corrected 33 y the user.
3o0 $ignal-to-noi$ !atio
The "rogram needs to BhearB the words s"o$en distinct#y% and any e(tra noise introduced into thesound wi## inter!ere with this. The noise can come !rom a numer o! sources% inc#uding #oud
ac$ground noise in an o!!ice environment. @sers shou#d wor$ in a &uiet room with a
&ua#ity micro"hone "ositioned as c#ose to their mouths as "ossi#e. ?ow3&ua#ity sound cards%
which "rovide the in"ut !or the micro"hone to send the signa# to the com"uter% o!ten do not haveenough shie#ding !rom the e#ectrica# signa#s "roduced y other com"uter com"onents. They can
introduce hum or hiss into the signa#.
Ov!la&&ing $&c+
Current systems have di!!icu#ty se"arating simu#taneous s"eech !rom mu#ti"#e users. BI! you tryto em"#oy recognition techno#ogy in conversations or meetings where "eo"#e !re&uent#y interru"t
each other or ta#$ over one another% you're #i$e#y to get e(treme#y "oor resu#ts%B says ohn
Faro!o#o.
Intn$iv )$ o' co%&)t! &o0!
Running the statistica# mode#s needed !or s"eech recognition re&uires the com"uter's "rocessor to
do a #ot o! heavy wor$. -ne reason !or this is the need to rememer each stage o! the word3
-
8/10/2019 36605087 Voice Recognition System Report
21/21
2
recognition search in case the system needs to ac$trac$ to come u" with the right word. The
!astest "ersona# com"uters in use today can sti## have di!!icu#ties with com"#icated commands or"hrases% s#owing down the res"onse time signi!icant#y. The vocau#aries needed y the "rograms
a#so ta$e u" a #arge amount o! hard drive s"ace. >ortunate#y% dis$ storage and "rocessor s"eed are
areas o! ra"id advancement 33 the com"uters in use 1 years !rom now wi## ene!it !rom ane("onentia# increase in oth !actors.
Ho%on*%$Homonyms are two words that are s"e##ed di!!erent#y and have di!!erent meanings ut sound the
same. BThereB and Btheir%B BairB and Bheir%B BeB and BeeB are a## e(am"#es. There is no way !ora s"eech recognition "rogram to te## the di!!erence etween these words ased on sound a#one.
However% e(tensive training o! systems and statistica# mode#s that ta$e into account word conte(t
has great#y im"roved their "er!ormance.
CHAPTER 5
R'!nc$4
htt"Den.wi$i"edia.orgwi$iS"eechrecognition
htt"De#ectronics.howstu!!wor$s.comgadgetshigh3tech3gadgetss"eech3recognition.htm
htt"sDwww.microso!t.comena#e"roductswindowsvistas"eech.as"(
www.nuance.comnatura##ys"ea$ing
www.!a&s.orgdocs?inu(...S&c+3Rcognition3H-*T-.htm#