36605087 voice recognition system report

Upload: saringagan

Post on 02-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 36605087 Voice Recognition System Report

    1/21

    VOICE RECOGNITION BASED

    HOME AUTOMATION SYSTEM

  • 8/10/2019 36605087 Voice Recognition System Report

    2/21

    2

    CHAPTER 1

    Introduction

    Voice Recognition System is a system which can recognize the voices. This can e !or the

    "ur"ose o! words identi!ication or !or the "ur"ose o! security.

    Voice Recognition is the "rocess o! automatica##y recognizing who is s"ea$ing or what is

    s"ea$ing% on the asis o! individua# in!ormation inc#uded in the s"eech waves. This techni&ue

    ma$es it "ossi#e to use the s"ea$er's voice to veri!y their identity and contro# access to services

    such as voice dia#ing% an$ing y te#e"hone% te#e"hone sho""ing% dataase access services%

    in!ormation services% voice mai#% security contro# !or con!identia# in!ormation areas% and remote

    access to com"uters.

    Some Voice Recognition System is designed in such a way that they can convert the s"o$en

    words into te(t.

    Voice recognition System or So!tware)s can a#so e used as an a#ternative to ty"ing on a

    $eyoard. Put sim"#y% you ta#$ to the com"uter and your words a""ear on the screen. The

    so!tware has een deve#o"ed to "rovide a !ast method o! writing onto a com"uter and can he#"

    "eo"#e with a variety o! disai#ities. It is use!u# !or "eo"#e with "hysica# disai#ities who o!ten

    !ind ty"ing di!!icu#t% "ain!u# or im"ossi#e. Voice recognition so!tware can a#so he#" those with

    s"e##ing di!!icu#ties% inc#uding users with dys#e(ic% ecause recognized words are a#ways

    correct#y s"e##ed.

    *e can see the use o! Voice Recognition Systems in our dai#y #i!e !or e(am"#e today% when we

    ca## most #arge com"anies+ a "erson doesn)t usua##y answer the "hone. Instead% an automated

    voice recording answers and instructs you to "ress uttons to move through o"tions menus.

    ,any com"anies have moved eyond re&uiring you to "ress uttons% though. -!ten you can ust

    s"ec$ certain words /again as instructed y a recording0 to get what you need. The system that

    ma$es this "ossi#e is a ty"e o! Voice Recognition Program an automated "hone system.

    2ou can a#so use voice recognition so!tware in homes and usinesses. A range o! so!tware

    "roducts a##ows users to dictate to their com"uter and have their words converted to te(t in a

    word "rocessing or e3mai# document. 2ou can access !unction commands% such as o"ening !i#es

    and accessing menus% with voice instructions. Some "rograms are !or s"eci!ic usiness settings%

    such as medica# or #ega# transcri"tion.

    Peo"#e with disai#ities that "revent them !rom ty"ing have a#so ado"ted voice3recognition

    systems. I! a user has #ost the use o! his hands% or !or visua##y im"aired users when it is not

  • 8/10/2019 36605087 Voice Recognition System Report

    3/21

  • 8/10/2019 36605087 Voice Recognition System Report

    4/21

    4

    C#assi!ication o! Voice Recognition System

    Iso#ated Voice Recognition System re&uires a rie! "ause etween each s"o$en word%

    otherwise they can)t detect the voice com"#ete#y% and this system wi## ma#!unction.

    Continuous Voice Recognition System doesn)t re&uire a rie! "ause etween each s"o$en

    words% hence it can detect the continuous s"eech or voice. *e can say that this system is

    an advance version o! the Iso#ated Voice Recognition System.

    S"ea$er3e"endent Voice Recognition System can on#y recognize the s"eech !rom one

    "articu#ar s"ea$er)s voice. This ty"e o! system)s can e used !or security and

    identi!ication "ur"oses.

  • 8/10/2019 36605087 Voice Recognition System Report

    5/21

    5

    S"ea$er3Inde"endent Voice Recognition System can recognize the s"eech !rom anyody.

    These ty"es o! systems are emedded in voice3activated routing at customer ca## centre)s%

    voice dia#ing on moi#e "hones and many other dai#y a""#ications. This system is an

    advanced version o! S"ea$er3e"endent Voice Recognition System.

    The eve#o"ment *or$!#ow o! Voice Recognition System

    There are two maor stages within Voice RecognitionD a training stage and a testing stage.

    Training invo#ves 6teaching the system y ui#ding its !ictionary% an acoustic mode# !or eachword that the system needs to recognize. In the testing stage we use acoustic mode#s o! these

    words to recognize s"o$en words using a c#assi!ication a#gorithm.

    The eve#o"ment *or$!#ow consists o! three ste"sD

    S"eech Ac&uisition.

    S"eech Ana#ysis.

    @ser Inter!ace eve#o"ment.

    S"eech Ac&uisition

    >or training s"eech is ac&uired !rom the micro"hone and rought under the deve#o"ment

    environment !or the o!!#ine ana#ysis. >or testing the s"eech is continuous#y streamed into the

    environment !or on#ine "rocessing.

    uring the training stage% it is necessary to record the re"eated utterances o! each word in the

    dictionary. >or e(am"#e% su""ose we are recording the word 6A""#e in the dictionary% then we

    have to record the 6A""#e !or many times with a "ause etween each utterance. This isnecessary !or ui#ding a roust voice recognition system. I! we !ai# to do so% then the system

    deve#o"ed may "roduce undesira#e res"onses.

    *e can record the s"eech y using a micro"hone and with the he#" o! standard PC3Sound Card.

    This a""roach wor$s we## !or training data. In the testing stage% we need to continuous#y ac&uire

  • 8/10/2019 36605087 Voice Recognition System Report

    6/21

    6

    and u!!er s"eech sam"#es% and at the same time% "rocess the incoming s"eech"ra#e by "ra#e% or

    in continuous grou"s o! sam"#es.

    S"eech Ana#ysis

    *hen s"eech is ac&uired into the deve#o"ment environment then it has to e "rocessed orana#yzed. This s"eech ana#ysis is one o! the most com"#icated and im"ortant ste" in the

    deve#o"ment o! voice recognition system. In this stage a word detection a#gorithm is made that

    serrate each word !rom the amient noise. Then an acoustic mode# is derived that gives a roust

    re"resentation o! each word in the training stage. >ina##y an a""ro"riate c#assi!ication a#gorithm

    is se#ected !or the testing stage.

    @ser Inter!ace eve#o"ment

    These systems have a Fra"hica# @ser Inter!ace !or the convenience o! the users. In these @ser

    Inter!aces !irst#y the users have to train their system and then can use this system !or the "ur"ose

    o! testing and their wor$.

    How S"eech To ata Conversion Ta$es P#aceG

    To convert s"eech to on3screen te(t or a com"uter command% a com"uter has to go through

    severa# com"#e( ste"s. *hen you s"ea$% you create virations in the air. The analog-to-digital

    conv!t! "ADC# trans#ates this ana#og wave into digita# data that the com"uter can understand.

    To do this% it $a%&l$% or digitizes% the sound y ta$ing "recise measurements o! the wave at

    !re&uent interva#s. The system !i#ters the digitized sound to remove unwanted noise% and

    sometimes to se"arate it into di!!erent ands o! '!()nc*/!re&uency is the wave#ength o! the

    sound waves% heard y humans as di!!erences in "itch0. It a#so norma#izes the sound% or adusts it

    to a constant vo#ume #eve#. It may a#so have to e tem"ora##y a#igned. Peo"#e don't a#ways s"ea$

    at the same s"eed% so the sound must e adusted to match the s"eed o! the tem"#ate sound

    sam"#es a#ready stored in the system's memory.

  • 8/10/2019 36605087 Voice Recognition System Report

    7/21

    7

    =e(t the signa# is divided into sma## segments as short as a !ew hundredths o! a second% or even

    thousandths in the case o! &lo$iv con$onant $o)nd$33 consonant sto"s "roduced y ostructing

    air!#ow in the voca# tract 33 #i$e B"B or Bt.B The "rogram then matches these segments to

    $nown &+on%$in the a""ro"riate #anguage. A "honeme is the sma##est e#ement o! a #anguage

    33 a re"resentation o! the sounds we ma$e and "ut together to !orm meaning!u# e("ressions.

    There are rough#y

  • 8/10/2019 36605087 Voice Recognition System Report

    8/21

    8

    The ne(t ste" seems sim"#e% ut it is actua##y the most di!!icu#t to accom"#ish and is the is !ocus

    o! most s"eech recognition research. The "rogram e(amines "honemes in the conte(t o! the other

    "honemes around them. It runs the conte(tua# "honeme "#ot through a com"#e( statistica# mode#

    and com"ares them to a #arge #irary o! $nown words% "hrases and sentences. The "rogram then

    determines what the user was "roa#y saying and either out"uts it as te(t or issues a com"uter

    command.

    Voic Rcognition and Stati$tical Modling

    Ear#y s"eech recognition systems tried to a""#y a set o! grammatica# and syntactica# ru#es to

    s"eech. I! the words s"o$en !it into a certain set o! ru#es% the "rogram cou#d determine what the

  • 8/10/2019 36605087 Voice Recognition System Report

    9/21

    9

    words were. However% human #anguage has numerous e(ce"tions to its own ru#es% even when it's

    s"o$en consistent#y. Accents% dia#ects and mannerisms can vast#y change the way certain words

    or "hrases are s"o$en. Imagine someone !rom :oston saying the word Barn.B He wou#dn't

    "ronounce the BrB at a##% and the word comes out rhyming with Bohn.B -r consider the sentence%

    BI'm going to see the ocean.B ,ost "eo"#e don't enunciate their words very care!u##y. The resu#t

    might come out as BI'm goin' da see tha ocean.B They run severa# o! the words together with no

    noticea#e rea$% such as BI'm goin'B and Bthe ocean.B Ru#es3ased systems were unsuccess!u#ecause they cou#dn't hand#e these variations. This a#so e("#ains why ear#ier systems cou#d not

    hand#e continuous s"eech 33 you had to s"ea$ each word se"arate#y% with a rie! "ause in

    etween them.

    Today's s"eech recognition systems use "ower!u# and com"#icated $tati$tical %odling $*$t%$.

    These systems use "roai#ity and mathematica# !unctions to determine the most #i$e#y outcome.

    According to ohn Faro!o#o% S"eech Frou" ,anager at the In!ormation Techno#ogy ?aoratory

    o! the =ationa# Institute o! Standards and Techno#ogy% the two mode#s that dominate the !ie#d

    today are the Hidden ,ar$ov ,ode# and neura# networ$s. These methods invo#ve com"#e(

    mathematica# !unctions% ut essentia##y% they ta$e the in!ormation $nown to the system to !igureout the in!ormation hidden !rom it.

    The Hidden ,ar$ov ,ode# is the most common% so we'## ta$e a c#oser #oo$ at that "rocess. In

    this mode#% each "honeme is #i$e a #in$ in a chain% and the com"#eted chain is a word. However%

    the chain ranches o!! in di!!erent directions as the "rogram attem"ts to match the digita# sound

    with the "honeme that's most #i$e#y to come ne(t. uring this "rocess% the "rogram assigns a

    "roai#ity score to each "honeme% ased on its ui#t3in dictionary and user training.

  • 8/10/2019 36605087 Voice Recognition System Report

    10/21

    1

    This "rocess is even more com"#icated !or "hrases and sentences 33 the system has to !igure out

    where each word sto"s and starts. The c#assic e(am"#e is the "hrase Brecognize s"eech%B which

    sounds a #ot #i$e Bwrec$ a nice eachB when you say it very &uic$#y. The "rogram has to ana#yze

    the "honemes using the "hrase that came e!ore it in order to get it right. Here's a rea$down o!

    the two "hrasesD

    ! + , ao g n a* $ & i* c+.!cogni $&c+.

    ! + , a* n a* $ / i* c+

    .0!c, a nic /ac+.

    *hy is this so com"#icatedG I! a "rogram has a vocau#ary o! ;% words /common in today's

    "rograms0% a se&uence o! three words cou#d e any o! 71; tri##ion "ossii#ities. -vious#y% even

    the most "ower!u# com"uter can't search through a## o! them without some he#".

    That he#" comes in the !orm o! "rogram training. According to ohn Faro!o#o D

    6These statistica# systems need #ots o! e(em"#ary training data to reach their o"tima#"er!ormance 33 sometimes on the order o! thousands o! hours o! human3transcried s"eech and

    hundreds o! megaytes o! te(t. These training data are used to create acoustic mode#s o! words%

    word #ists% and J...K mu#ti3word "roai#ity networ$s. There is some art into how one se#ects%

    com"i#es and "re"ares this training data !or BdigestionB y the system and how the system

    mode#s are BtunedB to a "articu#ar a""#ication. These detai#s can ma$e the di!!erence etween a

    we##3"er!orming system and a "oor#y3"er!orming system 33 even when using the same asic

    a#gorithm.

    *hi#e the so!tware deve#o"ers who set u" the system's initia# vocau#ary "er!orm much o! this

    training% the end user must a#so s"end some time training it. In a usiness setting% the "rimary

    users o! the "rogram must s"end some time /sometimes as #itt#e as 1 minutes0 s"ea$ing into the

    system to train it on their "articu#ar s"eech "atterns. They must a#so train the system to recognizeterms and acronyms "articu#ar to the com"any. S"ecia# editions o! s"eech recognition "rograms

    !or medica# or #ega# o!!ices have terms common#y used in those !ie#ds a#ready trained into them.

  • 8/10/2019 36605087 Voice Recognition System Report

    11/21

    1

    CHAPTER LBloc, Diag!a%

    MICROPHONE

    PC

    MICROCONTROLLER

    RELAY 1 RELAY 2 RELAY 3

    LOAD 1 LOAD 2 LOAD 3

  • 8/10/2019 36605087 Voice Recognition System Report

    12/21

    1

    CHAPTER rance on insta##ing s"eech recognition systems on ,irage aircra!t% and "rograms in the @M

    dea#ing with a variety o! aircra!t "#at!orms. In these "rograms% s"eech recognizers have een

    o"erated success!u##y in !ighter aircra!t with a""#ications inc#udingD setting radio !re&uencies%

    commanding an auto"i#ot system% setting steer3"oint coordinates and wea"ons re#ease"arameters% and contro##ing !#ight dis"#ays. Fenera##y% on#y very #imited% constrained

    vocau#aries have een used success!u##y% and a maor e!!ort has een devoted to integration o!

    the s"eech recognizer with the avionics system.

    Some im"ortant conc#usions !rom the wor$ were as !o##owsD

    1. S"eech recognition has de!inite "otentia# !or reducing "i#ot wor$#oad% ut this "otentia#

    was not rea#ized consistent#y.

    7. Achievement o! very high recognition accuracy /89N or more0 was the most critica#!actor !or ma$ing the s"eech recognition system use!u# O with #ower recognition rates%

    "i#ots wou#d not use the system.

  • 8/10/2019 36605087 Voice Recognition System Report

    13/21

    1

    L. ,ore natura# vocau#ary and grammar% and shorter training times wou#d e use!u#% ut

    on#y i! very high recognition rates cou#d e maintained.

    ?aoratory research in roust s"eech recognition !or mi#itary environments has "roduced

    "romising resu#ts which% i! e(tenda#e to the coc$"it% shou#d im"rove the uti#ity o! s"eechrecognition in high3"er!ormance aircra!t.

    He#ico"ters

    The "ro#ems o! achieving high recognition accuracy under stress and noise "ertain strong#y tothe he#ico"ter environment as we## as to the !ighter environment. The acoustic noise "ro#em is

    actua##y more severe in the he#ico"ter environment% not on#y ecause o! the high noise #eve#s ut

    a#so ecause the he#ico"ter "i#ot genera##y does not wear a !acemas$% which wou#d reduceacoustic noise in the micro"hone. Sustantia# test and eva#uation "rograms have een carried out

    in the "ast decade in s"eech recognition systems a""#ications in he#ico"ters% nota#y y the @.S.

    Army Avionics Research and eve#o"ment Activity /AVRAA0 and y the Roya# Aeros"aceEsta#ishment /RAE0 in the @M. *or$ in >rance has inc#uded s"eech recognition in the Puma

    he#ico"ter. There has a#so een much use!u# wor$ in Canada. Resu#ts have een encouraging% andvoice a""#ications have inc#udedD contro# o! communication radios+ setting o! navigation

    systems+ and contro# o! an automated target handover system.

    As in !ighter a""#ications% the overriding issue !or voice in he#ico"ters is the im"act on "i#ot

    e!!ectiveness. Encouraging resu#ts are re"orted !or the AVRAA tests% a#though these re"resenton#y a !easii#ity demonstration in a test environment. ,uch remains to e done oth in s"eech

    recognition and in overa## s"eech recognition techno#ogy% in order to consistent#y achieve

    "er!ormance im"rovements in o"erationa# settings.

    :att#e ,anagement

    :att#e ,anagement command centers genera##y re&uire ra"id access to and contro# o! #arge%

    ra"id#y changing in!ormation dataases. Commanders and system o"erators need to &uery these

    dataases as convenient#y as "ossi#e% in an eyes3usy environment where much o! the

    in!ormation is "resented in a dis"#ay !ormat. Human3machine interaction y voice has the

    "otentia# to e very use!u# in these environments. . A numer o! e!!orts have een underta$en to

    inter!ace commercia##y avai#a#e iso#ated3word recognizers into att#e management

    environments. In one !easii#ity study s"eech recognition e&ui"ment was tested in conunction

    with an integrated in!ormation dis"#ay !or nava# att#e management a""#ications. @sers were

    very o"timistic aout the "otentia# o! the system% a#though ca"ai#ities were #imited.

    S"eech understanding "rograms s"onsored y the e!ense Advanced Research Proects Agency

    /ARPA0 in the @.S. has !ocused on this "ro#em o! natura# s"eech inter!ace. S"eech recognition

    e!!orts have !ocused on a dataase o! continuous s"eech recognition /CSR0% #arge3vocau#ary

    s"eech which is designed to e re"resentative o! the nava# resource management tas$. Signi!icant

    advances in the state3o!3the3art in CSR have een achieved% and current e!!orts are !ocused on

  • 8/10/2019 36605087 Voice Recognition System Report

    14/21

    1

    integrating s"eech recognition and natura# #anguage "rocessing to a##ow s"o$en #anguage

    interaction with a nava# resource management system.

    Training Air Tra!!ic Contro##er

    Training !or air tra!!ic contro##ers /ATC0 re"resents an e(ce##ent a""#ication !or s"eechrecognition systems. ,any ATC training systems current#y re&uire a "erson to act as a B"seudo3

    "i#otB% engaging in a voice dia#og with the trainee contro##er% which simu#ates the dia#og which

    the contro##er wou#d have to conduct with "i#ots in a rea# ATC situation. S"eech recognition and

    synthesis techni&ues o!!er the "otentia# to e#iminate the need !or a "erson to act as "seudo3"i#ot%

    thus reducing training and su""ort "ersonne#. Air contro##er tas$s are a#so characterized y high#y

    structured s"eech as the "rimary out"ut o! the contro##er% hence reducing the di!!icu#ty o! the

    s"eech recognition tas$.

    The @.S. =ava# Training E&ui"ment Center has s"onsored a numer o! deve#o"ments o!

    "rototy"e ATC trainers using s"eech recognition. Fenera##y% the recognition accuracy !a##s shorto! "roviding grace!u# interaction etween the trainee and the system. However% the "rototy"e

    training systems have demonstrated a signi!icant "otentia# !or voice interaction in these systems%

    and in other training a""#ications. The @.S. =avy has s"onsored a #arge3sca#e e!!ort in ATC

    training systems% where a commercia# s"eech recognition unit was integrated with a com"#e(

    training system inc#uding dis"#ays and scenario creation. A#though the recognizer was

    constrained in vocau#ary% one o! the goa#s o! the training "rograms was to teach the contro##ers

    to s"ea$ in a constrained #anguage% using s"eci!ic vocau#ary s"eci!ica##y designed !or the ATC

    tas$. Research in >rance has !ocused on the a""#ication o! s"eech recognition in ATC training

    systems% directed at issues oth in s"eech recognition and in a""#ication o! tas$3domain grammar

    constraints.

    The @SA>% @S,C% @S Army% and >AA are current#y using ATC simu#ators with s"eech

    recognition !rom a numer o! di!!erent vendors% inc#uding @>A% Inc% and Adace# Systems Inc

    /ASI0. This so!tware uses s"eech recognition and synthetic s"eech to ena#e the trainee to contro#

    aircra!t and ground vehic#es in the simu#ation without the need !or "seudo "i#ots.

    Another a""roach to ATC simu#ation with s"eech recognition has een created y Su"remis. The

    Su"remis system is not constrained y rigid grammars im"osed y the under#ying #imitations o!

    other recognition strategies.

    Te#e"hony and -ther omains

    ASR in the !ie#d o! te#e"hony is now common"#ace and in the !ie#d o! com"uter gaming and

    simu#ation is ecoming more wides"read. es"ite the high #eve# o! integration with word

  • 8/10/2019 36605087 Voice Recognition System Report

    15/21

    1

    "rocessing in genera# "ersona# com"uting% however% ASR in the !ie#d o! document "roduction has

    not seen the e("ected increases in use.

    The im"rovement o! moi#e "rocessor s"eeds made !easi#e the s"eech3ena#ed Symian and

    *indows ,oi#e Smart "hones. S"eech is used most#y as a "art o! @ser Inter!ace% !or creating

    "re3de!ined or custom s"eech commands. ?eading so!tware vendors in this !ie#d areD ,icroso!t

    Cor"oration /,icroso!t Voice Command0% =uance Communications /=uance Voice Contro#0%Vito Techno#ogy /VIT- Voice7Fo0% S"eereo So!tware /S"eereo Voice Trans#ator0% igita#

    Sy"hon /Sonic ,assager a""#iance0 and SV-4.

    Peo"#e with isai#ities

    Peo"#e with disai#ities can ene!it !rom s"eech recognition "rograms. S"eech recognition is

    es"ecia##y use!u# !or "eo"#e who have di!!icu#ty using their hands% ranging !rom mi#d re"etitive

    stress inuries to invo#ved disai#ities that "rec#ude using conventiona# com"uter in"ut devices.

    In !act% "eo"#e who used the $eyoard a #ot and deve#o"ed RSI ecame an urgent ear#y mar$et!or s"eech recognition. S"eech recognition is used in dea! te#e"hony% such as voicemai# to te(t%

    re#ay services% and ca"tioned te#e"hone. Individua#s with #earning disai#ities who have "ro#ems

    with thought3to3"a"er communication /essentia##y they thin$ o! an idea ut it is "rocessed

    incorrect#y causing it to end u" di!!erent#y on "a"er0 can ene!it !rom the so!tware.

    1)!t+! A&&lication

    Automatic trans#ation+

    Automotive s"eech recognition

    Te#ematics /e.g. vehic#e =avigation Systems0+

    Court re"orting /Rea#3time Voice *riting0+

    Hands3!ree com"utingD voice command recognition com"uter user inter!ace+

    Home automation+

    Interactive voice res"onse+

    ,oi#e te#e"hony% inc#uding moi#e emai#+

    ,u#timoda# interaction+

    Pronunciation eva#uation in com"uter3aided #anguage #earning a""#ications+

  • 8/10/2019 36605087 Voice Recognition System Report

    16/21

    1

    Rootics+

    Video games% with Tom C#ancy's End *ar and ?i!e#ine as wor$ing e(am"#es+

    Transcri"tion /digita# s"eech3to3te(t0+

    S"eech3to3te(t /transcri"tion o! s"eech into moi#e te(t messages0+

    Air Tra!!ic Contro# S"eech Recognition

    CHAPTER 9

    Per!ormance o! Voice Recognition System

    The "er!ormance o! s"eech recognition systems is usua##y s"eci!ied in terms o! accuracy and

    s"eed. Accuracy is usua##y rated with word error rate /*ER0% whereas s"eed is measured withthe rea# time !actor. -ther measures o! accuracy inc#ude Sing#e *ord Error Rate /S*ER0 and

    Command Success Rate /CSR0.

    In 187 Murzwei# A""#ied Inte##igence and ragon Systems re#eased s"eech recognition

    "roducts. :y 189% Murzwei#)s so!tware had a vocau#ary o! 1% wordsOi! uttered one word at

    a time. Two years #ater% in 185% its #e(icon reached 7% words% entering the rea#m o! human

    vocau#aries% which range !rom 1% to 19% words. :ut recognition accuracy was on#y

    1N in 188L. Two years #ater% the error rate crossed e#ow 9N. ragon Systems re#eased

    B=atura##y S"ea$ingB in 1885 which recognized norma# human s"eech. Progress main#y came

    !rom im"roved com"uter "er!ormance and #arger source te(t dataases. The :rown Cor"us wasthe !irst maor dataase avai#a#e% containing severa# mi##ion words. In 71 recognition

    accuracy reached its current "#ateau o! N% no #onger growing with data or com"uting "ower. In

    7;% Foog#e "u#ished a tri##ion3word cor"us% whi#e Carnegie ,e##on @niversity researchers

    !ound no signi!icant increase in recognition accuracy.

    ictation in Voice Recognition Systems

    ictation machines can achieve good "er!ormance in contro##ed conditions. There is some

    con!usion% however% over the interchangeai#ity o! the terms Bs"eech recognitionB and

    BdictationB. Commercia# s"ea$er3de"endent dictation systems usua##y re&uire on#y a short

    training "eriod /sometimes a#so ca##ed Qenro##ment'0 and may success!u##y ca"ture continuous

    s"eech with a #arge vocau#ary at norma# "ace with a very high accuracy. ,ost commercia#

    com"anies c#aim that recognition so!tware can achieve etween 8N to 88N accuracy i!

    o"erated under o"tima# conditions. Q-"tima# conditions' usua##y assume that usersD

  • 8/10/2019 36605087 Voice Recognition System Report

    17/21

    1

    have s"eech characteristics which match the training data%

    can achieve "ro"er s"ea$er ada"tation% and

    *or$ in a c#ean noise environment /e.g. &uiet o!!ice or #aoratory s"ace0.

    This e("#ains why some users% such as those whose s"eech is heavi#y accented% e("erience much

    #ower recognition rates.

    ?imited vocau#ary systems% re&uiring no training% can recognize a sma## numer o! words /!or

    instance% the ten digits0 as s"o$en y most s"ea$ers. Such systems are "o"u#ar !or routing

    incoming "hone ca##s to their destinations in #arge organizations.

    A#gorithm used in Voice Recognition System

    :oth acoustic mode#ing and #anguage mode#ing are im"ortant "arts o! modern statistica##y3ased

    s"eech recognition a#gorithms. Hidden ,ar$ov mode#s /H,,s0 are wide#y used in manysystems. ?anguage mode#ing has many other a""#ications such as smart $eyoard and document

    c#assi!ication.

    Hiddn Ma!,ov %odl$

    ,odern genera#3"ur"ose s"eech recognition systems are ased on Hidden ,ar$ov ,ode#s.

    These are statistica# mode#s which out"ut a se&uence o! symo#s or &uantities. H,,s are used in

    s"eech recognition ecause a s"eech signa# can e viewed as a "iecewise stationary signa# or a

    short3time stationary signa#. In a short3time /e.g.% 1 mi##iseconds00% s"eech can e a""ro(imatedas a stationary "rocess. S"eech can e thought o! as a ,ar$ov mode# !or many stochastic

    "ur"oses.

    Another reason why H,,s are "o"u#ar is ecause they can e trained automatica##y and are

    sim"#e and com"utationa##y !easi#e to use. In s"eech recognition% the hidden ,ar$ov mode#wou#d out"ut a se&uence o! n3dimensiona# rea#3va#ued vectors /with neing a sma## integer% such

    as 10% out"utting one o! these every 1 mi##iseconds. The vectors wou#d consist o! ce"stra#

    coe!!icients% which are otained y ta$ing a >ourier trans!orm o! a short time window o! s"eechand dcor3re#ating the s"ectrum using a cosine trans!orm% then ta$ing the !irst /most signi!icant0

    coe!!icients. The hidden ,ar$ov mode# wi## tend to have in each state a statistica# distriution

    that is a mi(ture o! diagona# covariance Faussians which wi## give #i$e#ihood !or each oservedvector. Each word% or /!or more genera# s"eech recognition systems0% each "honeme% wi## have a

    di!!erent out"ut distriution+ a hidden ,ar$ov mode# !or a se&uence o! words or "honemes ismade y concatenating the individua# trained hidden ,ar$ov mode#s !or the se"arate words and

    "honemes.

    escried aove are the core e#ements o! the most common% H,,3ased a""roach to s"eech

    recognition. ,odern s"eech recognition systems use various cominations o! a numer o!

    standard techni&ues in order to im"rove resu#ts over the asic a""roach descried aove. Aty"ica# #arge3vocau#ary system wou#d need conte(t de"endency !or the "honemes /so "honemes

  • 8/10/2019 36605087 Voice Recognition System Report

    18/21

    1

    with di!!erent #e!t and right conte(t have di!!erent rea#izations as H,, states0+ it wou#d use

    ce"stra# norma#ization to norma#ize !or di!!erent s"ea$er and recording conditions+ !or !urthers"ea$er norma#ization it might use voca# tract #ength norma#ization /VT?=0 !or ma#e3!ema#e

    norma#ization and ma(imum #i$e#ihood #inear regression /,??R0 !or more genera# s"ea$er

    ada"tation. The !eatures wou#d have so3ca##ed de#ta and de#ta3de#ta coe!!icients to ca"ture s"eechdynamics and in addition might use heteroscedastic #inear discriminant ana#ysis /H?A0+ or

    might s$i" the de#ta and de#ta3de#ta coe!!icients and use s"#icing and an ?A3ased "roection

    !o##owed "erha"s y heteroscedastic #inear discriminant ana#ysis or a g#oa# semitied covariancetrans!orm /a#so $nown as ma(imum #i$e#ihood #inear trans!orm% or ,??T0. ,any systems useso3ca##ed discriminative training techni&ues which dis"ense with a "ure#y statistica# a""roach to

    H,, "arameter estimation and instead o"timize some c#assi!ication3re#ated measure o! the

    training data. E(am"#es are ma(imum mutua# in!ormation /,,I0% minimum c#assi!ication error/,CE0 and minimum "hone error /,PE0.

    ecoding o! the s"eech /the term !or what ha""ens when the system is "resented with a new

    utterance and must com"ute the most #i$e#y source sentence0 wou#d "roa#y use the Viteri

    a#gorithm to !ind the est "ath% and here there is a choice etween dynamica##y creating acomination hidden ,ar$ov mode# which inc#udes oth the acoustic and #anguage mode#

    in!ormation% or comining it statica##y e!orehand /the !inite state transducer% or >ST% a""roach0.

    D*na%ic ti% 0a!&ing "DT2#-/a$d $&c+ !cognition

    ynamic time war"ing is an a""roach that was historica##y used !or s"eech recognition ut hasnow #arge#y een dis"#aced y the more success!u# H,,3ased a""roach. ynamic time

    war"ing is an a#gorithm !or measuring simi#arity etween two se&uences which may vary in time

    or s"eed. >or instance% simi#arities in wa#$ing "atterns wou#d e detected% even i! in one videothe "erson was wa#$ing s#ow#y and i! in another they were wa#$ing more &uic$#y% or even i! there

    were acce#erations and dece#erations during the course o! one oservation. T* has een

    a""#ied to video% audio% and gra"hics indeed% any data which can e turned into a #inear

    re"resentation can e ana#yzed with T*.

    A we## $nown a""#ication has een automatic s"eech recognition% to co"e with di!!erent s"ea$ing

    s"eeds. In genera#% it is a method that a##ows a com"uter to !ind an o"tima# match etween two

    given se&uences /e.g. time series0 with certain restrictions% i.e. the se&uences are Bwar"edB non3

    #inear#y to match each other. This se&uence a#ignment method is o!ten used in the conte(t o!hidden ,ar$ov mode#s.

    CHAPTER ;>urther In!ormation

    Po"u#ar s"eech recognition con!erences he#d each year or two inc#ude S"eechTEM andS"eechTEM Euro"e% ICASSP% Euros"eechICS?P /now named Inters"eech0 and the IEEE ASR@.

  • 8/10/2019 36605087 Voice Recognition System Report

    19/21

  • 8/10/2019 36605087 Voice Recognition System Report

    20/21

    2

    addition to Eng#ish% and the FA?E "roect% which !ocused so#e#y on ,andarin and Araic and

    re&uired trans#ation simu#taneous#y with s"eech recognition.

    Commercia# research and other academic research a#so continue to !ocus on increasing#y di!!icu#t"ro#ems. -ne $ey area is to im"rove roustness o! s"eech recognition "er!ormance% not ust

    roustness against noise ut roustness against any condition that causes a maor degradation in

    "er!ormance. Another $ey area o! research is !ocused on an o""ortunity rather than a "ro#em.

    This research attem"ts to ta$e advantage o! the !act that in many a""#ications there is a #arge&uantity o! s"eech data avai#a#e% u" to mi##ions o! hours. It is too e("ensive to have humans

    transcrie such #arge &uantities o! s"eech% so the research !ocus is on deve#o"ing new methods o!machine #earning that can e!!ective#y uti#ize #arge &uantities o! un#ae#ed data. Another area o!

    research is etter understanding o! human ca"ai#ities and to use this understanding to im"rove

    machine recognition "er!ormance.

    Voice Recognition SystemsD *ea$ness and >#aws

    =o s"eech recognition system is 1 "ercent "er!ect+ severa# !actors can reduce accuracy. Some

    o! these !actors are issues that continue to im"rove as the techno#ogy im"roves. -thers can e#essened 33 i! not com"#ete#y corrected 33 y the user.

    3o0 $ignal-to-noi$ !atio

    The "rogram needs to BhearB the words s"o$en distinct#y% and any e(tra noise introduced into thesound wi## inter!ere with this. The noise can come !rom a numer o! sources% inc#uding #oud

    ac$ground noise in an o!!ice environment. @sers shou#d wor$ in a &uiet room with a

    &ua#ity micro"hone "ositioned as c#ose to their mouths as "ossi#e. ?ow3&ua#ity sound cards%

    which "rovide the in"ut !or the micro"hone to send the signa# to the com"uter% o!ten do not haveenough shie#ding !rom the e#ectrica# signa#s "roduced y other com"uter com"onents. They can

    introduce hum or hiss into the signa#.

    Ov!la&&ing $&c+

    Current systems have di!!icu#ty se"arating simu#taneous s"eech !rom mu#ti"#e users. BI! you tryto em"#oy recognition techno#ogy in conversations or meetings where "eo"#e !re&uent#y interru"t

    each other or ta#$ over one another% you're #i$e#y to get e(treme#y "oor resu#ts%B says ohn

    Faro!o#o.

    Intn$iv )$ o' co%&)t! &o0!

    Running the statistica# mode#s needed !or s"eech recognition re&uires the com"uter's "rocessor to

    do a #ot o! heavy wor$. -ne reason !or this is the need to rememer each stage o! the word3

  • 8/10/2019 36605087 Voice Recognition System Report

    21/21

    2

    recognition search in case the system needs to ac$trac$ to come u" with the right word. The

    !astest "ersona# com"uters in use today can sti## have di!!icu#ties with com"#icated commands or"hrases% s#owing down the res"onse time signi!icant#y. The vocau#aries needed y the "rograms

    a#so ta$e u" a #arge amount o! hard drive s"ace. >ortunate#y% dis$ storage and "rocessor s"eed are

    areas o! ra"id advancement 33 the com"uters in use 1 years !rom now wi## ene!it !rom ane("onentia# increase in oth !actors.

    Ho%on*%$Homonyms are two words that are s"e##ed di!!erent#y and have di!!erent meanings ut sound the

    same. BThereB and Btheir%B BairB and Bheir%B BeB and BeeB are a## e(am"#es. There is no way !ora s"eech recognition "rogram to te## the di!!erence etween these words ased on sound a#one.

    However% e(tensive training o! systems and statistica# mode#s that ta$e into account word conte(t

    has great#y im"roved their "er!ormance.

    CHAPTER 5

    R'!nc$4

    htt"Den.wi$i"edia.orgwi$iS"eechrecognition

    htt"De#ectronics.howstu!!wor$s.comgadgetshigh3tech3gadgetss"eech3recognition.htm

    htt"sDwww.microso!t.comena#e"roductswindowsvistas"eech.as"(

    www.nuance.comnatura##ys"ea$ing

    www.!a&s.orgdocs?inu(...S&c+3Rcognition3H-*T-.htm#