seminar speech

Upload: dharmenderrkgit

Post on 08-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Seminar Speech

    1/14

    -: SPEECH RECOGNITION :-

    Introduction

    One dont have to be a scientist to know that the computer of the future will

    talk, listen and understand. One of them is the Apple Macintosh of today.

    Apples Speech Recognition and Speech Synthesis Technologies now give

    speech-savvy applications the power to carry out your voice commands and

    even speak back to you in plain English.

    Apple Speech Recognition lets the system (Macintosh) understand what you

    say, giving you a new dimension for interacting with and controlling your

    computer by voice. You dont even have to train it to understand your voice,

    because it already understands you, from your very first word. You can

    speak naturally, without pausing or stopping. Apples leadership in speech

    recognition technology makes it possible by bringing a whole new dimension

    to the user interface: speech. Combined with Voice-Over, speech synthesis

    will help turn the graphical user interface into a vocal user interface.

    Speech recognition (in many contexts also known as 'automatic speechrecognition', computer speech recognition or erroneously as Voice

    Recognition) is the process of converting a speech signal to a sequence of

    words, by means of an algorithm implemented as a computer program.

    Speech recognition applications that have emerged over the last years

    include voice dialing (e.g., Call home), call routing (e.g.,I would like to

    make a collect call), simple data entry (e.g., entering a credit card number),

    and preparation of structured documents (e.g., a radiology report).

    Voice Verification or speaker recognition is a related process that attempts to

    identify the person speaking, as opposed to what is being said.

    1

  • 8/7/2019 Seminar Speech

    2/14

    Speech Technology Development at IBM:

    The overall view, with emphasis on Via-Scribe and Accessibility

    Speech technologies development, deployments

    Technology Applications

    Large Vocabulary Speech

    Recognition

    Broadcast news transcription, Content spotting and indexing,

    Via-Scribe, MALACH, DARPA projects

    Telephony Speech

    Recognition (+natural

    language understanding)

    Mutual funds transactions, contact center call routing, contact

    center analytics

    Embedded Speech

    Recognition

    (+ multimodal input)

    Embedded speech in telematics (e.g., vehicles), devices (e.g.,

    cell phones, pdas, etc.) And other consumer appliances (e.g.,

    set top boxes, DVD players).

    Audio Visual Speech Improved ASR on trading floor

    2

  • 8/7/2019 Seminar Speech

    3/14

    Recognition

    Conversational Biometrics Speaker identification, speaker verification

    Text to Speech Synthesis Home Page Reader, viavoice

    Machine Translation MASTOR, DARPA projects, websphere

    Speech Analytics: Automated Quality Assurance Application

    Monitor 100% of calls

    Download recorded calls daily from across North America

    Answer questions and assign default ratings

    Provide a ranked list to human monitors to focus on bad calls

    Speech recognition is the process of converting an acoustic signal, captured by

    a microphone or a telephone, to a set of words. The recognized words can be the

    final results, as for applications such as commands & control, data entry, and

    document preparation. They can also serve as the input to further linguistic

    processing in order to achieve speech understanding.

    An isolated-word speech recognition system requires that the speaker pause

    briefly between words, whereas a continuous speech recognition system does not.

    Spontaneous, or extemporaneously generated, speech contains disfluencies, and is

    much more difficult to recognize than speech read from script. Some systems

    require speaker enrollment---a user must provide samples of his or her speech

    before using them, whereas other systems are said to be speaker-independent, in

    that no enrollment is necessary. Some of the other parameters depend on the

    specific task. Recognition is generally more difficult when vocabularies are large

    or have many similar-sounding words. When speech is produced in a sequence of

    words, language models or artificial grammars are used to restrict the combination

    of words.

    3

  • 8/7/2019 Seminar Speech

    4/14

    Speech recognition is a technology that is constantly evolving. It is a technology

    that is experiencing tremendous growth in the commercial market, apart from its

    original niche as an assistive technology product. There are presently three majorcompanies with speech recognition products, Dragon Systems, Lernout & Hauspie

    (L&H), and IBM. Stiff competition between these companies and more demand

    from consumer and business markets, has led to a tremendous drop in prices over

    the last few years. Competition has also fueled the development of a plethora of

    new products. Each company has several products available, ranging in price,

    features, and the applications that they support. This paper seeks to make sense of

    the overwhelming array of products so that persons who are shopping for speech

    recognition will have a better understanding of their choices.

    What are the Types of Speech Recognition?

    *Discrete

    Slower dictation process - better for persons with difficulty in language

    processing or in fluid speech

    Word-by-word style, rather than phrases, reflects the way beginning writers

    form sentences

    *Continuous

    Processes speech by phrase

    Takes context into account

    Is less accurate if phrases are interrupted

    Advantages: Speed and accuracy (for most users)

    Who Can Benefit from Speech Recognition?

    Persons with mobility impairments or injuries that prevent keyboard access

    Persons who have or who are seeking to prevent repetitive stress injuries

    Persons with writing difficulties

    Any person who want hands-free access to the computer

    4

  • 8/7/2019 Seminar Speech

    5/14

    Any persons who wants to increase their typing speed

    (reportedly up to 160 wpm)

    What is Required to Use Speech Recognition? A Powerful Computer

    Consistent Speech (not necessarily intelligible)

    Fluid speech (i.e., not pausing between words) desirable for use of

    continuous speech products

    Patience

    Basic knowledge of computers

    Fairly high cognitive ability

    Applications of speech recognition

    Command recognition - Voice user interface with the computer

    Dictation

    Interactive Voice Response

    Automotive speech recognition

    Medical Transcription

    Pronunciation Teaching in computer-aided language learning applications

    Automatic Translation

    Hands-free computing

    Speech Analysis

    Speech analysis/input deals with the the following research areas;

    Speech Analysis

    5

  • 8/7/2019 Seminar Speech

    6/14

    WHO? What? How?

    Verification Identification Recognition Understanding

    Human speech has certain characteristics determined by a speaker. Hence,

    speech analysis can serve to analyze who is speaking,i.e. To recognize a

    speaker for his/her identification and verification. The computer identifies and

    verifies the speaker using an acoustic finger print. An acoustic finger print is a

    digitally stored speech probe of a person; for example a company that uses the

    speech analysis for identification and verification of the employees. The

    employee has to say a certain sentence into a microphone. The computer

    system gets the speakers voice, identifies it and verifies the spoken statement.

    Another main task of the speech analysis is to analyses what has been said,i.e.

    To recognize and understand the speech signal itself. Based on the speech

    sequence the corresponding text is generated. This can lead to a speech

    controlled type writer, a translation system or part of a workplace for the

    physically-challenged.

    Another area of speech analysis tries to reseach sppech paterns with respect to

    how a certain statement was said. For example, a spoken sentence sound s

    differently if a person is angry or calm. An another application of this research

    could be a LIE-DETECTOR.

    The primary goal of the speech analysis is to correctly determine individual words

    with probability 1. A word is recognized only with a certain probability.

    Environmental noise, room acoustics and a speakers physical and

    psychological conditions play an important role.

    6

  • 8/7/2019 Seminar Speech

    7/14

    For example, lets assume extremely bad individual words recognition with a

    probability of 0.95. This means that 5% of the words are incorrectly

    recognized. If we have a sentence with three words, the probability of

    recognizing the sentence correctly is 0.95 0.95 0.95 = 0.857.

    This small example should emphasize the fact that speech analysis system

    should have a very high individual word recognize the fact that speech

    analysis system should have a very high individual word recognition

    probability.

    Speech recognition system

    Speech

    Special chip Main Program

    Recognized Speech

    The speech recognition system is divided into system components according

    to a baisic principle: Data Reduction Through property Extraction.

    First speech analysis occrs where properties must be determined.

    7

    Reference storage:

    Properties of

    Learned Material

    Speech Analysis:

    Parameters;

    Response,Pro ert Extraction

    Problem Recognition:

    Comparison with

    Reference,Decision

    --:Speech Recognition System :--

  • 8/7/2019 Seminar Speech

    8/14

    Speech

    Understoodspeech

    Properties are extracted by comparision of individual speech element

    characteristics with a sequence of in advance given speech element

    characteristics. The characteristics with a sequence of in advance

    given speech elements are present.

    Second, the speech elements are compared with existent reference to

    determine the mapping to one of the existent speech elements. The

    identified speech can be stored, transmitted or processed as a

    parameterized sequence of speech elements.

    Usually the comparison and decision are executed through the main

    system processor. The computers secondary storage contains theletter0to-phone rules, a Dictionary of exceptions and a reference

    characteristics. The concrete methods differ in definition of the

    characteristics. The principle of data reduction through property

    extraction, can be applied several times to different characteristics. The

    system which provides recognition and understanding of a speech signal

    applies this principle several times:-

    Sound pattern Syntax SemanticsWord model

    Acoustical and Syntactical Semantic

    Phonetic Analysis Analysis Analysis

    Recognized Speech

    8

    Components of speech recognition and understanding.

  • 8/7/2019 Seminar Speech

    9/14

  • 8/7/2019 Seminar Speech

    10/14

    Speech recognition systems are divided intospeaker independent

    recognition systems andspeaker-dependentrecognition system. A speaker

    independent system can recognisewith the same reliability essentially fewer

    words than a speaker dependent system because the latter is TRAINED IN

    ADVANCE. Training in advance means that there exists a training phase for

    the speech recognition system, which takes a half an hour. speaker-

    dependentrecognition system can recognize around 25,000 words,speaker-

    independentrecognition system can recognize around 500 words but with a

    worse recognition rate. These should be understood as gross guidelines.

    Speech Transmission

    The area of speech transmission deals with the efficient coding to

    transmit the speech/sound signal correctly and efficiently over networks

    such that the same quality of speech/sound. Some principles are:

    Signal form coding

    Here no speech specific properties and parameters are needed.

    Here the goal is to schieve the most effiecent of the audio signal. The

    data rate of a PCM coded sterio audio signal with CD-quqlity

    requirements is 1,411,200 bits/s.

    Telephony quality , in comparision to Cd quality needs only 64

    kbits/s. using DPCM,the data rate can be lowered to 56 kbits/s

    without loss of quality.

    Recognition/synthesis Methods

    There have been attempt to reduce transmission rate using pure

    recognition /synthesis methods. Speech analysis (recognition)

    10

  • 8/7/2019 Seminar Speech

    11/14

    follows on the sender side of a speech transmission system and

    speech synthesis (generation) follows on the receiver side.

    Analog Speech signal Speech Recognition

    Coded Speech Signal

    Speech Synthesis Analog speech signal

    Conclusion

    The major players in the speech recognition market are

    Dragon Systems,Lernout & Hauspie (L&H), andIBM. Each

    company offers several products, ranging in price and features. Because of

    the variety of products available, shopping for a speech recognition system

    can be an overwhelming experience.

    Dragons original product, Dragon Dictate, is currently the only product

    that uses the discrete speech model. Discrete speech, is the best solution for

    persons with difficulty in language processing or in fluid speech, or who

    form sentences one word at a time, rather than in phrases. The latest version,

    3.0 Classic, offers fully functional voice control across all applications. It is

    the only current speech recognition product that supports Windows 3.x.

    Because it uses discrete speech, it is better than current continuous speech

    products at recognizing the speech patterns of persons who naturally pause

    between words, and seems to be better at learning to recognize persons with

    unique speech patterns. Unfortunately, Dragon Systems has discontinued

    development on this product, as the companys focus is now on continuous

    speech products, which are more viable in the larger commercial market.

    11

  • 8/7/2019 Seminar Speech

    12/14

    Dragons current continuous speech product line, known as Dragon

    NaturallySpeaking, includes a Standard, Preferred, and Professional edition,

    listed in order from low end to high end. The Preferred edition includes

    dictation playback and text-to-speech, features that distinguish it from the

    Standard edition. The Preferred edition also supports input from an external

    recording device, although no recording device is provided. A special

    version of the Preferred edition, Dragon NaturallySpeaking Mobile, does

    include a digital recording device for additional cost. On the high end of

    Dragons NaturallySpeaking product line, the Professional edition is

    distinguished by its expanded macro and scripting capabilities, which allow

    users to dictate long sections of text or complex computer operations with

    simple commands. The Professional edition also comes in Legal and Medical

    versions, which feature custom vocabularies for these disciplines.

    L & Hproducts are based on speech recognition technology

    developed by Kurzweil, a major pioneer in speech recognition.

    The current L&H product line, called VoiceXpress, includes a Standard,

    Advanced, and Professional edition. The differences in these editions are

    fairly straightforward. In the Standard edition, VoiceXpresss natural

    language command interface works only in L&Hs own word processing

    application, called XpressPad. The Advanced edition extends natural

    language support to include Microsoft Word. The Professional edition

    further extends natural language support to encompass the entire Microsoft

    Office suite, plus Internet Explorer. The Professional edition also provides

    support for recorded dictation, and includes a bundled digital recorder.

    IBM has been a major player in speech recognition for many

    years. Its discrete speech product, IBM VoiceType, was a12

  • 8/7/2019 Seminar Speech

    13/14

    major competitor of Dragon Dictate. However, IBM has discontinued this

    product and is now focusing all its efforts on developing continuous speech

    products. Its current product line, IBM ViaVoice Millenium, includes a

    Standard, Web and Professional edition. The web edition features natural

    language commands for Internet Explorer, Netscape Communicator and

    America Online. The web edition also features a specialized vocabulary for

    on-line chats. The Professional edition provides most of the features of the

    Web edition, but also provides natural language commands for the entire

    Microsoft Office suite, and specialized business, finance, and computer

    vocabularies.

    Although speech recognition got its start as an assistive technology product,

    the commercial market has fueled its rapid development in recent years, and

    the primary target market of each of the companies described above is now

    the general public, rather than persons with disabilities.

    A person who has a disability or who works with persons with disabilities

    will come out of this system with a more accurate representation on which

    speech recognition products will best work with them. There is a lot of

    confusion today about speech recognition products. The main focus of this

    presentation is to clarify the speech recognition technology.

    References

    13

  • 8/7/2019 Seminar Speech

    14/14

    Multilingual Speech Processing, Edited by Tanja Schultz and

    Katrin Kirchhoff, April 2006

    Multimedia : COMPUTING ,COMMNICATIONS &

    APPLICATIONS (By. RALF STEINMETZ & KLARA

    NABRSTED)

    www.software.ibm.com/speech/

    www.dragonsys.com

    http://cslu.cse.ogi.edu/HLTsurvey/ch1node5.html

    http://www.apple.com/macosx/developertools/

    14