voice morphing

Voice Morphing Voice morphing means the transition of one speech signal into another. Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and linear transformations estimated from time-aligned parallel training data are commonly used to achieve this. Applications {some of the applications } 1. Text To Speech (TTS) 2. In public speech systems 3. For special effects ( just like video or image morphing is done ) 4. To diminish Ethnical barriers. {Focus on TTS the most………} Text To Speech (TTS):- A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Upload: naveen-krishnan

Post on 17-Nov-2014

12.426 views

Category:

Documents

2 download

Report

Download

Embed Size (px):

TRANSCRIPT

Voice Morphing

Voice morphing means the transition of one speech signal into another. Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker.

The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and linear transformations estimated from time-aligned parallel training data are commonly used to achieve this.

Applications

{some of the applications }

1. Text To Speech (TTS)2. In public speech systems3. For special effects ( just like video or image morphing is done )4. To diminish Ethnical barriers.

{Focus on TTS the most………}

Text To Speech (TTS):-

A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database.

{ That is we keep a database of the different phonetics and substitute the ones that correspond in our text }

Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity.

{ ‘phones’ and ‘diphones’ are phonetic terms here… phone means an individual phonetic element, while diphones is an adjacent pair of phones . Google for more info…}

For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.

Fig - List of different TTS systems

Public Speech systems:

In public speech systems we can make the sound to be of a popular public speaker.

{ there are a lot of advantages for this…… like1. The public speaker doesn’t need to be physically present2. We can implement that in many places (in fact railway

announcement uses a very crude form of the same idea)3. Cost efficiency

}

Special Effects

Video and Image morphing is extensively used for film and graphical special effects. Similarly we can increase the multimedia animation experience by simultaneously morphing the images/video while doing the audio also.

To diminish Ethnical barriers

Through voice morphing, we can give accent corrections and even translations!!!!!!!

{ That is a German engineer can instruct a Chinese workman, an American caller can understand an Indian call center guy better, etc)

The Ethical barriers and small talks hugely hinder an effective communication. Thus through the voice morphing we can improve the communication and thus ultimately the through-put.

Limitations1. The voice detection is done via sophisticated 3d renderings but

this there are a lot of normalizing problems { that is extracting the meaning / understanding the sound is difficult }

2. Some applications require extensive sound libraries.