prosody modification in speech signals project by edi fridman & alex zalts supervision by yizhar...

12
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Upload: harold-payne

Post on 16-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Prosody modification in speech signals

Project by

Edi Fridman & Alex Zalts

supervision by

Yizhar Lavner

Page 2: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Prosody: the "non-textual" aspects of the speech signal

”Segmental" aspects: timing, duration, rhythm, stress, and metrical structure. The duration of each individual "segment"

is under the control of the speaker to varying degrees, and varies with stress and rate.

The relative strength of an individual syllable, word, or phrase may be realized in a number of ways, including

lengthening (or shortening and cliticization), changes in pitch, and amplitude, and spectral character.

Page 3: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Project goals

• Prosody modification with TDPSOLA algorithm

• Prosody modification with HNM model

• Conversion of male voice to female voice & vice versa

Page 4: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Four steps in prosody modification• Time-scale

modification

• Pitch-scale modification

• Energy envelope modification

• Modification of distribution of utterancers

0 50 100 150 200 250 300 350-0.04

-0.02

0

0.02

0.04

0 50 100 150 200 250 300 350 400 450 500-0.04

-0.02

0

0.02

0.04

0 50 100 150 200 250 300 350-0.04

-0.02

0

0.02

0.04

0 50 100 150 200 250 300 350-0.03

-0.02

-0.01

0

0.01

0.02

0 50 100 150 200 250 300 350-0.04

-0.02

0

0.02

0.04

0 50 100 150 200 250 300 350-0.04

-0.02

0

0.02

0.04

3500 4000 4500 5000 5500-0.02

-0.01

0

0.01

0.02

2500 3000 3500 4000 4500-0.02

-0.01

0

0.01

0.02

Page 5: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

TDPSOLA Approach

(*) Based on Overlapp-and-Add idea

(*) Synchronization with original pitch by:1) Setting up pitch marks in analysis signal2) Setting up new pitch marks in synthesis signal according to time-scale and pitch-scale factors (0.6 for pitch 1.3 for time)

(*) Building synthesis signal using OLA

Page 6: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Let us define time instants in analysis signal ta(s) as original pitch marks and pitch contour as P(t)

The stream of synthesis pitch-marks ts(u) is determined from ta(s) according to desired time-scale modification (tD(t)) and pitch-

scale modification Fp(P) by:

ts(u+1)-ts(u) = P`(t) dtts`(u+1)-ts`(u)

ts`(u+1)

ts`(u)

1

with

ts(u+1) = D(ts`(u+1))

Setting up new pitch marks

P`(t) = Fp (P(t))

Page 7: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Problem of TDPSOLA:

Impossible to change pitch contour

because algorithm is based on original pitch marks

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

Problem: too many pitch marks are not counted in, resulting bad sound quality

original pitch-marks

new pitch-marks

Page 8: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

HNM Approach

• Speech signal is modeled as harmonics of pitch plus noise

• Harmonics and noise are treated in different

• ways

• Synthesis and analysis are performed in pitch synchronous way

Page 9: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Let X(n) be the speech segment. According to HNM model can be found and written as:

)()(1

1 nwzhnXp

k

nkk

where the complex constants hk and zk are defined as:

)exp( kkk jAh Tfjz kk )2exp( hk - complex amplitude of harmonic K

fk - frequency of harmonic KT - sampling period

W(n) - noise

)(nX

To minimize error )()( nXnX

Page 10: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Amplitudes and phases of pitch-harmonics computed with Prony algorithm by minimizing least square error between harmonics and original signal yielding:

Harmonic K is set to be K*F0 where F0 is pitch that found by PDA

][*][ xZhZZ HH

In each voiced speech fragment maximum voiced frequency Fm is calculated and noise part obtained by filtering signal with HP filter with cutoff frequency Fm

In unvoiced fragments signal’s specturm is modeled by pth-order all-pole filter H(z). The noise is synthesized by filtering a unit variance gaussian noise through H(z)

When pitch scaling is done there is a need to re-compute amplitudes and phases of modified pitch-harmonics.

For this purpose a frequency-continuous spectral and phase envelope is

necessary.

Page 11: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Comparing between TDPSOLA & HNM

TDPSOLA HNM

Sound quality very good very goodwith possiblebuzziness

Pitch contourmodification

can be donewith a lot ofcomputationalload

can be donein easy way

Computational load low high

Page 12: Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

The only target in pitch-scaling was to change F0 preserving other

formants

There was an attempt to change spectral envelope in order to change male voice to female voice and vice

versa

New algorithm was proposed