implementation of a speech analysis-synthesis toolbox using harmonic plus noise model didier cadic...
TRANSCRIPT
![Page 1: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/1.jpg)
Implementation of a speech Implementation of a speech Analysis-Synthesis Toolbox using Analysis-Synthesis Toolbox using
Harmonic plus Noise ModelHarmonic plus Noise Model
Didier CadicDidier Cadic11, engineering student, engineering student
supervised bysupervised by
Olivier CappéOlivier Cappé11, Maurice Charbit, Maurice Charbit11, , Gérard CholletGérard Chollet11, Eric Moulines, Eric Moulines11
(presented here by Guido Aversano(presented here by Guido Aversano1,21,2))22IIASS, IIASS, Vietri sul Mare (SA), ItalyVietri sul Mare (SA), Italy
11Département TSI, ENST, Paris, FranceDépartement TSI, ENST, Paris, France
![Page 2: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/2.jpg)
Plan of the presentationPlan of the presentation
Text-to-speech: classic methodsText-to-speech: classic methods
HNM modelHNM model
AnalysisAnalysis
SynthesisSynthesis
Analysis-Synthesis examplesAnalysis-Synthesis examples
ConclusionsConclusions
![Page 3: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/3.jpg)
Text-To-Speech by concatenationText-To-Speech by concatenation
EnglishEnglish, male, male
EnglishEnglish, female (vocal server example), female (vocal server example)
EnglishEnglish, female (another vocal server example), female (another vocal server example)
GermanGerman, male, male
FrenchFrench, female, female
Examples realized on the AT&T web site:Examples realized on the AT&T web site:
![Page 4: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/4.jpg)
Text-To-Speech by concatenationText-To-Speech by concatenation
2 major challenges :2 major challenges :
smooth connection between acoustic unitssmooth connection between acoustic units
flexible prosodyflexible prosody
![Page 5: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/5.jpg)
TD-PSOLA methodTD-PSOLA method
Analysis :Analysis :
Pitch estimationPitch estimation
Pitch-synchronous Pitch-synchronous windowing windowing
Synthesis :Synthesis :
Rearrangement of Rearrangement of framesframes
![Page 6: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/6.jpg)
TD-PSOLA methodTD-PSOLA method
Some very good-quality results:Some very good-quality results:
Singing, originalSinging, original
Singing, modifiedSinging, modified
Time-scalingTime-scaling
Cello, originalCello, original
Cello, modifiedCello, modified
Pitch-shiftingPitch-shifting
![Page 7: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/7.jpg)
TD-PSOLA methodTD-PSOLA method
"rain", original"rain", original
"rain", 0.5 rate"rain", 0.5 rate
"ss", original"ss", original
"ss", slowed down (classic method)"ss", slowed down (classic method)
"ss", slowed down (improved)"ss", slowed down (improved)
Artifacts appearing in non-voiced sounds:Artifacts appearing in non-voiced sounds:
![Page 8: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/8.jpg)
Phase Vocoder methodPhase Vocoder method
Intuitive description:Intuitive description:
Compression/stretchingCompression/stretchingof (narrow-band) spectrogram’s of (narrow-band) spectrogram’s time-frequency scales…time-frequency scales…
time-scalingtime-scaling
pitch-shiftingpitch-shifting
![Page 9: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/9.jpg)
Phase Vocoder methodPhase Vocoder method
Examples :Examples :
"rain", male voice"rain", male voice
Slow-motion by Vocoder (PSOLA : )Slow-motion by Vocoder (PSOLA : )
"The quick fox …", female voice"The quick fox …", female voice
Slow-motion by VocoderSlow-motion by Vocoder
Main problem :Main problem : phase coherence is lost in the synthesized signalphase coherence is lost in the synthesized signal
![Page 10: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/10.jpg)
TD-PSOLA and Vocoder allow TD-PSOLA and Vocoder allow basic prosodic modifications. basic prosodic modifications.
The problem of unit concatenation for TTS isThe problem of unit concatenation for TTS is not solved. not solved.
Other kinds of modifications (timbre,Other kinds of modifications (timbre, denoising, …) should be considered. denoising, …) should be considered.
We need a parametric modelWe need a parametric model
![Page 11: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/11.jpg)
Harmonic plus Noise Model (HNM)Harmonic plus Noise Model (HNM)
Main assumption :Main assumption :
stationary segments of a stationary segments of a speech signal can be speech signal can be always seen as the always seen as the superposition of a periodic superposition of a periodic and a noisy partand a noisy part
![Page 12: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/12.jpg)
HNM ModelHNM Model
Modelling :Modelling :
S(t)S(t) H(t)H(t) B(t)B(t)== ++
where :where : H(t) = H(t) = A Ak k cos ( 2cos ( 2 k f k f0 0 t + t + k k ))
andand B(t) = white noise passed through an AR filterB(t) = white noise passed through an AR filter
![Page 13: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/13.jpg)
HNM analysis of a frameHNM analysis of a frame
1.1. Pitch estimationPitch estimation
Spectral comb methodSpectral comb method
![Page 14: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/14.jpg)
HNM analysis of a frameHNM analysis of a frame
1.1. Pitch estimationPitch estimation
Good results are obtainedGood results are obtained
In some cases the method In some cases the method erroneously returns f0/2erroneously returns f0/2
Possibility of tracking…Possibility of tracking…
"aka…aga""aka…aga"
![Page 15: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/15.jpg)
HNM analysis of a frameHNM analysis of a frame
2.2. Harmonic part: extraction of amplitudesHarmonic part: extraction of amplitudes
Least squares methodLeast squares method
H(t) = H(t) = aakk cos ( 2cos ( 2k fk f0 0 t ) + t ) + bbkk sin ( 2sin ( 2k fk f0 0 t )t )
minmin s(t) – H(t) s(t) – H(t) 22
aak, k, bbkk
![Page 16: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/16.jpg)
HNM analysis of a frameHNM analysis of a frame
2.2. Extraction of amplitudesExtraction of amplitudes
Problem: the noisy part gives aProblem: the noisy part gives anon-null contribution to the non-null contribution to the spectral powerspectral power
Gain correction for the harmonicsGain correction for the harmonics(using an euristic formula (using an euristic formula gg((DVDV), where ), where DVDV is the estimated voicing degree) is the estimated voicing degree)
![Page 17: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/17.jpg)
HNM analysis of a frameHNM analysis of a frame
2.2. Extraction of amplitudesExtraction of amplitudes
Residual:Residual: R(t) = s(t) - H(t)R(t) = s(t) - H(t)
![Page 18: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/18.jpg)
HNM analysis of a frameHNM analysis of a frame
2.2. Extraction of amplitudesExtraction of amplitudes
Possibility of improving harmonic estimationPossibility of improving harmonic estimation
![Page 19: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/19.jpg)
where Bg = gaussian white noisewhere Bg = gaussian white noise
and F(t) = AR filter, F(z) =and F(t) = AR filter, F(z) =
HNM analysis of a frameHNM analysis of a frame
3.3. AR filter estimation for the residual:AR filter estimation for the residual:
Linear prediction methodLinear prediction method
R(t) = Bg R(t) = Bg F(t) F(t)
aa0 0 + a+ a1 1 zz-1 -1 + … + a+ … + aN N zz-N-N
11
![Page 20: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/20.jpg)
HNM SynthesisHNM Synthesis
Interpolation for each harmonic between Interpolation for each harmonic between two succesive framestwo succesive frames
H(t) = H(t) = aakk(t)(t) cos ( 2cos ( 2k fk f00(t)(t) t ) + t ) + bbkk(t)(t) sin ( 2sin ( 2k fk f00(t)(t) t ) =t ) =
= = AAkk(t)(t) cos cos kk(t)(t)
kk(t(taa) = 2) = 2k fk f00(t(taa) ) is known by pitch analysisis known by pitch analysis..
AAkk(t(taa) and ) and kk(t(taa) ) are known at analysis instants tare known at analysis instants taa
![Page 21: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/21.jpg)
HNM SynthesisHNM Synthesis
Erroneous pitch (usually f0/2)Erroneous pitch (usually f0/2)
harmonic correspondence problemharmonic correspondence problem
is solved introducing fictitious harmonicsis solved introducing fictitious harmonics
![Page 22: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/22.jpg)
HNM SynthesisHNM Synthesis
AAk k cos cos kk(t)(t)Linear interpolation Linear interpolation
UnwrappingUnwrapping + + cubic interpolationcubic interpolation
![Page 23: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/23.jpg)
HNM SynthesisHNM Synthesis
Noisy partNoisy part
Generation of normally distributed random Generation of normally distributed random numbersnumbers
AR filtering (abrupt changes of coefficients AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)between 2 windows have no incidence…)
![Page 24: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/24.jpg)
HNM SynthesisHNM Synthesis
ResultsResults
"Carottes" :"Carottes" :synthesizedsynthesized
originaloriginal
"Lawyer" :"Lawyer" :synthesizedsynthesized
originaloriginal
Tuba :Tuba :synthesizedsynthesized
originaloriginal
"wazi" :"wazi" :synthesizedsynthesized
originaloriginal
a-e-i-o-u :a-e-i-o-u :synthesizedsynthesized
originaloriginal
singing :singing :synthesizedsynthesized
originaloriginal
![Page 25: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/25.jpg)
HNM SynthesisHNM Synthesis
ResultsResults
Discours :Discours :synthesizedsynthesized
originaloriginal
"aka aga" :"aka aga" :synthesizedsynthesized
originaloriginalDussolier :Dussolier : synthesizedsynthesized
originaloriginal
Andie :Andie :synthesizedsynthesized
originaloriginal
noisy partnoisy part
"coiffe" :"coiffe" :synthesizedsynthesized
originaloriginal
![Page 26: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/26.jpg)
Synthesis with time-stretchingSynthesis with time-stretching
Synthesis instants (tSynthesis instants (tss) ) Analysis instants (t Analysis instants (taa))
The following parameters remain unchanged:The following parameters remain unchanged:
Noisy part parametersNoisy part parameters
The pitchThe pitch
The amplitudes AThe amplitudes Akk of the harmonics of the harmonics
![Page 27: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/27.jpg)
Synthesis with time-stretchingSynthesis with time-stretching
Simple phase trajectories resamplingSimple phase trajectories resampling
oror
"harmonic" rephasing"harmonic" rephasing
Phase adaptationPhase adaptation
a-e-i-o-u :a-e-i-o-u : slow-motion with phase "stretching"slow-motion with phase "stretching"originaloriginal
slow-motion with "harmonic" rephasingslow-motion with "harmonic" rephasing
![Page 28: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/28.jpg)
Final resultsFinal results
OriginalOriginal 11Synthesized with rate : Synthesized with rate :
0.40.4 0.50.5 0.60.6 0.70.7 0.80.8 1.21.2 1.51.5 22
"carottes" :"carottes" :"lawyer" :"lawyer" :
tuba :tuba :"wazi" :"wazi" :singing :singing :
"a-e-i-o-u" :"a-e-i-o-u" :Dussolier :Dussolier :Discours :Discours :
Andie :Andie :"aka aga":"aka aga":"coiffe" :"coiffe" :
![Page 29: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé](https://reader035.vdocument.in/reader035/viewer/2022070411/56649f3d5503460f94c5ce80/html5/thumbnails/29.jpg)
ConclusionsConclusions
Good results, showing method’s potential for Good results, showing method’s potential for different applications including TTSdifferent applications including TTS
Future work will include other kinds of Future work will include other kinds of modifications (pitch shifting, timbre etc.)modifications (pitch shifting, timbre etc.)