the fundamental frequency variation spectrum
DESCRIPTION
THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM. FONETIK 2008 Kornel Laskowski , Mattias Heldner and Jens Edlund interACT , Carnegie Mellon University, Pittsburgh PA, USA Centre for Speech Technology, KTH Stockholm, Sweden. Speaker: Hsiao- Tsung. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/1.jpg)
THE FUNDAMENTAL FREQUENCY VARIATION
SPECTRUMFONETIK 2008
Kornel Laskowski, Mattias Heldner and Jens EdlundinterACT, Carnegie Mellon University, Pittsburgh PA, USACentre for Speech Technology, KTH Stockholm, Sweden
Speaker: Hsiao-Tsung
![Page 2: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/2.jpg)
Introduction While speech recognition systems have long ago
transitioned from formant localization to spectral (vector-valued) formant representations.
Prosodic processing continues to rely squarely on a pitch tracker’s ability to identify a peak, corresponding to the fundamental frequency(f0) of the speaker.
Even if a robust, local, analytic, statistical estimate of absolute pitch were available, applications require a representation of pitch variation and go to considerable additional effort to identify a speaker-dependent quantity for normalization
![Page 3: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/3.jpg)
The Fundamental Frequency Variation Spectrum
Instantaneous variation in pitch is normally computed by determining a single scalar, the F0, at two temporally adjacent instants and forming their difference.
![Page 4: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/4.jpg)
The Fundamental Frequency Variation Spectrum
we propose a vector-valued representation of pitch variation, inspired by vanishing-point perspective(透視 )
While the standard inner product between two vectors can be viewed as the summation of pair-wise products with pairs selected by orthonormal projection onto a point at infinity
F: signal’s spectral content (512-point FFT)
![Page 5: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/5.jpg)
The Fundamental Frequency Variation Spectrum
the proposed vanishing-point product induces a 1-point perspective projection onto a point at
![Page 6: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/6.jpg)
The Fundamental Frequency Variation Spectrum
The FFV spectrum is then given by
is undefined over the interval [-T0, +T0]
![Page 7: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/7.jpg)
The Fundamental Frequency Variation Spectrum
A support for which is continuous over In practice, we compute using magnitude rather than
complex spectra
![Page 8: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/8.jpg)
The Fundamental Frequency Variation Spectrum
and are 512-point Fourier transforms, computed every 8 ms.
However, the discrete transforms FL and FR are in general not defind at the corresponding dilate frequencies .
We resort to linear interpolation using the coefficients
![Page 9: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/9.jpg)
The Fundamental Frequency Variation Spectrum
Energy independent
![Page 10: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/10.jpg)
Filterbank
Rapidly changing
slowly changing
![Page 11: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/11.jpg)
Filterbank
![Page 12: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM](https://reader036.vdocument.in/reader036/viewer/2022081505/568161e5550346895dd207a5/html5/thumbnails/12.jpg)
Discussion Initial experiments along these lines show that such
HMMs, when trained on dialogue data, corroborate research on human turn-taking behavior in conversations.
does not require peak identification, dynamic time warping, median filtering, landmark detection, linearization, or mean pitch estimation and subtraction
Immediate next steps include fine-tuning the filter banks and the HMM topologies, and testing the results on other tasks where pitch movements are expected to play a role, such as the attitudinal coloring of short feedback utterances, speaker verification, and automatic speech recognition for tonal languages.