landmark-based speech recognition: spectrogram reading, support vector machines, dynamic bayesian...

Landmark-Based Speech Recognition:

Spectrogram Reading,Support Vector Machines,

Dynamic Bayesian Networks,and Phonology

Mark [email protected]

University of Illinois at Urbana-Champaign, USA

mailto:[email protected]

Lecture 2: Acoustics of Vowel and Glide Production

• One-Dimensional Linear Acoustics– The Acoustic Wave Equation– Transmission Lines– Standing Wave Patterns

• One-Tube Models– Schwa– Front cavity resonance of fricatives

• Two-Tube Models– The vowel /a/– Helmholtz Resonator– The vowels /u,i,e/

• Perturbation Theory– The vowels /u/, /o/ revisited– Glides

1. One-Dimensional Acoustic Wave Equation

and Solutions

Acoustics: Constitutive Equations

Acoustic Plane Waves: Time Domain

Acoustic Plane Waves: Frequency Domain

Tex

Solution for a Tube with Constant Area and Hard Walls

2. One-Tube Models

Boundary Conditions

L0

Resonant Frequencies

Standing Wave Patterns

Standing Wave Patterns: Quarter-Wave Resonators

Tube Closed at the Left End, Open at the Right End

Standing Wave Patterns: Half-Wave Resonators

Tube Closed at Both Ends Tube Open at Both Ends

Schwa and Invv (the vowels in “a tug”)

F1=500Hz=c/4L

F2=1500Hz=3c/4L

F3=2500Hz=5c/4L

Front Cavity Resonances of a Fricative

/s/: Front Cavity Resonance = 4500Hz 4500Hz = c/4L if Front Cavity Length is L=1.9cm

/sh/: Front Cavity Resonance = 2200Hz 2200Hz = c/4L if Front Cavity Length is L=4.0cm

3. Two-Tube Models

Conservation of Mass at the Juncture of Two Tubes

A1

A2 = A1/2

U1(x,t) U2(x,t)= 2U1(x,t)

Total liters/second transmitted = (velocity) X (tube area)

Two-Tube Model: Two Different Sets of Waves

Incident Wave P1+

Reflected Wave P1- Incident Wave P2-

Reflected Wave P2+

Two-Tube Model: Solution in the Time Domain

Two-Tube Model in the Frequency Domain

Approximate Solution of the Two-Tube Model, A1>>A2

Approximate solution: Assume that the two tubes are completely decoupled, so that the formants include - F(BACK CAVITY) = c/4 LBACK

- F(FRONT CAVITY) = c/4LFRONT

LBACK

LFRONT

The Vowels /AA/, /AH/

LBACK

LFRONT

LBACK=8.8cm F2= c/4LBACK = 1000Hz

LFRONT=12.6cm F1= c/4LFRONT = 700Hz

Acoustic Impedance

Z(,j)

0

Z(,j)

0

Low-Frequency Approximations of Acoustic Impedance

Helmholtz Resonator

-Z1(,j) =

0

Z2(,j)

0

The Vowel /i/

Back Cavity = Pharynx Resonances: 0Hz, 2000Hz, 4000HzFront Cavity = Palatal Constriction Resonances: 0Hz, 2500Hz, 5000Hz

Back Cavity Volume = 70cm3

Front Cavity Length/Area = 7cm-1

1/2√MC = 250Hz

Helmholtz Resonance replaces all 0Hz partial-tube resonances.

250Hz

2000Hz2500Hz

The Vowel /u/: A Two-Tube Model

Back Cavity = Mouth + Pharynx Resonances: 0Hz, 1000Hz, 2000HzFront Cavity = Lips Resonances: 0Hz, 18000Hz, …

Back Cavity Volume = 200cm3

Front Cavity Length/Area = 2cm-1

1/2√MC = 250Hz

Helmholtz Resonance replaces all 0Hz partial-tube resonances.

250Hz

1000Hz

2000Hz

The Vowel /u/: A Four-Tube Model

Two Helmholtz Resonators = Two Low-Frequency Formants! F1 = 250Hz F2 = 500Hz

F3 = Pharynx resonance, c/2L = 2000Hz

250Hz 500Hz

2000Hz

Pharynx

VelarTongueBodyConstriction Mouth

Lips

4. Perturbation Theory

Perturbation Theory(Chiba and Kajiyama, The Vowel, 1940)

A(x) is constant everywhere, except for one small perturbation.

Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

Conservation of Energy Under Perturbation

“Sensitivity” Functions

Sensitivity Functions for the Quarter-Wave Resonator (Lips Open)

L

/AA/ /ER/ /IY/ /W/

Sensitivity Functions for the Half-Wave Resonator (Lips Rounded)

L

/L,OW/ /UW/

Formant Frequencies of Vowels

From Peterson & Barney, 1952

Summary• Acoustic wave equation easiest to solve in frequency domain, for

example:– Solve two boundary condition equations for P+ and P-, or– Solve the two-tube model (four equations in four unknowns)

• Quarter-Wave Resonator: Open at one end, Closed at the other– Schwa or Invv (“a tug”)– Front cavity resonance of a fricative or stop

• Half-Wave Resonator: Closed at the glottis, Nearly closed at the lips– /uw/

• Two-Tube Models– Exact solution: use reflection coefficient– Approximate solution: decouple the tubes, solve separately

• Helmholtz Resonator– When the two-tube model seems to have resonances at 0Hz, use, instead,

the Helmholtz Resonance frequency, computed with low-frequency approximations of acoustic impedance

– /iy/: F1 is a Helmholtz Resonance– /uw/ and /ow/: Both F1 and F2 are Helmholtz Resonances

• Perturbation Theory– Perturbed area Perturbed formants– Sensitivity function explains most vowels and glides in one simple chart

landmark-based speech recognition: spectrogram reading, support vector machines, dynamic bayesian...

Documents

tube modelback cavity

hz partialtube resonances

hzfront cavity

fback cavity

5c4lfront cavity resonances

hzback cavity volume

70cm3front cavity lengtharea

200cm3front cavity lengtharea