speech signal processing · i tongue, lips, glottis, teeth. 3/49. vocal folds vocal “chords”...

49
Speech production 1 / 49

Upload: others

Post on 11-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Speech production

1 / 49

Page 2: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Gray’s Anatomy

I Find the bit where thenoise comes out.

I Take it to bits.I See how it works.

http://en.wikipedia.org/wiki/Vocal_tract

2 / 49

Page 3: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vocal apparatus

Apparently from Borden et al,1994.

Physically quite largeI Lungs, throat.I Oral cavity, nasal cavity.I Tongue, lips, glottis, teeth.

3 / 49

Page 4: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vocal folds

Vocal “chords” are perhaps the most obvious part.

Used even in whispered speech.

4 / 49

Page 5: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vowels and consonants

This is a rough separation at best

Vowels are resonant sounds that can be held as if singing.

Consonants are everything elseIncluding some things that are difficult todistinguish from vowels.

Vowels are vocalisedI The vocal chords vibrate, imparting pitch.I The oral cavity resonates, imparting character or quality.

Consonants involve other articulators.I Tongue, lips.I Stops, fricatives, may or may not be vocalised.

5 / 49

Page 6: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

c©The Wire Magazine

Donald Ayler, 1942–2007 (with Albert Ayler, 1936–1970)

6 / 49

Page 7: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Excitation System

c©The Wire Magazine

Donald Ayler, 1942–2007 (with Albert Ayler, 1936–1970)

7 / 49

Page 8: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Modeling the vocal tract

Vocal tract models have two elements:

1. Excitation from the vocal chords.

2. System response from the vocal tract shape.

I will gloss over this. For a full description, see

Thomas F. Quatieri.Discrete-Time Speech Signal Processing: principles andpractice.Signal Processing Series. Prentice Hall PTR, Upper SaddleRiver, New Jersey, 07458, 2002.

8 / 49

Page 9: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Excitation

Excitation is important, it has two forms:1. Voiced sounds.

I Pulse train (not a pure tone)I Corresponds to vocal chords opening.I Has a distinct pitch.I Imposes a distinct fundamental period in the spectra.

2. Unvoiced sounds.I White (-ish, pink maybe) noise.I No pitch, but perception of pitch can come from formants.I Spectra are smooth.

These two forms get you quite a long way in understandingspeech.

9 / 49

Page 10: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

System response 1

I Blah blah lossless tubes.I Blah blah partial differential equations.I Blah blah piston in infinite wall.

10 / 49

Page 11: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

System response 2

<

=

The vocal tract is a prettycomplicated tube. However, itcan be summarised:

I There are multipleresonances. Thesecorrespond to poles.

I Some sounds, notablynasals, introduce zeros.

I There is a zerocorresponding to theradiation impedance.

11 / 49

Page 12: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vocalisation

Simplistic frequency domain view

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

Excitation

×

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

System response

=

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3

Vocalised response

12 / 49

Page 13: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Data driven

Why so blase?I In speech and language, hand-crafting a solution tends to

only get you so far.I It’s better to learn what you can from data.

I Provide a model that is capable of representing what youwant.

I Provide an algorithm capable of training the model.I Provide enough data to allow the algorithm to train the

model.

13 / 49

Page 14: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vowels

14 / 49

Page 15: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vowel shapes

15 / 49

Page 16: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

i: “eee”

16 / 49

Page 17: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

ae “ah”

17 / 49

Page 18: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

a “aaa”

18 / 49

Page 19: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

u: “ooo”

19 / 49

Page 20: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

The vowel quadrilateral

20 / 49

Page 21: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

And formalised

21 / 49

Page 22: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

The engineer’s approach

22 / 49

Page 23: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Petersen-Barney data

I Formant frequencies of 76American English speakers(around 1952).

I First vs. second formantfrequencies are plotted.

I American vowels areseparable in twodimensions.

23 / 49

Page 24: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Petersen-Barney rotated

24 / 49

Page 25: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Italian

I Italian vowels are veryclose to cardinal vowels.

I The mid ones areinterchangeable to anextent with dialectSo in this sense there areonly 5.

25 / 49

Page 26: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Spanish

I Simplest of all EuropeanlanguagesPhonetically speaking!

I Just 5 cardinal vowels.I No need for a quadrilateral

“The Spanish voweltriangle”

26 / 49

Page 27: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

English (British RP)

I English is not likeMediterranean languages.

I Classed as Germanic.I Central vowels.I Un-stressed vowel

“schwa” [@].

27 / 49

Page 28: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

French

I Actually quite close toItalian.

I A single central vowelLike English

I Bonus: extra dimension

28 / 49

Page 29: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Rounding

I Generally:Back vowels are rounded.Front vowels are un-rounded.

I French and German (for example)Front vowels can be rounded or unrounded

I N.B.: Rounding is not the only other dimension.Length, stress, rhoticity.

29 / 49

Page 30: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Vowels are optional sometimes

Said Sjlbvdnzv residentGrg Hmphrs, 67:

With just a fewkey letters, Icould be GeorgeHumphries. Thatis my dream.

– The Onion

30 / 49

Page 31: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Difficulties for non-natives

Non-native speakers find some vowels very tricky.I Particularly close together ones.I Particularly rounded vs. back.

31 / 49

Page 32: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Pitch vs. Peach

English Italian

32 / 49

Page 33: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Dessus vs. Dessous

English French

(I am native English and can’t hear the difference)

33 / 49

Page 34: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Diphthongs

0 20 40 60 80 100 1200

4000

“eight five zero” [eIt faIv zIro]

A diphthong is a vowel thatmutates into a different vowel.

I Cow.I Buy.

Often represented by lines onthe vowel quadrilateral.

34 / 49

Page 35: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Consonants

35 / 49

Page 36: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Resonance

Vowels are about resonances.I The vocal tract is “open”.I The sounds can be held for some time.

By contrast, consonants are about stopping resonance.I The tongue touches part of the vocal tract.I Air-flow is restricted somehow.

36 / 49

Page 37: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Sagittal sections

There are lots of named parts.

37 / 49

Page 38: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Example vocabulary

These are adjectives:

Labial The lips.

Dental The teeth.

Alveolar The ridge behindthe upper teeth.Alveoli are theholes your teeth sitin.

Velar The roof of themouth, or softpalate.

Uvular The dangly thingat the back of yourmouth.

38 / 49

Page 39: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

IPA chart

The IPA is characteristically terse:

39 / 49

Page 40: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Labio-dental

I Upper teeth and lower lip.I [f], [v].

40 / 49

Page 41: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Bilabial

I Both lips.I [p], [b].

41 / 49

Page 42: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Alveolar

I Tongue, and the area justbehind the teeth.

I In English, there are threequite close areas:

I Dental, [T] [D].I Alveolar, [s], [z], [t], [d].I Post alveolar, [S], [Z].

42 / 49

Page 43: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Velar

I The tongue and the roof ofthe mouth, the hardpalette.

I [k], [g].

43 / 49

Page 44: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Voicing

0 20 40 60 80 1000

10000

“zero seven”

Many consonants can be voicedor unvoiced. Consider thedifference between these:

I [p] and [b].I [k] and [g].I [s] and [z].

44 / 49

Page 45: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Nasals

0 20 40 60 80 1000

4000

“oh nine one”

Nasals are actually a lot likevowels.

I In normal vowels, the nasalcavity is closed at theuvula.

I In nasals, the oral cavity isclosed at

I the lips: [m].I the alveolar ridge or

teeth: [n].I the velum: [N].

I In both cases, an open pathexists between larynx andair.

45 / 49

Page 46: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Swedish nasal

46 / 49

Page 47: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Fricatives

0 20 40 60 80 1000

10000

“zero seven” [zIroU sev@n]

Fricatives involve frictionI In English, it’s

(labio-)dental or alveolarI Unvoiced: [f], [T], [s], [S].I Voiced: [v], [D], [z], [Z].

47 / 49

Page 48: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Stops

0 20 40 60 800

10000

“six two” [siks tu:]

Stops are momententarycomplete obstructions

I Unvoiced: [p], [t], [k].I Voiced: [b], [d], [g].I Same articulation points as

the nasals.

48 / 49

Page 49: Speech Signal Processing · I Tongue, lips, glottis, teeth. 3/49. Vocal folds Vocal “chords” are perhaps the most obvious part. Used even in whispered speech. 4/49. Vowels and

Also

Consonants also include:I Semi-vowels

I [w], [j].I Very like vowels, but a little more obstructed.

I LiquidsI [ô], [l].I Again, somewhat closer to vowels, but with some

obstruction.

Notice that the English “r” is [ô]. [r] is a trill.

49 / 49