adaptive design of speech sound systems randy diehl in collaboration with bjőrn lindblom, carl...

33
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Upload: dora-fisher

Post on 12-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Adaptive Design of Speech Sound Systems

Randy Diehl

In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Page 2: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Two observations:

Sound systems of natural languages underexploit the sound producing capabilities of humans.

The sounds that are used in natural languages vary in frequency of occurrence. /a/, /i/, /u/ are common; /š/, /õ/ are rare. /p/, /t/ are common; />/, /q/ are rare.

Page 3: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Why are certain speech sounds favored?Possibilities:

They are easy to hear (i.e., to distinguish from other sounds).

They are easy to produce.

They are easy to learn.

Page 4: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

The role of auditory distinctiveness in the design of vowel inventories Liljencrants & Lindblom (1972)

Diehl, Lindblom, & Creeger (2002, 2003)

Page 5: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Liljencrants & Lindblom (1972)

Possible vowel sound: Any vowel-like output of a computational model of the human vocal tract (Lindblom & Sundberg, 1971).

Auditory distance: Euclidean distance between any two vowel sounds i and j in a space defined by the frequencies of the first several formants:

Dij = ((∆M1)2 + (∆M’2)2)1/2.

Selection criterion: For any given inventory size, select those vowels whose pairwise distances, Dij, are maximal.

Page 6: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Predicted vowel systems (Liljencrants & Lindblom (1972)

Page 7: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

A problem: Too many high vowels

A problem: Too many high vowels

Page 8: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

These simulations were unrealistic in at least two ways:

Acoustic distance (based on formant frequencies) is probably not a good proxy for auditory distance.

Vowel sounds do not naturally occur in conditions of total quiet.

Page 9: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Improving the realism of the simulations (Diehl, Lindblom, and Creeger, 2002) Define a notion of ‘auditory distance’ based

on plausible auditory representations of vowel sounds.

Model vowel systems as they would have emerged under natural conditions of background noise.

Page 10: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

From acoustic to auditory representations

0

40

80

0 1000 2000 3000 Hz

dB

0

40

80

0 4 8 12 16Bark

Hz to Bark

*

Input

0

50

100

0 4 8 12 16Bark

Ph

ons/

Bar

k

Output

-600

0

0 4 8 12 16Bark

Auditory filtering

Page 11: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Computing distances among auditory spectra

0

50

100

0 4 8 12 16Bark

Ph

ons/

Bar

k

1. At each point along the Bark dimension, calculate the difference in Phons/Bark between any vowel pair.2. Square these differences.3. Sum the squares.4. Take the square root of the sum. This is a measure of the Euclidean auditory distance between two vowels.

Page 12: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of the auditory transform

Auditory-basedSystem

7

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

Formant-basedSystem

Page 13: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of the auditory transform

Auditory-basedSystem

Formant-basedSystem

9

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

Page 14: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of the auditory transform

The problem of excessive high vowels is reduced—but not eliminated.

Page 15: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of adding background noise

Hypothesis: Vowel systems have evolved to be perceptually robust even at unfavorable signal/noise ratios.

Page 16: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Method

We used noise whose spectral shape mimicked the long-term average for speech

(-6 dB/octave). We computed auditory distances among

vowels at 8 different S/N ratios, ranging from 10 dB to -7.5 dB.

We then averaged these distances to determine the optimal vowel systems.

Page 17: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

3

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

Quiet Noise

3

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)F

2 (H

z)

Effects of adding background noise (3 vowel system)

Page 18: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of adding background noise (5 vowel system)

5

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

Quiet Noise

5

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)F

2 (H

z)

Page 19: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of adding background noise (7 vowel system)

Quiet Noise

7

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

7

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)F

2 (H

z)

Page 20: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Effects of adding background noise (9 vowel system)

9

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)

F2

(Hz)

9

0

500

1000

1500

2000

2500

200 400 600 800

F1 (Hz)F

2 (H

z)

Quiet Noise

Page 21: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Comparisons with actual vowel inventories

The reduction in the number of high vowels (relative to the Liljencrants & Lindblom simulations) yields a much better fit with actual vowel systems.

Some fronting/unrounding of the high, back vowel /u/ also appears to be common among the world’s languages (e.g., Japanese and many other 5-vowel systems, American English).

Page 22: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Why does background noise reduce the number of high vowels?

0

10

20

30

40

50

60

70

80

90

0 500 1000 1500 2000 2500 3000 3500

Frequency (Hz)

dB

per

Ba

rk

0

10

20

30

40

50

60

70

80

90

0 500 1000 1500 2000 2500 3000 3500

Frequency (Hz)

dB

per

Bar

k

First formant information tendsto be more noise-resistant than higher formant information.

This warps the auditory-distance space for vowels: the front-back dimension contracts relative to the open-close dimension.

This, in turn, leaves less room for high vowels.

Page 23: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

More recent modeling (Diehl, Lindblom, and Creeger, 2003)

By further improving the realism of our auditory model by incorporating temporal (phase locking) information as well as spectral (excitation pattern) information, we obtain predicted vowel systems that fairly closely match observed systems even without the presence of background noise.

Page 24: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Preferred vowel inventories are reasonably well predicted on the basis of a principle of maximal auditory contrast.

What about preferred consonant inventories?

Page 25: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Voice distinctions

Many languages distinguish certain consonants (e.g., /b/ vs /p/, /d/ vs /t/) based on the differences in voice onset time (VOT). This is the interval between the opening of the vocal tract and the onset of vocal fold vibration (voicing).

Page 26: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Voice categories across languages (Lisker & Abramson 1964)

Page 27: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Why do languages select from these three categories of VOT?

One possibility: aerodynamic and biomechanical factors.

Another possibility: enhanced discriminability at -20 ms VOT and +20 ms VOT yields robust perceptual distinctions between the three categories.

Evidence: human infants, chinchillas,nonspeech analogs of VOT

Page 28: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Voice onset time and tone onset time

Time (ms)

-50 ms +50 ms

Fre

qu

ency

A

BTime (ms)

Fre

qu

ency

Page 29: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Discriminability of TOT stimuli

0

10

20

30

40

50

60

70

80

90

100

-50 vs.-20

-40 vs.-10

-30 vs.0

-20 vs.10

-10 vs.20

0 vs.30

10 vs.40

20 vs.50

TOT Stimulus Pair (ms)

Perc

ent Corr

ect

Dis

crim

inati

on

Page 30: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Are TOT categories that are consistent with the natural boundaries more learnable? (Holt, Lotto, and Diehl, JASA, 2004)

0

1

2

3

4

5

6

7

8

-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120

Rela

tive F

requency

Inconsistent

Consistent

3.36

8.21

123456789

1011121314151617181920

Condition

Avera

ge B

lock

s to

Cri

teri

on Consistent

Inconsistent

0

1

2

3

4

5

6

7

8

-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120

TOT (ms)

Rela

tive F

requency

Inconsistent

Consistent

6.57

16.86

0123456789

1011121314151617181920

Condition

Avera

ge B

lock

s to

Cri

teri

on Consistent

Inconsistent

Page 31: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Summary of VOT results:

Preferred voice categories are more discriminable than other possible voice categories.

The results of Holt, Lotto, and Diehl (2004) suggest that they are also more learnable.

Page 32: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Conclusion

Cross-language preferences in speech sound systems appear to reflect performance constraints on talkers, listeners, and language learners.

Page 33: Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

Unsolved problems

Measuring articulatory energy costs

Weighting contributions of auditory distinctiveness, least effort, and learnability

Predicting variability