an acoustic profile of speech efficiency r.j.j.h. van son, barbertje m. streefkerk, and louis c.w....

18
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlan tel: +31 20 5252183; fax: +31 20 5252197 email: [email protected] ICSLP2000, Beijing, China, Oct. 20, 2000

Upload: sophia-norman

Post on 17-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

R.J.J.H. van Son, Barbertje M. Streefkerk, andLouis C.W. Pols

Institute of Phonetic Sciences / ACLC University of Amsterdam,Herengracht 338, 1016 CG Amsterdam, The Netherlandstel: +31 20 5252183; fax: +31 20 5252197email: [email protected]

ICSLP2000, Beijing, China, Oct. 20, 2000

Page 2: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

INTRODUCTION

• Speech is "efficient": Important components are emphasized

Less important ones are de-emphasized

• Two mechanisms:1) Prosody:

Lexical Stress and Sentence Accent (Prominence)

2) Predictability: Frequency of Occurrence (tested) and

Context (not tested)

Page 3: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

MECHANISMS FOR EFFICIENT SPEECH

Speech emphasis should mirror importance which largely corresponds to unpredictability

• Prosodic structure distributes emphasis according to importance (lexical stress, sentence accent / prominence)

• Speakers can (de-)emphasize according to supposed (un)importance

• Speech production mechanisms can facilitate redundant speech or hamper unpredictable speech

Page 4: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

QUESTIONS

• Can the distribution of emphasis or reduction be completely explained from Prosody? (Lexical stress

and Sentence Accent / Prominence)

• If not, can we identify a speech production mechanism that would assist efficiency in speech?

e.g. preprogrammed articulation of redundant and / or high-frequent syllable-like segments?

Page 5: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

Unstressed Stressed Total Corpus Accent – + – +

Single consonants 550 180 569 283 1582Speaker vowels 812 461 528 224 2025

Polyphone vowels 4435 4942 9603 3516 22496

• Accent: Sentence accent / Prominence• Stressed/Unstressed: Lexical stress

SPEECH MATERIAL (DUTCH)• Single Male Speaker: Vowels and Consonants Matched Informal and Read speech, 791 matched VCV pairs • Polyphone: Vowels only 273 speakers (out of 5000), telephone speech, 1244 read sentences Segmented with a modified HMM recognizer (Xue Wang)

• Corpora sizes: Number of realizations of vowels and consonants

Page 6: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

METHODS: SPEECH PREPARATION

• Single speaker corpus– All 2 x 791 VCV segments hand-labeled– Also sentence accent determined by hand– 22 Native listeners identified consonants from this corpus

• Polyphone corpus– Automatically labeled using a pronunciation lexicon and a

modified HMM recognizer – 10 Judges marked prominent words (prominence 1-10)

• Word and Syllable -log2(Frequencies) for both

corpora were determined from Dutch CELEX

Page 7: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

METHODS: ANALYSISSingle Speaker Corpus

Consonants and Vowels

• Duration in ms (vowels and consonants)

• Contrast (vowels only) F1 / F2 distance to (300, 1450) Hz in semitones

• Spectral Center of Gravity (CoG) (V and C)Weighted mean frequency in semitones at point of maximum energy

• Log2(Perplexity) from consonant identification

Calculated from confusion matrices

Page 8: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

METHODS: ANALYSISPolyphone Corpus

Vowels only

• Loudnessin sone

• Spectral Center of Gravity (CoG)Weighted mean frequency in semitones averaged over the segment

• Prominence (1-10)The number of 'PROMINENT' listener judgements0 – 5 is considered Unaccented6 –10 is considered Accented

Page 9: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

• Duration in ms • Loudness in sones • CoG: Spectral Center of Gravity (semitones) • Px: log2(Perplexity) plotted is –R• Contrast: F1/ F2 distance to (300, 1450) Hz (semitones)

CONSISTENCY OF MEASUREMENTSCorrelation coefficients between factors

Single Speaker

Polyphone

}

Filled symbols: P<=0.01

2;F

JP;PF

F

BB

BB

E

JJ

S

S

AA

2

Z

2

H

H

H

H

;

;

;

B

J

P

Z

H

+ + Total

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

AccentUnstressed Stressed— —

Cor

rela

tion

Coe

ffici

ent

-> R

Consonants (n=1582)

Vowels (n=2025)

Polyphone (n=22496)

GESA2

GIC

Filled: p<=0.01

Duration x CoGDuration x PxCoG x PxDuration x Contr.Duration x CoG

Loudness x CoG

Contrast x CoG

Page 10: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

CONSONANT REDUCTION VERSUS FREQUENCY OF OCCURRENCE

(correlation coefficients)

• CoG: Spectral Center of Gravity (semitones)

• Perplexity: log2(Perplexity), plotted is –R.

• Syllable and word frequencies were correlated (R=0.230, p=0.01)

Single speaker corpus(n=1582)

Filled symbols: P<=0.01

G

BB

B B

G

BB

B

E

JJ

J

J J

JA

A

AF

A A

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35Syllable frequencies Word frequencies

+ + TotalAccentUnstressed Stressed

— —

Cor

rela

tion

Coe

ffici

ent

-> R

+ + TotalUnstressed Stressed

— —

A

GEA

DurationCoGPerplexity

Filled: p<=0.01

Page 11: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

• Duration in ms• Contrast: F1/ F2 distance to (300, 1450) Hz (semitones)

• CoG: Spectral Center of Gravity (semitones)• Syllable and word frequencies were correlated (R=0.280, p<=0.01)

VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE

(correlation coefficients)

Single speaker corpus(n=2025)

Filled symbols: P<=0.01

GEA

DurationCoGContrast

Filled: p<=0.01

B

G

B

B

G

G

B G

BE

J

J

E

J

E

J

E

E

F

F

F FF

F

A

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35Syllablefrequencies Wordfrequencies

+ + TotalAccentUnstressed Stressed

— —

Cor

rela

tion

Coe

ffici

ent

-> R

+ + TotalUnstressed Stressed

— —

A

Page 12: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

DISCUSSION OF SINGLE SPEAKER DATA

• There are consistent correlations between frequency of occurrence and “acoustic reduction” (duration, CoG and contrast), but not for consonant identification (perplexity)

• Correlations for syllable frequencies tend to be larger than those for word frequencies (p0.01)

• Correlations were found after accounting for Phoneme identity, Lexical Stress and Sentence Accent

Page 13: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

• Loudness (sone)• CoG: Spectral Center of Gravity (semitones)• Syllable and word frequencies (-log2(freq))

PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY OF OCCURRENCE

(correlation coefficients)

Polyphone corpus(n=22496)

Filled symbols: P<=0.01B

B

B

JJ J H

H

HF

F

F

– + Total – + Total0

0.1

0.2

0.3

0.4

0.5C

orre

latio

n C

oeffi

cien

t ->

R

Lexical stress

Loudness

CoGSyllable

freq.

Word freq.G LoudnessE CoGC Syllable freq.A Word freq.Filled: p<=0.01

Page 14: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE

(correlation coefficients)

Polyphone corpus (n=22496)

• Loudness (sone)• CoG: Spectral Center of Gravity (semitones)• Syllable and word frequencies were correlated (R=0.316, p<=0.01)

Filled symbols: P<=0.01

Accent: + Prom > 5– Prom <= 5

GE

LoudnessCoG

Filled: p<=0.01

+ + TotalAccentUnstressed Stressed

— —

Cor

rela

tion

Coe

ffici

ent

-> R

+ + TotalUnstressed Stressed

— —

EE

E E

EE

E EJ

G

B

B

B

B

B

Syllablefrequencies Wordfrequencies

0

0.02

0.04

0.06

0.08

0.10

Page 15: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

DISCUSSION OF POLYPHONE DATA

• Perceived prominence correlates with “acoustic vowel reduction” (loudness, CoG) and frequency of occurrence (syllable and word)

• There are small but consistent correlations between “acoustic vowel reduction” and frequency of occurrence

• Correlations were found after accounting for Vowel identity, Lexical Stress and Prominence

Page 16: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

CONCLUSIONS• LEXICAL STRESS and

SENTENCE ACCENT / PROMINENCE cannot explain all of the “efficiency” of speech: FREQUENCY OF OCCURRENCE and possibly CONTEXT in general are needed for a full account

• A SYLLABARY which speeds up (and reduces) the articulation of “stored”, high-frequency, syllables with respect to “computed”, rare, syllables might explain at least part of our data

Page 17: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

SPOKEN LANGUAGE CORPUSHow Efficient is Speech

• 8-10 speakers: ~60 minutes of speech each (fixed and variable materials)

• Informal story telling and retold stories ~15 min• Reading continuous texts ~15 min• Reading Isolated (Pseudo-) sentences ~20 min• Word lists ~ 5 min• Syllable lists ~ 5 min

Page 18: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University

MEASURINGSPEECH EFFICIENCY

• Speaking Style differences(Informal, Retold, Read, Sentences, Lists)

• Predictability– Frequency of Occurrence (words and syllables)– In Context (language models)– Cloze-tests– Shadowing (RT or delay)

• Acoustic Reduction– Segment identification– Duration– Spectral reduction