morphological information and acoustic salience in dutch compounds

22
Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen

Upload: yered

Post on 08-Jan-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Morphological information and acoustic salience in Dutch compounds. Victor Kuperman, IWTS Radboud University Nijmegen. Introduction. Kuperman, Pluymaekers, Ernestus, Baayen (in preparation) Goal: Role of morphological structure in modulating fine phonetic detail in speech production. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Morphological information  and acoustic salience in Dutch compounds

Morphological information and acoustic saliencein Dutch compounds

Victor Kuperman, IWTS

Radboud University Nijmegen

Page 2: Morphological information  and acoustic salience in Dutch compounds

Introduction

Kuperman, Pluymaekers, Ernestus, Baayen (in preparation)

Goal:

Role of morphological structure in modulating fine phonetic detail in speech production.

Object:

Interfixes in Dutch compounds.

Page 3: Morphological information  and acoustic salience in Dutch compounds

Background: Theoretical Framework

• Economy of articulatory effort versus discriminability of the speech signal (H&H Theory, Lindblom 1990);

• Distribution of acoustic salience over an utterance depends on the distribution of information;

• Less predictable (more informative) elements are more salient;

• More predictable (less informative) elements are more reduced.

Page 4: Morphological information  and acoustic salience in Dutch compounds

Background: Theoretical Framework

Information transmission is optimal when information is distributed equally (per time unit) throughout the signal.

Important elements need longer or more careful transmission: less likely to be lost to noise.

Acoustic duration smoothes the amount of information in the signal over time (Aylett and Turk, 2004).

Page 5: Morphological information  and acoustic salience in Dutch compounds

Background: Theoretical Framework

Research on reduction in a large variety of language domains:Syntactic, discourse-related, phonological and prosodic, andlexical.

Attested types of reduction:• mostly, durational shortening of phonemes and syllables;• deletion of phonemes and syllables (Ernestus, 2000;

Johnson, 2004; Jurafsky et al, 2001)• decrease in the spectral centre of gravity (Van Son and Pols,

2003)• decrease in the mean amplitude (Shields and Balota, 1991); • lesser degree of centralization of vowels (Wright, 1997), • and higher degree of coarticulation (Scarborough, 2004).

Page 6: Morphological information  and acoustic salience in Dutch compounds

Aims

We test whether acoustic duration of interfixes contributes to smoothing of morphological information over the signal.

The information-theoretical approach to acoustic salience was validated against two datasets with interfixed compounds. Control variables range from morphological to phonological to lexical tiers.

Page 7: Morphological information  and acoustic salience in Dutch compounds

Background: Morphological Predictability

Interfixes in Dutch compounds:

Interfix -s-: oorlog-s-verklaring Interfix -e(n)-: dier-en-artsNo interfix (zero): oog-arts

Selection of the interfix is not predictable by deterministicrules, but depends on morphological families of the left/rightconstituents of the compound (Krott et al., 2001).

Page 8: Morphological information  and acoustic salience in Dutch compounds

Background: Morphological Predictability

Left/Right Constituent Families:Sets of compounds that share the left/right constituent with thetarget.

Left constituent family of the compound “appartement-en-complex":

• appartement-en-complex• appartement-en-gebouw• appartement-s-gebouw

Page 9: Morphological information  and acoustic salience in Dutch compounds

Background: Morphological Predictability

Selection of an interfix in a compound is biased towards:

• the most frequent interfix in the Left constituent family;• the most frequent interfix in the Right constituent family (to a

lesser extent)

Page 10: Morphological information  and acoustic salience in Dutch compounds

Methodology: Acoustic Materials

Two datasets collected in the Read Speech component of theSpoken Dutch Corpus:

• 1155 tokens with the interfix -s-. Excluded environments: [s], [z], [∫]

• 742 tokens with the interfix -e(n)-. Excluded environments: [n], [m].

Interfixes were manually transcribed by two phoneticians.Acoustic durations for each segment in the datasets wereobtained with the help of an HMM ASR which uses the HTK software package.

Page 11: Morphological information  and acoustic salience in Dutch compounds

Methodology: Variables

Dependent variable: (log-transformed) acoustic duration of the interfix

Independent variable: The bias of the left constituent family towards –s- (SBias), or –en- (EnBias) for respective datasets.

appartement-s-gebouw appartement-en-gebouw appartement-en-complex

SBias in this left constituent family is: 1/(1+2) = 0.33.EnBias in this left constituent family is: 2/(1+2) =0.66.

Page 12: Morphological information  and acoustic salience in Dutch compounds

Methodology: Control Variables

Morphological variables:

Positional entropy of the constituent families:

• Number of members in the family;

• Average information load of the family

Page 13: Morphological information  and acoustic salience in Dutch compounds

Methodology: Control Variables

• Compound word frequency; constituent frequencies• Frequency of word co-occurrence with its neighbors• Segmental lexical information (van Son, Pols, 2003)• Speech rate• Number of segments after the interfix• Position in the utterance (initial/final)• Presence of [n] in the interfix (for –e(n)- dataset)• Phoneme identity: [s] vs. [z] (for –s- dataset)• FollowedbyStop (for –e(n)- dataset)• Stress on the interfix syllable (for –s- dataset)• Stress clash • Speaker’s sex, language, age.

Page 14: Morphological information  and acoustic salience in Dutch compounds

Results: /s/-dataset

SBias 0.345 ***

RightPositionalEntropy 0.068 ***

SBias*RightPosEntropy -0.069 ***

WordFrequency 0.010 *

Lexical_Information 0.117 ***

PhonemeZ -0.156 ***

SpeechRate -0.511 ***

Stress -0.087 ***

R2 = 0.104

Unique contribution of morpholexical factors = 2.0%

Page 15: Morphological information  and acoustic salience in Dutch compounds

Discussion: /s/-dataset

SBias 0.345 ***RightPositionalEntropy 0.068 ***SBias*RightPosEntropy -0.069 ***WordFrequency 0.010 *

Direction of effects runs counter the predictions of the Information Theory. High values of these predictors imply a high likelihood of the interfix, so the acoustic duration of interfix should be reduced, not lengthened.

Page 16: Morphological information  and acoustic salience in Dutch compounds

Results: /en/-dataset

EnBias 0.140 ***RightPositionalEntropy 0.082 ***Lexical_Information 0.070 ***PresenceN 0.707 ***SpeechRate -0.036 ***FollowedByStop 0.234 ***

R2 = 0.720Unique contribution of morpholexical factors = 2.3%

Again, more probability (less information) is associated here withacoustic lengthening, rather than reduction.

Page 17: Morphological information  and acoustic salience in Dutch compounds

General Discussion

Fine phonetic detail is governed by two orthogonal dimensions: • syntagmatic, and• paradigmatic.

Syntagmatic perspective: Information-theoretical approaches consider information as a probability of an element in the context of its phonetic, lexical or syntactic neighbors.

The syntagmatic measures presume that the elements and their sequence in a produced unit (syllable, word, clause) are known with certainty.

Page 18: Morphological information  and acoustic salience in Dutch compounds

General Discussion: Syntagmatic Perspective

Example:

Segmental lexical information load (Van Son and Pols, 2003): the contribution of a segment to word disambiguation given the preceding word fragment.

Target word: boo-k

Frequency (boo-k…)

______________________________

Frequency (boo-k, boo-t, boo-ze…)

Less probable = better disambiguation = longer realization.

Page 19: Morphological information  and acoustic salience in Dutch compounds

General Discussion: Paradigmatic perspective

Selection of the interfix is a pocket of indeterminacy: choice is probabilistic, not deterministic.

The indeterminacy is resolved by paradigmatic support in constituent families. Morpholexical variables determine the strength of the support for available alternatives for the speaker.

Greater support from paradigmatics implies a more confident selection and more salient acoustic realization. Lack of support leads to acoustic reduction.

Page 20: Morphological information  and acoustic salience in Dutch compounds

Conclusions

• Morphological information does affect acoustic duration of interfixes -- in the direction unpredicted by the information theory.

• Interfixes in Dutch compounds form pockets of indeterminacy where the selection is driven by the power of paradigmatic support.

• The maximally likely alternative (one with the most support) is realized with greater acoustic salience and is not reduced.

Page 21: Morphological information  and acoustic salience in Dutch compounds
Page 22: Morphological information  and acoustic salience in Dutch compounds

Methodology: Control Variables

Morphological variables:

• Positional entropy of the constituent families:

• Number of members in the family;

• Average information load of the family

H = - Σ p(x) * log2 p(x)

The left family frequency of appartement is 8 + 5 = 13,

The relative frequencies of family members in this family are:

8/13 = 0.62 for appartementsgebouw,

5/13 = 0.38 for appartementengebouw.

The left positional entropy of appartementengebouw equals

-(0.62*log20.62 + 0.38*log20.38)=0.96 bit.