vot is necessary but not sufficient for describing the voicing contrast in japanese

1

VOT is necessary but not sufficient for describing the voicing contrast in Japanese

Eun Jong Kong*, Mary E. Beckman*, Jan Edwards †

(*Ohio State University, †Univ. of Wisconsin at Madison)

LSA 2009 January 10

2

Since the seminal work of Lisker and Abramson (1964), Voice Onset Time (VOT) has been used as the primary measure for comparing word-initial stop voicing and aspiration contrasts across languages.

Introduction

Figure.1 Voice onset time distribution of apical (dental and alveolar) stops of two-category languages. Taken from Lisker & Abramson (1964).

e.g.,

• Spanish: /d/ vs. /t/lead VOT vs. short lag VOT

• Cantonese: /t/ vs. /th/ short lag vs. long lag VOT.

• English: /d/ vs. /t/

lead or short lag VOT vs. long lag VOT

freq

uenc

y

Spanish

Cantonese

English

voice onset time (msec)

d t

t th

d t

vot=0

3

VOT has also been a useful acoustic measure for describing children’s mastery of word-initial stops in languages with voicing and/or aspiration contrasts.

Introduction

Figure.2 VOT distribution of alveolar stops in Thai. Taken from Gandour et al (1986).

e.g., Thai (Gandour et al 1986)

- stops with three-way contrast

: /d/ vs. /t/ vs. /th/

- lead VOT mastered later than short lag VOT or long lag VOT

3 year olds

5 year olds

7 year olds

Thai

/d/ /t/ /th /

4

Is VOT the whole story?

Japanese stops and VOT Two-way voicing contrast (Homma 1980, Shimizu 1989)

voiced stops: not only lead VOT, but also short lag VOT (Takada 2004)

voiceless stops: neither clearly short lag nor clearly long lag, but intermediate between the two (Riney et al 2007)

This results in overlap in VOT range between the two categories Is there another acoustic measure that helps to disambiguate?

Introduction

5

To evaluate whether VOT is a sufficient acoustic measure in distinguishing voiced stops from voiceless stops in Japanese, we investigate

how the acoustic parameter of VOT relates to native speaker/transcriber judgments of accuracy for voiced and voiceless stop consonants in English- and Japanese- acquiring children.

whether another acoustic parameter is also needed to predict native speaker/transcriber judgments of these productions.

Goal of the study

6

Children’s stop productions were analyzed to address the following questions.

Question 1) Are there differences between the time-courses for mastering the stop voicing contrasts in English and Japanese?

Method; judgments by trained native speaker/phoneticians, logistic regression.

Question 2) How well does the single acoustic dimension of VOT predict the native speaker/transcriber’s judgments of voiced vs. voiceless stops produced by English- and Japanese-acquiring children?

Question 3) Is there another acoustic dimension that improves the prediction of the native speaker/transcriber’s judgments of the voicing contrast in stops produced by these children?

Method; acoustic analysis, logistic regression

Research questions

7

Data collection

1) Production data come from project

- cross-language investigation of phonological development

www.ling.ohio-state.edu/~edwards/

2) Subjects 51 children (2;0-6;0) , 20 adults (18;0-30;0) recorded in Tokyo 50 children (2;0-6;0) , 15 adults (18;0-30;0), recorded in Ohio

3) Materials: word-initial pre-vocalic lingual stops — e.g., Japanese /d/ daikon ‘radish’ vs. /t/ tamago ‘egg’ English /d/ dove vs. /t/ tongue

(velar stops were also recorded but not discussed here)

8tamago ‘egg’

9daikon ‘radish’

10Correct Voicing Voicing Error

12

Question 1) Are there differences between the time-courses for mastering the stop voicing contrasts in English and Japanese?

Measure: voicing accuracy from transcriptions by a trained phonetician native speaker of English/Japanese.

voicing correct: /t/ → [t], /d/ → [d], /d/ → [g], /t/ → [k] voicing error: /t/ → [d], /d/ → [t], /t/ → [n]

Criterion for mastery: 75% voicing accuracy (adapted from criteria used in norming studies such as Smit et al., 1990).

Analysis 1: Transcription

1330 40 50 60 70 80

02

04

06

08

01

00

dt

30 40 50 60 70 80

02

04

06

08

01

00

gk

age (months): English

% v

oici

ng a

ccur

acy

cons

.

30 40 50 60 70 80

02

04

06

08

01

00

voicedvoiceless

75% accuracy criterion

30 40 50 60 70 80

02

04

06

08

01

00

dt

30 40 50 60 70 80

02

04

06

08

01

00

g or gjk or kj

30 40 50 60 70 80

02

04

06

08

01

00

voicedvoiceless

age (months): Japanese

% v

oici

ng a

ccur

acy

cons

./d/ at 42 mo/d/ at 42 mo

Transcription: results Mixed effects logistic regression.

Dependent variable: token by token voicing accuracy (correct / incorrect)

Independent variable: age of child and target voicing (fixed effect) + subject (random effect)

before 24 mo

age in month

English

Japanese

14

Analysis.1: interim conclusion

Transcription Analysis The voicing contrast is mastered later by Japanese-speaking

children, as compared to English-speaking children.

15

VOT: the latency between the burst and the voicing onset.

VOT

Analysis 2: VOT

Time (s)141.9 142.1

-0.06935

0.08031

0

141.871398 142.072843torn4_20000

/t/ in “torn”

burst voice onset

Question 2) How well does the single acoustic dimension of VOT predict the native speaker/transcriber’s judgments of voiced vs. voiceless stops produced by English- and Japanese-acquiring children?

16

-0.20 -0.10 0.00 0.10

020

4060

male

-0.20 -0.10 0.00 0.10

020

4060

femaledt

English: clear separation between short lag (/d/) vs. long lag (/t/) Japanese: lead or short lag (/d/) vs. intermediate lag (/t/), with much

overlap.

English

VOT: results (adults)

-0.15 -0.05 0.05 0.15

020

4060

male

-0.15 -0.05 0.05 0.15

020

4060

female/d//t/

Japanese

VOT=0

VOT medians.

VOT in seconds

no. o

f co

unts

17

-0.2 -0.1 0.0 0.1 0.2

020

4060

2yos

-0.2 -0.1 0.0 0.1 0.2

020

4060

5yos /d//t/

no. o

f co

unts

VOT in seconds

VOT: results (children)

Language specific VOT distributions in children’s stops English: clearly separated peaks. Japanese: intermediate values for /t/ with even more overlap

with /d/ than in adults.

VOT=0

VOT medians.

Japanese

English5 yos2 yos

-0.2 -0.1 0.0 0.1 0.2

020

4060

2yo

-0.2 -0.1 0.0 0.1 0.2

020

4060

5yo /d//t/

no. o

f co

unts

VOT in seconds

2 yosJapanese

5 yos

18-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3

0.0

0.2

0.4

0.6

0.8

1.0

VOT (seconds)

prob

abil

ity

of tr

ansc

ript

ion

as /t

/

correctly predicted 94%

English


-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3

0.0

0.2

0.4

0.6

0.8

1.0

VOT (seconds)

prob

abil

ity

of tr

ansc

ript

ion

as /t

/

correctly predicted 80%

Japanese

Mixed effects logistic regression

Dependent variable: token by token voicing judgment (/t/ or not /t/)

Independent variable: VOT

19

VOT: results (children) Evaluation of predictive value

Model’s prediction accuracy with VOT as an independent variable i.e., calculate proportion of tokens where the odds of transcribing /t/ are greater than 50% and the transcriber actually transcribed /t/:‘VOT model’: 94% and 80%

Baseline prediction accuracy with no independent variable i.e., calculate the proportion of tokens where the transcriber transcribed a voiceless consonant: ‘Baseline’: 49.7% and 63.3%

20

Analysis 2: interim conclusionTranscription Analysis The voicing contrast is mastered later for Japanese-speaking

children, as compared to English-speaking children.

VOT The single acoustic dimension of VOT predicts the transcribed

voicing for English productions 94% of the time.

Accuracy of prediction for Japanese productions is much lower.

21

Question 3) Is there another acoustic dimension that improves the prediction of the native speaker/transcriber’s judgments of the voicing contrast in stops produced by these children?

H1-H2 A type of breathiness measure. Amplitude difference between the first harmonic and the second

harmonic.

Frequency (Hz)0 6000

Sound pressure level (dB/Hz)

0

20

40 H1-H2 (dB)first harmonic (H1)

second harmonic (H1)

Time (s)141.9 142.1

-0.06935

0.08031

0

141.871398 142.072843torn4_20000

25ms

Am

plitude (dB)

Analysis 3: H1-H2 by VOT

“torn”

22

Adults English

Higher H1-H2 and longer VOT for /t/.

No overlap between VOT ranges

Japanese Higher H1-H2 and

longer VOT for /t/. Overlap between

VOT ranges

-20

-10

010

20

-100 -10 0 10 100

adults: male

-20

-10

010

20

-100 -10 0 10 100

adults: female

/t//th/

log VOT (ms)H

1-H

2 (d

B)

English

-20

-10

010

20

-100 -10 0 10 100

adults: male

-20

-10

010

20

-100 -10 0 10 100

adults: female

/d//t/

log VOT (ms)

H1-

H2

(dB

)

Japanese

H1-H2 by VOT: adults

male

male

female

female

23

Perceived /t/ and /d/ by transcriber.

English /t/

: longer lag VOT

Japanese /t/

: longer lag VOT, higher H1-H2

H1-H2 by VOT: children

-20

-10

010

20

-100 -10 0 10 100

-20

-10

010

20

-100 -10 0 10 100

/d/ on target/t/ on target[t] off target[d] off target


log VOT (ms)

H1-

H2

(dB

)

-20

-10

010

20

-100 -10 0 10 100

2 yos-2

0-1

00

1020

-100 -10 0 10 100

5 yos



log VOT (ms)

H1-

H2

(dB

)

English

Japanese

2 yos 5 yos

24


Mixed effects logistic regression

Dependent variable: token by token voicing judgment (/t/ or not /t/)

Independent variables: VOT+ H1H2

25

VOT and H1-H2: results (children) Evaluation of predictive value

Baseline prediction accuracy with no independent variable i.e., calculate the proportion of tokens where the transcriber transcribed a voiceless consonant: 49.7% and 63.3%

Model’s prediction accuracy with VOT as an independent variable: 94% and 80%

Model’s prediction accuracy with VOT and H1-H2 as independent variables: 94% and 83%

VOT H1-H2

0.0

0.2

0.4

0.6

0.8

1.0

29.4 times

English children

7.91 0.27

norm

aliz

ed c

oeff

icie

nts

29.4 times

VOT H1-H2

0.0

0.2

0.4

0.6

0.8

1.0

5.3 times

5.84 1.1

5.3 times

VOT H1-H2 VOT H1-H2

English Japanese

> >

*

*

*

*

* P < 0.05

26

Analysis 3: interim conclusionTranscription Analysis The voicing contrast is acquired later for Japanese-speaking

children, as compared to English-speaking children.VOT The single acoustic dimension of VOT is adequate to

characterize the transcription results for English. However, VOT alone does not adequately characterize

the transcription results for Japanese. H1-H2 by VOT In Japanese, the additional acoustic parameter of H1-H2

improves the prediction of the transcription results. The effects of VOT relative to H1-H2 was greater in English than

in Japanese

27

Japanese-speaking children showed mastery of the voicing contrast at a later age than English speaking children. However, the VOT ranges for the productions of Japanese-speaking

children were similar to those of adults. When VOT alone was used to predict the judgments of a

trained native speaker/transcriber, it was only 80% successful in Japanese, whereas it was 94% successful in English.

Adding the acoustic parameter of H1-H2 improved the prediction of the native speaker/transcriber judgments for the productions of the Japanese-speaking children, but not for those of the English-speaking children.

Summary and conclusion

28

English and Japanese encode their stop voicing contrast in the acoustic dimensions in language-specific ways. English: exclusively along VOT dimension Japanese: more than VOT dimension

Unlike English, VOT is not a sufficient acoustic measure of stop voicing contrast in Japanese. It was necessary to examine other relevant acoustic

dimensions such as breathiness to correctly characterize Japanese stop voicing contrast.

Summary and conclusion

29

Acknowledgement

This work was supported by by NIDCD grant 02932 to Jan Edwards.

We thank the children who participated in the task, the parents who gave their consent, and the principals and teachers at the schools at which the data were collected.

Thank you for your attention!

30

reference

Lisker, L. and A. Abramson. 1964. A cross-language study of voicing in initial stops: acoustical measurements. Words, 20.

Riney, T., N. Takagi, K. Otaa, and Y. Uchida. 2007. The intermediate degree of vot in japanese initial voiceless stops. Journal of Phonetics, 35.

Smit, A.B., L. Hand, J. Freilinger, J renthal, and A Bird. 1990. The iowa articulation norms project and its nebraska replication. Journal of Speech and Hearing Disorders, 55.

Gandour, H. S. H., J., R. Petty, S. Dardarananda, Dechongkit, and S. Mukongoen. 1986. The acquisition of the voicing contrast in thai: A study of voice onset time in word-initial stop consonants. Journal of Child Language, 13.

Takada, M. 2004. VOT tendency in the initial voiced alveolar plosive /d/ in Japanese and the speakers' age. Journal of the Phonetic Society of Japan, 8(3), 57-66.

Homma, Y. (1980). Voice onset time in Japanese stops. Onseigakkai Kaihoo, 163, 7-9.

Sander, E.1972. When are speech sounds learned? Journal of Speech and Hearing Disorders, 37: 55-63.

31

Extra I: Velarsadults scatterplts

English adults: coronals + velars

Japanese adults: coronals (top) + velars (bottom)

32

English children(alv: left, velar: right)- VOT only model:93%- VOT&H1-H2 mode

l: no improvement. VOT was the only effective parameter.

Extra I: Velarschildren scatterplots

33

Japanese children

(alv: left, velar: right)

- VOT only model: 87%

- VOT&H1-H2 model: no improvement. VOT was the only effective parameter.

Extra I: Velarschildren scatterplots

vot is necessary but not sufficient for describing the voicing contrast in japanese

Documents

tlead vot

vot range

japanese stops

acoustic parameter of

short lag vot cantonese

short lag vot takada

voiceless stops

initial stops