di ss2013 hesita_presentation_final

37
The 6th Workshop on Disfluency in Spontaneous Speech Stockholm, Sweden August 21-23, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo Veiga 1,2 Fernando Perdigão 1,2 1 Instituto de Telecomunicações, Coimbra, Portugal 2 University of Coimbra, DEEC, Portugal HESITA(tions) in Portuguese Database

Upload: sara-candeias

Post on 24-Jun-2015

47 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Di ss2013 hesita_presentation_final

The 6th Workshop on Disfluency in Spontaneous Speech

Stockholm, Sweden August 21-23, 2013

Sara Candeias 1

Dirce Celorico 1

Jorge Proença 1

Arlindo Veiga 1,2

Fernando Perdigão 1,2

1Instituto de Telecomunicações, Coimbra, Portugal2University of Coimbra, DEEC, Portugal

HESITA(tions) in Portuguese Database

Page 2: Di ss2013 hesita_presentation_final

2

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 3: Di ss2013 hesita_presentation_final

3

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 4: Di ss2013 hesita_presentation_final

4

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

LINGUISTIC or CLINICAL/THERAPEUTIC areas

more directly interested in gathering knowledge for better

identifying salient information in human speech

communication

Various scientific domains can beneficiate of the analysis of the

hesitation distribution along the speech:

SCOPE

Page 5: Di ss2013 hesita_presentation_final

5

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

LINGUISTIC or CLINICAL/THERAPEUTIC areas

more directly interested in gathering knowledge for better

identifying salient information in human speech

communication

Various scientific domains can beneficiate of the analysis of the

hesitation distribution along the speech:

SCOPE

SPEECH TECHNOLOGY

to increase the usability of speech systems, by overpassing the challenges proposed by the presence of such phenomena.

Page 6: Di ss2013 hesita_presentation_final

6

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

AUTOMATIC LANGUAGE PROCESSING

could benefit from a richer representation of the audio signal that

incorporates speaking styles information (hesitations),

to reduce errors in the automatic speech recognition,

to improve automatic conversational speech systems.

DETECTION OF HESITATION EVENTS

provides the segmentation of multimedia data into consistent

parts,

leads to important applications : identification of the speech

segments to train acoustic models for speech recognition in

spontaneous speech.

SCOPE

Page 7: Di ss2013 hesita_presentation_final

7

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

SCOPE

No database of hesitation events for European Portuguese is freely

available so far !

Page 8: Di ss2013 hesita_presentation_final

8

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 9: Di ss2013 hesita_presentation_final

9

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Database for European Portuguese,

mainly focused on the hesitation events,

containing a large and rich variety of speech data events.

GOAL

HESITA database

Available through:

Meta-Net: http://metanet4u.l2f.inesc-id.pt/repository/search/

Project page: http://lsi.co.it.pt/spl/hesitation/downloads.html

Page 10: Di ss2013 hesita_presentation_final

10

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 11: Di ss2013 hesita_presentation_final

11

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database 30 daily news programs

collected from podcasts of a European Portuguese television channel

~ 27 hours of speech

audio downsampled from 44.1 kHz to 16 kHz sampling rate,

video information discarded,

studio and out of studio recordings, some telephone sessions.

DESCRIPTION OF THE HESITA DATABASE

Page 12: Di ss2013 hesita_presentation_final

12

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database prepared (read) speaking style is dominant:

most of the speech encompasses utterances of anchors and professional speakers (14 hours),

spontaneous speech segments present:

in commentators, reporters, interviewers and interviewees (10 hours),

Lombard speech appears with low representativeness (18 minutes).

DESCRIPTION OF THE HESITA DATABASE

Page 13: Di ss2013 hesita_presentation_final

13

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database

Manually identified and annotated hesitation events:

DESCRIPTION OF THE HESITA DATABASE

patterns closely following the notation presented in E. Shriberg

Page 14: Di ss2013 hesita_presentation_final

14

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database

Manually identified and annotated hesitation events:

DESCRIPTION OF THE HESITA DATABASE

repetitions (r),

substitutions (s),

filler words (p),

deletions (d) and

insertions (i).

Only the speech segments were annotated in terms of hesitations,

Filled pause vocalizations were transcribed using the SAMPA phonetic alphabet for European Portuguese.

Page 15: Di ss2013 hesita_presentation_final

15

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database

Annotation encompasses information regarding to:

audio characteristics - background environments:

studio, street, speech overlapping, noise and music,

DESCRIPTION OF THE HESITA DATABASE

acoustic events - non-speech events:

music, jingles, laughter, coughing or clapping.

respiratory and other events:

noise from cars or wind,

speaking style and speaker information.

Page 16: Di ss2013 hesita_presentation_final

16

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITA Database

All the annotations were performed by using the Transcriber software tool.

DESCRIPTION OF THE HESITA DATABASE

Page 17: Di ss2013 hesita_presentation_final

17

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

DESCRIPTION OF THE HESITA DATABASE

SP_STU_E3_M represents:

an annotation of speech segment (SP),

in a noise-free environment, studio, (STU),

with high level of spontaneity (E3), and

from a male speaker (M).

Page 18: Di ss2013 hesita_presentation_final

18

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

DESCRIPTION OF THE HESITA DATABASE

SP_OVR_E3_M represents:

the annotation of an speech segment (SP)

with overlapping speech (OVR),

in a spontaneous speaking style with high level of spontaneity (E3), and

from a male speaker (M).

Page 19: Di ss2013 hesita_presentation_final

19

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

DESCRIPTION OF THE HESITA DATABASE

(r.r) - repetitions (r),

(.w+) - extensions within a word (w+)

(f.) - filled pauses (f).

[6~]: (f.) - phonetic symbols attest extended vowel sounds or vocalic fillers.

res - presence of a respiratory event.

Page 20: Di ss2013 hesita_presentation_final

20

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 21: Di ss2013 hesita_presentation_final

21

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Considering the segments annotated accordingly to the presence of hesitations, we can see how the hesitation patterns are distributed.

HESITATION PATTERNS

Top 10 most frequent hesitation patterns.

total of 4608 events observed,

filled pauses (f.) and vocalic extensions within a word (.w+) are the most common.

Page 22: Di ss2013 hesita_presentation_final

22

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Pattern models display the way that the hesitation occurs, indicating the order of the words before and after the so-called “repair-point”.

HESITATION PATTERNS

Top 10 most frequent hesitation patterns.

The repair point (as in (f.)) marks the place from which the hesitation is repaired and the fluency is restored.

pattern (r.r) indicates that a word r was repeated as repair or reinforcement ("de.de");

pattern (s-.s), the word s was cut and then substituted ("qua-.quantas");

in (r2.r) the same word r was repeated twice and finally restored ("com.com.com");

in (rs-.rs) the word r was repeated and word s was cut and, then substituted with correction ("da tu-.da totalidade").

Page 23: Di ss2013 hesita_presentation_final

23

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

More complex hesitation patterns are present…

HESITATION PATTERNS

Embedded hesitations:

"que vo-.que.que.que voltam.que.que possam"

" that re-.that. that. that return. that. that could "

( ( r s- .(r 2 . r) s) . ( r . r ) s).

Page 24: Di ss2013 hesita_presentation_final

24

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 25: Di ss2013 hesita_presentation_final

25

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

In general hesitation events occur mainly in spontaneous speech

4406 against 188 in read (prepared) speech and 12 in Lombard speech,

total of 188 hesitations observed in 14 hours for read (prepared) speaking style results in a rate of 0.22 hesitations per minute,

4406 hesitation events in 10 hours of spontaneous speech result in a rate of 7.34 hesitations per minute.

HESITATIONS ACROSS SPEAKING STYLES

The density of hesitations in speech varies with the speaking style

Page 26: Di ss2013 hesita_presentation_final

26

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITATIONS ACROSS SPEAKING STYLES

Distribution of the 5 most common hesitation patterns in the read (prepared) speech

High frequency of vocalic extensions (.w+) (39.36%) just followed by filled pauses (f.) (32.45%).

Top 5 most frequent hesitation patterns for read (prepared) speech.

Although the difference between those two occurrences is not so expressive, it is possible that the choice for the extensions reflects the fact that vocalic fillers tend to be more stigmatized in a prepared speech context.

Page 27: Di ss2013 hesita_presentation_final

27

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

HESITATIONS ACROSS SPEAKING STYLES

Distribution of the 5 most common hesitation patterns in the read (prepared) speech

High frequency of vocalic extensions (.w+) (39.36%) just followed by filled pauses (f.) (32.45%).

Top 5 most frequent hesitation patterns for read (prepared) speech.

Repetitions in read or prepared speech become residual.

The occurrence of substitutions are higher in the prepared speech than in spontaneous speech (9.57% vs. 3.61%).

‘proving’ that they are more adequate for communicative strategy mainly in what the fluency of speaking is concerned.

Page 28: Di ss2013 hesita_presentation_final

28

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 29: Di ss2013 hesita_presentation_final

29

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

PHONETIC FORM OF FILLED PAUSES

The two most common phonetic forms for filled pauses:

the near-open central vowel [ɐ] ([6] in SAMPA),

the mid-central vowel [ə] ([@] in SAMPA).

Phone distribution of filled pauses (top10 most frequent).

This distribution supports the view that the vocalizations preferred by Portuguese speakers are around central vowels, corresponding to the reduced vowels in an unstressed position.

Page 30: Di ss2013 hesita_presentation_final

30

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

PHONETIC FORM OF FILLED PAUSES

The two most common phonetic forms for filled pauses:

the near-open central vowel [ɐ] ([6] in SAMPA),

the mid-central vowel [ə] ([@] in SAMPA).

Phone distribution of filled pauses (top10 most frequent).

Slight inclination for the high back rounded nasal vowel [u] as well (around 3%).

A nasal preference is also evident : see [ɐT ], [ɐm] and [ɐT m] or [u].

Our point here is not to associate a meaning to the filler sounds. However, there is strong empirical evidence that speakers use all of them for playing a structuring role in the speech.

Page 31: Di ss2013 hesita_presentation_final

31

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

PHONETIC FORM OF FILLED PAUSES

The two most common phonetic forms for filled pauses:

the near-open central vowel [ɐ] ([6] in SAMPA),

the mid-central vowel [ə] ([@] in SAMPA).

Phone distribution of filled pauses (top10 most frequent).

Slight inclination for the high back rounded nasal vowel [u] as well (around 3%).

A nasal preference is also evident : see [ɐT ], [ɐm] and [ɐT m] or [u].

The choice for a vocalic sound rather than other appears to be, at least in some contexts, motivated by the behavior of neighbor phonetic segments, neutralizing in some way the phonetic difference of the vocalic fillers.

Page 32: Di ss2013 hesita_presentation_final

32

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 33: Di ss2013 hesita_presentation_final

33

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Annotation of patterns closely follows E. Shriberg methodology,

Encompasses the initial and final temporal marks,

Corresponding label contains the pattern and the orthographic transcription,

Repair-point marked temporally, showing the instant where the hesitation is corrected and when the fluency on speech is recovered.

SEGMENTATION OF HESITATIONS

The period of time that corresponds to the beginning of the hesitation to its repair-point is much larger (0.61 seconds in average) than the period of time between the repair point and the end of the hesitation correction (0.34 seconds in average).

These trends concerning the distribution and duration of hesitation events may be analyzed as manifestations of planning effort as well.

Page 34: Di ss2013 hesita_presentation_final

34

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Scope

Goal

Description of the HESITA Database Hesitation Patterns Hesitations across speaking styles Phonetic form of filled pauses Segmentation of hesitations

Technical Information

Future

SUMMARY

Page 35: Di ss2013 hesita_presentation_final

35

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Directories and files:

The archive that can be uploaded contains 58 audio files and the corresponding TRS files, that enclose the two parts of the 30 daily newsprograms.

Data structure of an entry:

The TRS files have a data type definition file associated: trans-14.dtd that is provided in the archive.

TECHNICAL INFORMATION

Corpora size:

TRS files have a total of 4608 hesitation events.

The whole resource occupies 3GB, mainly due to the audio files.

Page 36: Di ss2013 hesita_presentation_final

36

DiSS 2013

Stockholm, Sweden - August 21-23, 2013

Thank You

FUTURE...

We really expect that this database can be a relevant base of work for further studies regarding a variety of speech phenomena.

Page 37: Di ss2013 hesita_presentation_final

The 6th Workshop on Disfluency in Spontaneous Speech

Stockholm, Sweden August 21-23, 2013

Sara Candeias 1

([email protected])Dirce Celorico 1

Jorge Proença 1

Arlindo Veiga 1,2

Fernando Perdigão 1,2

1Instituto de Telecomunicações, Coimbra, Portugal2University of Coimbra, DEEC, Portugal

HESITA(tions) in Portuguese Database