the corpus of interactional data: a large multimodal ... · discourse: turns, backchannels,...

61
The Corpus of Interactional Data: a Large Multimodal Annotated Resource Philippe Blache Laboratoire Parole et Langage Brain and Language Research Institute CNRS & Aix-Marseille Universit´ e LDC 2013 The CID corpus 1 / 60

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

The Corpus of Interactional Data: a Large MultimodalAnnotated Resource

Philippe Blache

Laboratoire Parole et LangageBrain and Language Research Institute

CNRS & Aix-Marseille Universite

LDC 2013 The CID corpus 1 / 60

Page 2: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Outline

Multimodal annotation: general overview

The formal background

Annotation of the different domains in the CID

LDC 2013 The CID corpus 2 / 60

Page 3: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Part I

Multimodal Annotation: General Overview

LDC 2013 The CID corpus 3 / 60

Page 4: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Multimodality

Goals

Description of modalities and their interactionAnalysis of natural communication

Different sources of information

Different modalities: verbal, non verbal, context, etc.Different domains: phonetics, prosody, syntax, pragmatics, etc.

Issues

Representation, encodingDiversity of annotation tools and formatsAlignment vs. synchronizationData manipulation, querying

Method

Rich annotation for each domainHomogeneous framework

LDC 2013 The CID corpus 4 / 60

Page 5: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Multimodal Corpora: a survey

Switchboard in NXT (NITE XML Toolkit)

642 conversations; 830,000 words.Syntax, turns, disfluency, information status, coreference, phonemes,syllables, prosodic phrases, breaks, accents

LUNA (Spoken Language Understanding in MultilingualCommunication Systems)

8100 human-machine dialogues and 1000 human-human dialogues inPolish, Italian and French.Turns, POS, chunks, dialogue acts, reference

SAMMIE (Saarbrucken Multimodal MP3 Player InteractionExperiment)

Multimodal dialogue system, human-machine multimodal interaction(Wizard of Oz)Transcription, turns, clauses, discourse entities, dialogue acts

LDC 2013 The CID corpus 5 / 60

Page 6: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Multimodal Corpora: a survey

AMI (Augmented Multi-party Interaction)

100h meeting, full manual transcriptionDialogue acts, focus of attention, movement (hand, head, leg), namedentities, topic segmentation

The ITC Corpus

11 groups of 4 people (25 minutes each). Task: decision makingscenarioNo transcription, functional role, socio emotional, speech activity, bodyactivity

The ATR Corpus

10 meetings, 1 hour eachNo transcription, speech activity, body movements, activity type

Multimodal Corpora

LDC 2013 The CID corpus 6 / 60

Page 7: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Part II

The Corpus of Interactional Data: a Large

Scale Experiment

LDC 2013 The CID corpus 7 / 60

Page 8: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

CID: main features

8 dialogs, 1 hour each (4 male/male; 4 female/female)

Task: - “Tell something unusual which happened to you”- “Tell about professional conflicts you may have met”

Setting

Anechoic room1 camcorder / 2 microphones

Annotations (aligned on the signal)

Phonetic and orthographic transcriptionProsody (units, intonation, contours)Morphosyntax, syntaxDiscourse (markers, turns, etc.)Gestures

LDC 2013 The CID corpus 8 / 60

Page 9: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Example

LDC 2013 The CID corpus 9 / 60

Page 10: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

The Annotation Wokflow

LDC 2013 The CID corpus 10 / 60

Page 11: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

The Annotation Architecture

LDC 2013 The CID corpus 11 / 60

Page 12: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

The Annotation Architecture

LDC 2013 The CID corpus 12 / 60

Page 13: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Main steps and contributions

1 Primary Data Preparation

Transcription Convention <<< CID Convention

Generation of orthographic and phonetic transcriptionsAligning transcriptions with the signal <<< CID

2 Automatic Annotation

Syllabification <<< CIDIntonationSentence segmentation <<< CIDPOS-taggerChunkerShallow parser

LDC 2013 The CID corpus 13 / 60

Page 14: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Main steps and contributions

3 Manual Annotation <<< CID

Gestures: hands, head, armsProsody: phrasing, contoursDisfluencesDiscourse: turns, backchannels, reported speech, information structure

4 Formal representation

Abstract schema: Typed Feature Structures <<< CIDGeneration of the XML schema <<< CIDFormatting dataQuerying

LDC 2013 The CID corpus 14 / 60

Page 15: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Some descriptions

1 Backchannels <<< CID

Vocal and gesturalDescription in terms of prosody, discourse, morpho-syntax

2 Detachments <<< CID

Dislocation, cleft, topicalizationAnnotation of the detachment type, the category, the function, theanaphor

LDC 2013 The CID corpus 15 / 60

Page 16: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Part III

The Formal Background

LDC 2013 The CID corpus 16 / 60

Page 17: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Annotation Graphs

LDC 2013 The CID corpus 17 / 60

Page 18: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Annotation Graphs

LDC 2013 The CID corpus 18 / 60

Page 19: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Annotation Graphs

LDC 2013 The CID corpus 19 / 60

Page 20: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

NXT gestures

LDC 2013 The CID corpus 20 / 60

Page 21: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

NXT-format Switchboard

LDC 2013 The CID corpus 21 / 60

Page 22: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Graph Annotation Format (GrAF)

GrAF: nodes and edges, decorated with feature structures

Annotations associated to the nodes (rather than the edges as in AG)

Nodes may be linked to:

Primary dataOther nodes in the graph

LDC 2013 The CID corpus 22 / 60

Page 23: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Graph Annotation Format (GrAF)

Base segmentation:<seg:sink seg:id="42" seg:start="24" seg:end="35"/>

Annotation over the base segmentation:

<msd:node msd:id="16">

<msd:f name="cat" value="NN"/>

</msd:node>

<msd:edge from="msd:16" to="seg:42"/>

Annotation over another annotation:<ptb:node ptb:id="23">

<ptb:f name="type" value="NP"/>

<ptb:f name="role" value="SBJ"/>

</ptb:node>

<ptb:edge from="ptb:23"to="msd:16"/>

LDC 2013 The CID corpus 23 / 60

Page 24: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

A generic scheme

Needs

A mean to describe the information to be encoded, its organizationA precise description of:

the categories or objects in each domainthe organization of each domainthe relations between the domains

An homogeneous framework for representing all sources of informationIndependent from a specific tool or formalism

Solution: Typed Feature Structure

Description of the objects and their propertiesDescription of the hierarchical structure

LDC 2013 The CID corpus 24 / 60

Page 25: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

An Annotation Scheme in terms of TFS

Type hierarchy:

object

������

����

����

@@

@@

PPPP

PPPP

PP

pros phr��HH

ip ap

phono

�� HHsyllable phoneme

disfluence��HH

lex non-lex

gest

���

HHH

hand head ...

Constituency hierarchy:

ip ::= ap∗

ap ::= syl+

syl ::= const syl+

const syl ::= phon+

disf ::= reprandum break reparans

LDC 2013 The CID corpus 25 / 60

Page 26: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

The TFS Schema

Object type :

object

[index integer

locationloc type

]

Location type:

loc type

�����

HHH

HH

temporal

����

HHHH

interval[start time unit

end time unit

] point[point time unit

]spatial

LDC 2013 The CID corpus 26 / 60

Page 27: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Phonetics

phon

sampa label sampa unit

cat{

vowel, consonant}

type{

occlusive, fricative, nasal, ...}

articulation

lip

[protusionstring

aperture aperture

]

tongue

tip

[locationstring

degree string

]

body

[locationstring

degree string

]

velum aperture

glottis aperture

role

[epentheticboolean

liaison boolean

]

LDC 2013 The CID corpus 27 / 60

Page 28: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Prosody

pros phr

������

HHH

HHH

iplabel IP

constituents list(ap)

contour

directionstring

position string

function string

ap[label AP

constituents list(syl)

]

LDC 2013 The CID corpus 28 / 60

Page 29: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Prosody: Example

ip

label IP

index 18

location

[start83.11

end 204.21

]

constituents

ap

label AP

index 25

location

[start192.28

end 204.21

]

contour

directionfalling

position final

function conclusive

LDC 2013 The CID corpus 29 / 60

Page 30: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Syllabic structure

syl

struct syl struct

position

rank{

integer}

syl number{

integer}

accentuable boolean

prominence boolean

constituents list(const syl)

const syl

phon list(phon)

const type{

onset, nucleus, coda}

syl

label syl

index 42

location

[start 195.12

end 204.21

]

constituents

{[const type onset

phon /f/

],

[const type nucleus

phon /u/

],

[const type coda

phon /l/

]}struct CVC

position 3/3

accentuable false

prominence false

LDC 2013 The CID corpus 30 / 60

Page 31: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Syllabic structure

syl

struct syl struct

position

rank{

integer}

syl number{

integer}

accentuable boolean

prominence boolean

constituents list(const syl)

const syl

phon list(phon)

const type{

onset, nucleus, coda}

syl

label syl

index 42

location

[start 195.12

end 204.21

]

constituents

{[const type onset

phon /f/

],

[const type nucleus

phon /u/

],

[const type coda

phon /l/

]}struct CVC

position 3/3

accentuable false

prominence false

LDC 2013 The CID corpus 30 / 60

Page 32: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Disfluency

disfluency

���

��

HHH

HH

lex[reparandum frag

break int break

]

����

HHHH

repaired[type rep

reparans change

] incomplete[dis type inc

]

non lex

���

HHH

filled[type fill

] silent[type sil

]

LDC 2013 The CID corpus 31 / 60

Page 33: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Part IV

The annotations

LDC 2013 The CID corpus 32 / 60

Page 34: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Transcription

EOT on y va avec des copains on a(v)ait pris l(e) ferry en Normandie,T/

p(ui)sque j’avais un frere qui etait en $Normandie, T/$ on traverse on

a(v)ait passe [une, uneu] nuit epouvantab(le) sur le ferry et euh on arrive

a $Londres,T /$ on voit ma soeur e(lle) nous amene dans le [B&B, biainbi]

ou ...

Tokens on y va avec des copains on avait pris le ferry en Normandie puisque j’

avais un frere qui etait en Normandie on traverse on avait passe une nuit

epouvantable sur le ferry et on arrive a Londres on voit ma soeur elle nous

amene dans le B&B ou ...

LDC 2013 The CID corpus 33 / 60

Page 35: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Segmentation

on y va avec des copains /Wm/ on avait pris le ferry en Normandie puisque j’

avais un frere qui etait en Normandie /Wd/ on traverse /Wm/ on avait passe une

nuit epouvantable sur le ferry /Wm/ et on arrive a Londres /Wm/ on voit ma soeur

/Wm/ elle nous amene dans le B&B /Wm/ ou on devait loger /Wd/ on se promene /Wm/

moi /Wm/ j’ etais deja crevee au bout de trois jours /Wm/ parce qu’ on voyageait

vachement a pied /Wm/ donc j’en pouvais plus

LDC 2013 The CID corpus 34 / 60

Page 36: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Phonetic transcription

Grapheme-phoneme conversion

Input: enriched transcriptionOutput: list of phonemes, with liaisons

Exampleet c’est comme en anglais te rappelles pas en anglais quand euh tu epelais

ton nom euh tu sais quand tu apprends les lettres

e s e k o m a~ n a~ g l e t @ R A p e l p A a~ n a~ g l e k a~ @ t y e p @

l e t o~ n o~ t @ t s e k a~ t A p R a~ l e l e t R #

LDC 2013 The CID corpus 35 / 60

Page 37: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Phonetics: example in Praat

LDC 2013 The CID corpus 36 / 60

Page 38: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Phonetics: some figures

Phenomenon Number

Elision 11,058Word truncation 1,732Standard liaison missing 160Unusual liaison 49Non-standard phonetic realization 2,812Laugh seq. 2,111Laughing speech seq. 367Single laugh IPU 844Overlaps > 150 ms 4,150

LDC 2013 The CID corpus 37 / 60

Page 39: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Alignment

LDC 2013 The CID corpus 38 / 60

Page 40: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Syllables

LDC 2013 The CID corpus 39 / 60

Page 41: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Prosody

LDC 2013 The CID corpus 40 / 60

Page 42: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Prosodic contours

LDC 2013 The CID corpus 41 / 60

Page 43: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

POS-tagging

LDC 2013 The CID corpus 42 / 60

Page 44: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Lexicon

LDC 2013 The CID corpus 43 / 60

Page 45: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Chunking

LDC 2013 The CID corpus 44 / 60

Page 46: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Chunking (2)

LDC 2013 The CID corpus 45 / 60

Page 47: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Some results

Category Count Group Count

Adverb 15 123 AP 3 634Adjective 4 585 NP 13 107Auxiliary 3 057 PP 7 041Determiner 9 427 AdvP 15 040Conjunction 9 390 VPn 22 925Interjection 5 068 VP 1 323Preposition 8 693 Total 63 070Pronoun 25 199Noun 13 419 Soft Pct 9 689Verb 20 436 Strong Pct 14 459Total 11 4397 Total 24 148

LDC 2013 The CID corpus 46 / 60

Page 48: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Trees

LDC 2013 The CID corpus 47 / 60

Page 49: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Detachment: annotations

Dislocation: “Chocolate, I hate”

Cleft: “It is John who married Ann”

Pseudo-cleft: “What he wanted to do was to travel”

Binary constructions: “Being happy, it is not always”

Features

Detachment type: D, CV, PSCV, B

Detached category: NP, NPrel, NPproP, NPproD, NPproQ, PP, AP,

AdvP, VP, S

Function: Subj, Odir, Oind, Loc, Adj

Resumptive element: Rxx (xx : type of the res. element)

LDC 2013 The CID corpus 48 / 60

Page 50: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Detachment

LDC 2013 The CID corpus 49 / 60

Page 51: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Detachment (2)

LDC 2013 The CID corpus 50 / 60

Page 52: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Disfluencies

LDC 2013 The CID corpus 51 / 60

Page 53: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Disfluencies

LDC 2013 The CID corpus 52 / 60

Page 54: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Disfluencies (2)

LDC 2013 The CID corpus 53 / 60

Page 55: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Gestures: hands

Hands

Symmetry symmetry

hands type

Phase phase

Gesture gesture

HandShape

[Shape

Laxness boolean

]HandOrientation orientation

Space

[Region region

Coordinates

]Contact

MovementQuality

Trajectory trajectory

Velocity velocity

Amplitude amplitude

LDC 2013 The CID corpus 54 / 60

Page 56: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Gestures: hands

symmetry: {Both hands symmetrical, Both hands asymmetrical, ...}phase: {Preparation, Stroke, Hold, Retraction, ...}gesture: {Adaptor, Iconic, Metaphoric, Deictic, Emblem, ...}orientation: { Palm up, Palm down, Palm towards self, Palm away from self, ...}region: { Center center, Center, Periphery, Extreme periphery}coordinates: { Right, Left, Upper, Lower, Upper right, ...}contact: { Forehead, Hair, Cheek, Chin, Eyes, Eyebrow, ...}trajectory: { Upper, Lower, Right, Left, Upper right, lower right, ...}velocity: { Normal, Fast, Slow}amplitude: { Small, Medium, large}

LDC 2013 The CID corpus 55 / 60

Page 57: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Gestures: hands

LDC 2013 The CID corpus 56 / 60

Page 58: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Stand-off hierarchical encoding

When editing, no distinction features vs. hierarchy

Prosody: intonation, contour are features vs. AP ∈ IP

LDC 2013 The CID corpus 57 / 60

Page 59: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

TFS representation: XML scheme

label IP

constituents list(ap)

contour

direction string

position string

function string

<xs:complexType name="IntonationalPhrase">

<xs:complexContent>

<xs:extension base="ProsodicPhrase">

<xs:sequence>

<xs:element name="constituents">

<xs:complexType>

<xs:sequence>

<xs:element name="accentual\_phrase" type="AccentualPhrase"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="contour" type="Contour"/>

</xs:sequence>

</xs:extension>

</xs:complexContent>

</xs:complexType>

LDC 2013 The CID corpus 58 / 60

Page 60: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

XML representation

intervals [2]:

xmin = 0.78

xmax = 1.7559754641684542

text = "ip"

...

intervals [5]:

xmin = 2.6703535937364578

xmax = 3.329971301020408

text = "ip"

...

class = "TextTier"

name = "at_ctr"

xmin = 0

xmax = 3573.6

points: size = 2118

points [1]:

time = 1.7559754641684542

mark = "RT"

...

points [3]:

time = 3.329971301020408

mark = "F"

<IntonationalPhrase index=0>

<localisation start=0.78 end=1.7559 />

<contour type=RT time=1.7559 />

</IntonationalPhrase>

...

<IntonationalPhrase index=5>

<localisation start=2.6703 end=3.3299 />

<contour type=F time=3.3299 />

</IntonationalPhrase>

LDC 2013 The CID corpus 59 / 60

Page 61: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature

Distribution: the CID at SLDR

LDC 2013 The CID corpus 60 / 60