data collection and multimodal annotation tools

27
Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2

Upload: everly

Post on 15-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Data collection and Multimodal Annotation Tools. Dagstuhl 2001 Workgroup 2. Permanents Lisa Harper Michael Kipp Emiel Krahmer Jean-Claude Martin Dagmar Schmauks. Visiting Scientists Harry Bunt Kioto Hasida John Lee Thomas Rist Laurent Romary. Members. Needs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data collection and Multimodal  Annotation Tools

Data collection and Multimodal Annotation Tools

Dagstuhl 2001

Workgroup 2

Page 2: Data collection and Multimodal  Annotation Tools

Members

Permanents• Lisa Harper• Michael Kipp• Emiel Krahmer• Jean-Claude Martin• Dagmar Schmauks

Visiting Scientists• Harry Bunt• Kioto Hasida• John Lee• Thomas Rist• Laurent Romary

Page 3: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 4: Data collection and Multimodal  Annotation Tools

Multimodal Corpus• Sound

– human speech (e.g. MPEG7)• transcription, (morphology)

• part-of-speech

• syntax (linguistic DS)

• binary relations– thematic roles

– rhetorical relations

– co-reference

– computer voice, sound, music– environmental sounds

Page 5: Data collection and Multimodal  Annotation Tools

Multimodal Corpus (2)• Vision

– head: movement, gaze, facial expression– gesture: hands/arms

• basic phases

• formal features(handshape, trajectories, direction, location etc.)

• encode qualities (Laban efforts?)

• functional/semiotic categories (emblem, iconic, deictic, self-adaptors etc.)

– posture: including feet/legs– computer graphics (charts/tables), characters– static/dynamic environment (people/objects):

• moving camera

Page 6: Data collection and Multimodal  Annotation Tools

Multimodal Corpus (3)• Haptic

– pressure of feet/hands/back/on seat, texture– force feedback

• Biometric– heartrate, eye dilation, skin sensitivity, eyebrow

movement, breathing

• Smell & taste (VR)• Balance (VR)• Thermal (VR)

– body/object temperature, conduit properties

Page 7: Data collection and Multimodal  Annotation Tools

Multimodal Corpus (4)

• Within-modality/cross-modality relations– mirror behavior, synchronized behavior,

repeated behavior, postural congruence– distance and touch

• Behavioral/Social units?often across modalities!

Page 8: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 9: Data collection and Multimodal  Annotation Tools

Existing Corpora:Meta Survey

• Existing surveys: – ISLE and NIMM (D8, EU & US)– ELRA (EU)– COCOSDA (Japan)– LDC (US)– TalkBank (US)

Page 10: Data collection and Multimodal  Annotation Tools

Existing Corpora:Dagstuhl 2001

• Survey with Dagstuhl participants

• Collected 28 questionnaires

• From 24 different institutes

number of

corpora

number of participants

0 6

1 12

2-9 8

10+ 2

Page 11: Data collection and Multimodal  Annotation Tools

Questionnaire

• annotated modalities– speech: 20

– gestures: 17

– facial expression: 5

– gaze: 3

– posture: 3

• file format– analogue: 4

– digital: 12

– I don't know: 4

• tool– own tool: 9

– other tool: 3

– no tool: 8

– I don't know: 1

• application areas– tourism/navigation (10),

consumer electronics, info kiosk, realty, storytelling, instruction, cinema, graphical design, everyday gestures, education, car, face guessing, games, talk shows

Page 12: Data collection and Multimodal  Annotation Tools

Questionnaire (2)

• Languages– English: 11

– German: 5

– French: 2

– Japanese: 3

– Italian: 2

– Dutch, Swedish, Finnish: 1

• Planning to collect: 21

Page 13: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 14: Data collection and Multimodal  Annotation Tools

Data Collection:Methodology?

• Legal issues: – ethical– commercial– country dependent legislature

• Practical guidelines (best practice)– technical setup for recording– field-specific coder training, models for coding

manuals

• Specify meta-data

Page 15: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 16: Data collection and Multimodal  Annotation Tools

Coding Schemes

• Survey on existing schemes: ISLE D9

• Guidelines for developing schemes:– encoding vs. inference– can scheme accommodate semantics or

generation languages for MM players (MPEG)

• Standardization– partial standards like in speech– standards for computer output log files

(graphics output, locations, xml, trajectories, time-stamping, granularity)

Page 17: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 18: Data collection and Multimodal  Annotation Tools

Tools

• Surveys of existing tools: – ISLE D11– (Bigbee, Loehr, Harper 2001)– TalkBank proposal

• Underlying frameworks: – track-based– annotation graphs– spatial annotation?

Page 19: Data collection and Multimodal  Annotation Tools

Tools: Checklists• Checklist for coding support

– fast and efficient annotation

– efficient view, search & find, customizable

– extensibility of annotation

– easy access to scheme definitions (online)

– automatic extraction of modality-specific specimen (images, sound bits, transcription sequences)

• Checklist for multi-coder support– update/merge, concurrent coding, reliability

• Checklist for Import/Export

Page 20: Data collection and Multimodal  Annotation Tools

Tools: Visions• Bootstrapping

(semi-automatic or fully automatic annotation)

• Use MM techniques for coding tools (3D, haptic, VR)

• Standardized analysis (e.g. metrics) and visualization (metaphors)

• Modular generic framework for– tools– schemes

Page 21: Data collection and Multimodal  Annotation Tools

Tools (4)

Annotation Framework (Tracks, types, objects etc.)

CodingTool

CodingScheme

specificanalysis

MLclassifier

parser

Logical Layerdata

viewer

generalanalysis

Page 22: Data collection and Multimodal  Annotation Tools

Annotation Framework (Tracks, types, objects etc.)

CodingTool

schemeframework

analysismodule

MLclassifier

parser speech

gaze

gesture

Page 23: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 24: Data collection and Multimodal  Annotation Tools

Organizations

• Initiatives:– EAGLES/ISLE

– ATLAS, MATE/NITE

– TalkBank, Childes

• International:– ELRA/ELDA, US? Asia?

• National agencies (Eurospeech):– BAS, LDC, MPI Nijmegen

Page 25: Data collection and Multimodal  Annotation Tools

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 26: Data collection and Multimodal  Annotation Tools

Future

• Data collection project– sample videos with illustrative MM data– pre-coded minimal data (speech transcription)

• Comparison/integration of schemes

• Encourage collaborative coding?

Page 27: Data collection and Multimodal  Annotation Tools

Future (2)

• Workshop on LREC Language Resources and Evaluation Canary Islands! May 2002– deadline: 20 Nov 2001– paper on Dagstuhl and follow-ups– coding excercise based on data coll.– questionnaire based on Dagstuhl survey