introduction to elan - uni-bielefeld.de · audacity (audio) squared 5 (video) transcription with...

39
Introduction to ELAN Johanna Lorenz, Bielefeld University, 5.11.2015

Upload: others

Post on 19-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Introduction to ELAN

Johanna Lorenz, Bielefeld University, 5.11.2015

Page 2: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Overview

What is ELAN?

Basic information

The ELAN screen

Working modes

Linguistic tiers and types

Getting started

Creating and saving files

Setting up linguistic types and tiers

Time-aligned annotation fields

Adding linguistic annotations

Import and export options

ELAN’s XML files

Compatibility with other software tools and formats

05/11/2015 Introduction to ELAN 2

Page 3: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

What is ELAN?

ELAN-Manual:

The name ELAN:

‘EUDICO Linguistic Annotator’

where EUDICO it the abbreviation for the

‘European Distributed Corpus’ project

05/11/2015 Introduction to ELAN 3

Page 4: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

What is ELAN?

ELAN is a software tool to create time-aligned

annotations of audio and/or video recordings.

The term time-aligned annotations refers to the linking

of annotations to the appropriate parts of audio(visual)

media files.

05/11/2015 Introduction to ELAN 4

Page 5: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

What is ELAN?

ELAN is a free and open-source software developed by the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands.

05/11/2015 Introduction to ELAN 5

The latest version can be downloaded for Windows, Mac, and Linux from the ELAN website (there you can also find the manual and other useful material):

http://tla.mpi.nl/tools/tla-tools/elan/

Page 6: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Why taking ELAN into consideration? ELAN is a free software for several platforms (Windows, Mac,

Linux).

It creates Unicode-based XML files that link annotations to media timelines in a long-term, reusable, archival way.

ELAN is very flexible: you can set up complex transcripts without limitations to the

number of tiers

it sets no limits to the number of speakers or languages

it allows to work with audio and several video files

you can import from and export to different linguistic software and formats

ELAN provides powerful searching options.

05/11/2015 Introduction to ELAN 6

Page 7: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Some linguistic software tools

Editing media files ELAN is an annotation tool, not a media editor to change media files.

Audacity (audio)

Squared 5 (video)

Transcription with time-alignment

ELAN

Transcriber

useful for single-speaker transcriptions, but not for more complicated

data

EXMaRALDA

mainly same functions as ELAN, but less widely used

05/11/2015 Introduction to ELAN 7

Page 8: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Some linguistic software tools

Annotation and interlinearisation

Fieldworks

Toolbox

Lexical databases

Fieldworks

LexiquePro

lexicon viewer and editor, no annotation and interlinearisation

Toolbox

05/11/2015 Introduction to ELAN 8

Page 9: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

The ELAN screen

05/11/2015 Introduction to ELAN 9

annotation

tiers

main menu

Annotation mode (Default)

video

viewer

waveform

(audio)

selection

media

controls

display

controls and

viewers

waveform

(audio) viewer

timeline

viewer/

annotations

Page 10: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Working modes

In ELAN, different working modes are available. They are designed for specific tasks and you can access them via

[Main menu] > [Options]

05/11/2015 Introduction to ELAN 10

On the previous slide, you saw the Annotation mode

which is optimized for creating annotation fields and

editing annotations. This default mode offers the

most viewing, editing and searching options.

Page 11: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Working modes

The Media Synchronization mode can be used for

synchronizing several media files, e.g. more than

one video or video and audio files.

The Transcription mode is optimized for transcription

work. You can type text in all created annotation

fields.

The Segmentation mode is designed for the

creations of segmentations/annotation fields, but not

for entering text into annotation fields.

05/11/2015 Introduction to ELAN 11

Page 12: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Linguistic tiers and types

In the Annotation mode, you can see all tiers.

A tier is a line of annotation. You have to think about the

structure of your annotation tiers.

The example file has 7 tiers:

05/11/2015 Introduction to ELAN 12

tier content

ref reference: identifier of the sentence

tx text: transcription on sentence level

wo transcription on word level

mb transcription on morpheme level

gl morpheme-aligned glosses in English

ps parts of speech

ft free translation

Page 13: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Linguistic tiers and types

The tiers in ELAN have a hierarchical relation.

They are sorted as parent-child relationships by the tier hierarchy.

What are the dependencies in our example?

05/11/2015 Introduction to ELAN 13

tier parent child(ren)

ref

tx

wo

mb

gl

ps

ft

tier parent child(ren)

ref - all others

tx ref -

wo ref mb

mb wo gl, ps

gl mb -

ps mb -

ft ref -

Page 14: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Linguistic tiers and types

We have seen tiers with different properties:

One is independent (ref) and contains annotations linked to the time axis.

The others are referring tiers that are linked to annotations on their parent tier (ref). They can (within the boundaries of the parent tier), but do not have to be linked to a time interval.

If you make changes to the parent tier (deletion, change time interval), the child tiers will be affected as well.

If you delete a child tier, the parent tier will not be affected. The time interval of a child tier can´t be changed independently.

05/11/2015 Introduction to ELAN 14

Page 15: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Linguistic tiers and types

Tier types provide information about the nature of the

linguistic data that the tier contains.

You can choose the names for the types on your own.

When you create a type for e.g. the tier ‘ps’ (part of speech),

you can also name the type ‘ps’. Then you have to define the

properties of the type by choosing a type stereotype.

Each tier in ELAN has a type stereotype. The stereotypes

tell ELAN the following:

Are the annotations of the tier linked (directly) to the time axis?

Can the annotations of the parent tier be subdivided in the child

tier?

05/11/2015 Introduction to ELAN 15

Page 16: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Basic information:

Linguistic tiers and types

Type stereotypes in ELAN:

05/11/2015 Introduction to ELAN 16

Name Time-

aligned

Subdi-

vision

Explanation Visualization

none yes no non-overlapping annotations

on an independent tier di-

rectly linked to the time axis

Time

Subdivision

yes,

without

gaps

yes sub-divided annotations can

be linked to time within the

parent interval, but no gaps

Symbolic

Subdivision

no yes sub-divided annotations

without time-alignment, no

gaps allowed

Included In yes,

with

gaps

yes similar to Time subdivision,

with the difference that gaps

are allowed

Symbolic

association

no no one-to-one correspondence

to parent tier

Page 17: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Creating and saving files

1. Start ELAN.

2. Start a new file: [File] > [New]

3. Select your media file(s), then click [>>] to move it to ‘Selected files’ and click [OK].

05/11/2015 Introduction to ELAN 17

Page 18: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Creating and saving files

4. Save your file: [File] > [Save]

Choose the desired folder via ‘Save in’, enter a file name

into the field ‘File name’ and click [Save].

The ELAN file exten-

sion ‘.eaf’ will be

added automatically.

05/11/2015 Introduction to ELAN 18

Page 19: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Creating and saving files

5. Set an automatic backup to an appropriate time by going to [File] > [Automatic Backup].

You can choose between several options (1 to 30 minutes), I usually select 5 minutes.

ELAN will produce two files for each transcript:

.eaf-file: contains all of your time-aligned annotations and the path to the media file(s)

.pfsx-file: stores your settings for the display, e.g. tier order, zoom, font…

05/11/2015 Introduction to ELAN 19

Page 20: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Before you start with your transcriptions and

annotations, you have to set up (1) linguistic types

and (2) tiers.

First, delete the default tier.

Do a [right-click] on ‘default’ and select [Delete

default]. Click on [Delete] and on [Yes].

Second, delete the default type.

Go to [Type] > [Delete linguistic Type].

Click on [Delete] and on [Close].

05/11/2015 Introduction to ELAN 20

Page 21: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Before you start with your transcriptions and annotations,

you have to set up (1) linguistic types and (2) tiers.

Today we want to create an orthographic transcript with time-

alignment on sentence level > tier tx

We also want to have a wordlevel annotation, but the words do

not have to be time-aligned to the media file > tier wo

In a third tier, we want to have the correspondent part of

speech of each word > tier ps

Finally, we want to have a phonetic transcription on sentence

level > tier ipa

What are the appropriate stereotypes?

05/11/2015 Introduction to ELAN 21

Page 22: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Setting up

(1) linguistic types.

What are the appropriate

stereotypes?

Go to [Type] > [Add New Linguistic Type].

Write the type name into the field ‘Type Name’,

select the correct stereotype and click on [Add].

When you are done with

all four types, click on

[Close].

05/11/2015 Introduction to ELAN 22

type name stereotype

tx none

wo Symbolic Subdivision

ps Symbolic Association

ipa Symbolic Association

Page 23: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Setting up (2) tiers.

What is the appropriate

tier hierarchy?

We have two speakers, let´s call them W (man in white

dress) and P (pink dress). Today we want to have all

tiers for W, but for P only the tx-tier.

Go to [Tier] > [Add New Tier].

05/11/2015 Introduction to ELAN 23

tier name parent child(ren)

tx

wo

mb

ipa

tier name parent child(ren)

tx - wo, ps, ipa

wo tx wo

mb wo -

ipa tx -

Page 24: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Setting up (2) tiers.

Write the name of each tier

in the field ‘Tier Name’.

You can use the extension

‘@W’ or ‘@P’ to define the

speaker of the tier. You can

add the ‘Participant’.

Choose the appropriate

‘Linguistic type’ for each

tier and click on [Add].

When you are done, click

on [Close].

05/11/2015 Introduction to ELAN 24

Page 25: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

Setting up (2) tiers.

Now the tiers are visible in the ‘Annotation mode’, but

they are unsorted. To change this, do a [right-click] on the

tiers, go to [Sort Tiers] and choose [Sort by Hierarchy].

05/11/2015 Introduction to ELAN 25

Page 26: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Setting up linguistic types and tiers

In many language documentation and archiving projects, the types and tiers settings will be needed for a large number of files.

ELAN provides the possibility to create a template file out of an existing file. It will store all the information about types and tiers.

It can be loaded to set up new ELAN-files with the same types and tiers properties.

05/11/2015 Introduction to ELAN 26

Page 27: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Time-aligned annotation fields

After setting up the tiers, you can switch to the segmentation mode ([Options]). Here you can set the boundaries for the annotation fields of the tiers ‘tx@W’ and ‘tx@P’.

05/11/2015 Introduction to ELAN 27

playback

- what is played

and volume

candidate tiers

for segmentation

active tier for

segmentation

playback -

volume of files

& speed rate

segmentation

behaviour

zoom of time-

line viewer

created

annotation

field

Page 28: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Time-aligned annotation fields

How to create an annotation field:

Choose a tier by [double clicking], [click] on the time where you

want to start your annotation, press [Enter], [click] on the time

where the annotation field should end and press [Enter].

Start at about 00:00:01.3. End at about 00:00:10.5.

You can move already determined boundaries by

clicking and holding. You can move the whole field or

single boundaries (active annotation fields are green).

You can also delete annotation fields via [right-click] >

[Delete Annotation].

When you are done,

you should have 4

annotation fields

(2 per speaker).

05/11/2015 Introduction to ELAN 28

Page 29: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

After the preparation of

annotation fields, you can

start the transcription.

Go to the Transcription

mode and select the tier

types ‘tx’ and ‘ipa’.

05/11/2015 Introduction to ELAN 29

playback - whole sound

playback - selected part

playback - volume

playback – speed rate

Page 30: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

When you have

added the

transcriptions,

you can switch

back to the

Annotation

mode.

Now your file

should look

like this:

05/11/2015 Introduction to ELAN 30

Page 31: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

This was only one way to create

annotation fields and to type

text into existing fields.

You can also create annotations in

the Annotation mode by selecting

a part of the waveform viewer (i.e.

a part of the sound). Then do a

[right-click] at the tier where you

can create independent

annotations (here: tx) and select

[New Annotation Here].

You can also type in text or edit

text in existing annotation fields in

the Annotation mode.

05/11/2015 Introduction to ELAN 31

Page 32: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

In our file, the tiers ‘wo’ and ‘ps’ still have no entries.

ELAN is not the most efficient software to create interlinear

glosses.

Toolbox is one software that is frequently used for annotation and

interlinearisation. When you are done with your ELAN transcript,

you can export it to Toolbox and after you have annotated

interlinear glosses you can reimport the file to ELAN.

Another workflow can be to first create a transcript without time-

alignment in Toolbox. Then you can import the data to ELAN, link

the file to the sound file and set the annotation boundaries.

Prior to the import you have to set the so-called ‘Field markers’. You

need to define ‘Parent markers’ and ‘Stereotypes’ for each ‘Field

marker’, i.e. for each tier that you want to import (the settings can be

stored and loaded as marker files).

05/11/2015 Introduction to ELAN 32

Page 33: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

For today, try to add some entries in the tiers ‘wo’ and ‘ps’.

You can [double-click] on the wo-tier at a point below an existing parent

tier and create an annotation.

You can create more annotations on the child tier when you do a [right-

click] on an existing annotation and choose [New Annotation Before/After].

You can experiment with the different options to create new annotation

fields in ‘wo’ and ‘ps’, also try to work with the Transcription mode.

05/11/2015 Introduction to ELAN 33

Page 34: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

ELAN sets no limit to the number of annotation tiers and

to their content, you can have…

05/11/2015 Introduction to ELAN 34

transcripts with a reference tier (useful for citation), sentence and word level transcription, morphemes, glosses, parts of speech and free translation

phonetic transcription (like in our example)

information structural annotation

syntactic annotation

syntactic annotation

Page 35: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Getting started:

Adding linguistic annotations

05/11/2015 Introduction to ELAN 35

actions

gaze behavior

gestures

Picture:

Front. Psychol., 10

December 2014 |

http://dx.doi.org/10.3389/fp

syg.2014.01390

Page 36: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Import and export options:

ELAN’s XML files

The transcripts that you produce with ELAN are

saved as XML files.

file name extension ‘.eaf’ > ELAN annotation format

EAF is an XML file format defined by an XML schema

(xsd)

EAF is based on the main XML elements and attributes of

the EAF schema.

You should be able to process ELAN data in the same

way as you process other XML files.

The XML-output is suitable for archival (long-lasting) and

not reliant on proprietary software for recovery.

05/11/2015 Introduction to ELAN 36

Page 37: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Import and export options:

Interoperability with other tools and formats

If you do not want to have an XML-based ELAN file,

ELAN provides a number of export options:

05/11/2015 Introduction to ELAN 37

Text (Tab-delimited, Interlinear, HTML…)

Formats for other linguistic software tools (Shoebox/ Toolbox file, Praat, TextGris)

Multimedia (SMIL, QuickTime text, Subtitles text)

Page 38: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

Import and export options:

Interoperability with other tools and formats

Import and/or export

between:

ELAN

Praat

Shoebox/Toolbox

Transcriber

CHILDES

and other software …

05/11/2015 Introduction to ELAN 38

Page 39: Introduction to ELAN - uni-bielefeld.de · Audacity (audio) Squared 5 (video) Transcription with time-alignment ELAN Transcriber useful for single-speaker transcriptions, but not

the end

05/11/2015 Introduction to ELAN 39