introduction to elan - uni-bielefeld.de · audacity (audio) squared 5 (video) transcription with...
TRANSCRIPT
Introduction to ELAN
Johanna Lorenz, Bielefeld University, 5.11.2015
Overview
What is ELAN?
Basic information
The ELAN screen
Working modes
Linguistic tiers and types
Getting started
Creating and saving files
Setting up linguistic types and tiers
Time-aligned annotation fields
Adding linguistic annotations
Import and export options
ELAN’s XML files
Compatibility with other software tools and formats
05/11/2015 Introduction to ELAN 2
What is ELAN?
ELAN-Manual:
The name ELAN:
‘EUDICO Linguistic Annotator’
where EUDICO it the abbreviation for the
‘European Distributed Corpus’ project
05/11/2015 Introduction to ELAN 3
What is ELAN?
ELAN is a software tool to create time-aligned
annotations of audio and/or video recordings.
The term time-aligned annotations refers to the linking
of annotations to the appropriate parts of audio(visual)
media files.
05/11/2015 Introduction to ELAN 4
What is ELAN?
ELAN is a free and open-source software developed by the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands.
05/11/2015 Introduction to ELAN 5
The latest version can be downloaded for Windows, Mac, and Linux from the ELAN website (there you can also find the manual and other useful material):
http://tla.mpi.nl/tools/tla-tools/elan/
Why taking ELAN into consideration? ELAN is a free software for several platforms (Windows, Mac,
Linux).
It creates Unicode-based XML files that link annotations to media timelines in a long-term, reusable, archival way.
ELAN is very flexible: you can set up complex transcripts without limitations to the
number of tiers
it sets no limits to the number of speakers or languages
it allows to work with audio and several video files
you can import from and export to different linguistic software and formats
ELAN provides powerful searching options.
05/11/2015 Introduction to ELAN 6
Some linguistic software tools
Editing media files ELAN is an annotation tool, not a media editor to change media files.
Audacity (audio)
Squared 5 (video)
Transcription with time-alignment
ELAN
Transcriber
useful for single-speaker transcriptions, but not for more complicated
data
EXMaRALDA
mainly same functions as ELAN, but less widely used
05/11/2015 Introduction to ELAN 7
Some linguistic software tools
Annotation and interlinearisation
Fieldworks
Toolbox
Lexical databases
Fieldworks
LexiquePro
lexicon viewer and editor, no annotation and interlinearisation
Toolbox
05/11/2015 Introduction to ELAN 8
Basic information:
The ELAN screen
05/11/2015 Introduction to ELAN 9
annotation
tiers
main menu
Annotation mode (Default)
video
viewer
waveform
(audio)
selection
media
controls
display
controls and
viewers
waveform
(audio) viewer
timeline
viewer/
annotations
Basic information:
Working modes
In ELAN, different working modes are available. They are designed for specific tasks and you can access them via
[Main menu] > [Options]
05/11/2015 Introduction to ELAN 10
On the previous slide, you saw the Annotation mode
which is optimized for creating annotation fields and
editing annotations. This default mode offers the
most viewing, editing and searching options.
Basic information:
Working modes
The Media Synchronization mode can be used for
synchronizing several media files, e.g. more than
one video or video and audio files.
The Transcription mode is optimized for transcription
work. You can type text in all created annotation
fields.
The Segmentation mode is designed for the
creations of segmentations/annotation fields, but not
for entering text into annotation fields.
05/11/2015 Introduction to ELAN 11
Basic information:
Linguistic tiers and types
In the Annotation mode, you can see all tiers.
A tier is a line of annotation. You have to think about the
structure of your annotation tiers.
The example file has 7 tiers:
05/11/2015 Introduction to ELAN 12
tier content
ref reference: identifier of the sentence
tx text: transcription on sentence level
wo transcription on word level
mb transcription on morpheme level
gl morpheme-aligned glosses in English
ps parts of speech
ft free translation
Basic information:
Linguistic tiers and types
The tiers in ELAN have a hierarchical relation.
They are sorted as parent-child relationships by the tier hierarchy.
What are the dependencies in our example?
05/11/2015 Introduction to ELAN 13
tier parent child(ren)
ref
tx
wo
mb
gl
ps
ft
tier parent child(ren)
ref - all others
tx ref -
wo ref mb
mb wo gl, ps
gl mb -
ps mb -
ft ref -
Basic information:
Linguistic tiers and types
We have seen tiers with different properties:
One is independent (ref) and contains annotations linked to the time axis.
The others are referring tiers that are linked to annotations on their parent tier (ref). They can (within the boundaries of the parent tier), but do not have to be linked to a time interval.
If you make changes to the parent tier (deletion, change time interval), the child tiers will be affected as well.
If you delete a child tier, the parent tier will not be affected. The time interval of a child tier can´t be changed independently.
05/11/2015 Introduction to ELAN 14
Basic information:
Linguistic tiers and types
Tier types provide information about the nature of the
linguistic data that the tier contains.
You can choose the names for the types on your own.
When you create a type for e.g. the tier ‘ps’ (part of speech),
you can also name the type ‘ps’. Then you have to define the
properties of the type by choosing a type stereotype.
Each tier in ELAN has a type stereotype. The stereotypes
tell ELAN the following:
Are the annotations of the tier linked (directly) to the time axis?
Can the annotations of the parent tier be subdivided in the child
tier?
05/11/2015 Introduction to ELAN 15
Basic information:
Linguistic tiers and types
Type stereotypes in ELAN:
05/11/2015 Introduction to ELAN 16
Name Time-
aligned
Subdi-
vision
Explanation Visualization
none yes no non-overlapping annotations
on an independent tier di-
rectly linked to the time axis
Time
Subdivision
yes,
without
gaps
yes sub-divided annotations can
be linked to time within the
parent interval, but no gaps
Symbolic
Subdivision
no yes sub-divided annotations
without time-alignment, no
gaps allowed
Included In yes,
with
gaps
yes similar to Time subdivision,
with the difference that gaps
are allowed
Symbolic
association
no no one-to-one correspondence
to parent tier
Getting started:
Creating and saving files
1. Start ELAN.
2. Start a new file: [File] > [New]
3. Select your media file(s), then click [>>] to move it to ‘Selected files’ and click [OK].
05/11/2015 Introduction to ELAN 17
Getting started:
Creating and saving files
4. Save your file: [File] > [Save]
Choose the desired folder via ‘Save in’, enter a file name
into the field ‘File name’ and click [Save].
The ELAN file exten-
sion ‘.eaf’ will be
added automatically.
05/11/2015 Introduction to ELAN 18
Getting started:
Creating and saving files
5. Set an automatic backup to an appropriate time by going to [File] > [Automatic Backup].
You can choose between several options (1 to 30 minutes), I usually select 5 minutes.
ELAN will produce two files for each transcript:
.eaf-file: contains all of your time-aligned annotations and the path to the media file(s)
.pfsx-file: stores your settings for the display, e.g. tier order, zoom, font…
05/11/2015 Introduction to ELAN 19
Getting started:
Setting up linguistic types and tiers
Before you start with your transcriptions and
annotations, you have to set up (1) linguistic types
and (2) tiers.
First, delete the default tier.
Do a [right-click] on ‘default’ and select [Delete
default]. Click on [Delete] and on [Yes].
Second, delete the default type.
Go to [Type] > [Delete linguistic Type].
Click on [Delete] and on [Close].
05/11/2015 Introduction to ELAN 20
Getting started:
Setting up linguistic types and tiers
Before you start with your transcriptions and annotations,
you have to set up (1) linguistic types and (2) tiers.
Today we want to create an orthographic transcript with time-
alignment on sentence level > tier tx
We also want to have a wordlevel annotation, but the words do
not have to be time-aligned to the media file > tier wo
In a third tier, we want to have the correspondent part of
speech of each word > tier ps
Finally, we want to have a phonetic transcription on sentence
level > tier ipa
What are the appropriate stereotypes?
05/11/2015 Introduction to ELAN 21
Getting started:
Setting up linguistic types and tiers
Setting up
(1) linguistic types.
What are the appropriate
stereotypes?
Go to [Type] > [Add New Linguistic Type].
Write the type name into the field ‘Type Name’,
select the correct stereotype and click on [Add].
When you are done with
all four types, click on
[Close].
05/11/2015 Introduction to ELAN 22
type name stereotype
tx none
wo Symbolic Subdivision
ps Symbolic Association
ipa Symbolic Association
Getting started:
Setting up linguistic types and tiers
Setting up (2) tiers.
What is the appropriate
tier hierarchy?
We have two speakers, let´s call them W (man in white
dress) and P (pink dress). Today we want to have all
tiers for W, but for P only the tx-tier.
Go to [Tier] > [Add New Tier].
05/11/2015 Introduction to ELAN 23
tier name parent child(ren)
tx
wo
mb
ipa
tier name parent child(ren)
tx - wo, ps, ipa
wo tx wo
mb wo -
ipa tx -
Getting started:
Setting up linguistic types and tiers
Setting up (2) tiers.
Write the name of each tier
in the field ‘Tier Name’.
You can use the extension
‘@W’ or ‘@P’ to define the
speaker of the tier. You can
add the ‘Participant’.
Choose the appropriate
‘Linguistic type’ for each
tier and click on [Add].
When you are done, click
on [Close].
05/11/2015 Introduction to ELAN 24
Getting started:
Setting up linguistic types and tiers
Setting up (2) tiers.
Now the tiers are visible in the ‘Annotation mode’, but
they are unsorted. To change this, do a [right-click] on the
tiers, go to [Sort Tiers] and choose [Sort by Hierarchy].
05/11/2015 Introduction to ELAN 25
Getting started:
Setting up linguistic types and tiers
In many language documentation and archiving projects, the types and tiers settings will be needed for a large number of files.
ELAN provides the possibility to create a template file out of an existing file. It will store all the information about types and tiers.
It can be loaded to set up new ELAN-files with the same types and tiers properties.
05/11/2015 Introduction to ELAN 26
Getting started:
Time-aligned annotation fields
After setting up the tiers, you can switch to the segmentation mode ([Options]). Here you can set the boundaries for the annotation fields of the tiers ‘tx@W’ and ‘tx@P’.
05/11/2015 Introduction to ELAN 27
playback
- what is played
and volume
candidate tiers
for segmentation
active tier for
segmentation
playback -
volume of files
& speed rate
segmentation
behaviour
zoom of time-
line viewer
created
annotation
field
Getting started:
Time-aligned annotation fields
How to create an annotation field:
Choose a tier by [double clicking], [click] on the time where you
want to start your annotation, press [Enter], [click] on the time
where the annotation field should end and press [Enter].
Start at about 00:00:01.3. End at about 00:00:10.5.
You can move already determined boundaries by
clicking and holding. You can move the whole field or
single boundaries (active annotation fields are green).
You can also delete annotation fields via [right-click] >
[Delete Annotation].
When you are done,
you should have 4
annotation fields
(2 per speaker).
05/11/2015 Introduction to ELAN 28
Getting started:
Adding linguistic annotations
After the preparation of
annotation fields, you can
start the transcription.
Go to the Transcription
mode and select the tier
types ‘tx’ and ‘ipa’.
05/11/2015 Introduction to ELAN 29
playback - whole sound
playback - selected part
playback - volume
playback – speed rate
Getting started:
Adding linguistic annotations
When you have
added the
transcriptions,
you can switch
back to the
Annotation
mode.
Now your file
should look
like this:
05/11/2015 Introduction to ELAN 30
Getting started:
Adding linguistic annotations
This was only one way to create
annotation fields and to type
text into existing fields.
You can also create annotations in
the Annotation mode by selecting
a part of the waveform viewer (i.e.
a part of the sound). Then do a
[right-click] at the tier where you
can create independent
annotations (here: tx) and select
[New Annotation Here].
You can also type in text or edit
text in existing annotation fields in
the Annotation mode.
05/11/2015 Introduction to ELAN 31
Getting started:
Adding linguistic annotations
In our file, the tiers ‘wo’ and ‘ps’ still have no entries.
ELAN is not the most efficient software to create interlinear
glosses.
Toolbox is one software that is frequently used for annotation and
interlinearisation. When you are done with your ELAN transcript,
you can export it to Toolbox and after you have annotated
interlinear glosses you can reimport the file to ELAN.
Another workflow can be to first create a transcript without time-
alignment in Toolbox. Then you can import the data to ELAN, link
the file to the sound file and set the annotation boundaries.
Prior to the import you have to set the so-called ‘Field markers’. You
need to define ‘Parent markers’ and ‘Stereotypes’ for each ‘Field
marker’, i.e. for each tier that you want to import (the settings can be
stored and loaded as marker files).
05/11/2015 Introduction to ELAN 32
Getting started:
Adding linguistic annotations
For today, try to add some entries in the tiers ‘wo’ and ‘ps’.
You can [double-click] on the wo-tier at a point below an existing parent
tier and create an annotation.
You can create more annotations on the child tier when you do a [right-
click] on an existing annotation and choose [New Annotation Before/After].
You can experiment with the different options to create new annotation
fields in ‘wo’ and ‘ps’, also try to work with the Transcription mode.
05/11/2015 Introduction to ELAN 33
Getting started:
Adding linguistic annotations
ELAN sets no limit to the number of annotation tiers and
to their content, you can have…
05/11/2015 Introduction to ELAN 34
transcripts with a reference tier (useful for citation), sentence and word level transcription, morphemes, glosses, parts of speech and free translation
phonetic transcription (like in our example)
information structural annotation
syntactic annotation
syntactic annotation
Getting started:
Adding linguistic annotations
05/11/2015 Introduction to ELAN 35
actions
gaze behavior
gestures
…
Picture:
Front. Psychol., 10
December 2014 |
http://dx.doi.org/10.3389/fp
syg.2014.01390
Import and export options:
ELAN’s XML files
The transcripts that you produce with ELAN are
saved as XML files.
file name extension ‘.eaf’ > ELAN annotation format
EAF is an XML file format defined by an XML schema
(xsd)
EAF is based on the main XML elements and attributes of
the EAF schema.
You should be able to process ELAN data in the same
way as you process other XML files.
The XML-output is suitable for archival (long-lasting) and
not reliant on proprietary software for recovery.
05/11/2015 Introduction to ELAN 36
Import and export options:
Interoperability with other tools and formats
If you do not want to have an XML-based ELAN file,
ELAN provides a number of export options:
05/11/2015 Introduction to ELAN 37
Text (Tab-delimited, Interlinear, HTML…)
Formats for other linguistic software tools (Shoebox/ Toolbox file, Praat, TextGris)
Multimedia (SMIL, QuickTime text, Subtitles text)
Import and export options:
Interoperability with other tools and formats
Import and/or export
between:
ELAN
Praat
Shoebox/Toolbox
Transcriber
CHILDES
and other software …
05/11/2015 Introduction to ELAN 38
the end
05/11/2015 Introduction to ELAN 39