july 11, 2003e-meld 2003 e-meld “school” of best practice helen aristar-dry & gayathri...

26
July 11, 2003 E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University

Upload: bertram-jones

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

July 11, 2003 E-MELD 2003

E-MELD“School” of Best Practice

Helen Aristar-Dry & Gayathri SriramThe LINGUIST List

Eastern Michigan University

July 11, 2003 E-MELD 2003

The LINGUIST List Crew

July 11, 2003 E-MELD 2003

Working late…

Back

July 11, 2003 E-MELD 2003

Using all available talent ….

Back

July 11, 2003 E-MELD 2003

Overview

• The E-MELD ‘School’ of Best Practice: latest version– Purpose

•What is ‘best practice’?•Why ‘best practice’?

– Organization

• Demo some of the facilities

July 11, 2003 E-MELD 2003

A note about the name…

• Showroom of BP? …..Nope, it’s got rooms. • House of BP?• Funhouse?• Playhouse?• Outhouse?• Bazaar?• Palace?• Chateau?• Shed?

School

July 11, 2003 E-MELD 2003

What is Best Practice?

Practices designed to insure that digital language resources :

• endure through time. • can be reused by others, both now and

in the future.• are as independent as possible of

computer environments, scholarly communities, and domains of application.

-Bird & Simons 2003

July 11, 2003 E-MELD 2003

Best Practice as we know it …

• Distinguish between the archival format and the presentation format(s). BP is concerned primarily with archival format.

• Archival formats should employ open file formats and open standards.

• Examples of archive formats: – Documents: plain text with XML markup.

– Images: TIF 16 bit gray scale format

– Audio files: pure (uncompressed) WAV files.

. . . this afternoon

July 11, 2003 E-MELD 2003

Best Practice

• Write metadata for the language resource in an approved format.

Recommended: •OLAC format

•A format mapped to OLAC, e.g., IMDI

• Make the metadata available to a general search engine.

Recommended:•An OLAC service provider, e.g.

LINGUIST List

July 11, 2003 E-MELD 2003

Best Practice

• For morphosyntactic markup: countenance different terminology sets but use an ontology of linguistic concepts (GOLD) as an interlanguage

• Relate the different morphological markup schemas to the ontology by means of a metaschema.

July 11, 2003 E-MELD 2003

Why Best Practice?

“Best practice is enduring practice” (Simons, bc)

BP is important for all language documentation . . .

. . . but especially for documentation of endangered languages

July 11, 2003 E-MELD 2003

Why Best Practice?

•According to the Ethnologue, 52 languages have only 1 speaker left.

•Somewhere 52 field linguists are making audiotapes, videotapes, and transcripts….

July 11, 2003 E-MELD 2003

What if . . .

–Ten are transcribing in MS Word 6(which probably won’t be readable in 15 years )

July 11, 2003 E-MELD 2003

What if . . .

–Ten more are using compressed audio formats? (and compressing away some of the data)

July 11, 2003 E-MELD 2003

What if . . .

–Two more forget to turn on the tape recorder?

July 11, 2003 E-MELD 2003

A true story….

The BBC Doomsday Project…

July 11, 2003 E-MELD 2003

So the School is designed to

• Help users preserve their valuable data for generations to come.– Data:

•Notes• Images•Audio & video

– Users: • linguists, programmers, archivists• (digital) beginners or advanced users

July 11, 2003 E-MELD 2003

Ob jectives:

•Teach•Motivate•Facilitate•Invite (suggestions & participation)

July 11, 2003 E-MELD 2003

What will the School offer?

– Information about the preservation and digitization of data

– Tutorials to provide hands-on training – Facilities for online operations on the

linguist’s own data, i.e., creation of metadata

– Tools (and links to tools) for client-side operations, i.e., text annotation

– Reading material about various aspects of BP

– showcase of data from 10 endangered languages digitized according to BP

July 11, 2003 E-MELD 2003

How is the School organized?

– Information– Tutorials– Online facilities– Client-sideTools– Reading material – Showcase of data from 10 endangered languages

Classroom

Workroom

Tool Room

Reading Room

Exhibit Hall

July 11, 2003 E-MELD 2003

The Exhibit Hall

Purpose: to show what can be done within the BP framework

• Data (currently) from Biao Min and Mocovi

• Info on the language(s)• Biao Min lexicon & metadata

– Archive formats– Presentation formats (with some audio)

• Search: cross-language search at a fine-grained morphosyntactic level (thanks to the ontology)

• Comments facility for users• What else?

July 11, 2003 E-MELD 2003

Classroom

Teach users how to:– choose equipment & software– create metadata and make it available

for search– create an XML file, schema &

metaschema– create and use stylesheets to transform

XML files– annotate & transcribe audio & video files – acquire ethics – What else??

July 11, 2003 E-MELD 2003

Workroom

Where user gets to work on her own data, using BP tools for:

•metadata creation (ORE)•terminology mapping•annotation & transcription•lexicon creation (FIELD)•What else?

July 11, 2003 E-MELD 2003

Reading Room

– Reference materials– Manuals – Links to off-site tutorials – White papers– Glossary of terms (linked

to other pages on the site)

• What else?

July 11, 2003 E-MELD 2003

Toolroom

Downloads of :•FIELD (Laptop version)•Standalone ORE•Links to LDC, IMDI tools, etc. for–Conversion– Annotation

•What else?

July 11, 2003 E-MELD 2003

The “School”

http://emeld.org/school