tools for language documentation claire bowern yale university lsa summer institute: 2013 week 1:...

31
Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Upload: hannah-gilbert

Post on 24-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Tools for Language DocumentationClaire BowernYale UniversityLSA Summer Institute: 2013

Week 1: Overview

Page 2: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

OVERVIEW, GOALS OF CLASS

Page 3: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Tools for documentation• Physical tools:• Hardware • Software• Stimuli

• Conceptual tools:• What makes a good documentary corpus

• Procedural tools:• How to go about documenting a language

• Tools for disseminating results

Page 4: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Overview

• Week 1: overview, hardware, software• Week 2: elicitation techniques, grammar writing• Week 3: narratives, conversation, corpus building• Week 4: lexicon, archiving

Page 5: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

About the class• “How to describe/document a language”• *No practical component* (in that we won’t be working with

speakers)• However, there will be time (I hope!) to talk about your own

field data• And we will be doing some exercises with existing data• I will provide datasets for exercises (if you don’t have data of

your own to use)• You can also use data from the field methods class here at the

Institute.

Page 6: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

A few assumptions for this class

• Not talking about community-oriented materials here (I see documentary materials as feeding into that though)

• Assuming that the language doesn’t have a lot of other materials apart from what the linguist will be producing

• Assuming that the linguist will be the one doing most of the writing.

• Implicitly assuming a grammar/dictionary/texts model (more on this below).

• None of these assumptions are crucial, they’re just there so we can limit the topic a bit.

Page 7: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

PRINCIPLES OF DOCUMENTATION

Page 8: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

What is language documentation? • Documentary Linguistics as its own subfield.• Doing things with linguistic data:• Getting the data• Preserving it• Processing it• (Analyzing it)

• Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language.

• Important for both theoretical and empirical branches of linguistics:• typology, historical linguistics, etc

Page 9: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

What shapes the language record?• The linguist (i.e. you!)• Their interests• Their abilities

• The speakers and their interests!• External circumstances• funding• time available• lucky breaks• unlucky breaks

Page 10: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Language Documentation as a Language Legacy

• Particularly relevant for endangered languages.• Your work might be the only substantive record of a language:• few speakers• field might view the language as “done”• speakers might view the language as “done”

Page 11: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Planned Documentation vs “Collect it all”• “making a record of the language” : ‘comprehensive grammar’• You can’t collect everything.• All documentation is sampling.

• Unstructured, unanalyzed corpora usually aren’t very useful• They are hard to use;• They don’t get worked on;• They usually aren’t big enough to test hypotheses

computationally;• They require native speakers (or people who are already very

familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world’s languages with fewer than 10,000 speakers?

Page 12: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

What counts as documentation?

• When is a collection big enough to count as language documentation?

• Is an article in Linguistic Inquiry language documentation?• creation• annotation• preservation• dissemination

• but only a very small fragment of a language.

Page 13: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

How much time/space does a documentary corpus take?• Depends on the resources:• Time• Speakers• Money• Levels of Interest

Page 14: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Grammar, Dictionary, Texts• “The Boasian Trilogy”• Structure, Lexicon, Culture• Way to present the analysis and also allow others to recreate

it (or challenge it) from the underlying data.

• Conceived broadly:• Capture language structure• Capture language in use• Capture lexicon and meaning

Page 15: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Sampling: Documentation as snapshots• A big part of documentation is constructing a good set of

“samples”.• To do that, you will need to consider what the purpose of the

documentary record is. That is, why are you collecting data on the language?• “to make a lasting record of the language”• “to reclaim the language to future speakers”• “to write a reference grammar”• “to document the culture in the traditional language”• “to investigate a particular aspect of the language”• all of the above…• …

Page 16: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Sampling• Are your “snapshots” representative?• Speakers• Subjects/Topics• Grammatical constructions• Lexicon

• …

Page 17: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Planned versus opportunistic collection• Planned: • translated sentences.• grammaticality judgments• etc.

• Unplanned (or planning gone wrong): • Speakers reinterpret your prompts and construct narratives from

them.• New speaker comes to a session and wants to tell stories.• You find a new (to you) morpheme in your data and want to find

out how it works.• You overhear a new construction in conversation.

Page 18: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

What constitutes a documentary corpus?• ***Everything***• sound files• videos• transcripts• (elicitation prompts – part of the

annotation)• photographs• maps• (artifacts)• metadata (data about the data)• metametadata• …

Page 19: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

WORKFLOW AND DATA TYPES

Page 20: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Workflow:

1. What do you need to do to document a language?2. What order do you need to do it in?3. (How will you know if it’s been done right?)

Page 21: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Scaled workflow• Project as a whole (timescale of years)• e.g. “Bardi language documentation”

• Immediate tasks (timescale of weeks or months)• e.g. “Bardi learners guide”

• Subtasks (timescale of days or weeks)• e.g. “write the section on numbers”

• Data gathering (timescale of single session)• e.g. “get data on numerals in use”

Page 22: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Workflow while on fieldwork

Page 23: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

HARDWARE

Page 24: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Sample field kit:• Equipment:• Laptop• Audio recorder• Video recorder• + microphones• + backup means of recording (e.g. from laptop, second recorder)

• Media:• backup devices [hard drive, DVDs, etc]• memory cards for recorders• paper! pens!

• Other• ways of keeping the equipment clean• carry bag• stills camera (cell phone, ipad, etc)• batteries, other power equipment• tripod

• Stimuli/research prompts

Page 25: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Audio• The field has converged on solid state recorders using SD cards• Handy Zoom H2 or H4 (or H6 coming soon!)• Edirol R-09• Marantz PMD 660 or 670

• And/or laptops• (or laptop plus external sound card/preprocessor)

• small/portable• AA batteries• high quality, lossless formats• easy to use• easy to transfer data

Page 26: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Not recommended:• Dictaphones• Cassette recorders• DAT

Page 27: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Video• Less consensus on models• Major component of the documentation or side-project?• Options:• smart phone• ipad• stills camera with video function• dedicated video camera

• SD card• mic jack

• Problems:• mpeg vs other proprietary video

formats• large files• memory-intensive

Page 28: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Microphones• headset vs lapel vs meeting microphone• dynamic vs cardioid• wired vs wireless• SLR vs 1/8” jack• The built-in mics in the Edirol, Handy, etc, are also ok

• You get what you pay for, approximately.

• Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter).

Page 29: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Computer• Laptop• Lots of memory• Lots of hard drive space• Usually don’t need ruggedization features• Get cheapest possible and assume it won’t last for more than

a season, or try for a higher end model

• Special considerations for high altitude, high humidity, or low temperature work.• High altitude: hard drives fail: use solid state• High humidity: condensation issues• Low temperatures: battery issues (See Lanz 2010)

Page 30: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Tablets?• Most language software won’t run on ipads or other tablets.• Great for stimuli, backup recorder, camera, etc.• Too much data

Page 31: Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

Sample field kit:• Equipment:• Laptop• Audio recorder• Video recorder• + microphones• + backup means of recording (e.g. from laptop, second recorder)

• Media:• backup devices [hard drive, DVDs, etc]• memory cards for recorders• paper! pens!

• Other• ways of keeping the equipment clean• carry bag• stills camera (cell phone, ipad, etc)• batteries, other power equipment• tripod

• Stimuli/research prompts