current trends in documentation of endangered languages peter k. austin elap, department of...

33
Current Trends in Documentation of Endangered Languages Peter K. Austin ELAP, Department of Linguistics SOAS

Upload: chester-allison

Post on 23-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Current Trends in Documentation of

Endangered Languages

Peter K. AustinELAP, Department of LinguisticsSOAS

Thanks to Oliver Bond, Lise Dobrin, Lenore Grenoble, David Nash David Nathan for discussion of the ideas in this presentation; they are absolved of responsibility for errors

Outline

Documentary linguistics and language documentation

Components and skills for documentationSome current issues and future concernsConclusions

Documentary linguistics

new field of linguistics “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998, 2006)

has developed over the last decade in large part in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information and communication technologies

essentially concerned with role of language speakers and their rights and needs

Features of documentary linguistics

Himmelmann (2006:15) identifies important new features of documentary linguistics:

Focus on primary data – language documentation concerns the collection and analysis of an array of primary language data to be made available for a wide range of users;

Explicit concern for accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected;

Concern for long-term storage and preservation of primary data – language documentation includes a focus on archiving in order to ensure that documentary materials are made available to potential users into the distant future;

Work in interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to linguistics alone;

Close cooperation with and direct involvement of the speech community – language documentation requires active and collaborative work with community members both as producers of language materials and as co-researchers.

A contrast

language documentation: activity of systematic recording, transcription, translation and analysis of the broadest possible variety of spoken (and written) language samples collected within their appropriate social and cultural context

language description: activity of writing grammar, dictionary, text collection, typically for linguists

Ref: Himmelmann 1998, Woodbury 2003

Uses of documentation

documentation outputs are multifunctional for: linguistic research - phonology, grammar, discourse,

sociolinguistics, typology, historical reconstruction folklore - oral literature and folklorepoetics - metrical and music aspect of oral literatureanthropology - cultural aspects, kinship, interaction styles,

ritualoral history, andeducation - applications in teaching language revitalisation

Users of documentation

collection, analysis and presentation of data useful not only for linguistics but also for research into the

socio-cultural life of the community analysed and processed so it can be understood by

researchers of other disciplines and does not require any prior knowledge of the language in question

usable by members of the speaker community respects intellectual property rights, moral rights, individual

and cultural sensitivities about access and use and is done in most ethical manner possible

The documentation record

core of a documentation project is usually understood to be a corpus of audio and/or video materials with transcription, multi-tier annotation, translation into a language of wider communication, and relevant metadata on context and use of the materials

the corpus will ideally be large, cover a diverse range of genres and contexts, be expandable, opportunistic, portable, transparent, ethical and preservable (Woodbury 2003)

as a result documentation is increasingly done by teams rather than ‘lone wolf linguists’

need to see grammatical analysis and description as a tertiary-level activity contingent on and emergent from the documentation corpus

Phases in documentation project

Project conceptualisation and designEstablishment of field site and permissionsFunding applicationData collecting and processing (including

archiving)Creation of outputsMonitoring, evaluation and reporting

Phases in data collection and analysis

Recording – of media and text (including metadata)

Capture – analogue to digital transferAnalysis – transcription, translation, annotation,

notation of metadataArchiving – creating archival objects, assigning

access and usage rightsMobilisation – publication and distribution of

materials

Some current issues and challenges

Documentation versus description The ‘representative’ record Quality of language documentation Commodification Interdisciplinarity Training for language documentation Communicating with the wider world

Documentation vs description

Himmelmann and others have tried to distinguish language documentation from language description, but it is unclear whether such a separation is truly meaningful, and even if it is where the boundaries between the two might lie.

Documentation projects must rely on application of theoretical and descriptive linguistic techniques, if only to ensure that they are usable (i.e. have accessible entry points via transcription, translation and annotation) as well as to ensure that they are comprehensive.

It is only through linguistic analysis that we can discover that some crucial speech genre, lexical form, grammatical paradigm or sentence construction is missing or under-represented in the documentary record.

Without good analysis, recorded audio and video materials do not serve as data for any community of potential users. Similarly, linguistic description without documentary support is sterile, opaque and untestable.

The “representative” record

On a theoretical level, once can define “representative” documentation as the collection of sample texts of all discourse types, all registers and genres, from speakers representing all ages, generations, socioeconomic classes, and so on. On a practical level, however, there are concrete limitations to the range and number of texts which can be collected, transcribed and analysed. Most linguists cannot devote their entire careers to time in the field, which would be required for a truly thorough collection and analysis of data.

A solution (proposed by Siefart in LDD 5) is sampling, ie. identification of some subset of types that is representative of the language as a whole – but how do we do this in a meaningful way: (i) for an individual language (ii) cross-linguistically in a comparable manner?

Sampling criteria

Criteria for differentiation of communicative events:

“Ways of speaking“ as distinguished in specific culture / speech community (Ethnography of Communication)

Medium: spoken / written Plannedness: unplanned / planned Register: formal / informal Manner of obtaining data: spontaneous (‘natural’) vs.

elicitation vs. stimulated Target: child-directed / adult-directed / foreigner-

directed

It is clear that the success of a documentation project rests on intimate collaboration with community members. In the ideal, they can be trained to be engaged in data collection themselves, thereby expediting the process (eg. Florey 2004). Even if this is not possible, community members can direct (external) linguists to varying discourse types and to differing speech patterns.

Note however that this could result in focus on rare/unusual/unique discourse types that were in no sense ‘representative’

Himmelmann (2006:66) identifies five major types of communicative events ranged along a continuum from unplanned to planned (next slide) however it is not clear that this typology is applicable to all languages and all speech communities – just what is a ‘representative’ account of language in use remains unclear, and perhaps should be abandoned

Himmelmann genres

Parameter Major Types Examples

Unplanned exclamative Ouch! Fire! Jishin da!

directive Scalpel! Sit! Achi ike!

conversational greetings, small talk, chat, discussion,interview

monological narrative, description, speech,formal address

ritual prayer, ceremonial addressPlanned

Quality of documentation

There is a tendency among some researchers to equate documentation outcomes with archival objects (part of what David Nathan has termed ‘archivism’), that is, the number and volume of recorded digital audio and/or video files and their related transcription, annotation, translation and metadata.

Mere quantity of objects is not a good proxy for quality of research. Equally, some would argue that outcomes which contribute to

language maintenance and revitalization are the true measure of the quality of a documentation project (what better success of an endangered language project than that the language continues to be used?).

So how could we measure ‘quality’ of a documentary corpus? What parameters might be included?

Possible metrics

volume (quantity) as a proxy form

media – audio, video, stills – how measured? text – explicit, transparent, well-structured,

standardised, richly detailed, machine-readable links (relations, hypertext, multimedia) – explicit,

well-structured, machine readable

More possible metrics

content: new – never inscribed before unique – not readily replicable interesting …

organisation and management (workflow, transformations, archiving)

relevance and use of outputs for stakeholders impact on community of speakers (or other

stakeholders) impact on future of language

Commodification

reduction of languages to things and their treatment as if they were a tradeable commodity

reflected in language documentation through the transformation of languages into bounded objects, indices, technical encodings, and exchangeable goods

results from forces of objectification, standardisation and audit that shape the management of information in contemporary Western culture, especially academic culture with its focus on outputs and counting (eg. RAE, RQF, citation indices, research impact statements etc)

also reflects a theoretical and methodological vacuum that has been filled not by linguistics but by preservationists, archives and technologists

Languages as bounded objects

selections of phenomena crystalised into a singular “language”

languages placed within boundaries, on maps etc.

Languages as indices

language vitality indicators: Unesco defines 9 criteria with 6 scoring levels; SIL uses 8 indicators

these objectify languages: the vitality of an individual language can be quantified, and languages can be ranked according to degree of endangerment

Unesco presents a deterministic relationship between the 9 factors and the vitality and function of languages: “taken together, these nine factors can determine the viability of a language, its function in society and the type of measures required for its maintenance or revitalization”

Languages as exchangeable goods

goal of research is for languages to be ‘preserved’ as ‘resources’ that ‘consumers’ (linguists et al) discover and access via ‘service providers’ (OLAC publicity)

linguists’ professional obligations to speaker communities now often formulated in grant applications and elsewhere in terms of transacted objects (language primers, CDs, books) rather than knowledge sharing, joint engagement in language maintenance activities or other interactions

granting agencies require linguist’s bona fides to be distilled into a ‘letter of support’ from ‘an appropriate representative of the language community’ thus turning a complex of social and political dynamics into an object that is used to legitimise the research

Languages as technical encodings

quantifiable properties (recording hours, data volume, file parameters) and technical desiderata (‘archival quality’, ‘portability’, standardised ontologies) have become reference points in discussing and assessing the methods and goals of documentation

results in grant application by formula: 100 hours of 16 bit 44.1MHz audio, 25 hours MPEG-2 video, 10% ELAN .eaf files and Toolbox annotations

technical parameters replace balanced discussion of documentation methods; eg. video recordings proposed without reference to hypotheses, goals or methodology; avoidance of data compression substitutes for knowledge of art of audio recording; file formats named rather than corpus structure described

Interdisciplinarity

Himmelmann and others have pointed to the importance of taking a multidisciplinary perspective in language documentation and drawing in researchers, theories and methods from a wide range of areas, including anthropology, musicology, psychology, ecology, applied linguistics etc (see Harrison 2005, Coelho 2005, Eisenbeiss 2005).

True interdisciplinary research, is difficult to achieve, both because of theoretically different orientations, and practical differences in approach (ranging from differences in linguists’ and anthropologists’ practices concerning payments for consultants traditionally have differed, to more significant differences in academic paradigm that make communication and understanding fraught).

Mainstream linguistics has tended to turn away from other disciplines and to emphasise its ‘independence’ by concentrating on theoretical concerns that are of internal interest to linguists only (minimalism, OT phonology – see Libermann 2007).

Documentary linguistics opens new doors to interdisciplinary collaboration but we need to work out how to achieve it.

Reaching the wider world

There are great opportunities for communicating about language and language issues to the general community

At SOAS we have run “Endangered Languages Week” in 2007 and 2008, film showings, public lectures, exhibitions (“Disappearing Voices”), David Crystal’s play (“Living On”)

We see part of your work as ELDP grantees as including outreach and communication activities – we will encourage you to contribute “stories” and images for things like the HRELP annual report, the website etc.

Exhibition

Identifying the gaps

The discourse of endangered languages and language documentation has a strong moral and emotional power which has not been matched by conceptual guidance on what linguistics and linguists can do in response

publications and debates about effective and appropriate documentary methodologies for linguists have been slow to develop, resulting in many unanswered questions: are the goals of documentary linguistics social or formal? are its data symbolic or digital recordings of events? what role(s) should archives play? how could we decide between competing interests?

we lack a framework for assessing quality, value, effectiveness and progress of our work so documentary linguists fall back on established patterns like quantifiable indices and technical standards

Setting some agendas

recognising that some of the challenges described here derive from bureaucratic and technological contexts and should not be taken for granted as defining the discipline

we need to develop a new approach to language documentation that implements the moral and ethical vision that has attracted new participants

replacing the rhetoric that documentation is a separate discipline from descriptive linguistics with a better understanding of their respective goals, methodologies and evaluative criteria

and locating documentation within a wide range of interdisciplinary approaches to human language

with development of appropriate training and outreach

Our goals for the training course

To expose you to good practices in documentation (recording, analysis, archiving, mobilisation, ethics and IPR)

To raise issues that we see as theoretical and practical challenges and to share experiences and ideas (a two-way process

To begin what we hope is a long-term on-going relationship between you as researchers and us as trainers, archivists, researchers and all round good guys

The end