language documentation 20 years on

53
1 Language documentation 20 years on Peter K. Austin Department of Linguistics SOAS, University of London 36 th International LAUD Symposium, Landau 3 rd April 2014

Upload: peter-austin

Post on 18-Dec-2014

455 views

Category:

Education


0 download

DESCRIPTION

Presentation at 36th LAUD Symposium, 3rd April 2014

TRANSCRIPT

Page 1: Language Documentation 20 years on

1

Language documentation 20 years on

Peter K. Austin

Department of Linguistics

SOAS, University of London

36th International LAUD Symposium, Landau 3rd April 2014

Page 2: Language Documentation 20 years on

2

© 2014 Peter K. Austin

Creative commons licenceAttribution-NonCommercial-NoDerivsCC BY-NC-ND

Page 3: Language Documentation 20 years on

3

Outline

• Language documentation in 1995 and today

• Identifying developments and trends

• Some current challenges• Documentation practice• Archiving• The output gap

• Conclusions

Page 4: Language Documentation 20 years on

4

Note

Today’s presentation is an attempt at a critical analysis of experiences across the world over the past 20 years, not to criticise or blame anyone, but in order to seek to understand developments and possible directions for the future.

The analysis builds on work with colleagues at SOAS and elsewhere but I alone am to blame for any errors or shortcomings.

Page 5: Language Documentation 20 years on

5

Language documentation

• “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998)

• has developed over the last 20 years in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information, media and communication technologies

• concerned with roles of language speakers and their rights and needs

Page 6: Language Documentation 20 years on

6

What documentary linguistics is not

• it's not about collecting stuff to preserve it without analysing it

• it's not = description + technology

• it's not necessarily about endangered languages per se

• it's not a fad

Page 7: Language Documentation 20 years on

7

Indicators that Lang Doc has ‘arrived’

Graduate student interest

• 140 students graduated from SOAS MA in Language Documentation and Description 2004-14 – currently 27 are enrolled

• 10 graduates in PhD in Field Linguistics – 20 currently enrolled

• other documentation programmes, eg. UTAustin have similar experience

Page 8: Language Documentation 20 years on

8

Publications: books and journals

• Gippert et al 2006 Essentials of Language Documentation. Mouton

• Tsunoda 2006 Language endangerment and language revitalization: an introduction

• Language Documentation and Description – 11 issues (2,000+ copies sold), 2 in prep

• Language Documentation and Conservation – 6 issues (on-line only)

• Cambridge Handbook of Endangered Languages 2011

• Routledge Essential Readings 2011

• Oxford Bibliography Online 2012

Page 9: Language Documentation 20 years on

9

Big money – DoBeS projects

Page 10: Language Documentation 20 years on

10

ELAR deposits

Page 11: Language Documentation 20 years on

11

Main features (Himmelmann 2006:15)

• Primary data – collection and analysis of an array of primary language data to be made available for a wide range of users;

• Accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected;

• Long-term storage and preservation of primary data – includes a focus on archiving in order to ensure that documentary materials are made available to potential users now and into the distant future;

Page 12: Language Documentation 20 years on

12

Main features (cont.)

• Interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to mainstream (“core”) linguistics alone

• Cooperation with and direct involvement of the speech community – active and collaborative work with community members both as producers of language materials and as co-researchers

• Outcome is annotated and translated corpus of archived representative materials on a language

Page 13: Language Documentation 20 years on

13

LangDoc promised

• To make linguistics what many have claimed it always wanted to be, ie. “the scientific study of human language”, by:• Paying proper attention to data (making linguistics

properly empirical)• Paying proper attention to analysis in relation to

data (metadata, value-adding to the corpus)

• To change the socio-political academic balance between “fieldworkers” and “armchair linguists” (typologists, theoreticians) by providing a foundation (theory, best practices) for data collection and analysis

• To change the balance between “outsider” (linguist) and “insider” (speaker, community member) through empowerment, skills transfer and training

Page 14: Language Documentation 20 years on

14

• Language Documentation has failed to live up to its promises in all three areas, and in many ways continues what has been seen as “normal science” in Linguistics, especially in relation to outputs and evaluations of them

• There are many challenges facing the field, but also exciting opportunities to be explored – we identify some of these later

Page 15: Language Documentation 20 years on

15

A 2010 example – Stuart McGill

• 4 year PhD project at SOAS, plus 2 year post-doc

• documentation of Cicipu (Niger-Congo, north-west Nigeria) in collaboration with native speaker researchers

• outcomes: a corpus of texts (video, ELAN, Toolbox) 2,000 item lexicon archive (956 files, 50Gbytes) overview grammar (134 pages) analysis of agreement (158 pages) website, cassette tapes, books, orthography

proposal and workshop

Page 16: Language Documentation 20 years on

16

Stuart McGill Cicipu corpus

Page 17: Language Documentation 20 years on

17

Cicipu Toolbox

Page 18: Language Documentation 20 years on

18

The documentation model2000-2010

Noah’s arc(hive) – saving the morphemes 2-by-2

Page 19: Language Documentation 20 years on

19

Despite the rhetoric

• lone wolf linguists primarily focussed on language

• little interdisciplinary interest

• the linguist decides what to deliver to communities (dictionaries, orthographies, story collections, etc.)

Page 20: Language Documentation 20 years on

20

Key concepts in this period

• Standards (data, metadata, project designs)

• Tools (transcription, glossing)

• Preservation (“archival standard”)

Page 21: Language Documentation 20 years on

21

Consequences

• objectification and commodification: “reduction of languages to common exchange values, particularly in competitive and programmatic contexts such as grant-seeking and standard-setting where languages are necessarily compared and ranked” (Dobrin, Austin, Nathan)

• lack of audio skills : little or no knowledge about recording arts and microphone types, properties and placement (microphone choice and handling is the single greatest determiner of recording quality)

• video madness: video recordings made without reference to hypotheses, goals, or methodology, simply because the technology is available, portable and relatively inexpensive

• corpus taming: little ability at corpus and metadata management, file naming and bundle organisation

Page 22: Language Documentation 20 years on

22

ILG blindness

many documenters believe that interlinear glossing is the ‘gold standard’ of annotation but it is very time-consuming and illegible to non-linguists – overview annotations may be a preferred as a primary goal: ‘roadmap’ or index of a recording – approximately time-aligned information about what is in the recording, who is participating, and other interesting phenomena

Page 23: Language Documentation 20 years on

23

Holton 2014

Item 408: Oral Literature Collection, Tape 343, Side B. Robert Zuboff (Kak’weidí clan, Kaakáakw Hít) and Susie James (Chookaneidi clan, T’akdeintaan yádi), July 27, 1972; interviewed by Nora Marks Dauenhauer, migrated from reel to CD. Length 60:14. Content by DK: story of how the Sea Otter came to be is told, 0-4:15; raven sounds are given by Zuboff, and their meaning/use, 4:16-11:10; Zuboff tells a story about a man who became an invisible man (tlékanáa) (13:24); 11:11-13:24; story of a man named Naawan that bit the tongue off of a raven, 13:25-16:09; general conversation and questions about Tlingit phrases, 16:10-19:57; story of a man named Gáneix, 19:58-21:40; discussion about language and storytelling, mention of the Salmon Boy story, 21:41-24:12; Zuboff tells the story about the Woman that Raised the Wood Worm, attributes the story’s people, 24:13-27:34; Susie and Nora talk, Susie speaks about the Man Who Commanded the Tides (Yookis´kookeik) and his sister and raven. She then tells the story of bringing in the house that was way out on the ocean and how raven got the octopus tentacle to bring in the house. She then talks about the type of resources that were in the house but not in detail. She mentions the whale, cod etc. She then goes back to the man who commanded the tide and rescues his mother by placing her in the skin of a black duck, 27:35 to the end of the recording. Notes on file.

Page 24: Language Documentation 20 years on

24

Files, files and more files

• data – for the sake of data (mining)

• archivism – quantifiable properties such as recording hours, data volume, and file parameters, and technical desiderata like ‘archival quality’ and ‘portability’ become reference points in assessing the aims and outcomes of language documentation – these are not measures of quality

documentary dogarchiving tail

X

Page 25: Language Documentation 20 years on

25

Important concepts since 2010

• diversity: of goals, contexts, people, data, corpora, outcomes• move away from Noah’s Arc(hive) to more focused

documentation, eg. ELDP 2012 grant list: bark cloth making, libation rituals, fishing practices, child language, interactive speech, and ethnobotany

• diverse inputs – field interviews, experiments and observations (traditionally the bread and butter of documentation and description) but also Youtube uploads, Twitter feeds, Facebook, blogs, email, chat, Skype, local pedagogy in revitalisation

• diverse outputs – books, papers and archive deposits (the bread and butter of 1990’s documentation) but also Youtube uploads, Twitter posts, Facebook, blogs, email, chat, Skype, local pedagogy in revitalisation, mobile apps

Page 26: Language Documentation 20 years on

26

• collaboration: working with communities to determine project goals and outcomes

• Archiving 2.0: building on Web 2.0 models that link people (rather than documents or files) to create contexts for exchange and sharing, with language archives as a locus for interactivity

• incremental documentation and archiving

Page 27: Language Documentation 20 years on

27

Archive 2.0: social media models

• traditionally archiving focussed heavily on preservation

• however documentation often deals with highly sensitive topics (sacred stories, gossip)

• needs powerful but flexible access management

• transparency – ease of understanding• use positively – social networking model

• access through relationships• relationships and sharing produce new

opportunities• ELAR URCS system

Page 28: Language Documentation 20 years on

28

ELAR URCS system

• e.g. Trevor Johnston Auslan deposit

• Logged in user displays

Page 29: Language Documentation 20 years on

29

OAIS model

OAIS archives define three types of ‘packages’ingestion, archive, dissemination:

Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

IngestionProducers Designated communities

Page 30: Language Documentation 20 years on

30

ELAR archive 2.0 model

Page 31: Language Documentation 20 years on

31

Rethinking archive participation

• userse.g. add bookmarks, negotiate access

• depositorse.g. updating and editing content• negotiate access• monitoring usage

• collaborations• exchange & share information• establish groups• community curation

Page 32: Language Documentation 20 years on

32

User xx has just applied for access to restricted material in the deposit johnston2012auslan. The following message was attached to the application:

"Hello [depositor], xx here. I'm interested in having a look at some of your video deposit, including annotation files. I am working on a project documenting Central Australian Indigenous sign with yy (see http://iltyemiltyem.tumblr.com/). If ok, I'd like to see how you do the annotation - we have worked out a template and annotation protocol, but this needs a lot of refinement. Regards, MC"

Application: from depositor’s friend, re methods

Page 33: Language Documentation 20 years on

33

This email is to inform you that user xx's application for access to restricted material in the deposit kunbarlang-389 has just been approved. The depositor included the following note to the user:

"Hi xxI've approved your access to this collection, but you should know that there is an update in the material I've just deposited, with much more information on both music and texts. I'd be happy to give you access to that when it is processed.

Next time I come to London (October or November this year) I'd be happy to meet up if you would like to discuss."

Response: further info and offer to meet

Page 34: Language Documentation 20 years on

34

User xx has just applied for access to restricted material in the deposit cappadocian-375. The following message was attached to the application:

"Dear [depositor], I work as a research assistant in Nevsehir University in Cappadocia, Turkey. As you know, Cappadocian language has some relics in this region despite speakers of Cappadocian do not live anymore. In my university, there are few research on this subject with collaboration of Greek friends and local societies … I would like to access to your material … By the way, i would like to interview with you about Cappadocian language for our international journal of art and language. I hope you will have time for our journal . Thank you in advance."

Application: establish credentials and make request

Page 35: Language Documentation 20 years on

35

This email is to inform you that user xx's application for access to restricted material in the deposit johnston2012auslan has just been approved. The depositor included the following note to the user:

"I am giving you user access which means you should be able to see the ELAN eaf annotation files for the topics "The boy who cried wolf" and for "The hare and the tortoise. You should also be able to see most other movies except those tagged "1a" "4a" and "5". If you cannot see the ELAN eaf annotations I hope the problem will be fixed soon. I told the ELAR team about this."

Response: approval with details and guide

Page 36: Language Documentation 20 years on

36

Response with advice about usage

“I would have no objection to you getting the movies of these conversation(s) and the eafs from us. Please contact me directly at my work email …

Remember however that the conversational material should not be shown publicly or in a publication if there is any suggestion the participants might feel embarrassed by being identified and people seeing what they have said.  (They did give there permission for the corpus to be accessible and viewable, but sometimes people have said things they regret and would not like shown publicly. I made this restriction after seeing the videos and reconsidering their privacy issues.)”

Page 37: Language Documentation 20 years on

37

Rethinking the archive model

• progressive archiving – a challenge to whole approach of documentary linguistics so far

• establish user account at beginning of project – users add and manage/update resources over time

• user accounts show access and usage/downloads analytics – cf. Academia.edu

Page 38: Language Documentation 20 years on

38

“classical” archiving

collect resources/data archive them

Collect, process, publish Archive

And hope that death does not intervene

progressive archiving

Page 39: Language Documentation 20 years on

39 39

Summary re Archiving 2.0

• flavour of archives changes from

finality and completeness

to

open and evolutionary• questions for archives about what a

“deposit” or “depositor” really is• archives recast as providers of services

within a revised, ‘holistic’ documentation

Page 40: Language Documentation 20 years on

40

Meta-documentation

• meta-documentation = documentation of language documentation models, processes and outcomes

• the goals, methods and conditions (linguistic, social, physical, technical, historical, biographical) under which the data and analysis was produced

• meta-documentation should be as rich and appropriate as the documentary materials themselves

Page 41: Language Documentation 20 years on

41

Why?

• developing good ways of presenting and using language documentations

• future preservation of the outcomes of current documentation projects

• sustainability of field• helping future researchers learn from the

successes and failed experiments of those presently grappling with issues in language documentation (Austin 2010)

• documenting IP contributions and career trajectories (Conathan 2011)

Page 42: Language Documentation 20 years on

42

Meta-documentation categories

• identity of stakeholders involved and their roles in the project

• attitudes of language consultants, both towards their languages and towards the documenter and documentation project

• relationships with consultants and community

• goals and methodology of researcher, including research methods and tools (see Lüpke 2010), corpus theorisation (Woodbury 2011), theoretical assumptions embedded in annotation (abbreviations, glosses), potential for revitalisation

Page 43: Language Documentation 20 years on

43

• biography of the project, including background knowledge and experience of the researcher and main consultants (eg. how much fieldwork the researcher had done at the beginning of the project and under what conditions, what training the researcher and consultants had received)

• for funded projects, includes original grant application and any amendments, reports to the funder, email communications with the funder and/or any discussions with an archive

Page 44: Language Documentation 20 years on

44

Shifting the sociology of the academy?

• The development of language documentation from 1995 looked like a possible avenue to legitimise data collection and analysis and shift the sociological power balance between ‘theoretical linguists’ and ‘fieldworkers’ (or ‘butterfly collectors’) as it developed its own theoretical and analytical machinery

• This is the context that led in 2010 to the LSA Resolution Recognizing the Scholarly Merit of Language Documentation

Page 45: Language Documentation 20 years on

45

LSA Resolution Recognizing the Scholarly Merit of Language Documentation

“[a] shift in practice has broadened the range of scholarly work to include not only grammars, dictionaries, and text collections, but also archives of primary data, electronic databases, corpora, critical editions of legacy materials, pedagogical works designed for the use of speech communities, software, websites, or other digital media;

the products of language documentation and work supporting linguistic vitality are of significant importance to the preservation of linguistic diversity, are fundamental and permanent contributions to the foundation of linguistics, and are intellectual achievements which require sophisticated analytical skills, deep theoretical knowledge, and broad linguistic expertise;

Page 46: Language Documentation 20 years on

46

“the Linguistic Society of America supports the recognition of these materials as scholarly contributions to be given weight in the awarding of advanced degrees and in decisions on hiring, tenure, and promotion of faculty. It supports the development of appropriate means of review of such works so that their functionality, import, and scope can be assessed relative to other language resources and to more traditional publications”

But this has not happened – why?

Page 47: Language Documentation 20 years on

47

There is an output gap

Page 48: Language Documentation 20 years on

48

The output gap

• Outputs from language documentation projects have bifurcated into:

• Published grammars, (bilingual) dictionaries and (glossed) texts – ‘revival’ of familiar genres linguists have been comfortable with for 100+ years

• Archive deposits – hundreds or thousands of files, professionally curated by archivists, but often poorly organised or structured, with little if any contextualisation

Page 49: Language Documentation 20 years on

49

What is missing?

• Meta-documentation – the documentation of documentation projects, goals, methods, IP contributions, outcomes

• New (unfamiliar) genres that link and contextualise analytical outputs and the archival corpus:• ethnographies of documentation project designs• accounts of data collection (cf. archaeology ‘field

report’)• finding-aids to corpus collections• ‘exhibitions’ or ‘guided tours’ of archival deposits

• Evaluation measures that enable properly-based peer assessment of documentations, equivalent to the way traditional outputs are judged

Page 50: Language Documentation 20 years on

50

Open access refereed online publication

• provides a new (relatively inexpensive) platform to shift power away from traditional publishers

• unfortunately, current attempts to do this, eg. Language Science Press, merely replicate in digital form existing and familiar genres of output, while pushing the costs of formatting etc. back to the author (and proofing to ‘volunteers’)

• we need new genres, new experiments in publication and evaluation to bridge the output gap and to realise the potential that language documentation promised to rebalance the sociology of linguistics as a field

Page 51: Language Documentation 20 years on

51

EL Publishing

• A new online venture to be launched soon which will:• have the infrastructure of familiar

models of publication (editorial board, peer assessment, etc.)

• provide a platform to encourage experiments in new genres of output

• provide a space and an interface to move towards evaluations of these new outputs so that the underlying desire of the LSA statement might be realised

Page 52: Language Documentation 20 years on

52

Conclusions

• 20 years ago Language Documentation promised a new approach to the study of human language that paid better attention to data collection and analysis

• it appeared to be an opportunity to shift the socio-political academic balance between “fieldworkers” and “armchair linguists” (typologists, theoreticians) by providing a foundation (theory, best practices) for documentation, in contrast to language description

• Over the past 20 years, and especially the last 10 years, we have seen shifts in the goals, methods, foci and contexts of Language Documentation to make it more pluralistic, open, and socially networked and responsive

• However challenges remain, including encouraging new genres that bridge the output gap, more reflexivity, and better engagement with interdisciplinarity and the ethnography of our research and its contexts

Page 53: Language Documentation 20 years on

53

Thank you!