helen dry & anthony aristar linguist list: lrec symposium: the open language archives community...
TRANSCRIPT
![Page 1: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/1.jpg)
Helen Dry & Anthony AristarLINGUIST List: http://linguistlist.org
LREC Symposium: The Open Language Archives Community
29 May 2002
OLAC, EMELD, & “Us”
![Page 2: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/2.jpg)
OLAC Launch, LREC-02
Who is “Us”?
• The community of academic linguists
• who produce data & documentation on languages
• who use language data & documentation in their research
• Includes most subscribers to The LINGUIST List
![Page 3: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/3.jpg)
OLAC Launch, LREC-02
The LINGUIST List
• 15,600 subscribers
• 106 different countries
• 4 European mirror sites:
Tübingen | Stockholm
Edinburgh | Moscow
• Current project: EMELD . . .
![Page 4: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/4.jpg)
OLAC Launch, LREC-02
What is E-MELD? “Electronic Metastructure for Endangered Languages
Data” 5 year collaborative project, begun Sept. 2001 Participants:
The LINGUIST List (Eastern Michigan University, Wayne State University, University of Arizona)
The Linguistic Data Consortium (University of Pennsylvania) The Endangered Languages Fund (Yale University, Haskins
Laboratories)
Funded by NSF
![Page 5: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/5.jpg)
OLAC Launch, LREC-02
E-MELD Objectives:
To aid in … …the preservation of Endangered
Languages (EL ) data and documentation
…the development of infrastructure for linguistic archives
![Page 6: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/6.jpg)
OLAC Launch, LREC-02
The Problem with EL archives:
Lack of interoperability < many different procedures and data formats
Lack of permanence < use of proprietary tools & standards unstable institutional support
Inadequate input from linguists into the standards-setting enterprise
A L
![Page 7: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/7.jpg)
OLAC Launch, LREC-02
Result:
Endangered Languagesplus
Endangered data
![Page 8: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/8.jpg)
OLAC Launch, LREC-02
EMELD Components Catalog of language resources on the Internet Promotion of community consensus about best
practice in: Language identification Resource description Markup or annotation
“Showroom of Best Practice”
![Page 9: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/9.jpg)
OLAC Launch, LREC-02
“Showroom of Best Practice”
Information on standards & software Query Room, where questions may be
addressed to native speakers Texts and lexicons from 10 EL’s
marked up according to best practice
![Page 10: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/10.jpg)
OLAC Launch, LREC-02
Languages
Mocovi (Guaicuruan)
7000 speakers [EMU]
Biao Min (Mienic)
21,000 speakers [WSU]
Ega (Kwa)
300 speakers [LDC]
Cambap (Mambiloid)
30 speakers [LDC]
Lakota (Macro-Siouan) [ELF]
Tofa (Turkic) [ELF]
Two from: Alamblak, Dadibi, Mapos Buang, Takaulu Kalagan, Tuwali Ifugao - [SIL]Two from Post-Docs as yet to be determined.
![Page 11: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/11.jpg)
OLAC Launch, LREC-02
OLAC & EMELD:
OLAC
Common Goals
EMELD
Needed: Collaboration!
![Page 12: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/12.jpg)
OLAC Launch, LREC-02
Components1. Catalog of resources
2. Promotion of community consensus about best practice in:
1. Resource description
2. Language identification
OLAC Service Provider
OLAC metadata
Ethnologue /LINGUIST language codes proposed as OLAC best practice
OLAC-related
![Page 13: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/13.jpg)
OLAC Launch, LREC-02
LINGUIST = Gateway to Language Resources
Archive 1 Archive 2 Archive 3
LINGUIST = OLAC Service Provider
Data Provider 1 Data Provider 2 Data Provider 3
Key = Metadata
![Page 14: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/14.jpg)
OLAC Launch, LREC-02
What you need to know to …
Understand Metadata
YesYes
a) Standardization is power
b) Standardization is hard
• Why ??
• Is it really important?
• Is it really as simple as it sounds ?
(for Computers)
(for People)
![Page 15: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/15.jpg)
OLAC Launch, LREC-02
Metadata
Data about data, e.g., cataloguing information
Facilitates resource description, including summarization
Enables search and retrieval
![Page 16: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/16.jpg)
OLAC Launch, LREC-02
How LINGUIST will use Metadata
Harvest metadata from OLAC archives Collect metadata from individual linguists Provide a searchable database of
information (metadata) on Language data & documentation Software & tools Standards & formats
![Page 17: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/17.jpg)
OLAC Launch, LREC-02
An Example
<creator>Derbyshire, Desmond C.</creator><date code="1986“></date> <title>Topic continuity and OVS order in Hixkaryana</title> <relation refine=“IsPartOf”>In Joel Sherzer and Greg Urban
(eds.), Native South American discourse , 237-306. Berlin: Mouton.</relation>
<type code="Text" /> <type.linguistic code="description/grammatical" /> <subject>Word order</subject> <subject.language code="x-sil-HIX"/></olac>
<olac xmlns="http://www.language-archives.org/OLAC/0.3/" >
![Page 18: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/18.jpg)
OLAC Metadata . . . built on Dublin Core set of 15 elements:
Language Publisher Relation Rights Source Subject Title Type
Contributor Coverage Creator Date Description Format Identifier
![Page 19: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/19.jpg)
Added for Language Resources :
Subject.language A language the resource is about E.g. A Grammar of Russian written in English
has Subject.language = Russian Type.linguistic
The nature of the content from a linguistic point of view
E.g. transcription, annotation, description, lexicon
![Page 20: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/20.jpg)
OLAC Launch, LREC-02
Important for LL Searching
<olac xmlns="http://www.language-archives.org/OLAC/0.3/" ><creator>Derbyshire, Desmond C.</creator><date code="1986“></date> <title>Topic continuity and OVS order in Hixkaryana</title> <relation refine=“isPartOf”>In Joel Sherzer and Greg Urban (eds.), Native
South American discourse , 237-306. Berlin: Mouton.</relation> <type code="Text" />
<type.linguistic code="description/grammatical" /> <subject>Word order</subject>
<subject.language code="x-sil-HIX"/></olac>
![Page 21: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/21.jpg)
OLAC Launch, LREC-02
What’s been done so far:
- OLAC harvester on the LINGUIST site:- http://saussure.linguistlist.org/olac/
- OLAC metadata editor (ORE) on the LINGUIST site:- http://saussure.linguistlist.org/olac/ore/
- Language identification:- Code list for ancient languages, constructed languages, and
language families to complement the Ethnologue code list- Everything on LINGUIST site (not just harvested metadata)
categorized according to these codes: see Directory of Linguists
![Page 22: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/22.jpg)
OLAC Launch, LREC-02
What needs to be added? . . .to LINGUIST Gateway
Advice about software, tools, formats User reviews of archives, software Look up for
Controlled vocabularies OLAC best practice
![Page 23: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/23.jpg)
OLAC Launch, LREC-02
What needs to be done? . . .on Language Codes
Mechanism ensuring community input into system
Establishment of working group using OLAC process
Promotion of code use among OLAC data providers
![Page 24: Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002](https://reader036.vdocument.in/reader036/viewer/2022062417/5514d56e550346935c8b509c/html5/thumbnails/24.jpg)
OLAC Launch, LREC-02
Outcome?
• Data Access
• Data Permanence
Improved
• Accuracy of language representation