wikis, standards and everything lee gillamlaurent romary university of surreymax-planck digital...
TRANSCRIPT
Wikis, Standards and Everything
Lee Gillam Laurent Romary
University of Surrey Max-Planck Digital Library
Foreword
• Wikification and standards: is this the wrong talk?– Wiki: Open + free interaction on-line– ISO: Dusty documents imposing ways of thinking and working
• Still, reusability and preservation and data– Requires some minimal principles about data representation
• Interoperability
– And there are quite a few practical standards (e.g. ISO 10646)
• Background (outline)– The demonstrators: OmegaWiki– The police: ISO (International standards association)– The topic at hand: language descriptions
• Highly complementary to work done here at MPI-EVA (eWALS)
ISO standards
Title of Standard Status Registration Authority
Number of identifiers (approx)
ISO 639-1: Part 1: Alpha-2 code Published (2002) InfoTerm 150 ISO 639-2: Part 2: Alpha-3 code Published (1998) Library of Congress
(LoC) 400
ISO 639-3: Part 3: Alpha-3 code for comprehensive coverage of languages
Published (2007) Summer Institute of Linguistics (SIL)
7000
ISO 639-4: Part 4: Implementation guidelines and general principles for language coding
Expected late 2007. n/a n/a
ISO 639-5: Part 5: Alpha-3 code for language families and groups
Expected late 2007. TBC 100
ISO 639-6: Part 6: Alpha-4 representation for comprehensive coverage of language variation
Expected early 2008. GeoLang 25000
Wikis for Languages
• Some possible motivations: – 50% of languages are endangered (UNESCO); – large proportion of languages have no “resources” and no web presence; – discontinuity and fragmentation of research; – sustainability and curation issues
• And yet…..– Capability for capturing data like never before;– Expansion of capacity of the Internet and growing pressure for an
inclusive multilingual internet;– OLPC programme;– Language experts and non-experts are prepared to contribute time and
resources
• So, how about a Wiki-based infrastructure that allows us to form communities around languages and harmonize results?
Wikis for Languages
• OmegaWiki, a collaborative project to produce a free, multilingual resource in every language, with lexicological, terminological and thesaurus information
• World Language Documentation Centre (WLDC), currently comprising 22 experts in language technologies, linguistics, terminology standardisation, and localisation
• ISO, provision of the ISO 639 series of standards; focus here on 639-4 and 639-6
Wikis for Languages
ISO 639-6 dataISO 639-X data
ISO 639-6 standardISO 639-X standard
Expert review
Community review & infrastructure
“Auditors”
ISO 639-4“standards as databases”
ISO 11179ISO 12620
Co-ordination
SIL, LoC,
Infoterm
Data categoriesMetadata registries
Wikis for Languages
• Language Documentation via ISO 639-4: association of metadata descriptors to model interoperable with DCIF (12620) (639-4 section 9)
Name Section
Language Section Representation Section
Geographical I nfo
Societal I nfo
Linguistic I nfo
Diachronic I nfo
Temporal I nfo
Cultural / Religious
I nfo
Documentation
Description (#)
Attribution informationmissing here
ISO standards
• Language Codes Standards are growing in number and complexity– From 2 to 6
– From 400 identifiers to upwards of 30000
– From lists to databases
– From tables to metadata registries
– From published text documents to “published” databases
– From IETF RFC to RFCs to RFCs
– From a closed membership committee to an open Community initiative (OmegaWiki)
– …. with accompanying (web) services and products
ISO standards
• Language Codes Standards are growing in number and complexity– From 2 to 6 – eventually back to 1?
– From 400 identifiers to upwards of 30000 – plus supporting metadata
– From lists to databases – multiple metadata registers
– From tables to metadata registries – registers + policies + “auditors”
– From published text documents to “published” databases – “SAD”
– From IETF RFC to RFCs to RFCs – consume, consume, consume
– From a closed membership committee to an open Community initiative (OmegaWiki) – supporting infrastructure, expert review of community contributions (e-Voting?)
– …. with accompanying (web) services and products – Open Source and bespoke, and secured funding as necessary
ISO standards
Title of Standard Status Registration Authority
Number of identifiers (approx)
ISO 639-1: Part 1: Alpha-2 code Published (2002) InfoTerm 150 ISO 639-2: Part 2: Alpha-3 code Published (1998) Library of Congress
(LoC) 400
ISO 639-3: Part 3: Alpha-3 code for comprehensive coverage of languages
Published (2007) Summer Institute of Linguistics (SIL)
7000
ISO 639-4: Part 4: Implementation guidelines and general principles for language coding
Expected late 2007. n/a n/a
ISO 639-5: Part 5: Alpha-3 code for language families and groups
Expected late 2007. TBC 100
ISO 639-6: Part 6: Alpha-4 representation for comprehensive coverage of language variation
Expected early 2008. GeoLang 25000
Next steps
• Data and models for wiki– Structured data in necessary in scientific domains
– Registering descriptors and schemas is an essential component of long-term management of such data
• New types of standards– Stabilisation of knowledge
– Dynamic platforms for describing knowledge
– Complementary to rocket science
• Back to WALS– MPI EVA and MPDL => eWALS
• Generic environment for managing and linking 639-4 compliant data
• Connecting the whole thing…
Further Sources
• Gillam, L. (2007) "A metadata infrastructure using ISO standards". We Have to Talk about Metadata Workshop at UK e-Science Programme All Hands Meeting 2007 (AHM 2007), Nottingham, 10-13 September. Accepted.
• Gillam, L., Garside, D., Cox, C. (2007) "Developments in Language Codes standards". In Rehm, Witt and Lemnitzer (eds.): Datenstrukturen fur linguistische Ressourcen und ihre Anwendungen / Data Structures for Linguistic Resources and Applications. Proc.of GLDV 2007, 11-13 April 2007, Tubingen, Germany: Gunter Narr Verlag.
• Gillam, L., Garside, D., Cox, C. (2006). "Information volumes and linguitic diversity: meeting the challenges for content management". 3rd International Conference on Terminology, Standardization and Technology Transfer, 25-26 August, Beijing, PRC.