language-sites: accessing language resources via geographic information systems
DESCRIPTION
. Language-Sites: Accessing Language Resources via Geographic Information Systems. Dieter van Uytvanck, Alex Dukers, Paul Trilsbeek Jacquelijn Ringersma (Peter Wittenburg) MPI for Psycholinguistics DOBES Endangered Languages Project. . - PowerPoint PPT PresentationTRANSCRIPT
Language-Sites: Accessing Language Resourcesvia Geographic Information Systems
Dieter van Uytvanck, Alex Dukers, Paul Trilsbeek Jacquelijn Ringersma(Peter Wittenburg) MPI for Psycholinguistics DOBES Endangered Languages Project
Little Background Information
In the MPI Archive we have• data for professionals in Computer Linguistics and Phonetics such as the Dutch Spoken Corpus, the Second Learner Corpus, Gesture corpora, etc.• but also • data about small languages, anthropological data etc
• the users of the latter are mainly • linguists, ethnologists, musicologists, ethnobiologists etc. and • speech community members
• overview of the “small languages” in the archive
DOBES Languages
40 language teams from the DOBES program documenting about 60 languages and working independently
MPI Languages
• about 100 researchers at the MPI • also increasing amount of deposits from external people
User Interests
• researchers have completely different interests compared to HLT • non-linguistic influences on language development • language contact effects (cognate sets)• music systems and relevance of patterns • cultural differences in parent-child relation• kinship and other relations between persons • cultural differences in relation between “language and thought” • etc
• speech community interests • revitalize the language • find identity and bring it over to their children • document cultural knowledge encoded in language + music• get acquainted with modern technology • etc
Standard way of Access in LAT
• standard way of accessing a large archive is to browse and/or search in a catalogue• MPI archive offers the IMDI infrastructure • such a canonical catalogue needs to be based on predefined classifications by the researcher and organization principles defined by the archivist
• some professionals like it since it is neutral and offers atomic access• most users find it boring and not-functional• certainly for the speech community this presentation is completely meaningless
metadata browsing& searching
LAT
Offering new Views in LAT
1. allow everyone to build his/her own virtual collection, i.e. step away from canonical pre-defined hierarchy
2. allow people to create community portals where metadata queries are used to present the resources in a web-site style
3. allow everyone to access complex objects such as annotated multimedia recordings
4. allow people to start from a semantic conceptual space
5. allow people to start from geographic information
LAT
Create own virtual Collections
• recombining and linking metadata descriptions• result is a new linked structure of nodes
• still the same “boring” style
LAT
Create Community Portals
• creating “nice” web-sites with categories according to some criteria such as genre• take care: our genres are not the same as community genres • basis is a dynamic REST-based query on the metadata registry and properly filled in metadata
• communities like this and it is maintainable for archivist
LAT
Complex Access to Resources
• navigate from resource to resource by using content links• resources can be annotated media resources, lexicons with multimedia extensions, metadata descriptions etc.
• nice, but very specific and time consuming (work in progress)
LAT
Navigation in Conceptual Spaces
• creating conceptual spaces with semantically meaningful relations• allow people to navigate in such spaces and jump to detail information in media, lexicons, photos, etc
• turns out to be very attractive to researchers and community members (work in progress)
LAT
Geographic Views
• for many users GIS view is very attractive • like to relate languages and cultures with regions • combining with other resources (geographical, historical, political, etc) • are creating GoogleEarth overlays (XML -> no dependency of big brother)• on the following slides some examples
LAT
GIS Link to Catalogue Node
• as appetizer and entry point to the appropriate catalogue node • then continuation in IMDI tree • automatic generation if coordinates are filled in(from Gunter Senft)
LAT
GIS Link to Complex Resources
LAT
• as appetizer and entry point to complex resources such as annotated media or lexicons(from Stephen Levinson)
GIS as organization Mechanism
LAT
• some researchers have organized their material according to field trips and visited places• GIS overlay gives easy links to all steps • from there link to the IMDI nodes
(from Niklas Burenholt)
GIS for anthropological Marks
LAT
• anthropologists like to set marks about mythical places, historical events and sociologically relevant material• combination with material from archeology for example• zooming in and out to see geographic relations
(from G. Boden)
GIS as entry points for Communities
LAT
• here an example from the DOBES Beaver team (Canada)• use to point to toponyms and their ethymology with direct links to resources, web-sites etc.
(from J. Miller)
GIS as entry point to LR Archives
• could be used to find regional archives with interesting language material • here the archive at CONICET in Buenos Aires
LAT
Other known Usages
• Jamieson: sounds of the world with Apple Hypercard
• CNRS/Quai Branly: explanation of aspects of languages in the world
• WALS (Haspelmath): relating language typology features to regions
• trends to combine geological and time information
• de Vriend: adding coordinates to lexemes for microvariation studies
LAT
Pros and Cons
• make GIS view one view on data amongst others but maintain a proper repository structure
• GIS is excellent for geographically oriented overviews almost everyone is used to understand maps equipment tuned to allow automatically adding coordinates
• GIS methods allow easy visual correlations geographic parameters influencing language contact very easy to see that big swamps hampered influences
• GIS optimal for bringing data from various disciplines together
• take care that you are not dependent from big brother
LAT
Thanks for your attention.
LAT
Language Archiving Technology
Shoebox/CHATTranscriber
XML
ELAN/LEXUS/SYNPATHY Annotation + Lexicon
IMDI Data Organization, Metadata
LAMUS Data Uploading and Management
Access Management
Data Archiving and Copying
IMDI / GISMetadata Browsing & Searching
ANNEX/LEXUS/IMEX/TROVA
Complex Access via Web ODIT/ISOcat Ontology
management framework
preparation
integration
utilization
ADDIT/VICOS/MELEnrichments/Views
LAT
• LAT to support operations during resource life-time
Archive GridFederation