wp3 further specification of functionality and interoperability - gradmann
Post on 11-May-2015
706 Views
Preview:
DESCRIPTION
TRANSCRIPT
WP3 Further specification of Functionality and InteroperabilityWork Group 3.2 Semantic and Multilingual Aspects
Issues for Work Group WG3.2:Some Principles
• Europeana surrogates need rich semantic context in (at least)
• Place, Time, Persons, Abstract Concepts
• The graphs linking surrogates and semantic nodes need to be typed
• We will use linked data wherever possible instead of creating our own semantic nodes
• Source data and their context will be in all European languages (and potentially more!)
• Europeana users will wish to use all European languages (and potentially more!)
WG3.2: Semantic Contextualisation and Multilingual Issues
Issues for Work Group WG3.2: Semantic Contextualisation (1)
• What kind of functionality based on semantic technology do we actually want to enable (have a look at the thoughtlab and develop ideas from there)? Do we want to enable logical inferencing, for instance?
• What source data do we actually have (subject headings, classifications, thesauri) and how well are objects contextualised in source data?
• What kinds of semantic elements will we be able to produce from these via SKOSification or other automated procedures?
• Which linked data resources will we be using?
Issues for Work Group WG3.2: Semantic Contextualisation (2)
• To what extent will we be able to automatically contextualise surrogates in linking them to semantic nodes?
• What types of links between surrogates and nodes do we distinguish?
• What may providers expect to get back from us?
• What technology do we need for all this• RDF? SKOS?? OWL???
• What input does Europeana.Connect (EuCo) WP1 expect from us and when?
• What do we expect back from EuCo WP1 and when?
• Any related projects? Results we can reuse??
Issues for Work Group WG3.2Multilingual Issues
• What is a realistic scope for multilingual functionality: • query translation? • Result set translation?? • More???
• Which languages will Europeana 1.0 support?
• What input does EuCo WP2 expect from us and when?
• What do we expect back from EuCo WP2 and when do we expect this?
• Any related projects? Results we can reuse??
WG3.2: Semantic and multilingual aspects• Marco Berni• Tobias Blanke • Giuliana de Francesco • Milena Dobreva• Martin Doerr • Zeki Mustafa Dogan • Nicola Ferro • Stefan Gradmann • Antoine Isaac • Walter Koch • Stefanos Kollias • Allison Kupietzky• Dan Matei • Hans Nederbragt • Vivien Petras• Anne Schiller• Douglas Tudhope • Vassilis Tzouvaras • Dov Wiener
• Issues:• intended functionality• quality and semantic
contextualization of object data• subject headings, thesauri,
classification data available• which technologies to use• realistic scope for multilingual
operations• related projects in area of
multilinguality
• Office:• Sjoerd Siebinga• Go Sugimoto
Today (02 April)
• Contextualisation of existing source data
• Contextual data available
• Functional Scope
• Linked data at our disposal
WG3.2 02 April - 1
• Contextual data available• List of 84 different vocabularies• Some prominent ones such as LCSH, some of them in VIAF• Semantic areas: subjects, names, persons, material• Various delivery formats
WG3.2 02 April - 2
• Contextualisation of existing source data• Geographic names used 50% -> 90%• Coordinates 6% -> 8%• Time• Subjects• Persons• organisations
WG3.2 02 April - 3
• Questions / suggestions:• Which resources are cross-domain?• Which ESE element to be used?• Who will do cleaning of metadata?• Why not store metadata received as objects of its own rights• Minerva list of thesauri to be considered• Distinguish subject terms and classification of objects• Restrict structured operations to high level thesauri and do the rest
based on lexical associations and the like• Ask providers to make their internal authorities available rather
than trying to do map
WG3.2 02 April - 4
• Functional Scope (1)• Surrogate model as presented in D2.5 doesn’t distinguish different
types of relationships such as ‘about’ and ‘was present at’.• The Point is valid for data organisation and for searching
• Is a better model realistic for 1.0?
• Can relation types be derived from the original attributes’ semantics
• Contextualisation pertaining to surrogate vs. context data pertaining to originating context
• Granularity: complex objects• We need examples! -> Don Undeen: The Semantic Web in Practice
• Separation of digital object, conceptual object (FRBRize the model)• Annotation: part of surrogate? When are these object of their own
rights
WG3.2 02 April - 5
• Functional Scope (2)• Provenance! Diachronic dimensions should be better represented• Geographic data DigMap (input from Milena)• Target audience is critical! User modelling!!• Reasoning: indirectly connected things • Related terms + related (functional) context• Flexibility of modeling is a requirement• -> inferencing, some kind of reasoning is needed, and be it for
machine processing only• Cost of processing time may be a critical issue in designing!• How to generalise properties to a small set of super-properties
WG3.2 03 April - 5
• Functional Scope (3)• Access by super-properties based on appropriate generalisations, follow
data paths• Rosetta stone metaphor: Rosetta navigation• Domain specific ontologies mapped (or pruned) to more generic
Europeana ontologies as part of OurEuropeana• Higher level terms (Europeana) + more granular terminology (user)• Generalisation, query expansion• Characterisation of collections (do we want these?) – or rather fonds (in
archival speak), contextual groupings• Distinguish curatorial environments (with metatada pertaining to these) and virtual
‘collections’
• Tree structure in archives: can we represent these in the surrogate structure, or do we model this in semantic contextualisation
WG3.2 03 April - 5
• Functional Scope (4)• (Collections contd): provider vs. user generated groupings• All ‘collections’ can be reduced to conceptual context (including
‘events’)• Questions – answers? Or just surrogate retrieval?? And if we
provide answers: multilingually??
• Multilingual issues• Linguistic info pertaining to each attribute is a basic requirement –
possible?• Query expansion + translation as scope + query formulation aids• Surrogate model doesn’t account for language, also regarding
diachronic aspects
WG3.2 03 April - 5
• [Multilingual issues]• Architecture: language manager indicates query translation focus,
but multilingual approach should be much more transversal• Check against lexica at ingest stage and normalise / enrich• Use of an interlingua of controlled terms – but consider out of
vocabulary terms!• Use CACAO results: make recommendations rather than try to
impose …• Resources in different languages (FRBRzing)• Use payloading in search contex• Who will provide named entity resources, and which standards will
we use in this respect
WG3.2 03 April - 5
• [Multilingual issues]• Distinguish properties that are important for multilingual operations
from those that are not• Wordnet use in ThoughtLab with English as pivotal language
providing quick wins• Freely available resources are rare! UNESCO thesaurus availiable
in some languages: CACAO list, TrebleCLEF, Placenames (European resource)
• IMPACT uses lexica, some of which may be freely available -> Max Kaiser!
• Political issues: who are the semantic/linguistic resource providers? Political authorities??
• Last FP7 call (DL) …
WG3.2 03 April - 5
• [Multilingual issues]• CEN INNN• Talk to CLARIN for multilingual services• Contact FlareNET project• Eurovoc mapping involving Gemnet and others (Doug)• Aligning all these resources may be a non-trivial issues
• Organise a seminar joining all projects
• Whitepaper on multilingual issues as a starting point (Milena, Martin, CACAO,
• CERL has produced a thesaurus• Subject terms and concepts are harder than place names and the like• Problem of differing standards
WG3.2 03 April - 5
• [Multilingual issues]• Whitepaper on multilingual services provided to Europeana as a
starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki
• -> Seminar adjacent to the September meeting• Technology watch
WG3.2 03 April - 6
• Linked data at our disposal (quite restricted)• Link at ingestion and updating time rather than dynamically in query
context (-> use a Europeana cache for pointers -> surrogate model?)• DBPedia (pivotal resource for multilingual operations!)• Language repository• Geonames• LCSH• Rameau (use MACS and CrissCross to provide mappings)• VIAF• ETB• But: Metadata provided will contain links to other resources, and
typically not URIs
WG3.2 03 April - 7
• Typing relations ...! Including language tags again
WG3.2 03 April – Conclusion (1)
• Semantics: Rosetta Stone metaphor with two types of functionality
• Context of surrogates• Contextual groupings• Open: typing relations
WG3.2 03 April – Conclusion (2)
• Multilingual Issues• Linguistic info pertaining to each attribute is a basic requirement –
possible?• Surrogate model doesn’t account for language, also regarding
diachronic aspects• Scope: Query expansion + translation + query formulation aids• Whitepaper on multilingual services provided to Europeana as a
starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki
• -> Seminar bringing together all initiatives and projects adjacent to the September meeting
top related