semantic annotation and search of large virtual heritage collections
DESCRIPTION
Semantic annotation and search of large virtual heritage collections. Guus Schreiber Free University Amsterdam. Overview. A non-technical view on the Semantic Web Work on Semantic-Web deployment SKOS, RDFa Semantic annotation and search in virtual collections: the E-Culture example. - PowerPoint PPT PresentationTRANSCRIPT
Semantic annotation and search of Semantic annotation and search of large virtual heritage collectionslarge virtual heritage collections
Guus SchreiberGuus Schreiber
Free University AmsterdamFree University Amsterdam
Overview
• A non-technical view on the Semantic Web• Work on Semantic-Web deployment
– SKOS, RDFa
• Semantic annotation and search in virtual collections: the E-Culture example
The Web: resources and links
URL URLWeb link
The Semantic Web: typed resources and links
URL URLWeb link
ULAN
Henri Matisse
Dublin Core
creator
Painting“Femme aux chapeau”
SFMOMA
Principle 1: semantic annotation
• Description of web objects with “concepts” from a shared vocabulary
Principle 2: semantic search
• Search for objects which are linked via concepts (semantic link)
• Use the type of semantic link to provide meaningful presentation of the search results
urang-utang
orange
ape
great ape
Principle 3: multiple vocabularies. or: the myth of a unified vocabulary
• In large virtual collections there are always multiple vocabularies – In multiple languages
• Every vocabulary has its own perspective– You can’t just merge them
• But you can use vocabularies jointly by defining a limited set of links– “Vocabulary alignment”
• It is surprising what you can do with just a few links
Example“Tokugawa”
AAT style/period Edo (Japanese period) Tokugawa
SVCN period Edo
SVCN is local in-house thesaurus
A link between two thesauri
RDF/OWL language constructs
• classes and individuals• subclasses• properties• subproperties• domain/range of
properties• XML Schema datatypes
• equality, inequality • inverse, transitive,
symmetric, functional properties
• property constraints: cardinality, allValuesFrom, someValuesFrom
• conjunction, disjunction, negation of classes
• hasValue, enumerated type
How useful are RDF and OWL?
• RDF: basic level of interoperability• Some constructs of OWL are key:
– Logical characteristics of properties: symmetric, transitive, inverse
– Identity: sameAs
• OWL pitfalls– Bad: if it is written in OWL it is an ontology– Worse: if it is not in OWL, then it is not an
ontology
W3C Semantic Web Deployment Working Groupmaking vocabularies/thesauri/ontologies available on the Web
• Schema for interoperable RDF/OWL representation of vocabularies – SKOS
• Publication guidelines: – URI management, representation of versions
• Embedding RDF in (X)HTML pages– RDFa
SKOS: pattern for thesaurus modeling
• Based on ISO standard• RDF representation• Documentation:
http://www.w3.org/TR/swbp-skos-core-guide/• Base class: SKOS Concept
Multi-lingual labels for concepts
Semantic relation:broader and narrower
• No subclass semantics assumed!
Indexing a resource with a SKOS concept
• primarySubject is defined as subproperty
Adding semantics
• Adding OWL statements• Interpretations of thesaurus relations such as
narrower as subclass-of are often imprecise (but can still be useful)
• Learning relations between thesauri is important form of additional semantics– Example: AAT contains styles; ULAN contains
artists, but there is no link– Availability of this kind of alignment knowledge is
extremely useful
W3C standardization process
• Input: draft specification• Collect use cases• Derive requirements• Create issues list: requirements that cannot be
handled by the draft spec• Propose resolutions for issues• Continuously: ask for public feedback/comments• Get consensus on amended spec• Find two independent implementation for each
feature in the spec
Example issue: relationships between lexical labels
• In draft SKOS spec lexical labels of concepts are represented as datatype properties
• Use cases require relations between labels, e.g. “AAT” is an acronym of “Art & Architecture Thesaurus”
• This is a problem because literals have no URI (so cannot be subject of an RDF property)
• Possible resolutions:– Labels/terms as classes– Relaxing constraints on label property– …..
Recipes for vocabulary URIs
• Simplified rule:– Use “hash" variant” for vocabularies that are
relatively small and require frequent access
http://www.w3.org/2004/02/skos/core#Concept – Use “slash” variant for large vocabularies, where
you do not want always the whole vocabulary to be retrieved
http://xmlns.com/foaf/0.1/Person
• For more information and other recipes, see:
http://www.w3.org/TR/swbp-vocab-pub/
Query for WordNet URI returns “concept-bounded description”
RDFa: embedding RDF metadata in an (X)HTML file
Regular HTML
Resulting RDF statements
HTML with RDFa
More information
E-Culture demonstrator
• Part of large Dutch knowledge-economy project MultimediaN
• Partners: VU, CWI, UvA, DEN,ICN
• People: – Alia Amin, Lora Aroyo, Mark
van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga
• Artchive.com, ICN: Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
Use case: painting style
Find paintings of a similar style
KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna
How can we find this other ‘Art nouveau’ painting?
MUNCH, EdvardThe Scream1893Oil, tempera and pastel on
cardboard91 x 73.5 cmNational Gallery, Oslo
Issues w.r.t. the use case
• Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals
• Artists-style links– AAT contains styles; ULAN contains artists, but there is no
link• Learn link from corpora• Derive it from other annotations
– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles
Example enrichment
• Learning relations between art styles in AAT and artists in ULAN through NLP of art0historic texts
• But don’t learn things that already exist!
Culture Web demonstratorhttp://e-culture.multimedian.nl
16 Nov 200616 Nov 2006
Perspectives
• Basic Semantic Web technology is ready for deployment– in open knowledge-rich domains– Important research issues: scalability, vocabulary
alignment, metadata extraction
• Web 2.0 features:– Involving community experts in annotation– Personalization, myArt
• Social barriers have to be overcome!– “open door” policy– Involvement of general public => issues of “quality”
• Importance of using open standards– Away from custom-made flashy web sites