semantic annotation and search of large virtual heritage collections

Post on 11-Jan-2016

46 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Semantic annotation and search of large virtual heritage collections. Guus Schreiber Free University Amsterdam. Overview. A non-technical view on the Semantic Web Work on Semantic-Web deployment SKOS, RDFa Semantic annotation and search in virtual collections: the E-Culture example. - PowerPoint PPT Presentation

TRANSCRIPT

Semantic annotation and search of Semantic annotation and search of large virtual heritage collectionslarge virtual heritage collections

Guus SchreiberGuus Schreiber

Free University AmsterdamFree University Amsterdam

Overview

• A non-technical view on the Semantic Web• Work on Semantic-Web deployment

– SKOS, RDFa

• Semantic annotation and search in virtual collections: the E-Culture example

The Web: resources and links

URL URLWeb link

The Semantic Web: typed resources and links

URL URLWeb link

ULAN

Henri Matisse

Dublin Core

creator

Painting“Femme aux chapeau”

SFMOMA

Principle 1: semantic annotation

• Description of web objects with “concepts” from a shared vocabulary

Principle 2: semantic search

• Search for objects which are linked via concepts (semantic link)

• Use the type of semantic link to provide meaningful presentation of the search results

urang-utang

orange

ape

great ape

Principle 3: multiple vocabularies. or: the myth of a unified vocabulary

• In large virtual collections there are always multiple vocabularies – In multiple languages

• Every vocabulary has its own perspective– You can’t just merge them

• But you can use vocabularies jointly by defining a limited set of links– “Vocabulary alignment”

• It is surprising what you can do with just a few links

Example“Tokugawa”

AAT style/period Edo (Japanese period) Tokugawa

SVCN period Edo

SVCN is local in-house thesaurus

A link between two thesauri

RDF/OWL language constructs

• classes and individuals• subclasses• properties• subproperties• domain/range of

properties• XML Schema datatypes

• equality, inequality • inverse, transitive,

symmetric, functional properties

• property constraints: cardinality, allValuesFrom, someValuesFrom

• conjunction, disjunction, negation of classes

• hasValue, enumerated type

How useful are RDF and OWL?

• RDF: basic level of interoperability• Some constructs of OWL are key:

– Logical characteristics of properties: symmetric, transitive, inverse

– Identity: sameAs

• OWL pitfalls– Bad: if it is written in OWL it is an ontology– Worse: if it is not in OWL, then it is not an

ontology

W3C Semantic Web Deployment Working Groupmaking vocabularies/thesauri/ontologies available on the Web

• Schema for interoperable RDF/OWL representation of vocabularies – SKOS

• Publication guidelines: – URI management, representation of versions

• Embedding RDF in (X)HTML pages– RDFa

SKOS: pattern for thesaurus modeling

• Based on ISO standard• RDF representation• Documentation:

http://www.w3.org/TR/swbp-skos-core-guide/• Base class: SKOS Concept

Multi-lingual labels for concepts

Semantic relation:broader and narrower

• No subclass semantics assumed!

Indexing a resource with a SKOS concept

• primarySubject is defined as subproperty

Adding semantics

• Adding OWL statements• Interpretations of thesaurus relations such as

narrower as subclass-of are often imprecise (but can still be useful)

• Learning relations between thesauri is important form of additional semantics– Example: AAT contains styles; ULAN contains

artists, but there is no link– Availability of this kind of alignment knowledge is

extremely useful

W3C standardization process

• Input: draft specification• Collect use cases• Derive requirements• Create issues list: requirements that cannot be

handled by the draft spec• Propose resolutions for issues• Continuously: ask for public feedback/comments• Get consensus on amended spec• Find two independent implementation for each

feature in the spec

Example issue: relationships between lexical labels

• In draft SKOS spec lexical labels of concepts are represented as datatype properties

• Use cases require relations between labels, e.g. “AAT” is an acronym of “Art & Architecture Thesaurus”

• This is a problem because literals have no URI (so cannot be subject of an RDF property)

• Possible resolutions:– Labels/terms as classes– Relaxing constraints on label property– …..

Recipes for vocabulary URIs

• Simplified rule:– Use “hash" variant” for vocabularies that are

relatively small and require frequent access

http://www.w3.org/2004/02/skos/core#Concept – Use “slash” variant for large vocabularies, where

you do not want always the whole vocabulary to be retrieved

http://xmlns.com/foaf/0.1/Person

• For more information and other recipes, see:

http://www.w3.org/TR/swbp-vocab-pub/

Query for WordNet URI returns “concept-bounded description”

RDFa: embedding RDF metadata in an (X)HTML file

Regular HTML

Resulting RDF statements

HTML with RDFa

More information

E-Culture demonstrator

• Part of large Dutch knowledge-economy project MultimediaN

• Partners: VU, CWI, UvA, DEN,ICN

• People: – Alia Amin, Lora Aroyo, Mark

van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga

• Artchive.com, ICN: Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)

Use case: painting style

Find paintings of a similar style

KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna

How can we find this other ‘Art nouveau’ painting?

MUNCH, EdvardThe Scream1893Oil, tempera and pastel on

cardboard91 x 73.5 cmNational Gallery, Oslo

Issues w.r.t. the use case

• Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals

• Artists-style links– AAT contains styles; ULAN contains artists, but there is no

link• Learn link from corpora• Derive it from other annotations

– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles

Example enrichment

• Learning relations between art styles in AAT and artists in ULAN through NLP of art0historic texts

• But don’t learn things that already exist!

Culture Web demonstratorhttp://e-culture.multimedian.nl

16 Nov 200616 Nov 2006

Perspectives

• Basic Semantic Web technology is ready for deployment– in open knowledge-rich domains– Important research issues: scalability, vocabulary

alignment, metadata extraction

• Web 2.0 features:– Involving community experts in annotation– Personalization, myArt

• Social barriers have to be overcome!– “open door” policy– Involvement of general public => issues of “quality”

• Importance of using open standards– Away from custom-made flashy web sites

top related