standards for the representation of knowledge on the semantic web antoine isaac stitch project...
TRANSCRIPT
Standards for the Representation of Knowledge on the Semantic Web
Antoine ISAACSTITCH Project
Offene Archivierbare FormateOct. 25th, 2007
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web• SKOS
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web• SKOS
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic WebThe Interoperability Problem in Cultural
Heritage
• STITCH• SemanTic Interoperability To access Cultural Heritage
• Here, CH at large (incl. Digital Libraries)
• Trend: simultaneous access to different collections• The European Library, Memory of the Netherlands
• Problem: how to access seamlessly different collections?
• Traditional solution: using object metadata • But…
Standards for the Representation of Knowledge on the Semantic WebKB Illustrated Manuscripts
Standards for the Representation of Knowledge on the Semantic WebKB Illustrated Manuscripts
Standards for the Representation of Knowledge on the Semantic Web
Mandragore
Standards for the Representation of Knowledge on the Semantic Web
Mandragore
Standards for the Representation of Knowledge on the Semantic Web
The Interoperability Problems
From syntactic to semantic
• Different formats• “We have a solution”
• XML as a standard for data exchange
• Different metadata schemes• “Something is coming”
• Dublin Core for MD exchange
Standards for the Representation of Knowledge on the Semantic Web
The Interoperability Problems
From syntactic to semantic (continued)
• Different conceptual vocabularies for description• “Do you really want to discuss about it now?”
• No standard vocabulary• DDC, UDC, SWD, LCSH, AAT, Iconclass and myriads of
others…
• Not even a common model for these Knowledge Organization Schemes (KOSs)
• thesauri, classification schemes, subject heading lists…
• Even worse: there are reasons for this!
Standards for the Representation of Knowledge on the Semantic Web
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
The result
Standards for the Representation of Knowledge on the Semantic Web
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
An Ideal Situation
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
Why thinking of the Semantic Web?
• Cf Semantic Web activity page at W3C• http://www.w3.org/2001/sw/
• “The Semantic Web provides a common framework that allows data to be shared and reused”
• “The Semantic Web is a web of data”
• “It is about common formats for integration and combination of data drawn from diverse sources”
Standards for the Representation of Knowledge on the Semantic Web
SW Problem: The Web for Humans
• A city
• A flag
• The city’s location
Meaning
Standards for the Representation of Knowledge on the Semantic Web
SW Problem: The Web for Humans
Standards for the Representation of Knowledge on the Semantic Web
SW Problem: The Web for Computers?
• Characters
• Images
Black boxes
• Markup
Layout/Display
Standards for the Representation of Knowledge on the Semantic Web
SW Problem: The Web for Computers?
Standards for the Representation of Knowledge on the Semantic Web
The Interoperability Problems in CH (reminder)
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
MDS 2- Field 1
- Field 1.1- Field 1.2
- Field 1.2.1- Field 1.3
- Field 2- …
MDS 1- Field 1
- Field 1.1- Field 2
- Field 2.1- Field 2.2- …
Standards for the Representation of Knowledge on the Semantic WebThe Semantic Web Approach: A Web of
(Meta)data
subject
Amsterdam
par3
file1
Article
type
partOf
DocumentsubClassOfThe_Netherlands
hasCapital
City
type
Standards for the Representation of Knowledge on the Semantic Web
A footnote
• Why “(meta)data”?
• Because what is metadata for certain applications can indeed be the data for the Semantic Web
• Boundary is blurred
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
The Semantic Web (1/4)
• Pointing at resources• What? Knowledge objects, everything that we may
want to refer to (including documents)
• How? Uniform Resource Identifiers (incl. URLs)
Standards for the Representation of Knowledge on the Semantic Web
A Web of Resources
myVoc2:Amsterdam
http://ex.org/files/file1#par3
http://ex.org/files/file1
myVoc1:Article
http://www.ned.nl/rep321
Standards for the Representation of Knowledge on the Semantic Web
The Semantic Web (2/4)
• Pointing at resources: URIs
• Creating structured assertions involving resources• What? Structured assertions with typed links
• How? RDF (Resource Description Framework)
Factual knowledge encoded as “triples”subject – predicate (property) – object
myVoc1:subject
myVoc2:Amsterdam
http://ex.org/files/file1#par3
Standards for the Representation of Knowledge on the Semantic Web
Data in an RDF “graph”
myVoc1:subject
myVoc2:Amsterdam
http://ex.org/files/file1#par3
http://ex.org/files/file1
myVoc1:partOf
myVoc1:Article
rdf:type
http://www.ned.nl/rep321
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
The Semantic Web (3/4)
• Pointing at resources: URIs
• Enabling structured assertions: RDF
• Giving machine-understandable semantics to “building blocks” • What? Ontologies
• “Formal definitions of shared conceptual vocabularies”
• Giving semantics for properties and classes
• How? RDFS /OWL (Ontology Web Language)
Standards for the Representation of Knowledge on the Semantic Web
RDF Schema (RDFS)
• Meta-language to create vocabularies• “Article” is an (RDFS) Class
• Denotes a type, a collection of resources (individuals)
• “subject” is an (RDFS) Property
• Giving semantics to vocabulary elements• My “Article” has the literal article as a label for
display• myVoc1:Article rdfs:label “article”
• “Article” is a subclass of the class “Document”• myVoc1:Article rdfs:subClassOf myVoc1:Document
• “subject” is applied to resources of type “Document”• myVoc1:Article rdfs:domain myVoc1:Document
Standards for the Representation of Knowledge on the Semantic Web
RDF Schema (RDFS)
• Different kind of constructs• Assigning domain and ranges of properties
• Creating hierarchies of classes and properties
• Labels and informal specifications
• (Some) Equipped with formal semantics• R rdf:type C1, C1 rdfs:subClass C2 -> X rdf:type C2
• P rdfs:domain C, R1 P R2 -> R1 rdf:type C
Standards for the Representation of Knowledge on the Semantic Web
Web Ontology Language (OWL)
• Same function as RDFS, but more possibilities, e.g.• Characteristics of properties
• Inverse(hasAuthor, authorOf)
• Restriction on property usage• SubClassOf(Books, restriction(hasISBN minCardinality(1)))
• Combination and exclusion of classes and properties• DisjointClasses(Persons, Books)
• Inherits from AI research and Description Logics
• Comes in different levels of complexity:• Lite, DL, Full
Standards for the Representation of Knowledge on the Semantic Web
Tools to build RDFS/OWL ontologies
Standards for the Representation of Knowledge on the Semantic Web
Ontological information
myVoc1:subject
myVoc2:Amsterdam
http://ex.org/files/file1#par3
http://ex.org/files/file1
myVoc1:Article
rdf:type
myVoc1:partOf
myVoc1:Documentrdfs:subClassOf
http://www.ned.nl/rep321
Standards for the Representation of Knowledge on the Semantic Web
The Semantic Web (4/4)
• Pointing at resources: documents, knowledge objects
• Enabling structured assertions
• Using “building blocks” with precise semantics
• Controlling existing facts, inferring new onesPart of the tasks are delegated from the user to
inference engines that use the formal semantics of ontologies
Standards for the Representation of Knowledge on the Semantic Web
Ontological information
myVoc1:subject
myVoc2:Amsterdam
http://ex.org/files/file1#par3
http://ex.org/files/file1
myVoc1:Article
rdf:type
myVoc1:partOf
myVoc1:Documentrdfs:subClassOf
http://www.ned.nl/rep321
rdf:type
Standards for the Representation of Knowledge on the Semantic Web
RDFS/OWL and Semantic Interoperability
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
Why is it interesting?
• RDF model is simple• Just triples
• There is meaning exploitable by computers
• Resources are universal, hence shareable• One resource for one object, used in different places
• Vocabularies for (meta)data are made of resources• They can be re-used in different applications
• RDF does not enforce the use of a specific ontology
• Their meaning (incl. formal semantics) is shareable
Standards for the Representation of Knowledge on the Semantic Web
Building on top of the Web
• Web-based resources allow distribution/sharing of • document
• vocabulary
• (meta)data
(par3, subject, Amsterdam)
differentowners & locations
http://www.kb.nl/eDepot
http://www.geo.org/voc/
http://www.ned.nl/rep321
Standards for the Representation of Knowledge on the Semantic Web
Why is it interesting?
• Using open standards• W3C’s URI, XML, RDF, RDFS, OWL
Standards for the Representation of Knowledge on the Semantic Web
Footnote: Building on top of XML
<rdf:Description rdf:about=”http://www.ned.nl/doc321”> <myVoc1:subject rdf:resource=” http://www.geo.org/Amsterdam”/></rdf:Description><rdf:Description rdf:about=”http://www.geo.org/The_Netherlands”> <myVoc2:hasCapital rdf:resource=”http://www.geo.org/Amsterdam”/></rdf:Description>
• RDF can be encoded as XML data• RDF/XML is the reference syntax, but others are
possible
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing (meta)data to the Semantic Web• SKOS
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
Problem: Data Population
• How will Semantic Web data will be created?• Creation of “born-semantic” data?
• Automatic or manual (tagging)
• Converting existing databases to SW format• Fits the vision of the SW as a place to exchange data
• In the CH situation: porting legacy metadata is fundamental
Standards for the Representation of Knowledge on the Semantic Web
Porting CH Metadata to the Semantic Web
• Requirement: an ontology to create SW-enabled representations for metadata• “Ontologized” metadata schema
• A first candidate: Dublin Core for metadata schema• Well-established set of metadata elements
• Already coming in RDFS!
Standards for the Representation of Knowledge on the Semantic Web
Porting KOSs to the Semantic Web
• How about metadata values from Knowledge Organization Schemes?• E.g. dc:subject values (terms, keywords, classes…)
• DC does not address the problem of KOS representation
• Why is it important?• Their heterogeneity is a primary source of
interoperability problems
• They are provided with (informal) semantics• Taxonomies, associative networks can be exploited in
many applications
Standards for the Representation of Knowledge on the Semantic Web
Porting KOSs to the Semantic Web
• A first solution: converting KOSs to formal ontologies• Ontologization of terms/concepts into classes
• Problem: KOSs are generally no full-fledged ontologies• Iconclass: “Group of Birds” rdfs:subClassOf “Birds”?
• There is much work needed to have semantics fit!
• The concept of a car (reference=a subject in a KOS)
vs. the class of cars (reference=a set of objects in the world)
• Things in ontologies and KOSs don’t have the same epistemological status
• We need a model for elements of the realm of subjects
Standards for the Representation of Knowledge on the Semantic Web
Representing KOSs – Requirements
Many different models and formats to represent vocabularies
• Need for standard formats to develop standardized tools and methods• Semantic correspondences
• Browsing/information retrieval tools using vocabularies
• Need to represent features commonly used by these tools• Especially lexical information and semantic links
Standards for the Representation of Knowledge on the Semantic Web
SKOS (Simple Knowledge Organisation System)
• Model to represent KOSs (thesauri, classification schemes) on the Semantic Web in a simple way• Comparable to Dublin Core, for conceptual vocabularies
• SKOS offers building blocks to create XML/RDF data• Concepts and ConceptSchemes
• Lexical properties (prefLabel, altLabel)
• Semantic relations (broader, related)
• Notes (scopeNote, definition)
Standards for the Representation of Knowledge on the Semantic Web
SKOS: Iconclass Example
Standards for the Representation of Knowledge on the Semantic Web
SKOS: Limitations
• SKOS is a standard• Simple• Meant for information exchange and re-use
• Not everything can be represented!E.g. for Iconclass, difficulty to represent all types of
auxiliaries• Keys, structural digits…
• It is still work in progress• W3C Semantic Web Deployment Working Group
Standards for the Representation of Knowledge on the Semantic Web
Agenda
• Interoperability problems in Cultural Heritage
• An introduction to the Semantic Web• The problem
• RDF
• RDFS/OWL
• Why is it interesting?
• Porting existing metadata to the Semantic Web• SKOS
• Conclusion: SW and semantic alignment
Standards for the Representation of Knowledge on the Semantic Web
What have we seen?
• TODO
Standards for the Representation of Knowledge on the Semantic Web
Back to the Problem: Semantic Alignment
• Different ontologies/individuals should be aligned at the semantic level• Using the same resources to join SW graphs together
• Using the same vocabularies and semantics
• But: difficulty to recognize equivalent resources at data creation time• There is (and will be) no such thing as a single one
ontology!
• A posteriori semantic alignment is needed
Standards for the Representation of Knowledge on the Semantic Web
Back to the Problem: semantic alignment
• Fortunately, SW languages give appropriate means• Equivalence/specialization links for properties and classes
• myVoc:auteur rdfs:subPropertyOf dc:creator
• myVoc:Article owl:equivalentClass yourVoc:Artikel
• Identity link for individuals• vu:aisaac owl:sameAs kb:AntoineIsaac
• (yet unstable) SKOS mapping links for subjects• iconclass:birds exactMatch swd:vogel
• But they don’t do the job for us!• The links have to be created somehow
• This is another story…
Standards for the Representation of Knowledge on the Semantic Web
Thank you!
Standards for the Representation of Knowledge on the Semantic Web
Vocabulary alignment
• Find correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”
• “maagd Maria” = “Heilige Moeder”
• STITCH aim: doing it (semi-)automatically• Vocabularies are big
• They evolve over time
• Using techniques from Semantic Web research domain• Problem comparable to ontology alignment
• Techniques already investigated there• Linguistics, statistics
Standards for the Representation of Knowledge on the Semantic Web
Automatic alignment techniques
• Lexical • Structural• Statistical• Background knowledge
Standards for the Representation of Knowledge on the Semantic Web
Lexical alignment
• Labels of entities, textual definitions
tumorbrainLong tumor LongMore specific than
Standards for the Representation of Knowledge on the Semantic Web
Automatic Alignment Techniques
• Lexical • Structural• Statistical• Background knowledge
Standards for the Representation of Knowledge on the Semantic Web
Statistical alignment
• Object information (e.g. book indexing)
Thesaurus 1
Thesaurus 2
Collectionof books
“DutchLiterature”
“Dutch”
Standards for the Representation of Knowledge on the Semantic Web
Automatic Alignment Techniques
• Lexical • Structural• Statistical• Background knowledge
Standards for the Representation of Knowledge on the Semantic Web
Backgroundknowledge
Alignment using shared background knowledge
• Using a shared conceptual reference to find links
Thesaurus 1 Thesaurus 2
“Calendar”
“Publication”
Standards for the Representation of Knowledge on the Semantic Web
Alignment: no universal solution
• No single technique gives an ideal solution
• Different techniques have to be selected/combined, depending on the application case• Poor vs. rich semantic structure
• Extensive vs. limited lexical coverage
• Existence of collections described by several vocabularies
• Alignment is a difficult research problem
Standards for the Representation of Knowledge on the Semantic Web
Conclusions : Alignement
• Les techniques simples permettent d'obtenir des résultats rapides• 12300 concepts de Mandragore “accessibles” depuis
Iconclass
• Leur fiabilité ne permet pas de les considérer comme sources uniques• Combinaison avec travail manuel (vérification, complétion)
• L’alignement sémantique est toujours un problème de recherche difficile• Aucune technique n’est parfaite• Il faut sélectionner/combiner, en fonction des cas applicatifs
Standards for the Representation of Knowledge on the Semantic Web
Demo
• http://prauw.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandraNewNONE , amphibiens
• Blé
Standards for the Representation of Knowledge on the Semantic Web
Conclusions : Représentation
• Il est possible de produire des représentations WS standardisées (SKOS) des vocabulaires conceptuels• Et des méta-données qui les utilisent
• Les techniques existantes pour accéder aux méta-données et vocabulaires (OAI-PMH, XML) facilitent le travail
• C’est utile • Réutilisation/interopérabilité des composants applicatifs
utilisant les vocabulaires
• Facilité de la représentation de liens avec des éléments extérieurs au vocabulaire représenté
Standards for the Representation of Knowledge on the Semantic Web
Links
• STITCH http://stitch.cs.vu.nl• Demo collections
• BNF Mangragore http://mandragore.bnf.fr• KB illuminated manuscripts http://www.kb.nl/manuscripts/
• Library-originated integration projects:• MSAC search interface http://sigma.nkp.cz• MACS project http://macs.cenl.org
• Semantic web links• Semantic Web at W3C http://www.w3.org/2001/sw/• SKOS http://www.w3.org/2004/02/skos/
• Semantic Web projects dealing with Cultural Heritage• MuseumFinland http://www.museosuomi.fi/ • eCulture
http://e-culture.multimedian.nl/
Standards for the Representation of Knowledge on the Semantic WebDemo (1)
Subject vocabulary, collection 1
Subjects
Standards for the Representation of Knowledge on the Semantic Web
Demo (2)
Hierarchical path from root to selected
subject
Possible specialization for selected subject
Standards for the Representation of Knowledge on the Semantic Web
Document from Collection 2
Semantic alignment of subjects activated
Demo (3)
Standards for the Representation of Knowledge on the Semantic Web
Demo (4)
Subject from voc2 aligned to voc1:amphibians”
Back