cs3352 metadata the semantic web directories and thesauri xml is not enough topic maps rdf
TRANSCRIPT
CS3352
MetadataMetadata
The Semantic WebDirectories and Thesauri
XML is not enoughTopic maps
RDF
CS3352
Sources of Knowledge for Sources of Knowledge for finding documents finding documents
[DeRose99][DeRose99] “The user, including their current explicit query and
any historical or profile information the system may have gained earlier.
The documents in the library or on the web, including their nominal "content" and whatever metadata has been attached
The world, about which the system may have certain information, such as dictionaries and thesauri of natural language terms; basic knowledge of object categories ("dog is-a animal"), and much more…”
Text, image
Mark-up, Links, Catalogue database
Ontologies, ThesauriKnowledge
CS3352
What is metadata?What is metadata?
Data cataloging resources– Administrative cataloguing: acquisition history,
author…– Structural: size, image format…
Data describing the content and meaning of resources
royal UK male trophy presenter, footballer trophy
winner
CS3352
Expressive, so we can say what we want;Compositional, so that we can build complex terms out of simple pieces;Controlled, so we only say consistent and coherent things;Incremental, so we can keep adding descriptions
Metadata RepresentationMetadata Representation
CS3352
Dublin CoreDublin Core A standard for metadata defined by the digital library community
Others: MARC, VRA… 15 Elements:
– Title Subject Description – Creator Publisher Contributor – Date Type Format – Identifier Source Language – Relation Coverage Rights
From : Metadata for images, Michael Day http://www.ukoln.ac.uk
Core elements defined in RFC 2413:
http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt
http://www.ariadne.ac.ukhttp://www.ukoln.ac.uk
CS3352
Metadata on the web Metadata on the web yesterdayyesterday
Meta tags
CS3352
Metadata Metadata on the Web on the Web yesterdayyesterday
<?xml version="1.0" encoding="utf-8"?><book isbn="0836217462"><title>Being a Dog Is a Full-Time Job</title><author>Charles M. Schulz</author><character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification></character><character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>
CS3352
Metadata on the web Metadata on the web yesterdayyesterday
CS3352
World Wide WebWorld Wide WebTim Berners-Lee reprise…“... a goal of the Web was that, if the interaction
between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”
Berners-Lee 1996
CS3352
Web = Data+Information-Web = Data+Information-KnowledgeKnowledge
Browse the LinksSearch using Words
steamer, tank
Search using experience Link structure is content
– rhetorical narratives
Search using indexesMetadata and classifications
CS3352
“Find a very successful European team-based sports person”
Resource describing UK soccer players and their careers
Resource listing sporting competitions including FA Cup and Superbowl
Resource that lists teams that have won the FA Cup
Resource describing the Olympic Games
Steve Redgrave’s home page
?•Metadata
•Knowledge•Inference
CS3352
People
SportCompetition
Soccer
participatesparticipants =
11
Rowing
Coxless Fours
participants = 4
Tournament
Event
Sports Tournament
Olympic Games
Soccer player
Sports Person
Rower
Wimbledon
win
Rower win Olympic Games
UK Rower win Olympic Games
> 2 times
Tennis
FA Cup
Soccer player wins
FA Cup once
Soccer Tournament
TennisTournament
Countrynationality
UK
Europe
partof
holds
CS3352
A Shared UnderstandingA Shared UnderstandingMetadata
– Data describing the content and meaning of resources
– But everyone must speak the same language…Terminologies
– Shared and common vocabularies– For search engines, agents, curators, authors and
users – But everyone must mean the same thing…
Ontologies– Shared and common understanding of a domain– Essential for exchange and discovery
CS3352
Ontologies Ontologies “The [reusable] specification of conceptualizations, used
to help programs and humans share knowledge” [Gruber93]
An ontology will include: – a vocabulary of terms, and– some specification of their meaning– structure on the domain and constrain the possible
interpretations of terms [Uschold99]
– precise notion of what meaning meansOntologies provide: a shared and common understanding of a domain that
can be communicated across people and applications
CS3352
OntologyOntology
Precise notion of what meaning means
formal, explicit, rigour unambigious agents not just people machine computable from machine-readable to machine-
understandable.
use knowledge representation and reasoning to supply the meaning
CS3352
What is an Ontology?What is an Ontology?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance Value
Restrs.
Disjointness, Inverse, part-of…
From Debbie McGuinness
CS3352
Ontologies and E-Ontologies and E-AnythingAnythingSimple ontologies provide: Controlled shared vocabulary (search engines, authors, users,
databases, programs all speak same language) Organization (and navigation support) Expectation setting (left side of many web pages) Browsing support (tagged structures such as Yahoo!) Search support (query expansion approaches such as FindUR, e-
Cyc) Sense disambiguation Conflict detection Structured, comparative search Generalization/ Specialization …
From Debbie McGuinness
CS3352
The Semantic WebThe Semantic Webhttp://www.semanticweb.org
CS3352
Metadata on the web Metadata on the web tomorrowtomorrow
Resources annotated with metadata using knowledge as a shared vocabulary– Metadata held outside the resource
Knowledge structures for holding the ontology– XML DTDs
Product classifications – Directories
Home > Recreation > Sports > Events > International Games > Olympic Games >
W3C: RDF and RDFS– Resource Description Framework
Topic maps DAML+OIL
CS3352
XML is not good for XML is not good for describing ontologiesdescribing ontologies
XML defines grammars to verify and structure documents
The grammar enforces constraints on tags Different grammars define the same content XML lacks a semantic model – it only has a surface
model which is a tree.
course
teachertitle students
name http
<course date=“...”><title>...</title><teacher>...</teacher>
<name>...</name><http>...</http>
<students>...</students></course>
• node = label + attr/values + contents
CS3352
XML is not good for XML is not good for describing ontologiesdescribing ontologies
Meaning of XML documents is intuitively clear– “semantic” markup tags are domain terms
But computers do not have intuition– Tag names per se do not provide semantics– The semantics are encoded outside the XML specification
XML makes no commitment on: Domain specific ontological vocabulary Ontological modelling primitives requires pre-arranged agreement on & Feasible for closed collaboration
– agents in a small & stable community– pages on a small & stable intranet
CS3352
XML DTDs and XML SchemaXML DTDs and XML Schema DTD does not distinguish between objects and relations XML Schema’s type extension mechanism is a red
herring – it can’t be used to model ontological subtypes XML has been used as a serialisation syntax for other
markup languages – e.g. SMIL, XOL<class> <name> person </name></class><slot> <name>year-of-birth</name> <domain.person</domain> <slot-cardinality>1</slot-cardinality></slot>
CS3352
Requirements for an Requirements for an Ontology-languageOntology-language
Well designed– Useful and proven modelling primitives– Intuitive to human users– Can say simple things simply– Expressive enough to capture many ontologies– Efficient, sound and complete reasoning support
Well defined– clear syntax - read ontologies– Formal semantics – understand (process) ontologies - to
facilitate machine interpretation of that semantics;– Expressive enough to capture many ontologies
Compatible– Easy mapping to/from other ontology languages– Maximum compatibility with XML and RDF(S);
CS3352
Sem Web Research IssuesSem Web Research Issues Ontology creation
– Millions of ontologies will be built– Ontology Engineering is difficult and time-consuming– Ontology Learning– Scalable RDF Repositories (all is built on top of the
same data model !) Infrastructure
– Scalable reasoning services for different languages– Resource-ID Management– Versioning of ontologies and corresponding metadata
CS3352
Sem Web Research IssuesSem Web Research Issues Metadata Management
– legacy data (HTML, XML, ...) -> legacy data migration:– Annotation of Web documents (HTML, PDF, ...)– Semi-automation using information extraction– XML-Wrapper / Transformer– Database Converter / Exporter
Maintenance of Metadata, ontologies and resources– sources, ontologies, and metadata have to be
maintained in a consistent way organizational process is needed tools are needed Metadata have to reflect changes of the sources metadata have to reflect changes of the ontologies
CS3352
Selected Semantic Web Selected Semantic Web ProjectsProjects
COHSE – http://inanna.ecs.soton.ac.uk/cohse/
Ontobroker – http://ontobroker.aifb.uni-karlsruhe.de/
SHOE– http://www.cs.umd.edu/projects/plus/SHOE/