cs3352 metadata the semantic web directories and thesauri xml is not enough topic maps rdf

26
CS3352 Metadata Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

Upload: nora-watts

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

MetadataMetadata

The Semantic WebDirectories and Thesauri

XML is not enoughTopic maps

RDF

Page 2: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Sources of Knowledge for Sources of Knowledge for finding documents finding documents

[DeRose99][DeRose99] “The user, including their current explicit query and

any historical or profile information the system may have gained earlier.

The documents in the library or on the web, including their nominal "content" and whatever metadata has been attached

The world, about which the system may have certain information, such as dictionaries and thesauri of natural language terms; basic knowledge of object categories ("dog is-a animal"), and much more…”

Text, image

Mark-up, Links, Catalogue database

Ontologies, ThesauriKnowledge

Page 3: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

What is metadata?What is metadata?

Data cataloging resources– Administrative cataloguing: acquisition history,

author…– Structural: size, image format…

Data describing the content and meaning of resources

royal UK male trophy presenter, footballer trophy

winner

Page 4: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Expressive, so we can say what we want;Compositional, so that we can build complex terms out of simple pieces;Controlled, so we only say consistent and coherent things;Incremental, so we can keep adding descriptions

Metadata RepresentationMetadata Representation

Page 5: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Dublin CoreDublin Core A standard for metadata defined by the digital library community

Others: MARC, VRA… 15 Elements:

– Title Subject Description – Creator Publisher Contributor – Date Type Format – Identifier Source Language – Relation Coverage Rights

From : Metadata for images, Michael Day http://www.ukoln.ac.uk

Core elements defined in RFC 2413:

http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt

http://www.ariadne.ac.ukhttp://www.ukoln.ac.uk

Page 6: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Metadata on the web Metadata on the web yesterdayyesterday

Meta tags

Page 7: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Metadata Metadata on the Web on the Web yesterdayyesterday

<?xml version="1.0" encoding="utf-8"?><book isbn="0836217462"><title>Being a Dog Is a Full-Time Job</title><author>Charles M. Schulz</author><character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification></character><character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>

Page 8: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Metadata on the web Metadata on the web yesterdayyesterday

Page 9: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

World Wide WebWorld Wide WebTim Berners-Lee reprise…“... a goal of the Web was that, if the interaction

between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”

Berners-Lee 1996

Page 10: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Web = Data+Information-Web = Data+Information-KnowledgeKnowledge

Browse the LinksSearch using Words

steamer, tank

Search using experience Link structure is content

– rhetorical narratives

Search using indexesMetadata and classifications

Page 11: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

“Find a very successful European team-based sports person”

Resource describing UK soccer players and their careers

Resource listing sporting competitions including FA Cup and Superbowl

Resource that lists teams that have won the FA Cup

Resource describing the Olympic Games

Steve Redgrave’s home page

?•Metadata

•Knowledge•Inference

Page 12: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

People

SportCompetition

Soccer

participatesparticipants =

11

Rowing

Coxless Fours

participants = 4

Tournament

Event

Sports Tournament

Olympic Games

Soccer player

Sports Person

Rower

Wimbledon

win

Rower win Olympic Games

UK Rower win Olympic Games

> 2 times

Tennis

FA Cup

Soccer player wins

FA Cup once

Soccer Tournament

TennisTournament

Countrynationality

UK

Europe

partof

holds

Page 13: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

A Shared UnderstandingA Shared UnderstandingMetadata

– Data describing the content and meaning of resources

– But everyone must speak the same language…Terminologies

– Shared and common vocabularies– For search engines, agents, curators, authors and

users – But everyone must mean the same thing…

Ontologies– Shared and common understanding of a domain– Essential for exchange and discovery

Page 14: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Ontologies Ontologies “The [reusable] specification of conceptualizations, used

to help programs and humans share knowledge” [Gruber93]

An ontology will include: – a vocabulary of terms, and– some specification of their meaning– structure on the domain and constrain the possible

interpretations of terms [Uschold99]

– precise notion of what meaning meansOntologies provide: a shared and common understanding of a domain that

can be communicated across people and applications

Page 15: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

OntologyOntology

Precise notion of what meaning means

formal, explicit, rigour unambigious agents not just people machine computable from machine-readable to machine-

understandable.

use knowledge representation and reasoning to supply the meaning

Page 16: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

What is an Ontology?What is an Ontology?

Catalog/ID

GeneralLogical

constraints

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance Value

Restrs.

Disjointness, Inverse, part-of…

From Debbie McGuinness

Page 17: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Ontologies and E-Ontologies and E-AnythingAnythingSimple ontologies provide: Controlled shared vocabulary (search engines, authors, users,

databases, programs all speak same language) Organization (and navigation support) Expectation setting (left side of many web pages) Browsing support (tagged structures such as Yahoo!) Search support (query expansion approaches such as FindUR, e-

Cyc) Sense disambiguation Conflict detection Structured, comparative search Generalization/ Specialization …

From Debbie McGuinness

Page 18: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

The Semantic WebThe Semantic Webhttp://www.semanticweb.org

Page 19: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Metadata on the web Metadata on the web tomorrowtomorrow

Resources annotated with metadata using knowledge as a shared vocabulary– Metadata held outside the resource

Knowledge structures for holding the ontology– XML DTDs

Product classifications – Directories

Home > Recreation > Sports > Events > International Games > Olympic Games >

W3C: RDF and RDFS– Resource Description Framework

Topic maps DAML+OIL

Page 20: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

XML is not good for XML is not good for describing ontologiesdescribing ontologies

XML defines grammars to verify and structure documents

The grammar enforces constraints on tags Different grammars define the same content XML lacks a semantic model – it only has a surface

model which is a tree.

course

teachertitle students

name http

<course date=“...”><title>...</title><teacher>...</teacher>

<name>...</name><http>...</http>

<students>...</students></course>

• node = label + attr/values + contents

Page 21: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

XML is not good for XML is not good for describing ontologiesdescribing ontologies

Meaning of XML documents is intuitively clear– “semantic” markup tags are domain terms

But computers do not have intuition– Tag names per se do not provide semantics– The semantics are encoded outside the XML specification

XML makes no commitment on: Domain specific ontological vocabulary Ontological modelling primitives requires pre-arranged agreement on & Feasible for closed collaboration

– agents in a small & stable community– pages on a small & stable intranet

Page 22: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

XML DTDs and XML SchemaXML DTDs and XML Schema DTD does not distinguish between objects and relations XML Schema’s type extension mechanism is a red

herring – it can’t be used to model ontological subtypes XML has been used as a serialisation syntax for other

markup languages – e.g. SMIL, XOL<class> <name> person </name></class><slot> <name>year-of-birth</name> <domain.person</domain> <slot-cardinality>1</slot-cardinality></slot>

Page 23: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Requirements for an Requirements for an Ontology-languageOntology-language

Well designed– Useful and proven modelling primitives– Intuitive to human users– Can say simple things simply– Expressive enough to capture many ontologies– Efficient, sound and complete reasoning support

Well defined– clear syntax - read ontologies– Formal semantics – understand (process) ontologies - to

facilitate machine interpretation of that semantics;– Expressive enough to capture many ontologies

Compatible– Easy mapping to/from other ontology languages– Maximum compatibility with XML and RDF(S);

Page 24: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Sem Web Research IssuesSem Web Research Issues Ontology creation

– Millions of ontologies will be built– Ontology Engineering is difficult and time-consuming– Ontology Learning– Scalable RDF Repositories (all is built on top of the

same data model !) Infrastructure

– Scalable reasoning services for different languages– Resource-ID Management– Versioning of ontologies and corresponding metadata

Page 25: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Sem Web Research IssuesSem Web Research Issues Metadata Management

– legacy data (HTML, XML, ...) -> legacy data migration:– Annotation of Web documents (HTML, PDF, ...)– Semi-automation using information extraction– XML-Wrapper / Transformer– Database Converter / Exporter

Maintenance of Metadata, ontologies and resources– sources, ontologies, and metadata have to be

maintained in a consistent way organizational process is needed tools are needed Metadata have to reflect changes of the sources metadata have to reflect changes of the ontologies

Page 26: CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

CS3352

Selected Semantic Web Selected Semantic Web ProjectsProjects

COHSE – http://inanna.ecs.soton.ac.uk/cohse/

Ontobroker – http://ontobroker.aifb.uni-karlsruhe.de/

SHOE– http://www.cs.umd.edu/projects/plus/SHOE/