nancy ide vassar college usa resource definition framework a tutorial eurolan 2003 july 28 - august...

Post on 24-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nancy IdeVassar College

USA

Resource Definition Framework

A Tutorial

EUROLAN 2003 • July 28 - August 8 • Bucharest - Romania

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

The Semantic Web: Where RDF fits in

RDF overviewConcepts

Data Model

RDF Syntax

RDF Schema

RDF, RDFS and language technology

Outline

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

What is the Semantic Web?“a conceptual information space in which resources identified by URIs can be processed by machines”

Relies on three key elements:identification of resources

defining the semantics of resource descriptions and relationships among resources

inferring new knowledge from available information

All of this must be done using common, machine-processable notations

Overview

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Supporting TechnologiesThe Layer-cake model

XML

RDF RDF Schema

Ontologies (OWL)

Rules

Logic Framework

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Provides a common syntax for marking up documents

Data model: ordered, labeled tree

The Base: XML

<bookInfo> <title>The Royal Navy</title> <author> <persName type=“pen name”> <title>Sir</title> <foreName>Edward</foreName> <surName>Bulwer-Lytton</surName> <rolename>Barron Lytton of <placeName>Kenworth</placeName> </roleName> </persName> </author></bookInfo>

bookinfo

title

surNametitle

author

persName

foreName

placeName

roleName

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Why Do We Need RDF?

<bookInfo> <title>The Royal Navy</title> <author> <persName type=“pen name”> <title>Sir</title> <foreName>Edward</foreName> <surName>Bulwer-Lytton</surName> <rolename>Barron Lytton of <placeName>Kenworth</placeName> </roleName> </persName> </author></bookInfo>

XML provides only impoverished semantics

<X356T0> <Y71109>The Royal Navy</Y71109> <KH561F> <L098JN> type=“pen name”> <Y71109>Sir</Y71109 > <XXS553>Edward</XXS553 > <NJK098>Bulwer-Lytton</NJK098> <R4W23T>Barron Lytton of <PPY6G1>Kenworth</PPY6G1> </R4W23T> </L098JN> </KH561F></X356T0>

What the human sees What the computer sees

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

No agreement onstructure

what does nesting mean? Part-of? Something else?

is bookInfo an object? class? attribute? relation? something else?

vocabulary

do both title elements mean the same thing?

is author the same ascreator?

XML “semantics”

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Provides a way to give meaning to information that is machine-processable

W3C Recommendationhttp://www.w3c.org/RDF

A data model for describing data about data (metadata)

RDFResource Definition Framework

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Three object types Resources

Things being described by RDF expressions. Resources are always named by URIs

e.g., HTML Document, specific XML element within the document source, a collection of pages, a book

PropertiesSpecific aspect, characteristic, attribute or relation used to describe a resource

e.g., Creator, Title, Name

Statements

Resource (Subject) + Property (Predicate) +  Property Value (Object)

RDF

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF Statements Three parts: subject, predicate, object

describe properties of resources

ResourceAnything that can be described by a URI

a document, part of a document, image, on the Web

http://www.cs.vassar.edu/~ide

a real world object

e.g. a book: isbn://9402-5546-1234

The Data Model

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Uniform Resource IdentifierThe generic set of all names/addresses consisting of short strings that refer to resources

URLs (Uniform Resource Locators) are a particular type of URI, used on the WWW

URIs look like URLs, sometimes with fragment identifiers to point at specific parts of a document

URIs

http://somedomain.com/some/path/to/file#fragment

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Basic element is the triplea resource (the subject) is linked to another resource (the object) via an arc labeled by a relation (the predicate)

<subject> has a property <predicate> valued by <object>

Example

RDF

NancyIde

EncodingSyntactic

Annotation

author-of

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Statements The English word “car” translates to the French word “voiture”The word “car” is a nounNancy Ide the author of “Encoding Syntactic Annotation”

Examples

translates-toCAR voiture

noun

NancyIde

EncodingSyntactic

Annotation

is-a

author-of

SUBJECT PREDICATE OBJECTCAR translates-to voiture

CAR is-a noun

Nancy Ide author-ofEncoding Syntactic Annotation

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

The subject of one statement can be the object of another statement

RESULT: a labeled directed graph

RDF Triples

NancyIde

EncodingSyntactic

Annotation

author-ofemployee

Vassar College

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

One syntax for expressing RDF statements is XML

Tags and attributes have a specific meaningDescription element describes a resource

every attribute or nested element inside a Description is a property of that resource

RDF Syntax

<Description about=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>

Does this solve the structure and vocabulary problems?

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Different ways to express the same model

RDF/XML Syntax is Just a Syntax

<Description about=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>

<Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”> <author-of>Encoding Syntactic Annotation</author-of> </employee></Description>

<Description about=”http://www.cs.vassar.edu/~ide” author-of=”Encoding Syntactic Annotation”/></Description><Description about=”http://www.vassar.edu”> <employee resource=”http://www.cs.vassar.edu/~ide”/></Description>

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Use namespaces to indicate where the defining RDF schema exists

Namespaces

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vassar=”http://www.vassar.edu/schema.rdf” xmlns:biblio=”http://www.library\ies.org/schema.rdf”>

<Description rdf:about=”http://www.cs.vassar.edu/~ide”> <biblio:author-of>Encoding Syntactic Annotation</biblio:author-of></Description><Description rdf:about=”http://www.vassar.edu”> <vassar:employee rdf:resource=”http://www.cs.vassar.edu/~ide”/></Description>

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Make explicit statements about web resources

The computer knows that these are statements, knows how the statements relate, can compare values

But...we still lack a way to define a vocabulary

Should we use author or creator?

Is Nancy Ide an author?

Are there other authors?

What properties can authors have?

What is RDF Used For?

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF is a data model that allows you to assert relation(s) between two objects

RDFS (RDF schemas) are a means to define classes and sub-classes of objects and the relations that may hold between these objects

RDF and RDFS

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF provides a data model for metadata annotation and a way to express it in XML, but it cannot define the vocabulary for a domain

RDF Schema allow you to define vocabulary terms and the relations between these terms

Adds semantics to RDF predicates and resources

define how a term should be interpreted by specifying its properties and the kinds of objects that can be the values of these properties

RDF Schema

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF Schema core primitives

Class, Property

type, subClassOf, domain, range

Vocabulary definition with these primitives:

<Person, type, Class>

<Author, subClassOf, Person>

<Employee, domain, Person>

Some RDF Schema Terminology

These are just RDF statements, but in RDF Schema they have special meaning

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

The semantics of RDF Schema are expressed in natural language:

2.3.2 rdfs:subClassOf

The semantics of RDF Schema

“This property specifies a subset/superset relation between classes. The rdfs:subClassOf property is transitive. If class A is a subclass of some broader class B, and B is a subclass of C, then A is also implicitly a subclass of C. Consequently, resources that are instances of class A will also be instances of class C, since A is a subset of both B and C. Only instances of rdfs:Class can have the rdfs:subClassOf property and the property value is always of rdf:type rdfs:Class. A class may be a subclass of more than one class.”

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Set-theoretical semantics for RDF and RDFS specifies entailment rules, for example:

[rdfs7b] (reflexivity)(xxx, rdf:type, rdfs:Class) => (xxx, rdfs:subClassOf, xxx)

[rdfs8] (transitivity)(xxx, rdfs:subClassOf, yyy) & (yyy, rdfs:subClassOf, zzz) => (xxx, rdfs:subClassOf, zzz)

RDF Model Theory

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Example RDF Schema

Part-of-Speech

Noun Verb

Motion VerbCommon Noun

Subject-of

sub-class of sub-class of

sub-class of sub-class ofdomain range

Ontology Level

Data LevelSubject-of

Dogs run

type type

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Part-of-Speech

Noun Verb

Motion VerbCommon Noun

Subject-of

sub-class of sub-class of

sub-class of sub-class ofdomain range

Ontology Level

Language Level

Resourcesub-class of sub-class of

Property Class

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Classes and properties are modeled separately!

Different from typical Object-Oriented modeling where properties (attributes) are part of a class

Because of this, domain/range statements are very restrictive

Observations

Remember: RDF Schema is just RDF, but with some added meaning to particular terms

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Domain RestrictionsPart-of-Speech

Noun Verb

Motion VerbCommon Noun

Genderdomain

chatbouge

MM

“M” is a literal value

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Problem solved...

Noun Verb

Genderdomain

Part-of-Speech

Moving the domain restriction up the hierarchy solves the problem

But risk over-generalization

properties get “loose” restrictions

classes may be allowed properties they should not have

e.g. now any part of speech has the GENDER property

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF Schema Syntax

<rdfs:Property rdf:about="http://www.linguistics.org/schema.rdf#number"><rdfs:domain rdf:resource="http://www.linguistics.orgschema.rdf#PartOfSpeech"/><rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>

<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#Noun"><rdfs:label>Noun</rdfs:label> <rdfs:comment>Class for nouns</rdfs:comment> <rdfs:subClassOf rdfs:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/></rdfs:Class>

<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#PartOfSpeech"><rdfs:label>POS</rdfs:label> <rdfs:comment>Class for the general category part of speech</rdfs:comment> </rdfs:Class>

Class Definitions

Property Definition

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Putting It All Together

<rdfs:Property rdf:about="http://www.linguistics.org/schema.rdf#number"><rdfs:domain rdf:resource="http://www.linguistics.orgschema.rdf#PartOfSpeech"/><rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>

</rdf:RDF>

<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#Noun"><rdfs:label>Noun</rdfs:label> <rdfs:comment>Class for nouns</rdfs:comment> <rdfs:subClassOf rdfs:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/></rdfs:Class>

<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#PartOfSpeech"><rdfs:label>POS</rdfs:label> <rdfs:comment>Class for the general category part of speech</rdfs:comment> </rdfs:Class>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

The schema file: http://www.linguistics.org/schema.rdf

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Using the Schema

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pos="http://www.linguistics.org/schema.rdf#">

<pos:Noun rdf:ID="dogs"> <pos:number rdf:value="Plural"/></pos:Noun><pos:Verb rdf:ID="run"> <pos:number rdf:value="Plural"/></pos:Verb>

</rdf:RDF>

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Defining a Default Namespace

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">

<Noun rdf:ID="dogs"> <number rdf:value="Plural"/></Noun><Verb rdf:ID="run"> <number rdf:value="Plural"/></Verb>

</rdf:RDF>

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Referring To Another Resource<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">

<Noun rdf:about="Mydoc#W1"> <number rdf:value="Plural"/><Noun><Verb rdf:about="Mydoc#W2"> <number rdf:value="Plural"/></Verb>

</rdf:RDF>

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

One possible use of RDF is to pre-define “linguistic objects” that can be used by other resources such as lexicons, taggers, etc.

An RDF schema defines a class and its properties, but does not instantiate objects of that class

in previous examples, “dogs” and “run” were instantiated as objects of class Noun

Creating Pre-defined Linguistic Objects

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

A “Data Category” Definition<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.linguistics.org/schema.rdf#">

<Noun rdf:ID=”NMP”> <gender rdf:value=”masculine”/> <number rdf:value=”plural”/></Noun>

<Verb rdf:ID=”V3pl”> <number rdf:value=”plural”/> <person rdf:value=”3rd”/></Verb>

</rdf:RDF>

File: http://www.linguistics.org/categories.rdf”

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

Using the Definition<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ling="http://www.linguistics.org/schema.rdf#">

<ling:word rdf:value=”dog”> <ling:POS rdf:resource=”http://www.linguistics.org/categories.rdf#NMS”/> <ling:word rdf:about=”http://www.mySite.edu/myDoc#W1”> <ling:POS rdf:resource=”http://www.linguistics.org/categories.rdf#NMS”/>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdfs:Class rdf:about="http://www.linguistics.org/schema.rdf#word"><rdfs:label>Word</rdfs:label> <rdfs:comment>Class for a word</rdfs:comment> </rdfs:Class>

<rdf:Property rdf:ID="POS"><rdfs:domain rdfs:resource="http://www.linguistics.orgschema.rdf#word"/><rdfs:range rdf:resource="http://www.linguistics.org/schema.rdf#PartOfSpeech"/>

</rdf:RDF>

Additions to the linguistics schema.rdf

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

RDF and RDFS give us the capability to provide some semantics for resources and the relations between them

But there is a lot missingboolean operators, cardinality constraints, disjunction, etc.

These are in the next level: OWL

Beyond RDF and RDFS

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

The previous examples suggest how the Semantic Web can benefit language technology

ResourcesPre-defined linguistic objects can be used in lexicons, term banks, annotations, etc.

Goes toward a commonly agreed-upon set of categories

Language Processing applicationsCan exploit linguistic knowledge “attached” to data to enhance capability

The Semantic Web and Language Technology

EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania

W3C RDF Model and Syntax Specification

http://www.w3.org/TR/REC-rdf-syntax/

W3C RDF Schema Specification 1.0 http://www.w3.org/TR/2000/CR-rdf-schema-20000327/

W3C RDF Validation Service

http://www.w3.org/RDF/Validator/

W3C RDF http://www.w3.org/RDF/

List of RDF resources http://www.ilrt.bris.ac.uk/discovery/rdf/resources/

SiRPAC - Simple RDF Parser & Compiler (Java) http://www.w3.org/RDF/Implementations/SiRPAC/

Libwww - RDF Parser (C) http://www.w3.org/Library/

Resources and Tools

top related