1cs236607. the need “most of the web's content today is designed for humans to read, not for...
TRANSCRIPT
The Need
“Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001
2cs236607
Semantic ProcessingWe want to be able to pose complex search
tasks that use the semantics of pieces of information, e.g.,
I want to purchase a DVDof “Dore the Explorer” at a price lower than 10$.Is such a CD available atamazon.com?
3cs236607
Current Solution
Use “intelligent” agents
The Semantic-Web Approach
Content is machine-understandable by being bound to some formal description of itself (i.e. metadata)
5cs236607
GoalsWeb of data - provides common data
representation framework to facilitate integrating multiple sources to draw new conclusions
Increase the utility of information by connecting it to its definitions and to its context
More efficient information access and analysis
cs236607 6
ApplicationsAgents that search the Web and retrieve
valuable information to the end userWeb services that publish their informationPrograms that try to integrate data of
different web services and to produce new results or draw new conclusions from the integrated data
cs236607 7
Ontologies & Inference Engines
“For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001
cs236607 8
XML
“XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean”
cs236607 10
RDF –Resource Description FrameworkMeaning encoded in sets of ‘triples’: entities
have properties which have valuesEntities, properties and values all have
distinct URIs
cs236607 11
“imagine that we have access to a variety of databases with information about people, including their addresses. If we want to find people living in a specific zip code, we need to know which fields in each database represent names and which represent zip codes. RDF can specify that "(field 5 in database A) (is a field of type) (zip code)," using URIs rather than phrases for each term.” Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001a
OntologiesDatabase A and Database B may use
different fields to contain ‘zip code’Ontologies sort this outOntology = ‘a document or file that
formally defines the relations among terms’
Ontologies for the web normally haveA taxonomyA set of inference rules
cs236607 12
Agents
“Agent based computing appears to be the appropriate paradigm to work in a complex world with multiple ontologies, fragments and multiple inferencing engines.”
Stork, Hans-Georg and Mastroddi, Franco, Semantic Web Technologies - a New Action Line in the European Commission’s IST Programme, 2001
cs236607 13
The Power of Agents - Integration“The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001
cs236607 14
‘Ambient Intelligence’
“In the next step, the Semantic Web will break out of the virtual realm and extend into our physical world. URIs can point to anything, including physical entities, which means we can use the RDF language to describe devices such as cell phones and TVs.”Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001
cs236607 15
17
What is RDF?A part of the semantic-Web activityRDF is a general-purpose language for
representing information on the WebSpecifically, objects and relationships
Designed to allow computer applications to process data based on its semanticsRather than displaying data to humans (as
opposed to RSS)
An RDF document is actually a labeled graph that is represented in XMLThe specific language is called RDF/XMLW3C recommendation (Feb. 2004)
RDF Data Consists of TripletsRDF data is a set of statementsEach statement is a triplet (Resource,
Property, Value)Sometimes we refer to a triplet using the
terminology of (Subject, Predicate, Object)
cs236607 18
The author of http://www.cs.technion.ac.il/kanza/myPage
.html is Yaron Kanza
Resource (Subject): myPage.htmlProperty (Predicate): author Value (Object): Yaron Kanza
19
RDF Data
Subject Objectpredicate
The basic element: Triple (labeled edge)
Person#845
#1002
address
postalCode
6941Haifa
city
Herzel
street
RDF document: edge-labeled graph
Statement
20
page.html
John Smith
John’s Home Page
DC:Creator
DC:Title
<?xml version=“1.0”?><rdf:RDF
xmlns:rdf=“http://www.w3.org/TR/WD-rdf-syntax#”xmlns:dc=“http://purl.org/metadata/dublin_core#”>
<rdf:Description about=“page.html”> <dc:Creator>John Smith</dc:Creator> <dc:Title>John’s Home Page</dc:Title> </rdf:Description>
</rdf:RDF>
21
page.html
John SmithJohn’s Home Page [email protected]
dc:Title
dc:Creator
Name Email
. . .<Description about=“page.html”> <dc:Creator> <Description> <corp:Name>John Smith</corp:Name> <corp:Email>[email protected]</corp:Email> </Description> </dc:Creator> <dc:Title>John’s Home Page</dc:Title></Description>
</RDF>
ContainersGroups of things: <bag> <seq>
<alt><bag>: unordered list; duplicates
allowed<seq>: ordered list; duplicates
allowed<alt>: list of alternatives; one will
be selectedcs236607 22
23
A set of fifteen basic properties for describing generalized Web resources
The “obvious” mapping of Dublin Core properties into RDF properties has not yet been approved by the Dublin Core initiative, but is generally a good example
Dublin Core
24
“Title”: the name given to the resource“Creator”: the person or organization
primarily responsible for the resource“Subject”: what the resource is about“Description”: a description of the content“Publisher”: the person or organization
responsible for making the resource available“Contributor”: someone who has provided
content to the resource other than the creator“Date”: date of creation or publication
Dublin Core
25
“Type”: type of resource, such as home page, technical report, novel, photograph…
“Format”: data format of the resource“Identifier”: URL, ISBN number, …“Source”: another resource that this resource is derived
from“Language”: the language of the content“Relation”: another resource and its relationship to this
one“Coverage”: the portion of time or space described by
this resource (atlases, histories, etc.)“Rights”: the intellectual property rights adhering to
this resource, or a pointer to them
Dublin Core
26
Containers: bags, sequences, alternativesaboutEach, aboutEachPrefixReification (higher order statements)Namespaces and Vocabularies
Advanced RDF
27
Manually from HTML or “user domain XML”With special assisting tools – like Protégé,
Reggie, DC-dot, RDF for XMLIdeally – with some automated procedure
from HTML/XML documentsCan we use XSLT there?
Creating RDF documents
RDF SchemaRDF Schema (RDFS) enriches the data model
of RDF, adding vocabulary and associated semantics forClasses and subclassesProperties and sub-propertiesTyping of properties
Support for describing simple ontologiesAdds an object-oriented flavorBut with a logic-oriented approach and using
“open world” semanticscs236607 29
30
Not an XML Schema!A “companion” specification for RDF specClass, Type, subClassOf,domain, rangeMisc: label, comment, isDefinedBy,etc.
RDF Schema
cs236607 31
<rdf:RDFxmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xml:base= "http://www.animals.fake/animals#">
<rdf:Description rdf:ID="animal"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/></rdf:Description>
<rdf:Description rdf:ID="horse"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#animal"/></rdf:Description>
</rdf:RDF>Horse is defined as subclass of animal
Example
Example<?xml version="1.0"?><rdf:RDF
xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base= "http://www.animals.fake/animals#"><rdfs:Class rdf:ID="animal" /><rdfs:Class rdf:ID="horse">
<rdfs:subClassOf rdf:resource="#animal"/> </rdfs:Class>
</rdf:RDF>
cs236607 32
Abbreviated version. Works because an RDFS class is an RDF resource.
Use rdfs:Class instead of rdfDescription and drop the rdf:type information
RDF and RDF Schema
<rdf:RDF xmlns:g=“http://schema.org/gen” xmlns:u=“http://schema.org/univ”> <u:Chair rdf:ID=“john”> <g:name>John Smith</g:name> </u:Chair></rdf:RDF>
<rdfs:Property rdf:ID=“name”> <rdfs:domain rdf:resource=“Person”></rdfs:Property>
<rdfs:Class rdf:ID=“Chair”> <rdfs:subclassOf rdf:resource= “http://schema.org/gen#Person”></rdfs:Class>
u:Chair
John Smith
rdf:typeg:name
g:Person
g:name
rdfs:Class rdfs:Property
rdf:typerdf:type
rdf:type
rdfs:subclassOf
rdfs:domain
RDF Schema is LimitedWe cannot express facts such as
Two classes are disjointBuild a class that is the union of two classesCardinality restrictionScope of propertiesProvide relationships between properties, such
as transitive, unique, inverse
cs236607 35
OWLA Web ontology language that is more
expressive than RDF and RDF SchemaWritten in XML on top of RDFUsing OWL we want to provide exact
descriptions of items and the relationships between them
Basically, built upon Description Logics
cs236607 36
SPARQLSPARQL =
Query Language + Protocol + XML Results Format
• Access and query RDF graphs
• Product of the RDF Data Access Working Group
• We will only provide some examples and will not go over the entire definition of the language
cs236607 38
39
SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title2WHERE{ ?doc dc:title "SPARQL at speed" . ?doc dc:creator ?c . ?docOther dc:creator ?c . ?docOther dc:title ?title2
}
• On an abstracts/papers database:“Find other papers by the authors of a given paper.”
40
SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX shop: <http://example/shop#>
SELECT ?titleWHERE{ ?doc dc:title ?title . FILTER regex(?title, "SPARQL") . ?doc dc:creator ?c . ?c foaf:name ?name . OPTIONAL { ?doc shop:price ?price }}
• “Find books with ‘SPARQL’ in the title. Get the authors’ name and the price (if available).”
• Multiple vocabularies
41
Inference• An RDF graph may be backed by inference
− OWL, RDFS, application, rules
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?typeWHERE{ ?x rdf:type ?type .}
:x rdf:type :C .:C rdfs:subClassOf :D .
--------| type |========| :C || :D |--------
42
Another ExamplePREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX ldap: <http://ldap.hp.com/people#>PREFIX foaf:
SELECT ?name ?email{ ?doc dc:title ?title . FILTER regex(?title, “SPARQL”) . ?doc dc:creator ?reseacher . ?researcher ldap:email ?email . ?researcher ldap:name ?name}
• “Find the name and email addresses of authors of a paper about SPARQL”
Linkshttp://www.w3.org/RDF/http://www.w3.org/TR/2004/REC-rdf-primer-
20040210/http://www.w3.org/TR/rdf-schema/http://www.w3.org/TR/2004/REC-owl-ref-
20040210/
cs236607 43