i have a dream for the web [in which computers] become capable of analyzing all the data on the web...

41
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize. Image by Paul Clarke, Wikimedia Commons, CC-BY

Upload: geraldine-fletcher

Post on 13-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

I have a dream

for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and

computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.

Tim Berners-Lee 1999Image by Paul Clarke, Wikimedia Commons, CC-BY

"It is not easy to build a robot, and only very clever boys should try it."Carol Ryrie Brink (1966) Andy Buckram's Tin Men.

How cool would it be to makethe intelligent agents come into existence ?!

The Baskauf Rule for Technology

Adopting new technology requires that it do something better than the old technology

W3C Semantic Web Activity http://www.w3.org/2001/sw/ logos used according to usage guidelines

W3C Resource Description Framework http://www.w3.org/RDF/

What does RDF and SPARQL do better than traditional databases and SQL?

If the answer is "nothing", then we shouldn't waste our time using it!

RDF is an abstract, graph-based model• Triples are represented in text as serializations.• Several serializations are W3C Recommendations:• XML (media type: application/rdf+xml)• Turtle (media type: text/turtle)• also RDFa and JSON-LD (but won't talk about today)

• RDF/XML plays well with XML tools like XSLT and Xquery, but isn't very readable• RDF/Turtle is easier for humans to read.• SPARQL is based on Turtle syntax.

W3C RDF/XML Validation/visualization Servicehttp://www.w3.org/RDF/Validator/Load RDF/XML file from https://gist.github.com/baskaufs/609978f931b96c610f86

IRIs=ovals, literals=rectangles, predicates=arrows

Graph model of data in van-gogh.rdf

Serializations of the data

Namespace abbreviations

abbreviatedIRIs

type

blank (anonymous) node

XML Turtle

painting painter year

The Starry Night Vincent van Gogh 1889

Birth of Venus Sandro Botticelli 1485

<table> <record> <painting>The Starry Night</painting> <painter>Vincent van Gogh</painter> <year>1889</year> </record> <record> <painting>Birth of Venus</painting> <painter>Sandro Botticelli</painter> <year>1485</year> </record></table>

dbres:The_Starry_Night dcterms:creator viaf:9854560; dcterms:created "1889"^^xsd:gYear.<http://dbpedia.org/resource/The_Birth_of_Venus_(Botticelli)> dcterms:creator viaf:19686406; dcterms:created "1485"^^xsd:gYear.

Database table

XML

RDF (Turtle serialization)

IRIs denote resources.The resource that is denoted is the referent.

RDF "means" something.

dbres:The_Starry_Night dcterms:creator viaf:9854560.

denotes the actual painting entitled "The Starry Night"

denotes the actual person whose name was "Vincent van Gogh"

denotes the relationship of a subject resource having a maker who is the object agent.

information resource(web page;deliverable via Internet)

non-information resource(a painting;not deliverablevia Internet)

simple literal (denotes a string of characters with NO meaning)

IRI (denotes the person,Vincent van Gogh)

Datatyped literals "mean" somethingdbres:The_Starry_Night dcterms:created "1889"^^xsd:gYear.

denotes the actual painting entitled "The Starry Night"

denotes the actual year of 1889 CE

denotes the relationship of a subject resource being made in the object time period.

dbres:The_Starry_Night dcterms:created "1889".The triple

does not actually mean anything that makes sense.

What does RDF do better?

RDF "means" something.

• Great if you care about imparting meaning.• Really annoying if you don't care about the

complications and just want to do string searching.

What is the Semantic Web?"The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources …

It is also about language for recording how the data relates to real world objects."

Let's play with van Gogh and The Starry Night graph!It's loaded in the Heard Library triplestore as the graph: http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf

Important note:the graph does NOT live in the triplestore as any particular serialization! It's just a pot full of triples.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX schema: <http://schema.org/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX dbres: <http://dbpedia.org/resource/>PREFIX viaf: <http://viaf.org/viaf/>PREFIX orcid: <http://orcid.org/>PREFIX owl: <http://www.w3.org/2002/07/owl#>PREFIX dbp: <http://dbpedia.org/property/>PREFIX prov: <http://www.w3.org/ns/prov#>PREFIX dbo: <http://dbpedia.org/ontology/>

These are all of the namespace prefixes we will be using in the rest of the examples (see Gist).

This is the skeleton SPARQL query that we will use (see Gist).

SELECT DISTINCT ?label FROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf> WHERE { dbres:The_Starry_Night rdfs:label ?label. }

Replace stuff in orange text with your experimentation.DISTINCT keyword prevents repetition if the same triple is found multiple times.

What kinds of classes of things are present in this graph? (rdf:type or "a")

SELECT DISTINCT ?resource ?class FROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf> WHERE { ?resource a ?class. }

Notes:• The foaf:Document is

represented by a blank node.• There is no limit to the number

of classes a resource can be an instance of.

Human-friendly labels for referents.

SELECT DISTINCT ?label FROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf> WHERE { viaf:9854560 rdfs:label ?label. }

Replace stuff in orange text with your experimentation. Try schema:name, schema:familyName, and schema:givenName.

rdfs:label is the most generic (built-in property) but more specific properties give more precise information.

Schema.org is run by Google, Microsoft, Yahoo, with contributions by Dan Brickley (of FOAF fame).

Find human-friendly labels for The Starry Night.

SELECT DISTINCT ?label FROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf> WHERE { dbres:The_Starry_Night rdfs:label ?label. }

Replace stuff in orange text with more the specific Dublin Core term dcterms:title.

Dublin Core is the most commonly used vocabulary for metadata.

What is the Semantic Web?"The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources …

It is also about language for recording how the data relates to real world objects."

Linked Data

http://www.w3.org/DesignIssues/LinkedData.html

Tim Berners-Lee expressed the "Linked Data Principles" in 2006:

1. Use URIs as names for things.2. Use HTTP URIs, so that people can look up those names.3. When someone looks up a URI, provide useful information, using

the standards (RDF, SPARQL).4. Include links to other URIs, so that they can discover more things.

"Linked Data" is a similar idea to "the Semantic Web" but focused on HTTP URIs as identifiers and more on data discovery than reasoning.

Dereference the HTTP URI and ask for RDF

dbres:The_Starry_Night

is an abbreviation for

http://dbpedia.org/resource/The_Starry_Night

SELECT DISTINCT ?labelFROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf>FROM <http://rdf.library.vanderbilt.edu/learn/dbpedia-example-data.rdf>WHERE {dbres:The_Starry_Night rdfs:label ?label.}

Can we get more information by merging data from dbpedia?

SELECT DISTINCT ?propertyFROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf>FROM <http://rdf.library.vanderbilt.edu/learn/dbpedia-example-data.rdf>WHERE {dbres:The_Starry_Night ?property ?value.}

Look for more labels:

Look for more properties (try first without dbpedia data):

Yay! We have "learned" more about The Starry Night by adding triples to our graph via Linked Data!

What does RDF do better?

• Expressiveness is great. If there isn't a property that you need, you can make one up!

• Consistency is terrible. Resources described with ad hoc properties are not likely to be usefully combined with other people's data.

AAA principle: Anyone can say Anything about Anything.

SELECT DISTINCT ?nameFROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf>FROM <http://rdf.library.vanderbilt.edu/learn/dbpedia-example-data.rdf>WHERE { viaf:9854560 schema:name ?name.}

Can we get more information from dbpedia about van Gogh?

Look for more properties (try first without dbpedia data):

<rdfs:label xml:lang="ar"> غوخ فان <rdfs:label/>فينسنت <rdfs:label xml:lang="zh">文森特 ·梵高 </rdfs:label> <rdfs:label xml:lang="de">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="ru">Ван Гог, Винсент</rdfs:label> <rdfs:label xml:lang="en">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="es">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="pt">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="fr">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="it">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="ja">フィンセント・ファン・ゴッホ </rdfs:label> <rdfs:label xml:lang="pl">Vincent van Gogh</rdfs:label> <rdfs:label xml:lang="nl">Vincent van Gogh</rdfs:label> <foaf:name xml:lang="en">Vincent van Gogh</foaf:name> <foaf:name xml:lang="en">Gogh, Vincent van</foaf:name> <dbp:name xml:lang="en">Vincent van Gogh</dbp:name> <dbp:name xml:lang="en">Gogh, Vincent van</dbp:name>

Why didn't it work?

Grrrrrrr. They didn't use schema:name like we did!

SELECT DISTINCT ?nameFROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf>FROM <http://rdf.library.vanderbilt.edu/learn/dbpedia-example-data.rdf>WHERE { {viaf:9854560 schema:name ?name.}UNION {viaf:9854560 rdfs:label ?name.}UNION {viaf:9854560 foaf:name ?name.}UNION {viaf:9854560 dbp:name ?name.}}

Try something more complicated

Grrrrrrr. They didn't use viaf:9854560 as an IRI for van Gogh as we did:

<rdf:Description rdf:about="http://dbpedia.org/resource/Vincent_van_Gogh">

Here's the solution! <rdf:Description rdf:about="http://dbpedia.org/resource/Vincent_van_Gogh"> <owl:sameAs rdf:resource="http://it.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://yago-knowledge.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://id.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://ja.dbpedia.org/resource/フィンセント・ファン・ゴッホ" /> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://eu.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://fr.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://es.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://purl.org/collections/nl/am/p-43441" /> <owl:sameAs rdf:resource="http://de.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://pl.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://nl.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://ko.dbpedia.org/resource/빈센트_반_고흐" /> <owl:sameAs rdf:resource="http://el.dbpedia.org/resource/Βίνσεντ_βαν_Γκογκ" /> <owl:sameAs rdf:resource="http://wikidata.org/entity/Q5582" /> <owl:sameAs rdf:resource="http://cs.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://viaf.org/viaf/9854560" /> <owl:sameAs rdf:resource="http://pt.dbpedia.org/resource/Vincent_van_Gogh" /> <owl:sameAs rdf:resource="http://wikidata.dbpedia.org/resource/Q5582" /> <owl:sameAs rdf:resource="http://sw.cyc.com/concept/Mx4rwORxl5wpEbGdrcN5Y29ycA" /> <owl:sameAs rdf:resource="http://rdf.freebase.com/ns/m.07_m2" /> </rdf:Description>

SELECT DISTINCT ?nameFROM <http://rdf.library.vanderbilt.edu/learn/van-gogh.rdf>FROM <http://rdf.library.vanderbilt.edu/learn/dbpedia-example-data.rdf>WHERE { {viaf:9854560 schema:name ?name.}UNION { ?person owl:sameAs viaf:9854560. ?person rdfs:label ?name. }UNION { ?person owl:sameAs viaf:9854560. ?person foaf:name ?name. }UNION { ?person owl:sameAs viaf:9854560. ?person dbp:name ?name. }}

Try something more complicated

References:

Harry Halpin, Patrick J. Hayes, James P. McCusker, Deborah L. McGuinness, and Henry S. Thompson. 2010. When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data. International Semantic Web Conference (ISWC).http://iswc2010.semanticweb.org/pdf/261.pdf

Also, blog post on "bloating" caused by owl:sameAshttp://baskauf.blogspot.com/2014/05/confessions-of-rdf-agnostic-part-5.html

What does RDF do better?

With RDF, you can discover other people's triples (Linked Data).

• Great if they used standard properties to link and standard IRIs to identify.

• Really annoying if they made up their own properties and IRIs.

So the examples in the book where you make up your own vocabulary don't really leverage the power of Linked Data. You're not much better off than if you used standard database and querying techniques.

One can infer previously unstated facts based on logic

(Entailment)This is a key benefit of having RDF "mean" something rather than just making it be a transfer mechanism or database system.

"The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth." RDF Semantics http://www.w3.org/TR/rdf-mt/

A semantic client does not “know” what the URIs and literals “mean”

dwc:decimalLatitude<http://rs.tdwg.org/dwc/terms/decimalLatitude>

has no more meaning to a machine than:xq:p2-glwsopgn_2q4as<http://xq1w.org/3t3/nv_c/p2-glwsopgn_2q4as>

"-121.34278"is just a string of Unicode characters

"The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth." RDF Semantics http://www.w3.org/TR/rdf-mt/

But a semantic client can follow rules about what can be inferred to be true

If

aaa rdfs:range XXX. uuu aaa vvv.

then

vvv rdf:type XXX.

Application of an entailment ruleThe FOAF vocabulary asserts:

foaf:depiction rdfs:range foaf:Image.

This does NOT mean that the object of a triple containing foaf:depiction must be an image.

The AAA Principle allows the predicate foaf:depiction to be used with any kind of object.

The entailment rule rdfs3 means that that a semantic client can materialize an entailed triple stating that the rdf:type of the object is foaf:Image.

FOAF = Friend of a Friend vocabulary http://xmlns.com/foaf/spec/

Entailment rule example

The AAA Principle allows me to assert that:

<http://viaf.org/viaf/9854560> foaf:depiction <http://commons.wikimedia.org/wiki/File:Van_Gogh_Age_19.jpg>.

In English we would say:{The person Vincent van Gogh} has a depiction {a certain jpeg image}

From the range of foaf:depiction, a client can infer that:

<http://commons.wikimedia.org/wiki/File:Van_Gogh_Age_19.jpg> rdf:type foaf:Image.

RDF also allows me to assert that: <urn:lsid:ubio.org:namebank:111731> foaf:depiction <http://dbpedia.org/resource/Moby-Dick>.

In English we would say:{The name Physeter macrocephalus Linnaeus, 1758} has a depiction {the novel Moby Dick}

DBpedia declares <http://dbpedia.org/resource/Moby-Dick> rdf:type bibo:Book

But a semantic client infers <http://dbpedia.org/resource/Moby-Dick> rdf:type foaf:Image.

based on the range declaration of foaf:depiction

A novel is an image !!! Oops. We must be more careful with foaf:depiction because of its range declaration.

Image by Randy Son of Robert Wikimedia Commons cc-by-2.0

Aside on inconsistencies• The Open World assumption assumes that we cannot infer

anything from triples that are unstated (i.e. not making a statement does not imply that the statement is false).

• Stating more triples restricts the possible states of the "world" of discourse described by the graph.

• It is possible to make statements which entail that there is no possible "world" that the graph describes, e.g.

• ex:steve my:age "18.5"^^xsd:integer.

• Careless use of terms with strong entailments increase the likelihood of rendering a graph inconsistent.

• see http://baskauf.blogspot.com/2014/05/confessions-of-rdf-agnostic-part-4.html for examples and more on this.

Entailment summary• Entailment rules do NOT enforce conditions.

• Entailment rules imply that other unstated triples exist.

• Inferred triples are true to the extent that the statements which entail them are also true. This introduces a requirement for an element of trust.

• A client is not required to apply all possible entailment rules.

• A client is not required to to apply rules to any particular set of triples.

Quote from section 3 of OWL 2 Primer http://www.w3.org/TR/owl2-primer/#Modeling_Knowledge:_Basic_Notions

"a set of statements A entails a statement a if in any state of affairs wherein all statements from A are true, also a is true."

"… the vocabulary of the graph may be interpreted relative to a stronger notion of vocabulary entailment, i.e. with a larger set of semantic conditions understood to be imposed on the interpretations. … [This] can be thought of as an addition of information, and may make more entailments hold than held before the change. "

section 6 of RDF Semantics W3C Recommendation http://www.w3.org/TR/rdf-mt/#MonSemExt

vocabulary-interpretation

rdf-interpretation

rdfs-interpretation

owl-interpretation

entailment

weaker

stronger

semantic conditions

imposed

fewer

more

information

less

more

likelihood of inconsistency

less

more

Vocabulary trends

What does RDF do better?

With RDF, you reason entailed triples that nobody has explicitly stated.

• Great if consistent use of terms entails triples that make sense.

• Really annoying if careless use of terms entails triples that are nonsensical or that generate inconsistencies.