eswc ss 2012 - tuesday tutorial dan brickley and denny vrandecic: linked open data
TRANSCRIPT
Linked Open Data Dan Brickley, Google Denny Vrandečić, Wikimedia
Session Linked data, Tuesday, 9:45-11:15
2
Agenda ! Notation ! Linked Open Data principles ! Applied LOD principles ! Application: schema.org ! Application: Wikidata ! Open questions ! Hands-On Intro: On links ! Hands-on: Exploration ! Hands-on: SPARQL ! Hands-on: Spark
22/05/2012
3
dbpedia:Kalamaki
Notation ! URIs here generally abbreviated with CURIEs (e.g. http://dbpedia.org/resource/Kalamaki = dbpedia:Kalamaki) ! Entities and literals are labeled rectangles ! Blank nodes are circles ! Triples are arrows labeled with property connecting subject and object
22/05/2012
dbpedia:Kalamaki fb:likes
4
LOD PRINCIPLES Background
22/05/2012
5
Linked Open Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that they can be looked up
3. Provide results in standard formats (e.g. RDF, SPARQL)
4. Link to other URIs
22/05/2012
6
WHY SEMANTIC COMPUTING? LOD Application
7
8
1 2 10 100 10 D
Field 1: Tag ! 0 = Face up ! 1 = Face down Field 2: Suit ! 1 = Clubs ! 2 = Diamonds ! 3 = Hearts ! 4 = Spades
Field 3: Rank ! 1 = Ace ! 2..10 = 2..10 ! 11 = Jack ! 12 = Queen ! 13 = King Field 4: Address next card Field 5: “Human-readable”
Example from Donald Knuth, The Art of Computer Programming, Chapter 1
9
1 2 10 100 10 D
10
1 2 10 100 10 D
card
s:ne
xt
11
1 2 10 100 10 D
cards:d10
card
s:ne
xt
cards:card
12
http://example.org/cards/d10!! Oh, an unknown term ! It is an HTTP URI!
GET /cards/d10 HTTP/1.1!HOST www.example.org!Accept: text/rdf+n3, application/rdf+xml!
!!
HTTP/1.1 200 OK!Content-type: text/n3; charset-UTF-8!
!
cards:d10 rdf:type cards:Card ;! rdfs:label “10 of diamonds”@en ;! cards:suit cards:diamonds ;! cards:rank cards:rank-10 .!
22/05/2012
13
1 2 10 100 10 D
cards:d10
cards:diamonds
cards:rank-10
card
s:ne
xt
cards:card cards:rank
14
cards:card
1 2 10 100 10 D
cards:d10
cards:diamonds
cards:rank-10
“10 of Diamonds”@en
card
s:ne
xt
cards:rank
rdfs:label
15
cards:card
1 2 10 100 10 D
cards:d10
cards:facedown cards:diamonds
cards:rank-10
“10 of Diamonds”@en
card
s:ne
xt
cards:rank
rdfs:label
16
cards:card cards:d10
cards:facedown cards:diamonds
cards:rank-10
“10 of Diamonds”@en
“10”^xsd:int
color:red
“Karo 10”@de card
s:ne
xt
cards:rank
rdfs:label
17
cards:d10
cards:facedown cards:diamonds
cards:rank-10
“10 of Diamonds”@en
“10”^xsd:int
color:red
“Karo 10”@de card
s:ne
xt
cards:card cards:rank
rdfs:label
cards:suit ○ cards:color ⊑ cards:cardcolor
18
Programming function color(card) { if ((card[2] == 1) or (card[2] == 4)) { return 1; } else { return 2; } }
function color(card) { if ((card.suite == cards.clubs) or (card.suite == cards.spades)){ return cards.black; } else { return cards.red; } }
function color(card) { return 2 – int((card[2] == 1) or (card[2] == 4)); }
cards:cardcolor select ?color where { card cards:cardcolor ?color }
Classic Symbolic constants
Wannabe Hacker Semantic
Where is the knowledge? How do I edit it?
19
20
cards:d10
cards:facedown cards:diamonds
cards:rank-10
“10 of Diamonds”@en
“10”^xsd:int
color:red
“Karo 10”@de card
s:ne
xt
cards:card cards:rank
rdfs:label
color:yellow
cards:suit ○ skat:color ⊑ cards:cardcolor
21
22
cards:d10
cards:facedown cards:diamonds
cards:rank-10
“10 of Diamonds”@en
“10”^xsd:int
color:red
“Karo 10”@de
card
s:ne
xt
cards:card cards:rank
rdfs:label
color:yellow color:purple
poker:color
cards:suit ○ poker:color ⊑ cards:cardcolor cards:cardcolor
23
BUT THAT ARE KNOWLEDGE-BASED SYSTEMS AS DONE FOR DECADES!
24
CHRIS WELTY, IBM
“In the Semantic Web, it is not the ‘Semantic’ which is new, it is the ‘Web’ which is new.”
25
cards:d10
cards:diamonds
color:red
cards:card
color:yellow color:purple
poker:color
aifb:Elena
fb:li
ke
26
Elena
AIFB
Purple
Tatort
Diamond
10-Diamond Queen-Diamond
Queen
King
KIT
Culture
University
Karlsruhe
Education China
Ceylon
India
Airline
Asia
Hotel Restaurant Enterprise
Airport Advertisment
Animal Vegeterian restaurant
Cosmos
TV Show
Inchineon Mumbay Airport
Mumbay
Human
Carbon
Diamond
Lao Tse Religion
Philosophy
Semantic Web
27
Semantic Web
22/05/2012
2007
28
Semantic Web
22/05/2012
2008
29 22/05/2012
2009
30 22/05/2012
2010
31 22/05/2012
2011
32
SCHEMA.ORG Applications
22/05/2012
33
Schema.org A quick look.
34
35
36
37
Yandex
38
event
place
intangible LocalBusiness
Organization
CivicStructure
CreativeWork
Landform
UserInteraction
39
For example?
40
41
<div itemscope itemtype="http://schema.org/VideoObject">! <h2>Video: <span itemprop="name">My Title</span></h2>! <meta itemprop="duration" content="T1M33S" />! <meta itemprop="thumbnailUrl" content="thumbnail.jpg" />! <meta itemprop="embedUrl"! content="http://example.com/videoplayer.swf?video=123" />! <object ...>! <embed type="application/x-shockwave-flash" ...>! </object>! <span itemprop="description">Video description</span>!</div>!
Type: http://schema.org/VideoObject name = My Title duration = T1M33S thumbnailurl = thumbnail.jpg embedurl = http://www.example.com/videoplayer.swf?video=123 description = Video description
42
(this is almost all you need to know about RDF, incidentally)
43
WIKIDATA Applications
22/05/2012
44
45
Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Complete list
Berlin edit | x
Continent Europe [3 sources]
Country Germany [2 sources]
Population 3,499,879 As of November 30 2011 Method Extrapolation
[1 source]
3,500,000 As of 2012 Method Estimate
[2 sources]
[further values]
Phone prefix 030 since June 1973
[2 sources]
0311 before June 1973
[1 source]
Mayor Klaus W| [no source]
Registration license B [1 source]
Area 891,85 km” [2 sources]
Twin city Los Angeles [no sources]
[new statement]
edit
edit
Klaus Wowereit German politician Klaus Wunderlich German musician Klaus Wagner Stalker of the British royal family Klaus Wagner German mathematician Klaus Waldeck Austrian musician and lawyer
Capital of Germany Also known as: City of Berlin
46
Hauptseite Inhalt API Zufällige Seite Spende an Wikidata Interaktion Hilfe Über Wikidata Benutzerportal Letze Änderungen Sprachen Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Vollständige Liste
Berlin edit | x
Kontinent Europa [3 Quellen]
Land Deutschland [2 Quellen]
Einwohner 3.499.879 Stand 30. November 2011 Methode Fortschreibung
[1 Quelle]
3.500.000 Stand 2012 Methode Schätzung
[2 Quellen]
[weitere Werte]
Telefonvorwahl 030 Seit Juni 1973
[2 Quellen]
0311 Vor Juni 1973
[1 Quelle]
Bürgermeister Klaus W| [keine Quellen]
Amtliches Kennzeichen B [1 Quelle]
Fläche 891,85 km” [2 Quellen]
Partnerstadt Los Angeles [keine Quellen]
[neue Aussage]
edit
edit
Klaus Wowereit Deutscher Politiker Klaus Wunderlich Deutscher Musiker Klaus Wagner Stalker der Britischen Königsfamilie Klaus Wagner Deutscher Mathematiker Klaus Waldeck Österreichischer Musiker und Anwalt
Hauptstadt von Deutschland Auch bekannt als: Stadt Berlin
47
Application: Infoboxes ! Now: every article calls an
infobox with local values
! In Wikidata: one page with values
! Wikipedias fill infoboxes with Wikidata values
48
49
OPEN QUESTIONS Or: A few dozen possible paper, project and thesis topics
50
UNFINISHED WORK Open questions
51
52
Unfinished work ! What does a unifying logic look like? ! How do we export proofs? ! How do we validate proofs? ! How do we express trust? ! How does the crypto stack really work? ! What are usable interfaces to the Semantic Web? ! How are Semantic Web applications created?
53
IDENTITY AND REPRESENTATION
Open questions
54
http://simpsons.com/id/Bart
http://rdf.freebase.com/id/en.bart_simpson
http://en.wikipedia.org/wiki/Bart_Simpson http://dbpedia.org/resource/Bart_Simpson
http://en.wikipedia.org/wiki/Bart_Simpson
Bart
4030
Bart Simpson
(Character ID on ComicbookDB)
55
Identity and representation ! Is there anything out there? ! How to find the right identifier? ! How to know what an identifier identifies? ! What about the multitude of identifiers? ! How do we know that two identifiers identify the same entity? ! How do we know that two identifiers identify different entities? ! Without this, can we still usefully apply statistical techniques? ! What about creating new identifiers? ! What if identifiers are ambiguous? ! How to find representations for entities fitting my UI? ! How to choose a representation?
56
TRUST AND DIVERSITY Open questions
57
Main page Content API Random page Donate to Wikidata Interaction Help About Wikidata Community Recent changes Languages Catalá Cesky Dansk Eesti English Español Esperanto Français Hrvatski Italiano O’zbek Complete list
Berlin edit | x
Continent Europe [3 sources]
Country Germany [2 sources]
Population 3,499,879 As of November 30 2011 Method Extrapolation
[1 source]
3,500,000 As of 2012 Method Estimate
[2 sources]
[further values]
Phone prefix 030 since June 1973
[2 sources]
0311 before June 1973
[1 source]
Mayor Klaus Wowereit [no source]
Registration license B [1 source]
Area 891,85 km” [2 sources]
Twin city Los Angeles [no sources]
[new statement]
edit
edit
Capital of Germany Also known as: City of Berlin
58
A statement in Wikidata
Population 3,499,879 As of November 30 2011 Method Extrapolation
[2 sources]
3,500,000 As of 2012 Method Estimate
[1 source]
Berlin
59
A statement in Wikidata
Population 3,499,879 As of November 30 2011 Method Extrapolation
[2 sources]
3,500,000 [1 source]
Berlin
Berlin 3499879 population
Statement1
item property
value
3500000 population
2011-11-30 Extrapolation
as of method
60
A statement in Wikidata
Population 8,000 As of 15th century Method Estimate
[2 sources]
3,500,000 [1 source]
Berlin
Berlin 8000 population
Statement1
item property
value
3500000 population
15th century Estimate
as of method
Statement2
property value
61
A statement in Wikidata
Population 3,499,879 As of November 30 2011 Method Extrapolation
[2 sources]
3,500,000 [1 source]
Berlin
Berlin 3499879 population
Statement1 Source1
item property
value
reference
3500000 population
Statement2
property value
Source2
2011-11-30 Extrapolation
as of method
Source3
reference
62
Trust and diversity ! How to express provenance information? ! How to store provenance of data? ! Can provenance information be expressed such that the data is still
easily accessible? ! How to query data with provenance information? ! How to deal with genuinely diverse data? ! How to match diverse vocabularies? ! How to deal with noisy data? ! Is reification really necessary? ! Do named graphs provide solutions? ! Use one graph per statement?
63
UNITS AND ACCURACY Open questions
22/05/2012
64
Units and accuracy ! How to express “17th century” next to literal dates? ! How to express heterogeneous accuracies? ! Is a functional value of 40,000km really inconsistent with 39,987km? ! How to express confidence values? ! How to express units? ! Is 176cm equal to 5ft9? 177cm too? Is equality transitive? ! How to express ranges (e.g. property “active” for bands)?
22/05/2012
65
SERIALIZATIONS Open questions
66
http://simpsons.com/id/Bart
http://simpsons.com/id/Marge
http://family.org/id/parent
http://simpsons.com/id/Lisa
Bart
http://www.w3.org/2000/01/rdf-schema#label
Marge parent
Lisa
Child
sibling
Adult
http://family.org/id/sibling
http://family.org/id/Child
http://family.org/id/Adult
http://www.w3.org/1999/02/22/rdf-syntax-ns#type
67
<?xml version=“1.0” encoding=“UTF-8”?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=“http://www.w3.org/2000/01/rdf-schema#” xmlns:family=“http://family.org/id/”> <rdf:Description rdf:about=“http://simpsons.com/id/Marge”> <rdf:type rdf:resource=“http://family.org/id/Adult”/> <rdfs:label>Marge</rdfs:label> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Bart”> <rdfs:label>Bart</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> <family:sibling rdf:resource=“http://simpsons.com/id/Lisa”/> </rdf:Description> <rdf:Description rdf:about=“http://simpsons.com/id/Lisa”> <rdfs:label>Lisa</rdfs:label> <rdf:type rdf:resource=“http://family.org/id/Child”/> <family:parent rdf:resource=“http://simpsons.com/id/Marge”/> </rdf:Description> </rdf:RDF>
68
@prefix rdf ‘http://www.w3.org/1999/02/22-rdf-syntax-ns#’ @prefix rdfs ‘http://www.w3.org/2000/01/rdf-schema#’ @prefix family ‘http://family.org/id/’ @prefix simpsons ‘http://simpsons.com/id/’ simpsons:Marge rdf:type family:Adult ; rdfs:label ‘Marge’ . simpsons:Bart rdf:type family:Child ; rdfs:label ‘Bart’ ; family:parent simpsons:Marge ; family:sibling simpsons:Lisa . simpsons:Lisa rdf:type family:Child ; rdfs:label ‘Lisa’ ; family:parent simpsons:Marge .
{ “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” }
69
Child(Bart). sibling(Bart, Lisa). parent(Bart, Marge).
Bart is a son of [[parent::Marge]] and the brother of [[sibling::Lisa]].
0 HEAD 1 FILE simpsons 1 GEDC 2 VERS 5.5 0 @I1@ INDI 1 NAME Marge /Bouvier/ 2 SURN Simpson 1 SEX F 1 FAMS @F1@ 0 @I2@ INDI 1 NAME Bart /Simpson/ 1 SEX M 1 FAMS @F1@ 0 @I3@ INDI 1 NAME Lisa /Simpson/ 1 SEX F 1 FAMS @F1@ 0 @F1@ FAM 1 WIFE @I1@ 1 CHIL @I2@ 1 CHIL @I3@ 0 TRLR
{ “id” : “Bart”, “type” : “Child”, “sibling” : “Lisa”, “parent” : “Marge” }
70
Serializations ! Do all tools need to understand all serializations? ! Are all serializations lossless? ! How to ensure they are up-to-date? ! What about current tools that don’t understand anything? ! Is the data sufficiently complete? ! How to seamlessly ground and lift data to RDF?
71
ONTOLOGIES Open questions
72
Ontologies
! “An ontology is a formal specification of a shared conceptualization” ! Defines concepts and their formal relations to each other ! You can understand a concept without having a word for it ! Axiom not possible in OWL L, can only be approximated
parent ○ brother = uncle
Bart
Marge Selma Sideshow Bob ⚭Homer ⚭Herb
♂
parent ○ sister ○ husband V ⊑ sibling
73
Ontologies
! “An ontology is a formal specification of a shared conceptualization”
! Strict taxonomies ! Bart a FictionalPerson
! owl:sameAs ! GDR sameAs Germany
! Classes as individuals ! Eagle a EndangeredSpecies
! rdfs:domain and rdfs:range ! family:child rdfs:range foaf:Person
! “Unauthorized” extensions ! foaf:favouriteMovie
74
Ontologies ! How to achieve and measure sharedness? ! Who defines the semantics of a term? ! How to achieve correctness? ! Does sharedness mean correctness? ! How to overcome limitations on expressivity? ! How to deal with wishes for more expressivity? ! How to deal with undecidability? ! What does inconsistency mean? ! How to deal with brittleness?
75
PRIVACY Open questions
76 76
77 77
78
Privacy ! How to ensure privacy? ! What does privacy mean? ! How to publish linked data that is not open? ! What about the ethics of combining data?
79
SCALABILITY Open questions
80
Web Data Commons ! Extracts data from Common Crawl (5b pages, 20 TB compressed) ! 65,408,946 domains with triples ! 1,222,563,749 typed entities ! 3,294,248,653 triples ! www.webdatacommons.org
22/05/2012
81
Scalability ! How to efficiently use Semantic Web data? ! How to select the appropriate set? ! How to cache it? ! How to deal with frequent updates? ! How to deal with SPARQL endpoints vs RDF? ! How to do federated queries? ! Who pays for it and when?
82
QUESTIONS?
83
WHAT ABOUT THE LINKS? Introduction to Hands-On
22/05/2012
84
What are the links in "linked data"?
Are they links between things?
Are they links between documents?
How exactly do the "Web hyperlinks" we know and love relate to the factual "typed links" of data modeling?
85
Links and Links ! These questions motivate and drive the Linked Data project, and
have been with the Web from the start. ! They explain our most boring debates ("http-range-14"). ! And show how 'Semantic Web' is a project to improve the
mainstream Web itself.
86
87
In the beginning...
(1989, 1994, ...)
88
89
90
91
92
93
94
95
96
What's in a (hyper)link?
! Does a node in the graph stand for 'Stephen Fry'-the-Person? or 'a page about Stephen Fry'?
! What about when there are multiple pages about the same person? in different voices? sometimes disagreeing?
! RDF thinks in triples, but data management is often in quads: asking who-said-what in SPARQL
97
1989 again
One flat graph? What if we disagree?
98
A Graph of Graphs?
! Classic WWW hypertext is a top-level document graph. ! Those documents make claims about the world; factual
graphs, e.g. schema.org, RDFa. ! SPARQL let's us store and query all this. ! Each Web 'node' may give us its own 'nodes and links'
description, including links.
99
100
IMDB
BBC
stephenfry.com
Freebase
sameas.org
dbpedia.org
NewYorkTimes RottenTomatoes
VIAF
101
We can emphasize the landscape of sites/datasets...
(No single 'correct' view)
102
(No single 'correct' view)
We can emphasize the landscape of sites/datasets...
103
Or we can zoom in, and see how records can be merged / flattened into a single set of triples...
104
Summary
! Linked datasets, pages, real world things...
! ... all of these are represented in RDF datasets.
! To query this hands on, we can use SPARQL to ask questions, and 'named graphs' to organize factual claims into groups.
105
EXPLORATION Hands-on
106
Hands-on ! You will explore datasets with SPARQL about Stephen Fry
! SPARQL yourself and your colleagues
! Spark: SPARQL on the Web
107
Thinking about data ! We made a data/ folder for you ! Real public RDF data about a real person ! Sources: DBpedia, Freebase, VIAF, sameas.org, New York Times,
Identi.ca, BBC, Rotten Tomatoes, IMDB and us. ! I’ll briefly introduce the data now, then see info/data-and-queries-
intro.txt
http://192.168.0.20:8080/openrdf-workbench/repositories/Tuesday
108
What to do
! “Get your hands dirty” with real Linked Data ! If you hit a problem, make a note of it - & ask! ! Most files have RDF describing Stephen Fry; he is real and
human, please bear that in mind. ! Study the shape and patterns of the data, ask yourself
questions, using SPARQL to explore.
109
Questions
! What RDF schemas/ontologies do you see? ! How are people and other things identified? ! Are there common patterns across sources? ! Can you write queries that integrate these? ! What bugs in the data are there? How do you think they got
there?
110
Internet Detectives ! for each triple, can you figure out “how it got there”? in whose voice
is it? ! is there a real schema? (if the Wifi is up) ! how would you check its truth? who “said” it and how could a
machine tell? ! which sources (or parts) aggregate different points of view within a
single RDF graph?
111
data-and-queries-intro.txt ! See the info/ folder for more details - SPARQL setup and some
querying tutorial. ! Goal is to study the Linked Data Web and understand how it might
evolve. ! Identify project and research topics, and ways of helping to improve
the Web.
112
SPARQL YOURSELF Hands-on
113
SPARQL yourself
http://192.168.0.20:8080/openrdf-workbench/repositories/Students/query
http://192.168.0.20:8080/openrdf-sesame/repositories/Students
SPARQL endpoint
SPARQL Web Form
114
SPARK Hands-on
115
Spark
116
Spark visualizations
117
Spark visualizations
118
Exercise
119
Exercise
120
Semantic MediaWiki
121
Semantic MediaWiki - Export
122
Task
! Let’s add semanticweb.org as an additional source in order to add Dan from there to the lists of the “Friends of Spark”.
! Expand spark.zip, then check test/index.html
123 22/05/2012