methodological guidelines for publishing linked data
DESCRIPTION
Methodological Guidelines for Publishing Linked Data presented in Bolivia. UPB, UCB, UMSS, JalasoftTRANSCRIPT
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
Cochabamba, BoliviaMay, 2011
ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
2
ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
3
People
• Director: A. Gómez-Pérez• Research Group (38 people)
• 2 Full Professors• 6 Associate Professors• 1 Assistant Professor• 6 Postdocs• 14 PhD Students• 6 MSc Students• 4 Software Engineers
• Management (5)• 3 Project Managers• 3 Project Managers• 1 System Administrator• 1 Secretary
• 80+ Past Collaborators• 80+ Past Collaborators• 15+ visitors
4
http://www.oeg-upm.net
Students from...
France
G
Lithuania
Italy
France
Serbia
Macedonia
Germany
MalaysiaColombia
India
EcuadorCameroon
Bolivia
Ecuador
5
> 30 Research projects1999 20022000 2001 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20131999 20022000 2001 2003 2004 2005 2006 2007
Katalyx
Group
IGN/RAE/AMPER/XMEDIA2008 2009 2010
E ñ Vi t l/ IO!/B diPLATA
WHO/IGN/BNE/FAO2011 2012 2013
ContentWeb
20 Ac Especiales/Complementarias
Servicios Semánticos
REIMDOC (FIT) Red/Gis4Gov/11811/UPnP/UpGrid/Autores3.0/WEBn+1
GeoBuddies
España Virtual/mIO!/BuscamediaPLATA
BabelData / myBigData
HA98-0002
MKBEEMO t W b
HF02-0013
20 Ac. Especiales/Complementarias
SEEMPNeOnOntoWeb
Esperonto
PIKONKnowledge Web
NeO
ADMIRE
DynaLearn
OntoGrid
Marie Curie
SemSorGrid4Env
SEALS
MONNET
Company EU Project Coordinators
SCALUS
PlanetData
Wf4Ever
6
Company EU Project CoordinatorsSpanish Projects EU Project Participation
Collaboration with international companies
7
Collaboration with other research groups
Univ. of Amsterdam
i f d
DFKI
Univ. of Augsburg
Univ. of Karlsruhe
U i f K bl
KSL. Stanford Univ.
Univ. of Wien
Univ. of NR & ALS
Univ. of InnsbruckFree Univ. of Amsterdam Univ. of Koblenz
Univ. of Hannover
Univ. of Mannheim
Univ of Bielefeld
Univ. of Brasilia
Univ. of Bielefeld
Forschungszentrum Informatik
Univ. of Galway (DERI)
Free Univ. of Brussels
Úniv. of Zurich
Open University
O f d U i it
Ústav Informatiky
Oxford University
Univ. of Manchester
Univ. of Liverpool
Univ of Sheffield
Academy of Sciences
Univ. of Sheffield
Univ. of Aberdeen
Univ. of Edinburgh
Univ. of Southampton
CNR
Univ of Trento
Univ. of Tel Aviv
8
Univ. of Southampton
Univ. of Hull
Univ. of Trento
Univ. of BolzanoINRIA
Univ. of Athens
TUC
Research Areas
2004 2008
Internet of Things
Semantic e-Science(Data Integration, Semantic Grid)
Ontological Engineering1995Semantic Grid) 1995
(S i l) N l(Social) Semantic
Web
Natural LanguageProcessingg
19972000
9
Linked Data in OEG
• GeoLinkedData is an open initiative whose aim is toenrich the Web of Data with Spanish geospatial data.p g phttp://geo.linkeddata.es
• El Viajero Linked Data is project that focuses on theintegration of the contents produced by newspapersand digital platforms belonging to Prisa Groupand digital platforms belonging to Prisa Group.http://webenemasuno.linkeddata.es/
• A project with the Biblioteca Nacional to publish thelibrary information as Linked Data.yhttp://cultura.linkeddata.es/visualizer/
10
Linked Data in OEG
• Tools for generating and cosuming Linked Data, e.g.,• geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf• geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf
• map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/
• Spanish Thematic Network of Linked Data http://red.linkeddata.esp
» Group leader: Ontology Engineering Group
» 19 Research Groups
» 4 companies» 4 companies
11
ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
12
Classic Web
MovieDB
Data exposed tothe Web viathe Web via
HTML, pdf, etc.
CIAWorld
FactBook
13
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Classic Web
Information fromsingle pagesComplex queries
l i ls g e pages
can be found viasearch engines
over multiplepages / data
?sea c e g es
sources?
14
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
What do we actually want?
• Use the Web like a single global database
MovieDBCIA
WorldFactBook
15
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters usedGlobal Identifier: URI (Uniform Resource Identifier), which is a string of characters used
to identify a name or a resource on the Internet.Data Model: RDF (Resource Description Framework), which is a standard model
for data interchange on the WebAccess Mechanism: HTTPConnection: Typed Links
8000000
“Even the Rain”
http://cia.../Boliviahttp://imdb.../TLLuvia
http://.../populationhttp://.../name
http://.../filming_location
p
MovieDBCIA
WorldFactBook
16
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
In a nutshell• An extension of the current• An extension of the current
Web…• … where information and services data
are given well-defined and explicitly represented meaning, …
• … so that it can be shared and used by humans and machinesby humans and machines, ...
• ... better enabling them to work in cooperation
• How?• Promoting information exchange by
tagging web content with machine processable descriptions of its meaning. A d t h l i d i f t t• And technologies and infrastructureto do this
• And clear principles on how to publish data
17
publish data
The four principles (Tim Berners Lee, 2006)
1. Use URIs as names for things
• http://www.w3.org/DesignIssues/Linkedfor things
2. Use HTTP URIs so that people can look
esignIssues/LinkedData.html
that people can look up those names.
3. When someone looks http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlhttp://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
up a URI, provide useful information,
i th t d dusing the standards (RDF*, SPARQL)
4 Include links to other4. Include links to other URIs, so that they can discover more things.discover more things.
18
RDF – Resource Description FrameworkW3C d ti• W3C recommendation
RDF is a basic KR language based on semantic networks• RDF is a basic KR language based on semantic networks
• Useful to represent metadata and describe any type of• Useful to represent metadata and describe any type of information in a machine-accesible way (aka data model)
• Resources are described in terms of properties and property values using RDF statementSt t t t d t i l i ti f• Statements are represented as triples, consisting of a subject, predicate, and object [S,P,O]
Subject Objectproperty
19
Statement
© Slide adapted from “RDF and RDF Schema”- Raúl García et al.
RDF - Example“Alex Villazón”Alex Villazón
hasName
hasColleaguehttp://upb.edu/Alex http://upb.edu/Hugo
htt // b d /B t
hasColleague
“M l ”http://upb.edu/Beto “Male”
• For practical purposes, specially if handwritten, URIs are shortened using XML namespaces• xmlns:upb=“http://upb edu/”
hasSex
xmlns:upb http://upb.edu/• upb:Alex is equivalent to http://upb.edu/Alex• RDF serializations: XML, N3, N-Triple
“Alex Villazón”
person:hasName
upb:Alex upb:Hugoperson:hasColleague
person:hasColleague
20
upb:Beto “Male”person:hasSex
© Slide adapted from “RDF and RDF Schema”- Raúl García et al.
RDF - SPARQL“Alex Villazón”
person:hasName
person hasColleag eupb:Alex upb:Hugo
person:hasColleague
person:hasColleague
• Query: “Tell me who are the persons who have Hugo as colleague”
upb:Beto “Male”person:hasSex
upb:Hugo?person:hasColleague
• Result: upb:Alex and upb:Beto
SPARQL l f RDF W3C d i• SPARQL query language for RDF. W3C recommendationSELECT ?s WHERE { ?s person:hasColleague upb:Hugo.}
21© Slide adapted from “RDF and RDF Schema”- Raúl García et al.
So does that mean I have to publish my data as Linked Data, now?
• But, why?
• What was your incentive to publish an HTML page in 1990?• Share data in documents and because your neighbor
was doing itwas doing it
• So, why should we publish Linked Data in 2011?, y p• Share data as data and because your neighbor is doing it
22
© Slide adapted from “Introduction to Linked Data”- Juan Sequeda
And guess who is starting to publish Linked Data now?
• UK Government• UK Government• US Government• BBC• Open Calais• Freebase• NY Times• CNET• Dbpedia• Dbpedia• ….
23
Linked Open Data evolution
2007
2008
2009
2424
Linked Open Data
2010
25
http://richard.cyganiak.de/2007/10/lod/
ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
26
Guidelines for Publishing Linked Data
27
Guidelines for Publishing Linked Data
28
Identification of the data sources
• Guidelines based on the Open Data Manual 1
• Two possibilities
• To find the data sources already available in a public data catalog, e.g., Aporta project 2
• To get an agreement with a particular government body topublish its data sources, e.g., GeoLinkedData - IGNp g
29
1 http://opendatamanual.org/2 http://aporta.es
GeoLinkedDataIdentification of the data sources
IGNNational Geographic Institute of Spain
Agreement with the IGN
g p p
Oracle & MySQL
Data sources availablein a public data catalog
INENational Statistic Institute of Spain
in a public data catalog
30
IGN & INEIdentification of the data sources
Year
Industry Production IndexProvince
31
Guidelines for Publishing Linked Data
32
OntologyVocabulary Modelling
• An ontology is an engineering artifact, which provides: • A set of terms• A set of explicit assumptions regarding the intended meaning of the terms.
• Almost always including concepts and their classification• Almost always including properties between concepts
Shared nderstanding of a domain of interest• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S), both based on RDFOntologies expressed in OWL or RDF(S), both based on RDF
33
Reuse available vocabulariesVocabulary Modelling
Search for suitablevocabularies
Linked Open Vocabularies
are theresuitable
vocabularies?
Build the vocabulary byreusing available
vocabularies
Yes
No
34
…
Reuse available non-ontological resourcesVocabulary Modelling
Highly reliable Web Sites
Search for suitablenon-ontological resources
Domain-related sites
Government CatalogsGovernment Catalogs
are theresuitable
resources?
Build the vocabulary bytransforming available
resources
Yes
No
Build the vocabulary fromscratch
35
GeoLinkedDataVocabulary Modelling
scv:Dimensionscv:Item
scv:Dataset
WGS84 Geo Positioning: an RDF
vocabulary
hydrographical phenomena (riversphenomena (rivers,
lakes, etc.)
Vocabulary for instants, intervals, , ,durations, etc.
Ontology for OGC Geography Markup Language
Names and international code systems for territories and groupsg g
Classes 33 33
Object Properties 44 44
http://neon-toolkit.org/
j p
Data Properties 318 318
36
Guidelines for Publishing Linked Data
37
Generation of the RDF Data
INEINE
NOR2O
ODEMapster
IGNIGN
IGNIGN
GeospatialGeospatialcolumncolumn
Geometry2RDF
38
NOR2OIndustry Production Index Year
Generation of the RDF Data
Industry Production Index
ProvinceProvince
NOR2O
39
R2O & ODEMapsterR O is an extensible fully declarative language to describe
Generation of the RDF Data
• R2O is an extensible, fully declarative language to describe mappings between relational database schemas and ontologies.
• The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document
40
www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
R2O & ODEMapsterGeneration of the RDF Data
• Creation of the R2O Mappings
41
R2O & ODEMapsterGeneration of the RDF Data
Excerpt of the R2O documentExcerpt of the R2O document
42
geometry2rdfGeneration of the RDF Data
• Tool for generating RDF from geometrical information
• The geometry could be available in GML or WKT
• The RDF generated follows our Geometry Model
43
http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf
geometry2rdfGeneration of the RDF Data
Oracle STO UTIL packageOracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'
44
geometry2rdfGeneration of the RDF Data
Geometry ModelGeneration of the RDF Data
geoes: http://geo.linkeddata.es/geo: http://www.w3.org/2003/01/geo/wgs84_pos#
geoes:ontology/Geometría
rdfs:subClassOf rdfs:subClassOf
geoes:ontology/Polígonogeoes:ontology/Curvageo:Point
rdfs:subClassOfrdfs:subClassOf
rdfs:subClassOf
3939geo:lat geo:long Collection of 2 or Collection of 3 or
formadoPor formadoPor
more geo:PointsCollection of 3 ormore geo:Points
46
RDF generated according to our Geometry ModelGeneration of the RDF Data
1 2
0
0
47
URI GenerationGeneration of the RDF Data
• URIs are extremely relevant in this process since they are the key for the alignment of heterogeneousthey are the key for the alignment of heterogeneous resources that come from different data sources.• Cool URIs 1
• UK Cabinet Office 2
• Examples:http://geo.linkeddata.es/ontology/{class/property}
http://geo.linkeddata.es/ontology/Lago
http://geo linkeddata es/resource/dataset/type/{resourcename}http://geo.linkeddata.es/resource/dataset/type/{resourcename}
http://geo.linkeddata.es/resource/Provincia/Madrid
48
1 http://www.w3.org/TR/cooluris/2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf
Provenance InformationGeneration of the RDF Data
• It is relevant• to manage the provenance information of the resources• to manage the provenance information of the resources• to establish the license of the information
• Example
49
Pubby: http://www4.wiwiss.fu-berlin.de/pubby/
Guidelines for Publishing Linked Data
50
Publication of the RDF data
map4rdf
map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/
SPARQLLinked DataHTML
PubbyIncluding Provenance Pubby
Pubby 0.3
Including ProvenanceSupport
http://www4.wiwiss.fu-berlin.de/pubby/
51
Virtuoso 6.1.0
Guidelines for Publishing Linked Data
52
Data Cleansing
• To find possible errors, identified by Hogan et al.• http-level issues such as accessibility and derefencability• http-level issues, such as accessibility and derefencability,
e.g., HTTP URIs return 40x/50x errors• reasoning issues such as namespace without vocabulary,
e.g., rss:item term invented• malformed/incompatible datatypes, e.g., “true” as xsd:int
• To fix the identified errors
• Example, encoding URIs• Special characters á é ñSpecial characters á, é, ñ
• http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga
53
Guidelines for Publishing Linked Data
54
Linking the RDF Data
Identify suitable data sets li ki t t
http://ckan.netas linking targets
Discover relationshipsbetween data items
Silk FrameworkLIMEShttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
Validate the relationshipsdiscovered sameAs Validator
http://oegdev.dia.fi.upm.es:8080/sameAs/
55
GeoLinkedDataLinking the RDF Data
GeoLinkedData
GeoNamesDBPedia
…. …. ….
http://sws.geonames.org/6355233/
http://geo.linkeddata.es/.../Madrid
http://dbpedia.org/resource/Madrid
56
…. …. ….
sameAs ValidatorLinking the RDF Data
http://oegdev.dia.fi.upm.es:8080/sameAs/
57
Guidelines for Publishing Linked Data
58
Register the dataset into CKAN RegistryEnable Effective Discovery
• Add the dataset to CKAN, the open registry of data and content packagesand content packages
• Minimum information• Minimum information• Name, unique ID for your data set on CKAN• Title, full name of your data set, y• URL, link to the data set home page
59
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
Sitemap protocolEnable Effective Discovery
• Used by web crawlers• Efficiently find all your content & discover
what has been updatedhttp://sitemaps.org/
A i fil i i f i di URLA sitemap file contains information regarding one or more URLs onyour Web site. The information that is stored there helps searchengines better spider your website.
60
Sindice: the best RDF search engineEnable Effective Discovery
61
sitemap4rdfEnable Effective Discovery
• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap• Generates sitemap
it 4 df htt // it / l htt // it / /sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Example:
it 4 df if i th SPARQL d i t
sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/
• run sitemap4rdf specifying the SPARQL endpointand the prefix of the URLs to include in the Sitemap
62
http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
Submit the sitemap location - SindiceEnable Effective Discovery
• http://sindice.com/main/submit
63
Submit the sitemap location - GoogleEnable Effective Discovery
• https://www.google.com/webmasters/tools/
64
ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
65
DEMODEMOhttp://geo linkeddata es/browserhttp://geo.linkeddata.es/browser
66
Provinces
67
Capital of Province
68
Provinces – Industry Production Index
69
Beaches
70
DEMODEMOhttp://webenemasuno linkeddata es/http://webenemasuno.linkeddata.es/
71
Trips
72
Guide Locations
73
Guide
74
Future Work
75
Streaming resources
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
Cochabamba, BoliviaMay, 2011