validating and describing linked data portals using rdf shape expressions
DESCRIPTION
Presentation at 1st Linked Data Quality Workshop, Leipzig, 2nd Sept. 2014 Author: Jose Emilio Labra Gayo Applies Shapes Expressions to validate the WebIndex linked data portalTRANSCRIPT
Validating and Describing Linked Data Portals using
RDF Shape Expressions
Eric Prud'hommeauxWorld Wide Web
ConsortiumMIT, Cambridge, MA, USA
Harold SolbrigMayo Clinic
USACollege of Medicine, Rochester,
MN, USA
Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo
Jose María Álvarez RodríguezDept. Computer Science
Carlos III UniversitySpain
This talk in one slide
Shape ExpressionsSimple language to describe and validate RDFCan be applied to linked data portals
Use case: WebIndex project
Web Index
Measure WWW's contribution to development and human rights by countryDeveloped by the Web Foundation81 countries, 116 indicators, 5 years (2007-12)
Linked data portalhttp://data.webfoundation.org/webindex/2013
Webindex workflow
Data(Excel)
RDF Datastore
VisualizationsLinked data portal
ConversionExcel RDF
Enrichment
WebIndex data model
ITU_B 2011 2012 2013 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
ITU_B 2011 2012 2013 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
ITU_B 2010 2011 2012 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
DataSets are published by OrganizationsDatasets contain several slices Slices group observations
Model based on RDF Data CubeMain entity = Observation
Observations have values by yearsObservations refer to indicators and countries
Observation
YearsIndicator% Broadband subscribers
Countries
SliceDataSet
Indicators are provided by OrganizationsExamples ITU = International Telecommunication UnionUN = United NationsWB = World bank...
Main webIndex data model*
Observation rdf:type = qb:Observationcex:value: xsd:floatdc:issued: xsd:dateTimerdfs:label: xsd:Stringcex:ref-year: xsd:gYear
Indicator rdf:type = cex:Primary | cex:Secondaryrdfs:label: xsd:stringrdfs:comment: xsd:stringskos:notation: xsd:String
Organization rdf:type = org:Organizationrdfs:label: xsd:Stringfoaf:homepage: IRI
Slice rdf:type = qb:Sliceqb:sliceStructure: wf:sliceByArea
Country rdf:type = wf:Countrywf:iso2 : xsd:stringwf:iso3 : xsd:stringrdfs:label : xsd:String
DataSet rdf:type = qb:DataSetqb:structure : wf:DSDrdfs:label : xsd:String
cex:ref-area cex:indicator
qb:observation
1..nqb:observation
qb:slice
wf:provider
dc:publisher
1..n
*Simplified
indicator:ITU_B a wf:SecondaryIndicator ; rdfs:label "Broadband subscribers %" .dataset:DITU a qb:DataSet ; rdfs:label "ITU Dataset" ; dc:publisher org:ITU ; qb:slice slice:ITU10B , slice:ITU11B, . ... ...slice:ITU11B a qb:Slice ; qb:sliceStructure wf:sliceByYear ; qb:observation obs:obs8165, obs:obs8166, ......org:ITU a org:Organization ; rdfs:label "ITU" ; foaf:homepage <http://www.itu.int/> .country:Spain a wf:Country ; wf:iso2 "ES" ; wf:iso3 "ESP" ; rdfs:label "Spain" .
obs:obs8165 a qb:Observation ; rdfs:label "ITU B in ESP, 2011" ; cex:indicator indicator:ITU_B ; qb:dataSet dataset:DITU ; cex:value "23.78"^^xsd:float ; cex:ref-year 2011 ; cex:ref-area country:Spain ; dc:issued "2013-05-30"^^xsd:date ;... .
Excel RDF (Turtle)interrelated
linked data
Description and Validation
Lots of constraintsObservations must be linked to some countryObservations have a float valueObservations are related with an indicator, a country
and a yearDataset contains several slices and slices contain
several observations....etc.
Q: How can we express those constraints easily?Our proposal: Shape expressions
Shape Expressions
A simple and intuitive language to:Describe the topology of RDF dataValidate that an RDF graph matches a shape
Two syntaxes- Compact syntax (inspired by RelaxNG, Turtle and SPARQL)
- RDF
More details about ShEx: Paper in Semantics-2014 (5th Sept, 10:30h)
http://www.w3.org/2013/ShEx/Primer
Country
<Country> { rdf:type (wf:Country), rdfs:label xsd:string, wf:iso2 xsd:string, wf:iso3 xsd:string }
Label Open shape
Conjunction
A <Country> has at least the following properties: rdf:type with value wf:Country rdfs:label with value of type xsd:stringwf:iso2 with value of type xsd:stringwf:iso3 with value of type xsd:string
Using shape Expressions:
DataSets
A <DataSet> has the shape:rdf:type with value qb:Datasetqb:structure with value wf:DSDOptional rdfs:label with value of type xsd:string
One or more qb:slice with shape <Slice><DataSet> { rdf:type (qb:DataSet), qb:structure (wf:DSD), dc:publisher @<Organization>, rdfs:label xsd:string ?, qb:slice @<Slice>+}
Cardinality posibilities: * (0 or more) ? (0 or 1) + (1 or more) {m,n} between m and n
Whar does it mean?
Slices
<Slice> has the properties:rdf:type with value qb:Sliceqb:SliceStructure with value wf:sliceByYearSeveral qb:observation with shape <Observation>
cex:indicator with shape <Indicator>
<Slice> { rdf:type (qb:Slice) , qb:sliceStructure (wf:sliceByYear) , qb:observation @<Observation>+ , cex:indicator @<Indicator> }
Observations
<Observation> { rdf:type (qb:Observation), cex:value xsd:float ?, dc:issued xsd:dateTime, rdfs:label xsd:string ?, qb:dataSet @<DataSet>, cex:ref-area @<Country>, cex:indicator @<Indicator>, cex:ref-year xsd:gYear}
...and moreIndicators<Indicator> { rdf:type (wf:PrimaryIndicator wf:SecondaryIndicator), rdfs:label xsd:string, rdfs:comment xsd:string ?, skos:notation xsd:string ?}
Organizations<Organization> { rdf:type (org:Organization), rdfs:label xsd:string, foaf:homepage IRI, org:hasSubOrganization @<Organization>}
Current Use of shape expressions in WebIndex
1. Documentation of linked data portalHuman-readableMachine processable
2. Team communicationTell the developers which shapes they must generate
3. Validation For example: check if a value of type qb:Observation has shape <Observation>
http://weso.github.io/wiDoc
1 CONSTRUCT { 2 ?Organization shex:hasShape <Organization> . 3 } WHERE { SELECT ?Organization { 4 ?Organization a ?o . 5 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 6 { SELECT ?Organization { 7 ?Organization a ?o . FILTER ((?o = org:Organization)) 8 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 9 { SELECT ?Organization {10 ?Organization rdfs:label ?o .11 } GROUP BY ?Organization HAVING (COUNT(*)=1)}12 { SELECT ?Organization {13 ?Organization rdfs:label ?o . 14 FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))15 } GROUP BY ?Organization HAVING (COUNT(*)=1)}16 { SELECT ?Organization {17 ?Organization foaf:homepage ?o .18 } GROUP BY ?Organization HAVING (COUNT(*)=1)}19 { SELECT ?Organization {20 ?Organization foaf:homepage ?o . FILTER (isIRI(?o))21 } GROUP BY ?Organization HAVING (COUNT(*)=1)}22 { SELECT ?Organization (COUNT(*) AS ?Organization_c0) {23 ?Organization org:hasSubOrganization ?o .24 } GROUP BY ?Organization}25 { SELECT ?Organization (COUNT(*) AS ?Organization_c1) {26 ?Organization org:hasSubOrganization ?o .26 } GROUP BY ?Organization}28 FILTER (?Organization_c0 = ?Organization_c1)29 }
OK, but...why not use SPARQL?
Shape Expressions1 <Organization> {2 rdf:type (org:Organization)3 , rdfs:label xsd:string4 , foaf:homepage IRI5 , org:hasSubOrganization @<Organization>6 }
SPARQL
Full example of WebIndex: 61 lines ShEx vs 580 lines SPARQL
Opgnization shape
Implementations of Shape Expressions
Name Main Developer
Language Features
FancyDemo Eric Prud'hommeaux
Javascript First implementationSemantic Actions and Conversion to SPARQLhttp://www.w3.org/2013/ShEx/
JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntaxhttps://github.com/jessevdam/shextest
ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,...Efficient implementation using Derivativeshttp://labra.github.io/ShExcala/
Haws Jose E. Labra Haskell Prototype to check Inference semanticshttp://labra.github.io/haws/
RDFShape: Online validation toolRDFShape = online validation tool
Deployed at: http://rdfshape.weso.esBased on ShExcala
Can validate by:URIFileTextareaSPARQL endpoint Dereference
Shapes ≠ TypesTypes oriented to concepts
Inference: RDF Schena, OWLShapes oriented to RDF graphs
Interface descriptions
<Observation> { rdf:type (qb:Observation),cex:ref-year xsd:gYear...other properties from WebIndex }
WebIndex
<Observation> { rdf:type (qb:Observation), lb:time @<Time>...other properties from LandPortal}
LandPortal
qb:Observation a rdfs:Class, owl:Class; rdfs:label "Observation"@en; rdfs:comment "... "@en; rdfs:subClassOf qb:Attachable; owl:equivalentClass scovo:Item; rdfs:isDefinedBy ...
RDF Data Cube
Types are globalCan be local to a data portal
However, we can also define generic Shapes templates
Extensions & challenges
Shape Expressions language has just been bornIt will be affected by W3c Charter group about
RDF Data ShapesMailing list: [email protected]"The discussion on [email protected] is the
best entertainment since years; Game of Thrones colors pale." @PaulZH
Alternatives, negations, etc.
More expressiveness from regular expressionsAlternativesNegationsGroupings
<Country> { a (wf:Country) , rdfs:label xsd:string , ( wf:iso2 xsd:string | wf:iso3 xsd:string ) , ! dc:creator . }
Open vs Closed shapes
Open shapes allow remaining triples after validation
<User> { a (foaf:Person) }
:john a foaf:Person, foaf:name "John" .
<User> [ a (foaf:Person) ]
:anna a foaf:Person .
Open shape Closed shape
Shape inclusions
A shape includes/extends another shapeExample:
<Provider> extends <Organization> { wf:sourceURI IRI }
A <Provider> has the same shape as an <Organization> plus the property wf:sourceURI
Reverse and relation arcs
Shape Expressions describe subjects Can be extended to describe
objects (reverse arcs)predicates (relation arcs)
<Country> { rdf:type (wf:Country), rdfs:label xsd:string, wf:iso2 xsd:string , wf:iso3 xsd:string , ^ cex:ref-area @<Observation> *}
Reverse arc: a country has several incoming arcs with property cex:ref-areafrom subjects with shape <Observation>
Semantic actions
Defines actions to be executed during validation%lang{ ...actions... %}
Calls lang processor passing it the given actions
<Country> { ... wf:iso2 xsd:string %js{return /^[A-Z]{2}$/i.test(_.o.lex); %} , wf:iso3 xsd:string %js{return /^[A-Z]{3}$/i.test(_.o.lex); %}}
Variables and filters
Add variable bindings to matched valuesAdd FILTER rules similar to SPARQL
<Country> { wf:iso2 (xsd:string AS ?iso2) wf:iso3 (xsd:string AS ?iso3) FILTER (regex(?iso2,"^[A-Z]{2}$","i") && regex(?iso3,"^[A-Z]{3}$","i"))}
Conclusions
Shape Expressions: DSL to describe and validate RDFRole similar to DTDs or Schema languages for XML
Quality of linked data portalsShape Expressions as interface descriptions
Publishers: Understand what to publishConsumers: Check data before processing
Future Work
Shape Expressions languageNamed graphsImplementations: Debugging and error messagesPerformance
Applications toOther linked data use casesUser interface generation Binding: generate parsers/tools from shapes...
End of presentation
More info: Jose Emilio Labra Gayohttp://www.weso.es