validating and describing linked data portals using rdf shape expressions

29
Validating and Describing Linked Data Portals using RDF Shape Expressions Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA [email protected] Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain [email protected] Jose María Álvarez Rodríguez Dept. Computer Science Carlos III University Spain josemaria.alvarez@uc3m .es

Upload: jose-emilio-labra-gayo

Post on 17-Dec-2014

304 views

Category:

Software


0 download

DESCRIPTION

Presentation at 1st Linked Data Quality Workshop, Leipzig, 2nd Sept. 2014 Author: Jose Emilio Labra Gayo Applies Shapes Expressions to validate the WebIndex linked data portal

TRANSCRIPT

Page 1: Validating and Describing Linked Data Portals using RDF Shape Expressions

Validating and Describing Linked Data Portals using

RDF Shape Expressions

Eric Prud'hommeauxWorld Wide Web

ConsortiumMIT, Cambridge, MA, USA

[email protected]

Harold SolbrigMayo Clinic

USACollege of Medicine, Rochester,

MN, USA

Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo

[email protected]

Jose María Álvarez RodríguezDept. Computer Science

Carlos III UniversitySpain

[email protected]

Page 2: Validating and Describing Linked Data Portals using RDF Shape Expressions

This talk in one slide

Shape ExpressionsSimple language to describe and validate RDFCan be applied to linked data portals

Use case: WebIndex project

Page 3: Validating and Describing Linked Data Portals using RDF Shape Expressions

Web Index

Measure WWW's contribution to development and human rights by countryDeveloped by the Web Foundation81 countries, 116 indicators, 5 years (2007-12)

Linked data portalhttp://data.webfoundation.org/webindex/2013

Page 4: Validating and Describing Linked Data Portals using RDF Shape Expressions

Webindex workflow

Data(Excel)

RDF Datastore

VisualizationsLinked data portal

ConversionExcel RDF

Enrichment

Page 5: Validating and Describing Linked Data Portals using RDF Shape Expressions

WebIndex data model

ITU_B 2011 2012 2013 ...

Germany 20.34 35.46 37.12 ...

Spain 19.12 23.78 25.45 ...

France 20.12 21.34 28.34 ...

... ... ... ... ...

ITU_B 2011 2012 2013 ...

Germany 20.34 35.46 37.12 ...

Spain 19.12 23.78 25.45 ...

France 20.12 21.34 28.34 ...

... ... ... ... ...

ITU_B 2010 2011 2012 ...

Germany 20.34 35.46 37.12 ...

Spain 19.12 23.78 25.45 ...

France 20.12 21.34 28.34 ...

... ... ... ... ...

DataSets are published by OrganizationsDatasets contain several slices Slices group observations

Model based on RDF Data CubeMain entity = Observation

Observations have values by yearsObservations refer to indicators and countries

Observation

YearsIndicator% Broadband subscribers

Countries

SliceDataSet

Indicators are provided by OrganizationsExamples ITU = International Telecommunication UnionUN = United NationsWB = World bank...

Page 6: Validating and Describing Linked Data Portals using RDF Shape Expressions

Main webIndex data model*

Observation rdf:type = qb:Observationcex:value: xsd:floatdc:issued: xsd:dateTimerdfs:label: xsd:Stringcex:ref-year: xsd:gYear

Indicator rdf:type = cex:Primary | cex:Secondaryrdfs:label: xsd:stringrdfs:comment: xsd:stringskos:notation: xsd:String

Organization rdf:type = org:Organizationrdfs:label: xsd:Stringfoaf:homepage: IRI

Slice rdf:type = qb:Sliceqb:sliceStructure: wf:sliceByArea

Country rdf:type = wf:Countrywf:iso2 : xsd:stringwf:iso3 : xsd:stringrdfs:label : xsd:String

DataSet rdf:type = qb:DataSetqb:structure : wf:DSDrdfs:label : xsd:String

cex:ref-area cex:indicator

qb:observation

1..nqb:observation

qb:slice

wf:provider

dc:publisher

1..n

*Simplified

Page 7: Validating and Describing Linked Data Portals using RDF Shape Expressions

indicator:ITU_B a wf:SecondaryIndicator ; rdfs:label "Broadband subscribers %" .dataset:DITU a qb:DataSet ; rdfs:label "ITU Dataset" ; dc:publisher org:ITU ; qb:slice slice:ITU10B , slice:ITU11B, . ... ...slice:ITU11B a qb:Slice ; qb:sliceStructure wf:sliceByYear ; qb:observation obs:obs8165, obs:obs8166, ......org:ITU a org:Organization ; rdfs:label "ITU" ; foaf:homepage <http://www.itu.int/> .country:Spain a wf:Country ; wf:iso2 "ES" ; wf:iso3 "ESP" ; rdfs:label "Spain" .

obs:obs8165 a qb:Observation ; rdfs:label "ITU B in ESP, 2011" ; cex:indicator indicator:ITU_B ; qb:dataSet dataset:DITU ; cex:value "23.78"^^xsd:float ; cex:ref-year 2011 ; cex:ref-area country:Spain ; dc:issued "2013-05-30"^^xsd:date ;... .

Excel RDF (Turtle)interrelated

linked data

Page 8: Validating and Describing Linked Data Portals using RDF Shape Expressions

Description and Validation

Lots of constraintsObservations must be linked to some countryObservations have a float valueObservations are related with an indicator, a country

and a yearDataset contains several slices and slices contain

several observations....etc.

Q: How can we express those constraints easily?Our proposal: Shape expressions

Page 9: Validating and Describing Linked Data Portals using RDF Shape Expressions

Shape Expressions

A simple and intuitive language to:Describe the topology of RDF dataValidate that an RDF graph matches a shape

Two syntaxes- Compact syntax (inspired by RelaxNG, Turtle and SPARQL)

- RDF

More details about ShEx: Paper in Semantics-2014 (5th Sept, 10:30h)

http://www.w3.org/2013/ShEx/Primer

Page 10: Validating and Describing Linked Data Portals using RDF Shape Expressions

Country

<Country> { rdf:type (wf:Country), rdfs:label xsd:string, wf:iso2 xsd:string, wf:iso3 xsd:string }

Label Open shape

Conjunction

A <Country> has at least the following properties: rdf:type with value wf:Country rdfs:label with value of type xsd:stringwf:iso2 with value of type xsd:stringwf:iso3 with value of type xsd:string

Using shape Expressions:

Page 11: Validating and Describing Linked Data Portals using RDF Shape Expressions

DataSets

A <DataSet> has the shape:rdf:type with value qb:Datasetqb:structure with value wf:DSDOptional rdfs:label with value of type xsd:string

One or more qb:slice with shape <Slice><DataSet> { rdf:type (qb:DataSet), qb:structure (wf:DSD), dc:publisher @<Organization>, rdfs:label xsd:string ?, qb:slice @<Slice>+}

Cardinality posibilities: * (0 or more) ? (0 or 1) + (1 or more) {m,n} between m and n

Page 12: Validating and Describing Linked Data Portals using RDF Shape Expressions

Whar does it mean?

Slices

<Slice> has the properties:rdf:type with value qb:Sliceqb:SliceStructure with value wf:sliceByYearSeveral qb:observation with shape <Observation>

cex:indicator with shape <Indicator>

<Slice> { rdf:type (qb:Slice) , qb:sliceStructure (wf:sliceByYear) , qb:observation @<Observation>+ , cex:indicator @<Indicator> }

Page 13: Validating and Describing Linked Data Portals using RDF Shape Expressions

Observations

<Observation> { rdf:type (qb:Observation), cex:value xsd:float ?, dc:issued xsd:dateTime, rdfs:label xsd:string ?, qb:dataSet @<DataSet>, cex:ref-area @<Country>, cex:indicator @<Indicator>, cex:ref-year xsd:gYear}

Page 14: Validating and Describing Linked Data Portals using RDF Shape Expressions

...and moreIndicators<Indicator> { rdf:type (wf:PrimaryIndicator wf:SecondaryIndicator), rdfs:label xsd:string, rdfs:comment xsd:string ?, skos:notation xsd:string ?}

Organizations<Organization> { rdf:type (org:Organization), rdfs:label xsd:string, foaf:homepage IRI, org:hasSubOrganization @<Organization>}

Page 15: Validating and Describing Linked Data Portals using RDF Shape Expressions

Current Use of shape expressions in WebIndex

1. Documentation of linked data portalHuman-readableMachine processable

2. Team communicationTell the developers which shapes they must generate

3. Validation For example: check if a value of type qb:Observation has shape <Observation>

http://weso.github.io/wiDoc

Page 16: Validating and Describing Linked Data Portals using RDF Shape Expressions

1 CONSTRUCT { 2 ?Organization shex:hasShape <Organization> . 3 } WHERE { SELECT ?Organization { 4 ?Organization a ?o . 5 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 6 { SELECT ?Organization { 7 ?Organization a ?o . FILTER ((?o = org:Organization)) 8 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 9 { SELECT ?Organization {10 ?Organization rdfs:label ?o .11 } GROUP BY ?Organization HAVING (COUNT(*)=1)}12 { SELECT ?Organization {13 ?Organization rdfs:label ?o . 14 FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))15 } GROUP BY ?Organization HAVING (COUNT(*)=1)}16 { SELECT ?Organization {17 ?Organization foaf:homepage ?o .18 } GROUP BY ?Organization HAVING (COUNT(*)=1)}19 { SELECT ?Organization {20 ?Organization foaf:homepage ?o . FILTER (isIRI(?o))21 } GROUP BY ?Organization HAVING (COUNT(*)=1)}22 { SELECT ?Organization (COUNT(*) AS ?Organization_c0) {23 ?Organization org:hasSubOrganization ?o .24 } GROUP BY ?Organization}25 { SELECT ?Organization (COUNT(*) AS ?Organization_c1) {26 ?Organization org:hasSubOrganization ?o .26 } GROUP BY ?Organization}28 FILTER (?Organization_c0 = ?Organization_c1)29 }

OK, but...why not use SPARQL?

Shape Expressions1 <Organization> {2 rdf:type (org:Organization)3 , rdfs:label xsd:string4 , foaf:homepage IRI5 , org:hasSubOrganization @<Organization>6 }

SPARQL

Full example of WebIndex: 61 lines ShEx vs 580 lines SPARQL

Opgnization shape

Page 17: Validating and Describing Linked Data Portals using RDF Shape Expressions

Implementations of Shape Expressions

Name Main Developer

Language Features

FancyDemo Eric Prud'hommeaux

Javascript First implementationSemantic Actions and Conversion to SPARQLhttp://www.w3.org/2013/ShEx/

JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntaxhttps://github.com/jessevdam/shextest

ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,...Efficient implementation using Derivativeshttp://labra.github.io/ShExcala/

Haws Jose E. Labra Haskell Prototype to check Inference semanticshttp://labra.github.io/haws/

Page 18: Validating and Describing Linked Data Portals using RDF Shape Expressions

RDFShape: Online validation toolRDFShape = online validation tool

Deployed at: http://rdfshape.weso.esBased on ShExcala

Can validate by:URIFileTextareaSPARQL endpoint Dereference

Page 19: Validating and Describing Linked Data Portals using RDF Shape Expressions

Shapes ≠ TypesTypes oriented to concepts

Inference: RDF Schena, OWLShapes oriented to RDF graphs

Interface descriptions

<Observation> { rdf:type (qb:Observation),cex:ref-year xsd:gYear...other properties from WebIndex }

WebIndex

<Observation> { rdf:type (qb:Observation), lb:time @<Time>...other properties from LandPortal}

LandPortal

qb:Observation a rdfs:Class, owl:Class; rdfs:label "Observation"@en; rdfs:comment "... "@en; rdfs:subClassOf qb:Attachable; owl:equivalentClass scovo:Item; rdfs:isDefinedBy ...

RDF Data Cube

Types are globalCan be local to a data portal

However, we can also define generic Shapes templates

Page 20: Validating and Describing Linked Data Portals using RDF Shape Expressions

Extensions & challenges

Shape Expressions language has just been bornIt will be affected by W3c Charter group about

RDF Data ShapesMailing list: [email protected]"The discussion on [email protected] is the

best entertainment since years; Game of Thrones colors pale." @PaulZH

Page 21: Validating and Describing Linked Data Portals using RDF Shape Expressions

Alternatives, negations, etc.

More expressiveness from regular expressionsAlternativesNegationsGroupings

<Country> { a (wf:Country) , rdfs:label xsd:string , ( wf:iso2 xsd:string | wf:iso3 xsd:string ) , ! dc:creator . }

Page 22: Validating and Describing Linked Data Portals using RDF Shape Expressions

Open vs Closed shapes

Open shapes allow remaining triples after validation

<User> { a (foaf:Person) }

:john a foaf:Person, foaf:name "John" .

<User> [ a (foaf:Person) ]

:anna a foaf:Person .

Open shape Closed shape

Page 23: Validating and Describing Linked Data Portals using RDF Shape Expressions

Shape inclusions

A shape includes/extends another shapeExample:

<Provider> extends <Organization> { wf:sourceURI IRI }

A <Provider> has the same shape as an <Organization> plus the property wf:sourceURI

Page 24: Validating and Describing Linked Data Portals using RDF Shape Expressions

Reverse and relation arcs

Shape Expressions describe subjects Can be extended to describe

objects (reverse arcs)predicates (relation arcs)

<Country> { rdf:type (wf:Country), rdfs:label xsd:string, wf:iso2 xsd:string , wf:iso3 xsd:string , ^ cex:ref-area @<Observation> *}

Reverse arc: a country has several incoming arcs with property cex:ref-areafrom subjects with shape <Observation>

Page 25: Validating and Describing Linked Data Portals using RDF Shape Expressions

Semantic actions

Defines actions to be executed during validation%lang{ ...actions... %}

Calls lang processor passing it the given actions

<Country> { ... wf:iso2 xsd:string %js{return /^[A-Z]{2}$/i.test(_.o.lex); %} , wf:iso3 xsd:string %js{return /^[A-Z]{3}$/i.test(_.o.lex); %}}

Page 26: Validating and Describing Linked Data Portals using RDF Shape Expressions

Variables and filters

Add variable bindings to matched valuesAdd FILTER rules similar to SPARQL

<Country> { wf:iso2 (xsd:string AS ?iso2) wf:iso3 (xsd:string AS ?iso3) FILTER (regex(?iso2,"^[A-Z]{2}$","i") && regex(?iso3,"^[A-Z]{3}$","i"))}

Page 27: Validating and Describing Linked Data Portals using RDF Shape Expressions

Conclusions

Shape Expressions: DSL to describe and validate RDFRole similar to DTDs or Schema languages for XML

Quality of linked data portalsShape Expressions as interface descriptions

Publishers: Understand what to publishConsumers: Check data before processing

Page 28: Validating and Describing Linked Data Portals using RDF Shape Expressions

Future Work

Shape Expressions languageNamed graphsImplementations: Debugging and error messagesPerformance

Applications toOther linked data use casesUser interface generation Binding: generate parsers/tools from shapes...

Page 29: Validating and Describing Linked Data Portals using RDF Shape Expressions

End of presentation

More info: Jose Emilio Labra Gayohttp://www.weso.es