using sparql and spin for data quality management on the semantic web
DESCRIPTION
TRANSCRIPT
![Page 1: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/1.jpg)
Using SPARQL and SPIN for
Data Quality Management
on the Semantic Web
Christian Fürber / Martin [email protected], [email protected]
Presentation @ BIS
May 4th 2010
![Page 2: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/2.jpg)
Vision of the Semantic Web
Publishing data on the
web in a meaningful way for
more automation,
better integration,
and higher reusability of data.
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web 2
© Hanspeter Graf / www.pixelio.de
![Page 3: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/3.jpg)
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web3
Growth of Data:
Well on Track…
Reference: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html
Retrieving
information
Building smart
SemWeb apps
![Page 4: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/4.jpg)
C. Fürber, M. Hepp:
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
4
…but what if the published data was of
poor quality?
Get a giant
camcorder
from
amazon!
![Page 5: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/5.jpg)
Using Poor Data is Costly
Without quality checks your SemWeb Apps will
take this data seriously and…
…get an oversized shipping
package with expensive postage,
…and waste transportation capacity.
C. Fürber, M. Hepp:
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
5
![Page 6: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/6.jpg)
Yes, if we know about data quality
problems, before anything bad will
happen!
6
Is there any way to avoid data
quality disasters?
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
A giant
camcorder on
the road!
![Page 7: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/7.jpg)
The Impact of Poor Data Quality
7
Poor Decisions
Failed Business Processes
Failed Projects
Higher Costs
Missed Revenues
Lower Product /
Service Quality
Lower Stakeholder
Satisfaction
Fatal Disasters
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 8: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/8.jpg)
Data Quality is a Key Bottleneck of the
Semantic Web<vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1">
<vocab:location_ZIP></vocab:location_ZIP>
<vocab:location_STREETNO></vocab:location_STREETNO>
<vocab:location_COUNTRY>France</vocab:location_COUNTRY>
<vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
>1</vocab:location_ID>
<vocab:location_STREET>8489 Strong St.</vocab:location_STREET>
<vocab:location_STATE>NV</vocab:location_STATE>
<rdfs:label>location #1</rdfs:label>
<vocab:location_CITY>Las Vegas</vocab:location_CITY>
</vocab:location>
8
Missing literal values
Functional dependency
violation
Syntax violation
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
Unique value violation
![Page 9: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/9.jpg)
Our Approach
Identification of data quality problems on
instance level of Semantic Web sources
solely with Semantic Web technologies.
9C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
<vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1">
<vocab:location_ZIP></vocab:location_ZIP>
<vocab:location_STREETNO></vocab:location_STREETNO>
<vocab:location_COUNTRY>France</vocab:location_COUNTRY>
<vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
>1</vocab:location_ID>
<vocab:location_STREET>8489 Strong St.</vocab:location_STREET>
<vocab:location_STATE>NV</vocab:location_STATE>
<rdfs:label>location #1</rdfs:label>
<vocab:location_CITY>Las Vegas</vocab:location_CITY>
</vocab:location>
Integration advantages
Access to SemWeb data may be
useful for dqm.
![Page 10: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/10.jpg)
Proposed Architecture
10
RDB
SPIN
OBDQM
Domain-
Ontology
Knowledge
Base
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
Linked
Data Cloud
SPARQL + SPIN Query Layer
Ontology Layer
Data Sources Layer
![Page 11: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/11.jpg)
Defining Data Quality Rules with
SPARQL (1)
Define what is allowed and negate it.
Define what is not allowed.
Negations and regular expressions save manual
effort.
11C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 12: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/12.jpg)
Defining Data Quality Rules with
SPARQL (2)
The city „Las Vegas“ must be in the country „USA“.
12
# Checking functional dependency of {?arg4} with {?arg2}
CONSTRUCT {
_:b0 a spin:ConstraintViolation .
_:b0 spin:violationRoot ?this .
_:b0 spin:violationPath vocab:location_COUNTRY .
}
WHERE {
?this vocab:location_CITY „Las Vegas“ .
FILTER (!spl:hasValue(?this, vocab:location_COUNTRY, “USA”)) .
}
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 13: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/13.jpg)
Defining Data Quality Rules with
SPARQL (3)
High reusability of data quality rules through SPIN‘s
SPARQL query templates.
13
# Checking functional dependency of {?arg4} with {?arg2}
CONSTRUCT {
_:b0 a spin:ConstraintViolation .
_:b0 spin:violationRoot ?this .
_:b0 spin:violationPath ?arg3 .
}
WHERE {
?this ?arg1 ?arg2 .
FILTER (!spl:hasValue(?this, ?arg3, ?arg4)) .
}
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 14: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/14.jpg)
Enforced DQ-Rules with SPIN
C. Fürber, M. Hepp:
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
14
Application: http://www.topquadrant.com/products/TB_Composer.html#free
![Page 15: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/15.jpg)
More Data Quality Rule Templates (1)Data Quality Problem SPARQL Query Template
Missing literal values ASK WHERE {
?this ?arg1 "" .
}
Out of range value
(lower limit)
ASK WHERE {
?this ?arg1 ?value .
FILTER (?value < ?arg2) .
}
Out of range value
(upper limit)
ASK WHERE {
?this ?arg1 ?value .
FILTER (?value > ?arg2) .
}
15C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
RDB RDBKnowledge
Base
Global Ontology
![Page 16: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/16.jpg)
More Data Quality Rule Templates (2)Data Quality Problem SPARQL Query Template
Syntax violation
(only letters and dots
allowed)
ASK WHERE {
?this ?arg1 ?value .
FILTER (!regex(str(?value),
"^([A-Za-z,. ])*$"))}
Unique value violation CONSTRUCT {
_:b0 a spin:ConstraintViolation .
_:b0 spin:violationRoot ?a .
_:b0 spin:violationPath ?arg1 .
}
WHERE {
?a ?arg1 ?uniqueValue .
?b ?arg1 ?uniqueValue .
FILTER (?a != ?b)}
16C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
RDB RDB Knowledge
Base
Global Ontology
![Page 17: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/17.jpg)
Contributions
• Domain-independent SPARQL query
templates for data quality problem identification
• Queries are highly reusable
• Architecture enables the use of Linked Data
• Methodology for data quality management of
Semantic Web data
• First approach on how to apply SPIN for DQM
17C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 18: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/18.jpg)
Limitations & Open Issues
• Knowing the problem does not mean we can
solve it
• Homonym / Synonym handling
• Incomplete knowledge may cause constraint
violations of clean instances
• Current approach focuses on literal values
• Scalability on large data sets
18C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 19: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/19.jpg)
Ongoing Extensions
• Extension to a broader set of data quality problems
• Enabling synonym handling and homonym tolerance
• Enhancement of peformance
• Calculation of information quality scores
• Integration of Linked Data as trusted reference for
data quality management
• Evaluate the quality of popular Semantic Web data sets
on instance level (e.g. Geonames & DBPedia)
• Extension for (semi-)automated data cleansing
19C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web
![Page 20: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/20.jpg)
20
Christian FuerberResearcher
E-Business & Web Science Research Group
Werner-Heisenberg-Weg 39
85577 Neubiberg
Germany
skype c.fuerber
email [email protected]
web http://www.unibw.de/ebusiness
homepage http://www.fuerber.com
Paper is available at http://bit.ly/bYes0V
![Page 21: Using SPARQL and SPIN for Data Quality Management on the Semantic Web](https://reader034.vdocument.in/reader034/viewer/2022051323/54b7ab7e4a795913288b4650/html5/thumbnails/21.jpg)
References & Links
C. Fürber, M. Hepp:
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
21
LOD-Cloud:
http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html
D2RQ:
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/
SPIN:
http://spinrdf.org/
TopBraid Composer Free Edition:
http://www.topquadrant.com/products/TB_Composer.html#free