linked data quality assessment – daq and luzzu

27
Linked Data Quality Assessment – daQ and Luzzu Jeremy Debattista University of Bonn Presentation at the Ontology Engineering Group (UPM)

Upload: jerdeb

Post on 08-Aug-2015

296 views

Category:

Presentations & Public Speaking


3 download

TRANSCRIPT

Page 1: Linked Data Quality Assessment – daQ and Luzzu

Linked Data Quality Assessment – daQ and Luzzu

Jeremy DebattistaUniversity of Bonn

Presentation at the Ontology Engineering Group (UPM)

Page 2: Linked Data Quality Assessment – daQ and Luzzu

…who am I?

• B.Sc (Hons) in Computer Science – University of Malta– Thesis: Collaborative Editing and Expert Finding

• M.App Sc in Computer Science – DERI, National University of Ireland, Galway– Thesis: Ontology-based rules for User-Controlled

Support in Ubiquitous Environments

• PhD Candidate – University of Bonn

Page 3: Linked Data Quality Assessment – daQ and Luzzu

… my PhD – the big picture

• Work related to Data Quality (in LD)– representing quality metadata (daQ)– assessing data quality (Luzzu)– identifying new metrics from standard

vocabularies (like PROV-O)

Page 4: Linked Data Quality Assessment – daQ and Luzzu

… the need for Quality Metadata

• Convincing data consumers to use our published data

• Filtering datasets

• Poor Quality Perspective – Big Data Veracity

Page 5: Linked Data Quality Assessment – daQ and Luzzu

… the daQ vocabulary

Page 6: Linked Data Quality Assessment – daQ and Luzzu

… the daQ vocabulary

Page 7: Linked Data Quality Assessment – daQ and Luzzu

… the daQ vocabulary

• Metadata as Named Graphs

• Usage of abstract class concept

• Metric assessment as Observations

• Preserving Provenance information

Page 8: Linked Data Quality Assessment – daQ and Luzzu

… daQ on the Web

http://purl.org/eis/vocab/daq

Page 9: Linked Data Quality Assessment – daQ and Luzzu

… daQ Applications

• daQ validator – Validates quality metric schemas extending the daQ (will be online soon)– e.g. checking that each dimension is in exactly one category…

• Luzzu – next slides

Page 10: Linked Data Quality Assessment – daQ and Luzzu

… Luzzu – QA Framework

• A comprehensive QA framework– assesses LD quality using user-provided metrics (we

have a number of LOD metrics already) in a scalable manner

– provides queryable metadata (daQ) – provide quality reports which can be used for cleaning

• Java Based with maven integration• http://eis-bonn.github.io/Luzzu

Page 11: Linked Data Quality Assessment – daQ and Luzzu

… Luzzu – QA Framework

Page 12: Linked Data Quality Assessment – daQ and Luzzu

… Luzzu – QA Framework

Page 13: Linked Data Quality Assessment – daQ and Luzzu

…what’s missing in Luzzu

• Make Luzzu work better on Big Data Platforms

– We already have a SPARK Processor

– How can metrics be scaled on different cores? Something like map-reduce maybe?

Page 14: Linked Data Quality Assessment – daQ and Luzzu

… data quality lifecycle

Page 15: Linked Data Quality Assessment – daQ and Luzzu

… quality metrics

• Traditional naïve way

• Probabilistic Techniques (A paper was presented at ESWC this year)

Page 16: Linked Data Quality Assessment – daQ and Luzzu

… probabilistic technique hypothesis

Probabilistic approximation techniques would :

(H1) drastically improve computational time(H2) give close to accurate results

Page 17: Linked Data Quality Assessment – daQ and Luzzu

… probabilistic techniques used

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Page 18: Linked Data Quality Assessment – daQ and Luzzu

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 75% Time Saved: > 2 Orders of Magnitude

Precision: 100%Time Saved: > 2 Orders of Magnitude

Page 19: Linked Data Quality Assessment – daQ and Luzzu

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 97%Time Saved: > 3 Orders of Magnitude

Page 20: Linked Data Quality Assessment – daQ and Luzzu

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 95% Time Saved: > 1 Order of Magnitude

Page 21: Linked Data Quality Assessment – daQ and Luzzu

… what am I working on

• Large Scale/Data web Scale evaluation Journal Paper– assessing the quality of LOD Cloud datasets

• daQ (Journal Paper)

Page 22: Linked Data Quality Assessment – daQ and Luzzu

… what do we do at Bonn

• Open Government Data – Publishing and Consumption– Data Value Chains, Value Creation, Budgeting

• Portal for publication and consumption of open data– Lowering of semantic data to shallower domain specific

formats (RDB, CSV etc..)

• RDF Visualisations and Recommendations

Page 23: Linked Data Quality Assessment – daQ and Luzzu

… what do we do at Bonn

• Dataset Change Detection

• Collaborative Authoring and Open Educational Content

• Low-threshold agile methodology for collaborative vocabulary development

• Mapping of AutomationML to RDF

Page 24: Linked Data Quality Assessment – daQ and Luzzu

… some tools

http://purl.org/net/exconquer/

Page 25: Linked Data Quality Assessment – daQ and Luzzu

… some tools

http://purl.org/net/dsaas

Page 26: Linked Data Quality Assessment – daQ and Luzzu

… some tools

http://slidewiki.org

Page 27: Linked Data Quality Assessment – daQ and Luzzu

… some tools

http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html