linked data quality assessment – daq and luzzu

Post on 08-Aug-2015

296 Views

Category:

Presentations & Public Speaking

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linked Data Quality Assessment – daQ and Luzzu

Jeremy DebattistaUniversity of Bonn

Presentation at the Ontology Engineering Group (UPM)

…who am I?

• B.Sc (Hons) in Computer Science – University of Malta– Thesis: Collaborative Editing and Expert Finding

• M.App Sc in Computer Science – DERI, National University of Ireland, Galway– Thesis: Ontology-based rules for User-Controlled

Support in Ubiquitous Environments

• PhD Candidate – University of Bonn

… my PhD – the big picture

• Work related to Data Quality (in LD)– representing quality metadata (daQ)– assessing data quality (Luzzu)– identifying new metrics from standard

vocabularies (like PROV-O)

… the need for Quality Metadata

• Convincing data consumers to use our published data

• Filtering datasets

• Poor Quality Perspective – Big Data Veracity

… the daQ vocabulary

… the daQ vocabulary

… the daQ vocabulary

• Metadata as Named Graphs

• Usage of abstract class concept

• Metric assessment as Observations

• Preserving Provenance information

… daQ on the Web

http://purl.org/eis/vocab/daq

… daQ Applications

• daQ validator – Validates quality metric schemas extending the daQ (will be online soon)– e.g. checking that each dimension is in exactly one category…

• Luzzu – next slides

… Luzzu – QA Framework

• A comprehensive QA framework– assesses LD quality using user-provided metrics (we

have a number of LOD metrics already) in a scalable manner

– provides queryable metadata (daQ) – provide quality reports which can be used for cleaning

• Java Based with maven integration• http://eis-bonn.github.io/Luzzu

… Luzzu – QA Framework

… Luzzu – QA Framework

…what’s missing in Luzzu

• Make Luzzu work better on Big Data Platforms

– We already have a SPARK Processor

– How can metrics be scaled on different cores? Something like map-reduce maybe?

… data quality lifecycle

… quality metrics

• Traditional naïve way

• Probabilistic Techniques (A paper was presented at ESWC this year)

… probabilistic technique hypothesis

Probabilistic approximation techniques would :

(H1) drastically improve computational time(H2) give close to accurate results

… probabilistic techniques used

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 75% Time Saved: > 2 Orders of Magnitude

Precision: 100%Time Saved: > 2 Orders of Magnitude

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 97%Time Saved: > 3 Orders of Magnitude

… some results

Reservoir Sampling

Bloom Filters

Clustering Coefficient Estimation

Dereferenceability

Links to External Data Providers

Extensional Conciseness

Clustering Coefficient of a

Network

Precision: approx. 95% Time Saved: > 1 Order of Magnitude

… what am I working on

• Large Scale/Data web Scale evaluation Journal Paper– assessing the quality of LOD Cloud datasets

• daQ (Journal Paper)

… what do we do at Bonn

• Open Government Data – Publishing and Consumption– Data Value Chains, Value Creation, Budgeting

• Portal for publication and consumption of open data– Lowering of semantic data to shallower domain specific

formats (RDB, CSV etc..)

• RDF Visualisations and Recommendations

… what do we do at Bonn

• Dataset Change Detection

• Collaborative Authoring and Open Educational Content

• Low-threshold agile methodology for collaborative vocabulary development

• Mapping of AutomationML to RDF

… some tools

http://purl.org/net/exconquer/

… some tools

http://purl.org/net/dsaas

… some tools

http://slidewiki.org

… some tools

http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html

top related