eea data quality management supporting inspire ......eea’s data flow stages • source: eea common...
Post on 04-Aug-2020
2 Views
Preview:
TRANSCRIPT
Daniela Cristiana Docan I 6th Sept. I INSPIRE Conference 2017, Strasbourg
EEA Data Quality Management supporting
INSPIRE implementation
Data Quality in INSPIRE
INSPIRE Technical Guidelines use ISO 19157
Geographic Information-Data quality
•Data Quality Elements mentioned in INSPIRE TG
ISO 19157 Geographic Information – Data Quality
Data Quality in INSPIRE
INSPIRE TGs cover:
1.Data Quality elements/sub-elements2. Data quality measures
(tests to be applied on dataset)3. Minimum data quality requirements/conformance quality level/
ISO 19157 Geographic Information – Data Quality
•Data Quality in INSPIRE : Protected Sites TG
Recommendations:
•DQ elements and sub-elements•Corresponding DQ measures•Minimum data quality requirements
•--
acceptance criteria or
conformance quality level
•Data Quality in INSPIRE : Protected Sites TG
Item: fields,
records, value,
features,
relationships,
files in the
dataset
package
•Data Quality in INSPIRE : Protected Sites TG
INSPIRE data quality requirements
INSPIRE:
Completeness/Omission
INSPIRE: Rate of missing item
INSPIRE: No recommendation/constrains
EEA’s Data Flow stages
•Source: EEA Common Workspace/Generic QA/QC
•Guidelines for Reporting Obligations, XML schema, Database schema, Quality control checks•A priori DQ requirements -absolute positional accuracy-10m
“what we want”
•Automatic and manually quality checks•Conformance test (minimum data quality requirements)•Metadata and/or standalone data quality report]• A posteriori DQ results/values
“what we get”
•
•Run automatic quality checks [QA scripts – XQuery]•Automatic QA report for MS
•ETL (Extract Transform Load) tools•Automatic and manually quality checks
•EEA’s Automatic quality report
•Source: Eionet Central Data Repository (CDR
equivalent to measures in ISO standard
•How will EEA’s quality checks connect with ISO elements & measures?
INSPIRE Requirements in INSPIRE EEA’s CDF’s quality checks
Element: Completeness-Omission ✓Mandatory values
Standardised Measure: Rate of missing
items
[Error rate] (e.g. real, percentage, ratio)
✓User defined data quality measure:
All records must have the SITE_CODE field
filled
Minimum data quality requirements:
None
Conformance test: acceptance criteria is
[0%] errors in the dataset
Source: Eionet Central Data repository (CDR)
Data Quality Rule Registry (DQRR)• [catalogue of standardized and user defined data quality measures]
= Measures in ISO 19157
= Elements in ISO 19157
Data Quality Rule Registry (DQRR)
•Data quality checks – different point of views
1.“Minimal mapping unit”
[the smallest size of area allowed to be represented in a given data set]
Topological consistency or conceptual consistency?
2. “C34 – Coordinate accuracy”
These are required to be in format the ETRS89 (2D)-EPSG:4258 coordinate reference system,
with a 10m accuracy. Hence a check is required to ensure that, when coordinates are
reported, each coordinate is to 4 decimal places, adhering to the 10m accuracy required.”
CDF’s guideline
The number of decimal places for decimal degrees coordinates
☺ Precision or resolution Not absolute positional accuracy
Source: www.ncetm.org.uk
Logical consistency-Conceptual consistency
or Format consistency?
The error vector for a single point,
(Source: Weir et al., 2001, p.413)
Conclusions
INSPIRE Data Specification requirements/recommendations
Data Specifications are not restrictive on the data quality (e.g. elements to be covered, measure (tests) to be applied, or minimum data quality requirements)
Consistency in defining data quality elements and measures (tests) across different annexes or/and themes
(e.g. Annex III - Topological consistency and Temporal consistency and validity)
•Conclusions
EEA’s QA/QC workflow
Fulfils the INSPIRE requirements on data qualityNew components of the data quality will be covered/improved (e.g. absolute positional accuracy)Data Quality Rule Register (DQRR) project promote the “interoperability of the data quality” by proposing:
•Common criteria to categorise data quality checks across EEA’s production stages•Harmonize DQ terminology across different Core Data Flows
(e.g. record uniqueness, duplicate elements, duplicates entries, duplicate value, duplicities, and uniqueness of primary key)
•Assign/link the existing DQ checks to ISO quality elements and sub-elements•Harmonise the Standalone Data Quality Reports
•Q&A
Thank you,
Daniela Cristiana Docan
daniela.docan@eea.europa.eu
top related