data quality resources in species occurrence digitization allan koch veiga etienne americo cartolano...
TRANSCRIPT
S
Data Quality Resources in Species Occurrence
Digitization
Allan Koch VeigaEtienne Americo Cartolano Jr
Antonio Mauro Saraiva
Agricultural Automation Laboratory – LAAComputing Engineering Dept., Engineering School
Universidade de São Paulo, Brazil
Outline
Background
Biodiversity Data Digitizer (BDD) & IABIN
Data Quality Methodology
Data Quality Tools BDD Geo Tool BDD Taxon Tool
Conclusion
Background
Importance of Species Occurrence Data GBIF Portal IABIN Portal
Data quality impacts the uses of data
Location | Taxonomic data domain Georeferencing | Identification are two major
causes of error in species occurrence data
Need to improve Data Quality (DQ)
Data quality & IABIN-PTN
Inter-American Biodiversity Information Network (IABIN) Pollinators Thematic Network (PTN) GEF-funded project (2006-2011) (~$180k)
11 countries in Latin America ~400,000 records
Responsibilities Development of tools for data digitization and
integration Data Digitization Training and support Reviewing proposals, reports, data
Close contact with data owners / providers
Data Quality & IABIN-PTN
Opportunities & needs Discuss digitization issues with the
grantees Standards: importance and role (TDWG) Data quality: concepts
Improve data quality Provide mechanisms integrated to
digitization tools versus isolated tools
Biodiversity Data Digitizer (BDD)
Designed for easy: Digitization Manipulation Publication
Rich data content
FAO-GEF pollinator project
Darwin Core
EOL/Plinian Core
Interaction Extension
FAO Deficit Protocol
FAO Monitoring
Protocol
MRTG Schema
Dublin Core
Demo: Thu
Location Data Domain
DQ Assessment MethodologyWhat is Data Quality?
Completeness Consistency Credibility Accuracy Precision
Data Domain (context)
Dimension (aspect) Problem (error patterns)
Missing value
Incorrect value
Nonatomic value
Inconsistent value
Incorrect value
Missing value
Incorrect value
Nonatomic value
Missing value
Incorrect value
Nonatomic valueInformation
contamination
Nonatomic value
Information contaminati
on
Information contaminati
onInformation contaminati
on
DQ Management Methodology
How to improve the DQ?
Reducing Errors
Detection and CorrectionPrevention
Error prevention is considered superior to error detection
Resources to Improve DQ on BDD
Tools to prevent errors on occurrence data digitization
Integrated to BDD species occurrence data-entry interface
BDD Geo Tool prevent location data digitization errors
BDD Taxon Tool prevent taxonomic data digitization errors
BDD Geo ToolStep 1 of 3 – Primary Data
BDD Geo ToolStep 2 of 3 – Data Source
BDD Geo ToolStep 3 of 3 – Uncertainty
BDD Geo ToolLocation data form is filled
BDD Geo Tool
Improved
Completeness: adds data not available before (ex. lat/long, municipality)
Consistency: consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans )
Credibility: associate data to a credible source (BioGeomancer, Google, GeoNames)
Accuracy: better than center of mass of a region
Precision: uncertainty indicator increases data fitness for use
BDD Taxon ToolStep 1 of 2 – Taxonomic Name Selection
BDD Taxon ToolStep 2 of 2 – Taxonomic Hierarchy Selection
BDD Taxon ToolTaxonomic data form filled
BDD Taxon Tool
Improved
Completeness: taxonomic hierarchy is filled from a taxon name
Consistency: consistent data are obtained from a consistent source (Catalog of Life)
Credibility: data associate to a credible source (Catalog of Life)
Accuracy: avoid spelling mistakes / entering an incorrect taxonomic hierarchy
Precision: complete scientific names suggestions
Conclusion
Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool
Improved completeness, consistency, accuracy and precision of species occurrence data
Error prevention in taxonomic and location data
Tools available for an audience with little literacy on data digitization and DQ
Conclusion
Next steps
Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added
Work on error correction on existing data
Spreadsheet based data correction
Suggestions and collaboration are welcome!
Acknowledgements
IABIN – PTN Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz
Sellers and Ben Wheeler (USGS) Pedro Correa (University of São Paulo) All data grantees
FAO-UNEP-GEF Pollinator project in Brazil Barbara Gemmil-Herren (FAO) Ministry of the Environment - Brazil All data grantees
Thank you
Allan Koch Veiga [email protected]
Etienne Americo Cartolano [email protected]
Antonio Mauro Saraiva [email protected]
Agricultural Automation Laboratory – LAAComputing Engineering Dept., Engineering School
Universidade de São Paulo, Brazil