Download - Tdwg 2-remsen
Taxonomic Databases Working Group Annual Meeting 2011
GBIF: Issues in providing federated access to digital information related to biological specimens.
David RemsenSenior Programme OfficerGlobal Biodiversity Information Facility (GBIF)
TDWG 2011
Issue #1: The consequences of scale
Goal – Provide timely access to a large federated network of biodiversity databases
About GBIF
• 341 publishers• 9290 datasets• 310M records
The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development.
• 57 countries• 45 organisations
Primary biodiversity data
“Wrapper” Software
PyWrapper (Python)
TAPIR Link (PHP)
DiGIR (PHP)
Your database
Insect Collection
Install one of these ‘wrappers’
ABCD
Bird Observations
Herbarium
Data
DarwinCore
DarwinCore
The promise of federation
Insect Collection HerbariumBird Observations Herbarium
Any specimens from Thailand?
GBIF Data Portal
I will ask!
I do! I do! I do!Nope!
GBIF Data Portal as a Gateway
The challenge of federation
Insect Collection HerbariumBird Observations Herbarium
Hello?
Server Not AvailableServer Not Available
GBIF Data Portal
Hi!
The rise of Indexing
Insect Collection HerbariumBird Observations Herbarium
Any data records from
Thailand?
Send me an index of all of your data
GBIF Data Portal (now with Data!)
GBIF Data Portal as a Data Index
The wrong tools for the job
Insect Collection HerbariumBird Observations Herbarium
Any data records from
Thailand?
Send me an index of your data once per month
Here is page one.
If I go offline,start againNot too fast!
You ask the same questions every time
GBIF Data Portal (now with Data!)
Darwin Core Archives
A text-based solution to publishing biodiversity
data
A Refined Approach
Insect Collection HerbariumBird Observations Herbarium
Any data records from
Thailand?
This is fast!
GBIF Data Portal (now with Data!)
This is easy
URL URL URL URL
2007 Today
70 million
20102008 2009
147 million
180 million
201 million
302 millionGrowth
Need for a new standard identified
Issue #2: Geospatial Integration
Goal – Provide accurate reporting of nationally-bound data
Challenge – Inaccurate recording of geospatial coordinates
Geo-referenced USA data
Verbatim data as shared on the network
Issue #2: Geospatial Integration
• Remediation includes• Integration of national shapefiles to verify that
coordinates fell within country boundaries– Including EEZ boundaries– Including islands
• Identified outliers• Qualified the nature of the error (e.g.,
“coordinates inverted”)• Marked and omitted these records from display
Geo-referenced USA data
Data following interpretation- Coastal regions recognised
- Offshore islands recognised
Issue #3: Taxonomic Integration
• Goal – Provide access to biodiversity data according to taxonomic groups and concepts
• Challenge – – Heterogeneous and sometimes inaccurate
classification• Same taxon appearing in different classifications
– Presence of homonyms that complicate reconciling above
– Misspellings– Wide range of orthographies for the same name.
Enabled taxonomic data to be published through GBIF
Trochilidae (Hummingbirds) (today)
Misinterpretations(Hummingbirds are only found in western
hemisphere)
Trochilidae (Hummingbirds) (next month)
Improved interpretation
Search for Oenanthe(water dropwort plant or wheatear bird)
Difficult for user to interpret
Accurate search results
Today
Next month
Improved the means to match names
In summary
• GBIF has had to deploy different data access strategies in order to effectively scale
• Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF
• Geospatial filtering via shapefiles provides basis for more accurate national reporting– Basis for additional services later (e.g., ecosystem
shapefiles, protected areas, etc.)
• Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure.– Comprehensive authoritative taxonomic data is a key
organisational component of collections data
Thank you