biodata a new bioassessment database for the usgs briefing for the cdi 2011.06.08

Post on 31-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BioDataa new bioassessment database for the USGS

Briefing for the CDI 2011.06.08

http://aquatic.biodata.usgs.gov

Today

What is BioData? Why Did We Build It? Current Capabilities Future Possibilities Data Integration/Interoperability Challenges

What is BioData? – in a nutshell

A data management, storage, and distribution system for aquatic bioassessment data.

• data capture• data curation• data publication

Why We Built It - A Brief History

1992 – National Water-Quality Assessment Program (NAWQA) began collecting bioassessment data (macroinvert, fish, algae, stream habitat)

NAWQA Study Units

Why We Built It - A Brief History

1992 – National Water-Quality Assessment Program (NAWQA) begins collecting bioassessment data (macroinvert, fish, algae, stream habitat)

1992 – 1999: Local data management and national data aggregations

1999 – NAWQA national bioassessment database – (BioTDB)

WRD Needs Assessment (2006)

Surveyed WRD Science Centers to find out: How much aquatic ecology data is being collected

outside the NAWQA Program? What kinds? What methods? Where and how are data being stored?

What We Discovered

Water collaborative projects with other agencies, states, localities, and partners are producing as much data as the NAWQA Program 80 % of WSC’s reported projects collecting aquatic ecology

data 120 projects had a macroinvertebrate, fish, algae, or habitat

component (2000 – 2005) Approximately 15,000 samples

The majority of samples are being collected using NAWQA and USEPA national stream bioassessment protocols

Samples are being sent to a variety of taxonomic labs

What We Discovered

The data are stored electronically, but are very difficult to discover, access, and integrate 47% in Excel 13% are in EPA databases 19% in home-grown relational databases

79%

U.S. Department of the InteriorU.S. Geological Survey

BioDataa new bioassessment database for the USGS

briefing for the USGS GCMRC 5/9/2011

http://aquatic.biodata.usgs.gov

What Should We Do?

1. Do nothing?

2. Implement a federated system?

3. Incrementally refurbish existing NAWQA database?

4. Redesign and “re-build” using modern, web-enabled, extensible architecture? (BioData)

Biodata - Version 1 Objective

A data storage, retrieval, and distribution system for aquatic bioassessment data most commonly produced by USGS WRD projects.

“Most Commonly Produced” Project Objectives

Setting

Types of Data

Sampling Protocols

Bioassessment and monitoring

Streams and rivers

Macroinvertebrates Fish Algae Study reach habitat

NAWQA USEPA

Additional Characteristics

An internet application Available to any USGS ecologist. Designed to be adapted and extended Support scientific workflow Serve as an online data archive Curate taxonomic nomenclature - map it

forward and harmonize it across all the data Support biologist lab data exchange Readily add web data services

BioDataRetrieval

(DWH)

project data management

BioDataInput

data distribution

field data lab data

• field data input• data exchange with

labs• data review

external data

• NAWQA legacy data

public web site

web data services

application-specific output

Data Retrieval Featureshttps://aquatic.biodata.usgs.gov

Real-time feedback on how many samples your query will return

Save the query to your desktop – then email to friends for them to run

Variety of file formats Multiple data sets downloaded in one step

Data Retrieval Demo

https://aquatic.biodata.usgs.gov

BioDataRetrieval

(DWH)

project data management

BioDataInput

data distribution

field data lab data

• field data input• data exchange with

labs• data review

external data

• NAWQA legacy data

public web site

web data services

application-specific output

Data Input/Management Features

Retrieve restricted (unreleased) data Manage and organize data by project Project control over rights to enter and edit

data Built in help and data validation checks Auto-saving Data entry screens tailored to field sheets Send electronic orders to labs

Data Input/Mgt Demo

Data integration – touchpoints

First challenge – find the data Second challenge - compatible methods?

Data integration – touchpoints

First challenge – find the data Second challenge - compatible methods? Third challenge – get the data

We need to pick a data exchange standard

Data integration – touchpoints

First challenge – find the data Second challenge - compatible methods? Third challenge – get the data Fourth challenge – harmonize taxonomy

Does “Thienemannimyia group” = “Thienemannimyia gr.” ?? Does ITIS solve this?

ITIS

ITIS

Only handles published names We have to handle unpublished names Provisional = new taxon claimed but not

“officially” published Conditional = uncertain or indeterminate

identification, e.g. “Thienemannimyia group”

ITIS is not complete for all groups Fish – good, we can integrate tightly with it Macroinvertebrates – doable Algae – ITIS not ready yet

Data integration – touchpoints

First challenge – find the data Second challenge - compatible methods? Third challenge – get the data Fourth challenge – harmonize taxonomy

Does “Thienemannimyia group” = “Thienemannimyia gr.” ??

Fifth challenge – integrate with physio-chemical and ancillary data Common geospatial framework would help

NHD

Which NHD? NHD “snap to” service with API’s that

developers could use in their application(s)? Service to translate NHD address to other

versions of NHD (and future)

http://aquatic.biodata.usgs.gov

BioData

For more information contact:Pete Ruhlpmruhl@usgs.gov703-648-6841

top related