u.s. department of the interior u.s. geological survey biodata

30
U.S. Department of the Interior U.S. Geological Survey BioData http://aquatic.biodata.usgs.gov

Upload: toby-hamilton

Post on 26-Dec-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

U.S. Department of the InteriorU.S. Geological Survey

BioData

http://aquatic.biodata.usgs.gov

What is BioData? Objectives Development Highlights

(Agile Methdology) Future Directions

http://aquatic.biodata.usgs.gov

WHAT IS BIODATA

What Is BioData?

An internet-accessible database application for managing, storing, and distributing USGS aquatic bioassessment data

http://aquatic.biodata.usgs.gov

Internet Application

All you need is a web browser

No VPN required Updates from

the data center (you don’t have to do anything)

Feature rich

PublicRetrieval

project data management

BioDataInput

data distribution

field data lab data

• field data input• data exchange with

labs• data review

public web site

web data services

application-specific output

BioData is three systems

InternalRetrieval

Internal web site

internal analysis

• finalized data

• preliminary and restricted data

http://aquatic.biodata.usgs.gov

http://aquatic.biodata.usgs.gov

BioData is an Internet application

Data management tool Online data archive Data-distribution system Taxonomic harmonization system Ecologist lab data exchange

system

http://aquatic.biodata.usgs.gov

OBJECTIVES

1. Purpose and Scope

2. USGS Water Mission Area needs

3. Data Provenance and Quality

http://aquatic.biodata.usgs.gov

Purpose and Scope

Secure, web-enabled data management system for capturing, storing, archiving, curating, and distributing aquatic bioassessment data collected or compiled by USGS science projects

Data used to characterize and investigate the condition of water-dependent and water-related ecological resources

http://aquatic.biodata.usgs.gov

Water Resources Needs Assessment 2007 From 2000-2006 WSCs collected ~15,000 biology samples

in 147 projects; Project data are stored electronically but are time consuming to discover and access

NAWQA Program has collected > 20,000 macroinvertebrate, algae, fish, and stream habitat samples at 2,200 sites (since 1994). Data stored in BioTDB

http://aquatic.biodata.usgs.gov

USGS Water Mission Area

Philosophically we recognize that certain kinds of data sets lend themselves to a more centralized strategy, Similar goals to the NWIS QWDATA system; Fundamental community composition data for

stream bioassessments uses a fairly narrow range of methods and objectives;

Data integration needs.

http://aquatic.biodata.usgs.gov

Data Provenance and Quality 1/2

1. Data produced using USGS-approved sampling and laboratory methods

a. data traceable to the methods used to produce

2. Automated data validation and verification checks flag erroneous and suspect data

3. Data must be coded as reviewed and accepted by USGS and must pass all validation checks before it is released to the public.

http://aquatic.biodata.usgs.gov

Data Provenance and Quality 2/2

4. Laboratory authorization process a. demonstrate compliance with business rules and

data standards

5. Taxonomic nomenclature is controlled and managed by domain experts

a. maintain name-validation rules

b. map stored data

c. curate stored data for scientific advancements

6. At this time, sampling sites must be in the USGS National Water Information System (NWIS)

http://aquatic.biodata.usgs.gov

Extensible designSupport for Bioassessment & Monitoring data

Current Vision

Streams Lakes, wetlands, coastal areas, …

Fish, macroinvertebrates, algae, physical habitat

Mussels, crayfish, zooplankton

NAWQA and NRSA protocols Additional protocols

Retrieve tables of count data Biotic indices, maps, graphs

USGS projects DOI bureaus, other agencies

Integrate with NWIS water-quality and flow data through shared site framework

Integrate with partners through NHD spatial framework

Interactive data entry, lab batch loads Field data entry, mobile apps & devices

http://aquatic.biodata.usgs.gov

Data Management Support

Data Life Cycle Diagram Data Management Working Group of the

Community for Data Integration (CDI) https://my.usgs.gov/confluence/display/cdi/Home

http://aquatic.biodata.usgs.gov

Quality management features Auto data validation

Data Review tools

http://aquatic.biodata.usgs.gov

BioData Taxonomy Developed Taxonomic Management utility Data delivered using consistent, up-to-date nomenclature

and hierarchy All provisional/conditional identities are mapped to

published names (ICZN, ICBN) ITIS TSNs are reported for fish and macroinvertebrate data Originally-reported (bench) identities are stored and

delivered in result data tables

Bench Name Reference BioData Name Published Name

Hydropsyche unspecified Ceratopsyche/Hydropsyche Hydropsychinae

Hydropsyche Wiggins, 1996 Ceratopsyche/Hydropsyche Hydropsychinae

Hydropsyche Morse and Holzenthal, 2008

Hydropsyche Hydropsyche

Ceratopsyche Morse and Holzenthal, 2008

Ceratopsyche Ceratopsyche

http://aquatic.biodata.usgs.gov

DEVELOPMENT HIGHLIGHTS

http://aquatic.biodata.usgs.gov

Organization

Steering Committee Pete Ruhl, Project Manager Biological Users Group (BUG) Developers

Center for Integrated Data Analytics (CIDA), Middleton, Wis

Flexion Inc., Madison, Wisconsin (Contractor)

http://aquatic.biodata.usgs.gov

Timeline

FY 2007 – Planning starts 2009 February – Iteration 1 2011 March– Production release to NAWQA

Program 2012 Spring – Available to USGS

http://aquatic.biodata.usgs.gov

Agile Development Approach

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.

From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.

http://aquatic.biodata.usgs.gov

Release Planning Meeting 1. Write Stories (around broad themes)

A story is (who, what, why) As a <type of user, I want <some goal>, so that <some

reason> Example: As a data administrator, I want all batch

uploaded data to be tagged with original record information so that we can maintain the provenance of the data

2. Assign Value Vote, discuss, rewrite, consensus

3. Complexity 4. Pick Stories

http://aquatic.biodata.usgs.gov

Summary

From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.

http://aquatic.biodata.usgs.gov

FUTURE DIRECTIONS

http://aquatic.biodata.usgs.gov

What’s in the pipeline?

Data management support for more sampling and laboratory protocols

More ways to bring data in to the system Batch-loading data - Make more data available Field apps

Expand Data Distribution, Sharing Web services Map interface for discovering, retrieving data

http://aquatic.biodata.usgs.gov

BioData

For more information:

Use the online form

OR

Pete Ruhl [email protected] 703-648-6841Mitch Harris [email protected] 217-328-9716