u.s. department of the interior u.s. geological survey biodata
TRANSCRIPT
What Is BioData?
An internet-accessible database application for managing, storing, and distributing USGS aquatic bioassessment data
http://aquatic.biodata.usgs.gov
Internet Application
All you need is a web browser
No VPN required Updates from
the data center (you don’t have to do anything)
Feature rich
PublicRetrieval
project data management
BioDataInput
data distribution
field data lab data
• field data input• data exchange with
labs• data review
public web site
web data services
application-specific output
BioData is three systems
InternalRetrieval
Internal web site
internal analysis
• finalized data
• preliminary and restricted data
http://aquatic.biodata.usgs.gov
BioData is an Internet application
Data management tool Online data archive Data-distribution system Taxonomic harmonization system Ecologist lab data exchange
system
http://aquatic.biodata.usgs.gov
OBJECTIVES
1. Purpose and Scope
2. USGS Water Mission Area needs
3. Data Provenance and Quality
http://aquatic.biodata.usgs.gov
Purpose and Scope
Secure, web-enabled data management system for capturing, storing, archiving, curating, and distributing aquatic bioassessment data collected or compiled by USGS science projects
Data used to characterize and investigate the condition of water-dependent and water-related ecological resources
http://aquatic.biodata.usgs.gov
Water Resources Needs Assessment 2007 From 2000-2006 WSCs collected ~15,000 biology samples
in 147 projects; Project data are stored electronically but are time consuming to discover and access
NAWQA Program has collected > 20,000 macroinvertebrate, algae, fish, and stream habitat samples at 2,200 sites (since 1994). Data stored in BioTDB
http://aquatic.biodata.usgs.gov
USGS Water Mission Area
Philosophically we recognize that certain kinds of data sets lend themselves to a more centralized strategy, Similar goals to the NWIS QWDATA system; Fundamental community composition data for
stream bioassessments uses a fairly narrow range of methods and objectives;
Data integration needs.
http://aquatic.biodata.usgs.gov
Data Provenance and Quality 1/2
1. Data produced using USGS-approved sampling and laboratory methods
a. data traceable to the methods used to produce
2. Automated data validation and verification checks flag erroneous and suspect data
3. Data must be coded as reviewed and accepted by USGS and must pass all validation checks before it is released to the public.
http://aquatic.biodata.usgs.gov
Data Provenance and Quality 2/2
4. Laboratory authorization process a. demonstrate compliance with business rules and
data standards
5. Taxonomic nomenclature is controlled and managed by domain experts
a. maintain name-validation rules
b. map stored data
c. curate stored data for scientific advancements
6. At this time, sampling sites must be in the USGS National Water Information System (NWIS)
http://aquatic.biodata.usgs.gov
Extensible designSupport for Bioassessment & Monitoring data
Current Vision
Streams Lakes, wetlands, coastal areas, …
Fish, macroinvertebrates, algae, physical habitat
Mussels, crayfish, zooplankton
NAWQA and NRSA protocols Additional protocols
Retrieve tables of count data Biotic indices, maps, graphs
USGS projects DOI bureaus, other agencies
Integrate with NWIS water-quality and flow data through shared site framework
Integrate with partners through NHD spatial framework
Interactive data entry, lab batch loads Field data entry, mobile apps & devices
http://aquatic.biodata.usgs.gov
Data Management Support
Data Life Cycle Diagram Data Management Working Group of the
Community for Data Integration (CDI) https://my.usgs.gov/confluence/display/cdi/Home
http://aquatic.biodata.usgs.gov
BioData Taxonomy Developed Taxonomic Management utility Data delivered using consistent, up-to-date nomenclature
and hierarchy All provisional/conditional identities are mapped to
published names (ICZN, ICBN) ITIS TSNs are reported for fish and macroinvertebrate data Originally-reported (bench) identities are stored and
delivered in result data tables
Bench Name Reference BioData Name Published Name
Hydropsyche unspecified Ceratopsyche/Hydropsyche Hydropsychinae
Hydropsyche Wiggins, 1996 Ceratopsyche/Hydropsyche Hydropsychinae
Hydropsyche Morse and Holzenthal, 2008
Hydropsyche Hydropsyche
Ceratopsyche Morse and Holzenthal, 2008
Ceratopsyche Ceratopsyche
http://aquatic.biodata.usgs.gov
Organization
Steering Committee Pete Ruhl, Project Manager Biological Users Group (BUG) Developers
Center for Integrated Data Analytics (CIDA), Middleton, Wis
Flexion Inc., Madison, Wisconsin (Contractor)
http://aquatic.biodata.usgs.gov
Timeline
FY 2007 – Planning starts 2009 February – Iteration 1 2011 March– Production release to NAWQA
Program 2012 Spring – Available to USGS
http://aquatic.biodata.usgs.gov
Agile Development Approach
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.
From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.
http://aquatic.biodata.usgs.gov
Release Planning Meeting 1. Write Stories (around broad themes)
A story is (who, what, why) As a <type of user, I want <some goal>, so that <some
reason> Example: As a data administrator, I want all batch
uploaded data to be tagged with original record information so that we can maintain the provenance of the data
2. Assign Value Vote, discuss, rewrite, consensus
3. Complexity 4. Pick Stories
http://aquatic.biodata.usgs.gov
Summary
From – Rasmussen, Jonathan, 2010, The Agile Samurai: How Agile Masters Deliver Great Software: Dallas, Texas, The Pragmatic Bookshelf.
http://aquatic.biodata.usgs.gov
What’s in the pipeline?
Data management support for more sampling and laboratory protocols
More ways to bring data in to the system Batch-loading data - Make more data available Field apps
Expand Data Distribution, Sharing Web services Map interface for discovering, retrieving data
http://aquatic.biodata.usgs.gov
BioData
For more information:
Use the online form
OR
Pete Ruhl [email protected] 703-648-6841Mitch Harris [email protected] 217-328-9716