Cyndy Chandler 22 July 2011
Biological and Chemical Oceanography Data Management Office
(BCO-DMO)
SOST IWG-OP Biodiversity Ad Hoc Committee ~ July 2011 Quarterly Meeting ~ Washington, DC
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMOWhat is BCO-DMO?Who is BCO-DMO?Why is BCO-DMO different?How do we accomplish our task?
Outline
Discussion: Data Management for Biodiversity Research
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMO staff provide data management support for investigators and projects funded by NSF Ocean Sciences Biological and Chemical Oceanography Sections or NSF OPP ANT Organisms & Ecosystems Program
partner with individual investigators and those associated with collaborative research projects
data management support throughout the projectcapture and record documentation (metadata)
sufficient to support data reuse and re-purposing load data and metadata into a relational database and ensure
their availability onlineensure final archive in appropriate data center (e.g. NODC);
contribute to special repositories (e.g. CDIAC, OBIS, GenBank)
‘proposal to preservation’
What is BCO-DMO?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMO StaffBiology Department
Peter Wiebe (Lead Investigator)Robert Groman (co-PI)Dicky Allison (Data Specialist)Tobias Work (Programmer)
Marine Chemistry and GeochemistryDavid Glover (co-PI)Cyndy Chandler (co-PI)Stephen Gegg (Data Specialist)
additional data specialists, consultants and collaborators as needed
Who is BCO-DMO?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMO staff are funded to …support NSF OCE and OPP funded researchersensure that data are …
available to the research community in a timely mannersufficiently documented to facilitate reuse and re-purposing
work with investigators during all phases of research:data management planning and stewardship
proposal writingcruise preparationcruise and data documentationeffective organization of data in the BCO-DMO data system
permanent archive of data at NODC
Why is BCO-DMO different?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
How do we accomplish our task?
BCO-DMO staff work in partnership with PIs to create well-documented data sets from research programs
involving a wide variety of sampling gear
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Data Discovery and Availabilityour primary task is to ensure that data from NSF OCE
funded awards are freely available online
the BCO-DMO data system and interfaces facilitatedata discovery (text and map-based browse systems)data access to assess fitness-for-purposedata export and downloaddata preservation in a permanent archive (the National
Oceanographic Data Center (NODC))
How do we accomplish our task?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Field Data to Databasein situ data from research cruises are documented and contributed to theonline data system and discoverable through a variety of user interfaces
How do we accomplish our task?
Original data from Bongo net towsand CTD/Niskin Rosette
bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 9 of 17
MOCNESS data – paper to digital
“Data Management in the Wild” ~ MOCNESS Datahauled in by people . . .
. . . the samples are processed by people, observations recorded by people, and digital data sets created by people
MOCNESS Sampling raw biology data raw physical data
digital biology data
digital physical
data
CTD sensor data
bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 10 of 17
MapServer Starting Screen
http://bco-dmo.org/BCO-DMO
Geospatial MapServer interface showing all available data.
http://bco-dmo.org/BCO-DMO
Geospatial MapServer interface showing all available data.
bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 11 of 17
MapServer with selections
access to dataaccess to data
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMO staff work in partnership with PIs to create well-documented data sets to enable reuse
and re-purposing of data to support US contributions to large coordinated
research programs and global ocean research themes
How do we accomplish our task?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
BCO-DMO and Other Data RepositoriesBCO-DMO is part of a network of distributed data
repositories working to support the research community and ensuring that data are available in the public domain.
Carbon Dioxide Information Analysis Center
North American Carbon Program
Long Term Ecological ResearchNetwork
National Center for Biotechnology Information: GenBank
Rolling Deck to Repository (R2R)
How do we accomplish our task?
bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 14 of 17
“A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.”
In: Advice to a Young Investigator (Santiago Ramón y Cajal, 1897)
Thank you.Questions?
photo by Chris Linder (WHOI)
http://bco-dmo.org/
bco-dmo.org Biological and Chemical Oceanography Data Management Office
What additional cyber-infrastructure is needed to support biodiversity research?
What else is needed to support biodiversity research?
The remaining slides are a supplement to the talk that may be useful during the data management discussion.
bco-dmo.org Biological and Chemical Oceanography Data Management Office
NSF Dimensions of Biodiversity Programdata from 9 awards to be managed by BCO-DMO
NSF OCE #1046144Dimensions: The Role of Viruses in Structuring Biodiversity in Methanotrophic Marine Ecosystems
NSF OCE #1046017 and OCE #1046098 Dimensions: Significance of nitrification in shaping planktonic biodiversity in the ocean
NSF OCE # 1045966, 1046001 , 1046368 and 1046297 Dimensions: Biological controls on the ocean C:N:P ratios
NSF OCE #1046371 and 1046372 Dimensions: Uncovering the novel diversity of the copepod microbiome and its effect on habitat invasions by the copepod host
What else is needed to support biodiversity research?
Marine Biodiversity Operation Network
Extended research network being considered
bco-dmo.org Biological and Chemical Oceanography Data Management Office slide 17 of 17
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Infrastructure OptionsChallenge:
there are currently many sources with overlapping and/or incomplete information
researchers must locate resources, resolve conflicts/duplicates, review and ‘repair’ retrieved data
Strategies and Solutions:data warehousing - extract, transfer, load datadata federation – network of distributed repositories
data remain at the source and are retrieved on demand
data aggregation – central catalog (e.g. EOL)
What else is needed to support biodiversity research?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Advantages and Disadvantagesdata warehousing – one central repository for all data
one system ‘one stop shop’ is rarely appropriate for all data typesdata and information loss during transfer
data federation – network of distributed repositoriesdata remain closer to the ‘source of origin’ and local expertisedata and information loss is limited requires negotiated arrangements (standards) to support
interoperability of distributed systemsLong-term preservation must be considered
data aggregation (e.g. EOL)
What else is needed to support biodiversity research?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Interoperability the ability of different data repository systems to
exchange and integrate data and information and present a unified view to the user
requires syntactic (format) compatibilitye.g. access/security, file formats, transfer protocols to retrieve data and information
requires semantic (language) compatibilitye.g. metadata standards, controlled vocabularies, ontologies to understand data and information
What else is needed to support biodiversity research?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Trans-disciplinary, cross-agency collaboration and cooperation
a workshop of 100 invited participants held in Broomfield, Colorado in March 2011 NSF sponsored with support from USGS primary objective: “to substantially advance discussions
and directions of data life cycle, data integration and data citation, with strong emphasis on end-use, and to provide a state-of-the-field report to NSF and the USGS of the geoinformatics community’s capabilities and needs ... “
final report (in progress) http://tw.rpi.edu/web/Workshop/Community/GeoData2011
Geo-Data Informatics 2011 WorkshopExploring the Life Cycle, Citation and Integration of Geo-Data
What else is needed to support biodiversity research?
some thoughts . . . integration of distributed, loosely federated data
repositoriesdesigned to foster biodiversity research and assessment
Microbes to MammalsHabitat to HealthTaxonomy to Tipping Points
bco-dmo.org Biological and Chemical Oceanography Data Management
What else is needed to support biodiversity research?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Data repositories for biodiversity research?
What else is needed to support biodiversity research?
BCO-DMOLTER sitesNCBI GenBankOBISMICROBIS: ICoMM Marine Microbes DatabaseEOLprotein Data Bank (3D structures of DNA, RNA)Cell Image Library (cellimagelibrary.org)NOAA, NASA, EPA and USGS sitesLiterature (some are proprietary)
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Coordinating groups for biodiversity research?
What else is needed to support biodiversity research?
NSF, NOAA, NASA, EPA and USGS agency program managers, representatives, committees
Interagency Working Groups and Advisory CommitteesScientific Steering Committees Interagency Working Group on Ocean Observations
(IWGOO) Support Office hosted at the Consortium for Ocean Leadership
Other considerations:What are the connection axes (geospatial, temporal,
organism/taxon/species name)?PI name (e.g. Web of Science researcher ID;
or ORCID - Open Source ID for researchers)Data provenance is very importantPersistent identifiers (DOIs ?)References (reciprocal links) to published literatureAccess to proprietary information
bco-dmo.org Biological and Chemical Oceanography Data Management
What else is needed to support biodiversity research?
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Existing Repositories
Other considerations:Long tail or ‘dark data’ (Heidorn 2008)
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Other considerations
What are the use cases?
Benedict, et al. 2007
bco-dmo.org Biological and Chemical Oceanography Data Management Office Final Slide
What additional cyber-infrastructure is needed to support biodiversity research?
What else is needed to support biodiversity research?
Additional repositories?What about the *omics data?
Connections between repositories?Standards (semantic and syntactic)
Advisory groups, workshops and governance systems
bco-dmo.org Biological and Chemical Oceanography Data Management Office
Existing Repositories
Existing Repositories
CDIACCarbon Dioxide Information Analysis Center-Ocean CO2
http://cdiac.ornl.gov/oceans/TCO2 (DIC)
TALK
pH
pCO2
CFCs
SF6
CC14
CaCO3
DOC, TOC
TDN
dC14
OBIS - USAOcean Biodiversity Information System (OBIS) - USA
http://obisusa.nbii.gov (will redirect)
GenBank http://www.ncbi.nlm.nih.gov/genbank/
SUBMIT DATA
SEARCH for DATA