data resources us perspective kerstin lehnert suzanne carbotte lamont-doherty earth observatory of...
TRANSCRIPT
Data Resources
US Perspective
Kerstin Lehnert Suzanne Carbotte
Lamont-Doherty Earth Observatory of Columbia University
Scientific Data in the Digital Age“It is exceedingly rare that fundamentally new approaches to research and education arise. Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change.”
US National Science Board, Report to the US National Science Foundation,, 2005
Access to Data
“Effective access to research data, in a responsible and efficient manner, is required to take advantage of the new opportunities and benefits offered by new information and communication technologies.”
Organization for Economic Co-operation & Development:
“Principles and Guidelines for Access to Research Data from
Public Funding”
May 2007
Open Access to Data: Benefits Democratize access to research resources
Ensure broad dissemination of results Facilitate new cross-disciplinary approaches - access for non-
specialist users Enable verification of research results Provide new research opportunities
Provide access to data from variety of sources and enable integration across fields
Provide foundation for use of automated tools Facilitate more efficient use of resources
Data are often expensive to collect (especially marine!) often/usually unique, repeat collection/analysis rare
Data Synthesis ‘the Old Way’
Months to Years
Months to Years
Data Synthesis Today
2 Minutes2 Minutes2 Minutes2 Minutes
Data Visualization: 2 Minutes
GeoMapApp software: www.geomapapp.org
Sharing Research Data: USA
“GAO recommends the agencies explore opportunities in the grants process to better ensure the availability of data to other researchers and determine if additional archiving strategies are warranted.”
GAO Report #07-1172September 28, 2007
Existing US Data Resources relevant for MARGINS Science Marine Geoscience Data System: hosts the MARGINS
Data Portal Geoinformatics for Geochemistry: hosts PetDB,
SedDB, SESAR, EarthChem (links to GEOROC & NAVDAT)
NGDC: Marine geoscience data - mostly legacy programs
IRIS: Seismic network data and earthquake catalogs UNAVCO: GPS data GEON: Lidar data SIO-GDC: hosts marine geoscience data from Scripps
expeditions WHOI: hosts data from vehicles of the NDSF
www.marine-geo.org
www.geoinfogeochem.org
PetDB
SESARSample Registry
EarthChem
SedDB
AntarcticMultibeam
Seismic ReflectionField Data Center
MARGINS
Ridge2K
Legacy
GfG & MGDSCollaborations & Partnerships
Boston UnivOregon State
Boise State
University of Kansas
WHOIScripps
Texas A&MUTIG
NGDCUniversity of NH
Data & IT • GEON• UNAVCO• USGS• IODP• ICDP• Pangaea• CoreWall• PaleoStrat• MetPetDB• LEPR
Data & IT • GEON• UNAVCO• USGS• IODP• ICDP• Pangaea• CoreWall• PaleoStrat• MetPetDB• LEPR
ScienceScience
Development
Operation
- Data modeling- Metadata standards- QC & ingestion
procedures- Data submission tools
& procedures
- Solicitation & Compilation
- Ingestion- Quality Control- Documentation- Curation- User support- Archiving
- Web applications- Query tools- Download
options- Web services, XML- Visualization & data
analysis tools
- System operation - Maintenance- User support
- Education modules - Presentations- Publications- Exhibits & demos- Workshops & short
courses- Web sites (News etc.)
DataData
ServicesServices
AccessAccess
EducationEducation& Outreach& Outreach
Scope of the MGDSScope of the MGDS
Metadata catalog: Central cruise catalog and data repository for all MARGINS programs- important goal is to preserve full data collection context for each expedition
Sensor Database: data documentation and access for multibeam and geophysical data from Palmer & Gould and MCS reflection data from Ewing & Langseth Global DEM: Synthesis of multibeam bathymetry into the Global Multi Resolution Topography - GMRT
MG&G Legacy data and derived data Tools for data access: lower barrier to data
access with tools tailored to science needs
October 23-24, 2007
MARGINS DatabaseMARGINS Database
Provides access to expedition information & data for all MARGINS funded marine and some terrestrial programs
Diverse data collected during these programs hosted within MARGINS database: swath bathymetry gravity and magnetics MCS reflection water column data (BLISP, CTD) side-scan sonar mapping data rock and fluid sampling information
Database includes links to WHOI (near bottom camera), UTIG (processed MCS), IRIS (seismometer), UNAVCO (GPS)
MGDS Data HoldingsMGDS Data Holdings
MGDS Access InterfacesMGDS Access Interfaces
Data Link (server side)
GeoMapApp (client side)
Web services
Access data hosted at distributed data repositories
Access to data at distributed data repositoriesAccess to data at distributed data repositories
Alvin and Jason2 near bottom photos
With bathymetry tiles exposed through a programmatic interface - can make use of GoogleEarth
GfG Program: Scope
PetDB, SedDB, EarthChem data sets Build and provide access to integrated compilations of large volumes of geochemical data desktop access to the entire published geochemical literature within minutes
EarthChem Portal: Central access point to the broadest range of geochemical data in federated databases
SESAR Sample Registry: Provide global unique identifiers for samples; build global sample catalog
Database Features
Archive & serve integrated data sets of geochemical data (each individual value searchable)
Include complete metadata of samples and analytical procedures for searching and data evaluation
Offer interactive, dynamic user interfaces that allow extraction of any customized subset of the data
Support data analysis Tools for data quality assessment & control. Tools for visualization (map interfaces, plotting tools). Integration with broader Geoscience data via
interoperability & partnerships.
EarthChem Data
EarthChem Portal
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Access via GeoMapApp
Ambiguous Sample Naming
Examples from the PetDB Database
Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)
Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)
Sample names are duplicated.
Sample names are modified or changed.
D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972
Dredge sample 3, Amphitrite Cruise 1963/4D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972
Dredge sample 3, Amphitrite Cruise 1963/4
46396B 22 3,28-38 Dungan 1978396B 22 3,28-38 Muehlenbach 1979249 Dungan 1978DSDP046-0396B-022-003/28-38 PetDB
DSDP Leg 46, Hole 396B, Section 22, Sample 3, 28-33cm
International Geo Sample Number IGSN SESAR serves as registry that provides &
manages unique identifiers for samples IGSN - International Geo Sample Number Obtained upon submission of sample metadata
(registration) Implementation in sample collection &
curation ongoing (IODP, core repositories)
Ca. 4 Mio. samples registered System still under development
Challenges for Open Data Access
Improving Global Data Access
Agreed on statements of principle and recommendations to address technical, procedural, and organizational issues of open global data sharing.
“Building a Global Data Network for Studies of Earth Processes at the World’s Plate Boundaries”International Workshop, Kiel (Germany), May 2007. Attended by 71 people from 14 countries.Sponsored by the MARGINS, Ridge2000, InterMARGINS, InterRIDGE programs.
Workshop Recommendations Science User Needs
Access to all data needed to reproduce scientific results Access to multidisciplinary & integrated marine & terrestrial data
Data Documentation & Publication Uniform best practices & standards for data acquisition, data
submission to data centers & data publication Easy procedures for metadata creation & data submission
Data & Metadata Interoperability Minimize proliferation of metadata standards Development of a data discovery service across distributed data
resources Opportunities & Obstacles for International Data
Sharing Leverage international bodies & programs (e.g. GEOSS, eGY, ICSU,
IPY) Establish dedicated task group & special interest groups to advance
implementation of a global data network
Cyberinfrastructure
Geoinformatics = Cyberinfrastructure for the Geosciences
Goal: A genuine infrastructure of highly reliable, widely accessible capabilities and services to support the entire range of scientific work.
Infrastructure Components
Technological Infrastructure Institutional & Management Models Legal & Policy Framework Financial Support Cultural & Behavioral Changes
MARGINSTAMU*
LEGACYNGDC/UNH
Ridge2000WHOI*
AntarcticMBS
Seismic Reflection DMSUTIG (Lead)
MGDS