metadata standards for gridded climate data in the earth system grid robert drach llnl/pcmdi...
TRANSCRIPT
Metadata Standards for Gridded Climate Data in the Earth System Grid
Robert Drach LLNL/PCMDI
UCRL-PRES-149779
Drach 2 Sept. 10, 2002
Overview
I. Earth System Grid: Grid Access to Climate Research Data
II. Metadata Standards for Gridded Climate Data
Part I
ESG: Grid Access to Climate Research Data
Drach 4 Sept. 10, 2002
The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.
Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.
Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.
Foundation is Globus Grid technology
Earth System Grid Overview
Drach 5 Sept. 10, 2002
Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.
GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. ESG is integrating OpenDAP (DODS protocol) with GridFTP
protocol. Single sign-on using Grid Security Infrastructure
Proxy certificates Community Authorization Service (CAS) Replica Location Service: manages copying and
placement of files in a distributed environment. Logical vs. physical files
http://www.globus.org
ESG uses Globus Grid technology.
Drach 6 Sept. 10, 2002
ESG: U.S. Collaborations & Development
ORNL: Climate storage &computational resources
ORNL: Climate storage &computational resources
ANL: Computational grids,& grid-based applications
ANL: Computational grids,& grid-based applications
USC/ISI: Computational grids,& grid-based applications
USC/ISI: Computational grids,& grid-based applications
NCAR: Climate changepredication and scenarios
NCAR: Climate changepredication and scenarios
LBNL: Climate storage Facility and access
LBNL: Climate storage Facility and access
LLNL: Model diagnostics& inter-comparison
LLNL: Model diagnostics& inter-comparison
Drach 7 Sept. 10, 2002
Program for Climate Model Diagnosis and Intercomparison
Validation and intercomparison of atmospheric general circulation models, coupled ocean-atmosphere models
Development of analysis software, quality control, archiving, distribution of model results. Climate Data Analysis Tools (CDAT) is a Python-based analysis and visualization system.
Global warming detection studies
CMIP (coupled models) and AMIP (atmospheric GCMs) gather model simulation results from thirty modeling groups worldwide.
Drach 8 Sept. 10, 2002
PCMDI and Model Development
Modeling groups
PCMDIDiagnosis, quality control,
data archivalSimulation data
Controlled simulation runs
Feedback to modelers
Gridded observation data
Observations
Data assimilation
PCMDI
Drach 9 Sept. 10, 2002
ESG-II Architecture
Portals
Servers
Middleware
Drach 10 Sept. 10, 2002
ESG: Metadata Services
METADATAEXTRACTION
METADATAEXTRACTION
METADATADISPLAY
METADATADISPLAY
METADATABROWSING
METADATABROWSING
METADATAQUERY
METADATAQUERY
ESG CLIENTS API & USER INTERFACES
Data &MetadataCatalog
Dublin CoreDatabase
CFDatabase
mirrorDublin CoreXML Files
COMMENTSXML Files
METADATA HOLDINGS
METADATAANNOTATION
METADATAANNOTATION
METADATAVALIDATION
METADATAVALIDATION
METADATA ACCESS(update, insert, delete, query)
METADATA ACCESS(update, insert, delete, query)
SERVICE TRANSLATIONLIBRARY
SERVICE TRANSLATIONLIBRARY
CORE METADATA SERVICES
METADATAAGGREGATION
METADATAAGGREGATION
METADATADISCOVERY
METADATADISCOVERY
METADATA & DATA REGISTRATION
METADATA & DATA REGISTRATION
PUBLISHINGPUBLISHING
HIGH LEVEL METADATA SERVICES
SEARCH & DISCOVERYSEARCH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY
ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION
Drach 11 Sept. 10, 2002
OpenDAP (DODS): Distributed Oceanographic Data System (Unidata)Integrations of Globus GridFTP, DODS data access
THREDDS: THematic Real‑time Environmental Distributed Data Services (Unidata)LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)
Works with CDAT, Ferret, GrADS, …CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualizationCommunity Data Portal project (NCAR)NCL (NCAR)Globus Grid technology(ANL, ISI): GridFTP, CAS Community Authorization Service
ESG is leveraging off existing software and projects.
Drach 12 Sept. 10, 2002
CDAT: Example of an ESG GUI Client Access
Drach 13 Sept. 10, 2002
LAS/CDAT: Example of a Web-based Data Portal
Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus),
CDAT Portal should hide/simplify the Grid for
users Single sign-on Community-based authorization Simplified resource location Remote job submission,
management Accesses the ESG Grid Testbed
Part II
Metadata Standards for Gridded Climate Data
Drach 15 Sept. 10, 2002
Most climate simulation data are in the form of gridded datasets: collections of variables as a function of longitude, latitude, time, and vertical level.
A dataset is a logical container:A fileAn aggregation of filesA collection of database tables
Model-generated dataModel dataDerived data: zonal averages, global averages, virtual variables
Observational data, including reanalysesAttributes in the form of (name, value) pairs, array values
Climate Model Datasets
Drach 16 Sept. 10, 2002
Suitable basis for storing data, but lack the metadata to support certain application requirements
netCDF (UCAR) array data model flexible attribute/value metadata model simple API
HDF (NCSA, NASA) collection of APIs, can be tailored to specific data models including scientific data sets, satellite data, point data
Binary formats
Drach 17 Sept. 10, 2002
GRIB (WMO, ECMWF, NCEP) mixed sequential/array data model tailored for simulation output, supports common horizontal grid types hardwired metadata model good compression capabilities lacks a standard API
Binary formats
Drach 18 Sept. 10, 2002
Self-describing binary formats are flexible, but underconstrain representation of coordinate systems.
Coordinate Systems
Index Space
Variable Space
Coordinate Space
Coordinate SystemTime(i)Latitude(j,k)Longitude(j,k)
V = Temperature(Time, Latitude, Longitude)V’ = Temperature(i,j,k)
Drach 19 Sept. 10, 2002
Curvilinear grid - Los Alamos POP ocean model
Horizontal Grids
Temperature(i,j)
Latitude(i,j)
Longitude(i,j)
Lat_bounds(i,j,4)
Lon_bounds(i,j,4)
Drach 20 Sept. 10, 2002
Reduced grid
Horizontal Grids
Temperature(i,j)
Latitude(i)
Longitude(i,j)
Lat_bounds(i,2)
Lon_bounds(i,j,4)
Drach 21 Sept. 10, 2002
General grid – Colorado State geodesic grid
Horizontal Grids
Temperature(npts)
Latitude(npts)
Longitude(npts)
Lat_bounds(npts,6)
Lon_bounds(npts,6)
Drach 22 Sept. 10, 2002
Applications must be able to recognize the spatial/temporal coordinate axes.
Visualization: continental overlaysData: selection by axis type
Spatial/temporal location
file = cdms.open(‘sample.nc’)
temperature = file[‘temperature’]
data = temperature(latitude=(-45.0, 45.0))
file = cdms.open(‘sample.nc’)
temperature = file[‘temperature’]
data = temperature(latitude=(-45.0, 45.0))
Drach 23 Sept. 10, 2002
Climate simulations use different types of calendars ‘proleptic’ Gregorian Julian Mixed Gregorian/Julian No leap years (noleap) 30-day months
Climatologies represent multi-year averages.
Time representation and calendars
Drach 24 Sept. 10, 2002
Several conventions have been developed to augment the netCDF data model.
Represent a balance between needs of data producers and data consumers.
COARDS convention 1D coordinates axes, rectilinear horizontal grids axis identification based on units variables limited to four dimensions ordering of dimensions fixed http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html
Metadata conventions
Drach 25 Sept. 10, 2002
CF (Climate and Forecast) convention Based on earlier conventions, COARDS and GDT multidimensional coordinates (auxiliary coordinate variables) simplified axis identification specific representation for several horizontal grid types
rectilinear curvilinear reduced grids
variables can have an arbitrary number of dimensions no constraint on ordering of dimensions non-Gregorian calendars standard name table http://www.cgd.ucar.edu/cms/eaton/cf-metadata/
Metadata conventions
Drach 26 Sept. 10, 2002
Ability to recognize comparable quantities is fundamental to model intercomparison. CF defines a schema for standard name tables XML representation used for table of standard variable names and descriptions standard_name attribute is optional. No restriction on variable names. Relationship to ontology development?
Comparability of quantities
<standard_name_table> <institution>Program for Climate Model Diagnosis and Intercomparison</institution> <contact>[email protected]</contact> <entry id="surface_air_pressure"> <canonical_units>Pa</canonical_units> <description>Pressure defined at the level of the mean topography within the grid box.</description> </entry> <alias id="mean_sea_level_pressure"> <entry_id>air_pressure_at_sea_level</entry_id> </alias> </standard_name_table>
<standard_name_table> <institution>Program for Climate Model Diagnosis and Intercomparison</institution> <contact>[email protected]</contact> <entry id="surface_air_pressure"> <canonical_units>Pa</canonical_units> <description>Pressure defined at the level of the mean topography within the grid box.</description> </entry> <alias id="mean_sea_level_pressure"> <entry_id>air_pressure_at_sea_level</entry_id> </alias> </standard_name_table>
Drach 27 Sept. 10, 2002
Variable
CoordinateVariable
NetCDFFile
Attribute
NetCDFAggregate
Dimension
Collection
CF / NetCDF Object Model
Inheritance
Relationship (1:n)
CF
Relationship (m:n)
BoundaryVariable
GeneralCoordinate
Variable
GriddedVariable
AuxiliaryCoordinate
Variable
ESG has adopted the netCDF data model and the CF convention as standards Other standards and conventions will follow.NcML markup language.
ESG metadata
Drach 28 Sept. 10, 2002
CF and NcML apply to data aggregates as well as files
Data aggregation: collections of files/datasets are treated as single entities. array model netCDF-like tailored for extraction of 'hyperslabs' of data
Aspects of aggregation: combining/merging variables joining variables creating new coordinate axes overlaying/adding metadata nesting datasets
Aggregation
Drach 29 Sept. 10, 2002
Aggregation maps well to multifile datasets: multifile datasets can be thought of as 'partitioned' into files. Variables may 'span' multiple files. Usually a dataset is partitioned on time and/or vertical level axes.PCMDI CDAT supports aggregations via the cdscan utility, uses XML representation THREDDS/DODS aggregation server (http://www.unidata.ucar.edu/projects/THREDDS/)
Aggregation
Time
Level
Variable
Drach 30 Sept. 10, 2002
The Earth System Grid project is developing metadata services to support a variety of schemas and conventions.
The initial focus of ESG is to enable climate researchers to make effective use of distributed, model-generated datasets.
The netCDF schema and CF convention are the foundation for representation of this data.
Summary