metadata standards for gridded climate data in the earth system grid robert drach llnl/pcmdi...

30
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Upload: coleen-hampton

Post on 05-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Metadata Standards for Gridded Climate Data in the Earth System Grid

Robert Drach LLNL/PCMDI

UCRL-PRES-149779

Page 2: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 2 Sept. 10, 2002

Overview

I. Earth System Grid: Grid Access to Climate Research Data

II. Metadata Standards for Gridded Climate Data

Page 3: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Part I

ESG: Grid Access to Climate Research Data

Page 4: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 4 Sept. 10, 2002

The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.

Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.

Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.

Foundation is Globus Grid technology

Earth System Grid Overview

Page 5: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 5 Sept. 10, 2002

Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.

GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. ESG is integrating OpenDAP (DODS protocol) with GridFTP

protocol. Single sign-on using Grid Security Infrastructure

Proxy certificates Community Authorization Service (CAS) Replica Location Service: manages copying and

placement of files in a distributed environment. Logical vs. physical files

http://www.globus.org

ESG uses Globus Grid technology.

Page 6: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 6 Sept. 10, 2002

ESG: U.S. Collaborations & Development

ORNL: Climate storage &computational resources

ORNL: Climate storage &computational resources

ANL: Computational grids,& grid-based applications

ANL: Computational grids,& grid-based applications

USC/ISI: Computational grids,& grid-based applications

USC/ISI: Computational grids,& grid-based applications

NCAR: Climate changepredication and scenarios

NCAR: Climate changepredication and scenarios

LBNL: Climate storage Facility and access

LBNL: Climate storage Facility and access

LLNL: Model diagnostics& inter-comparison

LLNL: Model diagnostics& inter-comparison

Page 7: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 7 Sept. 10, 2002

Program for Climate Model Diagnosis and Intercomparison

Validation and intercomparison of atmospheric general circulation models, coupled ocean-atmosphere models

Development of analysis software, quality control, archiving, distribution of model results. Climate Data Analysis Tools (CDAT) is a Python-based analysis and visualization system.

Global warming detection studies

CMIP (coupled models) and AMIP (atmospheric GCMs) gather model simulation results from thirty modeling groups worldwide.

Page 8: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 8 Sept. 10, 2002

PCMDI and Model Development

Modeling groups

PCMDIDiagnosis, quality control,

data archivalSimulation data

Controlled simulation runs

Feedback to modelers

Gridded observation data

Observations

Data assimilation

PCMDI

Page 9: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 9 Sept. 10, 2002

ESG-II Architecture

Portals

Servers

Middleware

Page 10: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 10 Sept. 10, 2002

ESG: Metadata Services

METADATAEXTRACTION

METADATAEXTRACTION

METADATADISPLAY

METADATADISPLAY

METADATABROWSING

METADATABROWSING

METADATAQUERY

METADATAQUERY

ESG CLIENTS API & USER INTERFACES

Data &MetadataCatalog

Dublin CoreDatabase

CFDatabase

mirrorDublin CoreXML Files

COMMENTSXML Files

METADATA HOLDINGS

METADATAANNOTATION

METADATAANNOTATION

METADATAVALIDATION

METADATAVALIDATION

METADATA ACCESS(update, insert, delete, query)

METADATA ACCESS(update, insert, delete, query)

SERVICE TRANSLATIONLIBRARY

SERVICE TRANSLATIONLIBRARY

CORE METADATA SERVICES

METADATAAGGREGATION

METADATAAGGREGATION

METADATADISCOVERY

METADATADISCOVERY

METADATA & DATA REGISTRATION

METADATA & DATA REGISTRATION

PUBLISHINGPUBLISHING

HIGH LEVEL METADATA SERVICES

SEARCH & DISCOVERYSEARCH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY

ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION

Page 11: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 11 Sept. 10, 2002

OpenDAP (DODS): Distributed Oceanographic Data System (Unidata)Integrations of Globus GridFTP, DODS data access

THREDDS: THematic Real‑time Environmental Distributed Data Services (Unidata)LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)

Works with CDAT, Ferret, GrADS, …CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualizationCommunity Data Portal project (NCAR)NCL (NCAR)Globus Grid technology(ANL, ISI): GridFTP, CAS Community Authorization Service

ESG is leveraging off existing software and projects.

Page 12: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 12 Sept. 10, 2002

CDAT: Example of an ESG GUI Client Access

Page 13: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 13 Sept. 10, 2002

LAS/CDAT: Example of a Web-based Data Portal

Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus),

CDAT Portal should hide/simplify the Grid for

users Single sign-on Community-based authorization Simplified resource location Remote job submission,

management Accesses the ESG Grid Testbed

Page 14: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Part II

Metadata Standards for Gridded Climate Data

Page 15: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 15 Sept. 10, 2002

Most climate simulation data are in the form of gridded datasets: collections of variables as a function of longitude, latitude, time, and vertical level.

A dataset is a logical container:A fileAn aggregation of filesA collection of database tables

Model-generated dataModel dataDerived data: zonal averages, global averages, virtual variables

Observational data, including reanalysesAttributes in the form of (name, value) pairs, array values

Climate Model Datasets

Page 16: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 16 Sept. 10, 2002

Suitable basis for storing data, but lack the metadata to support certain application requirements

netCDF (UCAR) array data model flexible attribute/value metadata model simple API

HDF (NCSA, NASA) collection of APIs, can be tailored to specific data models including scientific data sets, satellite data, point data

Binary formats

Page 17: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 17 Sept. 10, 2002

GRIB (WMO, ECMWF, NCEP) mixed sequential/array data model tailored for simulation output, supports common horizontal grid types hardwired metadata model good compression capabilities lacks a standard API

Binary formats

Page 18: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 18 Sept. 10, 2002

Self-describing binary formats are flexible, but underconstrain representation of coordinate systems.

Coordinate Systems

Index Space

Variable Space

Coordinate Space

Coordinate SystemTime(i)Latitude(j,k)Longitude(j,k)

V = Temperature(Time, Latitude, Longitude)V’ = Temperature(i,j,k)

Page 19: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 19 Sept. 10, 2002

Curvilinear grid - Los Alamos POP ocean model

Horizontal Grids

Temperature(i,j)

Latitude(i,j)

Longitude(i,j)

Lat_bounds(i,j,4)

Lon_bounds(i,j,4)

Page 20: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 20 Sept. 10, 2002

Reduced grid

Horizontal Grids

Temperature(i,j)

Latitude(i)

Longitude(i,j)

Lat_bounds(i,2)

Lon_bounds(i,j,4)

Page 21: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 21 Sept. 10, 2002

General grid – Colorado State geodesic grid

Horizontal Grids

Temperature(npts)

Latitude(npts)

Longitude(npts)

Lat_bounds(npts,6)

Lon_bounds(npts,6)

Page 22: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 22 Sept. 10, 2002

Applications must be able to recognize the spatial/temporal coordinate axes.

Visualization: continental overlaysData: selection by axis type

Spatial/temporal location

file = cdms.open(‘sample.nc’)

temperature = file[‘temperature’]

data = temperature(latitude=(-45.0, 45.0))

file = cdms.open(‘sample.nc’)

temperature = file[‘temperature’]

data = temperature(latitude=(-45.0, 45.0))

Page 23: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 23 Sept. 10, 2002

Climate simulations use different types of calendars ‘proleptic’ Gregorian Julian Mixed Gregorian/Julian No leap years (noleap) 30-day months

Climatologies represent multi-year averages.

Time representation and calendars

Page 24: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 24 Sept. 10, 2002

Several conventions have been developed to augment the netCDF data model.

Represent a balance between needs of data producers and data consumers.

COARDS convention 1D coordinates axes, rectilinear horizontal grids axis identification based on units variables limited to four dimensions ordering of dimensions fixed http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html

Metadata conventions

Page 25: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 25 Sept. 10, 2002

CF (Climate and Forecast) convention Based on earlier conventions, COARDS and GDT multidimensional coordinates (auxiliary coordinate variables) simplified axis identification specific representation for several horizontal grid types

rectilinear curvilinear reduced grids

variables can have an arbitrary number of dimensions no constraint on ordering of dimensions non-Gregorian calendars standard name table http://www.cgd.ucar.edu/cms/eaton/cf-metadata/

Metadata conventions

Page 26: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 26 Sept. 10, 2002

Ability to recognize comparable quantities is fundamental to model intercomparison. CF defines a schema for standard name tables XML representation used for table of standard variable names and descriptions standard_name attribute is optional. No restriction on variable names. Relationship to ontology development?

Comparability of quantities

<standard_name_table> <institution>Program for Climate Model Diagnosis and Intercomparison</institution> <contact>[email protected]</contact> <entry id="surface_air_pressure"> <canonical_units>Pa</canonical_units> <description>Pressure defined at the level of the mean topography within the grid box.</description> </entry> <alias id="mean_sea_level_pressure"> <entry_id>air_pressure_at_sea_level</entry_id> </alias> </standard_name_table>

<standard_name_table> <institution>Program for Climate Model Diagnosis and Intercomparison</institution> <contact>[email protected]</contact> <entry id="surface_air_pressure"> <canonical_units>Pa</canonical_units> <description>Pressure defined at the level of the mean topography within the grid box.</description> </entry> <alias id="mean_sea_level_pressure"> <entry_id>air_pressure_at_sea_level</entry_id> </alias> </standard_name_table>

Page 27: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 27 Sept. 10, 2002

Variable

CoordinateVariable

NetCDFFile

Attribute

NetCDFAggregate

Dimension

Collection

CF / NetCDF Object Model

Inheritance

Relationship (1:n)

CF

Relationship (m:n)

BoundaryVariable

GeneralCoordinate

Variable

GriddedVariable

AuxiliaryCoordinate

Variable

ESG has adopted the netCDF data model and the CF convention as standards Other standards and conventions will follow.NcML markup language.

ESG metadata

Page 28: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 28 Sept. 10, 2002

CF and NcML apply to data aggregates as well as files

Data aggregation: collections of files/datasets are treated as single entities. array model netCDF-like tailored for extraction of 'hyperslabs' of data

Aspects of aggregation: combining/merging variables joining variables creating new coordinate axes overlaying/adding metadata nesting datasets

Aggregation

Page 29: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 29 Sept. 10, 2002

Aggregation maps well to multifile datasets: multifile datasets can be thought of as 'partitioned' into files. Variables may 'span' multiple files. Usually a dataset is partitioned on time and/or vertical level axes.PCMDI CDAT supports aggregations via the cdscan utility, uses XML representation THREDDS/DODS aggregation server (http://www.unidata.ucar.edu/projects/THREDDS/)

Aggregation

Time

Level

Variable

Page 30: Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

Drach 30 Sept. 10, 2002

The Earth System Grid project is developing metadata services to support a variety of schemas and conventions.

The initial focus of ESG is to enable climate researchers to make effective use of distributed, model-generated datasets.

The netCDF schema and CF convention are the foundation for representation of this data.

Summary