enabling technologies for facilitating access and use of data

23
Enabling technologies for facilitating access and use of data Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville, 2010-03-09 QuickTime™ and a decompressor are needed to see this picture.

Upload: zeno

Post on 11-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Enabling technologies for facilitating access and use of data. Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville , 2010-03-09. CDM. Goal: N + M instead of N * M things on your TODO List. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enabling technologies for facilitating access and use of data

Enabling technologies for facilitating access and use of

dataRuss Rew and John Caron, Unidata

Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society

NCDC, Asheville, 2010-03-09

QuickTime™ and a decompressor

are needed to see this picture.

Page 2: Enabling technologies for facilitating access and use of data

File Format#N

File Format#2

File Format#1

CDM

Visualization&Analysis

Goal: N + M instead of N * M things on your TODO List

NetCDF file

Data Server

Web Service

Page 3: Enabling technologies for facilitating access and use of data

Common Data Model

• What is it?• Capabilities for observational data• Current status

Page 4: Enabling technologies for facilitating access and use of data

What is it?

• Abstract Data Model for scientific data• Implemented by Netcdf-Java library• Core of the THREDDS Data Server• Co-evolving with the CF Conventions

Page 5: Enabling technologies for facilitating access and use of data

Abstract Data Modelaka Object Model

• Data Access Layer– NetCDF / HDF5 / OPeNDAP– subset in index space

• Coordinate System Layer– CF, VisAD, HDF-EOS, GRIB– georeferencing

• Feature Type Layer– OGC WxS, ISO, CSML,– Subset in coordinate space

Page 6: Enabling technologies for facilitating access and use of data

Abstract Data Model

• Turns a collection of bytes into a collection of objects called features– Eg: Grids, swaths, profiles, radial sweeps

• These objects play the same role as a schema does in a database

• Defines the things (nouns) and what operations (verbs) are possible

Page 7: Enabling technologies for facilitating access and use of data

Netcdf-Java library implementation

• 100 % pure Java, open source, developed and maintained by Unidata

• Object oriented, strongly typed, garbage collected, huge open-source libraries, runtime configurable == highly productive

• Many different file formats• Many different coordinate system conventions• Library is used by many other software

packages

Page 8: Enabling technologies for facilitating access and use of data

Netcdf-Java File Formats

• General: NetCDF-3, NetCDF-4, HDF5, HDF4, OPeNDAP

• Gridded: GRIB-1, GRIB-2, GEMPAK, McIDAS, UAMIV CAMx

• Point: BUFR, GEMPAK

• Radar: NEXRAD 2&3, DORADE, CINRAD, UF

• Satellite: DMSP, GINI, McIDAS, FYSAT, HDF-EOS

• Misc: GTOPO, NLDN, USPLN, etc• Write your own IOServiceProvider Java class

Page 9: Enabling technologies for facilitating access and use of data

Transforms (CF)

Projections albers_conical_equal_area, lambert_azimuthal_equal_area,

lambert_conformal_conic, mcidas_area, mercator, orthographic, rotated_pole , stereographic (including polar), transverse_mercator, UTM (ellipsoidal), vertical_perspective

Vertical Transforms• atmosphere_sigma, atmosphere_hybrid_sigma_pressure,

ocean_s, ocean_sigma, existing3DField

Write your own CoordTransBuilderIF Java class

Page 10: Enabling technologies for facilitating access and use of data

Used by other applications

– Integrated Data Viewer, ToolsUI (Unidata)– Panoply (NASA)– ncBrowse (EPIC/NOAA)– Java NEXRAD Viewer (NCDC/NOAA)– MyWorld GIS (Northwestern)– EDC for ArcGIS, ERRDAP (SFSC/NOAA)– Live Access Server (PMEL/NOAA)– ncWMS (Reading)– Matlab plug-in (USGS)

Page 11: Enabling technologies for facilitating access and use of data

Servlet Container

Core of the THREDDS Data Server

Datasets

catalog.xml

motherlode.ucar.edu

THREDDS Server

NetCDF-Javalibrary

Remote AccessClient

IDD Data

•HTTPServer

•WMS

•WCS•OPeNDAP

configCatalog.xml

Page 12: Enabling technologies for facilitating access and use of data

THREDDS Data Server (TDS)• Web server for scientific data• 100% Java - servlet• Provides remote data access

– OPeNDAP – Open Geospatial Consortium (OGC) WMS and

WCS – HTTP file transfer– Experimental data access protocols.

• Infrastructure – not a portal

Page 13: Enabling technologies for facilitating access and use of data

TDS and NcML

• Embed NcML into the TDS configuration catalog• Server serves a virtual dataset defined by NcML

– NcML hidden from the client• Can “fix” metadata problems• Can augment metadata• General Aggregations

– joinNew, joinExisting, Union• Specialized Aggregations

– Forecast Model Run Collection (FMRC)– Point Feature Collections (version 4.2)

Page 14: Enabling technologies for facilitating access and use of data

TDS / NcML Modify all files in datasetScan

<datasetScan name="Ocean Satellite Data" path="/data/ocean/sat/" location= "/data/ncdc/impacts/scenario4b/run1234">

<netcdf> <attribute name=“NCS:Provenence" value=“NCDC assimilation prog4gd from

GOES-10"/> </netcdf>

</datasetScan>

Page 15: Enabling technologies for facilitating access and use of data

TDS / NcML aggregation<dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-

CONUS_4km">

<netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" <aggregation dimName="time" type="joinNew"> <scan location="/data/satellite/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf>

</dataset>

Page 16: Enabling technologies for facilitating access and use of data

Co-evolving with the CF Conventions

• Implementation of the CF Conventions• Strong feedback (in both directions) between

CF and CDM• CF is the recommended way to write datasets• CDM also deals with legacy datasets and other

file formats besides netCDF

Page 17: Enabling technologies for facilitating access and use of data

CF

• CF has mostly focused on model gridded data– Driven by IPCC work

• Has a general coordinate system model– :coordinates = “lat lon alt time”;– Sufficient for swath, some in-situ data

• Current efforts– Radial data (NCAR/EOL)– Discrete Sample data (aka point, in-situ data)

Page 18: Enabling technologies for facilitating access and use of data

• Point: measured at one point in time and space • Station: time-series of points at the same location• Profile: points along a vertical line • TimeSeries of Profiles a time-series of profiles at same

location. • Trajectory: points along a 1D curve in time/space • Trajectory of Profiles: a collection of profile features

which originate along a trajectory.

Discrete Sample Data Categorization

Page 19: Enabling technologies for facilitating access and use of data

Proposed Encoding Variations

• Rectangular Array– Multidimensional– Single : one feature in the file

• Ragged Array – different length features– Contiguous – Non-Contiguous– Flattened

Page 20: Enabling technologies for facilitating access and use of data

Current CDM Status

• Discrete Sample Data proposal– Almost finalized (Caron/Gregory/Hankin)– CDM implementation now in 4.1– Collections of files to be in 4.2

• Forecast Model Run Collection refactor– Also using Collection – Caching on the server– Scale to much larger collections (NCDC/Nomads)– Scheduled for 4.2

Page 21: Enabling technologies for facilitating access and use of data

CDM funding status

• CDM/THREDDS work competes with many other priorities at Unidata

• THREDDS is most used by large data centers (NOAA/NASA/USGS/EPS, EU)

• Important (but indirect) benefits to NSF ATM constituency (US academic meteorology)

• Unidata is fully committed but not much chance of expanded base funding from NSF

Page 22: Enabling technologies for facilitating access and use of data

CDM funding status (cont)

• Have a proposal in to NSF Cyber-Infrastructure solicitation– Integration of TDS and IDD/LDM data streams– Explore use of Hadoop (Map/Reduce) for very

large collections• Need commitment of resource from you

– ($$) Custom work when compatible– In-kind contribution == time and attention for

CF/CDM from domain experts and engineers

Page 23: Enabling technologies for facilitating access and use of data

Thank You!