a presentation for the 4 th cops workshop september 25-26, 2006 hohenheim, germany

40
A presentation for the 4 A presentation for the 4 th th COPS Workshop COPS Workshop September 25-26, 2006 September 25-26, 2006 Hohenheim, Germany Hohenheim, Germany Raymond McCord Raymond McCord Oak Ridge National Laboratory* Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Oak Ridge, Tennessee, USA Assisted by Dave Turner Assisted by Dave Turner University of Wisconsin University of Wisconsin Madison, Wisconsin, USA Madison, Wisconsin, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725 contract DE-AC05-00OR22725 Overview of Atmospheric Overview of Atmospheric Radiation Measurements (ARM) Radiation Measurements (ARM) Data Management and Archiving Data Management and Archiving in NetCDF formats in NetCDF formats

Upload: teigra

Post on 19-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Overview of Atmospheric Radiation Measurements (ARM) Data Management and Archiving in NetCDF formats. A presentation for the 4 th COPS Workshop September 25-26, 2006 Hohenheim, Germany Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Assisted by Dave Turner - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

A presentation for the 4A presentation for the 4thth COPS Workshop COPS WorkshopSeptember 25-26, 2006September 25-26, 2006Hohenheim, GermanyHohenheim, Germany

Raymond McCord Raymond McCord Oak Ridge National Laboratory*Oak Ridge National Laboratory*

Oak Ridge, Tennessee, USAOak Ridge, Tennessee, USA

Assisted by Dave TurnerAssisted by Dave TurnerUniversity of WisconsinUniversity of Wisconsin

Madison, Wisconsin, USAMadison, Wisconsin, USA

*Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725*Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

Overview of Atmospheric Radiation Overview of Atmospheric Radiation Measurements (ARM) Data Management Measurements (ARM) Data Management

and Archiving in NetCDF formatsand Archiving in NetCDF formats

Page 2: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

OverviewOverview• Data management

– Objectives – Policy

• ARM data and systems description– Systems overview– Data storage strategy

• About Data Files and Formats– Features– Header attributes– Data structure– Access and Analytical Tools

• ARM Data and Information Types• Beyond “the data file”

– Where are the metadata??– Web tour of www.arm.gov

• ARM Data Access– Overview of Archive– Demo of ARM Archive user interfaces (time allowing)

Page 3: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Quotes from RaymondQuotes from Raymond• “Storing data is EASY. Finding and using data later

is NOT…”– Data accessibility and usage, not storage, are the primary

metrics of an Archive• “Systematically and consistently organized data

does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.”

• “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.”

• “Successfully managed data by projects may not be ready to be archived.”

• Scientific data systems must be designed to accommodate changes (content, access, users, etc.). This is noticeably different from business systems – the origin of most of our technology.

Page 4: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Data Management: ObjectivesData Management: Objectives

• ARM Objectives– Create a data product that is:

• Logically and structurally consistent through time• Capable of accommodating changes (scope, content, quality

information, etc.) • Accessible both “now” and in the future

– Develop and operate a data system that is:• Timely to develop and processes data in a timely manner• Modular for expansion and change• Can withstand external review (mostly scientific and quality issues)

• COPS Objectives– When possible create data products “like ARM”– When possible attain the same data management objectives as

ARM

Page 5: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data PolicyARM Data Policy

• Provide open data access:– To maximize exchange of data

• between collaborating programs• to be available for scientific objectives

– In a timely manner (known and minimal delays)– To data of “known and reasonable” quality– From routine instrument operations– With delayed and restricted access for experimental implementations

• Record data usage and users– Retrospective notifications of new quality information or reprocessed

data– Important for documenting “worth” of data to sponsoring organization– Required for “National User Facility” status

• Provides access to operational funding beyond research programs

Page 6: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Data SystemsData Systems

Page 7: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data Systems: OverviewARM Data Systems: Overview

GeneralScientific

Community(2100 users,

140 universities44 countries)

Southern Great Plains

North SlopeOf

Alaska

TropicalWesternPacific

AerialVehicles

ARMArchive70 TB

Data Mgt &Processing

Facility

ARMScientists

External- Model- Satellite- GIS (BNL)

(ORNL)

(Alaska)

(Manus, Nauru, Darwin)

(Oklahoma, USA)

(PNNL)

• Geography Dispersed• Enabled by Internet Technology• Continuous availability• Today - >2000 Different Data Streams• Availability/Quality/Meaning

25 GB/Day

MobileFacility

Page 8: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data Systems: DetailARM Data Systems: Detail

laptop

laptop

Data logger

Site data systems

ResearchUser system

Very Limited User Access

ARMDMFhourly

con

tin

uou

s

Shared disk

hour

lyExternal disk(shipped)

ARMArchive

As needed

daily

hourly

Research / Data Quality system

Page 9: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data Storage StrategyARM Data Storage Strategy

• ARM data are stored in Data Streams– A “data stream” is a series of files (daily) that have

similar contents and structure.• Files can be concatenated across time if needed.• Daily files are created as a convenience for processing,

review, transfer, and distribution.

• The same instruments at different locations create files with the same data stream structure.

• Automated QC flags are contained within the data files.

Page 10: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

About Data Files and FormatsAbout Data Files and Formats

Page 11: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

NetCDF File “features”NetCDF File “features”

• Processed ARM data files are stored in NetCDF format– Self-contained data documentation

• Header block• Data arrays

– Non-proprietary format (open source) – Efficient binary format– Directly accessible by application software (IDL, MATLAB)– Libraries available for data creation and access from your own

software • available for Fortran, C, C++, Perl, Java

• http://www.unidata.ucar.edu/software/netcdf/index.html

Page 12: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

NetCDF File Structure (Header)NetCDF File Structure (Header)

• File-specific information – creation time, dimension values for arrays

• Data definition attributes– Data field names (varname)– Data field description (longname)– Data limits

• min, max – optional

– Measurement info • units, resolution, missing value code, etc. – optional??

• Global values (attributes)– Descriptive information that valid for a portion of the data stream

• Location name, reference for retrieval algorithm, long term calibration information, contact information, etc.

Page 13: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Examples of ARM Header InformationExamples of ARM Header Information

Online Demo Link Here

Page 14: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

NetCDF File Structure (Data)NetCDF File Structure (Data)

• Data are stored in “array” records after the header.– ARM data are “dimensioned” by Time and sometimes Height

• Time recording is very important. • ARM uses base time + time offset and composite time

– Multi dimensional arrays are possible, but rare.

• Data fields are stored in the same order as defined in the header.– Data are accessible by “array number”– Avoid using this!!!

• Single and multiple dimension data arrays can occur in any order within a data stream.

Page 15: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

NetCDF Data Access and Analysis NetCDF Data Access and Analysis

• Applications using NetCDF can:– Access data by filename / data field name– Concatenate similar files (e.g., from a time

series)– Merge of values based on similar dimension

values

• Links to NetCDF tools can be found at:– http://www.arm.gov/data/tools.stm

Page 16: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data/Information Structure ARM Data/Information Structure

Going to a “higher” view!!

Page 17: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data Types - overviewARM Data Types - overview

• Continuous data (stored offline, accessible by requests from user interface)– ARM collected data– Value added products– External data

• Special data (stored online, accessible from web interface)– Field Campaign (IOP) data– Beta data– PI generated data products

Page 18: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Data Types – more detailARM Data Types – more detail

• ARM collected data– RAW data files

• Available upon request, but not accessible from User Interface• Minimal documentation; user beware• Wide variety of formats; many are binary

– Processed data files• Accessible from user interfaces• Common formats include NetCDF and HDF

• Value added products (VAPs) – Include one or more of the following

• Advanced algorithms• Multiple data inputs• Input from long-time periods

– ARM produces some VAPs to improve the quality of existing measurements. In addition, when more than one measurement is available, ARM also produces "best estimate" VAPs.

[email protected]@ornl.gov1-888-ARM-DATA1-888-ARM-DATA

Page 19: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Types of Quality InformationTypes of Quality Information

• Automated products – QC flags

• inserted in data files during processing

– QA flags– Summaries of flags (data color)

• Manual products– Data Quality Reports (DQRs)

• web accessible reports • delivered as html files after data requests • event driven and problem-based

– Mentor Instrument Reports• web accessible (http://www.db.arm.gov/IMMS/ )• Also linked to instrument web pages.

Page 20: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Beyond the Data File!!Beyond the Data File!!

• Overview of Information Structure– “Patience… Please… getting ready for a Web

Tour”

• You will benefit from our “logic”.• You will need our “content”.• We will need to know your “content”.• Your structural “logic” will also be helpful to us.• A “sneak attack” on Metadata Issues

Page 21: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

????

VAPs

Guest

ARM information StructureARM information Structure

Sites

“Instruments” Data streams Measurements

Location, etc

Categories +

metadataDocumentation + Categories +

Metadata

Data stream

“Family” metadata

Page 22: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Tour of www.arm.govTour of www.arm.gov

Data streams Measurements

Instruments

What do you see now??

Page 23: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Data Access (user interfaces) Data Access (user interfaces)

How many doors are enough??

Page 24: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Accessing Data from the ArchiveAccessing Data from the Archive• User interface options

– Overall scheme of user interfaces– Logical view of interfaces

• More details and demo (time allowing)– ARM Data Browser– Web Shopping Cart– Catalog Interface– Thumbnail Browser– IOP Data Browser– Contact Us…..

• 1-888-ARM-DATA, [email protected]

• Continuous data distribution– “Standing Orders”

Page 25: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

You are NOT alone...You are NOT alone...• 3 sites• 10’s facilities• 100’s data sources• 100’s data users• 1000’s measurement

types• 1,000,000’s data files• 1,000,000,000’s

measurements• 10,000,000,000,000’s

bytes

Request Statistics From ArchiveRequest Statistics From Archive

Archive Data FlowArchive Data Flow

0

400000

800000

1200000

1600000

2000000

2400000

Oct-95 Oct-96 Oct-97 Oct-98 Oct-99 Oct-00 Oct-01 Oct-02 Oct-03 Oct-04 Oct-05

files MB

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

Oct-95 Oct-96 Oct-97 Oct-98 Oct-99 Oct-00 Oct-01 Oct-02 Oct-03 Oct-04 Oct-05

MB in MB out

Page 26: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Comparison of User Interface OptionsComparison of User Interface OptionsInterface

nameAccessible

data“Shopping” approach

([email protected], 1-888-ARM-DATA)

ARM Data Browser

Routine ARM data

“I know what I want. Do you have it?”Searching with predefined selection criteria.

Catalog Interface

Routine ARM data

“I am not sure what I want. I need to see what you have available.”Browsing a hierarchy of availability summaries.

Thumbnail Browser

Most routine ARM data

“I will know what I want when I see it.”Searching with a combination of predefined selection criteria and visual review of data plots

Web Shopping Cart

Routine ARM data and some IOP data

“I need to read about what you have, then I will decide.”Discover areas of interest by browsing the ARM web documentation and collect items of interest.

IOP Data Browser

IOP, special, PI, and beta data

“I need to look in the odd parts bin.”Direct access to IOP data. Navigate /year/site/iop directory tree. Also use narrow Google search.

Page 27: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Overall Interface SchemeOverall Interface Scheme

Display detailed information(file list, DQRs, color map, QLs)

Display summary results from search(# files, # DQRs, # QLs)

Identify “data of interest”(answer questions)

Order files

Page 28: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

You and the Archive You and the Archive (Simplified view)(Simplified view)

Archive web-basedUser Interface

Database

quer

y spe

cific

atio

ns

quer

y res

ults

User copy (FTP)

E-mail notification

MassStorageSystem

File list and tracking

FileRetrievalProcessor

Requested files

FTP host

Start

End

Page 29: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

User Interface “Demo”User Interface “Demo”

use presentationGo to web interface

Page 30: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Display ThumbnailsDisplay Thumbnails

Page 31: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Thumbnail Browser – Catalog InterfaceThumbnail Browser – Catalog Interface

Thumbnail Page

Page 32: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

IOP Data Browser – IOP ViewIOP Data Browser – IOP View

Click for access to Click for access to more data sub-more data sub-

directoriesdirectories

Page 33: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

IOP Data Browser – Data SelectionIOP Data Browser – Data Selection

Page 34: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

NotificationProcessor

Standing Order ProcessingStanding Order Processing

Data base

Email specifications to Archive

New File Processor

User copy (FTP)

E-mail notification

FTP hostftp.so.archive.arm.gov

NewData files

Temporarycopy

DeliveryDirectories

Order specifications

Page 35: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Questions? Comments?Questions? Comments?

Page 36: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Detailed Reference SlidesDetailed Reference Slides

Page 37: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Data access policy “goals” (1)Data access policy “goals” (1)

• Data exchange between ARM and COPS as open and complete as needed– (more comments on next slide)

• Provide online documentation about– Measurement technology– Installation and site information– Data structure– Basic QA review methods and results

• Generate data products in a “timely” manner – Predictable schedule for generation and access

• Retain complete and comprehensive records of data inventory, usage, and users.– In a searchable database

• Distribute to data users updated information for data quality and data revisions (reprocessing) as needed

Page 38: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Data access policy “goals” (2)Data access policy “goals” (2)

• Assume that fully open access has the best potential for overall scientific output– No cost for data exchange and access

• Protect “rights” of data generators– Provide initial opportunity for publication and evaluation

• Especially for data from “new” instruments.– Offer co-authorship or acknowledgement to instrument PI’s.

• Prevent premature access of data– Very early access only as needed for operational planning (forecasting)– Before initial QC evaluation is complete

• Recipients of data have unrestricted use.• Within an “access group” all requestors have equal access

– No favorites between groups (??)• Data file format (netCDF) and structure will match ARM when

possible (??)

Page 39: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

ARM Archive SystemsARM Archive Systems

DMFsystem

ExternalData system Archive

Storage Processing

User interface

MassStorage System

daily

Metadata Database

As needed

Retrievalprocessing

FTP host

usersStanding

OrdersExternal disk

(shipped)

Radar Spectral data

IOPData system

usersuser

s ARM Webdocumentation

Page 40: A presentation for the 4 th  COPS Workshop September 25-26, 2006 Hohenheim, Germany

Logical Structure of ARM MetadataLogical Structure of ARM Metadata

MET

SKYRAD

InsturmentClass

description

Date range

Site / facilitylist

Inventories ofStored and

Retrieved filesData stream

Measurement metadata

InsturmentCode

description

zcc1metM1.b1

nim1metM1.b1

30met

1met

skyrad60s

skyrad20s

nim30metM1.b1

zcc30metM1.b1 DailyFiles

Dailyfiles

Dailyfiles

Dailyfiles

Storageprocessing

UserInterface

Meastype

WebInfo