dod center for geosciences / atmospheric research colorado state university overview of the data...

33
DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CO

Upload: lewis-carter

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Overview of theData Processing and Error Analysis System (DPEAS)

Andrew S. Jones

Colorado State University (CSU)Cooperative Institute for Research in the Atmosphere (CIRA)

DOD Center for Geosciences / Atmospheric Research (CG/AR)Fort Collins, CO

Page 2: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

What is it? Data processing system for “large” data analysis

tasks using common PCs Features:

2nd generation system (replaces an earlier system called PORTAL (Jones et al., 1995))

Parallel implementation Web-based documentation and monitoring Incorporates a Fortran-interpreter for input tasks Virtualized I/O subsystem (only memory-resident data

structures are needed, data algorithms now function like a model) Able to failover to redundant hardware Extensible User Module

Error Analysis code is still under development Implemented on Windows NT/2000 OS

Page 3: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

What Does it Do? Global merge capabilities for numerous data sets Current system in operational use for 2+ years at CIRA

Current average operational throughput rates using 15 processors on 8 PCs is 17 TB/yr (47 GB/day).

Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day) Simplifies

Powerful abstraction layers allow anyone to write parallel code Virtual I/O subsystem reduces end-user code complexities Users interact using a language most already know

Easily Scales Limited process “cross-talk” improves scaling behavior Tests have shown that a 2000 machine cluster is physically

feasible. Basically… just add hardware.

Page 4: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

10 Data Types Are Currently Supported

Reads and Writes HDF-EOS natively GOES IMAGER (McIDAS) NOAA AVHRR GAC and LAC (McIDAS) NOAA AMSU-A and B (HDF-EOS) DMSP SSM/I (Byte Stream) DMSP SSM/T-2 (NGDC OIS) DMSP OLS (NGDC OIS) TRMM TMI and VIRS (HDF) User extensible… (your format here)

Page 5: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

EXPERIMENTAL CLUSTER (nights only/7)

OPERATIONAL CLUSTER (24/7)

STORAGE VIEW

PROCESSOR VIEW

Primary Backup W1 W2 W3

W5 W6

Legend Primary Backup Wn Worker

Cluster Summary - All Ingest Processes - Most Higher Level Remapped Products

9 Processors 3.0 GFlops 2.25 GB RAM

Cluster Summary - Large Global Sectors

6 Processors 2.5 GFlops 2.5 GB RAM

Primary Backup

MirroredSet

240 GB 240 GB

W1 W2

66 GB 240 GB

W4

The Hardware

Page 6: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Failover Mode

PrimaryX

EXPERIMENTAL CLUSTER (nights only/7)

OPERATIONAL CLUSTER (24/7)

STORAGE VIEW

PROCESSOR VIEW

Primary Backup W1 W2 W3

W5 W6

Failover Steps:Automated1. Synchronize states2. Promote the Backup

Restore Steps:Manually initiated1. Demote the Backup2. Restore Mirror Set3. Synchronize states4. Reactivate Primary

Backup

MirroredSet

240 GB 240 GB

Legend Primary Backup Wn Worker

W1 W2

66 GB 240 GB

W4

X

Page 7: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Module ContextGUIs

Command Shell Interpreter

Internet InformationServices

Web Browser

Other Applications

DPEAS Fortran Interpreter

DPEAS HDF-EOSVirtual I/O Subsystem

Analysis Modules User Modules

DPEASSystemState

Batch Job Client

TranslationModules

OutputModules

Operating System (Windows 2000)

Explorer Command Line

DPEAS Data Processing Engine

Sp

awn

Su

bta

skDPEAS Input Script

Command Line Script

DP

EA

S S

ub

task

Batch JobService

This is DPEAS

Page 8: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

An example of a DPEAS input script file

Page 9: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

How DPEAS Starts

Program Start

DPEAS Initialization

Interpreting DPEAS script declarations

Interpreting DPEAS script executable statements

Page 10: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

How DPEAS Ends

Program End

DPEAS Summary

Interpreting DPEAS script executable statements

Page 11: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

How Are Spawned Input Scripts and Jobs Created?

All spawned DPEAS jobs run machine-generated DPEAS input scripts which are generated by the data processing engine from the Master DPEAS input script (The examples shown previously were examples of DPEAS machine-generated code)

This is automated within DPEAS and the user code goes along for the free ride since it is part of the DPEAS executable (it’s like meeting a friendly virus which helps to spread your code along with it)

Page 12: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

What Does DPEASParallelism Look Like?

Do loop contentsare sent to other resources in parallel

The new jobs run the same “DPEAS.exe”, but execute only the subtask operations

Completed Jobsallow additional jobs to start

Page 13: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

The 3 Programming Steps to Add a User Routine to DPEAS1. Insert a program “hook”

The program hook makes the main DPEAS programaware of the existence of your wrapper routine.

2. Create a wrapper routineThe wrapper routine tells the DPEAS fortraninterpreter how to parse and interact with yourapplication subroutine arguments.

3. Create an application routineThe application routine performs the “real” work.You can do anything you want within the applicationroutine.

Page 14: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

How does the “User_Module.f90” relate to my DPEAS Input Scripts?

User_Module.f90

Program HookWrapper Routine

Application Routine

DPEAS InputScript

OrdinaryFortran Compiler

Compile Interpret AutomatedParallelization

Using Self-Replication

"DPEAS.exe"Interprets DPEAS

Input Script

End

Return toMaster

"DPEAS.exe"Interprets DPEAS

Input Script

DPEAS InputScript

Subtask

Page 15: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The user’s application routine

Using the virtual I/O data via pointers

1. Find each MW channel

2. Allocate a new output array data structure

Your science code looks like this

Page 16: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The results: Complete integration

The new user routine is now fully integrated into DPEAS

Page 17: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The output HDF-EOS file

Page 18: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

150 GHzEffective Emissivity

Calculated from:GOES-08 IMAGERNOAA-15 AMSU-B

User Example:The output image representation

Page 19: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Creates 2 new routines: Wrapper routine Application routine

Requires 25 lines of executable code: 2 – Program hook 4 – Wrapper routine 19 – Application routine

2 – Variable assignments 3 – Science algorithm 14 – Virtual I/O library calls

(using only 2 Virtual I/O library routines)

User Example:Summary

Small overhead for gaining massive parallelism capabilities!

Page 20: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Creates 2 new routines: Wrapper routine Application routine

Requires 59 lines of executable code: 2 – Program hook 4 – Wrapper routine 53 – Application routine

2 – Variable assignments 3 – Science algorithm 48 – HDF-EOS library calls

(using 26 HDF-EOS library routines)

User Example:How complex would the user routine be, if written without the Virtual I/O library?

Answer: Without the DPEAS Virtual I/O library there would be:

24 additional I/O routines called by the user (+1200%)

34 additional lines of user code (+236%)

Page 21: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:Conclusions

Implementation Insights Minimal amount of end-user code is required The effort and resources involved are small

(The DPEAS program recompiled in < 30 s on the user’s desktop)

Virtual I/O Insights The DPEAS virtual I/O access method is less complex than

traditional HDF-EOS file access methods

End user’s perspective End users are protected from technical data format issues End users can develop higher quality code by leveraging

shared robust common modules Scalability is greatly enhanced with little end user effort

Page 22: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Summary

DPEAS can process large data sets in an efficient manner while maintaining centralized management controls and error handling behaviors

Parallelism of the code is automatic and runs on “cheap hardware”

Failover capabilities make the system more robust User code is shielded from complexities of the

system using software abstraction layers Little training is needed since user interfaces are in

a known scientific language User modules directly access data from memory –

obsolesces traditional file access methods but maintains needed file compatibility

Page 23: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

What did I learn aboutHDF-EOS in the process?

HDF-EOS is an excellent “universal” data format It works for all satellite sensors types I have encountered to date (10+)

HDF-EOS requires serious software design before the implementation stage

It is my experience that “Time” information as a geo/time field for sectorizing is overrated and is likely to cause future software design headaches with the more complex sensors if encouraged to be the “norm”

Page 24: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

My 2 cents: How HDF-EOScould be made even better

(Hopefully someone has already thought of these things,and this short list will be a reaffirmation)

Given that GOES data, for example, and other multi-detector sensors can have multiple times for each channel for the same geolocation position, and that in addition, they can and do interrupt their sensor scans at any time…

Treat “Time” as a data attribute Currently I associate “Time” and other associated

arrays with its principle data array by nomenclature It would be better to use data array attribute

“groups”. Then “Time”, “Calibration”, and other associated arrays could be grouped with the data array through the data format.

Page 25: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Why Data Attributes? Many data channels have “associated” information

For example, it might be very meaningful to associate the min. and max. of a grid location with its mean value

It would be better if there was a standard way of showing that group association, so we don’t have to understand each other’s unique nomenclatures, “intent”, or have to resort to the use of unusual “mixed” HDF/HDF-EOS data files

Data attributes should not be arbitrarily limited in scope, but have full data type ranges

Units could also be incorporated through data attributes

Page 26: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

The End

[email protected]

Page 27: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Appendix The following series of slides show how a

user can easily modify DPEAS

1. The user’s program hook

2. … wrapper routine

3. … application routine(using the virtual I/O data via pointers)

4. Usage of the new user routine in a DPEAS input script file

5. The Results: Complete Integration

Page 28: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The user’s program hook

2 lines of code

Page 29: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The user’s wrapper routine

4 lines of executable code

Page 30: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The user’s application routine

Using the virtual I/O data via pointers

1. Find each MW channel

2. Allocate a new output array data structure

Your science code looks like this

Page 31: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:Usage of the new user routine in a

DPEAS input script file

Page 32: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

User Example:The results: Complete integration

The new user routine is now fully integrated into DPEAS

Page 33: DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S

DOD Center for Geosciences / Atmospheric Research Colorado State University

Where Do I Find DPEAS?

DPEAS Home Page:

http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm

Please direct questions to [email protected]