bakinam t. essawy and jonathan l. goodall - post-processing workflows using data grids to support...

17
 Post-processing W orkflows Using Data Grids to Support Hydrologic Modeling Bakinam T. Essawy and Jonathan L. Goodall Department of Civil and Environmental Engineering Univer sity of Virginia 3rd CUAHSI Conference on Hydroinformatics July 17, 2015

Category:

Documents


0 download

DESCRIPTION

2015 CUAHSI Conference on Hydroinformatics

TRANSCRIPT

  • Post-processing Workflows Using Data Grids to Support Hydrologic

    Modeling

    Bakinam T. Essawy and Jonathan L. Goodall Department of Civil and Environmental Engineering

    University of Virginia

    3rd CUAHSI Conference on Hydroinformatics July 17, 2015

  • VIC Output data set

    VIC data set http://boto.ocean.washington.edu/story/show/45

    VIC Output data set on the iRODS server

    Variable Infiltration Capacity (VIC) Macro-scale Hydrologic Model

    Example for a flux file. Fluxes_x_y: x = latitude, y = longitude flux files contain information about moisture and energy fluxes for each time step for the three layers of soil (Top, Middle, and Deep).

  • The VIC Model

    Source : Gao et al. (2009)

    VIC = Variable Infiltration Capacity; A regional-scale land surface hydrology model

    VIC developed at UWashington and Princeton; applied worldwide

    Spatial resolution: 1/8-degree grid cell

    Three layers of soil:

    top layer (Layer 0, 0-10cm)

    mid layer (Layer 1, 10-30cm)

    lower layer (Layer 2, 30-100cm)

  • The County-level population data extracted from Terra Populus Website

  • Integrated Rule-Oriented Data System (iRODS)

    The iRODS-enabled Data Federation Consortium (DFC) is an NSF project that provides support for both federation of resources and services.

    This work is funded the by DFC project, and uses a DFC data grid for storage and long term access to the stored datasets over heterogeneous resources.

    The DFC data grid also supports sharing of workflows that enable the reproducibility of the model results

  • Workflow Structured Object (WSO)

    Within the iRODS data grid, a Workflow Structured Object (WSO) enables the execution of a workflow, while capturing provenance information and archiving results.

    The workflow, the input files, and the output files can be shared.

    The workflow can be re-executed with new input files and versions of the output file are automatically saved.

  • Objectives

    Demonstrating how different data transfer approaches can be used for connecting cyber-infrastructure systems developed by different groups.

    Demonstrate how iRODS can provide federation across data grids.

  • Objectives

    Using the AWS (Amazon Web Services) for computing, and how public repositories like SEAD allow sharing and uniquely identifying data and modeling resources used within analyses.

    We are trying to reach an approach for model reproducibility, where a scientist can easily share his model, input and output in an easy way so others can benefit from it.

  • Main components and data flow in the post-processing system

  • Shell Script

    Python Scripts Parameter File

    Workflow file

    Visualization

    WSO files used by WSO for creating the visualization

  • Two main directories for storing all files required by the WSO on the iRODS server

    The location were the shell script and the python scripts located on

    the iRODS server

    Component of one of the runDir associated with the WSO like the staged data in or out, the cvs files

    output from python scripts, and the stdout

    The parameter file, the generated run file and the output RunDir for each time the

    run file is accessed

    The mounted collection

    The location of the mounted WSO on the iRODS server

  • The execution of the WSO installed on the hydrology grid from the client machine

    The user log in to client machine where

    the icommands are installed on

    ils to list all the collection under the path:

    /hydrology/home/bakinam

    Running the WSO through iget command to run the

    generated .run file. Output Message indicates that the

    WSO has been executed successfully

    icd to change collection were the WSO files are

    located

    Listing the mounted collection

  • Conclusion Reproducible data visualizations on large

    hydrological data collections Using strong and weakFederation of data across

    communities (e.g., TerraPop interoperability example)

    Publishing data along with workflow-produced metadata (e.g., SEAD interoperability example) using unique Identifier.

  • Future Plans

    Swap SEAD with Hydroshare to share my datasets and create a resource type from my WSO.

  • Bakinam T. Essawy

    Department of Civil and Environmental Engineering

    University of Virginia

    [email protected]

    Questions