mariana vertenstein [email protected] cesm software engineering group national center for...

37
Mariana Vertenstein [email protected] CESM Software Engineering Group National Center For Atmospheric Research CESM is primarily sponsored by the National Science Foundation and the Department of Energy CESM Infrastructure Update

Upload: bertram-douglas

Post on 27-Dec-2015

225 views

Category:

Documents


6 download

TRANSCRIPT

Mariana [email protected]

CESM Software Engineering GroupNational Center For Atmospheric Research

CESM is primarily sponsored by the National Science Foundation and the Department of Energy

CESM Infrastructure Update

OutlineNew approach to infrastructure development

Common Infrastructure for Modeling the Earth - CIME

New coupling complexity New components, routing complexity gridsChallenges of data assimilation

ESMF collaborationNUPOC, On-line regridding

New Infrastructure Capabilities (1) Statistical Ensemble Test (2) Creation of parallel workflow capabilities (3) PIO2

In past infrastructure (no IP) tied to science development (has IP)

Ocean(POP, Data)

SEA ICE(CICE, Data)

COUPLER

Atmosphere(CAM, Data)

Land(CLM, DATA)

River(RTM, DATA)

Land Ice(CISM)

Wave(WW3, DATA)

Why CIME?Facilitate infrastructure modernization as a

collaborative project (e.g. CESM infrastructure)

Response to February summit of US Global Change Research Program (USGCRP) / Interagency Group on Integrative Modeling (IGIM) as a positive outcome from the February Summit IGIM is charged with coordinating global change-

related modeling activities across the Federal Government and providing guidance to USGCRP on modeling priorities.

Enable separation of infrastructure (no intellectual property) versus scientific development codes (intellectual property must be protected)

Eliminate duplication of efforts

CIME current steps forward…..ALL CESM infrastructure to PUBLIC

github repository

This will facilitate AND encourageoutside collaboration frequent feedback on infrastructure

development quick problem resolution rapid improvement in the productivity,

reliability and extensibility of the CIME infrastructure

CIME can developed and tested as a stand-alone system – independent of prognostic components

Old paradigm – everything in restricted developer repository

CAM (prognostic)DATM (data)SATM (stub)

XATM (cpl test)

InfrastructureRestricted Subversion Repository -

All model components

Restricted Subversion Repository

ATM Models

Driver-Coupler CodeShare Code

ScriptsSystem and Unit Testing Mapping

Utilities

New paradigm – all infrastructure is Open Source

IP still in place for prognostic components

CAMCLMCICE

POP (MPAS)RTM(MOSART)

CISM

InfrastructurePUBLIC Open Source Github Repository -

Only prognostic components

Restricted Subversion Repository

Prognostic

Driver-CouplerShare Code

ScriptsMapping Utilities

System/Unit TestingAll Data ModelsAll Stub Models

All cpl-test Models

CIME Infrastructure can be used to facilitate releases and external collaborations

InfrastructurePUBLIC Open Source Github Repository -

Driver-Coupler(ESMF Collaboration)

Share CodeScripts

System/Unit testingMapping UtilitiesAll Data ModelsAll Stub Models

All cpl-test Models

Prognostic Components

e.g. CESM ESMF/NUOPC HYCOM

CIME implementation

Stand-alone capability CIME run and tested “stand-alone” with either all data

models or all stub-models or all “test-cpl” models

New unit testing framework new unit tests in coupler Framework can also be applied to prognostic components

Consolidation of separate externals Each current part of CIME was previously developed

independently and often led to inconsistencies Now CIME is a SINGLE entity and ensures consistency among

its various parts This simplifies and adds robustness to development process

Coupling Challenges

New component, routing and grid complexity,

DART data assimilation

Coupling ComplexityCurrently have 7 components

Ocean(POP, Data)

SEA ICE(CICE, Data)

COUPLER

Atmosphere(CAM, Data)

Land(CLM, DATA)

River(RTM, DATA)

Land Ice(CISM)

Wave(WW3, DATA)

Routing and Regridding Complexity Each component can run on its own grid – only assumption is

that ocn/ice are on the same grid

Multiple grids supported Regular lat/lon Dipole, Tripole (ocn-pop, ice-cice) Hexagonal (regular and unstructured Voronoi meshes)

MPAS grids (atm dycore, ocn, ice, land-ice)

Coupler is responsible for regridding currently using mapping files that are generated offline with ESMF parallel regridding tool Fluxes mapped conservatively States mapped with either bi-linear or higher order non-conservative

Components communicate with coupler at potentially different frequencies and with unique routine patterns

Coupling Complexity (Routing)

Ocean(POP, Data)

SEA ICE(CICE, Data)

COUPLER

Atmosphere(CAM, Data)

Land Ice(CISM)

Coupling Complexity (Routing)

Ocean(POP, Data)

SEA ICE(CICE, Data)

COUPLER

Land(CLM, DATA)

River(RTM, DATA)

Coupling Complexity (Routing)

COUPLER

Atmosphere(CAM, Data)

Land(CLM, DATA)

River(RTM, DATA)

Land Ice(CISM)

Multiple instance capability DART data assimilation

ocn obs

atm obs

Atmosphere(CAM)

Land(CLM)

Ocean(POP)

SEA ICE(CICE)

COUPLER River-Runoff

Atmosphere(CAM)Atmosphere

(CAM)

Land(CLM)Land

(CLM)

Ocean(POP)Ocean

(POP)

SEA ICE(CICE)SEA ICE

(CICE)

River-RunoffRiver-Runoff

DART

DART

Next steps in coupling with data assimilation

All data assimilation currently done via files and DART and CESM are separate executables

CESM must be stopped and restarted every data assimilation interval (6 hours) – extremely inefficient and expensive use of system resources

Limitations of only 1 coupler, but multiple component instances

New project enable DART to be a component of CESM coupled system each CESM component will have pause/resume capability and

will be able to start up from a restart file during the model run Extend coupler capability to permit multiple couplers within

single executable

ESMF Collaboration

NUOPCand

Online Regridding

Two ESMF/CESM Collaborations

1. Online regridding ESMF is collaborating with CSEG and this will be

brought into CIME

2. Introduction of “ESMF/NUOPC” capability in CESM/CIME

Vertenstein is co-PI on ESPC proposal “An Integration and Evaluation Framework for ESPC Coupled Models”

Import the standalone version of HYCOM into the validated NUOPC version of the CESM coupled system and test using the CORE forcing

Run reference CESM configurations (500 years present day climate + IPCC scenarios) for both POP and HYCOM

NUOPC and CESM

What is NUOPC?NUOPC layer “generic components” are templates that

encode rules for drivers, models, mediators (custom coupling code) and connectors (for data transfer)

Goal for DriverMaintain a single CESM driver – but restructure it to

accommodate both MCT and NUOPC component interfaces

Goal for Components: Implement ESMF-based NUOPC components as a CESM

option – but build on existing ESMF component interfaces

Redesign of cpl7 as first step

Why?Driver code was one large routine (6K loc)and hard-coded

to contain MCT data typesDifficult to understand, modify and add an alternative

coupling architecture

What was the redesign? Introduced a new abstraction layer between driver and

components – driver has no reference to MCT or ESMF types

Much easier to incorporate new ESMF/NUOPC driver components

Permits backwards compatibility and memory sharing between MCT and ESMF data structures

Original Redesigned

CAM

MCT CAM

DRIVERMCT based

MCT Hub<-> CAM Exchange

ESMF CAM

MCT ESMF

CAM

DRIVERComponent-type based

MCT Hub<-> CAM Exchange or

ESMF Hub<-> CAM Exchange

Component_type <-> ESMFor

Component_type <-> MCT

MCT CAM ESMF CAM

Current and Future WorkCurrent Status:

NUOPC implementation complete for all CESM components AND HYCOM

Modified the data exchange (between coupler/Mediator and components) to use NUOPC Fields

Future Work:Clean up and prepare NUOPC version for wider

useMerge code back to CIME and reconcile with other

developmental changes as well as MCT implementation

Performance evaluation

ESMF online regriddingAs more regionally refined grids are

introduced (e.g. MPAS, SE) – need tominimize the number of mapping files that are

needed simplify and streamline workflow for generating

new user grid configurations

Will be a requirement for run-time adaptive mesh refinement

ESMF is the only tool that currently delivers this capability

Status: Prototype implementation has been done and is being updated for newest coupler

New Infrastructure Capabilities (1)

Statistical Ensemble Test

CESM Ensemble Consistency Test

Motivation: Ensure that changes during the CESM development cycle (code modifications, compiler changes, new machine architectures) do not adversely effect the code

Question: Is the new data statistically significant from the old one

Old Method: compare multiple long simulations – time consuming and subjective

New Method: evaluate new data in the context of an ensemble of CESM runs

Part 1: Create “truth” ensemble of 1-year CESM runs (151)• Use “accepted” machine and “accepted” software stack• Differ by O (10-14) perturbations in initial atmospheric temp.

Part 2: Create “accepted” statistical distribution• Statistics based on ensemble• Summary file included with CESM release

Part 3: Evaluate “new” runs (new platform, code base, …)• Create 3 “new” runs (randomly selected i.c. from ensemble)• Principal Component Analysis (PCA)-based testing• Provides false positive rates

CESM Ensemble Consistency Test

Many uses:• Port-verification (new CESM-supported architectures,

heterogeneous computing platforms)• Sanity check on climate similarity for new CESM sytem

snapshots • (caught recent bug that would not have been caught before!!!!)

• Exploration of new algorithms, solvers, compiler options, …• Evaluation of data compression on CESM data

CESM Ensemble Consistency Test

New Infrastructure Capabilities (2)

New End-to-End workflow

Parallel Post-Processingas part of the model run

New Infrastructure Capabilities (3)

New Parallel IO Libarary (PIO2)

New Parallel IO Library : PIO2

PIO2 rewrite has newC language API added (F90 API retained)decomposition option improves scalabilitydata aggregation which improves performancenew testing framework

Provides higher performance through data aggregationnew subset rearranger provides higher

scalabilityProvides more options for IO performance

tuning but also provides tools to make tuning easier

Subset rearranger gives better scaling

Box rearranger gives optimal data layout

Example decomposition: CLM data on 2048 tasks.

Data rearranged to 32 IO tasks

Acknowledgements (plus many more)

ESMF Collaboration Cecelia Deluca, Fei Liu, Gerhard Theurich, Peggy Li, Robert Oehmke, Mathew

Rothstein, Tony Craig, Jim Edwards

Statistical Ensemble Test Allison Baker, Dorrit Hammerling, Mike Levy, Doug Nychka, John Dennis, Joe

Tribbia, Dave Williamson, Jim Edwards

Parallel IO Jim Edwards, John Dennis, Jayesh Krishna

Data Assimilation Alicia Karspeck, Nancy Collins, Tim Hoar, Kevin Reader, Jeff Anderson, Tony Craig

Parallel Workflow Alice Bertini, Sheri Mickelson, Jay Schollenberger

CSEG – Ben Andre, David Bailey, Alice Bertini, Cheryl Craig, Tony Craig, Brian Eaton, Jim Edwards, Erik Kluzek, Mike Levy, Bill Sacks, Sean Santos, Jay Schollenberger