cactus in grads

33
Cactus in GrADS Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, John Shalf, Thomas Radke

Upload: jace

Post on 21-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Cactus in GrADS. Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, John Shalf, Thomas Radke. Presentation Outline. Cactus Overview Architecture Applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cactus in GrADS

Cactus in GrADS

Dave Angulo, Ian Foster

Matei Ripeanu, Michael Russell

Distributed Systems Laboratory

The University of Chicago

With: Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, John Shalf, Thomas Radke

Page 2: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Presentation Outline

Cactus Overview– Architecture

– Applications Cactus and Grid computing

– Metacomputing, Worms, … Proposed Cactus-GrADS project

– The “Cactus-G worm”

– Tequila thorn and architecture

– Issues

Page 3: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

What is Cactus?

Cactus is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high-performance multidimensional simulations– Originally developed for astrophysics,

but nothing about it is astrophysics-specific

Page 4: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Applications

Example output from Numerical Relativity Simulations

Page 5: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Architecture

Codes are constructed by linking a small core (flesh) with selected modules (thorns)– Custom linking/configuration tools

Core provides basic management services A wide variety of thorns are supported

– Numerical methods

– Grids and domain decompositions

– Visualization and steering

– Etc.

Page 6: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Architecture

Configure CST

Flesh

ComputationalToolkit

Toolkit Toolkit

Operating SystemsAIX NT

LinuxUnicos

SolarisHP-UX

Thorns

Cactus

SuperUX Irix

OSF

Make

Page 7: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Applications

A Cactus “application” is just another thorn, “linked” with other tool thorns

Numerous Astrophysics applications– E.g., Calculate Schwartzchild Event Horizons

for colliding black holes Potential candidates for GrADS work

– Elliptical Solver, BenchADM

– Both use 3-D grid abstract topology

Page 8: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Model (cont.)

Building an executable

Cactus Source

Flesh

IOBasic

IOASCII

WaveToy

LDAP

Worm

ThornsConfiguration

• Compiler options

• Tool options

• MPI options

• HDF5 options

Page 9: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Running Cactus

Parameter File

• Specify which thorns to activate

• Specify global parameters

• Specify restricted parameters

• Specify private parameters

Page 10: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Parallelism in Cactus Distributed memory model: each thorn is passed a section of

the global grid The parallel driver (implemented in a thorn) can use whatever

method it likes to decompose the grid across processors and exchange ghost zone information - each thorn is presented with a standard interface, independent of the driver

Standard driver distributed with Cactus (PUGH) is for a parallel unigrid and uses MPI for the communication layer

PUGH can do custom processor decomposition and static load balancing

AMR driver also provided

Page 11: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus and Grid Computing:General Observations

Reasons to work with Cactus– Rich structure, computationally intensive,

numerous opportunities for Grid computing

– Talented and motivated developer/user community

Issues– At core, relatively simple structure

– Cactus system is relatively complex

– User community is relatively small

Page 12: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus-G: Possible Opportunities

“Metacomputing”: use heterogeneous systems as source of low-cost cycles– Departmental pool or multi-site system

Dynamic resource selection, e.g.– “Cheapest” resources to achieve interactivity

– “Fastest” resource for best turnaround

– “Best” resolution to meet turnaround goal

– Spawn independent tasks: e.g., analysis

– Migration to “better” resource for all above

Page 13: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus-G: Common Building Blocks

Resource selection based on resource and application characterizations

Implementation and management of distributed output

(De)centralized logging, accounting for resource usage, parameter selection, etc.

Fault discovery, recovery, tolerance Code/executable management and creation Next-generation Cactus that increases flexibility

with respect to parameter selection

Page 14: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Proposed Cactus-G Challenge Problem: Cactus-G Worm

Migrate to “faster/cheaper/bigger” system– When system identified by resource discovery

– When resource requirements change Why?

– Tests much of the machinery required for Cactus-G (source code mgmt, discovery, …)

– Places substantial demands on GrADS

– Good potential to show real benefit

– Migration approach simplifies infrastructure demands (MPI-2 support not required)

Page 15: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus-G WormBasic Architecture and Operation

Cactus “flesh”

“Tequila” Thorn

Computeresource

Computeresource

Coderepository…

Coderepository

Storageresource

Storageresource

GridInformation

Service

GrADSResourceSelector

ApplicationManager

Appln& otherthorns

(1) Adapt.request (2)

Resourcerequest

(3) Writecheckpoint

(4) Migrationrequest

(5) Cactusstartup

(7) Readcheckpoint

(0) Possibleuser input

(6) Loadcode

(1’) Resourcenotification

Storemodels, etc.

Query

Page 16: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Tequila Thorn Functions

Initiates adaptation on application request or on notification of new resources– Can include user input (e.g., HTTP thorn)

Requests resources from external entity– GIS or ResourceSelector

Checkpoints application Contacts Application Manager to request

restart on new resources– AppManager has security, robustness

advantages vs. direct restart

Page 17: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus-G Worm: Approach

1) Uniproc Tequila thorn that speaks to GIS, adapts periodically [done: Cactus group]

2) Tequila thorn that speaks to UCSD Resource Selector [current focus]

3) Integrate accurate performance models

4) Support multiprocessor execution

5) Detailed evaluation

6) Add adaptation triggers: e.g., contract violation, new regime, user input

Page 18: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Tequila Thorn + ResourceSelector

ResourceSelector must be set up as service

Tequila thorn sends request for new bag of resources

ResourceSelector responds with the new bag

Page 19: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Current Status

Tequila thorn prototype developed that speaks to ResourceSelector

Dummy ResourceSelector that returns a static bag of resources

Demonstrated Cactus+Tequila operating Performance model developed Expected by May: multiprocessor support,

ResourceSelector interface, real performance model

Page 20: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Open Issues

Should we move more management logic into Application Manager?

How does Contract Monitor fit into architecture?

How does PPS fit into architecture? How does COP and Aplication Launcher fit into

architecture (Cactus has its own launcher and compiles its own code)?

How does Pablo fit into architecture (Which thorns are monitored, is flesh monitored)?

Page 21: Cactus in GrADS

The End

Page 22: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Request and Response

The Request to the ResourceSelector will be stored in the InformationService

Only the pointer to the data in the IS will be passed to the ResourceSelector

The Response from the ResourceSelector will also be stored in the IS

Only the pointer to the data in the IS will be passed back.

Page 23: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Tequila communication overview

Cactus Tequila ThornResourceSelector

InformationService

Page 24: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Cactus Architecture in GrADS

Configure CST

Flesh

ComputationalToolkit

Toolkit

Operating SystemsAIX NT

LinuxUnicos

SolarisHP-UX

Thorns

Cactus

SuperUX Irix

OSF

Make

Toolkit

GradsCommuni-

cationlibrary

Page 25: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 1

Event sent to Tequila thorn requesting restart

Cactus Tqeuila ThornResourceSelector

InformationService

Page 26: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 2

Tequila store AART in IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 27: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 3

Tequila sends request to ResourceSelector passing pointer to data in IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 28: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 4

ResourceSelector retrieves AART from IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 29: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 5

ResourceSelector stores bag of resources (in AART) in IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 30: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 6

ResourceSelector responds to Tequila passing pointer to data in IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 31: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Communication Details step 7

Tequila retrieves AART with new bag of resources from IS

Cactus Tqeuila ThornResourceSelector

InformationService

Page 32: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Requirements

Using the IS for communication adds overhead.

Why do this? GrADS requirement 1: do some things (e.g.

compile) at one time and have the results stored in a persistent storage area. Pick these stored results up later and complete other phases.

Page 33: Cactus in GrADS

Distributed Systems Lab ARGONNE CHICAGO

Sample Tequila Scenario User asks to run an ADM simulation 400x400x400 for

1000 timesteps in 10s. Resource selector contacted to obtain virtual machines Best virtual machine selected based on performance

model. AM starts Cactus on that virtual machine (and monitors

execution Contracts?) User (or application manager) decides that

computation advances too slow and decides to search for a better virtual machine

AM finds a better machine, commands the Cactus run to Checkpoint, transfers files and restart Cactus