www.cactuscode.org scenarios for grid applications ed seidel max planck institute for gravitational...

www.cactuscode.org www.gridlab.org

Scenarios for Grid Applications

Ed SeidelMax Planck Institute for

Gravitational [email protected]

Scenarios 2

PHYSICS TOMORROW Remote

visualization, monitoring and steering

Distributed computing

Grid portals More dynamic

scenarios

OutlineOutline

Scenarios 3

Why Grids? (1) eScienceWhy Grids? (1) eScience

A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour

1,000 physicists worldwide pool resources for peta-op analyses of petabytes of data

Civil engineers collaborate to design, execute, & analyze shake table experiments

Climate scientists visualize, annotate, & analyze terabyte simulation datasets

An emergency response team couples real time data, weather model, population data

Source: Ian Foster

Scenarios 4

Why Grids? (2) eBusinessWhy Grids? (2) eBusiness

Engineers at a multinational company collaborate on the design of a new product

A multidisciplinary analysis in aerospace couples code and data in four companies

An insurance company mines data from partner hospitals for fraud detection

An application service provider offloads excess load to a compute cycle provider

An enterprise configures internal & external resources to support eBusiness workload

Source: Ian Foster

Scenarios 5

Current Grid App TypesCurrent Grid App Types Community Driven

Serving the needs of distributed communities Video Conferencing Virtual Collaborative Environments

– Code sharing to “experiencing each other” at a distance…

Data Driven Remote access of huge data,

data mining Weather Information systems Particle Physics

Process/Simulation Driven Demanding Simulations of Science and Engineering Get less attention in the Grid World, yet drive HPC!

Remote, steered, etc…

Present Examples: • “simple but very difficult”

Future Examples:• Dynamic• Interacting combinations of all types

Scenarios 6

From Telephone Conference Calls to Access Grid Intern’l Video Meetings

From Telephone Conference Calls to Access Grid Intern’l Video Meetings

Access Grid Lead-ArgonneNSF STARTAP Lead-UIC’s Elec. Vis. Lab

Creating a Virtual Global Research Lab

Internet Linked Pianos

New Cyber ArtsNew Cyber ArtsHumans Interacting

with Virtual Realities

Source: Smarr

Scenarios 7

NSF’s EarthScope -- USArrayExplosions of Data! Typical!NSF’s EarthScope -- USArrayExplosions of Data! Typical!

High Resolution of Crust & Upper Mantle Structure Transportable Array

Broadband Array– 400 Broadband Seismometers

• ~70 Km Spacing• ~1500 X 1500 Km Grid

~2 Year Deployments at Each Site– Rolling Deployment Over More Than 10 Years

Permanent Reference Network Geodetic Quality GPS Receivers

All Data to Community in Near Real Time Bandwidth Will Be Driven by Visual Analysis in

Repositories Realtime simulations work as data come in

Source: Smarr/Frank Vernon (IGPP SIO, UCSD)

Scenarios 8

Rollout Over 14 Years Starting With Existing Broadband Stations

Source: Smarr

EarthScope Rollout

Scenarios 9

Common InfrastructureCommon Infrastructure

Wide range of scientific disciplines will require a common infrastructure: Common Needs Driven by the Science/Engineering

Large Number of Sensors / Instruments– Data to community in real time!

Daily Generation of Large Data Sets– Growth in Computing power from TB ---> PB machines– Experimental data– Data is on Multiple Length and Time Scales

Automatic Archiving in Distributed Repositories Large Community of End Users Multi-Megapixel and Immersive Visualization Collaborative Analysis From Multiple Sites Complex Simulations Needed to Interpret Data

Source: Smarr

Scenarios 10

Common InfrastructureCommon Infrastructure

Wide range of scientific disciplines will require a common infrastructure

Some will need Optical Networks Communications Dedicated Lambdas Data Large Peer-to-Peer Lambda Attached

Storage

Source: Smarr

Scenarios 11

Huge Black Hole Collision Simulation3000 frames of volume rendering TB of data

Huge Black Hole Collision Simulation3000 frames of volume rendering TB of data

Scenarios 12

Issues for Complex SimulationsIssues for Complex Simulations

Huge amounts of data needed/generated across different machines How to retrieve, track, manage data across Grid In this case, had to fly Berlin-NCSA, bring data back on disks!

Many components developed by distributed collaborations How to bring communities together? How to find/load/execute different components?

Many computational resources available How to find best ones to start? How to distribute work effectively?

Needs of computations change with time! How to adapt to changes? How to monitor system?

How to interact with Experiments? Coming!

Scenarios 13

Scale of computations much larger: Need Grids to keep up… Much larger processes, task farm many more processes (e.g. Monte

Carlo) Also: how to simply make better use of current resources

Complexity approaching that of Nature Simulations of the Universe and its constituents

– Black holes, neutron stars, supernovae, Human genome, human behavior Teams of computational scientists working together: the future of

science and engineering Must support efficient, high level problem description Must support collaborative computational science: llok at Grand

Challenges Must support all different languages

Ubiquitous Grid Computing Resources, applications, etc replaced by abstract notion: Grid Services

– E.g., OGSA Very dynamic simulations, even deciding their own future Apps may find the services themselves: distributed, spawned, etc... Must be tolerant of dynamic infrastructure Monitored, viz’ed, controlled from anywhere, with colleagues

elsewhere

Future View: Much is HereFuture View: Much is Here

Scenarios 14

Much already here: Scale of computations much larger: Need

Grids to keep up… Much larger processes, task farm many more

processes (e.g. Monte Carlo) Also: how to simply make better use of current

resources Complexity approaching that of Nature

Simulations of the Universe and its constituents– Black holes, neutron stars, supernovae– Human genome, human behavior– Climate modeling, environmental effects

Future ViewFuture View

Scenarios 15

Future ViewFuture View

Teams of computational scientists working together: the future of science and engineering Must support efficient, high level problem description Must support collaborative computational science: Grand

Challenges Must support all different languages

Ubiquitous Grid Computing Resources, applications, etc replaced by abstract notion: Grid

Services E.g., OGSA Very dynamic simulations, even deciding their own future Apps may find services themselves: distributed, spawned, etc... Must be tolerant of dynamic infrastructure Monitor, viz, control from anywhere, with colleagues elsewhere

Scenarios 16

SC93 - SC2000 Typical scenario

Find remote resource (often using multiple computers) Launch job (usually static, tightly coupled) Visualize results (usually in-line, fixed)

Need to go far beyond this Make it much, much easier

–Portals, Globus, standards Make it much more

dynamic, adaptive, fault tolerant

Migrate this technology to general user

Metacomputing Einsteins Equations:Connecting T3E’s in Berlin,

Garching, SDSC

Grid Applications So FarGrid Applications So Far

Scenarios 17

Cactus Computational ToolkitScience, Autopilot, AMR, Petsc, HDF, MPI, GrACE, Globus, Remote Steering

1. User has Science idea... 3. Selects Appropriate Resources...

5. Collaborators log in to monitor...

4. Steers simulation, monitors performance...

2. Composes/Builds Code Components w/Interface...

Want to integrate and migrate this technology to the generic user…

The Vision, Part I: “ASC”The Vision, Part I: “ASC”

Scenarios 18

Start simple: sit here, compute thereStart simple: sit here, compute there

Accounts for one user (real case): berte.zib.de denali.mcs.anl.gov golden.sdsc.edu gseaborg.nersc.gov harpo.wustl.edu horizon.npaci.edu loslobos.alliance.unm.edu mcurie.nersc.gov modi4.ncsa.uiuc.edu ntsc1.ncsa.uiuc.edu origin.aei-potsdam.mpg.de pc.rzg.mpg.de pitcairn.mcs.anl.gov quad.mcs.anl.gov rr.alliance.unm.edu sr8000.lrz-muenchen.de

16 machines, 6 different usernames, 16 passwords, ...

This is hard, but it gets much worse from here…

Portals provide: Single access to all resources Locate/build executables Central/collaborative parameter files Job submission/tracking Access to new Grid Technologies

Use any Web Browser !!

Scenarios 19

Start simple: sit here, compute thereStart simple: sit here, compute there

http://www.ascportal.org/

Scenarios 20

Distributed Computation:Harnessing Multiple ComputersDistributed Computation:Harnessing Multiple Computers

Why would anyone want to do this? Capacity: computers can’t keep up with needs Throughput

Issues Bandwidth (increasing faster than computation) Latency Communication needs, Topology Communication/computation Techniques to be developed

– Overlapping communication/computation– Extra ghost zones to reduce latency– Compression– Algorithms to do this for scientist

Scenarios 21

Remote Visualization & SteeringRemote Visualization & Steering

Remote Viz data

HTTP

Streaming HDF5Autodownsample

Amira

Any Viz Client:

LCA Vision, OpenDXChanging any steerable parameter• Parameters• Physics, algorithms• Performance

Remote Viz data

QuickTime™ and a Motion JPEG A decompressor are needed to see this picture.

Scenarios 22

RemoteData

Storage

TB Data File

Remote File AccessRemote File Access

grr,psi

on timestep 2

Lapse for r<1, every other point

HDF5 VFD/GridFTP:

Clients use file URL(downsampling,hyperslabbing)

Network Monitoring Service

NCSA (USA)Analysis of

Simulation Data

AEI (Germany)

Visualization

SC02 (Baltimore)

More BandwidthAvailable

Scenarios 23

Vision, Part II: Dynamic Distributed ComputingVision, Part II: Dynamic Distributed Computing

Many new ideas Consider: the Grid IS your computer:

– Networks, machines, devices come and go– Dynamic codes, aware of their environment, seek out

resources– Distributed and Grid-based thread parallelism

Begin to change the way you think about problems: think global, solve much bigger problems

Many old ideas 1960’s all over again How to deal with dynamic processes processor management memory hierarchies, etc

Make apps able to respond to changing Grid environment...

Scenarios 24

Code/User/Infrastructure should be aware of environment What Grid Sevices are available??

– Discover resources available NOW, and their current state?– What is my allocation?– What is the bandwidth/latency between sites?

Code/User/Infrastructure should be able to make decisions A slow part of my simulation can run asynchronously…

spawn it off! New, more powerful resources just became available…

migrate there! Machine went down…reconfigure and recover! Need more memory (or less!)…get it by adding (dropping)

machines!

New Paradigms for Dynamic GridsNew Paradigms for Dynamic Grids

Scenarios 25

Code/User/Infrastructure should be able to publish to central server for tracking, monitoring, steering… Unexpected event…notify users! Collaborators from around the world all connect,

examine simulation. Rethink your Algorithms: Task farming, Vectors,

Pipelines, etc all apply on Grids… The Grid IS your Computer!

New Paradigms for Dynamic GridsNew Paradigms for Dynamic Grids

Scenarios 26

New Grid Applications: examplesNew Grid Applications: examples

Intelligent Parameter Surveys, Monte Carlos May control other simulations!

Dynamic Staging: move to faster/cheaper/bigger machine (“Grid Worm”) Need more memory? Need less?

Multiple Universe: create clone to investigate steered parameter (“Gird Virus”)

Automatic Component Loading Needs of process change, discover/load/execute new calc. component on

approp.machine Automatic Convergence Testing

from initial data or initiated during simulation Look Ahead

spawn off and run coarser resolution to predict likely future Spawn Independent/Asynchronous Tasks

send to cheaper machine, main simulation carries on Routine Profiling

best machine/queue, choose resolution parameters based on queue Dynamic Load Balancing: inhomogeneous loads, multiple grids

Scenarios 27

Issues Raised by Grid ScenariosIssues Raised by Grid Scenarios

Infrastructure: Is it ubiquitous? Is it reliable? Does it work?

Security: How does user or process authenticate as moves from

site to site? Firewalls? Ports?

How does user/application get information about Grid? Need reliable, ubiquitous Grid information services Portal, Cell phone, PDA

What is a file? Where does it live? Crazy Grid apps will leave pieces of files all over the

world Tracking

How does user track the Grid simulation hierarchies?

Scenarios 28

What can be done now?What can be done now?

Some Current Examples, work Now: Building blocks for the future Dynamic, Adaptive Distributed Computing

– Increase scaling from 15 - 70% Migration: Cactus Worm Spawning Task Farm/Steering Combination

If these can be done now, think what you could do tomorrow

We are developing tools to enable such scenarios for any applications

Scenarios 29

Dynamic Adaptive Distributed Computation(T.Dramlitsch, with Argonne/U.Chicago)Dynamic Adaptive Distributed Computation(T.Dramlitsch, with Argonne/U.Chicago)

SDSC IBM SP1024 procs5x12x17 =1020

NCSA Origin Array256+128+1285x12x(4+2+2) =480

OC-12 line

(But only 2.5MB/sec)

GigE:100MB/sec

17

12

5

4 2

12

5

2

These experiments: Einstein Equations (but could

be any Cactus application)Achieved:

First runs: 15% scaling With new techniques: 70-85%

scaling, ~ 250GF

Won “Gordon Bell Prize”

(Supercomputing 2001, Denver)

Dynamic Adaptation: Number of ghostzones, compression, …

Scenarios 30

Adapt

2 ghosts3 ghosts

Compress on!

Automatically Load Balance at t=0

Automatically adapt to bandwidth latency issues

Application has NO KNOWLEDGE of machines(s) it is on, networks, etc

Adaptive techniques make NO assumptions about network

Issues: if network conditions change faster than adaption…

Dynamic AdaptionDynamic Adaption

Scenarios 31

Cactus simulation (could be anything) starts, launched from a portal (Queries a Grid Information Server, finds available resources) Migrates itself to next site, accordingto some criterion Registers new location to GIS, terminates old simulation User tracks/steers, using http, streaming data, etc...…

Continues around Europe… If we can do this, much of what we want can be done!

Cactus Worm: Basic ScenarioCactus Worm: Basic Scenario

Scenarios 32

Determining When to Migrate: Contract MonitorDetermining When to Migrate: Contract Monitor

GrADS project activity: Foster, Angulo, Cactus Establish a “Contract”

Driven by user-controllable parameters– Time quantum for “time per iteration”– % degradation in time per iteration (relative to prior

average) before noting violation– Number of violations before migration

Potential causes of violation Competing load on CPU Computation requires more processing power:

e.g., mesh refinement, new sub-computation Hardware problems Going too fast! Using too little memory? Why

waste a resource??

Scenarios 33

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 3 5 7 9

11

13

15

17

19

21

23

25

27

29

31

33

Clock Time

Ite

rati

on

s/S

ec

on

d

Loadapplied

3 successivecontract

violations

RunningAt UIUC

(migrationtime not to scale)

Resourcediscovery

& migration

RunningAt UC

(Foster, Angulo, GrADS, Cactus Team…)

Migration Experiments: Contract BasedMigration Experiments: Contract Based

Scenarios 34

Spawning: SC2001 DemoSpawning: SC2001 Demo

Black hole collision simulation Every n timesteps, time consuming analysis tasks done Process output data, find gravitational waves, horizons Take much time Processes do not run well in parallel

Solution: User invokes “Spawner” Analysis tasks outsourced

Globus enabled resource Resource Discovery (only mocked up last year) login, data transfer Remote jobs started up

Main simulation can keep going without pausing Except to spawn: may be time consuming itself

It worked!

Scenarios 35

Main Cactus BH Simulation starts here

User only has to invoke “Spawner” thorn…

Spawning on ARG TestbedSpawning on ARG Testbed

All analysis tasks spawned automatically to free resources worldwide

Scenarios 36

Need to run large simulation, with dozens of input parameters Selection is trail and error (resolution, boundary, etc)

Remember look ahead scenario? Run at lower resolution predict likely outcome! Task farm dozens of smaller jobs across grid to set initial

parameters for big run Task farm manager sends out jobs across resources, collects

results Lowest error parameter set chosen

Main simulation starts with “best” parameters If possible, low resolution jobs with different parameters

can be cloned off at various intervals, run for awhile, and results returned to steer coordinate parameters for next phase of evolution

Task Farming/Steering ComboTask Farming/Steering Combo

Scenarios 37

Main Cactus BH Simulation starts here

Dozens of low resolution jobs sent out to test parameters

Data returned for main jobHuge job generates remote datato be visualized in Baltimore

SC2002 DemoSC2002 Demo

Scenarios 38

Physicist has new idea !

S1 S2

P1

P2

S1S2

P2P1

SBrill Wave

We see something,but too weak.

Please simulateto enhance signal!

Future Dynamic Grid ComputingFuture Dynamic Grid Computing

Scenarios 39

Found a black hole,Load new component

Look forhorizon

Calculate/OutputGrav. Waves

Calculate/OutputInvariants

Find bestresources

Free CPUs!!

NCSA

SDSC

RZG

LRZArchive data

SDSC

Add more resources

Clone job with steered

parameter

Queue time over, find new machine

FurtherCalculations

AEI

Archive to LIGOexperiment

Future Dynamic Grid ComputingFuture Dynamic Grid Computing

Scenarios 40

Grid ApplicationToolkit (GAT)Grid ApplicationToolkit (GAT)

Application developer should be able to build simulations with tools that easily enable dynamic grid capabilities

Want to build programming API to easily allow: Query information server (e.g. GIIS)

– What’s available for me? What software? How many processors? Network Monitoring Decision Routines

– How to decide? Cost? Reliability? Size? Spawning Routines

– Now start this up over here, and that up over there Authentication Server

– Issues commands, moves files on your behalf Data Transfer

– Use whatever method is desired (Gsi-ftp, Streamed HDF5, scp…) Etc…Apps themselves find, become, services

Scenarios 41

Grid Related ProjectsGrid Related Projects GridLab www.gridlab.org

Enabling these scenarios ASC: Astrophysics Simulation Collaboratory www.ascportal.org

NSF Funded (WashU, Rutgers, Argonne, U. Chicago, NCSA) Collaboratory tools, Cactus Portal

Global Grid Forum (GGF) www.gridforum.org Applications Working Group

GrADs: Grid Application Development Software www.isi.edu/grads

NSF Funded (Rice, NCSA, U. Illinois, UCSD, U. Chicago, U. Indiana...)

TIKSL/GriKSL www.griksl.org German DFN funded: AEI, ZIB, Garching Remote online and offline visualization, remote steering/monitoring

Cactus Team www.CactusCode.org Dynamic distributed computing …

Scenarios 42

SummarySummary

Grid computing has many promises Making today’s computing easier by managing resources

better Creating an environment for advanced apps of tomorrow

In order to do this, your applications must be portable!

Rethink your problems for the Grid Computer Distributed computing, grid threads, etc Other forms of parallelism: Task farming, spawning,

migration Data access, data mining etc: all much more powerful

Next, we’ll show you Demos of what you can do now How to prepare for the future

www.cactuscode.org scenarios for grid applications ed seidel max planck institute for gravitational...

Documents