www.cactuscode.org scenarios for grid applications ed seidel max planck institute for gravitational...
TRANSCRIPT
www.cactuscode.org www.gridlab.org
Scenarios for Grid Applications
Ed SeidelMax Planck Institute for
Gravitational [email protected]
Scenarios 2
PHYSICS TOMORROW Remote
visualization, monitoring and steering
Distributed computing
Grid portals More dynamic
scenarios
OutlineOutline
Scenarios 3
Why Grids? (1) eScienceWhy Grids? (1) eScience
A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour
1,000 physicists worldwide pool resources for peta-op analyses of petabytes of data
Civil engineers collaborate to design, execute, & analyze shake table experiments
Climate scientists visualize, annotate, & analyze terabyte simulation datasets
An emergency response team couples real time data, weather model, population data
Source: Ian Foster
Scenarios 4
Why Grids? (2) eBusinessWhy Grids? (2) eBusiness
Engineers at a multinational company collaborate on the design of a new product
A multidisciplinary analysis in aerospace couples code and data in four companies
An insurance company mines data from partner hospitals for fraud detection
An application service provider offloads excess load to a compute cycle provider
An enterprise configures internal & external resources to support eBusiness workload
Source: Ian Foster
Scenarios 5
Current Grid App TypesCurrent Grid App Types Community Driven
Serving the needs of distributed communities Video Conferencing Virtual Collaborative Environments
– Code sharing to “experiencing each other” at a distance…
Data Driven Remote access of huge data,
data mining Weather Information systems Particle Physics
Process/Simulation Driven Demanding Simulations of Science and Engineering Get less attention in the Grid World, yet drive HPC!
Remote, steered, etc…
Present Examples: • “simple but very difficult”
Future Examples:• Dynamic• Interacting combinations of all types
Scenarios 6
From Telephone Conference Calls to Access Grid Intern’l Video Meetings
From Telephone Conference Calls to Access Grid Intern’l Video Meetings
Access Grid Lead-ArgonneNSF STARTAP Lead-UIC’s Elec. Vis. Lab
Creating a Virtual Global Research Lab
Internet Linked Pianos
New Cyber ArtsNew Cyber ArtsHumans Interacting
with Virtual Realities
Source: Smarr
Scenarios 7
NSF’s EarthScope -- USArrayExplosions of Data! Typical!NSF’s EarthScope -- USArrayExplosions of Data! Typical!
High Resolution of Crust & Upper Mantle Structure Transportable Array
Broadband Array– 400 Broadband Seismometers
• ~70 Km Spacing• ~1500 X 1500 Km Grid
~2 Year Deployments at Each Site– Rolling Deployment Over More Than 10 Years
Permanent Reference Network Geodetic Quality GPS Receivers
All Data to Community in Near Real Time Bandwidth Will Be Driven by Visual Analysis in
Repositories Realtime simulations work as data come in
Source: Smarr/Frank Vernon (IGPP SIO, UCSD)
Scenarios 8
Rollout Over 14 Years Starting With Existing Broadband Stations
Source: Smarr
EarthScope Rollout
Scenarios 9
Common InfrastructureCommon Infrastructure
Wide range of scientific disciplines will require a common infrastructure: Common Needs Driven by the Science/Engineering
Large Number of Sensors / Instruments– Data to community in real time!
Daily Generation of Large Data Sets– Growth in Computing power from TB ---> PB machines– Experimental data– Data is on Multiple Length and Time Scales
Automatic Archiving in Distributed Repositories Large Community of End Users Multi-Megapixel and Immersive Visualization Collaborative Analysis From Multiple Sites Complex Simulations Needed to Interpret Data
Source: Smarr
Scenarios 10
Common InfrastructureCommon Infrastructure
Wide range of scientific disciplines will require a common infrastructure
Some will need Optical Networks Communications Dedicated Lambdas Data Large Peer-to-Peer Lambda Attached
Storage
Source: Smarr
Scenarios 11
Huge Black Hole Collision Simulation3000 frames of volume rendering TB of data
Huge Black Hole Collision Simulation3000 frames of volume rendering TB of data
Scenarios 12
Issues for Complex SimulationsIssues for Complex Simulations
Huge amounts of data needed/generated across different machines How to retrieve, track, manage data across Grid In this case, had to fly Berlin-NCSA, bring data back on disks!
Many components developed by distributed collaborations How to bring communities together? How to find/load/execute different components?
Many computational resources available How to find best ones to start? How to distribute work effectively?
Needs of computations change with time! How to adapt to changes? How to monitor system?
How to interact with Experiments? Coming!
Scenarios 13
Scale of computations much larger: Need Grids to keep up… Much larger processes, task farm many more processes (e.g. Monte
Carlo) Also: how to simply make better use of current resources
Complexity approaching that of Nature Simulations of the Universe and its constituents
– Black holes, neutron stars, supernovae, Human genome, human behavior Teams of computational scientists working together: the future of
science and engineering Must support efficient, high level problem description Must support collaborative computational science: llok at Grand
Challenges Must support all different languages
Ubiquitous Grid Computing Resources, applications, etc replaced by abstract notion: Grid Services
– E.g., OGSA Very dynamic simulations, even deciding their own future Apps may find the services themselves: distributed, spawned, etc... Must be tolerant of dynamic infrastructure Monitored, viz’ed, controlled from anywhere, with colleagues
elsewhere
Future View: Much is HereFuture View: Much is Here
Scenarios 14
Much already here: Scale of computations much larger: Need
Grids to keep up… Much larger processes, task farm many more
processes (e.g. Monte Carlo) Also: how to simply make better use of current
resources Complexity approaching that of Nature
Simulations of the Universe and its constituents– Black holes, neutron stars, supernovae– Human genome, human behavior– Climate modeling, environmental effects
Future ViewFuture View
Scenarios 15
Future ViewFuture View
Teams of computational scientists working together: the future of science and engineering Must support efficient, high level problem description Must support collaborative computational science: Grand
Challenges Must support all different languages
Ubiquitous Grid Computing Resources, applications, etc replaced by abstract notion: Grid
Services E.g., OGSA Very dynamic simulations, even deciding their own future Apps may find services themselves: distributed, spawned, etc... Must be tolerant of dynamic infrastructure Monitor, viz, control from anywhere, with colleagues elsewhere
Scenarios 16
SC93 - SC2000 Typical scenario
Find remote resource (often using multiple computers) Launch job (usually static, tightly coupled) Visualize results (usually in-line, fixed)
Need to go far beyond this Make it much, much easier
–Portals, Globus, standards Make it much more
dynamic, adaptive, fault tolerant
Migrate this technology to general user
Metacomputing Einsteins Equations:Connecting T3E’s in Berlin,
Garching, SDSC
Grid Applications So FarGrid Applications So Far
Scenarios 17
Cactus Computational ToolkitScience, Autopilot, AMR, Petsc, HDF, MPI, GrACE, Globus, Remote Steering
1. User has Science idea... 3. Selects Appropriate Resources...
5. Collaborators log in to monitor...
4. Steers simulation, monitors performance...
2. Composes/Builds Code Components w/Interface...
Want to integrate and migrate this technology to the generic user…
The Vision, Part I: “ASC”The Vision, Part I: “ASC”
Scenarios 18
Start simple: sit here, compute thereStart simple: sit here, compute there
Accounts for one user (real case): berte.zib.de denali.mcs.anl.gov golden.sdsc.edu gseaborg.nersc.gov harpo.wustl.edu horizon.npaci.edu loslobos.alliance.unm.edu mcurie.nersc.gov modi4.ncsa.uiuc.edu ntsc1.ncsa.uiuc.edu origin.aei-potsdam.mpg.de pc.rzg.mpg.de pitcairn.mcs.anl.gov quad.mcs.anl.gov rr.alliance.unm.edu sr8000.lrz-muenchen.de
16 machines, 6 different usernames, 16 passwords, ...
This is hard, but it gets much worse from here…
Portals provide: Single access to all resources Locate/build executables Central/collaborative parameter files Job submission/tracking Access to new Grid Technologies
Use any Web Browser !!
Scenarios 19
Start simple: sit here, compute thereStart simple: sit here, compute there
Scenarios 20
Distributed Computation:Harnessing Multiple ComputersDistributed Computation:Harnessing Multiple Computers
Why would anyone want to do this? Capacity: computers can’t keep up with needs Throughput
Issues Bandwidth (increasing faster than computation) Latency Communication needs, Topology Communication/computation Techniques to be developed
– Overlapping communication/computation– Extra ghost zones to reduce latency– Compression– Algorithms to do this for scientist
Scenarios 21
Remote Visualization & SteeringRemote Visualization & Steering
Remote Viz data
HTTP
Streaming HDF5Autodownsample
Amira
Any Viz Client:
LCA Vision, OpenDXChanging any steerable parameter• Parameters• Physics, algorithms• Performance
Remote Viz data
QuickTime™ and a Motion JPEG A decompressor are needed to see this picture.
Scenarios 22
RemoteData
Storage
TB Data File
Remote File AccessRemote File Access
grr,psi
on timestep 2
Lapse for r<1, every other point
HDF5 VFD/GridFTP:
Clients use file URL(downsampling,hyperslabbing)
Network Monitoring Service
NCSA (USA)Analysis of
Simulation Data
AEI (Germany)
Visualization
SC02 (Baltimore)
More BandwidthAvailable
Scenarios 23
Vision, Part II: Dynamic Distributed ComputingVision, Part II: Dynamic Distributed Computing
Many new ideas Consider: the Grid IS your computer:
– Networks, machines, devices come and go– Dynamic codes, aware of their environment, seek out
resources– Distributed and Grid-based thread parallelism
Begin to change the way you think about problems: think global, solve much bigger problems
Many old ideas 1960’s all over again How to deal with dynamic processes processor management memory hierarchies, etc
Make apps able to respond to changing Grid environment...
Scenarios 24
Code/User/Infrastructure should be aware of environment What Grid Sevices are available??
– Discover resources available NOW, and their current state?– What is my allocation?– What is the bandwidth/latency between sites?
Code/User/Infrastructure should be able to make decisions A slow part of my simulation can run asynchronously…
spawn it off! New, more powerful resources just became available…
migrate there! Machine went down…reconfigure and recover! Need more memory (or less!)…get it by adding (dropping)
machines!
New Paradigms for Dynamic GridsNew Paradigms for Dynamic Grids
Scenarios 25
Code/User/Infrastructure should be able to publish to central server for tracking, monitoring, steering… Unexpected event…notify users! Collaborators from around the world all connect,
examine simulation. Rethink your Algorithms: Task farming, Vectors,
Pipelines, etc all apply on Grids… The Grid IS your Computer!
New Paradigms for Dynamic GridsNew Paradigms for Dynamic Grids
Scenarios 26
New Grid Applications: examplesNew Grid Applications: examples
Intelligent Parameter Surveys, Monte Carlos May control other simulations!
Dynamic Staging: move to faster/cheaper/bigger machine (“Grid Worm”) Need more memory? Need less?
Multiple Universe: create clone to investigate steered parameter (“Gird Virus”)
Automatic Component Loading Needs of process change, discover/load/execute new calc. component on
approp.machine Automatic Convergence Testing
from initial data or initiated during simulation Look Ahead
spawn off and run coarser resolution to predict likely future Spawn Independent/Asynchronous Tasks
send to cheaper machine, main simulation carries on Routine Profiling
best machine/queue, choose resolution parameters based on queue Dynamic Load Balancing: inhomogeneous loads, multiple grids
Scenarios 27
Issues Raised by Grid ScenariosIssues Raised by Grid Scenarios
Infrastructure: Is it ubiquitous? Is it reliable? Does it work?
Security: How does user or process authenticate as moves from
site to site? Firewalls? Ports?
How does user/application get information about Grid? Need reliable, ubiquitous Grid information services Portal, Cell phone, PDA
What is a file? Where does it live? Crazy Grid apps will leave pieces of files all over the
world Tracking
How does user track the Grid simulation hierarchies?
Scenarios 28
What can be done now?What can be done now?
Some Current Examples, work Now: Building blocks for the future Dynamic, Adaptive Distributed Computing
– Increase scaling from 15 - 70% Migration: Cactus Worm Spawning Task Farm/Steering Combination
If these can be done now, think what you could do tomorrow
We are developing tools to enable such scenarios for any applications
Scenarios 29
Dynamic Adaptive Distributed Computation(T.Dramlitsch, with Argonne/U.Chicago)Dynamic Adaptive Distributed Computation(T.Dramlitsch, with Argonne/U.Chicago)
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
OC-12 line
(But only 2.5MB/sec)
GigE:100MB/sec
17
12
5
4 2
12
5
2
These experiments: Einstein Equations (but could
be any Cactus application)Achieved:
First runs: 15% scaling With new techniques: 70-85%
scaling, ~ 250GF
Won “Gordon Bell Prize”
(Supercomputing 2001, Denver)
Dynamic Adaptation: Number of ghostzones, compression, …
Scenarios 30
Adapt
2 ghosts3 ghosts
Compress on!
Automatically Load Balance at t=0
Automatically adapt to bandwidth latency issues
Application has NO KNOWLEDGE of machines(s) it is on, networks, etc
Adaptive techniques make NO assumptions about network
Issues: if network conditions change faster than adaption…
Dynamic AdaptionDynamic Adaption
Scenarios 31
Cactus simulation (could be anything) starts, launched from a portal (Queries a Grid Information Server, finds available resources) Migrates itself to next site, accordingto some criterion Registers new location to GIS, terminates old simulation User tracks/steers, using http, streaming data, etc...…
Continues around Europe… If we can do this, much of what we want can be done!
Cactus Worm: Basic ScenarioCactus Worm: Basic Scenario
Scenarios 32
Determining When to Migrate: Contract MonitorDetermining When to Migrate: Contract Monitor
GrADS project activity: Foster, Angulo, Cactus Establish a “Contract”
Driven by user-controllable parameters– Time quantum for “time per iteration”– % degradation in time per iteration (relative to prior
average) before noting violation– Number of violations before migration
Potential causes of violation Competing load on CPU Computation requires more processing power:
e.g., mesh refinement, new sub-computation Hardware problems Going too fast! Using too little memory? Why
waste a resource??
Scenarios 33
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 3 5 7 9
11
13
15
17
19
21
23
25
27
29
31
33
Clock Time
Ite
rati
on
s/S
ec
on
d
Loadapplied
3 successivecontract
violations
RunningAt UIUC
(migrationtime not to scale)
Resourcediscovery
& migration
RunningAt UC
(Foster, Angulo, GrADS, Cactus Team…)
Migration Experiments: Contract BasedMigration Experiments: Contract Based
Scenarios 34
Spawning: SC2001 DemoSpawning: SC2001 Demo
Black hole collision simulation Every n timesteps, time consuming analysis tasks done Process output data, find gravitational waves, horizons Take much time Processes do not run well in parallel
Solution: User invokes “Spawner” Analysis tasks outsourced
Globus enabled resource Resource Discovery (only mocked up last year) login, data transfer Remote jobs started up
Main simulation can keep going without pausing Except to spawn: may be time consuming itself
It worked!
Scenarios 35
Main Cactus BH Simulation starts here
User only has to invoke “Spawner” thorn…
Spawning on ARG TestbedSpawning on ARG Testbed
All analysis tasks spawned automatically to free resources worldwide
Scenarios 36
Need to run large simulation, with dozens of input parameters Selection is trail and error (resolution, boundary, etc)
Remember look ahead scenario? Run at lower resolution predict likely outcome! Task farm dozens of smaller jobs across grid to set initial
parameters for big run Task farm manager sends out jobs across resources, collects
results Lowest error parameter set chosen
Main simulation starts with “best” parameters If possible, low resolution jobs with different parameters
can be cloned off at various intervals, run for awhile, and results returned to steer coordinate parameters for next phase of evolution
Task Farming/Steering ComboTask Farming/Steering Combo
Scenarios 37
Main Cactus BH Simulation starts here
Dozens of low resolution jobs sent out to test parameters
Data returned for main jobHuge job generates remote datato be visualized in Baltimore
SC2002 DemoSC2002 Demo
Scenarios 38
Physicist has new idea !
S1 S2
P1
P2
S1S2
P2P1
SBrill Wave
We see something,but too weak.
Please simulateto enhance signal!
Future Dynamic Grid ComputingFuture Dynamic Grid Computing
Scenarios 39
Found a black hole,Load new component
Look forhorizon
Calculate/OutputGrav. Waves
Calculate/OutputInvariants
Find bestresources
Free CPUs!!
NCSA
SDSC
RZG
LRZArchive data
SDSC
Add more resources
Clone job with steered
parameter
Queue time over, find new machine
FurtherCalculations
AEI
Archive to LIGOexperiment
Future Dynamic Grid ComputingFuture Dynamic Grid Computing
Scenarios 40
Grid ApplicationToolkit (GAT)Grid ApplicationToolkit (GAT)
Application developer should be able to build simulations with tools that easily enable dynamic grid capabilities
Want to build programming API to easily allow: Query information server (e.g. GIIS)
– What’s available for me? What software? How many processors? Network Monitoring Decision Routines
– How to decide? Cost? Reliability? Size? Spawning Routines
– Now start this up over here, and that up over there Authentication Server
– Issues commands, moves files on your behalf Data Transfer
– Use whatever method is desired (Gsi-ftp, Streamed HDF5, scp…) Etc…Apps themselves find, become, services
Scenarios 41
Grid Related ProjectsGrid Related Projects GridLab www.gridlab.org
Enabling these scenarios ASC: Astrophysics Simulation Collaboratory www.ascportal.org
NSF Funded (WashU, Rutgers, Argonne, U. Chicago, NCSA) Collaboratory tools, Cactus Portal
Global Grid Forum (GGF) www.gridforum.org Applications Working Group
GrADs: Grid Application Development Software www.isi.edu/grads
NSF Funded (Rice, NCSA, U. Illinois, UCSD, U. Chicago, U. Indiana...)
TIKSL/GriKSL www.griksl.org German DFN funded: AEI, ZIB, Garching Remote online and offline visualization, remote steering/monitoring
Cactus Team www.CactusCode.org Dynamic distributed computing …
Scenarios 42
SummarySummary
Grid computing has many promises Making today’s computing easier by managing resources
better Creating an environment for advanced apps of tomorrow
In order to do this, your applications must be portable!
Rethink your problems for the Grid Computer Distributed computing, grid threads, etc Other forms of parallelism: Task farming, spawning,
migration Data access, data mining etc: all much more powerful
Next, we’ll show you Demos of what you can do now How to prepare for the future