grid computing in numerical relativity and …gallen/presentations/sesaps_november2005.pdfgrid...
TRANSCRIPT
1
Grid Computing inNumerical Relativityand AstrophysicsGabrielle Allen: [email protected] Computer Science & PhysicsCenter for Computation & Technology (CCT)Louisiana State University
Challenge Problems
• Cosmology• Black Hole
and NeutronStar Models
• Supernovae• Astronomical
Databases• Gravitational
Wave DataAnalysis
• Drive HEC &Grids
2
Gravitational Wave Physics
Observations Models
Analysis & Insight
Complex Simulations
3
Computational Science Needs• Requires incredible mix of technologies & expertise!• Many scientific/engineering components
– Physics, astrophysics, CFD, engineering,...• Many numerical algorithm components
– Finite difference? Finite volume? Finite elements?– Elliptic equations: multigrid, Krylov subspace,...– Mesh refinement
• Many different computational components– Parallelism (HPF, MPI, PVM, ???)– Multipatch– Architecture (MPP, DSM, Vector, PC Clusters, FPGA, ???)– I/O (generate TBs/simulation, checkpointing…)– Visualization of all that comes out!
• New technologies– Grid computing– Steering, data archives
• Such work cuts across many disciplines, areas of CS…
Cactus Code• Freely available, modular, portable and
manageable environment for collaborativelydeveloping parallel, high-performance multi-dimensional simulations
• Developed for Numerical Relativity, but nowgeneral framework for parallel computing(CFD, astrophysics, climate modeling,chemical eng, quantum gravity, …)
• Finite difference, adaptive mesh refinement(Carpet, Samrai, Grace), adding FE/FV,multipatch
• Active user and developer communities,main development now at LSU and AEI.
• Open source, documentation, etc
4
• Cactus modules (thorns) fornumerical relativity.
• Many additional thornsavailable from other groups(AEI, CCT, …)
• Agree on some basicprinciples (e.g. names ofvariables) and then can shareevolution, analysis etc.
• Can choose whether or not touse e.g. gauge choice, macros,masks, matter coupling,conformal factor
• Over 100 relativity papers &30 student theses:production research code
ADM
EvolSimple
Evolve
ADMAnalysis
ADMConstraints
AHFinder
Extract
PsiKadelia
TimeGeodesic
Analysis
IDAnalyticBH
IDAxiBrillBH
IDBrillData
IDLinearWaves
IDSimple
InitialData
CoordGauge
Maximal
Gauge Conditions
SpaceMask ADMCoupling
ADMMacros StaticConformal
ADMBaseCactus Einstein
Grand Challenge Collaborations
NSF Black Hole GrandChallenge
• 8 US Institutions• 5 years• Attack colliding black
hole problem
Examples of Future of Science &Engineering
• Require Large Scale Simulations,beyond reach of any machine
• Require Large Geo-distributedCross-Disciplinary Collaborations
• Require Grid Technologies, but notyet using them!
NASA Neutron StarGrand Challenge
• 5 US sites• 3 years• Colliding neutron
star problem
EU AstrophysicsNetwork
• 10 EU sites• 3 years• Continuing these
problems
5
New Paradigm: Grid Computing
• Computational resources acrossthe world– Compute servers (double each 18
months)– File servers– Networks (double each 9 months)– Playstations, cell phones etc…
• Grid computing integratescommunities and resources
• How to take advantage of this forscientific simulations?– Harness multiple sites and devices– Models with new level of
complexity and scale, interactingwith data
– New possibilities for collaborationand advanced scenarios
NLR and Louisiana Optical Network (LONI)
State initiative ($40M) tosupport research:
40 Gbps optical network
Connects 7 sites
Grid resources (IBM P5) atsites
LIGO/CAMD
New possibilities:
Dynamical provisioningand scheduling ofnetwork bandwidth
Network dependentscenarios
“EnLIGHTened”Computing (NSF)
6
Current Grid Application Types• Community Driven
– Distributed communities share resources– Video Conferencing– Virtual Collaborative Environments
• Data Driven– Remote access of huge data, data mining– Eg. Gravitational wave analysis, particle
physics, astronomy• Process/Simulation Driven
– Demanding Simulations of Science andEngineering
– Task farming, resource brokering,distributed computations, workflow
• Remote visualization, steering andinteraction, etc…
Typical scenario:Find remoteresources (task farm,distribute)Launch jobs (static)Visualize, collectresults
Prototypes and demos:need to move to:
Fault toleranceRobustnessScalingEasy to useComplete solutions
New Paradigms for Dynamic Grids• Addressing large, complex, multidisciplinary
problems with collaborative teams of variedresearchers ...
• Code/User/Infrastructure should be awareof environment– Discover and monitor resources available
NOW– What is my allocation on these resources?– What is bandwidth/latency
Code/User/Infrastructure should makedecisions
– Slow part of simulation can run independently… spawn it off!
– New powerful resources just became available… migrate there!
– Machine went down … reconfigure and recover!– Need more memory (or less!), get by adding
(dropping) machines!
Dynamically provisionand use new highend resources and
networks
7
S1 S2
P1
P2
S1S2
P2P1
S
Future Dynamic Grid Computing
We see something,but too weak.
Please simulateto enhance signal!
Found a black hole,Load new component
Look forhorizon
Calculate/OutputGrav. Waves
Calculate/OutputInvariants
Find bestresources
Free CPUs!!
NCSA
SDSC
RZG
LRZArchive data
SDSC
Add more resources
Clone job withsteered parameter
Queue time over, find new machine
FurtherCalculations
AEI Archive to LIGO
experiment
Future Dynamic Grid Computing
8
New Grid Scenarios• Intelligent Parameter Surveys, speculative computing, monte
carlo• Dynamic Staging: move to faster/cheaper/bigger machine• Multiple Universe: create clone to investigate steered
parameter• Automatic Component Loading: needs of process change,
discover/load/execute new calc. component on approp.machine• Automatic Convergence Testing• Look Ahead: spawn off and run coarser resolution to predict
likely future• Spawn Independent/Asynchronous Tasks: send to cheaper
machine, main simulation carries on• Routine Profiling: best machine/queue, choose resolution
parameters based on queue• Dynamic Load Balancing: inhomogeneous loads, multiple grids• Inject dynamically acquired data
But … Need Grid Apps andProgramming Tools
• Need application programming tools for Gridenvironments– Frameworks for developing Grid applications– Toolkits providing Grid functionality– Grid debuggers and profilers– Robust, dependable, flexible Grid tools
• Challenging CS problems:– Missing or immature grid services– Changing environment– Different and evolving interfaces to the “grid”– Interfaces are not simple for scientific application developers
• Application developers need easy, robust anddependable tools
9
GridLab Project• EU 5th Framework ($7M)• Partners in Europe and US
– PSNC (Poland), AEI & ZIB (Germany),VU (Netherlands), MASARYK (Czech),SZTAKI (Hungary), ISUFI (Italy),Cardiff (UK), NTUA (Greece),Chicago, ISI & Wisconsin (US), Sun,Compaq/HP, LSU
• Application and test bedoriented (Cactus + Triana)– Numerical relativity– Dynamic use of grids
• Main goal: develop applicationprogramming environment forGrid
www.gridlab.org
Grid Application Toolkit (GAT)• Abstract
programminginterface betweenapplications and Gridservices
• Designed forapplications (movefile, run remote task,migrate, write toremote file)
• Led to GGF SimpleAPI for GridApplications
www.gridlab.org/GAT
Main resultfrom GridLab
project
10
Distributed Computation• Issues
– Bandwidth (increasingfaster than CPU)
– Latency– Communication needs,
Topology– Communication/computation
• Techniques to bedeveloped– Overlapping
communication/computation– Extra ghost zones to reduce
latency– Compression– Algorithms to do this for
scientist
Harnessing MultipleComputers
Why do this?Capacity: computers can’t keep upwith needsThroughput: combine resources
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
OC-12 line
(But only 2.5MB/sec)
GigE:100MB/sec17
12
5
4 2
12
5
2
Cactus + MPICH-G2Communications dynamically adapt
to application and environmentAny Cactus applicationScaling: 15% -> 85%
“Gordon Bell Prize”(With U. Chicago/Northern,
Supercomputing 2001, Denver)
Dynamic AdaptiveDistributed Computation
11
HTTP
Streaming HDF5Autodownsample
Any Viz Client:LCA Vision, OpenDX
Changing steerableparameters• Parameters• Physics, algorithms• Performance
Remote Viz & Steering
Cactus Worm (SC2000)
• Cactus simulation starts,launched from portal
• Migrates itself to another site– Grid technologies
• Registers new location• User tracks/steers, using HTTP,
streaming data, etc…
• Continues around Europe…
12
User only has to invokeCactus “Spawner” thorn…
Appropriate analysis tasksspawned automatically tofree resources worldwide
Task Spawning (SC2001)Cactus “Spawner” thorn automaticallyprepares analysis tasks for spawning
Grid technologies find resources,manage tasks, collect data
Intelligence to decide when to spawn
SC2001: resources of GGTC testbed.
Main Cactus BHsimulation starts
here
• 5 continents and over14 countries.
• Around 70 machines,7500+ processors
• Many hardwaretypes, including PS2,IA32, IA64, MIPS,
• Many OSs, includingLinux, Irix, AIX,OSF, True64,Solaris, Hitachi
• Many organizations:DOE, NSF, MPG,universities, vendors
• All ran same Gridinfrastructure, andused for differentapplications
Cactus black holesimulations spawnedapparent horizon
finding tasks acrossthe grid.
Global Grid Testbed CollaborationSupercomputing 2001
Prizes for mostheterogeneous and most
distributed testbed
13
Main Cactus BHSimulation startedin California
Dozens of low resolutionjobs test corotationparameter
Huge jobgeneratesremote datavisualized inBaltimore
Error measurereturned
Black holeserver controlstasks and steersmain job
Black Hole Task Farming (SC2002)
Job Migration GridLab demonstration SC2003
14
Notification and Information
GridSpherePortal
SMSServer
Mail Server
“The Grid”
ReplicaCatalog
User details,notification prefs andsimulation information
IM Server
“Grid-enabled” Gravitational Physics• Adaptive, intelligent
simulation codes able toadapt to environment
• Simulation data storedacross geographicallydistributed spaces– Organization, access, mining
issues– Analysis of federated data
sets by virtual organizations• Data analysis of LIGO,
GEO, LISA signals– Interacting with simulation
data– Managing parameter
space/signal analysis
• Now working on domain specificinformation and knowledgebased services:– Gravitational physics description
language• Schema for describing, searching,
encoding simulation results• Automated logging of simulations:
reproducibility– Notification and data sharing
services to enable collaboration– Relativity services
• Remote servers running e.g.waveform extraction, horizonfinding etc.
• Connection to publications andinformation
• Automated analysis
15
Credits
• This talk describes work carried outover a number of years by physicists,computer scientists, mathematiciansetc by the joint AEI-LSU numericalrelativity groups and colleagues.