1 june 2015 astrogrid-d wp 5: resource management for grid jobs report by: rainer spurzem (zah-ari)...
Post on 18-Dec-2015
220 views
TRANSCRIPT
April 18, 2023
AstroGrid-DWP 5: Resource Management for Grid Jobs
AstroGrid-DWP 5: Resource Management for Grid Jobs
Report by:
Rainer Spurzem
(ZAH-ARI)[email protected]
and T. Brüsemeister, J. Steinacker
Report by:
Rainer Spurzem
(ZAH-ARI)[email protected]
and T. Brüsemeister, J. Steinacker
2AstroGrid-D Meeting April 18, 2023
Meeting 13:10 – 14:30 WG5Meeting 13:10 – 14:30 WG5
Meeting WG5 and friends GridWay Discussion (together with Ignacio Llorente) Expected List of Topics:
The Present Gridway Installation in Heidelberg - Solutions and Problems. Which use cases work? How? Demos or screenshots if available.
how about more than one gridway installation in Astrogrid-D simultaneously at different sites?
cooperation of information system and job submission (in general or in the special cases of our Astrogrid-D information system and Gridway?
miscellaneous (data staging postponed to next session)
Meeting WG5 and friends GridWay Discussion (together with Ignacio Llorente) Expected List of Topics:
The Present Gridway Installation in Heidelberg - Solutions and Problems. Which use cases work? How? Demos or screenshots if available.
how about more than one gridway installation in Astrogrid-D simultaneously at different sites?
cooperation of information system and job submission (in general or in the special cases of our Astrogrid-D information system and Gridway?
miscellaneous (data staging postponed to next session)
3AstroGrid-D Meeting April 18, 2023
GridWay Leightweight Metascheduler on top of GT2.4/GT4
Central Server Architecture
Support of GGF DRMAA standard API for job submission and management
Simple round robin/flooding scheduling algorithm, but extensible
Meeting 13:10 – 14:30 WG5Meeting 13:10 – 14:30 WG5
4AstroGrid-D Meeting April 18, 2023
GT4 Resources
hydra.ari.uni-heidelberg.de
Scheduler / Broker
Gridway
Information System
Matchmaking
Job Status: “gwps”
Meeting 13:10 – 14:30 WG5Meeting 13:10 – 14:30 WG5
A practical example with screenshots:A practical example with screenshots:
5AstroGrid-D Meeting April 18, 2023
Meeting 13:10 – 14:30 WG5Meeting 13:10 – 14:30 WG5
Our View
(Thanks:Hans-Martin)
Our View
(Thanks:Hans-Martin)
6AstroGrid-D Meeting April 18, 2023
Meeting 13:10 – 14:30 WG5Meeting 13:10 – 14:30 WG5
• D5.1: central resource broker with queue• Present: use GridWay, throughway, round-robin• More Installations useful?
Questions:
• Parameters needed Gridway – Information System (queue status, module availability, data availabilty, hardware)• When is it feasible to have a real brokerage? How?
• D5.1: central resource broker with queue• Present: use GridWay, throughway, round-robin• More Installations useful?
Questions:
• Parameters needed Gridway – Information System (queue status, module availability, data availabilty, hardware)• When is it feasible to have a real brokerage? How?
7AstroGrid-D Meeting April 18, 2023
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
Porting Use Cases onto the Grid NBODY6++ : Astrophysical Case for direct N-body: Star Clusters, Galactic Nuclei, Black Holes, Gravitational Wave Generation
Special Hardware GRAPE, MPRACE (FPGA), future technologies (HT, Xtoll, GRAPE-DR)
GRAPE in the Grid, Astrogrid-D, International DEISA
Porting Use Cases onto the Grid NBODY6++ : Astrophysical Case for direct N-body: Star Clusters, Galactic Nuclei, Black Holes, Gravitational Wave Generation
Special Hardware GRAPE, MPRACE (FPGA), future technologies (HT, Xtoll, GRAPE-DR)
GRAPE in the Grid, Astrogrid-D, International DEISA
8AstroGrid-D Meeting April 18, 2023
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
9AstroGrid-D Meeting April 18, 2023
Jun Makino and colleagues in Tokyo… …support, cooperation, over many years…
N-Body + Grav. Waves @ ARI: Peter Berczik, Ingo Berentzen,Jonathan Downing, Miguel Preto,Gabor Kupi, Christoph EichhornDavid Merritt (RIT, USA)…
in VESF/LSC collaboration:on gravitational wave modelling from dense star clusters:Pau Amaro-Seoane (AEI, Potsdam, D)G. Schäfer, A. Gopakumar (Univ. Jena, D)M. Benacquista (UT Brownsville, USA)
Further collaborations:Sverre Aarseth (IoA Cambridge UK)Seppo Mikkola (U Turku, FIN)
10AstroGrid-D Meeting April 18, 2023
Globular Cluster ω Centauri(Central Region)
Ground Based View
11AstroGrid-D Meeting April 18, 2023
Detection of Gravitational Waves? Was Einstein right?
12AstroGrid-D Meeting April 18, 2023
Example: VIRGO Detector in Cascina near Pisa, Italy
13AstroGrid-D Meeting April 18, 2023
ijij
jij r
r
mGf
)(
2/322
~N~N ~N^2~N^2
N
ijjiji fa
;1
Basic idea of any GRAPE N-body code:Basic idea of any GRAPE N-body code:
14AstroGrid-D Meeting April 18, 2023
GRAPE6a, -BL - PCI Board for PC-ClustersPROGRAPE-4, FPGA based board from RIKEN (Hamada)GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)GRAPE-DR – new board from Makino et al. NAOJMPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
GRAPE6a, -BL - PCI Board for PC-ClustersPROGRAPE-4, FPGA based board from RIKEN (Hamada)GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)GRAPE-DR – new board from Makino et al. NAOJMPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
~128 Gflops for a price ~5K USD; Memory for up to 128K particles~128 Gflops for a price ~5K USD; Memory for up to 128K particles
GRAPE6a PCI boardGRAPE6a PCI board
Hardware - GRAPEHardware - GRAPE
15AstroGrid-D Meeting April 18, 2023
16AstroGrid-D Meeting April 18, 2023
•32 dual-Xeon 3.0 GHz nodes•32 GRAPE6a•14 TB RAID•Infiniband link (10 Gb/s)•Speed: ~4 Tflops•N up to 4M•Cost: ~500K USD•Funding: NSF/NASA/RIT
ARI 32 node GRAPE6a clustersARI 32 node GRAPE6a clusters
•32 dual-Xeon 3.2 GHz nodes•32 GRAPE6a•32 FPGA•7 TB RAID•Dual port Infiniband link (20 Gb/s)•Speed: ~4 Tflops•N up to 4M•Cost: ~380K EUR•Funding: Volkswagen/Baden-Württemberg
Infiniband Dual 20Gb/s
17AstroGrid-D Meeting April 18, 2023
ARI-ZAH + RIT GRAPE6a clustersARI-ZAH + RIT GRAPE6a clusters
Performance Analysis (3.2 Tflop/s): Harfst et al. 2007, New Astron.
19AstroGrid-D Meeting April 18, 2023
HardwareHardware
20AstroGrid-D Meeting April 18, 2023
21AstroGrid-D Meeting April 18, 2023
22AstroGrid-D Meeting April 18, 2023
23AstroGrid-D Meeting April 18, 2023
24AstroGrid-D Meeting April 18, 2023
25AstroGrid-D Meeting April 18, 2023
S.J.Aarseth, S. Mikkola (ca. 20.000 lines):•Hierarchical Block Time Steps•Ahmad-Cohen Neighbour Scheme•Kustaanheimo-Stiefel and Chain-Regular. for bound subsystems of N<6 (Quaternions!)•4th order Hermite scheme (pred/corr)• Bulirsch-Stoer (for KS)
•NBODY6 (Aarseth 1999)•NBODY6++ (Spurzem 1999) using MPI/shmem, copy algorithm•Parallel Binary Integration in Progress•Parallel GRAPE Use (Harfst, Gualandris, Merritt, Spurzem, Berczik, Portegies Zwart, 2007)
Software:High Accuracy Integrators for Systems with long-range force + relaxation (gravothermal)Software:High Accuracy Integrators for Systems with long-range force + relaxation (gravothermal)
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
26AstroGrid-D Meeting April 18, 2023
by D.C. Heggie
Via www.maths.ed.ac.uk
Larger N needed!
●Baumgardt, Heggie, Hut Baumgardt, Makino
●Harfst, Gualandris, Merritt, Spurzem, Berczik
High Accuracy Integrators: Record with GRAPE cluster at 2 million particles!High Accuracy Integrators: Record with GRAPE cluster at 2 million particles!
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
27AstroGrid-D Meeting April 18, 2023
Parallel PP on GRAPE6a cluster
Parallel PP on GRAPE6a cluster
ARI Cluster:
~3.2 Tlop/ssustained
ARI Cluster:
~3.2 Tlop/ssustained
Harfst, Gualandris,Merritt, Spurzem,Portegies Zwart, Berczik, New Astron. 2007.
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
28AstroGrid-D Meeting April 18, 2023
VisualisationVisualisation
WithS. DominiczakW. Frings
John-von-NeumannInstitute forComputing(NIC)FZ Jülich
google for
xnbody
29AstroGrid-D Meeting April 18, 2023
Xnbody Visualization with FZ Jülich (Unicore) NBODY6++ UseCase in Astrogrid-D (Globus GT4.0)
Simple JSDL Job ok
Parallel Job + GRAPE/MPRACE request
in progress Astrogrid-D Participation in international networks, like MODEST,
AGENA (EGEE) Goal: share and load-balance GRAPE/MPRACE
resources in international grid-based frame
Xnbody Visualization with FZ Jülich (Unicore) NBODY6++ UseCase in Astrogrid-D (Globus GT4.0)
Simple JSDL Job ok
Parallel Job + GRAPE/MPRACE request
in progress Astrogrid-D Participation in international networks, like MODEST,
AGENA (EGEE) Goal: share and load-balance GRAPE/MPRACE
resources in international grid-based frame
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
30AstroGrid-D Meeting April 18, 2023
International GRAPE-Grid CollaborationInternational GRAPE-Grid Collaboration Members of Astrogrid-D:
ARI-ZAH Univ. Heidelberg, DMain Astron. Obs. Kiev, UA
Candidates:Univ. Amsterdam, NL
Obs. Astroph. Marseille, FFessenkov Obs., Almaty, KZ
Members of Astrogrid-D:ARI-ZAH Univ. Heidelberg, DMain Astron. Obs. Kiev, UA
Candidates:Univ. Amsterdam, NL
Obs. Astroph. Marseille, FFessenkov Obs., Almaty, KZ
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
31AstroGrid-D Meeting April 18, 2023
NBODY6++ RequirementsNBODY6++ Requirements
Meeting 15:15 – 17:00 Use CasesMeeting 15:15 – 17:00 Use Cases
•Fortran 77 with cpp Preprocessor and make•Data Access for Job Chain•Staging of binary and ASCII input/output
Optional:• Parallel Runs (PBS, mpich-mpif77, mpirun, others)• GRAPE hardware• xnbody direct visualization and interaction interface
Future:• GridMPI, Runs across sites
•Fortran 77 with cpp Preprocessor and make•Data Access for Job Chain•Staging of binary and ASCII input/output
Optional:• Parallel Runs (PBS, mpich-mpif77, mpirun, others)• GRAPE hardware• xnbody direct visualization and interaction interface
Future:• GridMPI, Runs across sites
32AstroGrid-D Meeting April 18, 2023
Common Workgroup Meeting of WG3 (Distributed Data Management) with WG5 (Resource Management for Grid Jobs) Expected List of Topics:
How can we improve data staging together? Which steps, what is needed, action items, people?
Further Interaction with other WG's e.g. WG7 user interfaces, WG6 Data Streaming, WG1 system integration
Next deliverables 5.4-5.8, others... Open Discussion on sustainability, internationality, EGEE,
followup project, breakout ideas, guided by Goals Last Year
Common Workgroup Meeting of WG3 (Distributed Data Management) with WG5 (Resource Management for Grid Jobs) Expected List of Topics:
How can we improve data staging together? Which steps, what is needed, action items, people?
Further Interaction with other WG's e.g. WG7 user interfaces, WG6 Data Streaming, WG1 system integration
Next deliverables 5.4-5.8, others... Open Discussion on sustainability, internationality, EGEE,
followup project, breakout ideas, guided by Goals Last Year
Meeting 17:30 – 18:30 WG5 with WG3Meeting 17:30 – 18:30 WG5 with WG3
33AstroGrid-D Meeting April 18, 2023
How can we improve data staging together? Which steps, what is needed, action items, people?
Use Astrogrid-D file management system?
How can we improve data staging together? Which steps, what is needed, action items, people?
Use Astrogrid-D file management system?
Meeting 17:30 – 18:30 WG5 with WG3Meeting 17:30 – 18:30 WG5 with WG3
34AstroGrid-D Meeting April 18, 2023
WP5: Resource Management for Grid Jobs Tasks
WP5: Resource Management for Grid Jobs Tasks
Task V-1: Specification of Requirements and Architecture
AIP (8), ARI-ZAH (6), ZIB (6), AEI (2), MPE (2), MPA (1)
Start Sep. 05, Deliverable D5.1 Oct. 2006 COMPLETED Task V-2: Development of Grid-Job Management (Feb. 07)
ZIB (24), ARI-ZAH (12), MPA (5)
Start June 06, Deliverable D5.2 Feb. 2007, D5.6 June 2008
5.2 COMPLETED Task V-4 Adaptation of User- and Programmer Interfaces (May 07)
AIP (18), ARI-ZAH (12), AEI (5), MPE (4), MPA (1)
Start Dec. 06 Deliverable D5.4 May 2007, D5.7 Sep. 2008 PENDING
Task V-3 Development Link to Robotic Telescopes, Requests (Feb 07) AIP (17), ZIB (6) , Start Sep. 06 Deliverable D5.3 Feb. 2007, D5.5 Oct. 2007, D5.8 Sep. 2008 IN PROGRESS
Task V-1: Specification of Requirements and Architecture
AIP (8), ARI-ZAH (6), ZIB (6), AEI (2), MPE (2), MPA (1)
Start Sep. 05, Deliverable D5.1 Oct. 2006 COMPLETED Task V-2: Development of Grid-Job Management (Feb. 07)
ZIB (24), ARI-ZAH (12), MPA (5)
Start June 06, Deliverable D5.2 Feb. 2007, D5.6 June 2008
5.2 COMPLETED Task V-4 Adaptation of User- and Programmer Interfaces (May 07)
AIP (18), ARI-ZAH (12), AEI (5), MPE (4), MPA (1)
Start Dec. 06 Deliverable D5.4 May 2007, D5.7 Sep. 2008 PENDING
Task V-3 Development Link to Robotic Telescopes, Requests (Feb 07) AIP (17), ZIB (6) , Start Sep. 06 Deliverable D5.3 Feb. 2007, D5.5 Oct. 2007, D5.8 Sep. 2008 IN PROGRESS
35AstroGrid-D Meeting April 18, 2023
Meeting 17:30 – 18:30 WG5 with WG3Meeting 17:30 – 18:30 WG5 with WG3
Short Term:•Improve the deployment by pushing the implementation of modules for at least 2-5 pioneer usecases (this year) [D5.4, 5.7]. • Demonstrate the ability to deploy and run these use case on more than one resource using Gridway (this year) [D5.4, 5.7]. • Use first primitive data staging (handing data through). Note: Useful Document GridGateWay 2007-10-05 by HMA et al.Middle Term:• Enable GridWay as AstroGrid-D job manager (May 08) [D5.6] • Solve the problem how to handle data management together with Gridway (Aug 08) [TA II-5] • increase number of use cases and prospective users [D5.4]• Improve international impact / compatibility issues e.g. with EGEE
Short Term:•Improve the deployment by pushing the implementation of modules for at least 2-5 pioneer usecases (this year) [D5.4, 5.7]. • Demonstrate the ability to deploy and run these use case on more than one resource using Gridway (this year) [D5.4, 5.7]. • Use first primitive data staging (handing data through). Note: Useful Document GridGateWay 2007-10-05 by HMA et al.Middle Term:• Enable GridWay as AstroGrid-D job manager (May 08) [D5.6] • Solve the problem how to handle data management together with Gridway (Aug 08) [TA II-5] • increase number of use cases and prospective users [D5.4]• Improve international impact / compatibility issues e.g. with EGEE
Next Steps in WG-5 / WG-3Next Steps in WG-5 / WG-3
36AstroGrid-D Meeting April 18, 2023
GUI jsdlproc
GT4.0
JSDL RSL/XML
(GT4.2 wird gerade entwickelt und wird JSDL direkt unterstützen)
Entscheidung für dieJob Submission Data Language (JSDL)
wird vom open grid forum (OGF) unterstützt
WG5: Current status,Job Management
37AstroGrid-D Meeting April 18, 2023
GridWay Leightweight Metascheduler on top of GT2.4/GT4
Central Server Architecture
Support of GGF DRMAA standard API for job submission and management
Simple round robin/flooding scheduling algorithm, but extensible
WG5: Current status,Scheduler/Broker
38AstroGrid-D Meeting April 18, 2023
GT4 Resources
hydra.ari.uni-heidelberg.de
Scheduler / Broker
Gridway
Information System
Matchmaking
Job Status: “gwps”
WG5: Current status,Scheduler/Broker
39AstroGrid-D Meeting April 18, 2023
First Steps accomplished toward the integration into AstroGrid Adopted the REMOTE TELESCOPE MARKUP LANGUAGE (RTML)
and developed a first description of STELLA-I This description can contain dynamic information e.g. about weather Developed a generic transformation from RTML to RDF which we can
upload to the AstroGrid information service
(Therefore we modified the program OwlMap from the FRESCO project) The user can use SPARQL queries to find appropriate telescopes. Also SPARQL queries can be implemented in tools like the Grid-
Resource Map.
First Steps accomplished toward the integration into AstroGrid Adopted the REMOTE TELESCOPE MARKUP LANGUAGE (RTML)
and developed a first description of STELLA-I This description can contain dynamic information e.g. about weather Developed a generic transformation from RTML to RDF which we can
upload to the AstroGrid information service
(Therefore we modified the program OwlMap from the FRESCO project) The user can use SPARQL queries to find appropriate telescopes. Also SPARQL queries can be implemented in tools like the Grid-
Resource Map.
STELLA-I
Robotic Telescopes STELLA-I & II
in Tenerife(Canary Islands)
WG5: Current status,Robotic Telescopes
40AstroGrid-D Meeting April 18, 2023
Next steps RTML description of STELLA-II, RoboTel and other robotic telescopes Develop a system that adds dynamic weather information Develop transformation from RTML to telescope specific language for
AIP operated telescopes to be able to send observation requests in RTML
Provide access through the AstroGrid by applying Grid security mechanisms VO management
Development of a scheduler for a network of robotic telescopes A lot of testing
The AIP has a simulator for STELLA and RoboTel
Next steps RTML description of STELLA-II, RoboTel and other robotic telescopes Develop a system that adds dynamic weather information Develop transformation from RTML to telescope specific language for
AIP operated telescopes to be able to send observation requests in RTML
Provide access through the AstroGrid by applying Grid security mechanisms VO management
Development of a scheduler for a network of robotic telescopes A lot of testing
The AIP has a simulator for STELLA and RoboTel
WG5: Next Steps,Robotic Telescopes