the french aci grid* initiative and its latest achievements using grid'5000
Post on 11-Jan-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
The French ACI GRID* initiative and its latest achievements using Grid'5000
Thierry PRIOL
Director of the French ACI GRIDThierry.Priol@inria.fr
Franck Cappello
Director ACI GRID Grid’5000Franck.Cappello@inria.fr
Contents An overview of the ACI GRID initiative and some of the projects The Grid’5000 project Concluding remarks
* Member of the non-usual suspect National Grid Initiatives (W. Gentzcsh)
The French ACI GRID initiative and its latest achievements using Grid'5000 2
Objectives of the ACI GRID
Push the national research effort on grid computing
Increase the visibility of French Grid research activities
Fund medium and long term research activities in Grid using a bottom-up approach (nothing imposed !)
Stimulate synergies between research groups
Encourage experimentations with the available grid infrastructure being deployed through national projects
Develop new software for experimental grid infrastructures
New system and programming environments for distributed computing or large data management
The French ACI GRID initiative and its latest achievements using Grid'5000 3
Organisation
Programme Director : Thierry Priol since January 2004, M. Cosnard before
Scientific council : Brigitte Plateau Budget: ~8 M€* (including 8 PhD grants)
This is incentive funding (around 98.3 M€ estimated by GridCoord)
* Computing and network infrastructures, permanent researchers salaries already paid by the state
2001 2003 200520042002 2006 2007
Call1 18 projects 2.25M€
Call2 12 projects 3M€
Call3-G5K 5 projects 1M€
Call4-G5K 6 projects 1M€
The French ACI GRID initiative and its latest achievements using Grid'5000 4
Several kinds of projects
Multidisciplinary project
Software project
Young research team
Collaboration
International
Testbed
The French ACI GRID initiative and its latest achievements using Grid'5000 5
ACI GRID projects
Middleware, tools, environments CGP2P (F. Cappello, LRI/CNRS) ASP (F. Desprez, ENS Lyon/INRIA) EPSN (O. Coulaud, INRIA) PADOUE (A. Doucet, LIP6) MEDIAGRID (C. Collet, IMAG) DARTS (S. Frénot, INSA-Lyon) Grid-TLSE (M. Dayde, ENSEEIHT) RMI (C. Pérez, IRISA) CONCERTO (Y. Maheo, VALORIA) CARAML (G. Hains, LIFO)
Algorithms TAG (S. Genaud, LSIIT) ANCG (N. Emad, PRISM) DOC-G (V-D. Cung, UVSQ)
Compiler techniques Métacompil (G-A. Silbert, ENMP)
Networks and communication RESAM (C. Pham, ENS Lyon) ALTA (C. Pérez, IRISA/INRIA)
Applications COUMEHY (C. Messager, LTHE) - Climate GenoGrid (D. Lavenier, IRISA) - Bioinformatics GeoGrid (J-C. Paul, LORIA) - Oil reservoir IDHA (F. Genova, CDAS) - Astronomy Guirlande-fr (L. Romary, LORIA) - Language GriPPS (C. Blanchet, IBCP) - Bioinformatics HydroGrid (M. Kern, INRIA) - Environment Medigrid (J. Montagnat, INSA-Lyon) - Medical
Grid Testbeds CiGri-CIMENT (L. Desbat, UjF) Mecagrid (H. Guillard, INRIA) GLOP (V. Breton, IN2P3) GRID5000 (F. Cappello, INRIA)
Support for disseminations ARGE (A. Schaff, LORIA) GRID2 (J-L. Pazat, IRISA/INSA) DataGRAAL (Y. Denneulin, IMAG)
The French ACI GRID initiative and its latest achievements using Grid'5000 6
GRID ASP: Client/Server Approach for Simulation over the Grid Call 1 (2001 - 2003) Project coordinator: F. Desprez
E-mail : Frederic.Desprez@inria.fr Web: http://graal.ens-lyon.fr/ASP/
Participants ENS-Lyon, INRIA, LORIA, LIFC, IRCOM,
LST, SRSMC, Physique Lyon1 Objectives
Building a portable set of tools for computational servers in a ASP (Application Service Provider) model
DIET (Distributed Interactive Engineering Toolbox)
Porting several different applications physic, geology, chemistry, electronic
device simulation, robotics, … Focus on issues
resource localization (hierarchical) scheduling, performance evaluation (both static and dynamic), data persistence, data redistribution between servers
Clients C, C++, Scilab, Web browser
S2
S3
Batch systemS1
LocalScheduler
AGENT
Scheduler
Visualization server
(Software database
distributed)
Performancedatabase
(distributed)
AGENTScheduler
AGENTScheduler
C, Fortran, Java
Direct connection
grpc_call(MatPROD, A, B);
……
The French ACI GRID initiative and its latest achievements using Grid'5000 7
TLSE : Web expert site for sparse matrices based on grid infrastructure
Call 2 (2002 - 2004) Project coordinator: Michel Daydé
E-mail : Michel.Dayde@enseeiht.fr Web: http://www.enseeiht.fr/lima/tlse/
Participants CERFACS, FéRIA-IRIT, LIP-ENSL,
LaBRI, CEA, CNES, EADS, EDF, IFP Objectives
Design a Web expertise site for sparse matrices
Dissemination of our expertise in sparse linear algebra
Easy access and experimentation with software and tools: only statistics are provided, not computing resources
Exploitation of the computing power of the grid for parametric studies
Contents : Sparse matrix software, Bibliography, Collections of sparse matrices
The French ACI GRID initiative and its latest achievements using Grid'5000 8
CGP2P: Global P2P Computing“Fusion of Desktop Grid and P2P systems”
Call 1 (2001 - 2003) Coordinator: Franck Cappello,
email: fci@lri.fr Web: www.lri.fr/~fci
Participants: LRI, LIFL, ID IMAG, LARIA, LAL, EADS
Requests concern computations or data
Services concern computation or data
Coordinationsystem
Client (PC)
request
result
accept
provide
Service provider (PC)
accept
provide
Potential communications forparallel applications(MPI)
Client (PC)
request
result
Service provider (PC)
Desktop Grid middleware: XtremWeb Fault tolerant MPI: MPICH-V Sandbox for binary applications: SBSLM Large Scale Storage: US Workflow/Dataflow language: YML Scheduling simulator: SimLargeGrid French ADSL analysis Theoretical proof of the protocols Convergence/Integration with GRID (GT3)
The French ACI GRID initiative and its latest achievements using Grid'5000 9
RMI: Programming the Grid with distributed Objects
Call 1 (2001 - 2003) Project coordinator: C. Pérez
E-mail : Christian.Perez@irisa.fr Web: http://www.irisa.fr/Grid-RMI/en/
Participants IRISA, ENS-Lyon, LIFL, INRIA, EADS
Objectives Provide a framework to combine various
communication middleware and runtimes For parallel programming:
Message based runtimes (MPI, PVM, …)
DSM-based runtimes (TreadMarks, …) For distributed programming
RPC/RMI based middleware (DCE, CORBA, Java)
Middleware for discrete-event based simulation (HLA)
Get the maximum performance from the network!
Offer zero-copy mechanism to middleware/runtime
MadeleinePortability across networks
MarcelI/O aware multi-threading
Myrinet SCI
PadicoTMCore
PadicoTM Services
Multithreading
Netw
or
ks
DSM JVMMPI CORBA HLA
TCP
Personality Layer
Internal engine
Mpich
OmniORBMICOOrbacusOrbix/E
Kaffe CERTIMome
0
50
100
150
200
250
1 10 100 1000 10000 100000 10000001E+07
Message size (bytes)
CORBA/Myrinet-2000
MPI/Myrinet-2000
Java/Myrinet-2000
CORBA/SCI
MPI/SCI
TCP/Ethernet-100
Bandwidth (MB/s)
The French ACI GRID initiative and its latest achievements using Grid'5000 10
HydroGrid: distributed code coupling in hydrogeology, using software components
Call 2 (2002 - 2004) Project coordinator: M. Kern
E-mail : Michel.Kern@inria.fr Web:
http://www-rocq.inria.fr/~kern/ HydroGrid/HydroGrid-en.html.
Participants: INRIA Rocquencourt, INRIA Rennes, IMFS Strasbourg, Geosciences Rennes
Objectives Simulate flow and transport of pollutants
in the subsurface Take into account couplings between
different physical phenomena Couple parallel codes on a grid, software
from ACI GRID RMI project Links between numerical and software
coupling Example applications: reactive transport
(top), density driven flow (bottom), fractured media
flow transport chemistry
meshing
visualizationDensity driven flow : mass fraction
The French ACI GRID initiative and its latest achievements using Grid'5000 11
Main feedback from call1 & call2 projects
Lack of a large scale testbed available for experiments
Several small scale testbeds at the regional level Duplication of effort when setting up testbeds
Various type of Grids
Need to be able to experiment various software layers
Incompatible with a production Grid
The French ACI GRID initiative and its latest achievements using Grid'5000 12
In the first ½ of 2003, the design and development ofan experimental platform for Grid researchers was decided: Grid’5000 as a real life system
log(cost & coordination)
log(realism)
math simulation emulation live systems
SimGridMicroGridBricksNS, etc.
ModelProtocol proof
Data Grid eXplorerWANinLabEmulab
DAS PlanetLabNaregi TestbedNSF GENI
Major challenge
Challenging
Reasonable
How to proceed…
Grid’5000
The Grid’5000 project
The French ACI GRID initiative and its latest achievements using Grid'5000 14
ApplicationMiddleware
OS (…)BIOS
Grid ApplicationGrid Middleware
OS (…)Grid BIOS
Grid’5000
Grid’5000 Objective
Deploy an experimental large scale computing infrastructure to allow any kind of experiments Experiments of any kind of grids (Virtual Supercomputer, Desktop Grid, …)
Experimental conditions Configuration of the entire software stack
from the application to the operating system
Computer testbed
Grid testbed
The French ACI GRID initiative and its latest achievements using Grid'5000 15
The Grid’5000 Project
Building a nation wide experimental platform for Large scale Grid & P2P experiments
9 geographically distributed sites Every site hosts a cluster (from 256 CPUs to 1K CPUs) All sites are connected by RENATER (French Res. and Edu. Net.) RENATER hosts probes to trace network load conditions Design and develop a system/middleware environment for safely test and
repeat experiments
Use the platform for Grid experiments in real life conditions Port and test applications, develop new algorithms Address critical issues of Grid system/middleware:
Programming, Scalability, Fault Tolerance, Scheduling Address critical issues of Grid Networking
High performance transport protocols, Qos Investigate original mechanisms
P2P resources discovery, Desktop Grids
The French ACI GRID initiative and its latest achievements using Grid'5000 16
June 2003 2005
PreparationCalibration
Experiments
2007
InternationalcollaborationsCoreGRIDCoreGRID
2006
12501250CPUsCPUs
35003500
50005000
ProcessorsProcessorsFunded
Today
2004
DiscussionsPrototypes
InstallationsClusters & Net
20002000
First Experiments
2300
~2500
Planning
The French ACI GRID initiative and its latest achievements using Grid'5000 17
Allow users running their favorite measurement toolsand experimental condition injectors
Grid’5000 foundations:Measurements and condition injection
Quantitative metrics : Performance: Execution time, throughput, overhead, QoS (Batch,
interactive, soft real time, real time). Scalability:Resource occupation (CPU, memory, disc, network),
Applications algorithms, Number of users, Number of resources. Fault-tolerance:Tolerance to very frequent failures (volatility), tolerance to
massive failures (a large fraction of the system disconnects), Fault tolerance consistency across the software stack.
Experimental Condition injection : Background workloads: CPU, Memory, Disk, network, Traffic injection at the
network edges. Stress: high number of clients, servers, tasks, data transfers, Perturbation: artificial faults (crash, intermittent failure, memory corruptions,
Byzantine), rapid platform reduction/increase, slowdowns, etc.
The French ACI GRID initiative and its latest achievements using Grid'5000 18
Application Runtime
Grid or P2P Middleware
Operating System
Programming Environments
Networking
Application
Let users create, deploy and run their software stack,including the software to test and their environment+ measurement tools + experimental conditions injectors
Experi
menta
l co
ndit
ions
inje
ctor
Measu
rem
en
t to
ols
Grid’5000 principle: A highlyreconfigurable experimental platform
The French ACI GRID initiative and its latest achievements using Grid'5000 19
Reserve nodes correspondingto the experiment
Log into Grid’5000Import data/codes
Reboot the nodes in the user experimental environment (optional)
Transfer params + Run the experiment
Collect experiment results
Build an env. ?yes
noReserve 1
node
Reboot node(existing env.*)
Adapt env.
Exit Grid’5000
Reboot node
Env. OK ?yes
*Available on all sites:Fedora4allUbuntu4allDebian4all
Experiment workflow
The French ACI GRID initiative and its latest achievements using Grid'5000 20
Orsay1000 (684)
Rennes518 (518)
Bordeaux500 (96)
Toulouse500 (116)
Lyon500 (252)
Grenoble500 (270)
Sophia Antipolis500 (434)
Lille:500 (106)
Nancy:500 (94)
Grid’5000 map
Should be red today at Orsay !
The French ACI GRID initiative and its latest achievements using Grid'5000 21
Grenoble
RennesLyon
Toulouse
Sophia
Orsay
Bordeaux
The French ACI GRID initiative and its latest achievements using Grid'5000 22
Hardware Configuration
The French ACI GRID initiative and its latest achievements using Grid'5000 23
QuickTime™ et undécompresseur TIFF (non compressé)
sont requis pour visionner cette image.
10 Gbps
Dark fiberDedicated LambdaFully isolated traffic!
QuickTime™ et undécompresseur TIFF (non compressé)
sont requis pour visionner cette image.
Grid’5000 network provided by RENATER
The French ACI GRID initiative and its latest achievements using Grid'5000 24
QuickTime™ et undécompresseur TIFF (non compressé)
sont requis pour visionner cette image.
QuickTime™ et undécompresseur TIFF (non compressé)
sont requis pour visionner cette image.
Grid’5000 as an Instrument
A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature
Grid’5000 is confined: communications between sites are isolated from the Internet and Vice versa (level2 MPLS, Dedicated lambda).
A software infrastructure allowing users to access Grid’5000 from any Grid’5000 site and have simple view of the system
A user has a single account on Grid’5000, Grid’5000 is seen as a cluster of clusters, 9 (1 per site) unsynchronized home directories
A reservation/scheduling tools allowing users to select nodes and schedule experiments
a reservation engine + batch scheduler (1 per site) + OAR Grid (a co-reservation scheduling system)
A user toolkit to reconfigure the nodes software image deployment and node reconfiguration tool
The French ACI GRID initiative and its latest achievements using Grid'5000 25
Currently we use Reboot, but Xen will be used inthe default environment.Let users select its experimental environment:Fully dedicated or shared within virtual machine
Reboot:
Remote control with IPMI,RSA, etc.
Disc repartitioning, if necessary
Reboot or Kernel switch (Kexec)
Virtual Machine:
No need for reboot
Virtual machine technologySelection not so easy
Xen has some limitations:-Xen3 in “initial support” status for intel vtx-Xen2 does not support x86/6-Many patches not supported-High overhead on high speed Net.
OS Reconfiguration techniquesReboot OR Virtual Machines
The French ACI GRID initiative and its latest achievements using Grid'5000 26
Resource usage: activity (Feb’06)Resource usage: activity (Feb’06)
Activity > 70%
April: just before SC’06 and Grid’06 deadlines
The French ACI GRID initiative and its latest achievements using Grid'5000 27
345 registeredUsersComing from45 Laboratories.
IBCPIMAGINRIA-AlpesINSA-LyonPrism-VersaillesBRGMINRIACEDRATIME/USP.brINF/UFRGS.brLORIA
UFRJ.brLABRILIFLENS-LyonEC-LyonIRISARENATERIN2P3LIFCLIP6UHP-Nancy
France-telecomLRIIDRISAIST.jpUCD.ieLIPN-Paris XIIIU-PicardieEADSEPFL.chLAASICPS-Strasbourg
Univ.NantesSophiaCS-VU.nlFEW-VU.nlUniv. NiceENSEEIHTCICTIRITCERFACSENSIACETINP-ToulouseSUPELEC
Community: Grid’5000 users
The French ACI GRID initiative and its latest achievements using Grid'5000 28
About 230+ Experiments
The French ACI GRID initiative and its latest achievements using Grid'5000 29
About 200 Publications
The French ACI GRID initiative and its latest achievements using Grid'5000 30
A series of Events
The French ACI GRID initiative and its latest achievements using Grid'5000 31
• Series of conferences and tutorials including Grid PlugTest (N-Queens and Flowshop Contests).
The objective of this event was to bring together ProActive users, to present and discuss current and future features of the ProActive Grid platform, and to test the deployment and interoperability of ProActive Grid applications on various Grids.
The N-Queens Contest (4 teams) where the aim was to find the number of solutions to the N-queens problem, N being as big as possible, in a limited amount of time
The Flowshop Contest (3 teams)
1600 CPUs in total: 1200 provided by Grid’5000 + 50 by the other Grids (EGEE, DEISA, NorduGrid) + 350 CPUs on clusters.
Don’t miss Grid@work 2006
in Nov. 26 to Dec. 1http://www.etsi.org/plugtests/Upcoming/GRID2006/GRID2006.htm
Grid@work (Octobre 10-14 2005)
The French ACI GRID initiative and its latest achievements using Grid'5000 32
Building a seismic tomography model of the Earth geology using seismic wave propagation characteristics in the Earth.Seismic waves are modeled from events detected by sensors. Ray tracing algorithm: waves are reconstructed from rays traced between the epicenter and one sensor.
A MPI parallel program composed of 3 steps
1) Master-worker: ray tracing and mesh update by each process with blocks of rays successively fetched from the master process,
2) all-to all communications to exchange submesh in-formation between the processes,
3) merging of cell information of the submesh associated with each process.
Reference: 32 CPUs
Stéphane Genaud , Marc Grunberg , and Catherine MongenetIPGS: “Institut de Physique du Globe de Strasbourg”
Experiment: Geophysics: Seismic Ray Tracing in 3D mesh of the Earth
The French ACI GRID initiative and its latest achievements using Grid'5000 33
Goals: study of a JXTA “DHT” “Rendez vous” peers form the JXTA DHT Performance of this DHT? Scalability of this DHT?
Organization of a JXTA overlay (peerview protocol)
Each rendezvous peer has a local view of other rendezvous peers Loosely-Consistent DHT between rendezvous peers Mechanism for ensuring convergence of local views
Benchmark: time for local views to converge
Up to 580 nodes on 6 sites
Edge Peer
rdv Peer
1) It requires 2 hours to contact all “rendez vous” peers2) With the per default setting, the view of every rendez vous
peers is limited to only 300 rendez vous peers3) The view of every “rendez vous” peer is very unstable
Jxta DHT scalability
The French ACI GRID initiative and its latest achievements using Grid'5000 34
B
C
EF
DA
G
Grid'5000 site #CPU used Execution timeBordeaux 82 1740s
Orsay 344 1959sRennes (paraci) 98 1594sRennes (parasol) 62 2689s
Rennes (paravent) 198 2062sSophia 160 1905s
944 CPUs Bordeaux (82), Orsay(344), Rennes Paraci (98), Rennes Parasol (62), Rennes Paravent (198), Sophia (160) Duration: 12 hours
Submission interval 5s 10s 20sAverage execution time 2057s 2042s 2039s
Overhead 4,80% 4,00% 3,90%
Fully Distributed Batch Scheduler
Motivation : evaluation of a fully distributed resource allocation service (batch scheduler)
Vigne : Unstructured network, flooding (random walk optimized for scheduling).
Experiment: a bag of 944 homogeneous tasks / 944 CPU Synthetic sequential code (monte carlo application). Measure of the mean execution time for a task (computation time
depends on the resource) Measure the overhead compared with an ideal execution (central
coordinator) Objective: 1 task per CPU.
Tested configuration: Result :
L. Rilling et al., 2006
The French ACI GRID initiative and its latest achievements using Grid'5000 35
7 sites : Lyon, Orsay,Rennes, Lilles, Sophia,Toulouse,Bordeaux
8 clusters - 585 machines - 1170 CPUs.
Objectives :
- Proove that the DIET environment is scallable.
- Test the functionnalities of DIET at large scale
1120 clients submitted more than 45 000 REAL GridRPC requests (dgemm matrix multiply) to GridRPC servers
Raphaël Bolze
Large Scale experiment of DIET:A GridRPC environment
The French ACI GRID initiative and its latest achievements using Grid'5000 36
“one of the hardest challenge problems in combinatorial optimization”
• Schedule a set of jobs on a set of machines minimizing the makespan. • Jobs order must be respected and machines can execute 1 job at a time.• Complexity is very high for large size instances (possible schedules). • Exhaustive enumeration of all combinations would take several years. • The challenge is thus to reduce the number of explored solutions.• But the problem cannot be efficiently solved without computational grids.
New Grid exact method based on the Branch-and-Bound algorithm (Talbi, melab, et al.), combining new approaches of combinatorial algorithmic, grid computing, load balancing and fault tolerance.
Problem: 50 jobs on 20 machines, optimally solved for the 1st time, with 1245 CPUs (peak) 1245 CPUs (peak)
Using simultaneously Grid5000 and other clusters Using simultaneously Grid5000 and other clusters
Involved Grid5000 sites (6): Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse. Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse.
The optimal solution required a wall-clock time of 1 month and 3 weeks.The optimal solution required a wall-clock time of 1 month and 3 weeks.
E. Talbi, N. Melab, 2006
Solving the Flow-Shop Scheduling Problem
The French ACI GRID initiative and its latest achievements using Grid'5000 37
Aggregated bandwidth of 9,3 Gb/s on a time interval of few minutes. Then a very high drop of the bandwidth on one of the connection.
Interaction of 10 1Gb/s TCP streams,
over the 10Gb/s Rennes-Nancy link,
during 1 hour.
TCP limits over 10Gb/s links
Highlighting TCP stream interaction issues in very high bandwidth links (congestion colapse) and poor bandwidth fairness
Grid’5000 10Gb/s connections evaluation Evaluation of TCP variants over Grid’5000 10Gb/s links (BIC TCP,
H-TCP, weswood…)
P. Primet et al., 2006
The French ACI GRID initiative and its latest achievements using Grid'5000 38
Grid’5000 main achievements in 2006
A large scale and highly reconfigurable Grid experimental platform Used by Master student Ph. D., PostDoc and researchers (and results are
presented in their reports, thesis, papers, etc.) Grid’5000 offers in 2006:
9 clusters distributed over 9 sites in France, about 10 Gigabit/s (directional) of bandwidth the capability for all users to reconfigure the platform
[protocols/OS/Middleware/Runtime/Application] Grid’5000 results in 2006:
300+ users ~200 publications, ~230 planned experiments
Grid’5000 is opened to French Grid researchers since July 2005 Grid’5000 is opened to others communities in 2006 (CoreGRID)
Grid’5000 winter school (Philippe d’Anfray, ~January 2007) Connection to other Grid experimental platforms
Netherlands (from October 2006), Japan (under discussion) Sustainability ensured by INRIA after 2007
The French ACI GRID initiative and its latest achievements using Grid'5000 39
Concluding remarks
GRID in its wider definition Computing, data and knowledge Grids, P2P Not only focusing on the use of Supercomputers… neither on Globus… An emphasis on middleware but also on applications/algorithms to make them Grid-
aware
The French ACI GRID lead to many European initiatives Several groups of the ACI GRID projects are involved in EU funded projects (almost
absent in FP5, involved in 10 projects in FP6 and leader of 3 projects) The idea to set up a Network of Excellence in Grid Research came from the ACI GRID
(M. Cosnard) On-going discussions to have a European dimension of Grid’5000 funded under the 7th
Framework Programme
Funding of Grid research yet available Through the “Agence National de la Recherche”
To get more information about the ACI-GRID http://www-sop.inria.fr/aci/grid Thierry.Priol@inria.fr
DAS3Grid’5000
Oct. 2006
2600 CPUs
1500 CPUs
The French ACI GRID initiative and its latest achievements using Grid'5000 40
AnnouncementProject consultation Meeting
Bridging Global Computing with Grid (BIGG)In conjunction with
November 28-29, 2006
The objective of the workshop is to provide a direct gateway facilitating interactions between two different
communities: Grid & Global Computing
top related