03.05.2015 short overview of current status a. a. moskovsky program systems institute, russian...
TRANSCRIPT
18.04.23
SHORT OVERVIEW OF CURRENT STATUS
A. A. MoskovskyProgram Systems Institute, Russian Academy of Sciences
IKI - MSR Research WorkshopMoscow, 10-12 June, 2009
“SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS
18.04.23 Slide 2 22
Pereslavl-ZalesskyPereslavl-ZalesskyPereslavl-ZalesskyPereslavl-Zalessky
Russian Golden Ring Russian Golden Ring City: 857 years oldCity: 857 years old
Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia
The first building site The first building site Peter The Great navyPeter The Great navy
Ancient capital of Ancient capital of Russian Orthodox Russian Orthodox churchchurch
Moscow
Pereslavl Zalessky
120 km
18.04.23 Slide 3
“SKIF-GRID” PROJECT TIMELINE“SKIF-GRID” PROJECT TIMELINE
1. 2000-2004 - SKIF project, SKIF K-1000 is #98 in Top500
2. June 2004 – first proposal filed for “SKIF-GRID” project
3. March 2007 – approved by Government
4. March 2008 - SKIF-MSU supercomputer deployed (#36 in June 08 Top 500)
5. May 2008 - “SKIF-Testbed” federation created.
6. March 2009 – alliance agreement signed for SKIF series 4 development
18.04.23 Slide 4
PROJECT ORGANIZATION: 2007-2008PROJECT ORGANIZATION: 2007-2008
Project directions1. Grid technology
2. Supercomputers
• SW
• HW
3. Security
4. Pilot projects – applications of HPC and grid technology
18.04.23 Slide 5
«SKIF MSU»«SKIF MSU»
18.04.23 Slide 6
SKIF MSU SKIF MSU
Theoretical peak performance 60 TFlops
47 TFlops Linpack Advanced clustering
solutions: diskless
computational nodes
Original blade design
Parameter Value
CPU architecture: x86-64
CPU model: Intel XEON E5472 3,0 GHz (4-cores)
Nodes (dual CPU) 625
CPU cores total 5 000
Interconnect Infiniband DDR,
Fat Tree
18.04.23 Slide 7
«SKIF-Testbed» a/k/a “SKIF-Polygon” «SKIF-Testbed» a/k/a “SKIF-Polygon”
Federation of HPC centers, ~100 Tflops
4 computers in the current Top 500 MSU (#35 in Top500) South Urals State
University Tomsk State
University UFA state technical
university
18.04.23 Slide 8
Middleware platform – UNICORE 6.1Middleware platform – UNICORE 6.1
X.509 for security Certificate Authority at Pereslavl-Zalessky (PyCA) Site platform
UNICORE 6.1 Java 1.5 Linux Torque
Experimental sites: UNICORE is complemented with additional services/modules
18.04.23 Slide 9
Applications (2007-2008)Applications (2007-2008)
HPC applications: Drug design (MSU Belozersky Institute, SRCC,
Chelyabinsk SU) Inverse problems in soil remote sensing (SRCC) Computational chemistry (MSU Chemistry department)
Geophysical data services Mammography database prototype (N.N. Semenov Chemical
Physics Institute, RAS) Text mining (PSI RAS) Engineering (South Ural University …) Space Research Institute... …
18.04.23
SKIF-Aurora
2009-2010: second phase of SKIF-GRID project
18.04.23 Slide 11
SKIF Series 4: original R&D goalsSKIF Series 4: original R&D goals
Highest density of performance(biggest possible number CPU per 1U) Smaller latency Less cables and connectors — better reliability Enlarged emission of heat per 1U
• We need new technology of cooling… How to? Improved Interconnect: we need better scalability,
bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR)
New approach to monitoring and management of the supercomputer
Combining standard CPUs and accelerators in computational nodes of the supercomputer
18.04.23 Slide 12
Spring’2008: SKIF Series 4 — How To?Spring’2008: SKIF Series 4 — How To?
18.04.23 Slide 13
Summer’2008: SKIF Series 4 — Know How!Summer’2008: SKIF Series 4 — Know How!
Italian-Russian Cooperation «SKIF Series 4» ==
«SKIF-AURORA Project» Designed by an alliance of
Eurotech, PSI RAS and RSC SKIF with support by Intel
To be present at ISC 09
Program SystemsInstitute of RAS
18.04.23 Slide 14
SKIF-Aurora distinctive featuresSKIF-Aurora distinctive features
No moving parts Liquid cooling – power efficiency X86_64 processors (IntelNehalem) 3-D torus interconnect Redundant management/monitoring
subsystem FPGA on board (optional) SSD disks (optional) QDR Infiniband
18.04.23 Slide 15
SKIF-AuroraSKIF-Aurora
32 nodes per chassis 64 CPUs in 6U
Up to 8 chassis per rack Up to 512 CPU per rack Up to 2048 cores
To build 500 TFlops 21 racks in 2009 scalable due to 3-D torus
10 kW per chassis
18.04.23 Slide 16
SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIFSKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF
PCBs, mechanics,
power supply, cooling,1 and 2 levels of
management system
3 level of management
system, Interconnect
(3D-torus: firmware,
routing, drivers,
MPI-2…), FPGA as
accelerator
18.04.23 Slide 17
SKIF-AURORA Management SubsystemSKIF-AURORA Management Subsystem
18.04.23 Slide 18
3-D torus interconnect implementation3-D torus interconnect implementation
System Interconnect, 3D-torus
Subsidiary Interconnect, Infiniband
FPGA FPGA FPGA FPGA...
CPU CPU CPU CPUstandard part
non-standard part
Only QCD specific is implemented by Italian team Russian teams to upgrade network to general-purpose
interconnect (MPI 2.0), due to appear fall 2009
18.04.23 Slide 19
R&D Directions Using FPGAR&D Directions Using FPGA
Collective MPI operations using FPGA FPGA to facilitate support of PGAS-languages (UPC, Titanium, etc) FPGA+CPU hybrid computing
18.04.23 Slide 20
ConclusionsConclusions
Is based on collaboration between international teams
Harnesses shared expertise and results Aimed to develop a family of petascale-level
supercomputers with innovative techniques: Higher density of CPUs (flops per volume) Efficient water cooling system Scalable powerful 3D-Torus Interconnect Etc.
18.04.23 Slide 21
Datacenter visualizationDatacenter visualization
18.04.23 Slide 22
Datacenter visualizationDatacenter visualization
18.04.23 Slide 23
THANKSTHANKS
SKIF-GRID web site
http://skif-grid.botik.ru