03.05.2015 short overview of current status a. a. moskovsky program systems institute, russian...

23
23.06.22 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow, 10-12 June, 2009 “SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS

Upload: kent-willocks

Post on 15-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23

SHORT OVERVIEW OF CURRENT STATUS

A. A. MoskovskyProgram Systems Institute, Russian Academy of Sciences

IKI - MSR Research WorkshopMoscow, 10-12 June, 2009

“SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS

Page 2: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 2 22

Pereslavl-ZalesskyPereslavl-ZalesskyPereslavl-ZalesskyPereslavl-Zalessky

Russian Golden Ring Russian Golden Ring City: 857 years oldCity: 857 years old

Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia

The first building site The first building site Peter The Great navyPeter The Great navy

Ancient capital of Ancient capital of Russian Orthodox Russian Orthodox churchchurch

Moscow

Pereslavl Zalessky

120 km

Page 3: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 3

“SKIF-GRID” PROJECT TIMELINE“SKIF-GRID” PROJECT TIMELINE

1. 2000-2004 - SKIF project, SKIF K-1000 is #98 in Top500

2. June 2004 – first proposal filed for “SKIF-GRID” project

3. March 2007 – approved by Government

4. March 2008 - SKIF-MSU supercomputer deployed (#36 in June 08 Top 500)

5. May 2008 - “SKIF-Testbed” federation created.

6. March 2009 – alliance agreement signed for SKIF series 4 development

Page 4: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 4

PROJECT ORGANIZATION: 2007-2008PROJECT ORGANIZATION: 2007-2008

Project directions1. Grid technology

2. Supercomputers

• SW

• HW

3. Security

4. Pilot projects – applications of HPC and grid technology

Page 5: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 5

«SKIF MSU»«SKIF MSU»

Page 6: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 6

SKIF MSU SKIF MSU

Theoretical peak performance 60 TFlops

47 TFlops Linpack Advanced clustering

solutions: diskless

computational nodes

Original blade design

Parameter Value

CPU architecture: x86-64

CPU model: Intel XEON E5472 3,0 GHz (4-cores)

Nodes (dual CPU) 625

CPU cores total 5 000

Interconnect Infiniband DDR,

Fat Tree

Page 7: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 7

«SKIF-Testbed» a/k/a “SKIF-Polygon” «SKIF-Testbed» a/k/a “SKIF-Polygon”

Federation of HPC centers, ~100 Tflops

4 computers in the current Top 500 MSU (#35 in Top500) South Urals State

University Tomsk State

University UFA state technical

university

Page 8: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 8

Middleware platform – UNICORE 6.1Middleware platform – UNICORE 6.1

X.509 for security Certificate Authority at Pereslavl-Zalessky (PyCA) Site platform

UNICORE 6.1 Java 1.5 Linux Torque

Experimental sites: UNICORE is complemented with additional services/modules

Page 9: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 9

Applications (2007-2008)Applications (2007-2008)

HPC applications: Drug design (MSU Belozersky Institute, SRCC,

Chelyabinsk SU) Inverse problems in soil remote sensing (SRCC) Computational chemistry (MSU Chemistry department)

Geophysical data services Mammography database prototype (N.N. Semenov Chemical

Physics Institute, RAS) Text mining (PSI RAS) Engineering (South Ural University …) Space Research Institute... …

Page 10: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23

SKIF-Aurora

2009-2010: second phase of SKIF-GRID project

Page 11: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 11

SKIF Series 4: original R&D goalsSKIF Series 4: original R&D goals

Highest density of performance(biggest possible number CPU per 1U) Smaller latency Less cables and connectors — better reliability Enlarged emission of heat per 1U

• We need new technology of cooling… How to? Improved Interconnect: we need better scalability,

bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR)

New approach to monitoring and management of the supercomputer

Combining standard CPUs and accelerators in computational nodes of the supercomputer

Page 12: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 12

Spring’2008: SKIF Series 4 — How To?Spring’2008: SKIF Series 4 — How To?

Page 13: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 13

Summer’2008: SKIF Series 4 — Know How!Summer’2008: SKIF Series 4 — Know How!

Italian-Russian Cooperation «SKIF Series 4» ==

«SKIF-AURORA Project» Designed by an alliance of

Eurotech, PSI RAS and RSC SKIF with support by Intel

To be present at ISC 09

Program SystemsInstitute of RAS

Page 14: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 14

SKIF-Aurora distinctive featuresSKIF-Aurora distinctive features

No moving parts Liquid cooling – power efficiency X86_64 processors (IntelNehalem) 3-D torus interconnect Redundant management/monitoring

subsystem FPGA on board (optional) SSD disks (optional) QDR Infiniband

Page 15: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 15

SKIF-AuroraSKIF-Aurora

32 nodes per chassis 64 CPUs in 6U

Up to 8 chassis per rack Up to 512 CPU per rack Up to 2048 cores

To build 500 TFlops 21 racks in 2009 scalable due to 3-D torus

10 kW per chassis

Page 16: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 16

SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIFSKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF

PCBs, mechanics,

power supply, cooling,1 and 2 levels of

management system

3 level of management

system, Interconnect

(3D-torus: firmware,

routing, drivers,

MPI-2…), FPGA as

accelerator

Page 17: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 17

SKIF-AURORA Management SubsystemSKIF-AURORA Management Subsystem

Page 18: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 18

3-D torus interconnect implementation3-D torus interconnect implementation

System Interconnect, 3D-torus

Subsidiary Interconnect, Infiniband

FPGA FPGA FPGA FPGA...

CPU CPU CPU CPUstandard part

non-standard part

Only QCD specific is implemented by Italian team Russian teams to upgrade network to general-purpose

interconnect (MPI 2.0), due to appear fall 2009

Page 19: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 19

R&D Directions Using FPGAR&D Directions Using FPGA

Collective MPI operations using FPGA FPGA to facilitate support of PGAS-languages (UPC, Titanium, etc) FPGA+CPU hybrid computing

Page 20: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 20

ConclusionsConclusions

Is based on collaboration between international teams

Harnesses shared expertise and results Aimed to develop a family of petascale-level

supercomputers with innovative techniques: Higher density of CPUs (flops per volume) Efficient water cooling system Scalable powerful 3D-Torus Interconnect Etc.

Page 21: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 21

Datacenter visualizationDatacenter visualization

Page 22: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 22

Datacenter visualizationDatacenter visualization

Page 23: 03.05.2015 SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow,

18.04.23 Slide 23

THANKSTHANKS

SKIF-GRID web site

http://skif-grid.botik.ru