“future technologiesfuture technologies” (wp8) … · 2019. 9. 24. · rapidmind allows to...
TRANSCRIPT
-
“Future Technologies” (WP8) PrototypesFuture Technologies (WP8) PrototypesIris Christadler, Dr. Herbert Huber
Leibniz Supercomputing Centre, Germany
-
Prototype Overview (1/2)Prototype Overview (1/2)CEA“GPU/CAPS”
1U Tesla Server T1070 (CUDA, CAPS DDT) Intel Harpertown nodes
Take more easily advantage of accelerators. Compare HMPP with other approaches to program accelerators“GPU/CAPS” CAPS, DDT), Intel Harpertown nodes HMPP with other approaches to program accelerators.
CINECA I/O Subsystem (SSD, Lustre, pNFS) Assess the applicability of new file system and storage technologies.
CINES-LRZ“LRB/CS”
Hybrid SGI ICE/UV/Nehalem-EP & Nehalem-EX/ClearSpeed/Larrabee
Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system.
CSCS“UPC/CAF”
Prototype PGAS language compilers (CAF + UPC for Cray XT systems)
Understand the usability and programmability of PGAS languages.
EPCC“FPGA”
Maxwell – FPGA prototype (VHDL support & consultancy + software licenses (e.g., Mitrion-C))
Assess the potential of high-level languages for using FPGAs in HPC. Compare energy efficiency with other solutions.
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 2
-
Prototype Overview (2/2)
FZJ eQPACE (PowerXCell Gain deep expertise in communication
Prototype Overview (2/2)
FZJ“Cell & FPGA interconnect”
eQPACE (PowerXCell8i cluster with special network processor)
Gain deep expertise in communication network issues. Extend the application domain of the QPACE system.
LRZ“RapidMind”
RapidMind Multi-Core Development Platform (automatic code generation for x86, GPUs and Cell)
Assess the potential of data stream languages. Compare RapidMind with other approaches for programming accelerators or multi-core systems
NCF“ClearSpeed”
ClearSpeed CATS 700 units
Evaluate ClearSpeed accelerator hardware for large-scale applications.
Air cooled blade system from SNIC-KTH
ySupermicro with AMD Istanbul processors & QDR IB(subject to EC approval)
Evaluate and optimize energy efficiency and packing density ofcommodity hardware.Experiences with the
prototypes will be reported in Deliverable D8 3 2
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 3
in Deliverable D8.3.2 [http://www.prace-project.eu/documents/public-deliverables-1/]
-
The teaser
A SELECTION OF RESULTSThe teaser
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 4
-
RinfRinf
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 5
-
Euroben results - accelerator languagesEuroben results - accelerator languages
Accelerator Languages (absolute performance)
94% 81%100000
1000000
Accelerator Languages (absolute performance)MKL (8 Nehalem cores)
CUDA (1 C1060)
CellSs (1 PowerXCell8i)79%
78% v. peak
10000
100000 CellSs (1 PowerXCell8i)
Cn (1CSX700)
94
Accelerator Languages (%peak perf)
100
1000
Mflo
ps
94
3.3
30
81
0 9
4.5
79
2
78
610.00
100.00
rforman
ce
MKL
mod2f/MKL:single‐threaded only
10
0.9
0.04
0.030 01
0.10
1.00
% of p
eak pe
r
CUDA
CellSs
Cnmod2f/MKL: single‐threadedonly
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 6
1
peak perf mod2am mod2as mod2f
0.01
mod2am mod2as mod2f
-
Euroben results - GPGPU languagesEuroben results - GPGPU languages
100
Performance Comparison (dense matrix‐matrix mul.) on Nvidia C1060
70
80
90
100
50
60
70
Gflo
ps
CUDA
CAPS
20
30
40
G
CUDA+MPI 4x4
RapidMind
OpenCL
0
10 MKL (8cores Nehalem)
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 7matrix size (m)
-
Euroben results - productivityEuroben results - productivity20100000
Development Time versus Performance (dense matrix-matrix mul.)
12
14
16
18
1000
10000
me
in D
ays
Mflo
ps
*6
8
10
12
100
1000
velo
pmen
t Tim
Perf
orm
ance
in
Performance
* *
**0
2
4
1
10 DevP Performance
total time
first version
**
* OpenCL and CUDA+MPI port based on existing CUDA port
** RapidMind developer included
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 8
time for benchmarking
-
First IO-ResultsFirst IO-Results
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 9
-
A glimpse on what you will find in Deliverable D8.3.2
PROTOTYPESA glimpse on what you will find in Deliverable D8.3.2
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 10
-
eQPACEeQPACEExtend communication capabilities of eQPACE to make
it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ).H d P XC ll8i d i h• Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect. B h k• Benchmarks:HPL, Euroben kernels, torus network benchmarktorus network benchmark,applications & iterative solvers.
• Programming environments:
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
g gCell SDK & CellSs
11
-
RapidMindRapidMindEvaluation of the RapidMind programming model (LRZ).
R idMi d d2
• Hardware:– CPUs (Nehalem EP, AMD Opteron)
10203040506070
Gfops
RapidMind mod2am
– GPUs (Nvidia Tesla and Quadro FX)– Cell (QS22-blade cluster)
• Software:
010
matrix size (m)
• Software:RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell.
x86‐dp (8 cores nehalem) cuda‐dp (c1060) glsl‐sp (FX 5800)
– Evaluate ease-of-use & portability– Assess RapidMind performance on different architectures
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
– Compare RapidMind with other accelerator languages
12
-
LRZ-CINESLRZ-CINESEvaluation of a hybrid system architecture containing thin
nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ).H d• Hardware:– SGI ICE (Nehalem EP)– SGI UV (Nehalem EX)– SGI UV (Nehalem EX)– Clearspeed CSX700
• Benchmarks:– Euroben kernels– Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
– Application BMs: Gadget, Raxml, Specfm3dglobe
13
-
Hybrid technology demonstratorHybrid technology demonstratorEvaluating GPGPU with CAPS HMPP (CEA).• Hardware:
Tesla servers connected to B ll i PCI E
40506070
ops
CAPS hmpp mod2am
Bull servers via PCI-E.• Software:
CAPS HMPP ll t l it th0
102030G
fl
CAPS HMPP allows to exploit the potential of GPGPUs by simply adding preprocessor directives to
matrix size (m)
50
60
70
CUDA mod2am
adding preprocessor directives to legacy Fortran and C codes.
0
10
20
30
40
50
Gflo
ps
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 14matrix size (m)
-
Maxwell FPGAMaxwell FPGAEvaluate the performance and usability of the
HARWEST Compiling Environment (EPCC).• Hardware: FPGA prototype “Maxwell” (32 FPGAs)
f b h Al h D L d d N ll h L d ifrom both Alpha Data Ltd and Nallatech Ltd using Virtex-4 FPGAs supplied by Xilinx Corp.B h k• Benchmarks:4 Euroben kernels
• Languages:• Languages:– VHDL– HCE
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 15
-
PGAS languagesPGAS languagesEvaluate ease of use of PGAS programming model
(CSCS).• Hardware: Cray XT5• Compiler: Cray Compiler Environment (CCE)• Evaluation of the compiler:
– Functional correctness– Conformance with language standards
Usability for existing CAF and UPC benchmarks/applications– Usability for existing CAF and UPC benchmarks/applications
• Benchmarks from Rice University, George Washington University and the Lawrence Berkley
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
Washington University and the Lawrence Berkley National Laboratory
16
-
ClearSpeed/PetaPathClearSpeed/PetaPathEvaluate ClearSpeed-Petapath system (NCF).• Hardware:
114 ClearSpeed CSX700 cards• Language: Cn
• Benchmarks: – 4 Euroben kernels– 4 Applications
• Astronomy• Astronomy• Geophysics• numerical mathematics
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
• medical tomography
17
-
XC4-IOXC4-IO• Compare performances in storage infrastructure
access, using different hardware configurations and file system architectures. (CINECA).
Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen.
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 18
-
SNIC-KTHEvaluate energy efficiency of
SNIC-KTH Preliminary Results (Gromacs)
high density commodity parts (SNIC-KTH).
• Hardware: AMD Istanbul• Benchmarks:
Euroben, STREAM, IMB, Gromacs, CFD• Measure power consumption per component• Adjust fan speed and fan power• Assess energy management features of AMD Istanbul
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
(Control of voltage and frequency of components)19
-
Results will be reported in Deliverable D8.3.2.
RESEARCH ACTIVITIESResults will be reported in Deliverable D8.3.2.
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 20
-
Parallel GPUParallel GPUEvaluation of GPGPU programming languages (CSC).• Languages
– CUDA+MPIOpenCL
GPU-HMMER– OpenCL
• Benchmarks:– GPU-HMMER– Euroben Kernels
• Hardware– Tesla– AMD Firestream
CEA WP8 Prototype
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
– CEA WP8 Prototype
21
-
Advanced PGAS ProgrammingAdvanced PGAS ProgrammingEvaluate usability of PGAS upc_barrier;upc_forall (sc=0; sc
-
Research on power efficiencyResearch on power efficiencyEvaluate power consumption of components (STFC, PSNC).• Hardware:
ClearSpeed, Tesla, Firestream, Cell, Power6.• Different workloads:
stand-by, neutral, real life, artificial stress.• Assess CPU, Memories, Accelerators, HDD’s, cooling fans,
backplane, power supply.P t ith• Power measurements with:Clamp meters, PDUs with built-in ammeters, values from system management software
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2
system management software
23
-
Contact information:Dr. Herbert Huber (WP8 Leader), [email protected] i Ch i t dl (WP8 C L d ) h i t dl @l dIris Christadler (WP8 Co-Leader), [email protected] Supercomputing Centre, Germany
THANK YOU FOR YOUR ATTENTION!COMMENTS? QUESTIONS?COMMENTS? QUESTIONS?
SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 24