qnibterminal plus infiniband - containerized mpi workloads
DESCRIPTION
In this deck, Christian Kniep presents: QNIBTerminal Plus InfiniBand - Containerized MPI Workloads. Watch the video presentation: http://wp.me/p3RLHQ-dvMTRANSCRIPT
QNIBTerminal plus InfiniBandContainerized MPI Workloads
2014-11-05Christian Kniep
insideHPC EditionSlides slightly modified in comparison
to the HPC Advisory Council
Agenda• Docker in a Nutshell • QNIBTerminal
• Testbed • MPI Benchmark • HPCG-Results
• Future Work • Conclusion
2
Docker in a Nutshell
3
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
Docker in a Nutshell
4
• (chroot on steroids)2
• intuitive build system
Docker in a Nutshell
5
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
• RedHat backing
• public repositories
• intuitive build system
Docker in a Nutshell
6
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
Traditional vs. Lightweight Layers
7
SERVER
HOST KERNEL
HYPERVISOR
KERNEL
SERVICE
Userland (OS)
KERNEL KERNEL
Userland (OS)Userland (OS) Userland (OS)
SERVICE SERVICE
SERVER
HOST KERNEL
SERVICE
Userland (OS)
Userland (OS)Userland (OS) Userland (OS)
SERVICE SERVICE
Traditional Virtualisation Containerisation
IB
IB
QNIBTerminalMotivation
8
Plain Metrics
QNIBTerminalMotivation
9
Plain Log Events
QNIBTerminalMotivation
10
Overlap Metrics/Log Events
QNIBTerminal Overview
11
haproxy haproxy
dnshelixdns
elk
kibana
logstash
etcd
carboncarbon
graphite-webgraphite-web
graphite-apigraphite-api
grafanagrafana
slurmctldslurmctld
compute0slurmd
compute<N>slurmd
Log/Events
Services Performance
Compute
elasticsearch
One Node Setup• All network traffic over bridge• Crippled MPI workload
• Multiple Open MPI version installed
• gcc versions
• 3 containers on top (CentOS 6, CentOS 7, Ubuntu 12)
• SLURM Resource Scheduler
• 1 native partition
• 3 containers partitions
Testbed
12
• 8 nodes (CentOS 7, 2x 4core XEON, 32GB, Mellanox ConnectX-2)
• osu-micro-benchmarks-4.4.1
• osu_alltoall with two tasks on two hosts
13MPI benchmark was not in original HPC Advisory Council Presentation
MPI Benchmark
$ mpirun -np 2 -H venus001,venus002 $(pwd)/osu_alltoall# OSU MPI All-to-All Personalized Exchange Latency Test v4.4.1# Size Avg Latency(us)1 1.832 1.824 1.748 1.6316 1.6232 1.6864 1.80128 2.77256 3.11512 3.51
MPI Benchmarkdistribution’s results [2 task @2nodes]
14
late
ncy
[us]
0
1
2
3
4
5
Message Size (KB)
4 8 16 32 64 128 256 512 1024
native cos7 cos6 u12
MPI benchmark was not in original HPC Advisory Council Presentation
15
late
ncy
[us]
0
0,7
1,4
2,1
2,8
distribution 1.5.4 1.6.4 1.8.3
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
MPI BenchmarkOpen MPI comparison [2 task @2nodes, avg(1B->64B)]
• mimics thermodynamic application workload
• Linpack corrective / successor in the long-term?
16
HPCG Benchmark
17
GFL
OP/
s
3
3,75
4,5
5,25
6
native cos7 cos6 u12
CentOS 7.0 oMPI 1.6.4 gcc 4.8.2
HPCG Benchmarkdistribution’s results
18
GFL
OP/
s
3
3,75
4,5
5,25
6
native cos7 cos6 u12
CentOS 7.0 oMPI 1.6.4 gcc 4.8.2
CentOS 6.5 oMPI 1.5.4 gcc 4.4.7
Ubuntu12.04 oMPI 1.5.4 gcc 4.6.3
HPCG Benchmarkdistribution’s results
19
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
HPCG BenchmarkOpen MPI comparison
20
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution 1.6.4 1.8.4
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
HPCG BenchmarkOpen MPI comparison
21
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution 1.5.4 1.6.4 1.8.4
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
HPCG BenchmarkOpen MPI comparison
• Security evaluations
• Compare different frameworks to orchestrate
• Use of SV-IOR (Keynote earlier today)
• Compare with tuned bare-metal
• Tune docker installation
Future Work
22
• Benchmark real-world applications
• Out-of-the-box: container beats bare-metal
• Continuous testing/deployment of containerized workloads
• Bare-metal kernel provides access to IB
• Container in charge from MPI upwards
Conclusion
23
• Bunch of tooling within docker ecosystem
• Abstraction bare-metal / application works fine
• Low performance overhead
• Contact • @CQnib / @qnibinc • [email protected] • http://qnib.org
La Fin
24
https://www.flickr.com/photos/dharmabum1964/3108162671
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @qnibinc • [email protected] • http://qnib.org
La Fin
25
https://www.flickr.com/photos/dharmabum1964/3108162671
La Fin
26
https://www.flickr.com/photos/dharmabum1964/3108162671
• Interested? • Docker Pitch today • Internal Evaluations • Workshops / Talks
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @qnibinc • [email protected] • http://qnib.org
La Fin
27
https://www.flickr.com/photos/dharmabum1964/3108162671
• Interested? • Docker Pitch today • Internal Evaluations • Workshops / Talks
• Questions?
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @_qnib • [email protected] • http://qnib.org