immutable application containers - hpc-ai advisory council · immutable application containers...

Post on 22-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Immutable Application Containers

Reproducibility of CAE-computations through Immutable Application Containers

HPC Advisory Council - Swiss Workshop 2015

Agenda

2

• Docker Introduction • How I got here

QNIBng

QNIBTerminal

QNIBMonitoring

• Immutable Application Containers

3

About Me

• >10y Iteration SysAdmin, SysOps, SysEngineer, R&D Engineer

DevOps @Locafox (hyper-scale web-service)

• Founder of QNIB Solutions Holistic System Management

Containerization of SysOps and Workload

Consultancy / Software Design & Development

Docker in a Nutshell

• Magnitude of different goods Not well suited together

• Different environments to handle them Transactional cost fairly high

5

Logistics Pre-1960

• Standard Container no mixup

easy transaction, no matter the environment

6

Container Logistics

7

Container Matrix

8

Multiple Guests

SERVER

HOST  KERNEL

HYPERVISOR  (Type  II)

KERNEL

SERVICE

Userland  (OS)

KERNEL KERNEL

Userland  (OS)Userland  (OS) Userland  (OS)

SERVICE SERVICE

SERVER

HOST  KERNEL

Userland  (Na<ve) Userland  (#1) Userland  (#2)

SERVICE SERVICE

Traditional Virtualisation Containerisation

IB

IB

HYPERVIS.  (Type  I)

Userland

Container

HostHost

Kernel

Userland

Hypervisor

9

Interface ViewApplication Application

Traditional Virtualisation Containerisation

lightweight abstraction w/o IO overhead or startup

latency

NativeHosted

<<100functions

>100functions

emulated dev /hyper-calls

Syscall Interface

Library Abstraction

–McLuckie (Senior product lead for Google Compute Engine, Feb/2015)

But when it comes to cloud operations, „we see the VM as the only truly safe isolation. …

Until we see foolproof security for containers, we will always double-bag our

customers' workloads.“

10

• Isolation Hypervisor vs. Kernel Namespaces

• Resource Allocation Hypervisor vs. CGroups

• Security Hypervisor vs. SELinux/AppAmor

11

VM vs. Container

• 1/2 Day, July 16th @ISC High Performance Deep dive into the talking points

How Docker might impact System Operations & HPC Applications

Further discussion beyond what I am talking about today

12

Docker Workshop

• Full Day, September 28th @ISC Cloud&BigData

13

Docker Workshop #2

QNIBng

• B.Sc. report in 2011

15

IBPM

• Qualified Networking w/ InfiniBand (next generation) Log/Performance Monitoring targeted at IB layer

16

QNIBng

Day 2

QNIBTerminal

18

QNIBTerminal

http-proxy

srv discovery /health check consul

elk

kibana

logstash

influxdbinfluxdb

grafanagrafana

slurmctldslurmctld

compute0slurmd

compute<N>slurmd

Log/Events

Services

Performance

Compute

elasticsearch

ls-indexer logstash

kopf es-kopf

nginx

• Holistic approach to target complete HPC Stack

QNIBMonitoring

19

QNIBMonitoring

• Containerized Monitoring Stack Logstash Stack (ELK)

20

QNIBMonitoring

• Containerized Monitoring Stack Logstash Stack (ELK)

Performance Monitoring (Graphite-Universe)

21

QNIBMonitoring

• Containerized Monitoring Stack Logstash Stack (ELK)

Performance Monitoring (Graphite-Universe)

Combine Log and Performance Information

22

QNIBMonitoring

• Containerized Monitoring Stack Logstash Stack (ELK)

Performance Monitoring (Graphite-Universe)

Combine Log and Performance Information

Service Discovery / Health Checks

23

QNIBTerminal

SLURMEvents

DockerEvents

• Big head-room to connect layers Trace errors / events, which span over multiple layers

holistic view on complete stack

• High potential within System Operation Configure Once… Run anywhere…

• Independent Environment Training

Integration Test

ProofOfConcept to explore new stack components

24

Conclusion

MPI-Workloads w/ Docker

• 8 nodes (CentOS 7, 2x 4core XEON, 32GB, Mellanox ConnectX-2)

• 3 containers on top

CentOS6/7, Ubuntu12

• SLURM Resource Scheduler

1 native / 3 container partition

• Multiple Open MPI version installed

1.5, 1.6, 1.8

26

HPC Testcluster ‘Venus’

• osu-micro-benchmarks-4.4.1 • osu_alltoall with two tasks on two hosts

27

MPI µBenchmark

$ mpirun -np 2 -H venus001,venus002 $(pwd)/osu_alltoall# OSU MPI All-to-All Personalized Exchange Latency Test v4.4.1# Size Avg Latency(us)1 1.832 1.824 1.748 1.6316 1.6232 1.6864 1.80128 2.77256 3.11512 3.51

28

MPI µBenchmark [result]la

tenc

y [u

s]

0

1

2

3

4

5

Message Size (KB)

4 8 16 32 64 128 256 512 1024

native cos7 cos6 u12

29

MPI µBenchmark [result #2]la

tenc

y [u

s]

0

0,7

1,4

2,1

2,8

distribution 1.5.4 1.6.4 1.8.3

nativecos7cos6u12

oMPI 1.6.4

oMPI 1.6.4

oMPI 1.5.4

oMPI 1.5.4

gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3

• mimics thermodynamic application workload

• LINPACK corrective

30

HPCG Benchmark

31

HPCG - Distributions ResultsG

FLO

P/s

3

3,75

4,5

5,25

6

native cos7 cos6 u12

CentOS 7.0 oMPI 1.6.4 gcc 4.8.2

CentOS 6.5 oMPI 1.5.4 gcc 4.4.7

Ubuntu12.0oMPI 1.5.4 gcc 4.6.3

32

HPCG - Overall ResultsG

FLO

P/s

3

3,75

4,5

5,25

6

distribution 1.5.4 1.6.4 1.8.4

nativecos7cos6u12

oMPI 1.6.4

oMPI 1.6.4

oMPI 1.5.4

oMPI 1.5.4

gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3

• High potential within end-user application Build once… Run anywhere…

• Abstracted user-land enables fine tuning, regardless of bare-metal system

• But: Will the result be equivalent?

33

Conclusion

Immutable App. Containers

• Bundled up applications sounds good but how about the consistency of the results?

• The Container communicates w/ the hosts Kernel results should be fairly stable, since the syscall interface is quite mature

35

Motivation

• Laptop (VirtualBox) MacBook Pro, Intel Core i7 @3GHz, 16GB RAM, 512GB SSD

• Workstation AMD Phenom II X4 955 (3.2GHz) , 4GB RAM, 64GB SSD

• HPC Cluster 8x SUN FIRE X2250, each 2x Intel Xeon X5472, 32GB RAM, QDR InfiniBand

• AWS EC2 Public Cloud (XEN) Compute instance c3.xlarge, Intel Xeon E5-2680, 7.5 GB RAM, SSD Storage

Compute instance c4.4xlarge, Intel Xeon E5-2666v3, 30 GB RAM, SSD Storage

36

Test Bed

• OpenFOAM tutorial ‘cavity’ • 3 solid walls, 1 moving ‘lid’

37

Use-case

• Changes done to increase computational effort Mesh size increased from 20x20x1 to 1000x1000x1000

streched vertices dimensions by factor 10

Iterates 50 times with step-with 0.1ms

• decomposition into 16 cells decompose && mpirun -np16 icoFoam —parallel 2>&1 > log

• Each iteration outputs the average cell pressure Sum of the pressure leads to measure of consistency

38

Test Case setting

• Immutable Application Containers u1204of222: Ubuntu 12.04 & OpenFOAM 2.2.2

u1204of230: Ubuntu 12.04 & OpenFOAM 2.3.0

u1410of231: Ubuntu 14.10 & OpenFOAM 2.3.1

39

OpenFOAM Containers

• MacBook boot2docker (1.4.0), 3.16.7

CoreOS 618, 3.19

• Workstation CentOS6.6, v2.6.32

Ubuntu12.04, 3.13.0 & 3.17.7

Ubuntu14.10, 3.13.0 & 3.18.1

40

Host Systems

• HPC Cluster ‘venus’ CentOS7.0alpha, 3.10.0

• AWS EC2 Ubuntu14.04, 3.13.0

CoreOS 494, 3.17.2

CoreOS 618, 3.19

• Pressure remains the same among minor releases OpenFOAM 2.2.2: 8.6402816 p

OpenFOAM 2.3.0/2.3.1: 8.6402463 p

• Runtime varies a lot

41

Results

MacBookWorkstation

1xHPC.IBHPC.shmc3.xlarge2xHPC.IB

c4.4xlarge4xHPC.IB8xHPC.IB 7

89

1521

4142

6970

• A paper describing the study can be found http://doc.qnib.org/HPCAC2015.pdf

42

Results [cont]

• Cavity iterates towards a deterministic result Use Karman Vortex Street (chaotic, sensitive to random events)

Stable behaviour of the user-land within Immutable Container?

• Diversify Use-Cases EndUser Applications: e.g. EAGER Pipeline (Uni Tübingen)

If you got ideas / use-cases -> talk to me…

• How about GPGPU, MICs? The abstraction of Docker should only cover the same architecture

43

Future Work

• Results remain stable across multiple Harde-ware versions

Operating Systems

Kernel versions

Network Technology (SHM, InfiniBand)

• Keep images and input data could be enough • Vendors are able to fine-tune and certify

Distribution vendors

Software vendors

44

Conclusion

45

Thank you for your attention…

• Picture Credits p5/6: http://de.slideshare.net/dotCloud/why-docker2bisv4

p7: http://blog.docker.com

p25: http://www.umbc.edu/hpcreu/2014/projects/team4.html https://software.sandia.gov/hpcg/

p32: https://www.youtube.com/watch?v=0KLajv6kS6Q

p38: https://www.youtube.com/watch?v=hZm7lc4sC2o

p40: https://www.flickr.com/photos/dharmabum1964/3108162671

top related