adac6 · why hpc containers april 2017 cray inc. confidential and proprietary mobility of computing...

24
ADAC6 Appropriate Use of Containers in HPC

Upload: others

Post on 13-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

ADAC6Appropriate Use of Containers in HPC

Page 2: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Agenda

April 2017 Cray Inc. Confidential and Proprietary

● Why HPC containers● Investigate different container runtimes for HPC ● Integration with batch systems● Best practices for HPC● HPC Application performance

2

Page 3: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Why HPC containers

April 2017 Cray Inc. Confidential and Proprietary

● Mobility of computing to both users and HPC centers● Means to capture and distribute software and compute environments● Entire workflow can be contained for others to replicate no matter what

version of Linux they are running (Kernel ABI compatibility – syscall’s)● Reproducibility of results – may not mean same performance

● MPI-ABI, libraries, host drivers● Standard and simplified packaging

● Independent Software Vendors (ISV) codes● Delivered benchmarks● CRI standard and/or propriety formats

● Legacy workload packaging and execution

3

Page 4: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Container Runtime Environments

April 2017 Cray Inc. Confidential and Proprietary

● Enterprise, and HPC container environments● opencontainers/runc

● 236 contributors, 17 release. 3,613 commits● Standard image format (Open Container Initiative – OCI)

● rtk/rtk● 197 contributors, 65 releases , 5,539 commits● Standard image format (appc Specification – Application Container Image, ACI)

● NERSC Shifter (github)● 11 contributors, 5 releases, 1,712 commits● Propriety image format

● LBL Singularity (syslabs.io)● 63 contributors, 19 releases, 4,503 commits● Propriety image format

4

As of 06/18/18

Page 5: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Integration with batch systems

April 2017 Cray Inc. Confidential and Proprietary

● Examples of Moab/Torque and Cray ALPS● Application based utilities● Limited or no cgroup and kernel namespace support

① $ aprun -n N -b runc --bundle /tmp/cle.img run $(date +%Y%m%d%H%M)② $ aprun -n N… -b rkt run \

--stage1-name=coreos.com/rkt/stage1-fly:1.21.0 \registry-1.docker.io/library/cle:latest --net=host --exec=/bin/hostname

③ $ aprun -n N… -b shifter --image cle:latest /bin/hostname④ $ aprun -n N… -b singularity exec /global/cle.img /bin/hostname

5

Page 6: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Container Runtime Specifics

April 2017 Cray Inc. Confidential and Proprietary

● Container runtimes configuration and semantics● Unique image management● Runtime arguments are specific

6

Page 7: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Configuration & Runtime options

April 2017 Cray Inc. Confidential and Proprietary

● runc● Embedded in the OCI image definition: config.json

● rkt● /etc/rkt, /usr/lib/rkt, user-defined

● Repository authentication, data and image locations● Command line can override system configurations

● Shifter● System configuration (/etc/shifter)

● Authentication, data and image locations● Singularity

● System configuration $SYSCONFDIR/singularity/singularity.conf● Authentication, data and image locations

7

Page 8: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

runc

April 2017 Cray Inc. Confidential and Proprietary

● Create a root file system, created via Docker container$ docker export alpine | tar xvfC - ./rootfs

● Create container configuration file config.json$ runc spec$ vi config.json # modify the args, see OCI specification

● Run the container$ runc --log /dev/stdout run $(date +%Y%m%d%H%M)

8

OCI specification: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md

Page 9: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

rkt

April 2017 Cray Inc. Confidential and Proprietary9

● Pull an image locallydocker2aci$ docker2aci docker://jsparkscraycom/cle:6.0.2

● Run the container$ qsub –Isalloc: Granted job allocation 88$ rkt run --no-overlay=true --net=host \

--stage1-name=coreos.com/rkt/stage1-fly:1.22.0 \registry-1.docker.io/jsparkscraycom/cle:6.0.2 \${options} –inherit-env=true \--environment=LD_LIBRARY_PATH=$LD_LIBRARY_PATH \--exec=/lus/${USER}/a.out -- myargs

Page 10: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Shifter

April 2017 Cray Inc. Confidential and Proprietary

● Pull an image$ shifterimg pull jsparkscraycom/test:latest

● Run the container$ shifter --image jsparkscraycom/test:latest \/usr/bin/date

10

Page 11: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Singularity

April 2017 Cray Inc. Confidential and Proprietary

● Pull an image (if needed)$ singularity pull docker://jsparkscraycom/test:latest

● Run the container$ singularity exec \

docker://jsparkscraycom/test:latest \/usr/bin/date

11

Page 12: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Container Best Practices

April 2017 Cray Inc. Confidential and Proprietary

● Portability● Run anywhere vs. exploit host acceleration● Isolated contained environment, aka encapsulation – full vs. partial

● Image standardization compliance● OCI, ACI and custom (singularity)

● HPC Optimizations● Override container defaults via mounts and environment variables

● Libraries and configurations● Framework independence

● No vendor/runtime lock in● Stripped frameworks

● opencontainers/runc, Charliecloud, rtk

12

Page 13: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Containerization Models

April 2017 Cray Inc. Confidential and Proprietary13

● HPC case● HPC runtimes

(Shifter/Singularity/...)● Partial encapsulation –

limited namespaces● Access to host resources –

networks, storage● blurs the line between

container and host such that local directories can exist within the container.

● Multiple applications● Batch system integration● Image format for scale (pfs)

● Enterprise case● Standard runtimes

(docker/rkt)● Fully encapsulated

● Everything the application needs is in the container

● Single application● Orchestration software● Standard image formats

Page 14: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

HPC Partial Encapsulation

April 2017 Cray Inc. Confidential and Proprietary14

Container

binaries, libraries, configurations

PFS

ISV app, libs, configuration

ISV app, libs,configuration,

directories

exec() / mount()

filesystem mount

ssh/sshd

/home, /var/lib/hugetlbfs, /opt/nvidia, /lustre

Page 15: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

HPC Application performance

April 2017 Cray Inc. Confidential and Proprietary

● Launch times● Time to setup and launch via container runtime

● Application performance● Hugepage optimization● Environment pass-through

15

An Updated Performance Comparison of Virtual Machinesand Linux ContainersRef: http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf

Page 16: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Launch Times

April 2017 Cray Inc. Confidential and Proprietary16

0. 00

2. 00

4. 00

6. 00

8. 00

10 .0 0

12 .0 0

14 .0 0

16 .0 0

18 .0 0

20 .0 0

2 4 8 16 32 64

Sec

on

ds

N um ber of nodes

Container Execution OverheadExecution time of /bin/true

ba re O S

sh if te r

r kt

r unC w / Lu str e

si ngu la ri t y

r unC w / t m pf s

Page 17: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

OSU Benchmarks

April 2017 Cray Inc. Confidential and Proprietary17

0. 00

10 0. 00

20 0. 00

30 0. 00

40 0. 00

50 0. 00

60 0. 00

0 2 832 12

851

220

4881

9232

768

1310

72

5242

88

2097

152

Lat

ency

(u

s)

S ize

OSU One Sided MPI_GET latency Test v3.8

r kt

S hif t er

C LE

S ing ul ar it y

0

10 00

20 00

30 00

40 00

50 00

60 00

70 00

80 00

90 00

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU One Sided MPI_GET Bandwidth Test v3.8

r kt

S hif t er

C LE

S ing ul ar it y

Page 18: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Hugepage / DMAPP

April 2017 Cray Inc. Confidential and Proprietary18

0

10 00

20 00

30 00

40 00

50 00

60 00

70 00

80 00

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU MPI One Slided MPI_Get Bandwidth

S hif t er w/ ohu gep age s

S hif t er whu gep age s

010 0020 0030 0040 0050 0060 0070 0080 0090 00

10 000

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU MPI One Sided MPI_Get Bandwidth

Hugepages + DMAPP

S hif t er w/ oD M AP P

S hif t er wD M AP P

Page 19: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

NPB Single node

April 2017 Cray Inc. Confidential and Proprietary19

0. 0020 .0 0

40 .0 060 .0 0

80 .0 010 0. 00

12 0. 0014 0. 00

16 0. 0018 0. 00

20 0. 00

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

bt cg dc ep f t i s l u sp ua

Tim

e in

sec

on

ds

B enchm ark

NAS Parallel Benchmarks 3.3Serial Single node CLASS=A

N PB - S in gle n ode

Page 20: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

NPB Multi-node

April 2017 Cray Inc. Confidential and Proprietary20

0. 00

20 .0 0

40 .0 0

60 .0 0

80 .0 0

10 0. 00

12 0. 00

14 0. 00

CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC

bt cg ep f t i s l u m g sp

be nch m ar k

Tim

e in

sec

on

ds

NAS Parallel Benchmarks 3.3NPROCS=256 CLASS=D

N PB

Page 21: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Quantum ESPRESSO – optimization

April 2017 Cray Inc. Confidential and Proprietary21

0. 00

50 .0 0

10 0. 00

15 0. 00

20 0. 00

25 0. 00

30 0. 00

35 0. 00

36 72 10 8 14 4 18 0 21 6 25 2 28 8

Ex

ec

uti

on

tim

e (

se

cs

)

cores

Execution time of QUANTUM ESPRESSO 6.0

/ Broadwellhugepage

C LE

S hif t er

S ing ul ar it y

0. 00

50 .0 0

10 0. 00

15 0. 00

20 0. 00

25 0. 00

30 0. 00

35 0. 00

36 72 10 8 14 4 18 0 21 6 25 2 28 8

Ex

ec

uti

on

tim

e (

se

cs

)

cores

Execution time of QUANTUM ESPRESSO 6.0

/ Broadwellnone-hugepage

C LE

S hif t er

S ing ul ar it y

Page 22: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Radioss – optimization

April 2017 Cray Inc. Confidential and Proprietary22

0

10

20

30

40

50

60

36 72 10 8 14 4

Ela

pse

d T

ime

(sec

s)

N um ber of C ores

Radioss (Intel MPI): Normalized to CLEBroadwell ppn:36

w /o h uge pag es

hu gep age s

I nt el l ib ra r ies

en vir o nm en t pr op oga ti on

Page 23: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Conclusions and Future Work

April 2017 Cray Inc. Confidential and Proprietary

● Deployment● Enterprise frameworks required per-node image caches● Enterprise frameworks, as expected offer more runtime options● HPC frameworks enforce strict user security

● Performance● Native application performance can be achieved, requires host-level access

to resources (network, file system)● Host environment pass-through● Launch time dependent on container infrastructure and image size

● Future work● Scaling investigation of open container frameworks

● Shared image storage – faster startup times● Abstraction layer standardizing on applications packaging, deployed and run

in isolation – OCI.

23

Page 24: ADAC6 · Why HPC containers April 2017 Cray Inc. Confidential and Proprietary Mobility of computing to both users and HPC centers Means to capture and distribute software and compute

Legal Disclaimer

April 2017 Cray Inc. Confidential and Proprietary24

Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document.

Cray Inc. may make changes to specifications and product descriptions at any time, without notice.

All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviatefrom published specifications. Current characterized errata are available on request.

Cray uses codenames internally to identify products that are in development and not yet publicly announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user.

Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA and YARCDATA. The following are trademarks of Cray Inc.: CHAPEL, CLUSTER CONNECT, CLUSTERSTOR, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX, LIBSCI, NODEKARE, REVEAL. The following system family marks, and associated model number marks, are trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT and XT. The registered trademark LINUX is used pursuant to a sublicense fromLMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used on this website are the property of their respective owners.