your cluster deploy containers on · dockerhub can monitor git repositories and rebuild a new...

35
Deploy containers on your cluster - A proof of concept

Upload: others

Post on 29-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Deploy containers on your cluster

- A proof of concept

Page 2: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What is HPC cluster (in my world!)

Where do I come from?

Run and maintain a bioinformatics cluster at

Bioinformatic Research Centre (BiRC),Aarhus University

E-mail: [email protected]

The setup● 3000+ cores

● 3.5PB parallel file system (henceforth known as /faststorage )

● Use SLURM as our scheduler

Page 3: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What is HPC cluster (in my world!)

A bunch of servers connected together with access to a shared file system

Pipelines are spread into parallel pieces and run on multiple nodes at onces, to achieve accumulated speedup

A multiuser system. Pipelines are run by unprivileged users (no root!)

Everything is orchestrated by a scheduler. Takes care of resource sharing. E.g:

● Kill jobs that takes to long

● Enforces the limits of cores+memory of each job

● Packs multiple jobs from multiple users together on as few nodes as possible

Page 4: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What is HPC cluster (in my world!)

What kinds of jobs do we run?

Lots of data- Large input datasets, large shared reference dataset- Sensitive data

Lots of different software by lots of different people- Versions keeps on changing

Work-in-progress pipelines- Batches are seldom run twice. But a batch can have 50,000 of the

same job-type

Everything is in flux

Page 5: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Docker

Docker: A Revolutionary Change in Cloud Computing

Page 6: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Docker

Docker: A Revolutionary Change in Cloud Computing

Page 7: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Docker

Dockers focus:Make software run the same anywhere

● Use containers to make software OS independent

● Take over networking, to make containers datacenter

environment independent○ no static/fixed ip’s

● One storage model, to make it backing independent○ image/container content is just fills in your filesystem

Docker takes care of many of the nitty gritty details and lets you focus on package

your software ones and for all

Page 8: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What are linux containers?

Chroot on steroids

● Each container comes with its own OS● Spawning a container runs a new init. Every running container on the host is a independent

OS running on the system

Uses features i Linux-kernel to achieve process isolation

● Cgroups for resource management● Linux namespaces for process isolation

Leverage OverlayFS in data/deployment model

● Spawn multiple containers from the same template without copying a thing

Page 9: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What are linux containers?

Has been long underway. Full support under anything by Ubuntu/Debian can be tricky

Linux Namespaces

PID namespaceNetwork namespace

UTS namespace(hostname)User namespace(UID/GID)

Mount namespace

Page 10: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What are linux containers?

Why is this powerful?

● Container will work the same anywhere

● Each container is isolated

○ Allow unprivileged users to run anything. Let them become root

● Utilize OverlayFS

○ Spawn a new full OS in under a second

○ Spawning multiple containers from the same template takes up no extra space

● No hypervisor, just native performance

○ No need for syscall translation => No overhead

○ Run 100+ containers on one host

Back to Docker =>

Page 11: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Docker is by far the most popular container implementation

The design philosophy of docker has been adopted wholesale

● Creating docker images through recipes (Dockerfile)

● Running containers are ephemeral

● Make docker images reusable by others

● Images are be easy to publish and to download and use

● Split your software stack into smaller units by containerizing one

service at a time

Docker

Page 12: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

● Has gain serious traction amongst companies/developers working in the cloud.

Here Docker and its philosophy helps:

Plan, structure, develop and deploy the software stack

● Lots of effort has been but into containerizing existing software

stack (also in academia)○ Restructure code under a better more scalable model

○ Cloud ready

○ Get in while the buzz it hot

Docker

Page 13: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Some of the heavy hitters

From academia

Docker

Björn Grüning (bgruening) from University of Freiburg

Page 14: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Meanwhile in HPC...

Page 15: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Can we get Docker into our HPC clusters?

How can we capitalize?

A lot of software has already been dockerized.

Projects like:https://github.com/BioDocker/containers

Or easy to get into containerize:https://github.com/mulled/mulled

And the list of container resources gross every day

How can we deploy all these containers with ready to use software inside our HPC cluster?

Page 16: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Merging containers into cluster computing

Let's look at the pipeline

● Individual pieces of software strung together in a chain*● Each link in the chain takes output from the previous link and uses it as input.

Instead of the actual software being the link, how about using containers?

To rephrase:

Split your pipeline into smaller units by containerizing one link at a time

● Makes your pipelines cluster independent**

● Much of the development can be done off-cluster, on your own system

● Write your awesome software once, and everybody can use it. #citations

● Reuse others (a little bit less awesome) software in your pipeline

*A lattice I guess, or else we wouldn't be doing stuff in parallel

**well no. But a step in the right direction

Page 17: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Use case - The cluster user

Missing a piece of software?

Search the web for existing images:

● https://hub.docker.com● https://github.com/BioDocker/containers● https://github.com/mulled/mulled● https://docker-ui.genouest.org/app/#/containers

Or query from the cmd:

$:> docker search bowtie2 *

Find a link in a research paper

*This does require mulled, biodocker etc. to be setup as repos

Page 18: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Use case - The cluster user

No luck? Build your own container.

$:> mkdir bowtie2 && cd bowtie2$:> vim Dockerfile 1 FROM ubuntu 2 3 RUN apt-get update -qq --fix-missing 4 RUN apt-get install -qq -y wget unzip 5 RUN wget -q -O bowtie2.zip https://sourceforge.net/.../bowtie2-2.2.9-linux-x86_64.zip/download 6 RUN unzip bowtie2.zip -d /opt/ 7 RUN ln -s /opt/bowtie2-2.2.9 /opt/bowtie2 8 RUN rm bowtie2.zip 9 10 ENV PATH $PATH:/opt/bowtie2

$:> docker build -t bowtie2-2.2.9 .

$:> docker images

REPOSITORY TAG IMAGE ID CREATED SIZE

bowtie2-2.2.9 latest 49c23f71b287 9 seconds ago 289 MB

ubuntu latest c73a085dc378 5 days ago 127 MB

$:> docker run --rm -it bowtie2-2.2.9 bowtie2 -h

Bowtie 2 version 2.2.9 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)

Usage:

bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

...

Page 19: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Use case - The cluster user

Push our own work to dockerhub for others to re-use:

$:> docker push bowtie2-2.2.9

Docker images can be pushed to repositories (dockerhub being one), and automatically pulled in if needed.

● Dockerhub can monitor git repositories and rebuild a new docker image on commits.

Setup a (private) docker repository on your local network that pulls content from the most relevant global repos.

Each docker daemon can stream in >1GB docker images within seconds.

Page 20: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What would we like to achieve?

● Make your lives as user easier by reusing existing and working docker images from papers, colleage, previous projects

● Make your lives as an administrator easier by not maintaining a plethora of software compiled to custom specifications from source

● Make our pipelines easier to rerun on a different cluster, by packaging the software into docker images that can run everywhere

Page 21: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

What do we need?

1. Mapping of dataEnable containers to work on the data (massive in size) on the HPC filesystem like any piece of software (within reason ;))

2. Resource limitingA way for the docker daemon to run under the resource management of SLURM, so that the scheduler can do resource sharing.

3. Maintain securityA cluster user should never be able to achieve priviledge escalation (of any sort)

● Alice should only be able to run as alice● No one but Alice should be able to run as alice

Page 22: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Map data from host to container via mount-bind

docker run -v /storage:/storage debian /bin/bash

Idear:Make a 1-1 map of the shared storage into the container. File paths are the same

outside and inside a container. Easy to work with.

Example:

#sbatch

docker run -v /storage:/storage tool_a /storage/input -o /storage/output.adocker run -v /storage:/storage tool_b /storage/output.a -o /storage/output.bdocker run -v /storage:/storage cat /storage/output.b

#sbatch

tool_a /storage/input -o /storage/output.atool_b /storage/output.a -o /storage output.bcat /storage/output.b

Page 23: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Problem solved. Let crack on

Page 24: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Problem solved. Let crack on

Major break of nr. 3: Maintain security

Docker defaults

● Containers run as root

● Anyone in the docker group can spawn containers

● All are equal in eyes of the daemon

● Alice get to spawn just as much as root does

Page 25: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Evil Alice

Mapping part of the host OS into a container, Alice can act like root in the mother OS.

What about:

docker run -v /storage/sensitive_data:/unsensitive_data debian /bin/bash

And even worse:

docker run -v /etc/shadow:/root/shadow debian /bin/bash

Read-write access to our password file!

Page 26: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Unprivileged containers

Any storage that is mapped inside a container retain the restrictions of the user spawning

Filesystems doesn’t have multiple and separate UID/GID ranges

Utilize the size of this UID/GID space, and shift containers into unused UID/GID’s to isolate them.

UID/GID gets translated back and forth when

Unprivileged containers has existed and been used in LXC for a while.

Fairly new (and unknown) option Docker

Page 27: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

Who does it work?

Assign a isolated UID-space and GID-space to a user

2 new files /etc/subuid and /etc/subgid

Use these UID/GID’s inside the container

$:> usermod --add-subuids 100000-165536 alice$:> usermod --add-subgids 100000-165536 alice

$:> docker daemon --userns-remap alice:alice &$:> docker run --rm -it -v /etc/shadow:/root/shadow debian /bin/bash#:> touch /etc/shadow#:> touch /root/shadowtouch: cannot touch '/root/shadow': Permission denied

*Available in Ubuntu since 14.04. But not in CentOS 7 yet.

Page 28: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

That was a step too far!

What about reference data, input data and output data?

Soulution:

Shift UID’s and GID’s into boring isolation but keep the UID of the user and GID on the project.

cat /etc/subgidplants:100000:10000plants:10000:1plants:110000:64535

cat /etc/subuidalice:100000:1000alice:1000:1alice:101001:64535

Page 29: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

$:> docker daemon --userns-remap alice:plant &$:> docker run --rm -it \ -v /etc/shadow:/root/shadow \ -v /storage:/storage debian /bin/bash

#:> touch /root/shadowtouch: cannot touch '/root/shadow': Permission denied#:> cd /storage#:> lshumans lost+found plants#:> ls humans/ls: cannot open directory humans/: Permission denied#:> ls plants/some_plant.gene

Succes!

Page 30: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Mapping of data

What did we need?

● Edit /etc/subuid and /etc/subgid to shift anything but the user uid and

project gid into a isolated uid/gid range

● Multiple running docker daemons. One pr. <user>:<group> mapping

○ Add --userns-remap to restrict container file access

○ Add --group to restrict access to the docker daemon

Your users are now able to run containers on your filesystem!

docker daemon \ --graph=/mnt/scratch/$USER.$PROJECT/docker \ --pidfile=/mnt/scratch/$USER.$PROJECT/docker.pid \ -H unix:///mnt/scratch/$USER.$PROJECT/docker.sock \ --group=$USER_ID \ --userns-remap=$USER_ID:$GROUP_ID

Page 31: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

In any HPC cluster the scheduler must have total resource control.

● Jobs are run with the privileges of the use● Processes are subprocesses of slurmd

● Docker daemon must be spawned by root● Containers run as subprocesses of the docker daemon

1. Unprivileged user must be able to start the docker daemon2. The scheduler must be able to monitor/control the resources of docker3. When a job is killed, all containers spawned by that job must die

Resource limiting

Page 32: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Resource limiting

SLURM already uses cgroups. And that is all we need

● Write a setuid script start_docker that assert permissions and forks out a

docker daemon locked to the <user>:<project>

● Run start_docker inside a job to use containers

○ The cgroup stay with the daemon. Monitoring/limiting its resources

● Use SLURMs epilog-hook to cleanup afterwards

○ Kills docker daemon and containers if still running

○ Delete any container leftovers

Page 33: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Check the process tree

And the cgroup

Resource limiting

alice@vm47:~$ cat /proc/self/cgroup 11:name=systemd:/user/0.user/6.session10:hugetlb:/user/0.user/6.session...alice@vm47:~$ cat /proc/`pidof dockerd`/cgroup 11:name=systemd:/user/0.user/6.session10:hugetlb:/user/0.user/6.session...

slurmstepd ├─bash │ ├─pstree 20238 -a │ └─sudo docker_daemon plants │ └─docker_daemon /usr/local/bin/docker_daemo... │ └─dockerd --graph=/mnt/scratch/alice.pl... │ ├─docker-containe -l unix:///var/ru... │ │ └─7*[{docker-containe}] │ └─14*[{dockerd}] └─5*[{slurmstepd}]

Page 34: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Limitations

This is a proof of concept

● Docker locks /etc/passwd and /etc/groupNo way to inject user/project names. Only UID and GID available

● Dockers --userns-remap limits user to one project at a timeLimitations in the kernel make this unlikely to change

● Limitations in the kernel allow no more than 5 lines in subgid(!?)

* There is an (arbitrary) limit on the number of lines in the file. As at Linux 3.18, the limit is five lines.

- user_namespaces manpage

Page 35: your cluster Deploy containers on · Dockerhub can monitor git repositories and rebuild a new docker image on commits. Setup a (private) docker repository on your local network that

Limitations

● How about network? ○ How to communicate with containers on different nodes?○ How about RDMA?

● Docker is still in very active development

Docker 1.8 - August 12, 2015Docker 1.9 - November 3, 2015Docker 1.10 - February 4, 2016Docker 1.11 - April 13, 2016Docker 1.12 - June 20, 2016

All saw major changes and introduction of concepts and features.

● Not all features are support in the major distribution○ Ubuntu/debian ✓

○ Archlinux ✗

○ CentOS ✗