mincs - containers in the shell script (eng. ver.)

39
1 MINCS – containers in the shell script @mhiramat Github.com/mhiramat/

Upload: masami-hiramatsu

Post on 24-Jan-2017

965 views

Category:

Software


2 download

TRANSCRIPT

Page 1: MINCS - containers in the shell script (Eng. ver.)

1

MINCS – containers in the shell script

@mhiramatGithub.com/mhiramat/

Page 2: MINCS - containers in the shell script (Eng. ver.)

2

Who

@mhiramatA linux kernel hacker but less chance to coding (><)o

Maintain perf-probe and kprobes

Page 3: MINCS - containers in the shell script (Eng. ver.)

3

At First

This presentation is almost 100% about shell script.

Not about kernel 'C' source code.

Page 4: MINCS - containers in the shell script (Eng. ver.)

4

What is the container?

Container == Docker?

There are other OSS implementations!LXC

Runc

OpenVZ

etc…

So what the container is ...?

Page 5: MINCS - containers in the shell script (Eng. ver.)

5

What is the Docker ?

Docker provides many container related features.Containerize

Packaging software

Managing Layers and its catalog

REST API

etc…

How does it work??

Page 6: MINCS - containers in the shell script (Eng. ver.)

6

Docker is Great, but...

It seems a bit .. too BIGAll the features are hidden in one binary

It is hard to know how it works

Remember the Unix philosophyKeep It Simple, Stupid

We can do it with existing tools

Page 7: MINCS - containers in the shell script (Eng. ver.)

7

Let's mimic it!

Let's try to make a minimal containerHow to use the namespaces

How to bind the devices

How to change the rootfs with chroot/pivot_root

How to use Capabilities and CPUSET etc.

Let's try to overlay the layersNow we have the overlayfs!

How to manage layers

Page 8: MINCS - containers in the shell script (Eng. ver.)

8

 MINCS

Minimum Container Shell-scriptshttps://github.com/mhiramat/mincs

Basic functionsUse PID/Net/UTS/Mount namespaces

Layering with overlayfs

Capabilities, CPUSET and more

POSIX shell script (not bash script)This can work with busybox shell/dash

Page 9: MINCS - containers in the shell script (Eng. ver.)

9

The MINCS

Frontendminc

marten

polecat

Backendminc-exec

minc-coat

minc-leash

minc-farm

minc-trapper

Page 10: MINCS - containers in the shell script (Eng. ver.)

10

Frontend Scripts

Frontends of MINCSMinc : run a command in a container

Marten : manage layered container images

Polecat : make a self executable containerized command

Frontend == parsing optionsSet options to environment vars and call backend scripts

The pair of marten/minc-farm is exception

Page 11: MINCS - containers in the shell script (Eng. ver.)

11

minc

The main tool of MINCSRun a command in a container

Works as chroot

(Or Docker run? :)

Setup namespaces and workspaces by overlayfsDo not need any container images like Docker

No need rootfs dir as chroot (we can reuse current rootfs)

Netns is not enabled by default

[mhiramat@localhost mincs]$ sudo ./minc ps -efUID PID PPID C STIME TTY TIME CMDroot 1 0 0 10:58 ? 00:00:00 ps -ef

Page 12: MINCS - containers in the shell script (Eng. ver.)

12

minc: Usage

Usage: minc [options] [command]Options:

-r/--root ROOTDIR Specify a directory as a rootfs. If omitted, use “/”.

-t/--temp TEMPDIR Specify a working directory. If omitted, use a tmpdir by mkdir.

-k Do not remove working directory

--name UTSNAME Specify the host name in the container

--debug Show the debug log

If the command is omitted, run $SHELL.

Page 13: MINCS - containers in the shell script (Eng. ver.)

13

Dive into the shell script

Let's look into the minc command

Phase 1: Parse the command line and setup env-vars.

Phase 2: Invoke minc-execSetup netns and cpumask (if needed)

Move to the new namespaces

Get correct PID and setup UTSNAME

Setup rootfs for container

Bind device files

Unmount original mounts

Chroot to new rootfs and setup capabilities

Page 14: MINCS - containers in the shell script (Eng. ver.)

14

Minc: command line parsing

Case and while loopGetopts is not used (not so flexible)

While { case & shift } loop

Mainly setup the environment value

After loopCall minc-farm to get image based on UUID

Post-scripting by trap command

Call minc-exec

Page 15: MINCS - containers in the shell script (Eng. ver.)

15

Minc-exec(1) : Overview

Self execution shellscriptUnshare requires some other command to execute, so call the script itself

This is a historical reason – previously minc-exec was chns – 1 script

The first execution is outside a containerSetup netns and cpuset

Call unshare to make a container (namespace)

The second is inside of the containerSwitch the script by checking PID == 1

Hide something from the program running in the container

Device files / unused mount points

Page 16: MINCS - containers in the shell script (Eng. ver.)

16

Minc-exec(2) : netns/cpuset

netnsUse “ip netns” to create new network namespace if needed

Use trap command to remove when the shell exits

Just create an eth pair on the namespacesDo not assign IP address

We can use “pipework” for more networking options

CPUSETJust setup a CPUSET bitmask by using taskset.

Still not using cgroups

Page 17: MINCS - containers in the shell script (Eng. ver.)

17

Intermission: Trap command

Trap is great :)We can handle signal interrupts and exit

Able to call shell script functions

Minc usually use trap for...Remove temporary files/PID file

Show the information messages when exits

Suppress ^C

Page 18: MINCS - containers in the shell script (Eng. ver.)

18

Minc-exec(3): Change namespace

Use unshare to change namespacesRun unshare by passing $0

Pid, mount, ipc, uts namespaces are unshared unshare -iumpf $0 “$@”

For the netns, we use ip netns exec

ip netns exec $MINC_NETNS unshare -iumpf $0 “$@”

Page 19: MINCS - containers in the shell script (Eng. ver.)

19

Minc-exec(4): Setup PID and utsname

Get the original PID (PID in parent namespace)The PID outside container is good to send signal

Since unshare command forks, we can know the PID inside the container.

Even if we separate mount namespace, /proc is still same until remount it.This means we can see /proc/self.

Set up utsnameUse hostname command to setup utsname

Page 20: MINCS - containers in the shell script (Eng. ver.)

20

Minc-exec(5):Mount namespace

Setup mount namespaceIn some environment (with systemd?), mount information propagates to other namespaces

Mount --make-rprivate /Do not propagate all the mount operations

Overlaying workspace via minc-coatMinc-coat backend does overlay on rootfs image.

Do not change rootfs afterwords.

If the rootfs can be changed, use --direct option

Page 21: MINCS - containers in the shell script (Eng. ver.)

21

Minc-coat: Implement overlays

Make root/, storage/, work/ under tempdirRoot/: The mountpoint for overlayfs → $RD

Storage/: Overlayfs top directory →$UD

work: a workdir for overlayfs → $WD

Build a new rootfs via OverlayfsNot only using mount namespaces, but also layering for storage isolation

Some differences are there depends on the versionOverlayfs for upstream kernel

mount -t overlay -o upperdir=$UD,lowerdir=$BASEDIR,workdir=$WD overlayfs $RD

Overlayfs for Ubuntu14.10 (out-of-tree)

mount -t overlayfs -o upperdir=$UD,lowerdir=$BASEDIR overlayfs $RD

Page 22: MINCS - containers in the shell script (Eng. ver.)

22

Minc-exec(6): Special Files

Special files and directoriesMake /etc, /dev, /sys and /proc on new rootfs

Bind mounts under /devTouch dummy files and bind it (like symlink)

/dev/console, /dev/null, /dev/zero, /dev/random, /dev/urandom, /dev/mqueue

(and others, if you need)

/dev/pts are mounted with newinstance

Mount /proc for new PID namespaceOld /proc should be ro remount.

Some files to be readonly (/proc/sys etc.), should be bind-mounted the ro /proc.

Bind mounts /sysThis could be skipped or be read only

Page 23: MINCS - containers in the shell script (Eng. ver.)

23

Intermission: Debug

How to debug it?Just for checking the commands, run it with --debug

This option enables “set -x”

If you want to break into it, write “bash”(or other shell you like)You can do anything :)

Or write a command what you run

MINCS is just a set of shell scriptsYou can change it as you want.

Page 24: MINCS - containers in the shell script (Eng. ver.)

24

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”

At last, call minc-leash to chroot.

Page 25: MINCS - containers in the shell script (Eng. ver.)

25

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”

Filesystem Size Used Avail Use% Mounted ondevtmpfs 740M 0 740M 0% /devtmpfs 748M 0 748M 0% /dev/shmtmpfs 748M 8.5M 740M 2% /runtmpfs 748M 0 748M 0% /sys/fs/cgroup/dev/sda2 15G 8.6G 6.5G 58% /

Before minc

Page 26: MINCS - containers in the shell script (Eng. ver.)

26

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /devtmpfs 740M 0 740M 0% /devtmpfs 748M 0 748M 0% /dev/shmtmpfs 748M 0 748M 0% /sys/fs/cgrouptmpfs 748M 8.5M 740M 2% /runoverlayfs 15G 8.6G 6.5G 58% /tmp/minc1012-NpuyIA/roottmpfs 748M 0 748M 0% /tmp/minc1012-NpuyIA/root/devdevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/consoledevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/nulldevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/zerodevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/randomdevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/urandom

Special files

Page 27: MINCS - containers in the shell script (Eng. ver.)

27

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /.origdevtmpfs 740M 0 740M 0% /.orig/devtmpfs 748M 0 748M 0% /.orig/dev/shmtmpfs 748M 0 748M 0% /.orig/sys/fs/cgrouptmpfs 748M 8.5M 740M 2% /.orig/runoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom

After the first pivot_root

Page 28: MINCS - containers in the shell script (Eng. ver.)

28

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /.origoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom

Remove oldProcfs, etc.

Page 29: MINCS - containers in the shell script (Eng. ver.)

29

Minc-exec(7): Post-process Mountpoint

Remove old mountpointsIf we keep it, it can still be visible after chroot

Use pivot_root to unmount somethings

Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted onoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom

2nd pivot_root andChroot to new rootfs

Page 30: MINCS - containers in the shell script (Eng. ver.)

30

Minc-leash: capabilities and chroot

Leash() = “Least capabilities shell”Limits capabilities and chroot by using capsh(libcap)

Change UID/GID too

If we skip capabilities setting, just do chroot

Wash() = “Wash out the environment variables”MINCS use environment variables internally, clean it up

Unset all the vars start with MINC_*

Page 31: MINCS - containers in the shell script (Eng. ver.)

31

Use cases of MINCS

Good learning material for containersIf you hits some limitations on docker, you can try it, and understand.

Prototyping new features

Containers for embedded devicesIs it wrong to desire running applications in containers on embedded device? :)

Docker(>14MB, docker only) vs MINCS+Busybox(<4MB, +shell and tools)→ Boot2MINC

Page 32: MINCS - containers in the shell script (Eng. ver.)

32

Boot2minc

Minimal ISO image +MINCShttps://github.com/mhiramat/boot2minc

Forked from minimal Linux Live (https://github.com/ivandavidov/minimal  )

IncludingLinux kernel

Busybox(+unshare patch)

MINCS

8MB image including kernel (can run on Qemu-kvm)Able to reduce the size if we optimize the configuration

Page 33: MINCS - containers in the shell script (Eng. ver.)

33

Marten: Manage container images

Minc provides only container featureShould we prepare rootfs via debootstrap?

How to get the rootfs of Fedora/CentOS etc.?

Want to reuse the result of previous container easily

Overlayfs-based container image managerIdentify container images by Docker-like UUID

Track the dependency between images

Import Docker export/saved images

Page 34: MINCS - containers in the shell script (Eng. ver.)

34

Demonstration

Minc

Marten

Boot2minc

Page 35: MINCS - containers in the shell script (Eng. ver.)

35

TODO

mincWork with pipework

Correct TTY support via tmux/screen

Use cgroups to limit cpu/memory/io usage (minc-cage?)

Plugin support of btrfs and dm-thin

MartenContainer execution command (like docker run)

Support OCI compatible container export/import and signing

Page 36: MINCS - containers in the shell script (Eng. ver.)

36

Known Issues

TestcasesWell, we can make it by shell script too :)

CapshCapsh only accepts “sh -c” type command

It doesn't accept escape characters…

Page 37: MINCS - containers in the shell script (Eng. ver.)

37

Conclusion

What I'd like to say is

“We can run a container by combining commands”Docker etc. is not a special, we've already have fundamental tools.

And

“Shell script is great!”

Page 38: MINCS - containers in the shell script (Eng. ver.)

38

END

Thank you very much!:)

https://github.com/mhiramat/mincs

Page 39: MINCS - containers in the shell script (Eng. ver.)

39

Example: Import image from Docker

# docker save centos | gzip - > centos.tar.gz # marten import centos.tar.gzImporting image: centos511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c1585b12ef8fd57065237a6833039acc0e7f68e363c15d8abb5cacce7143a1f7de8a8efe422e6104930bd0975c199faa15da985b6694513d2e873aa2da9ee402174c # marten imagesID SIZE NAME511136ea3c5a 4.0K (noname)5b12ef8fd570 4.0K (noname)8efe422e6104 224M centos # minc -r centos /bin/bash