mincs - containers in the shell script (eng. ver.)
TRANSCRIPT
1
MINCS – containers in the shell script
@mhiramatGithub.com/mhiramat/
2
Who
@mhiramatA linux kernel hacker but less chance to coding (><)o
Maintain perf-probe and kprobes
3
At First
This presentation is almost 100% about shell script.
Not about kernel 'C' source code.
4
What is the container?
Container == Docker?
There are other OSS implementations!LXC
Runc
OpenVZ
etc…
So what the container is ...?
5
What is the Docker ?
Docker provides many container related features.Containerize
Packaging software
Managing Layers and its catalog
REST API
etc…
How does it work??
6
Docker is Great, but...
It seems a bit .. too BIGAll the features are hidden in one binary
It is hard to know how it works
Remember the Unix philosophyKeep It Simple, Stupid
We can do it with existing tools
7
Let's mimic it!
Let's try to make a minimal containerHow to use the namespaces
How to bind the devices
How to change the rootfs with chroot/pivot_root
How to use Capabilities and CPUSET etc.
Let's try to overlay the layersNow we have the overlayfs!
How to manage layers
8
MINCS
Minimum Container Shell-scriptshttps://github.com/mhiramat/mincs
Basic functionsUse PID/Net/UTS/Mount namespaces
Layering with overlayfs
Capabilities, CPUSET and more
POSIX shell script (not bash script)This can work with busybox shell/dash
9
The MINCS
Frontendminc
marten
polecat
Backendminc-exec
minc-coat
minc-leash
minc-farm
minc-trapper
10
Frontend Scripts
Frontends of MINCSMinc : run a command in a container
Marten : manage layered container images
Polecat : make a self executable containerized command
Frontend == parsing optionsSet options to environment vars and call backend scripts
The pair of marten/minc-farm is exception
11
minc
The main tool of MINCSRun a command in a container
Works as chroot
(Or Docker run? :)
Setup namespaces and workspaces by overlayfsDo not need any container images like Docker
No need rootfs dir as chroot (we can reuse current rootfs)
Netns is not enabled by default
[mhiramat@localhost mincs]$ sudo ./minc ps -efUID PID PPID C STIME TTY TIME CMDroot 1 0 0 10:58 ? 00:00:00 ps -ef
12
minc: Usage
Usage: minc [options] [command]Options:
-r/--root ROOTDIR Specify a directory as a rootfs. If omitted, use “/”.
-t/--temp TEMPDIR Specify a working directory. If omitted, use a tmpdir by mkdir.
-k Do not remove working directory
--name UTSNAME Specify the host name in the container
--debug Show the debug log
If the command is omitted, run $SHELL.
13
Dive into the shell script
Let's look into the minc command
Phase 1: Parse the command line and setup env-vars.
Phase 2: Invoke minc-execSetup netns and cpumask (if needed)
Move to the new namespaces
Get correct PID and setup UTSNAME
Setup rootfs for container
Bind device files
Unmount original mounts
Chroot to new rootfs and setup capabilities
14
Minc: command line parsing
Case and while loopGetopts is not used (not so flexible)
While { case & shift } loop
Mainly setup the environment value
After loopCall minc-farm to get image based on UUID
Post-scripting by trap command
Call minc-exec
15
Minc-exec(1) : Overview
Self execution shellscriptUnshare requires some other command to execute, so call the script itself
This is a historical reason – previously minc-exec was chns – 1 script
The first execution is outside a containerSetup netns and cpuset
Call unshare to make a container (namespace)
The second is inside of the containerSwitch the script by checking PID == 1
Hide something from the program running in the container
Device files / unused mount points
16
Minc-exec(2) : netns/cpuset
netnsUse “ip netns” to create new network namespace if needed
Use trap command to remove when the shell exits
Just create an eth pair on the namespacesDo not assign IP address
We can use “pipework” for more networking options
CPUSETJust setup a CPUSET bitmask by using taskset.
Still not using cgroups
17
Intermission: Trap command
Trap is great :)We can handle signal interrupts and exit
Able to call shell script functions
Minc usually use trap for...Remove temporary files/PID file
Show the information messages when exits
Suppress ^C
18
Minc-exec(3): Change namespace
Use unshare to change namespacesRun unshare by passing $0
Pid, mount, ipc, uts namespaces are unshared unshare -iumpf $0 “$@”
For the netns, we use ip netns exec
ip netns exec $MINC_NETNS unshare -iumpf $0 “$@”
19
Minc-exec(4): Setup PID and utsname
Get the original PID (PID in parent namespace)The PID outside container is good to send signal
Since unshare command forks, we can know the PID inside the container.
Even if we separate mount namespace, /proc is still same until remount it.This means we can see /proc/self.
Set up utsnameUse hostname command to setup utsname
20
Minc-exec(5):Mount namespace
Setup mount namespaceIn some environment (with systemd?), mount information propagates to other namespaces
Mount --make-rprivate /Do not propagate all the mount operations
Overlaying workspace via minc-coatMinc-coat backend does overlay on rootfs image.
Do not change rootfs afterwords.
If the rootfs can be changed, use --direct option
21
Minc-coat: Implement overlays
Make root/, storage/, work/ under tempdirRoot/: The mountpoint for overlayfs → $RD
Storage/: Overlayfs top directory →$UD
work: a workdir for overlayfs → $WD
Build a new rootfs via OverlayfsNot only using mount namespaces, but also layering for storage isolation
Some differences are there depends on the versionOverlayfs for upstream kernel
mount -t overlay -o upperdir=$UD,lowerdir=$BASEDIR,workdir=$WD overlayfs $RD
Overlayfs for Ubuntu14.10 (out-of-tree)
mount -t overlayfs -o upperdir=$UD,lowerdir=$BASEDIR overlayfs $RD
22
Minc-exec(6): Special Files
Special files and directoriesMake /etc, /dev, /sys and /proc on new rootfs
Bind mounts under /devTouch dummy files and bind it (like symlink)
/dev/console, /dev/null, /dev/zero, /dev/random, /dev/urandom, /dev/mqueue
(and others, if you need)
/dev/pts are mounted with newinstance
Mount /proc for new PID namespaceOld /proc should be ro remount.
Some files to be readonly (/proc/sys etc.), should be bind-mounted the ro /proc.
Bind mounts /sysThis could be skipped or be read only
23
Intermission: Debug
How to debug it?Just for checking the commands, run it with --debug
This option enables “set -x”
If you want to break into it, write “bash”(or other shell you like)You can do anything :)
Or write a command what you run
MINCS is just a set of shell scriptsYou can change it as you want.
24
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
At last, call minc-leash to chroot.
25
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”
Filesystem Size Used Avail Use% Mounted ondevtmpfs 740M 0 740M 0% /devtmpfs 748M 0 748M 0% /dev/shmtmpfs 748M 8.5M 740M 2% /runtmpfs 748M 0 748M 0% /sys/fs/cgroup/dev/sda2 15G 8.6G 6.5G 58% /
Before minc
26
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /devtmpfs 740M 0 740M 0% /devtmpfs 748M 0 748M 0% /dev/shmtmpfs 748M 0 748M 0% /sys/fs/cgrouptmpfs 748M 8.5M 740M 2% /runoverlayfs 15G 8.6G 6.5G 58% /tmp/minc1012-NpuyIA/roottmpfs 748M 0 748M 0% /tmp/minc1012-NpuyIA/root/devdevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/consoledevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/nulldevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/zerodevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/randomdevtmpfs 740M 0 740M 0% /tmp/minc1012-NpuyIA/root/dev/urandom
Special files
27
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /.origdevtmpfs 740M 0 740M 0% /.orig/devtmpfs 748M 0 748M 0% /.orig/dev/shmtmpfs 748M 0 748M 0% /.orig/sys/fs/cgrouptmpfs 748M 8.5M 740M 2% /.orig/runoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom
After the first pivot_root
28
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted on/dev/sda2 15G 8.6G 6.5G 58% /.origoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom
Remove oldProcfs, etc.
29
Minc-exec(7): Post-process Mountpoint
Remove old mountpointsIf we keep it, it can still be visible after chroot
Use pivot_root to unmount somethings
Let's monitor it with “df -h”Filesystem Size Used Avail Use% Mounted onoverlayfs 15G 8.6G 6.5G 58% /tmpfs 748M 0 748M 0% /devdevtmpfs 740M 0 740M 0% /dev/consoledevtmpfs 740M 0 740M 0% /dev/nulldevtmpfs 740M 0 740M 0% /dev/zerodevtmpfs 740M 0 740M 0% /dev/randomdevtmpfs 740M 0 740M 0% /dev/urandom
2nd pivot_root andChroot to new rootfs
30
Minc-leash: capabilities and chroot
Leash() = “Least capabilities shell”Limits capabilities and chroot by using capsh(libcap)
Change UID/GID too
If we skip capabilities setting, just do chroot
Wash() = “Wash out the environment variables”MINCS use environment variables internally, clean it up
Unset all the vars start with MINC_*
31
Use cases of MINCS
Good learning material for containersIf you hits some limitations on docker, you can try it, and understand.
Prototyping new features
Containers for embedded devicesIs it wrong to desire running applications in containers on embedded device? :)
Docker(>14MB, docker only) vs MINCS+Busybox(<4MB, +shell and tools)→ Boot2MINC
32
Boot2minc
Minimal ISO image +MINCShttps://github.com/mhiramat/boot2minc
Forked from minimal Linux Live (https://github.com/ivandavidov/minimal )
IncludingLinux kernel
Busybox(+unshare patch)
MINCS
8MB image including kernel (can run on Qemu-kvm)Able to reduce the size if we optimize the configuration
33
Marten: Manage container images
Minc provides only container featureShould we prepare rootfs via debootstrap?
How to get the rootfs of Fedora/CentOS etc.?
Want to reuse the result of previous container easily
Overlayfs-based container image managerIdentify container images by Docker-like UUID
Track the dependency between images
Import Docker export/saved images
34
Demonstration
Minc
Marten
Boot2minc
35
TODO
mincWork with pipework
Correct TTY support via tmux/screen
Use cgroups to limit cpu/memory/io usage (minc-cage?)
Plugin support of btrfs and dm-thin
MartenContainer execution command (like docker run)
Support OCI compatible container export/import and signing
36
Known Issues
TestcasesWell, we can make it by shell script too :)
CapshCapsh only accepts “sh -c” type command
It doesn't accept escape characters…
37
Conclusion
What I'd like to say is
“We can run a container by combining commands”Docker etc. is not a special, we've already have fundamental tools.
And
“Shell script is great!”
39
Example: Import image from Docker
# docker save centos | gzip - > centos.tar.gz # marten import centos.tar.gzImporting image: centos511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c1585b12ef8fd57065237a6833039acc0e7f68e363c15d8abb5cacce7143a1f7de8a8efe422e6104930bd0975c199faa15da985b6694513d2e873aa2da9ee402174c # marten imagesID SIZE NAME511136ea3c5a 4.0K (noname)5b12ef8fd570 4.0K (noname)8efe422e6104 224M centos # minc -r centos /bin/bash