Transcript
Page 1: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Seven Problemsof Linux Containers

Kir Kolyshkin<[email protected]>

28 April 2013 LinuxFest Northwest

Page 2: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Seventy Seven Problemsof Linux Containers

Kir Kolyshkin<[email protected]>

28 April 2013 LinuxFest Northwest

(of which I am going to cover six)

Page 3: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 1: Effective virtualization

● Virtualization is partitioning● Historical way: $M mainframes● Modern way: virtual machines● Problem: performance overhead● Partial solution: hardware support

(Intel VT, AMD V)

Page 4: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: isolation

● Run many isolated userspace instanceson top of sone single (Linux) kernel

● All processes see each other– files, process information, network,

shared memory, users, etc.● Make them unsee it!

Page 5: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 6: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

One historical way to unsee

chroot()

Page 7: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Namespaces

● Implemented in the Linux kernel– PID– net– IPC– UTS– mnt– user

● clone() with CLONE_NEW* flags

Page 8: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 2: Shared resources

● All containers share the same set of resources (CPU, RAM, disk, various kernel things ...)

● Need fair distribution of goods so everyone gets their share

● Need DoS prevention● Need prioritization

– “All animals are equal, but some animals are more equal than others” -- George Orwell

Page 9: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 10: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: OpenVZ resource controls

● OpenVZ:– user beancounters

● controls 20 parameters– hierarchical CPU scheduler– disk quota per containers– I/O priorities per-container

● Dynamic control, can “resize” runtime

Page 11: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: cgroups

● Cgroups is a mechanism to control resources per hierarchical groups of processes

● Cgroups is nothing without controllers:– blkio, cpu, cpuacct, cpuset, devices, freezer,

memory, net_cls, net_prio● Cgroups are orthogonal to namespaces● Still a work in progress (kernel memory)

Page 12: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 3: easy resources

● User Beancounters are complicated:– http://wiki.openvz.org/UBC_consistency_check– user has to set all these parameters– some of which are interdependent

● We created a collection of valid configs,● ... wrote a whole book about UBC● ... and a set of tools to help

Page 13: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 14: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: VSwap

● Only two primary parameters: RAM and swap– others still exist, but no longer required to set

● Swap is virtual, no actual I/O is performed● Slow down to emulate real swap● Only when actual global RAM shortage

occurs,virtual swap goes into the real swap

● Currently only available in OpenVZ kernel

Page 15: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 4: fast live migration

● We can migrate an OpenVZ containerfrom one physical server to anotherwithout a shutdown

● We want to do it fast even for huge containers– huge disk: use shared storage– huge RAM: ???

Page 16: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Normal migration process

● (Assuming shared storage)● 1 Freeze the container● 2 Dump its complete state to a dump file● 3 Copy dump file to destination server● 4 Undump● 5 Unfreeze● Problem: huge dump file

Page 17: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: network swap

● 1 Dump the minimal memory, lock the rest● 2 Restore the minimal memory,

mark the rest as swapped out● 3 Set up network swap from the source● 4 Unfreeze. Missing RAM will be “swapped in”● 5 Migrate the rest of RAM and kill it on source

Page 18: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 19: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: network swap

● 1 Dump the minimal memory, lock the rest● 2 Copy, undump what we have,

mark the rest as swapped out● 3 Set up network swap served from the source● 4 Unfreeze. Missing RAM will be “swapped in”● 5 Migrate the rest of RAM and kill it on source● PROBLEM? Reliability, no way to rollback

Page 20: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: Iterative RAM migration

● 1 Ask kernel to track modified pages● 2 Copy all memory to destination system● 3 Ask kernel for list of modified pages● 4 Copy those pages● 5 GOTO 3 until satisfied● 6 Freeze and do migration as usual

Page 21: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 5: upstreaming

● OpenVZ was developed separately● Then we wanted to merge it upstream

(i.e. to vanilla Linux kernel)● Problem?

Page 22: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 23: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 5: upstreaming

● OpenVZ was developed separately● Then we wanted to merge it upstream

(i.e. to vanilla Linux kernel)● Problem:● upstream devs are not accepting our work

Page 24: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: rewrite from scratch

● User Beancounters -> CGroups● Did 2 rewrites for PID namespace

until it finally got accepted● Network namespace redone● It works!● about 1500 patches got landed to vanilla● II Parallels made it to top10 contributors

Page 25: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: CRIU

● We tried hard to merge checkpoint/restore● Other people tried hard too, no luck● Can't make it to the kernel, let's go userspace● With minimal kernel intervention when

required● Kernel exports most of information already, so

let's just add missing bits and pieces

Page 26: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

CRIU

● Checkpoint / Restore (mostly) In Userspace

Tools currently at version 0.4● Will do 1.0 release this year● Kernel 3.8 has about 120 patches from us

– 95% of needed features are there● Memory snapshot recently made it to -mm tree

Page 27: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 28: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 6: common file system

● Container is just a directory on host,all CTs reside on the same FS

● File system journal is a bottleneck● Lots of small-size files I/O on CT backup● No sub-tree disk quota support in upstream● No per-container snapshots● Live migration: rsync -- changed inodes● File system type and properties are fixed

Page 29: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: LVM

● Only works only on top of block device● Hard to manage (e.g. how to migrate huge

volume?)● No dynamic allocation● Complicated management

Page 30: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: loop device

● VFS operations leads to double page-caching– (already fixed in the recent kernels)

● No dynamic allocation, max space is used● Limited feature set

Page 31: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 3: ploop

● Basic idea: same as loop, just better● Modular design:

– various image formats (qcow2 in TODO)– various I/O backends

● More features:– live resize– instant live snapshots– write tracker to help in live migration

Page 32: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Any problems questions?

[email protected]● Twitter: @kolyshkin


Top Related