see docker from the perspective of linux processfiles.meetup.com/10602292/see docker from the...
TRANSCRIPT
See Docker from the Perspective of Linux Process
Allen Sun@[email protected]
Agenda
1. Prerequisite Linux Process (do_fork / copy_process )Namespaces
2. How Docker deals processdockerinit, ENTRYPOINT, CMD
syscall——fork()Process A
fork()
Process Acontinues
Process B
execev()
exit()
wait() ZOMBIE
SIGCHLD
clean up
Child -‐ new PID
executes a different program !
Reference: http://www.lynx.com/the-‐fork-‐call-‐posix-‐processes-‐and-‐parent-‐child-‐relationships
Parent -‐ original PID
do_forkdo_fork
copy_process
determine PID
wake_up_new_task
wait_for_completion
copy_process
check flags
dup and init task_struct
check resource limit
copy/share process details
Reference:Mauerer W. Professional Linux kernel architecture[M] Figure 2-‐7 and Figure 2-‐8. John Wiley & Sons, 2010.
copy_semundo
copy_namespaces
……
set IDs, task relationships, etc.
……
struct nsproxy *nsproxy
struct task_struct
struct uts_namespace *uts_ns
struct nsproxy
struct mnt_namespace *mnt_nsstruct net *net_ns
struct uts_namespace
struct mnt_namespace
struct net
task_struct and namespaces
Nsproxy proxies 5 kinds of namespace for a process.
1.uts_namespace2.mnt_namespace3.pid_namespace4.ipc_namespace5.net
user_namespace is not in nsproxy!Based on Linux kernel 3.13
What is in namespaces?struct pid_namespace {
…struct task_struct * child_reaper;…int level;struct pid_namespace *parent;
};
struct mnt_namespace {atomic_t count;struct mount *root;struct list_head list;……
};
Based on Linux kernel 3.13
struct uts_namespace {struct kref kref;struct new_utsname name;struct user_namespace *user_ns;……
}
struct new_utsname {char sysname[..];char nodename[..];char release[..];char version[..];char machine[..];char domainname[..];
};……
Docker? Where is Docker?
Docker Client
Docker Daemon
DockerContainer
DockerContainer
……
fork !
do_fork
copy_process
copy_namespaces
do_execve
Docker Container is born just by syscall fork and exec a process !
Difference(Docker’s fork vs normal fork)Special flags used in syscall do_fork()
flag name Linux kernel version
CLONE_NEWNS 2.4.19
CLONE_NEWUTS 2.6.19
CLONE_NEWIPC 2.6.24
CLONE_NEWPID 2.6.24
CLONE_NEWNET 2.6.29
CLONE_NEWUSER 3.8
Namespaces in Dockerfunc init(){
namespaceList = Namespaces{ {Key: "NEWNS", Value: syscall.CLONE_NEWNS, File: "mnt"},{Key: "NEWUTS", Value: syscall.CLONE_NEWUTS, File: "uts"},{Key: "NEWIPC", Value: syscall.CLONE_NEWIPC, File: "ipc"},{Key: "NEWUSER", Value: syscall.CLONE_NEWUSER, File: "user"}, {Key: "NEWPID", Value: syscall.CLONE_NEWPID, File: "pid"}, {Key: "NEWNET", Value: syscall.CLONE_NEWNET, File: "net"},
}}
Based on libcontainer v1.2.0
USER_NAMESPACE: not fully implemented in DockerNET_NAMESPACE: not used in network mode “host” and ”other container”
What to Fork?
Docker Client
Docker Daemon
? ?
fork with flags!
…… Docker Container
fork Docker Container?
Docker Container == Process(es)?
What Process to Fork?
Whatever! A process indeed.
Process is just forked, not execed yet.Result is like below:
task_struct ready
namespaces ready
other resources ready
Process is still static, no program is running. L
Then exec! exec what?Have you ever heard of
dockerinit, ENTRYPOINT or CMD in Docker?
name description
dockerinit init thing that first runs inside a new namespace to setup mount, net namespaces and other things.
ENTRYPOINT An ENTRYPOINT allows you to configure a container that will run as an executable
CMD The main purpose of a CMD is to provide defaults for an executing container.
Reference: https://docs.docker.com/reference/builder
Dockerinit, ENTRYPOINT, CMDDocker Daemon
process
fork
exec
dockerinit ENTRYPOINT CMD
1. 2. 3.
new namespaces
init namespaces
the only process (same PID)
dockerinit
Docker Daemon and dockerinit
Docker Daemon
syncPipe
parent
child
Usage: coordnate the sequential of Docker Daemon and dockerinit.
Dockerinit will be blocked if nothing read in syncPipe.
Why ?
How to coordinate?Docker Daemon
dockerinit
1.Create Command The executable in container(dockerint)
2.Create syncPipe3.Pass pipe to Child4. command.start() Fork and exec the command
syncPipe(nothing) blocked
5. SetupCgroups syncPipe(nothing) blocked, controlled by cgroup6. init network syncPipe(nothing) blocked, controlled by cgroup
7.Sync with Child syncPipe(has networkState) read from syncPipe J
fork, new PID!
Based on libcontainer v1.2.0
How to coordinate?Docker Daemon dockerinit
1.SetupNetwork2.SetupRoute3.Init Mount ns4.Apply apparmor5.execv Entrypoint
Setup devices, mount points and fs
ENTRYPOINT exec, same PID!
exec, same PID!CMD
Finally, YOUR APP!8.command.wait()
Based on libcontainer v1.2.0
x. execv Cmd
Docker ContainerDocker Daemon
process
fork
exec
dockerinit ENTRYPOINT CMD (your application)
1. 2. 3.
new namespaces
init namespaces
the only process (same PID)
cgroups applied
Docker Containerprocess process process
process
Why to Coordinate?1. Docker Daemon needs to Synchronize with dockerinit.
block dockerinit so no children of dockerinit can escape from cgroups.
2. Can not switch namespace in Go runtime.blocked until Docker Daemon transfers network details that will be used to setup network interface in newnet namespace.
中国第⼀一家专注于容器技术的云计算公司,由EMC与VMware前⾼高管创⽴立,核⼼心研发团队来⾃自Oracle,Microsoft,阿⾥里,盛⼤大等⼀一线⾼高科技公司。
成⽴立于2014年,于2015年获得光速安振⻛风险投资,总部位于上海。在北京及旧⾦金⼭山设有分⽀支机构。
道客⺴⽹网络
欢迎进入DaoCloud网站 https://www.daocloud.io/ 开启属于您的Docker云托管
PRESENTATION TITLESPEAKER NAME
2014 / 12 /09
THANK YOU !
Email: [email protected]: @莲子弗如清webchat: shlallen