runc init process of double fork

C3bf63e3aa5a2e655b7fb91f75ce8e95?s=47 Kunal Kushwaha
September 24, 2019

runc init process of double fork

supporting slides to explain runc nsexec.c code.

C3bf63e3aa5a2e655b7fb91f75ce8e95?s=128

Kunal Kushwaha

September 24, 2019
Tweet

Transcript

  1. runC init Kunal Kushwaha @kunalkushwaha

  2. Container Creation Process Isolation - New process creation - Namespace

    settings - Cgroups - seccomp nsexec.c Libcontainer go code.
  3. Why C code? - setns ( namespace setting ) :

    system call changes w.r.t current process thread. - Go runtime, cannot ensure current thread mapping to particular system thread. - Even runtime.LockISThread() also cannot. - So ‘C’ code helps here. - The nsexec() is invoked before go runtime boots. - Does nothing if not invoked from container init. - I.e. no environment variables _LIBCONTAINER_INITPIPE is set.
  4. nsexec() Function works in 3 stage. 1. Parent process (runc

    command context) 2. Child process 3. GrandChild (container application process)
  5. Flow 1. Parent process (case JUMP_PARENT) a. Have socket pair

    to communicate with child and grandchild. b. Clone child process c. Wait for both children to ready and then start communication. 2. Child process (case JUMP_CHILD) a. Join namespaces b. Unshare namespaces 3. GrandChild (container application process) (case JUMP_INIT) a. Sync with parent b. Unshare if asked by parent.
  6. Q & A Q : Why runC uses Clone() and

    not fork()? A: fork() and clone() both internally calls clone() system call in glibc. Clone provides option to choose the what exactly one want to share between parent and child process whereas fork tries to create a copy of parent process. Useful info: the-difference-between-fork-vfork-exec-and-clone Q: Why user namespace is unshared first and then other namespace by child process? A: user namespace is special namespace in terms - It affects ability to unshare other namespaces and used as context for privilege checks. - There are couple of inconsistency behaviour in various kernels, unsharing all namespace together results into incorrect namespace object. Ref: https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c#L860