system call changes w.r.t current process thread. - Go runtime, cannot ensure current thread mapping to particular system thread. - Even runtime.LockISThread() also cannot. - So ‘C’ code helps here. - The nsexec() is invoked before go runtime boots. - Does nothing if not invoked from container init. - I.e. no environment variables _LIBCONTAINER_INITPIPE is set.
to communicate with child and grandchild. b. Clone child process c. Wait for both children to ready and then start communication. 2. Child process (case JUMP_CHILD) a. Join namespaces b. Unshare namespaces 3. GrandChild (container application process) (case JUMP_INIT) a. Sync with parent b. Unshare if asked by parent.
not fork()? A: fork() and clone() both internally calls clone() system call in glibc. Clone provides option to choose the what exactly one want to share between parent and child process whereas fork tries to create a copy of parent process. Useful info: the-difference-between-fork-vfork-exec-and-clone Q: Why user namespace is unshared first and then other namespace by child process? A: user namespace is special namespace in terms - It affects ability to unshare other namespaces and used as context for privilege checks. - There are couple of inconsistency behaviour in various kernels, unsharing all namespace together results into incorrect namespace object. Ref: https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c#L860