Restore (C/R) support in Docker to facilitate container migration host_a$ docker checkpoint <container_id> host_b$ docker restore <container_id> • Actual C/R to be done with the Checkpoint Restore In Userspace (CRIU) utility
software tool for Linux (http://criu.org) ◦ Freeze a running application ◦ Checkpoint to a collection of “image” files ◦ Restore later from image files ◦ Application resumes execution from the point it was frozen • Implemented mainly in userspace in C ◦ Presented by Pavel Emelianov, OpenVZ team leader, in July 2011 ◦ Version 1.0 released in November 2013 ◦ Version 1.3 with Docker container support released in August 2014
ideas listed at http://criu.org/Usage_scenarios ◦ Container live migration ◦ Slow-boot services speed up ◦ Reboot-less kernel upgrade ◦ Networking load balancing ◦ HPC issues ◦ Desktop environment suspend/resume ◦ Freeze for inspection and/or debugging ◦ ... • Container live migration was the main use case for CRIU
number of issues C/R’ing Docker containers ◦ see backup slides for details • Excellent support from upstream CRIU developers and community • With CRIU 1.3, now possible to C/R ◦ Works with AUFS (default) as well as VFS and UnionFS ◦ Device Mapper not tested • No native support in Docker yet • No container migration yet
options to checkpoint and restore: A) The Docker daemon and (all) its containers and B) An individual container (without the Docker daemon) • Option A isn’t currently possible with CRIU due to nested namespaces ◦ Option B is possible on the same machine ◦ Will look into adding migration support
need native C/R support in Docker? ◦ Container state ▪ After checkpoint, Docker thinks the container has finished and exited ▪ After restore, Docker doesn’t know container has resumed ◦ Process tree ownership ▪ Restored process tree is a child of init, not Docker daemon ◦ Other uncovered issue...
namespaces ◦ two ways to start a container: interactive ($ docker run -i ...) or detached ($ docker run -d ...) ◦ in both cases the process is a child of the docker daemon (not the docker client) running in global PID namespace ◦ CRIU does not support nested PID namespaces • Solution: C/R is done on process tree without Docker
mounts ◦ /etc/{hosts,hostname} from container’s config dir ◦ /etc/resolv.conf from container’s config dir (or /etc/resolv.conf in older versions) ◦ /.dockerinit from Docker’s init dir in older versions ◦ bind mount paths for files in /etc can be obtained with docker inspect, but not for /.dockerinit • Solution: external bind mount support with --ext-mount-map
mount over /proc/kcore ◦ appeared in Docker 0.10.0, caused dump failure • Solution: patch 494c044 • Issue: dumpable flag ◦ appeared in Docker 0.11.1 (libcontainer dropping all capabilities, keeping those specified in config) ◦ value is set to 2 by which cannot be restored • Solution: patch 8870aa1
subdirs and properties ◦ after checkpointing, Docker daemon would remove container’s cgroups subdirs (because the container has “exited”) ◦ after restoring subdirs, properties were not restored • Solution: cgroups restoration support with --manage-cgroups
detached mode ◦ container’s stdin set to the global /dev/null in detached mode $ docker run -d … • Solution: fixed in Docker use --evasive-devices for older Docker versions
/proc/<pid>/map_files symbolic link paths point inside AUFS branches ◦ CRIU gets confused seeing the same file in its physical location (in the branch) and its logical location (from the root of mount namespace) ◦ fixing the kernel is the right solution but time-consuming to roll out • Solution: ◦ fixed in AUFS (but will take time to be available in all distros) ◦ in the meantime, CRIU patch d8b41b6 will compensate for the problem