Slide 1

Slide 1 text

Experimental Docker Checkpoint and Restore with CRIU Docker Meetup September 17, 2014 Saied Kazemi (saied@)

Slide 2

Slide 2 text

Docker Meetup 9/17/14 Ultimate Goal ● Native container Checkpoint and Restore (C/R) support in Docker to facilitate container migration host_a$ docker checkpoint host_b$ docker restore ● Actual C/R to be done with the Checkpoint Restore In Userspace (CRIU) utility

Slide 3

Slide 3 text

Docker Meetup 9/17/14 What is CRIU? ● An open source software tool for Linux (http://criu.org) ○ Freeze a running application ○ Checkpoint to a collection of “image” files ○ Restore later from image files ○ Application resumes execution from the point it was frozen ● Implemented mainly in userspace in C ○ Presented by Pavel Emelianov, OpenVZ team leader, in July 2011 ○ Version 1.0 released in November 2013 ○ Version 1.3 with Docker container support released in August 2014

Slide 4

Slide 4 text

Docker Meetup 9/17/14 CRIU Usage Scenarios ● A set of ideas listed at http://criu.org/Usage_scenarios ○ Container live migration ○ Slow-boot services speed up ○ Reboot-less kernel upgrade ○ Networking load balancing ○ HPC issues ○ Desktop environment suspend/resume ○ Freeze for inspection and/or debugging ○ ... ● Container live migration was the main use case for CRIU

Slide 5

Slide 5 text

Docker Meetup 9/17/14 CRIU and Docker ● There were a number of issues C/R’ing Docker containers ○ see backup slides for details ● Excellent support from upstream CRIU developers and community ● With CRIU 1.3, now possible to C/R ○ Works with AUFS (default) as well as VFS and UnionFS ○ Device Mapper not tested ● No native support in Docker yet ● No container migration yet

Slide 6

Slide 6 text

Docker Meetup 9/17/14 Docker and its Containers docker run ... docker -d init grandchild Container 1 Container 2 Global namesapce external bind mount

Slide 7

Slide 7 text

Docker Meetup 9/17/14 Live Demo ● Demo using docker_cr.sh helper script ● Demo using nsinit binary

Slide 8

Slide 8 text

Docker Meetup 9/17/14 Docker C/R Options ● There are two options to checkpoint and restore: A) The Docker daemon and (all) its containers and B) An individual container (without the Docker daemon) ● Option A isn’t currently possible with CRIU due to nested namespaces ○ Option B is possible on the same machine ○ Will look into adding migration support

Slide 9

Slide 9 text

Docker Meetup 9/17/14 Native C/R Support ● Why do we need native C/R support in Docker? ○ Container state ■ After checkpoint, Docker thinks the container has finished and exited ■ After restore, Docker doesn’t know container has resumed ○ Process tree ownership ■ Restored process tree is a child of init, not Docker daemon ○ Other uncovered issue...

Slide 10

Slide 10 text

Docker Meetup 9/17/14 Next Steps ● Add C/R support to libcontainer ○ use nsinit for validation ● Add C/R support to Docker ● Add migration support

Slide 11

Slide 11 text

Docker Meetup 9/17/14 Backup Slides

Slide 12

Slide 12 text

container docker exec driver nsinit libcontainer criu

Slide 13

Slide 13 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: nested PID namespaces ○ two ways to start a container: interactive ($ docker run -i ...) or detached ($ docker run -d ...) ○ in both cases the process is a child of the docker daemon (not the docker client) running in global PID namespace ○ CRIU does not support nested PID namespaces ● Solution: C/R is done on process tree without Docker

Slide 14

Slide 14 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: external bind mounts ○ /etc/{hosts,hostname} from container’s config dir ○ /etc/resolv.conf from container’s config dir (or /etc/resolv.conf in older versions) ○ /.dockerinit from Docker’s init dir in older versions ○ bind mount paths for files in /etc can be obtained with docker inspect, but not for /.dockerinit ● Solution: external bind mount support with --ext-mount-map

Slide 15

Slide 15 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: /dev/null bind mount over /proc/kcore ○ appeared in Docker 0.10.0, caused dump failure ● Solution: patch 494c044 ● Issue: dumpable flag ○ appeared in Docker 0.11.1 (libcontainer dropping all capabilities, keeping those specified in config) ○ value is set to 2 by which cannot be restored ● Solution: patch 8870aa1

Slide 16

Slide 16 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: restoring cgroups subdirs and properties ○ after checkpointing, Docker daemon would remove container’s cgroups subdirs (because the container has “exited”) ○ after restoring subdirs, properties were not restored ● Solution: cgroups restoration support with --manage-cgroups

Slide 17

Slide 17 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: stdin in detached mode ○ container’s stdin set to the global /dev/null in detached mode $ docker run -d … ● Solution: fixed in Docker use --evasive-devices for older Docker versions

Slide 18

Slide 18 text

Docker Meetup 9/17/14 Issues and Solutions ● Issue: AUFS ○ /proc//map_files symbolic link paths point inside AUFS branches ○ CRIU gets confused seeing the same file in its physical location (in the branch) and its logical location (from the root of mount namespace) ○ fixing the kernel is the right solution but time-consuming to roll out ● Solution: ○ fixed in AUFS (but will take time to be available in all distros) ○ in the meantime, CRIU patch d8b41b6 will compensate for the problem