Slide 1

Slide 1 text

Let’s take a glance at the future of containers! Uchio Kondo / GMO Pepabo, Inc. 2018.12.05 JapanContainerDays v18.12 Introduction to CRIU

Slide 2

Slide 2 text

Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura Technical department, Dev Productivity/R&D Team RubyKaigi 2019 at Fukuoka Local Organizer Chair on CNDJ at Fukuoka, 2019.04

Slide 3

Slide 3 text

http://www.fuk-ab.co.jp/network_int.html Hi from Fukuoka

Slide 4

Slide 4 text

ˏ 1FQBCP'VLVPLBCSBODI

Slide 5

Slide 5 text

Scope of Today’s Talk •What is CRIU? •Dive into the inside of CRIU •How can we use CRIU? • Migration • Reduction of bootstrap cost •How to combine CRIU into a runtime?

Slide 6

Slide 6 text

#containerdaysjp #TerraceRoom

Slide 7

Slide 7 text

Scope of Today’s Talk •What is CRIU? •Dive into the inside of CRIU •How can we use CRIU? • Migration • Reduction of bootstrap time •How to combine CRIU into a runtime? For Developers/Operators Using Containers For RUNTIME Developers

Slide 8

Slide 8 text

What is CRIU?

Slide 9

Slide 9 text

CRIU is: C/R In Userspace • a project to implement checkpoint/restore(C/R) functionality for Linux • Generally, VMs are able to be dumped and restored. • CRIU is this functionality for processes/containers • https://www.criu.org/Main_Page • ex. crtools

Slide 10

Slide 10 text

Whet CRIU is for • CRIU is developed as a project of Virtuozzo • https://www.virtuozzo.com/ • CRIU is currently used by OpenVZ (https://openvz.org/), 
 LXC/LXD and Docker. You can use criu command alone.

Slide 11

Slide 11 text

CRIU can create checkpoint • ... for PROCESSES. • Dumping memories, fds, socket state...

Slide 12

Slide 12 text

Hey, containers are PROCESSES!!

Slide 13

Slide 13 text

Containers are PROCESSES • So CRIU can create checkpoints for containers! • CRIU has many of functionalities to make container’s checkpoint. e.g. Network, Namespace, cgroup...

Slide 14

Slide 14 text

Docker + CRIU demo

Slide 15

Slide 15 text

Note... https://github.com/moby/moby/issues/35691 • Checkpoint won’t work with 18.03~... I used 17.06 for now • moby@master fixed this issue

Slide 16

Slide 16 text

Enable docker checkpoint • Following instruction in https://github.com/docker/cli/blob/ master/experimental/checkpoint-restore.md • Preparation: Install CRIU by yourself (and Docker v17.06 :) • (Ubuntu Bionic has criu package v3.6) • Add --experimental flag to dockerd startup command, then restart

Slide 17

Slide 17 text

Enable docker checkpoint

Slide 18

Slide 18 text

Checkpoint/Restore demo • Run simple container that count number on memory • e.g. • Then create checkpoint: • And restart with --checkpoint option
 • Thus, the count is rollbacked to checkpoint! • (If no --checkpoint, count is restarted by 0)

Slide 19

Slide 19 text

Checkpoint/Restore demo • Using criu command internally to checkpoint/restore

Slide 20

Slide 20 text

Resources about CRIU • Slide from OpenVZ team: • https://www.slideshare.net/openvz/criu-13dusseldorf • One of most reliable articles written in Japanese: • https://gihyo.jp/admin/serial/01/linux_containers/0032

Slide 21

Slide 21 text

Dive into the inside of CRIU

Slide 22

Slide 22 text

How can we invoke CRIU • There are 2 modes: • Via cli: criu command. Normally we use this • Via API: server/client model

Slide 23

Slide 23 text

cli model Shell CRIU command Kernel Target
 process Syscalls, /proc files ...

Slide 24

Slide 24 text

Server/client model • CRIU can be a service: criu service • Client can access this service via socket, using protobuf • CRIU provides some of protobuf wrapper: • C wrapper (called ) • Python wrapper • Go wrapper(experimental)

Slide 25

Slide 25 text

Server/client model Program CRIU service UNIX domain socket Kernel libcriu Target
 process Syscalls, /proc files ... protobuf

Slide 26

Slide 26 text

Detail of container C/R process • docker checkpoint/restore uses CRIU • Let’s look into how docker use CRIU!!!

Slide 27

Slide 27 text

Processes that docker hosts

Slide 28

Slide 28 text

Detailed processes overview dockerd docker-containerd containerd-shim Container’s process \_ \_ \_

Slide 29

Slide 29 text

Linux Namespace dockerd docker-containerd containerd-shim Container’s process \_ \_ \_ Host’s Linux Namespace Container’s Linux Namespace

Slide 30

Slide 30 text

Assigned cgroup dockerd docker-containerd containerd-shim Container’s process \_ \_ \_ Systemd-managed cgroup (docker.service) Container’s Each cgroup

Slide 31

Slide 31 text

How CRIU make images CRIU Target
 process Syscalls, /proc files ... Kernel • CRIU gets the information of process via syscall, /proc file, iproute2 utilities...

Slide 32

Slide 32 text

How CRIU make images • Then dump them into images - normally processes will be killed at this time. Memory dump Network conf File descriptors cgroup params Process attrs ...... CRIU Target
 process Syscalls, /proc files ... Kernel

Slide 33

Slide 33 text

How CRIU restore images CRIU Restored
 process • CRIU will use these images on restore Memory dump Network conf File descriptors cgroup params Process attrs ...... Kernel

Slide 34

Slide 34 text

The raw images

Slide 35

Slide 35 text

crit: image utility • CRIU is bundled with crit command, which can decode images in CRIU format.

Slide 36

Slide 36 text

How can we use CRIU?

Slide 37

Slide 37 text

Case 1: Migration

Slide 38

Slide 38 text

P.Haul Project https://criu.org/P.Haul • Extension to make live migration with CRIU possible. • Super experimental • Not so active

Slide 39

Slide 39 text

P.Haul works? • Example of node-to-node migration using sample process • https://github.com/checkpoint-restore/p.haul/blob/master/test/ mtouch/HOWTO • There is also a example for docker 1.9.0... and cannot reproduce now • https://github.com/checkpoint-restore/p.haul/blob/master/test/ docker/HOWTO

Slide 40

Slide 40 text

Migration demo P.Haul looks too inactive! So I implemented it Using my container... I’ll show later!

Slide 41

Slide 41 text

Case 2: Reduction of
 Bootstrap Cost

Slide 42

Slide 42 text

Containers with slow bootstrap • Especially big applications: Legacy Rails, JVM, ... • These applications cannot enjoy enough the merits of lightweight aspect of containers. • e.g. A small Rails project takes 2,500ms~ to become ready. • Jenkins project takes 5,000ms~ to listen 8080...

Slide 43

Slide 43 text

FYI: “FastContainer” • An architecture to handle containers • A container will be bootstrapped on first request, and automatically shut down after some minutes. • This means containers are restarted repeatedly, and this force containers to be refreshed and clean. • cf. “Phoenix Server” in the book “Infrastructure as Code” • Used in our PaaS service: https://mc.lolipop.jp • See @matsumotory’s paper/slide https://speakerdeck.com/matsumoto_r/fastcontainer-shi-xing-huan-jing-falsebian-hua- nisu-zao-kushi-ying-dekiruheng-chang-xing-wochi-tusisutemuakitekutiya

Slide 44

Slide 44 text

FYI: “FastContainer” Web Proxy Web Request Dispatcher FastContainer Runtime CMDB ❌ FastContainer Killed 1. Check 2. Boot 3. Forward 4. Terminate

Slide 45

Slide 45 text

Experiment overview Environment Containers Bench Host https://github.com/FastContainer/nginx-haconiwa 192.168.199.10 192.168.199.20 Service Meshing: Runtime:

Slide 46

Slide 46 text

Experiment codes ab -g bench-rails.tsv \ -s 120 -c 1 -t 90 -n 1000000 -k -l http://192.168.199.10/ import numpy as np import matplotlib.pyplot as plt data = np.loadtxt("/path/to/bench-rails.tsv", delimiter="\t", skiprows=1, usecols=(1,4), dtype=int) data = np.rot90(sorted(data, key=lambda x:x[0]), k=-1) plt.plot(data[0], data[1], linewidth=1, color="orange") plt.ylim(0, 2700) plt.show() Benchmarker Script For Visualize

Slide 47

Slide 47 text

Needs fast boot up • One of bottleneck of this architecture is “slow boot” apps • Comparison of Apache HTTPD vs Rails application: ms/r unixtime Apache(phpinfo) RoR(no bootsnap)

Slide 48

Slide 48 text

Normal FastCon lifecycle ngx_mruby Haconiwa Containers Restart on next request Stop after “Lifetime” Haconiwa

Slide 49

Slide 49 text

Lifecycle with CRIU ngx_mruby Haconiwa Containers ReSTORE on next request Make image just before stop, In async process haconiwa restore Image

Slide 50

Slide 50 text

Using CRIU to make boot fast • Comparison of hot-start Rails application and cold-start (from criu image) Rails: RoR(no bootsnap/From CRIU image) RoR(no bootsnap)

Slide 51

Slide 51 text

Misc.

Slide 52

Slide 52 text

No-downtime kernel upgrade? • Is it possible?: Yes(logically).

Slide 53

Slide 53 text

Kubernetes integration? • There seems to be no plan yet...(I want more info) • A project in UBC class refers this: • https://www.cs.ubc.ca/~bestchai/teaching/cs416_2017w2/ project2/project_m6r8_s8u8_v5v8_y6x8_proposal.pdf

Slide 54

Slide 54 text

Checkpoint

Slide 55

Slide 55 text

Restore :)

Slide 56

Slide 56 text

How to combine CRIU into a runtime?

Slide 57

Slide 57 text

I’ll introduce My container runtime...

Slide 58

Slide 58 text

https://haconiwa.mruby.org/

Slide 59

Slide 59 text

Haconiwa • Highly Configurable container runtime written in mruby • Non OCI-compatible for now (I am planning...) • Implemented basic container features: • Linux namespace, cgroup, chroot/pivot_root, capability/uid/ gid, rlimit, seccomp, apparmor... • Implemented some “hooks”: • Lifetime hooks, async timeout/interval hooks, sighandlers

Slide 60

Slide 60 text

Haconiwa accepts DSLs

Slide 61

Slide 61 text

What I’m working on now • Bundling CRIU features into Haconiwa • haconiwa checkpoint: • To create checkpoint from a running container • haconiwa restore: • To make a restored container, with some spec changes

Slide 62

Slide 62 text

CRIU deep features: • These are what I used in haconiwa development: • Restoration process hooks(action script) • Change cgroup name on restore • Replace supervisor program by --exec-cmd

Slide 63

Slide 63 text

Restoration process hooks • CRIU has a hooks which are invoked as the checkpointing or restoration is processed: Action Script. • e.g. post-dump, post-restore, setup-namespaces... • Haconiwa use this action script to change container’s IP from dumped one as written in a new DSL.

Slide 64

Slide 64 text

Change cgroup name on restore • Haconiwa’s container has name option, which decides its cgroup name. • When you want to change name between dumped and restored containers, you must also change new one’s cgroup name. • Criu’s --cgroup-root option to solve this

Slide 65

Slide 65 text

Replace supervisor program • Haconiwa has its own hooks, and restore process should also restore these hooks by DSL. • This is out of CRIU’s feature • Hooks are implemented in “container supervisor”, rather than container itself • So I implemented to set “supervisor for restored containers” upon a restored container. And hooks are invoked in SV

Slide 66

Slide 66 text

Replacement process Haconiwa sv \- criu restore \- Container Haconiwa sv \- haconiwa _restored \- Container exec() wait() in new program Restore done!

Slide 67

Slide 67 text

See official document • https://criu.org/Tree_after_restore

Slide 68

Slide 68 text

Haconiwa x CRIU Demo

Slide 69

Slide 69 text

Live Rails Migration Using haconiwa C/R

Slide 70

Slide 70 text

Demo Overview Load Balancer Victim container Restored container Image On shared storage Victim Host Dest Host http://Mac:10080 http://Mac:11080 Nonstop! https://github.com/udzura/nginx-haconiwa/tree/haconiwa-migration

Slide 71

Slide 71 text

Conclusion

Slide 72

Slide 72 text

Conclusion • CRIU can create checkpoints for containers, and restore. • I introduced 2 use cases: • Migration • Reduction of bootstrap cost • There is no Kubernetes integration yet, but may be soon? • I have been developing CRIU integration with my container runtime :)

Slide 73

Slide 73 text

Join Us To Be Cloud Native! Follow us: @pb_recruit