Slide 1

Slide 1 text

Benjamin Allot, Sysadmin/R&D Engineer @Scality Sysadmin Days #8 October 18th, 2018 Docker for production

Slide 2

Slide 2 text

FOR ANY STANDARD x86 SERVER PLATFORM 100% SOFTWARE SOLUTION PRIVATE CLOUD PUBLIC CLOUDS …… MULTI-CLOUD DATA CONTROLLER TO ACCESS AND MANAGE DATA ACROSS CLOUDS object & file storage in a single system · peer to peer architecture · unlimited scalability · unbounded scale-out performance · most adaptive set of robust data protection mechanisms · autonomous self-healing · designed in close collaboration with the biggest (cloud-scale) service providers in the world a single, unified API across all clouds to simplify application development · the only multi-cloud data management solution independent of the storage system · stores data in standard cloud format to make the data consumable directly by native cloud apps and services · true multi-cloud IT · global search across all managed data independent of cloud location …… … … …

Slide 3

Slide 3 text

Benjamin Allot Docker ● Adoption early in 2014 ● Tooling at Scality for our builds, then CI ● S3 API compatible server deployed with docker Tech Leader of “Setup team” Trained as a Developer Unix Sysadmin for a living

Slide 4

Slide 4 text

Docker Image : You said layers ? Thanks to xkcd

Slide 5

Slide 5 text

Docker Image: You said layers ? Check Docker documentation here

Slide 6

Slide 6 text

Docker : Storage Drivers

Slide 7

Slide 7 text

● The good: The layers are stored efficiently thanks to a “copy on write” mechanism Docker : Storage Drivers

Slide 8

Slide 8 text

● The good: The layers are stored efficiently thanks to a “copy on write” mechanism ● The bad: There is several storage drivers, each with their strengths and weaknesses Docker : Storage Drivers

Slide 9

Slide 9 text

● The good: The layers are stored efficiently thanks to a “copy on write” mechanism ● The bad: There is several storage drivers, each with their strengths and weaknesses ● The ugly: Bad combination of Docker, Storage Driver and Kernel can lead to issues (mostly upon stop) Docker : Storage Drivers

Slide 10

Slide 10 text

Docker : Storage Drivers

Slide 11

Slide 11 text

Btrfs/Zfs AUFS DEVICE- MAPPER OVERLAY OVERLAY2 Require disk formatting Not supported by kernel anymore since 3.18 Warning : bad performance for local-lvm Run out of inode easily Require disabling selinux, require Centos7.4 (kernel 3.10.0-693) Docker storage driver of choice = overlay2 - Best performance/stability with less requirements - With docker < 18.02, detection over kernel capabilities for overlay2 is buggy (require force storage driver for docker 17.03) - Educated bet on future Docker Storage Driver: which and why ?

Slide 12

Slide 12 text

● Configure it with /etc/docker/daemon.json file ○ “storage-driver”: “overlay2” ○ “storage-opts”: [“overlay2.override_kernel_check=true”] ● Double check the compatibility matrix ● On RedHat/CentOS be wary of: ○ XFS with a specific mkfs option mandatory with OverlayFS for /var/lib/docker ○ Device or resource busy when stopping a container ○ How to Setup OverlayFS on RedHat Docker : Storage Drivers

Slide 13

Slide 13 text

Docker : Storage Drivers issues Storage Driver Number of issues Number of issues open Device Mapper 184 41 Overlay (1 & 2) 150 30 Zfs 26 7 Btrfs 46 14 Aufs 87 24 (last 12th September 2017)

Slide 14

Slide 14 text

● Check the Storage Driver to use before deploying on production ● Be wary of Kernel capabilities and Docker “default” Storage Driver decision ● Future might be to use directly containerd 1.1+ (interesting history of graph drivers and why it isn’t supported “as is” in containerd) Docker : Summary

Slide 15

Slide 15 text

● To deploy our S3 connector (RING product) ● To replicate object into several clouds with Zenko (Open Source) ○ We used to use Docker Swarm ○ Now we do Kubernetes What production ?

Slide 16

Slide 16 text

Why Kubernetes ? - Run everywhere, on any cloud => provide API abstraction - Control plane run server side (compared to docker compose) - Self-healing - Auto-scaling (of pods, of cluster, of resources requests) - Huge set of plugins (centralised logging, monitoring, ingress) - Big community - Docker announcement to support Kubernetes in 2017 - Customers trust and want it

Slide 17

Slide 17 text

● An opinionated Kubernetes distribution with a focus on long-term on-prem deployments ● A commitment to bare-metal ● Open Source : check here MetalK8s ● KOPS ● EKS (one day ?) AWS ● GKE GCP ● AKS Azure ● ? Bare Metal

Slide 18

Slide 18 text

1) Inventory precheck 2) ping (connectivity check) 3) precheck about centos (kernel) 4) precheck on storage 5) create lvm vg/lv 6) => call kubespray ! <= 7) register lv into kubespray 8) deploy nginx ingress 9) deploy prometheus + grafana 10) deploy elasticsearch + kibana MetalK8s: quality Inventory n etcd % 2 = 1 n master > 1 n node > 0 Connectivity

Slide 19

Slide 19 text

("/sys/fs/cgroup/devices/kubepods/burstable/pod6ee68e26-9bed-11e8-b370-f403435bf038 /a1ac00006e2cd56faf6b14212c6a881371d9e1a683c2852fd47295fda4b00954": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/kubepods/burstable/pod6ee68e26-9bed-11e8-b370-f403435bf038/a 1ac00006e2cd56faf6b14212c6a881371d9e1a683c2852fd47295fda4b00954: no space left on device One tiny little problem ….

Slide 20

Slide 20 text

$ cat /proc/cgroups #subsys_name hierarchy num_cgroups enabled cpuset 10 3741 1 cpu 2 3971 1 cpuacct 2 3971 1 memory 4 3971 1 devices 7 3971 1 freezer 8 3741 1 net_cls 3 3741 1 blkio 5 3971 1 perf_event 6 3741 1 hugetlb 11 3741 1 pids 9 3971 1 net_prio 3 3741 1 Cgroup : No space left on device ?

Slide 21

Slide 21 text

● A kernel cgroup bug for kernel prior to 4.0 identified for Moby What we found so far

Slide 22

Slide 22 text

● A kernel cgroup bug for kernel < 4.0 ● A runc change of behavior with kernel memory accounting What we found so far

Slide 23

Slide 23 text

● A kernel cgroup bug for kernel prior to 4.0 identified for Moby ● A runc change of behavior with kernel memory accounting ● A Kubernetes bug identifying the issue What we found so far

Slide 24

Slide 24 text

● A kernel cgroup bug for kernel prior to 4.0 identified for Moby ● A runc change of behavior with kernel memory accounting ● A Kubernetes bug identifying the issue ● More precisely : this commit is responsible of the “bug” What we found so far

Slide 25

Slide 25 text

● A kernel cgroup bug for kernel prior to 4.0 identified for Moby ● A runc change of behavior with kernel memory accounting ● A Kubernetes bug identifying the issue ● More precisely : this commit is responsible of the “bug” ● A chinese page describing the issue (thank you google translate) What we found so far

Slide 26

Slide 26 text

● Use a recent kernel, even on CentOS What’s next

Slide 27

Slide 27 text

● Use a recent kernel, even on CentOS ● Wait for the fix to be backported What’s next

Slide 28

Slide 28 text

● Use a recent kernel, even on CentOS ● Wait for the fix to be backported ● Reboot regularly your servers What’s next

Slide 29

Slide 29 text

● Use a recent kernel, even on CentOS ● Wait for the fix to be backported ● Reboot regularly your servers ● Recompile your kernel without “CONFIG_MEMCG_KMEM” What’s next

Slide 30

Slide 30 text

An opinionated Kubernetes distribution with a focus on long-term on-prem deployments Conclusion

Slide 31

Slide 31 text

An opinionated Kubernetes distribution with a focus on long-term on-prem deployments Q&A