Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Docker Storage drivers for production

Docker Storage drivers for production

A small presentation of various issues with Docker, its storage drivers, Kernel capabilities and Kubernetes interaction

Benjamin Allot

October 23, 2018
Tweet

Other Decks in Technology

Transcript

  1. Benjamin Allot, Sysadmin/R&D Engineer @Scality
    Sysadmin Days #8 October 18th, 2018
    Docker for production

    View Slide

  2. FOR ANY STANDARD x86 SERVER PLATFORM
    100% SOFTWARE SOLUTION
    PRIVATE CLOUD PUBLIC CLOUDS
    ……
    MULTI-CLOUD DATA CONTROLLER
    TO ACCESS AND MANAGE DATA ACROSS CLOUDS
    object & file storage in a single system · peer to peer architecture
    · unlimited scalability · unbounded scale-out performance · most
    adaptive set of robust data protection mechanisms · autonomous
    self-healing · designed in close collaboration with the biggest
    (cloud-scale) service providers in the world
    a single, unified API across all clouds to simplify application
    development · the only multi-cloud data management solution
    independent of the storage system · stores data in standard cloud
    format to make the data consumable directly by native cloud apps and
    services · true multi-cloud IT · global search across all managed data
    independent of cloud location
    ……
    … … …

    View Slide

  3. Benjamin Allot Docker
    ● Adoption early in 2014
    ● Tooling at Scality for our
    builds, then CI
    ● S3 API compatible server
    deployed with docker
    Tech Leader of “Setup team”
    Trained as a Developer
    Unix Sysadmin for a living

    View Slide

  4. Docker Image : You said layers ?
    Thanks to xkcd

    View Slide

  5. Docker Image: You said layers ?
    Check Docker documentation here

    View Slide

  6. Docker : Storage Drivers

    View Slide

  7. ● The good: The layers are stored efficiently thanks to a
    “copy on write” mechanism
    Docker : Storage Drivers

    View Slide

  8. ● The good: The layers are stored efficiently thanks to a
    “copy on write” mechanism
    ● The bad: There is several storage drivers, each with
    their strengths and weaknesses
    Docker : Storage Drivers

    View Slide

  9. ● The good: The layers are stored efficiently thanks to a
    “copy on write” mechanism
    ● The bad: There is several storage drivers, each with
    their strengths and weaknesses
    ● The ugly: Bad combination of Docker, Storage Driver
    and Kernel can lead to issues (mostly upon stop)
    Docker : Storage Drivers

    View Slide

  10. Docker : Storage Drivers

    View Slide

  11. Btrfs/Zfs AUFS DEVICE-
    MAPPER
    OVERLAY OVERLAY2
    Require disk
    formatting
    Not supported by
    kernel anymore
    since 3.18
    Warning : bad
    performance for
    local-lvm
    Run out of inode
    easily
    Require disabling
    selinux, require
    Centos7.4
    (kernel 3.10.0-693)
    Docker storage driver of choice = overlay2
    - Best performance/stability with less requirements
    - With docker < 18.02, detection over kernel capabilities for overlay2 is
    buggy (require force storage driver for docker 17.03)
    - Educated bet on future
    Docker Storage Driver: which and why ?

    View Slide

  12. ● Configure it with /etc/docker/daemon.json file
    ○ “storage-driver”: “overlay2”
    ○ “storage-opts”: [“overlay2.override_kernel_check=true”]
    ● Double check the compatibility matrix
    ● On RedHat/CentOS be wary of:
    ○ XFS with a specific mkfs option mandatory with OverlayFS for /var/lib/docker
    ○ Device or resource busy when stopping a container
    ○ How to Setup OverlayFS on RedHat
    Docker : Storage Drivers

    View Slide

  13. Docker : Storage Drivers issues
    Storage Driver Number of issues Number of issues open
    Device Mapper 184 41
    Overlay (1 & 2) 150 30
    Zfs 26 7
    Btrfs 46 14
    Aufs 87 24 (last 12th September
    2017)

    View Slide

  14. ● Check the Storage Driver to use before
    deploying on production
    ● Be wary of Kernel capabilities and Docker
    “default” Storage Driver decision
    ● Future might be to use directly containerd
    1.1+ (interesting history of graph drivers and
    why it isn’t supported “as is” in containerd)
    Docker : Summary

    View Slide

  15. ● To deploy our S3 connector (RING product)
    ● To replicate object into several clouds with Zenko
    (Open Source)
    ○ We used to use Docker Swarm
    ○ Now we do Kubernetes
    What production ?

    View Slide

  16. Why Kubernetes ?
    - Run everywhere, on any cloud => provide API abstraction
    - Control plane run server side (compared to docker compose)
    - Self-healing
    - Auto-scaling (of pods, of cluster, of resources requests)
    - Huge set of plugins (centralised logging, monitoring, ingress)
    - Big community
    - Docker announcement to support Kubernetes in 2017
    - Customers trust and want it

    View Slide

  17. ● An opinionated Kubernetes distribution with a
    focus on long-term on-prem deployments
    ● A commitment to bare-metal
    ● Open Source : check here
    MetalK8s
    ● KOPS
    ● EKS (one day ?)
    AWS
    ● GKE
    GCP
    ● AKS
    Azure
    ● ?
    Bare
    Metal

    View Slide

  18. 1) Inventory precheck
    2) ping (connectivity check)
    3) precheck about centos (kernel)
    4) precheck on storage
    5) create lvm vg/lv
    6) => call kubespray ! <=
    7) register lv into kubespray
    8) deploy nginx ingress
    9) deploy prometheus + grafana
    10) deploy elasticsearch + kibana
    MetalK8s: quality
    Inventory
    n
    etcd
    % 2 = 1
    n
    master
    > 1
    n
    node
    > 0
    Connectivity

    View Slide

  19. ("/sys/fs/cgroup/devices/kubepods/burstable/pod6ee68e26-9bed-11e8-b370-f403435bf038
    /a1ac00006e2cd56faf6b14212c6a881371d9e1a683c2852fd47295fda4b00954": 0x40000100 ==
    IN_CREATE|IN_ISDIR): inotify_add_watch
    /sys/fs/cgroup/devices/kubepods/burstable/pod6ee68e26-9bed-11e8-b370-f403435bf038/a
    1ac00006e2cd56faf6b14212c6a881371d9e1a683c2852fd47295fda4b00954: no space left on
    device
    One tiny little problem ….

    View Slide

  20. $ cat /proc/cgroups
    #subsys_name hierarchy num_cgroups enabled
    cpuset 10 3741 1
    cpu 2 3971 1
    cpuacct 2 3971 1
    memory 4 3971 1
    devices 7 3971 1
    freezer 8 3741 1
    net_cls 3 3741 1
    blkio 5 3971 1
    perf_event 6 3741 1
    hugetlb 11 3741 1
    pids 9 3971 1
    net_prio 3 3741 1
    Cgroup : No space left on device ?

    View Slide

  21. ● A kernel cgroup bug for kernel prior to 4.0 identified for Moby
    What we found so far

    View Slide

  22. ● A kernel cgroup bug for kernel < 4.0
    ● A runc change of behavior with kernel memory accounting
    What we found so far

    View Slide

  23. ● A kernel cgroup bug for kernel prior to 4.0 identified for Moby
    ● A runc change of behavior with kernel memory accounting
    ● A Kubernetes bug identifying the issue
    What we found so far

    View Slide

  24. ● A kernel cgroup bug for kernel prior to 4.0 identified for Moby
    ● A runc change of behavior with kernel memory accounting
    ● A Kubernetes bug identifying the issue
    ● More precisely : this commit is responsible of the “bug”
    What we found so far

    View Slide

  25. ● A kernel cgroup bug for kernel prior to 4.0 identified for Moby
    ● A runc change of behavior with kernel memory accounting
    ● A Kubernetes bug identifying the issue
    ● More precisely : this commit is responsible of the “bug”
    ● A chinese page describing the issue (thank you google translate)
    What we found so far

    View Slide

  26. ● Use a recent kernel, even on CentOS
    What’s next

    View Slide

  27. ● Use a recent kernel, even on CentOS
    ● Wait for the fix to be backported
    What’s next

    View Slide

  28. ● Use a recent kernel, even on CentOS
    ● Wait for the fix to be backported
    ● Reboot regularly your servers
    What’s next

    View Slide

  29. ● Use a recent kernel, even on CentOS
    ● Wait for the fix to be backported
    ● Reboot regularly your servers
    ● Recompile your kernel without “CONFIG_MEMCG_KMEM”
    What’s next

    View Slide

  30. An opinionated Kubernetes distribution
    with a focus on long-term on-prem
    deployments
    Conclusion

    View Slide

  31. An opinionated Kubernetes distribution
    with a focus on long-term on-prem
    deployments
    Q&A

    View Slide