Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How have we been building a containers-based PaaS these last 5 years?

How have we been building a containers-based PaaS these last 5 years?

Talk about building a PaaS, and container scheduling system

Soulou

June 26, 2018
Tweet

More Decks by Soulou

Other Decks in Technology

Transcript

  1. What is Scalingo? Third-party application hosting platform → Images built

    from code source Database as a Service hosting → Pre-built images available on Docker Hub No Docker knowledge required
  2. Constraints PaaS provider objective ≠ Software service company Infrastructure optimization

    High level of consolidation Efficient scheduling → Avoid paying unused resources
  3. Tooling timeline Tools Docker 0.1.0 → 2013-03-23 Etcd 0.1.0 →

    2013-08-11 Orchestration Swarm 1.0.0 → 2015-11-03 Kubernetes 1.0.6 → 2015-09-11
  4. A (small) swissknife A Little of everything Not efficient everywhere

    Lock-in → A tool for us: Isolation helper Apps distribution
  5. Logging - demo $ docker run -it --log-opt file-max=1 --log-opt

    file-size=1m ubuntu:16.04 bash $ for i in $(seq 100000) ; do perl -e "print 'xxxxxxxxxxxxxxxxxxxxxxxxx$i' x 10, \"\n\"" ; done $ docker logs --follow <id> Demo: breaking docker logs
  6. Logging Agent TCP + TLS Agent TCP + TLS Message

    Bus 1.10.0 (2016-02-04) : Syslog driver TCP+TLS
  7. Networking throttling $ interface_id=$(echo 'ip link show eth0' | nsenter

    --target #{pid} --net --pid) $ iface=$(ip link show | grep “^${interface_id}:”) $ tc qdisc replace dev “${iface}” root tbf rate “${limit}”mbit latency 200ms burst “${burst}”MB No hook possible with Docker → Our job
  8. Security Concerns Unprivileged users only 1 container → 1 user

    Apps are built with 1 layer → 1 tarball Ability to patch base image
  9. NFS → Good enough to start Downsides: • SPoF •

    HA tricky to get • Hard to scale • No blkio cgroup! Storage - Beginnings
  10. Storage - NFS HA NFS DB DB DB DB NFS

    DRBD VIP DB DB DB DB MASTER FAILOVER
  11. Software Defined Storage - GlusterFS* - Ceph* - OpenSDS* -

    OpenEBS* - Rook* - StorageOS - Hedvig - ScaleIO (Dell) - Kasten - Virtuoso - Datacore - Diamanti - Hatchway (VMWare) - Portworx - Quobyte - Datera - Robin Cloud Platform - ... * FOSS
  12. Attached disks from SAN ≥ 100 volumes per host LVM

    magic Storage ie. https://github.com/Scalingo/go-fssync Good cloning tools Reattach on host failure
  13. Little Reminder Our goal: optimize servers usage Automate all the

    things Include business logics Scheduling
  14. Stop the world and move things Online vs Offline New

    container to an existing topology Scheduling
  15. Scheduling - Online - E. Arzuaga and D. R. Kaeli.

    Quantifying load imbalance on virtualized enterprise servers. page 235. ACM Press. - D. Ferrari and S. Zhou. An empirical investigation of load indices for load balancing applications. - T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper : Black-box and gray-box resource management for virtual machines. 53(17) : 2923–2938. Current Strategy → Sandpiper
  16. Business Logics and Support → Control needed on core infrastructure

    → Technology mastering → Need to patch K8s? → Scheduling strategy (w/ overcommitting) → “Unstable” solution Why not existing tech today?
  17. Owned Apps vs Third-Party → Not design for third-party →

    Billing requirements → Fine grained resource control Why not existing tech today?