How have we been building a containers-based PaaS these last 5 years?

How have we been building a containers-based PaaS these last
5 years? Soulou 26 Juin 2018

Léo Unbekandt, Co-founder & CTO Automagic cloud hosting for awesome
teams Quick Presentation

Part 1 Genesis of a containers- backed PaaS

What is Scalingo? Third-party application hosting platform → Images built
from code source Database as a Service hosting → Pre-built images available on Docker Hub No Docker knowledge required

DBaaS Offer What is Scalingo?

Constraints PaaS provider objective ≠ Software service company Infrastructure optimization
High level of consolidation Efficient scheduling → Avoid paying unused resources

Back to the past (2013) Once upon a time...

Containers? Scheduling? “Linux Containers” Jan 2013

Tooling timeline Tools Docker 0.1.0 → 2013-03-23 Etcd 0.1.0 →
2013-08-11 Orchestration Swarm 1.0.0 → 2015-11-03 Kubernetes 1.0.6 → 2015-09-11

Sept 2013: first prototype Basic Scheduling Container Clustering

Growing 2013: 10s 2015: 100s 2018: 1000s

Part 2 Containers == Docker? Small hint → NO

A new Standard? Containers under the spotlight

Ecosystem

A (small) swissknife A Little of everything Not efficient everywhere
Lock-in → A tool for us: Isolation helper Apps distribution

Logging 2013 <-> 2016: “JSON file tailing” (which eventually breaks
on logrotate, still today YES.)

Logging - demo $ docker run -it --log-opt file-max=1 --log-opt
file-size=1m ubuntu:16.04 bash $ for i in $(seq 100000) ; do perl -e "print 'xxxxxxxxxxxxxxxxxxxxxxxxx$i' x 10, \"\n\"" ; done $ docker logs --follow <id> Demo: breaking docker logs

Logging Agent TCP + TLS Agent TCP + TLS Message
Bus 1.10.0 (2016-02-04) : Syslog driver TCP+TLS

Monitoring GET /containers/:id/stats - No configuration possible - Poor performance

Monitoring Agent Message Bus Cgroups Namespaces Agent Cgroups Namespaces https://github.com/Scalingo/acadock-monitoring

Networking throttling Still searching...

Networking throttling $ interface_id=$(echo 'ip link show eth0' | nsenter
--target #{pid} --net --pid) $ iface=$(ip link show | grep “^${interface_id}:”) $ tc qdisc replace dev “${iface}” root tbf rate “${limit}”mbit latency 200ms burst “${burst}”MB No hook possible with Docker → Our job

Networking Docker should be a tool → No lock-in Goodbye
Docker network

Software Distribution Docker Registry Image spreading

Security Concerns Unprivileged users only 1 container → 1 user
Apps are built with 1 layer → 1 tarball Ability to patch base image

Part 3 Orchestration aka. Clustering and Scheduling

Simple bridges and port exports Networking - Apps App App
App App HTTP Requests Bridge

Networking - Database Clusters DB DB DB Internal Cluster Network
Per cluster private network

Networking - SAND https://github.com/Scalingo/sand

NFS → Good enough to start Downsides: • SPoF •
HA tricky to get • Hard to scale • No blkio cgroup! Storage - Beginnings

Storage - NFS HA NFS DB DB DB DB NFS
DRBD VIP DB DB DB DB MASTER FAILOVER

Software Defined Storage - GlusterFS* - Ceph* - OpenSDS* -
OpenEBS* - Rook* - StorageOS - Hedvig - ScaleIO (Dell) - Kasten - Virtuoso - Datacore - Diamanti - Hatchway (VMWare) - Portworx - Quobyte - Datera - Robin Cloud Platform - ... * FOSS

Attached disks from SAN ≥ 100 volumes per host LVM
magic Storage ie. https://github.com/Scalingo/go-fssync Good cloning tools Reattach on host failure

Little Reminder Our goal: optimize servers usage Automate all the
things Include business logics Scheduling

Stop the world and move things Online vs Offline New
container to an existing topology Scheduling

Scheduling - Online - Multiplicity of strategies - Memory -
CPU - Load average /!\ - Disk IO

Scheduling - Online - E. Arzuaga and D. R. Kaeli.
Quantifying load imbalance on virtualized enterprise servers. page 235. ACM Press. - D. Ferrari and S. Zhou. An empirical investigation of load indices for load balancing applications. - T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper : Black-box and gray-box resource management for virtual machines. 53(17) : 2923–2938. Current Strategy → Sandpiper

Scheduling - Offline

Scheduling - Offline (And listen to the monitoring)

Why not existing tech today? ???

Business Logics and Support → Control needed on core infrastructure
→ Technology mastering → Need to patch K8s? → Scheduling strategy (w/ overcommitting) → “Unstable” solution Why not existing tech today?

Owned Apps vs Third-Party → Not design for third-party →
Billing requirements → Fine grained resource control Why not existing tech today?

Soulou Thanks! https://scalingo.com 30-days free trial Try us: PCD2018 (50€
voucher) Questions?

How have we been building a containers-based Pa...

How have we been building a containers-based PaaS these last 5 years?

More Decks by Soulou

Other Decks in Technology

Featured

Transcript