Slide 1

Slide 1 text

Container orchestration: in theory and practice Jamie Hewland

Slide 2

Slide 2 text

Our approach We build open source, scalable platforms that allow anyone with a mobile phone to access vital information and essential services – putting wellbeing in the palm of their hands.

Slide 3

Slide 3 text

What I’m here to talk about • We’re using Mesos and Marathon in production • … plus a bunch of other stuff • Why did we do this? • What have we learned?

Slide 4

Slide 4 text

Use case: Universal Core • Our product for mobile websites: Universal Core • Simple CMS websites, Django-based • Needed to host many websites (100s) with many variations for Facebook’s Free Basics platform • Existing system of manually placing Python processes on hosts wasn’t scaling

Slide 5

Slide 5 text

Universal Core cluster • Mesos + Marathon cluster running over 400 websites, each in a Docker container • Cluster of ~10 nodes in South African datacenter • Automated deployment of CMS websites by project managers

Slide 6

Slide 6 text

Universal Core cluster

Slide 7

Slide 7 text

• MomConnect - 2014 • Connects every pregnant woman in South Africa to national health services • Provides information, advice, and opportunities to ask questions and share opinions • Want to expand to other countries Use case: MomConnect

Slide 8

Slide 8 text

MomConnect: where it’s going • Seed is our “health stack” — platform to take health services to new countries • Data sovereignty issues mean we can’t use traditional cloud providers • Hosted, supported & replicated locally

Slide 9

Slide 9 text

Organisational scaling • We want to scale for impact — while remaining a small company • Hand over to local partner after some time • “Plant the seed”

Slide 10

Slide 10 text

Seed Stack • Need a common, automated infrastructure platform for launching services (“Seed Stack”) • Evolving our infrastructure beyond “hosting” — general platform for connected microservices

Slide 11

Slide 11 text

Seed Stack requirements • Coexist with unreliable infrastructure • Make efficient use of limited resources • Friendly interface for other devs to use • High level of automation from provisioning and monitoring perspective

Slide 12

Slide 12 text

Docker containers Any software you need, just do a docker run

Slide 13

Slide 13 text

Docker: in practice • Not standard processes — need new tools to manage • New networking problems — many services per host • Persistent storage not straightforward — have to mount volumes

Slide 14

Slide 14 text

Container orchestration • Allocate cluster resources • Schedule containers to run on a mixed cluster • Restart containers if they fall over • Provide tools for building larger systems • Some concept of highly available “state of the world”

Slide 15

Slide 15 text

Lots of open source projects • Apache Mesos • Mesosphere Marathon • Google Kubernetes • Apache Aurora • HashiCorp Nomad • Docker Swarm

Slide 16

Slide 16 text

Resources Compute: CPU, RAM Networking: IP addresses, ports, domains… Storage: persistent general storage

Slide 17

Slide 17 text

Resources Compute: CPU, RAM ✔ Networking: IP addresses, ports, domains… Storage: persistent general storage

Slide 18

Slide 18 text

Container networking • Routing: how to route requests to/from/between dynamically deployed containers • Load-balancing: how to balance requests across multiple containers from internal & external clients • Service discovery: how containers know where to reach each other

Slide 19

Slide 19 text

Diversion: Docker bridge networking

Slide 20

Slide 20 text

Diversion: Docker bridge networking

Slide 21

Slide 21 text

IP-per-container problem • Don’t want to have to decide on ports for things (the scheduler should do it) • Things expect other things to be on certain ports (HTTP: 80, PostgreSQL: 5432, RabbitMQ: 5672…) • DNS SRV records are nice but nothing queries them • Want an IP address per container, like a regular host

Slide 22

Slide 22 text

IP-per-container solutions • Virtual overlay networks • Docker overlay networks, flannel, weave, … • Layer 2 - data link (frame) • Encapsulation and tunnelling of packets • IP routing • Project Calico, flannel • Layer 3 - network (packet) • Standard IP routing with iptables isolation VMworld 2013: Troubleshooting VXLAN and Network Services in a Virtualized Environment Deshpande & Thakkar (2013)

Slide 23

Slide 23 text

IP-per-container solutions • Software routers • Airbnb’s SmartStack Synapse, marathon-lb, … • Layer 4/7 - transport/ application • HAProxy/Nginx routing by IP:port or HTTP proxying • Cloud-specific • flannel, others… • AWS VPC route tables, GCE networking

Slide 24

Slide 24 text

Service discovery • Give me the address for service x • Various options: • HTTP API (significant code changes) • DNS (we might need the port too, TTLs…) • Layer 7 routing (limits available protocols) • Ideally: don’t want to change the app code, only configuration

Slide 25

Slide 25 text

HashiCorp Consul • Consul does more than one thing: DNS, key/value store, locks, health checks… • For service discovery: DNS server and HTTP API — e.g. postgresql.service.consul, marathon.service.consul • Does health checks of services locally on the node • Spreads health information very quickly via gossip protocol • HashiCorp Serf: SWIM protocol (Das, Gupta, Motivala - 2002) • Can use Consul Template to configure other tools

Slide 26

Slide 26 text

Consular: Marathon <-> Consul bridge

Slide 27

Slide 27 text

Distributed storage • Open source distributed storage/networked filesystem: GlusterFS, Ceph, HDFS • Docker volume API to tie to containers • Software that moves volumes with your containers: Flocker, Sheepdog Project • “EBS on AWS, bring your own network block device elsewhere”

Slide 28

Slide 28 text

What we picked • Mesos + Marathon • Consul service discovery • Templated Nginx routing/load-balancing • GlusterFS storage • Custom frontend for launching apps • Shared PostgreSQL, RabbitMQ, Redis instances

Slide 29

Slide 29 text

What works well • Mesos! Marathon! • Make much better use of available resources • Adding more nodes relatively painless • If a node falls over things behave pretty well • High level of automation (for most cases)

Slide 30

Slide 30 text

…and not so well • Networking: we don’t have IP-per-container yet • Duplicated and sometimes conflicting functionality in Marathon and Consul • Security: templated configuration is very powerful • Want to move more infrastructure into containers that run on Mesos • Still a lot of auxiliary services outside the cluster

Slide 31

Slide 31 text

Future work • Waiting for integration between Mesos/Marathon and IP-per-container solutions • Logging, monitoring, fault-reporting all still need work • Sharing secrets between nodes/containers (HashiCorp Vault) • Ask us in 6 months about how our first couple of in- country deployments have gone…

Slide 32

Slide 32 text

praekelt/seed-stack Thank you.