Container orchestration: in theory and practice (ScaleConf 2016)

Container orchestration: in theory and practice Jamie Hewland

Our approach We build open source, scalable platforms that allow
anyone with a mobile phone to access vital information and essential services – putting wellbeing in the palm of their hands.

What I’m here to talk about • We’re using Mesos
and Marathon in production • … plus a bunch of other stuff • Why did we do this? • What have we learned?

Use case: Universal Core • Our product for mobile websites:
Universal Core • Simple CMS websites, Django-based • Needed to host many websites (100s) with many variations for Facebook’s Free Basics platform • Existing system of manually placing Python processes on hosts wasn’t scaling

Universal Core cluster • Mesos + Marathon cluster running over
400 websites, each in a Docker container • Cluster of ~10 nodes in South African datacenter • Automated deployment of CMS websites by project managers

Universal Core cluster

• MomConnect - 2014 • Connects every pregnant woman in
South Africa to national health services • Provides information, advice, and opportunities to ask questions and share opinions • Want to expand to other countries Use case: MomConnect

MomConnect: where it’s going • Seed is our “health stack”
— platform to take health services to new countries • Data sovereignty issues mean we can’t use traditional cloud providers • Hosted, supported & replicated locally

Organisational scaling • We want to scale for impact —
while remaining a small company • Hand over to local partner after some time • “Plant the seed”

Seed Stack • Need a common, automated infrastructure platform for
launching services (“Seed Stack”) • Evolving our infrastructure beyond “hosting” — general platform for connected microservices

Seed Stack requirements • Coexist with unreliable infrastructure • Make
efﬁcient use of limited resources • Friendly interface for other devs to use • High level of automation from provisioning and monitoring perspective

Docker containers Any software you need, just do a docker
run

Docker: in practice • Not standard processes — need new
tools to manage • New networking problems — many services per host • Persistent storage not straightforward — have to mount volumes

Container orchestration • Allocate cluster resources • Schedule containers to
run on a mixed cluster • Restart containers if they fall over • Provide tools for building larger systems • Some concept of highly available “state of the world”

Lots of open source projects • Apache Mesos • Mesosphere
Marathon • Google Kubernetes • Apache Aurora • HashiCorp Nomad • Docker Swarm

Resources Compute: CPU, RAM Networking: IP addresses, ports, domains… Storage:
persistent general storage

Resources Compute: CPU, RAM ✔ Networking: IP addresses, ports, domains…
Storage: persistent general storage

Container networking • Routing: how to route requests to/from/between dynamically
deployed containers • Load-balancing: how to balance requests across multiple containers from internal & external clients • Service discovery: how containers know where to reach each other

Diversion: Docker bridge networking

IP-per-container problem • Don’t want to have to decide on
ports for things (the scheduler should do it) • Things expect other things to be on certain ports (HTTP: 80, PostgreSQL: 5432, RabbitMQ: 5672…) • DNS SRV records are nice but nothing queries them • Want an IP address per container, like a regular host

IP-per-container solutions • Virtual overlay networks • Docker overlay networks,
ﬂannel, weave, … • Layer 2 - data link (frame) • Encapsulation and tunnelling of packets • IP routing • Project Calico, ﬂannel • Layer 3 - network (packet) • Standard IP routing with iptables isolation VMworld 2013: Troubleshooting VXLAN and Network Services in a Virtualized Environment Deshpande & Thakkar (2013)

IP-per-container solutions • Software routers • Airbnb’s SmartStack Synapse, marathon-lb,
… • Layer 4/7 - transport/ application • HAProxy/Nginx routing by IP:port or HTTP proxying • Cloud-speciﬁc • ﬂannel, others… • AWS VPC route tables, GCE networking

Service discovery • Give me the address for service x
• Various options: • HTTP API (signiﬁcant code changes) • DNS (we might need the port too, TTLs…) • Layer 7 routing (limits available protocols) • Ideally: don’t want to change the app code, only conﬁguration

HashiCorp Consul • Consul does more than one thing: DNS,
key/value store, locks, health checks… • For service discovery: DNS server and HTTP API — e.g. postgresql.service.consul, marathon.service.consul • Does health checks of services locally on the node • Spreads health information very quickly via gossip protocol • HashiCorp Serf: SWIM protocol (Das, Gupta, Motivala - 2002) • Can use Consul Template to conﬁgure other tools

Consular: Marathon <-> Consul bridge

Distributed storage • Open source distributed storage/networked ﬁlesystem: GlusterFS, Ceph,
HDFS • Docker volume API to tie to containers • Software that moves volumes with your containers: Flocker, Sheepdog Project • “EBS on AWS, bring your own network block device elsewhere”

What we picked • Mesos + Marathon • Consul service
discovery • Templated Nginx routing/load-balancing • GlusterFS storage • Custom frontend for launching apps • Shared PostgreSQL, RabbitMQ, Redis instances

What works well • Mesos! Marathon! • Make much better
use of available resources • Adding more nodes relatively painless • If a node falls over things behave pretty well • High level of automation (for most cases)

…and not so well • Networking: we don’t have IP-per-container
yet • Duplicated and sometimes conﬂicting functionality in Marathon and Consul • Security: templated conﬁguration is very powerful • Want to move more infrastructure into containers that run on Mesos • Still a lot of auxiliary services outside the cluster

Future work • Waiting for integration between Mesos/Marathon and IP-per-container
solutions • Logging, monitoring, fault-reporting all still need work • Sharing secrets between nodes/containers (HashiCorp Vault) • Ask us in 6 months about how our ﬁrst couple of in- country deployments have gone…

praekelt/seed-stack Thank you.

Container orchestration: in theory and practice...

Container orchestration: in theory and practice (ScaleConf 2016)

Jamie Hewland

More Decks by Jamie Hewland

Other Decks in Programming

Featured

Transcript

Container orchestration: in theory and practice Jamie Hewland

Our approach We build open source, scalable platforms that allow

What I’m here to talk about • We’re using Mesos

Use case: Universal Core • Our product for mobile websites:

Universal Core cluster • Mesos + Marathon cluster running over

Universal Core cluster

• MomConnect - 2014 • Connects every pregnant woman in

MomConnect: where it’s going • Seed is our “health stack”

Organisational scaling • We want to scale for impact —

Seed Stack • Need a common, automated infrastructure platform for

Seed Stack requirements • Coexist with unreliable infrastructure • Make

Docker containers Any software you need, just do a docker

Docker: in practice • Not standard processes — need new

Container orchestration • Allocate cluster resources • Schedule containers to

Lots of open source projects • Apache Mesos • Mesosphere

Resources Compute: CPU, RAM Networking: IP addresses, ports, domains… Storage:

Resources Compute: CPU, RAM ✔ Networking: IP addresses, ports, domains…

Container networking • Routing: how to route requests to/from/between dynamically

Diversion: Docker bridge networking

Diversion: Docker bridge networking

IP-per-container problem • Don’t want to have to decide on

IP-per-container solutions • Virtual overlay networks • Docker overlay networks,

IP-per-container solutions • Software routers • Airbnb’s SmartStack Synapse, marathon-lb,

Service discovery • Give me the address for service x

HashiCorp Consul • Consul does more than one thing: DNS,

Consular: Marathon <-> Consul bridge

Distributed storage • Open source distributed storage/networked ﬁlesystem: GlusterFS, Ceph,

What we picked • Mesos + Marathon • Consul service

What works well • Mesos! Marathon! • Make much better

…and not so well • Networking: we don’t have IP-per-container

Future work • Waiting for integration between Mesos/Marathon and IP-per-container

praekelt/seed-stack Thank you.