Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Container orchestration: in theory and practice (ScaleConf 2016)

Container orchestration: in theory and practice (ScaleConf 2016)

Open source tools from companies such as Mesosphere, CoreOS and HashiCorp have made it possible for small teams to build scalable, highly automated infrastructure on which to deploy applications. Software container systems such as Docker have drastically simplified the way these applications are packaged and executed.

Over the last 2 years, we've worked hard to update our infrastructure from a complex set of manually-coordinated processes to an automated system that deploys Docker containers to a cluster. We made this move to reduce costs, scale further and make our software more portable.

We've used this new infrastructure to host hundreds of websites for Facebook's Free Basics platform. Our new health software stack, Seed, has pushed the requirements of the infrastructure even further. We're working towards replicating this stack to deployments in countries across the world.

The talk will cover some of the basic building blocks of cluster orchestration systems. We'll also discuss some of the challenges in networking containers at scale and touch on some solutions for persistent storage.

Finally, we'll look at how these problems and the available solutions have shaped our infrastructure, the design trade-offs we made, and what we're looking forward to in the future.


Jamie Hewland

March 14, 2016

More Decks by Jamie Hewland

Other Decks in Programming


  1. Container orchestration: in theory and practice Jamie Hewland

  2. Our approach We build open source, scalable platforms that allow

    anyone with a mobile phone to access vital information and essential services – putting wellbeing in the palm of their hands.
  3. What I’m here to talk about • We’re using Mesos

    and Marathon in production • … plus a bunch of other stuff • Why did we do this? • What have we learned?
  4. Use case: Universal Core • Our product for mobile websites:

    Universal Core • Simple CMS websites, Django-based • Needed to host many websites (100s) with many variations for Facebook’s Free Basics platform • Existing system of manually placing Python processes on hosts wasn’t scaling
  5. Universal Core cluster • Mesos + Marathon cluster running over

    400 websites, each in a Docker container • Cluster of ~10 nodes in South African datacenter • Automated deployment of CMS websites by project managers
  6. Universal Core cluster

  7. • MomConnect - 2014 • Connects every pregnant woman in

    South Africa to national health services • Provides information, advice, and opportunities to ask questions and share opinions • Want to expand to other countries Use case: MomConnect
  8. MomConnect: where it’s going • Seed is our “health stack”

    — platform to take health services to new countries • Data sovereignty issues mean we can’t use traditional cloud providers • Hosted, supported & replicated locally
  9. Organisational scaling • We want to scale for impact —

    while remaining a small company • Hand over to local partner after some time • “Plant the seed”
  10. Seed Stack • Need a common, automated infrastructure platform for

    launching services (“Seed Stack”) • Evolving our infrastructure beyond “hosting” — general platform for connected microservices
  11. Seed Stack requirements • Coexist with unreliable infrastructure • Make

    efficient use of limited resources • Friendly interface for other devs to use • High level of automation from provisioning and monitoring perspective
  12. Docker containers Any software you need, just do a docker

  13. Docker: in practice • Not standard processes — need new

    tools to manage • New networking problems — many services per host • Persistent storage not straightforward — have to mount volumes
  14. Container orchestration • Allocate cluster resources • Schedule containers to

    run on a mixed cluster • Restart containers if they fall over • Provide tools for building larger systems • Some concept of highly available “state of the world”
  15. Lots of open source projects • Apache Mesos • Mesosphere

    Marathon • Google Kubernetes • Apache Aurora • HashiCorp Nomad • Docker Swarm
  16. Resources Compute: CPU, RAM Networking: IP addresses, ports, domains… Storage:

    persistent general storage
  17. Resources Compute: CPU, RAM ✔ Networking: IP addresses, ports, domains…

    Storage: persistent general storage
  18. Container networking • Routing: how to route requests to/from/between dynamically

    deployed containers • Load-balancing: how to balance requests across multiple containers from internal & external clients • Service discovery: how containers know where to reach each other
  19. Diversion: Docker bridge networking

  20. Diversion: Docker bridge networking

  21. IP-per-container problem • Don’t want to have to decide on

    ports for things (the scheduler should do it) • Things expect other things to be on certain ports (HTTP: 80, PostgreSQL: 5432, RabbitMQ: 5672…) • DNS SRV records are nice but nothing queries them • Want an IP address per container, like a regular host
  22. IP-per-container solutions • Virtual overlay networks • Docker overlay networks,

    flannel, weave, … • Layer 2 - data link (frame) • Encapsulation and tunnelling of packets • IP routing • Project Calico, flannel • Layer 3 - network (packet) • Standard IP routing with iptables isolation VMworld 2013: Troubleshooting VXLAN and Network Services in a Virtualized Environment Deshpande & Thakkar (2013)
  23. IP-per-container solutions • Software routers • Airbnb’s SmartStack Synapse, marathon-lb,

    … • Layer 4/7 - transport/ application • HAProxy/Nginx routing by IP:port or HTTP proxying • Cloud-specific • flannel, others… • AWS VPC route tables, GCE networking
  24. Service discovery • Give me the address for service x

    • Various options: • HTTP API (significant code changes) • DNS (we might need the port too, TTLs…) • Layer 7 routing (limits available protocols) • Ideally: don’t want to change the app code, only configuration
  25. HashiCorp Consul • Consul does more than one thing: DNS,

    key/value store, locks, health checks… • For service discovery: DNS server and HTTP API — e.g. postgresql.service.consul, marathon.service.consul • Does health checks of services locally on the node • Spreads health information very quickly via gossip protocol • HashiCorp Serf: SWIM protocol (Das, Gupta, Motivala - 2002) • Can use Consul Template to configure other tools
  26. Consular: Marathon <-> Consul bridge

  27. Distributed storage • Open source distributed storage/networked filesystem: GlusterFS, Ceph,

    HDFS • Docker volume API to tie to containers • Software that moves volumes with your containers: Flocker, Sheepdog Project • “EBS on AWS, bring your own network block device elsewhere”
  28. What we picked • Mesos + Marathon • Consul service

    discovery • Templated Nginx routing/load-balancing • GlusterFS storage • Custom frontend for launching apps • Shared PostgreSQL, RabbitMQ, Redis instances
  29. What works well • Mesos! Marathon! • Make much better

    use of available resources • Adding more nodes relatively painless • If a node falls over things behave pretty well • High level of automation (for most cases)
  30. …and not so well • Networking: we don’t have IP-per-container

    yet • Duplicated and sometimes conflicting functionality in Marathon and Consul • Security: templated configuration is very powerful • Want to move more infrastructure into containers that run on Mesos • Still a lot of auxiliary services outside the cluster
  31. Future work • Waiting for integration between Mesos/Marathon and IP-per-container

    solutions • Logging, monitoring, fault-reporting all still need work • Sharing secrets between nodes/containers (HashiCorp Vault) • Ask us in 6 months about how our first couple of in- country deployments have gone…
  32. praekelt/seed-stack Thank you.