Production-Ready Containers: Using Aurora and Mesos

37b0fbbdf3dc2d989e8082708d50a939?s=47 dlester
September 22, 2015

Production-Ready Containers: Using Aurora and Mesos

Presented at Container Summit 2015 in San Francisco,

Contact: @davelester



September 22, 2015


  1. Production-Ready Containers: Using Aurora and Mesos @davelester, OSS Advocate at

    Twitter, Inc Apache Mesos and Apache Aurora PMC Member
  2. Twitter is the pulse of the planet

  3. How does your system handle spikes like this? (Castle in

    the Sky, 8/03/2013 TPS record)
  4. None
  5. “throw machines at the problem" vs improving the scalability of

    your system
  6. • During the World Cup, ran on a large Ruby

    on Rails codebase. At that time, approximately 200 engineers pushing code • Needed a solution to isolate failure and isolate feature development Previously at Twitter
  7. Mesos at Twitter • “Twitter Stack”, including Mesos, Aurora, Finagle

    • Hundreds of separate services with different owners • Managed by Site Reliability Engineer (SRE) team
  8. Mesos Adopters

  9. The New Stack


  11. Credit: “Protecting Yourself from the Container Shakeout” by Boris

  12. Two Questions • What are the right abstractions for software

    to integrate? • What are the interfaces for developers?
  13. Four Value Propositions for Apache Mesos • Reliability • Manageability

    • Utilization • Approachability
  14. Overview 1. Resource management and Apache Mesos 2. Container scheduling

    and Apache Aurora 3. Future abstractions and interfaces for managing containers in a distributed system
  15. Resource management and Apache Mesos

  16. Apache Mesos Node Node Node Node Node Node Node Node

    Aurora … … Framework(s) Machines A layer of abstraction between machines in a cluster and application frameworks that share resources Apache Mesos
  17. Apache Aurora Apache Mesos Node Node Node Node Node Node

    Node Node Apache Aurora Framework(s) Machines
  18. Mesos is the nervous system for the datacenter Schedulers are

    the brain(s)
  19. Mesos Actively Monitors • Cluster state (via ZooKeeper) • Available

    hardware resources
  20. Key Problems Addressed • Fault tolerance • Resource efficiency and

  21. Mesos Design Challenges 1. Each framework may have different scheduling

    needs 2. Must scale to tens of thousands of nodes running hundreds of jobs with millions of tasks 3. Must be fault-tolerant and highly available
  22. Mesos Frameworks Services Aurora Marathon Kubernetes Singularity Big Data Spark

    Storm Hadoop Batch Chronos Jenkins
  23. Container Scheduling and Apache Aurora

  24. Basic Mesos Architecture Agent Agent Agent Mesos Master Distributing/Scheduling tasks

    across a set of agent machines ZooKeeper Cluster
  25. Schedulers/Frameworks :) :) ̄\_(ツ)_/ ̄ Mesos Master Example: scheduling an

    Aurora job on two machines. One box idles. Aurora ZooKeeper Cluster
  26. Leveraging Containers Mesos Master Uses Linux Containers for CPU and

    memory isolation; containers size can change (be elastic) over time. Also: native Docker support. Aurora ZooKeeper Cluster Container ̄\_(ツ)_/ ̄ Container
  27. Failure of an agent :) Mesos Master Example: failure detected

    by scheduler and rescheduled onto another machine. :) ZooKeeper Cluster :( Frame work
  28. Failure of a master :( Leader election of master nodes

    whose state is replicated across other nodes. Agents connect to new Mesos master. ZooKeeper Cluster Mesos Master Master Agent Agent Agent
  29. Apache Aurora Apache Mesos Node Node Node Node Node Node

    Node Node Apache Aurora Framework(s) Machines
  30. Aurora’s Approach • One scheduler to rule them all: can

    manage both long-running services, as well as cron • For services, “run 200 of these, forever” • Has a built in priority and quota systems
  31. Key Aurora Features • Key features Aurora provides: • Deployment

    and scheduling of jobs • Rich DSL for defining services • Health checking and SLA monitoring • Battle-tested in production at Twitter for multiple years
  32. • Scheduling Diversity Constraints • Host and rack diversity •

    Job abstraction to bundle tasks • Ability to run multiple applications that are replicas of one another, and manage through a single point • Rolling Deploys Key Aurora Features
  33. Aurora supports preemption • Can mark production and non-production jobs;

    production jobs can pre-empt non-prod jobs • Critical to increased utilization of machines in our cluster
  34. Future Abstractions and Interfaces

  35. What are tomorrow’s abstractions?

  36. Will foundations determine tomorrow’s abstractions?

  37. Mesos: datacenter as a computer?

  38. Resource Management != Container Scheduling

  39. Unbundling and modularization • Cloud Foundry has been unbundled into

    Diego, Lattice • Mesos has a module system + frameworks • Kubernetes has a plugin system • Docker has a plugin system
  40. What are common interfaces for tomorrow’s developers? CLI? UI?

  41. Will these interfaces vary locally vs in the cloud? Small

    vs large scale?
  42. Schedulers? Kubernetes, Aurora, Marathon, …?

  43. Container tools themselves, like Docker, Swarm, etc?

  44. Orchestration as a service?

  45. Credit: “Protecting Yourself from the Container Shakeout” by Boris

  46. Thank You Dave Lester @davelester