Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A story of Mesos, Twitter, and a growing ecosystem

37b0fbbdf3dc2d989e8082708d50a939?s=47 dlester
June 24, 2015

A story of Mesos, Twitter, and a growing ecosystem

Presented at the DevNation conference, June 24th 2015 in Boston, MA.



June 24, 2015


  1. A story of Mesos, Twitter, and a growing ecosystem @davelester,

    OSS Advocate at Twitter, Inc Apache Mesos and Aurora PMC Member DevNation 2015, Boston
  2. Why you’re probably here • Schedulers (Kubernetes, Aurora, Marathon) •

    Cluster Managers (Mesos, YARN) • Containers (Docker, Rocket, OCF) • Buzzwords (Cloud, Microservices) • Stickers?
  3. Infographic credit, @tpetr: https://twitter.com/tpetr/status/609098710233051136

  4. http://memegenerator.net/instance/58183934

  5. Three Big Ideas 1. Orchestration is harder than you think

    2. High utilization is key for large deployments 3. Multi-framework is the future
  6. Out of Scope • In depth on project’s research background

    • For specific framework/adopter talks, see: • Bill Farner, “Past, Present, Future of the Aurora Scheduler” [1] • Connor Doyle, “Simplifying with Mesos and Marathon”[2] • Vinod Kone, “Jenkins on Mesos” [3] [1] https://www.youtube.com/watch?v=Dsc5CPhKs4o [2] https://www.youtube.com/watch?v=TPXw_lMTJVk [3] https://www.youtube.com/watch?v=OgVaQPYEsVo
  7. Apache Mesos Essentials

  8. Mesos is a resource manager that abstracts machines in a

    cluster and schedulers that share resources
  9. Apache Mesos Node Node Node Node Node Node Node Node

    Aurora Storm Spark Schedulers Machines Apache Mesos
  10. Mesos Actively Monitors • Cluster state (via ZooKeeper) • What

    machines are online and running tasks • Available hardware resources • RAM, CPU, and disk available to schedule tasks
  11. Key Problems Mesos Addresses • Fault tolerance • Resource efficiency

    and utilization
  12. Mesos Design Challenges 1. Each framework may have different scheduling

    needs 2. Must scale to tens of thousands of nodes running hundreds of jobs with millions of tasks 3. Must be fault-tolerant and highly available http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf
  13. Basic Mesos Architecture Worker Worker Worker Mesos Master Distributing/Scheduling tasks

    across a set of worker machines ZooKeeper Cluster
  14. Schedulers/Frameworks :) :) ̄\_(ツ)_/ ̄ Mesos Master Example: scheduling an

    Aurora job on two machines. One box idles. Aurora ZooKeeper Cluster
  15. Leveraging Containers Mesos Master Uses Linux Containers for CPU and

    memory isolation; containers size can change (be elastic) over time. Also: native Docker support. Aurora ZooKeeper Cluster Container ̄\_(ツ)_/ ̄ Container
  16. Demo Time: Distributed Shell

  17. hello- devnation hello- devnation hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799

  18. None
  19. None
  20. None
  21. sleep 3; echo “hello devnation!” hello-devnation

  22. hello- devnation hello- devnation hello- devnation hello- devnation hello- devnation

    hello- devnation hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347-56847afe9799 hello-devntn
  23. None
  24. hello- devnation hello- devnation hello- devnation hello- devnation hello- devnation

    hello- devnation hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347- 56847afe9799 hello-devnation.200cc4db-a921-11e4-8347-56847afe9799 hello-devntn
  25. Failure of a worker :) Mesos Master Example: failure detected

    by scheduler and rescheduled onto another machine. :) ZooKeeper Cluster :( Frame work
  26. Failure of a master :( Leader election of master nodes

    whose state is replicated across other nodes. Workers connect to new Mesos master. ZooKeeper Cluster Mesos Master Master Worker Worker Worker
  27. Why care about resource utilization? • Less hardware required to

    run the same jobs, driving down costs • Easier to manage fewer machines
  28. Holy Grail of Resource Utilization http://people.csail.mit.edu/matei/papers/2011/nsdi_mesos.pdf

  29. Quasar • Users specify performance target for applications instead of

    typical resource reservations • Machine-learning used to predict resource usage and for cluster scheduling • Research by Christina Delimitrou and Christos Kozyrakis at Stanford http://www.industry-academia.org/download/2014-asplos-quasar-Stanford-paper.pdf
  30. Google Borg • Google’s cluster management solution • Borg has

    “probably saved Google the cost of building an extra data center” • Google helped fund AMP Lab, and John Wilkes spoke at MesosCon 2014 • See: Wired Magazine, “Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon” www.wired.com/2013/03/google-borg-twitter-mesos/
  31. Mesos, Aurora, and the @TwitterOSS stack

  32. Twitter is the pulse of the planet

  33. None
  34. How does your system handle spikes like this? (Castle in

    the Sky, 8/03/2013 TPS record) https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
  35. “throw machines at the problem" vs improve the scalability of

    your system
  36. • During the World Cup, Twitter ran on a monolithic

    Ruby on Rails codebase. At that time, approximately 200 engineers pushing code • Needed a solution to isolate failure and isolate feature development Previously at Twitter
  37. • Hundreds of separate services with different owners • Managed

    by Site Reliability Engineering (SRE) Team • Running the “Twitter Stack” incl Mesos, Aurora, Finagle Now at Twitter
  38. https://twitter.com/raffi/status/488437255346741249

  39. Common pattern among companies; see Groupon talk, “Breaking up the

    monolith” http://vimeo.com/105880150
  40. Apache Mesos Node Node Node Node Node Node Node Node

    Schedulers Machines aurora.incubator.apache.org
  41. Apache Aurora Apache Mesos Node Node Node Node Node Node

    Node Node Apache Aurora Scheduler Machines aurora.incubator.apache.org “run 200 of these, forever”
  42. About Apache Aurora • One scheduler to rule them all:

    can manage both long-running services, as well as cron • Runs world’s largest production Mesos clusters, tens of thousands of servers in shared clusters • Originally developed and battle-tested in production at Twitter for several years, open sourced in 2013 and now an Apache TLP
  43. Key Aurora Features • Key features Aurora provides: • Deployment

    and scheduling of jobs • Rich DSL for defining services • Health checking • Can mark production and non-production jobs • Has an additional priority system
  44. • Scheduling Diversity Constraints • Host and rack diversity •

    Job abstraction to bundle tasks • Ability to run multiple applications that are replicas of one another, and manage through a single point • Rolling Deploys • SLA Monitoring Key Aurora Features
  45. Configuration in Aurora is similar to config in Google’s Borg

  46. Example .aurora File import os hello_world_process = Process(name = 'hello_world',

    cmdline = 'echo hello world') hello_world_task = Task( resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB), processes = [hello_world_process]) hello_world_job = Job( cluster = 'cluster1', role = os.getenv('USER'), task = hello_world_task) jobs = [hello_world_job] hello_world.aurora aurora.incubator.apache.org/documentation/latest/tutorial/
  47. Example .aurora File include(‘hello_world.aurora') production_resources = Resources(cpu = 1.0, ram

    = 512 * MB, disk = 2 * GB) staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB) hello_world_template = hello_world( name = "hello_world-{{cluster}}" task = hello_world(resources=production_resources)) jobs = [ # production jobs hello_world_template(cluster = 'cluster1', instances = 25), hello_world_template(cluster = 'cluster2', instances = 15), # staging jobs hello_world_template( cluster = 'local', instances = 1, task = hello_world(resources=staging_resources)), ] hello_world_productionized.aurora
  48. Why .aurora configs? • They define common patterns of usage

    • Templates can be owned by individual teams which control optimization of job execution • For example, your JVM team could own an Aurora config with JVM optimizations • Provide snapshots of a job, you can even check them into git if you’d like to manage them that way
  49. Lastly, Aurora uses a custom executor • Executors are responsible

    for executing code on individual worker machines, sending status to Mesos when a task completes • Most frameworks use the command-line executor, however you can create your own
  50. Future of the project, ecosystem, and frameworks

  51. Frameworks are the gateway drug to Mesos

  52. Frameworks Services Aurora Marathon Kubernetes Singularity Big Data Spark Storm

    Hadoop Batch Chronos Jenkins http://mesos.apache.org/documentation/latest/mesos-frameworks/ Storage Cassandra HDFS MySQL
  53. Kubernetes on Mesos • Similarities between Google’s Borg, developed by

    Google Engineers and OSS community • Commercial support: CoreOS, IBM, Kismatic, Mesosphere, Microsoft, Red Hat • Is being ported to Apache Mesos https://github.com/mesosphere/kubernetes-mesos
  54. Apache Mysos* • New framework developed by Twitter to provision

    MySQL clusters on Apache Mesos • Currently being tested at Twitter to offer a self- service mode for provisioning MySQL instances • Currently resources on a machine are treated as ephemeral, looking to integrate with persistent storage primitives in the near future * The project name is currently considering a name change
  55. Apache Myriad • Enables the co-existence of Apache Hadoop YARN

    and Apache Mesos together on the same cluster • YARN applications can continue to run on top of YARN, unaware of Mesos • Launching YARN node managers within Mesos • Now part of the Apache Incubator http://www.youtube.com/watch?v=d7vZWm_xS9c
  56. Singularity at HubSpot • HubSpot built a custom framework called

    Singularity to run their services (prior to Aurora being open sourced) • They run entirely on AWS • Reduced hardware resources; QA environment runs at 50% of its previous capacity https://mesosphere.com/resources/mesos-case-study-hubspot/
  57. Mesosphere DCOS • Popularizing the vision of running many frameworks

    on a single shared Apache Mesos cluster • Community edition runs on AWS, enterprise edition that can run on prem with customizations • Provides Mesos, Meta-Scheduler (Marathon), services (Mesos frameworks), and UI / CLI https://mesosphere.com/resources/mesos-case-study-hubspot/
  58. BYOF! (Build Your Own Framework)

  59. Resources for writing Mesos frameworks today • Tobi Knaup’s “Getting

    Started guide” for writing a scheduler in Scala [1] • “Mesos + Docker Tutorial: How to Build Your Own Framework” [2] • “Building Massively-Scalable Distributed Systems using Go and Mesos” [3] [1] https://github.com/guenter/mesos-getting-started [2] http://codefutures.com/mesos-docker-tutorial-how-to-build-your-own-framework/ [3] http://java.dzone.com/articles/building-massively-scalable
  60. Today: framework bindings • Mesos uses protocol buffers for network

    communication, serializing structured data • Framework bindings manage communication with the Mesos master according to the scheduler API • Build your own framework using bindings in your favorite language: C++, Python, Java, Clojure, Haskell, Go.. or write your own bindings!
  61. Future: Mesos HTTP API • Mesos will provide several different

    APIs for remote components to communicate with it using HTTP • Scheduler API, scheduler <—> master • Executor API, executor <—> worker • Internal API, master <—> worker https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit#
  62. Resource Offers Mesos provides a set of resource “offers” (descriptions

    of available hardware resources) to a scheduler, which makes decisions for Mesos. ZooKeeper Cluster Mesos Master Worker Worker Worker Frame work
  63. Scheduler/Mesos Interactions http://mesosphere.github.io/presentations/hack-week-2014/

  64. Future: optimistic offers • Allow Mesos to offer frameworks in

    parallel • Brings the project closer to the architecture of Google’s Omega (next-generation Borg) • Will enable improved integration of multi- framework on Mesos http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41684.pdf
  65. Preemption via Aurora’s Thermos Today • Aurora is a “greedy”

    scheduler, will collect all the offers it can to manage resources • Executor is necessary to do preemption of tasks in the, including cases when: • Production jobs need to be run instead of non- production jobs • Duplicate tasks are incorrectly running
  66. Preemption is a crucial feature for increased resource utilization in

    larger clusters <— Remember this? Our goal
  67. Future preemption in Mesos • Does not currently exist in

    Mesos, when a scheduler is given an offer it’s up to the scheduler how to handle those resources • Will be introducing idea of revocable offers, like normal offers but: • Once accepted by a Framework, the revocable offer can be revoked at any time • Revocation means that the underlying task / executor may be killed at any time https://cwiki.apache.org/confluence/display/MESOS/DRAFT+Design+Doc+-+Revocable+Offers
  68. Containers support in Mesos today • Linux Containers are used

    by default and provide isolation of CPU, memory, and disk • Docker containers may be run (Added in Mesos 0.20.0, August 2014)
  69. Future Container Support • Rocket / appc specification • Open

    Container Project
  70. Storage in Mesos today • Resources in Mesos are designed

    to be ephemeral • Most users will install provision Mesos worker nodes with HDFS or distributed file systems Mesos currently has support for: • Disk isolation, enabling you to enforce disk quota limits on sandboxes (added in Mesos 0.22.0) • Dedicated hosts, via schedulers like Aurora that can schedule to specific machines. Not a Mesos feature.
  71. Future Mesos Storage Primitives Actively being worked on. Several key

    features: • Persistent volumes, creating a volume outside of your tasks sandbox that persists even when task completes • Dynamic reservations, adding ability to reserve resources a task uses ensure they’re offered back when a task exits
  72. Recap future Mesos multi-framework world • Frameworks will be able

    to communicate via an HTTP API, making them much easier to write • Offers will be sent to multiple frameworks optimistically, allowing them to make decisions faster and mediate differences between needs • Revocable offers will be introduced to allow resources to be taken from other frameworks, killing any running tasks
  73. Help us build the future OS for distributed systems and

    get involved
  74. MesosCon 2015 • Annual community-driven conference • MesosCon 2015 will

    be co-located with LinuxCon in Seattle August 20-21, 2015 (hackathon on Aug 19) • CFP for full talks closed; CFP for lightning talks open until July 15th • Future MesosCon events: Dublin, Beijing, and more events.linuxfoundation.org/events/mesoscon
  75. Three Mesos books are being released this year • Mesos

    in Action [1] • Building Applications on Mesos [2] • Apache Mesos Essentials [3] [1] http://manning.com/ignazio/ [2] http://www.amazon.com/Building-Applications-Mesos-David-Greenberg/dp/149192652X [3] https://www.packtpub.com/big-data-and-business-intelligence/apache-mesos-essentials
  76. Learning More Apache Mesos website • http://mesos.apache.org Mailing Lists •

    dev@mesos.apache.org, user@mesos.apache.org IRC • #mesos on irc.freenode.net
  77. Thank You Dave Lester @davelester dlester@twitter.com