Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build Distributed, Fault­-Tolerant Infrastructure with Apache Mesos

37b0fbbdf3dc2d989e8082708d50a939?s=47 dlester
January 31, 2015

Build Distributed, Fault­-Tolerant Infrastructure with Apache Mesos

Presented at FOSDEM 2015 in Brussels.



January 31, 2015


  1. Build Distributed, Fault- Tolerant Infrastructure with Apache Mesos @davelester, OSS

    Advocate at Twitter, Inc Apache Mesos and Aurora PMC Member FOSDEM 2015
  2. Imagine..

  3. You’re an app developer

  4. You’re a Site Reliability Engineer

  5. You run on the public cloud

  6. You’ve statically partitioned your batch workloads and services

  7. Apache Mesos Node Node Node Node Node Node Node Node

    Aurora Storm Spark Frameworks Machines A layer of abstraction between machines in a cluster and application frameworks that share resources Apache Mesos
  8. What is Mesos? • cluster manager • resource manager

  9. Overview 1. Apache Mesos Essentials 2. Using and Building Frameworks

  10. Out of Scope • In depth on project’s research background,

    see Ben Hindman’s original paper from UC Berkeley, or talks • For specific framework/adopter talks, see: • Brenden Matthew, “Hadoop on Mesos” • Bill Farner, “Past, Present, Future of the Aurora Scheduler” • Vinod Kone, “Jenkins on Mesos”
  11. Apache Mesos Essentials

  12. Mesos Actively Monitors • Cluster state (via ZooKeeper) • Available

    hardware resources
  13. Key Problems Addressed • Fault tolerance • Resource efficiency and

  14. Mesos Design Challenges 1. Each framework may have different scheduling

    needs 2. Must scale to tens of thousands of nodes running hundreds of jobs with millions of tasks 3. Must be fault-tolerant and highly available http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf
  15. Basic Mesos Architecture Worker Worker Worker Mesos Master Distributing/Scheduling tasks

    across a set of worker machines ZooKeeper Cluster
  16. Schedulers/Frameworks :) :) ̄\_(ツ)_/ ̄ Mesos Master Example: scheduling an

    Aurora job on two machines. One box idles. Aurora ZooKeeper Cluster
  17. Leveraging Containers Mesos Master Uses Linux Containers for CPU and

    memory isolation; containers size can change (be elastic) over time. Also: native Docker support. Aurora ZooKeeper Cluster Container ̄\_(ツ)_/ ̄ Container
  18. Demo Time: Distributed Shell

  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. Failure of a worker :) Mesos Master Example: failure detected

    by scheduler and rescheduled onto another machine. :) ZooKeeper Cluster :( Frame work
  28. Failure of a master :( Leader election of master nodes

    whose state is replicated across other nodes. Workers connect to new Mesos master. ZooKeeper Cluster Mesos Master Master Worker Worker Worker
  29. Why care about resource utilization? • Less hardware required to

    run the same jobs, driving down costs • Easier to manage fewer machines
  30. Holy Grail of Resource Utilization http://people.csail.mit.edu/matei/papers/2011/nsdi_mesos.pdf

  31. Quasar • Users specify performance target for applications instead of

    typical resource reservations • Machine-learning used to predict resource usage and for cluster scheduling • Research by Christina Delimitrou and Christos Kozyrakis at Stanford http://www.industry-academia.org/download/2014-asplos-quasar-Stanford-paper.pdf
  32. Google Borg • Google’s cluster management solution • Borg has

    “probably saved Google the cost of building an extra data center” • See: Wired Magazine, “Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon” • Google helped fund AMP Lab, and John Wilkes spoke at MesosCon 2014 www.wired.com/2013/03/google-borg-twitter-mesos/
  33. Mesos at Twitter • “Twitter Stack”, including Mesos, Aurora, Finagle

    • Hundreds of separate services with different owners • Managed by Site Reliability Engineer (SRE) team
  34. None
  35. How does your system handle spikes like this? (Castle in

    the Sky, 8/03/2013 TPS record) https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
  36. “throw machines at the problem" vs improving the scalability of

    your system
  37. • During the World Cup, ran on a large Ruby

    on Rails codebase. At that time, approximately 200 engineers pushing code • Needed a solution to isolate failure and isolate feature development Previously at Twitter
  38. https://twitter.com/raffi/status/488437255346741249

  39. Common pattern among companies; see Groupon talk, “Breaking up the

    monolith” http://vimeo.com/105880150
  40. Mesos Adopters mesos.apache.org/documentation/latest/powered-by-mesos/

  41. Using and Building Mesos Frameworks

  42. Resource Offers Mesos provides a set of resource “offers” (descriptions

    of available hardware resources) to a scheduler, which makes decisions for Mesos. ZooKeeper Cluster Mesos Master Worker Worker Worker Frame work
  43. Framework Bindings • Build your own framework for the scheduler

    API, in your favorite language • Currently bindings in: C++, Python, Java, Clojure, Haskell, Go.. or write your own bindings!
  44. Resources for Writing Mesos Frameworks • Tobi Knaup’s “Getting Started

    guide” for writing a scheduler in Scala https://github.com/guenter/ mesos-getting-started • “Mesos + Docker Tutorial: How to Build Your Own Framework” https://www.voxxed.com/blog/2014/12/ mesosdockertutorialhowtobuildyourownframework/ • “Building Massively-Scalable Distributed Systems using Go and Mesos" java.dzone.com/articles/ building-massively-scalable
  45. Scheduler/Mesos Interactions http://mesosphere.github.io/presentations/hack-week-2014/

  46. Frameworks Services Aurora Marathon Kubernetes Singularity Big Data Spark Storm

    Hadoop Batch Chronos Jenkins http://mesos.apache.org/documentation/latest/mesos-frameworks/
  47. Kubernetes on Mesos • Similarities between Google’s Borg and Omega

    systems, and Mesos as a resource manager • Recently open sourced Kubernetes, a new project which is inspired by ideas in their own system • Commercial support: Microsoft, RedHat, IBM, Docker, Mesosphere • Being ported to Mesos as a scheduler https://github.com/mesosphere/kubernetes-mesos
  48. YARN alongside Mesos • Hadoop 2.0 (aka YARN) can work

    with Mesos • Presented at MesosCon 2014 • Prototype developed by eBay/Paypal to share resources across multiple resource managers, using a control plane • Future work being explored in this space http://www.youtube.com/watch?v=d7vZWm_xS9c
  49. Singularity at HubSpot • HubSpot built a custom framework called

    Singularity to run their services (prior to Aurora being open sourced) • They run entirely on AWS • Reduced hardware resources; QA environment runs at 50% of its previous capacity https://mesosphere.com/resources/mesos-case-study-hubspot/
  50. Igazú at Coursera • Built a custom Mesos scheduler called

    Iguazú • Uses Docker containers to bundle existing code for long-running jobs • Relying on Mesos to manage how jobs are run https://tech.coursera.org/blog/2014/11/17/long-running-jobs-at-coursera/
  51. Apache Aurora Apache Mesos Node Node Node Node Node Node

    Node Node Apache Aurora Framework(s) Machines aurora.incubator.apache.org
  52. Aurora’s Approach • One scheduler to rule them all: can

    manage both long-running services, as well as cron • Can mark production and non-production jobs; production jobs can pre-empt non-prod jobs • Has an additional priority system
  53. Key Aurora Features • “run 200 of these, forever” •

    Key features Aurora provides: • Deployment and scheduling of jobs • Rich DSL for defining services • Health checking • Battle-tested in production at Twitter for several years
  54. • Scheduling Diversity Constraints • Host and rack diversity •

    Job abstraction to bundle tasks • Ability to run multiple applications that are replicas of one another, and manage through a single point • Rolling Deploys • SLA Monitoring Key Aurora Features
  55. Example .aurora File import os hello_world_process = Process(name = 'hello_world',

    cmdline = 'echo hello world') hello_world_task = Task( resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB), processes = [hello_world_process]) hello_world_job = Job( cluster = 'cluster1', role = os.getenv('USER'), task = hello_world_task) jobs = [hello_world_job] hello_world.aurora aurora.incubator.apache.org/documentation/latest/tutorial/
  56. Example .aurora File include(‘hello_world.aurora') production_resources = Resources(cpu = 1.0, ram

    = 512 * MB, disk = 2 * GB) staging_resources = Resources(cpu = 0.1, ram = 32 * MB, disk = 512 * MB) hello_world_template = hello_world( name = "hello_world-{{cluster}}" task = hello_world(resources=production_resources)) jobs = [ # production jobs hello_world_template(cluster = 'cluster1', instances = 25), hello_world_template(cluster = 'cluster2', instances = 15), # staging jobs hello_world_template( cluster = 'local', instances = 1, task = hello_world(resources=staging_resources)), ] hello_world_productionized.aurora
  57. One more framework feature: Executors • Executors are responsible for

    executing code on individual worker machines, sending status to Mesos when a task completes • Most frameworks use the pre-packaged command-line executor, however you can create your own
  58. Aurora’s Executor, Thermos • Executes on each worker node of

    the cluster. • Necessary to do preemption of tasks in the, including cases when: • Production jobs need to be run instead of non- production jobs • Duplicate tasks are incorrectly running
  59. Future Mesos Features • optimistic offers • storage primitives •

    support for more container technologies, like Rocket
  60. MesosCon 2015 • MesosCon 2015 will be co-located with LinuxCon

    in Seattle August 20-21, 2015 • Looking for speakers, sponsors, and attendees! • CFP and early-bird registration open until February 14th events.linuxfoundation.org/events/mesoscon
  61. Learning More • Apache Mesos website • http://mesos.apache.org • Mesosphere

    tutorials • http://mesosphere.io
  62. Thank You Dave Lester @davelester dlester@twitter.com