Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Mesos

37b0fbbdf3dc2d989e8082708d50a939?s=47 dlester
October 30, 2013

Apache Mesos

Presented at Big Data DC Meetup, Oct 8th 2013.

Some slides adapted from a previous talk by Benjamin Hindman.

37b0fbbdf3dc2d989e8082708d50a939?s=128

dlester

October 30, 2013
Tweet

Transcript

  1. Apache Mesos DAVE LESTER, OPEN SOURCE ADVOCATE AT TWITTER Big

    Data DC Meetup, Oct 8 2013
  2. HELLO @davelester OPEN SOURCE ADVOCATE AT TWITTER • Open Programs

    at Twitter • Apache Mesos PMC • Apache Aurora PMC
  3. BRIEF OUTLINE Community Overview Get Started

  4. APACHE MESOS

  5. APACHE MESOS IS... A next-generation resource manager -- “the kernel

    of the data center” that provides fault tolerance and improves resource utilization in your distributed systems. A compact piece of software that makes it easier to develop and run software in your datacenter
  6. Data Center Challenges

  7. Mesos Node Node Node Node Hadoop Node Node Node Node

    Spark Node Node MPI Node … RAPID CHANGES New Applications
  8. Mesos Node Node Node Node Hadoop Node Node Node Node

    Spark Node Node MPI Storm Node … RAPID CHANGES New Applications
  9. Mesos Node Node Node Node Hadoop Node Node Node Node

    Spark Node Node MPI Storm Node Chronos … RAPID CHANGES New Applications
  10. Mesos Node Node Node Node Hadoop Node Node Node Node

    Spark Node Node MPI Storm Node Chronos RAPID CHANGES New Applications
  11. Mesos Node Node Node Node Hadoop Node Node Node Node

    Spark Node Node MPI Storm Node Chronos RAPID CHANGES New Hardware
  12. Mesos AWS AWS AWS AWS Hadoop AWS AWS AWS AWS

    Spark AWS AWS MPI Storm AWS Chronos RAPID CHANGES New Hardware
  13. Mesos AWS AWS AWS AWS Hadoop AWS AWS AWS AWS

    Spark AWS AWS MPI Storm AWS Chronos RAPID CHANGES New Hardware
  14. HIGH AVAILABILITY • How do you manage the state of

    jobs? • How is failure detected and managed? • How do you reschedule failed jobs?
  15. EFFICIENCY • How do you utilize all of the cores

    and memory available? • How can I reduce latency?
  16. None
  17. SCALABILITY •More traffic + users (sometimes infrequent)? •More data to

    analyze
  18. How Does Mesos Help?

  19. MESOS

  20. Apache ZooKeeper MESOS

  21. Apache ZooKeeper MESOS

  22. Apache ZooKeeper MESOS

  23. Apache ZooKeeper MESOS

  24. Apache ZooKeeper MESOS

  25. Now, launch frameworks!

  26. • Hadoop (github.com/mesos/hadoop) • Spark (github.com/mesos/spark) • DPark (github.com/douban/dpark) •

    Storm (github.com/nathanmarz/storm) • Chronos (github.com/airbnb/chronos) • Aurora (github.com/twitter/aurora) • Marathon (github.com/mesosphere/marathon) • [ADD YOUR FRAMEWORK TO OUR LIST!] MESOS FRAMEWORKS
  27. run processes simultaneously (distributed) handle process failures (fault-tolerance) optimize execution

    (elasticity, scheduling) FRAMEWORK COMMONALITY
  28. MESOS Apache ZooKeeper Apache Hadoop Chronos

  29. MESOS Apache ZooKeeper Apache Hadoop Chronos

  30. MESOS Apache ZooKeeper Apache Hadoop Chronos

  31. MESOS Apache ZooKeeper Apache Hadoop Chronos

  32. BUT WHY?

  33. Originally a UC Berkeley AMPLab research project including Benjamin Hindman,

    Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica Check out the Berkeley Stack. mesos.apache.org/documentation RESEARCH ORIGINS
  34. STATIC PARTITIONING Apache Hadoop Chronos

  35. hard to utilize machines (e.g., 72 GB RAM and 24

    CPUs) (1) STATIC PARTITIONING Apache Hadoop Chronos
  36. hard to scale elastically (2) STATIC PARTITIONING Apache Hadoop Chronos

  37. hard to deal with failures (3) STATIC PARTITIONING Apache Hadoop

    Chronos
  38. “KERNEL” FOR THE DATACENTER Apache Hadoop Chronos

  39. scheduler – distributed system “master” (executor – lower-level control of

    task execution, optional) requests/offers – resource allocations tasks – “threads” of the distributed system state – working set of the distributed system … MESOS PRIMATIVES
  40. SCHEDULER Apache Hadoop Chronos

  41. (1) brokers for resources (with master) (2) launches tasks (3)

    handles task termination SCHEDULER
  42. (1) make resource requests 2 CPUs 1 GB RAM slave

    * (2) respond to resource offers 4 CPUs 4 GB RAM slave foo.bar.com BROKERING FOR RESOURCES
  43. non-blocking resource allocation exist to answer the question: “what should

    mesos do if it can’t satisfy a request?” (1) wait until it can (2) offer the best allocation it can immediately OFFERS
  44. mesos: controls resource allocations to schedulers schedulers: make decisions about

    what to run given allocated resources “TWO-LEVEL SCHEDULING”
  45. either a concrete command line or an opaque description (which

    requires a framework executor to execute) a consumer of resources TASKS
  46. launching/killing health monitoring/reporting (failure detection) resource usage monitoring (statistics) TASK

    OPERATIONS
  47. GETTING STARTED

  48. DOWNLOAD AND INSTALL http://mesos.apache.org/downloads/ $ tar zxf mesos-0.13.0.tar.gz $ cd

    mesos-0.13.0 $ ./configure --prefix=/path/to/install/directory $ make install
  49. BASIC RESOURCES Mesos Getting Started Page http://mesos.apache.org/gettingstarted/ Vagrant Script https://github.com/everpeace/vagrant-mesos

  50. ACADEMIC RESEARCH Multi-Agent Cluster Scheduling for Scalability and Flexibility, Andrew

    Konwinski Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Hindman, Konwinski, Zaharia, et all. Google Omega paper
  51. maintained stable development 0.13.0 0.14.0 0.15.0 RELEASES * * tested

    and in production at Twitter
  52. Mesosphere has created scripts & distributed packages for: • OSX

    • Debian • Ubuntu • Redhat PACKAGES
  53. If you care about packages in the Mesos core, join

    the Mesos dev list (dev@mesos.apache.org) Folks are actively working on various distributions PACKAGES
  54. $ mesos-master --help $ mesos-master --ip=a.b.c.d $ MESOS_ip=a.b.c.d mesos-master STARTING

    A MASTER
  55. STARTING A (FAULT-TOLERANT) MASTER $ mesos-master --zk=zk://ip1:port1,ip2:port2,…/mesos

  56. $ mesos-slave –help $ mesos-slave --master=ip:port $ mesos-slave --master=zk://ip1:port1,ip2:port2,…/mesos STARTING

    A SLAVE
  57. AURORA Builds on top of Apache Mesos and provides common

    features that allow any site to run large scale production applications. Runs large parts of Twitter.com including our ad services. Now part of the Apache Incubator. Snapshot of scheduler code is online: http://github.com/twitter/aurora/
  58. AURORA Typical services consist of dozens or hundreds of replicas

    of tasks. As a service scheduler, Aurora provides the abstraction of a "job" to bundle and manage these tasks. Features: definition, the concept of an instance and the serverset, deployment and scheduling, health checking, and introspection. Allows cross-cutting concerns to be handled like observability and log collection.
  59. COMMUNITY

  60. TWITTER STORY Remember Twitter during World Cup 2010? Humble beginnings

    as a Ruby on Rails app Rearchitected our infrastructure, featuring Mesos* * see Raffi Krikorian (Twitter VP of Platform Eng) blog post
  61. MESOS AT TWITTER TODAY "Mesos is the cornerstone of our

    elastic compute infrastructure -- it's how we build all our new services and is critical for Twitter's continued success at scale. It's one of the primary keys to our data center efficiency." - Chris Fry, SVP of Engineering at Twitter
  62. COMPANIES USING MESOS Airbnb MediaCrossing Sharethrough Vimeo Categorize Conviva CloudPhysics

    Xogito
  63. MESOS SUPPORT Databricks Mesosphere.io Grand Logic, Inc

  64. COMMUNITY ACTIVITY Docker and Mesos integration New framework bindings

  65. CONTRIBUTE AND JOIN US http://mesos.apache.org/community/ Start a local meetup group

    Join our IRC channel: irc.freenode.net #mesos November 19th Mesos townhall meeting
  66. HOW TO GET IN TOUCH Drop me a line via

    email or Twitter. I will be happy to answer your questions. @DAVELESTER DAVE@DAVELESTER.ORG THANK YOU PS: remember to grab a Mesos sticker