Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Mesos at Twitter

Apache Mesos at Twitter

Chris Aniszczyk

June 14, 2014
Tweet

More Decks by Chris Aniszczyk

Other Decks in Programming

Transcript

  1. Agenda ! • Introduction • How does Mesos work? •

    Mesos Ecosystem • Conclusion • Q&A
  2. Twitter Scale… 5 255M+ 500M+ 77% Active users Tweets per

    day of users are outside the US 2006 2014 100TB+ compressed data per day
  3. Easy solution!? Lets add machines… but… ! • Can get

    expensive… even with commodity hardware… • Hard to fully utilize machines (e.g., 72 GB RAM and 24 CPUs) • Hard to deal with failures… • What else could we do…?
  4. Evaluate industry… ! • Google was ahead of the game

    of managing warehouse scale computing: http:// research.google.com/pubs/pub35290.html ! • Google hit a lot of these problems before many other companies and came up with interesting solutions: http://youtube.com/watch?v=0ZFMlO98Jkc
  5. Evaluate research at universities… ! • Universities (wooooo PhDs) were

    doing research in this area, we decided to partner and hire researchers: https://amplab.cs.berkeley.edu/tag/mesos/ ! • “Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon: http://www.wired.com/2013/03/ google-borg-twitter-mesos
  6. Enter Apache Mesos ! • We took university research and

    spun into an open source project at the Apache Foundation: https:// blog.twitter.com/2012/incubating-apache-mesos • https://twitter.com/ApacheMesos/statuses/ 360039441500340224
  7. What is exactly is Mesos? • Mesos is an open

    source project with a healthy independent community: http://mesos.apache.org • Mesos is a distributed system to build and run distributed systems • Mesos provides fine-grained resource sharing and isolation • Mesos enables high-availability and fault-tolerance for your cluster
  8. Resource sharing increases throughput and utilization 0% 11% 22% 33%

    0% 11% 22% 33% 0% 11% 22% 33% 0% 33.333% 66.667% 100%
  9. Running at the container level improves performance… Time to provision

    (seconds) 1 100 10000 Bare metal VM Container Inspired by Tomas Barton’s Mesos talk at InstallFest in Prague
  10. Agenda ! • Introduction • How does Mesos work? •

    Mesos Ecosystem • Conclusion • Q&A
  11. Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2

    ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum *Thank you to Niklas Nielsen and Adam Borlen for the following diagrams explaining Mesos https://www.youtube.com/watch?v=EI0ROkf0vks Mesos consists of master/slave nodes
  12. Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2

    ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum applications are known as frameworks in Mesos, they interact with master
  13. Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2

    ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum Multiple masters can be in place for HA; coordinate leader election with ZK
  14. Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2

    ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum Master schedules tasks to run on slaves’ available resources; slaves use executors to coordinate execution of tasks Tasks are the unit of execution
  15. Mesos provides fine-grained resource isolation (via cgroups) Compute Node Mesos

    Slave Process Hadoop task-tracker Mesos Executor Task #1 Task #2 ruby XYZ Container (Cgroups) Executor Slaves isolate executors and tasks via containers (dotted line)
  16. Compute Node Mesos Slave Process Hadoop task-tracker Task #1 Task

    #2 Container (Cgroups) Task #3 Mesos provides fine-grained resource isolation (via cgroups) Containers can GROW AND SRHINK as tasks run and complete
  17. Mesos provides componentized resource isolation Mesos Slave Process Mesos Containerizer

    CGroups CPU isolator CGroups Memory isolator Launcher Container foo Task baz Containerizer API Executor bar When a slave starts, you can specify a “containerizer” to launch the container and set of isolators to enforce resource constraints (CPU/memory) Mesos can track and allocate more resource types, allowing you to manage resources like ip-addresses, ports, disk space and even GPUs!
  18. Mesos provides pluggable resource isolation (e.g., Docker) External Containerizer External

    Containerizer API Mesos Slave Process External Containerizer Program Container foo MySQL Containerizer API Ubuntu 13.10 Container bar Ruby Centos 6.4 github.com/mesosphere/deimos
  19. Mesos has no single point of failure (master keeps monitoring

    tasks and waits for a node to reconnect, master will update the framework with any tasks that were completed while it was gone) Tasks keep running! Framework Masters
  20. Master node can fail-over (ZK quorum will elect a new

    leader) Tasks keep running! Framework Masters
  21. Slave processes can fail over (loads check pointed state to

    learn what pods to reconnect for reach task and re-registeres with the master) Tasks keep running! Compute Node Mesos Slave Process Mesos Executor Mesos Executor
  22. Agenda ! • Introduction • How does Mesos work? •

    Mesos Ecosystem • Conclusion • Q&A
  23. Storage MySQL Tweet store Flock User Store Cache Memcached Redis

    Logic Tweet Service User Service Timeline Service SocialGraph Service DM Service Presentation API Web Search Feature X Feature Y Presentation TFE (netty) Reverse Proxy HTTP Thrift Thrift Aurora Mesos Monorail
  24. Marathon Mesos Chronos Batch/Streaming Hadoop Spark Kafka Query/Analysis Cascading Presto

    Hive Shark Pig Services Rails Redis Cassandra KairosDB RDS Hadoop A Hadoop B
  25. Agenda ! • Introduction • How does Mesos work? •

    Mesos Ecosystem • Conclusion • Q&A
  26. Conclusion • Mesos is a distributed system to build and

    run distributed systems (think datacenter OS) • Mesos enables resource sharing, high-availability and fault-tolerance for your data centers • Mesos is an open source project with a healthy independent community: http://mesos.apache.org • So please check it out, use it or contribute back if you can to make it better!
  27. Thank you for listening! Chris Aniszczyk (@cra) [email protected] http://opensource.twitter.com !

    http://mesos.apache.org email: {user,dev}@mesos.apache.org 51 Also thanks to Niklas Nielsen and Adam Borlen for their slides explaining Mesos from ApacheCon 2014 https://www.youtube.com/watch?v=EI0ROkf0vks