Slide 1

Slide 1 text

@andypiper Chris Aniszczyk Head of Open Source @cra Apache Mesos at Twitter #TXLF 2014

Slide 2

Slide 2 text

Hi, I’m @cra & run the @TwitterOSS office! 2

Slide 3

Slide 3 text

Twitter is Built on Open Source… 3

Slide 4

Slide 4 text

Agenda ! • Introduction • How does Mesos work? • Mesos Ecosystem • Conclusion • Q&A

Slide 5

Slide 5 text

Twitter Scale… 5 255M+ 500M+ 77% Active users Tweets per day of users are outside the US 2006 2014 100TB+ compressed data per day

Slide 6

Slide 6 text

6 Growth challenges… sad times… remember the fail whale?

Slide 7

Slide 7 text

7 Ups and Downs… remember World Cup 2010? http://gigaom.com/2010/06/11/is-the-world-cup-bringing-down-twitter/

Slide 8

Slide 8 text

Easy solution!? Lets add machines… but… ! • Can get expensive… even with commodity hardware… • Hard to fully utilize machines (e.g., 72 GB RAM and 24 CPUs) • Hard to deal with failures… • What else could we do…?

Slide 9

Slide 9 text

Evaluate industry… ! • Google was ahead of the game of managing warehouse scale computing: http:// research.google.com/pubs/pub35290.html ! • Google hit a lot of these problems before many other companies and came up with interesting solutions: http://youtube.com/watch?v=0ZFMlO98Jkc

Slide 10

Slide 10 text

Evaluate research at universities… ! • Universities (wooooo PhDs) were doing research in this area, we decided to partner and hire researchers: https://amplab.cs.berkeley.edu/tag/mesos/ ! • “Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon: http://www.wired.com/2013/03/ google-borg-twitter-mesos

Slide 11

Slide 11 text

Enter Apache Mesos ! • We took university research and spun into an open source project at the Apache Foundation: https:// blog.twitter.com/2012/incubating-apache-mesos • https://twitter.com/ApacheMesos/statuses/ 360039441500340224

Slide 12

Slide 12 text

What is exactly is Mesos? • Mesos is an open source project with a healthy independent community: http://mesos.apache.org • Mesos is a distributed system to build and run distributed systems • Mesos provides fine-grained resource sharing and isolation • Mesos enables high-availability and fault-tolerance for your cluster

Slide 13

Slide 13 text

This is your typical data center 1 2 3 4 5 6 7 8 9

Slide 14

Slide 14 text

This is your typical data center with static partitioned apps 1 2 3 4 5 6 7 8 9

Slide 15

Slide 15 text

Not sharing wastes resources 0% 11% 22% 33% 0% 11% 22% 33% 0% 11% 22% 33%

Slide 16

Slide 16 text

Resource sharing increases throughput and utilization 0% 11% 22% 33% 0% 11% 22% 33% 0% 11% 22% 33% 0% 33.333% 66.667% 100%

Slide 17

Slide 17 text

Running at the container level improves performance… Time to provision (seconds) 1 100 10000 Bare metal VM Container Inspired by Tomas Barton’s Mesos talk at InstallFest in Prague

Slide 18

Slide 18 text

Agenda ! • Introduction • How does Mesos work? • Mesos Ecosystem • Conclusion • Q&A

Slide 19

Slide 19 text

Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2 ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum *Thank you to Niklas Nielsen and Adam Borlen for the following diagrams explaining Mesos https://www.youtube.com/watch?v=EI0ROkf0vks Mesos consists of master/slave nodes

Slide 20

Slide 20 text

Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2 ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum applications are known as frameworks in Mesos, they interact with master

Slide 21

Slide 21 text

Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2 ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum Multiple masters can be in place for HA; coordinate leader election with ZK

Slide 22

Slide 22 text

Mesos Slave Hadoop task-tracker Mesos Executor Task #1 Task #2 ./ruby XYZ Mesos Slave Docker Executor Docker Executor java -jar XYZ.jar ./xyz Mesos Master Mesos Master Mesos Master Hadoop scheduler Marathon scheduler Zookeeper quorum Master schedules tasks to run on slaves’ available resources; slaves use executors to coordinate execution of tasks Tasks are the unit of execution

Slide 23

Slide 23 text

Mesos provides fine-grained resource isolation (via cgroups) Compute Node Mesos Slave Process Hadoop task-tracker Mesos Executor Task #1 Task #2 ruby XYZ Container (Cgroups) Executor Slaves isolate executors and tasks via containers (dotted line)

Slide 24

Slide 24 text

Compute Node Mesos Slave Process Hadoop task-tracker Task #1 Task #2 Container (Cgroups) Task #3 Mesos provides fine-grained resource isolation (via cgroups) Containers can GROW AND SRHINK as tasks run and complete

Slide 25

Slide 25 text

Mesos provides componentized resource isolation Mesos Slave Process Mesos Containerizer CGroups CPU isolator CGroups Memory isolator Launcher Container foo Task baz Containerizer API Executor bar When a slave starts, you can specify a “containerizer” to launch the container and set of isolators to enforce resource constraints (CPU/memory) Mesos can track and allocate more resource types, allowing you to manage resources like ip-addresses, ports, disk space and even GPUs!

Slide 26

Slide 26 text

Mesos provides pluggable resource isolation (e.g., Docker) External Containerizer External Containerizer API Mesos Slave Process External Containerizer Program Container foo MySQL Containerizer API Ubuntu 13.10 Container bar Ruby Centos 6.4 github.com/mesosphere/deimos

Slide 27

Slide 27 text

Everything fails all the time Werner Vogels (Amazon CTO)

Slide 28

Slide 28 text

Mesos has no single point of failure (master keeps monitoring tasks and waits for a node to reconnect, master will update the framework with any tasks that were completed while it was gone) Tasks keep running! Framework Masters

Slide 29

Slide 29 text

Master node can fail-over (ZK quorum will elect a new leader) Tasks keep running! Framework Masters

Slide 30

Slide 30 text

Slave processes can fail over (loads check pointed state to learn what pods to reconnect for reach task and re-registeres with the master) Tasks keep running! Compute Node Mesos Slave Process Mesos Executor Mesos Executor

Slide 31

Slide 31 text

The Mesos ecosystem is growing, frameworks everywhere) http://mesos.apache.org/documentation/latest/mesos-frameworks/

Slide 32

Slide 32 text

Chronos: Distributed cron with dependencies https://github.com/airbnb/chronos

Slide 33

Slide 33 text

Marathon: init.d for your data center https://github.com/mesosphere/marathon

Slide 34

Slide 34 text

Aurora: Advanced scheduler used by Twitter in production http://aurora.incubator.apache.org

Slide 35

Slide 35 text

You can also build your own framework…

Slide 36

Slide 36 text

Agenda ! • Introduction • How does Mesos work? • Mesos Ecosystem • Conclusion • Q&A

Slide 37

Slide 37 text

#PoweredByMesos (public) http://mesos.apache.org/documentation/latest/powered-by-mesos/

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Mesos allow services to scale Engineers think about resources, not machines

Slide 40

Slide 40 text

Storage MySQL Tweet store Flock User Store Cache Memcached Redis Logic Tweet Service User Service Timeline Service SocialGraph Service DM Service Presentation API Web Search Feature X Feature Y Presentation TFE (netty) Reverse Proxy HTTP Thrift Thrift Aurora Mesos Monorail

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Mesos enables multi-tenant clusters Small teams can move fast AWS-based infrastructure beyond just Hadoop

Slide 43

Slide 43 text

Marathon Mesos Chronos Batch/Streaming Hadoop Spark Kafka Query/Analysis Cascading Presto Hive Shark Pig Services Rails Redis Cassandra KairosDB RDS Hadoop A Hadoop B

Slide 44

Slide 44 text

Agenda ! • Introduction • How does Mesos work? • Mesos Ecosystem • Conclusion • Q&A

Slide 45

Slide 45 text

Conclusion • Mesos is a distributed system to build and run distributed systems (think datacenter OS) • Mesos enables resource sharing, high-availability and fault-tolerance for your data centers • Mesos is an open source project with a healthy independent community: http://mesos.apache.org • So please check it out, use it or contribute back if you can to make it better!

Slide 46

Slide 46 text

https://elastic.mesosphere.io

Slide 47

Slide 47 text

http://mesos.apache.org Open Source Support from the Mesos Community

Slide 48

Slide 48 text

http://mesos.apache.org/community/user-groups/ Learn more via Mesos User Groups

Slide 49

Slide 49 text

http://mesosphere.io/learn Commercial Support from Mesosphere

Slide 50

Slide 50 text

http://mesoscon.org First #MesosCon to coincide with LinuxCon 2014!

Slide 51

Slide 51 text

Thank you for listening! Chris Aniszczyk (@cra) [email protected] http://opensource.twitter.com ! http://mesos.apache.org email: {user,dev}@mesos.apache.org 51 Also thanks to Niklas Nielsen and Adam Borlen for their slides explaining Mesos from ApacheCon 2014 https://www.youtube.com/watch?v=EI0ROkf0vks

Slide 52

Slide 52 text

Resources ! http://mesos.apache.org http://mesosphere.io/learn/ http://wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos http://mesosphere.io/2013/09/26/docker-on-mesos/ http://typesafe.com/blog/play-framework-grid-deployment-with-mesos http://research.google.com/pubs/pub35290.html http://nerds.airbnb.com/hadoop-on-mesos/ https://blog.twitter.com/2013/mesos-graduates-from-apache-incubation http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/ https://www.youtube.com/watch?v=EI0ROkf0vks !