Slide 1

Slide 1 text

Distributed Scheduling with Apache Mesos on the Cloud UberConf 2015, Denver Diptanu Gon Choudhury @diptanu

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

This Talk • Challenges of traditional Data Centre environments • Taming the complexities of running services at Scale • Cloud Native Cluster Management with Titan

Slide 4

Slide 4 text

Evolution of Data Centers Mid 90’s Early 2000

Slide 5

Slide 5 text

The Modern Data Centre VM VM VM VM VM VM VM VM VM VM VM VM SDN Network Storage

Slide 6

Slide 6 text

Data Centre of 2015 C C C C C C C C C C C C C C C C C C C C C C C C SDN Network Storage Cloud Storage Cloud Persistence

Slide 7

Slide 7 text

Internet Scale Complexities London Virginia Tokyo Distributed Across Geographies to lower latencies Highly Available within a region by distribution across buildings

Slide 8

Slide 8 text

Evolution of Applications Application Server Database Software Load Balancers API Servers Mid Tier Services Caches Caches Distributed K/V Stores

Slide 9

Slide 9 text

The Multi Core World Scale by running on more cores Scale by adding more servers Scale by running on commodity hardware

Slide 10

Slide 10 text

The Modern Internet Scale Application Is a Data Center Application

Slide 11

Slide 11 text

Data Centre Applications are essentially Distributed Systems

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Operational Aspects of Distributed Systems • Provisioning Compute Resources • Configuration of services • Distribution of services • Supervision and Fault Tolerance • Service Discovery

Slide 14

Slide 14 text

Provisioning Resources Usually Ops folks assign servers for specific teams or applications VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process Statically partitions the data centre

Slide 15

Slide 15 text

Challenges of manual provisioning Node Failures - Restore services from failed nodes on specific servers Distribution of services across fault domains are harder at scale

Slide 16

Slide 16 text

Different Fault Domains • Node • Memory, Disk, CPU, Network card, etc • Rack • PDU • Switch • Data Centre • Power • Cooling

Slide 17

Slide 17 text

Maintenance • Upgrading software is harder • Choosing which machines to upgrade is more difficult VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process

Slide 18

Slide 18 text

Poor Utilization 90 2 5 95 API Services 2 65 55 5 Batch Processes

Slide 19

Slide 19 text

Ideal Utilization 0 25 50 75 100

Slide 20

Slide 20 text

Challenges of Manual Scheduling • Homogenous distribution of applications on a single node decreases utilization. • Static partitioning doesn’t allow sharing a group of machine’s resources across multiple applications.

Slide 21

Slide 21 text

Enter the Era of Cluster Managers • Mesos • Borg • Kubernetes • CoreOS Fleet

Slide 22

Slide 22 text

Cluster Managers C C C C C C C C C C C C C C C C C C C C C C C C SDN Network Storage Cluster Manager API Servers Batch Apps DBs

Slide 23

Slide 23 text

Mesos

Slide 24

Slide 24 text

Mesos Moves the focus from Servers to Compute Resources

Slide 25

Slide 25 text

Mesos • Two level scheduler • Provides discovery and brokerage of compute resources • Semantics for launching processes • Event driven API for monitoring life cycle of applications • Resource Isolation on a single node

Slide 26

Slide 26 text

Domain Specific Frameworks • Scheduling decisions are left to users • Allows plugging in multiple frameworks • Allows users to define their own states for processes • Pending -> Running -> Dead • Sends messages when state of a process changes • Task Dispatched -> Task Staging-> Task Running -> Task Finished

Slide 27

Slide 27 text

Node Node Node Node Node 10s of 1000s of Compute Nodes Mesos Scheduler 1 Scheduler 2 Mesos - Indirections

Slide 28

Slide 28 text

Mesos - Abstractions IAAS - Provisions Machine Mesos - Deploy and supervise processes

Slide 29

Slide 29 text

Mesos - Abstractions EC2 Mesos - Deploy and supervise processes

Slide 30

Slide 30 text

Mesos - Abstractions Mesos - Deploy and supervise processes Titan Aurora Marathon

Slide 31

Slide 31 text

Mesos - Custom Executors Linux Kernel Mesos Slave Mesos Executor Process Compute Node

Slide 32

Slide 32 text

Mesos - SDK for building Data Centre OS • SDK in Java, Python, Go • Provides interfaces for exchanging messages between schedulers and executors • Provides log replication capabilities

Slide 33

Slide 33 text

Data Centre OS Services • A highly available and consistent control plane for managing state of the cluster • An API for users and other services to submit job specifications • Custom Executors to setup processes and communicate life cycle events of processes • Containerizer for providing process isolation

Slide 34

Slide 34 text

Cloud Native Scheduling • AutoScaling • Dynamic Reservations • Automatice Node Replacements • Fail-overs across multiple Regions

Slide 35

Slide 35 text

Titan

Slide 36

Slide 36 text

Titan • A distributed compute service native to public clouds • Provides AutoScaling to clusters of Containers • Supervises containers and provides failover mechanisms to applications running in containers • Provides logging, monitoring, volume management capabilities

Slide 37

Slide 37 text

Titan

Slide 38

Slide 38 text

A Compute Node of Titan

Slide 39

Slide 39 text

Scheduling Library for Mesos

Slide 40

Slide 40 text

Dynamic Reservations • Titan allows reserving resources for specific applications • Enforces the reservations under resource contentions • Reservations are made on a priority level • P1 (Guaranteed Reservations) <-> P3 (Best Effort)

Slide 41

Slide 41 text

AutoScaling • Two Levels of AutoScaling • Scaling of underlying compute resources • Application Scaling based on business and performance metrics

Slide 42

Slide 42 text

jobs.netflix.com The Data Centre As a Computer http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf Resource Scheduling using Fenzo http://www.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud Apache Mesos https://www.cs.berkeley.edu/~alig/papers/mesos.pdf Large Scale Cluster Management at Google http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf