Distributed Scheduling with Apache Mesos in the Cloud

Distributed Scheduling with Apache Mesos on the Cloud UberConf 2015,
Denver Diptanu Gon Choudhury @diptanu

This Talk • Challenges of traditional Data Centre environments •
Taming the complexities of running services at Scale • Cloud Native Cluster Management with Titan

Evolution of Data Centers Mid 90’s Early 2000

The Modern Data Centre VM VM VM VM VM VM
VM VM VM VM VM VM SDN Network Storage

Data Centre of 2015 C C C C C C
C C C C C C C C C C C C C C C C C C SDN Network Storage Cloud Storage Cloud Persistence

Internet Scale Complexities London Virginia Tokyo Distributed Across Geographies to
lower latencies Highly Available within a region by distribution across buildings

Evolution of Applications Application Server Database Software Load Balancers API
Servers Mid Tier Services Caches Caches Distributed K/V Stores

The Multi Core World Scale by running on more cores
Scale by adding more servers Scale by running on commodity hardware

The Modern Internet Scale Application Is a Data Center Application

Data Centre Applications are essentially Distributed Systems

Operational Aspects of Distributed Systems • Provisioning Compute Resources •
Conﬁguration of services • Distribution of services • Supervision and Fault Tolerance • Service Discovery

Provisioning Resources Usually Ops folks assign servers for speciﬁc teams
or applications VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process Statically partitions the data centre

Challenges of manual provisioning Node Failures - Restore services from
failed nodes on speciﬁc servers Distribution of services across fault domains are harder at scale

Different Fault Domains • Node • Memory, Disk, CPU, Network
card, etc • Rack • PDU • Switch • Data Centre • Power • Cooling

Maintenance • Upgrading software is harder • Choosing which machines
to upgrade is more difﬁcult VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process

Poor Utilization 90 2 5 95 API Services 2 65
55 5 Batch Processes

Ideal Utilization 0 25 50 75 100

Challenges of Manual Scheduling • Homogenous distribution of applications on
a single node decreases utilization. • Static partitioning doesn’t allow sharing a group of machine’s resources across multiple applications.

Enter the Era of Cluster Managers • Mesos • Borg
• Kubernetes • CoreOS Fleet

Cluster Managers C C C C C C C C
C C C C C C C C C C C C C C C C SDN Network Storage Cluster Manager API Servers Batch Apps DBs

Mesos Moves the focus from Servers to Compute Resources

Mesos • Two level scheduler • Provides discovery and brokerage
of compute resources • Semantics for launching processes • Event driven API for monitoring life cycle of applications • Resource Isolation on a single node

Domain Speciﬁc Frameworks • Scheduling decisions are left to users
• Allows plugging in multiple frameworks • Allows users to deﬁne their own states for processes • Pending -> Running -> Dead • Sends messages when state of a process changes • Task Dispatched -> Task Staging-> Task Running -> Task Finished

Node Node Node Node Node 10s of 1000s of Compute
Nodes Mesos Scheduler 1 Scheduler 2 Mesos - Indirections

Mesos - Abstractions IAAS - Provisions Machine Mesos - Deploy
and supervise processes

Mesos - Abstractions EC2 Mesos - Deploy and supervise processes

Mesos - Abstractions Mesos - Deploy and supervise processes Titan
Aurora Marathon

Mesos - Custom Executors Linux Kernel Mesos Slave Mesos Executor
Process Compute Node

Mesos - SDK for building Data Centre OS • SDK
in Java, Python, Go • Provides interfaces for exchanging messages between schedulers and executors • Provides log replication capabilities

Data Centre OS Services • A highly available and consistent
control plane for managing state of the cluster • An API for users and other services to submit job speciﬁcations • Custom Executors to setup processes and communicate life cycle events of processes • Containerizer for providing process isolation

Cloud Native Scheduling • AutoScaling • Dynamic Reservations • Automatice
Node Replacements • Fail-overs across multiple Regions

Titan • A distributed compute service native to public clouds
• Provides AutoScaling to clusters of Containers • Supervises containers and provides failover mechanisms to applications running in containers • Provides logging, monitoring, volume management capabilities

A Compute Node of Titan

Scheduling Library for Mesos

Dynamic Reservations • Titan allows reserving resources for speciﬁc applications
• Enforces the reservations under resource contentions • Reservations are made on a priority level • P1 (Guaranteed Reservations) <-> P3 (Best Effort)

AutoScaling • Two Levels of AutoScaling • Scaling of underlying
compute resources • Application Scaling based on business and performance metrics

jobs.netﬂix.com The Data Centre As a Computer http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf Resource Scheduling
using Fenzo http://www.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud Apache Mesos https://www.cs.berkeley.edu/~alig/papers/mesos.pdf Large Scale Cluster Management at Google http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf

Distributed Scheduling with Apache Mesos in the...

Distributed Scheduling with Apache Mesos in the Cloud

More Decks by Diptanu Choudhury

Featured

Transcript