Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Scheduling with Apache Mesos in the...

Diptanu Choudhury
July 23, 2015
350

Distributed Scheduling with Apache Mesos in the Cloud

Diptanu Choudhury

July 23, 2015
Tweet

Transcript

  1. This Talk • Challenges of traditional Data Centre environments •

    Taming the complexities of running services at Scale • Cloud Native Cluster Management with Titan
  2. The Modern Data Centre VM VM VM VM VM VM

    VM VM VM VM VM VM SDN Network Storage
  3. Data Centre of 2015 C C C C C C

    C C C C C C C C C C C C C C C C C C SDN Network Storage Cloud Storage Cloud Persistence
  4. Internet Scale Complexities London Virginia Tokyo Distributed Across Geographies to

    lower latencies Highly Available within a region by distribution across buildings
  5. Evolution of Applications Application Server Database Software Load Balancers API

    Servers Mid Tier Services Caches Caches Distributed K/V Stores
  6. The Multi Core World Scale by running on more cores

    Scale by adding more servers Scale by running on commodity hardware
  7. Operational Aspects of Distributed Systems • Provisioning Compute Resources •

    Configuration of services • Distribution of services • Supervision and Fault Tolerance • Service Discovery
  8. Provisioning Resources Usually Ops folks assign servers for specific teams

    or applications VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process Statically partitions the data centre
  9. Challenges of manual provisioning Node Failures - Restore services from

    failed nodes on specific servers Distribution of services across fault domains are harder at scale
  10. Different Fault Domains • Node • Memory, Disk, CPU, Network

    card, etc • Rack • PDU • Switch • Data Centre • Power • Cooling
  11. Maintenance • Upgrading software is harder • Choosing which machines

    to upgrade is more difficult VM1 VM2 VM3 VM4 VM5 VM6 Service Database Batch Process
  12. Challenges of Manual Scheduling • Homogenous distribution of applications on

    a single node decreases utilization. • Static partitioning doesn’t allow sharing a group of machine’s resources across multiple applications.
  13. Cluster Managers C C C C C C C C

    C C C C C C C C C C C C C C C C SDN Network Storage Cluster Manager API Servers Batch Apps DBs
  14. Mesos • Two level scheduler • Provides discovery and brokerage

    of compute resources • Semantics for launching processes • Event driven API for monitoring life cycle of applications • Resource Isolation on a single node
  15. Domain Specific Frameworks • Scheduling decisions are left to users

    • Allows plugging in multiple frameworks • Allows users to define their own states for processes • Pending -> Running -> Dead • Sends messages when state of a process changes • Task Dispatched -> Task Staging-> Task Running -> Task Finished
  16. Node Node Node Node Node 10s of 1000s of Compute

    Nodes Mesos Scheduler 1 Scheduler 2 Mesos - Indirections
  17. Mesos - SDK for building Data Centre OS • SDK

    in Java, Python, Go • Provides interfaces for exchanging messages between schedulers and executors • Provides log replication capabilities
  18. Data Centre OS Services • A highly available and consistent

    control plane for managing state of the cluster • An API for users and other services to submit job specifications • Custom Executors to setup processes and communicate life cycle events of processes • Containerizer for providing process isolation
  19. Cloud Native Scheduling • AutoScaling • Dynamic Reservations • Automatice

    Node Replacements • Fail-overs across multiple Regions
  20. Titan • A distributed compute service native to public clouds

    • Provides AutoScaling to clusters of Containers • Supervises containers and provides failover mechanisms to applications running in containers • Provides logging, monitoring, volume management capabilities
  21. Dynamic Reservations • Titan allows reserving resources for specific applications

    • Enforces the reservations under resource contentions • Reservations are made on a priority level • P1 (Guaranteed Reservations) <-> P3 (Best Effort)
  22. AutoScaling • Two Levels of AutoScaling • Scaling of underlying

    compute resources • Application Scaling based on business and performance metrics
  23. jobs.netflix.com The Data Centre As a Computer http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf Resource Scheduling

    using Fenzo http://www.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud Apache Mesos https://www.cs.berkeley.edu/~alig/papers/mesos.pdf Large Scale Cluster Management at Google http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf