Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mesos Marathon Overview

Mesos Marathon Overview

Presentation at the Aug '16 Mesos Syd meetup.

Avatar for Andreas Nygard

Andreas Nygard

August 17, 2016
Tweet

Other Decks in Technology

Transcript

  1. Overview u What + Why? u Looking at the implementation:

    u How Marathon schedules tasks with Mesos u How Marathon monitors tasks u Questions
  2. Marathon Marathon Framework Framework 2 Framework 3 Mesos Master Mesos

    Agent Executor Task Task Mesos Agent Executor Task Task Mesos Agent Executor Task Task 36 CPU - 3TB RAM - 5000TB Storage – 120,000 Ports
  3. Marathon u Highly available u Provides a REST API for

    launching / scaling / upgrading apps u Supports constraints on where your app should be deployed u e.g. UNIQUE, CLUSTER, GROUP_BY u Event subscription, Metrics
  4. How we’re using it u Initial use case: Deliver a

    service which tracks and coordinates thousands of map reduce jobs. u Providing a PaaS for other teams to host their apps. u Requires a whole ecosystem of tooling, such as: u Continuous delivery (Ansible, Artifactory, CI tools) u Infrastructure testing (Ansible, Serverspec, Infrataster) u Monitoring (Splunk) u Alerting (Splunk)
  5. How Tasks are Scheduled Mesos Marathon resourceOffers(offers: Offer[]) acceptOffers(offers: OfferID[],

    ops: Operation[]) or, declineOffer(offer: offerID) Example Operation: { type = Launch launch = { task_info = { task_id = “my-app-32e20..” slave_id = “node-1” resources = [{ name = ”cpus” type = scalar value = 0.5 }] container = .. } } } Example Offer: { id = “812d-das..” framework_id = “marathon-e81..” slave_id = “node-1-34ca0..” hostname = “node-1” resources = [ { name = ”cpus” type = scalar value = 8 }, .. ] } Note: protobuf protocol, not JSON!
  6. How Tasks are Scheduled Mesos resourceOffers(offers) acceptOffers(offers, ops) or, declineOffer(offer)

    MarathonScheduler OfferMatcher new(app, count) foreach offer: match(offer) foreach matcher: match(offer) Note: Slightly simplified – there are some more intermediate types TaskLauncherActor subscribe(self) TaskLauncher add(app, count) * • app added • app scaled • task failed *
  7. How Tasks Monitored u Health Checks: u Default: Where Mesos

    indicates TASK_RUNNING u HTTP / HTTPS: Where response between 200 – 399 u TCP: Where connection is established u Command: Where exit code is zero u Once maxConsecutiveFailures is exceeded, task is killed u Marathon will restart a task with status: u FINISHED | ERROR | FAILED | KILLED | GONE (agent terminated)