Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mesos Marathon Overview

Mesos Marathon Overview

Presentation at the Aug '16 Mesos Syd meetup.

Andreas Nygard

August 17, 2016
Tweet

Other Decks in Technology

Transcript

  1. Overview u What + Why? u Looking at the implementation:

    u How Marathon schedules tasks with Mesos u How Marathon monitors tasks u Questions
  2. Marathon Marathon Framework Framework 2 Framework 3 Mesos Master Mesos

    Agent Executor Task Task Mesos Agent Executor Task Task Mesos Agent Executor Task Task 36 CPU - 3TB RAM - 5000TB Storage – 120,000 Ports
  3. Marathon u Highly available u Provides a REST API for

    launching / scaling / upgrading apps u Supports constraints on where your app should be deployed u e.g. UNIQUE, CLUSTER, GROUP_BY u Event subscription, Metrics
  4. How we’re using it u Initial use case: Deliver a

    service which tracks and coordinates thousands of map reduce jobs. u Providing a PaaS for other teams to host their apps. u Requires a whole ecosystem of tooling, such as: u Continuous delivery (Ansible, Artifactory, CI tools) u Infrastructure testing (Ansible, Serverspec, Infrataster) u Monitoring (Splunk) u Alerting (Splunk)
  5. How Tasks are Scheduled Mesos Marathon resourceOffers(offers: Offer[]) acceptOffers(offers: OfferID[],

    ops: Operation[]) or, declineOffer(offer: offerID) Example Operation: { type = Launch launch = { task_info = { task_id = “my-app-32e20..” slave_id = “node-1” resources = [{ name = ”cpus” type = scalar value = 0.5 }] container = .. } } } Example Offer: { id = “812d-das..” framework_id = “marathon-e81..” slave_id = “node-1-34ca0..” hostname = “node-1” resources = [ { name = ”cpus” type = scalar value = 8 }, .. ] } Note: protobuf protocol, not JSON!
  6. How Tasks are Scheduled Mesos resourceOffers(offers) acceptOffers(offers, ops) or, declineOffer(offer)

    MarathonScheduler OfferMatcher new(app, count) foreach offer: match(offer) foreach matcher: match(offer) Note: Slightly simplified – there are some more intermediate types TaskLauncherActor subscribe(self) TaskLauncher add(app, count) * • app added • app scaled • task failed *
  7. How Tasks Monitored u Health Checks: u Default: Where Mesos

    indicates TASK_RUNNING u HTTP / HTTPS: Where response between 200 – 399 u TCP: Where connection is established u Command: Where exit code is zero u Once maxConsecutiveFailures is exceeded, task is killed u Marathon will restart a task with status: u FINISHED | ERROR | FAILED | KILLED | GONE (agent terminated)