• namespaces • Isolate PIDs between processes • Isolate process to network resources • Isolate the hostname to fake it out (UTS) • Isolate the filesystem mount points (chroot) • Isolate inter process communication (IPC) • Isolate specific users to specific processes • cgroups https://sysadmincasts.com/episodes/14-introduction-to-linux-control-groups-cgroups
• One cluster for • stateless services such as Web servers & app servers (via Marathon) • stateful services like PostgreSQL, MemSQL, Kafka, Cassandra, etc. • elastic data processing via Spark, Akka, etc. • CI/CD, for example Jenkins+Marathon • Dynamic partitioning of your cluster, depending on your needs • Increased utilization (10% → 80%+)
• A top-level ASF project • A cluster resource negotiator • Scalable to 10,000s of nodes but also useful for a handful of nodes • Fault-tolerant, battle-tested • An SDK for distributed apps • Native Docker support mesos.apache.org
resource: anything a task consumes to do its work • standard resources: cpu mem disk ports • Dominant Resource Fairness (DRF) algorithm guarantees fair allocation across resource types
DC/OS 'init system' • starts instances of a long-running services • restarts the instances if they crash • provides composition primitives • supports health checks • supports rolling upgrades
• Groups can contain one or more apps/groups • Dependency management • Scaling https://mesosphere.com/blog/2015/06/21/web-application-analytics-using-docker-and-marathon/
31 DNS-based easy to integrate SRV records no health checks TTL Proxy-based no port conflicts fast failover no UDP management of VIPs (Minuteman) or service ports (Marathon-lb) Application-aware developer fully in control and full-feature implementation effort requires distributed state management (ZK, etcd or Consul) examples: Mesos-DNS,Consul examples: Minuteman, Marathon-lb examples: Roll-your-own, Finagle
32 rolling deployment bring up N instances of new app & terminate N instances of old app until all old instances are gone goal: minimize capacity requirements blue-green deployment launch a new stack and switch traffic from old to new when the new instances are healthy goal: minimize impact of regressions, friction, delays, and allow easy rollbacks canary deployment bring up a new stack, start by routing a small portion of traffic to the new app, and slowly increase goal: test production traffic slowly & safely
33 • Cluster-internal: Minuteman, a L4 distributed LB, usage via VIP in Marathon • Internal or edge: Marathon-lb, dynamically updates HAProxy, usage via package+service ports in Marathon • External, for example Azure's offerings
• Based on health checks • Policy via • minimumHealthCapacity float value between 0—1, specifies % of app instances to maintain healthy while performing deployment • maximumOverCapacity float value between 0 — 1, specifies the maximum % of instances that can be over capacity during deployment