… 3 • Clusters: go to http://301.sh/cct2016 • We have clusters prepared → team up in groups of 3-4 people • One person should drive, others help/comment/plan • Join DC/OS Slack Community • http://chat.dcos.io/ • will invite you to private channel #cc2016 • Proctors are around: Bernd, Jens, Tobi
• One cluster for • stateless services such as Web servers & app servers (via Marathon) • stateful services like PostgreSQL, MemSQL, Kafka, Cassandra, etc. • elastic data processing via Spark, Akka, etc. • CI/CD, for example Jenkins+Marathon • Dynamic partitioning of your cluster, depending on your needs • Increased utilization (10% → 80%+)
• A top-level ASF project • A cluster resource negotiator • Scalable to 10,000s of nodes but also useful for a handful of nodes • Fault-tolerant, battle-tested • An SDK for distributed apps • Native Docker support mesos.apache.org
• resource—anything a task consumes to carry out its work • standard resources cpu mem disk ports • to guarantee fair allocation across resource types —Dominant Resource Fairness (DRF) algorithm
acts as the distributed init system for DC/OS • starts instances of a long-running services • restarts the instances if they crash • supports health checks • supports multitude of upgrade strategies • HA built in
37 • groups can contain one or more apps/groups • good for dependency management/scaling • labels → good for non-hierarchical organization https://mesosphere.com/blog/2015/06/21/web-application-analytics-using-docker-and-marathon/
The why and the what: • Containers vs VMs • app-level dependency management • lightweight (startup time, footprint, average runtime) • isolation & security
• namespaces (isolation) • Isolate PIDs between processes • Isolate process to network resources • Isolate the hostname to fake it out (UTS) • Isolate the filesystem mount points (chroot) • Isolate inter process communication (IPC) • Isolate specific users to specific processes • cgroups (limiting & accounting) https://sysadmincasts.com/episodes/14-introduction-to-linux-control-groups-cgroups
• Docker Hub https://hub.docker.com/ • Google Cloud https://cloud.google.com/tools/container-registry/ • AWS https://aws.amazon.com/ecr/ • Run your own https://docs.docker.com/registry/deploying/
58 DNS-based easy to integrate SRV records no health checks TTL Proxy-based no port conflicts fast failover no UDP management of VIPs (Minuteman) or service ports (Marathon-lb) Application-aware developer fully in control and full-feature implementation effort requires distributed state management (ZK, etcd or Consul) examples: Mesos-DNS,Consul examples: Minuteman, Marathon-lb examples: Roll-your-own, Finagle
59 rolling deployment bring up N instances of new app & terminate N instances of old app until all old instances are gone goal: minimize capacity requirements blue-green deployment launch a new stack and switch traffic from old to new when the new instances are healthy goal: minimize impact of regressions, friction, delays, and allow easy rollbacks canary deployment bring up a new stack, start by routing a small portion of traffic to the new app, and slowly increase goal: test production traffic slowly & safely
60 • Cluster-internal: Minuteman, a L4 distributed LB, usage via VIP in Marathon • Internal or edge: Marathon-lb, dynamically updates HAProxy, usage via package+service ports in Marathon • External, for example Azure's offerings
• Based on health checks • Policy via • minimumHealthCapacity float value between 0—1, specifies % of app instances to maintain healthy while performing deployment • maximumOverCapacity float value between 0 — 1, specifies the maximum % of instances that can be over capacity during deployment