Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Docker in Production

D8a64c3723b72b2a1cfd09342e9103d3?s=47 spiddy
July 22, 2015

Docker in Production

A fast-forward of the process to modernize an infrastructure to a Micro-service oriented and Container-based cluster.

D8a64c3723b72b2a1cfd09342e9103d3?s=128

spiddy

July 22, 2015
Tweet

Transcript

  1. Docker in Production Speaker: Dimitris Kapanidis

  2. Innovating Container Delivery

  3. Index - Requirements - Phase 1 - Continuous Delivery -

    Phase 2 - Operating System - Benchmarking Day - Disaster Recovery Day
  4. Requirements - Production - 23 Million daily hits - Off-load

    DB usage - Fast response time - Process - Quick deploy cycles - Easy rollbacks - Easy scaling - Automatic failover - Dev,QA,Prod parity - No SPOF - Canary Releases
  5. Phase 1 Containerize an existing Tomcat WAR

  6. Delivery Life Cycle - Classic

  7. Infrastructure

  8. Issues with This Delivery Lifecycle - Too many open questions

    - How is Production environment provisioned? - What JDK, Tomcat version? - How are JDK/Tomcat installed? (apt-get, tarball) - Values of variables? - JAVA_OPTS, JAVA_HOME - CATALINA_OPTS, CATALINA_HOME - Logging mechanism - Application Configuration
  9. WAR is not executable

  10. How? - Application is a WAR file - The migration

    is initially an Ops project - Gave the confidence to bootstrap the rest - Keep it incremental!
  11. Infrastructure

  12. Infrastructure

  13. Infrastructure

  14. Delivery Life Cycle - Containers

  15. Creating Container FROM tomcat:7.0.59 ADD app.war webapps/ Dockerfile > ls

    app.war Dockerfile Directory Structure
  16. Delivery Life Cycle - Containers

  17. Containers are executable

  18. Container Image Tuning - Tomcat APR Installation - JAVA_OPTS Memory

    tuning - Remove default Tomcat webapps - Expand WAR during docker build - Set Locale, Timezone
  19. - Use official parent Images - Add variable Configuration (with

    defaults) - Use ENV variables when possible - Use Template config files to inject values - Add variable Validation (with exit status) - Follow open-source examples Container Image Build Checklist
  20. Adding entrypoint FROM tomcat:7.0.59 ADD app.war webapps/ COPY docker-entrypoint.sh /entrypoint.sh

    ENTRYPOINT [“/entrypoint.sh”] Dockerfile > ls app.war Dockerfile docker-entrypoint.sh Directory Structure
  21. Bringing down walls but still a two steps build process…

  22. Introducing captain https://github.com/harbur/captain

  23. Captain Configuration app: build: Dockerfile image: registry.local/user/app pre: - mvn

    clean install captain.yml
  24. Atomic Builds but we still depend on installed mvn at

    build machine...
  25. app: build: Dockerfile image: registry.local/user/app pre: - docker run -it

    --rm -v ~/.m2:/root/.m2 \ -v "$PWD":/app -w /app maven:3.3.3 mvn clean install Captain Configuration captain.yml
  26. Continuous Delivery in other words: from Commit to Container

  27. Continuous Integration: Build master branch B branch A captain build

    latest branch A
  28. Continuous Integration: Test captain test latest

  29. Continuous Integration: Push captain push latest latest

  30. Continuous Integration: Release v0.1 v0.2 v1.0 captain push

  31. Phase 2 Distributed Container-Based Architecture (a.k.a Microservices)

  32. Phase 2 - Legacy Production

  33. Phase 2 - Add Cache

  34. Phase 2 - Add Messaging System

  35. Phase 2 - Add Tasks

  36. Phase 2 - Topology

  37. How to Achieve - Infrastructure as Code - Cluster-centric Operating

    System - Designed with fail-safes - Avoid Single Point-of-Failure (SPOF) - Designed with redundancy - Partition tolerant - Lightweight
  38. Operating System Designed for Cluster

  39. The Operating System - CoreOS

  40. The Operating System - CoreOS

  41. The Operating System - CoreOS

  42. The Operating System - CoreOS

  43. - Initialization of cluster instances - Infrastructure as Code -

    Treat servers as Cattle (not pets) - Scale-out with one command - CoreOS Update reboot-strategy: off - Discovery URL to auto-configure cluster - Configured Fleet Metadata Cloud Config YML
  44. Fleet Units - One per Service - Configuration Management with

    Etcd - Side-kick to activate Load Balancer
  45. Production Cluster cloud-config.yml (injecting private keys for read-only access of

    git repo & docker registry) docker pull git checkout Fleet Unit Files
  46. Benchmarking Day

  47. Production Server Topology 1

  48. Production Server Topology 2

  49. Production Server Topology 3

  50. Production Server Topology 4

  51. Production Server Topology 5

  52. On 4 cluster servers

  53. Disaster Recovery Day

  54. - Entire Cluster failure - Running Containers +350 - Rebuilt

    cluster etcd from scratch - Downtime: 2 hours Worse Case Scenario
  55. - Buggy App (100% cpu) deployed on cluster - Measures:

    Deployment on Test environment - Measures: Canary deploys (10% of cluster) - High CPU usage disrupted etcd cluster - Measures: Tune etcd time-outs - Measures: Setup isolated etcd-cluster servers - Reboot of machines provoked OS upgrade - Measures: Lock-down CoreOS version by disabling service-upgrade; Upgrades on-demand Worse Case Scenario - Timeline
  56. Thank you