$30 off During Our Annual Pro Sale. View Details »

Docker in Production

spiddy
July 22, 2015

Docker in Production

A fast-forward of the process to modernize an infrastructure to a Micro-service oriented and Container-based cluster.

spiddy

July 22, 2015
Tweet

More Decks by spiddy

Other Decks in Technology

Transcript

  1. Docker in Production
    Speaker: Dimitris Kapanidis

    View Slide

  2. Innovating Container Delivery

    View Slide

  3. Index
    - Requirements
    - Phase 1
    - Continuous Delivery
    - Phase 2
    - Operating System
    - Benchmarking Day
    - Disaster Recovery Day

    View Slide

  4. Requirements
    - Production
    - 23 Million daily hits
    - Off-load DB usage
    - Fast response time
    - Process
    - Quick deploy cycles
    - Easy rollbacks
    - Easy scaling
    - Automatic failover
    - Dev,QA,Prod parity
    - No SPOF
    - Canary Releases

    View Slide

  5. Phase 1
    Containerize an existing Tomcat WAR

    View Slide

  6. Delivery Life Cycle - Classic

    View Slide

  7. Infrastructure

    View Slide

  8. Issues with This Delivery Lifecycle
    - Too many open questions
    - How is Production environment provisioned?
    - What JDK, Tomcat version?
    - How are JDK/Tomcat installed? (apt-get, tarball)
    - Values of variables?
    - JAVA_OPTS, JAVA_HOME
    - CATALINA_OPTS, CATALINA_HOME
    - Logging mechanism
    - Application Configuration

    View Slide

  9. WAR
    is not executable

    View Slide

  10. How?
    - Application is a WAR file
    - The migration is initially an Ops project
    - Gave the confidence to bootstrap the rest
    - Keep it incremental!

    View Slide

  11. Infrastructure

    View Slide

  12. Infrastructure

    View Slide

  13. Infrastructure

    View Slide

  14. Delivery Life Cycle - Containers

    View Slide

  15. Creating Container
    FROM tomcat:7.0.59
    ADD app.war webapps/
    Dockerfile
    > ls
    app.war
    Dockerfile
    Directory Structure

    View Slide

  16. Delivery Life Cycle - Containers

    View Slide

  17. Containers
    are executable

    View Slide

  18. Container Image Tuning
    - Tomcat APR Installation
    - JAVA_OPTS Memory tuning
    - Remove default Tomcat webapps
    - Expand WAR during docker build
    - Set Locale, Timezone

    View Slide

  19. - Use official parent Images
    - Add variable Configuration (with defaults)
    - Use ENV variables when possible
    - Use Template config files to inject values
    - Add variable Validation (with exit status)
    - Follow open-source examples
    Container Image Build Checklist

    View Slide

  20. Adding entrypoint
    FROM tomcat:7.0.59
    ADD app.war webapps/
    COPY docker-entrypoint.sh /entrypoint.sh
    ENTRYPOINT [“/entrypoint.sh”]
    Dockerfile
    > ls
    app.war
    Dockerfile
    docker-entrypoint.sh
    Directory Structure

    View Slide

  21. Bringing down walls
    but still a two steps build process…

    View Slide

  22. Introducing
    captain
    https://github.com/harbur/captain

    View Slide

  23. Captain Configuration
    app:
    build: Dockerfile
    image: registry.local/user/app
    pre:
    - mvn clean install
    captain.yml

    View Slide

  24. Atomic Builds
    but we still depend on installed mvn at build machine...

    View Slide

  25. app:
    build: Dockerfile
    image: registry.local/user/app
    pre:
    - docker run -it --rm -v ~/.m2:/root/.m2 \
    -v "$PWD":/app -w /app maven:3.3.3 mvn clean install
    Captain Configuration
    captain.yml

    View Slide

  26. Continuous Delivery
    in other words: from Commit to Container

    View Slide

  27. Continuous Integration: Build
    master
    branch B
    branch A
    captain build
    latest
    branch A

    View Slide

  28. Continuous Integration: Test
    captain test
    latest

    View Slide

  29. Continuous Integration: Push
    captain push
    latest latest

    View Slide

  30. Continuous Integration: Release
    v0.1 v0.2 v1.0
    captain push

    View Slide

  31. Phase 2
    Distributed Container-Based Architecture
    (a.k.a Microservices)

    View Slide

  32. Phase 2 - Legacy Production

    View Slide

  33. Phase 2 - Add Cache

    View Slide

  34. Phase 2 - Add Messaging System

    View Slide

  35. Phase 2 - Add Tasks

    View Slide

  36. Phase 2 - Topology

    View Slide

  37. How to Achieve
    - Infrastructure as Code
    - Cluster-centric Operating System
    - Designed with fail-safes
    - Avoid Single Point-of-Failure (SPOF)
    - Designed with redundancy
    - Partition tolerant
    - Lightweight

    View Slide

  38. Operating System
    Designed for Cluster

    View Slide

  39. The Operating System - CoreOS

    View Slide

  40. The Operating System - CoreOS

    View Slide

  41. The Operating System - CoreOS

    View Slide

  42. The Operating System - CoreOS

    View Slide

  43. - Initialization of cluster instances
    - Infrastructure as Code
    - Treat servers as Cattle (not pets)
    - Scale-out with one command
    - CoreOS Update reboot-strategy: off
    - Discovery URL to auto-configure cluster
    - Configured Fleet Metadata
    Cloud Config YML

    View Slide

  44. Fleet Units
    - One per Service
    - Configuration Management with Etcd
    - Side-kick to activate Load Balancer

    View Slide

  45. Production Cluster
    cloud-config.yml (injecting private keys for read-only access of git repo & docker registry)
    docker pull
    git checkout
    Fleet Unit Files

    View Slide

  46. Benchmarking Day

    View Slide

  47. Production Server Topology 1

    View Slide

  48. Production Server Topology 2

    View Slide

  49. Production Server Topology 3

    View Slide

  50. Production Server Topology 4

    View Slide

  51. Production Server Topology 5

    View Slide

  52. On 4 cluster servers

    View Slide

  53. Disaster Recovery Day

    View Slide

  54. - Entire Cluster failure
    - Running Containers +350
    - Rebuilt cluster etcd from scratch
    - Downtime: 2 hours
    Worse Case Scenario

    View Slide

  55. - Buggy App (100% cpu) deployed on cluster
    - Measures: Deployment on Test environment
    - Measures: Canary deploys (10% of cluster)
    - High CPU usage disrupted etcd cluster
    - Measures: Tune etcd time-outs
    - Measures: Setup isolated etcd-cluster servers
    - Reboot of machines provoked OS upgrade
    - Measures: Lock-down CoreOS version by disabling
    service-upgrade; Upgrades on-demand
    Worse Case Scenario - Timeline

    View Slide

  56. Thank you

    View Slide