Slide 1

Slide 1 text

Docker in Production Speaker: Dimitris Kapanidis

Slide 2

Slide 2 text

Innovating Container Delivery

Slide 3

Slide 3 text

Index - Requirements - Phase 1 - Continuous Delivery - Phase 2 - Operating System - Benchmarking Day - Disaster Recovery Day

Slide 4

Slide 4 text

Requirements - Production - 23 Million daily hits - Off-load DB usage - Fast response time - Process - Quick deploy cycles - Easy rollbacks - Easy scaling - Automatic failover - Dev,QA,Prod parity - No SPOF - Canary Releases

Slide 5

Slide 5 text

Phase 1 Containerize an existing Tomcat WAR

Slide 6

Slide 6 text

Delivery Life Cycle - Classic

Slide 7

Slide 7 text

Infrastructure

Slide 8

Slide 8 text

Issues with This Delivery Lifecycle - Too many open questions - How is Production environment provisioned? - What JDK, Tomcat version? - How are JDK/Tomcat installed? (apt-get, tarball) - Values of variables? - JAVA_OPTS, JAVA_HOME - CATALINA_OPTS, CATALINA_HOME - Logging mechanism - Application Configuration

Slide 9

Slide 9 text

WAR is not executable

Slide 10

Slide 10 text

How? - Application is a WAR file - The migration is initially an Ops project - Gave the confidence to bootstrap the rest - Keep it incremental!

Slide 11

Slide 11 text

Infrastructure

Slide 12

Slide 12 text

Infrastructure

Slide 13

Slide 13 text

Infrastructure

Slide 14

Slide 14 text

Delivery Life Cycle - Containers

Slide 15

Slide 15 text

Creating Container FROM tomcat:7.0.59 ADD app.war webapps/ Dockerfile > ls app.war Dockerfile Directory Structure

Slide 16

Slide 16 text

Delivery Life Cycle - Containers

Slide 17

Slide 17 text

Containers are executable

Slide 18

Slide 18 text

Container Image Tuning - Tomcat APR Installation - JAVA_OPTS Memory tuning - Remove default Tomcat webapps - Expand WAR during docker build - Set Locale, Timezone

Slide 19

Slide 19 text

- Use official parent Images - Add variable Configuration (with defaults) - Use ENV variables when possible - Use Template config files to inject values - Add variable Validation (with exit status) - Follow open-source examples Container Image Build Checklist

Slide 20

Slide 20 text

Adding entrypoint FROM tomcat:7.0.59 ADD app.war webapps/ COPY docker-entrypoint.sh /entrypoint.sh ENTRYPOINT [“/entrypoint.sh”] Dockerfile > ls app.war Dockerfile docker-entrypoint.sh Directory Structure

Slide 21

Slide 21 text

Bringing down walls but still a two steps build process…

Slide 22

Slide 22 text

Introducing captain https://github.com/harbur/captain

Slide 23

Slide 23 text

Captain Configuration app: build: Dockerfile image: registry.local/user/app pre: - mvn clean install captain.yml

Slide 24

Slide 24 text

Atomic Builds but we still depend on installed mvn at build machine...

Slide 25

Slide 25 text

app: build: Dockerfile image: registry.local/user/app pre: - docker run -it --rm -v ~/.m2:/root/.m2 \ -v "$PWD":/app -w /app maven:3.3.3 mvn clean install Captain Configuration captain.yml

Slide 26

Slide 26 text

Continuous Delivery in other words: from Commit to Container

Slide 27

Slide 27 text

Continuous Integration: Build master branch B branch A captain build latest branch A

Slide 28

Slide 28 text

Continuous Integration: Test captain test latest

Slide 29

Slide 29 text

Continuous Integration: Push captain push latest latest

Slide 30

Slide 30 text

Continuous Integration: Release v0.1 v0.2 v1.0 captain push

Slide 31

Slide 31 text

Phase 2 Distributed Container-Based Architecture (a.k.a Microservices)

Slide 32

Slide 32 text

Phase 2 - Legacy Production

Slide 33

Slide 33 text

Phase 2 - Add Cache

Slide 34

Slide 34 text

Phase 2 - Add Messaging System

Slide 35

Slide 35 text

Phase 2 - Add Tasks

Slide 36

Slide 36 text

Phase 2 - Topology

Slide 37

Slide 37 text

How to Achieve - Infrastructure as Code - Cluster-centric Operating System - Designed with fail-safes - Avoid Single Point-of-Failure (SPOF) - Designed with redundancy - Partition tolerant - Lightweight

Slide 38

Slide 38 text

Operating System Designed for Cluster

Slide 39

Slide 39 text

The Operating System - CoreOS

Slide 40

Slide 40 text

The Operating System - CoreOS

Slide 41

Slide 41 text

The Operating System - CoreOS

Slide 42

Slide 42 text

The Operating System - CoreOS

Slide 43

Slide 43 text

- Initialization of cluster instances - Infrastructure as Code - Treat servers as Cattle (not pets) - Scale-out with one command - CoreOS Update reboot-strategy: off - Discovery URL to auto-configure cluster - Configured Fleet Metadata Cloud Config YML

Slide 44

Slide 44 text

Fleet Units - One per Service - Configuration Management with Etcd - Side-kick to activate Load Balancer

Slide 45

Slide 45 text

Production Cluster cloud-config.yml (injecting private keys for read-only access of git repo & docker registry) docker pull git checkout Fleet Unit Files

Slide 46

Slide 46 text

Benchmarking Day

Slide 47

Slide 47 text

Production Server Topology 1

Slide 48

Slide 48 text

Production Server Topology 2

Slide 49

Slide 49 text

Production Server Topology 3

Slide 50

Slide 50 text

Production Server Topology 4

Slide 51

Slide 51 text

Production Server Topology 5

Slide 52

Slide 52 text

On 4 cluster servers

Slide 53

Slide 53 text

Disaster Recovery Day

Slide 54

Slide 54 text

- Entire Cluster failure - Running Containers +350 - Rebuilt cluster etcd from scratch - Downtime: 2 hours Worse Case Scenario

Slide 55

Slide 55 text

- Buggy App (100% cpu) deployed on cluster - Measures: Deployment on Test environment - Measures: Canary deploys (10% of cluster) - High CPU usage disrupted etcd cluster - Measures: Tune etcd time-outs - Measures: Setup isolated etcd-cluster servers - Reboot of machines provoked OS upgrade - Measures: Lock-down CoreOS version by disabling service-upgrade; Upgrades on-demand Worse Case Scenario - Timeline

Slide 56

Slide 56 text

Thank you