Slide 1

Slide 1 text

Docker in Production: A Survival Guide Greg Poirier - CTO - @GetOpsee

Slide 2

Slide 2 text

Who am I? • CTO at Opsee. • Lots of years in operations, development. • Super jazzed about containers.

Slide 3

Slide 3 text

Docker at Opsee • I love containers. • Lightweight deployable objects. • “Build once. Deploy anywhere.”

Slide 4

Slide 4 text

A Bumpy Ride

Slide 5

Slide 5 text

A Bumpy Ride • Adopting any new technology requires a significant investment in the form of time and energy. • You will make mistakes. • You will learn from them.

Slide 6

Slide 6 text

Docker in Production • Building software for containers • Deploying containers • Operational considerations • Logging, resource allocation.

Slide 7

Slide 7 text

Runtime Containers • You want thin containers. • Faster deploys. • Faster builds. • Fewer disk problems.

Slide 8

Slide 8 text

Thin Containers • Avoid "OS" containers. • Avoid startup scripts.

Slide 9

Slide 9 text

Thin Containers • Runtime dependencies go in volumes. • Use multiple containers and link them. • Containers cost very little. • Inodes and disk space.

Slide 10

Slide 10 text

Build vs. Runtime Containers • Build containers are for building. • Compilers, deployment stuff, etc. • Runtime containers are for running. • Just the build artifacts.

Slide 11

Slide 11 text

Don’t Fear Multiple Containers config:
 image: yourOrg/getConfig
 command: /getConfig serviceName -o /etc/config.yaml
 volumes:
 - /etc
 
 service:
 image: yourOrg/serviceName
 command: /serviceName /etc/config.yaml
 volumes_from:
 - config:ro

Slide 12

Slide 12 text

Export Things for Humans • Put stuff in host-mounted volumes for people • If you must • Ship stuff to S3 • Log • Emit metrics

Slide 13

Slide 13 text

Deploying Containers • Registries • Tags • Schedulers

Slide 14

Slide 14 text

Registries • Depending on registries sucks. • Downtime is extremely frustrating. • I think they mostly understand this.

Slide 15

Slide 15 text

MFW Registry Downtime

Slide 16

Slide 16 text

Registry Downtime • Downtime can and will happen. • Restart on the same host if you crash. • Docker or Systemd restart policy. • Don’t fail to start if you can’t pull. • ExecStartPre=-/usr/bin/docker pull…

Slide 17

Slide 17 text

Deploying Containers • Avoid symbolic container tags. • Tags identify code running in a container. • You can use labels for this as well, but don’t.

Slide 18

Slide 18 text

Tag Your Images • Simple Example: • You run yourOrg/yourService:production • You update the tag to point to a new image version • One of the instances in your ELB restarts. • Two versions without a deploy.

Slide 19

Slide 19 text

I Promise This is Bad • Deploys should be deliberate. • Control what code is running very carefully. • Make it obvious to the casual observer what version is running.

Slide 20

Slide 20 text

Schedulers • Most of them are good. • Some of them are easier. • Some of them are harder.

Slide 21

Slide 21 text

Choosing a Scheduler • Operational complexity. • Features. • Most importantly: your needs.

Slide 22

Slide 22 text

Docker-Compose

Slide 23

Slide 23 text

The Power of Docker-Compose Compels You • Containers work well together. • E.g. NSQ + Service + Configuration • Choose a scheduler that supports docker- compose. • It’s got what devs need.

Slide 24

Slide 24 text

Operations

Slide 25

Slide 25 text

Operations • Docker does not solve operational problems. • Docker’s default configuration is not suitable for production. • Docker’s default configuration will lead to downtime.

Slide 26

Slide 26 text

Logging • Default logging driver: json-file • gliderlabs/logspout • So many problems…

Slide 27

Slide 27 text

Don’t Use json-file in Production • Long-running containers in production will eventually consume all of the disk space available to /var/lib/docker because of json- file’s default configuration. • Use syslog, or awslog

Slide 28

Slide 28 text

No Sensible Defaults Anywhere

Slide 29

Slide 29 text

No Sensible Defaults Anywhere • CoreOS uses json-file by default. • Debian(s) use json-file by default. • RHEL(s) use json-file by default. • Everyone defaults to something inappropriate for production.

Slide 30

Slide 30 text

Breathe.

Slide 31

Slide 31 text

Logspout • We tried to make logspout happen. • Problems with connection handling, etc. • Don’t use json-log or logspout.

Slide 32

Slide 32 text

Disk

Slide 33

Slide 33 text

Disk • You really need to manage disk space carefully. • Remove stale images. • Remove stopped containers. • Don’t store tons of state locally.

Slide 34

Slide 34 text

No Really... • rm -rf /var/lib/docker • docker ps -aq | xargs docker rm • docker images -q | xargs docker rmi -f

Slide 35

Slide 35 text

Memory Allocation • Declare the resources you intend to use. • This is important to do. • Pick a scheduler that supports this.

Slide 36

Slide 36 text

You Still Have Work to Do • V8 and JVM will allocate heap until they are OOM killed. • Go does not adhere to resource limits. • Nothing adheres to resource limits but the kernel.

Slide 37

Slide 37 text

Memory Management Settings • V8 and JVM allow you to control memory allocation. • Max heap isn’t everything. • If you don’t set max heap, they will allocate heap until the kernel kills them. • Plan for this or don’t.

Slide 38

Slide 38 text

Thanks! • Thanks for coming! • Thanks for listening! • Question, comments? • @grepory on Twitter

Slide 39

Slide 39 text

Operators are Standing By