Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Docker in Production: A Survival Guide

Docker in Production: A Survival Guide

For the past few years, the Linux container ecosystem has been growing at a breakneck pace with a vanguard of companies like Joyent, Docker, and CoreOS. Opsee chose to build its product using Docker and CoreOS, and for the past year, we have been delivering software continuously to production with containers. In this talk, we'll discuss the lessons learned from a year of containers in production and the mistakes that led to our current set of best practices. How do we build software for containers? How do we ship containers? How do we do all of it without shooting ourselves in the foot?

F1d0ab801e972d49592bf75d69b7bf65?s=128

Greg Poirier

April 08, 2016
Tweet

Transcript

  1. Docker in Production: A Survival Guide Greg Poirier - CTO

    - @GetOpsee
  2. Who am I? • CTO at Opsee. • Lots of

    years in operations, development. • Super jazzed about containers.
  3. Docker at Opsee • I love containers. • Lightweight deployable

    objects. • “Build once. Deploy anywhere.”
  4. A Bumpy Ride

  5. A Bumpy Ride • Adopting any new technology requires a

    significant investment in the form of time and energy. • You will make mistakes. • You will learn from them.
  6. Docker in Production • Building software for containers • Deploying

    containers • Operational considerations • Logging, resource allocation.
  7. Runtime Containers • You want thin containers. • Faster deploys.

    • Faster builds. • Fewer disk problems.
  8. Thin Containers • Avoid "OS" containers. • Avoid startup scripts.

  9. Thin Containers • Runtime dependencies go in volumes. • Use

    multiple containers and link them. • Containers cost very little. • Inodes and disk space.
  10. Build vs. Runtime Containers • Build containers are for building.

    • Compilers, deployment stuff, etc. • Runtime containers are for running. • Just the build artifacts.
  11. Don’t Fear Multiple Containers config:
 image: yourOrg/getConfig
 command: /getConfig serviceName

    -o /etc/config.yaml
 volumes:
 - /etc
 
 service:
 image: yourOrg/serviceName
 command: /serviceName /etc/config.yaml
 volumes_from:
 - config:ro
  12. Export Things for Humans • Put stuff in host-mounted volumes

    for people • If you must • Ship stuff to S3 • Log • Emit metrics
  13. Deploying Containers • Registries • Tags • Schedulers

  14. Registries • Depending on registries sucks. • Downtime is extremely

    frustrating. • I think they mostly understand this.
  15. MFW Registry Downtime

  16. Registry Downtime • Downtime can and will happen. • Restart

    on the same host if you crash. • Docker or Systemd restart policy. • Don’t fail to start if you can’t pull. • ExecStartPre=-/usr/bin/docker pull…
  17. Deploying Containers • Avoid symbolic container tags. • Tags identify

    code running in a container. • You can use labels for this as well, but don’t.
  18. Tag Your Images • Simple Example: • You run yourOrg/yourService:production

    • You update the tag to point to a new image version • One of the instances in your ELB restarts. • Two versions without a deploy.
  19. I Promise This is Bad • Deploys should be deliberate.

    • Control what code is running very carefully. • Make it obvious to the casual observer what version is running.
  20. Schedulers • Most of them are good. • Some of

    them are easier. • Some of them are harder.
  21. Choosing a Scheduler • Operational complexity. • Features. • Most

    importantly: your needs.
  22. Docker-Compose

  23. The Power of Docker-Compose Compels You • Containers work well

    together. • E.g. NSQ + Service + Configuration • Choose a scheduler that supports docker- compose. • It’s got what devs need.
  24. Operations

  25. Operations • Docker does not solve operational problems. • Docker’s

    default configuration is not suitable for production. • Docker’s default configuration will lead to downtime.
  26. Logging • Default logging driver: json-file • gliderlabs/logspout • So

    many problems…
  27. Don’t Use json-file in Production • Long-running containers in production

    will eventually consume all of the disk space available to /var/lib/docker because of json- file’s default configuration. • Use syslog, or awslog
  28. No Sensible Defaults Anywhere

  29. No Sensible Defaults Anywhere • CoreOS uses json-file by default.

    • Debian(s) use json-file by default. • RHEL(s) use json-file by default. • Everyone defaults to something inappropriate for production.
  30. Breathe. </high horse>

  31. Logspout • We tried to make logspout happen. • Problems

    with connection handling, etc. • Don’t use json-log or logspout.
  32. Disk

  33. Disk • You really need to manage disk space carefully.

    • Remove stale images. • Remove stopped containers. • Don’t store tons of state locally.
  34. No Really... • rm -rf /var/lib/docker • docker ps -aq

    | xargs docker rm • docker images -q | xargs docker rmi -f
  35. Memory Allocation • Declare the resources you intend to use.

    • This is important to do. • Pick a scheduler that supports this.
  36. You Still Have Work to Do • V8 and JVM

    will allocate heap until they are OOM killed. • Go does not adhere to resource limits. • Nothing adheres to resource limits but the kernel.
  37. Memory Management Settings • V8 and JVM allow you to

    control memory allocation. • Max heap isn’t everything. • If you don’t set max heap, they will allocate heap until the kernel kills them. • Plan for this or don’t.
  38. Thanks! • Thanks for coming! • Thanks for listening! •

    Question, comments? • @grepory on Twitter
  39. Operators are Standing By