Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A million containers isn't cool

A million containers isn't cool

You know what's cool? A hundred containers.

A lot of us ship software multiple times a day—but what goes into that, and how do we make it happen reliably?

In this talk, we'll look at the deployment of a typical web app/API. We'll focus on build artifacts - the things we actually ship to production - and why it's helpful to make their build and deployment processes consistent.

From there, we'll move on to containers—Docker in particular—with with a focus on container images and how they can get us to that goal.

We'll deliberately sidestep the world of distributed schedulers—Mesos, Kubernetes, and friends. They're great tools when you need to manage a growing fleet of computers, but running them doesn't come without an operational cost.

By following the example of a production system that's built this way—containerised apps without a distributed scheduler—we'll explore what it takes to move apps into containers, and how doing so might shape your infrastructure.

To wrap up, we'll look at some alternatives that could serve you well if Docker isn't the right fit for your organisation.

Chris Sinjakli

March 13, 2017
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. A million containers
    isn’t cool

    View Slide

  2. You know what’s
    cool?

    View Slide

  3. A hundred
    containers

    View Slide

  4. A million containers isn’t cool
    You know what’s cool? A hundred containers.
    @ChrisSinjo

    View Slide

  5. GOCARDLESS

    View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. We aren’t
    #webscale
    (#sorrynotsorry)

    View Slide

  10. So why do we care
    about containers?

    View Slide

  11. POST /cash/monies HTTP/1.1
    { amount: 100 }

    View Slide

  12. High per-request

    View Slide

  13. Reliability is

    View Slide

  14. Deploying software reliably

    View Slide

  15. Deploying software reliably
    How containers can help

    View Slide

  16. Deploying software reliably
    How containers can help
    Other options

    View Slide

  17. First things first:
    deployment artifacts

    View Slide

  18. Source code

    Something you can
    put on a server

    View Slide

  19. A .jar file
    A statically linked binary
    An OS package (.deb, .rpm)

    View Slide

  20. Some languages
    start on the back foot

    View Slide

  21. View Slide

  22. Capistrano:
    a typical Ruby flow

    View Slide

  23. On each server:

    View Slide

  24. On each server:
    - Clone source

    View Slide

  25. On each server:
    - Clone source
    - Build dependencies

    View Slide

  26. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations

    View Slide

  27. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets

    View Slide

  28. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP

    View Slide

  29. What’s wrong here?

    View Slide

  30. Hope

    View Slide

  31. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP

    View Slide

  32. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP
    Hope

    View Slide

  33. $ bundle install

    Building nokogiri using system libraries.
    Gem::Ext::BuildError: ERROR: Failed to build
    gem native extension.

    View Slide

  34. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP
    Hope

    View Slide

  35. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP
    Hope
    Hope

    View Slide

  36. On each server:
    - Clone source
    - Build dependencies
    - Run schema migrations
    - Build static assets
    - SIGHUP
    Hope
    Hope
    Hope

    View Slide

  37. – Traditional SRE saying
    “Hope is not a strategy.”
    https://landing.google.com/sre/book.html

    View Slide

  38. There’s something
    else

    View Slide

  39. Applications
    don’t
    run in a
    vacuum

    View Slide

  40. Ruby app

    View Slide

  41. Ruby app
    Ruby dependencies

    View Slide

  42. Ruby app
    Ruby dependencies
    Native libraries

    View Slide

  43. Ruby app
    Ruby dependencies
    Native libraries

    View Slide

  44. Ruby app
    Ruby dependencies
    Native libraries
    Nokogiri
    libxml2

    View Slide

  45. Ruby app
    Ruby dependencies
    Native libraries
    Nokogiri
    libxml2

    View Slide

  46. Ruby app
    Ruby dependencies
    Native libraries
    Nokogiri
    libxml2

    View Slide

  47. How do we install
    software?

    View Slide

  48. Nokogiri
    libxml2

    View Slide

  49. Nokogiri
    libxml2
    $ bundle install

    View Slide

  50. Nokogiri
    libxml2 $ apt-get install libxml2
    $ bundle install

    View Slide

  51. Nokogiri
    libxml2 Chef or whatever
    App’s source repository

    View Slide

  52. That seems
    inconvenient…

    View Slide

  53. Container images:
    totally a thing

    View Slide

  54. Nokogiri
    libxml2 Chef or whatever
    App’s source repository

    View Slide

  55. Nokogiri
    libxml2 App’s source repository
    App’s source repository

    View Slide

  56. This is why most
    people care about
    Docker

    View Slide

  57. namespaces
    cgroups
    images

    View Slide

  58. namespaces
    cgroups
    images

    View Slide

  59. https://twitter.com/benjiweber/status/770306615555854336

    View Slide

  60. Deploying software reliably
    How containers can help
    Other options

    View Slide

  61. Deploying software reliably
    How containers can help
    Other options

    View Slide

  62. So what did we
    care about?

    View Slide

  63. Uniform deployment

    View Slide

  64. Uniform deployment
    Based around an artifact

    View Slide

  65. Uniform deployment
    Based around an artifact
    Fail early

    View Slide

  66. And what didn’t we
    care about?

    View Slide

  67. Know what your
    aims aren’t

    View Slide

  68. Distributed
    schedulers

    View Slide

  69. compute compute compute !!!
    compute
    compute

    View Slide

  70. Scheduler
    compute compute compute !!!
    compute
    compute

    View Slide

  71. compute compute compute !!!
    compute
    compute
    Scheduler
    App
    App
    App

    View Slide

  72. compute compute compute !!!
    compute
    compute
    Scheduler
    App
    App
    App

    View Slide

  73. compute compute compute !!!
    compute
    compute
    Scheduler
    App
    App App

    View Slide

  74. Nothing comes for
    free

    View Slide

  75. Kubernetes means:

    View Slide

  76. Kubernetes means:
    — a distributed scheduler

    View Slide

  77. Kubernetes means:
    — a distributed scheduler
    — cluster DNS

    View Slide

  78. Kubernetes means:
    — a distributed scheduler
    — cluster DNS
    — etcd

    View Slide

  79. Kubernetes means:
    — a distributed scheduler
    — cluster DNS
    — etcd
    — …

    View Slide

  80. Nothing comes for
    free

    View Slide

  81. We aren’t
    #webscale
    (#sorrynotsorry)

    View Slide

  82. Distributed
    schedulers

    View Slide

  83. Distributed
    schedulers

    View Slide

  84. So what did
    we build?

    View Slide

  85. 3 parts…

    View Slide

  86. Service definitions

    View Slide

  87. A service:

    View Slide

  88. A service:
    — an image

    View Slide

  89. A service:
    — an image
    — environment config

    View Slide

  90. A service:
    — an image
    — environment config
    — command to run

    View Slide

  91. A service:
    — an image
    — environment config
    — command to run
    — limits (memory, CPU)

    View Slide

  92. A service:
    — an image
    — environment config
    — command to run
    — limits (memory, CPU)
    — …

    View Slide

  93. This is
    config management

    View Slide

  94. So we
    used Chef

    View Slide

  95. Chef
    Service A
    Service C
    Service B

    View Slide

  96. Chef
    Service A
    Service C
    Service B
    Compute 1
    Compute 2
    Compute 3

    View Slide

  97. Chef
    Service A
    Service C
    Service B
    Compute 1
    Service A Service B
    Compute 2
    Compute 3
    config

    View Slide

  98. Chef
    Service A
    Service C
    Service B
    Compute 1
    Service A Service B
    Compute 2
    Service B Service C
    Compute 3
    config

    View Slide

  99. Chef
    Service A
    Service C
    Service B
    Compute 1
    Service A Service B
    Compute 2
    Service B Service C
    Compute 3
    Service A Service C
    config

    View Slide

  100. Chef
    Service A
    Service C
    Service B
    Compute 1
    Service A Service B
    Compute 2
    Service B Service C
    Compute 3
    Service A Service C

    View Slide

  101. Service definitions

    View Slide

  102. Service definitions
    Single-node orchestration

    View Slide

  103. Enter Conductor

    View Slide

  104. conductor service upgrade
    --id gocardless_app_production
    --revision 279d903588

    View Slide

  105. conductor service upgrade
    --id gocardless_app_production
    --revision 279d903588

    View Slide

  106. conductor service upgrade
    --id gocardless_app_production
    --revision 279d903588

    View Slide

  107. The flow:

    View Slide

  108. The flow:
    — start containers for new version

    View Slide

  109. The flow:
    — start containers for new version
    — wait for health check

    View Slide

  110. The flow:
    — start containers for new version
    — wait for health check
    — rewrite local nginx config

    View Slide

  111. The flow:
    — start containers for new version
    — wait for health check
    — rewrite local nginx config
    — reload nginx

    View Slide

  112. The flow:
    — start containers for new version
    — wait for health check
    — rewrite local nginx config
    — reload nginx
    — stop old containers

    View Slide

  113. Conductor
    nginx Docker

    View Slide

  114. Conductor
    nginx Docker
    Old

    View Slide

  115. Conductor
    nginx
    traffic
    Old
    traffic
    Docker

    View Slide

  116. Conductor
    nginx
    traffic
    Old
    New
    traffic
    API
    Docker

    View Slide

  117. Conductor
    nginx
    traffic
    Old
    New
    traffic
    health check
    Docker

    View Slide

  118. Conductor
    nginx
    traffic
    Old
    New
    traffic
    config
    Docker

    View Slide

  119. Conductor
    nginx
    traffic
    Old
    New
    traffic
    reload
    Docker

    View Slide

  120. Conductor
    nginx
    traffic
    Old
    New
    traffic
    Docker

    View Slide

  121. Conductor
    nginx
    traffic
    Old
    New
    traffic
    Docker
    API

    View Slide

  122. Conductor
    nginx
    traffic
    New
    traffic
    Docker
    API

    View Slide

  123. Conductor
    nginx
    traffic
    New
    traffic
    Docker

    View Slide

  124. What about
    cron jobs?

    View Slide

  125. conductor cron generate
    --id gocardless_cron_production
    --revision 279d903588

    View Slide

  126. conductor cron generate
    --id gocardless_cron_production
    --revision 279d903588

    View Slide

  127. gocardless/
    ▼ app/
    payment_stuff.rb
    ▶ lib/
    generate-cron

    View Slide

  128. # Clean up expired API tokens
    */30 * * * * scripts/cleanup-api-tokens

    View Slide

  129. # Clean up expired API tokens
    */30 * * * * /usr/local/bin/conductor run
    --id gocardless_cron_production
    --revision 279d903588
    scripts/cleanup-api-tokens

    View Slide

  130. Service definitions
    Single-node orchestration

    View Slide

  131. Service definitions
    Single-node orchestration
    A way to trigger deploys

    View Slide

  132. Keep it
    boring

    View Slide

  133. Keep it
    in Capistrano

    View Slide

  134. Capistrano
    Legacy
    infra
    deploy

    View Slide

  135. Capistrano
    Legacy
    infra
    deploy
    New
    infra
    deploy

    View Slide

  136. Help developers
    do their job

    View Slide

  137. $

    View Slide

  138. 1thing
    missing

    View Slide

  139. – a computer
    “Hey, this process died.”

    View Slide

  140. Process Process Process
    Supervisor

    View Slide

  141. Process Process Process
    Supervisor

    View Slide

  142. Process Process Process
    Supervisor

    View Slide

  143. Process Process Process
    Supervisor
    start

    View Slide

  144. Some supervisors:

    View Slide

  145. Some supervisors:
    — Upstart

    View Slide

  146. Some supervisors:
    — Upstart
    — systemd

    View Slide

  147. Some supervisors:
    — Upstart
    — systemd
    — runit

    View Slide

  148. Those didn’t
    play well with
    Docker

    View Slide

  149. Docker restart
    policies

    View Slide

  150. We didn’t get
    along well

    View Slide

  151. Hard to stop
    or
    Gave up entirely

    View Slide

  152. Hard to stop
    or
    Gave up entirely

    View Slide

  153. We built a
    process supervisor

    View Slide

  154. conductor supervise

    View Slide

  155. Specifically:

    View Slide

  156. Specifically:
    — check number of containers

    View Slide

  157. Specifically:
    — check number of containers
    — health check each container

    View Slide

  158. Specifically:
    — check number of containers
    — health check each container

    View Slide

  159. Specifically:
    — check number of containers
    — health check each container
    — restart if either fails

    View Slide

  160. Specifically:
    — check number of containers
    — health check each container
    — restart if either fails
    — at most every 5 seconds

    View Slide

  161. # service conductor-supervise stop

    View Slide

  162. We don’t want this
    piece of software

    View Slide

  163. $

    View Slide

  164. Deploying software reliably
    How containers can help
    Other options

    View Slide

  165. Deploying software reliably
    How containers can help
    Other options

    View Slide

  166. systemd + rkt
    or
    VMs + autoscaling

    View Slide

  167. Supervisor: systemd
    Containers: rkt

    View Slide

  168. Supervisor: systemd
    Containers: rkt

    View Slide

  169. To fit our usage:

    View Slide

  170. To fit our usage:
    — Conductor generates systemd config

    View Slide

  171. To fit our usage:
    — Conductor generates systemd config
    — systemd manages processes

    View Slide

  172. To fit our usage:
    — Conductor generates systemd config
    — systemd manages processes
    — Delete conductor supervise

    View Slide

  173. To fit our usage:
    — Conductor generates systemd config
    — systemd manages processes
    — Delete conductor supervise
    — HTTP health checks???

    View Slide

  174. systemd + rkt
    or
    VMs + autoscaling

    View Slide

  175. Supervisor: autoscaling
    Containers → VMs

    View Slide

  176. Supervisor: autoscaling
    Containers → VMs

    View Slide

  177. View Slide

  178. View Slide

  179. Meta-thoughts

    View Slide

  180. Meta-thoughts

    View Slide

  181. Some reckons

    View Slide

  182. Introduce
    new infrastructure
    where failure
    is survivable

    View Slide

  183. Non-critical batch jobs

    Background workers

    API servers

    View Slide

  184. Goal state is
    what matters

    View Slide

  185. Everything might
    change before your
    next method call

    View Slide

  186. The system isn’t
    interesting
    without context

    View Slide

  187. Start with why

    View Slide

  188. Thank you
    )❤
    @ChrisSinjo
    @GoCardlessEng

    View Slide

  189. We’re hiring
    )❤
    @ChrisSinjo
    @GoCardlessEng

    View Slide

  190. Questions?
    )❤
    @ChrisSinjo
    @GoCardlessEng

    View Slide