Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A million containers isn't cool

A million containers isn't cool

You know what's cool? A hundred containers.

A lot of us ship software multiple times a day—but what goes into that, and how do we make it happen reliably?

In this talk, we'll look at the deployment of a typical web app/API. We'll focus on build artifacts - the things we actually ship to production - and why it's helpful to make their build and deployment processes consistent.

From there, we'll move on to containers—Docker in particular—with with a focus on container images and how they can get us to that goal.

We'll deliberately sidestep the world of distributed schedulers—Mesos, Kubernetes, and friends. They're great tools when you need to manage a growing fleet of computers, but running them doesn't come without an operational cost.

By following the example of a production system that's built this way—containerised apps without a distributed scheduler—we'll explore what it takes to move apps into containers, and how doing so might shape your infrastructure.

To wrap up, we'll look at some alternatives that could serve you well if Docker isn't the right fit for your organisation.

Chris Sinjakli

March 13, 2017
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. A million containers isn’t cool

  2. You know what’s cool?

  3. A hundred containers

  4. A million containers isn’t cool You know what’s cool? A

    hundred containers. @ChrisSinjo
  5. GOCARDLESS

  6. None
  7. None
  8. None
  9. We aren’t #webscale (#sorrynotsorry)

  10. So why do we care about containers?

  11. POST /cash/monies HTTP/1.1 { amount: 100 }

  12. High per-request

  13. Reliability is

  14. Deploying software reliably

  15. Deploying software reliably How containers can help

  16. Deploying software reliably How containers can help Other options

  17. First things first: deployment artifacts

  18. Source code ↓ Something you can put on a server

  19. A .jar file A statically linked binary An OS package

    (.deb, .rpm)
  20. Some languages start on the back foot

  21. None
  22. Capistrano: a typical Ruby flow

  23. On each server:

  24. On each server: - Clone source

  25. On each server: - Clone source - Build dependencies

  26. On each server: - Clone source - Build dependencies -

    Run schema migrations
  27. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets
  28. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP
  29. What’s wrong here?

  30. Hope

  31. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP
  32. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP Hope
  33. $ bundle install … Building nokogiri using system libraries. Gem::Ext::BuildError:

    ERROR: Failed to build gem native extension.
  34. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP Hope
  35. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP Hope Hope
  36. On each server: - Clone source - Build dependencies -

    Run schema migrations - Build static assets - SIGHUP Hope Hope Hope
  37. – Traditional SRE saying “Hope is not a strategy.” https://landing.google.com/sre/book.html

  38. There’s something else

  39. Applications don’t run in a vacuum

  40. Ruby app

  41. Ruby app Ruby dependencies

  42. Ruby app Ruby dependencies Native libraries

  43. Ruby app Ruby dependencies Native libraries

  44. Ruby app Ruby dependencies Native libraries Nokogiri libxml2

  45. Ruby app Ruby dependencies Native libraries Nokogiri libxml2

  46. Ruby app Ruby dependencies Native libraries Nokogiri libxml2

  47. How do we install software?

  48. Nokogiri libxml2

  49. Nokogiri libxml2 $ bundle install

  50. Nokogiri libxml2 $ apt-get install libxml2 $ bundle install

  51. Nokogiri libxml2 Chef or whatever App’s source repository

  52. That seems inconvenient…

  53. Container images: totally a thing

  54. Nokogiri libxml2 Chef or whatever App’s source repository

  55. Nokogiri libxml2 App’s source repository App’s source repository

  56. This is why most people care about Docker

  57. namespaces cgroups images

  58. namespaces cgroups images

  59. https://twitter.com/benjiweber/status/770306615555854336

  60. Deploying software reliably How containers can help Other options

  61. Deploying software reliably How containers can help Other options

  62. So what did we care about?

  63. Uniform deployment

  64. Uniform deployment Based around an artifact

  65. Uniform deployment Based around an artifact Fail early

  66. And what didn’t we care about?

  67. Know what your aims aren’t

  68. Distributed schedulers

  69. compute compute compute !!! compute compute

  70. Scheduler compute compute compute !!! compute compute

  71. compute compute compute !!! compute compute Scheduler App App App

  72. compute compute compute !!! compute compute Scheduler App App App

  73. compute compute compute !!! compute compute Scheduler App App App

  74. Nothing comes for free

  75. Kubernetes means:

  76. Kubernetes means: — a distributed scheduler

  77. Kubernetes means: — a distributed scheduler — cluster DNS

  78. Kubernetes means: — a distributed scheduler — cluster DNS —

    etcd
  79. Kubernetes means: — a distributed scheduler — cluster DNS —

    etcd — …
  80. Nothing comes for free

  81. We aren’t #webscale (#sorrynotsorry)

  82. Distributed schedulers

  83. Distributed schedulers

  84. So what did we build?

  85. 3 parts…

  86. Service definitions

  87. A service:

  88. A service: — an image

  89. A service: — an image — environment config

  90. A service: — an image — environment config — command

    to run
  91. A service: — an image — environment config — command

    to run — limits (memory, CPU)
  92. A service: — an image — environment config — command

    to run — limits (memory, CPU) — …
  93. This is config management

  94. So we used Chef

  95. Chef Service A Service C Service B

  96. Chef Service A Service C Service B Compute 1 Compute

    2 Compute 3
  97. Chef Service A Service C Service B Compute 1 Service

    A Service B Compute 2 Compute 3 config
  98. Chef Service A Service C Service B Compute 1 Service

    A Service B Compute 2 Service B Service C Compute 3 config
  99. Chef Service A Service C Service B Compute 1 Service

    A Service B Compute 2 Service B Service C Compute 3 Service A Service C config
  100. Chef Service A Service C Service B Compute 1 Service

    A Service B Compute 2 Service B Service C Compute 3 Service A Service C
  101. Service definitions

  102. Service definitions Single-node orchestration

  103. Enter Conductor

  104. conductor service upgrade --id gocardless_app_production --revision 279d903588

  105. conductor service upgrade --id gocardless_app_production --revision 279d903588

  106. conductor service upgrade --id gocardless_app_production --revision 279d903588

  107. The flow:

  108. The flow: — start containers for new version

  109. The flow: — start containers for new version — wait

    for health check
  110. The flow: — start containers for new version — wait

    for health check — rewrite local nginx config
  111. The flow: — start containers for new version — wait

    for health check — rewrite local nginx config — reload nginx
  112. The flow: — start containers for new version — wait

    for health check — rewrite local nginx config — reload nginx — stop old containers
  113. Conductor nginx Docker

  114. Conductor nginx Docker Old

  115. Conductor nginx traffic Old traffic Docker

  116. Conductor nginx traffic Old New traffic API Docker

  117. Conductor nginx traffic Old New traffic health check Docker

  118. Conductor nginx traffic Old New traffic config Docker

  119. Conductor nginx traffic Old New traffic reload Docker

  120. Conductor nginx traffic Old New traffic Docker

  121. Conductor nginx traffic Old New traffic Docker API

  122. Conductor nginx traffic New traffic Docker API

  123. Conductor nginx traffic New traffic Docker

  124. What about cron jobs?

  125. conductor cron generate --id gocardless_cron_production --revision 279d903588

  126. conductor cron generate --id gocardless_cron_production --revision 279d903588

  127. gocardless/ ▼ app/ payment_stuff.rb ▶ lib/ generate-cron

  128. # Clean up expired API tokens */30 * * *

    * scripts/cleanup-api-tokens
  129. # Clean up expired API tokens */30 * * *

    * /usr/local/bin/conductor run --id gocardless_cron_production --revision 279d903588 scripts/cleanup-api-tokens
  130. Service definitions Single-node orchestration

  131. Service definitions Single-node orchestration A way to trigger deploys

  132. Keep it boring

  133. Keep it in Capistrano

  134. Capistrano Legacy infra deploy

  135. Capistrano Legacy infra deploy New infra deploy

  136. Help developers do their job

  137. $

  138. 1thing missing

  139. – a computer “Hey, this process died.”

  140. Process Process Process Supervisor

  141. Process Process Process Supervisor

  142. Process Process Process Supervisor

  143. Process Process Process Supervisor start

  144. Some supervisors:

  145. Some supervisors: — Upstart

  146. Some supervisors: — Upstart — systemd

  147. Some supervisors: — Upstart — systemd — runit

  148. Those didn’t play well with Docker

  149. Docker restart policies

  150. We didn’t get along well

  151. Hard to stop or Gave up entirely

  152. Hard to stop or Gave up entirely

  153. We built a process supervisor

  154. conductor supervise

  155. Specifically:

  156. Specifically: — check number of containers

  157. Specifically: — check number of containers — health check each

    container
  158. Specifically: — check number of containers — health check each

    container
  159. Specifically: — check number of containers — health check each

    container — restart if either fails
  160. Specifically: — check number of containers — health check each

    container — restart if either fails — at most every 5 seconds
  161. # service conductor-supervise stop

  162. We don’t want this piece of software

  163. $

  164. Deploying software reliably How containers can help Other options

  165. Deploying software reliably How containers can help Other options

  166. systemd + rkt or VMs + autoscaling

  167. Supervisor: systemd Containers: rkt

  168. Supervisor: systemd Containers: rkt

  169. To fit our usage:

  170. To fit our usage: — Conductor generates systemd config

  171. To fit our usage: — Conductor generates systemd config —

    systemd manages processes
  172. To fit our usage: — Conductor generates systemd config —

    systemd manages processes — Delete conductor supervise
  173. To fit our usage: — Conductor generates systemd config —

    systemd manages processes — Delete conductor supervise — HTTP health checks???
  174. systemd + rkt or VMs + autoscaling

  175. Supervisor: autoscaling Containers → VMs

  176. Supervisor: autoscaling Containers → VMs

  177. None
  178. None
  179. Meta-thoughts

  180. Meta-thoughts

  181. Some reckons

  182. Introduce new infrastructure where failure is survivable

  183. Non-critical batch jobs ↓ Background workers ↓ API servers

  184. Goal state is what matters

  185. Everything might change before your next method call

  186. The system isn’t interesting without context

  187. Start with why

  188. Thank you )❤ @ChrisSinjo @GoCardlessEng

  189. We’re hiring )❤ @ChrisSinjo @GoCardlessEng

  190. Questions? )❤ @ChrisSinjo @GoCardlessEng