The OPS side of DEV

The OPS side of DEV

I think we as developers need to be more attentive towards the operational aspects of our code. Presented at Crash & Burn conference in Stockholm.

Approx 40 minutes.

A204e1fe2002bc6d087391759c3dfab0?s=128

Mårten Gustafson

March 02, 2012
Tweet

Transcript

  1. Hello. My name is Mårten Gustafson

  2. ...I used to work here...

  3. ...now I work here...

  4. ...doing mostly this...

  5. ...but spend a fair share of my time looking at

    metrics like this...
  6. AUTOMATE ALL THE TINGS! ...being rabid about this, which makes

    me a fan of...
  7. ...DevOps...and it’s general concepts...but...

  8. the OPS side of DEV ...I’m talking about this

  9. OpsDev ...I think: we (as developers) need to think about

    this!
  10. develop for operations * I think we need to get

    better at this
  11. develop for production * I think we need to focus

    on this, which is why...
  12. ...I’ll start off with one of the most boring things

    to most developers...
  13. logging

  14. * Huge files * Messy log format * Hard to

    filter * Hard to correlate * Might as well...
  15. /bin/my-awesome-service 2&>1 > /dev/null

  16. surprisingly hard

  17. or... we’re surprisingly bad

  18. so, logging

  19. 1. do it

  20. framework pick a framework that’s: * makes sense * is

    de-facto standard? * is flexible * is lightweight * is easy to use
  21. consistent try to log in a consistent manner * think

    of log messages in terms of operation * what’s an error (should somebody be woken up?) * what’s a trace (who’s the audience?) * etc
  22. 2. rotation & retention

  23. resources are finite * rotate your log files * put

    an upper bound on the size of one log file
  24. define window of interest * for the local disc: toss

    anythings that’s older than X, or: * compress (and then toss when they’re even older) * will you ever look in compressed log files?
  25. 3. formatting

  26. 0 [main] INFO Main - foo 0 [main] WARN Main

    - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)
  27. 0 [main] INFO Main - foo 0 [main] WARN Main

    - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) 0 [main] INFO Main - foo 0 [main] WARN Main - bar 0 [doer] ERROR Worker - gah java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)
  28. easy on the eyes * aligned (easier on the eyes)

  29. easy on the tools * tail & grep friendly

  30. INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main

    - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)
  31. INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main

    - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13) INFO [2012-02-25 20:24:03] foo.Main - foo WARN [2012-02-25 20:24:03] foo.Main - bar ERROR [2012-02-25 20:24:03] foo.Worker - gah ! java.lang.NullPointerException: gah ! at Doer.worker(Doer.java:13)
  32. https://github.com/codahale/logula * have a look at this

  33. 3. destinations

  34. * when this is your reality, you don’t really only

    want log files on local machine disk
  35. * when this is your reality, you don’t really only

    want log files on local machine disk
  36. to name a few

  37. SMTP file syslog SQL AMQP IRC XMPP to name a

    few
  38. critical = “real time”

  39. None
  40. audit = remote + restricted access

  41. mix and match

  42. always fallback on local file (fallacies of distributed computing) *

    Not SAN, NFS, NAS, etc
  43. (beware of sensitive data) * security sensitive: keys, passwords, etc

    * integrity sensitive: whatever you’re users might provide that’s not for everyone’s eyes
  44. 4. separation

  45. we usually log most things

  46. we usually don’t separate

  47. UTILIZE ALL THE CONTEXTS!

  48. multiple logs & context logs

  49. * Look at the SiftingAppender in logback-classic for an example

  50. traditional log * Look at the SiftingAppender in logback-classic for

    an example
  51. traditional log userid * Look at the SiftingAppender in logback-classic

    for an example
  52. traditional log userid session id * Look at the SiftingAppender

    in logback-classic for an example
  53. 5. configuration

  54. sane defaults * location * rotation

  55. per environment * have a configuration that automatically adapts to

    the environment * log everything to stdout in local development * log everything to file in test * log X to Y and Z to FOO in prod
  56. re:configurable * don’t require a deploy to change a log

    level * provide an API * use JMX * so that you can tweak logging (enable tracing) right in production when you need to
  57. None
  58. metrics

  59. let your code speak

  60. INSTRUMENT ALL THE CODE!

  61. meters

  62. counters meters timers gauges histograms

  63. EXPOSE ALL THE VALUES!

  64. ...or whatever makes sense for you ...and yes, it’s Comic

    Sans for BAYEUX ...and yes, it’s Helvetica for JMX
  65. JMX JSON XML HTTP XMPP AMQP THRIFT BAYEUX RMI CSV

    ...or whatever makes sense for you ...and yes, it’s Comic Sans for BAYEUX ...and yes, it’s Helvetica for JMX
  66. * put all your values into your tools and services

    * BUT DON’T FORGET THE AD-HOC, LOCAL, USAGE (ie JMX)
  67. None
  68. self checks!

  69. @Override protected Result check() throws Exception { if (database.ping()) {

    return Result.healthy(); } return Result.unhealthy("Can't ping database"); } * databases * other services * other dependencies * make them explicitly invokable
  70. trend on them

  71. alert on them

  72. ...render them as markers/buttons/light bulbs/whatever

  73. make instrumentation a habit * just do it

  74. find optimal usage later * you’ll never use it if

    it ain’t there
  75. None
  76. http://metrics.codahale.com/

  77. None
  78. packaging * adaptive to different environments

  79. one package * one package, regardless of environment

  80. bundle dependencies

  81. über jar (maven shade plugin) * for example

  82. adaptive configuration * adaptive to different environments

  83. isolated * mocks dependencies with (static) dummy answers

  84. <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <scope>test</scope> </dependency> * maven example

  85. <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <scope>test</scope> </dependency> * maven example

  86. <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <scope>test</scope> </dependency> * maven example

  87. <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <scope>compile</scope> </dependency> * maven example

  88. 127.0.0.1 * expect everything to be available on 127.0.0.1

  89. test / qa / staging / prod * the usual

    suspects
  90. either detect your environment * ie, bundle configurations for all

    environments
  91. or load externalized configuration * DNS * ZooKeeper * CouchDB

    * Doozer * External property/YML/JSON/whatever files ** in one sane specified location (preferably the working directory)
  92. strive for zero-touch configuration * packages should JUST WORK

  93. None
  94. the operational aspect needs to be an integral part of:

  95. the operational aspect needs to be an integral part of:

    development
  96. the operational aspect needs to be an integral part of:

    design
  97. the operational aspect needs to be an integral part of:

    architecture
  98. the operational aspect needs to be an integral part of:

    reasoning
  99. And therefore, a quick comment on...

  100. *aaS ....this...

  101. ...[whatever] as a service * what ever as a service

  102. OUTSOURCE ALL THE TINGS!

  103. to name some * logging * alerting

  104. Tools! Nice!

  105. (new shiny object syndrome)

  106. None
  107. aka

  108. fallacies of distributed computing

  109. None
  110. 1. The network is reliable 2. Latency is zero 3.

    Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch
  111. 1. The network is reliable 2. Latency is zero 3.

    Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch
  112. so when using this, we need to seriously consider...

  113. 1. The network is reliable 2. Latency is zero 3.

    Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous (- James Gosling) Fallacies of distributed computing - Peter Deutsch ...this * reliability (overall, geo location, connectivity) * security (communication, retention) * cost (of using, of not being available)
  114. by all means...

  115. ...use services

  116. but not only

  117. don’t bet your operation on their availability

  118. None
  119. responsibility

  120. logging

  121. YOU

  122. metrics

  123. YOU

  124. packaging

  125. YOU

  126. configuration

  127. YOU

  128. sane defaults

  129. YOU

  130. YOU

  131. NOT operations

  132. NOT your hosting provider

  133. NOT your boss

  134. NOT service provider

  135. NOT your colleague

  136. YOU

  137. develop accordingly

  138. develop for operations

  139. love your operations @martengustafson http://marten.gustafson.pp.se/ marten.gustafson@gmail.com