$30 off During Our Annual Pro Sale. View Details »

The OPS side of DEV

The OPS side of DEV

I think we as developers need to be more attentive towards the operational aspects of our code. Presented at Crash & Burn conference in Stockholm.

Approx 40 minutes.

Mårten Gustafson

March 02, 2012
Tweet

More Decks by Mårten Gustafson

Other Decks in Programming

Transcript

  1. Hello.
    My name is Mårten Gustafson

    View Slide

  2. ...I used to work here...

    View Slide

  3. ...now I work here...

    View Slide

  4. ...doing mostly this...

    View Slide

  5. ...but spend a fair share of my time looking at metrics like this...

    View Slide

  6. AUTOMATE
    ALL THE TINGS!
    ...being rabid about this, which makes me a fan of...

    View Slide

  7. ...DevOps...and it’s general concepts...but...

    View Slide

  8. the OPS side of DEV
    ...I’m talking about this

    View Slide

  9. OpsDev
    ...I think: we (as developers) need to think about this!

    View Slide

  10. develop for operations
    * I think we need to get better at this

    View Slide

  11. develop for production
    * I think we need to focus on this, which is why...

    View Slide

  12. ...I’ll start off with one of the most boring things to most developers...

    View Slide

  13. logging

    View Slide

  14. * Huge files
    * Messy log format
    * Hard to filter
    * Hard to correlate
    * Might as well...

    View Slide

  15. /bin/my-awesome-service 2&>1 > /dev/null

    View Slide

  16. surprisingly hard

    View Slide

  17. or...
    we’re surprisingly bad

    View Slide

  18. so, logging

    View Slide

  19. 1. do it

    View Slide

  20. framework
    pick a framework that’s:
    * makes sense
    * is de-facto standard?
    * is flexible
    * is lightweight
    * is easy to use

    View Slide

  21. consistent
    try to log in a consistent manner
    * think of log messages in terms of operation
    * what’s an error (should somebody be woken up?)
    * what’s a trace (who’s the audience?)
    * etc

    View Slide

  22. 2. rotation & retention

    View Slide

  23. resources are finite
    * rotate your log files
    * put an upper bound on the size of one log file

    View Slide

  24. define window of interest
    * for the local disc: toss anythings that’s older than X, or:
    * compress (and then toss when they’re even older)
    * will you ever look in compressed log files?

    View Slide

  25. 3. formatting

    View Slide

  26. 0 [main] INFO Main - foo
    0 [main] WARN Main - bar
    0 [doer] ERROR Worker - gah
    java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)

    View Slide

  27. 0 [main] INFO Main - foo
    0 [main] WARN Main - bar
    0 [doer] ERROR Worker - gah
    java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    0 [main] INFO Main - foo
    0 [main] WARN Main - bar
    0 [doer] ERROR Worker - gah
    java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    0 [main] INFO Main - foo
    0 [main] WARN Main - bar
    0 [doer] ERROR Worker - gah
    java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    0 [main] INFO Main - foo
    0 [main] WARN Main - bar
    0 [doer] ERROR Worker - gah
    java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)

    View Slide

  28. easy on the eyes
    * aligned (easier on the eyes)

    View Slide

  29. easy on the tools
    * tail & grep friendly

    View Slide

  30. INFO [2012-02-25 20:24:03] foo.Main - foo
    WARN [2012-02-25 20:24:03] foo.Main - bar
    ERROR [2012-02-25 20:24:03] foo.Worker - gah
    ! java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)

    View Slide

  31. INFO [2012-02-25 20:24:03] foo.Main - foo
    WARN [2012-02-25 20:24:03] foo.Main - bar
    ERROR [2012-02-25 20:24:03] foo.Worker - gah
    ! java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    INFO [2012-02-25 20:24:03] foo.Main - foo
    WARN [2012-02-25 20:24:03] foo.Main - bar
    ERROR [2012-02-25 20:24:03] foo.Worker - gah
    ! java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    INFO [2012-02-25 20:24:03] foo.Main - foo
    WARN [2012-02-25 20:24:03] foo.Main - bar
    ERROR [2012-02-25 20:24:03] foo.Worker - gah
    ! java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)
    INFO [2012-02-25 20:24:03] foo.Main - foo
    WARN [2012-02-25 20:24:03] foo.Main - bar
    ERROR [2012-02-25 20:24:03] foo.Worker - gah
    ! java.lang.NullPointerException: gah
    ! at Doer.worker(Doer.java:13)

    View Slide

  32. https://github.com/codahale/logula
    * have a look at this

    View Slide

  33. 3. destinations

    View Slide

  34. * when this is your reality, you don’t really only want log files on local machine disk

    View Slide

  35. * when this is your reality, you don’t really only want log files on local machine disk

    View Slide

  36. to name a few

    View Slide

  37. SMTP file
    syslog
    SQL
    AMQP
    IRC XMPP
    to name a few

    View Slide

  38. critical
    =
    “real time”

    View Slide

  39. View Slide

  40. audit
    =
    remote + restricted access

    View Slide

  41. mix and match

    View Slide

  42. always fallback on local file
    (fallacies of distributed computing)
    * Not SAN, NFS, NAS, etc

    View Slide

  43. (beware of sensitive data)
    * security sensitive: keys, passwords, etc
    * integrity sensitive: whatever you’re users might provide that’s not for everyone’s eyes

    View Slide

  44. 4. separation

    View Slide

  45. we usually log most things

    View Slide

  46. we usually don’t separate

    View Slide

  47. UTILIZE
    ALL THE CONTEXTS!

    View Slide

  48. multiple logs & context logs

    View Slide

  49. * Look at the SiftingAppender in logback-classic for an example

    View Slide

  50. traditional log
    * Look at the SiftingAppender in logback-classic for an example

    View Slide

  51. traditional log
    userid
    * Look at the SiftingAppender in logback-classic for an example

    View Slide

  52. traditional log
    userid
    session id
    * Look at the SiftingAppender in logback-classic for an example

    View Slide

  53. 5. configuration

    View Slide

  54. sane defaults
    * location
    * rotation

    View Slide

  55. per environment
    * have a configuration that automatically adapts to the environment
    * log everything to stdout in local development
    * log everything to file in test
    * log X to Y and Z to FOO in prod

    View Slide

  56. re:configurable
    * don’t require a deploy to change a log level
    * provide an API
    * use JMX
    * so that you can tweak logging (enable tracing) right in production when you need to

    View Slide

  57. View Slide

  58. metrics

    View Slide

  59. let your code speak

    View Slide

  60. INSTRUMENT
    ALL THE CODE!

    View Slide

  61. meters

    View Slide

  62. counters
    meters
    timers
    gauges
    histograms

    View Slide

  63. EXPOSE
    ALL THE VALUES!

    View Slide

  64. ...or whatever makes sense for you
    ...and yes, it’s Comic Sans for BAYEUX
    ...and yes, it’s Helvetica for JMX

    View Slide

  65. JMX JSON
    XML
    HTTP
    XMPP
    AMQP
    THRIFT
    BAYEUX
    RMI
    CSV
    ...or whatever makes sense for you
    ...and yes, it’s Comic Sans for BAYEUX
    ...and yes, it’s Helvetica for JMX

    View Slide

  66. * put all your values into your tools and services
    * BUT DON’T FORGET THE AD-HOC, LOCAL, USAGE (ie JMX)

    View Slide

  67. View Slide

  68. self checks!

    View Slide

  69. @Override
    protected Result check() throws Exception {
    if (database.ping()) {
    return Result.healthy();
    }
    return Result.unhealthy("Can't ping database");
    }
    * databases
    * other services
    * other dependencies
    * make them explicitly invokable

    View Slide

  70. trend on them

    View Slide

  71. alert on them

    View Slide

  72. ...render them as markers/buttons/light bulbs/whatever

    View Slide

  73. make instrumentation a habit
    * just do it

    View Slide

  74. find optimal usage later
    * you’ll never use it if it ain’t there

    View Slide

  75. View Slide

  76. http://metrics.codahale.com/

    View Slide

  77. View Slide

  78. packaging
    * adaptive to different environments

    View Slide

  79. one package
    * one package, regardless of environment

    View Slide

  80. bundle dependencies

    View Slide

  81. über jar
    (maven shade plugin)
    * for example

    View Slide

  82. adaptive configuration
    * adaptive to different environments

    View Slide

  83. isolated
    * mocks dependencies with (static) dummy answers

    View Slide


  84. org.mockito
    mockito-core
    test

    * maven example

    View Slide


  85. org.mockito
    mockito-core
    test

    * maven example

    View Slide


  86. org.mockito
    mockito-core
    test

    * maven example

    View Slide


  87. org.mockito
    mockito-core
    compile

    * maven example

    View Slide

  88. 127.0.0.1
    * expect everything to be available on 127.0.0.1

    View Slide

  89. test / qa / staging / prod
    * the usual suspects

    View Slide

  90. either detect your environment
    * ie, bundle configurations for all environments

    View Slide

  91. or load externalized configuration
    * DNS
    * ZooKeeper
    * CouchDB
    * Doozer
    * External property/YML/JSON/whatever files
    ** in one sane specified location (preferably the working directory)

    View Slide

  92. strive for zero-touch configuration
    * packages should JUST WORK

    View Slide

  93. View Slide

  94. the operational aspect needs to be an
    integral part of:

    View Slide

  95. the operational aspect needs to be an
    integral part of: development

    View Slide

  96. the operational aspect needs to be an
    integral part of: design

    View Slide

  97. the operational aspect needs to be an
    integral part of: architecture

    View Slide

  98. the operational aspect needs to be an
    integral part of: reasoning

    View Slide

  99. And therefore, a quick comment on...

    View Slide

  100. *aaS
    ....this...

    View Slide

  101. ...[whatever] as a service
    * what ever as a service

    View Slide

  102. OUTSOURCE
    ALL THE TINGS!

    View Slide

  103. to name some
    * logging
    * alerting

    View Slide

  104. Tools! Nice!

    View Slide

  105. (new shiny object syndrome)

    View Slide

  106. View Slide

  107. aka

    View Slide

  108. fallacies of distributed computing

    View Slide

  109. View Slide

  110. 1. The network is reliable
    2. Latency is zero
    3. Bandwidth is infinite
    4. The network is secure
    5. Topology doesn't change
    6. There is one administrator
    7. Transport cost is zero
    8. The network is homogeneous (- James Gosling)
    Fallacies of distributed computing
    - Peter Deutsch

    View Slide

  111. 1. The network is reliable
    2. Latency is zero
    3. Bandwidth is infinite
    4. The network is secure
    5. Topology doesn't change
    6. There is one administrator
    7. Transport cost is zero
    8. The network is homogeneous (- James Gosling)
    Fallacies of distributed computing
    - Peter Deutsch

    View Slide

  112. so when using this, we need to seriously consider...

    View Slide

  113. 1. The network is reliable
    2. Latency is zero
    3. Bandwidth is infinite
    4. The network is secure
    5. Topology doesn't change
    6. There is one administrator
    7. Transport cost is zero
    8. The network is homogeneous (- James Gosling)
    Fallacies of distributed computing
    - Peter Deutsch
    ...this
    * reliability (overall, geo location, connectivity)
    * security (communication, retention)
    * cost (of using, of not being available)

    View Slide

  114. by all means...

    View Slide

  115. ...use services

    View Slide

  116. but not only

    View Slide

  117. don’t bet
    your operation
    on their availability

    View Slide

  118. View Slide

  119. responsibility

    View Slide

  120. logging

    View Slide

  121. YOU

    View Slide

  122. metrics

    View Slide

  123. YOU

    View Slide

  124. packaging

    View Slide

  125. YOU

    View Slide

  126. configuration

    View Slide

  127. YOU

    View Slide

  128. sane defaults

    View Slide

  129. YOU

    View Slide

  130. YOU

    View Slide

  131. NOT operations

    View Slide

  132. NOT your hosting provider

    View Slide

  133. NOT your boss

    View Slide

  134. NOT service provider

    View Slide

  135. NOT your colleague

    View Slide

  136. YOU

    View Slide

  137. develop accordingly

    View Slide

  138. develop for operations

    View Slide

  139. love your operations
    @martengustafson
    http://marten.gustafson.pp.se/
    [email protected]

    View Slide