Observability-Driven Development: What DevOps is Really About

Observability-Driven Development: What DevOps is Really About


Greg Shackles

October 26, 2018


  1. Observability-Driven Development What DevOps is Really About Add some sort

    of nice picture…maybe one of the observation deck ones? Greg Shackles Principal Engineer, Olo @gshackles greg@gregshackles.com
  2. whoami Greg Shackles Principal Engineer, Olo Based in Los Angeles,

    CA @gshackles greg@gregshackles.com github.com/gshackles speakerdeck.com/gshackles
  3. it all starts here

  4. what does that mean? continuous integration postmortems agile infrastructure as

    code continuous delivery continuous deployment autoscaling containers automation on-call rotations instrumentation chaos testing canary releases monitoring observability site reliability engineering feature flags continuous testing
  5. are we doing DevOps now?

  6. None
  7. ok, so what even is DevOps? culture automation measurement sharing

    DevOps is really about people
  8. this is a team sport create a culture of sharing

    shared ownership, understanding, and responsibility shipping is just the beginning product development requires continuously asking questions and iterating
  9. measurement let’s talk about

  10. what does your system look like?

  11. what does it really look like?

  12. control theory observability is a measure of how well internal

    states of a system can be inferred by knowledge of its external outputs
  13. why is this important? you can’t improve what you don’t

    measure if you’re not measuring, you’re effectively flying blind level up your skills and your value this mindset will make you a better engineer and more valuable to your company observability is about asking questions it’s a cultural trait that drives how you build systems and products break down silos, share knowledge build a shared understanding and insight into your systems
  14. test-driven development Red Green Refactor

  15. testing in production reality check: you’re always testing in production,

    whether you think you are or not embrace it
  16. observability-driven development test in production keep observing measure in production,

    validate assumptions consider adding instrumentation ahead of time to determine impact does anything now look abnormal? do you know what normal looks like?
  17. observe in production release happened here

  18. what should we measure?

  19. handy methods R ate E rrors D uration U tilization

    S aturation E rrors
  20. it’s not just about technical metrics

  21. SLIs, SLOs, SLAs… oh my! Service Level Indicator (SLI) defined

    quantitative measure of some aspect of a service Service Level Objective (SLO) target value or range for a service, measured by an SLI Service Level Agreement (SLA) contract with your users that includes consequences of meeting or missing the SLOs landing.google.com/sre/book/chapters/service-level-objectives.html request latency, error rate, availability 99th percentile for request latency of 100ms or less public service: 10% discount for every hour below SLO, internal service: team gets paged
  22. make it visible

  23. create a culture of sharing

  24. stigmergy the trace left by an action stimulates the next

    action by the same or a different agent
  25. stigmergic alerting

  26. start small there isn’t a one-size-fits-all solution

  27. error logging

  28. structured logging serilog.net getseq.net

  29. metrics

  30. feature flags

  31. feature flags launchdarkly.com

  32. safely experiment in production github.com/github/Scientist.net

  33. None
  34. questions? Greg Shackles Principal Engineer, Olo @gshackles greg@gregshackles.com