Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What if Socrates was a dev?

What if Socrates was a dev?

or: notes on observability

Zbigniew Siciarz

March 30, 2020

More Decks by Zbigniew Siciarz

Other Decks in Programming


  1. What if Socrates was a dev? or: notes on observability

    Zbigniew Siciarz @ PyWaw 93 2020.03.30
  2. known knowns obvious context things we learned to do well

    example: JavaScript for FE developers (or is it?)
  3. known unknowns complex/complicated context concepts we need to learn in

    order to progress for example: new APIs, system architectures but also: why does the system behave like this?
  4. unknown knowns intuition/muscle memory years of practice and experience example:

    your mother language https://www.freecodecamp.org/news/how-to-discover-your- unknown-knowns/
  5. The holy grail of observability is the ability to be

    able to ask any question, understand any previously unseen state your system may get itself into; without having to ship new code to handle that state (bc that implies you knew enough to predict it) -- Charity Majors (@mipsytipsy) “ “
  6. Logging controlled and used by devs (mostly) you wouldn't want

    to check each of 20 replicas by hand, would you? centralized logging 12factors showed us the way
  7. logging? what logging? ¯\_( ツ)_/¯ for process in processes_to_archive: with

    transaction(): archived_process = archivist.create_from(process) process_manager.delete(process) self.dispatch("process_archived", archived_process) tip: always log destructive actions!
  8. does this code answer why? def activate_process(self, process): if (

    process.name and self.is_valid_name(process.name) and process.is_approved and process.owner.is_supervisor() ): self.process_controller.set_active(process) but does this code answer why not?
  9. does this code answer why not? def activate_process(self, process): if

    not process.name: logger.warning(f"Process {process} has empty name") return if not self.is_valid_name(process.name): logger.warning(f"Process name {process.name} isn't valid") return if not process.is_approved: logger.warning(f"Process {process} requires approval") return # ...
  10. Targeted logging provide detailed, DEBUG level logs in production for

    specific services/users with issues without redeploying https://tersesystems.com/blog/2019/07/22/targeted-diagnostic- logging-in-production/
  11. Metrics numeric data measured over time controlled by devs, used

    by everyone need smart aggregation/retention rules https://blog.digitalocean.com/observability-and-metrics/
  12. Distributed tracing correlate flow of events across distributed system essential

    in the world of microservices Zipkin, Jaeger, Lightstep, OpenTracing, hovewer... https://thenewstack.io/opentracing-opencensus-merge-into-a- single-new-project-opentelemetry/
  13. *three pillars is bullshit each pillar is flawed observability requires:

    high throughput high cardinality no sampling long retention all at once (and a pony) https://www.infoq.com/news/2019/02/rethinking-observability/
  14. Good alerts reliable actionable investigable serious enough have recovery instructions/runbooks

    attached https://www.infoworld.com/article/3265735/eliminating-alert- fatigue-a-devops-secret.html
  15. Fast CI/CD process merged? ship it! deploy small changes frequently

    WE DO NOT BREAK USERSPACE! https://lkml.org/lkml/2012/12/23/75
  16. Code review tips how will you know if this is

    broken? most changes should be instrumented important changes must be
  17. Final thoughts observability is a cross-cutting feature of your product

    embrace known unknowns limit unknown unknowns