Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What if Socrates was a dev?

What if Socrates was a dev?

or: notes on observability

Zbigniew Siciarz

March 30, 2020

More Decks by Zbigniew Siciarz

Other Decks in Programming


  1. What if Socrates was a dev? or: notes on observability

    Zbigniew Siciarz @ PyWaw 93 2020.03.30
  2. Οἶδα οὐδὲν εἰδώς

  3. I know that I know nothing

  4. knowledge matrix

  5. known knowns obvious context things we learned to do well

    example: JavaScript for FE developers (or is it?)
  6. known unknowns complex/complicated context concepts we need to learn in

    order to progress for example: new APIs, system architectures but also: why does the system behave like this?
  7. unknown knowns intuition/muscle memory years of practice and experience example:

    your mother language https://www.freecodecamp.org/news/how-to-discover-your- unknown-knowns/
  8. None
  9. unknown unknowns ??? I don't even

  10. four stages of competence https://en.wikipedia.org/wiki/Four_stages_of_competence

  11. Observability

  12. The holy grail of observability is the ability to be

    able to ask any question, understand any previously unseen state your system may get itself into; without having to ship new code to handle that state (bc that implies you knew enough to predict it) -- Charity Majors (@mipsytipsy) “ “
  13. three pillars of observability* logs metrics distributed tracing

  14. Logging controlled and used by devs (mostly) you wouldn't want

    to check each of 20 replicas by hand, would you? centralized logging 12factors showed us the way
  15. plain text logging log lines are human readable grep-friendly, lnav-friendly

  16. structured logging machine readable nested contexts format-agnostic Python: structlog https://2019.djangocon.eu/talks/logging-rethought-2-the-actions-

  17. logging? what logging? ¯\_( ツ)_/¯ for process in processes_to_archive: with

    transaction(): archived_process = archivist.create_from(process) process_manager.delete(process) self.dispatch("process_archived", archived_process) tip: always log destructive actions!
  18. does this code answer why? def activate_process(self, process): if (

    process.name and self.is_valid_name(process.name) and process.is_approved and process.owner.is_supervisor() ): self.process_controller.set_active(process) but does this code answer why not?
  19. does this code answer why not? def activate_process(self, process): if

    not process.name: logger.warning(f"Process {process} has empty name") return if not self.is_valid_name(process.name): logger.warning(f"Process name {process.name} isn't valid") return if not process.is_approved: logger.warning(f"Process {process} requires approval") return # ...
  20. Targeted logging provide detailed, DEBUG level logs in production for

    specific services/users with issues without redeploying https://tersesystems.com/blog/2019/07/22/targeted-diagnostic- logging-in-production/
  21. Metrics numeric data measured over time controlled by devs, used

    by everyone need smart aggregation/retention rules https://blog.digitalocean.com/observability-and-metrics/
  22. business dashboard

  23. dev dashboard

  24. health dashboard

  25. Distributed tracing correlate flow of events across distributed system essential

    in the world of microservices Zipkin, Jaeger, Lightstep, OpenTracing, hovewer... https://thenewstack.io/opentracing-opencensus-merge-into-a- single-new-project-opentelemetry/
  26. Zipkin UI

  27. *three pillars is bullshit each pillar is flawed observability requires:

    high throughput high cardinality no sampling long retention all at once (and a pony) https://www.infoq.com/news/2019/02/rethinking-observability/
  28. Alerting

  29. Bad alerts too frequent, noisy tend to get ignored mask

    real problems
  30. Good alerts reliable actionable investigable serious enough have recovery instructions/runbooks

    attached https://www.infoworld.com/article/3265735/eliminating-alert- fatigue-a-devops-secret.html
  31. Developing with observability in mind

  32. Healthy feedback loop build ship observe https://charity.wtf/2019/10/28/deploys-its-not-actually-about- fridays/

  33. Fast CI/CD process merged? ship it! deploy small changes frequently

    WE DO NOT BREAK USERSPACE! https://lkml.org/lkml/2012/12/23/75
  34. Reliable environment

  35. Code review tips how will you know if this is

    broken? most changes should be instrumented important changes must be
  36. Final thoughts observability is a cross-cutting feature of your product

    embrace known unknowns limit unknown unknowns
  37. ὁ δὲ ἀνεξέταστος βίος οὐ βιωτὸς ἀνθρώπῳ

  38. The unexamined life is not worth living

  39. Thanks!

  40. Further reading https://grafana.com/blog/2019/10/21/whats-next-for- observability/ https://charity.wtf/ https://blog.revdebug.com/observability-in-microservices