$30 off During Our Annual Pro Sale. View Details »

What if Socrates was a dev?

What if Socrates was a dev?

or: notes on observability

Zbigniew Siciarz

March 30, 2020
Tweet

More Decks by Zbigniew Siciarz

Other Decks in Programming

Transcript

  1. What if Socrates was a dev?
    or: notes on observability
    Zbigniew Siciarz @ PyWaw 93
    2020.03.30

    View Slide

  2. Οἶδα οὐδὲν εἰδώς

    View Slide

  3. I know that I know
    nothing

    View Slide

  4. knowledge matrix

    View Slide

  5. known knowns
    obvious context
    things we learned to do well
    example: JavaScript for FE developers (or is it?)

    View Slide

  6. known unknowns
    complex/complicated context
    concepts we need to learn in order to progress
    for example: new APIs, system architectures
    but also: why does the system behave like this?

    View Slide

  7. unknown knowns
    intuition/muscle memory
    years of practice and experience
    example: your mother language
    https://www.freecodecamp.org/news/how-to-discover-your-
    unknown-knowns/

    View Slide

  8. View Slide

  9. unknown unknowns
    ???
    I don't even

    View Slide

  10. four stages of competence
    https://en.wikipedia.org/wiki/Four_stages_of_competence

    View Slide

  11. Observability

    View Slide

  12. The holy grail of observability is the ability to be able to ask any
    question, understand any previously unseen state your system
    may get itself into; without having to ship new code to handle
    that state (bc that implies you knew enough to predict it)
    -- Charity Majors (@mipsytipsy)


    View Slide

  13. three pillars of observability*
    logs
    metrics
    distributed tracing

    View Slide

  14. Logging
    controlled and used by devs (mostly)
    you wouldn't want to check each of 20 replicas by hand, would
    you?
    centralized logging
    12factors showed us the way

    View Slide

  15. plain text logging
    log lines are human readable
    grep-friendly, lnav-friendly

    View Slide

  16. structured logging
    machine readable
    nested contexts
    format-agnostic
    Python: structlog
    https://2019.djangocon.eu/talks/logging-rethought-2-the-actions-
    of-frank-taylor-jr/

    View Slide

  17. logging? what logging? ¯\_(
    ツ)_/¯
    for process in processes_to_archive:
    with transaction():
    archived_process = archivist.create_from(process)
    process_manager.delete(process)
    self.dispatch("process_archived", archived_process)
    tip: always log destructive actions!

    View Slide

  18. does this code answer why?
    def activate_process(self, process):
    if (
    process.name
    and self.is_valid_name(process.name)
    and process.is_approved
    and process.owner.is_supervisor()
    ):
    self.process_controller.set_active(process)
    but does this code answer why not?

    View Slide

  19. does this code answer why not?
    def activate_process(self, process):
    if not process.name:
    logger.warning(f"Process {process} has empty name")
    return
    if not self.is_valid_name(process.name):
    logger.warning(f"Process name {process.name} isn't valid")
    return
    if not process.is_approved:
    logger.warning(f"Process {process} requires approval")
    return
    # ...

    View Slide

  20. Targeted logging
    provide detailed, DEBUG level logs in production
    for specific services/users with issues
    without redeploying
    https://tersesystems.com/blog/2019/07/22/targeted-diagnostic-
    logging-in-production/

    View Slide

  21. Metrics
    numeric data measured over time
    controlled by devs, used by everyone
    need smart aggregation/retention rules
    https://blog.digitalocean.com/observability-and-metrics/

    View Slide

  22. business dashboard

    View Slide

  23. dev dashboard

    View Slide

  24. health dashboard

    View Slide

  25. Distributed tracing
    correlate flow of events across distributed system
    essential in the world of microservices
    Zipkin, Jaeger, Lightstep, OpenTracing, hovewer...
    https://thenewstack.io/opentracing-opencensus-merge-into-a-
    single-new-project-opentelemetry/

    View Slide

  26. Zipkin UI

    View Slide

  27. *three pillars is bullshit
    each pillar is flawed
    observability requires:
    high throughput
    high cardinality
    no sampling
    long retention
    all at once (and a pony)
    https://www.infoq.com/news/2019/02/rethinking-observability/

    View Slide

  28. Alerting

    View Slide

  29. Bad alerts
    too frequent, noisy
    tend to get ignored
    mask real problems

    View Slide

  30. Good alerts
    reliable
    actionable
    investigable
    serious enough
    have recovery instructions/runbooks attached
    https://www.infoworld.com/article/3265735/eliminating-alert-
    fatigue-a-devops-secret.html

    View Slide

  31. Developing with
    observability in mind

    View Slide

  32. Healthy feedback loop
    build
    ship
    observe
    https://charity.wtf/2019/10/28/deploys-its-not-actually-about-
    fridays/

    View Slide

  33. Fast CI/CD process
    merged? ship it!
    deploy small changes frequently
    WE DO NOT BREAK USERSPACE!
    https://lkml.org/lkml/2012/12/23/75

    View Slide

  34. Reliable environment

    View Slide

  35. Code review tips
    how will you know if this is broken?
    most changes should be instrumented
    important changes must be

    View Slide

  36. Final thoughts
    observability is a cross-cutting feature of your product
    embrace known unknowns
    limit unknown unknowns

    View Slide

  37. ὁ δὲ ἀνεξέταστος
    βίος οὐ βιωτὸς
    ἀνθρώπῳ

    View Slide

  38. The unexamined life
    is not worth living

    View Slide

  39. Thanks!

    View Slide

  40. Further reading
    https://grafana.com/blog/2019/10/21/whats-next-for-
    observability/
    https://charity.wtf/
    https://blog.revdebug.com/observability-in-microservices

    View Slide