What if Socrates was a dev?

What if Socrates was a dev? or: notes on observability
Zbigniew Siciarz @ PyWaw 93 2020.03.30

Οἶδα οὐδὲν εἰδώς

I know that I know nothing

knowledge matrix

known knowns obvious context things we learned to do well
example: JavaScript for FE developers (or is it?)

known unknowns complex/complicated context concepts we need to learn in
order to progress for example: new APIs, system architectures but also: why does the system behave like this?

unknown knowns intuition/muscle memory years of practice and experience example:
your mother language https://www.freecodecamp.org/news/how-to-discover-your- unknown-knowns/

unknown unknowns ??? I don't even

four stages of competence https://en.wikipedia.org/wiki/Four_stages_of_competence

Observability

The holy grail of observability is the ability to be
able to ask any question, understand any previously unseen state your system may get itself into; without having to ship new code to handle that state (bc that implies you knew enough to predict it) -- Charity Majors (@mipsytipsy) “ “

three pillars of observability* logs metrics distributed tracing

Logging controlled and used by devs (mostly) you wouldn't want
to check each of 20 replicas by hand, would you? centralized logging 12factors showed us the way

plain text logging log lines are human readable grep-friendly, lnav-friendly

structured logging machine readable nested contexts format-agnostic Python: structlog https://2019.djangocon.eu/talks/logging-rethought-2-the-actions-
of-frank-taylor-jr/

logging? what logging? ¯\_( ツ)_/¯ for process in processes_to_archive: with
transaction(): archived_process = archivist.create_from(process) process_manager.delete(process) self.dispatch("process_archived", archived_process) tip: always log destructive actions!

does this code answer why? def activate_process(self, process): if (
process.name and self.is_valid_name(process.name) and process.is_approved and process.owner.is_supervisor() ): self.process_controller.set_active(process) but does this code answer why not?

does this code answer why not? def activate_process(self, process): if
not process.name: logger.warning(f"Process {process} has empty name") return if not self.is_valid_name(process.name): logger.warning(f"Process name {process.name} isn't valid") return if not process.is_approved: logger.warning(f"Process {process} requires approval") return # ...

Targeted logging provide detailed, DEBUG level logs in production for
specific services/users with issues without redeploying https://tersesystems.com/blog/2019/07/22/targeted-diagnostic- logging-in-production/

Metrics numeric data measured over time controlled by devs, used
by everyone need smart aggregation/retention rules https://blog.digitalocean.com/observability-and-metrics/

business dashboard

dev dashboard

health dashboard

Distributed tracing correlate flow of events across distributed system essential
in the world of microservices Zipkin, Jaeger, Lightstep, OpenTracing, hovewer... https://thenewstack.io/opentracing-opencensus-merge-into-a- single-new-project-opentelemetry/

Zipkin UI

*three pillars is bullshit each pillar is flawed observability requires:
high throughput high cardinality no sampling long retention all at once (and a pony) https://www.infoq.com/news/2019/02/rethinking-observability/

Alerting

Bad alerts too frequent, noisy tend to get ignored mask
real problems

Good alerts reliable actionable investigable serious enough have recovery instructions/runbooks
attached https://www.infoworld.com/article/3265735/eliminating-alert- fatigue-a-devops-secret.html

Developing with observability in mind

Healthy feedback loop build ship observe https://charity.wtf/2019/10/28/deploys-its-not-actually-about- fridays/

Fast CI/CD process merged? ship it! deploy small changes frequently
WE DO NOT BREAK USERSPACE! https://lkml.org/lkml/2012/12/23/75

Reliable environment

Code review tips how will you know if this is
broken? most changes should be instrumented important changes must be

Final thoughts observability is a cross-cutting feature of your product
embrace known unknowns limit unknown unknowns

ὁ δὲ ἀνεξέταστος βίος οὐ βιωτὸς ἀνθρώπῳ

The unexamined life is not worth living

Thanks!

Further reading https://grafana.com/blog/2019/10/21/whats-next-for- observability/ https://charity.wtf/ https://blog.revdebug.com/observability-in-microservices

What if Socrates was a dev?

What if Socrates was a dev?

More Decks by Zbigniew Siciarz

Other Decks in Programming

Featured

Transcript