Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observability is Programmed

Yury Nino
August 23, 2022

Observability is Programmed

Presented in QCon Live

Yury Nino

August 23, 2022
Tweet

More Decks by Yury Nino

Other Decks in Technology

Transcript

  1. Agenda - Current Observability Landscape - Why observability as Code

    [OaC]? - What are the benefits of [OaC]? - A methodology for starting with [OaC]?
  2. What is Observability? In Control Theory, observability is defined as

    a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In Software Engineering, observability allows us understand: • Any state of the system. • The inner workings of their components. • All without shipping any new custom code. • And solely by interrogating with external tools.
  3. What is NOT Observability? Some vendors insist that observability is

    simply another synonym for telemetry indistinguishable from monitoring! Observability is defined as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Monitoring is about collecting, processing, aggregating, and displaying real-time quantitative metrics about a system.
  4. For modern software systems, observability is not about mathematical equations.

    It is about how people interact with and try to understand their complex systems.
  5. Observability Evolution 1960 2013 2016 2017 2018 2020 2022 The

    (four) Pillars of Observability at Twitter https://www.humio.com/blog/observability-redefined/
  6. The purpose of DevOps Automation isn’t just speed, it’s about

    leveraging the intrinsic motivation and creativity of developers again by freeing them from non-creative, tedious repair work! Observability-driven development (ODD) uses data and tooling to observe the state and behavior of a system before, during and after development to learn more about its patterns of weakness.
  7. Continuous Continuous Deployment Delivery Local IaC on Git Continuous Integration

    Developers 1 2 4 3 6 5 IaC in Git SaaS Pipelines Development Environment Production Environment Is this: Observability as Code?
  8. What is Observability as Code? Since Monitoring is monthly Metrics,

    while Observability is about events … Observability as Code must include: • Many actionable active checks and alerts. • Proactively notifying engineers of failures and warnings. • Maintaining a runbook for stability and predictability in production systems. • Expecting clusters and clumps of tightly coupled systems to all break at once.
  9. Continuous Continuous Deployment Delivery Local IaC on Git Continuous Integration

    Developers 1 2 4 3 6 5 IaC in Git SaaS Pipelines Development Environment Production Environment Is this: Observability as Code? Answer questions Make Decisions Answer questions Make Decisions
  10. Repeatable & Reusable Security Context & Documentation Auditable History Disaster

    Recovery Efficient Delta Changes Ownership & Packaging Reduce Toiling Because …
  11. Elementary Elementary Simple Sophisticated Advanced • Team is distracted by

    picking the wrong way to fix bugs. • Team is collecting metrics but they are not monitoring them. • There is an interest in implementing [OaC]. • Metrics are not visualized and do not give value to business. • Code is poorly instrumented so new builds are not examined. • Incident responders cannot easily diagnose issues.
  12. Simple Elementary Simple Sophisticated Advanced • Team is using a

    monitoring platform and they are familiar with the API features. • Team is determining what to monitor based on list of services and the KPIs that to be met. • The process is administered manually and require lots of human intervention. • Simple events are applied like turn it off, but there is not a methodology for notifying them to the team.
  13. Sophisticated Elementary Simple Sophisticated Advanced • Team is using with

    [IaC] tools and is familiar with the CI/CD capabilities of code versioning systems. • An automation workflow for [OaC] is implemented and it is running in low environments. • Engineers find it intuitive to debug problems and troubleshooting incidents in production. • Metrics are collected and visualized to give value to business capturing known unknowns.
  14. Advanced Elementary Simple Sophisticated Advanced • An automation workflow for

    [OaC] is implemented and it is running on production. • Engineers can trigger deployment of their own code after it’s been peer reviewed, satisfies controls, and is checked in. • Observability code paths can be enabled or disabled instantly, without needing a deployment. • [OaC] allows using the same tooling to debug code on one machine as on 10,000.
  15. In shadows In shadows Investment Adoption Cultural Expectation • There

    is low or no organizational awareness and Product teams do not receive feedback of the features. • Early adopters infrequently perform monitoring or observability strategies. • Team is identifying where to observe and is designing in such a way to make instrumentation easy. • Team has decided to adopt [OaC], but are unsure how to get started to avoid common dead ends.
  16. Investment • [OaC] is officially sanctioned and practitioners are dedicating

    resources to the practice. • Team is identifying where to observe and is designing in such a way to make instrumentation easy. • On-call duty is not excessively stressful, and engineers are not hesitant to take additional shifts as needed. • Multiple teams are interested and engaged with a strategy for observe several critical services. In shadows Investment Adoption Cultural Expectation
  17. Adoption • [OaC] is officially sanctioned and there is a

    team dedicated to implement it. • Developers have easy access to [KPIs] for outcomes and system utilization/cost, and can visualize them. • Team is following [OaC] practice to enforce observability as part of continuous deployment. • Team is adding metric collection, tracing and context for getting better insights. In shadows Investment Adoption Cultural Expectation
  18. Cultural Expectation • There is standardization of instrumentation with best

    practices like proactive monitoring and alerting in place. • Feedback loop from the observations to stakeholders team taking advantage of Observability as Code. • Team is using insights for discussing about learnings that are shared and implemented through initiatives. • Team is familiar with strategies such as OpenTelemetry into a single set of components and language-specific telemetry libraries In shadows Investment Adoption Cultural Expectation