Observability is Programmed

Slide 1

Slide 1 text

Observability is Programmed Observability as Code

Slide 2

Slide 2 text

Yury Niño Roa Cloud Infrastructure Engineer @Google

Slide 3

Slide 3 text

Agenda - Current Observability Landscape - Why observability as Code [OaC]? - What are the beneﬁts of [OaC]? - A methodology for starting with [OaC]?

Slide 4

Slide 4 text

Observability Landscape

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

What is Observability? In Control Theory, observability is deﬁned as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In Software Engineering, observability allows us understand: ● Any state of the system. ● The inner workings of their components. ● All without shipping any new custom code. ● And solely by interrogating with external tools.

Slide 7

Slide 7 text

What is NOT Observability? Some vendors insist that observability is simply another synonym for telemetry indistinguishable from monitoring! Observability is deﬁned as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Monitoring is about collecting, processing, aggregating, and displaying real-time quantitative metrics about a system.

Slide 8

Slide 8 text

For modern software systems, observability is not about mathematical equations. It is about how people interact with and try to understand their complex systems.

Slide 9

Slide 9 text

Observability Evolution 1960 2013 2016 2017 2018 2020 2022 The (four) Pillars of Observability at Twitter https://www.humio.com/blog/observability-redefined/

Slide 10

Slide 10 text

Data Observability will help organizations better understand and troubleshoot their data-intensive systems.

Slide 11

Slide 11 text

Observability as Code

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

It is part of a bigger thing: Observability-Driven Development Visualize Stack Instru- mentation Application

Slide 14

Slide 14 text

The purpose of DevOps Automation isn’t just speed, it’s about leveraging the intrinsic motivation and creativity of developers again by freeing them from non-creative, tedious repair work! Observability-driven development (ODD) uses data and tooling to observe the state and behavior of a system before, during and after development to learn more about its patterns of weakness.

Slide 15

Slide 15 text

How does Observability Code look?

Slide 16

Slide 16 text

Continuous Continuous Deployment Delivery Local IaC on Git Continuous Integration Developers 1 2 4 3 6 5 IaC in Git SaaS Pipelines Development Environment Production Environment Is this: Observability as Code?

Slide 17

Slide 17 text

Remember! Monitoring is monitoring! Observing is event-first testing in production. So How does Observability Code look?

Slide 18

Slide 18 text

What is Observability as Code? Since Monitoring is monthly Metrics, while Observability is about events … Observability as Code must include: ● Many actionable active checks and alerts. ● Proactively notifying engineers of failures and warnings. ● Maintaining a runbook for stability and predictability in production systems. ● Expecting clusters and clumps of tightly coupled systems to all break at once.

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Why Observability as Code?

Slide 21

Slide 21 text

Repeatable & Reusable Security Context & Documentation Auditable History Disaster Recovery Efficient Delta Changes Ownership & Packaging Reduce Toiling Because …

Slide 22

Slide 22 text

How to start with Observability as Code?

Slide 23

Slide 23 text

Observability Maturity Model High Sophistication High Adoption Observability Pioneer Observability Pioneer

Slide 24

Slide 24 text

Sophistication Elementary Simple Sophisticated Advanced

Slide 25

Slide 25 text

Elementary Elementary Simple Sophisticated Advanced ● Team is distracted by picking the wrong way to ﬁx bugs. ● Team is collecting metrics but they are not monitoring them. ● There is an interest in implementing [OaC]. ● Metrics are not visualized and do not give value to business. ● Code is poorly instrumented so new builds are not examined. ● Incident responders cannot easily diagnose issues.

Slide 26

Slide 26 text

Simple Elementary Simple Sophisticated Advanced ● Team is using a monitoring platform and they are familiar with the API features. ● Team is determining what to monitor based on list of services and the KPIs that to be met. ● The process is administered manually and require lots of human intervention. ● Simple events are applied like turn it off, but there is not a methodology for notifying them to the team.

Slide 27

Slide 27 text

Sophisticated Elementary Simple Sophisticated Advanced ● Team is using with [IaC] tools and is familiar with the CI/CD capabilities of code versioning systems. ● An automation workﬂow for [OaC] is implemented and it is running in low environments. ● Engineers ﬁnd it intuitive to debug problems and troubleshooting incidents in production. ● Metrics are collected and visualized to give value to business capturing known unknowns.

Slide 28

Slide 28 text

Advanced Elementary Simple Sophisticated Advanced ● An automation workﬂow for [OaC] is implemented and it is running on production. ● Engineers can trigger deployment of their own code after it’s been peer reviewed, satisﬁes controls, and is checked in. ● Observability code paths can be enabled or disabled instantly, without needing a deployment. ● [OaC] allows using the same tooling to debug code on one machine as on 10,000.

Slide 29

Slide 29 text

Adoption In the shadows Investment Cultural Expectation Adoption

Slide 30

Slide 30 text

In shadows In shadows Investment Adoption Cultural Expectation ● There is low or no organizational awareness and Product teams do not receive feedback of the features. ● Early adopters infrequently perform monitoring or observability strategies. ● Team is identifying where to observe and is designing in such a way to make instrumentation easy. ● Team has decided to adopt [OaC], but are unsure how to get started to avoid common dead ends.

Slide 31

Slide 31 text

Investment ● [OaC] is ofﬁcially sanctioned and practitioners are dedicating resources to the practice. ● Team is identifying where to observe and is designing in such a way to make instrumentation easy. ● On-call duty is not excessively stressful, and engineers are not hesitant to take additional shifts as needed. ● Multiple teams are interested and engaged with a strategy for observe several critical services. In shadows Investment Adoption Cultural Expectation

Slide 32

Slide 32 text

Adoption ● [OaC] is ofﬁcially sanctioned and there is a team dedicated to implement it. ● Developers have easy access to [KPIs] for outcomes and system utilization/cost, and can visualize them. ● Team is following [OaC] practice to enforce observability as part of continuous deployment. ● Team is adding metric collection, tracing and context for getting better insights. In shadows Investment Adoption Cultural Expectation

Slide 33

Slide 33 text

Cultural Expectation ● There is standardization of instrumentation with best practices like proactive monitoring and alerting in place. ● Feedback loop from the observations to stakeholders team taking advantage of Observability as Code. ● Team is using insights for discussing about learnings that are shared and implemented through initiatives. ● Team is familiar with strategies such as OpenTelemetry into a single set of components and language-speciﬁc telemetry libraries In shadows Investment Adoption Cultural Expectation

Slide 34

Slide 34 text

Waiting Is Not an Option Thank you very much!