Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Telemetry Everywhere: Observability in the DevOps Cosmos

Telemetry Everywhere: Observability in the DevOps Cosmos

From a talk with Moogsoft on how observability supports teams through DevOps evolution to become higher performing organizations.

Helen Beal

July 22, 2020
Tweet

More Decks by Helen Beal

Other Decks in Technology

Transcript

  1. Human Chief Ambassador: DevOps Institute DevOps Editor: InfoQ Ambassador: CD

    Foundation Analyst: Accelerated Strategies WoW coach, speaker, learning facilitator Geek, wordsmith, Bananagrammer Volunteer warden at Kingley Vale Once saw a flamingo lay an egg Can dig an Olive Ridley turtle nest Mission: Bringing joy to work
  2. Automation and Astronomy “DevOps isn’t about automation just as astronomy

    isn’t about telescopes” Chris Little Senior Research Director, Gartner
  3. Continuous Service Assurance • Launch the rocket ship more frequently

    • Make every launch safer • Predict meteors: prevent data disasters • Smooth mission workflows • Turn data into actionable insights • Recycle knowledge • Power continuous innovation • Create more stars: optimize profitability • Delight all the planets’ customers Icons made by Freepik, Eucalyp and Icongeek 26 from www.flaticon.com The most overlooked competitive advantage is uninterrupted service. DevOps Team
  4. What does this mean for DevOps evolution? Culture Automation Lean

    Measurement Sharing Visibility and transparency builds trust Data-driven not opinion-driven conversations Fast feedback on experiments A tool that supports team autonomy: “We build it, we own it” Accelerated root cause(s) analysis and insights Pre-emptive warning and forecasting operating behavior Automated service assurance Data discovery, crunch & insights Accelerates flow (MTTx) Removes handoffs and delays between teams Observability across the end-to-end value stream Focus on customer experience Real data that measures progress and improvements operations, SRE, SLOs and error budgets Actionable insights based on streaming data Telemetry everywhere Provides a shared platform for collaborative analysis Builds a knowledge base so local discoveries become global improvements ChatOps
  5. What does this mean for an SRE? • Reducing the

    toil associated with incident management - particularly around cause analysis - improving uptime and MTTR • Providing a platform for inspecting and adapting according to SLOs, and ultimately improving teams’ ability to meet them • Offering a potential solution to improve when SLOs are not met and error budgets are over-spent • Relieving team cognitive load when dealing with vast amounts of data - reducing burnout • Releasing humans and teams from toil, improving productivity, innovation and the flow and delivery of value