• Founding member and organizer of JDDUG Fukuoka • Speaker at Datadog Live Tokyo 2025 (June) • Speaker at Datadog Live Tokyo 2025 (December) • Exhibited at the JDDUG booth at Datadog Summit (October) • Visited the Datadog Japan office for a total of 6 days Datadog & Me 3
services̶ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a cost: maximizing stability limits how fast new features can be developed and how quickly products can be delivered to users, and dramatically increases their cost, which in turn reduces the numbers of features a team can afford to offer. Further, users typically donʼt notice the difference between high reliability and extreme reliability in a service, because the user experience is dominated by less reliable components like the cellular network or the device they are working with. Put simply, a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability! With this in mind, rather than simply maximizing uptime, Site Reliability Engineering seeks to balance the risk of unavailability with the goals of rapid innovation and efficient service operations, so that usersʼ overall happiness̶with features, service, and performance̶is optimized. 100% Reliability Is Rarely the Right Answer 9 Ref: https://sre.google/sre-book/embracing-risk/
services̶ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a cost: maximizing stability limits how fast new features can be developed and how quickly products can be delivered to users, and dramatically increases their cost, which in turn reduces the numbers of features a team can afford to offer. Further, users typically donʼt notice the difference between high reliability and extreme reliability in a service, because the user experience is dominated by less reliable components like the cellular network or the device they are working with. Put simply, a user on a 99% reliable smartphone cannot tell the difference between 99.99% and 99.999% service reliability! With this in mind, rather than simply maximizing uptime, Site Reliability Engineering seeks to balance the risk of unavailability with the goals of rapid innovation and efficient service operations, so that usersʼ overall happiness̶with features, service, and performance̶is optimized. 100% Reliability Is Rarely the Right Answer 10 My Interpretation ①Over-Engineering Reliability = Higher Costs + Slower Velocity + Diminishing Returns for Users = No Value Added ②SRE = Reliability × Velocity × Cost Efficiency = Maximize Overall Value through Balance Ref: https://sre.google/sre-book/embracing-risk/
supporting all Nulab services • Handles sensitive org/user data • Auth, Billing, and Security-hardening features Critical Systems Solo SRE The Burden of Return • Sunset of a primary product • New tech stack & lack of domain knowledge • Juggling childcare and housework • Moved to a team of 1 after 3 departures • Getting by with help from kind neighboring teams (Platform, other SREs, Developers, etc.)
Datadog CSM ) 🔄 15 Trial is available! Maybe just a little... Wow, Bits is impressive! Too busy to touch it I need more SREs! Bits AI SRE! Experiencing AIOps thru a 30-day trial and a strong nudge Rising AIOps
that aren't urgent but can't be ignored 20 App not deploying A few 5xx errors Help! Too many slow queries Internal tools inaccessible Pod restarting repeatedly It just broke suddenly High CPU usage, huh? Feels slow, doesn't it? Often impossible to start immediately and ties up multiple people
finish with just Bits now I don't even open AWS Console or Terminal anymore Futahashi-san is impressive! A world I never would have believed a year ago That's insane
trigger an investigation. • APM latency alerts lack the ability to provide additional context. • Invisible issues remain uninvestigable if they aren't caught by monitors or APM latency. • Unsupported data sources cannot be utilized for investigations. But is Bits in its current state enough? 26 We must lower the barrier to entry and expand coverage for Bits
and Adoption of Bits Higher Expectations for "Bits AI SRE" While "Bits AI SRE" is highly effective even without tuning, I expect it to achieve even greater results with less effort in the future. Specifically, it currently requires pre-configured monitors as investigation triggers, and context must be added to those monitors to gain deeper insights. In the future, I hope it will be able to autonomously recognize anomalies and leverage context based on all available information within Datadog. 27 Ref: https://www.datadoghq.com/ja/blog/datadog-live-tokyo-2025-recap/
and Adoption of Bits Higher Expectations for "Bits AI SRE" While "Bits AI SRE" is highly effective even without tuning, I expect it to achieve even greater results with less effort in the future. Specifically, it currently requires pre-configured monitors as investigation triggers, and context must be added to those monitors to gain deeper insights. In the future, I hope it will be able to autonomously recognize anomalies and leverage context based on all available information within Datadog. 28 Ref: https://www.datadoghq.com/ja/blog/datadog-live-tokyo-2025-recap/ My Interpretation I WANT TO investigate without monitors and master all the data and just make my life easy! lol Futahashi-san is impressive!
creates PRs to fix the root cause • Investigate Synthetics API: Support for Synthetics Monitors • Recommended Actions: Triage steps based on investigation results • Bits.md: Provide shared context to Bits for investigations • Start investigations from APM latency graphs & APM Watchdog stories: Trigger investigations without monitors • Prompt-based investigations: Trigger investigations without monitors 30 Ref: https://www.datadoghq.com/product-preview/bits-ai-sre-pilot-features/
• Insights from dashboards and service pages • Insights from cloud costs • Insights from SLOs • Capacity planning • Investigating issues undetected by monitors 36 The possibilities are endless!!!!
Bits Assistant Datadog MCP Pup CLI Primary Role Autonomous SRE Conversational Assistant Gateway to External AI AI-powered CLI Primary Use Incident Response Data Search, Insights, General Q&A Using Datadog via external LLMs Human/AI ops & Scripting UI Specific Pages / Slack / (Mobile) All Pages / Slack / (Mobile) External LLM UI (Kiro, Claude, etc.) Terminal / Scripts Target Users SRE / Ops Every human being Dev / SRE / Ops AI / SRE / Ops Each one is unique, and all are impressive!!
➡➡➡ Fundamental Change • Focusing on the Essence of SRE ‒ Bits protects the “NOW”, I protect the “FUTURE” ‒ Data Consolidation / Context Enrichment / Feature Utilization ➡➡➡ Enhanced Observability • Strengthening Organization-wide Incident Response ‒ And when I shape Bits, Bits also shapes me ‒ Human-AI Collaboration / Visibility / Memory ➡➡➡ Improvement Loop What Bits Has Brought to Us 🤲 39 Bits is not just automation; it is a fundamental transformation of our world
Evangelize Bits: Share features and celebrate successes • Prepare the "Dog Park": Deploy and utilize based on Datadog Best Practices • Train Bits: Foster growth through known alerts and chaos injection • Secure the "Dog Food" budget: Guardrails and cost monitoring 41 Lead as a "Capable Senior" through Servant Leadership Iʼll keep pushing forward too! 🏋 Futahashi-san is impressive!
telemetry • Enable key features and ingest essential data • Create effective Dashboards for investigation • Include telemetry links in Monitor Messages • Train through Feedback and Memory • Set up Slack Integration Tips for Empowering Bits 💡 42 Ref: https://docs.datadoghq.com/bits_ai/bits_ai_sre/knowledge_sources A better environment for humans is a better environment for our Good Boy
the World of Incident Response ‒ A Partner with "Super-Canine" Insight and Expression, Growing alongside us like a Teammate • Our Mission as SREs: Build the Ultimate Dog Park for Bits ‒ Provide the Best Datadog Environment for the Best Partner • Confidence in the Rapid Evolution of Datadog and Bits ‒ Reach a Higher Realm simply by continuing your journey with Datadog 44 Letʼs Evolve and Transform Our Organizations Together with Bits! Datadog is impressive!