chaos with computers since the 80’s. • Background in development and operations. • My own three chaos monkeys at home. • Skateboarder at heart. • Serverless aficionado. ABOUT ME AWS Pop-up Loft Stockholm @gunnargrosch
private and hybrid cloud platforms. • Delivering everything from magical customer support to fully managed cloud services. • Helping customers find the right level of monitoring and observability. ABOUT OPSIO AWS Pop-up Loft Stockholm @gunnargrosch STRATEGY ARCHITECTURE MIGRATION CLOUDOPS DEVOPS 24/7 SUPPORT OPTIMIZATION MANAGED SERVICES
difference between monitoring and observability. • Explore the three pillars of observability. • Touch the whats and whys. • Leave you with more questions than answers. GOAL OF THIS SESSION AWS Pop-up Loft Stockholm @gunnargrosch
change. • The way we build and operate systems has evolved. • Containers, Kubernetes, microservices, serverless, and service meshes change the way we operate software. • The systems we build have become more distributed and more ephemeral. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
with time. • We have a responsibility to ensure our application is good enough. • No service is going to fix our software. • Most failures will come from the application layer or from the interactions between different applications or services. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
of our application. • We can focus on making our application more robust. • Having visibility is more important than ever before to understand, operate, maintain and evolve applications. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
needs? • How do we tell the difference between these tools? • Should we stop monitoring? • Should we only focus on observability? • What’s the difference between logs and metrics? • Do we really need all three pillars? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
Evaluate the value it can provide for your challenges. • The tool should solve your problems. • We want to evolve instead of starting over. DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
are the weaknesses of the tool? • What are the problems it solves? • What are the tradeoffs it makes? • How is the ease of adoption/integration into an existing infrastructure? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
is changing. Almost. • To follow the change we need visibility. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Better visibility Better understanding Better systems Better instrumentation
waiting for something. • We monitor because we expect failure. • A failure centric approach to monitoring becomes a problem when the number of failure modes increases. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Dev Test QA Prod Monitor
behave gracefully when they fail. • Use graceful degradation like retries, timeouts and rate limiting. • Increases overall complexity. • Monitoring is human centric. • Automation and self-healing platforms makes it less human centric. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
• Key is still design of the systems themselves. • Monitoring should be failure and human centric. • Systems will continue to get more complex. • Monitoring of every failure becomes unnecessary. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
context. • Find answers to questions not yet formulated. • Whitebox monitoring gives us data. • When data is processed, interpreted, organized, structured it’s called information. • Observability gives us information. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
creates overheads. • How data is gathered, processed and stored is a key consideration. • Having data and information doesn’t solve problems. • Intuition and knowledge gives better observability. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
health of a system. • We can use the information to alert based on symptoms. • We can use the information to debug failures. • We can use the information to better understand our system. • We can use the information to understand dependencies. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
excel in providing valuable insight along with ample context. • Incredibly useful when we are searching at a very fine level of granularity. • The data is rich but without further processing it can be overwhelming. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
end-to- end request. • Provides visibility into the path and the structure of a request. • Complex applications can benefit from tracing. • Specific points in the execution of a request are represented. • Instrumentation is added in code which is propagated through the path. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
We are able to debug requests spanning multiple services to find bottlenecks. • Traces helps us understand the which and sometimes even the why. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
time. • Use mathematical modeling and prediction to know the behavior of a system in a time series. • Optimized for storage, processing, compression and retrieval. • Enables long retention of data as well as easy querying. • Allows for reduction of data resolution over time. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
log level. • Establishing service tiers with quotas and priorities. • Adjust log rates dynamically. • Identify a small set of hard failure modes. • Don’t monitor more than needed. • Avoid noisy monitoring. • Trace through service meshes. BEST PRACTICES AWS Pop-up Loft Stockholm @gunnargrosch