Cloud Monitoring - AWS Pop-up Loft Stockholm Nov 2 2018

Cloud Monitoring in the era of Microservices and Distributed Architectures
1 AWS Pop-up Loft Stockholm Gunnar Grosch @gunnargrosch

• Cloud Evangelist at Opsio www.opsio.se • Have been creating
chaos with computers since the 80’s. • Background in development and operations. • My own three chaos monkeys at home. • Skateboarder at heart. • Serverless aficionado. ABOUT ME AWS Pop-up Loft Stockholm @gunnargrosch

• Opsio specializes in the operation and support of public,
private and hybrid cloud platforms. • Delivering everything from magical customer support to fully managed cloud services. • Helping customers find the right level of monitoring and observability. ABOUT OPSIO AWS Pop-up Loft Stockholm @gunnargrosch STRATEGY ARCHITECTURE MIGRATION CLOUDOPS DEVOPS 24/7 SUPPORT OPTIMIZATION MANAGED SERVICES

AWS Pop-up Loft Stockholm @gunnargrosch Goal of this session

• Look at why monitoring is complex. • See the
difference between monitoring and observability. • Explore the three pillars of observability. • Touch the whats and whys. • Leave you with more questions than answers. GOAL OF THIS SESSION AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch Where are we at?

• . WHERE ARE WE AT? AWS Pop-up Loft Stockholm
@gunnargrosch

• The infrastructure space is in the midst of a
change. • The way we build and operate systems has evolved. • Containers, Kubernetes, microservices, serverless, and service meshes change the way we operate software. • The systems we build have become more distributed and more ephemeral. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch

• The tools are only going to get increasingly better
with time. • We have a responsibility to ensure our application is good enough. • No service is going to fix our software. • Most failures will come from the application layer or from the interactions between different applications or services. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch

• We can focus on the performance and business logic
of our application. • We can focus on making our application more robust. • Having visibility is more important than ever before to understand, operate, maintain and evolve applications. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch

• Many tools are just as bleeding edge as the
infrastructure. • There has been a surge in both open source and SaaS tooling. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch Decision making

Oh, we have a new problem DECISION MAKING AWS Pop-up
Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch

• How do we choose the best tool for our
needs? • How do we tell the difference between these tools? • Should we stop monitoring? • Should we only focus on observability? • What’s the difference between logs and metrics? • Do we really need all three pillars? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch

• Don’t just replace your problem with a tool. •
Evaluate the value it can provide for your challenges. • The tool should solve your problems. • We want to evolve instead of starting over. DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch

• What are the strengths of the tool? • What
are the weaknesses of the tool? • What are the problems it solves? • What are the tradeoffs it makes? • How is the ease of adoption/integration into an existing infrastructure? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch What to monitor and how?

• Monitoring and observability are not the same. • Everything
is changing. Almost. • To follow the change we need visibility. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Better visibility Better understanding Better systems Better instrumentation

Monitoring is being on the lookout for failures WHAT TO
MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

• We develop, test, deploy and monitor. • We monitor
waiting for something. • We monitor because we expect failure. • A failure centric approach to monitoring becomes a problem when the number of failure modes increases. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Dev Test QA Prod Monitor

• Complex architectures gives more possible failures. WHAT TO MONITOR
AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Possible failures Complexity

• Embracing failure means we need to design services to
behave gracefully when they fail. • Use graceful degradation like retries, timeouts and rate limiting. • Increases overall complexity. • Monitoring is human centric. • Automation and self-healing platforms makes it less human centric. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

• So how do we design monitoring for these systems?
• Key is still design of the systems themselves. • Monitoring should be failure and human centric. • Systems will continue to get more complex. • Monitoring of every failure becomes unnecessary. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

Observability is understanding how a system behaves WHAT TO MONITOR
AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

• Observability provides insights into the behavior of systems with
context. • Find answers to questions not yet formulated. • Whitebox monitoring gives us data. • When data is processed, interpreted, organized, structured it’s called information. • Observability gives us information. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

• Observability is our ability to easily find information from
data when we need it. • Information on the fly or pre-processed. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
Alerting Overview Debugging Profiling Dependencies Monitoring Observability

• Observability isn’t about data collection alone. • Raw data
creates overheads. • How data is gathered, processed and stored is a key consideration. • Having data and information doesn’t solve problems. • Intuition and knowledge gives better observability. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

Observability is more about what we do with the data
than where it comes from WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

• We can use the information to know the overall
health of a system. • We can use the information to alert based on symptoms. • We can use the information to debug failures. • We can use the information to better understand our system. • We can use the information to understand dependencies. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch The three pillars of observability

• Logs • Traces • Metrics THE THREE PILLARS OF
OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

• . THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft
Stockholm @gunnargrosch

Logs

• A recorded event that happened over time. • Logs
excel in providing valuable insight along with ample context. • Incredibly useful when we are searching at a very fine level of granularity. • The data is rich but without further processing it can be overwhelming. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

• What we’re looking for is information. • The number
of data points we can collect is endless. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

Traces

• A trace represents a series of events in an
end-to- end request. • Provides visibility into the path and the structure of a request. • Complex applications can benefit from tracing. • Specific points in the execution of a request are represented. • Instrumentation is added in code which is propagated through the path. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

• We understand the lifecycle of a request better. •
We are able to debug requests spanning multiple services to find bottlenecks. • Traces helps us understand the which and sometimes even the why. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

Metrics

• Metrics are a numeric representation of our data over
time. • Use mathematical modeling and prediction to know the behavior of a system in a time series. • Optimized for storage, processing, compression and retrieval. • Enables long retention of data as well as easy querying. • Allows for reduction of data resolution over time. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

• Modern monitoring systems uses labels as additional key-value pair
allowing a high degree of dimensionality in the data model. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch Best practices

• Log everything or log selectively. • Quotas instead of
log level. • Establishing service tiers with quotas and priorities. • Adjust log rates dynamically. • Identify a small set of hard failure modes. • Don’t monitor more than needed. • Avoid noisy monitoring. • Trace through service meshes. BEST PRACTICES AWS Pop-up Loft Stockholm @gunnargrosch

AWS Pop-up Loft Stockholm @gunnargrosch Questions?

AWS Pop-up Loft Stockholm @gunnargrosch Thank you for attending!

48 Tynäsgatan 12 65216 Karlstad +46 10 252 55 01
[email protected] www.opsio.se Gunnar Grosch @gunnargrosch

Cloud Monitoring - AWS Pop-up Loft Stockholm No...

Cloud Monitoring - AWS Pop-up Loft Stockholm Nov 2 2018

More Decks by Gunnar Grosch

Other Decks in Technology

Featured

Transcript