Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Monitoring - AWS Pop-up Loft Stockholm No...

Avatar for Gunnar Grosch Gunnar Grosch
November 02, 2018

Cloud Monitoring - AWS Pop-up Loft Stockholm Nov 2 2018

Cloud Monitoring in the era of Microservices and Distributed Architectures from AWS Pup-up Loft Stockholm Nov 2 2018

https://www.opsiocloud.com | https://www.opsio.se

Avatar for Gunnar Grosch

Gunnar Grosch

November 02, 2018
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. Cloud Monitoring in the era of Microservices and Distributed Architectures

    1 AWS Pop-up Loft Stockholm Gunnar Grosch @gunnargrosch
  2. • Cloud Evangelist at Opsio www.opsio.se • Have been creating

    chaos with computers since the 80’s. • Background in development and operations. • My own three chaos monkeys at home. • Skateboarder at heart. • Serverless aficionado. ABOUT ME AWS Pop-up Loft Stockholm @gunnargrosch
  3. • Opsio specializes in the operation and support of public,

    private and hybrid cloud platforms. • Delivering everything from magical customer support to fully managed cloud services. • Helping customers find the right level of monitoring and observability. ABOUT OPSIO AWS Pop-up Loft Stockholm @gunnargrosch STRATEGY ARCHITECTURE MIGRATION CLOUDOPS DEVOPS 24/7 SUPPORT OPTIMIZATION MANAGED SERVICES
  4. • Look at why monitoring is complex. • See the

    difference between monitoring and observability. • Explore the three pillars of observability. • Touch the whats and whys. • Leave you with more questions than answers. GOAL OF THIS SESSION AWS Pop-up Loft Stockholm @gunnargrosch
  5. • The infrastructure space is in the midst of a

    change. • The way we build and operate systems has evolved. • Containers, Kubernetes, microservices, serverless, and service meshes change the way we operate software. • The systems we build have become more distributed and more ephemeral. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  6. • The tools are only going to get increasingly better

    with time. • We have a responsibility to ensure our application is good enough. • No service is going to fix our software. • Most failures will come from the application layer or from the interactions between different applications or services. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  7. • We can focus on the performance and business logic

    of our application. • We can focus on making our application more robust. • Having visibility is more important than ever before to understand, operate, maintain and evolve applications. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  8. • Many tools are just as bleeding edge as the

    infrastructure. • There has been a surge in both open source and SaaS tooling. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  9. • How do we choose the best tool for our

    needs? • How do we tell the difference between these tools? • Should we stop monitoring? • Should we only focus on observability? • What’s the difference between logs and metrics? • Do we really need all three pillars? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  10. • Don’t just replace your problem with a tool. •

    Evaluate the value it can provide for your challenges. • The tool should solve your problems. • We want to evolve instead of starting over. DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  11. • What are the strengths of the tool? • What

    are the weaknesses of the tool? • What are the problems it solves? • What are the tradeoffs it makes? • How is the ease of adoption/integration into an existing infrastructure? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  12. • Monitoring and observability are not the same. • Everything

    is changing. Almost. • To follow the change we need visibility. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Better visibility Better understanding Better systems Better instrumentation
  13. Monitoring is being on the lookout for failures WHAT TO

    MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  14. • We develop, test, deploy and monitor. • We monitor

    waiting for something. • We monitor because we expect failure. • A failure centric approach to monitoring becomes a problem when the number of failure modes increases. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Dev Test QA Prod Monitor
  15. • Complex architectures gives more possible failures. WHAT TO MONITOR

    AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Possible failures Complexity
  16. • Embracing failure means we need to design services to

    behave gracefully when they fail. • Use graceful degradation like retries, timeouts and rate limiting. • Increases overall complexity. • Monitoring is human centric. • Automation and self-healing platforms makes it less human centric. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  17. • So how do we design monitoring for these systems?

    • Key is still design of the systems themselves. • Monitoring should be failure and human centric. • Systems will continue to get more complex. • Monitoring of every failure becomes unnecessary. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  18. Observability is understanding how a system behaves WHAT TO MONITOR

    AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  19. • Observability provides insights into the behavior of systems with

    context. • Find answers to questions not yet formulated. • Whitebox monitoring gives us data. • When data is processed, interpreted, organized, structured it’s called information. • Observability gives us information. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  20. • Observability is our ability to easily find information from

    data when we need it. • Information on the fly or pre-processed. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  21. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

    Alerting Overview Debugging Profiling Dependencies Monitoring Observability
  22. • Observability isn’t about data collection alone. • Raw data

    creates overheads. • How data is gathered, processed and stored is a key consideration. • Having data and information doesn’t solve problems. • Intuition and knowledge gives better observability. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  23. Observability is more about what we do with the data

    than where it comes from WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  24. • We can use the information to know the overall

    health of a system. • We can use the information to alert based on symptoms. • We can use the information to debug failures. • We can use the information to better understand our system. • We can use the information to understand dependencies. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  25. • Logs • Traces • Metrics THE THREE PILLARS OF

    OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  26. • A recorded event that happened over time. • Logs

    excel in providing valuable insight along with ample context. • Incredibly useful when we are searching at a very fine level of granularity. • The data is rich but without further processing it can be overwhelming. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  27. • What we’re looking for is information. • The number

    of data points we can collect is endless. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  28. • A trace represents a series of events in an

    end-to- end request. • Provides visibility into the path and the structure of a request. • Complex applications can benefit from tracing. • Specific points in the execution of a request are represented. • Instrumentation is added in code which is propagated through the path. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  29. • We understand the lifecycle of a request better. •

    We are able to debug requests spanning multiple services to find bottlenecks. • Traces helps us understand the which and sometimes even the why. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  30. • Metrics are a numeric representation of our data over

    time. • Use mathematical modeling and prediction to know the behavior of a system in a time series. • Optimized for storage, processing, compression and retrieval. • Enables long retention of data as well as easy querying. • Allows for reduction of data resolution over time. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  31. • Modern monitoring systems uses labels as additional key-value pair

    allowing a high degree of dimensionality in the data model. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  32. • Log everything or log selectively. • Quotas instead of

    log level. • Establishing service tiers with quotas and priorities. • Adjust log rates dynamically. • Identify a small set of hard failure modes. • Don’t monitor more than needed. • Avoid noisy monitoring. • Trace through service meshes. BEST PRACTICES AWS Pop-up Loft Stockholm @gunnargrosch