Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Monitoring - AWS Pop-up Loft Stockholm Nov 2 2018

Gunnar Grosch
November 02, 2018

Cloud Monitoring - AWS Pop-up Loft Stockholm Nov 2 2018

Cloud Monitoring in the era of Microservices and Distributed Architectures from AWS Pup-up Loft Stockholm Nov 2 2018

https://www.opsiocloud.com | https://www.opsio.se

Gunnar Grosch

November 02, 2018
Tweet

More Decks by Gunnar Grosch

Other Decks in Technology

Transcript

  1. Cloud Monitoring in the era of Microservices and Distributed Architectures

    1 AWS Pop-up Loft Stockholm Gunnar Grosch @gunnargrosch
  2. • Cloud Evangelist at Opsio www.opsio.se • Have been creating

    chaos with computers since the 80’s. • Background in development and operations. • My own three chaos monkeys at home. • Skateboarder at heart. • Serverless aficionado. ABOUT ME AWS Pop-up Loft Stockholm @gunnargrosch
  3. • Opsio specializes in the operation and support of public,

    private and hybrid cloud platforms. • Delivering everything from magical customer support to fully managed cloud services. • Helping customers find the right level of monitoring and observability. ABOUT OPSIO AWS Pop-up Loft Stockholm @gunnargrosch STRATEGY ARCHITECTURE MIGRATION CLOUDOPS DEVOPS 24/7 SUPPORT OPTIMIZATION MANAGED SERVICES
  4. • Look at why monitoring is complex. • See the

    difference between monitoring and observability. • Explore the three pillars of observability. • Touch the whats and whys. • Leave you with more questions than answers. GOAL OF THIS SESSION AWS Pop-up Loft Stockholm @gunnargrosch
  5. • The infrastructure space is in the midst of a

    change. • The way we build and operate systems has evolved. • Containers, Kubernetes, microservices, serverless, and service meshes change the way we operate software. • The systems we build have become more distributed and more ephemeral. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  6. • The tools are only going to get increasingly better

    with time. • We have a responsibility to ensure our application is good enough. • No service is going to fix our software. • Most failures will come from the application layer or from the interactions between different applications or services. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  7. • We can focus on the performance and business logic

    of our application. • We can focus on making our application more robust. • Having visibility is more important than ever before to understand, operate, maintain and evolve applications. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  8. • Many tools are just as bleeding edge as the

    infrastructure. • There has been a surge in both open source and SaaS tooling. WHERE ARE WE AT? AWS Pop-up Loft Stockholm @gunnargrosch
  9. • How do we choose the best tool for our

    needs? • How do we tell the difference between these tools? • Should we stop monitoring? • Should we only focus on observability? • What’s the difference between logs and metrics? • Do we really need all three pillars? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  10. • Don’t just replace your problem with a tool. •

    Evaluate the value it can provide for your challenges. • The tool should solve your problems. • We want to evolve instead of starting over. DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  11. • What are the strengths of the tool? • What

    are the weaknesses of the tool? • What are the problems it solves? • What are the tradeoffs it makes? • How is the ease of adoption/integration into an existing infrastructure? DECISION MAKING AWS Pop-up Loft Stockholm @gunnargrosch
  12. • Monitoring and observability are not the same. • Everything

    is changing. Almost. • To follow the change we need visibility. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Better visibility Better understanding Better systems Better instrumentation
  13. Monitoring is being on the lookout for failures WHAT TO

    MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  14. • We develop, test, deploy and monitor. • We monitor

    waiting for something. • We monitor because we expect failure. • A failure centric approach to monitoring becomes a problem when the number of failure modes increases. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Dev Test QA Prod Monitor
  15. • Complex architectures gives more possible failures. WHAT TO MONITOR

    AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch Possible failures Complexity
  16. • Embracing failure means we need to design services to

    behave gracefully when they fail. • Use graceful degradation like retries, timeouts and rate limiting. • Increases overall complexity. • Monitoring is human centric. • Automation and self-healing platforms makes it less human centric. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  17. • So how do we design monitoring for these systems?

    • Key is still design of the systems themselves. • Monitoring should be failure and human centric. • Systems will continue to get more complex. • Monitoring of every failure becomes unnecessary. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  18. Observability is understanding how a system behaves WHAT TO MONITOR

    AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  19. • Observability provides insights into the behavior of systems with

    context. • Find answers to questions not yet formulated. • Whitebox monitoring gives us data. • When data is processed, interpreted, organized, structured it’s called information. • Observability gives us information. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  20. • Observability is our ability to easily find information from

    data when we need it. • Information on the fly or pre-processed. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  21. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch

    Alerting Overview Debugging Profiling Dependencies Monitoring Observability
  22. • Observability isn’t about data collection alone. • Raw data

    creates overheads. • How data is gathered, processed and stored is a key consideration. • Having data and information doesn’t solve problems. • Intuition and knowledge gives better observability. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  23. Observability is more about what we do with the data

    than where it comes from WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  24. • We can use the information to know the overall

    health of a system. • We can use the information to alert based on symptoms. • We can use the information to debug failures. • We can use the information to better understand our system. • We can use the information to understand dependencies. WHAT TO MONITOR AND HOW? AWS Pop-up Loft Stockholm @gunnargrosch
  25. • Logs • Traces • Metrics THE THREE PILLARS OF

    OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  26. • A recorded event that happened over time. • Logs

    excel in providing valuable insight along with ample context. • Incredibly useful when we are searching at a very fine level of granularity. • The data is rich but without further processing it can be overwhelming. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  27. • What we’re looking for is information. • The number

    of data points we can collect is endless. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  28. • A trace represents a series of events in an

    end-to- end request. • Provides visibility into the path and the structure of a request. • Complex applications can benefit from tracing. • Specific points in the execution of a request are represented. • Instrumentation is added in code which is propagated through the path. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  29. • We understand the lifecycle of a request better. •

    We are able to debug requests spanning multiple services to find bottlenecks. • Traces helps us understand the which and sometimes even the why. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  30. • Metrics are a numeric representation of our data over

    time. • Use mathematical modeling and prediction to know the behavior of a system in a time series. • Optimized for storage, processing, compression and retrieval. • Enables long retention of data as well as easy querying. • Allows for reduction of data resolution over time. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  31. • Modern monitoring systems uses labels as additional key-value pair

    allowing a high degree of dimensionality in the data model. THE THREE PILLARS OF OBSERVABILITY AWS Pop-up Loft Stockholm @gunnargrosch
  32. • Log everything or log selectively. • Quotas instead of

    log level. • Establishing service tiers with quotas and priorities. • Adjust log rates dynamically. • Identify a small set of hard failure modes. • Don’t monitor more than needed. • Avoid noisy monitoring. • Trace through service meshes. BEST PRACTICES AWS Pop-up Loft Stockholm @gunnargrosch