Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing an observability strategy - AWS Summ...

Developing an observability strategy - AWS Summit Amsterdam 2023

As you migrate to the cloud or mature your operations within the cloud, it is crucial to optimize your observability so that your stakeholders can comprehend how your applications are operating.
In this session, learn how to define your observability strategy for the future in order to satisfy the needs of all of your stakeholders and ensure that you can deliver successful business outcomes.

Mohammed Fazalullah

August 02, 2024
Tweet

More Decks by Mohammed Fazalullah

Other Decks in Technology

Transcript

  1. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. A M S T E R D A M | J U N E 1 , 2 0 2 3
  2. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Developing an observability strategy Mohammed Fazalullah Qudrath C O P 2 0 1 Senior Developer Advocate Amazon Web Services
  3. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Lifecycle of an issue Detect Investigate Remediate
  4. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Issue timeline Detect Identify Fix Verify MTTD MTTI MTTR
  5. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pillars of observability
  6. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is observability? A measure of how well we can understand a system from the work it does 90% of the methods in this service complete in under 200 milliseconds This API is handling 1500 HTTP requests per second CPU utilization for this service is at 85%
  7. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is instrumentation? Instrumentation: Measuring events in software using code (a type of structural monitoring) Calls to this database took, on average, 50 milliseconds
  8. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Foundations of observability Metrics Traces Logs
  9. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AIOps and DevOps Microservices and containers Digital experience monitoring (DEM) Data lakes What are the main observability use cases?
  10. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Observable systems should emit events: Metrics, logs, and traces “The database won’t start after the update” “Our application is 35% slower than last week after this configuration change” “What are the dependencies for this service?” Logs Metrics Traces
  11. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Observability matters because of . . . Visibility Real-time troubleshooting Customer experience Applications = $ Operational Business
  12. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Observability signals
  13. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Metrics Time series measurement of system performance, health, or business data over a period Typically visualized on graphical charts for human consumption
  14. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Logs Textual data captured from applications indicating an event happened at that time Log events are queried to perform deep analysis or troubleshooting
  15. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Traces A record of the end-to-end journey of user requests, from the user- interface through the entire distributed system and back to the user
  16. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Profiles/continuous profiling Analyzing application code complexity dynamically by tracking function calls, resource utilization, errors, etc. The core idea of continuous profiling is to capture CPU, memory, I/O, and networking usage over time, connecting it with the code base in a manner so that it’s possible to directly locate a certain line of source code
  17. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Putting all of these together Signal type Information collected (examples) Context Metrics Resource utilization Are my resources optimally allocated? Logs Business context, web traffic Is my server healthy? Traces Service fault, latency Which of my services is slow? Profiles Function calls, code execution What line of code is causing issues? Is there a memory leak?
  18. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Developing an observability strategy
  19. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Observability virtuous cycle Customer experience Collect Improve Act Business stakeholders Customer needs KPIs
  20. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Strategy What do I ? What do my customers want? What do stakeholders need? What are my KPIs? What do I ? Customer experience Metrics, logs, and traces Identify data sources How do I ? Alert when outcomes are at risk Evaluate impact Establish root cause and fix
  21. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Full-stack observability strategies Outside-in Inside-out Begin with establishing what good looks like to your end users Examples include • Web page response times • Failed purchases • JavaScript errors Begin by establishing what good looks like for your backend applications Examples include • Slow queries • Integration health • Container restarts What you observe should reflect your desired business outcomes
  22. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Full-stack observability strategies Inside-out Outside-in Business objectives Your goals, objectives, and approach to observability should be shaped by your business objectives This will determine what signals you receive from your workloads, what to create alarms and notifications around, and how to build a full-stack observability solution that reduces your mean time to resolution
  23. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. System-level telemetry Business-level metrics Webpage response time, job run length CPU wait %, disk queue depth Business insight Customer sentiment, SLAs Business insights come from signals
  24. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Observability best practices
  25. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. #1: Observable systems should emit events – Metrics, logs, and traces “The database won’t start after the update” “Our application is 35% slower than last week after this configuration change” “What are the dependencies for this service?” Logs Metrics Traces
  26. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. #2: All components should be instrumented Browser Mobile Server (virtual) hardware and managed services Host operating system and containers Application Amazon EC2 instance Amazon Cognito Kinesis DynamoDB API Gateway CloudWatch Amazon S3
  27. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. #3: Instrumentation should not be opt in, manual, or hard to do On premises Web server On-premises relational data Synthetic customers Customers Public cloud Micro services API Browser Apps Mobile NoSQL data store
  28. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What metrics do we need? These metrics align to our KPIs These are what our customers care about
  29. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Designing dashboards Stakeholder dashboards Cost, service audit, capacity planning Low-level dashboards Infrastructure, microservice, dependency Backend microservice AWS Lambda functions Infrastructure dashboard Infrastructure dashboard Dependency dashboard Microservice dashboard High-level dashboards Customer experience, system level, service instance Client dashboard Customer experience dashboard Clients API microservice Amazon EC2 instances System dashboard Amazon Builders’ Library Building dashboards for operational visibility
  30. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Playbook Designing alarms Take action Alarm Notification Runbook Metric Event
  31. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Troubleshooting An alarm tells you that there is a problem . . . How do you find out quickly there is a problem?
  32. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Logging and tracing Immutable record of an event A record of WHAT happened Follow the path of a request, user centric Tell you the impact to the request path or user
  33. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Observability options
  34. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS observability options Observability Collectors and SDKs Container insights Lambda insights Contributor insights Application insights Synthetics Dashboards Alarms RUM Metrics Logs AWS X-Ray AWS-native services Amazon CloudWatch ServiceLens Open-source managed services Amazon Managed Grafana Do it yourself (DIY) – AWS OSS solutions Amazon OpenSearch Service Amazon Managed Service for Prometheus Jaeger and Zipkin Tracing Insights and ML Instrumentation CloudWatch agent AWS X-Ray agent AWS Distro for OpenTelemetry
  35. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS-native observability stack Digital experience Application Infrastructure CloudWatch Synthetics CloudWatch RUM CloudWatch Evidently AWS X-Ray insights CloudWatch ServiceLens, Container Insights, Lambda Insights, Contributor Insights, CloudWatch Application Insights AWS X-Ray CloudWatch Logs, alarms, metrics, and dashboards Outside-in Inside-out
  36. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. The outside-in example
  37. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Outside-in F O C U S O N E N D U S E R E X P E R I E N C E CloudWatch RUM CloudWatch Evidently AWS X-Ray Alarm Logs Synthetics Metrics Dashboards CloudWatch Anomaly detection Metrics Insights
  38. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Outside-in F O C U S O N E N D U S E R E X P E R I E N C E Web application CloudWatch RUM CloudWatch Evidently AWS X-Ray Alarm Logs Synthetics Metrics Dashboards CloudWatch Anomaly detection Metrics Insights
  39. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Typical SLOs Page load time Purchases completed successfully JavaScript and HTML errors Conversion rates and new customer acquisition New feature adoption rates Search engine traffic All are related to end-user behavior and performance
  40. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. The inside-out scenario
  41. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inside-out F O C U S O N I N T E R N A L S Y S T E M P E R F O R M A N C E AWS X-Ray Alarm Logs Metrics Dashboards CloudWatch Anomaly detection Logs Insights Contributor Insights Metrics Insights
  42. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inside-out F O C U S O N I N T E R N A L S Y S T E M P E R F O R M A N C E EC2 instances Lambda functions AWS Fargate Amazon EKS Amazon RDS AWS X-Ray Alarm Logs Metrics Dashboards CloudWatch Anomaly detection Logs Insights Contributor Insights Metrics Insights
  43. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Slow queries High/low CPU utilization Disk usage, IOPS API response time Errors, faults, and retries These are internal-facing signals Typical SLOs
  44. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. A well-developed, full-stack example F U L L S T A C K M E A N S R E C E I V I N G S I G N A L S F R O M A L L T I E R S O F Y O U R A P P L I C A T I O N CloudWatch RUM CloudWatch Evidently AWS X-Ray Alarm Logs Synthetics Metrics Dashboards CloudWatch Anomaly detection Metrics Insights Logs Insights Contributor Insights Metrics Insights EC2 instances Lambda functions AWS Fargate Amazon EKS Amazon RDS Web application
  45. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. On premises What about hybrid, distributed, or on-premises workloads? AWS X-Ray Alarm Logs Metrics Dashboards CloudWatch Anomaly detection Logs Insights Contributor Insights Metrics Insights Database Server Firewall Server Server VPN gateway Direct Connect gateway Internet Almost identical to workloads on AWS
  46. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary
  47. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Summary Remember the basics: metrics, logs, and traces Outside-in: Begin with establishing what good looks like to your end users Inside-out: Begin by establishing what good looks like for your backend applications AWS Observability options for outside-in and inside-out approaches
  48. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Observability workshop H A N D S - O N E X P E R I E N C E W I T H A W S O B S E R V A B I L I T Y S E R V I C E S https://observability.workshop.aws
  49. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Observability best practices C O L L E C T I O N O F U S E F U L A W S O B S E R V A B I L I T Y I D E A S , T I P S , B L O G S & O T H E R R E S O U R C E S 64 https://aws-observability.github.io/observability-best-practices/
  50. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Mohammed Fazalullah Qudrath Senior Developer Advocate Amazon Web Services