Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to achieve full-stack Observability with AWS

SMS tech
August 23, 2024

How to achieve full-stack Observability with AWS

SMS tech

August 23, 2024
Tweet

More Decks by SMS tech

Other Decks in Technology

Transcript

  1. © SMS Co., Ltd. Takashi Kaga (SMS Co.,Ltd) JAWS PANKRATION

    2024 How to achieve full-stack Observability with AWS
  2. © SMS Co., Ltd. 1. 2. 3. 4. About Me

    Observability Learning from CNCF How to Achieve full-stack Observability Using Amazon CloudWatch summary Agenda
  3. © SMS Co., Ltd. About Me • Takashi Kaga (@TAKA_0411)

    • SRE at SMS Co.,Ltd • AWS Community Builder (Cloud Operations, since 2023) • Core Member : Media-JAWS
  4. © SMS Co., Ltd. About CNCF Cloud Native Computing Foundation

    (CNCF) https://www.cncf.io/ - Cloud Native Computing Foundation - CNCF is part of the Linux Foundation and was founded in 2015 - CNCF offers support for growing cloud-native projects - CNCF is creating an Observability Whitepaper
  5. © SMS Co., Ltd. CNCF and representative projects Graduated and

    Incubating Projects https://www.cncf.io/projects/ - Argo (Continuous Integration & Delivery) - Fluentd (Observability) - Istio (Service Mesh) - Kubernetes (Scheduling & Orchestration) - Prometheus (Observability) - OpenTelemetry (Observability)
  6. © SMS Co., Ltd. Observability as defined by CNCF 「It

    is a function of a system with which humans and machines can observe, understand and act on the state of said system.」 Observability Whitepaper : What is Observability? https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  7. © SMS Co., Ltd. What is observing the state of

    the system? Telemetry correlation for deeper insights https://ubuntu.com/observability/what-is-observability Observe and analyze various data output by the system to be able to estimate and address the internal state of the system.
  8. © SMS Co., Ltd. Observe the system : Monitoring and

    Observability Monitoring - Monitoring sets specific conditions and thresholds and is intended to periodically check the status of the system. - Monitoring reveals the “when” and “what” of system errors. Observability - Observability is aimed at understanding the internal status of the system, preventing problems and identifying their causes. - Observability reveals the “why” and “how” of system errors.
  9. © SMS Co., Ltd. Observe the system : analyze data

    output by system Telemetry correlation for deeper insights https://ubuntu.com/observability/what-is-observability Observe and analyze various data output by the system to be able to estimate and address the internal state of the system.
  10. © SMS Co., Ltd. analyze data : Telemetry Data Telemetry

    Data - It is an important element in understanding system status. - It is collected to detect and resolve system anomalies, optimize performance, etc. - CNCF defines Metrics, Logs, and Traces as primary signals. - In addition, signals such as Profiles and Dumps are also important.
  11. © SMS Co., Ltd. Primary signals as defined by CNCF

    Observability Whitepaper : Observability Signals https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  12. © SMS Co., Ltd. Telemetry Data Metrics - Quantified data

    on various activities. - Already quantified data - CPU utilization, Memory utilization - Data broken down as numerical values - Number of requests Observability Whitepaper : Metrics https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  13. © SMS Co., Ltd. Telemetry Data Logs - Describes activities

    and operations that occur in an OS, application, server, etc. - System logs - Application logs - Security logs - Audit logs Observability Whitepaper : Logs https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  14. © SMS Co., Ltd. Telemetry Data Traces - A description

    of what happened in a distributed transaction, such as a request initiated by an end user. Observability Whitepaper : Traces https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  15. © SMS Co., Ltd. Example : Traces in Datadog -

    Horizontal Axis : Time Axis - Vertical Axis : Call Relationship
  16. © SMS Co., Ltd. Telemetry Data Profiles - Sampling of

    stack traces in runtime - CPU Profiler, Heap Profiler, IO profiler, etc. - Program language specific Profiler - pprof (Go), Xdebug (PHP) Dumps - Snapshot at a point in time - core dump, etc. Observability Whitepaper : Profiles, Dumps https://github.com/cncf/tag-observability/blob/main/whitepaper.md
  17. © SMS Co., Ltd. Summary so far - CNCF's Observability

    Whitepaper will help you understand the concept and practice of Observability. - CNCF definition of observability is “It is a function of a system with which humans and machines can observe, understand and act on the state of said system.” - Telemetry data is important to achieve observability. Metrics, Logs, Traces, Profiles, Dumps
  18. © SMS Co., Ltd. Definition of full-stack Observability An observability

    solution that monitors the entire stack of services, including front-end and back-end as well as end-user experience and security. (※) (※) Definitions vary by vendor offering Observability SaaS
  19. © SMS Co., Ltd. full-stack Observability by Splunk Cisco and

    Splunk Bring Full-Stack Observability to the Entire Enterprise https://www.splunk.com/en_us/blog/devops/cisco-and-splunk-bring-full-stack-observability-to-the-entire-enterprise.html
  20. © SMS Co., Ltd. full-stack Observability by Dynatrace What is

    full-stack observability? https://www.dynatrace.com/knowledge-base/full-stack-observability/
  21. © SMS Co., Ltd. Observability Services in AWS Amazon CloudWatch

    - Collect key telemetry data from AWS services and other sources - CloudWatch Metrics - CloudWatch Logs - CloudWatch Application Signals - Numerous other functions besides data collection
  22. © SMS Co., Ltd. Amazon CloudWatch Feature List Amazon CloudWatchの概要と基本

    : AWS Black Belt Online Seminar https://pages.awscloud.com/rs/112-TZM-766/images/AWS-Black-Belt_2023_AmazonCloudWatch_0330_v1.pdf Application Signals (Trace) is not yet available as this is a March 2023 document.
  23. © SMS Co., Ltd. Topic : Simple Web Services front-end

    (user experience) back-end (infrastructure) Build / Deploy Pipelines Security Focus on front-end and back-end this time
  24. © SMS Co., Ltd. 1. front-end monitoring front-end (user experience)

    back-end (infrastructure) Build / Deploy Pipelines Security
  25. © SMS Co., Ltd. 1. front-end monitoring Users execute CloudWatch

    RUM's JavaScript in their browsers to metrics, logs, etc. Link metrics and real-time logs output by CloudFront to CloudWatch. Headless browsers access endpoints (URLs) to obtain metrics, traces, screenshots, etc.
  26. © SMS Co., Ltd. 2. back-end monitoring front-end (user experience)

    back-end (infrastructure) Build / Deploy Pipelines Security
  27. © SMS Co., Ltd. 2. back-end monitoring Link metrics output

    by ALB and Fargate to CloudWatch. Linking trace data to CloudWatch. (Automatic instrumentation of OpenTelemetry) Metrics output by data store services are linked to CloudWatch. Link task logs and DB logs to CloudWatch Logs.
  28. © SMS Co., Ltd. About full-stack Observability with AWS -

    First, read CNCF's Observability Whitepaper to understand the concept of Observability and telemetry data. - If the workload is in AWS, use Amazon CloudWatch to proceed with collection and analysis of telemetry data. - Full stack observability with Amazon CloudWatch is feasible, although there are challenges in getting Profiles and Dumps.
  29. © SMS Co., Ltd. Amazon CloudWatch or Observability SaaS Amazon

    CloudWatch Observability SaaS Introduction Available as soon as the workload is on AWS. There is a slight time lag before it can be used, including account sign-up and initial setup. Feature Features are being added, but not at the same pace as SaaS. Many features have been added, ranging from front-end, mobile application and security visualization. Integration with other services If you need to integrate with SaaS or other services, you will need to set up and build it yourself. It is possible to monitor a large amount of data by linking with various services. Cost It is an inexpensive way to collect key telemetry data, and it is excellent for cost analysis. Traces and Logs are useful but can spike costs; CloudWatch integration can increase AWS costs. Support You can expect extensive follow-up by AWS support. Support quality may vary depending on the case, e.g., English-only support.