Slide 1

Slide 1 text

© SMS Co., Ltd. Takashi Kaga (SMS Co.,Ltd) JAWS PANKRATION 2024 How to achieve full-stack Observability with AWS

Slide 2

Slide 2 text

© SMS Co., Ltd. 1. 2. 3. 4. About Me Observability Learning from CNCF How to Achieve full-stack Observability Using Amazon CloudWatch summary Agenda

Slide 3

Slide 3 text

© SMS Co., Ltd. 01 About Me

Slide 4

Slide 4 text

© SMS Co., Ltd. About Me ● Takashi Kaga (@TAKA_0411) ● SRE at SMS Co.,Ltd ● AWS Community Builder (Cloud Operations, since 2023) ● Core Member : Media-JAWS

Slide 5

Slide 5 text

© SMS Co., Ltd. About Me

Slide 6

Slide 6 text

© SMS Co., Ltd. 02 Observability Learning from CNCF

Slide 7

Slide 7 text

© SMS Co., Ltd. About CNCF Cloud Native Computing Foundation (CNCF) https://www.cncf.io/ - Cloud Native Computing Foundation - CNCF is part of the Linux Foundation and was founded in 2015 - CNCF offers support for growing cloud-native projects - CNCF is creating an Observability Whitepaper

Slide 8

Slide 8 text

© SMS Co., Ltd. CNCF and representative projects Graduated and Incubating Projects https://www.cncf.io/projects/ - Argo (Continuous Integration & Delivery) - Fluentd (Observability) - Istio (Service Mesh) - Kubernetes (Scheduling & Orchestration) - Prometheus (Observability) - OpenTelemetry (Observability)

Slide 9

Slide 9 text

© SMS Co., Ltd. Observability as defined by CNCF 「It is a function of a system with which humans and machines can observe, understand and act on the state of said system.」 Observability Whitepaper : What is Observability? https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 10

Slide 10 text

© SMS Co., Ltd. What is observing the state of the system? Telemetry correlation for deeper insights https://ubuntu.com/observability/what-is-observability Observe and analyze various data output by the system to be able to estimate and address the internal state of the system.

Slide 11

Slide 11 text

© SMS Co., Ltd. Observe the system : Monitoring and Observability Monitoring - Monitoring sets specific conditions and thresholds and is intended to periodically check the status of the system. - Monitoring reveals the “when” and “what” of system errors. Observability - Observability is aimed at understanding the internal status of the system, preventing problems and identifying their causes. - Observability reveals the “why” and “how” of system errors.

Slide 12

Slide 12 text

© SMS Co., Ltd. Observe the system : analyze data output by system Telemetry correlation for deeper insights https://ubuntu.com/observability/what-is-observability Observe and analyze various data output by the system to be able to estimate and address the internal state of the system.

Slide 13

Slide 13 text

© SMS Co., Ltd. analyze data : Telemetry Data Telemetry Data - It is an important element in understanding system status. - It is collected to detect and resolve system anomalies, optimize performance, etc. - CNCF defines Metrics, Logs, and Traces as primary signals. - In addition, signals such as Profiles and Dumps are also important.

Slide 14

Slide 14 text

© SMS Co., Ltd. Primary signals as defined by CNCF Observability Whitepaper : Observability Signals https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 15

Slide 15 text

© SMS Co., Ltd. Telemetry Data Metrics - Quantified data on various activities. - Already quantified data - CPU utilization, Memory utilization - Data broken down as numerical values - Number of requests Observability Whitepaper : Metrics https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 16

Slide 16 text

© SMS Co., Ltd. Telemetry Data Logs - Describes activities and operations that occur in an OS, application, server, etc. - System logs - Application logs - Security logs - Audit logs Observability Whitepaper : Logs https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 17

Slide 17 text

© SMS Co., Ltd. Telemetry Data Traces - A description of what happened in a distributed transaction, such as a request initiated by an end user. Observability Whitepaper : Traces https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 18

Slide 18 text

© SMS Co., Ltd. Example : Traces in Datadog - Horizontal Axis : Time Axis - Vertical Axis : Call Relationship

Slide 19

Slide 19 text

© SMS Co., Ltd. Telemetry Data Profiles - Sampling of stack traces in runtime - CPU Profiler, Heap Profiler, IO profiler, etc. - Program language specific Profiler - pprof (Go), Xdebug (PHP) Dumps - Snapshot at a point in time - core dump, etc. Observability Whitepaper : Profiles, Dumps https://github.com/cncf/tag-observability/blob/main/whitepaper.md

Slide 20

Slide 20 text

© SMS Co., Ltd. Summary so far - CNCF's Observability Whitepaper will help you understand the concept and practice of Observability. - CNCF definition of observability is “It is a function of a system with which humans and machines can observe, understand and act on the state of said system.” - Telemetry data is important to achieve observability. Metrics, Logs, Traces, Profiles, Dumps

Slide 21

Slide 21 text

© SMS Co., Ltd. 03 How to Achieve full-stack Observability Using Amazon CloudWatch

Slide 22

Slide 22 text

© SMS Co., Ltd. Definition of full-stack Observability An observability solution that monitors the entire stack of services, including front-end and back-end as well as end-user experience and security. (※) (※) Definitions vary by vendor offering Observability SaaS

Slide 23

Slide 23 text

© SMS Co., Ltd. full-stack Observability by Splunk Cisco and Splunk Bring Full-Stack Observability to the Entire Enterprise https://www.splunk.com/en_us/blog/devops/cisco-and-splunk-bring-full-stack-observability-to-the-entire-enterprise.html

Slide 24

Slide 24 text

© SMS Co., Ltd. full-stack Observability by Dynatrace What is full-stack observability? https://www.dynatrace.com/knowledge-base/full-stack-observability/

Slide 25

Slide 25 text

© SMS Co., Ltd. Observability Services in AWS

Slide 26

Slide 26 text

© SMS Co., Ltd. Observability Services in AWS Amazon CloudWatch - Collect key telemetry data from AWS services and other sources - CloudWatch Metrics - CloudWatch Logs - CloudWatch Application Signals - Numerous other functions besides data collection

Slide 27

Slide 27 text

© SMS Co., Ltd. Amazon CloudWatch Feature List Amazon CloudWatchの概要と基本 : AWS Black Belt Online Seminar https://pages.awscloud.com/rs/112-TZM-766/images/AWS-Black-Belt_2023_AmazonCloudWatch_0330_v1.pdf Application Signals (Trace) is not yet available as this is a March 2023 document.

Slide 28

Slide 28 text

© SMS Co., Ltd. How to achieve full-stack observability using Amazon CloudWatch

Slide 29

Slide 29 text

© SMS Co., Ltd. Topic : Simple Web Services front-end (user experience) back-end (infrastructure) Build / Deploy Pipelines Security Focus on front-end and back-end this time

Slide 30

Slide 30 text

© SMS Co., Ltd. 1. front-end monitoring front-end (user experience) back-end (infrastructure) Build / Deploy Pipelines Security

Slide 31

Slide 31 text

© SMS Co., Ltd. 1. front-end monitoring Users execute CloudWatch RUM's JavaScript in their browsers to metrics, logs, etc. Link metrics and real-time logs output by CloudFront to CloudWatch. Headless browsers access endpoints (URLs) to obtain metrics, traces, screenshots, etc.

Slide 32

Slide 32 text

© SMS Co., Ltd. 2. back-end monitoring front-end (user experience) back-end (infrastructure) Build / Deploy Pipelines Security

Slide 33

Slide 33 text

© SMS Co., Ltd. 2. back-end monitoring Link metrics output by ALB and Fargate to CloudWatch. Linking trace data to CloudWatch. (Automatic instrumentation of OpenTelemetry) Metrics output by data store services are linked to CloudWatch. Link task logs and DB logs to CloudWatch Logs.

Slide 34

Slide 34 text

© SMS Co., Ltd. 04 summary

Slide 35

Slide 35 text

© SMS Co., Ltd. About full-stack Observability with AWS - First, read CNCF's Observability Whitepaper to understand the concept of Observability and telemetry data. - If the workload is in AWS, use Amazon CloudWatch to proceed with collection and analysis of telemetry data. - Full stack observability with Amazon CloudWatch is feasible, although there are challenges in getting Profiles and Dumps.

Slide 36

Slide 36 text

© SMS Co., Ltd. Happy Observability life with Amazon CloudWatch

Slide 37

Slide 37 text

© SMS Co., Ltd. 05 appendix

Slide 38

Slide 38 text

© SMS Co., Ltd. Amazon CloudWatch or Observability SaaS Amazon CloudWatch Observability SaaS Introduction Available as soon as the workload is on AWS. There is a slight time lag before it can be used, including account sign-up and initial setup. Feature Features are being added, but not at the same pace as SaaS. Many features have been added, ranging from front-end, mobile application and security visualization. Integration with other services If you need to integrate with SaaS or other services, you will need to set up and build it yourself. It is possible to monitor a large amount of data by linking with various services. Cost It is an inexpensive way to collect key telemetry data, and it is excellent for cost analysis. Traces and Logs are useful but can spike costs; CloudWatch integration can increase AWS costs. Support You can expect extensive follow-up by AWS support. Support quality may vary depending on the case, e.g., English-only support.