Upgrade to Pro — share decks privately, control downloads, hide ads and more …

可觀測性鏈路追蹤的實踐

Marcus
April 01, 2023

 可觀測性鏈路追蹤的實踐

隨著雲端與雲原生(Cloud Native)應用程式的普及,企業在軟體架構的演進與程式開發的速度與越來越快,隨之而來的挑戰是開發人員如何針對這些服務進行更即時的監控(Monitor)。可觀測性(Observability)議題在近幾年越來越多人在討論,甚至在 CNCF 還有專區介紹 Observability 相關的工具,可觀測性與監控到底有何不同?開發人員與SRE團隊該如何迎接雲原生應用的挑戰?這議程將會以淺顯易懂方式和與會者分享以下內容
- 什麼是 Observability?
- 可觀測性 vs 監控
- 鏈路追蹤工具:OpenTelemetry

Marcus

April 01, 2023
Tweet

More Decks by Marcus

Other Decks in Technology

Transcript

  1. AGENDA Observability ○ 什麼是 Observability ○ 可觀測性 vs 監控 Practices

    ○ 實踐工具 ○ OpenTelemetry Take Away ○ 總結
  2. I’m Marcus ▸ 專注在後端開發的工程師 ▸ 喜歡上技術課程 / 研討會吸收新知識 ▸ 分享學習技術於

    Blog & fb 粉絲團 Blog : m@rcus 學習筆記 Fb : m@rcus 學習筆記粉絲團 Hello!
  3. Challenge Virtualization Cloud Orchestration Containers Serverless / FaaS Monolith N

    - tier SOA Microservices Complexity low Complexity high Infrastructure Cloud native infrastructure is more than servers, network, and storage in the cloud—it is as much about operational hygiene as it is about elasticity and scalability. FaaS That provides a platform allowing customers to develop, run, and manage application functionalities without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app Architectures Software architecture refers to the fundamental structures of a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations. Reference : Hunting for Evil with the Elastic Stack
  4. Constant Change Deployment Frequency Organization Technical Move to Devops Endless

    Dependences 53.4% 35.9% 2021 2022 MTTR 一小時解決問題 Monitor Complexity Challenge (Mean time to recovery)
  5. “Observability is a measure of how well internal states of

    a system can be inferred from knowledge of its external outputs. In control theory, the observability and controllability of a linear system are mathematical duals.” - wiki
  6. 15 Thing we are aware of and understand Thing we

    are aware of but don’t understand Thing we are not aware of but understand Thing we are not aware of and don’t understand UnKnown UnKnown Known Known 測試 監控 可觀測性 N/A
  7. 16 Thing we are aware of and understand Thing we

    are aware of but don’t understand Thing we are not aware of but understand Thing we are not aware of and don’t understand UnKnown UnKnown Known Known 測試 監控 可觀測性 N/A 系統有沒有 正常工作 系統為甚麼 不工作
  8. Metrics • Is my service healthy ? • How much

    traffic do we have ? Logs Traces • Why did this node crash ? • Was function X on the node called ? • Why was this request slow? • Where should I optimize performance? • Which services are involved?
  9. “OpenTelemetry, also known as OTel for short, is a vendor-neutral

    open-source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs. As an industry-standard it is natively supported by a number of vendors.” OpenTelemetry
  10. Architecture Application OpenTelemetry API Exporter API API 3rd Library 3rd

    Library Minimal Implementation Log, Metrics, Tracing etc Prometheus Jaeger Zipkin SigNoz SDK Collector Receive Core functionality Span scope, context propagation w3c standard etc Telemetry data Receive Process Receive Export OTLP OTLP
  11. OpenTelemetry (OTel) Exporter • Azure Application Insights • Azure Monitor

    Collector SDK It implements the Tracing API, the Metrics API, and the Context API. This SDK also supports ILogger integration. Instrumentation libraries • ASP.NET、ASP.NET Core • Grpc.Net.Client、HTTP clients • Redis client、SQL client • …etc APIs • Tracing (Stable) • Metrics (Stable) • Logging (Mixed) OpenTelemetry Distributed Tracing
  12. Traces - Context Propagation Defined implicitly by its spans. A

    trace can be thought of as a directed acyclic graph of spans where the edges between spans are defined as parent/child relationships. Trace Represents a single unit of work in a system. Span Individual unit of work done within a distributed system. Spans generally have a name, and a start and end timestamp. Parent Trace Spans
  13. 結構化數據 • http.status_code • http.url Tags "http.status_code": 200, "http.url": "http://example.com",

    "my.custom.application.tag": "hello", Tags "service.name": "donut_shop", "service.version": "2.0.0", "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198", Logs "TraceId": "f4dbb3edd765f620 "SpanId": "43222c2d51a7abe3", "SeverityText": "INFO", "SeverityNumber": 9, "Body": "20200415T072306-0700 INFO I like donuts" Span Context • service.name • service.version • Custom tag Logs • traceID : identifier of the span context • spanID : the span identifier of the span context. Span Context 目的 : 具備 分析 的能力
  14. Observability ○ 什麼是 Observability ○ 可觀測性 vs 監控 ○ 蒐集遙測數據的新標準

    ○ 檢測、生成、收集資料 ○ Context Propagation OpenTelemetry