Observability in Go Application using OpenTelemetry

Slide 1

Slide 1 text

Observability in Go Application using OpenTelemetry Jun 23 2021 Sakti Dwi Cahyono

Slide 2

Slide 2 text

Intro 01 Observability Built in tools Logging, tracing, metrics Tracing implementation Q&A 02 03 04 05 06

Slide 3

Slide 3 text

Observability

Slide 4

Slide 4 text

“Observability refers to understanding a system through observation, and classifies the tools that accomplish this. These tools includes tracing tools, sampling tools, and tools based on fixed counters. It does not include benchmark tools, which modify the state of the system by performing a workload experiment” Excerpt From: Brendan Gregg. “BPF Performance Tools: Linux System and Application Observability.” Definition

Slide 5

Slide 5 text

Source: http://www.brendangregg.com/blog/2014-08-23/linux-perf-tools-linuxcon-na-2014.html

Slide 6

Slide 6 text

Why observability? ● Microservices/services create complex interactions. ● Failures don't exactly repeat. ● Debugging multi-tenancy is painful. ● Monitoring no longer can help us.

Slide 7

Slide 7 text

What is observability? ● We need to answer questions about our systems. What characteristics did the queries that timed out at 500ms share in common? Service versions? ● Instrumentation produces data. ● Querying data answers our questions.

Slide 8

Slide 8 text

Telemetry aids observability ● Telemetry data isn't observability itself. ● Instrumentation code is how we get telemetry. ● Telemetry data can include traces, logs, and/or metrics. All different views into the same underlying truth.

Slide 9

Slide 9 text

Built-in tools

Slide 10

Slide 10 text

Go diagnostics Source: https://golang.org/doc/diagnostics ● Proﬁling, package runtime/pprof, net/http/pprof ○ cpu,heap, threadcreate, goroutine, block, mutex ● Tracing, package x/net/trace ● Debugging, GODEBUG environment variable ● Runtime statistics and events, package runtime, runtime/debug

Slide 11

Slide 11 text

Logging, tracing, metrics

Slide 12

Slide 12 text

Metrics, logs, and traces, oh my! ● Metrics ○ Aggregated summary statistics. ● Logs ○ Detailed debugging information emitted by processes. ● Distributed Tracing ○ Provides insights into the full lifecycles, aka traces of requests to a system, allowing you to pinpoint failures and performance issues. Structured data can be transmuted into any of these!

Slide 13

Slide 13 text

Metrics concepts in a nutshell ● Gauges ○ Instantaneous point-in-time value (e.g. CPU utilization) ● Cumulative counters ○ Cumulative sums of data since process start (e.g. request counts) ● Cumulative histogram ○ Grouped counters for a range of buckets (e.g. 0-10ms, 11-20ms) ● Rates ○ The derivative of a counter, typically. (e.g. requests per second) ● Aggregation by tags ○ Data can be joined along shared tags (e.g. hostname, cluster name).

Slide 14

Slide 14 text

Tracing concepts in a nutshell ● Span ○ Represents a single unit of work in a system. ○ Typically encapsulates: operation name, a start and finish timestamp, the parent span identifier, the span identifier, and context items. ● Trace ○ Defined implicitly by its spans. A trace can be thought of as a directed acyclic graph of spans where the edges between spans are defined as parent/child relationships. ● DistributedContext ○ Contains the tracing identifiers, tags, and options that are propagated from parent to child spans

Slide 15

Slide 15 text

Source: https://sgryphon.wordpress.com/2020/11/16/a-guide-to-w3c-trace-context/

Slide 16

Slide 16 text

Add more context to traces with Span Events ● Span Events are context-aware logging. ● An event contains timestamped information added to a span. You can think of this as a structured log, or a way to annotate your spans with speciﬁc details about what happened along the way. ○ Contains: ■ the name of the event ■ one or more attributes ■ a timestamp

Slide 17

Slide 17 text

Implementation

Slide 18

Slide 18 text

OpenCensus + OpenTracing = OpenTelemetry ● OpenTracing: ○ Provides APIs and instrumentation for distributed tracing ● OpenCensus: ○ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing. ● OpenTelemetry: ○ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-speciﬁc libraries.

Slide 19

Slide 19 text

Source: https://opentelemetry.io/docs/

Slide 20

Slide 20 text

OpenTelemetry Components ● Proto ● Speciﬁcation (API, SDK, Data) ● Collector ● Instrumentation Libraries

Slide 21

Slide 21 text

Demo Repository https://github.com/sakti/o11ygo

Slide 22

Slide 22 text

We are hiring https://sampingan.company/career

Slide 23

Slide 23 text

Backend Engineer Wanted! Sampingan is currently seeking a Backend Engineer What you will do: ● Translating business requirements into scalable technical solutions; ● Producing high-quality maintainable code, testing and collaboratively review it to ensure eﬃciency; ● Pairing with team members on functional and nonfunctional requirements and spread design philosophy, goals and improve the code quality across the team; ● Participating in preparing systems requirements, speciﬁcations and design; ● Ensuring maintainability of core app assets and artifacts; ● Researching new tools, learn and experiment with new languages and technologies and growing continuously with us; ● Continuously refactoring applications and architectures to maintain high quality levels and experience in troubleshooting server performance - memory issues, GC tuning and resource leaks If you are interested, send your CV to [email protected]. Don't forget to include the role name and your name on the subject line. Sampingan @sampingan.business & @sampingan.id Follow us on www.sampingan.co.id

Slide 24

Slide 24 text

Q&A