Slide 1

Slide 1 text

Enable effective observability with Python Ernesto Arbitrio

Slide 2

Slide 2 text

🤓 Who am I Senior Software Engineer at Crunch.io/ YouGov. Python passionate. Python Italia Vicepresident. I do love cooking and eating 🍷 🍝 🥩

Slide 3

Slide 3 text

Enable effective observability with Python Ernesto Arbitrio

Slide 4

Slide 4 text

What is observability?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Looking inside the system

Slide 7

Slide 7 text

Why?

Slide 8

Slide 8 text

The 3 pillars of observability Logs Metrics Traces

Slide 9

Slide 9 text

Everything is based on events -Logging: recording events -Metrics: data combined from measuring events -Tracing: recording events with casual ordering Credit: codahale

Slide 10

Slide 10 text

Everything is based on events -Log: single events (response time) -Metric: aggregated data (response time) -Trace: detailed tree (response time)

Slide 11

Slide 11 text

Logs show response time 10.100.5.3 - - [23/Feb/2018:10:27:30 +0530] GET /store/ HTTP/1.1 200 6406 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36 493ms This request took 493ms!

Slide 12

Slide 12 text

Metrics show response time Is 493ms slow? How fast were most requests at 10.27am? 12:00 AM 02:00 AM 04:00 AM 06:00 AM 08:00 AM 10:00 AM 12:00 PM 02:00 PM 04:00 PM 06:00 PM 08:00 PM 10:00 PM SUN WED FRI SLOWEST FASTEST

Slide 13

Slide 13 text

Traces show response time What caused the request to take ~493 ms?

Slide 14

Slide 14 text

Thoughts -Log: easy to “grep”, manually read -Metric: trend identification -Trace: identify cause across services

Slide 15

Slide 15 text

Distributed tracing

Slide 16

Slide 16 text

Distributed tracing -Span ID -Parent Span ID -Start time -End time -Additional context (metadata)

Slide 17

Slide 17 text

Distributed tracing Store FE, ID:1, PID: none, start 8:30, end 8:50 Catalog, ID: 2, PID: 1, start: 8:32, end: 8:40 Stock, ID: 3, PID: 2, s: 8:34, e: 8:36 Stock, ID: 4, PID: 2, s: 8:35, e: 8:37 Stock: ID: 5, PID: 2, s: 8:36, e: 8:38 Auth, ID: 6, PID: 1, start: 8:40, end: 8:47

Slide 18

Slide 18 text

Sampling Traces that finish with no issues Statistically representative sample of all OK traces Traces with errors Traces with high latency Or would the right sampling be sufficient? Do you really need all of this data? Traces with specific attributes

Slide 19

Slide 19 text

Sampling techniques

Slide 20

Slide 20 text

How to? OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces). Helps you analyze your software’s performance and behavior.

Slide 21

Slide 21 text

A bit of context

Slide 22

Slide 22 text

Reduce tools

Slide 23

Slide 23 text

Using the OTEL collector

Slide 24

Slide 24 text

Architecture of OTEL collector

Slide 25

Slide 25 text

Example trace.set_tracer_provider( TracerProvider(resource=Resource.create({SERVICE_NAME: "ecommerce-web-service"})) ) app = Flask(__name__) FlaskInstrumentor().instrument_app(app) RequestsInstrumentor().instrument() jaeger_exporter = JaegerExporter(agent_host_name="jaeger", agent_port=6831) trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(jaeger_exporter)) products_api_url = os.environ.get("PRODUCTS_API_URL") tracer = trace.get_tracer(__name__) @app.get("/") def index(): with tracer.start_as_current_span("/ GET"): with tracer.start_as_current_span("/ products"): r = requests.get(products_api_url) items = r.json()["items"] return render_template("index.html", items=items) if __name__ == "__main__": app.run(debug=os.environ.get("DEBUG") or True, host="0.0.0.0", port=5001)

Slide 26

Slide 26 text

Example trace.set_tracer_provider( TracerProvider(resource=Resource.create({SERVICE_NAME: "ecommerce-web-service"})) ) app = Flask(__name__) FlaskInstrumentor().instrument_app(app) RequestsInstrumentor().instrument() jaeger_exporter = JaegerExporter(agent_host_name="jaeger", agent_port=6831) trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(jaeger_exporter)) products_api_url = os.environ.get("PRODUCTS_API_URL") tracer = trace.get_tracer(__name__) @app.get("/") def index(): with tracer.start_as_current_span("/ GET"): with tracer.start_as_current_span("/ products"): r = requests.get(products_api_url) items = r.json()["items"] return render_template("index.html", items=items) if __name__ == "__main__": app.run(debug=os.environ.get("DEBUG") or True, host="0.0.0.0", port=5001) Manually sent to backend

Slide 27

Slide 27 text

DEMO 
 🤞

Slide 28

Slide 28 text

Thanks! [email protected] github.com/ernestoarbitrio