Slide 1

Slide 1 text

The Importance of Observability for Kafka-based applications with Zipkin [email protected]

Slide 2

Slide 2 text

Jorge Quilcate-Otoya @jeqo89 github.com/jeqo github.com/sysco-middleware Middleware team at SYSCO AS focused on Data-Integration and Distributed Tracing

Slide 3

Slide 3 text

SYSCO AS Middleware department: Integration and Data Engineering We are hiring! Partners: github.com/sysco-middleware sysco.no/

Slide 4

Slide 4 text

Agenda Event-Driven Applications and Kafka Observability and Distributed Tracing Simulating Observability tools

Slide 5

Slide 5 text

Apache Kafka “Apache Kafka® is a distributed Streaming platform.”

Slide 6

Slide 6 text

Event-Driven Applications and Kafka Amazonas river

Slide 7

Slide 7 text

Event-Driven Architectural Style https://docs.microsoft.com/en-us/azure/architecture/guide/architecture-styles/event-driven

Slide 8

Slide 8 text

Service Collaboration and Dataflow Svc Svc Svc Svc Orchestration Event Bus Svc Svc Svc Svc Choreography

Slide 9

Slide 9 text

https://www.slideshare.net/ConfluentInc/etl-is-dead-long-live-streams Kafka Ecosystem

Slide 10

Slide 10 text

Observability and Distributed Tracing Titicaca Lake

Slide 11

Slide 11 text

What is Observability? “In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” - Wikipedia

Slide 12

Slide 12 text

Observability is for *Unknown Unknowns* https://twitter.com/mipsytipsy/status/963956028940234752

Slide 13

Slide 13 text

Observability methods

Slide 14

Slide 14 text

Observability methods

Slide 15

Slide 15 text

Span = execution of a task Trace = tree of spans Context Propagation = pass trace context between distributed components (e.g. HTTP Headers, Kafka-record Headers) Distributed Tracing Concepts

Slide 16

Slide 16 text

Demo Lab 01: Hello world to Distributed Tracing ● Tracing concepts ● Brave instrumentation https://github.com/jeqo/talk-kafka-zipkin#lab-1-hello-world-distributed-tracing

Slide 17

Slide 17 text

Adoption approaches Annotation-based - Part of your code - Instrument libraries first - Add custom spans on-demand - Check benchmarks Black-box

Slide 18

Slide 18 text

How does it work? Svc 0 Svc 1 tracer tracer Collector Tracing System Tracing DB

Slide 19

Slide 19 text

Zipkin Architecture

Slide 20

Slide 20 text

Demo Lab 02: Tracing Kafka-based applications ● Kafka-clients and Kafka-streams instrumentation ● Kafka Interceptors for Kafka Connectors https://github.com/jeqo/talk-kafka-zipkin#lab-02-twitter-kafka-based-application

Slide 21

Slide 21 text

Adoption approaches Annotation-based - Part of your code - Instrument libraries first - Add custom spans on-demand - Check benchmarks Black-box - Agent-based model - Framework/Protocol support - Machine impact - Promising approach: Service Mesh/Sidecar Proxy

Slide 22

Slide 22 text

Service Meshes and Zipkin

Slide 23

Slide 23 text

#QOTD https://twitter.com/rakyll/status/971231712049971200

Slide 24

Slide 24 text

Simulating Observability tools Lima - Chorrillos

Slide 25

Slide 25 text

➔ Model your architecture ➔ Simulate interaction ➔ Generate Traces ➔ Visualize your system’s traffic with Vizceral “SimianViz/ Spigo” - Simulation Protocol Interaction in GO github.com/adrianco/spigo

Slide 26

Slide 26 text

"Monitoring Microservices: A Challenge" - Adrian Cockcroft

Slide 27

Slide 27 text

Models from Traces, e.g. Vizceral https://www.youtube.com/watch?v=jWpI8qzqNHk

Slide 28

Slide 28 text

Demo Lab 03: Spigo and Vizceral ● Spigo for Simulation of Architecture behavior ● Zipkin for Tracing and Vizceral for Traffic Monitoring https://github.com/jeqo/talk-kafka-zipkin#lab-3-spigo-simulation

Slide 29

Slide 29 text

Takeaways ➔ If are doing Distributed Systems — using Kafka or not — consider Distributed Tracing. ➔ Instrument libraries first, not your code. ➔ Experiment by simulating your deployment. ➔ How many models can you build from tracing data?!

Slide 30

Slide 30 text

References Papers - Dapper: https://static.googleusercontent.com/media/research.google.com /en//pubs/archive/36356.pdf - Canopy: http://cs.brown.edu/~jcmace/papers/kaldor2017canopy.pdf - Automating Failure Testing Research at Internet Scale: https://people.ucsc.edu/~palvaro/socc16.pdf Posts: - Logging v. Instrumentation https://peter.bourgon.org/blog/2016/02/07/logging-v-instrument ation.html - Monitoring and Observability https://medium.com/@copyconstruct/monitoring-and-observability -8417d1952e1c - Monitoring in the Time of Cloud Native https://medium.com/@copyconstruct/monitoring-in-the-time-of-cl oud-native-c87c7a5bfa3e Tools: - Zipkin: https://zipkin.io/ - Brave: https://github.com/openzipkin/brave - Kafka Interceptors: https://github.com/sysco-middleware/kafka-interceptors - Spigo: https://github.com/adrianco/spigo - Vizceral: https://github.com/Netflix/vizceral

Slide 31

Slide 31 text

Thanks! Q&A github.com/jeqo/talk-kafka-zipkin github.com/sysco-middleware Machu Picchu