Slide 1

Slide 1 text

Making sense of event-driven systems Distributed tracing for Apache Ka a®-based applications with Zipkin @jeqo89 | #codeone

Slide 2

Slide 2 text

@jeqo89 | #codeone “Complexity symptoms: - Change amplification - Cognitive load - Unknown unknowns” John Ousterhout, “A Philosophy of Software Design”

Slide 3

Slide 3 text

@jeqo89 | #codeone “Complexity is caused by two things: dependencies and obscurity” John Ousterhout, “A Philosophy of Software Design”

Slide 4

Slide 4 text

@jeqo89 | #codeone SERVER CLIENT SERVER

Slide 5

Slide 5 text

@jeqo89 | #codeone CLIENT SERVER CLIENT SERVER

Slide 6

Slide 6 text

@jeqo89 | #codeone

Slide 7

Slide 7 text

@jeqo89 | #codeone

Slide 8

Slide 8 text

@jeqo89 | #codeone

Slide 9

Slide 9 text

@jeqo89 | #codeone

Slide 10

Slide 10 text

@jeqo89 | #codeone

Slide 11

Slide 11 text

@jeqo89 | #codeone

Slide 12

Slide 12 text

@jeqo89 | #codeone

Slide 13

Slide 13 text

@jeqo89 | #codeone

Slide 14

Slide 14 text

@jeqo89 | #codeone Complexity happens…

Slide 15

Slide 15 text

@jeqo89 | #codeone twitter.com/rakyll/status/971231712049971200

Slide 16

Slide 16 text

@jeqo89 | #codeone Jorge Esteban Quilcate Otoya twitter: @jeqo89 | github: jeqo Peruvian in Oslo, Norway Integration team at SYSCO AS Unclogging Data pipelines by day Contributing to Apache Kafka and OpenZipkin communities by night

Slide 17

Slide 17 text

@jeqo89 | #codeone Talk: “Making sense of your event-driven systems” 37 min Q&A Why? What’s distributed tracing? How to instrument Kafka apps? Demo What’s next? Demo time depth sync/async causal relation

Slide 18

Slide 18 text

@jeqo89 | #codeone

Slide 19

Slide 19 text

@jeqo89 | #codeone Traces in Zipkin

Slide 20

Slide 20 text

@jeqo89 | #codeone “Demystifying” Kafka client configurations Kafka producers Kafka Streams Kafka Consumer github.com/jeqo/tracing-kafka-apps

Slide 21

Slide 21 text

@jeqo89 | #codeone github.com/jeqo/tracing-kafka-apps Blocking call kafkaProducer.send(record, (metadata, exception) -> { //... }).get(); // Synchronous send

Slide 22

Slide 22 text

@jeqo89 | #codeone github.com/jeqo/tracing-kafka-apps Non-blocking call kafkaProducer.send(record, (metadata, exception) -> { //... }); // Async send

Slide 23

Slide 23 text

@jeqo89 | #codeone github.com/jeqo/tracing-kafka-apps Batched send var producerConfig = new Properties(); // ... producerConfig.put(ProducerConfig.LINGER_MS_CONFIG, 1_000); producerConfig.put(ProducerConfig.BATCH_SIZE_CONFIG, 100_000);

Slide 24

Slide 24 text

@jeqo89 | #codeone github.com/jeqo/tracing-kafka-apps auto.commit=true config = new Properties(); //... config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);

Slide 25

Slide 25 text

@jeqo89 | #codeone github.com/jeqo/tracing-kafka-apps commit per record var records = consumer.poll(TIMEOUT); records.forEach(record -> { doSomething(record); consumer.commitSync(Map.of(new TopicPartition(...), new OffsetAndMetadata(...))); });

Slide 26

Slide 26 text

@jeqo89 | #codeone Record

Slide 27

Slide 27 text

@jeqo89 | #codeone CLIENT SERVER TraceContext=abc tracer tracer Annotation-based TRACES

Slide 28

Slide 28 text

@jeqo89 | #codeone PRODUCER CONSUMER tracer tracer TraceContext=abc BROKER TraceContext=abc TRACES

Slide 29

Slide 29 text

@jeqo89 | #codeone /** Annotation-based approach **/ ScopedSpan span = tracer.startScopedSpan("process"); try { // The span is in "scope" doProcess(); } catch (RuntimeException | Error e) { span.error(e); // mark as error throw e; } finally { span.finish(); // always finish }

Slide 30

Slide 30 text

@jeqo89 | #codeone /** Annotation-based approach **/ ScopedSpan span = tracer.startScopedSpan("process"); try { // The span is in "scope" doProcess(); } catch (RuntimeException | Error e) { span.error(e); // mark as error throw e; } finally { span.finish(); // always finish }

Slide 31

Slide 31 text

@jeqo89 | #codeone /** Annotation-based approach **/ ScopedSpan span = tracer.startScopedSpan("process"); try { // The span is in "scope" doProcess(); } catch (RuntimeException | Error e) { span.error(e); // mark as error throw e; } finally { span.finish(); // always finish }

Slide 32

Slide 32 text

@jeqo89 | #codeone github.com/openzipkin/b3-propagation B3 Propagation

Slide 33

Slide 33 text

@jeqo89 | #codeone github.com/openzipkin/b3-propagation B3 Propagation

Slide 34

Slide 34 text

@jeqo89 | #codeone CLIENT SERVER TraceContext=abc Black-box agent agent TRACES linkerd.io/2019/08/09/service-mesh-distributed-tracing-myths/

Slide 35

Slide 35 text

@jeqo89 | #codeone linkerd.io/2019/08/09/service-mesh-distributed-tracing-myths/

Slide 36

Slide 36 text

@jeqo89 | #codeone linkerd.io/2019/08/09/service-mesh-distributed-tracing-myths/

Slide 37

Slide 37 text

@jeqo89 | #codeone CLIENT SERVER TraceContext=abc mixed agent agent TRACES tracer tracer linkerd.io/2019/08/09/service-mesh-distributed-tracing-myths/

Slide 38

Slide 38 text

@jeqo89 | #codeone “The more accurately you try to measure the position of a particle, the less accurately you can measure its speed” Heisenberg's uncertainty principle

Slide 39

Slide 39 text

@jeqo89 | #codeone Record services Collect

Slide 40

Slide 40 text

@jeqo89 | #codeone Record Collect Store BringYourOwnDB services

Slide 41

Slide 41 text

@jeqo89 | #codeone Record Collect Store Dependencies (batch) services

Slide 42

Slide 42 text

@jeqo89 | #codeone Streaming Messaging Kafka Clients REST Proxy KSQL Kafka Source Connector Kafka Streams Kafka Sink Connector

Slide 43

Slide 43 text

@jeqo89 | #codeone /** Instrumentation for Kafka Clients **/ Producer producer = new KafkaProducer<>(settings); Producer tracedProducer = kafkaTracing.producer(producer); producer.send( new ProducerRecord<>( "my-topic", key, value ));

Slide 44

Slide 44 text

@jeqo89 | #codeone /** Instrumentation for Kafka Clients **/ Producer producer = new KafkaProducer<>(settings); Producer tracedProducer = kafkaTracing.producer(producer); // wrap tracedProducer.send( new ProducerRecord<>( "my-topic", key, value ));

Slide 45

Slide 45 text

@jeqo89 | #codeone /** Instrumentation for Kafka Clients **/ Consumer consumer = new KafkaConsumer<>(settings); Consumer tracedConsumer = kafkaTracing.consumer(consumer); while (running) { var records = consumer.poll(1000); records.forEach(this::process); }

Slide 46

Slide 46 text

@jeqo89 | #codeone /** Instrumentation for Kafka Clients **/ Consumer consumer = new KafkaConsumer<>(settings); Consumer tracedConsumer = kafkaTracing.consumer(consumer); // wrap while (running) { var records = tracedConsumer.poll(1000); records.forEach(this::process); }

Slide 47

Slide 47 text

@jeqo89 | #codeone /** Instrumentation for Kafka Clients **/ void process(ConsumerRecord record){ // extract span from record headers Span span = kafkaTracing.nextSpan(record) .name("process") .start(); try (var ws = tracer.withSpanInScope(span)) { doProcess(record); } catch (RuntimeException | Error e) { span.error(e); throw e; } finally { span.finish(); } }

Slide 48

Slide 48 text

@jeqo89 | #codeone /** Instrumentation for Kafka Streams **/ var b = new StreamsBuilder(); b.stream("input-topic") .map(this::prepare)) .join(table, this::tableJoiner) .transformValues(this::transform)) .to("output-topic"); var topology = b.build(); KafkaStreams kafkaStreams = new KafkaStreams(topology, config); kafkaStreams.start();

Slide 49

Slide 49 text

@jeqo89 | #codeone /** Instrumentation for Kafka Streams **/ var b = new StreamsBuilder(); b.stream("input-topic") .map(this::prepare)) .join(table, this::tableJoiner) .transformValues(this::transform)) .to("output-topic"); var topology = b.build(); KafkaStreams kafkaStreams = // wrap ksTracing.kafkaStreams(topology, config); kafkaStreams.start();

Slide 50

Slide 50 text

@jeqo89 | #codeone /** Instrumentation for Kafka Streams **/ var b = new StreamsBuilder(); b.stream("input-topic") .map(this::prepare)) .join(table, this::tableJoiner) .transformValues(this::transform)) .to("output-topic"); var topology = b.build();

Slide 51

Slide 51 text

@jeqo89 | #codeone /** Instrumentation for Kafka Streams **/ var b = new StreamsBuilder(); b.stream("input-topic") .transform(ksTracing.map(“preparing”, () -> this::prepare)) .join(table, this::tableJoiner) .transformValues(ksTracing.transformValues( “transforming”, () -> this::transform))) .to("output-topic");

Slide 52

Slide 52 text

@jeqo89 | #codeone REST Proxy KSQL Kafka Source Connector Kafka Sink Connector Kafka Interceptors

Slide 53

Slide 53 text

@jeqo89 at #kafkasummit Demo: Tracing Kafka-based applications github.com/jeqo/talk-kafka-zipkin

Slide 54

Slide 54 text

@jeqo89 | #codeone https://www.confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications

Slide 55

Slide 55 text

@jeqo89 | #codeone Record Collect Store Dependencies (batch) Distributed Tracing is a Event Streaming problem Data-at-rest services

Slide 56

Slide 56 text

@jeqo89 | #codeone Netflix pipeline

Slide 57

Slide 57 text

@jeqo89 | #codeone Netflix pipeline

Slide 58

Slide 58 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem Span Collected partitioned-spans Traces Store github.com/openzipkin-contrib/zipkin-storage-kafka

Slide 59

Slide 59 text

@jeqo89 | #codeone Scatter-gather/scalable back-end kafka.apache.org/documentation/streams/developer-guide/interactive-queries.html#adding-an-rpc-layer-to-your-application

Slide 60

Slide 60 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem Span Collected Trace Aggregation partitioned-spans traces-completed Traces Store github.com/openzipkin-contrib/zipkin-storage-kafka

Slide 61

Slide 61 text

@jeqo89 | #codeone Session windows and suppression kafka.apache.org/documentation/streams/developer-guide/dsl-api.html#session-windows

Slide 62

Slide 62 text

@jeqo89 | #codeone Session windows and suppression kafka.apache.org/documentation/streams/developer-guide/dsl-api.html#session-windows

Slide 63

Slide 63 text

@jeqo89 | #codeone var b = new StreamsBuilder(); b.stream(spansTopic, ...).groupByKey() // how long to wait for another span .windowedBy(SessionWindows.with(timeout)...) .aggregate(ArrayList::new, aggregateSpans(), joinAggregates(), ...) // hold until a new record tells that a window is closed and we can process it further .suppress(untilWindowCloses(unbounded())) .toStream()

Slide 64

Slide 64 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem Span Collected Trace Aggregation partitioned-spans traces-completed dependencies Traces Store github.com/openzipkin-contrib/zipkin-storage-kafka

Slide 65

Slide 65 text

@jeqo89 | #codeone var b = new StreamsBuilder(); b.stream(spansTopic, ...).groupByKey() ... // session windows and suppression .toStream() // traceStream .flatMapValues(spansToDependencyLinks()) .selectKey((key, value) -> linkKey(value)) .to(dependencyTopic, ...));

Slide 66

Slide 66 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem Span Collected Trace Aggregation Traces Store partitioned-spans traces-completed Dependencies Store dependencies github.com/openzipkin-contrib/zipkin-storage-kafka

Slide 67

Slide 67 text

@jeqo89 | #codeone Tumbling window aggregates kafka.apache.org/documentation/streams/developer-guide/dsl-api.html#session-windows

Slide 68

Slide 68 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem Span Collected Trace Aggregation Traces Store partitioned-spans traces-completed Dependencies Store Custom processors dependencies github.com/openzipkin-contrib/zipkin-storage-kafka

Slide 69

Slide 69 text

@jeqo89 | #codeone Distributed Tracing is a Stream Processing Problem partitioned-spans traces-completed error-traces dependencies github.com/jeqo/zipkin-storage-kafka-experiments github.com/openzipkin/openzipkin.github.io/wiki/2018-07-18-Aggregation-and-Analysis-at-Netflix Long-term Error Trace Store path-aggregate

Slide 70

Slide 70 text

@jeqo89 | #codeone Haystack: tracing and analysis platform Tracing Trends and Metrics Anomaly Detection Remediation Alerting services expediadotcom.github.io/haystack/

Slide 71

Slide 71 text

@jeqo89 | #codeone expediadotcom.github.io/haystack/ Haystack: tracing and analysis platform

Slide 72

Slide 72 text

@jeqo89 | #codeone expediadotcom.github.io/haystack/ Haystack

Slide 73

Slide 73 text

@jeqo89 | #codeone Wrapping up: Consider Distributed Tracing as better data source to deal with system collaboration complexity Distributed Tracing pipeline is an Event Streaming pipeline Focus on aggregation, model extraction and signals (!)

Slide 74

Slide 74 text

@jeqo89 | #codeone * Demos, source code: github.com/jeqo/tracing-kafka-apps github.com/jeqo/talk-kafka-zipkin * Blog post: confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications * Zipkin Kafka Backend: github.com/openzipkin-contrib/zipkin-storage-kafka * Kafka Interceptor for Zipkin: github.com/sysco-middleware/kafka-interceptor-zipkin * Sites using Zipkin: github.com/openzipkin/openzipkin.github.io/wiki/Sites * Haystack: github.com/ExpediaDotCom/haystack, gitter.im/expedia-haystack * Martin Kleppmann et al. 2019. Online Event Processing. https://dl.acm.org/citation.cfm?id=3321612 * John Ousterhout. A Philosophy of Software Design. amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201 * Jonathan Kaldor et al.2017. Canopy: An End-to-End Performance Tracing And Analysis System.SOSP’17(2017). doi.org/10.1145/3132747.3132749 * Peter Alvaro et al.2016. Automating Failure Testing Research at Internet Scale.SoCC ’16. dx.doi.org/10.1145/2987550.2987555 * Mark Burgess 2019. From Observability to Significance in Distributed Information Systems. https://arxiv.org/abs/1907.05636 Resources

Slide 75

Slide 75 text

@jeqo89 | #codeone github.com/openzipkin/openzipkin.github.io/wiki/Sites

Slide 76

Slide 76 text

@jeqo89 | #codeone fin github.com/jeqo/talk-kafka-zipkin gitter.im/openzipkin/zipkin github.com/openzipkin