github: jeqo Peruvian in Oslo, Norway Integration team at SYSCO AS Unclogging Data pipelines by day Contributing to Apache Kafka and OpenZipkin communities by night
V> consumer = new KafkaConsumer<>(settings); Consumer<K, V> tracedConsumer = kafkaTracing.consumer(consumer); while (running) { var records = consumer.poll(1000); records.forEach(this::process); }
b = new StreamsBuilder(); b.stream("input-topic") .map(this::prepare)) .join(table, this::tableJoiner) .transformValues(this::transform)) .to("output-topic"); var topology = b.build();
// how long to wait for another span .windowedBy(SessionWindows.with(timeout)...) .aggregate(ArrayList::new, aggregateSpans(), joinAggregates(), ...) // hold until a new record tells that a window is closed and we can process it further .suppress(untilWindowCloses(unbounded())) .toStream()
data source to deal with system collaboration complexity Distributed Tracing pipeline is an Event Streaming pipeline Focus on aggregation, model extraction and signals (!)
Blog post: confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications * Zipkin Kafka Backend: github.com/openzipkin-contrib/zipkin-storage-kafka * Kafka Interceptor for Zipkin: github.com/sysco-middleware/kafka-interceptor-zipkin * Sites using Zipkin: github.com/openzipkin/openzipkin.github.io/wiki/Sites * Haystack: github.com/ExpediaDotCom/haystack, gitter.im/expedia-haystack * Martin Kleppmann et al. 2019. Online Event Processing. https://dl.acm.org/citation.cfm?id=3321612 * John Ousterhout. A Philosophy of Software Design. amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201 * Jonathan Kaldor et al.2017. Canopy: An End-to-End Performance Tracing And Analysis System.SOSP’17(2017). doi.org/10.1145/3132747.3132749 * Peter Alvaro et al.2016. Automating Failure Testing Research at Internet Scale.SoCC ’16. dx.doi.org/10.1145/2987550.2987555 * Mark Burgess 2019. From Observability to Significance in Distributed Information Systems. https://arxiv.org/abs/1907.05636 Resources