Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Plumbr uses Kafka

How Plumbr uses Kafka

Nikita Salnikov-Tarnovski

February 04, 2018
Tweet

More Decks by Nikita Salnikov-Tarnovski

Other Decks in Programming

Transcript

  1. What is Kafka • Distributed streaming platform • It lets

    you publish and subscribe to streams of records • It lets you store streams of records in a fault-tolerant way.
  2. What is Kafka • Kafka runs as a cluster on

    one or more servers. • The Kafka cluster stores streams of records in categories called topics. • Each record consists of a key, a value, and a timestamp.
  3. Brokers • Several brokers form a cluster • Coordinated with

    Zookeeper • All partitions are distributed among brokers
  4. Producers • Producer sends record to a topic • Based

    on a key, partition is chosen • Leader broker is found • Wait for requested acks
  5. Fast writes • Brokers cheat and don’t write to disk

    • They write to disk cache • And let OS care about flushing to disk
  6. Replication • Each topic can be replicated among brokers •

    So for each partition there are X copies • Brokers just consume messages from leader
  7. Kafka Connect • Off-the-shelf solution to pipe data to or

    from Kafka • E.g. DB, Elasticsearch, files, etc…
  8. Kafka Streams • DSL and platform for writing data processing

    streams • If you squint enough, very similar to Java8 streams and Fork-Join pool • But across multiple jvms and servers
  9. Kafka cluster • 5 brokers • 2x replication • 20T

    data for last 90 days • Inflow ~125G per day
  10. Spring Cloud Stream • Greatly simplifies development of Kafka based

    apps • Couple of annotations and data flows :)
  11. Solving performance problems is hard. We don’t think it needs

    to be. @JavaPlumbr/@iNikem http://plumbr.eu