Slide 1

Slide 1 text

Intro to Kafka

Slide 2

Slide 2 text

What is Kafka? - Distributed event streaming platform - Publish-subscribe model - Implemented as persistent and durable append- only log(s) - Dumb broker, smart consumer (contrary to standard message queues) - Highly performant and scalable (if it’s good enough for LinkedIn, it’s gonna be just fine for you)

Slide 3

Slide 3 text

A Message queue or a persistent storage? - Messages are not removed from the log once consumed - Controlled via retention policy - Messages are stored in topics that can be partitioned - Messages are identified by “offsets” in a given topic/partition

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Practical considerations - Semantics: at-most-once delivery vs. at-least- once delivery - Exactly-once delivery? - Log compaction

Slide 6

Slide 6 text

Practical considerations - Partitions as parallelization unit (one consumer within a consumer group for partition) - More partitions - more throughput - More partitions - potentially more problems as well :( (increased unavailability, latency) - Ordering of the messages is preserved only in a given partition of a topic. This is critical if you care about causality

Slide 7

Slide 7 text

Kafka vs. RabbitMQ - Two different things! - Different system for a different purpose (one is not better than another) - RabbitMQ is a smart broker/dumb consumer type of system - No persistent/durable storage in Rabbit

Slide 8

Slide 8 text

Kafka killer features summary - Performance - Durability - Ability to replay events (even indefinitely, durability is for real) - Log compaction - Heart of the event-driven ecosystems (Kafka Streams, Apache Samza/Spark/Flink, oh my!)

Slide 9

Slide 9 text

Kafka use cases - Pub-Sub system for (micro)services architecture - Activity Tracking - Metrics and Analytics - Stream processing - Event Sourcing - Cross-database replication (if you know what you are doing)

Slide 10

Slide 10 text

Thanks!