What is Kafka? - Distributed event streaming platform - Publish-subscribe model - Implemented as persistent and durable append- only log(s) - Dumb broker, smart consumer (contrary to standard message queues) - Highly performant and scalable (if it’s good enough for LinkedIn, it’s gonna be just fine for you)
A Message queue or a persistent storage? - Messages are not removed from the log once consumed - Controlled via retention policy - Messages are stored in topics that can be partitioned - Messages are identified by “offsets” in a given topic/partition
Practical considerations - Partitions as parallelization unit (one consumer within a consumer group for partition) - More partitions - more throughput - More partitions - potentially more problems as well :( (increased unavailability, latency) - Ordering of the messages is preserved only in a given partition of a topic. This is critical if you care about causality
Kafka vs. RabbitMQ - Two different things! - Different system for a different purpose (one is not better than another) - RabbitMQ is a smart broker/dumb consumer type of system - No persistent/durable storage in Rabbit
Kafka use cases - Pub-Sub system for (micro)services architecture - Activity Tracking - Metrics and Analytics - Stream processing - Event Sourcing - Cross-database replication (if you know what you are doing)