Slide 1

Slide 1 text

Apache Kafka Tech Confluence January 28th, 2015

Slide 2

Slide 2 text

What is Kafka?

Slide 3

Slide 3 text

Scalable Pub/Sub & Queueing

Slide 4

Slide 4 text

Kafka • Built by LinkedIn • Powers their real-time features • Topic: a logical queue • Partitioned: multiple “logs” • A single topic lives on multiple machines, a partition is bound to a machine* • *replication

Slide 5

Slide 5 text

Use Cases • Event logging • Application logs, JSON events, etc. • Event sourcing • Logging mutations to replay • Commit logs • Messaging • Batch and realtime ingestion • Metrics

Slide 6

Slide 6 text

Distributed, Partitioned Commit Log like a log file

Slide 7

Slide 7 text

Distributed, Partitioned Commit Log like tail -F

Slide 8

Slide 8 text

Single Producer, Multiple Consumer fanout

Slide 9

Slide 9 text

Durable • Kafka’s “killer app” • Partition like a huge disk-backed circular buffer • Tunable retention, default: 7 days • Consumers could go away for hours and pick up right where they left off • Enables batch & realtime case • Try that with RabbitMQ. I dare you. • Can run a new version at the same time as old version! Compare side to side

Slide 10

Slide 10 text

Made for Speed • Pushes expensive operations to consumers/ producers • Compression • Work tracking • Smart design leverages OS internals • Almost all linear IO: no SSDs wanted • Consumers are cheap

Slide 11

Slide 11 text

Benefits • Horizontally scales very well • millions of messages/s on fairly low-spec hardware • https://engineering.linkedin.com/kafka/benchmarking- apache-kafka-2-million-writes-second-three-cheap-machines • Decouples producers from consumers • Keeps data around • Rewind, redo! • Democratizes the use of data in org — “realtime warehouse” • Simplifies many state usecases

Slide 12

Slide 12 text

Who uses it? • Used in dozens of companies: • LinkedIn • Pinterest • Twitter • Netflix • scores of analytics and metrics companies • FullContact :)

Slide 13

Slide 13 text

Questions? • There’s another 40 minutes worth of things I can talk about, lots of interesting features & design. • Interesting links: • http://kafka.apache.org/documentation.html • http://samza.apache.org/ (Stream processor built on top of Kafka)