Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Kafka

Apache Kafka

Basic 10 minute intro to Kafka, given at Tech Confluence, January 2015

6df0f0cdde29041510e787ac49f0e930?s=128

Michael Rose

January 29, 2015
Tweet

Transcript

  1. Apache Kafka Tech Confluence January 28th, 2015

  2. What is Kafka?

  3. Scalable Pub/Sub & Queueing

  4. Kafka • Built by LinkedIn • Powers their real-time features

    • Topic: a logical queue • Partitioned: multiple “logs” • A single topic lives on multiple machines, a partition is bound to a machine* • *replication
  5. Use Cases • Event logging • Application logs, JSON events,

    etc. • Event sourcing • Logging mutations to replay • Commit logs • Messaging • Batch and realtime ingestion • Metrics
  6. Distributed, Partitioned Commit Log like a log file

  7. Distributed, Partitioned Commit Log like tail -F

  8. Single Producer, Multiple Consumer fanout

  9. Durable • Kafka’s “killer app” • Partition like a huge

    disk-backed circular buffer • Tunable retention, default: 7 days • Consumers could go away for hours and pick up right where they left off • Enables batch & realtime case • Try that with RabbitMQ. I dare you. • Can run a new version at the same time as old version! Compare side to side
  10. Made for Speed • Pushes expensive operations to consumers/ producers

    • Compression • Work tracking • Smart design leverages OS internals • Almost all linear IO: no SSDs wanted • Consumers are cheap
  11. Benefits • Horizontally scales very well • millions of messages/s

    on fairly low-spec hardware • https://engineering.linkedin.com/kafka/benchmarking- apache-kafka-2-million-writes-second-three-cheap-machines • Decouples producers from consumers • Keeps data around • Rewind, redo! • Democratizes the use of data in org — “realtime warehouse” • Simplifies many state usecases
  12. Who uses it? • Used in dozens of companies: •

    LinkedIn • Pinterest • Twitter • Netflix • scores of analytics and metrics companies • FullContact :)
  13. Questions? • There’s another 40 minutes worth of things I

    can talk about, lots of interesting features & design. • Interesting links: • http://kafka.apache.org/documentation.html • http://samza.apache.org/ (Stream processor built on top of Kafka)