Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Kafka

Apache Kafka

This presentation gives an overview of the Apache Kafka project. It covers areas like producer, consumer, topic, partitions, API's, architecture and usage.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Music by

"Little Planet", composed and performed by Bensound from http://www.bensound.com/

Mike Frampton

May 16, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache Kafka ? • A stream processing platform

    • Open source / Apache 2.0 license • Written in Java and Scala • A publish/subscribe system for record streams • Scaleable / fault tolerant • Topic based partition FIFO queues
  2. How Does Kafka Work ? • Kafka runs as a

    cluster of servers • Stores records in topics • Topics are partitioned into queues • Partitions are stored across cluster • Consumers organised into groups • Stream processors transform records • Reusable connectors process queues – For instance database connectors
  3. Kafka API'S • Producer API – Allows applications to publish

    to topics • Consumer API – Applications subscribe to topics / process data streams • Streams API – Applications acts as stream processor, transforming stream • Connector API – Build reusable producers / consumers – I.E. RDBMS connectors/producers/consumers • Admin API – For topic and broker management
  4. Kafka Topic Queue Offsets • Records published to Topics •

    Topics are multi subscriber • Topics contain partition queues • A partition queue contains an sequence of records • Each record has a queue offset ( position ) • Consumers use the offset to read records • Queue record retention is configurable
  5. Kafka Producer Consumer • Producers write to partitions i.e. Producer1

    → P0 • Producers responsible for record → partition mapping • Kafka only guarantees order with a partition • Kafka cluster contains <n> servers • Partitions mapped to servers • Consumers members of consumer groups • Each consumer must maintain it's partition read offset
  6. Kafka's Stack Role • A low latency messaging system –

    Records load balanced across partitions • As a storage system – Using local file system storage – Scales horizontally in terms of performance • As a stream processing system – Using stream API to transform data • Data replication provides fault tolerance
  7. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” – • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  8. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration