Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Apache Kafka #phpbnl20

Intro to Apache Kafka #phpbnl20

A 30-minute introduction to the features of Apache Kafka, the anatomy of a Kafka cluster, and how to talk to a Kafka cluster once you've got one.

This talk was held as part of the PHPBenelux 2020 Unconference track. Please give me feedback for this talk on Joind.in: https://joind.in/talk/27a56

Tobias Gies

January 25, 2020
Tweet

More Decks by Tobias Gies

Other Decks in Programming

Transcript

  1. Introduction to
    Apache Kafka
    PHPBenelux conference
    2020-01-25

    View Slide

  2. Part 1
    The feature pitch
    Why I think this Kafka thing is pretty cool

    View Slide

  3. 3
    Kafka is not just a message queue
    • Distributed data streaming and storage platform
    • Publish / subscribe model
    • Fault-tolerant and scalable (partitioning and replication are first-class citizens)
    • Read: At-least-once delivery or exactly-once-delivery*
    • Write: Consistency settings can be tuned to fit use-case (trade-off vs. data loss risk)

    View Slide

  4. 4
    Kafka is not just a message queue (2)
    • Reactive to the core: You always work with streams of data, never static tables
    • Strictly ordered (per partition, more on this later)
    • Data compaction: Different ways of getting rid of old* data (more on this later)

    View Slide

  5. 5
    Kafka is FAST. Like, seriously.
    • Millions of messages per second are not a big problem, even on small clusters
    • Most often, speed is I/O-limited (disk, network interface…)
    • Very little delay for a single message
    • Does not need expensive hardware (comparatively)

    View Slide

  6. Part 2
    High-Level Concepts
    A Kafka anatomy lesson

    View Slide

  7. 7
    Brokers and Clusters
    Image from "Kafka in a Nutshell" by Kevin Sookocheff
    • Broker is what Kafka calls a single server.
    • Multiple brokers form a Cluster.
    • Data is replicated to several brokers in the
    cluster, with one broker the Leader for a given
    partition of data.
    • Reads and writes for a partition are served by its
    leader.
    • The leader coordinates replication.
    • In case of failure, a replica will take over
    leadership.

    View Slide

  8. Kafka Topics
    • One Topic consists of one or more Partitions…
    • … which each contain any number of Messages.
    • Partitions have an ordering guarantee: Messages
    will be stored in the same order they are
    written.

    View Slide

  9. 9
    Messages
    Consist of…
    • Headers (e.g. Timestamp)
    • Key (Byte-Array)
    • Value (Byte-Array)
    Default maximum message size is ~1 MB.

    View Slide

  10. Topics (2) – Partition assignment
    Two options for partition assignment:
    • If the message has no key (Key is null), partition
    assignment happens round-robin
    • If the message has a key, partition assignment
    happens based on key hash…
    • … which means messages with the same key will
    always be in the same partition.

    View Slide

  11. 11
    Messages (2) – Special cases
    • In an event stream topic (example: access log):
    • All messages have null keys, because there is no meaningful identity for an event
    • In a data changelog topic (example: topic ingested from a database):
    • A message with null value marks a deleted record

    View Slide

  12. 12
    Topics (3) – (Change-)Log Compaction
    Partition 1 Partition 2
    Key Value Key Value
    User1 [email protected]
    User2 [email protected]
    User1 [email protected]
    User3 [email protected]

    View Slide

  13. 13
    Partition 1 Partition 2
    Key Value Key Value
    User1 [email protected]
    User2 [email protected]
    User1 [email protected]
    User3 [email protected]
    User2 null
    Topics (3) – (Change-)Log Compaction

    View Slide

  14. 14
    Topics (3) – (Change-)Log Compaction
    Partition 1 Partition 2
    Key Value Key Value
    User1 [email protected]
    User2 [email protected]
    User1 [email protected]
    User3 [email protected]
    User2 null

    View Slide

  15. 15
    Topics (3) – (Change-)Log Compaction
    Partition 1 Partition 2
    Key Value Key Value
    User1 [email protected]
    User2 [email protected]
    User1 [email protected]
    User3 [email protected]
    User2 null

    View Slide

  16. 16
    Topics (3) – (Change-)Log Compaction
    Partition 1 Partition 2
    Key Value Key Value
    User1 [email protected]
    User2 [email protected]
    User1 [email protected]
    User3 [email protected]
    User2 null

    View Slide

  17. Part 3
    Interacting with Kafka
    How to get data in and out, aggregate and transform it

    View Slide

  18. 18
    Terminology
    • Producers put data into Kafka
    • Consumers read data from Kafka
    • Connectors link Kafka to external data stores
    • Stream processors filter, merge, aggregate, and
    transform data

    View Slide

  19. A basic Kafka Producer in PHP
    $conf = new RdKafka\Conf();
    $conf->set('metadata.broker.list', 'localhost:9092');
    $producer = new RdKafka\Producer($conf);
    $topic = $producer->newTopic("test");
    for ($i = 0; $i < 10; $i++) {
    $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Message $i");
    $producer->poll(0);
    }
    // TODO: ensure producer flush on shutdown

    View Slide

  20. A basic Kafka Consumer in PHP
    $conf = new RdKafka\Conf();
    $consumer = new RdKafka\KafkaConsumer($conf);
    $consumer->subscribe(['test']);
    while (true) { // TODO: This should probably be a proper event loop...
    $message = $consumer->consume(120 * 1000);
    switch ($message->err) {
    case RD_KAFKA_RESP_ERR_NO_ERROR:
    var_dump($message);
    break;
    // TODO: Handle errors
    }
    }

    View Slide

  21. 21
    Kafka's own frameworks & services
    • Kafka Connect:
    • Links external data stores to Kafka using pre-built Connectors that only need configuration.
    • Great tool to make existing systems' data accessible in Kafka.
    • Kafka Streams:
    • Java framework for stream processing. Build your own stream processor in a single Java file.
    • Manages consumers, producers, data stores, etc. transparently.
    • KSQL:
    • Java not your thing? Write stream processors in an SQL-like language.

    View Slide

  22. 22
    Bonus slide: More Cool tools
    • First-party CLI scripts: kafka-topics, kafka-consumer-groups, kafka-console-consumer,many more
    • MirrorMaker: Replicate topics across different Kafka clusters
    • Kaf: alternative open source CLI client written in Go
    • Cruise Control: Kafka cluster management, workload rebalancing, self-healing
    • Debezium: Live replication of data from RDBMS (MySQL &co.) into Kafka
    • Many more:
    • https://github.com/monksy/awesome-kafka/
    • https://github.com/infoslack/awesome-kafka/

    View Slide

  23. Thank you!
    Got feedback?
    https://joind.in/talk/27a56

    View Slide