Intro to Apache Kafka #phpbnl20

Intro to Apache Kafka #phpbnl20

A 30-minute introduction to the features of Apache Kafka, the anatomy of a Kafka cluster, and how to talk to a Kafka cluster once you've got one.

This talk was held as part of the PHPBenelux 2020 Unconference track. Please give me feedback for this talk on


Tobias Gies

January 25, 2020


  1. 3.

    3 Kafka is not just a message queue • Distributed

    data streaming and storage platform • Publish / subscribe model • Fault-tolerant and scalable (partitioning and replication are first-class citizens) • Read: At-least-once delivery or exactly-once-delivery* • Write: Consistency settings can be tuned to fit use-case (trade-off vs. data loss risk)
  2. 4.

    4 Kafka is not just a message queue (2) •

    Reactive to the core: You always work with streams of data, never static tables • Strictly ordered (per partition, more on this later) • Data compaction: Different ways of getting rid of old* data (more on this later)
  3. 5.

    5 Kafka is FAST. Like, seriously. • Millions of messages

    per second are not a big problem, even on small clusters • Most often, speed is I/O-limited (disk, network interface…) • Very little delay for a single message • Does not need expensive hardware (comparatively)
  4. 7.

    7 Brokers and Clusters Image from "Kafka in a Nutshell"

    by Kevin Sookocheff • Broker is what Kafka calls a single server. • Multiple brokers form a Cluster. • Data is replicated to several brokers in the cluster, with one broker the Leader for a given partition of data. • Reads and writes for a partition are served by its leader. • The leader coordinates replication. • In case of failure, a replica will take over leadership.
  5. 8.

    Kafka Topics • One Topic consists of one or more

    Partitions… • … which each contain any number of Messages. • Partitions have an ordering guarantee: Messages will be stored in the same order they are written.
  6. 9.

    9 Messages Consist of… • Headers (e.g. Timestamp) • Key

    (Byte-Array) • Value (Byte-Array) Default maximum message size is ~1 MB.
  7. 10.

    Topics (2) – Partition assignment Two options for partition assignment:

    • If the message has no key (Key is null), partition assignment happens round-robin • If the message has a key, partition assignment happens based on key hash… • … which means messages with the same key will always be in the same partition.
  8. 11.

    11 Messages (2) – Special cases • In an event

    stream topic (example: access log): • All messages have null keys, because there is no meaningful identity for an event • In a data changelog topic (example: topic ingested from a database): • A message with null value marks a deleted record
  9. 12.

    12 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

    Key Value Key Value User1 User2 User1 User3
  10. 13.

    13 Partition 1 Partition 2 Key Value Key Value User1 User2 User1 User3 User2 null Topics (3) – (Change-)Log Compaction
  11. 14.

    14 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

    Key Value Key Value User1 User2 User1 User3 User2 null
  12. 15.

    15 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

    Key Value Key Value User1 User2 User1 User3 User2 null
  13. 16.

    16 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

    Key Value Key Value User1 User2 User1 User3 User2 null
  14. 17.

    Part 3 Interacting with Kafka How to get data in

    and out, aggregate and transform it
  15. 18.

    18 Terminology • Producers put data into Kafka • Consumers

    read data from Kafka • Connectors link Kafka to external data stores • Stream processors filter, merge, aggregate, and transform data
  16. 19.

    A basic Kafka Producer in PHP $conf = new RdKafka\Conf();

    $conf->set('', 'localhost:9092'); $producer = new RdKafka\Producer($conf); $topic = $producer->newTopic("test"); for ($i = 0; $i < 10; $i++) { $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Message $i"); $producer->poll(0); } // TODO: ensure producer flush on shutdown
  17. 20.

    A basic Kafka Consumer in PHP $conf = new RdKafka\Conf();

    $consumer = new RdKafka\KafkaConsumer($conf); $consumer->subscribe(['test']); while (true) { // TODO: This should probably be a proper event loop... $message = $consumer->consume(120 * 1000); switch ($message->err) { case RD_KAFKA_RESP_ERR_NO_ERROR: var_dump($message); break; // TODO: Handle errors } }
  18. 21.

    21 Kafka's own frameworks & services • Kafka Connect: •

    Links external data stores to Kafka using pre-built Connectors that only need configuration. • Great tool to make existing systems' data accessible in Kafka. • Kafka Streams: • Java framework for stream processing. Build your own stream processor in a single Java file. • Manages consumers, producers, data stores, etc. transparently. • KSQL: • Java not your thing? Write stream processors in an SQL-like language.
  19. 22.

    22 Bonus slide: More Cool tools • First-party CLI scripts:

    kafka-topics, kafka-consumer-groups, kafka-console-consumer,many more • MirrorMaker: Replicate topics across different Kafka clusters • Kaf: alternative open source CLI client written in Go • Cruise Control: Kafka cluster management, workload rebalancing, self-healing • Debezium: Live replication of data from RDBMS (MySQL &co.) into Kafka • Many more: • •