Intro to Apache Kafka #phpbnl20

Introduction to Apache Kafka PHPBenelux conference 2020-01-25

Part 1 The feature pitch Why I think this Kafka
thing is pretty cool

3 Kafka is not just a message queue • Distributed
data streaming and storage platform • Publish / subscribe model • Fault-tolerant and scalable (partitioning and replication are first-class citizens) • Read: At-least-once delivery or exactly-once-delivery* • Write: Consistency settings can be tuned to fit use-case (trade-off vs. data loss risk)

4 Kafka is not just a message queue (2) •
Reactive to the core: You always work with streams of data, never static tables • Strictly ordered (per partition, more on this later) • Data compaction: Different ways of getting rid of old* data (more on this later)

5 Kafka is FAST. Like, seriously. • Millions of messages
per second are not a big problem, even on small clusters • Most often, speed is I/O-limited (disk, network interface…) • Very little delay for a single message • Does not need expensive hardware (comparatively)

Part 2 High-Level Concepts A Kafka anatomy lesson

7 Brokers and Clusters Image from "Kafka in a Nutshell"
by Kevin Sookocheff • Broker is what Kafka calls a single server. • Multiple brokers form a Cluster. • Data is replicated to several brokers in the cluster, with one broker the Leader for a given partition of data. • Reads and writes for a partition are served by its leader. • The leader coordinates replication. • In case of failure, a replica will take over leadership.

Kafka Topics • One Topic consists of one or more
Partitions… • … which each contain any number of Messages. • Partitions have an ordering guarantee: Messages will be stored in the same order they are written.

9 Messages Consist of… • Headers (e.g. Timestamp) • Key
(Byte-Array) • Value (Byte-Array) Default maximum message size is ~1 MB.

Topics (2) – Partition assignment Two options for partition assignment:
• If the message has no key (Key is null), partition assignment happens round-robin • If the message has a key, partition assignment happens based on key hash… • … which means messages with the same key will always be in the same partition.

11 Messages (2) – Special cases • In an event
stream topic (example: access log): • All messages have null keys, because there is no meaningful identity for an event • In a data changelog topic (example: topic ingested from a database): • A message with null value marks a deleted record

12 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2
Key Value Key Value User1 [email protected] User2 [email protected] User1 [email protected] User3 [email protected]

13 Partition 1 Partition 2 Key Value Key Value User1
[email protected] User2 [email protected] User1 [email protected] User3 [email protected] User2 null Topics (3) – (Change-)Log Compaction

Key Value Key Value User1 [email protected] User2 [email protected] User1 [email protected] User3 [email protected] User2 null

Part 3 Interacting with Kafka How to get data in
and out, aggregate and transform it

18 Terminology • Producers put data into Kafka • Consumers
read data from Kafka • Connectors link Kafka to external data stores • Stream processors filter, merge, aggregate, and transform data

A basic Kafka Producer in PHP $conf = new RdKafka\Conf();
$conf->set('metadata.broker.list', 'localhost:9092'); $producer = new RdKafka\Producer($conf); $topic = $producer->newTopic("test"); for ($i = 0; $i < 10; $i++) { $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Message $i"); $producer->poll(0); } // TODO: ensure producer flush on shutdown

A basic Kafka Consumer in PHP $conf = new RdKafka\Conf();
$consumer = new RdKafka\KafkaConsumer($conf); $consumer->subscribe(['test']); while (true) { // TODO: This should probably be a proper event loop... $message = $consumer->consume(120 * 1000); switch ($message->err) { case RD_KAFKA_RESP_ERR_NO_ERROR: var_dump($message); break; // TODO: Handle errors } }

21 Kafka's own frameworks & services • Kafka Connect: •
Links external data stores to Kafka using pre-built Connectors that only need configuration. • Great tool to make existing systems' data accessible in Kafka. • Kafka Streams: • Java framework for stream processing. Build your own stream processor in a single Java file. • Manages consumers, producers, data stores, etc. transparently. • KSQL: • Java not your thing? Write stream processors in an SQL-like language.

22 Bonus slide: More Cool tools • First-party CLI scripts:
kafka-topics, kafka-consumer-groups, kafka-console-consumer,many more • MirrorMaker: Replicate topics across different Kafka clusters • Kaf: alternative open source CLI client written in Go • Cruise Control: Kafka cluster management, workload rebalancing, self-healing • Debezium: Live replication of data from RDBMS (MySQL &co.) into Kafka • Many more: • https://github.com/monksy/awesome-kafka/ • https://github.com/infoslack/awesome-kafka/

Thank you! Got feedback? https://joind.in/talk/27a56

Intro to Apache Kafka #phpbnl20

Intro to Apache Kafka #phpbnl20

Tobias Gies

More Decks by Tobias Gies

Other Decks in Programming

Featured

Transcript

Introduction to Apache Kafka PHPBenelux conference 2020-01-25

Part 1 The feature pitch Why I think this Kafka

3 Kafka is not just a message queue • Distributed

4 Kafka is not just a message queue (2) •

5 Kafka is FAST. Like, seriously. • Millions of messages

Part 2 High-Level Concepts A Kafka anatomy lesson

7 Brokers and Clusters Image from "Kafka in a Nutshell"

Kafka Topics • One Topic consists of one or more

9 Messages Consist of… • Headers (e.g. Timestamp) • Key

Topics (2) – Partition assignment Two options for partition assignment:

11 Messages (2) – Special cases • In an event

12 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

13 Partition 1 Partition 2 Key Value Key Value User1

14 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

15 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

16 Topics (3) – (Change-)Log Compaction Partition 1 Partition 2

Part 3 Interacting with Kafka How to get data in

18 Terminology • Producers put data into Kafka • Consumers

A basic Kafka Producer in PHP $conf = new RdKafka\Conf();

A basic Kafka Consumer in PHP $conf = new RdKafka\Conf();

21 Kafka's own frameworks & services • Kafka Connect: •

22 Bonus slide: More Cool tools • First-party CLI scripts:

Thank you! Got feedback? https://joind.in/talk/27a56