kafka_for_rubyists_advanced_kafka.pdf

Slide 1

Slide 1 text

Advanced Kafka

Slide 2

Slide 2 text

Tombstones - When you want to delete the message from partition under a given key - Just send a message with a null payload (do not send “null” string though) - GDPR-friendly feature ;)

Slide 3

Slide 3 text

Publishing huge messages (in megabytes) - Do you really have to do this in the first place? - If yes, you need to adjust multiple settings: 1. Consumer: "fetch.message.max.bytes" 2. Broker: "replica.fetch.max.bytes" 3. Broker: "message.max.bytes" 4. Broker: "max.message.bytes"

Slide 4

Slide 4 text

How to choose number of partitions? - More partitions - more throughput - What do you expect? For example, if processing a message takes 10ms and you want to process 1000 messages/second, you need 10 partitions - More partitions - increased unavailability when broker is down (proportionally to the number of partitions) - More partitions - higher latency, increased replication time

Slide 5

Slide 5 text

Kafka Controller - A “normal” broker with some extra responsibilities - The first broker that registers itself as a controller in the cluster - Responsible for electing partition leaders

Slide 6

Slide 6 text

Kafka Replication - Replication is necessary for a reasonable production setup with satisfactory availability and durability - Replication factor (replication.factor option) determines how many times a partition is replicated (on how many brokers it will exist). 3 is a reasonable default - A replication factor of 3 (N) allows to lose 2 (N-1) brokers while still being operational

Slide 7

Slide 7 text

Kafka Replication: Leader/Follower - Leader replica: each partition has own leader, all requests (produce/consume) go through the leader - Follower replica: they ensure they are up-to- date with the leader. If the leader goes down, the follower will take over

Slide 8

Slide 8 text

In-sync replicas - Replicas that are “up-to-date” - Configurable via “replica.lag.time.max.ms” for how long the replica can be considered to be in-sync

Slide 9

Slide 9 text

In-sync replicas - For ensuring consistency, you might choose to require data to be committed to more than one replica - “min.insync.replicas” - When it’s set to 2 and you have 3 brokers, you can lose only 1 broker, if you lose 2, it will no longer be possible to produce messages for the affected partitions

Slide 10

Slide 10 text

Leader election - Clean election - when in-sync replica is chosen as a new leader, a standard process - Unclean election - when no in-sync replica exists (e.g. 2 brokers are down and then the last one, the leader, goes down) - Unclean election - difficult choice, consistency vs. availability (we can lose messages or decide to have the partition offline) - configurable via "unclean.leader.election.enable"

Slide 11

Slide 11 text

Split-brain - One controller goes down (network partition, stop-the-world GC pause), and still thinks it’s a controller after coming back but meantime, a new one was elected - Epoch number (monotonically increasing number for controllers) is used to prevent split-brain - the highest ones wins

Slide 12

Slide 12 text

Zookeeper - It’s not a Kafka “core” itself, but it’s used by Kafka - Zookeeper - a service for maintaining shared configuration - It’s used e.g. for electing a controller or keeping info about cluster membership (which brokers are part of the cluster) - Planned to be removed as a dependency

Slide 13

Slide 13 text

Producer’s Reliability - acks: 0/1/all - no acknowledgement/by leader/ all required in-sync replicas - Error handling - if you don’t want to lose messages, you should retry somehow

Slide 14

Slide 14 text

Thanks!