Upgrade to Pro — share decks privately, control downloads, hide ads and more …

kafka_for_rubyists_advanced_kafka.pdf

 kafka_for_rubyists_advanced_kafka.pdf

karol.galanciak

January 15, 2021
Tweet

More Decks by karol.galanciak

Other Decks in Programming

Transcript

  1. Tombstones - When you want to delete the message from

    partition under a given key - Just send a message with a null payload (do not send “null” string though) - GDPR-friendly feature ;)
  2. Publishing huge messages (in megabytes) - Do you really have

    to do this in the first place? - If yes, you need to adjust multiple settings: 1. Consumer: "fetch.message.max.bytes" 2. Broker: "replica.fetch.max.bytes" 3. Broker: "message.max.bytes" 4. Broker: "max.message.bytes"
  3. How to choose number of partitions? - More partitions -

    more throughput - What do you expect? For example, if processing a message takes 10ms and you want to process 1000 messages/second, you need 10 partitions - More partitions - increased unavailability when broker is down (proportionally to the number of partitions) - More partitions - higher latency, increased replication time
  4. Kafka Controller - A “normal” broker with some extra responsibilities

    - The first broker that registers itself as a controller in the cluster - Responsible for electing partition leaders
  5. Kafka Replication - Replication is necessary for a reasonable production

    setup with satisfactory availability and durability - Replication factor (replication.factor option) determines how many times a partition is replicated (on how many brokers it will exist). 3 is a reasonable default - A replication factor of 3 (N) allows to lose 2 (N-1) brokers while still being operational
  6. Kafka Replication: Leader/Follower - Leader replica: each partition has own

    leader, all requests (produce/consume) go through the leader - Follower replica: they ensure they are up-to- date with the leader. If the leader goes down, the follower will take over
  7. In-sync replicas - Replicas that are “up-to-date” - Configurable via

    “replica.lag.time.max.ms” for how long the replica can be considered to be in-sync
  8. In-sync replicas - For ensuring consistency, you might choose to

    require data to be committed to more than one replica - “min.insync.replicas” - When it’s set to 2 and you have 3 brokers, you can lose only 1 broker, if you lose 2, it will no longer be possible to produce messages for the affected partitions
  9. Leader election - Clean election - when in-sync replica is

    chosen as a new leader, a standard process - Unclean election - when no in-sync replica exists (e.g. 2 brokers are down and then the last one, the leader, goes down) - Unclean election - difficult choice, consistency vs. availability (we can lose messages or decide to have the partition offline) - configurable via "unclean.leader.election.enable"
  10. Split-brain - One controller goes down (network partition, stop-the-world GC

    pause), and still thinks it’s a controller after coming back but meantime, a new one was elected - Epoch number (monotonically increasing number for controllers) is used to prevent split-brain - the highest ones wins
  11. Zookeeper - It’s not a Kafka “core” itself, but it’s

    used by Kafka - Zookeeper - a service for maintaining shared configuration - It’s used e.g. for electing a controller or keeping info about cluster membership (which brokers are part of the cluster) - Planned to be removed as a dependency
  12. Producer’s Reliability - acks: 0/1/all - no acknowledgement/by leader/ all

    required in-sync replicas - Error handling - if you don’t want to lose messages, you should retry somehow