Slide 1

Slide 1 text

Kafka in Production Andrey Panasyuk, @defascat

Slide 2

Slide 2 text

Introduction 2

Slide 3

Slide 3 text

Remote Calls Types 1. Synchronous calls 2. Asynchronous calls Limitations 1. Peer-to-Peer 2. Retries 3. Load balancing 4. Durability 5. Backpressure 3

Slide 4

Slide 4 text

Message Queues 1. External tool 2. Asynchronous communication protocol 4

Slide 5

Slide 5 text

Lets get to Kafka!!! 5

Slide 6

Slide 6 text

Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log". Wikipedia 6

Slide 7

Slide 7 text

Pub/Sub 7

Slide 8

Slide 8 text

Concepts. Log 8

Slide 9

Slide 9 text

Concepts. Data Flow 9

Slide 10

Slide 10 text

Concepts. Distributed Log 10

Slide 11

Slide 11 text

Concepts. Partitions 11

Slide 12

Slide 12 text

Concepts. Partitions to consumers 12

Slide 13

Slide 13 text

Concepts. Architecture 13

Slide 14

Slide 14 text

I’ve heard in other presentations. Lets get to it! 14

Slide 15

Slide 15 text

Kafka. Controller 1. One of brokers 2. Managing state of partitions 3. Managing state of replicas 4. Partitions manipulations 5. High-availability 15

Slide 16

Slide 16 text

Kafka + ZooKeeper 1. Cluster membership 2. Electing leader 3. Topic configuration 4. Offsets for a Group/Topic/Partition combination 16

Slide 17

Slide 17 text

Kafka. Guarantees 1. Delivery guarantees a. At least once (by default) b. At most once c. Exactly once 2. Fault-tolerance vs latency a. No ack b. Acks from leader c. Acks from followers 3. Message order in a single partition 17

Slide 18

Slide 18 text

Kafka. Adding a broker 1. Adds a new machine into ISR 2. Starts rebalancing partitions (if automatic rebalance enabled) a. Too much partitions can cause an issue 3. Notifies consumers 4. Notifies producers 18

Slide 19

Slide 19 text

Kafka. Failure Scenarios 1. In-Sync-Replicas 2. Leader election 3. CAP a. Partition Tolerance b. Availability c. Consistency* 19

Slide 20

Slide 20 text

I’m a Java Developer. Show me the code! 20

Slide 21

Slide 21 text

Kafka. Producer Properties properties = new Properties(); properties.setProperty(" bootstrap.servers", brokers); properties.setProperty("key.serializer","o.a.k.c.s.StringSerializer"); properties.setProperty("value.serializer","o.a.k.c.s.StringSerializer"); KafkaProducer producer = new KafkaProducer<>(properties); KeyedMessage data = new KeyedMessage<>( "sync", userId, steps); producer.send(data); 21

Slide 22

Slide 22 text

Kafka. Real-world Producers 1. Topic name validation 2. Adding metrics 3. Adding default metadata 22

Slide 23

Slide 23 text

Kafka. Message availability 23

Slide 24

Slide 24 text

Kafka. Consumer Properties properties = new Properties(); properties.setProperty(" bootstrap.servers", brokers); properties.setProperty("key.deserializer","o.a.k.c.s.StringDeserializer"); properties.setProperty("value.deserializer","o.a.k.c.s.StringDeserializer"); properties.setProperty(" group.id", groupId); KafkaConsumer consumer = new KafkaConsumer<>(properties); consumer.subscribe(“sync”); while(true) { consumer.poll(100) .forEach(r -> System.out.println(r.key() + ": " + r.value()); } 24

Slide 25

Slide 25 text

Kafka. Real-world Consumers 1. Metrics 2. Invalid message queue 3. Separating message processing in KafkaMessageProcessor 4. Different implementations a. 1 thread for all partitions vs 1 thread per 1 partition b. Autocommit c. Poll periods d. Batch support e. Rebalancing considerations 25

Slide 26

Slide 26 text

Kafka. Serialization public interface Deserializer { public void configure(Map configs, boolean isKey); public T deserialize(String topic, byte[] data); public void close(); } public interface Serializer { public void configure(Map configs, boolean isKey); public byte[] serialize(String topic, T data); public void close(); } 26

Slide 27

Slide 27 text

Kafka. Consumer Failure 1. Wait for ZooKeeper timeout 2. Controller processes event from ZooKeeper 3. Controller notifies consumers 4. Consumers select new partition consumer 27

Slide 28

Slide 28 text

Do you really have all this mess working? 28

Slide 29

Slide 29 text

Kafka. Corporate Challenge Usages 1. User Sync Processing 2. Analytics 29

Slide 30

Slide 30 text

Kafka. Our Deployment 1. Yahoo kafka-manager 2. MirrorMaker 30

Slide 31

Slide 31 text

Kafka. Practices 1. Topics manually created on prod, automatically on QA envs 2. Do not delete topics (KAFKA-1397, KAFKA-2937, KAFKA-4834, ...) 3. IMQ implementation 4. Use identical versions on all brokers 31

Slide 32

Slide 32 text

Kafka. Tuning 1. 20-100 brokers per cluster; hard limit of 10,000 partitions per cluster (Netflix) 2. Increase replica.lag.time.max.ms and replica.lag.max.messages 3. Increase num.replica.fetchers 4. Reduce retention 5. Increase rebalance.max.retries, rebalance.backoff.ms 32

Slide 33

Slide 33 text

Monitoring And Alerting 1. Consumer metrics 2. Producer metrics 3. Kafka Broker metrics 4. Zookeeper metrics 5. PagerDuty alerts 33

Slide 34

Slide 34 text

Current State. Message Input Rate 34

Slide 35

Slide 35 text

Current State. Producer Latency 35

Slide 36

Slide 36 text

Lets wrap this up! 36

Slide 37

Slide 37 text

Kafka. Extension Points ● Storages ○ Amazon S3 (Sink) ○ Files (Source) ○ Elasticsearch (Sink) ○ HDFS (Sink) ○ JDBC (Source, Sink) ○ C* (Sink) ○ PostgreSQL (Sink) ○ Oracle/MySQL/MSSQL (Sink) ○ Vertica (Source, Sink) ○ Ignite (Source, Sink) 37 ● Protocols/Queues ○ MQTT (Source) ○ SQS (Source) ○ JMS (Sink) ○ RabbitMQ (Source) ● Others ○ Mixpanel (Sink)

Slide 38

Slide 38 text

Alternatives. ActiveMQ 1. Pros a. Simplicity b. Way more rich features (standard protocols, TTLs, in-memory) c. DLQ d. Extension points 2. Cons a. Delivery guarantees b. Loosing messages under high load c. Failure Handling scenarios d. Throughput in transactional mode 38

Slide 39

Slide 39 text

Alternatives. RabbitMQ ● Pros ○ Simpler to start ○ More features ■ Ability to query/filter ■ Federated queues ■ Sophisticated routing ○ Plugins ● Cons ○ Scales vertically mostly ○ Consumers are mostly online assumption ○ Delivery guarantees are less rich 39

Slide 40

Slide 40 text

Kafka. Strengths and Weaknesses 1. Strengths a. Horizontal scalability b. Rich delivery guarantee models c. Disk persistance 2. Weaknesses a. Need for ZooKeeper b. Lack of any kind of backpressure c. Lack of useful features othe queues havr d. Lack of any kind of DLQ e. Limited number of extension points f. Complex internal protocols g. Too smart clients 40

Slide 41

Slide 41 text

41