Slide 1

Slide 1 text

MULTI-DATACENTER KAFKA

Slide 2

Slide 2 text

MULTI-DATACENTER KAFKA EKO KURNIAWAN KHANNEDY ▸ Senior Principal R&D Engineer at Blibli.com ▸ JVM User Group Indonesia Committee ▸ Blogger & Vlogger

Slide 3

Slide 3 text

BLIBLI.COM IS ONE OF THE BIGGEST B2C ECOMMERCE IN INDONESIA

Slide 4

Slide 4 text

MULTI-DATACENTER KAFKA BLIPAY (EWALLET, PAYMENT) BLIBLI.COM (ECOMMERCE) BLIBLI.COM SYSTEM

Slide 5

Slide 5 text

MULTI-DATACENTER KAFKA MESSAGE BROKER IN BLIBLI.COM ▸ Use RABBITMQ from the beginning ▸ Start using KAFKA in 2015 ▸ Deprecated RABBITMQ in 2017 because scalability issue ▸ Migrating from RABBITMQ to KAFKA until now

Slide 6

Slide 6 text

MULTI-DATACENTER KAFKA MESSAGE BROKER IN BLIPAY ▸ Start develop Blipay system in 2017 ▸ Using Event-Driven Architecture (No HTTP Call beetwen services, All communication using Message Broker) ▸ KAFKA is the only Message Broker in Blipay ▸ Based on regulation in Indonesia, Blipay must using multi data-center for disaster recovery

Slide 7

Slide 7 text

AND THIS IS OUR JOURNEY USING KAFKA

Slide 8

Slide 8 text

MULTI-DATACENTER KAFKA BLIPAY ARCHITECTURE BLIPAY-API BLIPAY-MEMBER BLIPAY- MERCHANT BLIPAY- TRANSACTION BLIPAY-PAYMENT BLIPAY- NOTIFICATION BLIPAY- PROMOTION APACHE KAFKA MONGODB MONGODB POSTGRE MONGODB MONGODB REDIS POSTGRE

Slide 9

Slide 9 text

MULTI-DATACENTER KAFKA ALL COMMUNICATION USING KAFKA BLIPAY-API BLIPAY-MEMBER BLIPAY- MERCHANT BLIPAY- TRANSACTION BLIPAY-PAYMENT BLIPAY- NOTIFICATION BLIPAY- PROMOTION APACHE KAFKA 1. HTTP Request 2. HTTP Request 4. SaveBalanceEvent 3. Some Events 4. SaveTransactionEvent

Slide 10

Slide 10 text

WE ARE HAPPY USING EVENT-DRIVEN ARCHITECTURE

Slide 11

Slide 11 text

UNTIL WE NEED TO DEPLOY TO MULTI-DATACENTER

Slide 12

Slide 12 text

MULTI-DATACENTER KAFKA SOLUTION NO 1 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER MIRROR MAKER MIRROR MAKER DATABASES DATABASES

Slide 13

Slide 13 text

MULTI-DATACENTER KAFKA PROBLEM WITH SOLUTION NO 1 ▸ INFINITE LOOP ▸ Mirror Maker in datacenter 1 send message to datacenter 2 ▸ Mirror Maker in datacenter 2 receive the message, and send back to datacenter 1

Slide 14

Slide 14 text

MULTI-DATACENTER KAFKA SOLUTION NO 2 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER MIRROR MAKER DATABASES DATABASES

Slide 15

Slide 15 text

MULTI-DATACENTER KAFKA PROBLEM WITH SOLUTION NO 2 ▸ All services in all datacenter will consume the message. ▸ For blipay-notification, we receive duplicate sms & email

Slide 16

Slide 16 text

MULTI-DATACENTER KAFKA SOLUTION NO 3 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER DATABASES DATABASES Database Mirroring

Slide 17

Slide 17 text

MULTI-DATACENTER KAFKA PROBLEM WITH SOLUTION NO 3 ▸ Not all services using database ▸ Some services need to react based on event, like blipay-notification.

Slide 18

Slide 18 text

MULTI-DATACENTER KAFKA SOLUTION NO 4 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER DATABASES DATABASES Database Mirroring MIRROR MAKER

Slide 19

Slide 19 text

MULTI-DATACENTER KAFKA PROBLEM WITH SOLUTION NO 4 ▸ All services in all datacenter will consume the message. ▸ And we still get duplicate sms & email ▸ And now we get error DUPLICATE PRIMARY KEY, because database sync and service receive message from kafka

Slide 20

Slide 20 text

MULTI-DATACENTER KAFKA SOLUTION NO 5 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER DATABASES DATABASES Database Mirroring MIRROR MAKER Block 
 Network

Slide 21

Slide 21 text

MULTI-DATACENTER KAFKA RESULT WITH SOLUTION NO 5 ▸ No duplicate primary key ▸ No duplicate sms & email ▸ And we are happy :D

Slide 22

Slide 22 text

IT'S TIME FOR SIMULATION

Slide 23

Slide 23 text

MULTI-DATACENTER KAFKA DISASTER RECOVERY SIMULATION ▸ Shutdown datacenter 1 ▸ Switch all traffic to datacenter 2 ▸ Unblock microservices network to kafka

Slide 24

Slide 24 text

MULTI-DATACENTER KAFKA AND THE RESULT ARE ... ▸ All microservice start consuming all kafka message from beginning ▸ Duplicate Primary Key Error EVERYWHERE! ▸ We receive SMS & Email again from beginning

Slide 25

Slide 25 text

WHY THIS HAPPEN?

Slide 26

Slide 26 text

MULTI-DATACENTER KAFKA ANATOMY OF KAFKA TOPIC PARTITION 1 PARTITION 2 PARTITION 3 PARTITION 4

Slide 27

Slide 27 text

MULTI-DATACENTER KAFKA ANATOMY OF KAFKA PARTITION

Slide 28

Slide 28 text

KAFKA MIRROR MAKER ONLY SEND MESSAGE TO OTHER KAFKA CLUSTER, BUT NOT THE OFFSET

Slide 29

Slide 29 text

MULTI-DATACENTER KAFKA SOLUTION NO 6 MICROSERVICES APACHE KAFKA CLUSTER MICROSERVICES APACHE KAFKA CLUSTER DATABASES DATABASES Database Mirroring MIRROR MAKER Block 
 Network OFFSET
 SYNC

Slide 30

Slide 30 text

LET'S DO SIMULATION! AGAIN!

Slide 31

Slide 31 text

MULTI-DATACENTER KAFKA AND THE RESULT ARE ... ▸ No duplicate primary key ▸ No duplicate sms & email ▸ And we are happy :D

Slide 32

Slide 32 text

BUT THERE IS ONE MORE THING

Slide 33

Slide 33 text

EVERY MONTH WE DEPLOY NEW FEATURES

Slide 34

Slide 34 text

MULTI-DATACENTER KAFKA AND THIS IS WHAT HAPPENS WHEN WE DEPLOY NEW FEATURES ▸ We start receiving sms & email about blipay from beginning, 
 but not all sms & email. ▸ Some service start reconsuming kafka messages from begining, 
 but again, not all messages. ▸ Now we are getting confused

Slide 35

Slide 35 text

LONG STORY SHORT! IT'S BECAUSE KAFKA OFFSET RETENTION

Slide 36

Slide 36 text

DEFAULT IS 1 DAY

Slide 37

Slide 37 text

SO IF IN 1 DAY, THERE IS NO ACTIVITY IN THE OFFSET, IT WILL RESET TO ZERO

Slide 38

Slide 38 text

SO, WE INCREASE OFFSET RETENTION MORE THAN TOPIC RETENTION

Slide 39

Slide 39 text

HTTPS://WWW.BLIBLI.COM/PAGE/KARIR/