Slide 1

Slide 1 text

© Hitachi, Ltd. 2021. All rights reserved. Apache Kafka for Microservices 2021.10.29 Hitachi Ltd., R&D Group Kiminori Kurihara OSS Tech Talk

Slide 2

Slide 2 text

© Hitachi, Ltd. 2021. All rights reserved. Summary Transactional Outbox pattern: one of methods for achieving distributed transaction. Kafka Connectivity ➢ Apache Kafka can easily connect to various data sources. ➢ Consistency with Kafka & DB event is useful for microservices. Use in microservices Development Migration Strangler Fig pattern: synchronizing DB for update and DB for query (CQRS pattern). Change data Caption(CDC) Kafka and Debezium • Low code: Using provided image & config files, Kafka connector can be built. • Source & Sink Connector: Data into Kafka (Source) & from Kafka (Sink) connector for data integration. • Various Data source: Self-developed service, managed service, DB and so on. Kafka Connector Kafka can connect various data sources.

Slide 3

Slide 3 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 4

Slide 4 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 5

Slide 5 text

© Hitachi, Ltd. 2021. All rights reserved. 1. Background ➢ Why can Apache Kafka be adopted for the hub of microservices architecture? https://www.confluent.jp/what-is-apache-kafka/ Microservices Architecture • Divide the system into smaller units called services and loosely couple them. Overview • Improve the agility of business speed by allowing systems to change flexibly. Merit Hub • There are many examples of microservices using Apache Kafka as a hub. Why?

Slide 6

Slide 6 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 7

Slide 7 text

© Hitachi, Ltd. 2021. All rights reserved. 2-1. Apache Kafka Overview ➢ Apache Kafka is one of the most famous OSSs for distributed event streaming platform. • Kafka can easily handle dozens of nodes and distributed processing by "partitions" 1. High scalability 1 2 3 4 5 • One message is distributed into several partitions. • Messages can be sent by several semantics. 2. Fault tolerance • Services are loosely coupled by using Kafka as message hub. • Kafka has a rebalancing function for service configuration changes. 3. Loosely coupled • Kafka ecosystem provides low-code integration functions to connect to external components. 4. Connectivity Used by thousands of companies for • high-performance data pipelines • streaming analytics • data integration • mission-critical applications.

Slide 8

Slide 8 text

© Hitachi, Ltd. 2021. All rights reserved. 2-2. High scalability 1. Kafka has “Topic” consisted of several partitions. 1 2 3 4 5 ➢ Kafka distributes partitions across multiple brokers to hold messages for high scalability. Produces messages to Topic Subscribe messages from Topic 2. One partition is distributed to several broker nodes. 3. Messages are performed on a per-partition. -> Then the performance is scalable. Kafka messaging

Slide 9

Slide 9 text

© Hitachi, Ltd. 2021. All rights reserved. 2-3. Fault tolerance One message is replicated into several partitions against message lost. Message store 1 2 3 4 5 ➢ One message is distributed into several partitions. ➢ Messages can be sent by several semantics. Types of message semantics At most once Exactly once At least once A message is delivered to consumers at most once. -> the message can be lost. A message is delivered to consumers at least once. -> the message can be delivered many times. A message is delivered to consumers exactly once. Attention) this is guaranteed only inside Kafka. - A producer can produce the same message many times. - A consumer cannot send ACK to Kafka after processing the message (e.g. NW disconnection) Designing idempotent is necessary.

Slide 10

Slide 10 text

© Hitachi, Ltd. 2021. All rights reserved. 2-4. Loosely coupled 1 2 3 4 5 ➢ Services are loosely coupled by using Kafka as message hub. ➢ Kafka has a rebalancing function for configuration changes. Service configuration changes don’t influence other services. New service If there is not a messaging hub, connected services must change after adding new service. Rebalancing Kafka automatically adapts the configuration changes inside Service. Loose Coupling between services. Loose Coupling between service and Kafka.

Slide 11

Slide 11 text

© Hitachi, Ltd. 2021. All rights reserved. 2-5. Apache Kafka - Connectivity Source Connector 1 2 3 4 5 ➢ Kafka ecosystem provides low-code integration functions to connect to external components (Kafka Connect). DB Service worker 1 worker 2 worker 3 Sink Connector worker 1 worker 2 worker 3 DB Service Managed service Managed service Kafka Need provided image & config file to build Image files are mainly provided by Confluent or Github.

Slide 12

Slide 12 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 13

Slide 13 text

© Hitachi, Ltd. 2021. All rights reserved. 3-1. Database per service Monolith 1 2 3 4 ➢ In microservices architecture, each service typically has its own database for loosely coupling. Microservices Component 1 Component 2 Component 3 Component 4 DB Service 1 DB 1 Service 2 DB 2 Service 3 DB 3 • There is one DB. • All components can access it. • Each service has its own DB. • Each DB can be accessed only by the relevant service.

Slide 14

Slide 14 text

© Hitachi, Ltd. 2021. All rights reserved. 3-2. Problem: consistency 1 2 3 4 ➢ When multiple DBs are updated, consistent state of the data is not maintained. ⇒ Commit (at step #2) can’t be rollbacked Service 1 Service 2 1. Request 2. Update & Commit 3. Request 5. Failure Occur 6. Rollback 4. Update DB 1 DB 2 Problem Multiple DBs cannot be committed at the same time.

Slide 15

Slide 15 text

© Hitachi, Ltd. 2021. All rights reserved. 3-3. Solution: Distributed transaction Saga pattern ➢ Typical solution of this issue is distributed transaction, just like Saga / TCC pattern does TCC(try confirm-cancel) pattern • Update is committed in each service, respectively. • Compensation transaction in failure cancels the committed data. • 2 phase across services: (i) commit OK/NG check (ii) commit request after all services responding OK 1 2 3 4

Slide 16

Slide 16 text

© Hitachi, Ltd. 2021. All rights reserved. 3-3-1. Saga pattern ➢ Solution 1. Saga pattern: Send compensating transaction in failure occurred. Service 1 Service 2 1. Request 2.Update & Commit 3. Request 5. Failure Occur 6. Rollback 7. Compensating Transaction Request 8. Commit for Canceling step #2 4. Update DB 1 DB 2 Solution Send compensating transaction in failure cancels the committed data. 1 2 3 4

Slide 17

Slide 17 text

© Hitachi, Ltd. 2021. All rights reserved. 3-3-2. TCC pattern – normal case ➢ Solution 2. TCC pattern: 2 phase commit across services. DB 1 DB 2 Solution 2 phase( (i) commit OK/NG check, (ii) confirm/cancel request) across services. Service 1 Service 2 1. Request 3. Request 5. Commit OK 6. Commit 4. Commit Preparation 8. Commit 7. Commit Direction 2. Commit Preparation 1 2 3 4

Slide 18

Slide 18 text

© Hitachi, Ltd. 2021. All rights reserved. 3-3-2. TCC pattern – failure case ➢ Solution 2. TCC pattern: 2 phase commit across services. DB 1 DB 2 Solution 2 phase( (i) commit OK/NG check, (ii) confirm/cancel request) across services. Service 1 Service 2 1. Request 3. Request 7. Commit NG 8. Cancel Commit Preparation 4. Commit Preparation 6. Rollback 2. Commit Preparation 5. Failure occur 1 2 3 4

Slide 19

Slide 19 text

© Hitachi, Ltd. 2021. All rights reserved. 3-4. Saga vs. TCC 1 2 3 4 ➢ We evaluated that Saga is superior for large system from viewpoints about throughput and loose coupling. Saga TCC Process Outline Compensation transaction 2 phase commit Throughput Good 1 phase process in normal Not Good Always 2 phase process Loose coupling Good Independent commit at respective services Not Good Commit after checks at all other services Consistency Not Good Inconsistent state can exist Good Only temporary data can exist Problem For early resolution of inconsistent state, (i) data commit and (ii) requests to other services in individual services must be done without conflict. The larger the system size, the greater the degree of performance problem.

Slide 20

Slide 20 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 21

Slide 21 text

© Hitachi, Ltd. 2021. All rights reserved. 4-1. Problem in Saga pattern ➢ (i) data commit and (ii) requests to other services in individual services can be temporarily inconsistent. Service 1 1. Request 2.Update & Commit 3. Request DB 1 Problem & Solution Process 2 & 3 must be done without conflict. => Kafka is suitable for the solution. After 2, the system state becomes temporarily inconsistent. Service 2 2 & 3 must be processed without conflict. if 2 fails => 3 must not be processed. if 2 successes => 3 must be processed. Kafka 1 2 3 4 5

Slide 22

Slide 22 text

© Hitachi, Ltd. 2021. All rights reserved. 4-2. Transactional outbox pattern ➢ One of the implementation method for Saga pattern is transactional outbox pattern. Message relay Updating DB & request to other services are processed without conflict using outbox table. 3. Read RDB Transactional outbox pattern 1. Service 1 updates business data table. 2. Service 1 inserts into the outbox table and commits. 3. Message relay reads the outbox table. 4. The data of outbox table is sent to other services by the Message relay. Other services 6 6 business data outbox table 1. CUD 2. C Service 1 Sequence 4. Request Is there any tool? 1 2 3 4 5

Slide 23

Slide 23 text

© Hitachi, Ltd. 2021. All rights reserved. 4-3. Debezium + Kafka ➢ z “ ” “ ” Debezium connector system https://access.redhat.com/documentation/ja-jp/red_hat_integration/2020-q2/html-single/debezium_user_guide/index • Debezium is OSS about distributed platform for Change Data Capture(CDC). • Debezium provides Kafka Connector -> “Debezium connector”. • Debezium connector converts “update of RDB” to “event to Kafka”. 1 2 3 4 5

Slide 24

Slide 24 text

© Hitachi, Ltd. 2021. All rights reserved. 4-4. Transactional outbox pattern with Debezium ➢ Debezium provides the Kafka connector for the message relay of transactional outbox pattern and the library for services. Debezium Connector Kafka 4. Produce RDB Transactional outbox pattern with Debezium connector 1. Service 1 update its DB(business data). 2. The library insert updating info. Into outbox table. 3. 1&2 is committed at the same time. 4. Debezium connector detects the change of outbox table. 5. the data of outbox table is sent to Kafka by Debezium connector. Other services 5. Consume 6 6 business data outbox table 1. CUD 2. C debezium-quarkus-outbox library Service 1 Sequence 1 2 3 4 5

Slide 25

Slide 25 text

© Hitachi, Ltd. 2021. All rights reserved. 4-4-1. Sample ➢ A sample of the transactional outbox pattern is provided in Github (E-commerce). https://github.com/debezium/debezium-examples/tree/master/saga 1 2 3 4 5

Slide 26

Slide 26 text

© Hitachi, Ltd. 2021. All rights reserved. 4-4-1-1. Sample - outbox table ➢ The outbox table has metadata and payload about updating. ➢ One of metadata indicates the Kafka topic to be sent. # Column Content Example 1 ID ID of outbox table f9a78388-883a-45d7-b4f6-7d47f8c89187 2 AGGREGAT ETYPE Process name used for indicating Kafka topic (defined by business logic) credit-approval 3 AGGREGATE ID ID of process event (defined by business logic) 57974d15-a2f5-49a7-b18a-71c636193bdd 4 TYPE Request type in process (defined by business logic) Request 5 PAYLOAD updating information about business data. (omitted) 6 TIMESTAMP timestamp of request 2021-08-16 07:12:42.605371 7 TRACING SPAN CONTEXT ID for distributed tracing (Jaeger) uber-trace- id=76d506f0a064b52d¥:a7c822eda5e591 86¥:76d506f0a064b52d¥:1 1 2 3 4 5

Slide 27

Slide 27 text

© Hitachi, Ltd. 2021. All rights reserved. 4-4-1-2. Sample – compensation transaction ➢ When order service receives the error response from one service, it sends the compensating transaction to the other. Order service Kafka Customer service Payment service 1. requests 2. request 3. request 5. error 4. success 6. responses 7. compensating transaction request 8. compensating transaction request Communication for compensating transaction case The library sends compensating transactions to the other services except for the service where error occurred. 1 2 3 4 5

Slide 28

Slide 28 text

© Hitachi, Ltd. 2021. All rights reserved. 4-5. Other use case ➢ Debezium connector is utilized in loosely coupling the monolithic system. 6 Data of Service 1 Transaction Log 1. CUD Debezium Connector Kafka 2. Replica 3. Produce RDB ・・・ Service 1 Debezium Connector System Other services 4. Consume Debezium Connector receive whole transaction logs from RDB, by setting it as replica of RDB. ’ in migration. 1 2 3 4 5

Slide 29

Slide 29 text

© Hitachi, Ltd. 2021. All rights reserved. 4-5-1. Other use case – strangler fig pattern ➢ Strangler Fig Pattern: Functions in legacy system are gradually split into services. CRUD CDC-based Strangler Fig Pattern Example: separating update and query processes in monolith Component 1 Component 2 Component 3 Component 4 RDB CUD Component 1 Component 2 Component 3 Component 4 RDB Loosely coupled (DBs are constructed by CQRS pattern) Proxy R Data Source Kafka Debezium Connector Connector Update Query Legacy Monolith 1 2 3 4 5

Slide 30

Slide 30 text

© Hitachi, Ltd. 2021. All rights reserved. Agenda 1. Background 2. Apache Kafka 1. Apache Kafka Overview 2. High scalability 3. Fault tolerance 4. Loosely coupled 5. Connectivity 3. Distributed transaction in microservices 1. Database per service 2. Problem: Consistency 3. Solution: Distributed transaction 1. Saga pattern 2. TCC pattern 4. Saga vs. TCC 4. Using Kafka in microservices 1. Problem in Saga pattern 2. Transactional Outbox pattern 3. Debezium + Kafka 4. Transactional Outbox pattern with Debezium 1. Sample 5. Other use case 1. Strangler Fig pattern 5. Conclusion

Slide 31

Slide 31 text

© Hitachi, Ltd. 2021. All rights reserved. 5. Conclusion suitable for microservices message hub Kafka Characteristics ➢ The Apache Kafka with Debezium is useful for microservices arch. on not only dev. phase but migration phase. (1) Updating data & (2)requests to other services become consistent Transactional Outbox pattern Strangler Fig pattern Functions in legacy system are gradually split into services z z z 1. High scalability 2. Fault tolerance 3. Loosely coupled 4. Connectivity Especially important

Slide 32

Slide 32 text

© Hitachi, Ltd. 2021. All rights reserved. Trademark • Apache Hadoop®, Hadoop, Apache Kafka®, Kafka, Apache ZooKeeper™, ZooKeeper, and associated open-source project names are trademarks of the Apache Software Foundation. • Debezium name and logo are trademarks of Red Hat. • Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. • MySQL, and Oracle are registered trademarks of Oracle and/or its affiliates. • Twitter is registered trademark of Twitter, Inc. • All other company names, product names, service names, and other proper nouns mentioned herein are trademarks or registered trademarks of their respective companies • TM and ® marks are not indicated in the text and figures in this presentation

Slide 33

Slide 33 text

No content