Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming For Microservices With Apache Kafka and Debezium

Change Data Streaming For Microservices With Apache Kafka and Debezium

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka as well as instant read-your-own-write semantics for services themselves. Join this session to learn how to leverage CDC for reliable microservices integration and solving typical challenges such as gradually extracting microservices from existing monoliths, building an audit log, and updating caches as well as full-text search indexes.

In a live demo we'll show how to use Debezium to set up a change data stream out of your service's database, without any code changes needed. You'll see how to consume change events in other services, how to gain real-time insight into your changing data using Kafka Streams and much more.

Gunnar Morling

September 12, 2019
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Change Data Streaming For Microservices Change Data Streaming For Microservices

    With Apache Kafka and Debezium With Apache Kafka and Debezium Gunnar Morling Gunnar Morling Software Engineer
  2. Gunnar Morling Gunnar Morling Open source software engineer at Red

    Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct Java Champion [email protected] @gunnarmorling http://in.relation.to/gunnar-morling/ #JavaZone #Debezium @gunnarmorling
  3. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Database Order Service #JavaZone #Debezium
  4. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service #JavaZone #Debezium
  5. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service Search Index #JavaZone #Debezium
  6. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Order Service Cache Database Search Index “ Friends Don't Let Friends Do Dual Writes #JavaZone #Debezium
  7. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service #JavaZone #Debezium
  8. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #JavaZone #Debezium
  9. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #JavaZone #Debezium
  10. Log-based Change Data Capture Log-based Change Data Capture Tailing the

    transaction log Tailing the transaction log Canonical source for all applied data changes in exact order Advantages All data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes and old record state @gunnarmorling #JavaZone #Debezium
  11. Debezium Debezium Change Data Capture Platform Change Data Capture Platform

    Retrieves change events from TX logs from different DBs Transparent to writing apps Comprehensive type support (PostGIS etc.) Snapshotting, Filtering etc. Fully open-source, very active community Latest version: 0.9 Production deployments at multiple companies (e.g. WePay, Trivago, BlaBlaCar etc.) @gunnarmorling #JavaZone #Debezium
  12. Change Event Structure Change Event Structure Key: PK of table

    Value: Describing the change event Before state, After state, Source info Serialization formats: JSON Avro { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } @gunnarmorling #JavaZone #Debezium
  13. Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Cassandra

    (Incubating) Oracle (Incubating, based on XStream) Possible future additions DB2? MariaDB? @gunnarmorling #JavaZone #Debezium
  14. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect Zero-Code

    Streaming Pipelines Zero-Code Streaming Pipelines #JavaZone #Debezium
  15. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ

    PG DBZ MySQL Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #JavaZone #Debezium
  16. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #JavaZone #Debezium
  17. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines JDBC Connector ES Connector Analytics DB #JavaZone #Debezium
  18. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines JDBC Connector ES Connector Analytics DB ISPN Connector Infinispan #JavaZone #Debezium
  19. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events CRM Service #JavaZone #Debezium
  20. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #JavaZone #Debezium
  21. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #JavaZone #Debezium
  22. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #JavaZone #Debezium
  23. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events #JavaZone #Debezium
  24. @gunnarmorling Auditing Auditing { "before": { "id": 1004, "last_name": "Kretchmar",

    "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } Customers #JavaZone #Debezium
  25. @gunnarmorling Auditing Auditing { "before": { "id": 1004, "last_name": "Kretchmar",

    "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers #JavaZone #Debezium
  26. @gunnarmorling Auditing Auditing { "before": { "id": 1004, "last_name": "Kretchmar",

    "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers #JavaZone #Debezium
  27. @gunnarmorling Auditing Auditing { "before": { "id": 1004, "last_name": "Kretchmar",

    "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers #JavaZone #Debezium
  28. @gunnarmorling Order Item Stock App Local DB Local DB Local

    DB App App Item Changes Stock Changes Pattern: Microservice Data Pattern: Microservice Data Synchronization Synchronization Microservice Architectures Microservice Architectures Propagate data between different services without coupling Each service keeps optimised views locally #JavaZone #Debezium
  29. Source DB (with "Events" table) Kafka Connect Apache Kafka DBZ

    Order Events Credit Worthiness Check Events Pattern: Outbox Pattern: Outbox Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service Customer Service ID Category Type Payload 123 Order OrderCreated { "id" : 123, ... } 456 Order OrderDetail- Canceled { "id" : 456, ... } 789 ... ... ... Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table #JavaZone #Debezium
  30. Pattern: Microservice Extraction Pattern: Microservice Extraction Migrating from Monoliths to

    Microservices Migrating from Monoliths to Microservices Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards @gunnarmorling #JavaZone #Debezium
  31. Pattern: Leverage the Powers of SMTs Pattern: Leverage the Powers

    of SMTs Single Message Transformations Single Message Transformations Aggregate sharded tables to single topic Keep compatibility with existing consumers Format conversions, e.g. for dates Ensure compatibility with sink connectors Extracting "after" state only Expand MongoDB's JSON structures @gunnarmorling #JavaZone #Debezium
  32. Pattern: Ensuring Data Quality Pattern: Ensuring Data Quality Detecting Missing

    or Wrong Data Detecting Missing or Wrong Data Constantly compare record counts on source and sink side Raise alert if threshold is reached Compare every n-th record field by field E.g. have all records compared within one week @gunnarmorling #JavaZone #Debezium
  33. Running Debezium on Kubernetes Running Debezium on Kubernetes Strimzi: Kubernetes

    Operator for Apache Kafka Strimzi: Kubernetes Operator for Apache Kafka YAML-based custom resource definitions for Kafka clusters, Kafka Connect, topics and users K8s Operator sets up Kafka, ZooKeeper etc. base on that Sandbox project at CNCF Supported by Red Hat AMQ Streams Debezium Developer Preview @gunnarmorling #JavaZone #Debezium
  34. Summary Summary CDC enables use cases such as replication, microservices

    data exchange and much more Debezium: CDC for a growing number of databases Contributions welcome! Tell us about your feature requests and ideas! @gunnarmorling “ Friends Don't Let Friends Do Dual Writes #JavaZone #Debezium
  35. Resources Resources Website: Source code, examples, Compose files etc. Discussion

    group Strimzi (Kafka on Kubernetes/OpenShift) Latest news: @debezium https://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium https://strimzi.io/ @gunnarmorling #JavaZone #Debezium