Change Data Streaming Use Cases With Apache Kafka and Debezium (JavaDay Istanbul 2020)

Change Data Streaming Use Cases With Apache Kafka and Debezium (JavaDay Istanbul 2020)

Debezium (noun de·be·zi·um /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration, e.g. using the outbox pattern, as well as many other CDC applications, such as maintaining audit logs, automatically keeping your full-text search index in sync, and driving streaming queries. We’ll also discuss practical matters, e.g. HA set-ups, best practices for running Debezium in production on and off Kubernetes, and the many use cases enabled by Kafka Connect's single message transformations.

8e25c0ca4bf25113bd9c0ccc5d118164?s=128

Gunnar Morling

September 12, 2020
Tweet

Transcript

  1. Change Data Streaming Use Cases With Apache Kafka and Debezium

    Java Day Istanbul 2020 Gunnar Morling Software Engineer @gunnarmorling
  2. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3
  3. Gunnar Morling Open source software engineer at Red Hat Debezium

    Hibernate Quarkus Spec Lead for Bean Validation 2.0 Other projects: Layrry, Deptective, MapStruct Java Champion @gunnarmorling #Debezium
  4. A Common Problem Updating Multiple Resources Database Order Service @gunnarmorling

    #Debezium
  5. A Common Problem Updating Multiple Resources Cache Database Order Service

    @gunnarmorling #Debezium
  6. A Common Problem Updating Multiple Resources Cache Database Order Service

    Search Index @gunnarmorling #Debezium
  7. A Common Problem Updating Multiple Resources Order Service Cache Database

    Search Index “ Friends Don't Let Friends Do Dual Writes @gunnarmorling #Debezium
  8. A Better Solution Streaming Change Events From the Database Order

    Service @gunnarmorling #Debezium
  9. A Better Solution Streaming Change Events From the Database Order

    Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium
  10. A Better Solution Streaming Change Events From the Database Order

    Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium
  11. Change Data Capture With Debezium

  12. Debezium Change Data Capture Platform CDC for multiple databases Based

    on transaction logs Snapshotting, Filtering etc. Fully open-source, very active community Via Apache Kafka or embedded Many production deployments (e.g. WePay, Shopify, Convoy, JW Player, Usabilla etc.) @gunnarmorling #Debezium
  13. Debezium Connectors MySQL Postgres MongoDB SQL Server Incubating: Db2 Cassandra

    Oracle (Incubating, based on XStream) @gunnarmorling #Debezium
  14. Meme idea: Robin Moffatt

  15. Log- vs. Query-Based CDC Query-Based Log-Based All data changes are

    captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + - @gunnarmorling #Debezium
  16. { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name":

    "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql-bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro Using a schema registry @gunnarmorling #Debezium
  17. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  18. CDC – "Liberation for Your Data" @gunnarmorling #Debezium

  19. Postgres MySQL Apache Kafka Data Replication Zero-Code Streaming Pipelines @gunnarmorling

    #Debezium
  20. Postgres MySQL Apache Kafka Kafka Connect Kafka Connect Data Replication

    Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  21. Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ PG

    DBZ MySQL Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  22. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  23. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector Data Warehouse Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  24. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium
  25. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium
  26. Data Replication Low-Latency Streaming Pipelines https://medium.com/convoy-tech/ @gunnarmorling #Debezium

  27. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    CRM Service @gunnarmorling #Debezium
  28. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  29. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  30. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  31. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events bit.ly/debezium-auditlogs @gunnarmorling #Debezium
  32. { "id": "tx-3" } { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "annek@example.com" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3" }, "op": "u", "ts_ms": 1486500577691 } Transactions Customers { "before": null, "after": { "id": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx-3" }, "op": "c", "ts_ms": 1486500577691 } @gunnarmorling #Debezium
  33. { "before": { "id": 1004, "last_name": "Kretchmar", "email": "annek@example.com" },

    "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers Auditing @gunnarmorling #Debezium
  34. @Override public KeyValue<JsonObject, JsonObject> transform(JsonObject key, JsonObject value) { boolean

    enrichedAllBufferedEvents = enrichAndEmitBufferedEvents(); if (!enrichedAllBufferedEvents) { bufferChangeEvent(key, value); return null; } KeyValue<JsonObject, JsonObject> enriched = enrichWithTxMetaData(key, value); if (enriched == null) { bufferChangeEvent(key, value); } return enriched; } Auditing Non-trivial join implementation no ordering across topics need to buffer change events until TX data available bit.ly/debezium-auditlogs @gunnarmorling #Debezium
  35. Microservice CDC Patterns

  36. Order Item Stock App Local DB Local DB Local DB

    App App Item Changes Stock Changes Microservice Architectures Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally @gunnarmorling #Debezium
  37. Source DB Kafka Connect Apache Kafka DBZ Order Events Credit

    Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Orders Outbox @gunnarmorling #Debezium
  38. Source DB Kafka Connect Apache Kafka DBZ Order Events Credit

    Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table bit.ly/debezium-outbox-pattern Orders Outbox @gunnarmorling #Debezium
  39. Strangler Pattern Migrating from Monoliths to Microservices https://martinfowler.com/bliki/StranglerFigApplication.html @gunnarmorling #Debezium

  40. Customer Strangler Pattern @gunnarmorling #Debezium

  41. Router CDC Customer Customer' Reads/ Writes Reads Strangler Pattern Transformation

    @gunnarmorling #Debezium
  42. Router CDC Customer Reads/ Writes Reads/ Writes CDC Strangler Pattern

    @gunnarmorling #Debezium
  43. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  44. @gunnarmorling Deployment Topologies Basic Set-Up CDC #Debezium

  45. Deployment Topologies Database High Availability @gunnarmorling CDC #Debezium

  46. Deployment Topologies Database High Availability @gunnarmorling CDC #Debezium

  47. Deployment Topologies Automatic Fail-over @gunnarmorling HA Proxy CDC #Debezium

  48. Deployment Topologies Automatic Fail-over @gunnarmorling HA Proxy CDC #Debezium

  49. Deployment Topologies Can't Change Binlog Mode? @gunnarmorling CDC Primary Secondary

    #Debezium
  50. Deployment Topologies High Availability for Connectors @gunnarmorling CDC Deduplicator CDC

    #Debezium
  51. apiVersion: "kafka.strimzi.io/v1alpha1" kind: "KafkaConnector" metadata: name: "inventory-connector" labels: connect-cluster: my-connect-cluster

    spec: class: i.d.c.p.PostgresConnector tasksMax: 1 config: database.hostname: "postgres", database.port: "5432", database.user: "bob", database.password: "secret", database.dbname : "prod", database.server.name: "dbserver1", schema.whitelist: "inventory" Running on Kubernetes Deployment via Operators YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds @gunnarmorling #Debezium
  52. Sneak Peek - UI Proof-of-Concept @gunnarmorling #Debezium

  53. @gunnarmorling Sneak Peek - UI Proof-of-Concept https://github.com/debezium/debezium-ui-poc/ #Debezium

  54. Single Message Transformations The Swiss Army Knife of Kafka Connect

    Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Coming with Debezium: Router, "Flattener" @gunnarmorling © Emilian Robert Vicol https://flic.kr/p/c8s6Y3 #Debezium
  55. Single Message Transformations Externalizing Large Column Values @gunnarmorling DBZ Amazon

    S3 #Debezium
  56. Single Message Transformations Externalizing Large Column Values @gunnarmorling DBZ Amazon

    S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org", "image": "imgs-<offset>-after" }, ... } #Debezium
  57. Takeaways Change Data Capture – Liberation for your data! Enabling

    use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling #Debezium
  58. Takeaways Change Data Capture – Liberation for your data! Enabling

    use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling “ Friends Don't Let Friends Do Dual-Writes #Debezium
  59. Resources Website Source code: Latest news: @debezium debezium.io debezium.io/documentation/online-resources debezium.io/blog

    https://github.com/debezium/ @gunnarmorling #Debezium
  60. gunnar@hibernate.org @gunnarmorling Q&A

  61. None
  62. Outlook: View Materialization Awareness of Transaction Boundaries Topic with BEGIN/END

    markers Enable consumers to buffer all events of one transaction { "transactionId" : "tx-123", "eventType" : "begin transaction", "ts_ms": 1486500577125 } { "transactionId" : "tx-123", "ts_ms": 1486500577691, "eventType" : "end transaction", "eventCount" : [ { "name" : "dbserver1.inventory.Order", "count" : 1 }, { "name" : "dbserver1.inventory.OrderLine", "count" : 5 } ] } BEGIN END @gunnarmorling