Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Use Cases With Apache Kafka and Debezium (JavaDay Istanbul 2020)

Change Data Streaming Use Cases With Apache Kafka and Debezium (JavaDay Istanbul 2020)

Debezium (noun de·be·zi·um /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration, e.g. using the outbox pattern, as well as many other CDC applications, such as maintaining audit logs, automatically keeping your full-text search index in sync, and driving streaming queries. We’ll also discuss practical matters, e.g. HA set-ups, best practices for running Debezium in production on and off Kubernetes, and the many use cases enabled by Kafka Connect's single message transformations.

Gunnar Morling

September 12, 2020
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Change Data Streaming Use Cases With Apache Kafka and Debezium

    Java Day Istanbul 2020 Gunnar Morling Software Engineer @gunnarmorling
  2. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3
  3. Gunnar Morling Open source software engineer at Red Hat Debezium

    Hibernate Quarkus Spec Lead for Bean Validation 2.0 Other projects: Layrry, Deptective, MapStruct Java Champion @gunnarmorling #Debezium
  4. A Common Problem Updating Multiple Resources Order Service Cache Database

    Search Index “ Friends Don't Let Friends Do Dual Writes @gunnarmorling #Debezium
  5. A Better Solution Streaming Change Events From the Database Order

    Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium
  6. A Better Solution Streaming Change Events From the Database Order

    Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium
  7. Debezium Change Data Capture Platform CDC for multiple databases Based

    on transaction logs Snapshotting, Filtering etc. Fully open-source, very active community Via Apache Kafka or embedded Many production deployments (e.g. WePay, Shopify, Convoy, JW Player, Usabilla etc.) @gunnarmorling #Debezium
  8. Debezium Connectors MySQL Postgres MongoDB SQL Server Incubating: Db2 Cassandra

    Oracle (Incubating, based on XStream) @gunnarmorling #Debezium
  9. Log- vs. Query-Based CDC Query-Based Log-Based All data changes are

    captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + - @gunnarmorling #Debezium
  10. { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name":

    "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql-bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro Using a schema registry @gunnarmorling #Debezium
  11. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  12. Postgres MySQL Apache Kafka Kafka Connect Kafka Connect Data Replication

    Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  13. Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ PG

    DBZ MySQL Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  14. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  15. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector Data Warehouse Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium
  16. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium
  17. Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG

    DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium
  18. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  19. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  20. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium
  21. Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events

    Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events bit.ly/debezium-auditlogs @gunnarmorling #Debezium
  22. { "id": "tx-3" } { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3" }, "op": "u", "ts_ms": 1486500577691 } Transactions Customers { "before": null, "after": { "id": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx-3" }, "op": "c", "ts_ms": 1486500577691 } @gunnarmorling #Debezium
  23. { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" },

    "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers Auditing @gunnarmorling #Debezium
  24. @Override public KeyValue<JsonObject, JsonObject> transform(JsonObject key, JsonObject value) { boolean

    enrichedAllBufferedEvents = enrichAndEmitBufferedEvents(); if (!enrichedAllBufferedEvents) { bufferChangeEvent(key, value); return null; } KeyValue<JsonObject, JsonObject> enriched = enrichWithTxMetaData(key, value); if (enriched == null) { bufferChangeEvent(key, value); } return enriched; } Auditing Non-trivial join implementation no ordering across topics need to buffer change events until TX data available bit.ly/debezium-auditlogs @gunnarmorling #Debezium
  25. Order Item Stock App Local DB Local DB Local DB

    App App Item Changes Stock Changes Microservice Architectures Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally @gunnarmorling #Debezium
  26. Source DB Kafka Connect Apache Kafka DBZ Order Events Credit

    Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Orders Outbox @gunnarmorling #Debezium
  27. Source DB Kafka Connect Apache Kafka DBZ Order Events Credit

    Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table bit.ly/debezium-outbox-pattern Orders Outbox @gunnarmorling #Debezium
  28. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  29. apiVersion: "kafka.strimzi.io/v1alpha1" kind: "KafkaConnector" metadata: name: "inventory-connector" labels: connect-cluster: my-connect-cluster

    spec: class: i.d.c.p.PostgresConnector tasksMax: 1 config: database.hostname: "postgres", database.port: "5432", database.user: "bob", database.password: "secret", database.dbname : "prod", database.server.name: "dbserver1", schema.whitelist: "inventory" Running on Kubernetes Deployment via Operators YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds @gunnarmorling #Debezium
  30. Single Message Transformations The Swiss Army Knife of Kafka Connect

    Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Coming with Debezium: Router, "Flattener" @gunnarmorling © Emilian Robert Vicol https://flic.kr/p/c8s6Y3 #Debezium
  31. Single Message Transformations Externalizing Large Column Values @gunnarmorling DBZ Amazon

    S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]", "image": "imgs-<offset>-after" }, ... } #Debezium
  32. Takeaways Change Data Capture – Liberation for your data! Enabling

    use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling #Debezium
  33. Takeaways Change Data Capture – Liberation for your data! Enabling

    use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling “ Friends Don't Let Friends Do Dual-Writes #Debezium
  34. Outlook: View Materialization Awareness of Transaction Boundaries Topic with BEGIN/END

    markers Enable consumers to buffer all events of one transaction { "transactionId" : "tx-123", "eventType" : "begin transaction", "ts_ms": 1486500577125 } { "transactionId" : "tx-123", "ts_ms": 1486500577691, "eventType" : "end transaction", "eventCount" : [ { "name" : "dbserver1.inventory.Order", "count" : 1 }, { "name" : "dbserver1.inventory.OrderLine", "count" : 5 } ] } BEGIN END @gunnarmorling