Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Change Data Streaming Use Cases With ...

Practical Change Data Streaming Use Cases With Apache Kafka and Debezium

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: how can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration as well as many other use cases such as extracting microservices out of monoliths, invalidating your 2nd-level cache after external data changes, automatically keeping your full-text search index in sync, maintaining audit logs, and much more. We’ll also discuss practical matters such as ensuring data quality in data streaming pipelines and implementing data conversions using single message transformations.

Gunnar Morling

October 26, 2019
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Practical Change Data Streaming Use Cases Practical Change Data Streaming

    Use Cases With Apache Kafka and Debezium With Apache Kafka and Debezium Gunnar Morling Gunnar Morling Software Engineer
  2. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3
  3. Gunnar Morling Gunnar Morling Open source software engineer at Red

    Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct Java Champion #Debezium @gunnarmorling
  4. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Database Order Service #Debezium
  5. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service #Debezium
  6. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service Search Index #Debezium
  7. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Order Service Cache Database Search Index “ Friends Don't Let Friends Do Dual Writes #Debezium
  8. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service #Debezium
  9. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #Debezium
  10. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #Debezium
  11. CDC = ... CDC = ... Consumer-Driven Contracts? Centers for

    Disease Control and Prevention? Caribbean Developers Conference? @gunnarmorling #Debezium © Brick Pics © https://flic.kr/p/q5YgYD https://cdc.dev/
  12. CDC = ... CDC = ... Consumer-Driven Contracts? Centers for

    Disease Control and Prevention? Caribbean Developers Conference? Change Data Capture! Change Data Capture! @gunnarmorling #Debezium
  13. Debezium Debezium Change Data Capture Platform Change Data Capture Platform

    Log-based CDC for multiple databases Comprehensive type support (PostGIS etc.) Snapshotting, Filtering etc. Via Apache Kafka or embedded Fully open-source, very active community Production deployments at multiple companies (e.g. WePay, Convoy, JW Player, Usabilla, BlaBlaCar etc.) @gunnarmorling #Debezium
  14. Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Cassandra

    (Incubating) Oracle (Incubating, based on XStream) Possible future additions DB2? MariaDB? @gunnarmorling #Debezium
  15. Log- vs. Query-Based CDC Log- vs. Query-Based CDC @gunnarmorling #Debezium

    Query-Based Log-Based All data changes are captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + -
  16. { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name":

    "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro @gunnarmorling #Debezium
  17. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  18. @gunnarmorling Postgres MySQL Apache Kafka Data Replication Data Replication Zero-Code

    Streaming Pipelines Zero-Code Streaming Pipelines #Debezium
  19. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect #Debezium

    Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  20. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ

    PG DBZ MySQL #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  21. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  22. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector Data Warehouse #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  23. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines Data Warehouse
  24. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  25. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  26. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  27. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events #Debezium
  28. @gunnarmorling Auditing Auditing #Debezium { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } Customers
  29. @gunnarmorling { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]"

    }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers #Debezium { "id": "tx­3" }
  30. { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" },

    "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers { "id": "tx­3" } @gunnarmorling #Debezium
  31. @gunnarmorling Auditing Auditing bit.ly/debezium-auditlogs { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers #Debezium
  32. @gunnarmorling Order Item Stock App Local DB Local DB Local

    DB App App Item Changes Stock Changes Microservice Architectures Microservice Architectures Data Synchronization Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally #Debezium
  33. Source DB (with "Outbox" table) Kafka Connect Apache Kafka DBZ

    Order Events Credit Worthiness Check Events Outbox Pattern Outbox Pattern Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service Customer Service Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table #Debezium bit.ly/debezium-outbox-pattern
  34. Strangler Pattern Strangler Pattern Migrating from Monoliths to Microservices Migrating

    from Monoliths to Microservices https://martinfowler.com/bliki/StranglerFigApplication.html @gunnarmorling #Debezium
  35. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  36. Deployment Topologies Deployment Topologies Can't change binlog mode? Can't change

    binlog mode? @gunnarmorling #Debezium CDC Primary Secondary
  37. Deployment Topologies Deployment Topologies High Availability for Connectors High Availability

    for Connectors @gunnarmorling #Debezium CDC Deduplicator CDC
  38. Running on Kubernetes Running on Kubernetes Deployment via Operators Deployment

    via Operators YAML-based custom resource definitions for Kafka clusters, Kafka Connect, topics and users Operator applies configuration Advantages Automated deployment of Kafka, ZooKeeper etc. Easier Scaling (up and down) Simplified Upgrading Portability across clouds @gunnarmorling #Debezium
  39. Running on Kubernetes Running on Kubernetes Operating Kafka Connect Operating

    Kafka Connect Distributed mode Offsets stored in Kafka Configuration via REST Single node: no re-balancing issues (< Apache Kafka 2.3) Single connector: health checks based on REST API @gunnarmorling #Debezium
  40. Single Message Transformations Single Message Transformations The Swiss Army Knife

    of Kafka Connect The Swiss Army Knife of Kafka Connect Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers @gunnarmorling #Debezium
  41. Single Message Transformations Single Message Transformations Externalizing large field values

    Externalizing large field values @gunnarmorling #Debezium DBZ Amazon S3
  42. Single Message Transformations Single Message Transformations Externalizing large field values

    Externalizing large field values @gunnarmorling #Debezium DBZ Amazon S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]", "image": "imgs­<offset>­after" }, ... }
  43. Summary Summary Change Data Capture – Liberation for your data!

    Enabling use cases such as replication, microservices data exchange and much more Debezium: open-source CDC for a growing number of databases @gunnarmorling #Debezium
  44. Resources Resources Website: Source code, examples, Compose files etc. Discussion

    group Strimzi (Kafka on Kubernetes/OpenShift) Latest news: @debezium https://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium https://strimzi.io/ @gunnarmorling #Debezium