Practical Change Data Streaming Use Cases With Apache Kafka and Debezium

Practical Change Data Streaming Use Cases With Apache Kafka and Debezium

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: how can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration as well as many other use cases such as extracting microservices out of monoliths, invalidating your 2nd-level cache after external data changes, automatically keeping your full-text search index in sync, maintaining audit logs, and much more. We’ll also discuss practical matters such as ensuring data quality in data streaming pipelines and implementing data conversions using single message transformations.

8e25c0ca4bf25113bd9c0ccc5d118164?s=128

Gunnar Morling

October 26, 2019
Tweet

Transcript

  1. Practical Change Data Streaming Use Cases Practical Change Data Streaming

    Use Cases With Apache Kafka and Debezium With Apache Kafka and Debezium Gunnar Morling Gunnar Morling Software Engineer
  2. DATA

  3. None
  4. None
  5. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3
  6. Gunnar Morling Gunnar Morling Open source software engineer at Red

    Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct Java Champion #Debezium @gunnarmorling
  7. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Database Order Service #Debezium
  8. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service #Debezium
  9. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service Search Index #Debezium
  10. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Order Service Cache Database Search Index “ Friends Don't Let Friends Do Dual Writes #Debezium
  11. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service #Debezium
  12. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #Debezium
  13. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #Debezium
  14. CDC = ... CDC = ... Consumer-Driven Contracts? Centers for

    Disease Control and Prevention? Caribbean Developers Conference? @gunnarmorling #Debezium © Brick Pics © https://flic.kr/p/q5YgYD https://cdc.dev/
  15. CDC = ... CDC = ... Consumer-Driven Contracts? Centers for

    Disease Control and Prevention? Caribbean Developers Conference? Change Data Capture! Change Data Capture! @gunnarmorling #Debezium
  16. Change Data Capture Change Data Capture With Debezium With Debezium

  17. Debezium Debezium Change Data Capture Platform Change Data Capture Platform

    Log-based CDC for multiple databases Comprehensive type support (PostGIS etc.) Snapshotting, Filtering etc. Via Apache Kafka or embedded Fully open-source, very active community Production deployments at multiple companies (e.g. WePay, Convoy, JW Player, Usabilla, BlaBlaCar etc.) @gunnarmorling #Debezium
  18. Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Cassandra

    (Incubating) Oracle (Incubating, based on XStream) Possible future additions DB2? MariaDB? @gunnarmorling #Debezium
  19. Meme idea: Robin Moffatt

  20. Log- vs. Query-Based CDC Log- vs. Query-Based CDC @gunnarmorling #Debezium

    Query-Based Log-Based All data changes are captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + -
  21. { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name":

    "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro @gunnarmorling #Debezium
  22. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  23. @gunnarmorling CDC – "Liberation for Your Data" CDC – "Liberation

    for Your Data" #Debezium
  24. @gunnarmorling Postgres MySQL Apache Kafka Data Replication Data Replication Zero-Code

    Streaming Pipelines Zero-Code Streaming Pipelines #Debezium
  25. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect #Debezium

    Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  26. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ

    PG DBZ MySQL #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  27. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  28. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector Data Warehouse #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines
  29. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan #Debezium Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines Data Warehouse
  30. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events CRM Service #Debezium
  31. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  32. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  33. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #Debezium
  34. @gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ

    Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events #Debezium
  35. @gunnarmorling Auditing Auditing #Debezium { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "annek@example.com" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } Customers
  36. @gunnarmorling { "before": { "id": 1004, "last_name": "Kretchmar", "email": "annek@example.com"

    }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers #Debezium { "id": "tx­3" }
  37. { "before": { "id": 1004, "last_name": "Kretchmar", "email": "annek@example.com" },

    "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers { "id": "tx­3" } @gunnarmorling #Debezium
  38. @gunnarmorling Auditing Auditing bit.ly/debezium-auditlogs { "before": { "id": 1004, "last_name":

    "Kretchmar", "email": "annek@example.com" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers #Debezium
  39. Demo: Demo: Streaming Queries Streaming Queries

  40. None
  41. Microservice Microservice CDC Patterns CDC Patterns

  42. @gunnarmorling Order Item Stock App Local DB Local DB Local

    DB App App Item Changes Stock Changes Microservice Architectures Microservice Architectures Data Synchronization Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally #Debezium
  43. Source DB (with "Outbox" table) Kafka Connect Apache Kafka DBZ

    Order Events Credit Worthiness Check Events Outbox Pattern Outbox Pattern Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service Customer Service Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table #Debezium bit.ly/debezium-outbox-pattern
  44. Strangler Pattern Strangler Pattern Migrating from Monoliths to Microservices Migrating

    from Monoliths to Microservices https://martinfowler.com/bliki/StranglerFigApplication.html @gunnarmorling #Debezium
  45. @gunnarmorling #Debezium Customer Strangler Strangler Pattern Pattern

  46. @gunnarmorling #Debezium Router CDC Customer Customer' Reads/ Writes Reads Strangler

    Strangler Pattern Pattern Transformation
  47. @gunnarmorling #Debezium Router CDC Customer Reads/ Writes Reads/ Writes CDC

    Strangler Strangler Pattern Pattern
  48. The Issue with Dual Writes What's the problem? Change data

    capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  49. @gunnarmorling #Debezium Deployment Topologies Deployment Topologies Basic Set-Up Basic Set-Up

    CDC
  50. Deployment Topologies Deployment Topologies Database High Availability Database High Availability

    @gunnarmorling #Debezium CDC
  51. Deployment Topologies Deployment Topologies Database High Availability Database High Availability

    @gunnarmorling #Debezium CDC
  52. Deployment Topologies Deployment Topologies Automatic Fail-over Automatic Fail-over @gunnarmorling #Debezium

    HA Proxy CDC
  53. Deployment Topologies Deployment Topologies Can't change binlog mode? Can't change

    binlog mode? @gunnarmorling #Debezium CDC Primary Secondary
  54. Deployment Topologies Deployment Topologies High Availability for Connectors High Availability

    for Connectors @gunnarmorling #Debezium CDC Deduplicator CDC
  55. Running on Kubernetes Running on Kubernetes Deployment via Operators Deployment

    via Operators YAML-based custom resource definitions for Kafka clusters, Kafka Connect, topics and users Operator applies configuration Advantages Automated deployment of Kafka, ZooKeeper etc. Easier Scaling (up and down) Simplified Upgrading Portability across clouds @gunnarmorling #Debezium
  56. Running on Kubernetes Running on Kubernetes Operating Kafka Connect Operating

    Kafka Connect Distributed mode Offsets stored in Kafka Configuration via REST Single node: no re-balancing issues (< Apache Kafka 2.3) Single connector: health checks based on REST API @gunnarmorling #Debezium
  57. Single Message Transformations Single Message Transformations The Swiss Army Knife

    of Kafka Connect The Swiss Army Knife of Kafka Connect Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers @gunnarmorling #Debezium
  58. Single Message Transformations Single Message Transformations Externalizing large field values

    Externalizing large field values @gunnarmorling #Debezium DBZ Amazon S3
  59. Single Message Transformations Single Message Transformations Externalizing large field values

    Externalizing large field values @gunnarmorling #Debezium DBZ Amazon S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "annek@noanswer.org", "image": "imgs­<offset>­after" }, ... }
  60. Summary Summary Change Data Capture – Liberation for your data!

    Enabling use cases such as replication, microservices data exchange and much more Debezium: open-source CDC for a growing number of databases @gunnarmorling #Debezium
  61. DATA

  62. Resources Resources Website: Source code, examples, Compose files etc. Discussion

    group Strimzi (Kafka on Kubernetes/OpenShift) Latest news: @debezium https://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium https://strimzi.io/ @gunnarmorling #Debezium
  63. gunnar@hibernate.org @gunnarmorling @gunnarmorling #Debezium ?!

  64. None