Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Change Data Capture with Debezium And Apache Kafka

E3afc47cd48ab5a032c7e2078c0d3ce7?s=47 Aykut Bulgu
November 26, 2020

Introducing Change Data Capture with Debezium And Apache Kafka

Istanbul JUG, Nov 26th 2020

Follow me on twitter (@systemcraftsman) or subscribe to https://www.systemcraftsman.com/join/ to get any updates from me.

E3afc47cd48ab5a032c7e2078c0d3ce7?s=128

Aykut Bulgu

November 26, 2020
Tweet

Transcript

  1. @systemcraftsman Introducing Change Data Capture with Debezium and Apache Kafka

    Aykut M. Bulgu Technology Consultant | Software Architect aykut@systemcraftsman.com
  2. @systemcraftsman #oc apply -f aykutbulgu.yaml apiVersion: redhat/v2.5 kind: Middleware &

    AppDev Consultant metadata: name: Aykut Bulgu namespace: Red Hat Consulting - CEMEA Annotations: twitter: @systemcraftsman email: aykut@systemcraftsman.com organizer: Software Craftsmanship Turkey founder: System Craftsman labels: married: yes children: daughter interests: tech (cloud & middleware), aikido, 80s spec: replicas: 2 containers: - image: aykut:latest Me as Code
  3. @systemcraftsman Agenda The Issue with Dual Writes What's the problem?

    Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  4. @systemcraftsman Common Problem Updating multiple resources Order Service Database

  5. @systemcraftsman Common Problem Updating multiple resources Order Service Database Cache

  6. @systemcraftsman Common Problem Updating multiple resources Order Service Database Cache

    Search Index
  7. @systemcraftsman Common Problem Updating multiple resources Order Service Database Cache

    Search Index
  8. @systemcraftsman ‘Friends Don't Let Friends Do Dual Writes!’

  9. @systemcraftsman As a Solution Stream changes events from the database

    Order Service
  10. @systemcraftsman As a Solution Stream changes events from the database

    Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  11. @systemcraftsman As a Solution Stream changes events from the database

    Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  12. @systemcraftsman Change Data Capture with Debezium Debezium is an open

    source distributed platform for change data capture
  13. @systemcraftsman Debezium Change Data Capture Platform CDC for multiple databases

    Based on transaction logs Snapshotting, filtering, etc. Fully open-source, very active community Latest version: 1.3 Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)
  14. @systemcraftsman Red Hat CDC Supported Databases GA Connectors: MySQL Postgres

    SQL Server MongoDB Developer Preview: DB2
  15. @systemcraftsman Advantages of Log-based CDC Tailing the Transaction Logs All

    data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/
  16. @systemcraftsman Log vs Query based CDC Query-based Log-based All data

    changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -
  17. @systemcraftsman Debezium Change Event Structure • Key: PK of table

    • Value: Describing the change event ◦ Before state, ◦ After state, ◦ Metadata info • Serialization formats: ◦ JSON ◦ Avro • Cloud events could be used too
  18. @systemcraftsman Single Message Transformations Image Source: “Penknife, Swiss Army Knife”

    by Emilian Robert Vicol , used under CC BY 2.0 Lightweight single message inline transformation Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Transformation does not interact with external systems Modify events before storing in Kafka
  19. @systemcraftsman Change Data Capture Usages & Patterns

  20. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka
  21. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect
  22. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL
  23. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch
  24. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch SQL Connector Data Warehouse
  25. @systemcraftsman Auditing Source: http://bit.ly/debezium-auditlogs | | | | | |

    | | DBZ CRM Service Source DB Kafka Connect Apache Kafka CDC and a bit of Kafka Streams
  26. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  27. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  28. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  29. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams | | | | | | | | Enriched Customers CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  30. @systemcraftsman Auditing CDC and a bit of Kafka Streams Source:

    http://bit.ly/debezium-auditlogs
  31. @systemcraftsman Auditing CDC and a bit of Kafka Streams Source:

    http://bit.ly/debezium-auditlogs
  32. @systemcraftsman Auditing CDC and a bit of Kafka Streams Source:

    http://bit.ly/debezium-auditlogs
  33. @systemcraftsman Auditing CDC and a bit of Kafka Streams Source:

    http://bit.ly/debezium-auditlogs
  34. @systemcraftsman Microservices Propagate data between different services without coupling Each

    service keeps optimised views locally Microservices Data Exchange
  35. @systemcraftsman Microservices Source: http://bit.ly/debezium-outbox-pattern Outbox Pattern

  36. @systemcraftsman Microservices Photo: “Strangler vines on trees, seen on the

    Mount Sorrow hike” by cynren, under CC BY SA 2.0 Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards Strangler Pattern
  37. @systemcraftsman Mono to micro: Strangler Pattern Customer

  38. @systemcraftsman Mono to micro: Strangler Pattern Customer Customer Router CDC

    Transformation Reads / Writes Reads
  39. @systemcraftsman Mono to micro: Strangler Pattern Customer Router CDC Reads

    / Writes Reads / Writes CDC
  40. @systemcraftsman Running on OpenShift Getting the best cloud-native Apache Kafka

    running on enterprise Kubernetes
  41. @systemcraftsman Running on OpenShift Provides: Container images for Apache Kafka,

    Connect, Zookeeper and MirrorMaker Kubernetes Operators for managing/configuring Apache Kafka clusters, topics and users Kafka Consumer, Producer and Admin clients, Kafka Streams Upstream Community: Strimzi Cloud-native Apache Kafka
  42. @systemcraftsman Running on OpenShift Source: YAML-based custom resource definitions for

    Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds Deployment via Operators
  43. @systemcraftsman Demo Time! https://github.com/systemcraftsman/debezium-demo

  44. @systemcraftsman Thank You @systemcraftsman aykut@systemcraftsman.com aykut@redhat.com