Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Change Data Capture with Debezium And Apache Kafka

Aykut Bulgu
November 26, 2020

Introducing Change Data Capture with Debezium And Apache Kafka

Istanbul JUG, Nov 26th 2020

Follow me on twitter (@systemcraftsman) or subscribe to https://www.systemcraftsman.com/join/ to get any updates from me.

Aykut Bulgu

November 26, 2020
Tweet

More Decks by Aykut Bulgu

Other Decks in Technology

Transcript

  1. @systemcraftsman #oc apply -f aykutbulgu.yaml apiVersion: redhat/v2.5 kind: Middleware &

    AppDev Consultant metadata: name: Aykut Bulgu namespace: Red Hat Consulting - CEMEA Annotations: twitter: @systemcraftsman email: [email protected] organizer: Software Craftsmanship Turkey founder: System Craftsman labels: married: yes children: daughter interests: tech (cloud & middleware), aikido, 80s spec: replicas: 2 containers: - image: aykut:latest Me as Code
  2. @systemcraftsman Agenda The Issue with Dual Writes What's the problem?

    Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
  3. @systemcraftsman As a Solution Stream changes events from the database

    Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  4. @systemcraftsman As a Solution Stream changes events from the database

    Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  5. @systemcraftsman Change Data Capture with Debezium Debezium is an open

    source distributed platform for change data capture
  6. @systemcraftsman Debezium Change Data Capture Platform CDC for multiple databases

    Based on transaction logs Snapshotting, filtering, etc. Fully open-source, very active community Latest version: 1.3 Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)
  7. @systemcraftsman Advantages of Log-based CDC Tailing the Transaction Logs All

    data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/
  8. @systemcraftsman Log vs Query based CDC Query-based Log-based All data

    changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -
  9. @systemcraftsman Debezium Change Event Structure • Key: PK of table

    • Value: Describing the change event ◦ Before state, ◦ After state, ◦ Metadata info • Serialization formats: ◦ JSON ◦ Avro • Cloud events could be used too
  10. @systemcraftsman Single Message Transformations Image Source: “Penknife, Swiss Army Knife”

    by Emilian Robert Vicol , used under CC BY 2.0 Lightweight single message inline transformation Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Transformation does not interact with external systems Modify events before storing in Kafka
  11. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka
  12. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect
  13. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL
  14. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch
  15. @systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | |

    | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch SQL Connector Data Warehouse
  16. @systemcraftsman Auditing Source: http://bit.ly/debezium-auditlogs | | | | | |

    | | DBZ CRM Service Source DB Kafka Connect Apache Kafka CDC and a bit of Kafka Streams
  17. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  18. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  19. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  20. @systemcraftsman Auditing | | | | | | | |

    DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams | | | | | | | | Enriched Customers CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  21. @systemcraftsman Microservices Propagate data between different services without coupling Each

    service keeps optimised views locally Microservices Data Exchange
  22. @systemcraftsman Microservices Photo: “Strangler vines on trees, seen on the

    Mount Sorrow hike” by cynren, under CC BY SA 2.0 Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards Strangler Pattern
  23. @systemcraftsman Running on OpenShift Provides: Container images for Apache Kafka,

    Connect, Zookeeper and MirrorMaker Kubernetes Operators for managing/configuring Apache Kafka clusters, topics and users Kafka Consumer, Producer and Admin clients, Kafka Streams Upstream Community: Strimzi Cloud-native Apache Kafka
  24. @systemcraftsman Running on OpenShift Source: YAML-based custom resource definitions for

    Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds Deployment via Operators