@systemcraftsman Introducing Change Data Capture with Debezium and Apache Kafka Aykut M. Bulgu Technology Consultant | Software Architect [email protected]
@systemcraftsman Agenda The Issue with Dual Writes What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms
@systemcraftsman As a Solution Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
@systemcraftsman As a Solution Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
@systemcraftsman Debezium Change Data Capture Platform CDC for multiple databases Based on transaction logs Snapshotting, filtering, etc. Fully open-source, very active community Latest version: 1.3 Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)
@systemcraftsman Advantages of Log-based CDC Tailing the Transaction Logs All data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/
@systemcraftsman Log vs Query based CDC Query-based Log-based All data changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -
@systemcraftsman Debezium Change Event Structure ● Key: PK of table ● Value: Describing the change event ○ Before state, ○ After state, ○ Metadata info ● Serialization formats: ○ JSON ○ Avro ● Cloud events could be used too
@systemcraftsman Single Message Transformations Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0 Lightweight single message inline transformation Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Transformation does not interact with external systems Modify events before storing in Kafka
@systemcraftsman Auditing Source: http://bit.ly/debezium-auditlogs | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka CDC and a bit of Kafka Streams
@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
@systemcraftsman Microservices Propagate data between different services without coupling Each service keeps optimised views locally Microservices Data Exchange
@systemcraftsman Microservices Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards Strangler Pattern
@systemcraftsman Running on OpenShift Source: YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds Deployment via Operators