Slide 1

Slide 1 text

Open-Source Change Data Capture With Debezium Gunnar Morling Software Engineer, Red Hat @gunnarmorling

Slide 2

Slide 2 text

#Debezium @gunnarmorling Today’s Objectives Learn About…

Slide 3

Slide 3 text

#Debezium @gunnarmorling ● Open source software engineer at Red Hat ○ Debezium ○ Quarkus ● Spec Lead for Bean Validation 2.0 ● kcctl, ModiTect, MapStruct ● Java Champion ● @gunnarmorling Gunnar Morling

Slide 4

Slide 4 text

#Debezium @gunnarmorling ● Taps into TX log to capture INSERT/UPDATE/DELETE events ● Propagated to consumers via Apache Kafka and Kafka Connect Debezium — Log-based Change Data Capture

Slide 5

Slide 5 text

#Debezium @gunnarmorling Change Data Capture A Giant Enabler for Your Data

Slide 6

Slide 6 text

#Debezium @gunnarmorling Debezium in a Nutshell ● A CDC Platform ■ Based on transaction logs ■ Snapshotting, filtering, etc. ■ Outbox support ■ Web-based UI ● Fully open-source, very active community ● Large production deployments

Slide 7

Slide 7 text

#Debezium @gunnarmorling Debezium: Connectors ● Stable ■ MySQL ■ Postgres ■ MongoDB ■ SQL Server ■ Db2 ■ Oracle ● Incubating ■ Vitess ■ Cassandra

Slide 8

Slide 8 text

#Debezium @gunnarmorling Debezium: Connectors Becoming the De-Facto CDC Standard https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-connector-scylla-cdc-source-connector/

Slide 9

Slide 9 text

#Debezium @gunnarmorling Debezium Architecture

Slide 10

Slide 10 text

#Debezium @gunnarmorling Debezium: Deployment Alternatives Embedded Engine and Debezium Server

Slide 11

Slide 11 text

#Debezium @gunnarmorling Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 12

Slide 12 text

#Debezium @gunnarmorling ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp Data Change Events

Slide 13

Slide 13 text

#Debezium @gunnarmorling ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp Data Change Events

Slide 14

Slide 14 text

Outbox Pattern

Slide 15

Slide 15 text

#Debezium @gunnarmorling ● Services need to update their database, ● send messages to other services, ● and that consistently! Challenge: Microservices Data Exchange

Slide 16

Slide 16 text

#Debezium @gunnarmorling “Dual writes” are prone to inconsistencies! Outbox Pattern

Slide 17

Slide 17 text

#Debezium @gunnarmorling Outbox Pattern

Slide 18

Slide 18 text

#Debezium @gunnarmorling Outbox Pattern

Slide 19

Slide 19 text

#Debezium @gunnarmorling Outbox Pattern

Slide 20

Slide 20 text

#Debezium @gunnarmorling Outbox Pattern

Slide 21

Slide 21 text

#Debezium @gunnarmorling Variation on Postgres pg_logical_emit_message() ● Directly writing arbitrary messages to the WAL ● No need for an outbox table

Slide 22

Slide 22 text

#Debezium @gunnarmorling pg_logical_emit_message() Exporting auditing metadata ● Pure CDC events lack metadata like business user, device id, etc. ● Solution: emit at TX begin, enrich events e.g. using SMT

Slide 23

Slide 23 text

#Debezium @gunnarmorling Directly Emitting WAL Events The Ask Provide a facility for producing raw WAL events

Slide 24

Slide 24 text

Challenges

Slide 25

Slide 25 text

#Debezium @gunnarmorling Keeping Track of Table Schemas How to Interpret Incoming Events? ● Messages typically not self-descriptive ● Challenge: incoming events may adhere to earlier schema version

Slide 26

Slide 26 text

#Debezium @gunnarmorling MySQL DDL Parser Solution: Parse DDL Events ● Based on Antlr parser generator

Slide 27

Slide 27 text

#Debezium @gunnarmorling Recovering schema after restarts ● Persisting schema change history in a Kafka topic

Slide 28

Slide 28 text

#Debezium @gunnarmorling Keeping Track of Table Schemas The Ask ● Efficient is good, but make it simple to consume ● Provide it when needed, as e.g. in Postgres (pgoutput) ○ At the beginning of session ○ After a table change ● Facility to query past schema versions

Slide 29

Slide 29 text

#Debezium @gunnarmorling On the Subject of Parsing… LogMiner Events

Slide 30

Slide 30 text

#Debezium @gunnarmorling Preventing Unbounded WAL Growth (I) Challenging API designs

Slide 31

Slide 31 text

#Debezium @gunnarmorling Preventing Unbounded WAL Growth (I) Can’t Commit Offsets Without Events

Slide 32

Slide 32 text

#Debezium @gunnarmorling Preventing Unbounded WAL Growth (II) High-traffic/Low-traffic Logical Databases ● Problem: ○ WAL global ○ Logical replication slots per database

Slide 33

Slide 33 text

#Debezium @gunnarmorling Preventing Unbounded WAL Growth The Ask Make sure interfaces work correctly also in corner cases

Slide 34

Slide 34 text

Snapshotting

Slide 35

Slide 35 text

#Debezium @gunnarmorling Snapshotting General Idea ● Need initial backfill of sink systems, but don’t have all TX logs ● Solution: scan data once before streaming

Slide 36

Slide 36 text

#Debezium @gunnarmorling Snapshotting The Ask Allow for consistent, lock-less snapshots

Slide 37

Slide 37 text

#Debezium @gunnarmorling Snapshotting Limitations of Classic Approach ● Can’t update filter list ● Long-running snapshots can’t be paused/resumed ● Can’t stream changes until snapshot completed ● Can’t re-snapshot selected tables

Slide 38

Slide 38 text

#Debezium @gunnarmorling Snapshotting Incremental Snapshotting ● “DBLog: A Watermark Based Change-Data-Capture Framework”, by Andreas Andreakis and Ioannis Papapanagiotou ● Key idea: interleave snapshot events and events from TX log https://arxiv.org/pdf/2010.12597v1.pdf

Slide 39

Slide 39 text

#Debezium @gunnarmorling Snapshotting Incremental Snapshotting

Slide 40

Slide 40 text

#Debezium @gunnarmorling Incremental Snapshotting Windowing via Watermarks

Slide 41

Slide 41 text

#Debezium @gunnarmorling Incremental Snapshotting Buffer Processing

Slide 42

Slide 42 text

#Debezium @gunnarmorling Incremental Snapshotting Buffer Processing

Slide 43

Slide 43 text

#Debezium @gunnarmorling Incremental Snapshotting Semantics ● No guarantee for snapshot (read) events for all records ● May receive update or delete without prior insert/read ● May receive read and update/delete ● What is guaranteed: complete data set after snapshot

Slide 44

Slide 44 text

#Debezium @gunnarmorling Incremental Snapshotting Comparison ● Can’t update filter list ✅ ● Long-running snapshots can’t be paused/resumed ✅ ● Can’t stream changes until snapshot completed ✅ ● Can’t re-snapshot selected tables ✅

Slide 45

Slide 45 text

Wrap-Up

Slide 46

Slide 46 text

#Debezium @gunnarmorling ● CDC: SELECT, INSERT, UPDATE, DELETE… STREAM? ● Debezium: open-source CDC for a variety of databases ● Outlook: incrementally updated materialized views? Takeaways

Slide 47

Slide 47 text

#Debezium @gunnarmorling ● Debezium https://debezium.io/ ● Incremental snapshotting https://debezium.io/blog/2021/10/07/incremental-snapshots/ ● Outbox implementation https://debezium.io/blog/2019/02/19/reliable-microservices-data -exchange-with-the-outbox-pattern/ ● Demo repo https://github.com/debezium/debezium-examples Resources

Slide 48

Slide 48 text

#Debezium @gunnarmorling Q & A [email protected] @gunnarmorling 📧 Thank You!

Slide 49

Slide 49 text

#Debezium @gunnarmorling Unsplash https://unsplash.com/license © Pablo García Saldaña https://unsplash.com/photos/lPQIndZz8Mo © David Clode https://unsplash.com/photos/T49WTav4LgU © Aaron Burden https://unsplash.com/photos/GFpxQ2ZyNc0 © Nathan Dumlao https://unsplash.com/photos/wQDysNUCKfw © mari lezhava https://unsplash.com/photos/q65bNe9fW-w © Michał Parzuchowski https://unsplash.com/photos/Bt0PM7cNJFQ © Charles Forerunner https://unsplash.com/photos/3fPXt37X6UQ Flickr Attribution 2.0 Generic https://creativecommons.org/licenses/by/2.0/ © Thomas Kamann https://flic.kr/p/coa2c CC0 1.0 Universal Public Domain Dedication https://creativecommons.org/publicdomain/zero/1.0/ © Wall Boat https://flic.kr/p/Y6zkmX Attribution-ShareAlike 2.0 Generic https://creativecommons.org/licenses/by-sa/2.0/ © Andrew Hart https://flic.kr/p/dmjkSk Attribution 2.0 Generic (CC BY 2.0) https://creativecommons.org/licenses/by/2.0/ © Ryan https://flic.kr/p/8gwtzo Image Credits In Order of Appearance