Slide 1

Slide 1 text

Image © Nicolas Buffler https://flic.kr/p/jpWcWD (CC BY 2.0) Debezium Snapshots Revisited! Gunnar Morling Senior Staff Software Engineer, Decodable @gunnarmorling

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

#DebeziumSnapshotting @gunnarmorling Agenda

Slide 4

Slide 4 text

#DebeziumSnapshotting @gunnarmorling ● Software engineer at Decodable ● Former project lead of Debezium ● kcctl 🧸, JfrUnit, ModiTect, MapStruct ● Spec Lead for Bean Validation 2.0 ● Java Champion Gunnar Morling

Slide 5

Slide 5 text

#DebeziumSnapshotting @gunnarmorling Recap – Debezium Log-Based Change Data Capture

Slide 6

Slide 6 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Why Is It Needed? ● Need to backfill data, but don’t have all TX logs ● Solution: scan data once before streaming ● Emit READ event for each record

Slide 7

Slide 7 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Classic Approach – General Idea ● Capture current position in transaction log ● Scan all relevant tables ● Start streaming

Slide 8

Slide 8 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Key Configuration Options ● snapshot.mode (initial, never, schema_only_recovery) ● snapshot.select.statement.overrides ● snapshot.max.threads

Slide 9

Slide 9 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Limitations of Classic Approach ● Can’t update filter list

Slide 10

Slide 10 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Limitations of Classic Approach ● Can’t update filter list ● Can’t pause & resume long-running snapshots

Slide 11

Slide 11 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Limitations of Classic Approach ● Can’t update filter list ● Can’t pause & resume long-running snapshots ● Can’t stream changes until snapshot completed

Slide 12

Slide 12 text

#DebeziumSnapshotting @gunnarmorling Snapshotting Limitations of Classic Approach ● Can’t update filter list ● Can’t pause & resume long-running snapshots ● Can’t stream changes until snapshot completed ● Can’t re-snapshot selected tables

Slide 13

Slide 13 text

Incremental Snapshots © Karen Blaha https://flic.kr/p/aeuPys (CC BY-SA 2.0)

Slide 14

Slide 14 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting The Paper ● “DBLog: A Watermark Based Change-Data-Capture Framework”, by Andreas Andreakis and Ioannis Papapanagiotou ● Key idea: interleave snapshot events and events from TX log https://arxiv.org/pdf/2010.12597v1.pdf

Slide 15

Slide 15 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting General Idea

Slide 16

Slide 16 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Windowing via Watermarks

Slide 17

Slide 17 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Buffer Processing

Slide 18

Slide 18 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Buffer Processing

Slide 19

Slide 19 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Semantics ● No guarantee for snapshot (read) events for all records ● May receive update or delete without prior insert/read ● May receive read and update/delete ● What is guaranteed: complete data set after snapshot

Slide 20

Slide 20 text

Demo © Wall Boat https://flic.kr/p/Y6zkmX (Public Domain)

Slide 21

Slide 21 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Connector Offsets

Slide 22

Slide 22 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting MySQL Read-Only Snapshots ● Write access to DB may be not desirable

Slide 23

Slide 23 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Signalling Channels ● Database table ● Kafka topic ● JMX ● Custom id 924e3ff8-2245-43ca-ba77-2af9af02fa07 type log, {execute|pause|resume|stop}-snapshot value { "data-collections": ["schema1.table1", "schema2.table2"], "type":"incremental", "additional-condition":"color=blue" }

Slide 24

Slide 24 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Notifications

Slide 25

Slide 25 text

#Debezium + #ApacheFlink | @gunnarmorling Comparison

Slide 26

Slide 26 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Benefits ● Can update filter list ✅

Slide 27

Slide 27 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Benefits ● Can update filter list ✅ ● Long-running snapshots can be paused/resumed ✅

Slide 28

Slide 28 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Benefits ● Can update filter list ✅ ● Long-running snapshots can be paused/resumed ✅ ● Can stream changes before snapshot completed ✅

Slide 29

Slide 29 text

#DebeziumSnapshotting @gunnarmorling Incremental Snapshotting Benefits ● Can update filter list ✅ ● Long-running snapshots can be paused/resumed ✅ ● Can stream changes before snapshot completed ✅ ● Can re-snapshot selected tables ✅

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

#DebeziumSnapshotting @gunnarmorling ● Incremental Snapshots in Debezium https://debezium.io/blog/2021/10/07/incremental-snapshots/ ● Read-only Incremental Snapshots for MySQL https://debezium.io/blog/2022/04/07/read-only-incremental-snapshots/ ● Flink CDC https://ververica.github.io/flink-cdc-connectors/ Resources

Slide 32

Slide 32 text

#DebeziumSnapshotting @gunnarmorling ● Debezium & Kafka Connect – Ask the Experts With Chris Cranford (Red Hat) and Chris Egerton (Aiven) Sep 27, 2:30 PM ● Change Stream Processing with Debezium and Apache Flink With Robert Metzger (Decodable) Sep 27, 5:30 PM, Dremio Office https://www.meetup.com/sf-big-analytics/events/294068331/ Upcoming

Slide 33

Slide 33 text

#DebeziumSnapshotting @gunnarmorling Q & A [email protected] @gunnarmorling 📧 Thank You!

Slide 34

Slide 34 text

No content