Slide 1

Slide 1 text

Image © Benjamin White https://flic.kr/p/2iGM2x1 (CC BY 2.0 DEED) Gunnar Morling @gunnarmorling Data Contracts In Practice With Debezium and Apache Flink

Slide 2

Slide 2 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Does Change Data Capture Break Encapsulation? 🤔

Slide 3

Slide 3 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Today’s Mission Learn About…

Slide 4

Slide 4 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling ● Software engineer at Decodable ● Former project lead of Debezium ● kcctl 🧸, JfrUnit, ModiTect, MapStruct ● Java Champion ● 1⃣ 🐝 🏎 Gunnar Morling

Slide 5

Slide 5 text

The observer pattern for your database

Slide 6

Slide 6 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Debezium Log-Based Change Data Capture

Slide 7

Slide 7 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Change Data Capture Liberation for Your Data https://www.decodable.co/blog/seven-ways-to-put-cdc-to-work

Slide 8

Slide 8 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Debezium: Data Change Events

Slide 9

Slide 9 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Does Change Data Capture Break Encapsulation?

Slide 10

Slide 10 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling (Potential) Concerns? Your Table Model Becomes Your API ● Names and types directly exposed ● Particularly problematic for legacy schemas Image © massmatt https://flic.kr/p/25eF9D3 (CC BY 2.0)

Slide 11

Slide 11 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling (Potential) Concerns? Fine-grained Events ● 1:1 relationship between tables and event streams ● May be too fine-grained Image © Michele Dorsey Walfred https://flic.kr/p/MDCCP4 (CC BY 2.0 DEED)

Slide 12

Slide 12 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling ● Renaming columns ● Changing types ● Removing columns ● Changing cardinality of associations (Potential) Concerns? Schema Changes Might Break Things Image © Insights Unspoken https://flic.kr/p/zNTwN9 (CC BY 2.0 DEED)

Slide 13

Slide 13 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling ● Exposing all columns… ● …and rows (Potential) Concerns? Accidental Data Leaks Image © Leonid Mamchenkov https://flic.kr/p/qzBLy (CC BY 2.0 DEED)

Slide 14

Slide 14 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Houston, we have a problem! Image © Jeff Hitchcock https://flic.kr/p/2hN4RG7 (CC BY 2.0 DEED)

Slide 15

Slide 15 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Is It Really a Problem, Though?

Slide 16

Slide 16 text

Data Contracts R. D. Barry https://flic.kr/p/P5RWWR (CC BY-SA 2.0 DEED)

Slide 17

Slide 17 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Data Contracts A data contract is a document that defines the structure, format, semantics, quality, and terms of use for exchanging data between a data provider and their consumers.

Slide 18

Slide 18 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Data Contracts Towards Data Products ● Documentation of intent ● Owned and evolved by the publisher Image © Jerome Vial https://flic.kr/p/71KpZy (CC BY-SA 2.0 DEED)

Slide 19

Slide 19 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Data Contracts datacontract.com

Slide 20

Slide 20 text

Outbox Pattern

Slide 21

Slide 21 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling The Outbox Pattern

Slide 22

Slide 22 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Variation on Postgres pg_logical_emit_message() ● Directly writing arbitrary messages to the WAL ● No need for an outbox table

Slide 23

Slide 23 text

Stream Processing Colin Howley https://flic.kr/p/698F5j (CC BY-ND 2.0)

Slide 24

Slide 24 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Apache Flink Stateful Computations over Data Streams https://flink.apache.org/

Slide 25

Slide 25 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Apache Flink APIs for Application Development Image source: “Change Data Capture with Flink SQL and Debezium” by Marta Paes at DataEngBytes (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

Slide 26

Slide 26 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Example

Slide 27

Slide 27 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Example

Slide 28

Slide 28 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Example

Slide 29

Slide 29 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling There’s More… SQL All the Things! ● Filters ● Derived fields ● Consumer-specific contracts

Slide 30

Slide 30 text

Beyond the Basics alvaroreguly https://flic.kr/p/yJVjVY (CC BY 2.0 DEED)

Slide 31

Slide 31 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Streaming Data Contracts Joining Two Tables

Slide 32

Slide 32 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Streaming Data Contracts Denormalizing Data

Slide 33

Slide 33 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Joining Two Streams

Slide 34

Slide 34 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Nested Data Structures UDFs to the Rescue

Slide 35

Slide 35 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Nested Data Structures UDFs to the Rescue https://www.youtube.com/@decodable

Slide 36

Slide 36 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Expanding Partial Events Postgres TOAST Columns

Slide 37

Slide 37 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Expanding Partial Events

Slide 38

Slide 38 text

Schema Changes Bureau of Land Management Oregon and Washington https://flic.kr/p/26mD2nW (CC BY 2.0 DEED)

Slide 39

Slide 39 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Schema Evolution: Producer-driven Changes

Slide 40

Slide 40 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Schema Changes Renaming a Column

Slide 41

Slide 41 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Renaming a Column How To Do It

Slide 42

Slide 42 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Renaming a Column How To Do It

Slide 43

Slide 43 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Wrap-Up

Slide 44

Slide 44 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling ● Debezium: Real-time change event streams for your data ● Apache Flink: Data contracts for… ○ …encapsulating internal models ✅ ○ …consciously designed events ✅ ○ …ensuring compatibility ✅ ○ …protecting sensitive data ✅ Take Aways 🤩

Slide 45

Slide 45 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Houston, we may have a problem. If we do, we know how to solve it! Image © NASA Hubble Space Telescope https://flic.kr/p/22tV2DJ (CC BY 2.0 DEED)

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling ● Blog post: https://www.decodable.co/blog/change-data-capture-breaks-en capsulation-does-it-though ● Example source code: github.com/decodableco/examples → cdc-data-contracts Learn More

Slide 48

Slide 48 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Decodable Talks at Current ‘24 Timing is Everything: Understanding Event-Time Processing in Flink SQL 🗣 Sharon Xie 📆 Tuesday 4pm 🗺 Ballroom F Data Contracts In Practice With Debezium and Apache Flink 🗣 Gunnar Morling 📆 Tuesday 3pm 🗺 Meeting Room 18C So You Want to Write a User-Defined Function (UDF) for Flink? 🗣 Hans-Peter Grahsl 📆 Wednesday 1:30pm 🗺 Ballroom F The Joy of JARs (and Other Flink SQL Troubleshooting Tales) 🗣 Robin Moffatt 📆 Wednesday 3pm 🗺 Ballroom F

Slide 49

Slide 49 text

Data Contracts With Debezium + Apache Flink | @gunnarmorling Thank You! Q & A gunnar@decodable.co @gunnarmorling morling.dev 📧

Slide 50

Slide 50 text

No content