Slide 1

Slide 1 text

Image © Marja van Bochove https://flic.kr/p/5Q6yUY (CC BY 2.0) Real-time Change Stream Processing with Apache Flink Gunnar Morling Software Engineer, Decodable @gunnarmorling

Slide 2

Slide 2 text

The world is real-time. So should be your data.

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

#Debezium + #ApacheFlink | @gunnarmorling Today’s Mission Learn About…

Slide 5

Slide 5 text

#Debezium + #ApacheFlink | @gunnarmorling ● Software engineer at Decodable ● Former project lead of Debezium ● kcctl 🧸, JfrUnit, ModiTect, MapStruct ● Spec Lead for Bean Validation 2.0 ● Java Champion Gunnar Morling

Slide 6

Slide 6 text

© Kai Schreiber https://flic.kr/p/uecg (CC BY-SA 2.0)

Slide 7

Slide 7 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium Log-Based Change Data Capture

Slide 8

Slide 8 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium in a Nutshell Open-Source Change Data Capture ● A CDC Platform ○ Based on transaction logs ○ Snapshotting, filtering, etc. ○ Outbox support ○ Web-based UI ● Fully open-source, very active community ● Large production deployments

Slide 9

Slide 9 text

#Debezium + #ApacheFlink | @gunnarmorling Change Data Capture Liberation for Your Data

Slide 10

Slide 10 text

#Debezium + #ApacheFlink | @gunnarmorling Change Data Capture Liberation for Your Data

Slide 11

Slide 11 text

#Debezium + #ApacheFlink | @gunnarmorling ● Core ○ MySQL ○ Postgres ○ SQL Server ○ MongoDB ○ Db2 ○ Oracle ● Community-led: ○ Vitess, Cassandra, Spanner ● External: ScyllaDB, Yugabyte Debezium Supported Databases

Slide 12

Slide 12 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 13

Slide 13 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 14

Slide 14 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 15

Slide 15 text

#Debezium + #ApacheFlink | @gunnarmorling Becoming the De-Facto CDC Standard https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-connector-scylla-cdc-source-connector/ Debezium

Slide 16

Slide 16 text

Apache Flink Colin Howley https://flic.kr/p/698F5j (CC BY-ND 2.0)

Slide 17

Slide 17 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stateful Computations over Data Streams https://flink.apache.org/

Slide 18

Slide 18 text

#Debezium + #ApacheFlink | @gunnarmorling ● Real-time reporting/dashboards ● Low-latency alerting, notifications ● Materialized view maintenance, caches ● Real-time cross-database sync, lookup joins, windowed joins, aggregations ● Machine learning: model serving, feature engineering ● Change data capture, data integration Apache Flink Common Use Cases https://flink.apache.org/poweredby.html

Slide 19

Slide 19 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink APIs for Application Development Image source: “Change Data Capture with Flink SQL and Debezium” by Marta Paes at DataEngBytes (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

Slide 20

Slide 20 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 21

Slide 21 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 22

Slide 22 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 23

Slide 23 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 24

Slide 24 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 25

Slide 25 text

#Debezium + #ApacheFlink | @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 26

Slide 26 text

#Debezium + #ApacheFlink | @gunnarmorling Debezium and Apache Flink Integration Options

Slide 27

Slide 27 text

Use Cases https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot

Slide 28

Slide 28 text

#Debezium + #ApacheFlink | @gunnarmorling pg_logical_emit_message() Exporting Auditing Metadata ● Pure CDC events lack metadata like business user, device id, etc. ● Solution: emit at TX begin, enrich events e.g. using SMT

Slide 29

Slide 29 text

#Debezium + #ApacheFlink | @gunnarmorling Audit Logs Enriching Change Data Events with Metadata

Slide 30

Slide 30 text

Data Contracts © Marcin Wichary https://flic.kr/p/6d9P7t (CC BY 2.0)

Slide 31

Slide 31 text

#Debezium + #ApacheFlink | @gunnarmorling Data Contracts Encapsulating Your Schema Chris Riccomini (https://cnr.sh/essays/kafka-change-data-capture-breaks-database-encapsulation) 🤔

Slide 32

Slide 32 text

#Debezium + #ApacheFlink | @gunnarmorling Data Contracts Encapsulating Your Schema Image source: “Data Contracts — From Zero To Hero” by Mehdio (https://towardsdatascience.com/data-contracts-from-zero-to-hero-343717ac4d5e)

Slide 33

Slide 33 text

#Debezium + #ApacheFlink | @gunnarmorling Data Contracts Encapsulating Your Schema Image source: “An Engineer's Guide to Data Contracts - Pt. 1” by Chad Sanderson and Adrian Kreuziger (https://dataproducts.substack.com/p/an-engineers-guide-to-data-contracts)

Slide 34

Slide 34 text

#Debezium + #ApacheFlink | @gunnarmorling Data Contracts Encapsulating Your Schema ● Consciously design your exposed ○ Set of columns ○ Their names and types ○ Data structure (e.g. DDD aggregates) ● Changes to the same

Slide 35

Slide 35 text

Demo © Luke Jones https://flic.kr/p/sEq4MA (CC BY-SA 2.0)

Slide 36

Slide 36 text

#Debezium + #ApacheFlink | @gunnarmorling Driving a Dashboard Propagating Joined Data to Elasticsearch/Kibana

Slide 37

Slide 37 text

Demo © Luke Jones https://flic.kr/p/sEq4MA (CC BY-SA 2.0)

Slide 38

Slide 38 text

#Debezium + #ApacheFlink | @gunnarmorling Nested Data Structures UDFs to the Rescue

Slide 39

Slide 39 text

#Debezium + #ApacheFlink | @gunnarmorling Nested Data Structures UDFs to the Rescue

Slide 40

Slide 40 text

#Debezium + #ApacheFlink | @gunnarmorling Nested Data Structures UDFs to the Rescue

Slide 41

Slide 41 text

#Debezium + #ApacheFlink | @gunnarmorling Nested Data Structures UDFs to the Rescue

Slide 42

Slide 42 text

#Debezium + #ApacheFlink | @gunnarmorling Nested Data Structures UDFs to the Rescue https://www.youtube.com/@decodable

Slide 43

Slide 43 text

#Debezium + #ApacheFlink | @gunnarmorling Transactional Aggregation Correlating Events From Same Transaction

Slide 44

Slide 44 text

#Debezium + #ApacheFlink | @gunnarmorling Transactional Aggregation Correlating Events From Same Transaction https://www.slideshare.net/FlinkForward/squirreling-away-640-billion-how-stripe-leverages-flink-for-change-data-capture

Slide 45

Slide 45 text

#Debezium + #ApacheFlink | @gunnarmorling Wrap-Up

Slide 46

Slide 46 text

#Debezium + #ApacheFlink | @gunnarmorling ● The fresher data is, the more valuable it is ● Debezium and Apache Flink: Power house of change stream processing ● Data streaming stacks can be non-trivial to set up and operate Take Aways 🤩

Slide 47

Slide 47 text

#Debezium + #ApacheFlink | @gunnarmorling ● Debezium: @debezium | https://debezium.io/ ● Apache Flink: @ApacheFlink | https://flink.apache.org/ ● Getting started with Flink: github.com/decodableco/examples → flink-learn Learn More

Slide 48

Slide 48 text

#Debezium + #ApacheFlink | @gunnarmorling Q & A gunnar@decodable.co @gunnarmorling 📧 Thank You!

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

#Debezium @gunnarmorling ● Incremental snapshotting ● Postgres logical decoding messages ● Multi-DB support (SQL Server) ● Debezium Server sinks ● MongoDB change streams support ● Debezium UI ● Debezium 2.0 What’s New in Debezium?