Slide 1

Slide 1 text

Image © pixelchecker https://flic.kr/p/ah5sPr (CC BY 2.0) Change Data Streaming Patterns With Debezium & Apache Flink Gunnar Morling Senior Staff Software Engineer, Decodable @gunnarmorling

Slide 2

Slide 2 text

The world is real-time. So should be your data.

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

#ChangeDataStreamingPatterns @gunnarmorling Today’s Mission Learn About…

Slide 5

Slide 5 text

#ChangeDataStreamingPatterns @gunnarmorling ● Software engineer at Decodable ● Former project lead of Debezium ● kcctl 🧸, JfrUnit, ModiTect, MapStruct ● Spec Lead for Bean Validation 2.0 ● Java Champion Gunnar Morling

Slide 6

Slide 6 text

The Tools https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot

Slide 7

Slide 7 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Log-Based Change Data Capture ● Taps into TX log to capture INSERT/UPDATE/DELETE events ● Typically propagated to consumers via Apache Kafka

Slide 8

Slide 8 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Log-Based Change Data Capture ● Taps into TX log to capture INSERT/UPDATE/DELETE events ● Typically propagated to consumers via Apache Kafka

Slide 9

Slide 9 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Open-Source Change Data Capture ● A CDC Platform ○ Based on transaction logs ○ Snapshotting, filtering, etc. ○ Outbox support ○ Web-based UI ● Fully open-source, very active community ● Large production deployments

Slide 10

Slide 10 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 11

Slide 11 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 12

Slide 12 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events ● Old and new row state ● Metadata on table, TX id, etc. ● Operation type, timestamp

Slide 13

Slide 13 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Deployment Options

Slide 14

Slide 14 text

#ChangeDataStreamingPatterns @gunnarmorling Becoming the De-Facto CDC Standard Debezium Google Cloud Spanner ScyllaDB

Slide 15

Slide 15 text

https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot Apache Flink

Slide 16

Slide 16 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stateful Computations over Data Streams https://flink.apache.org/

Slide 17

Slide 17 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink APIs for Application Development Image source: “Change Data Capture with Flink SQL and Debezium, by Marta Paes at DataEngBytes (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

Slide 18

Slide 18 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 19

Slide 19 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 20

Slide 20 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 21

Slide 21 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 22

Slide 22 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 23

Slide 23 text

#ChangeDataStreamingPatterns @gunnarmorling Apache Flink Stream Processing of Change Data Events

Slide 24

Slide 24 text

Outbox Pattern

Slide 25

Slide 25 text

#ChangeDataStreamingPatterns @gunnarmorling ● Services need to update their database, ● send messages to other services, ● and that consistently! Challenge: Microservices Data Exchange

Slide 26

Slide 26 text

#ChangeDataStreamingPatterns @gunnarmorling “Dual writes” are prone to inconsistencies! Outbox Pattern

Slide 27

Slide 27 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern

Slide 28

Slide 28 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern

Slide 29

Slide 29 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern

Slide 30

Slide 30 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern

Slide 31

Slide 31 text

#ChangeDataStreamingPatterns @gunnarmorling Variation on Postgres pg_logical_emit_message() ● Directly writing arbitrary messages to the WAL ● No need for an outbox table

Slide 32

Slide 32 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern Flink CDC Source

Slide 33

Slide 33 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern Flink Pipeline

Slide 34

Slide 34 text

#ChangeDataStreamingPatterns @gunnarmorling Outbox Pattern Serializer

Slide 35

Slide 35 text

#ChangeDataStreamingPatterns @gunnarmorling Strangler Fig Pattern

Slide 36

Slide 36 text

#ChangeDataStreamingPatterns @gunnarmorling ● Gradually evolve from old into new ● Support temporary coexistence ● Avoid big bang cut-over Challenge: Migrating Systems

Slide 37

Slide 37 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 38

Slide 38 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 39

Slide 39 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 40

Slide 40 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 41

Slide 41 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 42

Slide 42 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 43

Slide 43 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern

Slide 44

Slide 44 text

#ChangeDataStreamingPatterns @gunnarmorling CDC-based Strangler Fig Pattern Demo Repo: https://bit.ly/ff21-sfp

Slide 45

Slide 45 text

#ChangeDataStreamingPatterns @gunnarmorling ● Incremental migration → “baby steps” ● Pause or stop migration without losing spent efforts ● Migration steps ideally reversible Rationale: ⚠ minimize risk ⚠ Benefits

Slide 46

Slide 46 text

#ChangeDataStreamingPatterns @gunnarmorling CDC Pipeline Considerations ● data model leaking from monolith ? ● “1:1 replication” → building aggregates ?

Slide 47

Slide 47 text

#ChangeDataStreamingPatterns @gunnarmorling Enhanced CDC Processing Single Message Transforms

Slide 48

Slide 48 text

#ChangeDataStreamingPatterns @gunnarmorling Enhanced CDC Processing Custom Stream Processing with Flink

Slide 49

Slide 49 text

#ChangeDataStreamingPatterns @gunnarmorling Example: Join With Custom Aggregation

Slide 50

Slide 50 text

#ChangeDataStreamingPatterns @gunnarmorling Example: Join With Custom Aggregation Flink CDC Connector

Slide 51

Slide 51 text

#ChangeDataStreamingPatterns @gunnarmorling Example: Join With Custom Aggregation Flink SQL

Slide 52

Slide 52 text

Audit Logs

Slide 53

Slide 53 text

#ChangeDataStreamingPatterns @gunnarmorling Challenge: Capturing Intent pg_logical_emit_message() ● Pure CDC events lack metadata like business user, device id, etc. ● Solution: emit at TX begin, enrich events using Flink

Slide 54

Slide 54 text

#ChangeDataStreamingPatterns @gunnarmorling Capturing Intent Enriching Change Data Events with Metadata

Slide 55

Slide 55 text

#ChangeDataStreamingPatterns @gunnarmorling Capturing Intent Enriching Change Events Via Apache Flink

Slide 56

Slide 56 text

#ChangeDataStreamingPatterns @gunnarmorling Wrap-Up

Slide 57

Slide 57 text

#ChangeDataStreamingPatterns @gunnarmorling ● The fresher data is, the more valuable it is ● Change Data Capture and Debezium: Powerful tools for realtime change event feeds ● Combining CDC with stream processing: Many more possibilities ● Learn more: ○ https://www.infoq.com/articles/wonders-of-postgres-logical-decoding-messages/ ○ https://github.com/decodableco/examples/blob/main/postgres-logical-decoding/ Take Aways

Slide 58

Slide 58 text

#ChangeDataStreamingPatterns @gunnarmorling Q & A [email protected] @gunnarmorling 📧 Thank You!

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Correlating Events From Same Transaction

Slide 61

Slide 61 text

#ChangeDataStreamingPatterns @gunnarmorling Debezium Correlating Events From Same Transaction https://www.slideshare.net/FlinkForward/squirreling-away-640-billion-how-stripe-leverages-flink-for-change-data-capture