Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Patterns With Debezium & Apache Flink

Change Data Streaming Patterns With Debezium & Apache Flink

Microservices are one of the big trends in software engineering of the last few years: organizing business functionality in several self-contained, loosely coupled services helps engineering teams to work efficiently, make the most suitable technical decisions, and react quickly to new business requirements.

In this session we'll discuss and showcase how open-source change data capture (CDC) with Debezium can help developers with typical challenges they often face when working on microservices. Come and join us to learn how to:

* Employ the outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling
* Gradually extract microservices from existing monolithic applications, using CDC, the strangler fig pattern and Apache Flink
* Building auditing logs, containing not only the changed data itself, but also additional metadata like business user, client configuration, or use case identifier

Gunnar Morling

March 30, 2023
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © pixelchecker https://flic.kr/p/ah5sPr (CC BY 2.0) Change Data Streaming

    Patterns With Debezium & Apache Flink Gunnar Morling Senior Staff Software Engineer, Decodable @gunnarmorling
  2. #ChangeDataStreamingPatterns @gunnarmorling • Software engineer at Decodable • Former project

    lead of Debezium • kcctl 🧸, JfrUnit, ModiTect, MapStruct • Spec Lead for Bean Validation 2.0 • Java Champion Gunnar Morling
  3. #ChangeDataStreamingPatterns @gunnarmorling Debezium Log-Based Change Data Capture • Taps into

    TX log to capture INSERT/UPDATE/DELETE events • Typically propagated to consumers via Apache Kafka
  4. #ChangeDataStreamingPatterns @gunnarmorling Debezium Log-Based Change Data Capture • Taps into

    TX log to capture INSERT/UPDATE/DELETE events • Typically propagated to consumers via Apache Kafka
  5. #ChangeDataStreamingPatterns @gunnarmorling Debezium Open-Source Change Data Capture • A CDC

    Platform ◦ Based on transaction logs ◦ Snapshotting, filtering, etc. ◦ Outbox support ◦ Web-based UI • Fully open-source, very active community • Large production deployments
  6. #ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events • Old and new

    row state • Metadata on table, TX id, etc. • Operation type, timestamp
  7. #ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events • Old and new

    row state • Metadata on table, TX id, etc. • Operation type, timestamp
  8. #ChangeDataStreamingPatterns @gunnarmorling Debezium: Data Change Events • Old and new

    row state • Metadata on table, TX id, etc. • Operation type, timestamp
  9. #ChangeDataStreamingPatterns @gunnarmorling Apache Flink APIs for Application Development Image source:

    “Change Data Capture with Flink SQL and Debezium, by Marta Paes at DataEngBytes (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)
  10. #ChangeDataStreamingPatterns @gunnarmorling • Services need to update their database, •

    send messages to other services, • and that consistently! Challenge: Microservices Data Exchange
  11. #ChangeDataStreamingPatterns @gunnarmorling • Gradually evolve from old into new •

    Support temporary coexistence • Avoid big bang cut-over Challenge: Migrating Systems
  12. #ChangeDataStreamingPatterns @gunnarmorling • Incremental migration → “baby steps” • Pause

    or stop migration without losing spent efforts • Migration steps ideally reversible Rationale: ⚠ minimize risk ⚠ Benefits
  13. #ChangeDataStreamingPatterns @gunnarmorling Challenge: Capturing Intent pg_logical_emit_message() • Pure CDC events

    lack metadata like business user, device id, etc. • Solution: emit at TX begin, enrich events using Flink
  14. #ChangeDataStreamingPatterns @gunnarmorling • The fresher data is, the more valuable

    it is • Change Data Capture and Debezium: Powerful tools for realtime change event feeds • Combining CDC with stream processing: Many more possibilities • Learn more: ◦ https://www.infoq.com/articles/wonders-of-postgres-logical-decoding-messages/ ◦ https://github.com/decodableco/examples/blob/main/postgres-logical-decoding/ Take Aways