Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Patterns With Debezium & Apache Flink

Change Data Streaming Patterns With Debezium & Apache Flink

Microservices are one of the big trends in software engineering of the last few years: organizing business functionality in several self-contained, loosely coupled services helps engineering teams to work efficiently, make the most suitable technical decisions, and react quickly to new business requirements.

In this session we'll discuss and showcase how open-source change data capture (CDC) with Debezium can help developers with typical challenges they often face when working on microservices. Come and join us to learn how to:

* Employ the outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling
* Gradually extract microservices from existing monolithic applications, using CDC, the strangler fig pattern and Apache Flink
* Building auditing logs, containing not only the changed data itself, but also additional metadata like business user, client configuration, or use case identifier

Gunnar Morling

March 30, 2023
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © pixelchecker https://flic.kr/p/ah5sPr (CC BY 2.0)
    Change Data Streaming Patterns
    With Debezium & Apache Flink
    Gunnar Morling
    Senior Staff Software Engineer, Decodable
    @gunnarmorling

    View Slide

  2. The world is
    real-time.
    So should be
    your data.

    View Slide

  3. View Slide

  4. #ChangeDataStreamingPatterns @gunnarmorling
    Today’s Mission
    Learn About…

    View Slide

  5. #ChangeDataStreamingPatterns @gunnarmorling
    ● Software engineer at Decodable
    ● Former project lead of Debezium
    ● kcctl 🧸, JfrUnit, ModiTect,
    MapStruct
    ● Spec Lead for Bean Validation 2.0
    ● Java Champion
    Gunnar Morling

    View Slide

  6. The Tools
    https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot

    View Slide

  7. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Log-Based Change Data Capture
    ● Taps into TX log to capture INSERT/UPDATE/DELETE events
    ● Typically propagated to consumers via Apache Kafka

    View Slide

  8. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Log-Based Change Data Capture
    ● Taps into TX log to capture INSERT/UPDATE/DELETE events
    ● Typically propagated to consumers via Apache Kafka

    View Slide

  9. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Open-Source Change Data Capture
    ● A CDC Platform
    ○ Based on transaction logs
    ○ Snapshotting, filtering, etc.
    ○ Outbox support
    ○ Web-based UI
    ● Fully open-source, very active
    community
    ● Large production deployments

    View Slide

  10. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View Slide

  11. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View Slide

  12. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium: Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View Slide

  13. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Deployment Options

    View Slide

  14. #ChangeDataStreamingPatterns @gunnarmorling
    Becoming the De-Facto CDC Standard
    Debezium
    Google Cloud Spanner ScyllaDB

    View Slide

  15. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot
    Apache Flink

    View Slide

  16. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stateful Computations over Data Streams
    https://flink.apache.org/

    View Slide

  17. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    APIs for Application Development
    Image source: “Change Data Capture with Flink SQL and Debezium, by Marta Paes at DataEngBytes
    (https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium)

    View Slide

  18. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  19. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  20. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  21. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  22. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  23. #ChangeDataStreamingPatterns @gunnarmorling
    Apache Flink
    Stream Processing of Change Data Events

    View Slide

  24. Outbox Pattern

    View Slide

  25. #ChangeDataStreamingPatterns @gunnarmorling
    ● Services need to update their database,
    ● send messages to other services,
    ● and that consistently!
    Challenge: Microservices Data Exchange

    View Slide

  26. #ChangeDataStreamingPatterns @gunnarmorling
    “Dual writes” are prone to inconsistencies!
    Outbox Pattern

    View Slide

  27. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern

    View Slide

  28. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern

    View Slide

  29. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern

    View Slide

  30. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern

    View Slide

  31. #ChangeDataStreamingPatterns @gunnarmorling
    Variation on Postgres
    pg_logical_emit_message()
    ● Directly writing arbitrary messages to the WAL
    ● No need for an outbox table

    View Slide

  32. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern
    Flink CDC Source

    View Slide

  33. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern
    Flink Pipeline

    View Slide

  34. #ChangeDataStreamingPatterns @gunnarmorling
    Outbox Pattern
    Serializer

    View Slide

  35. #ChangeDataStreamingPatterns @gunnarmorling
    Strangler Fig Pattern

    View Slide

  36. #ChangeDataStreamingPatterns @gunnarmorling
    ● Gradually evolve from old into new
    ● Support temporary coexistence
    ● Avoid big bang cut-over
    Challenge: Migrating Systems

    View Slide

  37. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  38. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  39. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  40. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  41. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  42. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  43. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern

    View Slide

  44. #ChangeDataStreamingPatterns @gunnarmorling
    CDC-based Strangler Fig Pattern
    Demo Repo: https://bit.ly/ff21-sfp

    View Slide

  45. #ChangeDataStreamingPatterns @gunnarmorling
    ● Incremental migration → “baby steps”
    ● Pause or stop migration
    without losing spent efforts
    ● Migration steps ideally reversible
    Rationale: ⚠ minimize risk ⚠
    Benefits

    View Slide

  46. #ChangeDataStreamingPatterns @gunnarmorling
    CDC Pipeline Considerations
    ● data model leaking from monolith ?
    ● “1:1 replication” → building aggregates ?

    View Slide

  47. #ChangeDataStreamingPatterns @gunnarmorling
    Enhanced CDC Processing
    Single Message Transforms

    View Slide

  48. #ChangeDataStreamingPatterns @gunnarmorling
    Enhanced CDC Processing
    Custom Stream Processing with Flink

    View Slide

  49. #ChangeDataStreamingPatterns @gunnarmorling
    Example: Join With Custom Aggregation

    View Slide

  50. #ChangeDataStreamingPatterns @gunnarmorling
    Example: Join With Custom Aggregation
    Flink CDC Connector

    View Slide

  51. #ChangeDataStreamingPatterns @gunnarmorling
    Example: Join With Custom Aggregation
    Flink SQL

    View Slide

  52. Audit Logs

    View Slide

  53. #ChangeDataStreamingPatterns @gunnarmorling
    Challenge: Capturing Intent
    pg_logical_emit_message()
    ● Pure CDC events lack metadata like business user, device id, etc.
    ● Solution: emit at TX begin, enrich events using Flink

    View Slide

  54. #ChangeDataStreamingPatterns @gunnarmorling
    Capturing Intent
    Enriching Change Data Events with Metadata

    View Slide

  55. #ChangeDataStreamingPatterns @gunnarmorling
    Capturing Intent
    Enriching Change Events Via Apache Flink

    View Slide

  56. #ChangeDataStreamingPatterns @gunnarmorling
    Wrap-Up

    View Slide

  57. #ChangeDataStreamingPatterns @gunnarmorling
    ● The fresher data is, the more valuable it is
    ● Change Data Capture and Debezium: Powerful tools for realtime
    change event feeds
    ● Combining CDC with stream processing: Many more possibilities
    ● Learn more:
    ○ https://www.infoq.com/articles/wonders-of-postgres-logical-decoding-messages/
    ○ https://github.com/decodableco/examples/blob/main/postgres-logical-decoding/
    Take Aways

    View Slide

  58. #ChangeDataStreamingPatterns @gunnarmorling
    Q & A
    [email protected]
    @gunnarmorling
    📧
    Thank You!

    View Slide

  59. View Slide

  60. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Correlating Events From Same Transaction

    View Slide

  61. #ChangeDataStreamingPatterns @gunnarmorling
    Debezium
    Correlating Events From Same Transaction
    https://www.slideshare.net/FlinkForward/squirreling-away-640-billion-how-stripe-leverages-flink-for-change-data-capture

    View Slide