Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Patterns in Distributed Systems @ Flink Forward 2021

Change Data Streaming Patterns in Distributed Systems @ Flink Forward 2021

Abstract:
Microservices are one of the big trends in software engineering of the last few years; organising business functionality in several self-contained, loosely coupled services helps teams to work efficiently, make the most suitable technical decisions, and react quickly to new business requirements.

In this session we'll discuss and showcase how open-source change data capture (CDC) with Debezium can help developers with typical challenges they often face when working on microservices. Come and join us to learn how to:

* Employ the outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling
* Gradually extract microservices from existing monolithic applications, using CDC, the strangler fig pattern and Apache Flink
* Coordinate long-running business transactions across multiple services using CDC-based saga orchestration, ensuring such activity gets consistently applied or aborted by all participating services.

Event Page:
https://www.flink-forward.org/global-2021/conference-program#change-data-streaming-patterns-in-distributed-systems

Video Recording:
https://www.youtube.com/watch?v=JsihpvQEv2c

Demo Repository: https://github.com/hpgrahsl/flinkforward21

Hans-Peter Grahsl

October 26, 2021
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. Gunnar Morling
    Software Engineer, Red Hat
    @gunnarmorling
    Change Data Streaming Patterns in
    Distributed Systems
    Hans-Peter Grahsl
    Technical Trainer, Netconomy
    @hpgrahsl

    View Slide

  2. #CDCPatterns @gunnarmorling @hpgrahsl
    … implemented using Change Data Capture
    Today’s Objectives

    View Slide

  3. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Open source software engineer at Red Hat
    ○ Debezium
    ○ Quarkus
    ● Spec Lead for Bean Validation 2.0
    ● Java Champion
    ● @gunnarmorling
    Gunnar Morling

    View Slide

  4. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Technical Trainer at NETCONOMY
    ● Independent Engineer & Consultant
    ● Confluent Community Catalyst
    ● MongoDB Champion
    ● @hpgrahsl
    Hans-Peter Grahsl

    View Slide

  5. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Taps into TX log to capture INSERT/UPDATE/DELETE events
    ● Propagated to consumers via Apache Kafka and Kafka Connect
    Debezium — Log-based Change Data Capture

    View Slide

  6. #CDCPatterns @gunnarmorling @hpgrahsl
    Debezium in a Nutshell
    ● A CDC Platform
    ■ Based on transaction logs
    ■ Snapshotting, filtering, etc.
    ■ Outbox support
    ■ Web-based UI
    ● Fully open-source, very active
    community
    ● Large production deployments

    View Slide

  7. #CDCPatterns @gunnarmorling @hpgrahsl
    Debezium: Connectors
    ● Stable
    ■ MySQL
    ■ Postgres
    ■ MongoDB
    ■ SQL Server
    ■ Db2
    ■ Oracle
    ● Incubating
    ■ Vitess
    ■ Cassandra

    View Slide

  8. #CDCPatterns @gunnarmorling @hpgrahsl
    Debezium: Deployment Alternatives
    Embedded Engine and Debezium Server

    View Slide

  9. #CDCPatterns @gunnarmorling @hpgrahsl
    Data Change Events
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp

    View Slide

  10. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp
    Data Change Events

    View Slide

  11. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Old and new row state
    ● Metadata on table, TX id, etc.
    ● Operation type, timestamp
    Data Change Events

    View Slide

  12. Outbox Pattern

    View Slide

  13. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Services need to update their database,
    ● send messages to other services,
    ● and that consistently!
    Challenge: Microservices Data Exchange

    View Slide

  14. #CDCPatterns @gunnarmorling @hpgrahsl
    “Dual writes” are prone to inconsistencies!
    Outbox Pattern

    View Slide

  15. #CDCPatterns @gunnarmorling @hpgrahsl
    Outbox Pattern

    View Slide

  16. #CDCPatterns @gunnarmorling @hpgrahsl
    Outbox Pattern

    View Slide

  17. #CDCPatterns @gunnarmorling @hpgrahsl
    Outbox Pattern

    View Slide

  18. #CDCPatterns @gunnarmorling @hpgrahsl
    Outbox Pattern

    View Slide

  19. #CDCPatterns @gunnarmorling @hpgrahsl
    Outbox Pattern
    with enrichment from external system using Flink

    View Slide

  20. #CDCPatterns @gunnarmorling @hpgrahsl
    Integrating Debezium With Apache Flink
    Debezium → Kafka Topic → Flink

    View Slide

  21. #CDCPatterns @gunnarmorling @hpgrahsl
    Integrating Debezium With Apache Flink
    Flink CDC Connectors (Debezium Embedded Engine) → Flink

    View Slide

  22. Strangler Fig Pattern

    View Slide

  23. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Gradually evolve from old into new
    ● Support temporary coexistence
    ● Avoid big bang cut-over
    Challenge: Migrating Systems

    View Slide

  24. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  25. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  26. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  27. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  28. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  29. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  30. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern

    View Slide

  31. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC-based Strangler Fig Pattern
    Demo Repo: https://bit.ly/ff21-sfp

    View Slide

  32. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Incremental migration → “baby steps”
    ● Pause or stop migration
    without losing spent efforts
    ● Migration steps ideally reversible
    Rationale: ⚠ minimize risk ⚠
    Benefits

    View Slide

  33. #CDCPatterns @gunnarmorling @hpgrahsl
    CDC Pipeline Considerations
    ● data model leaking from monolith ?
    ● “1:1 replication” → building aggregates ?

    View Slide

  34. #CDCPatterns @gunnarmorling @hpgrahsl
    Enhanced CDC Processing
    Single Message Transforms

    View Slide

  35. #CDCPatterns @gunnarmorling @hpgrahsl
    Enhanced CDC Processing
    custom stream processing with Flink

    View Slide

  36. #CDCPatterns @gunnarmorling @hpgrahsl
    Example: join with custom aggregation

    View Slide

  37. #CDCPatterns @gunnarmorling @hpgrahsl
    Example: join with custom aggregation
    Flink Table API

    View Slide

  38. #CDCPatterns @gunnarmorling @hpgrahsl
    Example: join with custom aggregation
    Flink SQL

    View Slide

  39. Saga Pattern

    View Slide

  40. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Multiple services need to act collaboratively
    to achieve a consistent outcome
    ● Without 2-phase commit protocols
    ● Ensure correctness in case of failures
    Challenge: Long-running Business Transactions

    View Slide

  41. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern

    View Slide

  42. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Message Flow

    View Slide

  43. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Compensation

    View Slide

  44. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Configuration

    View Slide

  45. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Execution Flow

    View Slide

  46. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Execution Flow

    View Slide

  47. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Execution Flow

    View Slide

  48. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Expanding Partial Change Events with Flink

    View Slide

  49. #CDCPatterns @gunnarmorling @hpgrahsl
    Saga Pattern
    Expanding Partial Change Events with Flink

    View Slide

  50. Wrap-Up

    View Slide

  51. #CDCPatterns @gunnarmorling @hpgrahsl
    ● CDC: a powerful tool in the box for
    event-driven architectures
    ● Debezium: open-source CDC for
    a variety of databases
    ● Debezium + Apache Flink = ❤
    Takeaways

    View Slide

  52. #CDCPatterns @gunnarmorling @hpgrahsl
    ● Outbox implementation
    https://debezium.io/blog/2019/02/19/reliable-microservices-data
    -exchange-with-the-outbox-pattern/
    ● Strangler fig pattern
    https://martinfowler.com/bliki/StranglerFigApplication.html
    ● Saga implementation
    https://www.infoq.com/articles/saga-orchestration-outbox/
    ● Demo repo
    https://github.com/debezium/debezium-examples
    Resources

    View Slide

  53. #CDCPatterns @gunnarmorling @hpgrahsl
    Q & A
    [email protected]
    @gunnarmorling
    📧 [email protected]
    @hpgrahsl
    📧
    Thank You!

    View Slide

  54. #CDCPatterns @gunnarmorling @hpgrahsl
    Unsplash https://unsplash.com/license
    © Pablo García Saldaña https://unsplash.com/photos/lPQIndZz8Mo
    © David Clode https://unsplash.com/photos/T49WTav4LgU
    © Aaron Burden https://unsplash.com/photos/GFpxQ2ZyNc0
    © Nathan Dumlao https://unsplash.com/photos/wQDysNUCKfw
    © mari lezhava https://unsplash.com/photos/q65bNe9fW-w
    © Michał Parzuchowski https://unsplash.com/photos/Bt0PM7cNJFQ
    © Charles Forerunner https://unsplash.com/photos/3fPXt37X6UQ
    Flickr
    Attribution 2.0 Generic https://creativecommons.org/licenses/by/2.0/
    © Thomas Kamann https://flic.kr/p/coa2c
    CC0 1.0 Universal Public Domain Dedication https://creativecommons.org/publicdomain/zero/1.0/
    © Wall Boat https://flic.kr/p/Y6zkmX
    Attribution-ShareAlike 2.0 Generic https://creativecommons.org/licenses/by-sa/2.0/
    © Andrew Hart https://flic.kr/p/dmjkSk
    Image Credits
    In Order of Appearance

    View Slide