$30 off During Our Annual Pro Sale. View Details »

Keep Your Cache Always Fresh With Debezium

Keep Your Cache Always Fresh With Debezium

The saying goes that there are only two hard things in Computer Science: cache invalidation, and naming things. Well, turns out the first one is solved actually ;)

Join us for this session to learn how to keep read views of your data in distributed caches close to your users, always kept in sync with your primary data stores change data capture. You will learn how to

- Implement a low-latency data pipeline for cache updates based on Debezium, Apache Kafka, and Infinispan
- Create denormalized views of your data using Kafka Streams and make them accessible via plain key look-ups from a cache cluster close by
- Propagate updates between cache clusters using cross-site replication

We'll also touch on some advanced concepts, such as detecting and rejecting writes to the system of record which are derived from outdated cached state, and show in a demo how all the pieces come together, of course connected via Apache Kafka.

Presented at Kafka Summit London 2022; Infinispan slides contributed by Katia Aresti.

Gunnar Morling

April 26, 2022
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © Nathalie https://flic.kr/p/21Ghf2g (CC BY 2.0)
    Keep Your Cache Always Fresh With Debezium!
    Gunnar Morling
    Software Engineer, Red Hat
    @gunnarmorling

    View Slide

  2. #Debezium @gunnarmorling
    The Challenge
    ● 90% read requests
    ● Complex queries
    Multi-site Application With Shared Database

    View Slide

  3. #Debezium @gunnarmorling
    … Multi-site application with shared system-of-record database
    … With local, denormalized read views (CQRS)
    ... Automatically kept in sync after writes
    Today’s Mission
    🤔
    Explore How to Build a…

    View Slide

  4. #Debezium @gunnarmorling
    Agenda
    Demo!
    Learn Learn

    View Slide

  5. #Debezium @gunnarmorling
    ● Software engineer at Red Hat
    ○ Debezium
    ○ Quarkus
    ● kcctl 🧸, JfrUnit, ModiTect, MapStruct
    ● Spec Lead for Bean Validation 2.0
    ● Java Champion
    ● @gunnarmorling
    Gunnar Morling

    View Slide

  6. #Debezium @gunnarmorling
    The Idea
    Caching to the Rescue!

    View Slide

  7. #Debezium @gunnarmorling
    The Idea
    Caching to the Rescue!

    View Slide

  8. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot

    View Slide

  9. #Debezium @gunnarmorling
    Infinispan
    100% Open-source In-Memory Distributed Data Store
    Interoperability
    Resilient
    Fault Tolerant Data
    Clustered Processing Query
    ACID Tx

    View Slide

  10. #Debezium @gunnarmorling
    Infinispan Deployment
    Local Cache
    App
    Data
    5, Peter

    View Slide

  11. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    5, Peter
    5, Peter
    App 3
    Data
    Put
    4, Mike
    Remove
    5 Peter
    Get
    2, Null
    Infinispan Deployment
    Local Cache

    View Slide

  12. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    1, Maria
    2, Jenny
    App 3
    Data
    1, Maria
    2, Jenny
    Infinispan Deployment
    Replicated Cache

    View Slide

  13. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    1, Maria
    2, Jenny
    App 3
    Data
    1, Maria
    2, Jenny
    Put
    3, Juan
    3, Juan
    Infinispan Deployment
    Replicated Cache

    View Slide

  14. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    3, Juan
    1, Maria
    2, Jenny
    3, Juan
    App 3
    Data
    1, Maria
    2, Jenny
    3, Juan
    Infinispan Deployment
    Replicated Cache

    View Slide

  15. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria 2, Jenny
    App 3
    Data
    3, Juan
    Infinispan Deployment
    Distributed Cache (One Owner)

    View Slide

  16. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria 2, Jenny
    App 3
    Data
    3, Juan
    Get 2
    Jenny
    Get 2
    Jenny
    Infinispan Deployment
    Distributed Cache (One Owner)

    View Slide

  17. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria 2, Jenny
    App 3
    Data
    3, Juan
    Infinispan Deployment
    Distributed Cache (One Owner)

    View Slide

  18. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    2, Jenny
    3, Juan
    App 3
    Data
    1, Maria
    3, Juan
    Infinispan Deployment
    Distributed Cache (Two Owners)

    View Slide

  19. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    2, Jenny
    3, Juan
    App 3
    Data
    1, Maria
    3, Juan
    Put 4 Will Put 4 Will
    Infinispan Deployment
    Distributed Cache (Two Owners)

    View Slide

  20. #Debezium @gunnarmorling
    App 1
    Data
    App 2
    Data
    1, Maria
    2, Jenny
    4, Will
    2, Jenny
    3, Juan
    4, Will
    App 3
    Data
    1, Maria
    3, Juan
    Infinispan Deployment
    Distributed Cache (Two Owners)

    View Slide

  21. #Debezium @gunnarmorling

    View Slide

  22. #Debezium @gunnarmorling
    Infinispan
    Client/Server
    Service 1
    Service 2
    Binary
    (hot rod)
    REST
    ...
    Service 3
    Data
    Infinispan Cluster
    Data
    Data

    View Slide

  23. #Debezium @gunnarmorling
    Infinispan Cross-Site Replication
    AWS (LON)
    GCP (NYC)
    Load Balancer
    APP
    APP
    Service
    APP
    APP
    Service
    Shared
    State
    Shared
    State
    Shared
    State
    Shared
    State
    Data
    Data
    Data NYC
    Data LON
    RELAY2

    View Slide

  24. #Debezium @gunnarmorling
    The Question
    How To Keep The Cache In Sync?

    View Slide

  25. https://flic.kr/p/PFDvkY Public Domain, Angelo Br

    View Slide

  26. #Debezium @gunnarmorling
    Debezium in a Nutshell
    Open-Source Change Data Capture
    ● A CDC Platform
    ○ Based on transaction logs
    ○ Snapshotting, filtering, etc.
    ○ Outbox support
    ○ Web-based UI
    ● Fully open-source, very active
    community
    ● Large production deployments

    View Slide

  27. #Debezium @gunnarmorling
    Change Data Capture
    Liberation for Your Data

    View Slide

  28. #Debezium @gunnarmorling
    Change Data Capture
    Liberation for Your Data

    View Slide

  29. #Debezium @gunnarmorling
    Solution Overview
    Capturing Changes From the Database

    View Slide

  30. #Debezium @gunnarmorling
    ● Community-hosted connectors
    ● pg_logical_emit_message()
    ● Multi-DB support (SQL Server)
    ● Debezium Server sinks
    ● MongoDB change streams support
    ● Debezium UI
    ● etc.
    Detour: What’s New in Debezium?

    View Slide

  31. #Debezium @gunnarmorling
    ● Can’t update filter list
    ● Long-running snapshots can’t be paused/resumed
    ● Can’t stream changes until snapshot completed
    ● Can’t re-snapshot selected tables
    Detour: What’s New in Debezium?
    Incremental Snapshotting

    View Slide

  32. #Debezium @gunnarmorling
    Incremental Snapshotting
    ● “DBLog: A Watermark Based
    Change-Data-Capture
    Framework”, by Andreas Andreakis
    and Ioannis Papapanagiotou
    ● Key idea: interleave snapshot events
    and events from TX log
    https://arxiv.org/pdf/2010.12597v1.pdf
    Detour: What’s New in Debezium?

    View Slide

  33. #Debezium @gunnarmorling
    Incremental Snapshotting
    Detour: What’s New in Debezium?

    View Slide

  34. #Debezium @gunnarmorling
    Incremental Snapshotting
    Detour: What’s New in Debezium?

    View Slide

  35. #Debezium @gunnarmorling
    Support for pg_logical_emit_message()
    Detour: What’s New in Debezium?
    ● Directly writing arbitrary messages to the WAL
    ● No need for an outbox table

    View Slide

  36. #Debezium @gunnarmorling
    Multi-DB Support
    Detour: What’s New in Debezium?

    View Slide

  37. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot
    Demo

    View Slide

  38. View Slide

  39. #Debezium @gunnarmorling
    ● Fast start-up, low memory consumption
    ● Developer joy
    ● Imperative and Reactive
    ● Best-of-breed libraries
    ● Run via HotSpot and GraalVM native binaries
    Quarkus - Supersonic Subatomic Java
    A Stack for Building Cloud-native Apps

    View Slide

  40. #Debezium @gunnarmorling
    ● Java API for stateful stream processing
    ● Rich set of operators
    ● Scaling out to multiple JVMs
    ● Interactive queries
    Kafka Streams
    Streaming Queries on Kafka Topics

    View Slide

  41. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot
    Demo

    View Slide

  42. #Debezium @gunnarmorling
    … Multi-site application with shared system-of-record database ✅
    … With local, denormalized read views (CQRS) ✅
    ... Automatically kept in sync after writes ✅
    Today’s Mission
    🤩
    Explore How to Build a…

    View Slide

  43. #Debezium @gunnarmorling
    ● Infinispan: @infinispan | https://infinispan.org/
    ● Debezium: @debezium | https://debezium.io/
    ● Demo: https://github.com/debezium/debezium-examples/ →
    distributed-caching
    ● kcctl 🧸: https://github.com/kcctl/kcctl/
    Learn More

    View Slide

  44. 44
    Try our Kafka service!
    ▸ 48 hour trial
    ▸ Free of charge
    ▸ OpenShift and Kubernetes AppDev examples
    Red Hat OpenShift Streams for Apache Kafka Trial
    TRY IT TODAY!
    http://red.ht/TryKafka

    View Slide

  45. #Debezium @gunnarmorling
    Q & A
    [email protected]
    @gunnarmorling
    📧
    Thank You!

    View Slide

  46. View Slide