Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keep Your Cache Always Fresh With Debezium

Keep Your Cache Always Fresh With Debezium

The saying goes that there are only two hard things in Computer Science: cache invalidation, and naming things. Well, turns out the first one is solved actually ;)

Join us for this session to learn how to keep read views of your data in distributed caches close to your users, always kept in sync with your primary data stores change data capture. You will learn how to

- Implement a low-latency data pipeline for cache updates based on Debezium, Apache Kafka, and Infinispan
- Create denormalized views of your data using Kafka Streams and make them accessible via plain key look-ups from a cache cluster close by
- Propagate updates between cache clusters using cross-site replication

We'll also touch on some advanced concepts, such as detecting and rejecting writes to the system of record which are derived from outdated cached state, and show in a demo how all the pieces come together, of course connected via Apache Kafka.

Presented at Kafka Summit London 2022; Infinispan slides contributed by Katia Aresti.

8e25c0ca4bf25113bd9c0ccc5d118164?s=128

Gunnar Morling

April 26, 2022
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © Nathalie https://flic.kr/p/21Ghf2g (CC BY 2.0) Keep Your Cache

    Always Fresh With Debezium! Gunnar Morling Software Engineer, Red Hat @gunnarmorling
  2. #Debezium @gunnarmorling The Challenge • 90% read requests • Complex

    queries Multi-site Application With Shared Database
  3. #Debezium @gunnarmorling … Multi-site application with shared system-of-record database …

    With local, denormalized read views (CQRS) ... Automatically kept in sync after writes Today’s Mission 🤔 Explore How to Build a…
  4. #Debezium @gunnarmorling Agenda Demo! Learn Learn

  5. #Debezium @gunnarmorling • Software engineer at Red Hat ◦ Debezium

    ◦ Quarkus • kcctl 🧸, JfrUnit, ModiTect, MapStruct • Spec Lead for Bean Validation 2.0 • Java Champion • @gunnarmorling Gunnar Morling
  6. #Debezium @gunnarmorling The Idea Caching to the Rescue!

  7. #Debezium @gunnarmorling The Idea Caching to the Rescue!

  8. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot

  9. #Debezium @gunnarmorling Infinispan 100% Open-source In-Memory Distributed Data Store Interoperability

    Resilient Fault Tolerant Data Clustered Processing Query ACID Tx
  10. #Debezium @gunnarmorling Infinispan Deployment Local Cache App Data 5, Peter

  11. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 5, Peter 5, Peter App 3 Data Put 4, Mike Remove 5 Peter Get 2, Null Infinispan Deployment Local Cache
  12. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 1, Maria 2, Jenny App 3 Data 1, Maria 2, Jenny Infinispan Deployment Replicated Cache
  13. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 1, Maria 2, Jenny App 3 Data 1, Maria 2, Jenny Put 3, Juan 3, Juan Infinispan Deployment Replicated Cache
  14. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 3, Juan 1, Maria 2, Jenny 3, Juan App 3 Data 1, Maria 2, Jenny 3, Juan Infinispan Deployment Replicated Cache
  15. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Infinispan Deployment Distributed Cache (One Owner)
  16. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Get 2 Jenny Get 2 Jenny Infinispan Deployment Distributed Cache (One Owner)
  17. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Infinispan Deployment Distributed Cache (One Owner)
  18. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 2, Jenny 3, Juan App 3 Data 1, Maria 3, Juan Infinispan Deployment Distributed Cache (Two Owners)
  19. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 2, Jenny 3, Juan App 3 Data 1, Maria 3, Juan Put 4 Will Put 4 Will Infinispan Deployment Distributed Cache (Two Owners)
  20. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 4, Will 2, Jenny 3, Juan 4, Will App 3 Data 1, Maria 3, Juan Infinispan Deployment Distributed Cache (Two Owners)
  21. #Debezium @gunnarmorling

  22. #Debezium @gunnarmorling Infinispan Client/Server Service 1 Service 2 Binary (hot

    rod) REST ... Service 3 Data Infinispan Cluster Data Data
  23. #Debezium @gunnarmorling Infinispan Cross-Site Replication AWS (LON) GCP (NYC) Load

    Balancer APP APP Service APP APP Service Shared State Shared State Shared State Shared State Data Data Data NYC Data LON RELAY2
  24. #Debezium @gunnarmorling The Question How To Keep The Cache In

    Sync?
  25. https://flic.kr/p/PFDvkY Public Domain, Angelo Br

  26. #Debezium @gunnarmorling Debezium in a Nutshell Open-Source Change Data Capture

    • A CDC Platform ◦ Based on transaction logs ◦ Snapshotting, filtering, etc. ◦ Outbox support ◦ Web-based UI • Fully open-source, very active community • Large production deployments
  27. #Debezium @gunnarmorling Change Data Capture Liberation for Your Data

  28. #Debezium @gunnarmorling Change Data Capture Liberation for Your Data

  29. #Debezium @gunnarmorling Solution Overview Capturing Changes From the Database

  30. #Debezium @gunnarmorling • Community-hosted connectors • pg_logical_emit_message() • Multi-DB support

    (SQL Server) • Debezium Server sinks • MongoDB change streams support • Debezium UI • etc. Detour: What’s New in Debezium?
  31. #Debezium @gunnarmorling • Can’t update filter list • Long-running snapshots

    can’t be paused/resumed • Can’t stream changes until snapshot completed • Can’t re-snapshot selected tables Detour: What’s New in Debezium? Incremental Snapshotting
  32. #Debezium @gunnarmorling Incremental Snapshotting • “DBLog: A Watermark Based Change-Data-Capture

    Framework”, by Andreas Andreakis and Ioannis Papapanagiotou • Key idea: interleave snapshot events and events from TX log https://arxiv.org/pdf/2010.12597v1.pdf Detour: What’s New in Debezium?
  33. #Debezium @gunnarmorling Incremental Snapshotting Detour: What’s New in Debezium?

  34. #Debezium @gunnarmorling Incremental Snapshotting Detour: What’s New in Debezium?

  35. #Debezium @gunnarmorling Support for pg_logical_emit_message() Detour: What’s New in Debezium?

    • Directly writing arbitrary messages to the WAL • No need for an outbox table
  36. #Debezium @gunnarmorling Multi-DB Support Detour: What’s New in Debezium?

  37. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot Demo

  38. None
  39. #Debezium @gunnarmorling • Fast start-up, low memory consumption • Developer

    joy • Imperative and Reactive • Best-of-breed libraries • Run via HotSpot and GraalVM native binaries Quarkus - Supersonic Subatomic Java A Stack for Building Cloud-native Apps
  40. #Debezium @gunnarmorling • Java API for stateful stream processing •

    Rich set of operators • Scaling out to multiple JVMs • Interactive queries Kafka Streams Streaming Queries on Kafka Topics
  41. https://flic.kr/p/PFDvkY Public Domain, Angelo Brathot Demo

  42. #Debezium @gunnarmorling … Multi-site application with shared system-of-record database ✅

    … With local, denormalized read views (CQRS) ✅ ... Automatically kept in sync after writes ✅ Today’s Mission 🤩 Explore How to Build a…
  43. #Debezium @gunnarmorling • Infinispan: @infinispan | https://infinispan.org/ • Debezium: @debezium

    | https://debezium.io/ • Demo: https://github.com/debezium/debezium-examples/ → distributed-caching • kcctl 🧸: https://github.com/kcctl/kcctl/ Learn More
  44. 44 Try our Kafka service! ▸ 48 hour trial ▸

    Free of charge ▸ OpenShift and Kubernetes AppDev examples Red Hat OpenShift Streams for Apache Kafka Trial TRY IT TODAY! http://red.ht/TryKafka
  45. #Debezium @gunnarmorling Q & A gunnar@hibernate.org @gunnarmorling 📧 Thank You!

  46. None