Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keep Your Cache Always Fresh With Debezium

Keep Your Cache Always Fresh With Debezium

The saying goes that there are only two hard things in Computer Science: cache invalidation, and naming things. Well, turns out the first one is solved actually ;)

Join us for this session to learn how to keep read views of your data in distributed caches close to your users, always kept in sync with your primary data stores change data capture. You will learn how to

- Implement a low-latency data pipeline for cache updates based on Debezium, Apache Kafka, and Infinispan
- Create denormalized views of your data using Kafka Streams and make them accessible via plain key look-ups from a cache cluster close by
- Propagate updates between cache clusters using cross-site replication

We'll also touch on some advanced concepts, such as detecting and rejecting writes to the system of record which are derived from outdated cached state, and show in a demo how all the pieces come together, of course connected via Apache Kafka.

Presented at Kafka Summit London 2022; Infinispan slides contributed by Katia Aresti.

Gunnar Morling

April 26, 2022
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Image © Nathalie https://flic.kr/p/21Ghf2g (CC BY 2.0) Keep Your Cache

    Always Fresh With Debezium! Gunnar Morling Software Engineer, Red Hat @gunnarmorling
  2. #Debezium @gunnarmorling The Challenge • 90% read requests • Complex

    queries Multi-site Application With Shared Database
  3. #Debezium @gunnarmorling … Multi-site application with shared system-of-record database …

    With local, denormalized read views (CQRS) ... Automatically kept in sync after writes Today’s Mission 🤔 Explore How to Build a…
  4. #Debezium @gunnarmorling • Software engineer at Red Hat ◦ Debezium

    ◦ Quarkus • kcctl 🧸, JfrUnit, ModiTect, MapStruct • Spec Lead for Bean Validation 2.0 • Java Champion • @gunnarmorling Gunnar Morling
  5. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 5, Peter 5, Peter App 3 Data Put 4, Mike Remove 5 Peter Get 2, Null Infinispan Deployment Local Cache
  6. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 1, Maria 2, Jenny App 3 Data 1, Maria 2, Jenny Infinispan Deployment Replicated Cache
  7. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 1, Maria 2, Jenny App 3 Data 1, Maria 2, Jenny Put 3, Juan 3, Juan Infinispan Deployment Replicated Cache
  8. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 3, Juan 1, Maria 2, Jenny 3, Juan App 3 Data 1, Maria 2, Jenny 3, Juan Infinispan Deployment Replicated Cache
  9. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Infinispan Deployment Distributed Cache (One Owner)
  10. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Get 2 Jenny Get 2 Jenny Infinispan Deployment Distributed Cache (One Owner)
  11. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny App 3 Data 3, Juan Infinispan Deployment Distributed Cache (One Owner)
  12. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 2, Jenny 3, Juan App 3 Data 1, Maria 3, Juan Infinispan Deployment Distributed Cache (Two Owners)
  13. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 2, Jenny 3, Juan App 3 Data 1, Maria 3, Juan Put 4 Will Put 4 Will Infinispan Deployment Distributed Cache (Two Owners)
  14. #Debezium @gunnarmorling App 1 Data App 2 Data 1, Maria

    2, Jenny 4, Will 2, Jenny 3, Juan 4, Will App 3 Data 1, Maria 3, Juan Infinispan Deployment Distributed Cache (Two Owners)
  15. #Debezium @gunnarmorling Infinispan Client/Server Service 1 Service 2 Binary (hot

    rod) REST ... Service 3 Data Infinispan Cluster Data Data
  16. #Debezium @gunnarmorling Infinispan Cross-Site Replication AWS (LON) GCP (NYC) Load

    Balancer APP APP Service APP APP Service Shared State Shared State Shared State Shared State Data Data Data NYC Data LON RELAY2
  17. #Debezium @gunnarmorling Debezium in a Nutshell Open-Source Change Data Capture

    • A CDC Platform ◦ Based on transaction logs ◦ Snapshotting, filtering, etc. ◦ Outbox support ◦ Web-based UI • Fully open-source, very active community • Large production deployments
  18. #Debezium @gunnarmorling • Community-hosted connectors • pg_logical_emit_message() • Multi-DB support

    (SQL Server) • Debezium Server sinks • MongoDB change streams support • Debezium UI • etc. Detour: What’s New in Debezium?
  19. #Debezium @gunnarmorling • Can’t update filter list • Long-running snapshots

    can’t be paused/resumed • Can’t stream changes until snapshot completed • Can’t re-snapshot selected tables Detour: What’s New in Debezium? Incremental Snapshotting
  20. #Debezium @gunnarmorling Incremental Snapshotting • “DBLog: A Watermark Based Change-Data-Capture

    Framework”, by Andreas Andreakis and Ioannis Papapanagiotou • Key idea: interleave snapshot events and events from TX log https://arxiv.org/pdf/2010.12597v1.pdf Detour: What’s New in Debezium?
  21. #Debezium @gunnarmorling Support for pg_logical_emit_message() Detour: What’s New in Debezium?

    • Directly writing arbitrary messages to the WAL • No need for an outbox table
  22. #Debezium @gunnarmorling • Fast start-up, low memory consumption • Developer

    joy • Imperative and Reactive • Best-of-breed libraries • Run via HotSpot and GraalVM native binaries Quarkus - Supersonic Subatomic Java A Stack for Building Cloud-native Apps
  23. #Debezium @gunnarmorling • Java API for stateful stream processing •

    Rich set of operators • Scaling out to multiple JVMs • Interactive queries Kafka Streams Streaming Queries on Kafka Topics
  24. #Debezium @gunnarmorling … Multi-site application with shared system-of-record database ✅

    … With local, denormalized read views (CQRS) ✅ ... Automatically kept in sync after writes ✅ Today’s Mission 🤩 Explore How to Build a…
  25. #Debezium @gunnarmorling • Infinispan: @infinispan | https://infinispan.org/ • Debezium: @debezium

    | https://debezium.io/ • Demo: https://github.com/debezium/debezium-examples/ → distributed-caching • kcctl 🧸: https://github.com/kcctl/kcctl/ Learn More
  26. 44 Try our Kafka service! ▸ 48 hour trial ▸

    Free of charge ▸ OpenShift and Kubernetes AppDev examples Red Hat OpenShift Streams for Apache Kafka Trial TRY IT TODAY! http://red.ht/TryKafka