Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SG Kafka Meetup Jan 2025 : Realtime Audit Frame...

SG Kafka Meetup Jan 2025 : Realtime Audit Framework : Leveraging CDC & Kafka Connect

Singapore Kafka meetup talk on 16th January on "Realtime Audit Framework :Leveraging CDC and Kafka Connect" by Zabeer Farook and Lahiru Samarawickrama

Zabeer Farook

January 17, 2025
Tweet

More Decks by Zabeer Farook

Other Decks in Technology

Transcript

  1. About US Zabeer Farook Technical Architect, Credit Agricole CIB -

    Confluent Community Catalyst - Passionate about Stream data processing, Event Driven Architecture, Cloud & DevOps. - Love travelling & exploring places Let’s connect Lahiru Samarawickrama Technical Lead, Credit Agricole CIB - Passionate about Distributed Computing & Big Data - Enjoy travelling & hanging out with friends Let’s connect
  2. TODAY’S STORY LINE Why Audit Trail User experience Physical Computing

    HTML, CSS,JS Change Data Capture Debezium CDC CDC Based Audit Trail Q&A Debezium & Kafka Connect Demo
  3. What is CDC (Change Data Capture) Change Data Capture is

    a software process to track and capture the changes in a Database.
  4. Log Based CDC • Captures event streams for any database

    changes • Change events typically sourced from DB Txn Logs • Events further streamed/shipped to target typically via messaging infrastructure
  5. Other Types Of CDC Employ a DB Trigger to capture

    the changes Trigger Based • Dedicated Trigger on each table to monitor insert/update/delete • Tightly coupled with the main transactional updates • Also interferes with the transactional updates and may cause performance issues Periodic polling for changes in a table Query Based • Use Created / Modified Timestamp to query recently updated records • Far from efficient & correctness • Interferes with the transactional updates and may cause lock contentions
  6. CDC Use Cases Database Migration Audit / History Event Capture

    Populating Analytical Data Store Database Replication Update Search Index Cache Update / Invalidation Update CQRS Read Model Outbox Microservices Pattern
  7. Debezium CDC • CDC Platform based on transaction logs •

    Snapshotting support • Fully Open Source and widely used • Active Community Support • Supports Connectors for all major databases
  8. Best Practices • Use correlation-id to identify group of updates

    • Use de-duplication id / idempotency logic to avoid duplicates • Make use of an enrichment service to enrich audit data with additional context or to format data • Maintain configuration to include / exclude any columns from audit • Design a generic table for holding Audit Data. NoSQL DB is an option too • Make use of a schema registry
  9. Challenges • Managing schema changes in source tables • Handling

    snapshot events • Handling huge volume of data and with high network latency (tune offset.flush.interval.ms)