Slide 1

Slide 1 text

Realtime Audit Framework: Leveraging CDC & Kafka Connect

Slide 2

Slide 2 text

About US Zabeer Farook Technical Architect, Credit Agricole CIB - Confluent Community Catalyst - Passionate about Stream data processing, Event Driven Architecture, Cloud & DevOps. - Love travelling & exploring places Let’s connect Lahiru Samarawickrama Technical Lead, Credit Agricole CIB - Passionate about Distributed Computing & Big Data - Enjoy travelling & hanging out with friends Let’s connect

Slide 3

Slide 3 text

TODAY’S STORY LINE Why Audit Trail User experience Physical Computing HTML, CSS,JS Change Data Capture Debezium CDC CDC Based Audit Trail Q&A Debezium & Kafka Connect Demo

Slide 4

Slide 4 text

Why Audit Trail Compliance Security Regulations Transparency

Slide 5

Slide 5 text

What is CDC (Change Data Capture) Change Data Capture is a software process to track and capture the changes in a Database.

Slide 6

Slide 6 text

Log Based CDC ● Captures event streams for any database changes ● Change events typically sourced from DB Txn Logs ● Events further streamed/shipped to target typically via messaging infrastructure

Slide 7

Slide 7 text

Other Types Of CDC Employ a DB Trigger to capture the changes Trigger Based ● Dedicated Trigger on each table to monitor insert/update/delete ● Tightly coupled with the main transactional updates ● Also interferes with the transactional updates and may cause performance issues Periodic polling for changes in a table Query Based ● Use Created / Modified Timestamp to query recently updated records ● Far from efficient & correctness ● Interferes with the transactional updates and may cause lock contentions

Slide 8

Slide 8 text

CDC Use Cases Database Migration Audit / History Event Capture Populating Analytical Data Store Database Replication Update Search Index Cache Update / Invalidation Update CQRS Read Model Outbox Microservices Pattern

Slide 9

Slide 9 text

Some Of The Available CDC Tools

Slide 10

Slide 10 text

Debezium CDC ● CDC Platform based on transaction logs ● Snapshotting support ● Fully Open Source and widely used ● Active Community Support ● Supports Connectors for all major databases

Slide 11

Slide 11 text

Sample Change Data Feed

Slide 12

Slide 12 text

Debezium Deployment Options Most Popular Option With Kafka Connect Source: debezium.io

Slide 13

Slide 13 text

Debezium Deployment Options Debezium Server or Embedded Engine Source: debezium.io

Slide 14

Slide 14 text

CDC Based Audit Trail Framework

Slide 15

Slide 15 text

Demo https://github.com/lmsamarawickrama/cdc-audit-demo

Slide 16

Slide 16 text

Best Practices ● Use correlation-id to identify group of updates ● Use de-duplication id / idempotency logic to avoid duplicates ● Make use of an enrichment service to enrich audit data with additional context or to format data ● Maintain configuration to include / exclude any columns from audit ● Design a generic table for holding Audit Data. NoSQL DB is an option too ● Make use of a schema registry

Slide 17

Slide 17 text

Challenges ● Managing schema changes in source tables ● Handling snapshot events ● Handling huge volume of data and with high network latency (tune offset.flush.interval.ms)

Slide 18

Slide 18 text

Q&A Thanks.. Any questions?