Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Change Data Capture with Debezium And Apache Kafka

Aykut Bulgu
November 26, 2020

Introducing Change Data Capture with Debezium And Apache Kafka

Istanbul JUG, Nov 26th 2020

Follow me on twitter (@systemcraftsman) or subscribe to https://www.systemcraftsman.com/join/ to get any updates from me.

Aykut Bulgu

November 26, 2020
Tweet

More Decks by Aykut Bulgu

Other Decks in Technology

Transcript

  1. @systemcraftsman
    Introducing Change Data
    Capture with Debezium and
    Apache Kafka
    Aykut M. Bulgu
    Technology Consultant | Software Architect
    [email protected]

    View full-size slide

  2. @systemcraftsman
    #oc apply -f aykutbulgu.yaml
    apiVersion: redhat/v2.5
    kind: Middleware & AppDev Consultant
    metadata:
    name: Aykut Bulgu
    namespace: Red Hat Consulting - CEMEA
    Annotations:
    twitter: @systemcraftsman
    email: [email protected]
    organizer: Software Craftsmanship Turkey
    founder: System Craftsman
    labels:
    married: yes
    children: daughter
    interests: tech (cloud & middleware), aikido, 80s
    spec:
    replicas: 2
    containers:
    - image: aykut:latest
    Me as Code

    View full-size slide

  3. @systemcraftsman
    Agenda
    The Issue with Dual Writes
    What's the problem?
    Change data capture to the rescue!
    CDC Use Cases & Patterns
    Replication
    Audit Logs
    Microservices
    Practical Matters
    Deployment Topologies
    Running on Kubernetes
    Single Message Transforms

    View full-size slide

  4. @systemcraftsman
    Common Problem
    Updating multiple resources
    Order
    Service
    Database

    View full-size slide

  5. @systemcraftsman
    Common Problem
    Updating multiple resources
    Order
    Service
    Database
    Cache

    View full-size slide

  6. @systemcraftsman
    Common Problem
    Updating multiple resources
    Order
    Service
    Database
    Cache
    Search
    Index

    View full-size slide

  7. @systemcraftsman
    Common Problem
    Updating multiple resources
    Order
    Service
    Database
    Cache
    Search
    Index

    View full-size slide

  8. @systemcraftsman
    ‘Friends Don't Let Friends Do Dual
    Writes!’

    View full-size slide

  9. @systemcraftsman
    As a Solution
    Stream changes events from the database
    Order
    Service

    View full-size slide

  10. @systemcraftsman
    As a Solution
    Stream changes events from the database
    Order
    Service
    C | C | U | C | U | U | D
    Change
    Data
    Capture
    C - Change
    U - Update
    D - Delete

    View full-size slide

  11. @systemcraftsman
    As a Solution
    Stream changes events from the database
    Order
    Service
    C | C | U | C | U | U | D
    Change
    Data
    Capture
    C - Change
    U - Update
    D - Delete

    View full-size slide

  12. @systemcraftsman
    Change Data Capture with Debezium
    Debezium is an open source distributed platform for change data capture

    View full-size slide

  13. @systemcraftsman
    Debezium
    Change Data Capture Platform
    CDC for multiple databases
    Based on transaction logs
    Snapshotting, filtering, etc.
    Fully open-source, very active community
    Latest version: 1.3
    Production deployments at multiple companies (e.g.
    WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)

    View full-size slide

  14. @systemcraftsman
    Red Hat CDC
    Supported Databases
    GA Connectors:
    MySQL
    Postgres
    SQL Server
    MongoDB
    Developer Preview:
    DB2

    View full-size slide

  15. @systemcraftsman
    Advantages of Log-based CDC
    Tailing the Transaction Logs
    All data changes are captured
    No polling delay or overhead
    Transparent to writing applications and models
    Can capture deletes
    Can capture old record state and further meta data
    https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/

    View full-size slide

  16. @systemcraftsman
    Log vs Query based CDC
    Query-based Log-based
    All data changes are captured -
    No polling delay or overhead -
    Transparent to writing applications
    and models
    -
    Can capture deletes and old record
    state
    -
    Simple Installation/Configuration -

    View full-size slide

  17. @systemcraftsman
    Debezium
    Change Event Structure
    ● Key: PK of table
    ● Value: Describing the change event
    ○ Before state,
    ○ After state,
    ○ Metadata info
    ● Serialization formats:
    ○ JSON
    ○ Avro
    ● Cloud events could be used too

    View full-size slide

  18. @systemcraftsman
    Single Message Transformations
    Image Source: “Penknife, Swiss Army Knife” by Emilian
    Robert Vicol , used under CC BY 2.0
    Lightweight single message inline transformation
    Format conversions
    Time/date fields
    Extract new row state
    Aggregate sharded tables to single topic
    Keep compatibility with existing consumers
    Transformation does not interact with external systems
    Modify events before storing in Kafka

    View full-size slide

  19. @systemcraftsman
    Change Data Capture Usages & Patterns

    View full-size slide

  20. @systemcraftsman
    Data Replication
    Zero-Code Streaming Pipelines
    | | | | | | |
    | | | | | | | |
    | | | | | |
    MySQL
    PostgreSQL
    Apache Kafka

    View full-size slide

  21. @systemcraftsman
    Data Replication
    Zero-Code Streaming Pipelines
    | | | | | | |
    | | | | | | | |
    | | | | | |
    MySQL
    PostgreSQL
    Apache Kafka
    Kafka Connect Kafka Connect

    View full-size slide

  22. @systemcraftsman
    Data Replication
    Zero-Code Streaming Pipelines
    | | | | | | |
    | | | | | | | |
    | | | | | |
    MySQL
    PostgreSQL
    Apache Kafka
    Kafka Connect Kafka Connect
    DBZ PG
    DBZ
    MySQL

    View full-size slide

  23. @systemcraftsman
    Data Replication
    Zero-Code Streaming Pipelines
    | | | | | | |
    | | | | | | | |
    | | | | | |
    MySQL
    PostgreSQL
    Apache Kafka
    Kafka Connect Kafka Connect
    DBZ PG
    DBZ
    MySQL
    ES
    Connector
    ElasticSearch

    View full-size slide

  24. @systemcraftsman
    Data Replication
    Zero-Code Streaming Pipelines
    | | | | | | |
    | | | | | | | |
    | | | | | |
    MySQL
    PostgreSQL
    Apache Kafka
    Kafka Connect Kafka Connect
    DBZ PG
    DBZ
    MySQL
    ES
    Connector ElasticSearch
    SQL
    Connector
    Data
    Warehouse

    View full-size slide

  25. @systemcraftsman
    Auditing
    Source: http://bit.ly/debezium-auditlogs
    | | | | | | | |
    DBZ
    CRM
    Service
    Source DB
    Kafka Connect
    Apache Kafka
    CDC and a bit of Kafka Streams

    View full-size slide

  26. @systemcraftsman
    Auditing
    | | | | | | | |
    DBZ
    CRM
    Service
    Source DB
    Kafka Connect
    Apache Kafka
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  27. @systemcraftsman
    Auditing
    | | | | | | | |
    DBZ
    CRM
    Service
    Source DB
    Kafka Connect
    Apache Kafka
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    Customer Events
    | | | | |
    |
    Transactions
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  28. @systemcraftsman
    Auditing
    | | | | | | | |
    DBZ
    CRM
    Service
    Source DB
    Kafka Connect
    Apache Kafka
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    Customer Events
    | | | | |
    |
    Transactions
    Kafka Streams
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  29. @systemcraftsman
    Auditing
    | | | | | | | |
    DBZ
    CRM
    Service
    Source DB
    Kafka Connect
    Apache Kafka
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    Customer Events
    | | | | |
    |
    Transactions
    Kafka Streams
    | | | | | | | |
    Enriched Customers
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  30. @systemcraftsman
    Auditing
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  31. @systemcraftsman
    Auditing
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  32. @systemcraftsman
    Auditing
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  33. @systemcraftsman
    Auditing
    CDC and a bit of Kafka Streams
    Source: http://bit.ly/debezium-auditlogs

    View full-size slide

  34. @systemcraftsman
    Microservices
    Propagate data between different
    services without coupling
    Each service keeps optimised views
    locally
    Microservices Data Exchange

    View full-size slide

  35. @systemcraftsman
    Microservices
    Source: http://bit.ly/debezium-outbox-pattern
    Outbox Pattern

    View full-size slide

  36. @systemcraftsman
    Microservices
    Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under
    CC BY SA 2.0
    Extract microservice for single
    component(s)
    Keep write requests against running
    monolith
    Stream changes to extracted microservice
    Test new functionality
    Switch over, evolve schema only
    afterwards
    Strangler Pattern

    View full-size slide

  37. @systemcraftsman
    Mono to micro: Strangler Pattern
    Customer

    View full-size slide

  38. @systemcraftsman
    Mono to micro: Strangler Pattern
    Customer Customer
    Router
    CDC
    Transformation
    Reads /
    Writes Reads

    View full-size slide

  39. @systemcraftsman
    Mono to micro: Strangler Pattern
    Customer
    Router
    CDC
    Reads /
    Writes
    Reads /
    Writes
    CDC

    View full-size slide

  40. @systemcraftsman
    Running on OpenShift
    Getting the best cloud-native Apache Kafka running on enterprise Kubernetes

    View full-size slide

  41. @systemcraftsman
    Running on OpenShift
    Provides:
    Container images for Apache Kafka, Connect, Zookeeper and
    MirrorMaker
    Kubernetes Operators for managing/configuring Apache Kafka
    clusters, topics and users
    Kafka Consumer, Producer and Admin clients, Kafka Streams
    Upstream Community: Strimzi
    Cloud-native Apache Kafka

    View full-size slide

  42. @systemcraftsman
    Running on OpenShift
    Source:
    YAML-based custom resource definitions for
    Kafka/Connect clusters, topics etc.
    Operator applies configuration
    Advantages
    Automated deployment and scaling
    Simplified upgrading
    Portability across clouds
    Deployment via Operators

    View full-size slide

  43. @systemcraftsman
    Demo Time!
    https://github.com/systemcraftsman/debezium-demo

    View full-size slide

  44. @systemcraftsman
    Thank You
    @systemcraftsman
    [email protected]
    [email protected]

    View full-size slide