Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Change Data Streaming Use Cases With Apache Kafka and Debezium (QCon San Francisco 2019)

Practical Change Data Streaming Use Cases With Apache Kafka and Debezium (QCon San Francisco 2019)

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration, e.g. using the outbox pattern, as well as many other CDC applications, such as maintaining audit logs, automatically keeping your full-text search index in sync, and driving streaming queries. We’ll also discuss practical matters, e.g. HA set-ups, best practices for running Debezium in production on and off Kubernetes, and the many use cases enabled by Kafka Connect's single message transformations.

Gunnar Morling

November 12, 2019
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Practical Change Data Streaming Use Cases
    Practical Change Data Streaming Use Cases
    With Apache Kafka and Debezium
    With Apache Kafka and Debezium
    Gunnar Morling
    Gunnar Morling
    Software Engineer
    @gunnarmorling
    1

    View Slide

  2. DATA
    2

    View Slide

  3. 3

    View Slide

  4. 4

    View Slide

  5. The Issue with Dual Writes
    What's the problem?
    Change data capture to the rescue!
    CDC Use Cases & Patterns
    Replication
    Audit Logs
    Microservices
    Practical Matters
    Deployment Topologies
    Running on Kubernetes
    Single Message Transforms
    1
    2
    3
    5

    View Slide

  6. Gunnar Morling
    Gunnar Morling
    Open source software engineer at Red Hat
    Debezium
    Hibernate
    Spec Lead for Bean Validation 2.0
    Other projects: Deptective, MapStruct
    Java Champion
    #CDCUseCases @gunnarmorling
    6

    View Slide

  7. A Common Problem
    A Common Problem
    Updating Multiple Resources
    Updating Multiple Resources
    @gunnarmorling
    Database
    Order
    Service
    #CDCUseCases
    7

    View Slide

  8. A Common Problem
    A Common Problem
    Updating Multiple Resources
    Updating Multiple Resources
    @gunnarmorling
    Cache
    Database
    Order
    Service
    #CDCUseCases
    8

    View Slide

  9. A Common Problem
    A Common Problem
    Updating Multiple Resources
    Updating Multiple Resources
    @gunnarmorling
    Cache
    Database
    Order
    Service
    Search
    Index
    #CDCUseCases
    9

    View Slide

  10. A Common Problem
    A Common Problem
    Updating Multiple Resources
    Updating Multiple Resources
    @gunnarmorling
    Order
    Service
    Cache
    Database
    Search
    Index
    10
    “ Friends Don't Let Friends Do Dual Writes
    #CDCUseCases

    View Slide

  11. A Better Solution
    A Better Solution
    Streaming Change Events From the Database
    Streaming Change Events From the Database
    @gunnarmorling
    Order
    Service
    #CDCUseCases
    11

    View Slide

  12. A Better Solution
    A Better Solution
    Streaming Change Events From the Database
    Streaming Change Events From the Database
    @gunnarmorling
    Order
    Service
    C C U C U U D C
    C - Create
    U - Update
    D - Delete
    12
    Change Data
    Capture
    #CDCUseCases

    View Slide

  13. A Better Solution
    A Better Solution
    Streaming Change Events From the Database
    Streaming Change Events From the Database
    @gunnarmorling
    Order
    Service
    13
    C C U C U U D C
    C - Create
    U - Update
    D - Delete
    Change Data
    Capture
    #CDCUseCases

    View Slide

  14. Change Data Capture
    Change Data Capture
    With Debezium
    With Debezium
    14

    View Slide

  15. Debezium
    Debezium
    Change Data Capture Platform
    Change Data Capture Platform
    CDC for multiple databases
    Based on transaction logs
    Snapshotting, Filtering etc.
    Fully open-source, very active community
    Via Apache Kafka or embedded
    Many production deployments (e.g. WePay,
    Convoy, JW Player, Usabilla, BlaBlaCar etc.)
    @gunnarmorling
    #CDCUseCases
    15

    View Slide

  16. Debezium Connectors
    Debezium Connectors
    MySQL
    Postgres
    MongoDB
    SQL Server
    Cassandra (Incubating)
    Oracle (Incubating, based on XStream)
    Possible future additions
    DB2?
    MariaDB?
    @gunnarmorling
    #CDCUseCases
    16

    View Slide

  17. Meme idea: Robin Moffatt
    17

    View Slide

  18. Log- vs. Query-Based CDC
    Log- vs. Query-Based CDC
    @gunnarmorling
    Query-Based Log-Based
    All data changes are
    captured
    - +
    No polling delay or
    overhead
    - +
    Transparent to writing
    applications and models
    - +
    Can capture deletes and
    old record state
    - +
    Installation/Configuration + -
    #CDCUseCases
    18

    View Slide

  19. {
    "before": null,
    "after": {
    "id": 1004,
    "first_name": "Anne",
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "source": {
    "name": "dbserver1",
    "server_id": 0,
    "ts_sec": 0,
    "file": "mysql­bin.000003",
    "pos": 154,
    "row": 0,
    "snapshot": true,
    "db": "inventory",
    "table": "customers"
    },
    "op": "c",
    "ts_ms": 1486500577691
    }
    Change Event Structure
    Change Event Structure
    Key: Primary key of table
    Value: Describing the change event
    Old row state
    New row state
    Metadata
    Serialization formats:
    JSON
    Avro
    @gunnarmorling
    #CDCUseCases
    19

    View Slide

  20. The Issue with Dual Writes
    What's the problem?
    Change data capture to the rescue!
    1
    2
    3
    CDC Use Cases & Patterns
    Replication
    Audit Logs
    Microservices
    Practical Matters
    Deployment Topologies
    Running on Kubernetes
    Single Message Transforms
    20

    View Slide

  21. @gunnarmorling
    CDC – "Liberation for Your Data"
    CDC – "Liberation for Your Data"
    #CDCUseCases
    21

    View Slide

  22. @gunnarmorling
    Postgres
    MySQL
    Apache Kafka
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    #CDCUseCases
    22

    View Slide

  23. @gunnarmorling
    Postgres
    MySQL
    Apache Kafka
    Kafka Connect Kafka Connect
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    #CDCUseCases
    23

    View Slide

  24. @gunnarmorling
    Postgres
    MySQL
    Apache Kafka
    Kafka Connect Kafka Connect
    DBZ PG
    DBZ
    MySQL
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    #CDCUseCases
    24

    View Slide

  25. @gunnarmorling
    Postgres
    MySQL
    Kafka Connect Kafka Connect
    Apache Kafka
    DBZ PG
    DBZ
    MySQL
    Elasticsearch
    ES
    Connector
    25
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    #CDCUseCases

    View Slide

  26. @gunnarmorling
    Postgres
    MySQL
    Kafka Connect Kafka Connect
    Apache Kafka
    DBZ PG
    DBZ
    MySQL
    Elasticsearch
    ES
    Connector
    26
    JDBC
    Connector
    ES
    Connector
    Data
    Warehouse
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    #CDCUseCases

    View Slide

  27. @gunnarmorling
    Postgres
    MySQL
    Kafka Connect Kafka Connect
    Apache Kafka
    DBZ PG
    DBZ
    MySQL
    Elasticsearch
    ES
    Connector
    27
    JDBC
    Connector
    ES
    Connector
    ISPN
    Connector
    Infinispan
    Data Replication
    Data Replication
    Zero-Code Streaming Pipelines
    Zero-Code Streaming Pipelines
    Data
    Warehouse
    #CDCUseCases

    View Slide

  28. @gunnarmorling
    Data Replication
    Data Replication
    Low-Latency Streaming Pipelines
    Low-Latency Streaming Pipelines
    #CDCUseCases
    https://medium.com/convoy-tech/
    28

    View Slide

  29. @gunnarmorling
    Auditing
    Auditing
    Source DB Kafka Connect Apache Kafka
    DBZ
    Customer Events
    CRM
    Service
    #CDCUseCases
    29

    View Slide

  30. @gunnarmorling
    Auditing
    Auditing
    Source DB Kafka Connect Apache Kafka
    DBZ
    Customer Events
    CRM
    Service
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    "Transactions" table
    30
    #CDCUseCases

    View Slide

  31. @gunnarmorling
    Auditing
    Auditing
    Source DB Kafka Connect Apache Kafka
    DBZ
    Customer Events
    Transactions
    CRM
    Service
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    "Transactions" table
    31
    #CDCUseCases

    View Slide

  32. @gunnarmorling
    Auditing
    Auditing
    Source DB Kafka Connect Apache Kafka
    DBZ
    Customer Events
    Transactions
    CRM
    Service
    Kafka
    Streams
    32
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    "Transactions" table
    #CDCUseCases

    View Slide

  33. @gunnarmorling
    Auditing
    Auditing
    Source DB Kafka Connect Apache Kafka
    DBZ
    Customer Events
    Transactions
    CRM
    Service
    Kafka
    Streams
    33
    Id User Use Case
    tx-1 Bob Create Customer
    tx-2 Sarah Delete Customer
    tx-3 Rebecca Update Customer
    "Transactions" table
    Enriched Customer Events
    #CDCUseCases

    View Slide

  34. @gunnarmorling
    Auditing
    Auditing
    {
    "before": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "after": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "source": {
    "name": "dbserver1",
    "table": "customers",
    "txId": "tx­3"
    },
    "op": "u",
    "ts_ms": 1486500577691
    }
    Customers
    #CDCUseCases
    34

    View Slide

  35. @gunnarmorling
    {
    "before": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "after": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "source": {
    "name": "dbserver1",
    "table": "customers",
    "txId": "tx­3"
    },
    "op": "u",
    "ts_ms": 1486500577691
    }
    {
    "before": null,
    "after": {
    "id": "tx­3",
    "user": "Rebecca",
    "use_case": "Update customer"
    },
    "source": {
    "name": "dbserver1",
    "table": "transactions",
    "txId": "tx­3"
    },
    "op": "c",
    "ts_ms": 1486500577691
    }
    Transactions Customers
    {
    "id": "tx­3"
    }
    #CDCUseCases
    35

    View Slide

  36. {
    "id": "tx­3"
    }
    {
    "before": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "after": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "source": {
    "name": "dbserver1",
    "table": "customers",
    "txId": "tx­3"
    },
    "op": "u",
    "ts_ms": 1486500577691
    }
    Transactions Customers
    @gunnarmorling
    #CDCUseCases
    {
    "before": null,
    "after": {
    "id": "tx­3",
    "user": "Rebecca",
    "use_case": "Update customer"
    },
    "source": {
    "name": "dbserver1",
    "table": "transactions",
    "txId": "tx­3"
    },
    "op": "c",
    "ts_ms": 1486500577691
    }
    36

    View Slide

  37. @gunnarmorling
    {
    "before": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "after": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]"
    },
    "source": {
    "name": "dbserver1",
    "table": "customers",
    "txId": "tx­3",
    "user": "Rebecca",
    "use_case": "Update customer"
    },
    "op": "u",
    "ts_ms": 1486500577691
    }
    Enriched Customers
    Auditing
    Auditing
    #CDCUseCases
    37

    View Slide

  38. @gunnarmorling
    @Override
    public KeyValue
    transform(JsonObject key, JsonObject value) {
    boolean enrichedAllBufferedEvents =
    enrichAndEmitBufferedEvents();
    if (!enrichedAllBufferedEvents) {
    bufferChangeEvent(key, value);
    return null;
    }
    KeyValue enriched =
    enrichWithTxMetaData(key, value);
    if (enriched == null) {
    bufferChangeEvent(key, value);
    }
    return enriched;
    }
    Auditing
    Auditing
    Non-trivial join implementation
    no ordering across topics
    need to buffer change events
    until TX data available
    bit.ly/debezium-auditlogs
    #CDCUseCases
    38

    View Slide

  39. Microservice
    Microservice
    CDC Patterns
    CDC Patterns
    39

    View Slide

  40. @gunnarmorling
    Order Item Stock
    App
    Local DB Local DB Local DB
    App App
    40
    Item Changes
    Stock Changes
    Microservice Architectures
    Microservice Architectures
    Data Synchronization
    Data Synchronization
    Propagate data between different
    services without coupling
    Each service keeps
    optimised views locally
    #CDCUseCases

    View Slide

  41. Source DB
    Kafka Connect Apache Kafka
    DBZ Order Events
    Credit Worthiness Check Events
    Outbox Pattern
    Outbox Pattern
    Separate Events Table
    Separate Events Table
    @gunnarmorling
    Order
    Service
    Shipment
    Service
    41
    Customer
    Service
    Orders Outbox
    #CDCUseCases

    View Slide

  42. Source DB
    Kafka Connect Apache Kafka
    DBZ Order Events
    Credit Worthiness Check Events
    Outbox Pattern
    Outbox Pattern
    Separate Events Table
    Separate Events Table
    @gunnarmorling
    Order
    Service
    Shipment
    Service
    Customer
    Service
    42
    Id AggregateType AggregateId Type Payload
    ec6e Order 123 OrderCreated { "id" : 123, ... }
    8af8 Order 456 OrderDetailCanceled { "id" : 456, ... }
    890b Customer 789 InvoiceCreated { "id" : 789, ... }
    "Outbox" table
    bit.ly/debezium-outbox-pattern
    Orders Outbox
    #CDCUseCases

    View Slide

  43. Strangler Pattern
    Strangler Pattern
    Migrating from Monoliths to Microservices
    Migrating from Monoliths to Microservices
    https://martinfowler.com/bliki/StranglerFigApplication.html
    @gunnarmorling
    #CDCUseCases
    43

    View Slide

  44. @gunnarmorling
    Customer
    Strangler
    Strangler
    Pattern
    Pattern
    #CDCUseCases
    44

    View Slide

  45. @gunnarmorling
    Router
    CDC
    Customer Customer'
    45
    Reads/
    Writes Reads
    Strangler
    Strangler
    Pattern
    Pattern
    Transformation
    #CDCUseCases

    View Slide

  46. @gunnarmorling
    Router
    CDC
    Customer
    46
    Reads/
    Writes
    Reads/
    Writes
    CDC
    Strangler
    Strangler
    Pattern
    Pattern
    #CDCUseCases

    View Slide

  47. The Issue with Dual Writes
    What's the problem?
    Change data capture to the rescue!
    1
    3
    2
    CDC Use Cases & Patterns
    Replication
    Audit Logs
    Microservices
    Practical Matters
    Deployment Topologies
    Running on Kubernetes
    Single Message Transforms
    47

    View Slide

  48. @gunnarmorling
    Deployment Topologies
    Deployment Topologies
    Basic Set-Up
    Basic Set-Up
    CDC
    #CDCUseCases
    48

    View Slide

  49. Deployment Topologies
    Deployment Topologies
    Database High Availability
    Database High Availability
    @gunnarmorling
    CDC
    #CDCUseCases
    49

    View Slide

  50. Deployment Topologies
    Deployment Topologies
    Database High Availability
    Database High Availability
    @gunnarmorling
    CDC
    #CDCUseCases
    50

    View Slide

  51. Deployment Topologies
    Deployment Topologies
    Automatic Fail-over
    Automatic Fail-over
    @gunnarmorling
    HA Proxy CDC
    #CDCUseCases
    51

    View Slide

  52. Deployment Topologies
    Deployment Topologies
    Automatic Fail-over
    Automatic Fail-over
    @gunnarmorling
    HA Proxy CDC
    #CDCUseCases
    52

    View Slide

  53. Deployment Topologies
    Deployment Topologies
    Can't Change Binlog Mode?
    Can't Change Binlog Mode?
    @gunnarmorling
    CDC
    Primary Secondary
    #CDCUseCases
    53

    View Slide

  54. Deployment Topologies
    Deployment Topologies
    High Availability for Connectors
    High Availability for Connectors
    @gunnarmorling
    CDC
    Deduplicator
    CDC
    54
    #CDCUseCases

    View Slide

  55. apiVersion: "kafka.strimzi.io/v1alpha1"
    kind: "KafkaConnector"
    metadata:
    name: "inventory­connector"
    labels:
    connect­cluster: my­connect­cluster
    spec:
    class: i.d.c.p.PostgresConnector
    tasksMax: 1
    config:
    database.hostname: "postgres",
    database.port: "5432",
    database.user: "bob",
    database.password: "secret",
    database.dbname : "prod",
    database.server.name: "dbserver1",
    schema.whitelist: "inventory"
    Running on Kubernetes
    Running on Kubernetes
    Deployment via Operators
    Deployment via Operators
    YAML-based custom resource definitions
    for Kafka/Connect clusters, topics etc.
    Operator applies configuration
    Advantages
    Automated deployment and scaling
    Simplified upgrading
    Portability across clouds
    @gunnarmorling
    #CDCUseCases
    55

    View Slide

  56. Running on Kubernetes
    Running on Kubernetes
    Operating Kafka Connect
    Operating Kafka Connect
    Distributed mode
    Offsets stored in Kafka
    Configuration via REST
    Single node: no re-balancing issues
    (< Apache Kafka 2.3)
    Single connector: health checks
    based on REST API
    Fight duplication: Jsonnet templates
    @gunnarmorling
    // a database + connector per tenant
    {
    "name": "inventory­connector",
    "config": {
    "connector.class":
    "i.d.c.p.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "bob",
    "database.password": "secret",
    "database.dbname" :
    std.extVar('tenant'),
    "database.server.name":
    std.extVar('tenant'),
    "schema.whitelist": "inventory"
    }
    }
    #CDCUseCases
    56

    View Slide

  57. Single Message Transformations
    Single Message Transformations
    The Swiss Army Knife of Kafka Connect
    The Swiss Army Knife of Kafka Connect
    Format conversions
    Time/date fields
    Extract new row state
    Aggregate sharded tables to single topic
    Keep compatibility with existing consumers
    @gunnarmorling
    #CDCUseCases
    © Emilian Robert Vicol https://flic.kr/p/c8s6Y3
    57

    View Slide

  58. Single Message Transformations
    Single Message Transformations
    Externalizing Large Column Values
    Externalizing Large Column Values
    @gunnarmorling
    DBZ
    Amazon S3
    #CDCUseCases
    58

    View Slide

  59. Single Message Transformations
    Single Message Transformations
    Externalizing Large Column Values
    Externalizing Large Column Values
    @gunnarmorling
    DBZ
    Amazon S3
    {
    "before": { ... },
    "after": {
    "id": 1004,
    "last_name": "Kretchmar",
    "email": "[email protected]",
    "image":
    "imgs­­after"
    },
    ...
    }
    #CDCUseCases
    59

    View Slide

  60. Takeaways
    Takeaways
    Change Data Capture – Liberation for your data!
    Enabling use cases such as replication, streaming queries,
    maintaining CQRS read models etc.
    Microservices: outbox and strangler patterns
    Debezium: open-source CDC for a growing number of
    databases
    @gunnarmorling
    #CDCUseCases
    “ Friends Don't Let Friends Do Dual-Writes
    60

    View Slide

  61. DATA
    61

    View Slide

  62. Resources
    Resources
    Website:
    Strimzi (Kafka on Kubernetes)
    Latest news: @debezium
    debezium.io
    debezium.io/documentation/online-resources
    debezium.io/blog
    strimzi.io
    @gunnarmorling
    #CDCUseCases
    62

    View Slide

  63. [email protected]
    @gunnarmorling
    @gunnarmorling
    Q&A
    #CDCUseCases
    63

    View Slide

  64. Outlook: View Materialization
    Outlook: View Materialization
    Awareness of Transaction Boundaries
    Awareness of Transaction Boundaries
    Topic with BEGIN/END markers
    Enable consumers to buffer all events
    of one transaction
    @gunnarmorling
    {
    "transactionId" : "tx­123",
    "eventType" : "begin transaction",
    "ts_ms": 1486500577125
    }
    {
    "transactionId" : "tx­123",
    "ts_ms": 1486500577691,
    "eventType" : "end transaction",
    "eventCount" : [
    {
    "name" : "dbserver1.inventory.Order",
    "count" : 1
    },
    {
    "name" : "dbserver1.inventory.OrderLine",
    "count" : 5
    }
    ]
    }
    #CDCUseCases
    BEGIN END
    64

    View Slide

  65. 65

    View Slide