Slide 1

Slide 1 text

Practical Change Data Streaming Use Cases Practical Change Data Streaming Use Cases With Apache Kafka and Debezium With Apache Kafka and Debezium Gunnar Morling Gunnar Morling Software Engineer @gunnarmorling 1

Slide 2

Slide 2 text

DATA 2

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3 5

Slide 6

Slide 6 text

Gunnar Morling Gunnar Morling Open source software engineer at Red Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct Java Champion #CDCUseCases @gunnarmorling 6

Slide 7

Slide 7 text

A Common Problem A Common Problem Updating Multiple Resources Updating Multiple Resources @gunnarmorling Database Order Service #CDCUseCases 7

Slide 8

Slide 8 text

A Common Problem A Common Problem Updating Multiple Resources Updating Multiple Resources @gunnarmorling Cache Database Order Service #CDCUseCases 8

Slide 9

Slide 9 text

A Common Problem A Common Problem Updating Multiple Resources Updating Multiple Resources @gunnarmorling Cache Database Order Service Search Index #CDCUseCases 9

Slide 10

Slide 10 text

A Common Problem A Common Problem Updating Multiple Resources Updating Multiple Resources @gunnarmorling Order Service Cache Database Search Index 10 “ Friends Don't Let Friends Do Dual Writes #CDCUseCases

Slide 11

Slide 11 text

A Better Solution A Better Solution Streaming Change Events From the Database Streaming Change Events From the Database @gunnarmorling Order Service #CDCUseCases 11

Slide 12

Slide 12 text

A Better Solution A Better Solution Streaming Change Events From the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete 12 Change Data Capture #CDCUseCases

Slide 13

Slide 13 text

A Better Solution A Better Solution Streaming Change Events From the Database Streaming Change Events From the Database @gunnarmorling Order Service 13 C C U C U U D C C - Create U - Update D - Delete Change Data Capture #CDCUseCases

Slide 14

Slide 14 text

Change Data Capture Change Data Capture With Debezium With Debezium 14

Slide 15

Slide 15 text

Debezium Debezium Change Data Capture Platform Change Data Capture Platform CDC for multiple databases Based on transaction logs Snapshotting, Filtering etc. Fully open-source, very active community Via Apache Kafka or embedded Many production deployments (e.g. WePay, Convoy, JW Player, Usabilla, BlaBlaCar etc.) @gunnarmorling #CDCUseCases 15

Slide 16

Slide 16 text

Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Cassandra (Incubating) Oracle (Incubating, based on XStream) Possible future additions DB2? MariaDB? @gunnarmorling #CDCUseCases 16

Slide 17

Slide 17 text

Meme idea: Robin Moffatt 17

Slide 18

Slide 18 text

Log- vs. Query-Based CDC Log- vs. Query-Based CDC @gunnarmorling Query-Based Log-Based All data changes are captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + - #CDCUseCases 18

Slide 19

Slide 19 text

{ "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro @gunnarmorling #CDCUseCases 19

Slide 20

Slide 20 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 20

Slide 21

Slide 21 text

@gunnarmorling CDC – "Liberation for Your Data" CDC – "Liberation for Your Data" #CDCUseCases 21

Slide 22

Slide 22 text

@gunnarmorling Postgres MySQL Apache Kafka Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #CDCUseCases 22

Slide 23

Slide 23 text

@gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #CDCUseCases 23

Slide 24

Slide 24 text

@gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #CDCUseCases 24

Slide 25

Slide 25 text

@gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector 25 Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #CDCUseCases

Slide 26

Slide 26 text

@gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector 26 JDBC Connector ES Connector Data Warehouse Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines #CDCUseCases

Slide 27

Slide 27 text

@gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector 27 JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Data Replication Zero-Code Streaming Pipelines Zero-Code Streaming Pipelines Data Warehouse #CDCUseCases

Slide 28

Slide 28 text

@gunnarmorling Data Replication Data Replication Low-Latency Streaming Pipelines Low-Latency Streaming Pipelines #CDCUseCases https://medium.com/convoy-tech/ 28

Slide 29

Slide 29 text

@gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events CRM Service #CDCUseCases 29

Slide 30

Slide 30 text

@gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table 30 #CDCUseCases

Slide 31

Slide 31 text

@gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table 31 #CDCUseCases

Slide 32

Slide 32 text

@gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Kafka Streams 32 Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table #CDCUseCases

Slide 33

Slide 33 text

@gunnarmorling Auditing Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Kafka Streams 33 Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events #CDCUseCases

Slide 34

Slide 34 text

@gunnarmorling Auditing Auditing { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } Customers #CDCUseCases 34

Slide 35

Slide 35 text

@gunnarmorling { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } Transactions Customers { "id": "tx­3" } #CDCUseCases 35

Slide 36

Slide 36 text

{ "id": "tx­3" } { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3" }, "op": "u", "ts_ms": 1486500577691 } Transactions Customers @gunnarmorling #CDCUseCases { "before": null, "after": { "id": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx­3" }, "op": "c", "ts_ms": 1486500577691 } 36

Slide 37

Slide 37 text

@gunnarmorling { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx­3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers Auditing Auditing #CDCUseCases 37

Slide 38

Slide 38 text

@gunnarmorling @Override public KeyValue transform(JsonObject key, JsonObject value) { boolean enrichedAllBufferedEvents = enrichAndEmitBufferedEvents(); if (!enrichedAllBufferedEvents) { bufferChangeEvent(key, value); return null; } KeyValue enriched = enrichWithTxMetaData(key, value); if (enriched == null) { bufferChangeEvent(key, value); } return enriched; } Auditing Auditing Non-trivial join implementation no ordering across topics need to buffer change events until TX data available bit.ly/debezium-auditlogs #CDCUseCases 38

Slide 39

Slide 39 text

Microservice Microservice CDC Patterns CDC Patterns 39

Slide 40

Slide 40 text

@gunnarmorling Order Item Stock App Local DB Local DB Local DB App App 40 Item Changes Stock Changes Microservice Architectures Microservice Architectures Data Synchronization Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally #CDCUseCases

Slide 41

Slide 41 text

Source DB Kafka Connect Apache Kafka DBZ Order Events Credit Worthiness Check Events Outbox Pattern Outbox Pattern Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service 41 Customer Service Orders Outbox #CDCUseCases

Slide 42

Slide 42 text

Source DB Kafka Connect Apache Kafka DBZ Order Events Credit Worthiness Check Events Outbox Pattern Outbox Pattern Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service Customer Service 42 Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table bit.ly/debezium-outbox-pattern Orders Outbox #CDCUseCases

Slide 43

Slide 43 text

Strangler Pattern Strangler Pattern Migrating from Monoliths to Microservices Migrating from Monoliths to Microservices https://martinfowler.com/bliki/StranglerFigApplication.html @gunnarmorling #CDCUseCases 43

Slide 44

Slide 44 text

@gunnarmorling Customer Strangler Strangler Pattern Pattern #CDCUseCases 44

Slide 45

Slide 45 text

@gunnarmorling Router CDC Customer Customer' 45 Reads/ Writes Reads Strangler Strangler Pattern Pattern Transformation #CDCUseCases

Slide 46

Slide 46 text

@gunnarmorling Router CDC Customer 46 Reads/ Writes Reads/ Writes CDC Strangler Strangler Pattern Pattern #CDCUseCases

Slide 47

Slide 47 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 47

Slide 48

Slide 48 text

@gunnarmorling Deployment Topologies Deployment Topologies Basic Set-Up Basic Set-Up CDC #CDCUseCases 48

Slide 49

Slide 49 text

Deployment Topologies Deployment Topologies Database High Availability Database High Availability @gunnarmorling CDC #CDCUseCases 49

Slide 50

Slide 50 text

Deployment Topologies Deployment Topologies Database High Availability Database High Availability @gunnarmorling CDC #CDCUseCases 50

Slide 51

Slide 51 text

Deployment Topologies Deployment Topologies Automatic Fail-over Automatic Fail-over @gunnarmorling HA Proxy CDC #CDCUseCases 51

Slide 52

Slide 52 text

Deployment Topologies Deployment Topologies Automatic Fail-over Automatic Fail-over @gunnarmorling HA Proxy CDC #CDCUseCases 52

Slide 53

Slide 53 text

Deployment Topologies Deployment Topologies Can't Change Binlog Mode? Can't Change Binlog Mode? @gunnarmorling CDC Primary Secondary #CDCUseCases 53

Slide 54

Slide 54 text

Deployment Topologies Deployment Topologies High Availability for Connectors High Availability for Connectors @gunnarmorling CDC Deduplicator CDC 54 #CDCUseCases

Slide 55

Slide 55 text

apiVersion: "kafka.strimzi.io/v1alpha1" kind: "KafkaConnector" metadata: name: "inventory­connector" labels: connect­cluster: my­connect­cluster spec: class: i.d.c.p.PostgresConnector tasksMax: 1 config: database.hostname: "postgres", database.port: "5432", database.user: "bob", database.password: "secret", database.dbname : "prod", database.server.name: "dbserver1", schema.whitelist: "inventory" Running on Kubernetes Running on Kubernetes Deployment via Operators Deployment via Operators YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds @gunnarmorling #CDCUseCases 55

Slide 56

Slide 56 text

Running on Kubernetes Running on Kubernetes Operating Kafka Connect Operating Kafka Connect Distributed mode Offsets stored in Kafka Configuration via REST Single node: no re-balancing issues (< Apache Kafka 2.3) Single connector: health checks based on REST API Fight duplication: Jsonnet templates @gunnarmorling // a database + connector per tenant { "name": "inventory­connector", "config": { "connector.class": "i.d.c.p.PostgresConnector", "tasks.max": "1", "database.hostname": "postgres", "database.port": "5432", "database.user": "bob", "database.password": "secret", "database.dbname" : std.extVar('tenant'), "database.server.name": std.extVar('tenant'), "schema.whitelist": "inventory" } } #CDCUseCases 56

Slide 57

Slide 57 text

Single Message Transformations Single Message Transformations The Swiss Army Knife of Kafka Connect The Swiss Army Knife of Kafka Connect Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers @gunnarmorling #CDCUseCases © Emilian Robert Vicol https://flic.kr/p/c8s6Y3 57

Slide 58

Slide 58 text

Single Message Transformations Single Message Transformations Externalizing Large Column Values Externalizing Large Column Values @gunnarmorling DBZ Amazon S3 #CDCUseCases 58

Slide 59

Slide 59 text

Single Message Transformations Single Message Transformations Externalizing Large Column Values Externalizing Large Column Values @gunnarmorling DBZ Amazon S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]", "image": "imgs­­after" }, ... } #CDCUseCases 59

Slide 60

Slide 60 text

Takeaways Takeaways Change Data Capture – Liberation for your data! Enabling use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling #CDCUseCases “ Friends Don't Let Friends Do Dual-Writes 60

Slide 61

Slide 61 text

DATA 61

Slide 62

Slide 62 text

Resources Resources Website: Strimzi (Kafka on Kubernetes) Latest news: @debezium debezium.io debezium.io/documentation/online-resources debezium.io/blog strimzi.io @gunnarmorling #CDCUseCases 62

Slide 63

Slide 63 text

[email protected] @gunnarmorling @gunnarmorling Q&A #CDCUseCases 63

Slide 64

Slide 64 text

Outlook: View Materialization Outlook: View Materialization Awareness of Transaction Boundaries Awareness of Transaction Boundaries Topic with BEGIN/END markers Enable consumers to buffer all events of one transaction @gunnarmorling { "transactionId" : "tx­123", "eventType" : "begin transaction", "ts_ms": 1486500577125 } { "transactionId" : "tx­123", "ts_ms": 1486500577691, "eventType" : "end transaction", "eventCount" : [ { "name" : "dbserver1.inventory.Order", "count" : 1 }, { "name" : "dbserver1.inventory.OrderLine", "count" : 5 } ] } #CDCUseCases BEGIN END 64

Slide 65

Slide 65 text

65