Slide 1

Slide 1 text

Change Data Streaming Use Cases With Apache Kafka and Debezium Java Day Istanbul 2020 Gunnar Morling Software Engineer @gunnarmorling

Slide 2

Slide 2 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms 1 2 3

Slide 3

Slide 3 text

Gunnar Morling Open source software engineer at Red Hat Debezium Hibernate Quarkus Spec Lead for Bean Validation 2.0 Other projects: Layrry, Deptective, MapStruct Java Champion @gunnarmorling #Debezium

Slide 4

Slide 4 text

A Common Problem Updating Multiple Resources Database Order Service @gunnarmorling #Debezium

Slide 5

Slide 5 text

A Common Problem Updating Multiple Resources Cache Database Order Service @gunnarmorling #Debezium

Slide 6

Slide 6 text

A Common Problem Updating Multiple Resources Cache Database Order Service Search Index @gunnarmorling #Debezium

Slide 7

Slide 7 text

A Common Problem Updating Multiple Resources Order Service Cache Database Search Index “ Friends Don't Let Friends Do Dual Writes @gunnarmorling #Debezium

Slide 8

Slide 8 text

A Better Solution Streaming Change Events From the Database Order Service @gunnarmorling #Debezium

Slide 9

Slide 9 text

A Better Solution Streaming Change Events From the Database Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium

Slide 10

Slide 10 text

A Better Solution Streaming Change Events From the Database Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture @gunnarmorling #Debezium

Slide 11

Slide 11 text

Change Data Capture With Debezium

Slide 12

Slide 12 text

Debezium Change Data Capture Platform CDC for multiple databases Based on transaction logs Snapshotting, Filtering etc. Fully open-source, very active community Via Apache Kafka or embedded Many production deployments (e.g. WePay, Shopify, Convoy, JW Player, Usabilla etc.) @gunnarmorling #Debezium

Slide 13

Slide 13 text

Debezium Connectors MySQL Postgres MongoDB SQL Server Incubating: Db2 Cassandra Oracle (Incubating, based on XStream) @gunnarmorling #Debezium

Slide 14

Slide 14 text

Meme idea: Robin Moffatt

Slide 15

Slide 15 text

Log- vs. Query-Based CDC Query-Based Log-Based All data changes are captured - + No polling delay or overhead - + Transparent to writing applications and models - + Can capture deletes and old record state - + Installation/Configuration + - @gunnarmorling #Debezium

Slide 16

Slide 16 text

{ "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql-bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } Change Event Structure Key: Primary key of table Value: Describing the change event Old row state New row state Metadata Serialization formats: JSON Avro Using a schema registry @gunnarmorling #Debezium

Slide 17

Slide 17 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! 1 2 3 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms

Slide 18

Slide 18 text

CDC – "Liberation for Your Data" @gunnarmorling #Debezium

Slide 19

Slide 19 text

Postgres MySQL Apache Kafka Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium

Slide 20

Slide 20 text

Postgres MySQL Apache Kafka Kafka Connect Kafka Connect Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium

Slide 21

Slide 21 text

Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium

Slide 22

Slide 22 text

Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium

Slide 23

Slide 23 text

Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector Data Warehouse Data Replication Zero-Code Streaming Pipelines @gunnarmorling #Debezium

Slide 24

Slide 24 text

Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium

Slide 25

Slide 25 text

Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector JDBC Connector ES Connector ISPN Connector Infinispan Data Replication Zero-Code Streaming Pipelines Data Warehouse @gunnarmorling #Debezium

Slide 26

Slide 26 text

Data Replication Low-Latency Streaming Pipelines https://medium.com/convoy-tech/ @gunnarmorling #Debezium

Slide 27

Slide 27 text

Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events CRM Service @gunnarmorling #Debezium

Slide 28

Slide 28 text

Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium

Slide 29

Slide 29 text

Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium

Slide 30

Slide 30 text

Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table @gunnarmorling #Debezium

Slide 31

Slide 31 text

Auditing Source DB Kafka Connect Apache Kafka DBZ Customer Events Transactions CRM Service Kafka Streams Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer "Transactions" table Enriched Customer Events bit.ly/debezium-auditlogs @gunnarmorling #Debezium

Slide 32

Slide 32 text

{ "id": "tx-3" } { "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3" }, "op": "u", "ts_ms": 1486500577691 } Transactions Customers { "before": null, "after": { "id": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "source": { "name": "dbserver1", "table": "transactions", "txId": "tx-3" }, "op": "c", "ts_ms": 1486500577691 } @gunnarmorling #Debezium

Slide 33

Slide 33 text

{ "before": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "table": "customers", "txId": "tx-3", "user": "Rebecca", "use_case": "Update customer" }, "op": "u", "ts_ms": 1486500577691 } Enriched Customers Auditing @gunnarmorling #Debezium

Slide 34

Slide 34 text

@Override public KeyValue transform(JsonObject key, JsonObject value) { boolean enrichedAllBufferedEvents = enrichAndEmitBufferedEvents(); if (!enrichedAllBufferedEvents) { bufferChangeEvent(key, value); return null; } KeyValue enriched = enrichWithTxMetaData(key, value); if (enriched == null) { bufferChangeEvent(key, value); } return enriched; } Auditing Non-trivial join implementation no ordering across topics need to buffer change events until TX data available bit.ly/debezium-auditlogs @gunnarmorling #Debezium

Slide 35

Slide 35 text

Microservice CDC Patterns

Slide 36

Slide 36 text

Order Item Stock App Local DB Local DB Local DB App App Item Changes Stock Changes Microservice Architectures Data Synchronization Propagate data between different services without coupling Each service keeps optimised views locally @gunnarmorling #Debezium

Slide 37

Slide 37 text

Source DB Kafka Connect Apache Kafka DBZ Order Events Credit Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Orders Outbox @gunnarmorling #Debezium

Slide 38

Slide 38 text

Source DB Kafka Connect Apache Kafka DBZ Order Events Credit Worthiness Check Events Outbox Pattern Separate Events Table Order Service Shipment Service Customer Service Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table bit.ly/debezium-outbox-pattern Orders Outbox @gunnarmorling #Debezium

Slide 39

Slide 39 text

Strangler Pattern Migrating from Monoliths to Microservices https://martinfowler.com/bliki/StranglerFigApplication.html @gunnarmorling #Debezium

Slide 40

Slide 40 text

Customer Strangler Pattern @gunnarmorling #Debezium

Slide 41

Slide 41 text

Router CDC Customer Customer' Reads/ Writes Reads Strangler Pattern Transformation @gunnarmorling #Debezium

Slide 42

Slide 42 text

Router CDC Customer Reads/ Writes Reads/ Writes CDC Strangler Pattern @gunnarmorling #Debezium

Slide 43

Slide 43 text

The Issue with Dual Writes What's the problem? Change data capture to the rescue! 1 3 2 CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms

Slide 44

Slide 44 text

@gunnarmorling Deployment Topologies Basic Set-Up CDC #Debezium

Slide 45

Slide 45 text

Deployment Topologies Database High Availability @gunnarmorling CDC #Debezium

Slide 46

Slide 46 text

Deployment Topologies Database High Availability @gunnarmorling CDC #Debezium

Slide 47

Slide 47 text

Deployment Topologies Automatic Fail-over @gunnarmorling HA Proxy CDC #Debezium

Slide 48

Slide 48 text

Deployment Topologies Automatic Fail-over @gunnarmorling HA Proxy CDC #Debezium

Slide 49

Slide 49 text

Deployment Topologies Can't Change Binlog Mode? @gunnarmorling CDC Primary Secondary #Debezium

Slide 50

Slide 50 text

Deployment Topologies High Availability for Connectors @gunnarmorling CDC Deduplicator CDC #Debezium

Slide 51

Slide 51 text

apiVersion: "kafka.strimzi.io/v1alpha1" kind: "KafkaConnector" metadata: name: "inventory-connector" labels: connect-cluster: my-connect-cluster spec: class: i.d.c.p.PostgresConnector tasksMax: 1 config: database.hostname: "postgres", database.port: "5432", database.user: "bob", database.password: "secret", database.dbname : "prod", database.server.name: "dbserver1", schema.whitelist: "inventory" Running on Kubernetes Deployment via Operators YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds @gunnarmorling #Debezium

Slide 52

Slide 52 text

Sneak Peek - UI Proof-of-Concept @gunnarmorling #Debezium

Slide 53

Slide 53 text

@gunnarmorling Sneak Peek - UI Proof-of-Concept https://github.com/debezium/debezium-ui-poc/ #Debezium

Slide 54

Slide 54 text

Single Message Transformations The Swiss Army Knife of Kafka Connect Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Coming with Debezium: Router, "Flattener" @gunnarmorling © Emilian Robert Vicol https://flic.kr/p/c8s6Y3 #Debezium

Slide 55

Slide 55 text

Single Message Transformations Externalizing Large Column Values @gunnarmorling DBZ Amazon S3 #Debezium

Slide 56

Slide 56 text

Single Message Transformations Externalizing Large Column Values @gunnarmorling DBZ Amazon S3 { "before": { ... }, "after": { "id": 1004, "last_name": "Kretchmar", "email": "[email protected]", "image": "imgs--after" }, ... } #Debezium

Slide 57

Slide 57 text

Takeaways Change Data Capture – Liberation for your data! Enabling use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling #Debezium

Slide 58

Slide 58 text

Takeaways Change Data Capture – Liberation for your data! Enabling use cases such as replication, streaming queries, maintaining CQRS read models etc. Microservices: outbox and strangler patterns Debezium: open-source CDC for a growing number of databases @gunnarmorling “ Friends Don't Let Friends Do Dual-Writes #Debezium

Slide 59

Slide 59 text

Resources Website Source code: Latest news: @debezium debezium.io debezium.io/documentation/online-resources debezium.io/blog https://github.com/debezium/ @gunnarmorling #Debezium

Slide 60

Slide 60 text

[email protected] @gunnarmorling Q&A

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

Outlook: View Materialization Awareness of Transaction Boundaries Topic with BEGIN/END markers Enable consumers to buffer all events of one transaction { "transactionId" : "tx-123", "eventType" : "begin transaction", "ts_ms": 1486500577125 } { "transactionId" : "tx-123", "ts_ms": 1486500577691, "eventType" : "end transaction", "eventCount" : [ { "name" : "dbserver1.inventory.Order", "count" : 1 }, { "name" : "dbserver1.inventory.OrderLine", "count" : 5 } ] } BEGIN END @gunnarmorling