Slide 1

Slide 1 text

@systemcraftsman Introducing Change Data Capture with Debezium and Apache Kafka Aykut M. Bulgu Technology Consultant | Software Architect [email protected]

Slide 2

Slide 2 text

@systemcraftsman #oc apply -f aykutbulgu.yaml apiVersion: redhat/v2.5 kind: Middleware & AppDev Consultant metadata: name: Aykut Bulgu namespace: Red Hat Consulting - CEMEA Annotations: twitter: @systemcraftsman email: [email protected] organizer: Software Craftsmanship Turkey founder: System Craftsman labels: married: yes children: daughter interests: tech (cloud & middleware), aikido, 80s spec: replicas: 2 containers: - image: aykut:latest Me as Code

Slide 3

Slide 3 text

@systemcraftsman Agenda The Issue with Dual Writes What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms

Slide 4

Slide 4 text

@systemcraftsman Common Problem Updating multiple resources Order Service Database

Slide 5

Slide 5 text

@systemcraftsman Common Problem Updating multiple resources Order Service Database Cache

Slide 6

Slide 6 text

@systemcraftsman Common Problem Updating multiple resources Order Service Database Cache Search Index

Slide 7

Slide 7 text

@systemcraftsman Common Problem Updating multiple resources Order Service Database Cache Search Index

Slide 8

Slide 8 text

@systemcraftsman ‘Friends Don't Let Friends Do Dual Writes!’

Slide 9

Slide 9 text

@systemcraftsman As a Solution Stream changes events from the database Order Service

Slide 10

Slide 10 text

@systemcraftsman As a Solution Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete

Slide 11

Slide 11 text

@systemcraftsman As a Solution Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete

Slide 12

Slide 12 text

@systemcraftsman Change Data Capture with Debezium Debezium is an open source distributed platform for change data capture

Slide 13

Slide 13 text

@systemcraftsman Debezium Change Data Capture Platform CDC for multiple databases Based on transaction logs Snapshotting, filtering, etc. Fully open-source, very active community Latest version: 1.3 Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)

Slide 14

Slide 14 text

@systemcraftsman Red Hat CDC Supported Databases GA Connectors: MySQL Postgres SQL Server MongoDB Developer Preview: DB2

Slide 15

Slide 15 text

@systemcraftsman Advantages of Log-based CDC Tailing the Transaction Logs All data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data https://debezium.io/blog/2018/07/19/advantages-of-log-based-change-data-capture/

Slide 16

Slide 16 text

@systemcraftsman Log vs Query based CDC Query-based Log-based All data changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -

Slide 17

Slide 17 text

@systemcraftsman Debezium Change Event Structure ● Key: PK of table ● Value: Describing the change event ○ Before state, ○ After state, ○ Metadata info ● Serialization formats: ○ JSON ○ Avro ● Cloud events could be used too

Slide 18

Slide 18 text

@systemcraftsman Single Message Transformations Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0 Lightweight single message inline transformation Format conversions Time/date fields Extract new row state Aggregate sharded tables to single topic Keep compatibility with existing consumers Transformation does not interact with external systems Modify events before storing in Kafka

Slide 19

Slide 19 text

@systemcraftsman Change Data Capture Usages & Patterns

Slide 20

Slide 20 text

@systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | | | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka

Slide 21

Slide 21 text

@systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | | | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect

Slide 22

Slide 22 text

@systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | | | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL

Slide 23

Slide 23 text

@systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | | | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch

Slide 24

Slide 24 text

@systemcraftsman Data Replication Zero-Code Streaming Pipelines | | | | | | | | | | | | | | | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch SQL Connector Data Warehouse

Slide 25

Slide 25 text

@systemcraftsman Auditing Source: http://bit.ly/debezium-auditlogs | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka CDC and a bit of Kafka Streams

Slide 26

Slide 26 text

@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 27

Slide 27 text

@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 28

Slide 28 text

@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 29

Slide 29 text

@systemcraftsman Auditing | | | | | | | | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams | | | | | | | | Enriched Customers CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 30

Slide 30 text

@systemcraftsman Auditing CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 31

Slide 31 text

@systemcraftsman Auditing CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 32

Slide 32 text

@systemcraftsman Auditing CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 33

Slide 33 text

@systemcraftsman Auditing CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs

Slide 34

Slide 34 text

@systemcraftsman Microservices Propagate data between different services without coupling Each service keeps optimised views locally Microservices Data Exchange

Slide 35

Slide 35 text

@systemcraftsman Microservices Source: http://bit.ly/debezium-outbox-pattern Outbox Pattern

Slide 36

Slide 36 text

@systemcraftsman Microservices Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards Strangler Pattern

Slide 37

Slide 37 text

@systemcraftsman Mono to micro: Strangler Pattern Customer

Slide 38

Slide 38 text

@systemcraftsman Mono to micro: Strangler Pattern Customer Customer Router CDC Transformation Reads / Writes Reads

Slide 39

Slide 39 text

@systemcraftsman Mono to micro: Strangler Pattern Customer Router CDC Reads / Writes Reads / Writes CDC

Slide 40

Slide 40 text

@systemcraftsman Running on OpenShift Getting the best cloud-native Apache Kafka running on enterprise Kubernetes

Slide 41

Slide 41 text

@systemcraftsman Running on OpenShift Provides: Container images for Apache Kafka, Connect, Zookeeper and MirrorMaker Kubernetes Operators for managing/configuring Apache Kafka clusters, topics and users Kafka Consumer, Producer and Admin clients, Kafka Streams Upstream Community: Strimzi Cloud-native Apache Kafka

Slide 42

Slide 42 text

@systemcraftsman Running on OpenShift Source: YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. Operator applies configuration Advantages Automated deployment and scaling Simplified upgrading Portability across clouds Deployment via Operators

Slide 43

Slide 43 text

@systemcraftsman Demo Time! https://github.com/systemcraftsman/debezium-demo

Slide 44

Slide 44 text

@systemcraftsman Thank You @systemcraftsman [email protected] [email protected]