Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming Database Changes with Debezium

Streaming Database Changes with Debezium

Slides from a talk given at Devoxx Belgium 2017; the recording of the talk can be found at https://youtu.be/IOZ2Um6e430

"Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - secret ingredient for change data capture"

Updating caches and full-text indexes, synchronizing data between microservices, maintaining different read models in a CRQS-style architecture, feeding operational data to your analytics tools -- just a few use cases which benefit so much from streaming the changes from your datastore.

In this session, you’ll learn what change data capture (CDC) is about and how it can be implemented using Debezium, an open-source CDC solution based on Apache Kafka. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time, and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong.


Gunnar Morling

November 09, 2017


  1. Streaming Database Changes with Debezium Gunnar Morling @gunnarmorling 1

  2. Agenda Use cases for change data streams How to create

    a change data stream? Change Data Capture with Kafka (Connect) Introducing Debezium Demo #Devoxx #Debezium @gunnarmorling 2
  3. Gunnar Morling Open source software engineer at Red Hat Debezium

    Hibernate Spec Lead for Bean Validation 2.0 - Thursday, 13:35 Other projects: ModiTect, MapStruct - Thursday, 13:10 gunnar@hibernate.org @gunnarmorling http://in.relation.to/gunnar-morling/ #Devoxx #Debezium @gunnarmorling 3
  4. Change Data Capture What is it about? Get an event

    stream with all data and schema changes in your DB #Devoxx #Debezium @gunnarmorling Apache Kafka DB 1 DB 2 4
  5. CDC Use Cases Data Replication Replicate data to other DB

    Feed analytics system or DWH Feed data to other teams #Devoxx #Debezium @gunnarmorling Apache Kafka DB 1 DB 2 5
  6. CDC Use Cases Microservice Architectures Propagate data between different services

    without coupling Each service keeps optimised views locally #Devoxx #Debezium @gunnarmorling Order Item Stock App Local DB Local DB Local DB App App 6 Item Changes Stock Changes
  7. CDC Use Cases Others Update or invalidate caches Enable full-text

    search via Elasticsearch, Solr etc. Update CQRS read models #Devoxx #Debezium @gunnarmorling 7
  8. How to Capture Changes? 8

  9. How to Capture Changes? Possible approaches Dual writes Failure handling?

    Prone to race conditions Polling for changes How to find changed rows? How to handle deleted rows https://www.confluent.io/blog/using-logs-to-build-a-solid- data-infrastructure-or-why-dual-writes-are-a-bad-idea/ #Devoxx #Debezium @gunnarmorling 9
  10. How to Capture Changes! Monitoring the DB Reading the database

    log Failures only cause delays but no inconsistencies Transparent to upstream applications #Devoxx #Debezium @gunnarmorling 10
  11. Database Logs A closer look Apps write to the DB

    DB records changes in log files, then updates tables Logs used for TX recovery, replication etc. MySQL: binlog Postgres: write-ahead log MongoDB op log Let's use that for CDC! #Devoxx #Debezium @gunnarmorling 11
  12. Using Apache Kafka as Basis Well suited for CDC Messages

    have a key Guaranteed ordering (per partition) Pull-based Supports compaction Scales horizontally #Devoxx #Debezium @gunnarmorling 12
  13. #Devoxx #Debezium @gunnarmorling Kafka Connect A framework for source and

    sink connectors Track offsets Schema support Clustering Rich eco-system of connectors 13
  14. CDC Topology with Kafka Connect #Devoxx #Debezium @gunnarmorling Postgres MySQL

    Apache Kafka 14
  15. CDC Topology with Kafka Connect #Devoxx #Debezium @gunnarmorling Postgres MySQL

    Apache Kafka Kafka Connect Kafka Connect 15
  16. CDC Topology with Kafka Connect #Devoxx #Debezium @gunnarmorling Postgres MySQL

    Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL 16
  17. CDC Topology with Kafka Connect #Devoxx #Debezium @gunnarmorling Postgres MySQL

    Kafka Connect Kafka Connect Apache Kafka DBZ PG DBZ MySQL Elasticsearch ES Connector 17
  18. CDC Message Structure Key (PK of table) and Value Payload:

    Before state, After state, Source info JSON Human-readable Verbose Optional: schema in each message Avro Compact ​ binary representation Using Confluent Schema Registry { "schema": { ... }, "payload": { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql-bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } } #Devoxx #Debezium @gunnarmorling 18
  19. Initial Snapshots TX logs may have been purged when starting

    a connector Take a snapshot of tables (sending INSERT events) Switch to log reading afterwards #Devoxx #Debezium @gunnarmorling 19
  20. Debezium Connectors MySQL Postgres MongoDB (uses patch format) Coming next

    Oracle SQL Server Cassandra? MariaDB? #Devoxx #Debezium @gunnarmorling 20
  21. Debezium Connectors Common event format as far as possible Common

    options Filters Snapshotting mode Monitoring via JMX Snapshot progress Seconds behind master Last event, last offset etc. #Devoxx #Debezium @gunnarmorling 21.1
  22. Operating a Connector Using Kafka Connect REST API Set up

    Stopping, e.g. to Reconfigure Update to new version Starting up again Take snapshots as needed Outage of components may cause delays but no inconsistencies #Devoxx #Debezium @gunnarmorling 21.2
  23. Demo 22

  24. Message Transformations Altering individual messages via SMTs Use cases: Extraction

    Conversion Routing Apply on source or sink side Provided by Debezium: Logical table router Event flattening SMT #Devoxx #Debezium @gunnarmorling 23
  25. Demo 24

  26. Trying It Out Yourself Docker images for everything Extensive tutorial

    Docker Compose files OpenShift instructions http://debezium.io/docs/tutorial/ https://github.com/debezium/debezium-examples/ http://debezium.io/docs/openshift/ #Devoxx #Debezium @gunnarmorling 25
  27. Outlook Debezium 0.7 Based on Kafka 1.0 Support for Postgres

    on Amazon RDS via wal2json Debezium 0.x More Connectors Support for Infinispan as source and sink Debezium 1.x Support for event aggregation Building blocks for declarative CQRS support #Devoxx #Debezium @gunnarmorling 26
  28. Summary Debezium brings CDC for growing number of databases Transparently

    set up change data event streams Works reliably also in case of failures No dual write issues consumers wait until connectors are up again Used in production at several sites already Everything is open source (Apache License v2) Contributions welcome! #Devoxx #Debezium @gunnarmorling 27
  29. Resources Website Source code, examples, Compose files etc. Discussion group

    Latest news @debezium http://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium #Devoxx #Debezium @gunnarmorling 28
  30. 29